All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-30  3:29 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the ZNS mode with the passthru backend. There is no
support for a generic block device backend to handle the ZBD devices
which are not NVMe devices.

This adds support to export the ZBD drives (which are not NVMe drives)
to host from the target with NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for 
patch-series overview.

I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
namespace with nvme-loop transport. The same testcases are passing on the
NVMeOF zbd-ns and are passing for null_blk without NVMeOF .

Regards,
Chaitanya

Changes from V1:-

1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2. Mark helpers inline.
3. Fix typos in the comments and update the comments.
4. Get rid of the curly brackets.
5. Don't allow drives with last smaller zones.
6. Calculate the zasl as a function of max_zone_append_sectors,
   bio_max_pages so we don't have to split the bio.
7. Add global subsys->zasl and update the zasl when new namespace
   is enabled.
8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
   move functionality in to the report zone callback.
9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10, Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (9):
  block: export __bio_iov_append_get_pages()
	Prep patch needed for implementing Zone Append.
  nvmet: add ZNS support for bdev-ns
	Core Command handlers and various helpers for ZBD backend which
	 will be called by target-core/target-admin etc.
  nvmet: trim down id-desclist to use req->ns
	Cleanup needed to avoid the code repetation for passing extra
	function parameters for ZBD backend handlers.
  nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
	Allows host to identify zoned namesapce.
  nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
	Allows host to identify controller with the ZBD-ZNS.
  nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
	Allows host to identify namespace with the ZBD-ZNS.
  nvmet: add zns cmd effects to support zbdev
	Allows host to support the ZNS commands when zoned-blkdev is
	 selected.
  nvmet: add zns bdev config support
	Allows user to override any target namespace attributes for
	 ZBD.
  nvmet: add ZNS based I/O cmds handlers
	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.

 block/bio.c                       |   3 +-
 drivers/nvme/target/Makefile      |   2 +-
 drivers/nvme/target/admin-cmd.c   |  38 ++-
 drivers/nvme/target/io-cmd-bdev.c |  12 +
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  19 ++
 drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   1 +
 8 files changed, 524 insertions(+), 16 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

-- 
2.22.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-30  3:29 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the ZNS mode with the passthru backend. There is no
support for a generic block device backend to handle the ZBD devices
which are not NVMe devices.

This adds support to export the ZBD drives (which are not NVMe drives)
to host from the target with NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for 
patch-series overview.

I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
namespace with nvme-loop transport. The same testcases are passing on the
NVMeOF zbd-ns and are passing for null_blk without NVMeOF .

Regards,
Chaitanya

Changes from V1:-

1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2. Mark helpers inline.
3. Fix typos in the comments and update the comments.
4. Get rid of the curly brackets.
5. Don't allow drives with last smaller zones.
6. Calculate the zasl as a function of max_zone_append_sectors,
   bio_max_pages so we don't have to split the bio.
7. Add global subsys->zasl and update the zasl when new namespace
   is enabled.
8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
   move functionality in to the report zone callback.
9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10, Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (9):
  block: export __bio_iov_append_get_pages()
	Prep patch needed for implementing Zone Append.
  nvmet: add ZNS support for bdev-ns
	Core Command handlers and various helpers for ZBD backend which
	 will be called by target-core/target-admin etc.
  nvmet: trim down id-desclist to use req->ns
	Cleanup needed to avoid the code repetation for passing extra
	function parameters for ZBD backend handlers.
  nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
	Allows host to identify zoned namesapce.
  nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
	Allows host to identify controller with the ZBD-ZNS.
  nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
	Allows host to identify namespace with the ZBD-ZNS.
  nvmet: add zns cmd effects to support zbdev
	Allows host to support the ZNS commands when zoned-blkdev is
	 selected.
  nvmet: add zns bdev config support
	Allows user to override any target namespace attributes for
	 ZBD.
  nvmet: add ZNS based I/O cmds handlers
	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.

 block/bio.c                       |   3 +-
 drivers/nvme/target/Makefile      |   2 +-
 drivers/nvme/target/admin-cmd.c   |  38 ++-
 drivers/nvme/target/io-cmd-bdev.c |  12 +
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  19 ++
 drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   1 +
 8 files changed, 524 insertions(+), 16 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH V2 1/9] block: export __bio_iov_append_get_pages()
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

In this prep patch we exoprt the __bio_iov_append_get_pages() so that
NVMeOF target can use the core logic of building Zone Append bios for
REQ_OP_ZONE_APPEND without repeating the code.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/bio.c         | 3 ++-
 include/linux/bio.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index fa01bef35bb1..de356fa28315 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1033,7 +1033,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
@@ -1079,6 +1079,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_advance(iter, size - left);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(__bio_iov_append_get_pages);
 
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..47247c1b0b85 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -446,6 +446,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool *same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 void bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 1/9] block: export __bio_iov_append_get_pages()
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

In this prep patch we exoprt the __bio_iov_append_get_pages() so that
NVMeOF target can use the core logic of building Zone Append bios for
REQ_OP_ZONE_APPEND without repeating the code.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/bio.c         | 3 ++-
 include/linux/bio.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index fa01bef35bb1..de356fa28315 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1033,7 +1033,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
@@ -1079,6 +1079,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_advance(iter, size - left);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(__bio_iov_append_get_pages);
 
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..47247c1b0b85 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -446,6 +446,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool *same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 void bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
support for bdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   2 +-
 drivers/nvme/target/admin-cmd.c   |   4 +-
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  19 ++
 drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
 5 files changed, 486 insertions(+), 4 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..d050f829b43a 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
-			discovery.o io-cmd-file.o io-cmd-bdev.o
+		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index dca34489a1dc..509fd8dcca0c 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
-				    void *id, off_t *off)
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off)
 {
 	struct nvme_ns_id_desc desc = {
 		.nidt = type,
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 0abbefd9925e..2bd10960fa50 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 	return ret;
 }
 
-static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
 {
 	bv->bv_page = sg_page(sg);
 	bv->bv_offset = sg->offset;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 592763732065..eee7866ae512 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,10 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ns_zns	id_zns;
+	unsigned int		zasl;
+#endif
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
@@ -251,6 +255,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ctrl_zns	id_ctrl_zns;
+#endif
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -603,4 +611,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
 	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
 }
 
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off);
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..40dedfd51fd6
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,463 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/uio.h>
+#include <linux/nvme.h>
+#include <linux/xarray.h>
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include "nvmet.h"
+
+#ifdef CONFIG_BLK_DEV_ZONED
+#define NVMET_MPSMIN_SHIFT	12
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	u16 status = 0;
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
+		status = NVME_SC_INVALID_FIELD;
+out:
+	return status;
+}
+
+static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
+{
+	return req->ns->bdev;
+}
+
+static inline  u64 nvmet_zones_to_desc_size(unsigned int nr_zones)
+{
+	return sizeof(struct nvme_zone_report) +
+		(sizeof(struct nvme_zone_descriptor) * nr_zones);
+}
+
+static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
+{
+	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
+{
+	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+/*
+ *  ZNS related command implementation and helpers.
+ */
+
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	u16 nvme_cis_zns = NVME_CSI_ZNS;
+
+	if (!bdev_is_zoned(nvmet_bdev(req)))
+		return NVME_SC_SUCCESS;
+
+	return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
+					&nvme_cis_zns, off);
+}
+
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+}
+
+static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
+					    unsigned int idx, void *data)
+{
+	struct blk_zone *zone = data;
+
+	memcpy(zone, z, sizeof(struct blk_zone));
+
+	return 0;
+}
+
+static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
+{
+	sector_t last_sect = get_capacity(ns->bdev->bd_disk) - 1;
+	struct blk_zone last_zone, first_zone;
+	int reported_zones;
+
+	reported_zones = blkdev_report_zones(ns->bdev, 0, 1,
+					     nvmet_bdev_validate_zns_zones_cb,
+					     &first_zone);
+	if (reported_zones != 1)
+		return false;
+
+	reported_zones = blkdev_report_zones(ns->bdev, last_sect, 1,
+					     nvmet_bdev_validate_zns_zones_cb,
+					     &last_zone);
+	if (reported_zones != 1)
+		return false;
+
+	return first_zone.capacity == last_zone.capacity ? true : false;
+}
+
+static inline u8 nvmet_zasl(unsigned int zone_append_sects)
+{
+	unsigned int npages = (zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT;
+	u8 zasl = ilog2(npages);
+
+	/*
+	 * Zone Append Size Limit is the value experessed in the units
+	 * of minimum memory page size (i.e. 12) and is reported power of 2.
+	 */
+	return zasl;
+}
+
+static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
+{
+	u8 bio_max_zasl = nvmet_zasl((BIO_MAX_PAGES * PAGE_SIZE) >> 9);
+	struct request_queue *q = ns->bdev->bd_disk->queue;
+	struct nvmet_ns *ins;
+	unsigned long idx;
+	u8 min_zasl;
+
+	/*
+	 * Calculate new ctrl->zasl value when enabling the new ns. This value
+	 * has to be the minimum of the max_zone appned values from available
+	 * namespaces.
+	 */
+	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+
+	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
+		struct request_queue *iq = ins->bdev->bd_disk->queue;
+		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
+		u8 izasl = nvmet_zasl(imax_za_sects);
+
+		if (!bdev_is_zoned(ins->bdev))
+			continue;
+
+		min_zasl = min_zasl > izasl ? izasl : min_zasl;
+	}
+
+	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
+}
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
+		pr_err("block devices with conventional zones are not supported.");
+		return false;
+	}
+
+	if (!nvmet_bdev_validate_zns_zones(ns))
+		return false;
+
+	/*
+	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
+	 * to the device physical block size. So use this value as the logical
+	 * block size to avoid errors.
+	 */
+	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
+
+	nvmet_zns_update_zasl(ns);
+
+	return true;
+}
+
+/*
+ * ZNS related Admin and I/O command handlers.
+ */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_id_ctrl_zns *id;
+	u16 status = 0;
+	u8 mdts;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	/*
+	 * Even though this function sets Zone Append Size Limit to 0,
+	 * the 0 value here indicates that the maximum data transfer size for
+	 * the Zone Append command is indicated by the ctrl
+	 * Maximum Data Transfer Size (MDTS).
+	 */
+
+	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
+
+	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u16 status = 0;
+	u64 zsze;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(nvmet_bdev(req))) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+struct nvmet_report_zone_data {
+	struct nvmet_ns *ns;
+	struct nvme_zone_report *rz;
+};
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
+				     void *data)
+{
+	struct nvmet_report_zone_data *report_zone_data = data;
+	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
+	struct nvmet_ns *ns = report_zone_data->ns;
+
+	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
+	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
+	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
+	entries[idx].za = z->reset ? 1 << 2 : 0;
+	entries[idx].zt = z->type;
+	entries[idx].zs = z->cond << 4;
+
+	return 0;
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	struct nvmet_report_zone_data data = { .ns = req->ns };
+	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
+	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
+	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
+	int reported_zones;
+	u16 status;
+
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
+	if (!data.rz) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
+					     nvmet_bdev_report_zone_cb,
+					     &data);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	data.rz->nr_zones = cpu_to_le64(reported_zones);
+
+	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
+
+out_free_report_zones:
+	kvfree(data.rz);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
+	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
+	enum req_opf op = REQ_OP_LAST;
+	u16 status = NVME_SC_SUCCESS;
+	sector_t sect;
+	int ret;
+
+	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
+
+	switch (c->zsa) {
+	case NVME_ZONE_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case NVME_ZONE_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case NVME_ZONE_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case NVME_ZONE_RESET:
+		if (c->select_all)
+			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	unsigned long bv_cnt = req->sg_cnt;
+	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	u64 slba = le64_to_cpu(req->cmd->rw.slba);
+	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
+	u16 status = NVME_SC_SUCCESS;
+	size_t mapped_data_len = 0;
+	int sg_cnt = req->sg_cnt;
+	struct scatterlist *sg;
+	struct iov_iter from;
+	struct bio_vec *bvec;
+	size_t mapped_cnt;
+	struct bio *bio;
+	int ret;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	/*
+	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
+	 * don't have to split the bio, i.e. we shouldn't get
+	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
+	 * with the size that considers the BIO_MAX_PAGES.
+	 */
+	if (!req->sg_cnt)
+		goto out;
+
+	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out;
+	}
+
+	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
+	if (!bvec) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out;
+	}
+
+	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
+		nvmet_file_init_bvec(bvec, sg);
+		mapped_data_len += bvec[mapped_cnt].bv_len;
+		sg_cnt--;
+		if (mapped_cnt == bv_cnt)
+			break;
+	}
+
+	if (WARN_ON(sg_cnt)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out;
+	}
+
+	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
+
+	bio = bio_alloc(GFP_KERNEL, bv_cnt);
+	bio_set_dev(bio, nvmet_bdev(req));
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_opf = op;
+
+	ret =  __bio_iov_append_get_pages(bio, &from);
+	if (unlikely(ret)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		bio_io_error(bio);
+		goto bvec_free;
+	}
+
+	ret = submit_bio_wait(bio);
+	status = ret < 0 ? NVME_SC_INTERNAL : status;
+	bio_put(bio);
+
+	sect += (mapped_data_len >> 9);
+	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
+
+bvec_free:
+	kfree(bvec);
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+#else  /* CONFIG_BLK_DEV_ZONED */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	return 0;
+}
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+{
+	return false;
+}
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
support for bdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   2 +-
 drivers/nvme/target/admin-cmd.c   |   4 +-
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  19 ++
 drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
 5 files changed, 486 insertions(+), 4 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..d050f829b43a 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
-			discovery.o io-cmd-file.o io-cmd-bdev.o
+		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index dca34489a1dc..509fd8dcca0c 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
-				    void *id, off_t *off)
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off)
 {
 	struct nvme_ns_id_desc desc = {
 		.nidt = type,
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 0abbefd9925e..2bd10960fa50 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 	return ret;
 }
 
-static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
 {
 	bv->bv_page = sg_page(sg);
 	bv->bv_offset = sg->offset;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 592763732065..eee7866ae512 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,10 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ns_zns	id_zns;
+	unsigned int		zasl;
+#endif
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
@@ -251,6 +255,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ctrl_zns	id_ctrl_zns;
+#endif
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -603,4 +611,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
 	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
 }
 
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off);
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..40dedfd51fd6
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,463 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/uio.h>
+#include <linux/nvme.h>
+#include <linux/xarray.h>
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include "nvmet.h"
+
+#ifdef CONFIG_BLK_DEV_ZONED
+#define NVMET_MPSMIN_SHIFT	12
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	u16 status = 0;
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
+		status = NVME_SC_INVALID_FIELD;
+out:
+	return status;
+}
+
+static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
+{
+	return req->ns->bdev;
+}
+
+static inline  u64 nvmet_zones_to_desc_size(unsigned int nr_zones)
+{
+	return sizeof(struct nvme_zone_report) +
+		(sizeof(struct nvme_zone_descriptor) * nr_zones);
+}
+
+static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
+{
+	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
+{
+	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+/*
+ *  ZNS related command implementation and helpers.
+ */
+
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	u16 nvme_cis_zns = NVME_CSI_ZNS;
+
+	if (!bdev_is_zoned(nvmet_bdev(req)))
+		return NVME_SC_SUCCESS;
+
+	return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
+					&nvme_cis_zns, off);
+}
+
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+}
+
+static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
+					    unsigned int idx, void *data)
+{
+	struct blk_zone *zone = data;
+
+	memcpy(zone, z, sizeof(struct blk_zone));
+
+	return 0;
+}
+
+static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
+{
+	sector_t last_sect = get_capacity(ns->bdev->bd_disk) - 1;
+	struct blk_zone last_zone, first_zone;
+	int reported_zones;
+
+	reported_zones = blkdev_report_zones(ns->bdev, 0, 1,
+					     nvmet_bdev_validate_zns_zones_cb,
+					     &first_zone);
+	if (reported_zones != 1)
+		return false;
+
+	reported_zones = blkdev_report_zones(ns->bdev, last_sect, 1,
+					     nvmet_bdev_validate_zns_zones_cb,
+					     &last_zone);
+	if (reported_zones != 1)
+		return false;
+
+	return first_zone.capacity == last_zone.capacity ? true : false;
+}
+
+static inline u8 nvmet_zasl(unsigned int zone_append_sects)
+{
+	unsigned int npages = (zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT;
+	u8 zasl = ilog2(npages);
+
+	/*
+	 * Zone Append Size Limit is the value experessed in the units
+	 * of minimum memory page size (i.e. 12) and is reported power of 2.
+	 */
+	return zasl;
+}
+
+static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
+{
+	u8 bio_max_zasl = nvmet_zasl((BIO_MAX_PAGES * PAGE_SIZE) >> 9);
+	struct request_queue *q = ns->bdev->bd_disk->queue;
+	struct nvmet_ns *ins;
+	unsigned long idx;
+	u8 min_zasl;
+
+	/*
+	 * Calculate new ctrl->zasl value when enabling the new ns. This value
+	 * has to be the minimum of the max_zone appned values from available
+	 * namespaces.
+	 */
+	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+
+	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
+		struct request_queue *iq = ins->bdev->bd_disk->queue;
+		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
+		u8 izasl = nvmet_zasl(imax_za_sects);
+
+		if (!bdev_is_zoned(ins->bdev))
+			continue;
+
+		min_zasl = min_zasl > izasl ? izasl : min_zasl;
+	}
+
+	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
+}
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
+		pr_err("block devices with conventional zones are not supported.");
+		return false;
+	}
+
+	if (!nvmet_bdev_validate_zns_zones(ns))
+		return false;
+
+	/*
+	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
+	 * to the device physical block size. So use this value as the logical
+	 * block size to avoid errors.
+	 */
+	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
+
+	nvmet_zns_update_zasl(ns);
+
+	return true;
+}
+
+/*
+ * ZNS related Admin and I/O command handlers.
+ */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_id_ctrl_zns *id;
+	u16 status = 0;
+	u8 mdts;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	/*
+	 * Even though this function sets Zone Append Size Limit to 0,
+	 * the 0 value here indicates that the maximum data transfer size for
+	 * the Zone Append command is indicated by the ctrl
+	 * Maximum Data Transfer Size (MDTS).
+	 */
+
+	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
+
+	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u16 status = 0;
+	u64 zsze;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(nvmet_bdev(req))) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+struct nvmet_report_zone_data {
+	struct nvmet_ns *ns;
+	struct nvme_zone_report *rz;
+};
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
+				     void *data)
+{
+	struct nvmet_report_zone_data *report_zone_data = data;
+	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
+	struct nvmet_ns *ns = report_zone_data->ns;
+
+	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
+	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
+	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
+	entries[idx].za = z->reset ? 1 << 2 : 0;
+	entries[idx].zt = z->type;
+	entries[idx].zs = z->cond << 4;
+
+	return 0;
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	struct nvmet_report_zone_data data = { .ns = req->ns };
+	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
+	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
+	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
+	int reported_zones;
+	u16 status;
+
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
+	if (!data.rz) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
+					     nvmet_bdev_report_zone_cb,
+					     &data);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	data.rz->nr_zones = cpu_to_le64(reported_zones);
+
+	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
+
+out_free_report_zones:
+	kvfree(data.rz);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
+	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
+	enum req_opf op = REQ_OP_LAST;
+	u16 status = NVME_SC_SUCCESS;
+	sector_t sect;
+	int ret;
+
+	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
+
+	switch (c->zsa) {
+	case NVME_ZONE_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case NVME_ZONE_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case NVME_ZONE_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case NVME_ZONE_RESET:
+		if (c->select_all)
+			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	unsigned long bv_cnt = req->sg_cnt;
+	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	u64 slba = le64_to_cpu(req->cmd->rw.slba);
+	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
+	u16 status = NVME_SC_SUCCESS;
+	size_t mapped_data_len = 0;
+	int sg_cnt = req->sg_cnt;
+	struct scatterlist *sg;
+	struct iov_iter from;
+	struct bio_vec *bvec;
+	size_t mapped_cnt;
+	struct bio *bio;
+	int ret;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	/*
+	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
+	 * don't have to split the bio, i.e. we shouldn't get
+	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
+	 * with the size that considers the BIO_MAX_PAGES.
+	 */
+	if (!req->sg_cnt)
+		goto out;
+
+	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out;
+	}
+
+	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
+	if (!bvec) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out;
+	}
+
+	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
+		nvmet_file_init_bvec(bvec, sg);
+		mapped_data_len += bvec[mapped_cnt].bv_len;
+		sg_cnt--;
+		if (mapped_cnt == bv_cnt)
+			break;
+	}
+
+	if (WARN_ON(sg_cnt)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out;
+	}
+
+	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
+
+	bio = bio_alloc(GFP_KERNEL, bv_cnt);
+	bio_set_dev(bio, nvmet_bdev(req));
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_opf = op;
+
+	ret =  __bio_iov_append_get_pages(bio, &from);
+	if (unlikely(ret)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		bio_io_error(bio);
+		goto bvec_free;
+	}
+
+	ret = submit_bio_wait(bio);
+	status = ret < 0 ? NVME_SC_INTERNAL : status;
+	bio_put(bio);
+
+	sect += (mapped_data_len >> 9);
+	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
+
+bvec_free:
+	kfree(bvec);
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+#else  /* CONFIG_BLK_DEV_ZONED */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	return 0;
+}
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+{
+	return false;
+}
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 3/9] nvmet: trim down id-desclist to use req->ns
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

In this prep patch we remove the extra local variable struct nvmet_ns
in nvmet_execute_identify_desclist() since req already has the member
that can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.

This reduces the arguments to the function in the following patch to
implement the ZNS for bdev-ns so we can get away with passing the req
argument instead of req and ns.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 509fd8dcca0c..c64b40c631e0 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -603,37 +603,35 @@ u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
-	struct nvmet_ns *ns;
 	u16 status = 0;
 	off_t off = 0;
 
-	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
-	if (!ns) {
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
 		req->error_loc = offsetof(struct nvme_identify, nsid);
 		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
 		goto out;
 	}
 
-	if (memchr_inv(&ns->uuid, 0, sizeof(ns->uuid))) {
+	if (memchr_inv(&req->ns->uuid, 0, sizeof(req->ns->uuid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_UUID,
 						  NVME_NIDT_UUID_LEN,
-						  &ns->uuid, &off);
+						  &req->ns->uuid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
-	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
+	if (memchr_inv(req->ns->nguid, 0, sizeof(req->ns->nguid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_NGUID,
 						  NVME_NIDT_NGUID_LEN,
-						  &ns->nguid, &off);
+						  &req->ns->nguid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
 
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-out_put_ns:
-	nvmet_put_namespace(ns);
+
 out:
 	nvmet_req_complete(req, status);
 }
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 3/9] nvmet: trim down id-desclist to use req->ns
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

In this prep patch we remove the extra local variable struct nvmet_ns
in nvmet_execute_identify_desclist() since req already has the member
that can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.

This reduces the arguments to the function in the following patch to
implement the ZNS for bdev-ns so we can get away with passing the req
argument instead of req and ns.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 509fd8dcca0c..c64b40c631e0 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -603,37 +603,35 @@ u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
-	struct nvmet_ns *ns;
 	u16 status = 0;
 	off_t off = 0;
 
-	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
-	if (!ns) {
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
 		req->error_loc = offsetof(struct nvme_identify, nsid);
 		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
 		goto out;
 	}
 
-	if (memchr_inv(&ns->uuid, 0, sizeof(ns->uuid))) {
+	if (memchr_inv(&req->ns->uuid, 0, sizeof(req->ns->uuid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_UUID,
 						  NVME_NIDT_UUID_LEN,
-						  &ns->uuid, &off);
+						  &req->ns->uuid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
-	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
+	if (memchr_inv(req->ns->nguid, 0, sizeof(req->ns->nguid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_NGUID,
 						  NVME_NIDT_NGUID_LEN,
-						  &ns->nguid, &off);
+						  &req->ns->nguid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
 
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-out_put_ns:
-	nvmet_put_namespace(ns);
+
 out:
 	nvmet_req_complete(req, status);
 }
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 4/9] nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

When discovering the ZNS, the host-side looks for the NVME_CSI_ZNS
value in the ns-desc. Update the nvmet_execute_identify_desclist()
such that it can now update the ns-desc with NVME_CSI_ZNS if bdev is
zoned.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c64b40c631e0..d4fc1bb1a318 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -628,6 +628,10 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_process_zns_cis(req, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 4/9] nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

When discovering the ZNS, the host-side looks for the NVME_CSI_ZNS
value in the ns-desc. Update the nvmet_execute_identify_desclist()
such that it can now update the ns-desc with NVME_CSI_ZNS if bdev is
zoned.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c64b40c631e0..d4fc1bb1a318 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -628,6 +628,10 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_process_zns_cis(req, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_CTRL when identify.cis is set to ZNS. This allows
host to identify the support for ZNS.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index d4fc1bb1a318..e7d2b96cda6b 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -650,6 +650,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 		return nvmet_execute_identify_ns(req);
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ctrl(req);
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_CTRL when identify.cis is set to ZNS. This allows
host to identify the support for ZNS.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index d4fc1bb1a318..e7d2b96cda6b 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -650,6 +650,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 		return nvmet_execute_identify_ns(req);
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ctrl(req);
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_NS when identify.cis is set to ZNS. This allows
host to identify the ns with ZNS capabilities.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index e7d2b96cda6b..cd368cbe3855 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ns(req);
+		break;
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
 	case NVME_ID_CNS_CS_CTRL:
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_NS when identify.cis is set to ZNS. This allows
host to identify the ns with ZNS capabilities.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index e7d2b96cda6b..cd368cbe3855 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ns(req);
+		break;
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
 	case NVME_ID_CNS_CS_CTRL:
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 7/9] nvmet: add zns cmd effects to support zbdev
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

Update the target side command effects logs with support for
ZNS commands for zbdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index cd368cbe3855..0099275951da 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -191,6 +191,8 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 
+	nvmet_zns_add_cmd_effects(log);
+
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
 	kfree(log);
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 7/9] nvmet: add zns cmd effects to support zbdev
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

Update the target side command effects logs with support for
ZNS commands for zbdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index cd368cbe3855..0099275951da 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -191,6 +191,8 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 
+	nvmet_zns_add_cmd_effects(log);
+
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
 	kfree(log);
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 8/9] nvmet: add zns bdev config support
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

For zbd based bdev backend we need to override the ns->blksize_shift
with the physical block size instead of using the logical block size
so that SMR drives will not result in an error.

Update the nvmet_bdev_ns_enable() to reflect that.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 125dde3f410e..e1f6d59dd341 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -86,6 +86,9 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
+	if (bdev_is_zoned(ns->bdev) && !nvmet_bdev_zns_enable(ns))
+		return -EINVAL;
+
 	return 0;
 }
 
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 8/9] nvmet: add zns bdev config support
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

For zbd based bdev backend we need to override the ns->blksize_shift
with the physical block size instead of using the logical block size
so that SMR drives will not result in an error.

Update the nvmet_bdev_ns_enable() to reflect that.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 125dde3f410e..e1f6d59dd341 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -86,6 +86,9 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
+	if (bdev_is_zoned(ns->bdev) && !nvmet_bdev_zns_enable(ns))
+		return -EINVAL;
+
 	return 0;
 }
 
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 9/9] nvmet: add ZNS based I/O cmds handlers
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

Add zone-mgmt-send, zone-mgmt-recv and zone-zppend handlers for the
bdev backend so that it can support zbd.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index e1f6d59dd341..25dcd0544d5d 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -453,6 +453,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
 		       req->sq->qid);
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 9/9] nvmet: add ZNS based I/O cmds handlers
@ 2020-11-30  3:29   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

Add zone-mgmt-send, zone-mgmt-recv and zone-zppend handlers for the
bdev backend so that it can support zbd.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index e1f6d59dd341..25dcd0544d5d 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -453,6 +453,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
 		       req->sq->qid);
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  6:51   ` Damien Le Moal
  -1 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-11-30  6:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the ZNS mode with the passthru backend. There is no
> support for a generic block device backend to handle the ZBD devices
> which are not NVMe devices.
> 
> This adds support to export the ZBD drives (which are not NVMe drives)
> to host from the target with NVMeOF using the host side ZNS interface.
> 
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for 
> patch-series overview.
> 
> I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
> namespace with nvme-loop transport. The same testcases are passing on the
> NVMeOF zbd-ns and are passing for null_blk without NVMeOF .
> 
> Regards,
> Chaitanya
> 
> Changes from V1:-
> 
> 1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
> 2. Mark helpers inline.
> 3. Fix typos in the comments and update the comments.
> 4. Get rid of the curly brackets.
> 5. Don't allow drives with last smaller zones.
> 6. Calculate the zasl as a function of max_zone_append_sectors,
>    bio_max_pages so we don't have to split the bio.
> 7. Add global subsys->zasl and update the zasl when new namespace
>    is enabled.
> 8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
>    move functionality in to the report zone callback.
> 9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
> 10, Allocate the zones buffer with zones size instead of bdev nr_zones.
> 
> Chaitanya Kulkarni (9):
>   block: export __bio_iov_append_get_pages()
> 	Prep patch needed for implementing Zone Append.
>   nvmet: add ZNS support for bdev-ns
> 	Core Command handlers and various helpers for ZBD backend which
> 	 will be called by target-core/target-admin etc.
>   nvmet: trim down id-desclist to use req->ns
> 	Cleanup needed to avoid the code repetation for passing extra
> 	function parameters for ZBD backend handlers.
>   nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
> 	Allows host to identify zoned namesapce.
>   nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
> 	Allows host to identify controller with the ZBD-ZNS.
>   nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
> 	Allows host to identify namespace with the ZBD-ZNS.
>   nvmet: add zns cmd effects to support zbdev
> 	Allows host to support the ZNS commands when zoned-blkdev is
> 	 selected.
>   nvmet: add zns bdev config support
> 	Allows user to override any target namespace attributes for
> 	 ZBD.
>   nvmet: add ZNS based I/O cmds handlers
> 	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.
> 
>  block/bio.c                       |   3 +-
>  drivers/nvme/target/Makefile      |   2 +-
>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  19 ++
>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>  include/linux/bio.h               |   1 +
>  8 files changed, 524 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 

I had a few questions about the failed zonefs tests that you reported in the
cover letter of V1. Did you run the tests again with V2 ? Do you still see the
errors or not ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-30  6:51   ` Damien Le Moal
  0 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-11-30  6:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the ZNS mode with the passthru backend. There is no
> support for a generic block device backend to handle the ZBD devices
> which are not NVMe devices.
> 
> This adds support to export the ZBD drives (which are not NVMe drives)
> to host from the target with NVMeOF using the host side ZNS interface.
> 
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for 
> patch-series overview.
> 
> I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
> namespace with nvme-loop transport. The same testcases are passing on the
> NVMeOF zbd-ns and are passing for null_blk without NVMeOF .
> 
> Regards,
> Chaitanya
> 
> Changes from V1:-
> 
> 1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
> 2. Mark helpers inline.
> 3. Fix typos in the comments and update the comments.
> 4. Get rid of the curly brackets.
> 5. Don't allow drives with last smaller zones.
> 6. Calculate the zasl as a function of max_zone_append_sectors,
>    bio_max_pages so we don't have to split the bio.
> 7. Add global subsys->zasl and update the zasl when new namespace
>    is enabled.
> 8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
>    move functionality in to the report zone callback.
> 9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
> 10, Allocate the zones buffer with zones size instead of bdev nr_zones.
> 
> Chaitanya Kulkarni (9):
>   block: export __bio_iov_append_get_pages()
> 	Prep patch needed for implementing Zone Append.
>   nvmet: add ZNS support for bdev-ns
> 	Core Command handlers and various helpers for ZBD backend which
> 	 will be called by target-core/target-admin etc.
>   nvmet: trim down id-desclist to use req->ns
> 	Cleanup needed to avoid the code repetation for passing extra
> 	function parameters for ZBD backend handlers.
>   nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
> 	Allows host to identify zoned namesapce.
>   nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
> 	Allows host to identify controller with the ZBD-ZNS.
>   nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
> 	Allows host to identify namespace with the ZBD-ZNS.
>   nvmet: add zns cmd effects to support zbdev
> 	Allows host to support the ZNS commands when zoned-blkdev is
> 	 selected.
>   nvmet: add zns bdev config support
> 	Allows user to override any target namespace attributes for
> 	 ZBD.
>   nvmet: add ZNS based I/O cmds handlers
> 	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.
> 
>  block/bio.c                       |   3 +-
>  drivers/nvme/target/Makefile      |   2 +-
>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  19 ++
>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>  include/linux/bio.h               |   1 +
>  8 files changed, 524 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 

I had a few questions about the failed zonefs tests that you reported in the
cover letter of V1. Did you run the tests again with V2 ? Do you still see the
errors or not ?


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-30  3:29   ` Chaitanya Kulkarni
@ 2020-11-30 11:57     ` Damien Le Moal
  -1 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-11-30 11:57 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
> support for bdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   2 +-
>  drivers/nvme/target/admin-cmd.c   |   4 +-
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  19 ++
>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>  5 files changed, 486 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..d050f829b43a 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
>  obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
> -			discovery.o io-cmd-file.o io-cmd-bdev.o
> +		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index dca34489a1dc..509fd8dcca0c 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>  	nvmet_req_complete(req, status);
>  }
>  
> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> -				    void *id, off_t *off)
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off)
>  {
>  	struct nvme_ns_id_desc desc = {
>  		.nidt = type,
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 0abbefd9925e..2bd10960fa50 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  	return ret;
>  }
>  
> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>  {
>  	bv->bv_page = sg_page(sg);
>  	bv->bv_offset = sg->offset;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..eee7866ae512 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -81,6 +81,10 @@ struct nvmet_ns {
>  	struct pci_dev		*p2p_dev;
>  	int			pi_type;
>  	int			metadata_size;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ns_zns	id_zns;
> +	unsigned int		zasl;
> +#endif
>  };
>  
>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +255,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -603,4 +611,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off);
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..40dedfd51fd6
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,463 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/uio.h>
> +#include <linux/nvme.h>
> +#include <linux/xarray.h>
> +#include <linux/blkdev.h>
> +#include <linux/module.h>
> +#include "nvmet.h"
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +#define NVMET_MPSMIN_SHIFT	12
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = 0;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +out:
> +	return status;
> +}
> +
> +static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> +	return req->ns->bdev;
> +}
> +
> +static inline  u64 nvmet_zones_to_desc_size(unsigned int nr_zones)
> +{
> +	return sizeof(struct nvme_zone_report) +
> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
> +}
> +
> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +/*
> + *  ZNS related command implementation and helpers.
> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req)))
> +		return NVME_SC_SUCCESS;
> +
> +	return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> +					&nvme_cis_zns, off);
> +}
> +
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
> +					    unsigned int idx, void *data)
> +{
> +	struct blk_zone *zone = data;
> +
> +	memcpy(zone, z, sizeof(struct blk_zone));
> +
> +	return 0;
> +}
> +
> +static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
> +{
> +	sector_t last_sect = get_capacity(ns->bdev->bd_disk) - 1;
> +	struct blk_zone last_zone, first_zone;
> +	int reported_zones;
> +
> +	reported_zones = blkdev_report_zones(ns->bdev, 0, 1,
> +					     nvmet_bdev_validate_zns_zones_cb,
> +					     &first_zone);
> +	if (reported_zones != 1)
> +		return false;
> +
> +	reported_zones = blkdev_report_zones(ns->bdev, last_sect, 1,
> +					     nvmet_bdev_validate_zns_zones_cb,
> +					     &last_zone);
> +	if (reported_zones != 1)
> +		return false;
> +
> +	return first_zone.capacity == last_zone.capacity ? true : false;
> +}

That is really heavy handed... Why not just simply:

return !(get_capacity(ns->bdev->bd_disk) & (bdev_zone_sectors(ns->bdev) - 1));

That does the same: true if capacity is a multiple of the zone size (no runt
zone), false otherwise.

> +
> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
> +{
> +	unsigned int npages = (zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT;
> +	u8 zasl = ilog2(npages);
> +
> +	/*
> +	 * Zone Append Size Limit is the value experessed in the units
> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
> +	 */
> +	return zasl;

Why not just:

	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);

And are you sure that there is no possibility that the u8 type overflows here ?

> +}
> +
> +static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
> +{
> +	u8 bio_max_zasl = nvmet_zasl((BIO_MAX_PAGES * PAGE_SIZE) >> 9);
> +	struct request_queue *q = ns->bdev->bd_disk->queue;
> +	struct nvmet_ns *ins;
> +	unsigned long idx;
> +	u8 min_zasl;
> +
> +	/*
> +	 * Calculate new ctrl->zasl value when enabling the new ns. This value
> +	 * has to be the minimum of the max_zone appned values from available

s/appned/append

> +	 * namespaces.
> +	 */
> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
> +
> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
> +		u8 izasl = nvmet_zasl(imax_za_sects);
> +
> +		if (!bdev_is_zoned(ins->bdev))
> +			continue;
> +
> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
> +	}
> +
> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);

I do not understand what bio_max_zasl is used for here, as it does not represent
anything related to the zoned namespaces backends.

> +}> +
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> +		pr_err("block devices with conventional zones are not supported.");
> +		return false;
> +	}

It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
all backend zone configuration checks are together.

> +
> +	if (!nvmet_bdev_validate_zns_zones(ns))
> +		return false;
> +
> +	/*
> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> +	 * to the device physical block size. So use this value as the logical
> +	 * block size to avoid errors.
> +	 */
> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +
> +	nvmet_zns_update_zasl(ns);
> +
> +	return true;
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status = 0;
> +	u8 mdts;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Even though this function sets Zone Append Size Limit to 0,
> +	 * the 0 value here indicates that the maximum data transfer size for
> +	 * the Zone Append command is indicated by the ctrl
> +	 * Maximum Data Transfer Size (MDTS).
> +	 */

I do not understand this comment. It does not really exactly match what the code
below is doing.

> +
> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
> +
> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = 0;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +struct nvmet_report_zone_data {
> +	struct nvmet_ns *ns;
> +	struct nvme_zone_report *rz;
> +};
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
> +				     void *data)
> +{
> +	struct nvmet_report_zone_data *report_zone_data = data;
> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
> +	struct nvmet_ns *ns = report_zone_data->ns;
> +
> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
> +	entries[idx].za = z->reset ? 1 << 2 : 0;
> +	entries[idx].zt = z->type;
> +	entries[idx].zs = z->cond << 4;
> +
> +	return 0;
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;

size_t ?

> +	struct nvmet_report_zone_data data = { .ns = req->ns };
> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);

I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
nvme_zone_report). I think what you want here is:

nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
	sizeof(struct nvme_zone_descriptor);

And then you can get rid of nvmet_zones_to_desc_size();

> +	int reported_zones;
> +	u16 status;
> +
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> +	if (!data.rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
> +					     nvmet_bdev_report_zone_cb,
> +					     &data);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
> +
> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(data.rz);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
> +	enum req_opf op = REQ_OP_LAST;
> +	u16 status = NVME_SC_SUCCESS;
> +	sector_t sect;
> +	int ret;
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
> +
> +	switch (c->zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		if (c->select_all)
> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	unsigned long bv_cnt = req->sg_cnt;
> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	size_t mapped_data_len = 0;
> +	int sg_cnt = req->sg_cnt;
> +	struct scatterlist *sg;
> +	struct iov_iter from;
> +	struct bio_vec *bvec;
> +	size_t mapped_cnt;
> +	struct bio *bio;
> +	int ret;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;
> +
> +	/*
> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
> +	 * don't have to split the bio, i.e. we shouldn't get
> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
> +	 * with the size that considers the BIO_MAX_PAGES.
> +	 */

What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
zone append sectors). What does BIO_MAX_PAGES has to do with anything ?

> +	if (!req->sg_cnt)
> +		goto out;
> +
> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
> +	if (!bvec) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
> +		nvmet_file_init_bvec(bvec, sg);
> +		mapped_data_len += bvec[mapped_cnt].bv_len;
> +		sg_cnt--;
> +		if (mapped_cnt == bv_cnt)
> +			break;
> +	}
> +
> +	if (WARN_ON(sg_cnt)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
> +
> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
> +	bio_set_dev(bio, nvmet_bdev(req));
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_opf = op;

The op variable is used only here. It is not necessary.

> +
> +	ret =  __bio_iov_append_get_pages(bio, &from);

I still do not see why bio_iov_iter_get_pages() does not work here. Would you
care to explain please ?

> +	if (unlikely(ret)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		bio_io_error(bio);
> +		goto bvec_free;
> +	}
> +
> +	ret = submit_bio_wait(bio);
> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
> +	bio_put(bio);
> +
> +	sect += (mapped_data_len >> 9);
> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
> +
> +bvec_free:
> +	kfree(bvec);
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	return 0;
> +}
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-30 11:57     ` Damien Le Moal
  0 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-11-30 11:57 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
> support for bdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   2 +-
>  drivers/nvme/target/admin-cmd.c   |   4 +-
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  19 ++
>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>  5 files changed, 486 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..d050f829b43a 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
>  obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
> -			discovery.o io-cmd-file.o io-cmd-bdev.o
> +		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index dca34489a1dc..509fd8dcca0c 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>  	nvmet_req_complete(req, status);
>  }
>  
> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> -				    void *id, off_t *off)
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off)
>  {
>  	struct nvme_ns_id_desc desc = {
>  		.nidt = type,
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 0abbefd9925e..2bd10960fa50 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  	return ret;
>  }
>  
> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>  {
>  	bv->bv_page = sg_page(sg);
>  	bv->bv_offset = sg->offset;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..eee7866ae512 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -81,6 +81,10 @@ struct nvmet_ns {
>  	struct pci_dev		*p2p_dev;
>  	int			pi_type;
>  	int			metadata_size;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ns_zns	id_zns;
> +	unsigned int		zasl;
> +#endif
>  };
>  
>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +255,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -603,4 +611,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off);
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..40dedfd51fd6
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,463 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/uio.h>
> +#include <linux/nvme.h>
> +#include <linux/xarray.h>
> +#include <linux/blkdev.h>
> +#include <linux/module.h>
> +#include "nvmet.h"
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +#define NVMET_MPSMIN_SHIFT	12
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = 0;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +out:
> +	return status;
> +}
> +
> +static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> +	return req->ns->bdev;
> +}
> +
> +static inline  u64 nvmet_zones_to_desc_size(unsigned int nr_zones)
> +{
> +	return sizeof(struct nvme_zone_report) +
> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
> +}
> +
> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +/*
> + *  ZNS related command implementation and helpers.
> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req)))
> +		return NVME_SC_SUCCESS;
> +
> +	return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> +					&nvme_cis_zns, off);
> +}
> +
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
> +					    unsigned int idx, void *data)
> +{
> +	struct blk_zone *zone = data;
> +
> +	memcpy(zone, z, sizeof(struct blk_zone));
> +
> +	return 0;
> +}
> +
> +static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
> +{
> +	sector_t last_sect = get_capacity(ns->bdev->bd_disk) - 1;
> +	struct blk_zone last_zone, first_zone;
> +	int reported_zones;
> +
> +	reported_zones = blkdev_report_zones(ns->bdev, 0, 1,
> +					     nvmet_bdev_validate_zns_zones_cb,
> +					     &first_zone);
> +	if (reported_zones != 1)
> +		return false;
> +
> +	reported_zones = blkdev_report_zones(ns->bdev, last_sect, 1,
> +					     nvmet_bdev_validate_zns_zones_cb,
> +					     &last_zone);
> +	if (reported_zones != 1)
> +		return false;
> +
> +	return first_zone.capacity == last_zone.capacity ? true : false;
> +}

That is really heavy handed... Why not just simply:

return !(get_capacity(ns->bdev->bd_disk) & (bdev_zone_sectors(ns->bdev) - 1));

That does the same: true if capacity is a multiple of the zone size (no runt
zone), false otherwise.

> +
> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
> +{
> +	unsigned int npages = (zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT;
> +	u8 zasl = ilog2(npages);
> +
> +	/*
> +	 * Zone Append Size Limit is the value experessed in the units
> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
> +	 */
> +	return zasl;

Why not just:

	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);

And are you sure that there is no possibility that the u8 type overflows here ?

> +}
> +
> +static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
> +{
> +	u8 bio_max_zasl = nvmet_zasl((BIO_MAX_PAGES * PAGE_SIZE) >> 9);
> +	struct request_queue *q = ns->bdev->bd_disk->queue;
> +	struct nvmet_ns *ins;
> +	unsigned long idx;
> +	u8 min_zasl;
> +
> +	/*
> +	 * Calculate new ctrl->zasl value when enabling the new ns. This value
> +	 * has to be the minimum of the max_zone appned values from available

s/appned/append

> +	 * namespaces.
> +	 */
> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
> +
> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
> +		u8 izasl = nvmet_zasl(imax_za_sects);
> +
> +		if (!bdev_is_zoned(ins->bdev))
> +			continue;
> +
> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
> +	}
> +
> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);

I do not understand what bio_max_zasl is used for here, as it does not represent
anything related to the zoned namespaces backends.

> +}> +
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> +		pr_err("block devices with conventional zones are not supported.");
> +		return false;
> +	}

It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
all backend zone configuration checks are together.

> +
> +	if (!nvmet_bdev_validate_zns_zones(ns))
> +		return false;
> +
> +	/*
> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> +	 * to the device physical block size. So use this value as the logical
> +	 * block size to avoid errors.
> +	 */
> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +
> +	nvmet_zns_update_zasl(ns);
> +
> +	return true;
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status = 0;
> +	u8 mdts;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Even though this function sets Zone Append Size Limit to 0,
> +	 * the 0 value here indicates that the maximum data transfer size for
> +	 * the Zone Append command is indicated by the ctrl
> +	 * Maximum Data Transfer Size (MDTS).
> +	 */

I do not understand this comment. It does not really exactly match what the code
below is doing.

> +
> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
> +
> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = 0;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +struct nvmet_report_zone_data {
> +	struct nvmet_ns *ns;
> +	struct nvme_zone_report *rz;
> +};
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
> +				     void *data)
> +{
> +	struct nvmet_report_zone_data *report_zone_data = data;
> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
> +	struct nvmet_ns *ns = report_zone_data->ns;
> +
> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
> +	entries[idx].za = z->reset ? 1 << 2 : 0;
> +	entries[idx].zt = z->type;
> +	entries[idx].zs = z->cond << 4;
> +
> +	return 0;
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;

size_t ?

> +	struct nvmet_report_zone_data data = { .ns = req->ns };
> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);

I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
nvme_zone_report). I think what you want here is:

nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
	sizeof(struct nvme_zone_descriptor);

And then you can get rid of nvmet_zones_to_desc_size();

> +	int reported_zones;
> +	u16 status;
> +
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> +	if (!data.rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
> +					     nvmet_bdev_report_zone_cb,
> +					     &data);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
> +
> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(data.rz);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
> +	enum req_opf op = REQ_OP_LAST;
> +	u16 status = NVME_SC_SUCCESS;
> +	sector_t sect;
> +	int ret;
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
> +
> +	switch (c->zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		if (c->select_all)
> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	unsigned long bv_cnt = req->sg_cnt;
> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	size_t mapped_data_len = 0;
> +	int sg_cnt = req->sg_cnt;
> +	struct scatterlist *sg;
> +	struct iov_iter from;
> +	struct bio_vec *bvec;
> +	size_t mapped_cnt;
> +	struct bio *bio;
> +	int ret;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;
> +
> +	/*
> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
> +	 * don't have to split the bio, i.e. we shouldn't get
> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
> +	 * with the size that considers the BIO_MAX_PAGES.
> +	 */

What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
zone append sectors). What does BIO_MAX_PAGES has to do with anything ?

> +	if (!req->sg_cnt)
> +		goto out;
> +
> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
> +	if (!bvec) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
> +		nvmet_file_init_bvec(bvec, sg);
> +		mapped_data_len += bvec[mapped_cnt].bv_len;
> +		sg_cnt--;
> +		if (mapped_cnt == bv_cnt)
> +			break;
> +	}
> +
> +	if (WARN_ON(sg_cnt)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
> +
> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
> +	bio_set_dev(bio, nvmet_bdev(req));
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_opf = op;

The op variable is used only here. It is not necessary.

> +
> +	ret =  __bio_iov_append_get_pages(bio, &from);

I still do not see why bio_iov_iter_get_pages() does not work here. Would you
care to explain please ?

> +	if (unlikely(ret)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		bio_io_error(bio);
> +		goto bvec_free;
> +	}
> +
> +	ret = submit_bio_wait(bio);
> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
> +	bio_put(bio);
> +
> +	sect += (mapped_data_len >> 9);
> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
> +
> +bvec_free:
> +	kfree(bvec);
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	return 0;
> +}
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 8/9] nvmet: add zns bdev config support
  2020-11-30  3:29   ` Chaitanya Kulkarni
@ 2020-11-30 12:02     ` Damien Le Moal
  -1 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-11-30 12:02 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:30, Chaitanya Kulkarni wrote:
> For zbd based bdev backend we need to override the ns->blksize_shift
> with the physical block size instead of using the logical block size
> so that SMR drives will not result in an error.
> 
> Update the nvmet_bdev_ns_enable() to reflect that.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 125dde3f410e..e1f6d59dd341 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -86,6 +86,9 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>  		nvmet_bdev_ns_enable_integrity(ns);
>  
> +	if (bdev_is_zoned(ns->bdev) && !nvmet_bdev_zns_enable(ns))
> +		return -EINVAL;
> +
>  	return 0;
>  }
>  
> 

I think this should be merged with patch 9. I do not see the point in having
this as a separate patch.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 8/9] nvmet: add zns bdev config support
@ 2020-11-30 12:02     ` Damien Le Moal
  0 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-11-30 12:02 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:30, Chaitanya Kulkarni wrote:
> For zbd based bdev backend we need to override the ns->blksize_shift
> with the physical block size instead of using the logical block size
> so that SMR drives will not result in an error.
> 
> Update the nvmet_bdev_ns_enable() to reflect that.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 125dde3f410e..e1f6d59dd341 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -86,6 +86,9 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>  		nvmet_bdev_ns_enable_integrity(ns);
>  
> +	if (bdev_is_zoned(ns->bdev) && !nvmet_bdev_zns_enable(ns))
> +		return -EINVAL;
> +
>  	return 0;
>  }
>  
> 

I think this should be merged with patch 9. I do not see the point in having
this as a separate patch.

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-30  3:29   ` Chaitanya Kulkarni
@ 2020-11-30 12:29     ` Johannes Thumshirn
  -1 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2020-11-30 12:29 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi, Damien Le Moal

On 30/11/2020 04:32, Chaitanya Kulkarni wrote:
> +	ret =  __bio_iov_append_get_pages(bio, &from);

Can't you just use bio_iov_iter_get_pages() here?

It does have a 

if (WARN_ON_ONCE(is_bvec))
	return -EINVAL;

in it but I think that can be deleted.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-30 12:29     ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2020-11-30 12:29 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: Damien Le Moal, hch, sagi

On 30/11/2020 04:32, Chaitanya Kulkarni wrote:
> +	ret =  __bio_iov_append_get_pages(bio, &from);

Can't you just use bio_iov_iter_get_pages() here?

It does have a 

if (WARN_ON_ONCE(is_bvec))
	return -EINVAL;

in it but I think that can be deleted.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-30 11:57     ` Damien Le Moal
@ 2020-12-01  3:38       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:38 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/30/20 03:57, Damien Le Moal wrote:
> On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
>> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
>> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
>> support for bdev.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>  drivers/nvme/target/Makefile      |   2 +-
>>  drivers/nvme/target/admin-cmd.c   |   4 +-
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  19 ++
>>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>>  5 files changed, 486 insertions(+), 4 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>> index ebf91fc4c72e..d050f829b43a 100644
>> --- a/drivers/nvme/target/Makefile
>> +++ b/drivers/nvme/target/Makefile
>> @@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
>>  obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>  
>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>> -			discovery.o io-cmd-file.o io-cmd-bdev.o
>> +		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>>  nvme-loop-y	+= loop.o
>>  nvmet-rdma-y	+= rdma.o
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index dca34489a1dc..509fd8dcca0c 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>>  	nvmet_req_complete(req, status);
>>  }
>>  
>> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> -				    void *id, off_t *off)
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off)
>>  {
>>  	struct nvme_ns_id_desc desc = {
>>  		.nidt = type,
>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>> index 0abbefd9925e..2bd10960fa50 100644
>> --- a/drivers/nvme/target/io-cmd-file.c
>> +++ b/drivers/nvme/target/io-cmd-file.c
>> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>  	return ret;bio_iov_iter_get_pages
>>  }
>>  
>> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>  {
>>  	bv->bv_page = sg_page(sg);
>>  	bv->bv_offset = sg->offset;
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 592763732065..eee7866ae512 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -81,6 +81,10 @@ struct nvmet_ns {
>>  	struct pci_dev		*p2p_dev;
>>  	int			pi_type;
>>  	int			metadata_size;
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ns_zns	id_zns;
>> +	unsigned int		zasl;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
>> @@ -251,6 +255,10 @@ struct nvmet_subsys {
>>  	unsigned int		admin_timeout;
>>  	unsigned int		io_timeout;
>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>> @@ -603,4 +611,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>>  }
>>  
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off);
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>>  #endif /* _NVMET_H */
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..40dedfd51fd6
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,463 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +#include <linux/uio.h>
>> +#include <linux/nvme.h>
>> +#include <linux/xarray.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/module.h>
>> +#include "nvmet.h"
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +#define NVMET_MPSMIN_SHIFT	12
>> +
>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>> +{
>> +	u16 status = 0;
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>> +		status = NVME_SC_INVALID_FIELD;
>> +out:
>> +	return status;
>> +}
>> +
>> +static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
>> +{
>> +	return req->ns->bdev;
>> +}
>> +
>> +static inline  u64 nvmet_zones_to_desc_size(unsigned int nr_zones)
>> +{
>> +	return sizeof(struct nvme_zone_report) +
>> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
>> +}
>> +
>> +static inline u64 nvmet_sect_to_lba(struct nvbio_iov_iter_get_pagesmet_ns *ns, sector_t sect)
>> +{
>> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>> +{
>> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
>> +
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +	if (!bdev_is_zoned(nvmet_bdev(req)))
>> +		return NVME_SC_SUCCESS;
>> +
>> +	return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
>> +					&nvme_cis_zns, off);
>> +}
>> +
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +}
>> +
>> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
>> +					    unsigned int idx, void *data)
>> +{
>> +	struct blk_zone *zone = data;
>> +
>> +	memcpy(zone, z, sizeof(struct blk_zone));
>> +
>> +	return 0;
>> +}
>> +
>> +static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
>> +{
>> +	sector_t last_sect = get_capacity(ns->bdev->bd_disk) - 1;
>> +	struct blk_zone last_zone, first_zone;
>> +	int reported_zones;
>> +
>> +	reported_zones = blkdev_report_zones(ns->bdev, 0, 1,
>> +					     nvmet_bdev_validate_zns_zones_cb,
>> +					     &first_zone);
>> +	if (reported_zones != 1)
>> +		return false;
>> +
>> +	reported_zones = blkdev_report_zones(ns->bdev, last_sect, 1,
>> +					     nvmet_bdev_validate_zns_zones_cb,
>> +					     &last_zone);
>> +	if (reported_zones != 1)
>> +		return false;
>> +
>> +	return first_zone.capacity == last_zone.capacity ? true : false;
>> +}
> That is really heavy handed... Why not just simply:
>
> return !(get_capacity(ns->bdev->bd_disk) & (bdev_zone_sectors(ns->bdev) - 1));
>
> That does the same: true if capacity is a multiple of the zone size (no runt
> zone
Okay.
> ), false otherwise.
>
>> +
>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
>> +{
>> +	unsigned int npages = (zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT;
>> +	u8 zasl = ilog2(npages);
>> +
>> +	/*
>> +	 * Zone Append Size Limit is the value experessed in the units
>> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
>> +	 */
>> +	return zasl;
> Why not just:
>
> 	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
Okay.
> And are you sure that there is no possibility that the u8 type overflows here ?
nvmet_zasl() is called with an argument which are unsigned ints :-
1. (BIO_MAX_PAGES * PAGE_SIZE) >> 9.
2. queue_max_zone_append_sectors().
3. queue_max_zone_append_sectors().

size of unsigned int on 32bit machine should be 4bytes, so
0xFFFFFFFF >> 12 = 0xFFFFF and log2(0XFFFFF) should fit into the u8,
unless I'm completely wrong.
>> +}
>> +
>> +static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
>> +{
>> +	u8 bio_max_zasl = nvmet_zasl((BIO_MAX_PAGES * PAGE_SIZE) >> 9);
>> +	struct request_queue *q = ns->bdev->bd_disk->queue;
>> +	struct nvmet_ns *ins;
>> +	unsigned long idx;
>> +	u8 min_zasl;
>> +
>> +	/*
>> +	 * Calculate new ctrl->zasl value when enabling the new ns. This value
>> +	 * has to be the minimum of the max_zone appned values from available
> s/appned/append
Okay.
>> +	 * namespaces.
>> +	 */
>> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>> +
>> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
>> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
>> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
>> +		u8 izasl = nvmet_zasl(imax_za_sects);
>> +
>> +		if (!bdev_is_zoned(ins->bdev))
>> +			continue;
>> +
>> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
>> +	}
>> +
>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
> I do not understand what bio_max_zasl is used for here, as it does not represent
> anything related to the zoned namespaces backends.
If we don't consider the bio max zasl then we will get the request more than
one bio can handle for zone append and will lead to bio split whichneeds to
be avoided.
>> +}> +
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>> +		pr_err("block devices with conventional zones are not supported.");
>> +		return false;
>> +	}
> It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
> all backend zone configuration checks are together.
Okay.
>> +
>> +	if (!nvmet_bdev_validate_zns_zones(ns))
>> +		return false;
>> +
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>> +
>> +	nvmet_zns_update_zasl(ns);
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * ZNS related Admin and I/O command handlers.
>> + */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status = 0;
>> +	u8 mdts;
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * Even though this function sets Zone Append Size Limit to 0,
>> +	 * the 0 value here indicates that the maximum data transfer size for
>> +	 * the Zone Append command is indicated by the ctrl
>> +	 * Maximum Data Transfer Size (MDTS).
>> +	 */
> I do not understand this comment. It does not really exactly match what the code
> below is doing.
Let me fix the code and the comment in next version.
>> +
>> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
>> +
>> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u16 status = 0;
>> +	u64 zsze;
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +	if (!req->ns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
>> +	}
>> +
>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>> +					req->ns->blksize_shift;
>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>> +
>> +done:
>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>> +	kfree(id_zns);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +struct nvmet_report_zone_data {
>> +	struct nvmet_ns *ns;
>> +	struct nvme_zone_report *rz;
>> +};
>> +
>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>> +				     void *data)
>> +{
>> +	struct nvmet_report_zone_data *report_zone_data = data;
>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>> +	struct nvmet_ns *ns = report_zone_data->ns;
>> +
>> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>> +	entries[idx].zt = z->type;
>> +	entries[idx].zs = z->cond << 4;
>> +
>> +	return 0;
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
> size_t ?
Yes.
>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
>> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
> I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
> nvme_zone_report). I think what you want here is:
>
> nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
> 	sizeof(struct nvme_zone_descriptor);
>
> And then you can get rid of nvmet_zones_to_desc_size();
Yes, it doesn't need nvmet_zones_to_desc_size() anymore from v1.
>> +	int reported_zones;Maybe there is better way.
>> +	u16 status;
>> +
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>> +	if (!data.rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
>> +					     nvmet_bdev_report_zone_cb,
>> +					     &data);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(data.rz);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>> +	enum req_opf op = REQ_OP_LAST;
>> +	u16 status = NVME_SC_SUCCESS;
>> +	sector_t sect;
>> +	int ret;
>> +
>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>> +
>> +	switch (c->zsa) {
>> +	case NVME_ZONE_OPEN:
>> +		op = REQ_OP_ZONE_OPEN;
>> +		break;
>> +	case NVME_ZONE_CLOSE:
>> +		op = REQ_OP_ZONE_CLOSE;
>> +		break;
>> +	case NVME_ZONE_FINISH:
>> +		op = REQ_OP_ZONE_FINISH;
>> +		break;
>> +	case NVME_ZONE_RESET:
>> +		if (c->select_all)
>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>> +		op = REQ_OP_ZONE_RESET;
>> +		break;
>> +	default:
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	unsigned long bv_cnt = req->sg_cnt;
>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	size_t mapped_data_len = 0;
>> +	int sg_cnt = req->sg_cnt;
>> +	struct scatterlist *sg;
>> +	struct iov_iter from;
>> +	struct bio_vec *bvec;
>> +	size_t mapped_cnt;
>> +	struct bio *bio;
>> +	int ret;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
>> +
>> +	/*
>> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
>> +	 * don't have to split the bio, i.e. we shouldn't get
>> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
>> +	 * with the size that considers the BIO_MAX_PAGES.
>> +	 */
> What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
> zone append sectors). What does BIO_MAX_PAGES has to do with anything ?
That is not true, ctrl->zasl is advertised
min(min all ns max zone append sectors), bio_max_zasl) to avoid the
splitting
of the bio on the target side:-

See nvmet_zns_update_zasl()

+	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);

Without considering the bio_max_pages we may have to set the ctrl->zasl
value
thatis > bio_max_pages_zasl, then we'll get a request that is greater
than what
one bio can handle, that will lead to splitting the bios, which we want to
avoid as per the comment in the V1.

Can you please explain what is wrong with this approach which regulates the
zasl with all the possible namespaces zasl and bio_zasl?

May be there is better way?
>> +	if (!req->sg_cnt)
>> +		goto out;
>> +
>> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>> +	if (!bvec) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>> +		nvmet_file_init_bvec(bvec, sg);
>> +		mapped_data_len += bvec[mapped_cnt].bv_len;
>> +		sg_cnt--;
>> +		if (mapped_cnt == bv_cnt)
>> +			brhigh frequency I/O operationeak;
>> +	}
>> +
>> +	if (WARN_ON(sg_cnt)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>> +
>> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
>> +	bio_set_dev(bio, nvmet_bdev(req));
>> +	bio->bi_iter.bi_sector = sect;
>> +	bio->bi_opf = op;
> The op variable is used only here. It is not necessary.
Yes.
>> +
>> +	ret =  __bio_iov_append_get_pages(bio, &from);
> I still do not see why bio_iov_iter_get_pages() does not work here. Would you
> care to explain please ?
>
The same reason pointed out by the Johannes. Instead of calling wrapper,
call the underlaying core API, since we don't care about the rest of the
generic code. This also avoids an extra function callfor zone-append
fast path.
>> +	if (unlikely(ret)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		bio_io_error(bio);
>> +		goto bvec_free;
>> +	}
>> +
>> +	ret = submit_bio_wait(bio);
>> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
>> +	bio_put(bio);
>> +
>> +	sect += (mapped_data_len >> 9);
>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>> +
>> +bvec_free:
>> +	kfree(bvec);
>> +
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +#else  /* CONFIG_BLK_DEV_ZONED */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +}
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	return 0;
>> +}
>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>> +{
>> +	return false;
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +}
>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>
>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-12-01  3:38       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:38 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/30/20 03:57, Damien Le Moal wrote:
> On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
>> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
>> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
>> support for bdev.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>  drivers/nvme/target/Makefile      |   2 +-
>>  drivers/nvme/target/admin-cmd.c   |   4 +-
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  19 ++
>>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>>  5 files changed, 486 insertions(+), 4 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>> index ebf91fc4c72e..d050f829b43a 100644
>> --- a/drivers/nvme/target/Makefile
>> +++ b/drivers/nvme/target/Makefile
>> @@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
>>  obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>  
>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>> -			discovery.o io-cmd-file.o io-cmd-bdev.o
>> +		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>>  nvme-loop-y	+= loop.o
>>  nvmet-rdma-y	+= rdma.o
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index dca34489a1dc..509fd8dcca0c 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>>  	nvmet_req_complete(req, status);
>>  }
>>  
>> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> -				    void *id, off_t *off)
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off)
>>  {
>>  	struct nvme_ns_id_desc desc = {
>>  		.nidt = type,
>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>> index 0abbefd9925e..2bd10960fa50 100644
>> --- a/drivers/nvme/target/io-cmd-file.c
>> +++ b/drivers/nvme/target/io-cmd-file.c
>> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>  	return ret;bio_iov_iter_get_pages
>>  }
>>  
>> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>  {
>>  	bv->bv_page = sg_page(sg);
>>  	bv->bv_offset = sg->offset;
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 592763732065..eee7866ae512 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -81,6 +81,10 @@ struct nvmet_ns {
>>  	struct pci_dev		*p2p_dev;
>>  	int			pi_type;
>>  	int			metadata_size;
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ns_zns	id_zns;
>> +	unsigned int		zasl;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
>> @@ -251,6 +255,10 @@ struct nvmet_subsys {
>>  	unsigned int		admin_timeout;
>>  	unsigned int		io_timeout;
>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>> @@ -603,4 +611,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>>  }
>>  
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off);
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>>  #endif /* _NVMET_H */
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..40dedfd51fd6
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,463 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +#include <linux/uio.h>
>> +#include <linux/nvme.h>
>> +#include <linux/xarray.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/module.h>
>> +#include "nvmet.h"
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +#define NVMET_MPSMIN_SHIFT	12
>> +
>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>> +{
>> +	u16 status = 0;
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>> +		status = NVME_SC_INVALID_FIELD;
>> +out:
>> +	return status;
>> +}
>> +
>> +static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
>> +{
>> +	return req->ns->bdev;
>> +}
>> +
>> +static inline  u64 nvmet_zones_to_desc_size(unsigned int nr_zones)
>> +{
>> +	return sizeof(struct nvme_zone_report) +
>> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
>> +}
>> +
>> +static inline u64 nvmet_sect_to_lba(struct nvbio_iov_iter_get_pagesmet_ns *ns, sector_t sect)
>> +{
>> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>> +{
>> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
>> +
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +	if (!bdev_is_zoned(nvmet_bdev(req)))
>> +		return NVME_SC_SUCCESS;
>> +
>> +	return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
>> +					&nvme_cis_zns, off);
>> +}
>> +
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +}
>> +
>> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
>> +					    unsigned int idx, void *data)
>> +{
>> +	struct blk_zone *zone = data;
>> +
>> +	memcpy(zone, z, sizeof(struct blk_zone));
>> +
>> +	return 0;
>> +}
>> +
>> +static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
>> +{
>> +	sector_t last_sect = get_capacity(ns->bdev->bd_disk) - 1;
>> +	struct blk_zone last_zone, first_zone;
>> +	int reported_zones;
>> +
>> +	reported_zones = blkdev_report_zones(ns->bdev, 0, 1,
>> +					     nvmet_bdev_validate_zns_zones_cb,
>> +					     &first_zone);
>> +	if (reported_zones != 1)
>> +		return false;
>> +
>> +	reported_zones = blkdev_report_zones(ns->bdev, last_sect, 1,
>> +					     nvmet_bdev_validate_zns_zones_cb,
>> +					     &last_zone);
>> +	if (reported_zones != 1)
>> +		return false;
>> +
>> +	return first_zone.capacity == last_zone.capacity ? true : false;
>> +}
> That is really heavy handed... Why not just simply:
>
> return !(get_capacity(ns->bdev->bd_disk) & (bdev_zone_sectors(ns->bdev) - 1));
>
> That does the same: true if capacity is a multiple of the zone size (no runt
> zone
Okay.
> ), false otherwise.
>
>> +
>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
>> +{
>> +	unsigned int npages = (zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT;
>> +	u8 zasl = ilog2(npages);
>> +
>> +	/*
>> +	 * Zone Append Size Limit is the value experessed in the units
>> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
>> +	 */
>> +	return zasl;
> Why not just:
>
> 	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
Okay.
> And are you sure that there is no possibility that the u8 type overflows here ?
nvmet_zasl() is called with an argument which are unsigned ints :-
1. (BIO_MAX_PAGES * PAGE_SIZE) >> 9.
2. queue_max_zone_append_sectors().
3. queue_max_zone_append_sectors().

size of unsigned int on 32bit machine should be 4bytes, so
0xFFFFFFFF >> 12 = 0xFFFFF and log2(0XFFFFF) should fit into the u8,
unless I'm completely wrong.
>> +}
>> +
>> +static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
>> +{
>> +	u8 bio_max_zasl = nvmet_zasl((BIO_MAX_PAGES * PAGE_SIZE) >> 9);
>> +	struct request_queue *q = ns->bdev->bd_disk->queue;
>> +	struct nvmet_ns *ins;
>> +	unsigned long idx;
>> +	u8 min_zasl;
>> +
>> +	/*
>> +	 * Calculate new ctrl->zasl value when enabling the new ns. This value
>> +	 * has to be the minimum of the max_zone appned values from available
> s/appned/append
Okay.
>> +	 * namespaces.
>> +	 */
>> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>> +
>> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
>> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
>> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
>> +		u8 izasl = nvmet_zasl(imax_za_sects);
>> +
>> +		if (!bdev_is_zoned(ins->bdev))
>> +			continue;
>> +
>> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
>> +	}
>> +
>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
> I do not understand what bio_max_zasl is used for here, as it does not represent
> anything related to the zoned namespaces backends.
If we don't consider the bio max zasl then we will get the request more than
one bio can handle for zone append and will lead to bio split whichneeds to
be avoided.
>> +}> +
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>> +		pr_err("block devices with conventional zones are not supported.");
>> +		return false;
>> +	}
> It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
> all backend zone configuration checks are together.
Okay.
>> +
>> +	if (!nvmet_bdev_validate_zns_zones(ns))
>> +		return false;
>> +
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>> +
>> +	nvmet_zns_update_zasl(ns);
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * ZNS related Admin and I/O command handlers.
>> + */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status = 0;
>> +	u8 mdts;
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * Even though this function sets Zone Append Size Limit to 0,
>> +	 * the 0 value here indicates that the maximum data transfer size for
>> +	 * the Zone Append command is indicated by the ctrl
>> +	 * Maximum Data Transfer Size (MDTS).
>> +	 */
> I do not understand this comment. It does not really exactly match what the code
> below is doing.
Let me fix the code and the comment in next version.
>> +
>> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
>> +
>> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u16 status = 0;
>> +	u64 zsze;
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +	if (!req->ns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
>> +	}
>> +
>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>> +					req->ns->blksize_shift;
>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>> +
>> +done:
>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>> +	kfree(id_zns);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +struct nvmet_report_zone_data {
>> +	struct nvmet_ns *ns;
>> +	struct nvme_zone_report *rz;
>> +};
>> +
>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>> +				     void *data)
>> +{
>> +	struct nvmet_report_zone_data *report_zone_data = data;
>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>> +	struct nvmet_ns *ns = report_zone_data->ns;
>> +
>> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>> +	entries[idx].zt = z->type;
>> +	entries[idx].zs = z->cond << 4;
>> +
>> +	return 0;
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
> size_t ?
Yes.
>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
>> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
> I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
> nvme_zone_report). I think what you want here is:
>
> nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
> 	sizeof(struct nvme_zone_descriptor);
>
> And then you can get rid of nvmet_zones_to_desc_size();
Yes, it doesn't need nvmet_zones_to_desc_size() anymore from v1.
>> +	int reported_zones;Maybe there is better way.
>> +	u16 status;
>> +
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>> +	if (!data.rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
>> +					     nvmet_bdev_report_zone_cb,
>> +					     &data);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(data.rz);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>> +	enum req_opf op = REQ_OP_LAST;
>> +	u16 status = NVME_SC_SUCCESS;
>> +	sector_t sect;
>> +	int ret;
>> +
>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>> +
>> +	switch (c->zsa) {
>> +	case NVME_ZONE_OPEN:
>> +		op = REQ_OP_ZONE_OPEN;
>> +		break;
>> +	case NVME_ZONE_CLOSE:
>> +		op = REQ_OP_ZONE_CLOSE;
>> +		break;
>> +	case NVME_ZONE_FINISH:
>> +		op = REQ_OP_ZONE_FINISH;
>> +		break;
>> +	case NVME_ZONE_RESET:
>> +		if (c->select_all)
>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>> +		op = REQ_OP_ZONE_RESET;
>> +		break;
>> +	default:
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	unsigned long bv_cnt = req->sg_cnt;
>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	size_t mapped_data_len = 0;
>> +	int sg_cnt = req->sg_cnt;
>> +	struct scatterlist *sg;
>> +	struct iov_iter from;
>> +	struct bio_vec *bvec;
>> +	size_t mapped_cnt;
>> +	struct bio *bio;
>> +	int ret;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
>> +
>> +	/*
>> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
>> +	 * don't have to split the bio, i.e. we shouldn't get
>> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
>> +	 * with the size that considers the BIO_MAX_PAGES.
>> +	 */
> What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
> zone append sectors). What does BIO_MAX_PAGES has to do with anything ?
That is not true, ctrl->zasl is advertised
min(min all ns max zone append sectors), bio_max_zasl) to avoid the
splitting
of the bio on the target side:-

See nvmet_zns_update_zasl()

+	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);

Without considering the bio_max_pages we may have to set the ctrl->zasl
value
thatis > bio_max_pages_zasl, then we'll get a request that is greater
than what
one bio can handle, that will lead to splitting the bios, which we want to
avoid as per the comment in the V1.

Can you please explain what is wrong with this approach which regulates the
zasl with all the possible namespaces zasl and bio_zasl?

May be there is better way?
>> +	if (!req->sg_cnt)
>> +		goto out;
>> +
>> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>> +	if (!bvec) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>> +		nvmet_file_init_bvec(bvec, sg);
>> +		mapped_data_len += bvec[mapped_cnt].bv_len;
>> +		sg_cnt--;
>> +		if (mapped_cnt == bv_cnt)
>> +			brhigh frequency I/O operationeak;
>> +	}
>> +
>> +	if (WARN_ON(sg_cnt)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>> +
>> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
>> +	bio_set_dev(bio, nvmet_bdev(req));
>> +	bio->bi_iter.bi_sector = sect;
>> +	bio->bi_opf = op;
> The op variable is used only here. It is not necessary.
Yes.
>> +
>> +	ret =  __bio_iov_append_get_pages(bio, &from);
> I still do not see why bio_iov_iter_get_pages() does not work here. Would you
> care to explain please ?
>
The same reason pointed out by the Johannes. Instead of calling wrapper,
call the underlaying core API, since we don't care about the rest of the
generic code. This also avoids an extra function callfor zone-append
fast path.
>> +	if (unlikely(ret)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		bio_io_error(bio);
>> +		goto bvec_free;
>> +	}
>> +
>> +	ret = submit_bio_wait(bio);
>> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
>> +	bio_put(bio);
>> +
>> +	sect += (mapped_data_len >> 9);
>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>> +
>> +bvec_free:
>> +	kfree(bvec);
>> +
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +#else  /* CONFIG_BLK_DEV_ZONED */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +}
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	return 0;
>> +}
>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>> +{
>> +	return false;
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +}
>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>
>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 8/9] nvmet: add zns bdev config support
  2020-11-30 12:02     ` Damien Le Moal
@ 2020-12-01  3:40       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:40 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/30/20 04:02, Damien Le Moal wrote:
>>  
>>
> I think this should be merged with patch 9. I do not see the point in having
> this as a separate patch.

These two patches are doing different things, one deals with the target side

configuration management and other deals with the host side I/O handlers
which

has no relation to the target side configuration management, so I kept
it separate.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 8/9] nvmet: add zns bdev config support
@ 2020-12-01  3:40       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:40 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/30/20 04:02, Damien Le Moal wrote:
>>  
>>
> I think this should be merged with patch 9. I do not see the point in having
> this as a separate patch.

These two patches are doing different things, one deals with the target side

configuration management and other deals with the host side I/O handlers
which

has no relation to the target side configuration management, so I kept
it separate.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
  2020-11-30  6:51   ` Damien Le Moal
@ 2020-12-01  3:42     ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:42 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/29/20 22:52, Damien Le Moal wrote:
>>  block/bio.c                       |   3 +-
>>  drivers/nvme/target/Makefile      |   2 +-
>>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  19 ++
>>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>>  include/linux/bio.h               |   1 +
>>  8 files changed, 524 insertions(+), 16 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
> I had a few questions about the failed zonefs tests that you reported in the
> cover letter of V1. Did you run the tests again with V2 ? Do you still see the
> errors or not ?
>
Please have a look at V3 cover letter, it has updated test log.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-12-01  3:42     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:42 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/29/20 22:52, Damien Le Moal wrote:
>>  block/bio.c                       |   3 +-
>>  drivers/nvme/target/Makefile      |   2 +-
>>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  19 ++
>>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>>  include/linux/bio.h               |   1 +
>>  8 files changed, 524 insertions(+), 16 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
> I had a few questions about the failed zonefs tests that you reported in the
> cover letter of V1. Did you run the tests again with V2 ? Do you still see the
> errors or not ?
>
Please have a look at V3 cover letter, it has updated test log.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-30 12:29     ` Johannes Thumshirn
@ 2020-12-01  3:49       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:49 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-block, linux-nvme; +Cc: hch, sagi, Damien Le Moal

On 11/30/20 04:29, Johannes Thumshirn wrote:
> On 30/11/2020 04:32, Chaitanya Kulkarni wrote:
>> +	ret =  __bio_iov_append_get_pages(bio, &from);
> Can't you just use bio_iov_iter_get_pages() here?
>
> It does have a 
>
> if (WARN_ON_ONCE(is_bvec))
> 	return -EINVAL;
>
> in it but I think that can be deleted.
>
That was my initial patch but it adds an extra function call to the

fast patch for NVMeOF. We don't need any of the generic functionality from

bio_iov_iter_get_pages() anyway.


Why add an extra function call overhead in the hot path for each I/O ?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-12-01  3:49       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:49 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-block, linux-nvme; +Cc: Damien Le Moal, hch, sagi

On 11/30/20 04:29, Johannes Thumshirn wrote:
> On 30/11/2020 04:32, Chaitanya Kulkarni wrote:
>> +	ret =  __bio_iov_append_get_pages(bio, &from);
> Can't you just use bio_iov_iter_get_pages() here?
>
> It does have a 
>
> if (WARN_ON_ONCE(is_bvec))
> 	return -EINVAL;
>
> in it but I think that can be deleted.
>
That was my initial patch but it adds an extra function call to the

fast patch for NVMeOF. We don't need any of the generic functionality from

bio_iov_iter_get_pages() anyway.


Why add an extra function call overhead in the hot path for each I/O ?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-12-01  3:38       ` Chaitanya Kulkarni
@ 2020-12-01  5:44         ` Damien Le Moal
  -1 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-12-01  5:44 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/12/01 12:38, Chaitanya Kulkarni wrote:
[...]
>>> +	 * namespaces.
>>> +	 */
>>> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>>> +
>>> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
>>> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
>>> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
>>> +		u8 izasl = nvmet_zasl(imax_za_sects);
>>> +
>>> +		if (!bdev_is_zoned(ins->bdev))
>>> +			continue;
>>> +
>>> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
>>> +	}
>>> +
>>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
>> I do not understand what bio_max_zasl is used for here, as it does not represent
>> anything related to the zoned namespaces backends.
> If we don't consider the bio max zasl then we will get the request more than
> one bio can handle for zone append and will lead to bio split whichneeds to
> be avoided.

Why ? zasl comes from the target backend max zone append sectors, which we
already know is OK and does not lead to BIO splitting for zone append. So why do
you need an extra verification against that bio_max_zasl ?

>>> +}> +
>>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>>> +{
>>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>>> +		pr_err("block devices with conventional zones are not supported.");
>>> +		return false;
>>> +	}
>> It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
>> all backend zone configuration checks are together.
> Okay.
>>> +
>>> +	if (!nvmet_bdev_validate_zns_zones(ns))
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>>> +	 * to the device physical block size. So use this value as the logical
>>> +	 * block size to avoid errors.
>>> +	 */
>>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>>> +
>>> +	nvmet_zns_update_zasl(ns);
>>> +
>>> +	return true;
>>> +}
>>> +
>>> +/*
>>> + * ZNS related Admin and I/O command handlers.
>>> + */
>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>> +{
>>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>> +	struct nvme_id_ctrl_zns *id;
>>> +	u16 status = 0;
>>> +	u8 mdts;
>>> +
>>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>>> +	if (!id) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Even though this function sets Zone Append Size Limit to 0,
>>> +	 * the 0 value here indicates that the maximum data transfer size for
>>> +	 * the Zone Append command is indicated by the ctrl
>>> +	 * Maximum Data Transfer Size (MDTS).
>>> +	 */
>> I do not understand this comment. It does not really exactly match what the code
>> below is doing.
> Let me fix the code and the comment in next version.
>>> +
>>> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
>>> +
>>> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>>> +
>>> +	kfree(id);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +	struct nvme_id_ns_zns *id_zns;
>>> +	u16 status = 0;
>>> +	u64 zsze;
>>> +
>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>> +	if (!id_zns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>>> +	if (!req->ns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto done;
>>> +	}
>>> +
>>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto done;
>>> +	}
>>> +
>>> +	nvmet_ns_revalidate(req->ns);
>>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>>> +					req->ns->blksize_shift;
>>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>>> +
>>> +done:
>>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>>> +	kfree(id_zns);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +struct nvmet_report_zone_data {
>>> +	struct nvmet_ns *ns;
>>> +	struct nvme_zone_report *rz;
>>> +};
>>> +
>>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>>> +				     void *data)
>>> +{
>>> +	struct nvmet_report_zone_data *report_zone_data = data;
>>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>>> +	struct nvmet_ns *ns = report_zone_data->ns;
>>> +
>>> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>>> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>>> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>>> +	entries[idx].zt = z->type;
>>> +	entries[idx].zs = z->cond << 4;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>> size_t ?
> Yes.
>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
>>> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
>> I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
>> nvme_zone_report). I think what you want here is:
>>
>> nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>> 	sizeof(struct nvme_zone_descriptor);
>>
>> And then you can get rid of nvmet_zones_to_desc_size();
> Yes, it doesn't need nvmet_zones_to_desc_size() anymore from v1.
>>> +	int reported_zones;Maybe there is better way.
>>> +	u16 status;
>>> +
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>> +	if (!data.rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
>>> +					     nvmet_bdev_report_zone_cb,
>>> +					     &data);
>>> +	if (reported_zones < 0) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_report_zones;
>>> +	}
>>> +
>>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>>> +
>>> +out_free_report_zones:
>>> +	kvfree(data.rz);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>>> +	enum req_opf op = REQ_OP_LAST;
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	sector_t sect;
>>> +	int ret;
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>>> +
>>> +	switch (c->zsa) {
>>> +	case NVME_ZONE_OPEN:
>>> +		op = REQ_OP_ZONE_OPEN;
>>> +		break;
>>> +	case NVME_ZONE_CLOSE:
>>> +		op = REQ_OP_ZONE_CLOSE;
>>> +		break;
>>> +	case NVME_ZONE_FINISH:
>>> +		op = REQ_OP_ZONE_FINISH;
>>> +		break;
>>> +	case NVME_ZONE_RESET:
>>> +		if (c->select_all)
>>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>>> +		op = REQ_OP_ZONE_RESET;
>>> +		break;
>>> +	default:
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		goto out;
>>> +	}
>>> +
>>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>>> +	if (ret)
>>> +		status = NVME_SC_INTERNAL;
>>> +
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +	unsigned long bv_cnt = req->sg_cnt;
>>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	size_t mapped_data_len = 0;
>>> +	int sg_cnt = req->sg_cnt;
>>> +	struct scatterlist *sg;
>>> +	struct iov_iter from;
>>> +	struct bio_vec *bvec;
>>> +	size_t mapped_cnt;
>>> +	struct bio *bio;
>>> +	int ret;
>>> +
>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>> +		return;
>>> +
>>> +	/*
>>> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
>>> +	 * don't have to split the bio, i.e. we shouldn't get
>>> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
>>> +	 * with the size that considers the BIO_MAX_PAGES.
>>> +	 */
>> What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
>> zone append sectors). What does BIO_MAX_PAGES has to do with anything ?
> That is not true, ctrl->zasl is advertised
> min(min all ns max zone append sectors), bio_max_zasl) to avoid the
> splitting
> of the bio on the target side:-

We already know that zasl for each NS, given by the max zone append sector of
the backends, are already OK, regardless of BIO_MAX_PAGES (see below).

> 
> See nvmet_zns_update_zasl()
> 
> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
> 
> Without considering the bio_max_pages we may have to set the ctrl->zasl
> value
> thatis > bio_max_pages_zasl, then we'll get a request that is greater
> than what
> one bio can handle, that will lead to splitting the bios, which we want to
> avoid as per the comment in the V1.
> 
> Can you please explain what is wrong with this approach which regulates the
> zasl with all the possible namespaces zasl and bio_zasl?

For starter, with multi-page bvecs now, I am not sure BIO_MAX_PAGES makes any
sense anymore. Next, on the target side, the max zone append sectors limit for
each NS guarantee that a single BIO can be used without splitting needed. Taking
the min  of all of them will NOT remove that guarantee. Hence I think that
BIO_MAX_PAGES has nothing to do in the middle of all this.

> 
> May be there is better way?
>>> +	if (!req->sg_cnt)
>>> +		goto out;
>>> +
>>> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>>> +	if (!bvec) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>>> +		nvmet_file_init_bvec(bvec, sg);
>>> +		mapped_data_len += bvec[mapped_cnt].bv_len;
>>> +		sg_cnt--;
>>> +		if (mapped_cnt == bv_cnt)
>>> +			brhigh frequency I/O operationeak;
>>> +	}
>>> +
>>> +	if (WARN_ON(sg_cnt)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>>> +
>>> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
>>> +	bio_set_dev(bio, nvmet_bdev(req));
>>> +	bio->bi_iter.bi_sector = sect;
>>> +	bio->bi_opf = op;
>> The op variable is used only here. It is not necessary.
> Yes.
>>> +
>>> +	ret =  __bio_iov_append_get_pages(bio, &from);
>> I still do not see why bio_iov_iter_get_pages() does not work here. Would you
>> care to explain please ?
>>
> The same reason pointed out by the Johannes. Instead of calling wrapper,
> call the underlaying core API, since we don't care about the rest of the
> generic code. This also avoids an extra function callfor zone-append
> fast path.

I am still not convinced that __bio_iov_append_get_pages() will do the right
thing as it lacks the loop that bio_iov_iter_get_pages() has.

>>> +	if (unlikely(ret)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		bio_io_error(bio);
>>> +		goto bvec_free;
>>> +	}
>>> +
>>> +	ret = submit_bio_wait(bio);
>>> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
>>> +	bio_put(bio);
>>> +
>>> +	sect += (mapped_data_len >> 9);
>>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>>> +
>>> +bvec_free:
>>> +	kfree(bvec);
>>> +
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +#else  /* CONFIG_BLK_DEV_ZONED */
>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +}
>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>>> +{
>>> +	return 0;
>>> +}
>>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>>> +{
>>> +	return false;
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>>> +{
>>> +}
>>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>>
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-12-01  5:44         ` Damien Le Moal
  0 siblings, 0 replies; 44+ messages in thread
From: Damien Le Moal @ 2020-12-01  5:44 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/12/01 12:38, Chaitanya Kulkarni wrote:
[...]
>>> +	 * namespaces.
>>> +	 */
>>> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>>> +
>>> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
>>> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
>>> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
>>> +		u8 izasl = nvmet_zasl(imax_za_sects);
>>> +
>>> +		if (!bdev_is_zoned(ins->bdev))
>>> +			continue;
>>> +
>>> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
>>> +	}
>>> +
>>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
>> I do not understand what bio_max_zasl is used for here, as it does not represent
>> anything related to the zoned namespaces backends.
> If we don't consider the bio max zasl then we will get the request more than
> one bio can handle for zone append and will lead to bio split whichneeds to
> be avoided.

Why ? zasl comes from the target backend max zone append sectors, which we
already know is OK and does not lead to BIO splitting for zone append. So why do
you need an extra verification against that bio_max_zasl ?

>>> +}> +
>>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>>> +{
>>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>>> +		pr_err("block devices with conventional zones are not supported.");
>>> +		return false;
>>> +	}
>> It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
>> all backend zone configuration checks are together.
> Okay.
>>> +
>>> +	if (!nvmet_bdev_validate_zns_zones(ns))
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>>> +	 * to the device physical block size. So use this value as the logical
>>> +	 * block size to avoid errors.
>>> +	 */
>>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>>> +
>>> +	nvmet_zns_update_zasl(ns);
>>> +
>>> +	return true;
>>> +}
>>> +
>>> +/*
>>> + * ZNS related Admin and I/O command handlers.
>>> + */
>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>> +{
>>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>> +	struct nvme_id_ctrl_zns *id;
>>> +	u16 status = 0;
>>> +	u8 mdts;
>>> +
>>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>>> +	if (!id) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Even though this function sets Zone Append Size Limit to 0,
>>> +	 * the 0 value here indicates that the maximum data transfer size for
>>> +	 * the Zone Append command is indicated by the ctrl
>>> +	 * Maximum Data Transfer Size (MDTS).
>>> +	 */
>> I do not understand this comment. It does not really exactly match what the code
>> below is doing.
> Let me fix the code and the comment in next version.
>>> +
>>> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
>>> +
>>> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>>> +
>>> +	kfree(id);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +	struct nvme_id_ns_zns *id_zns;
>>> +	u16 status = 0;
>>> +	u64 zsze;
>>> +
>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>> +	if (!id_zns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>>> +	if (!req->ns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto done;
>>> +	}
>>> +
>>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto done;
>>> +	}
>>> +
>>> +	nvmet_ns_revalidate(req->ns);
>>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>>> +					req->ns->blksize_shift;
>>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>>> +
>>> +done:
>>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>>> +	kfree(id_zns);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +struct nvmet_report_zone_data {
>>> +	struct nvmet_ns *ns;
>>> +	struct nvme_zone_report *rz;
>>> +};
>>> +
>>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>>> +				     void *data)
>>> +{
>>> +	struct nvmet_report_zone_data *report_zone_data = data;
>>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>>> +	struct nvmet_ns *ns = report_zone_data->ns;
>>> +
>>> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>>> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>>> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>>> +	entries[idx].zt = z->type;
>>> +	entries[idx].zs = z->cond << 4;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>> size_t ?
> Yes.
>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
>>> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
>> I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
>> nvme_zone_report). I think what you want here is:
>>
>> nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>> 	sizeof(struct nvme_zone_descriptor);
>>
>> And then you can get rid of nvmet_zones_to_desc_size();
> Yes, it doesn't need nvmet_zones_to_desc_size() anymore from v1.
>>> +	int reported_zones;Maybe there is better way.
>>> +	u16 status;
>>> +
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>> +	if (!data.rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
>>> +					     nvmet_bdev_report_zone_cb,
>>> +					     &data);
>>> +	if (reported_zones < 0) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_report_zones;
>>> +	}
>>> +
>>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>>> +
>>> +out_free_report_zones:
>>> +	kvfree(data.rz);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>>> +	enum req_opf op = REQ_OP_LAST;
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	sector_t sect;
>>> +	int ret;
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>>> +
>>> +	switch (c->zsa) {
>>> +	case NVME_ZONE_OPEN:
>>> +		op = REQ_OP_ZONE_OPEN;
>>> +		break;
>>> +	case NVME_ZONE_CLOSE:
>>> +		op = REQ_OP_ZONE_CLOSE;
>>> +		break;
>>> +	case NVME_ZONE_FINISH:
>>> +		op = REQ_OP_ZONE_FINISH;
>>> +		break;
>>> +	case NVME_ZONE_RESET:
>>> +		if (c->select_all)
>>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>>> +		op = REQ_OP_ZONE_RESET;
>>> +		break;
>>> +	default:
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		goto out;
>>> +	}
>>> +
>>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>>> +	if (ret)
>>> +		status = NVME_SC_INTERNAL;
>>> +
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +	unsigned long bv_cnt = req->sg_cnt;
>>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	size_t mapped_data_len = 0;
>>> +	int sg_cnt = req->sg_cnt;
>>> +	struct scatterlist *sg;
>>> +	struct iov_iter from;
>>> +	struct bio_vec *bvec;
>>> +	size_t mapped_cnt;
>>> +	struct bio *bio;
>>> +	int ret;
>>> +
>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>> +		return;
>>> +
>>> +	/*
>>> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
>>> +	 * don't have to split the bio, i.e. we shouldn't get
>>> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
>>> +	 * with the size that considers the BIO_MAX_PAGES.
>>> +	 */
>> What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
>> zone append sectors). What does BIO_MAX_PAGES has to do with anything ?
> That is not true, ctrl->zasl is advertised
> min(min all ns max zone append sectors), bio_max_zasl) to avoid the
> splitting
> of the bio on the target side:-

We already know that zasl for each NS, given by the max zone append sector of
the backends, are already OK, regardless of BIO_MAX_PAGES (see below).

> 
> See nvmet_zns_update_zasl()
> 
> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
> 
> Without considering the bio_max_pages we may have to set the ctrl->zasl
> value
> thatis > bio_max_pages_zasl, then we'll get a request that is greater
> than what
> one bio can handle, that will lead to splitting the bios, which we want to
> avoid as per the comment in the V1.
> 
> Can you please explain what is wrong with this approach which regulates the
> zasl with all the possible namespaces zasl and bio_zasl?

For starter, with multi-page bvecs now, I am not sure BIO_MAX_PAGES makes any
sense anymore. Next, on the target side, the max zone append sectors limit for
each NS guarantee that a single BIO can be used without splitting needed. Taking
the min  of all of them will NOT remove that guarantee. Hence I think that
BIO_MAX_PAGES has nothing to do in the middle of all this.

> 
> May be there is better way?
>>> +	if (!req->sg_cnt)
>>> +		goto out;
>>> +
>>> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>>> +	if (!bvec) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>>> +		nvmet_file_init_bvec(bvec, sg);
>>> +		mapped_data_len += bvec[mapped_cnt].bv_len;
>>> +		sg_cnt--;
>>> +		if (mapped_cnt == bv_cnt)
>>> +			brhigh frequency I/O operationeak;
>>> +	}
>>> +
>>> +	if (WARN_ON(sg_cnt)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>>> +
>>> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
>>> +	bio_set_dev(bio, nvmet_bdev(req));
>>> +	bio->bi_iter.bi_sector = sect;
>>> +	bio->bi_opf = op;
>> The op variable is used only here. It is not necessary.
> Yes.
>>> +
>>> +	ret =  __bio_iov_append_get_pages(bio, &from);
>> I still do not see why bio_iov_iter_get_pages() does not work here. Would you
>> care to explain please ?
>>
> The same reason pointed out by the Johannes. Instead of calling wrapper,
> call the underlaying core API, since we don't care about the rest of the
> generic code. This also avoids an extra function callfor zone-append
> fast path.

I am still not convinced that __bio_iov_append_get_pages() will do the right
thing as it lacks the loop that bio_iov_iter_get_pages() has.

>>> +	if (unlikely(ret)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		bio_io_error(bio);
>>> +		goto bvec_free;
>>> +	}
>>> +
>>> +	ret = submit_bio_wait(bio);
>>> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
>>> +	bio_put(bio);
>>> +
>>> +	sect += (mapped_data_len >> 9);
>>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>>> +
>>> +bvec_free:
>>> +	kfree(bvec);
>>> +
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +#else  /* CONFIG_BLK_DEV_ZONED */
>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +}
>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>>> +{
>>> +	return 0;
>>> +}
>>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>>> +{
>>> +	return false;
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>>> +{
>>> +}
>>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>>
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-12-01  3:49       ` Chaitanya Kulkarni
@ 2020-12-01  7:51         ` Johannes Thumshirn
  -1 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2020-12-01  7:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi, Damien Le Moal

On 01/12/2020 04:49, Chaitanya Kulkarni wrote:
> On 11/30/20 04:29, Johannes Thumshirn wrote:
>> On 30/11/2020 04:32, Chaitanya Kulkarni wrote:
>>> +	ret =  __bio_iov_append_get_pages(bio, &from);
>> Can't you just use bio_iov_iter_get_pages() here?
>>
>> It does have a 
>>
>> if (WARN_ON_ONCE(is_bvec))
>> 	return -EINVAL;
>>
>> in it but I think that can be deleted.
>>
> That was my initial patch but it adds an extra function call to the
> 
> fast patch for NVMeOF. We don't need any of the generic functionality from
> 
> bio_iov_iter_get_pages() anyway.
> 
> 
> Why add an extra function call overhead in the hot path for each I/O ?

At least in my compilation (gcc 10.1) there's now extra function call overhead.
__bio_iov_append_get_pages() get's fully inlined into bio_iov_iter_get_pages().

$ make block/bio.s
  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND  objtool
  CC      block/bio.s
$ grep __bio_iov_append_get_pages block/bio.s
$ grep bio_iov_iter_get_pages block/bio.s
__kstrtab_bio_iov_iter_get_pages:
        .asciz  "bio_iov_iter_get_pages"
__kstrtabns_bio_iov_iter_get_pages:
        .section "___ksymtab_gpl+bio_iov_iter_get_pages", "a"
__ksymtab_bio_iov_iter_get_pages:
        .long   bio_iov_iter_get_pages- .
        .long   __kstrtab_bio_iov_iter_get_pages- .
        .long   __kstrtabns_bio_iov_iter_get_pages- .
        .globl  bio_iov_iter_get_pages
        .type   bio_iov_iter_get_pages, @function
bio_iov_iter_get_pages:
        .type   bio_iov_iter_get_pages.cold, @function
bio_iov_iter_get_pages.cold:
        .size   bio_iov_iter_get_pages, .-bio_iov_iter_get_pages
        .size   bio_iov_iter_get_pages.cold, .-bio_iov_iter_get_pages.cold
        .type   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, @object
        .size   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, 8
__UNIQUE_ID___addressable_bio_iov_iter_get_pages499:
        .quad   bio_iov_iter_get_pages


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-12-01  7:51         ` Johannes Thumshirn
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Thumshirn @ 2020-12-01  7:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: Damien Le Moal, hch, sagi

On 01/12/2020 04:49, Chaitanya Kulkarni wrote:
> On 11/30/20 04:29, Johannes Thumshirn wrote:
>> On 30/11/2020 04:32, Chaitanya Kulkarni wrote:
>>> +	ret =  __bio_iov_append_get_pages(bio, &from);
>> Can't you just use bio_iov_iter_get_pages() here?
>>
>> It does have a 
>>
>> if (WARN_ON_ONCE(is_bvec))
>> 	return -EINVAL;
>>
>> in it but I think that can be deleted.
>>
> That was my initial patch but it adds an extra function call to the
> 
> fast patch for NVMeOF. We don't need any of the generic functionality from
> 
> bio_iov_iter_get_pages() anyway.
> 
> 
> Why add an extra function call overhead in the hot path for each I/O ?

At least in my compilation (gcc 10.1) there's now extra function call overhead.
__bio_iov_append_get_pages() get's fully inlined into bio_iov_iter_get_pages().

$ make block/bio.s
  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND  objtool
  CC      block/bio.s
$ grep __bio_iov_append_get_pages block/bio.s
$ grep bio_iov_iter_get_pages block/bio.s
__kstrtab_bio_iov_iter_get_pages:
        .asciz  "bio_iov_iter_get_pages"
__kstrtabns_bio_iov_iter_get_pages:
        .section "___ksymtab_gpl+bio_iov_iter_get_pages", "a"
__ksymtab_bio_iov_iter_get_pages:
        .long   bio_iov_iter_get_pages- .
        .long   __kstrtab_bio_iov_iter_get_pages- .
        .long   __kstrtabns_bio_iov_iter_get_pages- .
        .globl  bio_iov_iter_get_pages
        .type   bio_iov_iter_get_pages, @function
bio_iov_iter_get_pages:
        .type   bio_iov_iter_get_pages.cold, @function
bio_iov_iter_get_pages.cold:
        .size   bio_iov_iter_get_pages, .-bio_iov_iter_get_pages
        .size   bio_iov_iter_get_pages.cold, .-bio_iov_iter_get_pages.cold
        .type   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, @object
        .size   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, 8
__UNIQUE_ID___addressable_bio_iov_iter_get_pages499:
        .quad   bio_iov_iter_get_pages


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-12-01  7:51         ` Johannes Thumshirn
@ 2020-12-01 22:36           ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01 22:36 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-block, linux-nvme; +Cc: hch, sagi, Damien Le Moal

On 11/30/20 23:51, Johannes Thumshirn wrote:
>> Why add an extra function call overhead in the hot path for each I/O ?
> At least in my compilation (gcc 10.1) there's now extra function call overhead.
> __bio_iov_append_get_pages() get's fully inlined into bio_iov_iter_get_pages().
>
> $ make block/bio.s
>   CALL    scripts/checksyscalls.sh
>   CALL    scripts/atomic/check-atomics.sh
>   DESCEND  objtool
>   CC      block/bio.s
> $ grep __bio_iov_append_get_pages block/bio.s
> $ grep bio_iov_iter_get_pages block/bio.s
> __kstrtab_bio_iov_iter_get_pages:
>         .asciz  "bio_iov_iter_get_pages"
> __kstrtabns_bio_iov_iter_get_pages:
>         .section "___ksymtab_gpl+bio_iov_iter_get_pages", "a"
> __ksymtab_bio_iov_iter_get_pages:
>         .long   bio_iov_iter_get_pages- .
>         .long   __kstrtab_bio_iov_iter_get_pages- .
>         .long   __kstrtabns_bio_iov_iter_get_pages- .
>         .globl  bio_iov_iter_get_pages
>         .type   bio_iov_iter_get_pages, @function
> bio_iov_iter_get_pages:
>         .type   bio_iov_iter_get_pages.cold, @function
> bio_iov_iter_get_pages.cold:
>         .size   bio_iov_iter_get_pages, .-bio_iov_iter_get_pages
>         .size   bio_iov_iter_get_pages.cold, .-bio_iov_iter_get_pages.cold
>         .type   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, @object
>         .size   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, 8
> __UNIQUE_ID___addressable_bio_iov_iter_get_pages499:
>         .quad   bio_iov_iter_get_pages
>
>
So it does on mine too, but it doesn't guarantee all the platforms.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-12-01 22:36           ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01 22:36 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-block, linux-nvme; +Cc: Damien Le Moal, hch, sagi

On 11/30/20 23:51, Johannes Thumshirn wrote:
>> Why add an extra function call overhead in the hot path for each I/O ?
> At least in my compilation (gcc 10.1) there's now extra function call overhead.
> __bio_iov_append_get_pages() get's fully inlined into bio_iov_iter_get_pages().
>
> $ make block/bio.s
>   CALL    scripts/checksyscalls.sh
>   CALL    scripts/atomic/check-atomics.sh
>   DESCEND  objtool
>   CC      block/bio.s
> $ grep __bio_iov_append_get_pages block/bio.s
> $ grep bio_iov_iter_get_pages block/bio.s
> __kstrtab_bio_iov_iter_get_pages:
>         .asciz  "bio_iov_iter_get_pages"
> __kstrtabns_bio_iov_iter_get_pages:
>         .section "___ksymtab_gpl+bio_iov_iter_get_pages", "a"
> __ksymtab_bio_iov_iter_get_pages:
>         .long   bio_iov_iter_get_pages- .
>         .long   __kstrtab_bio_iov_iter_get_pages- .
>         .long   __kstrtabns_bio_iov_iter_get_pages- .
>         .globl  bio_iov_iter_get_pages
>         .type   bio_iov_iter_get_pages, @function
> bio_iov_iter_get_pages:
>         .type   bio_iov_iter_get_pages.cold, @function
> bio_iov_iter_get_pages.cold:
>         .size   bio_iov_iter_get_pages, .-bio_iov_iter_get_pages
>         .size   bio_iov_iter_get_pages.cold, .-bio_iov_iter_get_pages.cold
>         .type   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, @object
>         .size   __UNIQUE_ID___addressable_bio_iov_iter_get_pages499, 8
> __UNIQUE_ID___addressable_bio_iov_iter_get_pages499:
>         .quad   bio_iov_iter_get_pages
>
>
So it does on mine too, but it doesn't guarantee all the platforms.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
  2020-12-01  5:44         ` Damien Le Moal
@ 2020-12-01 22:37           ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01 22:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/30/20 21:44, Damien Le Moal wrote:
> On 2020/12/01 12:38, Chaitanya Kulkarni wrote:
> [...]
>>>> +	 * namespaces.
>>>> +	 */
>>>> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>>>> +
>>>> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
>>>> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
>>>> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
>>>> +		u8 izasl = nvmet_zasl(imax_za_sects);
>>>> +
>>>> +		if (!bdev_is_zoned(ins->bdev))
>>>> +			continue;
>>>> +
>>>> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
>>>> +	}
>>>> +
>>>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
>>> I do not understand what bio_max_zasl is used for here, as it does not represent
>>> anything related to the zoned namespaces backends.
>> If we don't consider the bio max zasl then we will get the request more than
>> one bio can handle for zone append and will lead to bio split whichneeds to
>> be avoided.
> Why ? zasl comes from the target backend max zone append sectors, which we
> already know is OK and does not lead to BIO splitting for zone append. So why do
> you need an extra verification against that bio_max_zasl ?
>
>>>> +}> +
>>>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>>>> +{
>>>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>>>> +		pr_err("block devices with conventional zones are not supported.");
>>>> +		return false;
>>>> +	}
>>> It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
>>> all backend zone configuration checks are together.
>> Okay.
>>>> +
>>>> +	if (!nvmet_bdev_validate_zns_zones(ns))
>>>> +		return false;
>>>> +
>>>> +	/*
>>>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>>>> +	 * to the device physical block size. So use this value as the logical
>>>> +	 * block size to avoid errors.
>>>> +	 */
>>>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>>>> +
>>>> +	nvmet_zns_update_zasl(ns);
>>>> +
>>>> +	return true;
>>>> +}
>>>> +
>>>> +/*
>>>> + * ZNS related Admin and I/O command handlers.
>>>> + */
>>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>>> +{
>>>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>>> +	struct nvme_id_ctrl_zns *id;
>>>> +	u16 status = 0;
>>>> +	u8 mdts;
>>>> +
>>>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>>>> +	if (!id) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Even though this function sets Zone Append Size Limit to 0,
>>>> +	 * the 0 value here indicates that the maximum data transfer size for
>>>> +	 * the Zone Append command is indicated by the ctrl
>>>> +	 * Maximum Data Transfer Size (MDTS).
>>>> +	 */
>>> I do not understand this comment. It does not really exactly match what the code
>>> below is doing.
>> Let me fix the code and the comment in next version.
>>>> +
>>>> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
>>>> +
>>>> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
>>>> +
>>>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>>>> +
>>>> +	kfree(id);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>>> +{
>>>> +	struct nvme_id_ns_zns *id_zns;
>>>> +	u16 status = 0;
>>>> +	u64 zsze;
>>>> +
>>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>>> +	if (!id_zns) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>>>> +	if (!req->ns) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto done;
>>>> +	}
>>>> +
>>>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>>> +		goto done;
>>>> +	}
>>>> +
>>>> +	nvmet_ns_revalidate(req->ns);
>>>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>>>> +					req->ns->blksize_shift;
>>>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>>>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>>>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>>>> +
>>>> +done:
>>>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>>>> +	kfree(id_zns);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +struct nvmet_report_zone_data {
>>>> +	struct nvmet_ns *ns;
>>>> +	struct nvme_zone_report *rz;
>>>> +};
>>>> +
>>>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>>>> +				     void *data)
>>>> +{
>>>> +	struct nvmet_report_zone_data *report_zone_data = data;
>>>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>>>> +	struct nvmet_ns *ns = report_zone_data->ns;
>>>> +
>>>> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>>>> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>>>> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>>>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>>>> +	entries[idx].zt = z->type;
>>>> +	entries[idx].zs = z->cond << 4;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>> +{
>>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>> size_t ?
>> Yes.
>>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
>>>> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
>>> I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
>>> nvme_zone_report). I think what you want here is:
>>>
>>> nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>> 	sizeof(struct nvme_zone_descriptor);
>>>
>>> And then you can get rid of nvmet_zones_to_desc_size();
>> Yes, it doesn't need nvmet_zones_to_desc_size() anymore from v1.
>>>> +	int reported_zones;Maybe there is better way.
>>>> +	u16 status;
>>>> +
>>>> +	status = nvmet_bdev_zns_checks(req);
>>>> +	if (status)
>>>> +		goto out;
>>>> +
>>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>>> +	if (!data.rz) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
>>>> +					     nvmet_bdev_report_zone_cb,
>>>> +					     &data);
>>>> +	if (reported_zones < 0) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out_free_report_zones;
>>>> +	}
>>>> +
>>>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>>>> +
>>>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>>>> +
>>>> +out_free_report_zones:
>>>> +	kvfree(data.rz);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>>> +{
>>>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>>>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>>>> +	enum req_opf op = REQ_OP_LAST;
>>>> +	u16 status = NVME_SC_SUCCESS;
>>>> +	sector_t sect;
>>>> +	int ret;
>>>> +
>>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>>>> +
>>>> +	switch (c->zsa) {
>>>> +	case NVME_ZONE_OPEN:
>>>> +		op = REQ_OP_ZONE_OPEN;
>>>> +		break;
>>>> +	case NVME_ZONE_CLOSE:
>>>> +		op = REQ_OP_ZONE_CLOSE;
>>>> +		break;
>>>> +	case NVME_ZONE_FINISH:
>>>> +		op = REQ_OP_ZONE_FINISH;
>>>> +		break;
>>>> +	case NVME_ZONE_RESET:
>>>> +		if (c->select_all)
>>>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>>>> +		op = REQ_OP_ZONE_RESET;
>>>> +		break;
>>>> +	default:
>>>> +		status = NVME_SC_INVALID_FIELD;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		status = NVME_SC_INTERNAL;
>>>> +
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>>> +{
>>>> +	unsigned long bv_cnt = req->sg_cnt;
>>>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>>>> +	u16 status = NVME_SC_SUCCESS;
>>>> +	size_t mapped_data_len = 0;
>>>> +	int sg_cnt = req->sg_cnt;
>>>> +	struct scatterlist *sg;
>>>> +	struct iov_iter from;
>>>> +	struct bio_vec *bvec;
>>>> +	size_t mapped_cnt;
>>>> +	struct bio *bio;
>>>> +	int ret;
>>>> +
>>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>>> +		return;
>>>> +
>>>> +	/*
>>>> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
>>>> +	 * don't have to split the bio, i.e. we shouldn't get
>>>> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
>>>> +	 * with the size that considers the BIO_MAX_PAGES.
>>>> +	 */
>>> What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
>>> zone append sectors). What does BIO_MAX_PAGES has to do with anything ?
>> That is not true, ctrl->zasl is advertised
>> min(min all ns max zone append sectors), bio_max_zasl) to avoid the
>> splitting
>> of the bio on the target side:-
> We already know that zasl for each NS, given by the max zone append sector of
> the backends, are already OK, regardless of BIO_MAX_PAGES (see below).
>
>> See nvmet_zns_update_zasl()
>>
>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
>>
>> Without considering the bio_max_pages we may have to set the ctrl->zasl
>> value
>> thatis > bio_max_pages_zasl, then we'll get a request that is greater
>> than what
>> one bio can handle, that will lead to splitting the bios, which we want to
>> avoid as per the comment in the V1.
>>
>> Can you please explain what is wrong with this approach which regulates the
>> zasl with all the possible namespaces zasl and bio_zasl?
> For starter, with multi-page bvecs now, I am not sure BIO_MAX_PAGES makes any
> sense anymore. Next, on the target side, the max zone append sectors limit for
> each NS guarantee that a single BIO can be used without splitting needed. Taking
> the min  of all of them will NOT remove that guarantee. Hence I think that
> BIO_MAX_PAGES has nothing to do in the middle of all this.
>
Okay, let me remove the bio size check and send v4.
>> May be there is better way?
>>>> +	if (!req->sg_cnt)
>>>> +		goto out;
>>>> +
>>>> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>>>> +	if (!bvec) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>>>> +		nvmet_file_init_bvec(bvec, sg);
>>>> +		mapped_data_len += bvec[mapped_cnt].bv_len;
>>>> +		sg_cnt--;
>>>> +		if (mapped_cnt == bv_cnt)
>>>> +			brhigh frequency I/O operationeak;
>>>> +	}
>>>> +
>>>> +	if (WARN_ON(sg_cnt)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>>>> +
>>>> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
>>>> +	bio_set_dev(bio, nvmet_bdev(req));
>>>> +	bio->bi_iter.bi_sector = sect;
>>>> +	bio->bi_opf = op;
>>> The op variable is used only here. It is not necessary.
>> Yes.
>>>> +
>>>> +	ret =  __bio_iov_append_get_pages(bio, &from);
>>> I still do not see why bio_iov_iter_get_pages() does not work here. Would you
>>> care to explain please ?
>>>
>> The same reason pointed out by the Johannes. Instead of calling wrapper,
>> call the underlaying core API, since we don't care about the rest of the
>> generic code. This also avoids an extra function callfor zone-append
>> fast path.
> I am still not convinced that __bio_iov_append_get_pages() will do the right
> thing as it lacks the loop that bio_iov_iter_get_pages() has.
>
>>>> +	if (unlikely(ret)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		bio_io_error(bio);
>>>> +		goto bvec_free;
>>>> +	}
>>>> +
>>>> +	ret = submit_bio_wait(bio);
>>>> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
>>>> +	bio_put(bio);
>>>> +
>>>> +	sect += (mapped_data_len >> 9);
>>>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>>>> +
>>>> +bvec_free:
>>>> +	kfree(bvec);
>>>> +
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +#else  /* CONFIG_BLK_DEV_ZONED */
>>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>>>> +{
>>>> +	return false;
>>>> +}
>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>>>> +{
>>>> +}
>>>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>>>
>>
>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-12-01 22:37           ` Chaitanya Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01 22:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/30/20 21:44, Damien Le Moal wrote:
> On 2020/12/01 12:38, Chaitanya Kulkarni wrote:
> [...]
>>>> +	 * namespaces.
>>>> +	 */
>>>> +	min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>>>> +
>>>> +	xa_for_each(&(ns->subsys->namespaces), idx, ins) {
>>>> +		struct request_queue *iq = ins->bdev->bd_disk->queue;
>>>> +		unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
>>>> +		u8 izasl = nvmet_zasl(imax_za_sects);
>>>> +
>>>> +		if (!bdev_is_zoned(ins->bdev))
>>>> +			continue;
>>>> +
>>>> +		min_zasl = min_zasl > izasl ? izasl : min_zasl;
>>>> +	}
>>>> +
>>>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
>>> I do not understand what bio_max_zasl is used for here, as it does not represent
>>> anything related to the zoned namespaces backends.
>> If we don't consider the bio max zasl then we will get the request more than
>> one bio can handle for zone append and will lead to bio split whichneeds to
>> be avoided.
> Why ? zasl comes from the target backend max zone append sectors, which we
> already know is OK and does not lead to BIO splitting for zone append. So why do
> you need an extra verification against that bio_max_zasl ?
>
>>>> +}> +
>>>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>>>> +{
>>>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>>>> +		pr_err("block devices with conventional zones are not supported.");
>>>> +		return false;
>>>> +	}
>>> It may be better to move this test in nvmet_bdev_validate_zns_zones(), so that
>>> all backend zone configuration checks are together.
>> Okay.
>>>> +
>>>> +	if (!nvmet_bdev_validate_zns_zones(ns))
>>>> +		return false;
>>>> +
>>>> +	/*
>>>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>>>> +	 * to the device physical block size. So use this value as the logical
>>>> +	 * block size to avoid errors.
>>>> +	 */
>>>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>>>> +
>>>> +	nvmet_zns_update_zasl(ns);
>>>> +
>>>> +	return true;
>>>> +}
>>>> +
>>>> +/*
>>>> + * ZNS related Admin and I/O command handlers.
>>>> + */
>>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>>> +{
>>>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>>> +	struct nvme_id_ctrl_zns *id;
>>>> +	u16 status = 0;
>>>> +	u8 mdts;
>>>> +
>>>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>>>> +	if (!id) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Even though this function sets Zone Append Size Limit to 0,
>>>> +	 * the 0 value here indicates that the maximum data transfer size for
>>>> +	 * the Zone Append command is indicated by the ctrl
>>>> +	 * Maximum Data Transfer Size (MDTS).
>>>> +	 */
>>> I do not understand this comment. It does not really exactly match what the code
>>> below is doing.
>> Let me fix the code and the comment in next version.
>>>> +
>>>> +	mdts = ctrl->ops->get_mdts ? ctrl->ops->get_mdts(ctrl) : 0;
>>>> +
>>>> +	id->zasl = min_t(u8, mdts, req->sq->ctrl->subsys->id_ctrl_zns.zasl);
>>>> +
>>>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>>>> +
>>>> +	kfree(id);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>>> +{
>>>> +	struct nvme_id_ns_zns *id_zns;
>>>> +	u16 status = 0;
>>>> +	u64 zsze;
>>>> +
>>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>>> +	if (!id_zns) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>>>> +	if (!req->ns) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto done;
>>>> +	}
>>>> +
>>>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>>> +		goto done;
>>>> +	}
>>>> +
>>>> +	nvmet_ns_revalidate(req->ns);
>>>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>>>> +					req->ns->blksize_shift;
>>>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>>>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>>>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>>>> +
>>>> +done:
>>>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>>>> +	kfree(id_zns);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +struct nvmet_report_zone_data {
>>>> +	struct nvmet_ns *ns;
>>>> +	struct nvme_zone_report *rz;
>>>> +};
>>>> +
>>>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>>>> +				     void *data)
>>>> +{
>>>> +	struct nvmet_report_zone_data *report_zone_data = data;
>>>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>>>> +	struct nvmet_ns *ns = report_zone_data->ns;
>>>> +
>>>> +	entries[idx].zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>>>> +	entries[idx].zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>>>> +	entries[idx].wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>>>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>>>> +	entries[idx].zt = z->type;
>>>> +	entries[idx].zs = z->cond << 4;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>> +{
>>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>> size_t ?
>> Yes.
>>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(zmr->slba));
>>>> +	unsigned int nr_zones = bufsize / nvmet_zones_to_desc_size(1);
>>> I think this is wrong since nvmet_zones_to_desc_size includes sizeof(struct
>>> nvme_zone_report). I think what you want here is:
>>>
>>> nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>> 	sizeof(struct nvme_zone_descriptor);
>>>
>>> And then you can get rid of nvmet_zones_to_desc_size();
>> Yes, it doesn't need nvmet_zones_to_desc_size() anymore from v1.
>>>> +	int reported_zones;Maybe there is better way.
>>>> +	u16 status;
>>>> +
>>>> +	status = nvmet_bdev_zns_checks(req);
>>>> +	if (status)
>>>> +		goto out;
>>>> +
>>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>>> +	if (!data.rz) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nr_zones,
>>>> +					     nvmet_bdev_report_zone_cb,
>>>> +					     &data);
>>>> +	if (reported_zones < 0) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out_free_report_zones;
>>>> +	}
>>>> +
>>>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>>>> +
>>>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>>>> +
>>>> +out_free_report_zones:
>>>> +	kvfree(data.rz);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>>> +{
>>>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>>>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>>>> +	enum req_opf op = REQ_OP_LAST;
>>>> +	u16 status = NVME_SC_SUCCESS;
>>>> +	sector_t sect;
>>>> +	int ret;
>>>> +
>>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>>>> +
>>>> +	switch (c->zsa) {
>>>> +	case NVME_ZONE_OPEN:
>>>> +		op = REQ_OP_ZONE_OPEN;
>>>> +		break;
>>>> +	case NVME_ZONE_CLOSE:
>>>> +		op = REQ_OP_ZONE_CLOSE;
>>>> +		break;
>>>> +	case NVME_ZONE_FINISH:
>>>> +		op = REQ_OP_ZONE_FINISH;
>>>> +		break;
>>>> +	case NVME_ZONE_RESET:
>>>> +		if (c->select_all)
>>>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>>>> +		op = REQ_OP_ZONE_RESET;
>>>> +		break;
>>>> +	default:
>>>> +		status = NVME_SC_INVALID_FIELD;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		status = NVME_SC_INTERNAL;
>>>> +
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>>> +{
>>>> +	unsigned long bv_cnt = req->sg_cnt;
>>>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>>>> +	u16 status = NVME_SC_SUCCESS;
>>>> +	size_t mapped_data_len = 0;
>>>> +	int sg_cnt = req->sg_cnt;
>>>> +	struct scatterlist *sg;
>>>> +	struct iov_iter from;
>>>> +	struct bio_vec *bvec;
>>>> +	size_t mapped_cnt;
>>>> +	struct bio *bio;
>>>> +	int ret;
>>>> +
>>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>>> +		return;
>>>> +
>>>> +	/*
>>>> +	 * When setting the ctrl->zasl we consider the BIO_MAX_PAGES so that we
>>>> +	 * don't have to split the bio, i.e. we shouldn't get
>>>> +	 * sg_cnt > BIO_MAX_PAGES since zasl on the host will limit the I/Os
>>>> +	 * with the size that considers the BIO_MAX_PAGES.
>>>> +	 */
>>> What ? Unclear... You advertize max zone append as ctrl->zasl = min(all ns max
>>> zone append sectors). What does BIO_MAX_PAGES has to do with anything ?
>> That is not true, ctrl->zasl is advertised
>> min(min all ns max zone append sectors), bio_max_zasl) to avoid the
>> splitting
>> of the bio on the target side:-
> We already know that zasl for each NS, given by the max zone append sector of
> the backends, are already OK, regardless of BIO_MAX_PAGES (see below).
>
>> See nvmet_zns_update_zasl()
>>
>> +	ns->subsys->id_ctrl_zns.zasl = min_t(u8, min_zasl, bio_max_zasl);
>>
>> Without considering the bio_max_pages we may have to set the ctrl->zasl
>> value
>> thatis > bio_max_pages_zasl, then we'll get a request that is greater
>> than what
>> one bio can handle, that will lead to splitting the bios, which we want to
>> avoid as per the comment in the V1.
>>
>> Can you please explain what is wrong with this approach which regulates the
>> zasl with all the possible namespaces zasl and bio_zasl?
> For starter, with multi-page bvecs now, I am not sure BIO_MAX_PAGES makes any
> sense anymore. Next, on the target side, the max zone append sectors limit for
> each NS guarantee that a single BIO can be used without splitting needed. Taking
> the min  of all of them will NOT remove that guarantee. Hence I think that
> BIO_MAX_PAGES has nothing to do in the middle of all this.
>
Okay, let me remove the bio size check and send v4.
>> May be there is better way?
>>>> +	if (!req->sg_cnt)
>>>> +		goto out;
>>>> +
>>>> +	if (WARN_ON(req->sg_cnt > BIO_MAX_PAGES)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>>>> +	if (!bvec) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>>>> +		nvmet_file_init_bvec(bvec, sg);
>>>> +		mapped_data_len += bvec[mapped_cnt].bv_len;
>>>> +		sg_cnt--;
>>>> +		if (mapped_cnt == bv_cnt)
>>>> +			brhigh frequency I/O operationeak;
>>>> +	}
>>>> +
>>>> +	if (WARN_ON(sg_cnt)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>>>> +
>>>> +	bio = bio_alloc(GFP_KERNEL, bv_cnt);
>>>> +	bio_set_dev(bio, nvmet_bdev(req));
>>>> +	bio->bi_iter.bi_sector = sect;
>>>> +	bio->bi_opf = op;
>>> The op variable is used only here. It is not necessary.
>> Yes.
>>>> +
>>>> +	ret =  __bio_iov_append_get_pages(bio, &from);
>>> I still do not see why bio_iov_iter_get_pages() does not work here. Would you
>>> care to explain please ?
>>>
>> The same reason pointed out by the Johannes. Instead of calling wrapper,
>> call the underlaying core API, since we don't care about the rest of the
>> generic code. This also avoids an extra function callfor zone-append
>> fast path.
> I am still not convinced that __bio_iov_append_get_pages() will do the right
> thing as it lacks the loop that bio_iov_iter_get_pages() has.
>
>>>> +	if (unlikely(ret)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		bio_io_error(bio);
>>>> +		goto bvec_free;
>>>> +	}
>>>> +
>>>> +	ret = submit_bio_wait(bio);
>>>> +	status = ret < 0 ? NVME_SC_INTERNAL : status;
>>>> +	bio_put(bio);
>>>> +
>>>> +	sect += (mapped_data_len >> 9);
>>>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>>>> +
>>>> +bvec_free:
>>>> +	kfree(bvec);
>>>> +
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +#else  /* CONFIG_BLK_DEV_ZONED */
>>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>>>> +{
>>>> +	return false;
>>>> +}
>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>>> +{
>>>> +}
>>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>>>> +{
>>>> +}
>>>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>>>
>>
>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-12-01 22:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30  3:29 [PATCH 0/9] nvmet: add genblk ZBD backend Chaitanya Kulkarni
2020-11-30  3:29 ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 1/9] block: export __bio_iov_append_get_pages() Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 2/9] nvmet: add ZNS support for bdev-ns Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30 11:57   ` Damien Le Moal
2020-11-30 11:57     ` Damien Le Moal
2020-12-01  3:38     ` Chaitanya Kulkarni
2020-12-01  3:38       ` Chaitanya Kulkarni
2020-12-01  5:44       ` Damien Le Moal
2020-12-01  5:44         ` Damien Le Moal
2020-12-01 22:37         ` Chaitanya Kulkarni
2020-12-01 22:37           ` Chaitanya Kulkarni
2020-11-30 12:29   ` Johannes Thumshirn
2020-11-30 12:29     ` Johannes Thumshirn
2020-12-01  3:49     ` Chaitanya Kulkarni
2020-12-01  3:49       ` Chaitanya Kulkarni
2020-12-01  7:51       ` Johannes Thumshirn
2020-12-01  7:51         ` Johannes Thumshirn
2020-12-01 22:36         ` Chaitanya Kulkarni
2020-12-01 22:36           ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 3/9] nvmet: trim down id-desclist to use req->ns Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 4/9] nvmet: add NVME_CSI_ZNS in ns-desc for zbdev Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 6/9] nvmet: add cns-cs-ns " Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 7/9] nvmet: add zns cmd effects to support zbdev Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 8/9] nvmet: add zns bdev config support Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30 12:02   ` Damien Le Moal
2020-11-30 12:02     ` Damien Le Moal
2020-12-01  3:40     ` Chaitanya Kulkarni
2020-12-01  3:40       ` Chaitanya Kulkarni
2020-11-30  3:29 ` [PATCH V2 9/9] nvmet: add ZNS based I/O cmds handlers Chaitanya Kulkarni
2020-11-30  3:29   ` Chaitanya Kulkarni
2020-11-30  6:51 ` [PATCH 0/9] nvmet: add genblk ZBD backend Damien Le Moal
2020-11-30  6:51   ` Damien Le Moal
2020-12-01  3:42   ` Chaitanya Kulkarni
2020-12-01  3:42     ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.