All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-26  2:40 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

Hi,

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the ZNS mode with the passthru backend. There is no
support for a generic block device backend to handle the ZBD devices
which are not NVMe devices.

This adds support to export the ZBD drives (which are not NVMe drives)
to host from the target with NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for 
patch-series overview.

I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
namespace with nvme-loop transport. The same testcases are passing on the
NVMeOF zbd-ns and are passing for null_blk without NVMeOF .

Regards,
Chaitanya

Chaitanya Kulkarni (9):
  block: export __bio_iov_append_get_pages()
	Prep patch needed for implementing Zone Append.
  nvmet: add ZNS support for bdev-ns
	Core Command handlers and various helpers for ZBD backend which
	 will be called by target-core/target-admin etc.
  nvmet: trim down id-desclist to use req->ns
	Cleanup needed to avoid the code repetation for passing extra
	function parameters for ZBD backend handlers.
  nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
	Allows host to identify zoned namesapce.
  nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
	Allows host to identify controller with the ZBD-ZNS.
  nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
	Allows host to identify namespace with the ZBD-ZNS.
  nvmet: add zns cmd effects to support zbdev
	Allows host to support the ZNS commands when zoned-blkdev is
	 selected.
  nvmet: add zns bdev config support
	Allows user to override any target namespace attributes for
	 ZBD.
  nvmet: add ZNS based I/O cmds handlers
	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.

 block/bio.c                       |   3 +-
 drivers/nvme/target/Makefile      |   3 +-
 drivers/nvme/target/admin-cmd.c   |  38 ++-
 drivers/nvme/target/io-cmd-bdev.c |  12 +
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  18 ++
 drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   1 +
 8 files changed, 451 insertions(+), 16 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

Test Report :-

# cat /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1/device_path 
/dev/nullb1
# nvme list | tr -s ' ' ' ' 
Node SN Model Namespace Usage Format FW Rev 
/dev/nvme1n1 212d336db96a4282 Linux 1 1.07 GB / 1.07 GB 4 KiB + 0 B 5.10.0-r
# ./zonefs-tests.sh /dev/nullb1 
Gathering information on /dev/nullb1...
zonefs-tests on /dev/nullb1:
  4 zones (0 conventional zones, 4 sequential zones)
  524288 512B sectors zone size (256 MiB)
  0 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... FAIL
  Test 0013:  mkzonefs (super block zone state)                    ... FAIL
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

44 / 46 tests passed
#
#
# ./zonefs-tests.sh /dev/nvme1n1
Gathering information on /dev/nvme1n1...
zonefs-tests on /dev/nvme1n1:
  4 zones (0 conventional zones, 4 sequential zones)
  524288 512B sectors zone size (256 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... FAIL
  Test 0013:  mkzonefs (super block zone state)                    ... FAIL
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

44 / 46 tests passed

-- 
2.22.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-26  2:40 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Hi,

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the ZNS mode with the passthru backend. There is no
support for a generic block device backend to handle the ZBD devices
which are not NVMe devices.

This adds support to export the ZBD drives (which are not NVMe drives)
to host from the target with NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for 
patch-series overview.

I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
namespace with nvme-loop transport. The same testcases are passing on the
NVMeOF zbd-ns and are passing for null_blk without NVMeOF .

Regards,
Chaitanya

Chaitanya Kulkarni (9):
  block: export __bio_iov_append_get_pages()
	Prep patch needed for implementing Zone Append.
  nvmet: add ZNS support for bdev-ns
	Core Command handlers and various helpers for ZBD backend which
	 will be called by target-core/target-admin etc.
  nvmet: trim down id-desclist to use req->ns
	Cleanup needed to avoid the code repetation for passing extra
	function parameters for ZBD backend handlers.
  nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
	Allows host to identify zoned namesapce.
  nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
	Allows host to identify controller with the ZBD-ZNS.
  nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
	Allows host to identify namespace with the ZBD-ZNS.
  nvmet: add zns cmd effects to support zbdev
	Allows host to support the ZNS commands when zoned-blkdev is
	 selected.
  nvmet: add zns bdev config support
	Allows user to override any target namespace attributes for
	 ZBD.
  nvmet: add ZNS based I/O cmds handlers
	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.

 block/bio.c                       |   3 +-
 drivers/nvme/target/Makefile      |   3 +-
 drivers/nvme/target/admin-cmd.c   |  38 ++-
 drivers/nvme/target/io-cmd-bdev.c |  12 +
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  18 ++
 drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   1 +
 8 files changed, 451 insertions(+), 16 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

Test Report :-

# cat /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1/device_path 
/dev/nullb1
# nvme list | tr -s ' ' ' ' 
Node SN Model Namespace Usage Format FW Rev 
/dev/nvme1n1 212d336db96a4282 Linux 1 1.07 GB / 1.07 GB 4 KiB + 0 B 5.10.0-r
# ./zonefs-tests.sh /dev/nullb1 
Gathering information on /dev/nullb1...
zonefs-tests on /dev/nullb1:
  4 zones (0 conventional zones, 4 sequential zones)
  524288 512B sectors zone size (256 MiB)
  0 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... FAIL
  Test 0013:  mkzonefs (super block zone state)                    ... FAIL
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

44 / 46 tests passed
#
#
# ./zonefs-tests.sh /dev/nvme1n1
Gathering information on /dev/nvme1n1...
zonefs-tests on /dev/nvme1n1:
  4 zones (0 conventional zones, 4 sequential zones)
  524288 512B sectors zone size (256 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... FAIL
  Test 0013:  mkzonefs (super block zone state)                    ... FAIL
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

44 / 46 tests passed

-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/9] block: export __bio_iov_append_get_pages()
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

In this prep patch we exoprt the __bio_iov_append_get_pages() so that
NVMeOF target can use the core logic of building Zone Append bios for
REQ_OP_ZONE_APPEND without repeating the code.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/bio.c         | 3 ++-
 include/linux/bio.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index fa01bef35bb1..de356fa28315 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1033,7 +1033,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
@@ -1079,6 +1079,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_advance(iter, size - left);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(__bio_iov_append_get_pages);
 
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..47247c1b0b85 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -446,6 +446,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool *same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 void bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/9] block: export __bio_iov_append_get_pages()
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

In this prep patch we exoprt the __bio_iov_append_get_pages() so that
NVMeOF target can use the core logic of building Zone Append bios for
REQ_OP_ZONE_APPEND without repeating the code.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/bio.c         | 3 ++-
 include/linux/bio.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index fa01bef35bb1..de356fa28315 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1033,7 +1033,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 {
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
@@ -1079,6 +1079,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_advance(iter, size - left);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(__bio_iov_append_get_pages);
 
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..47247c1b0b85 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -446,6 +446,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool *same_page);
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
+int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
 void bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
support for bdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   2 +
 drivers/nvme/target/admin-cmd.c   |   4 +-
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  18 ++
 drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
 5 files changed, 413 insertions(+), 3 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..bc147ff2df5d 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
+nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
+
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index dca34489a1dc..509fd8dcca0c 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
-				    void *id, off_t *off)
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off)
 {
 	struct nvme_ns_id_desc desc = {
 		.nidt = type,
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 0abbefd9925e..2bd10960fa50 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 	return ret;
 }
 
-static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
 {
 	bv->bv_page = sg_page(sg);
 	bv->bv_offset = sg->offset;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 592763732065..0542ba672a31 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,9 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ns_zns	id_zns;
+#endif
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
@@ -251,6 +254,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ctrl_zns	id_ctrl_zns;
+#endif
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
 	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
 }
 
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off);
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..8ea6641a55e3
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/uio.h>
+#include <linux/nvme.h>
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include "nvmet.h"
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	u16 status = 0;
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
+		status = NVME_SC_INVALID_FIELD;
+out:
+	return status;
+}
+
+static struct block_device *nvmet_bdev(struct nvmet_req *req)
+{
+	return req->ns->bdev;
+}
+
+static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
+{
+	return sizeof(struct nvme_zone_report) +
+		(sizeof(struct nvme_zone_descriptor) * nr_zones);
+}
+
+static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
+{
+	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
+{
+	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+/*
+ *  ZNS related command implementation and helprs.
+ */
+
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	u16 nvme_cis_zns = NVME_CSI_ZNS;
+
+	if (bdev_is_zoned(nvmet_bdev(req))) {
+		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+						 NVME_NIDT_CSI_LEN,
+						 &nvme_cis_zns, off);
+	}
+
+	return NVME_SC_SUCCESS;
+}
+
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+}
+
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+{
+	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
+		pr_err("block device with conventional zones not supported.");
+		return false;
+	}
+	/*
+	 * SMR drives will results in error if writes are not aligned to the
+	 * physical block size just override.
+	 */
+	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
+	return true;
+}
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
+				     void *data)
+{
+	struct blk_zone *zones = data;
+
+	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
+
+	return 0;
+}
+
+static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
+				struct nvme_zone_descriptor *rz)
+{
+	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
+	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
+	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
+	rz->za = z->reset ? 1 << 2 : 0;
+	rz->zt = z->type;
+	rz->zs = z->cond << 4;
+}
+
+/*
+ * ZNS related Admin and I/O command handlers.
+ */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	struct nvme_id_ctrl_zns *id;
+	u16 status = 0;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	/*
+	 * Even though this function sets Zone Append Size Limit to 0,
+	 * the 0 value here indicates that the maximum data transfer size for
+	 * the Zone Append command is indicated by the ctrl
+	 * Maximum Data Transfer Size (MDTS).
+	 */
+	id->zasl = 0;
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u16 status = 0;
+	u64 zsze;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(nvmet_bdev(req))) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
+	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
+	unsigned int nz = blk_queue_nr_zones(q);
+	u64 bufsize = (zmr->numd << 2) + 1;
+	struct nvme_zone_report *rz;
+	struct blk_zone *zones;
+	int reported_zones;
+	sector_t sect;
+	u64 desc_size;
+	u16 status;
+	int i;
+
+	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
+			      sizeof(struct blk_zone), GFP_KERNEL);
+	if (!zones) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
+	if (!rz) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_zones;
+	}
+
+	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
+
+	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
+		desc_size = nvmet_zones_to_descsize(nz);
+
+	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
+					     nvmet_bdev_report_zone_cb,
+					     zones);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	rz->nr_zones = cpu_to_le64(reported_zones);
+	for (i = 0; i < reported_zones; i++)
+		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
+
+	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
+
+out_free_report_zones:
+	kvfree(rz);
+out_free_zones:
+	kvfree(zones);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
+	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
+	u16 status = NVME_SC_SUCCESS;
+	enum req_opf op;
+	sector_t sect;
+	int ret;
+
+	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
+
+	switch (c->zsa) {
+	case NVME_ZONE_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case NVME_ZONE_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case NVME_ZONE_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case NVME_ZONE_RESET:
+		if (c->select_all)
+			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		status = NVME_SC_INVALID_FIELD;
+		break;
+	}
+
+	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
+	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	u64 slba = le64_to_cpu(req->cmd->rw.slba);
+	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
+	u16 status = NVME_SC_SUCCESS;
+	int sg_cnt = req->sg_cnt;
+	struct scatterlist *sg;
+	size_t mapped_data_len;
+	struct iov_iter from;
+	struct bio_vec *bvec;
+	size_t mapped_cnt;
+	size_t io_len = 0;
+	struct bio *bio;
+	int ret;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return;
+	}
+
+	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
+	if (!bvec) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	while (sg_cnt) {
+		mapped_data_len = 0;
+		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
+			nvmet_file_init_bvec(bvec, sg);
+			mapped_data_len += bvec[mapped_cnt].bv_len;
+			sg_cnt--;
+			if (mapped_cnt == bv_cnt)
+				break;
+		}
+		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
+
+		bio = bio_alloc(GFP_KERNEL, bv_cnt);
+		bio_set_dev(bio, nvmet_bdev(req));
+		bio->bi_iter.bi_sector = sect;
+		bio->bi_opf = op;
+
+		ret =  __bio_iov_append_get_pages(bio, &from);
+		if (unlikely(ret)) {
+			status = NVME_SC_INTERNAL;
+			bio_io_error(bio);
+			kfree(bvec);
+			goto out;
+		}
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+		if (ret < 0) {
+			status = NVME_SC_INTERNAL;
+			break;
+		}
+
+		io_len += mapped_data_len;
+	}
+
+	sect += (io_len >> 9);
+	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
+	kfree(bvec);
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+#else  /* CONFIG_BLK_DEV_ZONED */
+static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	return 0;
+}
+static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+{
+	return false;
+}
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
support for bdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   2 +
 drivers/nvme/target/admin-cmd.c   |   4 +-
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  18 ++
 drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
 5 files changed, 413 insertions(+), 3 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..bc147ff2df5d 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
+nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
+
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index dca34489a1dc..509fd8dcca0c 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
-				    void *id, off_t *off)
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off)
 {
 	struct nvme_ns_id_desc desc = {
 		.nidt = type,
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 0abbefd9925e..2bd10960fa50 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 	return ret;
 }
 
-static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
 {
 	bv->bv_page = sg_page(sg);
 	bv->bv_offset = sg->offset;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 592763732065..0542ba672a31 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,9 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ns_zns	id_zns;
+#endif
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
@@ -251,6 +254,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	struct nvme_id_ctrl_zns	id_ctrl_zns;
+#endif
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
 	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
 }
 
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
+u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
+			     void *id, off_t *off);
+void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..8ea6641a55e3
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/uio.h>
+#include <linux/nvme.h>
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include "nvmet.h"
+
+#ifdef CONFIG_BLK_DEV_ZONED
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	u16 status = 0;
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
+		status = NVME_SC_INVALID_FIELD;
+out:
+	return status;
+}
+
+static struct block_device *nvmet_bdev(struct nvmet_req *req)
+{
+	return req->ns->bdev;
+}
+
+static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
+{
+	return sizeof(struct nvme_zone_report) +
+		(sizeof(struct nvme_zone_descriptor) * nr_zones);
+}
+
+static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
+{
+	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
+{
+	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
+}
+
+/*
+ *  ZNS related command implementation and helprs.
+ */
+
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	u16 nvme_cis_zns = NVME_CSI_ZNS;
+
+	if (bdev_is_zoned(nvmet_bdev(req))) {
+		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+						 NVME_NIDT_CSI_LEN,
+						 &nvme_cis_zns, off);
+	}
+
+	return NVME_SC_SUCCESS;
+}
+
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+}
+
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+{
+	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
+		pr_err("block device with conventional zones not supported.");
+		return false;
+	}
+	/*
+	 * SMR drives will results in error if writes are not aligned to the
+	 * physical block size just override.
+	 */
+	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
+	return true;
+}
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
+				     void *data)
+{
+	struct blk_zone *zones = data;
+
+	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
+
+	return 0;
+}
+
+static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
+				struct nvme_zone_descriptor *rz)
+{
+	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
+	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
+	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
+	rz->za = z->reset ? 1 << 2 : 0;
+	rz->zt = z->type;
+	rz->zs = z->cond << 4;
+}
+
+/*
+ * ZNS related Admin and I/O command handlers.
+ */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	struct nvme_id_ctrl_zns *id;
+	u16 status = 0;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	/*
+	 * Even though this function sets Zone Append Size Limit to 0,
+	 * the 0 value here indicates that the maximum data transfer size for
+	 * the Zone Append command is indicated by the ctrl
+	 * Maximum Data Transfer Size (MDTS).
+	 */
+	id->zasl = 0;
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u16 status = 0;
+	u64 zsze;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(nvmet_bdev(req))) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
+	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
+	unsigned int nz = blk_queue_nr_zones(q);
+	u64 bufsize = (zmr->numd << 2) + 1;
+	struct nvme_zone_report *rz;
+	struct blk_zone *zones;
+	int reported_zones;
+	sector_t sect;
+	u64 desc_size;
+	u16 status;
+	int i;
+
+	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
+			      sizeof(struct blk_zone), GFP_KERNEL);
+	if (!zones) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
+	if (!rz) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_zones;
+	}
+
+	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
+
+	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
+		desc_size = nvmet_zones_to_descsize(nz);
+
+	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
+					     nvmet_bdev_report_zone_cb,
+					     zones);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	rz->nr_zones = cpu_to_le64(reported_zones);
+	for (i = 0; i < reported_zones; i++)
+		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
+
+	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
+
+out_free_report_zones:
+	kvfree(rz);
+out_free_zones:
+	kvfree(zones);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
+	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
+	u16 status = NVME_SC_SUCCESS;
+	enum req_opf op;
+	sector_t sect;
+	int ret;
+
+	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
+
+	switch (c->zsa) {
+	case NVME_ZONE_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case NVME_ZONE_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case NVME_ZONE_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case NVME_ZONE_RESET:
+		if (c->select_all)
+			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		status = NVME_SC_INVALID_FIELD;
+		break;
+	}
+
+	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
+	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	u64 slba = le64_to_cpu(req->cmd->rw.slba);
+	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
+	u16 status = NVME_SC_SUCCESS;
+	int sg_cnt = req->sg_cnt;
+	struct scatterlist *sg;
+	size_t mapped_data_len;
+	struct iov_iter from;
+	struct bio_vec *bvec;
+	size_t mapped_cnt;
+	size_t io_len = 0;
+	struct bio *bio;
+	int ret;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return;
+	}
+
+	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
+	if (!bvec) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	while (sg_cnt) {
+		mapped_data_len = 0;
+		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
+			nvmet_file_init_bvec(bvec, sg);
+			mapped_data_len += bvec[mapped_cnt].bv_len;
+			sg_cnt--;
+			if (mapped_cnt == bv_cnt)
+				break;
+		}
+		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
+
+		bio = bio_alloc(GFP_KERNEL, bv_cnt);
+		bio_set_dev(bio, nvmet_bdev(req));
+		bio->bi_iter.bi_sector = sect;
+		bio->bi_opf = op;
+
+		ret =  __bio_iov_append_get_pages(bio, &from);
+		if (unlikely(ret)) {
+			status = NVME_SC_INTERNAL;
+			bio_io_error(bio);
+			kfree(bvec);
+			goto out;
+		}
+
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+		if (ret < 0) {
+			status = NVME_SC_INTERNAL;
+			break;
+		}
+
+		io_len += mapped_data_len;
+	}
+
+	sect += (io_len >> 9);
+	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
+	kfree(bvec);
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+#else  /* CONFIG_BLK_DEV_ZONED */
+static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
+{
+	return 0;
+}
+static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+{
+	return false;
+}
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/9] nvmet: trim down id-desclist to use req->ns
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

In this prep patch we remove the extra local variable struct nvmet_ns
in nvmet_execute_identify_desclist() since req already has the member
that can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.

This reduces the arguments to the function in the following patch to
implement the ZNS for bdev-ns so we can get away with passing the req
argument instead of req and ns.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 509fd8dcca0c..c64b40c631e0 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -603,37 +603,35 @@ u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
-	struct nvmet_ns *ns;
 	u16 status = 0;
 	off_t off = 0;
 
-	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
-	if (!ns) {
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
 		req->error_loc = offsetof(struct nvme_identify, nsid);
 		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
 		goto out;
 	}
 
-	if (memchr_inv(&ns->uuid, 0, sizeof(ns->uuid))) {
+	if (memchr_inv(&req->ns->uuid, 0, sizeof(req->ns->uuid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_UUID,
 						  NVME_NIDT_UUID_LEN,
-						  &ns->uuid, &off);
+						  &req->ns->uuid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
-	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
+	if (memchr_inv(req->ns->nguid, 0, sizeof(req->ns->nguid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_NGUID,
 						  NVME_NIDT_NGUID_LEN,
-						  &ns->nguid, &off);
+						  &req->ns->nguid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
 
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-out_put_ns:
-	nvmet_put_namespace(ns);
+
 out:
 	nvmet_req_complete(req, status);
 }
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/9] nvmet: trim down id-desclist to use req->ns
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

In this prep patch we remove the extra local variable struct nvmet_ns
in nvmet_execute_identify_desclist() since req already has the member
that can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.

This reduces the arguments to the function in the following patch to
implement the ZNS for bdev-ns so we can get away with passing the req
argument instead of req and ns.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 509fd8dcca0c..c64b40c631e0 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -603,37 +603,35 @@ u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
-	struct nvmet_ns *ns;
 	u16 status = 0;
 	off_t off = 0;
 
-	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
-	if (!ns) {
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
 		req->error_loc = offsetof(struct nvme_identify, nsid);
 		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
 		goto out;
 	}
 
-	if (memchr_inv(&ns->uuid, 0, sizeof(ns->uuid))) {
+	if (memchr_inv(&req->ns->uuid, 0, sizeof(req->ns->uuid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_UUID,
 						  NVME_NIDT_UUID_LEN,
-						  &ns->uuid, &off);
+						  &req->ns->uuid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
-	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
+	if (memchr_inv(req->ns->nguid, 0, sizeof(req->ns->nguid))) {
 		status = nvmet_copy_ns_identifier(req, NVME_NIDT_NGUID,
 						  NVME_NIDT_NGUID_LEN,
-						  &ns->nguid, &off);
+						  &req->ns->nguid, &off);
 		if (status)
-			goto out_put_ns;
+			goto out;
 	}
 
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-out_put_ns:
-	nvmet_put_namespace(ns);
+
 out:
 	nvmet_req_complete(req, status);
 }
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/9] nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

When discovering the ZNS, the host-side looks for the NVME_CSI_ZNS
value in the ns-desc. Update the nvmet_execute_identify_desclist()
such that it can now update the ns-desc with NVME_CSI_ZNS if bdev is
zoned.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c64b40c631e0..d4fc1bb1a318 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -628,6 +628,10 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_process_zns_cis(req, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/9] nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

When discovering the ZNS, the host-side looks for the NVME_CSI_ZNS
value in the ns-desc. Update the nvmet_execute_identify_desclist()
such that it can now update the ns-desc with NVME_CSI_ZNS if bdev is
zoned.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c64b40c631e0..d4fc1bb1a318 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -628,6 +628,10 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_process_zns_cis(req, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_CTRL when identify.cis is set to ZNS. This allows
host to identify the support for ZNS.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index d4fc1bb1a318..e7d2b96cda6b 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -650,6 +650,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 		return nvmet_execute_identify_ns(req);
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ctrl(req);
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_CTRL when identify.cis is set to ZNS. This allows
host to identify the support for ZNS.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index d4fc1bb1a318..e7d2b96cda6b 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -650,6 +650,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 		return nvmet_execute_identify_ns(req);
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ctrl(req);
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_NS when identify.cis is set to ZNS. This allows
host to identify the ns with ZNS capabilities.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index e7d2b96cda6b..cd368cbe3855 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ns(req);
+		break;
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
 	case NVME_ID_CNS_CS_CTRL:
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Update the nvmet_execute_identify() such that it can now handle
NVME_ID_CNS_CS_NS when identify.cis is set to ZNS. This allows
host to identify the ns with ZNS capabilities.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index e7d2b96cda6b..cd368cbe3855 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ns(req);
+		break;
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
 	case NVME_ID_CNS_CS_CTRL:
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/9] nvmet: add zns cmd effects to support zbdev
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

Update the target side command effects logs with support for
ZNS commands for zbdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index cd368cbe3855..0099275951da 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -191,6 +191,8 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 
+	nvmet_zns_add_cmd_effects(log);
+
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
 	kfree(log);
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/9] nvmet: add zns cmd effects to support zbdev
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Update the target side command effects logs with support for
ZNS commands for zbdev.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index cd368cbe3855..0099275951da 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -191,6 +191,8 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 
+	nvmet_zns_add_cmd_effects(log);
+
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
 	kfree(log);
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 8/9] nvmet: add zns bdev config support
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

For zbd based bdev backend we need to override the ns->blksize_shift
with the physical block size instead of using the logical block size
so that SMR drives will not result in an error.

Update the nvmet_bdev_ns_enable() to reflect that.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 125dde3f410e..f8a500983abd 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -86,6 +86,9 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
+	if (bdev_is_zoned(ns->bdev) && !nvmet_bdev_zns_config(ns))
+		return -EINVAL;
+
 	return 0;
 }
 
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 8/9] nvmet: add zns bdev config support
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

For zbd based bdev backend we need to override the ns->blksize_shift
with the physical block size instead of using the logical block size
so that SMR drives will not result in an error.

Update the nvmet_bdev_ns_enable() to reflect that.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 125dde3f410e..f8a500983abd 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -86,6 +86,9 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
+	if (bdev_is_zoned(ns->bdev) && !nvmet_bdev_zns_config(ns))
+		return -EINVAL;
+
 	return 0;
 }
 
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 9/9] nvmet: add ZNS based I/O cmds handlers
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, hch, Chaitanya Kulkarni

Add zone-mgmt-send, zone-mgmt-recv and zone-zppend handlers for the
bdev backend so that it can support zbd.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      | 3 +--
 drivers/nvme/target/io-cmd-bdev.c | 9 +++++++++
 drivers/nvme/target/zns.c         | 6 +++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index bc147ff2df5d..15307b1cc713 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,9 +10,8 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
-			discovery.o io-cmd-file.o io-cmd-bdev.o
+		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
-nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index f8a500983abd..4fcc8374b857 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -453,6 +453,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
 		       req->sq->qid);
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 8ea6641a55e3..efd11d7a6f96 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -361,17 +361,17 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 }
 
 #else  /* CONFIG_BLK_DEV_ZONED */
-static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
 {
 }
-static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
 {
 }
 u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
 {
 	return 0;
 }
-static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
 {
 	return false;
 }
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 9/9] nvmet: add ZNS based I/O cmds handlers
@ 2020-11-26  2:40   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-26  2:40 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Add zone-mgmt-send, zone-mgmt-recv and zone-zppend handlers for the
bdev backend so that it can support zbd.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      | 3 +--
 drivers/nvme/target/io-cmd-bdev.c | 9 +++++++++
 drivers/nvme/target/zns.c         | 6 +++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index bc147ff2df5d..15307b1cc713 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,9 +10,8 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
-			discovery.o io-cmd-file.o io-cmd-bdev.o
+		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
-nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index f8a500983abd..4fcc8374b857 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -453,6 +453,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
 		       req->sq->qid);
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 8ea6641a55e3..efd11d7a6f96 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -361,17 +361,17 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 }
 
 #else  /* CONFIG_BLK_DEV_ZONED */
-static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
 {
 }
-static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
 {
 }
 u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
 {
 	return 0;
 }
-static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
+bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
 {
 	return false;
 }
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
  2020-11-26  2:40 ` Chaitanya Kulkarni
@ 2020-11-26  8:07   ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:07 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Hi,
> 
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the ZNS mode with the passthru backend. There is no
> support for a generic block device backend to handle the ZBD devices
> which are not NVMe devices.
> 
> This adds support to export the ZBD drives (which are not NVMe drives)
> to host from the target with NVMeOF using the host side ZNS interface.
> 
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for 
> patch-series overview.
> 
> I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
> namespace with nvme-loop transport. The same testcases are passing on the
> NVMeOF zbd-ns and are passing for null_blk without NVMeOF .
> 
> Regards,
> Chaitanya
> 
> Chaitanya Kulkarni (9):
>   block: export __bio_iov_append_get_pages()
> 	Prep patch needed for implementing Zone Append.
>   nvmet: add ZNS support for bdev-ns
> 	Core Command handlers and various helpers for ZBD backend which
> 	 will be called by target-core/target-admin etc.
>   nvmet: trim down id-desclist to use req->ns
> 	Cleanup needed to avoid the code repetation for passing extra
> 	function parameters for ZBD backend handlers.
>   nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
> 	Allows host to identify zoned namesapce.
>   nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
> 	Allows host to identify controller with the ZBD-ZNS.
>   nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
> 	Allows host to identify namespace with the ZBD-ZNS.
>   nvmet: add zns cmd effects to support zbdev
> 	Allows host to support the ZNS commands when zoned-blkdev is
> 	 selected.
>   nvmet: add zns bdev config support
> 	Allows user to override any target namespace attributes for
> 	 ZBD.
>   nvmet: add ZNS based I/O cmds handlers
> 	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.
> 
>  block/bio.c                       |   3 +-
>  drivers/nvme/target/Makefile      |   3 +-
>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  18 ++
>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>  include/linux/bio.h               |   1 +
>  8 files changed, 451 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> Test Report :-
> 
> # cat /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1/device_path 
> /dev/nullb1
> # nvme list | tr -s ' ' ' ' 
> Node SN Model Namespace Usage Format FW Rev 
> /dev/nvme1n1 212d336db96a4282 Linux 1 1.07 GB / 1.07 GB 4 KiB + 0 B 5.10.0-r
> # ./zonefs-tests.sh /dev/nullb1 
> Gathering information on /dev/nullb1...
> zonefs-tests on /dev/nullb1:
>   4 zones (0 conventional zones, 4 sequential zones)
>   524288 512B sectors zone size (256 MiB)
>   0 max open zones
> Running tests
>   Test 0010:  mkzonefs (options)                                   ... PASS
>   Test 0011:  mkzonefs (force format)                              ... PASS
>   Test 0012:  mkzonefs (invalid device)                            ... FAIL
>   Test 0013:  mkzonefs (super block zone state)                    ... FAIL

See below.

>   Test 0020:  mount (default)                                      ... PASS
>   Test 0021:  mount (invalid device)                               ... PASS
>   Test 0022:  mount (check mount directory sub-directories)        ... PASS
>   Test 0023:  mount (options)                                      ... PASS
>   Test 0030:  Number of files (default)                            ... PASS
>   Test 0031:  Number of files (aggr_cnv)                           ... skip
>   Test 0032:  Number of files using stat (default)                 ... PASS
>   Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
>   Test 0034:  Number of blocks using stat (default)                ... PASS
>   Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
>   Test 0040:  Files permissions (default)                          ... PASS
>   Test 0041:  Files permissions (aggr_cnv)                         ... skip
>   Test 0042:  Files permissions (set value)                        ... PASS
>   Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
>   Test 0050:  Files owner (default)                                ... PASS
>   Test 0051:  Files owner (aggr_cnv)                               ... skip
>   Test 0052:  Files owner (set value)                              ... PASS
>   Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
>   Test 0060:  Files size (default)                                 ... PASS
>   Test 0061:  Files size (aggr_cnv)                                ... skip
>   Test 0070:  Conventional file truncate                           ... skip
>   Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
>   Test 0072:  Conventional file unlink                             ... skip
>   Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
>   Test 0074:  Conventional file random write                       ... skip
>   Test 0075:  Conventional file random write (direct)              ... skip
>   Test 0076:  Conventional file random write (aggr_cnv)            ... skip
>   Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
>   Test 0078:  Conventional file mmap read/write                    ... skip
>   Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
>   Test 0080:  Sequential file truncate                             ... PASS
>   Test 0081:  Sequential file unlink                               ... PASS
>   Test 0082:  Sequential file buffered write IO                    ... PASS
>   Test 0083:  Sequential file overwrite                            ... PASS
>   Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
>   Test 0085:  Sequential file unaligned write (async IO)           ... PASS
>   Test 0086:  Sequential file append (sync)                        ... PASS
>   Test 0087:  Sequential file append (async)                       ... PASS
>   Test 0088:  Sequential file random read                          ... PASS
>   Test 0089:  Sequential file mmap read/write                      ... PASS
>   Test 0090:  sequential file 4K synchronous write                 ... PASS
>   Test 0091:  Sequential file large synchronous write              ... PASS
> 
> 44 / 46 tests passed
> #
> #
> # ./zonefs-tests.sh /dev/nvme1n1
> Gathering information on /dev/nvme1n1...
> zonefs-tests on /dev/nvme1n1:
>   4 zones (0 conventional zones, 4 sequential zones)
>   524288 512B sectors zone size (256 MiB)
>   1 max open zones
> Running tests
>   Test 0010:  mkzonefs (options)                                   ... PASS
>   Test 0011:  mkzonefs (force format)                              ... PASS
>   Test 0012:  mkzonefs (invalid device)                            ... FAIL

Weird, this should not fail. zonefs-tests.sh creates a regular nullb device to
test that mkzonefs rejects regular disks. What was the failure here ?

>   Test 0013:  mkzonefs (super block zone state)                    ... FAIL

Same, this should not fail: this checks that if the super block is in a
sequential zone, that zone must be in the full condition. And seeing that all
conventional tests are skipped, it looks like your nullb drive does not have any
conventional zones. What was the error here ?

>   Test 0020:  mount (default)                                      ... PASS
>   Test 0021:  mount (invalid device)                               ... PASS
>   Test 0022:  mount (check mount directory sub-directories)        ... PASS
>   Test 0023:  mount (options)                                      ... PASS
>   Test 0030:  Number of files (default)                            ... PASS
>   Test 0031:  Number of files (aggr_cnv)                           ... skip
>   Test 0032:  Number of files using stat (default)                 ... PASS
>   Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
>   Test 0034:  Number of blocks using stat (default)                ... PASS
>   Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
>   Test 0040:  Files permissions (default)                          ... PASS
>   Test 0041:  Files permissions (aggr_cnv)                         ... skip
>   Test 0042:  Files permissions (set value)                        ... PASS
>   Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
>   Test 0050:  Files owner (default)                                ... PASS
>   Test 0051:  Files owner (aggr_cnv)                               ... skip
>   Test 0052:  Files owner (set value)                              ... PASS
>   Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
>   Test 0060:  Files size (default)                                 ... PASS
>   Test 0061:  Files size (aggr_cnv)                                ... skip
>   Test 0070:  Conventional file truncate                           ... skip
>   Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
>   Test 0072:  Conventional file unlink                             ... skip
>   Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
>   Test 0074:  Conventional file random write                       ... skip
>   Test 0075:  Conventional file random write (direct)              ... skip
>   Test 0076:  Conventional file random write (aggr_cnv)            ... skip
>   Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
>   Test 0078:  Conventional file mmap read/write                    ... skip
>   Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
>   Test 0080:  Sequential file truncate                             ... PASS
>   Test 0081:  Sequential file unlink                               ... PASS
>   Test 0082:  Sequential file buffered write IO                    ... PASS
>   Test 0083:  Sequential file overwrite                            ... PASS
>   Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
>   Test 0085:  Sequential file unaligned write (async IO)           ... PASS
>   Test 0086:  Sequential file append (sync)                        ... PASS
>   Test 0087:  Sequential file append (async)                       ... PASS
>   Test 0088:  Sequential file random read                          ... PASS
>   Test 0089:  Sequential file mmap read/write                      ... PASS
>   Test 0090:  sequential file 4K synchronous write                 ... PASS
>   Test 0091:  Sequential file large synchronous write              ... PASS
> 
> 44 / 46 tests passed
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-26  8:07   ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:07 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Hi,
> 
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the ZNS mode with the passthru backend. There is no
> support for a generic block device backend to handle the ZBD devices
> which are not NVMe devices.
> 
> This adds support to export the ZBD drives (which are not NVMe drives)
> to host from the target with NVMeOF using the host side ZNS interface.
> 
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for 
> patch-series overview.
> 
> I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
> namespace with nvme-loop transport. The same testcases are passing on the
> NVMeOF zbd-ns and are passing for null_blk without NVMeOF .
> 
> Regards,
> Chaitanya
> 
> Chaitanya Kulkarni (9):
>   block: export __bio_iov_append_get_pages()
> 	Prep patch needed for implementing Zone Append.
>   nvmet: add ZNS support for bdev-ns
> 	Core Command handlers and various helpers for ZBD backend which
> 	 will be called by target-core/target-admin etc.
>   nvmet: trim down id-desclist to use req->ns
> 	Cleanup needed to avoid the code repetation for passing extra
> 	function parameters for ZBD backend handlers.
>   nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
> 	Allows host to identify zoned namesapce.
>   nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
> 	Allows host to identify controller with the ZBD-ZNS.
>   nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
> 	Allows host to identify namespace with the ZBD-ZNS.
>   nvmet: add zns cmd effects to support zbdev
> 	Allows host to support the ZNS commands when zoned-blkdev is
> 	 selected.
>   nvmet: add zns bdev config support
> 	Allows user to override any target namespace attributes for
> 	 ZBD.
>   nvmet: add ZNS based I/O cmds handlers
> 	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.
> 
>  block/bio.c                       |   3 +-
>  drivers/nvme/target/Makefile      |   3 +-
>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  18 ++
>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>  include/linux/bio.h               |   1 +
>  8 files changed, 451 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> Test Report :-
> 
> # cat /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1/device_path 
> /dev/nullb1
> # nvme list | tr -s ' ' ' ' 
> Node SN Model Namespace Usage Format FW Rev 
> /dev/nvme1n1 212d336db96a4282 Linux 1 1.07 GB / 1.07 GB 4 KiB + 0 B 5.10.0-r
> # ./zonefs-tests.sh /dev/nullb1 
> Gathering information on /dev/nullb1...
> zonefs-tests on /dev/nullb1:
>   4 zones (0 conventional zones, 4 sequential zones)
>   524288 512B sectors zone size (256 MiB)
>   0 max open zones
> Running tests
>   Test 0010:  mkzonefs (options)                                   ... PASS
>   Test 0011:  mkzonefs (force format)                              ... PASS
>   Test 0012:  mkzonefs (invalid device)                            ... FAIL
>   Test 0013:  mkzonefs (super block zone state)                    ... FAIL

See below.

>   Test 0020:  mount (default)                                      ... PASS
>   Test 0021:  mount (invalid device)                               ... PASS
>   Test 0022:  mount (check mount directory sub-directories)        ... PASS
>   Test 0023:  mount (options)                                      ... PASS
>   Test 0030:  Number of files (default)                            ... PASS
>   Test 0031:  Number of files (aggr_cnv)                           ... skip
>   Test 0032:  Number of files using stat (default)                 ... PASS
>   Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
>   Test 0034:  Number of blocks using stat (default)                ... PASS
>   Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
>   Test 0040:  Files permissions (default)                          ... PASS
>   Test 0041:  Files permissions (aggr_cnv)                         ... skip
>   Test 0042:  Files permissions (set value)                        ... PASS
>   Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
>   Test 0050:  Files owner (default)                                ... PASS
>   Test 0051:  Files owner (aggr_cnv)                               ... skip
>   Test 0052:  Files owner (set value)                              ... PASS
>   Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
>   Test 0060:  Files size (default)                                 ... PASS
>   Test 0061:  Files size (aggr_cnv)                                ... skip
>   Test 0070:  Conventional file truncate                           ... skip
>   Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
>   Test 0072:  Conventional file unlink                             ... skip
>   Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
>   Test 0074:  Conventional file random write                       ... skip
>   Test 0075:  Conventional file random write (direct)              ... skip
>   Test 0076:  Conventional file random write (aggr_cnv)            ... skip
>   Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
>   Test 0078:  Conventional file mmap read/write                    ... skip
>   Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
>   Test 0080:  Sequential file truncate                             ... PASS
>   Test 0081:  Sequential file unlink                               ... PASS
>   Test 0082:  Sequential file buffered write IO                    ... PASS
>   Test 0083:  Sequential file overwrite                            ... PASS
>   Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
>   Test 0085:  Sequential file unaligned write (async IO)           ... PASS
>   Test 0086:  Sequential file append (sync)                        ... PASS
>   Test 0087:  Sequential file append (async)                       ... PASS
>   Test 0088:  Sequential file random read                          ... PASS
>   Test 0089:  Sequential file mmap read/write                      ... PASS
>   Test 0090:  sequential file 4K synchronous write                 ... PASS
>   Test 0091:  Sequential file large synchronous write              ... PASS
> 
> 44 / 46 tests passed
> #
> #
> # ./zonefs-tests.sh /dev/nvme1n1
> Gathering information on /dev/nvme1n1...
> zonefs-tests on /dev/nvme1n1:
>   4 zones (0 conventional zones, 4 sequential zones)
>   524288 512B sectors zone size (256 MiB)
>   1 max open zones
> Running tests
>   Test 0010:  mkzonefs (options)                                   ... PASS
>   Test 0011:  mkzonefs (force format)                              ... PASS
>   Test 0012:  mkzonefs (invalid device)                            ... FAIL

Weird, this should not fail. zonefs-tests.sh creates a regular nullb device to
test that mkzonefs rejects regular disks. What was the failure here ?

>   Test 0013:  mkzonefs (super block zone state)                    ... FAIL

Same, this should not fail: this checks that if the super block is in a
sequential zone, that zone must be in the full condition. And seeing that all
conventional tests are skipped, it looks like your nullb drive does not have any
conventional zones. What was the error here ?

>   Test 0020:  mount (default)                                      ... PASS
>   Test 0021:  mount (invalid device)                               ... PASS
>   Test 0022:  mount (check mount directory sub-directories)        ... PASS
>   Test 0023:  mount (options)                                      ... PASS
>   Test 0030:  Number of files (default)                            ... PASS
>   Test 0031:  Number of files (aggr_cnv)                           ... skip
>   Test 0032:  Number of files using stat (default)                 ... PASS
>   Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
>   Test 0034:  Number of blocks using stat (default)                ... PASS
>   Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
>   Test 0040:  Files permissions (default)                          ... PASS
>   Test 0041:  Files permissions (aggr_cnv)                         ... skip
>   Test 0042:  Files permissions (set value)                        ... PASS
>   Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
>   Test 0050:  Files owner (default)                                ... PASS
>   Test 0051:  Files owner (aggr_cnv)                               ... skip
>   Test 0052:  Files owner (set value)                              ... PASS
>   Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
>   Test 0060:  Files size (default)                                 ... PASS
>   Test 0061:  Files size (aggr_cnv)                                ... skip
>   Test 0070:  Conventional file truncate                           ... skip
>   Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
>   Test 0072:  Conventional file unlink                             ... skip
>   Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
>   Test 0074:  Conventional file random write                       ... skip
>   Test 0075:  Conventional file random write (direct)              ... skip
>   Test 0076:  Conventional file random write (aggr_cnv)            ... skip
>   Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
>   Test 0078:  Conventional file mmap read/write                    ... skip
>   Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
>   Test 0080:  Sequential file truncate                             ... PASS
>   Test 0081:  Sequential file unlink                               ... PASS
>   Test 0082:  Sequential file buffered write IO                    ... PASS
>   Test 0083:  Sequential file overwrite                            ... PASS
>   Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
>   Test 0085:  Sequential file unaligned write (async IO)           ... PASS
>   Test 0086:  Sequential file append (sync)                        ... PASS
>   Test 0087:  Sequential file append (async)                       ... PASS
>   Test 0088:  Sequential file random read                          ... PASS
>   Test 0089:  Sequential file mmap read/write                      ... PASS
>   Test 0090:  sequential file 4K synchronous write                 ... PASS
>   Test 0091:  Sequential file large synchronous write              ... PASS
> 
> 44 / 46 tests passed
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/9] block: export __bio_iov_append_get_pages()
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  8:09     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:09 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> In this prep patch we exoprt the __bio_iov_append_get_pages() so that
> NVMeOF target can use the core logic of building Zone Append bios for
> REQ_OP_ZONE_APPEND without repeating the code.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  block/bio.c         | 3 ++-
>  include/linux/bio.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index fa01bef35bb1..de356fa28315 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1033,7 +1033,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	return 0;
>  }
>  
> -static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> +int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
>  {
>  	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
>  	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
> @@ -1079,6 +1079,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
>  	iov_iter_advance(iter, size - left);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(__bio_iov_append_get_pages);

Why not use bio_iov_iter_get_pages() which is already exported ? As long as the
bio op is set to REQ_OP_ZONE_APPEND when bio_iov_iter_get_pages() is called,
__bio_iov_append_get_pages() will be used in that function.

>  
>  /**
>   * bio_iov_iter_get_pages - add user or kernel pages to a bio
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index c6d765382926..47247c1b0b85 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -446,6 +446,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
>  		unsigned int len, unsigned int off, bool *same_page);
>  void __bio_add_page(struct bio *bio, struct page *page,
>  		unsigned int len, unsigned int off);
> +int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter);
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
>  void bio_release_pages(struct bio *bio, bool mark_dirty);
>  extern void bio_set_pages_dirty(struct bio *bio);
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/9] block: export __bio_iov_append_get_pages()
@ 2020-11-26  8:09     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:09 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> In this prep patch we exoprt the __bio_iov_append_get_pages() so that
> NVMeOF target can use the core logic of building Zone Append bios for
> REQ_OP_ZONE_APPEND without repeating the code.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  block/bio.c         | 3 ++-
>  include/linux/bio.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index fa01bef35bb1..de356fa28315 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1033,7 +1033,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	return 0;
>  }
>  
> -static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> +int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
>  {
>  	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
>  	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
> @@ -1079,6 +1079,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
>  	iov_iter_advance(iter, size - left);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(__bio_iov_append_get_pages);

Why not use bio_iov_iter_get_pages() which is already exported ? As long as the
bio op is set to REQ_OP_ZONE_APPEND when bio_iov_iter_get_pages() is called,
__bio_iov_append_get_pages() will be used in that function.

>  
>  /**
>   * bio_iov_iter_get_pages - add user or kernel pages to a bio
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index c6d765382926..47247c1b0b85 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -446,6 +446,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page,
>  		unsigned int len, unsigned int off, bool *same_page);
>  void __bio_add_page(struct bio *bio, struct page *page,
>  		unsigned int len, unsigned int off);
> +int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter);
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
>  void bio_release_pages(struct bio *bio, bool mark_dirty);
>  extern void bio_set_pages_dirty(struct bio *bio);
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  8:36     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
> support for bdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   2 +
>  drivers/nvme/target/admin-cmd.c   |   4 +-
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  18 ++
>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>  5 files changed, 413 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..bc147ff2df5d 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
> +
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index dca34489a1dc..509fd8dcca0c 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>  	nvmet_req_complete(req, status);
>  }
>  
> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> -				    void *id, off_t *off)
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off)
>  {
>  	struct nvme_ns_id_desc desc = {
>  		.nidt = type,
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 0abbefd9925e..2bd10960fa50 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  	return ret;
>  }
>  
> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>  {
>  	bv->bv_page = sg_page(sg);
>  	bv->bv_offset = sg->offset;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..0542ba672a31 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -81,6 +81,9 @@ struct nvmet_ns {
>  	struct pci_dev		*p2p_dev;
>  	int			pi_type;
>  	int			metadata_size;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ns_zns	id_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off);
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..8ea6641a55e3
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,390 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/uio.h>
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include <linux/module.h>
> +#include "nvmet.h"
> +
> +#ifdef CONFIG_BLK_DEV_ZONED

This file is compiled only if CONFIG_BLK_DEV_ZONED is defined, so what is the
point of this ? The stubs for the !CONFIG_BLK_DEV_ZONED case should go into the
header file, no ?

> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = 0;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;

Why not return status directly here ? Same for the other cases below.

> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +out:
> +	return status;
> +}
> +
> +static struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> +	return req->ns->bdev;
> +}
> +
> +static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
> +{
> +	return sizeof(struct nvme_zone_report) +
> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
> +}

These could be declared as inline.

> +
> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +/*
> + *  ZNS related command implementation and helprs.

s/helprs/helpers

> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +	if (bdev_is_zoned(nvmet_bdev(req))) {
> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +						 NVME_NIDT_CSI_LEN,
> +						 &nvme_cis_zns, off);
> +	}

No need for the curly brackets.

> +
> +	return NVME_SC_SUCCESS;
> +}
> +
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> +		pr_err("block device with conventional zones not supported.");

pr_err("block devices with conventional zones are not supported.");

With SMR drives, the last zone of the disk can be smaller than the other zones.
That needs to be checked too as that is not allowed by ZNS. Drives with a last
smaller runt zone cannot be allowed.

> +		return false;
> +	}
> +	/*
> +	 * SMR drives will results in error if writes are not aligned to the
> +	 * physical block size just override.
> +	 */

	/*
	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
	 * to the device physical block size. So use this value as the logical
	 * block size to avoid errors.
	 */

> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +	return true;
> +}
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> +				     void *data)
> +{
> +	struct blk_zone *zones = data;
> +
> +	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
> +
> +	return 0;
> +}
> +
> +static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
> +				struct nvme_zone_descriptor *rz)
> +{
> +	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
> +	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
> +	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
> +	rz->za = z->reset ? 1 << 2 : 0;
> +	rz->zt = z->type;
> +	rz->zs = z->cond << 4;
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status = 0;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Even though this function sets Zone Append Size Limit to 0,
> +	 * the 0 value here indicates that the maximum data transfer size for
> +	 * the Zone Append command is indicated by the ctrl
> +	 * Maximum Data Transfer Size (MDTS).

But the target drive may have different values for max zone append sectors and
max_hw_sectors/max_sectors. So I think this needs finer handling.

> +	 */
> +	id->zasl = 0;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = 0;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;

That will result in nvmet_copy_to_sgl() being executed. Is that OK ?
Shouldn't you do only the kfree(id_zns) and complete with an error here ?

> +	}
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;

Same comment.

> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
> +	unsigned int nz = blk_queue_nr_zones(q);
> +	u64 bufsize = (zmr->numd << 2) + 1;
> +	struct nvme_zone_report *rz;
> +	struct blk_zone *zones;
> +	int reported_zones;
> +	sector_t sect;
> +	u64 desc_size;
> +	u16 status;
> +	int i;
> +
> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
> +			      sizeof(struct blk_zone), GFP_KERNEL);

This is not super nice: a large disk will have an enormous number of zones
(75000+ for largest SMR HDD today). But you actually do not need more zones
descs than what fits in req buffer.

> +	if (!zones) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> +	if (!rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_zones;
> +	}
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
> +
> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
> +		desc_size = nvmet_zones_to_descsize(nz);
> +
> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
> +					     nvmet_bdev_report_zone_cb,
> +					     zones);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	rz->nr_zones = cpu_to_le64(reported_zones);
> +	for (i = 0; i < reported_zones; i++)
> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);

This can be done directly in the report zones cb. That will avoid looping twice
over the reported zones.

> +
> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(rz);
> +out_free_zones:
> +	kvfree(zones);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
> +	u16 status = NVME_SC_SUCCESS;
> +	enum req_opf op;
> +	sector_t sect;
> +	int ret;
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
> +
> +	switch (c->zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		if (c->select_all)
> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		break;

You needa goto here or blkdev_zone_mgmt() will be called.

> +	}
> +
> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	int sg_cnt = req->sg_cnt;
> +	struct scatterlist *sg;
> +	size_t mapped_data_len;
> +	struct iov_iter from;
> +	struct bio_vec *bvec;
> +	size_t mapped_cnt;
> +	size_t io_len = 0;
> +	struct bio *bio;
> +	int ret;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;

No request completion ?

> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
> +	if (!bvec) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	while (sg_cnt) {
> +		mapped_data_len = 0;
> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
> +			nvmet_file_init_bvec(bvec, sg);
> +			mapped_data_len += bvec[mapped_cnt].bv_len;
> +			sg_cnt--;
> +			if (mapped_cnt == bv_cnt)
> +				break;
> +		}
> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
> +
> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
> +		bio_set_dev(bio, nvmet_bdev(req));
> +		bio->bi_iter.bi_sector = sect;
> +		bio->bi_opf = op;
> +
> +		ret =  __bio_iov_append_get_pages(bio, &from);
> +		if (unlikely(ret)) {
> +			status = NVME_SC_INTERNAL;
> +			bio_io_error(bio);
> +			kfree(bvec);
> +			goto out;
> +		}
> +
> +		ret = submit_bio_wait(bio);
> +		bio_put(bio);
> +		if (ret < 0) {
> +			status = NVME_SC_INTERNAL;
> +			break;
> +		}
> +
> +		io_len += mapped_data_len;
> +	}

This loop is equivalent to splitting a zone append. That must not be done as
that can lead to totally unpredictable ordering of the chunks. What if another
thread is doing zone append to the same zone at the same time ?

> +
> +	sect += (io_len >> 9);
> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
> +	kfree(bvec);
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	return 0;
> +}
> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +}

These should go in the header file. And put the brackets on the same line.

> +#endif /* CONFIG_BLK_DEV_ZONED */
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-26  8:36     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
> support for bdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   2 +
>  drivers/nvme/target/admin-cmd.c   |   4 +-
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  18 ++
>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>  5 files changed, 413 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..bc147ff2df5d 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
> +
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index dca34489a1dc..509fd8dcca0c 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>  	nvmet_req_complete(req, status);
>  }
>  
> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> -				    void *id, off_t *off)
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off)
>  {
>  	struct nvme_ns_id_desc desc = {
>  		.nidt = type,
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 0abbefd9925e..2bd10960fa50 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  	return ret;
>  }
>  
> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>  {
>  	bv->bv_page = sg_page(sg);
>  	bv->bv_offset = sg->offset;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..0542ba672a31 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -81,6 +81,9 @@ struct nvmet_ns {
>  	struct pci_dev		*p2p_dev;
>  	int			pi_type;
>  	int			metadata_size;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ns_zns	id_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off);
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..8ea6641a55e3
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,390 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/uio.h>
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include <linux/module.h>
> +#include "nvmet.h"
> +
> +#ifdef CONFIG_BLK_DEV_ZONED

This file is compiled only if CONFIG_BLK_DEV_ZONED is defined, so what is the
point of this ? The stubs for the !CONFIG_BLK_DEV_ZONED case should go into the
header file, no ?

> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = 0;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;

Why not return status directly here ? Same for the other cases below.

> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +out:
> +	return status;
> +}
> +
> +static struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> +	return req->ns->bdev;
> +}
> +
> +static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
> +{
> +	return sizeof(struct nvme_zone_report) +
> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
> +}

These could be declared as inline.

> +
> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +/*
> + *  ZNS related command implementation and helprs.

s/helprs/helpers

> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +	if (bdev_is_zoned(nvmet_bdev(req))) {
> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +						 NVME_NIDT_CSI_LEN,
> +						 &nvme_cis_zns, off);
> +	}

No need for the curly brackets.

> +
> +	return NVME_SC_SUCCESS;
> +}
> +
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> +		pr_err("block device with conventional zones not supported.");

pr_err("block devices with conventional zones are not supported.");

With SMR drives, the last zone of the disk can be smaller than the other zones.
That needs to be checked too as that is not allowed by ZNS. Drives with a last
smaller runt zone cannot be allowed.

> +		return false;
> +	}
> +	/*
> +	 * SMR drives will results in error if writes are not aligned to the
> +	 * physical block size just override.
> +	 */

	/*
	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
	 * to the device physical block size. So use this value as the logical
	 * block size to avoid errors.
	 */

> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +	return true;
> +}
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> +				     void *data)
> +{
> +	struct blk_zone *zones = data;
> +
> +	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
> +
> +	return 0;
> +}
> +
> +static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
> +				struct nvme_zone_descriptor *rz)
> +{
> +	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
> +	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
> +	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
> +	rz->za = z->reset ? 1 << 2 : 0;
> +	rz->zt = z->type;
> +	rz->zs = z->cond << 4;
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status = 0;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Even though this function sets Zone Append Size Limit to 0,
> +	 * the 0 value here indicates that the maximum data transfer size for
> +	 * the Zone Append command is indicated by the ctrl
> +	 * Maximum Data Transfer Size (MDTS).

But the target drive may have different values for max zone append sectors and
max_hw_sectors/max_sectors. So I think this needs finer handling.

> +	 */
> +	id->zasl = 0;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = 0;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;

That will result in nvmet_copy_to_sgl() being executed. Is that OK ?
Shouldn't you do only the kfree(id_zns) and complete with an error here ?

> +	}
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;

Same comment.

> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
> +	unsigned int nz = blk_queue_nr_zones(q);
> +	u64 bufsize = (zmr->numd << 2) + 1;
> +	struct nvme_zone_report *rz;
> +	struct blk_zone *zones;
> +	int reported_zones;
> +	sector_t sect;
> +	u64 desc_size;
> +	u16 status;
> +	int i;
> +
> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
> +			      sizeof(struct blk_zone), GFP_KERNEL);

This is not super nice: a large disk will have an enormous number of zones
(75000+ for largest SMR HDD today). But you actually do not need more zones
descs than what fits in req buffer.

> +	if (!zones) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> +	if (!rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_zones;
> +	}
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
> +
> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
> +		desc_size = nvmet_zones_to_descsize(nz);
> +
> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
> +					     nvmet_bdev_report_zone_cb,
> +					     zones);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	rz->nr_zones = cpu_to_le64(reported_zones);
> +	for (i = 0; i < reported_zones; i++)
> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);

This can be done directly in the report zones cb. That will avoid looping twice
over the reported zones.

> +
> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(rz);
> +out_free_zones:
> +	kvfree(zones);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
> +	u16 status = NVME_SC_SUCCESS;
> +	enum req_opf op;
> +	sector_t sect;
> +	int ret;
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
> +
> +	switch (c->zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		if (c->select_all)
> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		break;

You needa goto here or blkdev_zone_mgmt() will be called.

> +	}
> +
> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	int sg_cnt = req->sg_cnt;
> +	struct scatterlist *sg;
> +	size_t mapped_data_len;
> +	struct iov_iter from;
> +	struct bio_vec *bvec;
> +	size_t mapped_cnt;
> +	size_t io_len = 0;
> +	struct bio *bio;
> +	int ret;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;

No request completion ?

> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
> +	if (!bvec) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	while (sg_cnt) {
> +		mapped_data_len = 0;
> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
> +			nvmet_file_init_bvec(bvec, sg);
> +			mapped_data_len += bvec[mapped_cnt].bv_len;
> +			sg_cnt--;
> +			if (mapped_cnt == bv_cnt)
> +				break;
> +		}
> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
> +
> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
> +		bio_set_dev(bio, nvmet_bdev(req));
> +		bio->bi_iter.bi_sector = sect;
> +		bio->bi_opf = op;
> +
> +		ret =  __bio_iov_append_get_pages(bio, &from);
> +		if (unlikely(ret)) {
> +			status = NVME_SC_INTERNAL;
> +			bio_io_error(bio);
> +			kfree(bvec);
> +			goto out;
> +		}
> +
> +		ret = submit_bio_wait(bio);
> +		bio_put(bio);
> +		if (ret < 0) {
> +			status = NVME_SC_INTERNAL;
> +			break;
> +		}
> +
> +		io_len += mapped_data_len;
> +	}

This loop is equivalent to splitting a zone append. That must not be done as
that can lead to totally unpredictable ordering of the chunks. What if another
thread is doing zone append to the same zone at the same time ?

> +
> +	sect += (io_len >> 9);
> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
> +	kfree(bvec);
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	return 0;
> +}
> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +}

These should go in the header file. And put the brackets on the same line.

> +#endif /* CONFIG_BLK_DEV_ZONED */
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  8:39     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:39 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Update the nvmet_execute_identify() such that it can now handle
> NVME_ID_CNS_CS_CTRL when identify.cis is set to ZNS. This allows
> host to identify the support for ZNS.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index d4fc1bb1a318..e7d2b96cda6b 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -650,6 +650,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  		return nvmet_execute_identify_ns(req);
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ctrl(req);

This function is defined in zns.c, but it could be used for other NS types than
ZNS, no ?

> +		break;
>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>  		return nvmet_execute_identify_nslist(req);
>  	case NVME_ID_CNS_NS_DESC_LIST:
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
@ 2020-11-26  8:39     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:39 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Update the nvmet_execute_identify() such that it can now handle
> NVME_ID_CNS_CS_CTRL when identify.cis is set to ZNS. This allows
> host to identify the support for ZNS.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index d4fc1bb1a318..e7d2b96cda6b 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -650,6 +650,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  		return nvmet_execute_identify_ns(req);
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ctrl(req);

This function is defined in zns.c, but it could be used for other NS types than
ZNS, no ?

> +		break;
>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>  		return nvmet_execute_identify_nslist(req);
>  	case NVME_ID_CNS_NS_DESC_LIST:
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  8:40     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Update the nvmet_execute_identify() such that it can now handle
> NVME_ID_CNS_CS_NS when identify.cis is set to ZNS. This allows
> host to identify the ns with ZNS capabilities.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index e7d2b96cda6b..cd368cbe3855 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ns(req);
> +		break;
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
>  	case NVME_ID_CNS_CS_CTRL:
> 

Same patch as patch 5 ? Bug ?

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
@ 2020-11-26  8:40     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Update the nvmet_execute_identify() such that it can now handle
> NVME_ID_CNS_CS_NS when identify.cis is set to ZNS. This allows
> host to identify the ns with ZNS capabilities.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index e7d2b96cda6b..cd368cbe3855 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ns(req);
> +		break;
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
>  	case NVME_ID_CNS_CS_CTRL:
> 

Same patch as patch 5 ? Bug ?

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/9] nvmet: add zns cmd effects to support zbdev
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  8:40     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Update the target side command effects logs with support for
> ZNS commands for zbdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index cd368cbe3855..0099275951da 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -191,6 +191,8 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>  	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  
> +	nvmet_zns_add_cmd_effects(log);
> +
>  	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
>  
>  	kfree(log);
> 

This could be squashed with patch 5, no ?

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7/9] nvmet: add zns cmd effects to support zbdev
@ 2020-11-26  8:40     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Update the target side command effects logs with support for
> ZNS commands for zbdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index cd368cbe3855..0099275951da 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -191,6 +191,8 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>  	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  
> +	nvmet_zns_add_cmd_effects(log);
> +
>  	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
>  
>  	kfree(log);
> 

This could be squashed with patch 5, no ?

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] nvmet: add ZNS based I/O cmds handlers
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  8:45     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Add zone-mgmt-send, zone-mgmt-recv and zone-zppend handlers for the

s/zone-zppend/zone-append

> bdev backend so that it can support zbd.

s/zbd/zoned block devices (zbd is not an obvious acronym to all people)

> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      | 3 +--
>  drivers/nvme/target/io-cmd-bdev.c | 9 +++++++++
>  drivers/nvme/target/zns.c         | 6 +++---
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index bc147ff2df5d..15307b1cc713 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -10,9 +10,8 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
>  obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
> -			discovery.o io-cmd-file.o io-cmd-bdev.o
> +		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o

OK. Now I understand the really not obvious #ifdef in zns.c.
Isn't there a better way to do this ? If you move the code that must be
unconditionally compiled to check support for ZNS/Zoned devices is moved out of
zns.c, you would not need this dance with the Makefile and that will cleanup the
code (read: less of it).


>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> -nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>  
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index f8a500983abd..4fcc8374b857 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -453,6 +453,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>  	case nvme_cmd_write_zeroes:
>  		req->execute = nvmet_bdev_execute_write_zeroes;
>  		return 0;
> +	case nvme_cmd_zone_append:
> +		req->execute = nvmet_bdev_execute_zone_append;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_recv:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_send:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
> +		return 0;
>  	default:
>  		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
>  		       req->sq->qid);
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index 8ea6641a55e3..efd11d7a6f96 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -361,17 +361,17 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  }
>  
>  #else  /* CONFIG_BLK_DEV_ZONED */
> -static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>  {
>  }
> -static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>  {
>  }
>  u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>  {
>  	return 0;
>  }
> -static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>  {
>  	return false;
>  }
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] nvmet: add ZNS based I/O cmds handlers
@ 2020-11-26  8:45     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  8:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Add zone-mgmt-send, zone-mgmt-recv and zone-zppend handlers for the

s/zone-zppend/zone-append

> bdev backend so that it can support zbd.

s/zbd/zoned block devices (zbd is not an obvious acronym to all people)

> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      | 3 +--
>  drivers/nvme/target/io-cmd-bdev.c | 9 +++++++++
>  drivers/nvme/target/zns.c         | 6 +++---
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index bc147ff2df5d..15307b1cc713 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -10,9 +10,8 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)	+= nvme-fcloop.o
>  obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
> -			discovery.o io-cmd-file.o io-cmd-bdev.o
> +		   zns.o discovery.o io-cmd-file.o io-cmd-bdev.o

OK. Now I understand the really not obvious #ifdef in zns.c.
Isn't there a better way to do this ? If you move the code that must be
unconditionally compiled to check support for ZNS/Zoned devices is moved out of
zns.c, you would not need this dance with the Makefile and that will cleanup the
code (read: less of it).


>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> -nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>  
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index f8a500983abd..4fcc8374b857 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -453,6 +453,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>  	case nvme_cmd_write_zeroes:
>  		req->execute = nvmet_bdev_execute_write_zeroes;
>  		return 0;
> +	case nvme_cmd_zone_append:
> +		req->execute = nvmet_bdev_execute_zone_append;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_recv:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_send:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
> +		return 0;
>  	default:
>  		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
>  		       req->sq->qid);
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index 8ea6641a55e3..efd11d7a6f96 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -361,17 +361,17 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  }
>  
>  #else  /* CONFIG_BLK_DEV_ZONED */
> -static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>  {
>  }
> -static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>  {
>  }
>  u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>  {
>  	return 0;
>  }
> -static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>  {
>  	return false;
>  }
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-26  2:40   ` Chaitanya Kulkarni
@ 2020-11-26  9:06     ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  9:06 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
> support for bdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   2 +
>  drivers/nvme/target/admin-cmd.c   |   4 +-
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  18 ++
>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>  5 files changed, 413 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..bc147ff2df5d 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
> +
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index dca34489a1dc..509fd8dcca0c 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>  	nvmet_req_complete(req, status);
>  }
>  
> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> -				    void *id, off_t *off)
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off)
>  {
>  	struct nvme_ns_id_desc desc = {
>  		.nidt = type,
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 0abbefd9925e..2bd10960fa50 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  	return ret;
>  }
>  
> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>  {
>  	bv->bv_page = sg_page(sg);
>  	bv->bv_offset = sg->offset;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..0542ba672a31 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -81,6 +81,9 @@ struct nvmet_ns {
>  	struct pci_dev		*p2p_dev;
>  	int			pi_type;
>  	int			metadata_size;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ns_zns	id_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off);
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..8ea6641a55e3
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,390 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/uio.h>
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include <linux/module.h>
> +#include "nvmet.h"
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = 0;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +out:
> +	return status;
> +}
> +
> +static struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> +	return req->ns->bdev;
> +}
> +
> +static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
> +{
> +	return sizeof(struct nvme_zone_report) +
> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
> +}
> +
> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +/*
> + *  ZNS related command implementation and helprs.
> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +	if (bdev_is_zoned(nvmet_bdev(req))) {
> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +						 NVME_NIDT_CSI_LEN,
> +						 &nvme_cis_zns, off);
> +	}
> +
> +	return NVME_SC_SUCCESS;
> +}
> +
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> +		pr_err("block device with conventional zones not supported.");
> +		return false;
> +	}
> +	/*
> +	 * SMR drives will results in error if writes are not aligned to the
> +	 * physical block size just override.
> +	 */
> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +	return true;
> +}
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> +				     void *data)
> +{
> +	struct blk_zone *zones = data;
> +
> +	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
> +
> +	return 0;
> +}
> +
> +static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
> +				struct nvme_zone_descriptor *rz)
> +{
> +	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
> +	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
> +	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
> +	rz->za = z->reset ? 1 << 2 : 0;
> +	rz->zt = z->type;
> +	rz->zs = z->cond << 4;
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status = 0;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Even though this function sets Zone Append Size Limit to 0,
> +	 * the 0 value here indicates that the maximum data transfer size for
> +	 * the Zone Append command is indicated by the ctrl
> +	 * Maximum Data Transfer Size (MDTS).
> +	 */
> +	id->zasl = 0;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = 0;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
> +	unsigned int nz = blk_queue_nr_zones(q);
> +	u64 bufsize = (zmr->numd << 2) + 1;
> +	struct nvme_zone_report *rz;
> +	struct blk_zone *zones;
> +	int reported_zones;
> +	sector_t sect;
> +	u64 desc_size;
> +	u16 status;
> +	int i;
> +
> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
> +			      sizeof(struct blk_zone), GFP_KERNEL);
> +	if (!zones) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> +	if (!rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_zones;
> +	}
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
> +
> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
> +		desc_size = nvmet_zones_to_descsize(nz);

desc_size is actually not used anywhere to do something. So what is the purpose
of this ? If only to determine nz, the number of zones that can be reported,
surely you can calculate it instead of using this loop.

> +
> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
> +					     nvmet_bdev_report_zone_cb,
> +					     zones);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	rz->nr_zones = cpu_to_le64(reported_zones);
> +	for (i = 0; i < reported_zones; i++)
> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
> +
> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(rz);
> +out_free_zones:
> +	kvfree(zones);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
> +	u16 status = NVME_SC_SUCCESS;
> +	enum req_opf op;
> +	sector_t sect;
> +	int ret;
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
> +
> +	switch (c->zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		if (c->select_all)
> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		break;
> +	}
> +
> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	int sg_cnt = req->sg_cnt;
> +	struct scatterlist *sg;
> +	size_t mapped_data_len;
> +	struct iov_iter from;
> +	struct bio_vec *bvec;
> +	size_t mapped_cnt;
> +	size_t io_len = 0;
> +	struct bio *bio;
> +	int ret;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;
> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
> +	if (!bvec) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	while (sg_cnt) {
> +		mapped_data_len = 0;
> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
> +			nvmet_file_init_bvec(bvec, sg);
> +			mapped_data_len += bvec[mapped_cnt].bv_len;
> +			sg_cnt--;
> +			if (mapped_cnt == bv_cnt)
> +				break;
> +		}
> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
> +
> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
> +		bio_set_dev(bio, nvmet_bdev(req));
> +		bio->bi_iter.bi_sector = sect;
> +		bio->bi_opf = op;
> +
> +		ret =  __bio_iov_append_get_pages(bio, &from);
> +		if (unlikely(ret)) {
> +			status = NVME_SC_INTERNAL;
> +			bio_io_error(bio);
> +			kfree(bvec);
> +			goto out;
> +		}
> +
> +		ret = submit_bio_wait(bio);
> +		bio_put(bio);
> +		if (ret < 0) {
> +			status = NVME_SC_INTERNAL;
> +			break;
> +		}
> +
> +		io_len += mapped_data_len;
> +	}
> +
> +	sect += (io_len >> 9);
> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
> +	kfree(bvec);
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	return 0;
> +}
> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-26  9:06     ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-26  9:06 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
> support for bdev.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   2 +
>  drivers/nvme/target/admin-cmd.c   |   4 +-
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  18 ++
>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>  5 files changed, 413 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..bc147ff2df5d 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
> +
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index dca34489a1dc..509fd8dcca0c 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>  	nvmet_req_complete(req, status);
>  }
>  
> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> -				    void *id, off_t *off)
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off)
>  {
>  	struct nvme_ns_id_desc desc = {
>  		.nidt = type,
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 0abbefd9925e..2bd10960fa50 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  	return ret;
>  }
>  
> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>  {
>  	bv->bv_page = sg_page(sg);
>  	bv->bv_offset = sg->offset;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..0542ba672a31 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -81,6 +81,9 @@ struct nvmet_ns {
>  	struct pci_dev		*p2p_dev;
>  	int			pi_type;
>  	int			metadata_size;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ns_zns	id_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
> +#endif
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
> +			     void *id, off_t *off);
> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..8ea6641a55e3
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,390 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/uio.h>
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include <linux/module.h>
> +#include "nvmet.h"
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = 0;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +out:
> +	return status;
> +}
> +
> +static struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> +	return req->ns->bdev;
> +}
> +
> +static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
> +{
> +	return sizeof(struct nvme_zone_report) +
> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
> +}
> +
> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
> +/*
> + *  ZNS related command implementation and helprs.
> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +	if (bdev_is_zoned(nvmet_bdev(req))) {
> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +						 NVME_NIDT_CSI_LEN,
> +						 &nvme_cis_zns, off);
> +	}
> +
> +	return NVME_SC_SUCCESS;
> +}
> +
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> +		pr_err("block device with conventional zones not supported.");
> +		return false;
> +	}
> +	/*
> +	 * SMR drives will results in error if writes are not aligned to the
> +	 * physical block size just override.
> +	 */
> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +	return true;
> +}
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
> +				     void *data)
> +{
> +	struct blk_zone *zones = data;
> +
> +	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
> +
> +	return 0;
> +}
> +
> +static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
> +				struct nvme_zone_descriptor *rz)
> +{
> +	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
> +	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
> +	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
> +	rz->za = z->reset ? 1 << 2 : 0;
> +	rz->zt = z->type;
> +	rz->zs = z->cond << 4;
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status = 0;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Even though this function sets Zone Append Size Limit to 0,
> +	 * the 0 value here indicates that the maximum data transfer size for
> +	 * the Zone Append command is indicated by the ctrl
> +	 * Maximum Data Transfer Size (MDTS).
> +	 */
> +	id->zasl = 0;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = 0;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
> +	unsigned int nz = blk_queue_nr_zones(q);
> +	u64 bufsize = (zmr->numd << 2) + 1;
> +	struct nvme_zone_report *rz;
> +	struct blk_zone *zones;
> +	int reported_zones;
> +	sector_t sect;
> +	u64 desc_size;
> +	u16 status;
> +	int i;
> +
> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
> +			      sizeof(struct blk_zone), GFP_KERNEL);
> +	if (!zones) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> +	if (!rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_zones;
> +	}
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
> +
> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
> +		desc_size = nvmet_zones_to_descsize(nz);

desc_size is actually not used anywhere to do something. So what is the purpose
of this ? If only to determine nz, the number of zones that can be reported,
surely you can calculate it instead of using this loop.

> +
> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
> +					     nvmet_bdev_report_zone_cb,
> +					     zones);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	rz->nr_zones = cpu_to_le64(reported_zones);
> +	for (i = 0; i < reported_zones; i++)
> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
> +
> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(rz);
> +out_free_zones:
> +	kvfree(zones);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
> +	u16 status = NVME_SC_SUCCESS;
> +	enum req_opf op;
> +	sector_t sect;
> +	int ret;
> +
> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
> +
> +	switch (c->zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		if (c->select_all)
> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		break;
> +	}
> +
> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	int sg_cnt = req->sg_cnt;
> +	struct scatterlist *sg;
> +	size_t mapped_data_len;
> +	struct iov_iter from;
> +	struct bio_vec *bvec;
> +	size_t mapped_cnt;
> +	size_t io_len = 0;
> +	struct bio *bio;
> +	int ret;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;
> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
> +	if (!bvec) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	while (sg_cnt) {
> +		mapped_data_len = 0;
> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
> +			nvmet_file_init_bvec(bvec, sg);
> +			mapped_data_len += bvec[mapped_cnt].bv_len;
> +			sg_cnt--;
> +			if (mapped_cnt == bv_cnt)
> +				break;
> +		}
> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
> +
> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
> +		bio_set_dev(bio, nvmet_bdev(req));
> +		bio->bi_iter.bi_sector = sect;
> +		bio->bi_opf = op;
> +
> +		ret =  __bio_iov_append_get_pages(bio, &from);
> +		if (unlikely(ret)) {
> +			status = NVME_SC_INTERNAL;
> +			bio_io_error(bio);
> +			kfree(bvec);
> +			goto out;
> +		}
> +
> +		ret = submit_bio_wait(bio);
> +		bio_put(bio);
> +		if (ret < 0) {
> +			status = NVME_SC_INTERNAL;
> +			break;
> +		}
> +
> +		io_len += mapped_data_len;
> +	}
> +
> +	sect += (io_len >> 9);
> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
> +	kfree(bvec);
> +
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> +	return 0;
> +}
> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-26  8:36     ` Damien Le Moal
@ 2020-11-28  0:09       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-28  0:09 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: sagi, hch

On 11/26/20 00:36, Damien Le Moal wrote:
> On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
>> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
>> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
>> support for bdev.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>  drivers/nvme/target/Makefile      |   2 +
>>  drivers/nvme/target/admin-cmd.c   |   4 +-
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  18 ++
>>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>>  5 files changed, 413 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>> index ebf91fc4c72e..bc147ff2df5d 100644
>> --- a/drivers/nvme/target/Makefile
>> +++ b/drivers/nvme/target/Makefile
>> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>> +
>>  nvme-loop-y	+= loop.o
>>  nvmet-rdma-y	+= rdma.o
>>  nvmet-fc-y	+= fc.o
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index dca34489a1dc..509fd8dcca0c 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>>  	nvmet_req_complete(req, status);
>>  }
>>  
>> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> -				    void *id, off_t *off)
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off)
>>  {
>>  	struct nvme_ns_id_desc desc = {
>>  		.nidt = type,
>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>> index 0abbefd9925e..2bd10960fa50 100644
>> --- a/drivers/nvme/target/io-cmd-file.c
>> +++ b/drivers/nvme/target/io-cmd-file.c
>> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>  	return ret;
>>  }
>>  
>> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>  {
>>  	bv->bv_page = sg_page(sg);
>>  	bv->bv_offset = sg->offset;
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 592763732065..0542ba672a31 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -81,6 +81,9 @@ struct nvmet_ns {
>>  	struct pci_dev		*p2p_dev;
>>  	int			pi_type;
>>  	int			metadata_size;
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ns_zns	id_zns;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
>> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>>  	unsigned int		admin_timeout;
>>  	unsigned int		io_timeout;
>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>>  }
>>  
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off);
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>>  #endif /* _NVMET_H */
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..8ea6641a55e3
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,390 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +#include <linux/uio.h>
>> +#include <linux/nvme.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/module.h>
>> +#include "nvmet.h"
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
> This file is compiled only if CONFIG_BLK_DEV_ZONED is defined, so what is the
> point of this ? The stubs for the !CONFIG_BLK_DEV_ZONED case should go into the
> header file, no ?

Actually the conditional compilation of zns.c with CONFIG_BLK_DEV_ZONED

needs to be removed in the Makefile. I'm against putting these empty
stubs in the

makefile when CONFIG_BLK_DEV_ZONED is not true, as there are several files

transport, discovery, file/passthru backend etc in the nvme/target/*.c
which will add

empty stubs which has nothing to do with zoned bdev backend.

i.e. for Makefile it should be :-

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..c67276a25363 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)      += nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)          += nvmet-tcp.o
 
 nvmet-y                += core.o configfs.o admin-cmd.o fabrics-cmd.o \
-                       discovery.o io-cmd-file.o io-cmd-bdev.o
+                       zns,o discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)   += passthru.o
 nvme-loop-y    += loop.o
 nvmet-rdma-y   += rdma.o

>> +
>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>> +{
>> +	u16 status = 0;
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
> Why not return status directly here ? Same for the other cases below.
I prefer centralize returns with goto, which follows the similar code.
>> +	}
>> +
>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>> +		status = NVME_SC_INVALID_FIELD;
>> +out:
>> +	return status;
>> +}
>> +
>> +static struct block_device *nvmet_bdev(struct nvmet_req *req)
>> +{
>> +	return req->ns->bdev;
>> +}
>> +
>> +static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
>> +{
>> +	return sizeof(struct nvme_zone_report) +
>> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
>> +}
> These could be declared as inline.
>
Okay.
>> +
>> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
>> +{
>> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>> +{
>> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +/*
>> + *  ZNS related command implementation and helprs.
> s/helprs/helpers
Okay.
>
>> + */
>> +
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +	if (bdev_is_zoned(nvmet_bdev(req))) {
>> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +						 NVME_NIDT_CSI_LEN,
>> +						 &nvme_cis_zns, off);
>> +	}
> No need for the curly brackets.
Okay.
>> +
>> +	return NVME_SC_SUCCESS;
>> +}
>> +
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +}
>> +
>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>> +{
>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>> +		pr_err("block device with conventional zones not supported.");
> pr_err("block devices with conventional zones are not supported.");
>
> With SMR drives, the last zone of the disk can be smaller than the other zones.
> That needs to be checked too as that is not allowed by ZNS. Drives with a last
> smaller runt zone cannot be allowed.
Okay.
>> +		return false;
>> +	}
>> +	/*
>> +	 * SMR drives will results in error if writes are not aligned to the
>> +	 * physical block size just override.
>> +	 */
> 	/*
> 	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> 	 * to the device physical block size. So use this value as the logical
> 	 * block size to avoid errors.
> 	 */
Okay.
>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>> +	return true;
>> +}
>> +
>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
>> +				     void *data)
>> +{
>> +	struct blk_zone *zones = data;
>> +
>> +	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
>> +
>> +	return 0;
>> +}
>> +
>> +static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
>> +				struct nvme_zone_descriptor *rz)
>> +{
>> +	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>> +	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>> +	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>> +	rz->za = z->reset ? 1 << 2 : 0;
>> +	rz->zt = z->type;
>> +	rz->zs = z->cond << 4;
>> +}
>> +
>> +/*
>> + * ZNS related Admin and I/O command handlers.
>> + */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status = 0;
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * Even though this function sets Zone Append Size Limit to 0,
>> +	 * the 0 value here indicates that the maximum data transfer size for
>> +	 * the Zone Append command is indicated by the ctrl
>> +	 * Maximum Data Transfer Size (MDTS).
> But the target drive may have different values for max zone append sectors and
> max_hw_sectors/max_sectors. So I think this needs finer handling.
I think we can getaway with the

min(mdts, min(max_zone_append_sectors/max_hw_sectors), let me see.

>> +	 */
>> +	id->zasl = 0;
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u16 status = 0;
>> +	u64 zsze;
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +	if (!req->ns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
> That will result in nvmet_copy_to_sgl() being executed. Is that OK ?
> Shouldn't you do only the kfree(id_zns) and complete with an error here ?
Call to nvmet_copy_to_sgl() zeroout the values if any when we return the

buffer in case of error. I don't see any problem with zeroing out buffer in

case of error. Can you please explain why we shouldn't do that ?

>> +	}
>> +
>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
> Same comment.
See above reply.
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>> +					req->ns->blksize_shift;
>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>> +
>> +done:
>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>> +	kfree(id_zns);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>> +	unsigned int nz = blk_queue_nr_zones(q);
>> +	u64 bufsize = (zmr->numd << 2) + 1;
>> +	struct nvme_zone_report *rz;
>> +	struct blk_zone *zones;
>> +	int reported_zones;
>> +	sector_t sect;
>> +	u64 desc_size;
>> +	u16 status;
>> +	int i;
>> +
>> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
>> +			      sizeof(struct blk_zone), GFP_KERNEL);
> This is not super nice: a large disk will have an enormous number of zones
> (75000+ for largest SMR HDD today). But you actually do not need more zones
> descs than what fits in req buffer.
Call to nvmet_copy_to_sgl() nicely fail and return error.
>> +	if (!zones) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>> +	if (!rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_zones;
>> +	}
>> +
>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
>> +
>> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
>> +		desc_size = nvmet_zones_to_descsize(nz);
>> +
>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
>> +					     nvmet_bdev_report_zone_cb,
>> +					     zones);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	rz->nr_zones = cpu_to_le64(reported_zones);
>> +	for (i = 0; i < reported_zones; i++)
>> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
> This can be done directly in the report zones cb. That will avoid looping twice
> over the reported zones.
Okay, I'll try and remove this loop.
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(rz);
>> +out_free_zones:
>> +	kvfree(zones);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>> +	u16 status = NVME_SC_SUCCESS;
>> +	enum req_opf op;
>> +	sector_t sect;
>> +	int ret;
>> +
>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>> +
>> +	switch (c->zsa) {
>> +	case NVME_ZONE_OPEN:
>> +		op = REQ_OP_ZONE_OPEN;
>> +		break;
>> +	case NVME_ZONE_CLOSE:
>> +		op = REQ_OP_ZONE_CLOSE;
>> +		break;
>> +	case NVME_ZONE_FINISH:
>> +		op = REQ_OP_ZONE_FINISH;
>> +		break;
>> +	case NVME_ZONE_RESET:
>> +		if (c->select_all)
>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>> +		op = REQ_OP_ZONE_RESET;
>> +		break;
>> +	default:
>> +		status = NVME_SC_INVALID_FIELD;
>> +		break;
> You needa goto here or blkdev_zone_mgmt() will be called.
>
True.
>> +	}
>> +
>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	int sg_cnt = req->sg_cnt;
>> +	struct scatterlist *sg;
>> +	size_t mapped_data_len;
>> +	struct iov_iter from;
>> +	struct bio_vec *bvec;
>> +	size_t mapped_cnt;
>> +	size_t io_len = 0;
>> +	struct bio *bio;
>> +	int ret;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
> No request completion ?
See nvmet_check_transfer_len().
>> +
>> +	if (!req->sg_cnt) {
>> +		nvmet_req_complete(req, 0);
>> +		return;
>> +	}
>> +
>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>> +	if (!bvec) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	while (sg_cnt) {
>> +		mapped_data_len = 0;
>> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>> +			nvmet_file_init_bvec(bvec, sg);
>> +			mapped_data_len += bvec[mapped_cnt].bv_len;
>> +			sg_cnt--;
>> +			if (mapped_cnt == bv_cnt)
>> +				break;
>> +		}
>> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>> +
>> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
>> +		bio_set_dev(bio, nvmet_bdev(req));
>> +		bio->bi_iter.bi_sector = sect;
>> +		bio->bi_opf = op;
>> +
>> +		ret =  __bio_iov_append_get_pages(bio, &from);
>> +		if (unlikely(ret)) {
>> +			status = NVME_SC_INTERNAL;
>> +			bio_io_error(bio);
>> +			kfree(bvec);
>> +			goto out;
>> +		}
>> +
>> +		ret = submit_bio_wait(bio);
>> +		bio_put(bio);
>> +		if (ret < 0) {
>> +			status = NVME_SC_INTERNAL;
>> +			break;
>> +		}
>> +
>> +		io_len += mapped_data_len;
>> +	}
> This loop is equivalent to splitting a zone append. That must not be done as
> that can lead to totally unpredictable ordering of the chunks. What if another
> thread is doing zone append to the same zone at the same time ?
>
We can add something like per-zone bit locking here to prevent that,
multiple

threads. With zasl value derived from max_zone_append_sector (as
mentioned in

my reply ideally we shouldn't get the data len more than what we can
handle if I'm

not missing something.

>> +
>> +	sect += (io_len >> 9);
>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>> +	kfree(bvec);
>> +
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +#else  /* CONFIG_BLK_DEV_ZONED */
>> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +}
>> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +}
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	return 0;
>> +}
>> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>> +{
>> +	return false;
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +}
> These should go in the header file. And put the brackets on the same line.
>
As I explained earlier this bloats the header file with empty stubs and

adds functions in the traget transport code from nvmet.h which has

nothing to do with the backend. Regarding the {} style I don't see braces

on the same line for the empty stubs so keeping it consistent what is in

the repo.

>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-28  0:09       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-28  0:09 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: sagi, hch

On 11/26/20 00:36, Damien Le Moal wrote:
> On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
>> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
>> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
>> support for bdev.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>  drivers/nvme/target/Makefile      |   2 +
>>  drivers/nvme/target/admin-cmd.c   |   4 +-
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  18 ++
>>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>>  5 files changed, 413 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>> index ebf91fc4c72e..bc147ff2df5d 100644
>> --- a/drivers/nvme/target/Makefile
>> +++ b/drivers/nvme/target/Makefile
>> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>> +
>>  nvme-loop-y	+= loop.o
>>  nvmet-rdma-y	+= rdma.o
>>  nvmet-fc-y	+= fc.o
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index dca34489a1dc..509fd8dcca0c 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>>  	nvmet_req_complete(req, status);
>>  }
>>  
>> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> -				    void *id, off_t *off)
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off)
>>  {
>>  	struct nvme_ns_id_desc desc = {
>>  		.nidt = type,
>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>> index 0abbefd9925e..2bd10960fa50 100644
>> --- a/drivers/nvme/target/io-cmd-file.c
>> +++ b/drivers/nvme/target/io-cmd-file.c
>> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>  	return ret;
>>  }
>>  
>> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>  {
>>  	bv->bv_page = sg_page(sg);
>>  	bv->bv_offset = sg->offset;
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 592763732065..0542ba672a31 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -81,6 +81,9 @@ struct nvmet_ns {
>>  	struct pci_dev		*p2p_dev;
>>  	int			pi_type;
>>  	int			metadata_size;
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ns_zns	id_zns;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
>> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>>  	unsigned int		admin_timeout;
>>  	unsigned int		io_timeout;
>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
>> +#endif
>>  };
>>  
>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>>  }
>>  
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>> +			     void *id, off_t *off);
>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>>  #endif /* _NVMET_H */
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..8ea6641a55e3
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,390 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +#include <linux/uio.h>
>> +#include <linux/nvme.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/module.h>
>> +#include "nvmet.h"
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
> This file is compiled only if CONFIG_BLK_DEV_ZONED is defined, so what is the
> point of this ? The stubs for the !CONFIG_BLK_DEV_ZONED case should go into the
> header file, no ?

Actually the conditional compilation of zns.c with CONFIG_BLK_DEV_ZONED

needs to be removed in the Makefile. I'm against putting these empty
stubs in the

makefile when CONFIG_BLK_DEV_ZONED is not true, as there are several files

transport, discovery, file/passthru backend etc in the nvme/target/*.c
which will add

empty stubs which has nothing to do with zoned bdev backend.

i.e. for Makefile it should be :-

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..c67276a25363 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)      += nvme-fcloop.o
 obj-$(CONFIG_NVME_TARGET_TCP)          += nvmet-tcp.o
 
 nvmet-y                += core.o configfs.o admin-cmd.o fabrics-cmd.o \
-                       discovery.o io-cmd-file.o io-cmd-bdev.o
+                       zns,o discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)   += passthru.o
 nvme-loop-y    += loop.o
 nvmet-rdma-y   += rdma.o

>> +
>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>> +{
>> +	u16 status = 0;
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
> Why not return status directly here ? Same for the other cases below.
I prefer centralize returns with goto, which follows the similar code.
>> +	}
>> +
>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>> +		status = NVME_SC_INVALID_FIELD;
>> +out:
>> +	return status;
>> +}
>> +
>> +static struct block_device *nvmet_bdev(struct nvmet_req *req)
>> +{
>> +	return req->ns->bdev;
>> +}
>> +
>> +static u64 nvmet_zones_to_descsize(unsigned int nr_zones)
>> +{
>> +	return sizeof(struct nvme_zone_report) +
>> +		(sizeof(struct nvme_zone_descriptor) * nr_zones);
>> +}
> These could be declared as inline.
>
Okay.
>> +
>> +static inline u64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
>> +{
>> +	return sect >> (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>> +{
>> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>> +}
>> +
>> +/*
>> + *  ZNS related command implementation and helprs.
> s/helprs/helpers
Okay.
>
>> + */
>> +
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +	if (bdev_is_zoned(nvmet_bdev(req))) {
>> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +						 NVME_NIDT_CSI_LEN,
>> +						 &nvme_cis_zns, off);
>> +	}
> No need for the curly brackets.
Okay.
>> +
>> +	return NVME_SC_SUCCESS;
>> +}
>> +
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +}
>> +
>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>> +{
>> +	if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
>> +		pr_err("block device with conventional zones not supported.");
> pr_err("block devices with conventional zones are not supported.");
>
> With SMR drives, the last zone of the disk can be smaller than the other zones.
> That needs to be checked too as that is not allowed by ZNS. Drives with a last
> smaller runt zone cannot be allowed.
Okay.
>> +		return false;
>> +	}
>> +	/*
>> +	 * SMR drives will results in error if writes are not aligned to the
>> +	 * physical block size just override.
>> +	 */
> 	/*
> 	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> 	 * to the device physical block size. So use this value as the logical
> 	 * block size to avoid errors.
> 	 */
Okay.
>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>> +	return true;
>> +}
>> +
>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *zone, unsigned int idx,
>> +				     void *data)
>> +{
>> +	struct blk_zone *zones = data;
>> +
>> +	memcpy(&zones[idx], zone, sizeof(struct blk_zone));
>> +
>> +	return 0;
>> +}
>> +
>> +static void nvmet_get_zone_desc(struct nvmet_ns *ns, struct blk_zone *z,
>> +				struct nvme_zone_descriptor *rz)
>> +{
>> +	rz->zcap = cpu_to_le64(nvmet_sect_to_lba(ns, z->capacity));
>> +	rz->zslba = cpu_to_le64(nvmet_sect_to_lba(ns, z->start));
>> +	rz->wp = cpu_to_le64(nvmet_sect_to_lba(ns, z->wp));
>> +	rz->za = z->reset ? 1 << 2 : 0;
>> +	rz->zt = z->type;
>> +	rz->zs = z->cond << 4;
>> +}
>> +
>> +/*
>> + * ZNS related Admin and I/O command handlers.
>> + */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status = 0;
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * Even though this function sets Zone Append Size Limit to 0,
>> +	 * the 0 value here indicates that the maximum data transfer size for
>> +	 * the Zone Append command is indicated by the ctrl
>> +	 * Maximum Data Transfer Size (MDTS).
> But the target drive may have different values for max zone append sectors and
> max_hw_sectors/max_sectors. So I think this needs finer handling.
I think we can getaway with the

min(mdts, min(max_zone_append_sectors/max_hw_sectors), let me see.

>> +	 */
>> +	id->zasl = 0;
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u16 status = 0;
>> +	u64 zsze;
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +	if (!req->ns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
> That will result in nvmet_copy_to_sgl() being executed. Is that OK ?
> Shouldn't you do only the kfree(id_zns) and complete with an error here ?
Call to nvmet_copy_to_sgl() zeroout the values if any when we return the

buffer in case of error. I don't see any problem with zeroing out buffer in

case of error. Can you please explain why we shouldn't do that ?

>> +	}
>> +
>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
> Same comment.
See above reply.
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>> +					req->ns->blksize_shift;
>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>> +
>> +done:
>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>> +	kfree(id_zns);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>> +	unsigned int nz = blk_queue_nr_zones(q);
>> +	u64 bufsize = (zmr->numd << 2) + 1;
>> +	struct nvme_zone_report *rz;
>> +	struct blk_zone *zones;
>> +	int reported_zones;
>> +	sector_t sect;
>> +	u64 desc_size;
>> +	u16 status;
>> +	int i;
>> +
>> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
>> +			      sizeof(struct blk_zone), GFP_KERNEL);
> This is not super nice: a large disk will have an enormous number of zones
> (75000+ for largest SMR HDD today). But you actually do not need more zones
> descs than what fits in req buffer.
Call to nvmet_copy_to_sgl() nicely fail and return error.
>> +	if (!zones) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>> +	if (!rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_zones;
>> +	}
>> +
>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
>> +
>> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
>> +		desc_size = nvmet_zones_to_descsize(nz);
>> +
>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
>> +					     nvmet_bdev_report_zone_cb,
>> +					     zones);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	rz->nr_zones = cpu_to_le64(reported_zones);
>> +	for (i = 0; i < reported_zones; i++)
>> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
> This can be done directly in the report zones cb. That will avoid looping twice
> over the reported zones.
Okay, I'll try and remove this loop.
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(rz);
>> +out_free_zones:
>> +	kvfree(zones);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>> +	u16 status = NVME_SC_SUCCESS;
>> +	enum req_opf op;
>> +	sector_t sect;
>> +	int ret;
>> +
>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>> +
>> +	switch (c->zsa) {
>> +	case NVME_ZONE_OPEN:
>> +		op = REQ_OP_ZONE_OPEN;
>> +		break;
>> +	case NVME_ZONE_CLOSE:
>> +		op = REQ_OP_ZONE_CLOSE;
>> +		break;
>> +	case NVME_ZONE_FINISH:
>> +		op = REQ_OP_ZONE_FINISH;
>> +		break;
>> +	case NVME_ZONE_RESET:
>> +		if (c->select_all)
>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>> +		op = REQ_OP_ZONE_RESET;
>> +		break;
>> +	default:
>> +		status = NVME_SC_INVALID_FIELD;
>> +		break;
> You needa goto here or blkdev_zone_mgmt() will be called.
>
True.
>> +	}
>> +
>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	int sg_cnt = req->sg_cnt;
>> +	struct scatterlist *sg;
>> +	size_t mapped_data_len;
>> +	struct iov_iter from;
>> +	struct bio_vec *bvec;
>> +	size_t mapped_cnt;
>> +	size_t io_len = 0;
>> +	struct bio *bio;
>> +	int ret;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
> No request completion ?
See nvmet_check_transfer_len().
>> +
>> +	if (!req->sg_cnt) {
>> +		nvmet_req_complete(req, 0);
>> +		return;
>> +	}
>> +
>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>> +	if (!bvec) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	while (sg_cnt) {
>> +		mapped_data_len = 0;
>> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>> +			nvmet_file_init_bvec(bvec, sg);
>> +			mapped_data_len += bvec[mapped_cnt].bv_len;
>> +			sg_cnt--;
>> +			if (mapped_cnt == bv_cnt)
>> +				break;
>> +		}
>> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>> +
>> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
>> +		bio_set_dev(bio, nvmet_bdev(req));
>> +		bio->bi_iter.bi_sector = sect;
>> +		bio->bi_opf = op;
>> +
>> +		ret =  __bio_iov_append_get_pages(bio, &from);
>> +		if (unlikely(ret)) {
>> +			status = NVME_SC_INTERNAL;
>> +			bio_io_error(bio);
>> +			kfree(bvec);
>> +			goto out;
>> +		}
>> +
>> +		ret = submit_bio_wait(bio);
>> +		bio_put(bio);
>> +		if (ret < 0) {
>> +			status = NVME_SC_INTERNAL;
>> +			break;
>> +		}
>> +
>> +		io_len += mapped_data_len;
>> +	}
> This loop is equivalent to splitting a zone append. That must not be done as
> that can lead to totally unpredictable ordering of the chunks. What if another
> thread is doing zone append to the same zone at the same time ?
>
We can add something like per-zone bit locking here to prevent that,
multiple

threads. With zasl value derived from max_zone_append_sector (as
mentioned in

my reply ideally we shouldn't get the data len more than what we can
handle if I'm

not missing something.

>> +
>> +	sect += (io_len >> 9);
>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>> +	kfree(bvec);
>> +
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +#else  /* CONFIG_BLK_DEV_ZONED */
>> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +}
>> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +}
>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>> +{
>> +	return 0;
>> +}
>> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>> +{
>> +	return false;
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +}
>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>> +{
>> +}
> These should go in the header file. And put the brackets on the same line.
>
As I explained earlier this bloats the header file with empty stubs and

adds functions in the traget transport code from nvmet.h which has

nothing to do with the backend. Regarding the {} style I don't see braces

on the same line for the empty stubs so keeping it consistent what is in

the repo.

>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
  2020-11-28  0:09       ` Chaitanya Kulkarni
@ 2020-11-30  0:16         ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  0:16 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/28 9:09, Chaitanya Kulkarni wrote:
> On 11/26/20 00:36, Damien Le Moal wrote:
>> On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
>>> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
>>> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
>>> support for bdev.
>>>
>>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>>> ---
>>>  drivers/nvme/target/Makefile      |   2 +
>>>  drivers/nvme/target/admin-cmd.c   |   4 +-
>>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>>  drivers/nvme/target/nvmet.h       |  18 ++
>>>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>>>  5 files changed, 413 insertions(+), 3 deletions(-)
>>>  create mode 100644 drivers/nvme/target/zns.c
>>>
>>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>>> index ebf91fc4c72e..bc147ff2df5d 100644
>>> --- a/drivers/nvme/target/Makefile
>>> +++ b/drivers/nvme/target/Makefile
>>> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>>>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>>> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>>> +
>>>  nvme-loop-y	+= loop.o
>>>  nvmet-rdma-y	+= rdma.o
>>>  nvmet-fc-y	+= fc.o
>>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>>> index dca34489a1dc..509fd8dcca0c 100644
>>> --- a/drivers/nvme/target/admin-cmd.c
>>> +++ b/drivers/nvme/target/admin-cmd.c
>>> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>>>  	nvmet_req_complete(req, status);
>>>  }
>>>  
>>> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>>> -				    void *id, off_t *off)
>>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>>> +			     void *id, off_t *off)
>>>  {
>>>  	struct nvme_ns_id_desc desc = {
>>>  		.nidt = type,
>>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>>> index 0abbefd9925e..2bd10960fa50 100644
>>> --- a/drivers/nvme/target/io-cmd-file.c
>>> +++ b/drivers/nvme/target/io-cmd-file.c
>>> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>>  	return ret;
>>>  }
>>>  
>>> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>>  {
>>>  	bv->bv_page = sg_page(sg);
>>>  	bv->bv_offset = sg->offset;
>>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>>> index 592763732065..0542ba672a31 100644
>>> --- a/drivers/nvme/target/nvmet.h
>>> +++ b/drivers/nvme/target/nvmet.h
>>> @@ -81,6 +81,9 @@ struct nvmet_ns {
>>>  	struct pci_dev		*p2p_dev;
>>>  	int			pi_type;
>>>  	int			metadata_size;
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +	struct nvme_id_ns_zns	id_zns;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
>>> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>>>  	unsigned int		admin_timeout;
>>>  	unsigned int		io_timeout;
>>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>>> +
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>>> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>>>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>>>  }
>>>  
>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
>>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
>>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>>> +			     void *id, off_t *off);
>>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>>>  #endif /* _NVMET_H */
>>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>>> new file mode 100644
>>> index 000000000000..8ea6641a55e3
>>> --- /dev/null
>>> +++ b/drivers/nvme/target/zns.c
>>> @@ -0,0 +1,390 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * NVMe ZNS-ZBD command implementation.
>>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>>> + */
>>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>> +#include <linux/uio.h>
>>> +#include <linux/nvme.h>
>>> +#include <linux/blkdev.h>
>>> +#include <linux/module.h>
>>> +#include "nvmet.h"
>>> +
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>> This file is compiled only if CONFIG_BLK_DEV_ZONED is defined, so what is the
>> point of this ? The stubs for the !CONFIG_BLK_DEV_ZONED case should go into the
>> header file, no ?
> 
> Actually the conditional compilation of zns.c with CONFIG_BLK_DEV_ZONED
> 
> needs to be removed in the Makefile. I'm against putting these empty
> stubs in the
> 
> makefile when CONFIG_BLK_DEV_ZONED is not true, as there are several files
> 
> transport, discovery, file/passthru backend etc in the nvme/target/*.c
> which will add
> 
> empty stubs which has nothing to do with zoned bdev backend.

Each file will not need to add empty stubs if these stubs are in a common
header. And empty stubs are compiled away so I do not see the problem. Having
such empty stubs in a header file under an #ifdef is I think a fairly standard
coding style in the kernel. The block layer zone code and scsi SMR code is coded
like that.

> 
> i.e. for Makefile it should be :-
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..c67276a25363 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)      += nvme-fcloop.o
>  obj-$(CONFIG_NVME_TARGET_TCP)          += nvmet-tcp.o
>  
>  nvmet-y                += core.o configfs.o admin-cmd.o fabrics-cmd.o \
> -                       discovery.o io-cmd-file.o io-cmd-bdev.o
> +                       zns,o discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)   += passthru.o
>  nvme-loop-y    += loop.o
>  nvmet-rdma-y   += rdma.o

The NS scan and test for support of ZNS could go in discovery.c, and compiled
unconditionally exactly like the host nvme driver does. Everything else (ZNS
commands execution) can go into zns.c and that file compiled conditionally, with
empty stubs in zns.h or any other appropriate header file. I think that make
things clean and easy to understand, and avoid the big #ifdef in the C code.
Just my opinion here. You are the maintainer, so your call...

>>> +
>>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>>> +{
>>> +	u16 status = 0;
>>> +
>>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>> Why not return status directly here ? Same for the other cases below.
> I prefer centralize returns with goto, which follows the similar code.

Sure, but for such a simple function, this looks rather strange and in my
opinion uselessly complicates the code.

>>> +	}
>>> +
>>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +out:
>>> +	return status;
>>> +}

[...]
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +	struct nvme_id_ns_zns *id_zns;
>>> +	u16 status = 0;
>>> +	u64 zsze;
>>> +
>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>> +	if (!id_zns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>>> +	if (!req->ns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto done;
>> That will result in nvmet_copy_to_sgl() being executed. Is that OK ?
>> Shouldn't you do only the kfree(id_zns) and complete with an error here ?
> Call to nvmet_copy_to_sgl() zeroout the values if any when we return the
> 
> buffer in case of error. I don't see any problem with zeroing out buffer in
> 
> case of error. Can you please explain why we shouldn't do that ?

I cannot explain anything. I was merely pointing out what the code was doing and
if that was intentional. If there are no problems, then fine.

> 
>>> +	}
>>> +
>>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto done;
>> Same comment.
> See above reply.
>>> +	}
>>> +
>>> +	nvmet_ns_revalidate(req->ns);
>>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>>> +					req->ns->blksize_shift;
>>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>>> +
>>> +done:
>>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>>> +	kfree(id_zns);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
>>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>>> +	unsigned int nz = blk_queue_nr_zones(q);
>>> +	u64 bufsize = (zmr->numd << 2) + 1;
>>> +	struct nvme_zone_report *rz;
>>> +	struct blk_zone *zones;
>>> +	int reported_zones;
>>> +	sector_t sect;
>>> +	u64 desc_size;
>>> +	u16 status;
>>> +	int i;
>>> +
>>> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
>>> +			      sizeof(struct blk_zone), GFP_KERNEL);
>> This is not super nice: a large disk will have an enormous number of zones
>> (75000+ for largest SMR HDD today). But you actually do not need more zones
>> descs than what fits in req buffer.
> Call to nvmet_copy_to_sgl() nicely fail and return error.

That is not my point. The point is that this code will do an allocation for
75,000 x 64B = 4.8MB even if a single zone report is being requested. That is
not acceptable. This needs optimization: allocate only as many zone descriptors
as is requested.

>>> +	if (!zones) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>> +	if (!rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_zones;
>>> +	}
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
>>> +
>>> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
>>> +		desc_size = nvmet_zones_to_descsize(nz);
>>> +
>>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
>>> +					     nvmet_bdev_report_zone_cb,
>>> +					     zones);
>>> +	if (reported_zones < 0) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_report_zones;
>>> +	}
>>> +
>>> +	rz->nr_zones = cpu_to_le64(reported_zones);
>>> +	for (i = 0; i < reported_zones; i++)
>>> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
>> This can be done directly in the report zones cb. That will avoid looping twice
>> over the reported zones.
> Okay, I'll try and remove this loop.
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
>>> +
>>> +out_free_report_zones:
>>> +	kvfree(rz);
>>> +out_free_zones:
>>> +	kvfree(zones);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	enum req_opf op;
>>> +	sector_t sect;
>>> +	int ret;
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>>> +
>>> +	switch (c->zsa) {
>>> +	case NVME_ZONE_OPEN:
>>> +		op = REQ_OP_ZONE_OPEN;
>>> +		break;
>>> +	case NVME_ZONE_CLOSE:
>>> +		op = REQ_OP_ZONE_CLOSE;
>>> +		break;
>>> +	case NVME_ZONE_FINISH:
>>> +		op = REQ_OP_ZONE_FINISH;
>>> +		break;
>>> +	case NVME_ZONE_RESET:
>>> +		if (c->select_all)
>>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>>> +		op = REQ_OP_ZONE_RESET;
>>> +		break;
>>> +	default:
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		break;
>> You needa goto here or blkdev_zone_mgmt() will be called.
>>
> True.
>>> +	}
>>> +
>>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>>> +	if (ret)
>>> +		status = NVME_SC_INTERNAL;
>>> +
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
>>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	int sg_cnt = req->sg_cnt;
>>> +	struct scatterlist *sg;
>>> +	size_t mapped_data_len;
>>> +	struct iov_iter from;
>>> +	struct bio_vec *bvec;
>>> +	size_t mapped_cnt;
>>> +	size_t io_len = 0;
>>> +	struct bio *bio;
>>> +	int ret;
>>> +
>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>> +		return;
>> No request completion ?
> See nvmet_check_transfer_len().
>>> +
>>> +	if (!req->sg_cnt) {
>>> +		nvmet_req_complete(req, 0);
>>> +		return;
>>> +	}
>>> +
>>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>>> +	if (!bvec) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	while (sg_cnt) {
>>> +		mapped_data_len = 0;
>>> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>>> +			nvmet_file_init_bvec(bvec, sg);
>>> +			mapped_data_len += bvec[mapped_cnt].bv_len;
>>> +			sg_cnt--;
>>> +			if (mapped_cnt == bv_cnt)
>>> +				break;
>>> +		}
>>> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>>> +
>>> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
>>> +		bio_set_dev(bio, nvmet_bdev(req));
>>> +		bio->bi_iter.bi_sector = sect;
>>> +		bio->bi_opf = op;
>>> +
>>> +		ret =  __bio_iov_append_get_pages(bio, &from);
>>> +		if (unlikely(ret)) {
>>> +			status = NVME_SC_INTERNAL;
>>> +			bio_io_error(bio);
>>> +			kfree(bvec);
>>> +			goto out;
>>> +		}
>>> +
>>> +		ret = submit_bio_wait(bio);
>>> +		bio_put(bio);
>>> +		if (ret < 0) {
>>> +			status = NVME_SC_INTERNAL;
>>> +			break;
>>> +		}
>>> +
>>> +		io_len += mapped_data_len;
>>> +	}
>> This loop is equivalent to splitting a zone append. That must not be done as
>> that can lead to totally unpredictable ordering of the chunks. What if another
>> thread is doing zone append to the same zone at the same time ?
>>
> We can add something like per-zone bit locking here to prevent that,
> multiple
> 
> threads. With zasl value derived from max_zone_append_sector (as
> mentioned in
> 
> my reply ideally we shouldn't get the data len more than what we can
> handle if I'm
> 
> not missing something.

No way: a locking mechanism will negate the benefits of zone append vs regular
writes. So NACK on that. As you say, since you advertized the max zone append
sectors, you should not be getting a request larger than that limit. If you do,
fail the request immediately instead of trying to split the zone append command.

> 
>>> +
>>> +	sect += (io_len >> 9);
>>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>>> +	kfree(bvec);
>>> +
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +#else  /* CONFIG_BLK_DEV_ZONED */
>>> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>> +{
>>> +}
>>> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +}
>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>>> +{
>>> +	return 0;
>>> +}
>>> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>>> +{
>>> +	return false;
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>>> +{
>>> +}
>> These should go in the header file. And put the brackets on the same line.
>>
> As I explained earlier this bloats the header file with empty stubs and
> 
> adds functions in the traget transport code from nvmet.h which has
> 
> nothing to do with the backend. Regarding the {} style I don't see braces
> 
> on the same line for the empty stubs so keeping it consistent what is in
> 
> the repo.

As I said above, I am not a fan of this style... At the very least, please
remove the static stubs as that likely will generate a cdefined but not used
compiler warning.

> 
>>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-30  0:16         ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  0:16 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/28 9:09, Chaitanya Kulkarni wrote:
> On 11/26/20 00:36, Damien Le Moal wrote:
>> On 2020/11/26 11:42, Chaitanya Kulkarni wrote:
>>> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
>>> zone-mgmt-recv and zone-append handlers for NVMeOF target to enable ZNS
>>> support for bdev.
>>>
>>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>>> ---
>>>  drivers/nvme/target/Makefile      |   2 +
>>>  drivers/nvme/target/admin-cmd.c   |   4 +-
>>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>>  drivers/nvme/target/nvmet.h       |  18 ++
>>>  drivers/nvme/target/zns.c         | 390 ++++++++++++++++++++++++++++++
>>>  5 files changed, 413 insertions(+), 3 deletions(-)
>>>  create mode 100644 drivers/nvme/target/zns.c
>>>
>>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>>> index ebf91fc4c72e..bc147ff2df5d 100644
>>> --- a/drivers/nvme/target/Makefile
>>> +++ b/drivers/nvme/target/Makefile
>>> @@ -12,6 +12,8 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>>>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>>> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>>> +
>>>  nvme-loop-y	+= loop.o
>>>  nvmet-rdma-y	+= rdma.o
>>>  nvmet-fc-y	+= fc.o
>>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>>> index dca34489a1dc..509fd8dcca0c 100644
>>> --- a/drivers/nvme/target/admin-cmd.c
>>> +++ b/drivers/nvme/target/admin-cmd.c
>>> @@ -579,8 +579,8 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
>>>  	nvmet_req_complete(req, status);
>>>  }
>>>  
>>> -static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>>> -				    void *id, off_t *off)
>>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>>> +			     void *id, off_t *off)
>>>  {
>>>  	struct nvme_ns_id_desc desc = {
>>>  		.nidt = type,
>>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>>> index 0abbefd9925e..2bd10960fa50 100644
>>> --- a/drivers/nvme/target/io-cmd-file.c
>>> +++ b/drivers/nvme/target/io-cmd-file.c
>>> @@ -89,7 +89,7 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>>  	return ret;
>>>  }
>>>  
>>> -static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
>>>  {
>>>  	bv->bv_page = sg_page(sg);
>>>  	bv->bv_offset = sg->offset;
>>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>>> index 592763732065..0542ba672a31 100644
>>> --- a/drivers/nvme/target/nvmet.h
>>> +++ b/drivers/nvme/target/nvmet.h
>>> @@ -81,6 +81,9 @@ struct nvmet_ns {
>>>  	struct pci_dev		*p2p_dev;
>>>  	int			pi_type;
>>>  	int			metadata_size;
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +	struct nvme_id_ns_zns	id_zns;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
>>> @@ -251,6 +254,10 @@ struct nvmet_subsys {
>>>  	unsigned int		admin_timeout;
>>>  	unsigned int		io_timeout;
>>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>>> +
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +	struct nvme_id_ctrl_zns	id_ctrl_zns;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>>> @@ -603,4 +610,15 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>>>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>>>  }
>>>  
>>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off);
>>> +bool nvmet_bdev_zns_config(struct nvmet_ns *ns);
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log);
>>> +u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
>>> +			     void *id, off_t *off);
>>> +void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg);
>>>  #endif /* _NVMET_H */
>>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>>> new file mode 100644
>>> index 000000000000..8ea6641a55e3
>>> --- /dev/null
>>> +++ b/drivers/nvme/target/zns.c
>>> @@ -0,0 +1,390 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * NVMe ZNS-ZBD command implementation.
>>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>>> + */
>>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>> +#include <linux/uio.h>
>>> +#include <linux/nvme.h>
>>> +#include <linux/blkdev.h>
>>> +#include <linux/module.h>
>>> +#include "nvmet.h"
>>> +
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>> This file is compiled only if CONFIG_BLK_DEV_ZONED is defined, so what is the
>> point of this ? The stubs for the !CONFIG_BLK_DEV_ZONED case should go into the
>> header file, no ?
> 
> Actually the conditional compilation of zns.c with CONFIG_BLK_DEV_ZONED
> 
> needs to be removed in the Makefile. I'm against putting these empty
> stubs in the
> 
> makefile when CONFIG_BLK_DEV_ZONED is not true, as there are several files
> 
> transport, discovery, file/passthru backend etc in the nvme/target/*.c
> which will add
> 
> empty stubs which has nothing to do with zoned bdev backend.

Each file will not need to add empty stubs if these stubs are in a common
header. And empty stubs are compiled away so I do not see the problem. Having
such empty stubs in a header file under an #ifdef is I think a fairly standard
coding style in the kernel. The block layer zone code and scsi SMR code is coded
like that.

> 
> i.e. for Makefile it should be :-
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..c67276a25363 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_NVME_TARGET_FCLOOP)      += nvme-fcloop.o
>  obj-$(CONFIG_NVME_TARGET_TCP)          += nvmet-tcp.o
>  
>  nvmet-y                += core.o configfs.o admin-cmd.o fabrics-cmd.o \
> -                       discovery.o io-cmd-file.o io-cmd-bdev.o
> +                       zns,o discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)   += passthru.o
>  nvme-loop-y    += loop.o
>  nvmet-rdma-y   += rdma.o

The NS scan and test for support of ZNS could go in discovery.c, and compiled
unconditionally exactly like the host nvme driver does. Everything else (ZNS
commands execution) can go into zns.c and that file compiled conditionally, with
empty stubs in zns.h or any other appropriate header file. I think that make
things clean and easy to understand, and avoid the big #ifdef in the C code.
Just my opinion here. You are the maintainer, so your call...

>>> +
>>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>>> +{
>>> +	u16 status = 0;
>>> +
>>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>> Why not return status directly here ? Same for the other cases below.
> I prefer centralize returns with goto, which follows the similar code.

Sure, but for such a simple function, this looks rather strange and in my
opinion uselessly complicates the code.

>>> +	}
>>> +
>>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +out:
>>> +	return status;
>>> +}

[...]
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +	struct nvme_id_ns_zns *id_zns;
>>> +	u16 status = 0;
>>> +	u64 zsze;
>>> +
>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>> +	if (!id_zns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>>> +	if (!req->ns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto done;
>> That will result in nvmet_copy_to_sgl() being executed. Is that OK ?
>> Shouldn't you do only the kfree(id_zns) and complete with an error here ?
> Call to nvmet_copy_to_sgl() zeroout the values if any when we return the
> 
> buffer in case of error. I don't see any problem with zeroing out buffer in
> 
> case of error. Can you please explain why we shouldn't do that ?

I cannot explain anything. I was merely pointing out what the code was doing and
if that was intentional. If there are no problems, then fine.

> 
>>> +	}
>>> +
>>> +	if (!bdev_is_zoned(nvmet_bdev(req))) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto done;
>> Same comment.
> See above reply.
>>> +	}
>>> +
>>> +	nvmet_ns_revalidate(req->ns);
>>> +	zsze = (bdev_zone_sectors(nvmet_bdev(req)) << 9) >>
>>> +					req->ns->blksize_shift;
>>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(nvmet_bdev(req)));
>>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(nvmet_bdev(req)));
>>> +
>>> +done:
>>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>>> +	kfree(id_zns);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	struct request_queue *q = nvmet_bdev(req)->bd_disk->queue;
>>> +	struct nvme_zone_mgmt_recv_cmd *zmr = &req->cmd->zmr;
>>> +	unsigned int nz = blk_queue_nr_zones(q);
>>> +	u64 bufsize = (zmr->numd << 2) + 1;
>>> +	struct nvme_zone_report *rz;
>>> +	struct blk_zone *zones;
>>> +	int reported_zones;
>>> +	sector_t sect;
>>> +	u64 desc_size;
>>> +	u16 status;
>>> +	int i;
>>> +
>>> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
>>> +			      sizeof(struct blk_zone), GFP_KERNEL);
>> This is not super nice: a large disk will have an enormous number of zones
>> (75000+ for largest SMR HDD today). But you actually do not need more zones
>> descs than what fits in req buffer.
> Call to nvmet_copy_to_sgl() nicely fail and return error.

That is not my point. The point is that this code will do an allocation for
75,000 x 64B = 4.8MB even if a single zone report is being requested. That is
not acceptable. This needs optimization: allocate only as many zone descriptors
as is requested.

>>> +	if (!zones) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>> +	if (!rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_zones;
>>> +	}
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
>>> +
>>> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
>>> +		desc_size = nvmet_zones_to_descsize(nz);
>>> +
>>> +	reported_zones = blkdev_report_zones(nvmet_bdev(req), sect, nz,
>>> +					     nvmet_bdev_report_zone_cb,
>>> +					     zones);
>>> +	if (reported_zones < 0) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_report_zones;
>>> +	}
>>> +
>>> +	rz->nr_zones = cpu_to_le64(reported_zones);
>>> +	for (i = 0; i < reported_zones; i++)
>>> +		nvmet_get_zone_desc(req->ns, &zones[i], &rz->entries[i]);
>> This can be done directly in the report zones cb. That will avoid looping twice
>> over the reported zones.
> Okay, I'll try and remove this loop.
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, rz, bufsize);
>>> +
>>> +out_free_report_zones:
>>> +	kvfree(rz);
>>> +out_free_zones:
>>> +	kvfree(zones);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +	sector_t nr_sect = bdev_zone_sectors(nvmet_bdev(req));
>>> +	struct nvme_zone_mgmt_send_cmd *c = &req->cmd->zms;
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	enum req_opf op;
>>> +	sector_t sect;
>>> +	int ret;
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zms.slba));
>>> +
>>> +	switch (c->zsa) {
>>> +	case NVME_ZONE_OPEN:
>>> +		op = REQ_OP_ZONE_OPEN;
>>> +		break;
>>> +	case NVME_ZONE_CLOSE:
>>> +		op = REQ_OP_ZONE_CLOSE;
>>> +		break;
>>> +	case NVME_ZONE_FINISH:
>>> +		op = REQ_OP_ZONE_FINISH;
>>> +		break;
>>> +	case NVME_ZONE_RESET:
>>> +		if (c->select_all)
>>> +			nr_sect = get_capacity(nvmet_bdev(req)->bd_disk);
>>> +		op = REQ_OP_ZONE_RESET;
>>> +		break;
>>> +	default:
>>> +		status = NVME_SC_INVALID_FIELD;
>>> +		break;
>> You needa goto here or blkdev_zone_mgmt() will be called.
>>
> True.
>>> +	}
>>> +
>>> +	ret = blkdev_zone_mgmt(nvmet_bdev(req), op, sect, nr_sect, GFP_KERNEL);
>>> +	if (ret)
>>> +		status = NVME_SC_INTERNAL;
>>> +
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +	unsigned long bv_cnt = min(req->sg_cnt, BIO_MAX_PAGES);
>>> +	int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	u64 slba = le64_to_cpu(req->cmd->rw.slba);
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, slba);
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	int sg_cnt = req->sg_cnt;
>>> +	struct scatterlist *sg;
>>> +	size_t mapped_data_len;
>>> +	struct iov_iter from;
>>> +	struct bio_vec *bvec;
>>> +	size_t mapped_cnt;
>>> +	size_t io_len = 0;
>>> +	struct bio *bio;
>>> +	int ret;
>>> +
>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>> +		return;
>> No request completion ?
> See nvmet_check_transfer_len().
>>> +
>>> +	if (!req->sg_cnt) {
>>> +		nvmet_req_complete(req, 0);
>>> +		return;
>>> +	}
>>> +
>>> +	bvec = kmalloc_array(bv_cnt, sizeof(*bvec), GFP_KERNEL);
>>> +	if (!bvec) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	while (sg_cnt) {
>>> +		mapped_data_len = 0;
>>> +		for_each_sg(req->sg, sg, req->sg_cnt, mapped_cnt) {
>>> +			nvmet_file_init_bvec(bvec, sg);
>>> +			mapped_data_len += bvec[mapped_cnt].bv_len;
>>> +			sg_cnt--;
>>> +			if (mapped_cnt == bv_cnt)
>>> +				break;
>>> +		}
>>> +		iov_iter_bvec(&from, WRITE, bvec, mapped_cnt, mapped_data_len);
>>> +
>>> +		bio = bio_alloc(GFP_KERNEL, bv_cnt);
>>> +		bio_set_dev(bio, nvmet_bdev(req));
>>> +		bio->bi_iter.bi_sector = sect;
>>> +		bio->bi_opf = op;
>>> +
>>> +		ret =  __bio_iov_append_get_pages(bio, &from);
>>> +		if (unlikely(ret)) {
>>> +			status = NVME_SC_INTERNAL;
>>> +			bio_io_error(bio);
>>> +			kfree(bvec);
>>> +			goto out;
>>> +		}
>>> +
>>> +		ret = submit_bio_wait(bio);
>>> +		bio_put(bio);
>>> +		if (ret < 0) {
>>> +			status = NVME_SC_INTERNAL;
>>> +			break;
>>> +		}
>>> +
>>> +		io_len += mapped_data_len;
>>> +	}
>> This loop is equivalent to splitting a zone append. That must not be done as
>> that can lead to totally unpredictable ordering of the chunks. What if another
>> thread is doing zone append to the same zone at the same time ?
>>
> We can add something like per-zone bit locking here to prevent that,
> multiple
> 
> threads. With zasl value derived from max_zone_append_sector (as
> mentioned in
> 
> my reply ideally we shouldn't get the data len more than what we can
> handle if I'm
> 
> not missing something.

No way: a locking mechanism will negate the benefits of zone append vs regular
writes. So NACK on that. As you say, since you advertized the max zone append
sectors, you should not be getting a request larger than that limit. If you do,
fail the request immediately instead of trying to split the zone append command.

> 
>>> +
>>> +	sect += (io_len >> 9);
>>> +	req->cqe->result.u64 = le64_to_cpu(nvmet_sect_to_lba(req->ns, sect));
>>> +	kfree(bvec);
>>> +
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +#else  /* CONFIG_BLK_DEV_ZONED */
>>> +static void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>>> +{
>>> +}
>>> +static void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +}
>>> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
>>> +{
>>> +	return 0;
>>> +}
>>> +static bool nvmet_bdev_zns_config(struct nvmet_ns *ns)
>>> +{
>>> +	return false;
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +}
>>> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
>>> +{
>>> +}
>> These should go in the header file. And put the brackets on the same line.
>>
> As I explained earlier this bloats the header file with empty stubs and
> 
> adds functions in the traget transport code from nvmet.h which has
> 
> nothing to do with the backend. Regarding the {} style I don't see braces
> 
> on the same line for the empty stubs so keeping it consistent what is in
> 
> the repo.

As I said above, I am not a fan of this style... At the very least, please
remove the static stubs as that likely will generate a cdefined but not used
compiler warning.

> 
>>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>>
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
       [not found]     ` <BYAPR04MB496572575C9B29E682B0C25286F70@BYAPR04MB4965.namprd04.prod.outlook.com>
@ 2020-11-30  0:18         ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  0:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/28 9:11, Chaitanya Kulkarni wrote:
> On 11/26/20 01:06, Damien Le Moal wrote:
>>> +
>>> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
>>> +			      sizeof(struct blk_zone), GFP_KERNEL);
>>> +	if (!zones) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>> +	if (!rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_zones;
>>> +	}
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
>>> +
>>> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
>>> +		desc_size = nvmet_zones_to_descsize(nz);
>> desc_size is actually not used anywhere to do something. So what is the purpose
>> of this ? If only to determine nz, the number of zones that can be reported,
>> surely you can calculate it instead of using this loop.
>>
> It reads nicely. Let me see if I can get rid of the loop without having to
> add complex calculations.

I do not think it reads nicely at all: it makes what is being "calculated" hard
to understand. And that definitely looks to me like a waste of CPU cycles
compared to a real calculation.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/9] nvmet: add ZNS support for bdev-ns
@ 2020-11-30  0:18         ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  0:18 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/28 9:11, Chaitanya Kulkarni wrote:
> On 11/26/20 01:06, Damien Le Moal wrote:
>>> +
>>> +	desc_size = nvmet_zones_to_descsize(blk_queue_nr_zones(q));
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	zones = kvcalloc(blkdev_nr_zones(nvmet_bdev(req)->bd_disk),
>>> +			      sizeof(struct blk_zone), GFP_KERNEL);
>>> +	if (!zones) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
>>> +	if (!rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_zones;
>>> +	}
>>> +
>>> +	sect = nvmet_lba_to_sect(req->ns, le64_to_cpu(req->cmd->zmr.slba));
>>> +
>>> +	for (nz = blk_queue_nr_zones(q); desc_size >= bufsize; nz--)
>>> +		desc_size = nvmet_zones_to_descsize(nz);
>> desc_size is actually not used anywhere to do something. So what is the purpose
>> of this ? If only to determine nz, the number of zones that can be reported,
>> surely you can calculate it instead of using this loop.
>>
> It reads nicely. Let me see if I can get rid of the loop without having to
> add complex calculations.

I do not think it reads nicely at all: it makes what is being "calculated" hard
to understand. And that definitely looks to me like a waste of CPU cycles
compared to a real calculation.


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
       [not found]     ` <BYAPR04MB4965E885FFD93A530AF315B886F70@BYAPR04MB4965.namprd04.prod.outlook.com>
@ 2020-11-30  0:21         ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  0:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/28 9:13, Chaitanya Kulkarni wrote:
> On 11/26/20 00:40, Damien Le Moal wrote:
>>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>>> index e7d2b96cda6b..cd368cbe3855 100644
>>> --- a/drivers/nvme/target/admin-cmd.c
>>> +++ b/drivers/nvme/target/admin-cmd.c
>>> @@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>>  	switch (req->cmd->identify.cns) {
>>>  	case NVME_ID_CNS_NS:
>>>  		return nvmet_execute_identify_ns(req);
>>> +	case NVME_ID_CNS_CS_NS:
>>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>>> +			return nvmet_execute_identify_cns_cs_ns(req);
>>> +		break;
>>>  	case NVME_ID_CNS_CTRL:
>>>  		return nvmet_execute_identify_ctrl(req);
>>>  	case NVME_ID_CNS_CS_CTRL:
>>>
>> Same patch as patch 5 ? Bug ?
> 
> Yes, but right now there are no ns-type than ZNS that we support.
> Can you please explain I think I didn't understand what bug you are
> referring to.

Patch 5 and 6 in the series you posted look identical to me. It looks like the
same patch was sent twice with a different number. Unless I missed some subtle
difference between them ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/9] nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
@ 2020-11-30  0:21         ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  0:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: sagi, hch

On 2020/11/28 9:13, Chaitanya Kulkarni wrote:
> On 11/26/20 00:40, Damien Le Moal wrote:
>>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>>> index e7d2b96cda6b..cd368cbe3855 100644
>>> --- a/drivers/nvme/target/admin-cmd.c
>>> +++ b/drivers/nvme/target/admin-cmd.c
>>> @@ -648,6 +648,10 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>>  	switch (req->cmd->identify.cns) {
>>>  	case NVME_ID_CNS_NS:
>>>  		return nvmet_execute_identify_ns(req);
>>> +	case NVME_ID_CNS_CS_NS:
>>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>>> +			return nvmet_execute_identify_cns_cs_ns(req);
>>> +		break;
>>>  	case NVME_ID_CNS_CTRL:
>>>  		return nvmet_execute_identify_ctrl(req);
>>>  	case NVME_ID_CNS_CS_CTRL:
>>>
>> Same patch as patch 5 ? Bug ?
> 
> Yes, but right now there are no ns-type than ZNS that we support.
> Can you please explain I think I didn't understand what bug you are
> referring to.

Patch 5 and 6 in the series you posted look identical to me. It looks like the
same patch was sent twice with a different number. Unless I missed some subtle
difference between them ?


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
  2020-11-30  6:51   ` Damien Le Moal
@ 2020-12-01  3:42     ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:42 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/29/20 22:52, Damien Le Moal wrote:
>>  block/bio.c                       |   3 +-
>>  drivers/nvme/target/Makefile      |   2 +-
>>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  19 ++
>>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>>  include/linux/bio.h               |   1 +
>>  8 files changed, 524 insertions(+), 16 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
> I had a few questions about the failed zonefs tests that you reported in the
> cover letter of V1. Did you run the tests again with V2 ? Do you still see the
> errors or not ?
>
Please have a look at V3 cover letter, it has updated test log.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-12-01  3:42     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-12-01  3:42 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 11/29/20 22:52, Damien Le Moal wrote:
>>  block/bio.c                       |   3 +-
>>  drivers/nvme/target/Makefile      |   2 +-
>>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>>  drivers/nvme/target/io-cmd-file.c |   2 +-
>>  drivers/nvme/target/nvmet.h       |  19 ++
>>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>>  include/linux/bio.h               |   1 +
>>  8 files changed, 524 insertions(+), 16 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
> I had a few questions about the failed zonefs tests that you reported in the
> cover letter of V1. Did you run the tests again with V2 ? Do you still see the
> errors or not ?
>
Please have a look at V3 cover letter, it has updated test log.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
  2020-11-30  3:29 ` Chaitanya Kulkarni
@ 2020-11-30  6:51   ` Damien Le Moal
  -1 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  6:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the ZNS mode with the passthru backend. There is no
> support for a generic block device backend to handle the ZBD devices
> which are not NVMe devices.
> 
> This adds support to export the ZBD drives (which are not NVMe drives)
> to host from the target with NVMeOF using the host side ZNS interface.
> 
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for 
> patch-series overview.
> 
> I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
> namespace with nvme-loop transport. The same testcases are passing on the
> NVMeOF zbd-ns and are passing for null_blk without NVMeOF .
> 
> Regards,
> Chaitanya
> 
> Changes from V1:-
> 
> 1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
> 2. Mark helpers inline.
> 3. Fix typos in the comments and update the comments.
> 4. Get rid of the curly brackets.
> 5. Don't allow drives with last smaller zones.
> 6. Calculate the zasl as a function of max_zone_append_sectors,
>    bio_max_pages so we don't have to split the bio.
> 7. Add global subsys->zasl and update the zasl when new namespace
>    is enabled.
> 8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
>    move functionality in to the report zone callback.
> 9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
> 10, Allocate the zones buffer with zones size instead of bdev nr_zones.
> 
> Chaitanya Kulkarni (9):
>   block: export __bio_iov_append_get_pages()
> 	Prep patch needed for implementing Zone Append.
>   nvmet: add ZNS support for bdev-ns
> 	Core Command handlers and various helpers for ZBD backend which
> 	 will be called by target-core/target-admin etc.
>   nvmet: trim down id-desclist to use req->ns
> 	Cleanup needed to avoid the code repetation for passing extra
> 	function parameters for ZBD backend handlers.
>   nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
> 	Allows host to identify zoned namesapce.
>   nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
> 	Allows host to identify controller with the ZBD-ZNS.
>   nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
> 	Allows host to identify namespace with the ZBD-ZNS.
>   nvmet: add zns cmd effects to support zbdev
> 	Allows host to support the ZNS commands when zoned-blkdev is
> 	 selected.
>   nvmet: add zns bdev config support
> 	Allows user to override any target namespace attributes for
> 	 ZBD.
>   nvmet: add ZNS based I/O cmds handlers
> 	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.
> 
>  block/bio.c                       |   3 +-
>  drivers/nvme/target/Makefile      |   2 +-
>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  19 ++
>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>  include/linux/bio.h               |   1 +
>  8 files changed, 524 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 

I had a few questions about the failed zonefs tests that you reported in the
cover letter of V1. Did you run the tests again with V2 ? Do you still see the
errors or not ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-30  6:51   ` Damien Le Moal
  0 siblings, 0 replies; 50+ messages in thread
From: Damien Le Moal @ 2020-11-30  6:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2020/11/30 12:29, Chaitanya Kulkarni wrote:
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the ZNS mode with the passthru backend. There is no
> support for a generic block device backend to handle the ZBD devices
> which are not NVMe devices.
> 
> This adds support to export the ZBD drives (which are not NVMe drives)
> to host from the target with NVMeOF using the host side ZNS interface.
> 
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for 
> patch-series overview.
> 
> I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
> namespace with nvme-loop transport. The same testcases are passing on the
> NVMeOF zbd-ns and are passing for null_blk without NVMeOF .
> 
> Regards,
> Chaitanya
> 
> Changes from V1:-
> 
> 1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
> 2. Mark helpers inline.
> 3. Fix typos in the comments and update the comments.
> 4. Get rid of the curly brackets.
> 5. Don't allow drives with last smaller zones.
> 6. Calculate the zasl as a function of max_zone_append_sectors,
>    bio_max_pages so we don't have to split the bio.
> 7. Add global subsys->zasl and update the zasl when new namespace
>    is enabled.
> 8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
>    move functionality in to the report zone callback.
> 9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
> 10, Allocate the zones buffer with zones size instead of bdev nr_zones.
> 
> Chaitanya Kulkarni (9):
>   block: export __bio_iov_append_get_pages()
> 	Prep patch needed for implementing Zone Append.
>   nvmet: add ZNS support for bdev-ns
> 	Core Command handlers and various helpers for ZBD backend which
> 	 will be called by target-core/target-admin etc.
>   nvmet: trim down id-desclist to use req->ns
> 	Cleanup needed to avoid the code repetation for passing extra
> 	function parameters for ZBD backend handlers.
>   nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
> 	Allows host to identify zoned namesapce.
>   nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
> 	Allows host to identify controller with the ZBD-ZNS.
>   nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
> 	Allows host to identify namespace with the ZBD-ZNS.
>   nvmet: add zns cmd effects to support zbdev
> 	Allows host to support the ZNS commands when zoned-blkdev is
> 	 selected.
>   nvmet: add zns bdev config support
> 	Allows user to override any target namespace attributes for
> 	 ZBD.
>   nvmet: add ZNS based I/O cmds handlers
> 	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.
> 
>  block/bio.c                       |   3 +-
>  drivers/nvme/target/Makefile      |   2 +-
>  drivers/nvme/target/admin-cmd.c   |  38 ++-
>  drivers/nvme/target/io-cmd-bdev.c |  12 +
>  drivers/nvme/target/io-cmd-file.c |   2 +-
>  drivers/nvme/target/nvmet.h       |  19 ++
>  drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
>  include/linux/bio.h               |   1 +
>  8 files changed, 524 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 

I had a few questions about the failed zonefs tests that you reported in the
cover letter of V1. Did you run the tests again with V2 ? Do you still see the
errors or not ?


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-30  3:29 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the ZNS mode with the passthru backend. There is no
support for a generic block device backend to handle the ZBD devices
which are not NVMe devices.

This adds support to export the ZBD drives (which are not NVMe drives)
to host from the target with NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for 
patch-series overview.

I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
namespace with nvme-loop transport. The same testcases are passing on the
NVMeOF zbd-ns and are passing for null_blk without NVMeOF .

Regards,
Chaitanya

Changes from V1:-

1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2. Mark helpers inline.
3. Fix typos in the comments and update the comments.
4. Get rid of the curly brackets.
5. Don't allow drives with last smaller zones.
6. Calculate the zasl as a function of max_zone_append_sectors,
   bio_max_pages so we don't have to split the bio.
7. Add global subsys->zasl and update the zasl when new namespace
   is enabled.
8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
   move functionality in to the report zone callback.
9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10, Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (9):
  block: export __bio_iov_append_get_pages()
	Prep patch needed for implementing Zone Append.
  nvmet: add ZNS support for bdev-ns
	Core Command handlers and various helpers for ZBD backend which
	 will be called by target-core/target-admin etc.
  nvmet: trim down id-desclist to use req->ns
	Cleanup needed to avoid the code repetation for passing extra
	function parameters for ZBD backend handlers.
  nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
	Allows host to identify zoned namesapce.
  nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
	Allows host to identify controller with the ZBD-ZNS.
  nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
	Allows host to identify namespace with the ZBD-ZNS.
  nvmet: add zns cmd effects to support zbdev
	Allows host to support the ZNS commands when zoned-blkdev is
	 selected.
  nvmet: add zns bdev config support
	Allows user to override any target namespace attributes for
	 ZBD.
  nvmet: add ZNS based I/O cmds handlers
	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.

 block/bio.c                       |   3 +-
 drivers/nvme/target/Makefile      |   2 +-
 drivers/nvme/target/admin-cmd.c   |  38 ++-
 drivers/nvme/target/io-cmd-bdev.c |  12 +
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  19 ++
 drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   1 +
 8 files changed, 524 insertions(+), 16 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

-- 
2.22.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/9] nvmet: add genblk ZBD backend
@ 2020-11-30  3:29 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 50+ messages in thread
From: Chaitanya Kulkarni @ 2020-11-30  3:29 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the ZNS mode with the passthru backend. There is no
support for a generic block device backend to handle the ZBD devices
which are not NVMe devices.

This adds support to export the ZBD drives (which are not NVMe drives)
to host from the target with NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for 
patch-series overview.

I've tested the ZoneFS testcases with the null_blk memory backed NVMeOF
namespace with nvme-loop transport. The same testcases are passing on the
NVMeOF zbd-ns and are passing for null_blk without NVMeOF .

Regards,
Chaitanya

Changes from V1:-

1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2. Mark helpers inline.
3. Fix typos in the comments and update the comments.
4. Get rid of the curly brackets.
5. Don't allow drives with last smaller zones.
6. Calculate the zasl as a function of max_zone_append_sectors,
   bio_max_pages so we don't have to split the bio.
7. Add global subsys->zasl and update the zasl when new namespace
   is enabled.
8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
   move functionality in to the report zone callback.
9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10, Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (9):
  block: export __bio_iov_append_get_pages()
	Prep patch needed for implementing Zone Append.
  nvmet: add ZNS support for bdev-ns
	Core Command handlers and various helpers for ZBD backend which
	 will be called by target-core/target-admin etc.
  nvmet: trim down id-desclist to use req->ns
	Cleanup needed to avoid the code repetation for passing extra
	function parameters for ZBD backend handlers.
  nvmet: add NVME_CSI_ZNS in ns-desc for zbdev
	Allows host to identify zoned namesapce.
  nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev
	Allows host to identify controller with the ZBD-ZNS.
  nvmet: add cns-cs-ns in id-ctrl for ZNS bdev
	Allows host to identify namespace with the ZBD-ZNS.
  nvmet: add zns cmd effects to support zbdev
	Allows host to support the ZNS commands when zoned-blkdev is
	 selected.
  nvmet: add zns bdev config support
	Allows user to override any target namespace attributes for
	 ZBD.
  nvmet: add ZNS based I/O cmds handlers
	Handlers for Zone-Mgmt-Send/Zone-Mgmt-Recv/Zone-Append.

 block/bio.c                       |   3 +-
 drivers/nvme/target/Makefile      |   2 +-
 drivers/nvme/target/admin-cmd.c   |  38 ++-
 drivers/nvme/target/io-cmd-bdev.c |  12 +
 drivers/nvme/target/io-cmd-file.c |   2 +-
 drivers/nvme/target/nvmet.h       |  19 ++
 drivers/nvme/target/zns.c         | 463 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   1 +
 8 files changed, 524 insertions(+), 16 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2020-12-01  3:44 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26  2:40 [PATCH 0/9] nvmet: add genblk ZBD backend Chaitanya Kulkarni
2020-11-26  2:40 ` Chaitanya Kulkarni
2020-11-26  2:40 ` [PATCH 1/9] block: export __bio_iov_append_get_pages() Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  8:09   ` Damien Le Moal
2020-11-26  8:09     ` Damien Le Moal
2020-11-26  2:40 ` [PATCH 2/9] nvmet: add ZNS support for bdev-ns Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  8:36   ` Damien Le Moal
2020-11-26  8:36     ` Damien Le Moal
2020-11-28  0:09     ` Chaitanya Kulkarni
2020-11-28  0:09       ` Chaitanya Kulkarni
2020-11-30  0:16       ` Damien Le Moal
2020-11-30  0:16         ` Damien Le Moal
2020-11-26  9:06   ` Damien Le Moal
2020-11-26  9:06     ` Damien Le Moal
     [not found]     ` <BYAPR04MB496572575C9B29E682B0C25286F70@BYAPR04MB4965.namprd04.prod.outlook.com>
2020-11-30  0:18       ` Damien Le Moal
2020-11-30  0:18         ` Damien Le Moal
2020-11-26  2:40 ` [PATCH 3/9] nvmet: trim down id-desclist to use req->ns Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  2:40 ` [PATCH 4/9] nvmet: add NVME_CSI_ZNS in ns-desc for zbdev Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  2:40 ` [PATCH 5/9] nvmet: add cns-cs-ctrl in id-ctrl for ZNS bdev Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  8:39   ` Damien Le Moal
2020-11-26  8:39     ` Damien Le Moal
2020-11-26  2:40 ` [PATCH 6/9] nvmet: add cns-cs-ns " Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  8:40   ` Damien Le Moal
2020-11-26  8:40     ` Damien Le Moal
     [not found]     ` <BYAPR04MB4965E885FFD93A530AF315B886F70@BYAPR04MB4965.namprd04.prod.outlook.com>
2020-11-30  0:21       ` Damien Le Moal
2020-11-30  0:21         ` Damien Le Moal
2020-11-26  2:40 ` [PATCH 7/9] nvmet: add zns cmd effects to support zbdev Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  8:40   ` Damien Le Moal
2020-11-26  8:40     ` Damien Le Moal
2020-11-26  2:40 ` [PATCH 8/9] nvmet: add zns bdev config support Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  2:40 ` [PATCH 9/9] nvmet: add ZNS based I/O cmds handlers Chaitanya Kulkarni
2020-11-26  2:40   ` Chaitanya Kulkarni
2020-11-26  8:45   ` Damien Le Moal
2020-11-26  8:45     ` Damien Le Moal
2020-11-26  8:07 ` [PATCH 0/9] nvmet: add genblk ZBD backend Damien Le Moal
2020-11-26  8:07   ` Damien Le Moal
2020-11-30  3:29 Chaitanya Kulkarni
2020-11-30  3:29 ` Chaitanya Kulkarni
2020-11-30  6:51 ` Damien Le Moal
2020-11-30  6:51   ` Damien Le Moal
2020-12-01  3:42   ` Chaitanya Kulkarni
2020-12-01  3:42     ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.