linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V12 0/3] nvmet: add ZBD backend support
@ 2021-03-11  7:15 Chaitanya Kulkarni
  2021-03-11  7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-11  7:15 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni

Hi,

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
backend. There is no support for a generic block device backend to
handle the ZBDs which are not NVMe protocol compliant.

This adds support to export the ZBDs (which are not NVMe drives) to host
from the target via NVMeOF using the host side ZNS interface.

Note: This patch-series is based on nvme-5.13.

Following scenarios tested successfully:-

* Zonefs Test with dm-linear on the top of SMR HDD exported over NVMeOF.
* Zonefs Test With CONFIG_BLK_DEV_ZONED nvme genblk target and ZBD with
  null_blk zoned target with zonefs.
* Without CONFIG_BLK_DEV_ZONED nvme tests on genblk target.

-ck

Changes from V11:-

1. Use bdev_logical_block_size() in nvmet_bdev_zns_enable().
2. Return error if nr_zones calculation ends up having value 0.
4. Drop the comment patch from this series :-
   nvme: add comments to nvme_zns_alloc_report_buffer

Changes from V10:-

1.  Move CONFIG_BLK_DEV_ZONED check into the caller of
    nvmet_set_csi_zns_effects().
2.  Move ZNS related csi code from csi patch to its own ZNS backend
    patch.
3.  For ZNS command effects logs set default command effects with 
    nvmet_set_csi_nvm_effects() along with nvmet_set_csi_zns_effects().
4.  Use goto for failure case in the nvmet_set_csi_zns_effects().
5.  Return directly from swicth in nvmet_execute_identify_desclist_csi().
6.  Merge Patch 2nd/3rd into one patch and move ZNS related code
    into its own :-
     [PATCH V10 2/8] nvmet: add NVM Command Set Identifier support
     [PATCH V10 3/8] nvmet: add command set supported ctrl cap
    Merged into new patch minux ZNS calls :-
     [PATCH V11 1/4] nvmet: add NVM Command Set Identifier support
7.  Move req->cmd->identify.csi == NVME_CSI_ZNS checks in to respective
    caller in nvmet_execute_identify().
8.  Update error log page in nvmet_bdev_zns_checks().
9.  Remove the terney expression nvmet_zns_update_zasl().
10. Drop following patches:-
     [PATCH V10 1/8] nvmet: trim args for nvmet_copy_ns_identifier()
     [PATCH V10 6/8] nvme-core: check ctrl css before setting up zns
     [PATCH V10 7/8] nvme-core: add a helper to print css related error

Changes from V9:-

1.  Use bio_add_zone_append_page() for REQ_OP_ZONE_APPEND.
2.  Add a separate prep patch to reduce the arguments for
    nvmet_copy_ns_identifier().
3.  Add a separate patch for nvmet CSI support.
4.  Add a separate patch for nvmet CSS support.
5.  Move nvmet_cc_css_check() to multi-css supported patch.
6.  Return error in csi cmd effects helper when !CONFIG_BLK_DEV_ZONED.
7.  Return error in id desc list helper when !CONFIG_BLK_DEV_ZONED.
8.  Remove goto and return from nvmet_bdev_zns_checks().
9.  Move nr_zones calculations near call to blkdev_report_zones() in
    nvmet_bdev_execute_zone_mgmt_recv().
10. Split command effects logs into respective CSI helpers.
11. Don't use the local variables to pass NVME_CSI_XXX values, instead
    use req->ns->csi, also move this from ZBD support patch to nvmet
    CSI support.
12. Fix the bug that is chekcing cns value instead of csi in identify.
13. bdev_is_zoned() is stubbed out the CONFIG_BLK_DEV_ZONED, so remove
    the check for CONFIG_BLK_DEV_ZONED before calling bdev_is_zoned().
14  Drop following patches :-
    [PATCH V9 1/9] block: export bio_add_hw_pages().
    [PATCH V9 5/9] nvmet: add bio get helper.
    [PATCH V9 6/9] nvmet: add bio init helper.
    [PATCH V9 8/9] nvmet: add common I/O length check helper.
    [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append.
15. Add a patch to check for ctrl bits to make sure ctrl supports the
    multi css on the host side when setting up Zoned Namespace.
17. Add a documentation patch for the host side when calculating the
    buffer size allocation size for the report zones.
16. Rebase and retest 5.12-rc1.

Changes from V8:-

1. Rebase and retest on latest nvme-5.11.
2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set.
3. Add a fix to admin ns-desc list handler for handling default csi.

Changes from V7:-

1. Just like what block layer provides an API for bio_init(), provide
   nvmet_bio_init() such that we move bio initialization code for
   nvme-read-write commands from bdev and zns backend into the centralize
   helper. 
2. With bdev/zns/file now we have three backends that are checking for
   req->sg_cnt and calling nvmet_check_transfer_len() before we process
   nvme-read-write commands. Move this duplicate code from three
   backeneds into the helper.
3. Export and use nvmet_bio_done() callback in
   nvmet_execute_zone_append() instead of the open coding the function.
   This also avoids code duplication for bio & request completion with
   error log page update.
4. Add zonefs tests log for dm linear device created on the top of SMR HDD
   exported with NVMeOF ZNS backend with the help of nvme-loop.

Changes from V6:-

1. Instead of calling report zones to find conventional zones in the 
   loop use the loop inside LLD blkdev_report_zones()->LLD_report_zones,
   that also simplifies the report zone callback.
2. Fix the bug in nvmet_bdev_has_conv_zones().
3. Remove conditional operators in the nvmet_bdev_execute_zone_append().

Changes from V5:-

1.  Use bio->bi_iter.bi_sector for result of the REQ_OP_ZONE_APPEND
    command.
2.  Add endianness to the helper nvmet_sect_to_lba().
3.  Make bufsize u32 in zone mgmt recv command handler.
4.  Add __GFP_ZERO for report zone data buffer to return clean buffer.

Changes from V4:-

1.  Don't use bio_iov_iter_get_pages() instead add a patch to export
    bio_add_hw_page() and call it directly for zone append.
2.  Add inline vector optimization for append bio.
3.  Update the commit logs for the patches.
4.  Remove ZNS related identify data structures, use individual members.
5.  Add a comment for macro NVMET_MPSMIN_SHIFT.
6.  Remove nvmet_bdev() helper.
7.  Move the command set identifier code into common code.
8.  Use IS_ENABLED() and move helpers fomr zns.c into common code.
9.  Add a patch to support Command Set identifiers.
10. Open code nvmet_bdev_validate_zns_zones().
11. Remove the per namespace min zasl calculation and don't allow
    namespaces with zasl value > the first ns zasl value.
12. Move the stubs into the header file.
13. Add lba to/from sector conversion helpers and update the
    io-cmd-bdev.c to avoid the code duplication.
14. Add everything into one patch for zns command handlers and 
    respective calls from the target code.
15. Remove the trim ns-desclist admin callback patch from this series.
16. Add bio get and put helpers patches to reduce the duplicate code in
    generic bdev, passthru, and generic zns backend.

Changes from V3:-

1.  Get rid of the bio_max_zasl check.
2.  Remove extra lines.
3.  Remove the block layer api export patch.
4.  Remove the bvec check in the bio_iov_iter_get_pages() for
    REQ_OP_ZONE_APPEND so that we can reuse the code.

Changes from V2:-

1.  Move conventional zone bitmap check into 
    nvmet_bdev_validate_zns_zones(). 
2.  Don't use report zones call to check the runt zone.
3.  Trim nvmet_zasl() helper.
4.  Fix typo in the nvmet_zns_update_zasl().
5.  Remove the comment and fix the mdts calculation in
    nvmet_execute_identify_cns_cs_ctrl().
6.  Use u64 for bufsize in nvmet_bdev_execute_zone_mgmt_recv().
7.  Remove nvmet_zones_to_desc_size() and fix the nr_zones
    calculation.
8.  Remove the op variable in nvmet_bdev_execute_zone_append().
9.  Fix the nr_zones calculation nvmet_bdev_execute_zone_mgmt_recv().
10. Update cover letter subject.

Changes from V1:-

1.  Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2.  Mark helpers inline.
3.  Fix typos in the comments and update the comments.
4.  Get rid of the curly brackets.
5.  Don't allow drives with last smaller zones.
6.  Calculate the zasl as a function of ax_zone_append_sectors,
    bio_max_pages so we don't have to split the bio.
7.  Add global subsys->zasl and update the zasl when new namespace
    is enabled.
8.  Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
    move functionality in to the report zone callback.
9.  Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10. Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (3):
  nvmet: add NVM Command Set Identifier support
  nvmet: add ZBD over ZNS backend support
  nvmet: add nvmet_req_bio put helper for backends

 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  74 ++++++-
 drivers/nvme/target/core.c        |  16 +-
 drivers/nvme/target/io-cmd-bdev.c |  37 +++-
 drivers/nvme/target/nvmet.h       |  45 ++++
 drivers/nvme/target/passthru.c    |   3 +-
 drivers/nvme/target/zns.c         | 331 ++++++++++++++++++++++++++++++
 include/linux/nvme.h              |   1 +
 8 files changed, 487 insertions(+), 21 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

* Zonefs Test log with dm-linear on the top of SMR HDD:-
--------------------------------------------------------------------------------

1. Test Zoned Block Device info :- 
--------------------------------------------------------------------------------

# fdisk  -l /dev/sdh
Disk /dev/sdh: 13.64 TiB, 15000173281280 bytes, 3662151680 sectors
Disk model: HGST HSH721415AL
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
# cat /sys/block/sdh/queue/nr_zones
55880
# cat /sys/block/sdh/queue/zoned
host-managed
# cat /sys/block/sdh/queue/zone_append_max_bytes
688128

2. Creating NVMeOF target backed by dm-linear on the top of ZBD
--------------------------------------------------------------------------------

# ./zbdev.sh 1 dm-zbd
++ NQN=dm-zbd
++ echo '0 29022486528 linear /dev/sdh 274726912' | dmsetup create cksdh
9 directories, 4 files
++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd
++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1
++ echo -n /dev/dm-0
++ cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path
/dev/dm-0
++ echo 1
++ mkdir /sys/kernel/config/nvmet/ports/1/
++ echo -n loop
++ echo -n 1
++ ln -s /sys/kernel/config/nvmet/subsystems/dm-zbd /sys/kernel/config/nvmet/ports/1/subsystems/
++ sleep 1
++ echo transport=loop,nqn=dm-zbd
++ sleep 1
++ dmesg -c
[233450.572565] nvmet: adding nsid 1 to subsystem dm-zbd
[233452.269477] nvmet: creating controller 1 for subsystem dm-zbd for NQN nqn.2014-08.org.nvmexpress:uuid:853d7e82-8018-44ce-8784-ab81e7465ad9.
[233452.283352] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[233452.292805] nvme nvme0: creating 8 I/O queues.
[233452.299210] nvme nvme0: new ctrl: "dm-zbd"

3. dm-linear and backend SMR HDD association :-
--------------------------------------------------------------------------------

# cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path 
/dev/dm-0
# dmsetup ls --tree
 cksdh (252:0)
	 └─ (8:112)
# lsblk | head -3
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdh       8:112  0  13.6T  0 disk 
└─cksdh 252:0    0  13.5T  0 dm   

4. NVMeOF controller :- 
--------------------------------------------------------------------------------

# nvme list | tr -s ' ' ' '  
Node SN Model Namespace Usage Format FW Rev  
/dev/nvme0n1 8c6f348dcd64404c Linux 1 14.86 TB / 14.86 TB 4 KiB + 0 B 5.10.0nv

5. Zonefs tests results :-
--------------------------------------------------------------------------------

# ./zonefs-tests.sh /dev/nvme0n1 
Gathering information on /dev/nvme0n1...
zonefs-tests on /dev/nvme0n1:
  55356 zones (0 conventional zones, 55356 sequential zones)
  524288 512B sectors zone size (256 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... PASS
  Test 0013:  mkzonefs (super block zone state)                    ... PASS
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

46 / 46 tests passed

* With CONFIG_BLK_DEV_ZONED nvme and zonefs tests on membacked null_blk zoned :-
--------------------------------------------------------------------------------

# grep -i blk_dev_zoned .config
CONFIG_BLK_DEV_ZONED=y
# make M=drivers/nvme/ clean 
  CLEAN   drivers/nvme//Module.symvers
# make M=drivers/nvme/ 
  CC [M]  drivers/nvme//host/core.o
  CC [M]  drivers/nvme//host/trace.o
  CC [M]  drivers/nvme//host/lightnvm.o
  CC [M]  drivers/nvme//host/zns.o
  CC [M]  drivers/nvme//host/hwmon.o
  LD [M]  drivers/nvme//host/nvme-core.o
  CC [M]  drivers/nvme//host/pci.o
  LD [M]  drivers/nvme//host/nvme.o
  CC [M]  drivers/nvme//host/fabrics.o
  LD [M]  drivers/nvme//host/nvme-fabrics.o
  CC [M]  drivers/nvme//host/rdma.o
  LD [M]  drivers/nvme//host/nvme-rdma.o
  CC [M]  drivers/nvme//host/fc.o
  LD [M]  drivers/nvme//host/nvme-fc.o
  CC [M]  drivers/nvme//host/tcp.o
  LD [M]  drivers/nvme//host/nvme-tcp.o
  CC [M]  drivers/nvme//target/core.o
  CC [M]  drivers/nvme//target/configfs.o
  CC [M]  drivers/nvme//target/admin-cmd.o
  CC [M]  drivers/nvme//target/fabrics-cmd.o
  CC [M]  drivers/nvme//target/discovery.o
  CC [M]  drivers/nvme//target/io-cmd-file.o
  CC [M]  drivers/nvme//target/io-cmd-bdev.o
  CC [M]  drivers/nvme//target/passthru.o
  CC [M]  drivers/nvme//target/zns.o
  CC [M]  drivers/nvme//target/trace.o
  LD [M]  drivers/nvme//target/nvmet.o
  CC [M]  drivers/nvme//target/loop.o
  LD [M]  drivers/nvme//target/nvme-loop.o
  CC [M]  drivers/nvme//target/rdma.o
  LD [M]  drivers/nvme//target/nvmet-rdma.o
  CC [M]  drivers/nvme//target/fc.o
  LD [M]  drivers/nvme//target/nvmet-fc.o
  CC [M]  drivers/nvme//target/fcloop.o
  LD [M]  drivers/nvme//target/nvme-fcloop.o
  CC [M]  drivers/nvme//target/tcp.o
  LD [M]  drivers/nvme//target/nvmet-tcp.o
  MODPOST drivers/nvme//Module.symvers
  CC [M]  drivers/nvme//host/nvme-core.mod.o
  LD [M]  drivers/nvme//host/nvme-core.ko
  CC [M]  drivers/nvme//host/nvme-fabrics.mod.o
  LD [M]  drivers/nvme//host/nvme-fabrics.ko
  CC [M]  drivers/nvme//host/nvme-fc.mod.o
  LD [M]  drivers/nvme//host/nvme-fc.ko
  CC [M]  drivers/nvme//host/nvme-rdma.mod.o
  LD [M]  drivers/nvme//host/nvme-rdma.ko
  CC [M]  drivers/nvme//host/nvme-tcp.mod.o
  LD [M]  drivers/nvme//host/nvme-tcp.ko
  CC [M]  drivers/nvme//host/nvme.mod.o
  LD [M]  drivers/nvme//host/nvme.ko
  CC [M]  drivers/nvme//target/nvme-fcloop.mod.o
  LD [M]  drivers/nvme//target/nvme-fcloop.ko
  CC [M]  drivers/nvme//target/nvme-loop.mod.o
  LD [M]  drivers/nvme//target/nvme-loop.ko
  CC [M]  drivers/nvme//target/nvmet-fc.mod.o
  LD [M]  drivers/nvme//target/nvmet-fc.ko
  CC [M]  drivers/nvme//target/nvmet-rdma.mod.o
  LD [M]  drivers/nvme//target/nvmet-rdma.ko
  CC [M]  drivers/nvme//target/nvmet-tcp.mod.o
  LD [M]  drivers/nvme//target/nvmet-tcp.ko
  CC [M]  drivers/nvme//target/nvmet.mod.o
  LD [M]  drivers/nvme//target/nvmet.ko
# 
# cdblktests 
# ./check tests/nvme/
nvme/002 (create many subsystems and test discovery)         [passed]
    runtime  24.378s  ...  24.636s
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
    runtime  10.133s  ...  10.152s
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  2.463s  ...  2.478s
nvme/005 (reset local loopback target)                       [not run]
    nvme_core module does not have parameter multipath
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
    runtime  0.095s  ...  0.122s
nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
    runtime  0.065s  ...  0.079s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
    runtime  2.473s  ...  2.501s
nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
    runtime  2.460s  ...  2.424s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  24.526s  ...  28.015s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  265.967s  ...  282.717s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  44.665s  ...  48.124s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  261.739s  ...  352.331s
nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
    runtime  21.268s  ...  22.013s
nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
    runtime  18.820s  ...  22.104s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
    runtime  13.899s  ...  14.322s
nvme/017 (create/delete many file-ns and test discovery)     [passed]
    runtime  14.322s  ...  14.031s
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
    runtime  2.450s  ...  2.444s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
    runtime  2.475s  ...  2.489s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
    runtime  2.410s  ...  2.448s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
    runtime  2.441s  ...  2.439s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
    runtime  2.864s  ...  2.863s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
    runtime  2.465s  ...  2.446s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
    runtime  2.416s  ...  2.411s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
    runtime  2.419s  ...  2.748s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
    runtime  2.422s  ...  2.410s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
    runtime  2.456s  ...  2.462s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
    runtime  2.427s  ...  2.429s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
    runtime  2.751s  ...  2.755s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
    runtime  0.346s  ...  0.357s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
    runtime  13.601s  ...  13.591s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
    runtime  0.039s  ...  0.059s
#
# cdzonefstest 
# ./zonefs-tests.sh /dev/nvme1n1 
Gathering information on /dev/nvme1n1...
zonefs-tests on /dev/nvme1n1:
  16 zones (0 conventional zones, 16 sequential zones)
  131072 512B sectors zone size (64 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... PASS
  Test 0013:  mkzonefs (super block zone state)                    ... PASS
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

46 / 46 tests passed

* Without CONFIG_BLK_DEV_ZONED nvme tests :-
--------------------------------------------------------------------------------

#
# grep -i blk_dev_zoned .config
# CONFIG_BLK_DEV_ZONED is not set
# makej M=drivers/nvme/ clean 
  CLEAN   drivers/nvme//Module.symvers
# makej M=drivers/nvme/ 
  CC [M]  drivers/nvme//host/core.o
  CC [M]  drivers/nvme//host/trace.o
  CC [M]  drivers/nvme//host/lightnvm.o
  CC [M]  drivers/nvme//target/core.o
  CC [M]  drivers/nvme//host/hwmon.o
  CC [M]  drivers/nvme//target/configfs.o
  CC [M]  drivers/nvme//host/pci.o
  CC [M]  drivers/nvme//target/admin-cmd.o
  CC [M]  drivers/nvme//host/fabrics.o
  CC [M]  drivers/nvme//host/rdma.o
  CC [M]  drivers/nvme//target/fabrics-cmd.o
  CC [M]  drivers/nvme//target/discovery.o
  CC [M]  drivers/nvme//host/fc.o
  CC [M]  drivers/nvme//target/io-cmd-file.o
  CC [M]  drivers/nvme//host/tcp.o
  CC [M]  drivers/nvme//target/io-cmd-bdev.o
  CC [M]  drivers/nvme//target/passthru.o
  CC [M]  drivers/nvme//target/trace.o
  CC [M]  drivers/nvme//target/loop.o
  CC [M]  drivers/nvme//target/rdma.o
  CC [M]  drivers/nvme//target/fc.o
  CC [M]  drivers/nvme//target/fcloop.o
  CC [M]  drivers/nvme//target/tcp.o
  LD [M]  drivers/nvme//target/nvme-loop.o
  LD [M]  drivers/nvme//target/nvme-fcloop.o
  LD [M]  drivers/nvme//target/nvmet-tcp.o
  LD [M]  drivers/nvme//host/nvme-fabrics.o
  LD [M]  drivers/nvme//host/nvme.o
  LD [M]  drivers/nvme//host/nvme-rdma.o
  LD [M]  drivers/nvme//target/nvmet-rdma.o
  LD [M]  drivers/nvme//target/nvmet.o
  LD [M]  drivers/nvme//target/nvmet-fc.o
  LD [M]  drivers/nvme//host/nvme-tcp.o
  LD [M]  drivers/nvme//host/nvme-fc.o
  LD [M]  drivers/nvme//host/nvme-core.o
  MODPOST drivers/nvme//Module.symvers
  CC [M]  drivers/nvme//host/nvme-core.mod.o
  CC [M]  drivers/nvme//host/nvme-fabrics.mod.o
  CC [M]  drivers/nvme//host/nvme-fc.mod.o
  CC [M]  drivers/nvme//host/nvme-rdma.mod.o
  CC [M]  drivers/nvme//host/nvme-tcp.mod.o
  CC [M]  drivers/nvme//host/nvme.mod.o
  CC [M]  drivers/nvme//target/nvme-fcloop.mod.o
  CC [M]  drivers/nvme//target/nvme-loop.mod.o
  CC [M]  drivers/nvme//target/nvmet-fc.mod.o
  CC [M]  drivers/nvme//target/nvmet-rdma.mod.o
  CC [M]  drivers/nvme//target/nvmet-tcp.mod.o
  CC [M]  drivers/nvme//target/nvmet.mod.o
  LD [M]  drivers/nvme//target/nvme-fcloop.ko
  LD [M]  drivers/nvme//host/nvme-tcp.ko
  LD [M]  drivers/nvme//host/nvme-core.ko
  LD [M]  drivers/nvme//target/nvmet-tcp.ko
  LD [M]  drivers/nvme//target/nvme-loop.ko
  LD [M]  drivers/nvme//target/nvmet-fc.ko
  LD [M]  drivers/nvme//host/nvme-fabrics.ko
  LD [M]  drivers/nvme//host/nvme-fc.ko
  LD [M]  drivers/nvme//target/nvmet-rdma.ko
  LD [M]  drivers/nvme//host/nvme-rdma.ko
  LD [M]  drivers/nvme//host/nvme.ko
  LD [M]  drivers/nvme//target/nvmet.ko
#
# cdblktests 
# ./check tests/nvme/
nvme/002 (create many subsystems and test discovery)         [passed]
    runtime    ...  27.640s
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
    runtime  10.145s  ...  10.147s
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  1.713s  ...  1.712s
nvme/005 (reset local loopback target)                       [not run]
    nvme_core module does not have parameter multipath
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
    runtime  0.111s  ...  0.115s
nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
    runtime  0.081s  ...  0.069s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
    runtime  1.690s  ...  1.727s
nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
    runtime  1.659s  ...  1.661s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  28.781s  ...  30.166s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  253.831s  ...  238.774s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  40.003s  ...  68.076s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  272.649s  ...  283.720s
nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
    runtime  21.772s  ...  21.397s
nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
    runtime  21.908s  ...  18.622s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
    runtime  15.860s  ...  18.313s
nvme/017 (create/delete many file-ns and test discovery)     [passed]
    runtime  16.470s  ...  18.374s
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
    runtime  1.665s  ...  1.890s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
    runtime  1.681s  ...  1.982s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
    runtime  1.645s  ...  1.913s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
    runtime  1.648s  ...  1.956s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
    runtime  2.063s  ...  2.553s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
    runtime  1.692s  ...  2.588s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
    runtime  1.643s  ...  1.656s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
    runtime  1.640s  ...  1.668s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
    runtime  1.643s  ...  1.961s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
    runtime  1.641s  ...  1.677s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
    runtime  1.648s  ...  1.868s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
    runtime  1.982s  ...  2.703s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
    runtime  0.308s  ...  0.328s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
    runtime  5.432s  ...  7.495s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
    runtime  0.053s  ...  0.046s
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support
  2021-03-11  7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni
@ 2021-03-11  7:15 ` Chaitanya Kulkarni
  2021-03-11  7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
  2021-03-11  7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
  2 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-11  7:15 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni

NVMe TP 4056 allows controller to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is a value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controller
needs to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
supports default command set identifier. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
that represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend for NVMeOF
with host side NVMe ZNS interface, since ZNS commands belongs to
the different command set than the default one.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 47 +++++++++++++++++++++++++++------
 drivers/nvme/target/core.c      | 16 ++++++++++-
 drivers/nvme/target/nvmet.h     |  1 +
 include/linux/nvme.h            |  1 +
 4 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index f4cc32674edd..176c8593d341 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -162,15 +162,8 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
+static void nvmet_set_csi_nvm_effects(struct nvme_effects_log *log)
 {
-	u16 status = NVME_SC_INTERNAL;
-	struct nvme_effects_log *log;
-
-	log = kzalloc(sizeof(*log), GFP_KERNEL);
-	if (!log)
-		goto out;
-
 	log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
 	log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
 	log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
@@ -184,9 +177,31 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
+}
+
+static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
+{
+	struct nvme_effects_log *log;
+	u16 status = NVME_SC_SUCCESS;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	switch (req->cmd->get_log_page.csi) {
+	case NVME_CSI_NVM:
+		nvmet_set_csi_nvm_effects(log);
+		break;
+	default:
+		status = NVME_SC_INVALID_LOG_PAGE;
+		goto free;
+	}
 
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
+free:
 	kfree(log);
 out:
 	nvmet_req_complete(req, status);
@@ -611,6 +626,18 @@ static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 	return 0;
 }
 
+static u16 nvmet_execute_identify_desclist_csi(struct nvmet_req *req, off_t *o)
+{
+	switch (req->ns->csi) {
+	case NVME_CSI_NVM:
+		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+						NVME_NIDT_CSI_LEN,
+						&req->ns->csi, o);
+	}
+
+	return NVME_SC_INVALID_IO_CMD_SET;
+}
+
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
 	off_t off = 0;
@@ -635,6 +662,10 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_execute_identify_desclist_csi(req, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index adbede9ab7f3..4abe0b542c96 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -693,6 +693,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	uuid_gen(&ns->uuid);
 	ns->buffered_io = false;
+	ns->csi = NVME_CSI_NVM;
 
 	return ns;
 }
@@ -1113,6 +1114,17 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
 	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
 }
 
+static inline bool nvmet_cc_css_check(u8 cc_css)
+{
+	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
+	case NVME_CC_CSS_NVM:
+	case NVME_CC_CSS_CSI:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 {
 	lockdep_assert_held(&ctrl->lock);
@@ -1121,7 +1133,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 	    nvmet_cc_iocqes(ctrl->cc) != NVME_NVM_IOCQES ||
 	    nvmet_cc_mps(ctrl->cc) != 0 ||
 	    nvmet_cc_ams(ctrl->cc) != 0 ||
-	    nvmet_cc_css(ctrl->cc) != 0) {
+	    !nvmet_cc_css_check(nvmet_cc_css(ctrl->cc))) {
 		ctrl->csts = NVME_CSTS_CFS;
 		return;
 	}
@@ -1172,6 +1184,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 {
 	/* command sets supported: NVMe command set: */
 	ctrl->cap = (1ULL << 37);
+	/* Controller supports one or more I/O Command Sets */
+	ctrl->cap |= (1ULL << 43);
 	/* CC.EN timeout in 500msec units: */
 	ctrl->cap |= (15ULL << 24);
 	/* maximum queue entries supported: */
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 24e261bf153a..ee5999920155 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,7 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+	u8			csi;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index b08787cd0881..f09fbbb7876b 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -1494,6 +1494,7 @@ enum {
 	NVME_SC_NS_WRITE_PROTECTED	= 0x20,
 	NVME_SC_CMD_INTERRUPTED		= 0x21,
 	NVME_SC_TRANSIENT_TR_ERR	= 0x22,
+	NVME_SC_INVALID_IO_CMD_SET	= 0x2C,
 
 	NVME_SC_LBA_RANGE		= 0x80,
 	NVME_SC_CAP_EXCEEDED		= 0x81,
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-11  7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni
  2021-03-11  7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni
@ 2021-03-11  7:15 ` Chaitanya Kulkarni
  2021-03-12  1:15   ` Damien Le Moal
  2021-03-11  7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
  2 siblings, 1 reply; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-11  7:15 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni

NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for
NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend to support the ZBDs for NVMeOF target.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as
NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With new command set identifier we also update the target command
effects logs to reflect the ZNS compliant commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  27 +++
 drivers/nvme/target/io-cmd-bdev.c |  34 ++-
 drivers/nvme/target/nvmet.h       |  38 ++++
 drivers/nvme/target/zns.c         | 332 ++++++++++++++++++++++++++++++
 5 files changed, 424 insertions(+), 8 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..9837e580fa7e 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
+nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 176c8593d341..bf4876df624a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -179,6 +179,13 @@ static void nvmet_set_csi_nvm_effects(struct nvme_effects_log *log)
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 }
 
+static void nvmet_set_csi_zns_effects(struct nvme_effects_log *log)
+{
+	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+}
+
 static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 {
 	struct nvme_effects_log *log;
@@ -194,6 +201,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	case NVME_CSI_NVM:
 		nvmet_set_csi_nvm_effects(log);
 		break;
+	case NVME_CSI_ZNS:
+		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			status = NVME_SC_INVALID_IO_CMD_SET;
+			goto free;
+		}
+
+		nvmet_set_csi_nvm_effects(log);
+		nvmet_set_csi_zns_effects(log);
+		break;
 	default:
 		status = NVME_SC_INVALID_LOG_PAGE;
 		goto free;
@@ -630,6 +646,13 @@ static u16 nvmet_execute_identify_desclist_csi(struct nvmet_req *req, off_t *o)
 {
 	switch (req->ns->csi) {
 	case NVME_CSI_NVM:
+		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+						NVME_NIDT_CSI_LEN,
+						&req->ns->csi, o);
+	case NVME_CSI_ZNS:
+		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+			return NVME_SC_INVALID_IO_CMD_SET;
+
 		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
 						NVME_NIDT_CSI_LEN,
 						&req->ns->csi, o);
@@ -682,8 +705,12 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		return nvmet_execute_identify_cns_cs_ns(req);
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		return nvmet_execute_identify_cns_cs_ctrl(req);
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 9a8b3726a37c..ada0215f5e56 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
 	}
 }
 
+void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
+{
+	if (ns->bdev) {
+		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
+		ns->bdev = NULL;
+	}
+}
+
 int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 {
 	int ret;
@@ -86,15 +94,16 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
-	return 0;
-}
-
-void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
-{
-	if (ns->bdev) {
-		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
-		ns->bdev = NULL;
+	/* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */
+	if (bdev_is_zoned(ns->bdev)) {
+		if (!nvmet_bdev_zns_enable(ns)) {
+			nvmet_bdev_ns_disable(ns);
+			return -EINVAL;
+		}
+		ns->csi = NVME_CSI_ZNS;
 	}
+
+	return 0;
 }
 
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
@@ -448,6 +457,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ee5999920155..f3fccc49de03 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -247,6 +247,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	u8			zasl;
+#endif /* CONFIG_BLK_DEV_ZONED */
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -584,6 +588,40 @@ static inline struct nvme_ctrl *nvmet_passthru_ctrl(struct nvmet_subsys *subsys)
 }
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
 
+#ifdef CONFIG_BLK_DEV_ZONED
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+#else  /* CONFIG_BLK_DEV_ZONED */
+static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	return false;
+}
+static inline void
+nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 static inline struct nvme_ctrl *
 nvmet_req_passthru_ctrl(struct nvmet_req *req)
 {
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..e12629b02320
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/nvme.h>
+#include <linux/blkdev.h>
+#include "nvmet.h"
+
+/*
+ * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
+ * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
+ * as page_shift value. When calculating the ZASL use shift by 12.
+ */
+#define NVMET_MPSMIN_SHIFT	12
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	if (!bdev_is_zoned(req->ns->bdev))
+		return NVME_SC_INVALID_NS | NVME_SC_DNR;
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, zra);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		req->error_loc =
+			offsetof(struct nvme_zone_mgmt_recv_cmd, zrasf);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, pr);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	return NVME_SC_SUCCESS;
+}
+
+static inline u8 nvmet_zasl(unsigned int zone_append_sects)
+{
+	/*
+	 * Zone Append Size Limit is the value experessed in the units
+	 * of minimum memory page size (i.e. 12) and is reported power of 2.
+	 */
+	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
+}
+
+static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
+{
+	struct request_queue *q = ns->bdev->bd_disk->queue;
+	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+
+	if (ns->subsys->zasl)
+		return ns->subsys->zasl < zasl;
+
+	ns->subsys->zasl = zasl;
+	return true;
+}
+
+static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
+					    unsigned int i, void *data)
+{
+	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
+{
+	int ret;
+
+	if (bdev->bd_disk->queue->conv_zones_bitmap)
+		return true;
+
+	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
+				  nvmet_bdev_validate_zns_zones_cb, NULL);
+
+	return ret <= 0 ? true : false;
+}
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	if (nvmet_bdev_has_conv_zones(ns->bdev))
+		return false;
+
+	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
+
+	if (!nvmet_zns_update_zasl(ns))
+		return false;
+
+	return !(get_capacity(ns->bdev->bd_disk) &
+			(bdev_zone_sectors(ns->bdev) - 1));
+}
+
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	u8 zasl = req->sq->ctrl->subsys->zasl;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_id_ctrl_zns *id;
+	u16 status;
+
+	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
+		req->error_loc = offsetof(struct nvme_common_command, opcode);
+		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
+		goto out;
+	}
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	if (ctrl->ops->get_mdts)
+		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
+	else
+		id->zasl = zasl;
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u64 zsze;
+	u16 status;
+
+	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
+		req->error_loc = offsetof(struct nvme_common_command, opcode);
+		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	status = nvmet_req_find_ns(req);
+	if (status) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+struct nvmet_report_zone_data {
+	struct nvmet_ns *ns;
+	struct nvme_zone_report *rz;
+};
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned i, void *d)
+{
+	struct nvmet_report_zone_data *report_zone_data = d;
+	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
+	struct nvmet_ns *ns = report_zone_data->ns;
+
+	entries[i].zcap = nvmet_sect_to_lba(ns, z->capacity);
+	entries[i].zslba = nvmet_sect_to_lba(ns, z->start);
+	entries[i].wp = nvmet_sect_to_lba(ns, z->wp);
+	entries[i].za = z->reset ? 1 << 2 : 0;
+	entries[i].zt = z->type;
+	entries[i].zs = z->cond << 4;
+
+	return 0;
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
+	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	struct nvmet_report_zone_data data = { .ns = req->ns };
+	unsigned int nr_zones;
+	int reported_zones;
+	u16 status;
+
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
+	if (!data.rz) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
+			sizeof(struct nvme_zone_descriptor);
+	if (!nr_zones) {
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out_free_report_zones;
+	}
+
+	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
+					     nvmet_bdev_report_zone_cb, &data);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	data.rz->nr_zones = cpu_to_le64(reported_zones);
+
+	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
+
+out_free_report_zones:
+	kvfree(data.rz);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
+	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
+	u16 status = NVME_SC_SUCCESS;
+	u8 zsa = req->cmd->zms.zsa;
+	enum req_opf op;
+	int ret;
+	const unsigned int zsa_to_op[] = {
+		[NVME_ZONE_OPEN]	= REQ_OP_ZONE_OPEN,
+		[NVME_ZONE_CLOSE]	= REQ_OP_ZONE_CLOSE,
+		[NVME_ZONE_FINISH]	= REQ_OP_ZONE_FINISH,
+		[NVME_ZONE_RESET]	= REQ_OP_ZONE_RESET,
+	};
+
+	if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	op = zsa_to_op[zsa];
+
+	if (req->cmd->zms.select_all)
+		nr_sect = get_capacity(req->ns->bdev->bd_disk);
+
+	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
+	u16 status = NVME_SC_SUCCESS;
+	unsigned int total_len = 0;
+	struct scatterlist *sg;
+	int ret = 0, sg_cnt;
+	struct bio *bio;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return;
+	}
+
+	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
+		bio = &req->b.inline_bio;
+		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
+	} else {
+		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
+	}
+
+	bio_set_dev(bio, req->ns->bdev);
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
+		bio->bi_opf |= REQ_FUA;
+
+	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
+		struct page *p = sg_page(sg);
+		unsigned int l = sg->length;
+		unsigned int o = sg->offset;
+
+		ret = bio_add_zone_append_page(bio, p, l, o);
+		if (ret != sg->length) {
+			status = NVME_SC_INTERNAL;
+			goto out_bio_put;
+		}
+
+		total_len += sg->length;
+	}
+
+	if (total_len != nvmet_rw_data_len(req)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out_bio_put;
+	}
+
+	ret = submit_bio_wait(bio);
+	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
+						 bio->bi_iter.bi_sector);
+
+out_bio_put:
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
+}
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends
  2021-03-11  7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni
  2021-03-11  7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni
  2021-03-11  7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
@ 2021-03-11  7:15 ` Chaitanya Kulkarni
  2021-03-12  0:37   ` Damien Le Moal
  2 siblings, 1 reply; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-11  7:15 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni

With the addition of zns backend now we have three different backends
with inline bio optimization. That leads to having duplicate code in
for freeing the bio in all three backends: generic bdev, passsthru and
generic zns.

Add a helper function to avoid duplicate code and update the
respective backends.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +--
 drivers/nvme/target/nvmet.h       | 6 ++++++
 drivers/nvme/target/passthru.c    | 3 +--
 drivers/nvme/target/zns.c         | 3 +--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index ada0215f5e56..ca39d787d71f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -173,8 +173,7 @@ static void nvmet_bio_done(struct bio *bio)
 	struct nvmet_req *req = bio->bi_private;
 
 	nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status));
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f3fccc49de03..2f1bd3ac34a2 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -654,4 +654,10 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
 	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
 }
 
+static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
+{
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index 26c587ccd152..011aeebace55 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -206,8 +206,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,
 				    sg->offset) < sg->length) {
-			if (bio != &req->p.inline_bio)
-				bio_put(bio);
+			nvmet_req_bio_put(req, bio);
 			return -EINVAL;
 		}
 	}
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index e12629b02320..112d213583ea 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -326,7 +326,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 						 bio->bi_iter.bi_sector);
 
 out_bio_put:
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
 }
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends
  2021-03-11  7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
@ 2021-03-12  0:37   ` Damien Le Moal
  0 siblings, 0 replies; 12+ messages in thread
From: Damien Le Moal @ 2021-03-12  0:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, kbusch, sagi

On 2021/03/11 16:16, Chaitanya Kulkarni wrote:
> With the addition of zns backend now we have three different backends
> with inline bio optimization. That leads to having duplicate code in
> for freeing the bio in all three backends: generic bdev, passsthru and
> generic zns.
> 
> Add a helper function to avoid duplicate code and update the
> respective backends.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c | 3 +--
>  drivers/nvme/target/nvmet.h       | 6 ++++++
>  drivers/nvme/target/passthru.c    | 3 +--
>  drivers/nvme/target/zns.c         | 3 +--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index ada0215f5e56..ca39d787d71f 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -173,8 +173,7 @@ static void nvmet_bio_done(struct bio *bio)
>  	struct nvmet_req *req = bio->bi_private;
>  
>  	nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status));
> -	if (bio != &req->b.inline_bio)
> -		bio_put(bio);
> +	nvmet_req_bio_put(req, bio);
>  }
>  
>  #ifdef CONFIG_BLK_DEV_INTEGRITY
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index f3fccc49de03..2f1bd3ac34a2 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -654,4 +654,10 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>  	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>  }
>  
> +static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
> +{
> +	if (bio != &req->b.inline_bio)
> +		bio_put(bio);
> +}
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
> index 26c587ccd152..011aeebace55 100644
> --- a/drivers/nvme/target/passthru.c
> +++ b/drivers/nvme/target/passthru.c
> @@ -206,8 +206,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
>  	for_each_sg(req->sg, sg, req->sg_cnt, i) {
>  		if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,
>  				    sg->offset) < sg->length) {
> -			if (bio != &req->p.inline_bio)
> -				bio_put(bio);
> +			nvmet_req_bio_put(req, bio);
>  			return -EINVAL;
>  		}
>  	}
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index e12629b02320..112d213583ea 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -326,7 +326,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  						 bio->bi_iter.bi_sector);
>  
>  out_bio_put:
> -	if (bio != &req->b.inline_bio)
> -		bio_put(bio);
> +	nvmet_req_bio_put(req, bio);
>  	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
>  }
> 

Looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-11  7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
@ 2021-03-12  1:15   ` Damien Le Moal
  2021-03-12  6:29     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 12+ messages in thread
From: Damien Le Moal @ 2021-03-12  1:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, kbusch, sagi

On 2021/03/11 16:16, Chaitanya Kulkarni wrote:
> NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
> communicate with a non-volatile memory subsystem using zones for
> NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
> Protocol compliant devices on the target in the passthru mode. There
> are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
> HDDs that are not based on the NVMe protocol.
> 
> This patch adds ZNS backend to support the ZBDs for NVMeOF target.
> 
> This support includes implementing the new command set NVME_CSI_ZNS,
> adding different command handlers for ZNS command set such as
> NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
> NVMe Zone Management Send and NVMe Zone Management Receive.
> 
> With new command set identifier we also update the target command
> effects logs to reflect the ZNS compliant commands.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   1 +
>  drivers/nvme/target/admin-cmd.c   |  27 +++
>  drivers/nvme/target/io-cmd-bdev.c |  34 ++-
>  drivers/nvme/target/nvmet.h       |  38 ++++
>  drivers/nvme/target/zns.c         | 332 ++++++++++++++++++++++++++++++
>  5 files changed, 424 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..9837e580fa7e 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index 176c8593d341..bf4876df624a 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -179,6 +179,13 @@ static void nvmet_set_csi_nvm_effects(struct nvme_effects_log *log)
>  	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  }
>  
> +static void nvmet_set_csi_zns_effects(struct nvme_effects_log *log)
> +{
> +	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +}
> +
>  static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  {
>  	struct nvme_effects_log *log;
> @@ -194,6 +201,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  	case NVME_CSI_NVM:
>  		nvmet_set_csi_nvm_effects(log);
>  		break;
> +	case NVME_CSI_ZNS:
> +		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +			status = NVME_SC_INVALID_IO_CMD_SET;
> +			goto free;
> +		}
> +
> +		nvmet_set_csi_nvm_effects(log);
> +		nvmet_set_csi_zns_effects(log);
> +		break;
>  	default:
>  		status = NVME_SC_INVALID_LOG_PAGE;
>  		goto free;
> @@ -630,6 +646,13 @@ static u16 nvmet_execute_identify_desclist_csi(struct nvmet_req *req, off_t *o)
>  {
>  	switch (req->ns->csi) {
>  	case NVME_CSI_NVM:
> +		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +						NVME_NIDT_CSI_LEN,
> +						&req->ns->csi, o);
> +	case NVME_CSI_ZNS:
> +		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +			return NVME_SC_INVALID_IO_CMD_SET;
> +
>  		return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>  						NVME_NIDT_CSI_LEN,
>  						&req->ns->csi, o);
> @@ -682,8 +705,12 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		return nvmet_execute_identify_cns_cs_ns(req);
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		return nvmet_execute_identify_cns_cs_ctrl(req);
>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>  		return nvmet_execute_identify_nslist(req);
>  	case NVME_ID_CNS_NS_DESC_LIST:
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 9a8b3726a37c..ada0215f5e56 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
>  	}
>  }
>  
> +void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev) {
> +		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
> +		ns->bdev = NULL;
> +	}
> +}
> +
>  int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  {
>  	int ret;
> @@ -86,15 +94,16 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>  		nvmet_bdev_ns_enable_integrity(ns);
>  
> -	return 0;
> -}
> -
> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
> -{
> -	if (ns->bdev) {
> -		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
> -		ns->bdev = NULL;
> +	/* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */

I do not really understand this comment and I do not think it is useful.
bdev_is_zoned() is always defined, regardless of CONFIG_BLK_DEV_ZONED. If
CONFIG_BLK_DEV_ZONED is not set, you will always get false.

> +	if (bdev_is_zoned(ns->bdev)) {
> +		if (!nvmet_bdev_zns_enable(ns)) {
> +			nvmet_bdev_ns_disable(ns);
> +			return -EINVAL;
> +		}
> +		ns->csi = NVME_CSI_ZNS;
>  	}
> +
> +	return 0;
>  }
>  
>  void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
> @@ -448,6 +457,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>  	case nvme_cmd_write_zeroes:
>  		req->execute = nvmet_bdev_execute_write_zeroes;
>  		return 0;
> +	case nvme_cmd_zone_append:
> +		req->execute = nvmet_bdev_execute_zone_append;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_recv:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_send:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
> +		return 0;
>  	default:
>  		return nvmet_report_invalid_opcode(req);
>  	}
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index ee5999920155..f3fccc49de03 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -247,6 +247,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	u8			zasl;
> +#endif /* CONFIG_BLK_DEV_ZONED */
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -584,6 +588,40 @@ static inline struct nvme_ctrl *nvmet_passthru_ctrl(struct nvmet_subsys *subsys)
>  }
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +static inline void
> +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> +
>  static inline struct nvme_ctrl *
>  nvmet_req_passthru_ctrl(struct nvmet_req *req)
>  {
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..e12629b02320
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,332 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.

s/ZBD/zoned block device

"ZBD" is not necessarilly an obvious acronym to everybody.

> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.

This should be:

* Copyright (C) 2021 Western Digital Corporation or its affiliates.

> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include "nvmet.h"
> +
> +/*
> + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
> + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
> + * as page_shift value. When calculating the ZASL use shift by 12.
> + */
> +#define NVMET_MPSMIN_SHIFT	12
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	if (!bdev_is_zoned(req->ns->bdev))
> +		return NVME_SC_INVALID_NS | NVME_SC_DNR;
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, zra);
> +		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		req->error_loc =
> +			offsetof(struct nvme_zone_mgmt_recv_cmd, zrasf);
> +		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL) {
> +		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, pr);
> +		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> +	}
> +
> +	return NVME_SC_SUCCESS;
> +}
> +
> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
> +{
> +	/*
> +	 * Zone Append Size Limit is the value experessed in the units
> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
> +	 */
> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);

s/9/SECTOR_SHIFT

And you could rewrite this as:

return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - SECTOR_SHIFT));

> +}
> +
> +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
> +{
> +	struct request_queue *q = ns->bdev->bd_disk->queue;
> +	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
> +
> +	if (ns->subsys->zasl)
> +		return ns->subsys->zasl < zasl;
> +
> +	ns->subsys->zasl = zasl;
> +	return true;
> +}
> +
> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
> +					    unsigned int i, void *data)
> +{
> +	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +		return -EOPNOTSUPP;
> +	return 0;
> +}
> +
> +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
> +{
> +	int ret;
> +
> +	if (bdev->bd_disk->queue->conv_zones_bitmap)
> +		return true;
> +
> +	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
> +				  nvmet_bdev_validate_zns_zones_cb, NULL);
> +
> +	return ret <= 0 ? true : false;
> +}
> +
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	if (nvmet_bdev_has_conv_zones(ns->bdev))
> +		return false;
> +
> +	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
> +
> +	if (!nvmet_zns_update_zasl(ns))
> +		return false;
> +
> +	return !(get_capacity(ns->bdev->bd_disk) &
> +			(bdev_zone_sectors(ns->bdev) - 1));

It may be good to add a comment above this one as it is not necessarilly
obvious. Something like:

/*
 * Generic zoned block devices may have a smaller last zone which is
 * not supported by ZNS. Excludes zoned drives that have such smaller
 * last zone.
 */

> +}
> +
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	u8 zasl = req->sq->ctrl->subsys->zasl;
> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status;
> +
> +	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
> +		req->error_loc = offsetof(struct nvme_common_command, opcode);
> +		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	if (ctrl->ops->get_mdts)
> +		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
> +	else
> +		id->zasl = zasl;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u64 zsze;
> +	u16 status;
> +
> +	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
> +		req->error_loc = offsetof(struct nvme_common_command, opcode);
> +		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	status = nvmet_req_find_ns(req);
> +	if (status) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
> +					req->ns->blksize_shift;

s/9/SECTOR_SHIFT

> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +struct nvmet_report_zone_data {
> +	struct nvmet_ns *ns;
> +	struct nvme_zone_report *rz;
> +};
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned i, void *d)
> +{
> +	struct nvmet_report_zone_data *report_zone_data = d;
> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
> +	struct nvmet_ns *ns = report_zone_data->ns;
> +
> +	entries[i].zcap = nvmet_sect_to_lba(ns, z->capacity);
> +	entries[i].zslba = nvmet_sect_to_lba(ns, z->start);
> +	entries[i].wp = nvmet_sect_to_lba(ns, z->wp);
> +	entries[i].za = z->reset ? 1 << 2 : 0;
> +	entries[i].zt = z->type;
> +	entries[i].zs = z->cond << 4;
> +
> +	return 0;
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
> +	struct nvmet_report_zone_data data = { .ns = req->ns };
> +	unsigned int nr_zones;
> +	int reported_zones;
> +	u16 status;
> +
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
> +	if (!data.rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
> +			sizeof(struct nvme_zone_descriptor);
> +	if (!nr_zones) {
> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> +		goto out_free_report_zones;
> +	}
> +
> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
> +					     nvmet_bdev_report_zone_cb, &data);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}

There is a problem here: the code as is ignores the request reporting option
field which can lead to an invalid zone report being returned. I think you need
to modify nvmet_bdev_report_zone_cb() to look at the reporting option field
passed by the initiator and filter the zone report since blkdev_report_zones()
does not handle that argument.

> +
> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
> +
> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(data.rz);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
> +	u16 status = NVME_SC_SUCCESS;
> +	u8 zsa = req->cmd->zms.zsa;
> +	enum req_opf op;
> +	int ret;
> +	const unsigned int zsa_to_op[] = {
> +		[NVME_ZONE_OPEN]	= REQ_OP_ZONE_OPEN,
> +		[NVME_ZONE_CLOSE]	= REQ_OP_ZONE_CLOSE,
> +		[NVME_ZONE_FINISH]	= REQ_OP_ZONE_FINISH,
> +		[NVME_ZONE_RESET]	= REQ_OP_ZONE_RESET,
> +	};
> +
> +	if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) {

What is the point of the "!zsa_to_op[zsa]" here ? All the REQ_OP_ZONE_XXX are
non 0, always...

> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	op = zsa_to_op[zsa];
> +
> +	if (req->cmd->zms.select_all)
> +		nr_sect = get_capacity(req->ns->bdev->bd_disk);
If select_all is set, sect must be ignored, so you need to have something like this:

	if (req->cmd->zms.select_all) {
		sect = 0;
		nr_sect = get_capacity(req->ns->bdev->bd_disk);
	} else {
		sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
		nr_sect = bdev_zone_sectors(req->ns->bdev);
	}

Easier to read. Also may be rename nr_sect to nr_sects (plural).

> +
> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
> +	u16 status = NVME_SC_SUCCESS;
> +	unsigned int total_len = 0;
> +	struct scatterlist *sg;
> +	int ret = 0, sg_cnt;
> +	struct bio *bio;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;

No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ?

> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> +		bio = &req->b.inline_bio;
> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> +	} else {
> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> +	}
> +
> +	bio_set_dev(bio, req->ns->bdev);
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
> +		bio->bi_opf |= REQ_FUA;
> +
> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
> +		struct page *p = sg_page(sg);
> +		unsigned int l = sg->length;
> +		unsigned int o = sg->offset;
> +
> +		ret = bio_add_zone_append_page(bio, p, l, o);
> +		if (ret != sg->length) {
> +			status = NVME_SC_INTERNAL;
> +			goto out_bio_put;
> +		}
> +
> +		total_len += sg->length;
> +	}
> +
> +	if (total_len != nvmet_rw_data_len(req)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out_bio_put;
> +	}
> +
> +	ret = submit_bio_wait(bio);

submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it
mandatory to block here ? The result handling could be done in the bio_end
callback no ?

> +	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
> +						 bio->bi_iter.bi_sector);
> +
> +out_bio_put:
> +	if (bio != &req->b.inline_bio)
> +		bio_put(bio);
> +	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
> +}
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-12  1:15   ` Damien Le Moal
@ 2021-03-12  6:29     ` Chaitanya Kulkarni
  2021-03-12  7:25       ` Damien Le Moal
  0 siblings, 1 reply; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-12  6:29 UTC (permalink / raw)
  To: Damien Le Moal, linux-nvme; +Cc: hch, kbusch, sagi

>> -
>> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
>> -{
>> -	if (ns->bdev) {
>> -		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
>> -		ns->bdev = NULL;
>> +	/* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */
> I do not really understand this comment and I do not think it is useful.
> bdev_is_zoned() is always defined, regardless of CONFIG_BLK_DEV_ZONED. If
> CONFIG_BLK_DEV_ZONED is not set, you will always get false.

Okay, will remove the comment.

>> +
>>  static inline struct nvme_ctrl *
>>  nvmet_req_passthru_ctrl(struct nvmet_req *req)
>>  {
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..e12629b02320
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,332 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
> s/ZBD/zoned block device
>
> "ZBD" is not necessarilly an obvious acronym to everybody.
>
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> This should be:
>
> * Copyright (C) 2021 Western Digital Corporation or its affiliates.

Will fix the header.

>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
>> +{
>> +	/*
>> +	 * Zone Append Size Limit is the value experessed in the units
>> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
>> +	 */
>> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
> s/9/SECTOR_SHIFT
>
> And you could rewrite this as:
>
> return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - SECTOR_SHIFT));

SECTOR_SHIFT patches are nacked, reason being it doesn't make code
clear, how about following ?

return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - 9));

>
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	if (nvmet_bdev_has_conv_zones(ns->bdev))
>> +		return false;
>> +
>> +	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
>> +
>> +	if (!nvmet_zns_update_zasl(ns))
>> +		return false;
>> +
>> +	return !(get_capacity(ns->bdev->bd_disk) &
>> +			(bdev_zone_sectors(ns->bdev) - 1));
> It may be good to add a comment above this one as it is not necessarilly
> obvious. Something like:
>
> /*
>  * Generic zoned block devices may have a smaller last zone which is
>  * not supported by ZNS. Excludes zoned drives that have such smaller
>  * last zone.
>  */

Will add above comment.

>
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	u8 zasl = req->sq->ctrl->subsys->zasl;
>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status;
>> +
>> +	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
>> +		req->error_loc = offsetof(struct nvme_common_command, opcode);
>> +		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	if (ctrl->ops->get_mdts)
>> +		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
>> +	else
>> +		id->zasl = zasl;
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u64 zsze;
>> +	u16 status;
>> +
>> +	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
>> +		req->error_loc = offsetof(struct nvme_common_command, opcode);
>> +		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	status = nvmet_req_find_ns(req);
>> +	if (status) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
>> +	}
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
>> +					req->ns->blksize_shift;
> s/9/SECTOR_SHIFT

See my earlier reply to the same comment.

>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>> +	unsigned int nr_zones;
>> +	int reported_zones;
>> +	u16 status;
>> +
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>> +	if (!data.rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>> +			sizeof(struct nvme_zone_descriptor);
>> +	if (!nr_zones) {
>> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>> +					     nvmet_bdev_report_zone_cb, &data);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
> There is a problem here: the code as is ignores the request reporting option
> field which can lead to an invalid zone report being returned. I think you need
> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field
> passed by the initiator and filter the zone report since blkdev_report_zones()
> does not handle that argument.

The reporting options are set by the host statistically in
nvme_ns_report_zones()
arefrom:-  nvme_ns_report_zones()
         c.zmr.zra = NVME_ZRA_ZONE_REPORT;
         c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
         c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;

All the above values are validated in the nvmet_bdev_zns_checks() helper
called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the
report zone buffer.

1. c.zmr.zra indicates the action which Reports zone descriptor entries
   through the Report Zones data structure.

   We validate this value is been set to NVME_ZRA_ZONE_REPORT in the
   nvmet_bdev_zns_chceks(). We are calling report zone after checking
   zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed
   in the nvmet_bdev_report_zone_cb().

2. c.zmr.zrasf indicates the action specific field which is set to
   NVME_ZRASF_ZONE_REPORT_ALL.

   We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the
   nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to
   filter any zone states in the nvmet_bdev_report_zone_cb().

3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in
   the Report Zone data structure Number of Zones field indicates the
number of
   fully transferred zone descriptors in the data buffer, which we set from
   return value of the blkdev_report_zones() :-
  

   reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
					     nvmet_bdev_report_zone_cb, &data);
<snip>   data.rz->nr_zones = cpu_to_le64(reported_zones);

   So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr.

Can you please explain what filtering is missing in the current code ?

Maybe I'm looking into an old spec.

>> +
>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(data.rz);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
>> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	u8 zsa = req->cmd->zms.zsa;
>> +	enum req_opf op;
>> +	int ret;
>> +	const unsigned int zsa_to_op[] = {
>> +		[NVME_ZONE_OPEN]	= REQ_OP_ZONE_OPEN,
>> +		[NVME_ZONE_CLOSE]	= REQ_OP_ZONE_CLOSE,
>> +		[NVME_ZONE_FINISH]	= REQ_OP_ZONE_FINISH,
>> +		[NVME_ZONE_RESET]	= REQ_OP_ZONE_RESET,
>> +	};
>> +
>> +	if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) {
> What is the point of the "!zsa_to_op[zsa]" here ? All the REQ_OP_ZONE_XXX are
> non 0, always...

Well this is just making sure that we receive the right action since sparse
array will return 0 for any other values than listed above having
!zsa_to_op[zsa] check we can return an error.

See nvme_sysfs_show_state() if you want.

>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	op = zsa_to_op[zsa];
>> +
>> +	if (req->cmd->zms.select_all)
>> +		nr_sect = get_capacity(req->ns->bdev->bd_disk);
> If select_all is set, sect must be ignored, so you need to have something like this:
>
> 	if (req->cmd->zms.select_all) {
> 		sect = 0;
> 		nr_sect = get_capacity(req->ns->bdev->bd_disk);
> 	} else {
> 		sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
> 		nr_sect = bdev_zone_sectors(req->ns->bdev);
> 	}
>
> Easier to read. Also may be rename nr_sect to nr_sects (plural).

Okay, will make this change.

>> +
>> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	unsigned int total_len = 0;
>> +	struct scatterlist *sg;
>> +	int ret = 0, sg_cnt;
>> +	struct bio *bio;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
> No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ?

Yes it does, you had the same comment on earlier version, it can be
confusing
that is why I proposed a helper for check transfer len and !req->sg_cnt
check,
but we don't want that helper.

>> +
>> +	if (!req->sg_cnt) {
>> +		nvmet_req_complete(req, 0);
>> +		return;
>> +	}
>> +
>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>> +		bio = &req->b.inline_bio;
>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>> +	} else {
>> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
>> +	}
>> +
>> +	bio_set_dev(bio, req->ns->bdev);
>> +	bio->bi_iter.bi_sector = sect;
>> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>> +		bio->bi_opf |= REQ_FUA;
>> +
>> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
>> +		struct page *p = sg_page(sg);
>> +		unsigned int l = sg->length;
>> +		unsigned int o = sg->offset;
>> +
>> +		ret = bio_add_zone_append_page(bio, p, l, o);
>> +		if (ret != sg->length) {
>> +			status = NVME_SC_INTERNAL;
>> +			goto out_bio_put;
>> +		}
>> +
>> +		total_len += sg->length;
>> +	}
>> +
>> +	if (total_len != nvmet_rw_data_len(req)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out_bio_put;
>> +	}
>> +
>> +	ret = submit_bio_wait(bio);
> submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it
> mandatory to block here ? The result handling could be done in the bio_end
> callback no ?

I did initially, but zonefs uses sync I/O, I'm not sure about the btrfs,
if it does
please let me know I'll make it async.

If there is no async caller in the kernel for REQ_OP_ZONE_APPEND
shouldwe make this
async ?



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-12  6:29     ` Chaitanya Kulkarni
@ 2021-03-12  7:25       ` Damien Le Moal
  2021-03-13  2:40         ` Chaitanya Kulkarni
  2021-03-15  3:54         ` Chaitanya Kulkarni
  0 siblings, 2 replies; 12+ messages in thread
From: Damien Le Moal @ 2021-03-12  7:25 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, kbusch, sagi

On 2021/03/12 15:29, Chaitanya Kulkarni wrote:
[...]
>>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
>>> +{
>>> +	/*
>>> +	 * Zone Append Size Limit is the value experessed in the units
>>> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
>>> +	 */
>>> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
>> s/9/SECTOR_SHIFT
>>
>> And you could rewrite this as:
>>
>> return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - SECTOR_SHIFT));
> 
> SECTOR_SHIFT patches are nacked, reason being it doesn't make code
> clear, how about following ?

Weird. I personally find it much nicer than a magic value "9"... But why not...

[...]
>>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>>> +{
>>> +	struct nvme_id_ns_zns *id_zns;
>>> +	u64 zsze;
>>> +	u16 status;
>>> +
>>> +	if (req->cmd->identify.csi != NVME_CSI_ZNS) {
>>> +		req->error_loc = offsetof(struct nvme_common_command, opcode);
>>> +		status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto out;
>>> +	}
>>> +
>>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>>> +	if (!id_zns) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	status = nvmet_req_find_ns(req);
>>> +	if (status) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto done;
>>> +	}
>>> +
>>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>>> +		goto done;
>>> +	}
>>> +
>>> +	nvmet_ns_revalidate(req->ns);
>>> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
>>> +					req->ns->blksize_shift;
>> s/9/SECTOR_SHIFT
> 
> See my earlier reply to the same comment.

I do not agree but if that is the nvme code policy, sure.

>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>> +	unsigned int nr_zones;
>>> +	int reported_zones;
>>> +	u16 status;
>>> +
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>>> +	if (!data.rz) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>> +			sizeof(struct nvme_zone_descriptor);
>>> +	if (!nr_zones) {
>>> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
>>> +		goto out_free_report_zones;
>>> +	}
>>> +
>>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>>> +					     nvmet_bdev_report_zone_cb, &data);
>>> +	if (reported_zones < 0) {
>>> +		status = NVME_SC_INTERNAL;
>>> +		goto out_free_report_zones;
>>> +	}
>> There is a problem here: the code as is ignores the request reporting option
>> field which can lead to an invalid zone report being returned. I think you need
>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field
>> passed by the initiator and filter the zone report since blkdev_report_zones()
>> does not handle that argument.
> 
> The reporting options are set by the host statistically in
> nvme_ns_report_zones()
> arefrom:-  nvme_ns_report_zones()
>          c.zmr.zra = NVME_ZRA_ZONE_REPORT;
>          c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
>          c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
> 
> All the above values are validated in the nvmet_bdev_zns_checks() helper
> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the
> report zone buffer.
> 
> 1. c.zmr.zra indicates the action which Reports zone descriptor entries
>    through the Report Zones data structure.
> 
>    We validate this value is been set to NVME_ZRA_ZONE_REPORT in the
>    nvmet_bdev_zns_chceks(). We are calling report zone after checking
>    zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed
>    in the nvmet_bdev_report_zone_cb().
> 
> 2. c.zmr.zrasf indicates the action specific field which is set to
>    NVME_ZRASF_ZONE_REPORT_ALL.
> 
>    We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the
>    nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to
>    filter any zone states in the nvmet_bdev_report_zone_cb().
> 
> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in
>    the Report Zone data structure Number of Zones field indicates the
> number of
>    fully transferred zone descriptors in the data buffer, which we set from
>    return value of the blkdev_report_zones() :-
>   
> 
>    reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
> 					     nvmet_bdev_report_zone_cb, &data);
> <snip>   data.rz->nr_zones = cpu_to_le64(reported_zones);
> 
>    So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr.
> 
> Can you please explain what filtering is missing in the current code ?
> 
> Maybe I'm looking into an old spec.

report zones command has the reporting options (ro) field (bits 15:08 of dword
13) where the user can specify the following values:

Value Description
0h List all zones.
1h List the zones in the ZSE:Empty state.
2h List the zones in the ZSIO:Implicitly Opened state.
3h List the zones in the ZSEO:Explicitly Opened state.
4h List the zones in the ZSC:Closed state.
5h List the zones in the ZSF:Full state.
6h List the zones in the ZSRO:Read Only state.
7h List the zones in the ZSO:Offline state.

to filter the zone report based on zone condition. blkdev_report_zones() will
always to a "list all zones", that is, ro == 0h.

But on the initiator side, if the client issue a report zones command through an
ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES
ioctl), it may specify a different value for the ro field. Processing that
command using blkdev_report_zones() like you are doing here without any
additional filtering will give an incorrect report. Filtering based on the user
specified ro field needs to be added in nvmet_bdev_report_zone_cb().

The current code here is fine of the initiator/client side uses the block layer
and execute all report zones through blkdev_report_zones(). But things will
break if the client starts doing passthrough commands using nvme ioctl. No ?

>>> +
>>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>>> +
>>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>>> +
>>> +out_free_report_zones:
>>> +	kvfree(data.rz);
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>> +{
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
>>> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	u8 zsa = req->cmd->zms.zsa;
>>> +	enum req_opf op;
>>> +	int ret;
>>> +	const unsigned int zsa_to_op[] = {
>>> +		[NVME_ZONE_OPEN]	= REQ_OP_ZONE_OPEN,
>>> +		[NVME_ZONE_CLOSE]	= REQ_OP_ZONE_CLOSE,
>>> +		[NVME_ZONE_FINISH]	= REQ_OP_ZONE_FINISH,
>>> +		[NVME_ZONE_RESET]	= REQ_OP_ZONE_RESET,
>>> +	};
>>> +
>>> +	if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) {
>> What is the point of the "!zsa_to_op[zsa]" here ? All the REQ_OP_ZONE_XXX are
>> non 0, always...
> 
> Well this is just making sure that we receive the right action since sparse
> array will return 0 for any other values than listed above having
> !zsa_to_op[zsa] check we can return an error.

But zsa is unsigned and you check it against the array size. So it can only be
within 0 and array size - 1. That is enough... I really do not see the point of
clutering the condition with something that is always true...

[...]
>>> +
>>> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
>>> +	if (ret)
>>> +		status = NVME_SC_INTERNAL;
>>> +out:
>>> +	nvmet_req_complete(req, status);
>>> +}
>>> +
>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>> +{
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>>> +	u16 status = NVME_SC_SUCCESS;
>>> +	unsigned int total_len = 0;
>>> +	struct scatterlist *sg;
>>> +	int ret = 0, sg_cnt;
>>> +	struct bio *bio;
>>> +
>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>> +		return;
>> No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ?
> 
> Yes it does, you had the same comment on earlier version, it can be
> confusing
> that is why I proposed a helper for check transfer len and !req->sg_cnt
> check,
> but we don't want that helper.

Just add a comment saying that nvmet_check_transfer_len() calls
nvmet_req_complete(). That will help people like me with a short memory span :)

>>> +
>>> +	if (!req->sg_cnt) {
>>> +		nvmet_req_complete(req, 0);
>>> +		return;
>>> +	}
>>> +
>>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>>> +		bio = &req->b.inline_bio;
>>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>>> +	} else {
>>> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
>>> +	}
>>> +
>>> +	bio_set_dev(bio, req->ns->bdev);
>>> +	bio->bi_iter.bi_sector = sect;
>>> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>>> +		bio->bi_opf |= REQ_FUA;
>>> +
>>> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
>>> +		struct page *p = sg_page(sg);
>>> +		unsigned int l = sg->length;
>>> +		unsigned int o = sg->offset;
>>> +
>>> +		ret = bio_add_zone_append_page(bio, p, l, o);
>>> +		if (ret != sg->length) {
>>> +			status = NVME_SC_INTERNAL;
>>> +			goto out_bio_put;
>>> +		}
>>> +
>>> +		total_len += sg->length;
>>> +	}
>>> +
>>> +	if (total_len != nvmet_rw_data_len(req)) {
>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>> +		goto out_bio_put;
>>> +	}
>>> +
>>> +	ret = submit_bio_wait(bio);
>> submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it
>> mandatory to block here ? The result handling could be done in the bio_end
>> callback no ?
> 
> I did initially, but zonefs uses sync I/O, I'm not sure about the btrfs,
> if it does
> please let me know I'll make it async.
> 
> If there is no async caller in the kernel for REQ_OP_ZONE_APPEND
> shouldwe make this async ?

This should not depend on what the user does, at all.
Yes, for now zonefs only uses zone append for sync write() calls. But I intend
to have zone append used for async writes too. And in btrfs, append writes are
used for all data IOs, sync or async. That is on the initiator side anyway. The
target side should not assume any special behavior of the initiator. So if there
is no technical reasons to prevent async append writes execution, I would rather
have all of them processed with async BIOs execution, similar to regular write BIOs.

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-12  7:25       ` Damien Le Moal
@ 2021-03-13  2:40         ` Chaitanya Kulkarni
  2021-03-15  3:54         ` Chaitanya Kulkarni
  1 sibling, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-13  2:40 UTC (permalink / raw)
  To: Damien Le Moal, linux-nvme; +Cc: hch, kbusch, sagi

On 3/11/21 23:26, Damien Le Moal wrote:
> On 2021/03/12 15:29, Chaitanya Kulkarni wrote:
> [...]
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>> +{
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>>> +	unsigned int nr_zones;
>>>> +	int reported_zones;
>>>> +	u16 status;
>>>> +
>>>> +	status = nvmet_bdev_zns_checks(req);
>>>> +	if (status)
>>>> +		goto out;
>>>> +
>>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>>>> +	if (!data.rz) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>>> +			sizeof(struct nvme_zone_descriptor);
>>>> +	if (!nr_zones) {
>>>> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
>>>> +		goto out_free_report_zones;
>>>> +	}
>>>> +
>>>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>>>> +					     nvmet_bdev_report_zone_cb, &data);
>>>> +	if (reported_zones < 0) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out_free_report_zones;
>>>> +	}
>>> There is a problem here: the code as is ignores the request reporting option
>>> field which can lead to an invalid zone report being returned. I think you need
>>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field
>>> passed by the initiator and filter the zone report since blkdev_report_zones()
>>> does not handle that argument.
>> The reporting options are set by the host statistically in
>> nvme_ns_report_zones()
>> arefrom:-  nvme_ns_report_zones()
>>          c.zmr.zra = NVME_ZRA_ZONE_REPORT;
>>          c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
>>          c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
>>
>> All the above values are validated in the nvmet_bdev_zns_checks() helper
>> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the
>> report zone buffer.
>>
>> 1. c.zmr.zra indicates the action which Reports zone descriptor entries
>>    through the Report Zones data structure.
>>
>>    We validate this value is been set to NVME_ZRA_ZONE_REPORT in the
>>    nvmet_bdev_zns_chceks(). We are calling report zone after checking
>>    zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed
>>    in the nvmet_bdev_report_zone_cb().
>>
>> 2. c.zmr.zrasf indicates the action specific field which is set to
>>    NVME_ZRASF_ZONE_REPORT_ALL.
>>
>>    We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the
>>    nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to
>>    filter any zone states in the nvmet_bdev_report_zone_cb().
>>
>> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in
>>    the Report Zone data structure Number of Zones field indicates the
>> number of
>>    fully transferred zone descriptors in the data buffer, which we set from
>>    return value of the blkdev_report_zones() :-
>>   
>>
>>    reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>> 					     nvmet_bdev_report_zone_cb, &data);
>> <snip>   data.rz->nr_zones = cpu_to_le64(reported_zones);
>>
>>    So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr.
>>
>> Can you please explain what filtering is missing in the current code ?
>>
>> Maybe I'm looking into an old spec.
> report zones command has the reporting options (ro) field (bits 15:08 of dword
> 13) where the user can specify the following values:
>
> Value Description
> 0h List all zones.
> 1h List the zones in the ZSE:Empty state.
> 2h List the zones in the ZSIO:Implicitly Opened state.
> 3h List the zones in the ZSEO:Explicitly Opened state.
> 4h List the zones in the ZSC:Closed state.
> 5h List the zones in the ZSF:Full state.
> 6h List the zones in the ZSRO:Read Only state.
> 7h List the zones in the ZSO:Offline state.
>
> to filter the zone report based on zone condition. blkdev_report_zones() will
> always to a "list all zones", that is, ro == 0h.
>
> But on the initiator side, if the client issue a report zones command through an
> ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES
> ioctl), it may specify a different value for the ro field. Processing that
> command using blkdev_report_zones() like you are doing here without any
> additional filtering will give an incorrect report. Filtering based on the user
> specified ro field needs to be added in nvmet_bdev_report_zone_cb().
>
> The current code here is fine of the initiator/client side uses the block layer
> and execute all report zones through blkdev_report_zones(). But things will
> break if the client starts doing passthrough commands using nvme ioctl. No ?

Okay, so I'm using the right spec. With your explanation it needs a
filtering,
will add it to the next version.

>>>> +
>>>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>>>> +
>>>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>>>> +
>>>> +out_free_report_zones:
>>>> +	kvfree(data.rz);
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>>>> +{
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
>>>> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
>>>> +	u16 status = NVME_SC_SUCCESS;
>>>> +	u8 zsa = req->cmd->zms.zsa;
>>>> +	enum req_opf op;
>>>> +	int ret;
>>>> +	const unsigned int zsa_to_op[] = {
>>>> +		[NVME_ZONE_OPEN]	= REQ_OP_ZONE_OPEN,
>>>> +		[NVME_ZONE_CLOSE]	= REQ_OP_ZONE_CLOSE,
>>>> +		[NVME_ZONE_FINISH]	= REQ_OP_ZONE_FINISH,
>>>> +		[NVME_ZONE_RESET]	= REQ_OP_ZONE_RESET,
>>>> +	};
>>>> +
>>>> +	if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) {
>>> What is the point of the "!zsa_to_op[zsa]I see, I'll add the async I/O interface." here ? All the REQ_OP_ZONE_XXX are
>>> non 0, always...
>> Well this is just making sure that we receive the right action since sparse
>> array will return 0 for any other values than listed above having
>> !zsa_to_op[zsa] check we can return an error.
> But zsa is unsigned and you check it against the array size. So it can only be
> within 0 and array size - 1. That is enough... I really do not see the point of
> clutering the condition with something that is always true...

Okay.

>
> [...]
>>>> +
>>>> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		status = NVME_SC_INTERNAL;
>>>> +out:
>>>> +	nvmet_req_complete(req, status);
>>>> +}
>>>> +
>>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>>> +{
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>>>> +	u16 status = NVME_SC_SUCCESS;
>>>> +	unsigned int total_len = 0;
>>>> +	struct scatterlist *sg;
>>>> +	int ret = 0, sg_cnt;
>>>> +	struct bio *bio;
>>>> +
>>>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>>>> +		return;
>>> No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ?
>> Yes it does, you had the same comment on earlier version, it can be
>> confusing
>> that is why I proposed a helper for check transfer len and !req->sg_cnt
>> check,
>> but we don't want that helper.
> Just add a comment saying that nvmet_check_transfer_len() calls
> nvmet_req_complete(). That will help people like me with a short memory span :)

Okay.

>>>> +
>>>> +	if (!req->sg_cnt) {
>>>> +		nvmet_req_complete(req, 0);
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>>>> +		bio = &req->b.inline_bio;
>>>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>>>> +	} else {
>>>> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
>>>> +	}
>>>> +
>>>> +	bio_set_dev(bio, req->ns->bdev);
>>>> +	bio->bi_iter.bi_sector = sect;
>>>> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>>> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>>>> +		bio->bi_opf |= REQ_FUA;
>>>> +
>>>> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
>>>> +		struct page *p = sg_page(sg);
>>>> +		unsigned int l = sg->length;
>>>> +		unsigned int o = sg->offset;
>>>> +
>>>> +		ret = bio_add_zone_append_page(bio, p, l, o);
>>>> +		if (ret != sg->length) {
>>>> +			status = NVME_SC_INTERNAL;
>>>> +			goto out_bio_put;
>>>> +		}
>>>> +
>>>> +		total_len += sg->length;
>>>> +	}
>>>> +
>>>> +	if (total_len != nvmet_rw_data_len(req)) {
>>>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>>>> +		goto out_bio_put;
>>>> +	}
>>>> +
>>>> +	ret = submit_bio_wait(bio);
>>> submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it
>>> mandatory to block here ? The result handling could be done in the bio_end
>>> callback no ?
>> I did initially, but zonefs uses sync I/O, I'm not sure about the btrfs,
>> if it does
>> please let me know I'll make it async.
>>
>> If there is no async caller in the kernel for REQ_OP_ZONE_APPEND
>> shouldwe make this async ?
> This should not depend on what the user does, at all.
> Yes, for now zonefs only uses zone append for sync write() calls. But I intend
> to have zone append used for async writes too. And in btrfs, append writes are
> used for all data IOs, sync or async. That is on the initiator side anyway. The
> target side should not assume any special behavior of the initiator. So if there
> is no technical reasons to prevent async append writes execution, I would rather
> have all of them processed with async BIOs execution, similar to regular write BIOs.

Thanks for the explanation, I was not aware about the async interface.
I'll make it async.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-12  7:25       ` Damien Le Moal
  2021-03-13  2:40         ` Chaitanya Kulkarni
@ 2021-03-15  3:54         ` Chaitanya Kulkarni
  2021-03-15  4:09           ` Damien Le Moal
  1 sibling, 1 reply; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-15  3:54 UTC (permalink / raw)
  To: Damien Le Moal, hch; +Cc: linux-nvme, kbusch, sagi

Christoph/Damien,

On 3/11/21 23:26, Damien Le Moal wrote:
>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>> +{
>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>>> +	unsigned int nr_zones;
>>>> +	int reported_zones;
>>>> +	u16 status;
>>>> +
>>>> +	status = nvmet_bdev_zns_checks(req);
>>>> +	if (status)
>>>> +		goto out;
>>>> +
>>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>>>> +	if (!data.rz) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>>> +			sizeof(struct nvme_zone_descriptor);
>>>> +	if (!nr_zones) {
>>>> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
>>>> +		goto out_free_report_zones;
>>>> +	}
>>>> +
>>>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>>>> +					     nvmet_bdev_report_zone_cb, &data);
>>>> +	if (reported_zones < 0) {
>>>> +		status = NVME_SC_INTERNAL;
>>>> +		goto out_free_report_zones;
>>>> +	}
>>> There is a problem here: the code as is ignores the request reporting option
>>> field which can lead to an invalid zone report being returned. I think you need
>>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field
>>> passed by the initiator and filter the zone report since blkdev_report_zones()
>>> does not handle that argument.
>> The reporting options are set by the host statistically in
>> nvme_ns_report_zones()
>> arefrom:-  nvme_ns_report_zones()
>>          c.zmr.zra = NVME_ZRA_ZONE_REPORT;
>>          c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
>>          c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
>>
>> All the above values are validated in the nvmet_bdev_zns_checks() helper
>> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the
>> report zone buffer.
>>
>> 1. c.zmr.zra indicates the action which Reports zone descriptor entries
>>    through the Report Zones data structure.
>>
>>    We validate this value is been set to NVME_ZRA_ZONE_REPORT in the
>>    nvmet_bdev_zns_chceks(). We are calling report zone after checking
>>    zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed
>>    in the nvmet_bdev_report_zone_cb().
>>
>> 2. c.zmr.zrasf indicates the action specific field which is set to
>>    NVME_ZRASF_ZONE_REPORT_ALL.
>>
>>    We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the
>>    nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to
>>    filter any zone states in the nvmet_bdev_report_zone_cb().
>>
>> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in
>>    the Report Zone data structure Number of Zones field indicates the
>> number of
>>    fully transferred zone descriptors in the data buffer, which we set from
>>    return value of the blkdev_report_zones() :-
>>   
>>
>>    reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>> 					     nvmet_bdev_report_zone_cb, &data);
>> <snip>   data.rz->nr_zones = cpu_to_le64(reported_zones);
>>
>>    So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr.
>>
>> Can you please explain what filtering is missing in the current code ?
>>
>> Maybe I'm looking into an old spec.
> report zones command has the reporting options (ro) field (bits 15:08 of dword
> 13) where the user can specify the following values:
>
> Value Description
> 0h List all zones.
> 1h List the zones in the ZSE:Empty state.
> 2h List the zones in the ZSIO:Implicitly Opened state.
> 3h List the zones in the ZSEO:Explicitly Opened state.
> 4h List the zones in the ZSC:Closed state.
> 5h List the zones in the ZSF:Full state.
> 6h List the zones in the ZSRO:Read Only state.
> 7h List the zones in the ZSO:Offline state.
>
> to filter the zone report based on zone condition. blkdev_report_zones() will
> always to a "list all zones", that is, ro == 0h.
>
> But on the initiator side, if the client issue a report zones command through an
> ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES
> ioctl), it may specify a different value for the ro field. Processing that
> command using blkdev_report_zones() like you are doing here without any
> additional filtering will give an incorrect report. Filtering based on the user
> specified ro field needs to be added in nvmet_bdev_report_zone_cb().
>
> The current code here is fine of the initiator/client side uses the block layer
> and execute all report zones through blkdev_report_zones(). But things will
> break if the client starts doing passthrough commands using nvme ioctl. No ?
>

Regarding the passthru commands support following are the non technical
issues
I ran into, it will be great to get some feedback before we proceed :-

1. We don't support any passthru commands for the generic NVMeOF target
backends
   (file/bdev) for that we have a dedicated passthru target. ZBD backend
is a
   generic one, so do we really want to add support for user's NVMe
passthru
   commands ?

2. In past we've decided that for NVMeOF target non I/O commands should
be handled
   in the userspace applications to keep the target light.

I couldn't come up with justification :(.

There are other features that we may have to implement apart from filtering
if we want to support NVMe passthru commands e.g. Set Zone Descriptor
Extension,
Changed Zone List, AEN Zone Descriptor Changed Notices (Log Identifier
BFh) etc.
(these are just placeholder examples, please do not take it literally).

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-15  3:54         ` Chaitanya Kulkarni
@ 2021-03-15  4:09           ` Damien Le Moal
  2021-03-15  4:53             ` Chaitanya Kulkarni
  0 siblings, 1 reply; 12+ messages in thread
From: Damien Le Moal @ 2021-03-15  4:09 UTC (permalink / raw)
  To: Chaitanya Kulkarni, hch; +Cc: linux-nvme, kbusch, sagi

On 2021/03/15 12:54, Chaitanya Kulkarni wrote:
> Christoph/Damien,
> 
> On 3/11/21 23:26, Damien Le Moal wrote:
>>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>>>> +{
>>>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>>>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>>>> +	unsigned int nr_zones;
>>>>> +	int reported_zones;
>>>>> +	u16 status;
>>>>> +
>>>>> +	status = nvmet_bdev_zns_checks(req);
>>>>> +	if (status)
>>>>> +		goto out;
>>>>> +
>>>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>>>>> +	if (!data.rz) {
>>>>> +		status = NVME_SC_INTERNAL;
>>>>> +		goto out;
>>>>> +	}
>>>>> +
>>>>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>>>> +			sizeof(struct nvme_zone_descriptor);
>>>>> +	if (!nr_zones) {
>>>>> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
>>>>> +		goto out_free_report_zones;
>>>>> +	}
>>>>> +
>>>>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>>>>> +					     nvmet_bdev_report_zone_cb, &data);
>>>>> +	if (reported_zones < 0) {
>>>>> +		status = NVME_SC_INTERNAL;
>>>>> +		goto out_free_report_zones;
>>>>> +	}
>>>> There is a problem here: the code as is ignores the request reporting option
>>>> field which can lead to an invalid zone report being returned. I think you need
>>>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field
>>>> passed by the initiator and filter the zone report since blkdev_report_zones()
>>>> does not handle that argument.
>>> The reporting options are set by the host statistically in
>>> nvme_ns_report_zones()
>>> arefrom:-  nvme_ns_report_zones()
>>>          c.zmr.zra = NVME_ZRA_ZONE_REPORT;
>>>          c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
>>>          c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
>>>
>>> All the above values are validated in the nvmet_bdev_zns_checks() helper
>>> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the
>>> report zone buffer.
>>>
>>> 1. c.zmr.zra indicates the action which Reports zone descriptor entries
>>>    through the Report Zones data structure.
>>>
>>>    We validate this value is been set to NVME_ZRA_ZONE_REPORT in the
>>>    nvmet_bdev_zns_chceks(). We are calling report zone after checking
>>>    zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed
>>>    in the nvmet_bdev_report_zone_cb().
>>>
>>> 2. c.zmr.zrasf indicates the action specific field which is set to
>>>    NVME_ZRASF_ZONE_REPORT_ALL.
>>>
>>>    We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the
>>>    nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to
>>>    filter any zone states in the nvmet_bdev_report_zone_cb().
>>>
>>> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in
>>>    the Report Zone data structure Number of Zones field indicates the
>>> number of
>>>    fully transferred zone descriptors in the data buffer, which we set from
>>>    return value of the blkdev_report_zones() :-
>>>   
>>>
>>>    reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>>> 					     nvmet_bdev_report_zone_cb, &data);
>>> <snip>   data.rz->nr_zones = cpu_to_le64(reported_zones);
>>>
>>>    So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr.
>>>
>>> Can you please explain what filtering is missing in the current code ?
>>>
>>> Maybe I'm looking into an old spec.
>> report zones command has the reporting options (ro) field (bits 15:08 of dword
>> 13) where the user can specify the following values:
>>
>> Value Description
>> 0h List all zones.
>> 1h List the zones in the ZSE:Empty state.
>> 2h List the zones in the ZSIO:Implicitly Opened state.
>> 3h List the zones in the ZSEO:Explicitly Opened state.
>> 4h List the zones in the ZSC:Closed state.
>> 5h List the zones in the ZSF:Full state.
>> 6h List the zones in the ZSRO:Read Only state.
>> 7h List the zones in the ZSO:Offline state.
>>
>> to filter the zone report based on zone condition. blkdev_report_zones() will
>> always to a "list all zones", that is, ro == 0h.
>>
>> But on the initiator side, if the client issue a report zones command through an
>> ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES
>> ioctl), it may specify a different value for the ro field. Processing that
>> command using blkdev_report_zones() like you are doing here without any
>> additional filtering will give an incorrect report. Filtering based on the user
>> specified ro field needs to be added in nvmet_bdev_report_zone_cb().
>>
>> The current code here is fine of the initiator/client side uses the block layer
>> and execute all report zones through blkdev_report_zones(). But things will
>> break if the client starts doing passthrough commands using nvme ioctl. No ?
>>
> 
> Regarding the passthru commands support following are the non technical
> issues
> I ran into, it will be great to get some feedback before we proceed :-
> 
> 1. We don't support any passthru commands for the generic NVMeOF target
> backends
>    (file/bdev) for that we have a dedicated passthru target. ZBD backend
> is a
>    generic one, so do we really want to add support for user's NVMe
> passthru
>    commands ?

If it is not supported then fine, no filtering is needed. But you should add a
check on the report zones command received by the target to make sure that the
reporting option field is "report all", and fail the command if any other
reporting option is set since it cannot be honored.

> 
> 2. In past we've decided that for NVMeOF target non I/O commands should
> be handled
>    in the userspace applications to keep the target light.
> 
> I couldn't come up with justification :(.
> 
> There are other features that we may have to implement apart from filtering
> if we want to support NVMe passthru commands e.g. Set Zone Descriptor
> Extension,
> Changed Zone List, AEN Zone Descriptor Changed Notices (Log Identifier
> BFh) etc.
> (these are just placeholder examples, please do not take it literally).
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support
  2021-03-15  4:09           ` Damien Le Moal
@ 2021-03-15  4:53             ` Chaitanya Kulkarni
  0 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2021-03-15  4:53 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: hch, linux-nvme, kbusch, sagi

On 3/14/21 21:09, Damien Le Moal wrote:
>> Regarding the passthru commands support following are the non technical
>> issues
>> I ran into, it will be great to get some feedback before we proceed :-
>>
>> 1. We don't support any passthru commands for the generic NVMeOF target
>> backends
>>    (file/bdev) for that we have a dedicated passthru target. ZBD backend
>> is a
>>    generic one, so do we really want to add support for user's NVMe
>> passthru
>>    commands ?
> If it is not supported then fine, no filtering is needed. But you should add a
> check on the report zones command received by the target to make sure that the
> reporting option field is "report all", and fail the command if any other
> reporting option is set since it cannot be honored.
>

Absolutely will do that, thanks for the quick reply.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-03-15  5:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11  7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni
2021-03-11  7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni
2021-03-11  7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
2021-03-12  1:15   ` Damien Le Moal
2021-03-12  6:29     ` Chaitanya Kulkarni
2021-03-12  7:25       ` Damien Le Moal
2021-03-13  2:40         ` Chaitanya Kulkarni
2021-03-15  3:54         ` Chaitanya Kulkarni
2021-03-15  4:09           ` Damien Le Moal
2021-03-15  4:53             ` Chaitanya Kulkarni
2021-03-11  7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
2021-03-12  0:37   ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).