* [PATCH V12 0/3] nvmet: add ZBD backend support @ 2021-03-11 7:15 Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-11 7:15 UTC (permalink / raw) To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni Hi, NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru backend. There is no support for a generic block device backend to handle the ZBDs which are not NVMe protocol compliant. This adds support to export the ZBDs (which are not NVMe drives) to host from the target via NVMeOF using the host side ZNS interface. Note: This patch-series is based on nvme-5.13. Following scenarios tested successfully:- * Zonefs Test with dm-linear on the top of SMR HDD exported over NVMeOF. * Zonefs Test With CONFIG_BLK_DEV_ZONED nvme genblk target and ZBD with null_blk zoned target with zonefs. * Without CONFIG_BLK_DEV_ZONED nvme tests on genblk target. -ck Changes from V11:- 1. Use bdev_logical_block_size() in nvmet_bdev_zns_enable(). 2. Return error if nr_zones calculation ends up having value 0. 4. Drop the comment patch from this series :- nvme: add comments to nvme_zns_alloc_report_buffer Changes from V10:- 1. Move CONFIG_BLK_DEV_ZONED check into the caller of nvmet_set_csi_zns_effects(). 2. Move ZNS related csi code from csi patch to its own ZNS backend patch. 3. For ZNS command effects logs set default command effects with nvmet_set_csi_nvm_effects() along with nvmet_set_csi_zns_effects(). 4. Use goto for failure case in the nvmet_set_csi_zns_effects(). 5. Return directly from swicth in nvmet_execute_identify_desclist_csi(). 6. Merge Patch 2nd/3rd into one patch and move ZNS related code into its own :- [PATCH V10 2/8] nvmet: add NVM Command Set Identifier support [PATCH V10 3/8] nvmet: add command set supported ctrl cap Merged into new patch minux ZNS calls :- [PATCH V11 1/4] nvmet: add NVM Command Set Identifier support 7. Move req->cmd->identify.csi == NVME_CSI_ZNS checks in to respective caller in nvmet_execute_identify(). 8. Update error log page in nvmet_bdev_zns_checks(). 9. Remove the terney expression nvmet_zns_update_zasl(). 10. Drop following patches:- [PATCH V10 1/8] nvmet: trim args for nvmet_copy_ns_identifier() [PATCH V10 6/8] nvme-core: check ctrl css before setting up zns [PATCH V10 7/8] nvme-core: add a helper to print css related error Changes from V9:- 1. Use bio_add_zone_append_page() for REQ_OP_ZONE_APPEND. 2. Add a separate prep patch to reduce the arguments for nvmet_copy_ns_identifier(). 3. Add a separate patch for nvmet CSI support. 4. Add a separate patch for nvmet CSS support. 5. Move nvmet_cc_css_check() to multi-css supported patch. 6. Return error in csi cmd effects helper when !CONFIG_BLK_DEV_ZONED. 7. Return error in id desc list helper when !CONFIG_BLK_DEV_ZONED. 8. Remove goto and return from nvmet_bdev_zns_checks(). 9. Move nr_zones calculations near call to blkdev_report_zones() in nvmet_bdev_execute_zone_mgmt_recv(). 10. Split command effects logs into respective CSI helpers. 11. Don't use the local variables to pass NVME_CSI_XXX values, instead use req->ns->csi, also move this from ZBD support patch to nvmet CSI support. 12. Fix the bug that is chekcing cns value instead of csi in identify. 13. bdev_is_zoned() is stubbed out the CONFIG_BLK_DEV_ZONED, so remove the check for CONFIG_BLK_DEV_ZONED before calling bdev_is_zoned(). 14 Drop following patches :- [PATCH V9 1/9] block: export bio_add_hw_pages(). [PATCH V9 5/9] nvmet: add bio get helper. [PATCH V9 6/9] nvmet: add bio init helper. [PATCH V9 8/9] nvmet: add common I/O length check helper. [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append. 15. Add a patch to check for ctrl bits to make sure ctrl supports the multi css on the host side when setting up Zoned Namespace. 17. Add a documentation patch for the host side when calculating the buffer size allocation size for the report zones. 16. Rebase and retest 5.12-rc1. Changes from V8:- 1. Rebase and retest on latest nvme-5.11. 2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set. 3. Add a fix to admin ns-desc list handler for handling default csi. Changes from V7:- 1. Just like what block layer provides an API for bio_init(), provide nvmet_bio_init() such that we move bio initialization code for nvme-read-write commands from bdev and zns backend into the centralize helper. 2. With bdev/zns/file now we have three backends that are checking for req->sg_cnt and calling nvmet_check_transfer_len() before we process nvme-read-write commands. Move this duplicate code from three backeneds into the helper. 3. Export and use nvmet_bio_done() callback in nvmet_execute_zone_append() instead of the open coding the function. This also avoids code duplication for bio & request completion with error log page update. 4. Add zonefs tests log for dm linear device created on the top of SMR HDD exported with NVMeOF ZNS backend with the help of nvme-loop. Changes from V6:- 1. Instead of calling report zones to find conventional zones in the loop use the loop inside LLD blkdev_report_zones()->LLD_report_zones, that also simplifies the report zone callback. 2. Fix the bug in nvmet_bdev_has_conv_zones(). 3. Remove conditional operators in the nvmet_bdev_execute_zone_append(). Changes from V5:- 1. Use bio->bi_iter.bi_sector for result of the REQ_OP_ZONE_APPEND command. 2. Add endianness to the helper nvmet_sect_to_lba(). 3. Make bufsize u32 in zone mgmt recv command handler. 4. Add __GFP_ZERO for report zone data buffer to return clean buffer. Changes from V4:- 1. Don't use bio_iov_iter_get_pages() instead add a patch to export bio_add_hw_page() and call it directly for zone append. 2. Add inline vector optimization for append bio. 3. Update the commit logs for the patches. 4. Remove ZNS related identify data structures, use individual members. 5. Add a comment for macro NVMET_MPSMIN_SHIFT. 6. Remove nvmet_bdev() helper. 7. Move the command set identifier code into common code. 8. Use IS_ENABLED() and move helpers fomr zns.c into common code. 9. Add a patch to support Command Set identifiers. 10. Open code nvmet_bdev_validate_zns_zones(). 11. Remove the per namespace min zasl calculation and don't allow namespaces with zasl value > the first ns zasl value. 12. Move the stubs into the header file. 13. Add lba to/from sector conversion helpers and update the io-cmd-bdev.c to avoid the code duplication. 14. Add everything into one patch for zns command handlers and respective calls from the target code. 15. Remove the trim ns-desclist admin callback patch from this series. 16. Add bio get and put helpers patches to reduce the duplicate code in generic bdev, passthru, and generic zns backend. Changes from V3:- 1. Get rid of the bio_max_zasl check. 2. Remove extra lines. 3. Remove the block layer api export patch. 4. Remove the bvec check in the bio_iov_iter_get_pages() for REQ_OP_ZONE_APPEND so that we can reuse the code. Changes from V2:- 1. Move conventional zone bitmap check into nvmet_bdev_validate_zns_zones(). 2. Don't use report zones call to check the runt zone. 3. Trim nvmet_zasl() helper. 4. Fix typo in the nvmet_zns_update_zasl(). 5. Remove the comment and fix the mdts calculation in nvmet_execute_identify_cns_cs_ctrl(). 6. Use u64 for bufsize in nvmet_bdev_execute_zone_mgmt_recv(). 7. Remove nvmet_zones_to_desc_size() and fix the nr_zones calculation. 8. Remove the op variable in nvmet_bdev_execute_zone_append(). 9. Fix the nr_zones calculation nvmet_bdev_execute_zone_mgmt_recv(). 10. Update cover letter subject. Changes from V1:- 1. Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o. 2. Mark helpers inline. 3. Fix typos in the comments and update the comments. 4. Get rid of the curly brackets. 5. Don't allow drives with last smaller zones. 6. Calculate the zasl as a function of ax_zone_append_sectors, bio_max_pages so we don't have to split the bio. 7. Add global subsys->zasl and update the zasl when new namespace is enabled. 8. Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and move functionality in to the report zone callback. 9. Add goto for default case in nvmet_bdev_execute_zone_mgmt_send(). 10. Allocate the zones buffer with zones size instead of bdev nr_zones. Chaitanya Kulkarni (3): nvmet: add NVM Command Set Identifier support nvmet: add ZBD over ZNS backend support nvmet: add nvmet_req_bio put helper for backends drivers/nvme/target/Makefile | 1 + drivers/nvme/target/admin-cmd.c | 74 ++++++- drivers/nvme/target/core.c | 16 +- drivers/nvme/target/io-cmd-bdev.c | 37 +++- drivers/nvme/target/nvmet.h | 45 ++++ drivers/nvme/target/passthru.c | 3 +- drivers/nvme/target/zns.c | 331 ++++++++++++++++++++++++++++++ include/linux/nvme.h | 1 + 8 files changed, 487 insertions(+), 21 deletions(-) create mode 100644 drivers/nvme/target/zns.c * Zonefs Test log with dm-linear on the top of SMR HDD:- -------------------------------------------------------------------------------- 1. Test Zoned Block Device info :- -------------------------------------------------------------------------------- # fdisk -l /dev/sdh Disk /dev/sdh: 13.64 TiB, 15000173281280 bytes, 3662151680 sectors Disk model: HGST HSH721415AL Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes # cat /sys/block/sdh/queue/nr_zones 55880 # cat /sys/block/sdh/queue/zoned host-managed # cat /sys/block/sdh/queue/zone_append_max_bytes 688128 2. Creating NVMeOF target backed by dm-linear on the top of ZBD -------------------------------------------------------------------------------- # ./zbdev.sh 1 dm-zbd ++ NQN=dm-zbd ++ echo '0 29022486528 linear /dev/sdh 274726912' | dmsetup create cksdh 9 directories, 4 files ++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd ++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1 ++ echo -n /dev/dm-0 ++ cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path /dev/dm-0 ++ echo 1 ++ mkdir /sys/kernel/config/nvmet/ports/1/ ++ echo -n loop ++ echo -n 1 ++ ln -s /sys/kernel/config/nvmet/subsystems/dm-zbd /sys/kernel/config/nvmet/ports/1/subsystems/ ++ sleep 1 ++ echo transport=loop,nqn=dm-zbd ++ sleep 1 ++ dmesg -c [233450.572565] nvmet: adding nsid 1 to subsystem dm-zbd [233452.269477] nvmet: creating controller 1 for subsystem dm-zbd for NQN nqn.2014-08.org.nvmexpress:uuid:853d7e82-8018-44ce-8784-ab81e7465ad9. [233452.283352] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices. [233452.292805] nvme nvme0: creating 8 I/O queues. [233452.299210] nvme nvme0: new ctrl: "dm-zbd" 3. dm-linear and backend SMR HDD association :- -------------------------------------------------------------------------------- # cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path /dev/dm-0 # dmsetup ls --tree cksdh (252:0) └─ (8:112) # lsblk | head -3 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdh 8:112 0 13.6T 0 disk └─cksdh 252:0 0 13.5T 0 dm 4. NVMeOF controller :- -------------------------------------------------------------------------------- # nvme list | tr -s ' ' ' ' Node SN Model Namespace Usage Format FW Rev /dev/nvme0n1 8c6f348dcd64404c Linux 1 14.86 TB / 14.86 TB 4 KiB + 0 B 5.10.0nv 5. Zonefs tests results :- -------------------------------------------------------------------------------- # ./zonefs-tests.sh /dev/nvme0n1 Gathering information on /dev/nvme0n1... zonefs-tests on /dev/nvme0n1: 55356 zones (0 conventional zones, 55356 sequential zones) 524288 512B sectors zone size (256 MiB) 1 max open zones Running tests Test 0010: mkzonefs (options) ... PASS Test 0011: mkzonefs (force format) ... PASS Test 0012: mkzonefs (invalid device) ... PASS Test 0013: mkzonefs (super block zone state) ... PASS Test 0020: mount (default) ... PASS Test 0021: mount (invalid device) ... PASS Test 0022: mount (check mount directory sub-directories) ... PASS Test 0023: mount (options) ... PASS Test 0030: Number of files (default) ... PASS Test 0031: Number of files (aggr_cnv) ... skip Test 0032: Number of files using stat (default) ... PASS Test 0033: Number of files using stat (aggr_cnv) ... PASS Test 0034: Number of blocks using stat (default) ... PASS Test 0035: Number of blocks using stat (aggr_cnv) ... PASS Test 0040: Files permissions (default) ... PASS Test 0041: Files permissions (aggr_cnv) ... skip Test 0042: Files permissions (set value) ... PASS Test 0043: Files permissions (set value + aggr_cnv) ... skip Test 0050: Files owner (default) ... PASS Test 0051: Files owner (aggr_cnv) ... skip Test 0052: Files owner (set value) ... PASS Test 0053: Files owner (set value + aggr_cnv) ... skip Test 0060: Files size (default) ... PASS Test 0061: Files size (aggr_cnv) ... skip Test 0070: Conventional file truncate ... skip Test 0071: Conventional file truncate (aggr_cnv) ... skip Test 0072: Conventional file unlink ... skip Test 0073: Conventional file unlink (aggr_cnv) ... skip Test 0074: Conventional file random write ... skip Test 0075: Conventional file random write (direct) ... skip Test 0076: Conventional file random write (aggr_cnv) ... skip Test 0077: Conventional file random write (aggr_cnv, direct) ... skip Test 0078: Conventional file mmap read/write ... skip Test 0079: Conventional file mmap read/write (aggr_cnv) ... skip Test 0080: Sequential file truncate ... PASS Test 0081: Sequential file unlink ... PASS Test 0082: Sequential file buffered write IO ... PASS Test 0083: Sequential file overwrite ... PASS Test 0084: Sequential file unaligned write (sync IO) ... PASS Test 0085: Sequential file unaligned write (async IO) ... PASS Test 0086: Sequential file append (sync) ... PASS Test 0087: Sequential file append (async) ... PASS Test 0088: Sequential file random read ... PASS Test 0089: Sequential file mmap read/write ... PASS Test 0090: sequential file 4K synchronous write ... PASS Test 0091: Sequential file large synchronous write ... PASS 46 / 46 tests passed * With CONFIG_BLK_DEV_ZONED nvme and zonefs tests on membacked null_blk zoned :- -------------------------------------------------------------------------------- # grep -i blk_dev_zoned .config CONFIG_BLK_DEV_ZONED=y # make M=drivers/nvme/ clean CLEAN drivers/nvme//Module.symvers # make M=drivers/nvme/ CC [M] drivers/nvme//host/core.o CC [M] drivers/nvme//host/trace.o CC [M] drivers/nvme//host/lightnvm.o CC [M] drivers/nvme//host/zns.o CC [M] drivers/nvme//host/hwmon.o LD [M] drivers/nvme//host/nvme-core.o CC [M] drivers/nvme//host/pci.o LD [M] drivers/nvme//host/nvme.o CC [M] drivers/nvme//host/fabrics.o LD [M] drivers/nvme//host/nvme-fabrics.o CC [M] drivers/nvme//host/rdma.o LD [M] drivers/nvme//host/nvme-rdma.o CC [M] drivers/nvme//host/fc.o LD [M] drivers/nvme//host/nvme-fc.o CC [M] drivers/nvme//host/tcp.o LD [M] drivers/nvme//host/nvme-tcp.o CC [M] drivers/nvme//target/core.o CC [M] drivers/nvme//target/configfs.o CC [M] drivers/nvme//target/admin-cmd.o CC [M] drivers/nvme//target/fabrics-cmd.o CC [M] drivers/nvme//target/discovery.o CC [M] drivers/nvme//target/io-cmd-file.o CC [M] drivers/nvme//target/io-cmd-bdev.o CC [M] drivers/nvme//target/passthru.o CC [M] drivers/nvme//target/zns.o CC [M] drivers/nvme//target/trace.o LD [M] drivers/nvme//target/nvmet.o CC [M] drivers/nvme//target/loop.o LD [M] drivers/nvme//target/nvme-loop.o CC [M] drivers/nvme//target/rdma.o LD [M] drivers/nvme//target/nvmet-rdma.o CC [M] drivers/nvme//target/fc.o LD [M] drivers/nvme//target/nvmet-fc.o CC [M] drivers/nvme//target/fcloop.o LD [M] drivers/nvme//target/nvme-fcloop.o CC [M] drivers/nvme//target/tcp.o LD [M] drivers/nvme//target/nvmet-tcp.o MODPOST drivers/nvme//Module.symvers CC [M] drivers/nvme//host/nvme-core.mod.o LD [M] drivers/nvme//host/nvme-core.ko CC [M] drivers/nvme//host/nvme-fabrics.mod.o LD [M] drivers/nvme//host/nvme-fabrics.ko CC [M] drivers/nvme//host/nvme-fc.mod.o LD [M] drivers/nvme//host/nvme-fc.ko CC [M] drivers/nvme//host/nvme-rdma.mod.o LD [M] drivers/nvme//host/nvme-rdma.ko CC [M] drivers/nvme//host/nvme-tcp.mod.o LD [M] drivers/nvme//host/nvme-tcp.ko CC [M] drivers/nvme//host/nvme.mod.o LD [M] drivers/nvme//host/nvme.ko CC [M] drivers/nvme//target/nvme-fcloop.mod.o LD [M] drivers/nvme//target/nvme-fcloop.ko CC [M] drivers/nvme//target/nvme-loop.mod.o LD [M] drivers/nvme//target/nvme-loop.ko CC [M] drivers/nvme//target/nvmet-fc.mod.o LD [M] drivers/nvme//target/nvmet-fc.ko CC [M] drivers/nvme//target/nvmet-rdma.mod.o LD [M] drivers/nvme//target/nvmet-rdma.ko CC [M] drivers/nvme//target/nvmet-tcp.mod.o LD [M] drivers/nvme//target/nvmet-tcp.ko CC [M] drivers/nvme//target/nvmet.mod.o LD [M] drivers/nvme//target/nvmet.ko # # cdblktests # ./check tests/nvme/ nvme/002 (create many subsystems and test discovery) [passed] runtime 24.378s ... 24.636s nvme/003 (test if we're sending keep-alives to a discovery controller) [passed] runtime 10.133s ... 10.152s nvme/004 (test nvme and nvmet UUID NS descriptors) [passed] runtime 2.463s ... 2.478s nvme/005 (reset local loopback target) [not run] nvme_core module does not have parameter multipath nvme/006 (create an NVMeOF target with a block device-backed ns) [passed] runtime 0.095s ... 0.122s nvme/007 (create an NVMeOF target with a file-backed ns) [passed] runtime 0.065s ... 0.079s nvme/008 (create an NVMeOF host with a block device-backed ns) [passed] runtime 2.473s ... 2.501s nvme/009 (create an NVMeOF host with a file-backed ns) [passed] runtime 2.460s ... 2.424s nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed] runtime 24.526s ... 28.015s nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed] runtime 265.967s ... 282.717s nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed] runtime 44.665s ... 48.124s nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed] runtime 261.739s ... 352.331s nvme/014 (flush a NVMeOF block device-backed ns) [passed] runtime 21.268s ... 22.013s nvme/015 (unit test for NVMe flush for file backed ns) [passed] runtime 18.820s ... 22.104s nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed] runtime 13.899s ... 14.322s nvme/017 (create/delete many file-ns and test discovery) [passed] runtime 14.322s ... 14.031s nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed] runtime 2.450s ... 2.444s nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed] runtime 2.475s ... 2.489s nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed] runtime 2.410s ... 2.448s nvme/021 (test NVMe list command on NVMeOF file-backed ns) [passed] runtime 2.441s ... 2.439s nvme/022 (test NVMe reset command on NVMeOF file-backed ns) [passed] runtime 2.864s ... 2.863s nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed] runtime 2.465s ... 2.446s nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed] runtime 2.416s ... 2.411s nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed] runtime 2.419s ... 2.748s nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed] runtime 2.422s ... 2.410s nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed] runtime 2.456s ... 2.462s nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed] runtime 2.427s ... 2.429s nvme/029 (test userspace IO via nvme-cli read/write interface) [passed] runtime 2.751s ... 2.755s nvme/030 (ensure the discovery generation counter is updated appropriately) [passed] runtime 0.346s ... 0.357s nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed] runtime 13.601s ... 13.591s nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed] runtime 0.039s ... 0.059s # # cdzonefstest # ./zonefs-tests.sh /dev/nvme1n1 Gathering information on /dev/nvme1n1... zonefs-tests on /dev/nvme1n1: 16 zones (0 conventional zones, 16 sequential zones) 131072 512B sectors zone size (64 MiB) 1 max open zones Running tests Test 0010: mkzonefs (options) ... PASS Test 0011: mkzonefs (force format) ... PASS Test 0012: mkzonefs (invalid device) ... PASS Test 0013: mkzonefs (super block zone state) ... PASS Test 0020: mount (default) ... PASS Test 0021: mount (invalid device) ... PASS Test 0022: mount (check mount directory sub-directories) ... PASS Test 0023: mount (options) ... PASS Test 0030: Number of files (default) ... PASS Test 0031: Number of files (aggr_cnv) ... skip Test 0032: Number of files using stat (default) ... PASS Test 0033: Number of files using stat (aggr_cnv) ... PASS Test 0034: Number of blocks using stat (default) ... PASS Test 0035: Number of blocks using stat (aggr_cnv) ... PASS Test 0040: Files permissions (default) ... PASS Test 0041: Files permissions (aggr_cnv) ... skip Test 0042: Files permissions (set value) ... PASS Test 0043: Files permissions (set value + aggr_cnv) ... skip Test 0050: Files owner (default) ... PASS Test 0051: Files owner (aggr_cnv) ... skip Test 0052: Files owner (set value) ... PASS Test 0053: Files owner (set value + aggr_cnv) ... skip Test 0060: Files size (default) ... PASS Test 0061: Files size (aggr_cnv) ... skip Test 0070: Conventional file truncate ... skip Test 0071: Conventional file truncate (aggr_cnv) ... skip Test 0072: Conventional file unlink ... skip Test 0073: Conventional file unlink (aggr_cnv) ... skip Test 0074: Conventional file random write ... skip Test 0075: Conventional file random write (direct) ... skip Test 0076: Conventional file random write (aggr_cnv) ... skip Test 0077: Conventional file random write (aggr_cnv, direct) ... skip Test 0078: Conventional file mmap read/write ... skip Test 0079: Conventional file mmap read/write (aggr_cnv) ... skip Test 0080: Sequential file truncate ... PASS Test 0081: Sequential file unlink ... PASS Test 0082: Sequential file buffered write IO ... PASS Test 0083: Sequential file overwrite ... PASS Test 0084: Sequential file unaligned write (sync IO) ... PASS Test 0085: Sequential file unaligned write (async IO) ... PASS Test 0086: Sequential file append (sync) ... PASS Test 0087: Sequential file append (async) ... PASS Test 0088: Sequential file random read ... PASS Test 0089: Sequential file mmap read/write ... PASS Test 0090: sequential file 4K synchronous write ... PASS Test 0091: Sequential file large synchronous write ... PASS 46 / 46 tests passed * Without CONFIG_BLK_DEV_ZONED nvme tests :- -------------------------------------------------------------------------------- # # grep -i blk_dev_zoned .config # CONFIG_BLK_DEV_ZONED is not set # makej M=drivers/nvme/ clean CLEAN drivers/nvme//Module.symvers # makej M=drivers/nvme/ CC [M] drivers/nvme//host/core.o CC [M] drivers/nvme//host/trace.o CC [M] drivers/nvme//host/lightnvm.o CC [M] drivers/nvme//target/core.o CC [M] drivers/nvme//host/hwmon.o CC [M] drivers/nvme//target/configfs.o CC [M] drivers/nvme//host/pci.o CC [M] drivers/nvme//target/admin-cmd.o CC [M] drivers/nvme//host/fabrics.o CC [M] drivers/nvme//host/rdma.o CC [M] drivers/nvme//target/fabrics-cmd.o CC [M] drivers/nvme//target/discovery.o CC [M] drivers/nvme//host/fc.o CC [M] drivers/nvme//target/io-cmd-file.o CC [M] drivers/nvme//host/tcp.o CC [M] drivers/nvme//target/io-cmd-bdev.o CC [M] drivers/nvme//target/passthru.o CC [M] drivers/nvme//target/trace.o CC [M] drivers/nvme//target/loop.o CC [M] drivers/nvme//target/rdma.o CC [M] drivers/nvme//target/fc.o CC [M] drivers/nvme//target/fcloop.o CC [M] drivers/nvme//target/tcp.o LD [M] drivers/nvme//target/nvme-loop.o LD [M] drivers/nvme//target/nvme-fcloop.o LD [M] drivers/nvme//target/nvmet-tcp.o LD [M] drivers/nvme//host/nvme-fabrics.o LD [M] drivers/nvme//host/nvme.o LD [M] drivers/nvme//host/nvme-rdma.o LD [M] drivers/nvme//target/nvmet-rdma.o LD [M] drivers/nvme//target/nvmet.o LD [M] drivers/nvme//target/nvmet-fc.o LD [M] drivers/nvme//host/nvme-tcp.o LD [M] drivers/nvme//host/nvme-fc.o LD [M] drivers/nvme//host/nvme-core.o MODPOST drivers/nvme//Module.symvers CC [M] drivers/nvme//host/nvme-core.mod.o CC [M] drivers/nvme//host/nvme-fabrics.mod.o CC [M] drivers/nvme//host/nvme-fc.mod.o CC [M] drivers/nvme//host/nvme-rdma.mod.o CC [M] drivers/nvme//host/nvme-tcp.mod.o CC [M] drivers/nvme//host/nvme.mod.o CC [M] drivers/nvme//target/nvme-fcloop.mod.o CC [M] drivers/nvme//target/nvme-loop.mod.o CC [M] drivers/nvme//target/nvmet-fc.mod.o CC [M] drivers/nvme//target/nvmet-rdma.mod.o CC [M] drivers/nvme//target/nvmet-tcp.mod.o CC [M] drivers/nvme//target/nvmet.mod.o LD [M] drivers/nvme//target/nvme-fcloop.ko LD [M] drivers/nvme//host/nvme-tcp.ko LD [M] drivers/nvme//host/nvme-core.ko LD [M] drivers/nvme//target/nvmet-tcp.ko LD [M] drivers/nvme//target/nvme-loop.ko LD [M] drivers/nvme//target/nvmet-fc.ko LD [M] drivers/nvme//host/nvme-fabrics.ko LD [M] drivers/nvme//host/nvme-fc.ko LD [M] drivers/nvme//target/nvmet-rdma.ko LD [M] drivers/nvme//host/nvme-rdma.ko LD [M] drivers/nvme//host/nvme.ko LD [M] drivers/nvme//target/nvmet.ko # # cdblktests # ./check tests/nvme/ nvme/002 (create many subsystems and test discovery) [passed] runtime ... 27.640s nvme/003 (test if we're sending keep-alives to a discovery controller) [passed] runtime 10.145s ... 10.147s nvme/004 (test nvme and nvmet UUID NS descriptors) [passed] runtime 1.713s ... 1.712s nvme/005 (reset local loopback target) [not run] nvme_core module does not have parameter multipath nvme/006 (create an NVMeOF target with a block device-backed ns) [passed] runtime 0.111s ... 0.115s nvme/007 (create an NVMeOF target with a file-backed ns) [passed] runtime 0.081s ... 0.069s nvme/008 (create an NVMeOF host with a block device-backed ns) [passed] runtime 1.690s ... 1.727s nvme/009 (create an NVMeOF host with a file-backed ns) [passed] runtime 1.659s ... 1.661s nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed] runtime 28.781s ... 30.166s nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed] runtime 253.831s ... 238.774s nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed] runtime 40.003s ... 68.076s nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed] runtime 272.649s ... 283.720s nvme/014 (flush a NVMeOF block device-backed ns) [passed] runtime 21.772s ... 21.397s nvme/015 (unit test for NVMe flush for file backed ns) [passed] runtime 21.908s ... 18.622s nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed] runtime 15.860s ... 18.313s nvme/017 (create/delete many file-ns and test discovery) [passed] runtime 16.470s ... 18.374s nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed] runtime 1.665s ... 1.890s nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed] runtime 1.681s ... 1.982s nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed] runtime 1.645s ... 1.913s nvme/021 (test NVMe list command on NVMeOF file-backed ns) [passed] runtime 1.648s ... 1.956s nvme/022 (test NVMe reset command on NVMeOF file-backed ns) [passed] runtime 2.063s ... 2.553s nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed] runtime 1.692s ... 2.588s nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed] runtime 1.643s ... 1.656s nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed] runtime 1.640s ... 1.668s nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed] runtime 1.643s ... 1.961s nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed] runtime 1.641s ... 1.677s nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed] runtime 1.648s ... 1.868s nvme/029 (test userspace IO via nvme-cli read/write interface) [passed] runtime 1.982s ... 2.703s nvme/030 (ensure the discovery generation counter is updated appropriately) [passed] runtime 0.308s ... 0.328s nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed] runtime 5.432s ... 7.495s nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed] runtime 0.053s ... 0.046s -- 2.22.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support 2021-03-11 7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni @ 2021-03-11 7:15 ` Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni 2 siblings, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-11 7:15 UTC (permalink / raw) To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni NVMe TP 4056 allows controller to support different command sets. NVMeoF target currently only supports namespaces that contain traditional logical blocks that may be randomly read and written. In some applications there is a value in exposing namespaces that contain logical blocks that have special access rules (e.g. sequentially write required namespace such as Zoned Namespace (ZNS)). In order to support the Zoned Block Devices (ZBD) backend, controller needs to have support for ZNS Command Set Identifier (CSI). In this preparation patch, we adjust the code such that it can now supports default command set identifier. We update the namespace data structure to store the CSI value which defaults to NVME_CSI_NVM that represents traditional logical blocks namespace type. The CSI support is required to implement the ZBD backend for NVMeOF with host side NVMe ZNS interface, since ZNS commands belongs to the different command set than the default one. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- drivers/nvme/target/admin-cmd.c | 47 +++++++++++++++++++++++++++------ drivers/nvme/target/core.c | 16 ++++++++++- drivers/nvme/target/nvmet.h | 1 + include/linux/nvme.h | 1 + 4 files changed, 56 insertions(+), 9 deletions(-) diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index f4cc32674edd..176c8593d341 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -162,15 +162,8 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req) nvmet_req_complete(req, status); } -static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) +static void nvmet_set_csi_nvm_effects(struct nvme_effects_log *log) { - u16 status = NVME_SC_INTERNAL; - struct nvme_effects_log *log; - - log = kzalloc(sizeof(*log), GFP_KERNEL); - if (!log) - goto out; - log->acs[nvme_admin_get_log_page] = cpu_to_le32(1 << 0); log->acs[nvme_admin_identify] = cpu_to_le32(1 << 0); log->acs[nvme_admin_abort_cmd] = cpu_to_le32(1 << 0); @@ -184,9 +177,31 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) log->iocs[nvme_cmd_flush] = cpu_to_le32(1 << 0); log->iocs[nvme_cmd_dsm] = cpu_to_le32(1 << 0); log->iocs[nvme_cmd_write_zeroes] = cpu_to_le32(1 << 0); +} + +static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) +{ + struct nvme_effects_log *log; + u16 status = NVME_SC_SUCCESS; + + log = kzalloc(sizeof(*log), GFP_KERNEL); + if (!log) { + status = NVME_SC_INTERNAL; + goto out; + } + + switch (req->cmd->get_log_page.csi) { + case NVME_CSI_NVM: + nvmet_set_csi_nvm_effects(log); + break; + default: + status = NVME_SC_INVALID_LOG_PAGE; + goto free; + } status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log)); +free: kfree(log); out: nvmet_req_complete(req, status); @@ -611,6 +626,18 @@ static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len, return 0; } +static u16 nvmet_execute_identify_desclist_csi(struct nvmet_req *req, off_t *o) +{ + switch (req->ns->csi) { + case NVME_CSI_NVM: + return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, + NVME_NIDT_CSI_LEN, + &req->ns->csi, o); + } + + return NVME_SC_INVALID_IO_CMD_SET; +} + static void nvmet_execute_identify_desclist(struct nvmet_req *req) { off_t off = 0; @@ -635,6 +662,10 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req) goto out; } + status = nvmet_execute_identify_desclist_csi(req, &off); + if (status) + goto out; + if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off, off) != NVME_IDENTIFY_DATA_SIZE - off) status = NVME_SC_INTERNAL | NVME_SC_DNR; diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index adbede9ab7f3..4abe0b542c96 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -693,6 +693,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid) uuid_gen(&ns->uuid); ns->buffered_io = false; + ns->csi = NVME_CSI_NVM; return ns; } @@ -1113,6 +1114,17 @@ static inline u8 nvmet_cc_iocqes(u32 cc) return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf; } +static inline bool nvmet_cc_css_check(u8 cc_css) +{ + switch (cc_css <<= NVME_CC_CSS_SHIFT) { + case NVME_CC_CSS_NVM: + case NVME_CC_CSS_CSI: + return true; + default: + return false; + } +} + static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl) { lockdep_assert_held(&ctrl->lock); @@ -1121,7 +1133,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl) nvmet_cc_iocqes(ctrl->cc) != NVME_NVM_IOCQES || nvmet_cc_mps(ctrl->cc) != 0 || nvmet_cc_ams(ctrl->cc) != 0 || - nvmet_cc_css(ctrl->cc) != 0) { + !nvmet_cc_css_check(nvmet_cc_css(ctrl->cc))) { ctrl->csts = NVME_CSTS_CFS; return; } @@ -1172,6 +1184,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl) { /* command sets supported: NVMe command set: */ ctrl->cap = (1ULL << 37); + /* Controller supports one or more I/O Command Sets */ + ctrl->cap |= (1ULL << 43); /* CC.EN timeout in 500msec units: */ ctrl->cap |= (15ULL << 24); /* maximum queue entries supported: */ diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 24e261bf153a..ee5999920155 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -81,6 +81,7 @@ struct nvmet_ns { struct pci_dev *p2p_dev; int pi_type; int metadata_size; + u8 csi; }; static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item) diff --git a/include/linux/nvme.h b/include/linux/nvme.h index b08787cd0881..f09fbbb7876b 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -1494,6 +1494,7 @@ enum { NVME_SC_NS_WRITE_PROTECTED = 0x20, NVME_SC_CMD_INTERRUPTED = 0x21, NVME_SC_TRANSIENT_TR_ERR = 0x22, + NVME_SC_INVALID_IO_CMD_SET = 0x2C, NVME_SC_LBA_RANGE = 0x80, NVME_SC_CAP_EXCEEDED = 0x81, -- 2.22.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-11 7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni @ 2021-03-11 7:15 ` Chaitanya Kulkarni 2021-03-12 1:15 ` Damien Le Moal 2021-03-11 7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni 2 siblings, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-11 7:15 UTC (permalink / raw) To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are Generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend to support the ZBDs for NVMeOF target. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With new command set identifier we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- drivers/nvme/target/Makefile | 1 + drivers/nvme/target/admin-cmd.c | 27 +++ drivers/nvme/target/io-cmd-bdev.c | 34 ++- drivers/nvme/target/nvmet.h | 38 ++++ drivers/nvme/target/zns.c | 332 ++++++++++++++++++++++++++++++ 5 files changed, 424 insertions(+), 8 deletions(-) create mode 100644 drivers/nvme/target/zns.c diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile index ebf91fc4c72e..9837e580fa7e 100644 --- a/drivers/nvme/target/Makefile +++ b/drivers/nvme/target/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP) += nvmet-tcp.o nvmet-y += core.o configfs.o admin-cmd.o fabrics-cmd.o \ discovery.o io-cmd-file.o io-cmd-bdev.o nvmet-$(CONFIG_NVME_TARGET_PASSTHRU) += passthru.o +nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o nvme-loop-y += loop.o nvmet-rdma-y += rdma.o nvmet-fc-y += fc.o diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index 176c8593d341..bf4876df624a 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -179,6 +179,13 @@ static void nvmet_set_csi_nvm_effects(struct nvme_effects_log *log) log->iocs[nvme_cmd_write_zeroes] = cpu_to_le32(1 << 0); } +static void nvmet_set_csi_zns_effects(struct nvme_effects_log *log) +{ + log->iocs[nvme_cmd_zone_append] = cpu_to_le32(1 << 0); + log->iocs[nvme_cmd_zone_mgmt_send] = cpu_to_le32(1 << 0); + log->iocs[nvme_cmd_zone_mgmt_recv] = cpu_to_le32(1 << 0); +} + static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) { struct nvme_effects_log *log; @@ -194,6 +201,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) case NVME_CSI_NVM: nvmet_set_csi_nvm_effects(log); break; + case NVME_CSI_ZNS: + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) { + status = NVME_SC_INVALID_IO_CMD_SET; + goto free; + } + + nvmet_set_csi_nvm_effects(log); + nvmet_set_csi_zns_effects(log); + break; default: status = NVME_SC_INVALID_LOG_PAGE; goto free; @@ -630,6 +646,13 @@ static u16 nvmet_execute_identify_desclist_csi(struct nvmet_req *req, off_t *o) { switch (req->ns->csi) { case NVME_CSI_NVM: + return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, + NVME_NIDT_CSI_LEN, + &req->ns->csi, o); + case NVME_CSI_ZNS: + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) + return NVME_SC_INVALID_IO_CMD_SET; + return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN, &req->ns->csi, o); @@ -682,8 +705,12 @@ static void nvmet_execute_identify(struct nvmet_req *req) switch (req->cmd->identify.cns) { case NVME_ID_CNS_NS: return nvmet_execute_identify_ns(req); + case NVME_ID_CNS_CS_NS: + return nvmet_execute_identify_cns_cs_ns(req); case NVME_ID_CNS_CTRL: return nvmet_execute_identify_ctrl(req); + case NVME_ID_CNS_CS_CTRL: + return nvmet_execute_identify_cns_cs_ctrl(req); case NVME_ID_CNS_NS_ACTIVE_LIST: return nvmet_execute_identify_nslist(req); case NVME_ID_CNS_NS_DESC_LIST: diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c index 9a8b3726a37c..ada0215f5e56 100644 --- a/drivers/nvme/target/io-cmd-bdev.c +++ b/drivers/nvme/target/io-cmd-bdev.c @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns) } } +void nvmet_bdev_ns_disable(struct nvmet_ns *ns) +{ + if (ns->bdev) { + blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ); + ns->bdev = NULL; + } +} + int nvmet_bdev_ns_enable(struct nvmet_ns *ns) { int ret; @@ -86,15 +94,16 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns) if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10)) nvmet_bdev_ns_enable_integrity(ns); - return 0; -} - -void nvmet_bdev_ns_disable(struct nvmet_ns *ns) -{ - if (ns->bdev) { - blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ); - ns->bdev = NULL; + /* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */ + if (bdev_is_zoned(ns->bdev)) { + if (!nvmet_bdev_zns_enable(ns)) { + nvmet_bdev_ns_disable(ns); + return -EINVAL; + } + ns->csi = NVME_CSI_ZNS; } + + return 0; } void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns) @@ -448,6 +457,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req) case nvme_cmd_write_zeroes: req->execute = nvmet_bdev_execute_write_zeroes; return 0; + case nvme_cmd_zone_append: + req->execute = nvmet_bdev_execute_zone_append; + return 0; + case nvme_cmd_zone_mgmt_recv: + req->execute = nvmet_bdev_execute_zone_mgmt_recv; + return 0; + case nvme_cmd_zone_mgmt_send: + req->execute = nvmet_bdev_execute_zone_mgmt_send; + return 0; default: return nvmet_report_invalid_opcode(req); } diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index ee5999920155..f3fccc49de03 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -247,6 +247,10 @@ struct nvmet_subsys { unsigned int admin_timeout; unsigned int io_timeout; #endif /* CONFIG_NVME_TARGET_PASSTHRU */ + +#ifdef CONFIG_BLK_DEV_ZONED + u8 zasl; +#endif /* CONFIG_BLK_DEV_ZONED */ }; static inline struct nvmet_subsys *to_subsys(struct config_item *item) @@ -584,6 +588,40 @@ static inline struct nvme_ctrl *nvmet_passthru_ctrl(struct nvmet_subsys *subsys) } #endif /* CONFIG_NVME_TARGET_PASSTHRU */ +#ifdef CONFIG_BLK_DEV_ZONED +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns); +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req); +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req); +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req); +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req); +void nvmet_bdev_execute_zone_append(struct nvmet_req *req); +#else /* CONFIG_BLK_DEV_ZONED */ +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns) +{ + return false; +} +static inline void +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req) +{ +} +static inline void +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req) +{ +} +static inline void +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) +{ +} +static inline void +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) +{ +} +static inline void +nvmet_bdev_execute_zone_append(struct nvmet_req *req) +{ +} +#endif /* CONFIG_BLK_DEV_ZONED */ + static inline struct nvme_ctrl * nvmet_req_passthru_ctrl(struct nvmet_req *req) { diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c new file mode 100644 index 000000000000..e12629b02320 --- /dev/null +++ b/drivers/nvme/target/zns.c @@ -0,0 +1,332 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NVMe ZNS-ZBD command implementation. + * Copyright (c) 2020-2021 HGST, a Western Digital Company. + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include <linux/nvme.h> +#include <linux/blkdev.h> +#include "nvmet.h" + +/* + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0 + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k + * as page_shift value. When calculating the ZASL use shift by 12. + */ +#define NVMET_MPSMIN_SHIFT 12 + +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req) +{ + if (!bdev_is_zoned(req->ns->bdev)) + return NVME_SC_INVALID_NS | NVME_SC_DNR; + + if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) { + req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, zra); + return NVME_SC_INVALID_FIELD | NVME_SC_DNR; + } + + if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) { + req->error_loc = + offsetof(struct nvme_zone_mgmt_recv_cmd, zrasf); + return NVME_SC_INVALID_FIELD | NVME_SC_DNR; + } + + if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL) { + req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, pr); + return NVME_SC_INVALID_FIELD | NVME_SC_DNR; + } + + return NVME_SC_SUCCESS; +} + +static inline u8 nvmet_zasl(unsigned int zone_append_sects) +{ + /* + * Zone Append Size Limit is the value experessed in the units + * of minimum memory page size (i.e. 12) and is reported power of 2. + */ + return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT); +} + +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns) +{ + struct request_queue *q = ns->bdev->bd_disk->queue; + u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q)); + + if (ns->subsys->zasl) + return ns->subsys->zasl < zasl; + + ns->subsys->zasl = zasl; + return true; +} + +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z, + unsigned int i, void *data) +{ + if (z->type == BLK_ZONE_TYPE_CONVENTIONAL) + return -EOPNOTSUPP; + return 0; +} + +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev) +{ + int ret; + + if (bdev->bd_disk->queue->conv_zones_bitmap) + return true; + + ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk), + nvmet_bdev_validate_zns_zones_cb, NULL); + + return ret <= 0 ? true : false; +} + +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns) +{ + if (nvmet_bdev_has_conv_zones(ns->bdev)) + return false; + + ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev)); + + if (!nvmet_zns_update_zasl(ns)) + return false; + + return !(get_capacity(ns->bdev->bd_disk) & + (bdev_zone_sectors(ns->bdev) - 1)); +} + +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req) +{ + u8 zasl = req->sq->ctrl->subsys->zasl; + struct nvmet_ctrl *ctrl = req->sq->ctrl; + struct nvme_id_ctrl_zns *id; + u16 status; + + if (req->cmd->identify.csi != NVME_CSI_ZNS) { + req->error_loc = offsetof(struct nvme_common_command, opcode); + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; + goto out; + } + + id = kzalloc(sizeof(*id), GFP_KERNEL); + if (!id) { + status = NVME_SC_INTERNAL; + goto out; + } + + if (ctrl->ops->get_mdts) + id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl); + else + id->zasl = zasl; + + status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id)); + + kfree(id); +out: + nvmet_req_complete(req, status); +} + +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req) +{ + struct nvme_id_ns_zns *id_zns; + u64 zsze; + u16 status; + + if (req->cmd->identify.csi != NVME_CSI_ZNS) { + req->error_loc = offsetof(struct nvme_common_command, opcode); + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; + goto out; + } + + if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) { + req->error_loc = offsetof(struct nvme_identify, nsid); + status = NVME_SC_INVALID_NS | NVME_SC_DNR; + goto out; + } + + id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL); + if (!id_zns) { + status = NVME_SC_INTERNAL; + goto out; + } + + status = nvmet_req_find_ns(req); + if (status) { + status = NVME_SC_INTERNAL; + goto done; + } + + if (!bdev_is_zoned(req->ns->bdev)) { + req->error_loc = offsetof(struct nvme_identify, nsid); + status = NVME_SC_INVALID_NS | NVME_SC_DNR; + goto done; + } + + nvmet_ns_revalidate(req->ns); + zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >> + req->ns->blksize_shift; + id_zns->lbafe[0].zsze = cpu_to_le64(zsze); + id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev)); + id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev)); + +done: + status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns)); + kfree(id_zns); +out: + nvmet_req_complete(req, status); +} + +struct nvmet_report_zone_data { + struct nvmet_ns *ns; + struct nvme_zone_report *rz; +}; + +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned i, void *d) +{ + struct nvmet_report_zone_data *report_zone_data = d; + struct nvme_zone_descriptor *entries = report_zone_data->rz->entries; + struct nvmet_ns *ns = report_zone_data->ns; + + entries[i].zcap = nvmet_sect_to_lba(ns, z->capacity); + entries[i].zslba = nvmet_sect_to_lba(ns, z->start); + entries[i].wp = nvmet_sect_to_lba(ns, z->wp); + entries[i].za = z->reset ? 1 << 2 : 0; + entries[i].zt = z->type; + entries[i].zs = z->cond << 4; + + return 0; +} + +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) +{ + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; + struct nvmet_report_zone_data data = { .ns = req->ns }; + unsigned int nr_zones; + int reported_zones; + u16 status; + + status = nvmet_bdev_zns_checks(req); + if (status) + goto out; + + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); + if (!data.rz) { + status = NVME_SC_INTERNAL; + goto out; + } + + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / + sizeof(struct nvme_zone_descriptor); + if (!nr_zones) { + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; + goto out_free_report_zones; + } + + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, + nvmet_bdev_report_zone_cb, &data); + if (reported_zones < 0) { + status = NVME_SC_INTERNAL; + goto out_free_report_zones; + } + + data.rz->nr_zones = cpu_to_le64(reported_zones); + + status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize); + +out_free_report_zones: + kvfree(data.rz); +out: + nvmet_req_complete(req, status); +} + +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) +{ + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); + sector_t nr_sect = bdev_zone_sectors(req->ns->bdev); + u16 status = NVME_SC_SUCCESS; + u8 zsa = req->cmd->zms.zsa; + enum req_opf op; + int ret; + const unsigned int zsa_to_op[] = { + [NVME_ZONE_OPEN] = REQ_OP_ZONE_OPEN, + [NVME_ZONE_CLOSE] = REQ_OP_ZONE_CLOSE, + [NVME_ZONE_FINISH] = REQ_OP_ZONE_FINISH, + [NVME_ZONE_RESET] = REQ_OP_ZONE_RESET, + }; + + if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) { + status = NVME_SC_INVALID_FIELD; + goto out; + } + + op = zsa_to_op[zsa]; + + if (req->cmd->zms.select_all) + nr_sect = get_capacity(req->ns->bdev->bd_disk); + + ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL); + if (ret) + status = NVME_SC_INTERNAL; +out: + nvmet_req_complete(req, status); +} + +void nvmet_bdev_execute_zone_append(struct nvmet_req *req) +{ + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba); + u16 status = NVME_SC_SUCCESS; + unsigned int total_len = 0; + struct scatterlist *sg; + int ret = 0, sg_cnt; + struct bio *bio; + + if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) + return; + + if (!req->sg_cnt) { + nvmet_req_complete(req, 0); + return; + } + + if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) { + bio = &req->b.inline_bio; + bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); + } else { + bio = bio_alloc(GFP_KERNEL, req->sg_cnt); + } + + bio_set_dev(bio, req->ns->bdev); + bio->bi_iter.bi_sector = sect; + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; + if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA)) + bio->bi_opf |= REQ_FUA; + + for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) { + struct page *p = sg_page(sg); + unsigned int l = sg->length; + unsigned int o = sg->offset; + + ret = bio_add_zone_append_page(bio, p, l, o); + if (ret != sg->length) { + status = NVME_SC_INTERNAL; + goto out_bio_put; + } + + total_len += sg->length; + } + + if (total_len != nvmet_rw_data_len(req)) { + status = NVME_SC_INTERNAL | NVME_SC_DNR; + goto out_bio_put; + } + + ret = submit_bio_wait(bio); + req->cqe->result.u64 = nvmet_sect_to_lba(req->ns, + bio->bi_iter.bi_sector); + +out_bio_put: + if (bio != &req->b.inline_bio) + bio_put(bio); + nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status); +} -- 2.22.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-11 7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni @ 2021-03-12 1:15 ` Damien Le Moal 2021-03-12 6:29 ` Chaitanya Kulkarni 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2021-03-12 1:15 UTC (permalink / raw) To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, kbusch, sagi On 2021/03/11 16:16, Chaitanya Kulkarni wrote: > NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to > communicate with a non-volatile memory subsystem using zones for > NVMe protocol based controllers. NVMeOF already support the ZNS NVMe > Protocol compliant devices on the target in the passthru mode. There > are Generic zoned block devices like Shingled Magnetic Recording (SMR) > HDDs that are not based on the NVMe protocol. > > This patch adds ZNS backend to support the ZBDs for NVMeOF target. > > This support includes implementing the new command set NVME_CSI_ZNS, > adding different command handlers for ZNS command set such as > NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, > NVMe Zone Management Send and NVMe Zone Management Receive. > > With new command set identifier we also update the target command > effects logs to reflect the ZNS compliant commands. > > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > --- > drivers/nvme/target/Makefile | 1 + > drivers/nvme/target/admin-cmd.c | 27 +++ > drivers/nvme/target/io-cmd-bdev.c | 34 ++- > drivers/nvme/target/nvmet.h | 38 ++++ > drivers/nvme/target/zns.c | 332 ++++++++++++++++++++++++++++++ > 5 files changed, 424 insertions(+), 8 deletions(-) > create mode 100644 drivers/nvme/target/zns.c > > diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile > index ebf91fc4c72e..9837e580fa7e 100644 > --- a/drivers/nvme/target/Makefile > +++ b/drivers/nvme/target/Makefile > @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP) += nvmet-tcp.o > nvmet-y += core.o configfs.o admin-cmd.o fabrics-cmd.o \ > discovery.o io-cmd-file.o io-cmd-bdev.o > nvmet-$(CONFIG_NVME_TARGET_PASSTHRU) += passthru.o > +nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o > nvme-loop-y += loop.o > nvmet-rdma-y += rdma.o > nvmet-fc-y += fc.o > diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c > index 176c8593d341..bf4876df624a 100644 > --- a/drivers/nvme/target/admin-cmd.c > +++ b/drivers/nvme/target/admin-cmd.c > @@ -179,6 +179,13 @@ static void nvmet_set_csi_nvm_effects(struct nvme_effects_log *log) > log->iocs[nvme_cmd_write_zeroes] = cpu_to_le32(1 << 0); > } > > +static void nvmet_set_csi_zns_effects(struct nvme_effects_log *log) > +{ > + log->iocs[nvme_cmd_zone_append] = cpu_to_le32(1 << 0); > + log->iocs[nvme_cmd_zone_mgmt_send] = cpu_to_le32(1 << 0); > + log->iocs[nvme_cmd_zone_mgmt_recv] = cpu_to_le32(1 << 0); > +} > + > static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) > { > struct nvme_effects_log *log; > @@ -194,6 +201,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) > case NVME_CSI_NVM: > nvmet_set_csi_nvm_effects(log); > break; > + case NVME_CSI_ZNS: > + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) { > + status = NVME_SC_INVALID_IO_CMD_SET; > + goto free; > + } > + > + nvmet_set_csi_nvm_effects(log); > + nvmet_set_csi_zns_effects(log); > + break; > default: > status = NVME_SC_INVALID_LOG_PAGE; > goto free; > @@ -630,6 +646,13 @@ static u16 nvmet_execute_identify_desclist_csi(struct nvmet_req *req, off_t *o) > { > switch (req->ns->csi) { > case NVME_CSI_NVM: > + return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, > + NVME_NIDT_CSI_LEN, > + &req->ns->csi, o); > + case NVME_CSI_ZNS: > + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) > + return NVME_SC_INVALID_IO_CMD_SET; > + > return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, > NVME_NIDT_CSI_LEN, > &req->ns->csi, o); > @@ -682,8 +705,12 @@ static void nvmet_execute_identify(struct nvmet_req *req) > switch (req->cmd->identify.cns) { > case NVME_ID_CNS_NS: > return nvmet_execute_identify_ns(req); > + case NVME_ID_CNS_CS_NS: > + return nvmet_execute_identify_cns_cs_ns(req); > case NVME_ID_CNS_CTRL: > return nvmet_execute_identify_ctrl(req); > + case NVME_ID_CNS_CS_CTRL: > + return nvmet_execute_identify_cns_cs_ctrl(req); > case NVME_ID_CNS_NS_ACTIVE_LIST: > return nvmet_execute_identify_nslist(req); > case NVME_ID_CNS_NS_DESC_LIST: > diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c > index 9a8b3726a37c..ada0215f5e56 100644 > --- a/drivers/nvme/target/io-cmd-bdev.c > +++ b/drivers/nvme/target/io-cmd-bdev.c > @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns) > } > } > > +void nvmet_bdev_ns_disable(struct nvmet_ns *ns) > +{ > + if (ns->bdev) { > + blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ); > + ns->bdev = NULL; > + } > +} > + > int nvmet_bdev_ns_enable(struct nvmet_ns *ns) > { > int ret; > @@ -86,15 +94,16 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns) > if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10)) > nvmet_bdev_ns_enable_integrity(ns); > > - return 0; > -} > - > -void nvmet_bdev_ns_disable(struct nvmet_ns *ns) > -{ > - if (ns->bdev) { > - blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ); > - ns->bdev = NULL; > + /* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */ I do not really understand this comment and I do not think it is useful. bdev_is_zoned() is always defined, regardless of CONFIG_BLK_DEV_ZONED. If CONFIG_BLK_DEV_ZONED is not set, you will always get false. > + if (bdev_is_zoned(ns->bdev)) { > + if (!nvmet_bdev_zns_enable(ns)) { > + nvmet_bdev_ns_disable(ns); > + return -EINVAL; > + } > + ns->csi = NVME_CSI_ZNS; > } > + > + return 0; > } > > void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns) > @@ -448,6 +457,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req) > case nvme_cmd_write_zeroes: > req->execute = nvmet_bdev_execute_write_zeroes; > return 0; > + case nvme_cmd_zone_append: > + req->execute = nvmet_bdev_execute_zone_append; > + return 0; > + case nvme_cmd_zone_mgmt_recv: > + req->execute = nvmet_bdev_execute_zone_mgmt_recv; > + return 0; > + case nvme_cmd_zone_mgmt_send: > + req->execute = nvmet_bdev_execute_zone_mgmt_send; > + return 0; > default: > return nvmet_report_invalid_opcode(req); > } > diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h > index ee5999920155..f3fccc49de03 100644 > --- a/drivers/nvme/target/nvmet.h > +++ b/drivers/nvme/target/nvmet.h > @@ -247,6 +247,10 @@ struct nvmet_subsys { > unsigned int admin_timeout; > unsigned int io_timeout; > #endif /* CONFIG_NVME_TARGET_PASSTHRU */ > + > +#ifdef CONFIG_BLK_DEV_ZONED > + u8 zasl; > +#endif /* CONFIG_BLK_DEV_ZONED */ > }; > > static inline struct nvmet_subsys *to_subsys(struct config_item *item) > @@ -584,6 +588,40 @@ static inline struct nvme_ctrl *nvmet_passthru_ctrl(struct nvmet_subsys *subsys) > } > #endif /* CONFIG_NVME_TARGET_PASSTHRU */ > > +#ifdef CONFIG_BLK_DEV_ZONED > +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns); > +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req); > +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req); > +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req); > +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req); > +void nvmet_bdev_execute_zone_append(struct nvmet_req *req); > +#else /* CONFIG_BLK_DEV_ZONED */ > +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns) > +{ > + return false; > +} > +static inline void > +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req) > +{ > +} > +static inline void > +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req) > +{ > +} > +static inline void > +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) > +{ > +} > +static inline void > +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) > +{ > +} > +static inline void > +nvmet_bdev_execute_zone_append(struct nvmet_req *req) > +{ > +} > +#endif /* CONFIG_BLK_DEV_ZONED */ > + > static inline struct nvme_ctrl * > nvmet_req_passthru_ctrl(struct nvmet_req *req) > { > diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c > new file mode 100644 > index 000000000000..e12629b02320 > --- /dev/null > +++ b/drivers/nvme/target/zns.c > @@ -0,0 +1,332 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * NVMe ZNS-ZBD command implementation. s/ZBD/zoned block device "ZBD" is not necessarilly an obvious acronym to everybody. > + * Copyright (c) 2020-2021 HGST, a Western Digital Company. This should be: * Copyright (C) 2021 Western Digital Corporation or its affiliates. > + */ > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > +#include <linux/nvme.h> > +#include <linux/blkdev.h> > +#include "nvmet.h" > + > +/* > + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0 > + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k > + * as page_shift value. When calculating the ZASL use shift by 12. > + */ > +#define NVMET_MPSMIN_SHIFT 12 > + > +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req) > +{ > + if (!bdev_is_zoned(req->ns->bdev)) > + return NVME_SC_INVALID_NS | NVME_SC_DNR; > + > + if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) { > + req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, zra); > + return NVME_SC_INVALID_FIELD | NVME_SC_DNR; > + } > + > + if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) { > + req->error_loc = > + offsetof(struct nvme_zone_mgmt_recv_cmd, zrasf); > + return NVME_SC_INVALID_FIELD | NVME_SC_DNR; > + } > + > + if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL) { > + req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, pr); > + return NVME_SC_INVALID_FIELD | NVME_SC_DNR; > + } > + > + return NVME_SC_SUCCESS; > +} > + > +static inline u8 nvmet_zasl(unsigned int zone_append_sects) > +{ > + /* > + * Zone Append Size Limit is the value experessed in the units > + * of minimum memory page size (i.e. 12) and is reported power of 2. > + */ > + return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT); s/9/SECTOR_SHIFT And you could rewrite this as: return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - SECTOR_SHIFT)); > +} > + > +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns) > +{ > + struct request_queue *q = ns->bdev->bd_disk->queue; > + u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q)); > + > + if (ns->subsys->zasl) > + return ns->subsys->zasl < zasl; > + > + ns->subsys->zasl = zasl; > + return true; > +} > + > +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z, > + unsigned int i, void *data) > +{ > + if (z->type == BLK_ZONE_TYPE_CONVENTIONAL) > + return -EOPNOTSUPP; > + return 0; > +} > + > +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev) > +{ > + int ret; > + > + if (bdev->bd_disk->queue->conv_zones_bitmap) > + return true; > + > + ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk), > + nvmet_bdev_validate_zns_zones_cb, NULL); > + > + return ret <= 0 ? true : false; > +} > + > +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns) > +{ > + if (nvmet_bdev_has_conv_zones(ns->bdev)) > + return false; > + > + ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev)); > + > + if (!nvmet_zns_update_zasl(ns)) > + return false; > + > + return !(get_capacity(ns->bdev->bd_disk) & > + (bdev_zone_sectors(ns->bdev) - 1)); It may be good to add a comment above this one as it is not necessarilly obvious. Something like: /* * Generic zoned block devices may have a smaller last zone which is * not supported by ZNS. Excludes zoned drives that have such smaller * last zone. */ > +} > + > +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req) > +{ > + u8 zasl = req->sq->ctrl->subsys->zasl; > + struct nvmet_ctrl *ctrl = req->sq->ctrl; > + struct nvme_id_ctrl_zns *id; > + u16 status; > + > + if (req->cmd->identify.csi != NVME_CSI_ZNS) { > + req->error_loc = offsetof(struct nvme_common_command, opcode); > + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; > + goto out; > + } > + > + id = kzalloc(sizeof(*id), GFP_KERNEL); > + if (!id) { > + status = NVME_SC_INTERNAL; > + goto out; > + } > + > + if (ctrl->ops->get_mdts) > + id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl); > + else > + id->zasl = zasl; > + > + status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id)); > + > + kfree(id); > +out: > + nvmet_req_complete(req, status); > +} > + > +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req) > +{ > + struct nvme_id_ns_zns *id_zns; > + u64 zsze; > + u16 status; > + > + if (req->cmd->identify.csi != NVME_CSI_ZNS) { > + req->error_loc = offsetof(struct nvme_common_command, opcode); > + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; > + goto out; > + } > + > + if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) { > + req->error_loc = offsetof(struct nvme_identify, nsid); > + status = NVME_SC_INVALID_NS | NVME_SC_DNR; > + goto out; > + } > + > + id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL); > + if (!id_zns) { > + status = NVME_SC_INTERNAL; > + goto out; > + } > + > + status = nvmet_req_find_ns(req); > + if (status) { > + status = NVME_SC_INTERNAL; > + goto done; > + } > + > + if (!bdev_is_zoned(req->ns->bdev)) { > + req->error_loc = offsetof(struct nvme_identify, nsid); > + status = NVME_SC_INVALID_NS | NVME_SC_DNR; > + goto done; > + } > + > + nvmet_ns_revalidate(req->ns); > + zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >> > + req->ns->blksize_shift; s/9/SECTOR_SHIFT > + id_zns->lbafe[0].zsze = cpu_to_le64(zsze); > + id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev)); > + id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev)); > + > +done: > + status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns)); > + kfree(id_zns); > +out: > + nvmet_req_complete(req, status); > +} > + > +struct nvmet_report_zone_data { > + struct nvmet_ns *ns; > + struct nvme_zone_report *rz; > +}; > + > +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned i, void *d) > +{ > + struct nvmet_report_zone_data *report_zone_data = d; > + struct nvme_zone_descriptor *entries = report_zone_data->rz->entries; > + struct nvmet_ns *ns = report_zone_data->ns; > + > + entries[i].zcap = nvmet_sect_to_lba(ns, z->capacity); > + entries[i].zslba = nvmet_sect_to_lba(ns, z->start); > + entries[i].wp = nvmet_sect_to_lba(ns, z->wp); > + entries[i].za = z->reset ? 1 << 2 : 0; > + entries[i].zt = z->type; > + entries[i].zs = z->cond << 4; > + > + return 0; > +} > + > +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) > +{ > + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); > + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; > + struct nvmet_report_zone_data data = { .ns = req->ns }; > + unsigned int nr_zones; > + int reported_zones; > + u16 status; > + > + status = nvmet_bdev_zns_checks(req); > + if (status) > + goto out; > + > + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); > + if (!data.rz) { > + status = NVME_SC_INTERNAL; > + goto out; > + } > + > + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / > + sizeof(struct nvme_zone_descriptor); > + if (!nr_zones) { > + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; > + goto out_free_report_zones; > + } > + > + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, > + nvmet_bdev_report_zone_cb, &data); > + if (reported_zones < 0) { > + status = NVME_SC_INTERNAL; > + goto out_free_report_zones; > + } There is a problem here: the code as is ignores the request reporting option field which can lead to an invalid zone report being returned. I think you need to modify nvmet_bdev_report_zone_cb() to look at the reporting option field passed by the initiator and filter the zone report since blkdev_report_zones() does not handle that argument. > + > + data.rz->nr_zones = cpu_to_le64(reported_zones); > + > + status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize); > + > +out_free_report_zones: > + kvfree(data.rz); > +out: > + nvmet_req_complete(req, status); > +} > + > +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) > +{ > + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); > + sector_t nr_sect = bdev_zone_sectors(req->ns->bdev); > + u16 status = NVME_SC_SUCCESS; > + u8 zsa = req->cmd->zms.zsa; > + enum req_opf op; > + int ret; > + const unsigned int zsa_to_op[] = { > + [NVME_ZONE_OPEN] = REQ_OP_ZONE_OPEN, > + [NVME_ZONE_CLOSE] = REQ_OP_ZONE_CLOSE, > + [NVME_ZONE_FINISH] = REQ_OP_ZONE_FINISH, > + [NVME_ZONE_RESET] = REQ_OP_ZONE_RESET, > + }; > + > + if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) { What is the point of the "!zsa_to_op[zsa]" here ? All the REQ_OP_ZONE_XXX are non 0, always... > + status = NVME_SC_INVALID_FIELD; > + goto out; > + } > + > + op = zsa_to_op[zsa]; > + > + if (req->cmd->zms.select_all) > + nr_sect = get_capacity(req->ns->bdev->bd_disk); If select_all is set, sect must be ignored, so you need to have something like this: if (req->cmd->zms.select_all) { sect = 0; nr_sect = get_capacity(req->ns->bdev->bd_disk); } else { sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); nr_sect = bdev_zone_sectors(req->ns->bdev); } Easier to read. Also may be rename nr_sect to nr_sects (plural). > + > + ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL); > + if (ret) > + status = NVME_SC_INTERNAL; > +out: > + nvmet_req_complete(req, status); > +} > + > +void nvmet_bdev_execute_zone_append(struct nvmet_req *req) > +{ > + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba); > + u16 status = NVME_SC_SUCCESS; > + unsigned int total_len = 0; > + struct scatterlist *sg; > + int ret = 0, sg_cnt; > + struct bio *bio; > + > + if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) > + return; No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ? > + > + if (!req->sg_cnt) { > + nvmet_req_complete(req, 0); > + return; > + } > + > + if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) { > + bio = &req->b.inline_bio; > + bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); > + } else { > + bio = bio_alloc(GFP_KERNEL, req->sg_cnt); > + } > + > + bio_set_dev(bio, req->ns->bdev); > + bio->bi_iter.bi_sector = sect; > + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; > + if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA)) > + bio->bi_opf |= REQ_FUA; > + > + for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) { > + struct page *p = sg_page(sg); > + unsigned int l = sg->length; > + unsigned int o = sg->offset; > + > + ret = bio_add_zone_append_page(bio, p, l, o); > + if (ret != sg->length) { > + status = NVME_SC_INTERNAL; > + goto out_bio_put; > + } > + > + total_len += sg->length; > + } > + > + if (total_len != nvmet_rw_data_len(req)) { > + status = NVME_SC_INTERNAL | NVME_SC_DNR; > + goto out_bio_put; > + } > + > + ret = submit_bio_wait(bio); submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it mandatory to block here ? The result handling could be done in the bio_end callback no ? > + req->cqe->result.u64 = nvmet_sect_to_lba(req->ns, > + bio->bi_iter.bi_sector); > + > +out_bio_put: > + if (bio != &req->b.inline_bio) > + bio_put(bio); > + nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status); > +} > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-12 1:15 ` Damien Le Moal @ 2021-03-12 6:29 ` Chaitanya Kulkarni 2021-03-12 7:25 ` Damien Le Moal 0 siblings, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-12 6:29 UTC (permalink / raw) To: Damien Le Moal, linux-nvme; +Cc: hch, kbusch, sagi >> - >> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns) >> -{ >> - if (ns->bdev) { >> - blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ); >> - ns->bdev = NULL; >> + /* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */ > I do not really understand this comment and I do not think it is useful. > bdev_is_zoned() is always defined, regardless of CONFIG_BLK_DEV_ZONED. If > CONFIG_BLK_DEV_ZONED is not set, you will always get false. Okay, will remove the comment. >> + >> static inline struct nvme_ctrl * >> nvmet_req_passthru_ctrl(struct nvmet_req *req) >> { >> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c >> new file mode 100644 >> index 000000000000..e12629b02320 >> --- /dev/null >> +++ b/drivers/nvme/target/zns.c >> @@ -0,0 +1,332 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * NVMe ZNS-ZBD command implementation. > s/ZBD/zoned block device > > "ZBD" is not necessarilly an obvious acronym to everybody. > >> + * Copyright (c) 2020-2021 HGST, a Western Digital Company. > This should be: > > * Copyright (C) 2021 Western Digital Corporation or its affiliates. Will fix the header. >> +static inline u8 nvmet_zasl(unsigned int zone_append_sects) >> +{ >> + /* >> + * Zone Append Size Limit is the value experessed in the units >> + * of minimum memory page size (i.e. 12) and is reported power of 2. >> + */ >> + return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT); > s/9/SECTOR_SHIFT > > And you could rewrite this as: > > return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - SECTOR_SHIFT)); SECTOR_SHIFT patches are nacked, reason being it doesn't make code clear, how about following ? return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - 9)); > >> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns) >> +{ >> + if (nvmet_bdev_has_conv_zones(ns->bdev)) >> + return false; >> + >> + ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev)); >> + >> + if (!nvmet_zns_update_zasl(ns)) >> + return false; >> + >> + return !(get_capacity(ns->bdev->bd_disk) & >> + (bdev_zone_sectors(ns->bdev) - 1)); > It may be good to add a comment above this one as it is not necessarilly > obvious. Something like: > > /* > * Generic zoned block devices may have a smaller last zone which is > * not supported by ZNS. Excludes zoned drives that have such smaller > * last zone. > */ Will add above comment. > >> +} >> + >> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req) >> +{ >> + u8 zasl = req->sq->ctrl->subsys->zasl; >> + struct nvmet_ctrl *ctrl = req->sq->ctrl; >> + struct nvme_id_ctrl_zns *id; >> + u16 status; >> + >> + if (req->cmd->identify.csi != NVME_CSI_ZNS) { >> + req->error_loc = offsetof(struct nvme_common_command, opcode); >> + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; >> + goto out; >> + } >> + >> + id = kzalloc(sizeof(*id), GFP_KERNEL); >> + if (!id) { >> + status = NVME_SC_INTERNAL; >> + goto out; >> + } >> + >> + if (ctrl->ops->get_mdts) >> + id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl); >> + else >> + id->zasl = zasl; >> + >> + status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id)); >> + >> + kfree(id); >> +out: >> + nvmet_req_complete(req, status); >> +} >> + >> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req) >> +{ >> + struct nvme_id_ns_zns *id_zns; >> + u64 zsze; >> + u16 status; >> + >> + if (req->cmd->identify.csi != NVME_CSI_ZNS) { >> + req->error_loc = offsetof(struct nvme_common_command, opcode); >> + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; >> + goto out; >> + } >> + >> + if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) { >> + req->error_loc = offsetof(struct nvme_identify, nsid); >> + status = NVME_SC_INVALID_NS | NVME_SC_DNR; >> + goto out; >> + } >> + >> + id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL); >> + if (!id_zns) { >> + status = NVME_SC_INTERNAL; >> + goto out; >> + } >> + >> + status = nvmet_req_find_ns(req); >> + if (status) { >> + status = NVME_SC_INTERNAL; >> + goto done; >> + } >> + >> + if (!bdev_is_zoned(req->ns->bdev)) { >> + req->error_loc = offsetof(struct nvme_identify, nsid); >> + status = NVME_SC_INVALID_NS | NVME_SC_DNR; >> + goto done; >> + } >> + >> + nvmet_ns_revalidate(req->ns); >> + zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >> >> + req->ns->blksize_shift; > s/9/SECTOR_SHIFT See my earlier reply to the same comment. >> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) >> +{ >> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); >> + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; >> + struct nvmet_report_zone_data data = { .ns = req->ns }; >> + unsigned int nr_zones; >> + int reported_zones; >> + u16 status; >> + >> + status = nvmet_bdev_zns_checks(req); >> + if (status) >> + goto out; >> + >> + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); >> + if (!data.rz) { >> + status = NVME_SC_INTERNAL; >> + goto out; >> + } >> + >> + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / >> + sizeof(struct nvme_zone_descriptor); >> + if (!nr_zones) { >> + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; >> + goto out_free_report_zones; >> + } >> + >> + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >> + nvmet_bdev_report_zone_cb, &data); >> + if (reported_zones < 0) { >> + status = NVME_SC_INTERNAL; >> + goto out_free_report_zones; >> + } > There is a problem here: the code as is ignores the request reporting option > field which can lead to an invalid zone report being returned. I think you need > to modify nvmet_bdev_report_zone_cb() to look at the reporting option field > passed by the initiator and filter the zone report since blkdev_report_zones() > does not handle that argument. The reporting options are set by the host statistically in nvme_ns_report_zones() arefrom:- nvme_ns_report_zones() c.zmr.zra = NVME_ZRA_ZONE_REPORT; c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; All the above values are validated in the nvmet_bdev_zns_checks() helper called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the report zone buffer. 1. c.zmr.zra indicates the action which Reports zone descriptor entries through the Report Zones data structure. We validate this value is been set to NVME_ZRA_ZONE_REPORT in the nvmet_bdev_zns_chceks(). We are calling report zone after checking zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed in the nvmet_bdev_report_zone_cb(). 2. c.zmr.zrasf indicates the action specific field which is set to NVME_ZRASF_ZONE_REPORT_ALL. We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to filter any zone states in the nvmet_bdev_report_zone_cb(). 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in the Report Zone data structure Number of Zones field indicates the number of fully transferred zone descriptors in the data buffer, which we set from return value of the blkdev_report_zones() :- reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, nvmet_bdev_report_zone_cb, &data); <snip> data.rz->nr_zones = cpu_to_le64(reported_zones); So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr. Can you please explain what filtering is missing in the current code ? Maybe I'm looking into an old spec. >> + >> + data.rz->nr_zones = cpu_to_le64(reported_zones); >> + >> + status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize); >> + >> +out_free_report_zones: >> + kvfree(data.rz); >> +out: >> + nvmet_req_complete(req, status); >> +} >> + >> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) >> +{ >> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); >> + sector_t nr_sect = bdev_zone_sectors(req->ns->bdev); >> + u16 status = NVME_SC_SUCCESS; >> + u8 zsa = req->cmd->zms.zsa; >> + enum req_opf op; >> + int ret; >> + const unsigned int zsa_to_op[] = { >> + [NVME_ZONE_OPEN] = REQ_OP_ZONE_OPEN, >> + [NVME_ZONE_CLOSE] = REQ_OP_ZONE_CLOSE, >> + [NVME_ZONE_FINISH] = REQ_OP_ZONE_FINISH, >> + [NVME_ZONE_RESET] = REQ_OP_ZONE_RESET, >> + }; >> + >> + if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) { > What is the point of the "!zsa_to_op[zsa]" here ? All the REQ_OP_ZONE_XXX are > non 0, always... Well this is just making sure that we receive the right action since sparse array will return 0 for any other values than listed above having !zsa_to_op[zsa] check we can return an error. See nvme_sysfs_show_state() if you want. >> + status = NVME_SC_INVALID_FIELD; >> + goto out; >> + } >> + >> + op = zsa_to_op[zsa]; >> + >> + if (req->cmd->zms.select_all) >> + nr_sect = get_capacity(req->ns->bdev->bd_disk); > If select_all is set, sect must be ignored, so you need to have something like this: > > if (req->cmd->zms.select_all) { > sect = 0; > nr_sect = get_capacity(req->ns->bdev->bd_disk); > } else { > sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); > nr_sect = bdev_zone_sectors(req->ns->bdev); > } > > Easier to read. Also may be rename nr_sect to nr_sects (plural). Okay, will make this change. >> + >> + ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL); >> + if (ret) >> + status = NVME_SC_INTERNAL; >> +out: >> + nvmet_req_complete(req, status); >> +} >> + >> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req) >> +{ >> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba); >> + u16 status = NVME_SC_SUCCESS; >> + unsigned int total_len = 0; >> + struct scatterlist *sg; >> + int ret = 0, sg_cnt; >> + struct bio *bio; >> + >> + if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) >> + return; > No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ? Yes it does, you had the same comment on earlier version, it can be confusing that is why I proposed a helper for check transfer len and !req->sg_cnt check, but we don't want that helper. >> + >> + if (!req->sg_cnt) { >> + nvmet_req_complete(req, 0); >> + return; >> + } >> + >> + if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) { >> + bio = &req->b.inline_bio; >> + bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); >> + } else { >> + bio = bio_alloc(GFP_KERNEL, req->sg_cnt); >> + } >> + >> + bio_set_dev(bio, req->ns->bdev); >> + bio->bi_iter.bi_sector = sect; >> + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; >> + if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA)) >> + bio->bi_opf |= REQ_FUA; >> + >> + for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) { >> + struct page *p = sg_page(sg); >> + unsigned int l = sg->length; >> + unsigned int o = sg->offset; >> + >> + ret = bio_add_zone_append_page(bio, p, l, o); >> + if (ret != sg->length) { >> + status = NVME_SC_INTERNAL; >> + goto out_bio_put; >> + } >> + >> + total_len += sg->length; >> + } >> + >> + if (total_len != nvmet_rw_data_len(req)) { >> + status = NVME_SC_INTERNAL | NVME_SC_DNR; >> + goto out_bio_put; >> + } >> + >> + ret = submit_bio_wait(bio); > submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it > mandatory to block here ? The result handling could be done in the bio_end > callback no ? I did initially, but zonefs uses sync I/O, I'm not sure about the btrfs, if it does please let me know I'll make it async. If there is no async caller in the kernel for REQ_OP_ZONE_APPEND shouldwe make this async ? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-12 6:29 ` Chaitanya Kulkarni @ 2021-03-12 7:25 ` Damien Le Moal 2021-03-13 2:40 ` Chaitanya Kulkarni 2021-03-15 3:54 ` Chaitanya Kulkarni 0 siblings, 2 replies; 12+ messages in thread From: Damien Le Moal @ 2021-03-12 7:25 UTC (permalink / raw) To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, kbusch, sagi On 2021/03/12 15:29, Chaitanya Kulkarni wrote: [...] >>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects) >>> +{ >>> + /* >>> + * Zone Append Size Limit is the value experessed in the units >>> + * of minimum memory page size (i.e. 12) and is reported power of 2. >>> + */ >>> + return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT); >> s/9/SECTOR_SHIFT >> >> And you could rewrite this as: >> >> return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - SECTOR_SHIFT)); > > SECTOR_SHIFT patches are nacked, reason being it doesn't make code > clear, how about following ? Weird. I personally find it much nicer than a magic value "9"... But why not... [...] >>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req) >>> +{ >>> + struct nvme_id_ns_zns *id_zns; >>> + u64 zsze; >>> + u16 status; >>> + >>> + if (req->cmd->identify.csi != NVME_CSI_ZNS) { >>> + req->error_loc = offsetof(struct nvme_common_command, opcode); >>> + status = NVME_SC_INVALID_OPCODE | NVME_SC_DNR; >>> + goto out; >>> + } >>> + >>> + if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) { >>> + req->error_loc = offsetof(struct nvme_identify, nsid); >>> + status = NVME_SC_INVALID_NS | NVME_SC_DNR; >>> + goto out; >>> + } >>> + >>> + id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL); >>> + if (!id_zns) { >>> + status = NVME_SC_INTERNAL; >>> + goto out; >>> + } >>> + >>> + status = nvmet_req_find_ns(req); >>> + if (status) { >>> + status = NVME_SC_INTERNAL; >>> + goto done; >>> + } >>> + >>> + if (!bdev_is_zoned(req->ns->bdev)) { >>> + req->error_loc = offsetof(struct nvme_identify, nsid); >>> + status = NVME_SC_INVALID_NS | NVME_SC_DNR; >>> + goto done; >>> + } >>> + >>> + nvmet_ns_revalidate(req->ns); >>> + zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >> >>> + req->ns->blksize_shift; >> s/9/SECTOR_SHIFT > > See my earlier reply to the same comment. I do not agree but if that is the nvme code policy, sure. >>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) >>> +{ >>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); >>> + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; >>> + struct nvmet_report_zone_data data = { .ns = req->ns }; >>> + unsigned int nr_zones; >>> + int reported_zones; >>> + u16 status; >>> + >>> + status = nvmet_bdev_zns_checks(req); >>> + if (status) >>> + goto out; >>> + >>> + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); >>> + if (!data.rz) { >>> + status = NVME_SC_INTERNAL; >>> + goto out; >>> + } >>> + >>> + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / >>> + sizeof(struct nvme_zone_descriptor); >>> + if (!nr_zones) { >>> + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; >>> + goto out_free_report_zones; >>> + } >>> + >>> + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >>> + nvmet_bdev_report_zone_cb, &data); >>> + if (reported_zones < 0) { >>> + status = NVME_SC_INTERNAL; >>> + goto out_free_report_zones; >>> + } >> There is a problem here: the code as is ignores the request reporting option >> field which can lead to an invalid zone report being returned. I think you need >> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field >> passed by the initiator and filter the zone report since blkdev_report_zones() >> does not handle that argument. > > The reporting options are set by the host statistically in > nvme_ns_report_zones() > arefrom:- nvme_ns_report_zones() > c.zmr.zra = NVME_ZRA_ZONE_REPORT; > c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; > c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; > > All the above values are validated in the nvmet_bdev_zns_checks() helper > called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the > report zone buffer. > > 1. c.zmr.zra indicates the action which Reports zone descriptor entries > through the Report Zones data structure. > > We validate this value is been set to NVME_ZRA_ZONE_REPORT in the > nvmet_bdev_zns_chceks(). We are calling report zone after checking > zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed > in the nvmet_bdev_report_zone_cb(). > > 2. c.zmr.zrasf indicates the action specific field which is set to > NVME_ZRASF_ZONE_REPORT_ALL. > > We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the > nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to > filter any zone states in the nvmet_bdev_report_zone_cb(). > > 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in > the Report Zone data structure Number of Zones field indicates the > number of > fully transferred zone descriptors in the data buffer, which we set from > return value of the blkdev_report_zones() :- > > > reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, > nvmet_bdev_report_zone_cb, &data); > <snip> data.rz->nr_zones = cpu_to_le64(reported_zones); > > So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr. > > Can you please explain what filtering is missing in the current code ? > > Maybe I'm looking into an old spec. report zones command has the reporting options (ro) field (bits 15:08 of dword 13) where the user can specify the following values: Value Description 0h List all zones. 1h List the zones in the ZSE:Empty state. 2h List the zones in the ZSIO:Implicitly Opened state. 3h List the zones in the ZSEO:Explicitly Opened state. 4h List the zones in the ZSC:Closed state. 5h List the zones in the ZSF:Full state. 6h List the zones in the ZSRO:Read Only state. 7h List the zones in the ZSO:Offline state. to filter the zone report based on zone condition. blkdev_report_zones() will always to a "list all zones", that is, ro == 0h. But on the initiator side, if the client issue a report zones command through an ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES ioctl), it may specify a different value for the ro field. Processing that command using blkdev_report_zones() like you are doing here without any additional filtering will give an incorrect report. Filtering based on the user specified ro field needs to be added in nvmet_bdev_report_zone_cb(). The current code here is fine of the initiator/client side uses the block layer and execute all report zones through blkdev_report_zones(). But things will break if the client starts doing passthrough commands using nvme ioctl. No ? >>> + >>> + data.rz->nr_zones = cpu_to_le64(reported_zones); >>> + >>> + status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize); >>> + >>> +out_free_report_zones: >>> + kvfree(data.rz); >>> +out: >>> + nvmet_req_complete(req, status); >>> +} >>> + >>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) >>> +{ >>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); >>> + sector_t nr_sect = bdev_zone_sectors(req->ns->bdev); >>> + u16 status = NVME_SC_SUCCESS; >>> + u8 zsa = req->cmd->zms.zsa; >>> + enum req_opf op; >>> + int ret; >>> + const unsigned int zsa_to_op[] = { >>> + [NVME_ZONE_OPEN] = REQ_OP_ZONE_OPEN, >>> + [NVME_ZONE_CLOSE] = REQ_OP_ZONE_CLOSE, >>> + [NVME_ZONE_FINISH] = REQ_OP_ZONE_FINISH, >>> + [NVME_ZONE_RESET] = REQ_OP_ZONE_RESET, >>> + }; >>> + >>> + if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) { >> What is the point of the "!zsa_to_op[zsa]" here ? All the REQ_OP_ZONE_XXX are >> non 0, always... > > Well this is just making sure that we receive the right action since sparse > array will return 0 for any other values than listed above having > !zsa_to_op[zsa] check we can return an error. But zsa is unsigned and you check it against the array size. So it can only be within 0 and array size - 1. That is enough... I really do not see the point of clutering the condition with something that is always true... [...] >>> + >>> + ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL); >>> + if (ret) >>> + status = NVME_SC_INTERNAL; >>> +out: >>> + nvmet_req_complete(req, status); >>> +} >>> + >>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req) >>> +{ >>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba); >>> + u16 status = NVME_SC_SUCCESS; >>> + unsigned int total_len = 0; >>> + struct scatterlist *sg; >>> + int ret = 0, sg_cnt; >>> + struct bio *bio; >>> + >>> + if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) >>> + return; >> No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ? > > Yes it does, you had the same comment on earlier version, it can be > confusing > that is why I proposed a helper for check transfer len and !req->sg_cnt > check, > but we don't want that helper. Just add a comment saying that nvmet_check_transfer_len() calls nvmet_req_complete(). That will help people like me with a short memory span :) >>> + >>> + if (!req->sg_cnt) { >>> + nvmet_req_complete(req, 0); >>> + return; >>> + } >>> + >>> + if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) { >>> + bio = &req->b.inline_bio; >>> + bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); >>> + } else { >>> + bio = bio_alloc(GFP_KERNEL, req->sg_cnt); >>> + } >>> + >>> + bio_set_dev(bio, req->ns->bdev); >>> + bio->bi_iter.bi_sector = sect; >>> + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; >>> + if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA)) >>> + bio->bi_opf |= REQ_FUA; >>> + >>> + for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) { >>> + struct page *p = sg_page(sg); >>> + unsigned int l = sg->length; >>> + unsigned int o = sg->offset; >>> + >>> + ret = bio_add_zone_append_page(bio, p, l, o); >>> + if (ret != sg->length) { >>> + status = NVME_SC_INTERNAL; >>> + goto out_bio_put; >>> + } >>> + >>> + total_len += sg->length; >>> + } >>> + >>> + if (total_len != nvmet_rw_data_len(req)) { >>> + status = NVME_SC_INTERNAL | NVME_SC_DNR; >>> + goto out_bio_put; >>> + } >>> + >>> + ret = submit_bio_wait(bio); >> submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it >> mandatory to block here ? The result handling could be done in the bio_end >> callback no ? > > I did initially, but zonefs uses sync I/O, I'm not sure about the btrfs, > if it does > please let me know I'll make it async. > > If there is no async caller in the kernel for REQ_OP_ZONE_APPEND > shouldwe make this async ? This should not depend on what the user does, at all. Yes, for now zonefs only uses zone append for sync write() calls. But I intend to have zone append used for async writes too. And in btrfs, append writes are used for all data IOs, sync or async. That is on the initiator side anyway. The target side should not assume any special behavior of the initiator. So if there is no technical reasons to prevent async append writes execution, I would rather have all of them processed with async BIOs execution, similar to regular write BIOs. -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-12 7:25 ` Damien Le Moal @ 2021-03-13 2:40 ` Chaitanya Kulkarni 2021-03-15 3:54 ` Chaitanya Kulkarni 1 sibling, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-13 2:40 UTC (permalink / raw) To: Damien Le Moal, linux-nvme; +Cc: hch, kbusch, sagi On 3/11/21 23:26, Damien Le Moal wrote: > On 2021/03/12 15:29, Chaitanya Kulkarni wrote: > [...] > +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) >>>> +{ >>>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); >>>> + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; >>>> + struct nvmet_report_zone_data data = { .ns = req->ns }; >>>> + unsigned int nr_zones; >>>> + int reported_zones; >>>> + u16 status; >>>> + >>>> + status = nvmet_bdev_zns_checks(req); >>>> + if (status) >>>> + goto out; >>>> + >>>> + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); >>>> + if (!data.rz) { >>>> + status = NVME_SC_INTERNAL; >>>> + goto out; >>>> + } >>>> + >>>> + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / >>>> + sizeof(struct nvme_zone_descriptor); >>>> + if (!nr_zones) { >>>> + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; >>>> + goto out_free_report_zones; >>>> + } >>>> + >>>> + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >>>> + nvmet_bdev_report_zone_cb, &data); >>>> + if (reported_zones < 0) { >>>> + status = NVME_SC_INTERNAL; >>>> + goto out_free_report_zones; >>>> + } >>> There is a problem here: the code as is ignores the request reporting option >>> field which can lead to an invalid zone report being returned. I think you need >>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field >>> passed by the initiator and filter the zone report since blkdev_report_zones() >>> does not handle that argument. >> The reporting options are set by the host statistically in >> nvme_ns_report_zones() >> arefrom:- nvme_ns_report_zones() >> c.zmr.zra = NVME_ZRA_ZONE_REPORT; >> c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; >> c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; >> >> All the above values are validated in the nvmet_bdev_zns_checks() helper >> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the >> report zone buffer. >> >> 1. c.zmr.zra indicates the action which Reports zone descriptor entries >> through the Report Zones data structure. >> >> We validate this value is been set to NVME_ZRA_ZONE_REPORT in the >> nvmet_bdev_zns_chceks(). We are calling report zone after checking >> zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed >> in the nvmet_bdev_report_zone_cb(). >> >> 2. c.zmr.zrasf indicates the action specific field which is set to >> NVME_ZRASF_ZONE_REPORT_ALL. >> >> We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the >> nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to >> filter any zone states in the nvmet_bdev_report_zone_cb(). >> >> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in >> the Report Zone data structure Number of Zones field indicates the >> number of >> fully transferred zone descriptors in the data buffer, which we set from >> return value of the blkdev_report_zones() :- >> >> >> reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >> nvmet_bdev_report_zone_cb, &data); >> <snip> data.rz->nr_zones = cpu_to_le64(reported_zones); >> >> So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr. >> >> Can you please explain what filtering is missing in the current code ? >> >> Maybe I'm looking into an old spec. > report zones command has the reporting options (ro) field (bits 15:08 of dword > 13) where the user can specify the following values: > > Value Description > 0h List all zones. > 1h List the zones in the ZSE:Empty state. > 2h List the zones in the ZSIO:Implicitly Opened state. > 3h List the zones in the ZSEO:Explicitly Opened state. > 4h List the zones in the ZSC:Closed state. > 5h List the zones in the ZSF:Full state. > 6h List the zones in the ZSRO:Read Only state. > 7h List the zones in the ZSO:Offline state. > > to filter the zone report based on zone condition. blkdev_report_zones() will > always to a "list all zones", that is, ro == 0h. > > But on the initiator side, if the client issue a report zones command through an > ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES > ioctl), it may specify a different value for the ro field. Processing that > command using blkdev_report_zones() like you are doing here without any > additional filtering will give an incorrect report. Filtering based on the user > specified ro field needs to be added in nvmet_bdev_report_zone_cb(). > > The current code here is fine of the initiator/client side uses the block layer > and execute all report zones through blkdev_report_zones(). But things will > break if the client starts doing passthrough commands using nvme ioctl. No ? Okay, so I'm using the right spec. With your explanation it needs a filtering, will add it to the next version. >>>> + >>>> + data.rz->nr_zones = cpu_to_le64(reported_zones); >>>> + >>>> + status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize); >>>> + >>>> +out_free_report_zones: >>>> + kvfree(data.rz); >>>> +out: >>>> + nvmet_req_complete(req, status); >>>> +} >>>> + >>>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req) >>>> +{ >>>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba); >>>> + sector_t nr_sect = bdev_zone_sectors(req->ns->bdev); >>>> + u16 status = NVME_SC_SUCCESS; >>>> + u8 zsa = req->cmd->zms.zsa; >>>> + enum req_opf op; >>>> + int ret; >>>> + const unsigned int zsa_to_op[] = { >>>> + [NVME_ZONE_OPEN] = REQ_OP_ZONE_OPEN, >>>> + [NVME_ZONE_CLOSE] = REQ_OP_ZONE_CLOSE, >>>> + [NVME_ZONE_FINISH] = REQ_OP_ZONE_FINISH, >>>> + [NVME_ZONE_RESET] = REQ_OP_ZONE_RESET, >>>> + }; >>>> + >>>> + if (zsa > ARRAY_SIZE(zsa_to_op) || !zsa_to_op[zsa]) { >>> What is the point of the "!zsa_to_op[zsa]I see, I'll add the async I/O interface." here ? All the REQ_OP_ZONE_XXX are >>> non 0, always... >> Well this is just making sure that we receive the right action since sparse >> array will return 0 for any other values than listed above having >> !zsa_to_op[zsa] check we can return an error. > But zsa is unsigned and you check it against the array size. So it can only be > within 0 and array size - 1. That is enough... I really do not see the point of > clutering the condition with something that is always true... Okay. > > [...] >>>> + >>>> + ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL); >>>> + if (ret) >>>> + status = NVME_SC_INTERNAL; >>>> +out: >>>> + nvmet_req_complete(req, status); >>>> +} >>>> + >>>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req) >>>> +{ >>>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba); >>>> + u16 status = NVME_SC_SUCCESS; >>>> + unsigned int total_len = 0; >>>> + struct scatterlist *sg; >>>> + int ret = 0, sg_cnt; >>>> + struct bio *bio; >>>> + >>>> + if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) >>>> + return; >>> No nvmet_req_complete() call ? Is that done in nvmet_check_transfer_len() ? >> Yes it does, you had the same comment on earlier version, it can be >> confusing >> that is why I proposed a helper for check transfer len and !req->sg_cnt >> check, >> but we don't want that helper. > Just add a comment saying that nvmet_check_transfer_len() calls > nvmet_req_complete(). That will help people like me with a short memory span :) Okay. >>>> + >>>> + if (!req->sg_cnt) { >>>> + nvmet_req_complete(req, 0); >>>> + return; >>>> + } >>>> + >>>> + if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) { >>>> + bio = &req->b.inline_bio; >>>> + bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec)); >>>> + } else { >>>> + bio = bio_alloc(GFP_KERNEL, req->sg_cnt); >>>> + } >>>> + >>>> + bio_set_dev(bio, req->ns->bdev); >>>> + bio->bi_iter.bi_sector = sect; >>>> + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; >>>> + if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA)) >>>> + bio->bi_opf |= REQ_FUA; >>>> + >>>> + for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) { >>>> + struct page *p = sg_page(sg); >>>> + unsigned int l = sg->length; >>>> + unsigned int o = sg->offset; >>>> + >>>> + ret = bio_add_zone_append_page(bio, p, l, o); >>>> + if (ret != sg->length) { >>>> + status = NVME_SC_INTERNAL; >>>> + goto out_bio_put; >>>> + } >>>> + >>>> + total_len += sg->length; >>>> + } >>>> + >>>> + if (total_len != nvmet_rw_data_len(req)) { >>>> + status = NVME_SC_INTERNAL | NVME_SC_DNR; >>>> + goto out_bio_put; >>>> + } >>>> + >>>> + ret = submit_bio_wait(bio); >>> submit_bio_wait() ? Why blocking here ? That would be bad for performance. Is it >>> mandatory to block here ? The result handling could be done in the bio_end >>> callback no ? >> I did initially, but zonefs uses sync I/O, I'm not sure about the btrfs, >> if it does >> please let me know I'll make it async. >> >> If there is no async caller in the kernel for REQ_OP_ZONE_APPEND >> shouldwe make this async ? > This should not depend on what the user does, at all. > Yes, for now zonefs only uses zone append for sync write() calls. But I intend > to have zone append used for async writes too. And in btrfs, append writes are > used for all data IOs, sync or async. That is on the initiator side anyway. The > target side should not assume any special behavior of the initiator. So if there > is no technical reasons to prevent async append writes execution, I would rather > have all of them processed with async BIOs execution, similar to regular write BIOs. Thanks for the explanation, I was not aware about the async interface. I'll make it async. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-12 7:25 ` Damien Le Moal 2021-03-13 2:40 ` Chaitanya Kulkarni @ 2021-03-15 3:54 ` Chaitanya Kulkarni 2021-03-15 4:09 ` Damien Le Moal 1 sibling, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-15 3:54 UTC (permalink / raw) To: Damien Le Moal, hch; +Cc: linux-nvme, kbusch, sagi Christoph/Damien, On 3/11/21 23:26, Damien Le Moal wrote: >>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) >>>> +{ >>>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); >>>> + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; >>>> + struct nvmet_report_zone_data data = { .ns = req->ns }; >>>> + unsigned int nr_zones; >>>> + int reported_zones; >>>> + u16 status; >>>> + >>>> + status = nvmet_bdev_zns_checks(req); >>>> + if (status) >>>> + goto out; >>>> + >>>> + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); >>>> + if (!data.rz) { >>>> + status = NVME_SC_INTERNAL; >>>> + goto out; >>>> + } >>>> + >>>> + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / >>>> + sizeof(struct nvme_zone_descriptor); >>>> + if (!nr_zones) { >>>> + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; >>>> + goto out_free_report_zones; >>>> + } >>>> + >>>> + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >>>> + nvmet_bdev_report_zone_cb, &data); >>>> + if (reported_zones < 0) { >>>> + status = NVME_SC_INTERNAL; >>>> + goto out_free_report_zones; >>>> + } >>> There is a problem here: the code as is ignores the request reporting option >>> field which can lead to an invalid zone report being returned. I think you need >>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field >>> passed by the initiator and filter the zone report since blkdev_report_zones() >>> does not handle that argument. >> The reporting options are set by the host statistically in >> nvme_ns_report_zones() >> arefrom:- nvme_ns_report_zones() >> c.zmr.zra = NVME_ZRA_ZONE_REPORT; >> c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; >> c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; >> >> All the above values are validated in the nvmet_bdev_zns_checks() helper >> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the >> report zone buffer. >> >> 1. c.zmr.zra indicates the action which Reports zone descriptor entries >> through the Report Zones data structure. >> >> We validate this value is been set to NVME_ZRA_ZONE_REPORT in the >> nvmet_bdev_zns_chceks(). We are calling report zone after checking >> zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed >> in the nvmet_bdev_report_zone_cb(). >> >> 2. c.zmr.zrasf indicates the action specific field which is set to >> NVME_ZRASF_ZONE_REPORT_ALL. >> >> We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the >> nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to >> filter any zone states in the nvmet_bdev_report_zone_cb(). >> >> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in >> the Report Zone data structure Number of Zones field indicates the >> number of >> fully transferred zone descriptors in the data buffer, which we set from >> return value of the blkdev_report_zones() :- >> >> >> reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >> nvmet_bdev_report_zone_cb, &data); >> <snip> data.rz->nr_zones = cpu_to_le64(reported_zones); >> >> So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr. >> >> Can you please explain what filtering is missing in the current code ? >> >> Maybe I'm looking into an old spec. > report zones command has the reporting options (ro) field (bits 15:08 of dword > 13) where the user can specify the following values: > > Value Description > 0h List all zones. > 1h List the zones in the ZSE:Empty state. > 2h List the zones in the ZSIO:Implicitly Opened state. > 3h List the zones in the ZSEO:Explicitly Opened state. > 4h List the zones in the ZSC:Closed state. > 5h List the zones in the ZSF:Full state. > 6h List the zones in the ZSRO:Read Only state. > 7h List the zones in the ZSO:Offline state. > > to filter the zone report based on zone condition. blkdev_report_zones() will > always to a "list all zones", that is, ro == 0h. > > But on the initiator side, if the client issue a report zones command through an > ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES > ioctl), it may specify a different value for the ro field. Processing that > command using blkdev_report_zones() like you are doing here without any > additional filtering will give an incorrect report. Filtering based on the user > specified ro field needs to be added in nvmet_bdev_report_zone_cb(). > > The current code here is fine of the initiator/client side uses the block layer > and execute all report zones through blkdev_report_zones(). But things will > break if the client starts doing passthrough commands using nvme ioctl. No ? > Regarding the passthru commands support following are the non technical issues I ran into, it will be great to get some feedback before we proceed :- 1. We don't support any passthru commands for the generic NVMeOF target backends (file/bdev) for that we have a dedicated passthru target. ZBD backend is a generic one, so do we really want to add support for user's NVMe passthru commands ? 2. In past we've decided that for NVMeOF target non I/O commands should be handled in the userspace applications to keep the target light. I couldn't come up with justification :(. There are other features that we may have to implement apart from filtering if we want to support NVMe passthru commands e.g. Set Zone Descriptor Extension, Changed Zone List, AEN Zone Descriptor Changed Notices (Log Identifier BFh) etc. (these are just placeholder examples, please do not take it literally). _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-15 3:54 ` Chaitanya Kulkarni @ 2021-03-15 4:09 ` Damien Le Moal 2021-03-15 4:53 ` Chaitanya Kulkarni 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2021-03-15 4:09 UTC (permalink / raw) To: Chaitanya Kulkarni, hch; +Cc: linux-nvme, kbusch, sagi On 2021/03/15 12:54, Chaitanya Kulkarni wrote: > Christoph/Damien, > > On 3/11/21 23:26, Damien Le Moal wrote: >>>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req) >>>>> +{ >>>>> + sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba); >>>>> + u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; >>>>> + struct nvmet_report_zone_data data = { .ns = req->ns }; >>>>> + unsigned int nr_zones; >>>>> + int reported_zones; >>>>> + u16 status; >>>>> + >>>>> + status = nvmet_bdev_zns_checks(req); >>>>> + if (status) >>>>> + goto out; >>>>> + >>>>> + data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO); >>>>> + if (!data.rz) { >>>>> + status = NVME_SC_INTERNAL; >>>>> + goto out; >>>>> + } >>>>> + >>>>> + nr_zones = (bufsize - sizeof(struct nvme_zone_report)) / >>>>> + sizeof(struct nvme_zone_descriptor); >>>>> + if (!nr_zones) { >>>>> + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR; >>>>> + goto out_free_report_zones; >>>>> + } >>>>> + >>>>> + reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >>>>> + nvmet_bdev_report_zone_cb, &data); >>>>> + if (reported_zones < 0) { >>>>> + status = NVME_SC_INTERNAL; >>>>> + goto out_free_report_zones; >>>>> + } >>>> There is a problem here: the code as is ignores the request reporting option >>>> field which can lead to an invalid zone report being returned. I think you need >>>> to modify nvmet_bdev_report_zone_cb() to look at the reporting option field >>>> passed by the initiator and filter the zone report since blkdev_report_zones() >>>> does not handle that argument. >>> The reporting options are set by the host statistically in >>> nvme_ns_report_zones() >>> arefrom:- nvme_ns_report_zones() >>> c.zmr.zra = NVME_ZRA_ZONE_REPORT; >>> c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL; >>> c.zmr.pr = NVME_REPORT_ZONE_PARTIAL; >>> >>> All the above values are validated in the nvmet_bdev_zns_checks() helper >>> called from nvmet_bdev_execute_zone_mgmt_recv() before we allocate the >>> report zone buffer. >>> >>> 1. c.zmr.zra indicates the action which Reports zone descriptor entries >>> through the Report Zones data structure. >>> >>> We validate this value is been set to NVME_ZRA_ZONE_REPORT in the >>> nvmet_bdev_zns_chceks(). We are calling report zone after checking >>> zone receive action it NVME_ZRA_ZONE_REPORT so not filtering is needed >>> in the nvmet_bdev_report_zone_cb(). >>> >>> 2. c.zmr.zrasf indicates the action specific field which is set to >>> NVME_ZRASF_ZONE_REPORT_ALL. >>> >>> We validate this value is been set to NVME_ZRASF_ZONE_REPORT_ALL in the >>> nvmet_bdev_zns_chceks(). Since host wants all the zones we don't need to >>> filter any zone states in the nvmet_bdev_report_zone_cb(). >>> >>> 3. c.zmr.pr is set to NVME_REPORT_ZONE_PARTIAL which value = 1 i.e value in >>> the Report Zone data structure Number of Zones field indicates the >>> number of >>> fully transferred zone descriptors in the data buffer, which we set from >>> return value of the blkdev_report_zones() :- >>> >>> >>> reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones, >>> nvmet_bdev_report_zone_cb, &data); >>> <snip> data.rz->nr_zones = cpu_to_le64(reported_zones); >>> >>> So no filtering is needed in nvmet_bdev_report_zone_cb() for c.zmr.pr. >>> >>> Can you please explain what filtering is missing in the current code ? >>> >>> Maybe I'm looking into an old spec. >> report zones command has the reporting options (ro) field (bits 15:08 of dword >> 13) where the user can specify the following values: >> >> Value Description >> 0h List all zones. >> 1h List the zones in the ZSE:Empty state. >> 2h List the zones in the ZSIO:Implicitly Opened state. >> 3h List the zones in the ZSEO:Explicitly Opened state. >> 4h List the zones in the ZSC:Closed state. >> 5h List the zones in the ZSF:Full state. >> 6h List the zones in the ZSRO:Read Only state. >> 7h List the zones in the ZSO:Offline state. >> >> to filter the zone report based on zone condition. blkdev_report_zones() will >> always to a "list all zones", that is, ro == 0h. >> >> But on the initiator side, if the client issue a report zones command through an >> ioctl (passthrough/direct access not suing the block layer BLKREPORTZONES >> ioctl), it may specify a different value for the ro field. Processing that >> command using blkdev_report_zones() like you are doing here without any >> additional filtering will give an incorrect report. Filtering based on the user >> specified ro field needs to be added in nvmet_bdev_report_zone_cb(). >> >> The current code here is fine of the initiator/client side uses the block layer >> and execute all report zones through blkdev_report_zones(). But things will >> break if the client starts doing passthrough commands using nvme ioctl. No ? >> > > Regarding the passthru commands support following are the non technical > issues > I ran into, it will be great to get some feedback before we proceed :- > > 1. We don't support any passthru commands for the generic NVMeOF target > backends > (file/bdev) for that we have a dedicated passthru target. ZBD backend > is a > generic one, so do we really want to add support for user's NVMe > passthru > commands ? If it is not supported then fine, no filtering is needed. But you should add a check on the report zones command received by the target to make sure that the reporting option field is "report all", and fail the command if any other reporting option is set since it cannot be honored. > > 2. In past we've decided that for NVMeOF target non I/O commands should > be handled > in the userspace applications to keep the target light. > > I couldn't come up with justification :(. > > There are other features that we may have to implement apart from filtering > if we want to support NVMe passthru commands e.g. Set Zone Descriptor > Extension, > Changed Zone List, AEN Zone Descriptor Changed Notices (Log Identifier > BFh) etc. > (these are just placeholder examples, please do not take it literally). > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support 2021-03-15 4:09 ` Damien Le Moal @ 2021-03-15 4:53 ` Chaitanya Kulkarni 0 siblings, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-15 4:53 UTC (permalink / raw) To: Damien Le Moal; +Cc: hch, linux-nvme, kbusch, sagi On 3/14/21 21:09, Damien Le Moal wrote: >> Regarding the passthru commands support following are the non technical >> issues >> I ran into, it will be great to get some feedback before we proceed :- >> >> 1. We don't support any passthru commands for the generic NVMeOF target >> backends >> (file/bdev) for that we have a dedicated passthru target. ZBD backend >> is a >> generic one, so do we really want to add support for user's NVMe >> passthru >> commands ? > If it is not supported then fine, no filtering is needed. But you should add a > check on the report zones command received by the target to make sure that the > reporting option field is "report all", and fail the command if any other > reporting option is set since it cannot be honored. > Absolutely will do that, thanks for the quick reply. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends 2021-03-11 7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni @ 2021-03-11 7:15 ` Chaitanya Kulkarni 2021-03-12 0:37 ` Damien Le Moal 2 siblings, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2021-03-11 7:15 UTC (permalink / raw) To: linux-nvme; +Cc: hch, kbusch, sagi, damien.lemoal, Chaitanya Kulkarni With the addition of zns backend now we have three different backends with inline bio optimization. That leads to having duplicate code in for freeing the bio in all three backends: generic bdev, passsthru and generic zns. Add a helper function to avoid duplicate code and update the respective backends. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> --- drivers/nvme/target/io-cmd-bdev.c | 3 +-- drivers/nvme/target/nvmet.h | 6 ++++++ drivers/nvme/target/passthru.c | 3 +-- drivers/nvme/target/zns.c | 3 +-- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c index ada0215f5e56..ca39d787d71f 100644 --- a/drivers/nvme/target/io-cmd-bdev.c +++ b/drivers/nvme/target/io-cmd-bdev.c @@ -173,8 +173,7 @@ static void nvmet_bio_done(struct bio *bio) struct nvmet_req *req = bio->bi_private; nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status)); - if (bio != &req->b.inline_bio) - bio_put(bio); + nvmet_req_bio_put(req, bio); } #ifdef CONFIG_BLK_DEV_INTEGRITY diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index f3fccc49de03..2f1bd3ac34a2 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -654,4 +654,10 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba) return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT); } +static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio) +{ + if (bio != &req->b.inline_bio) + bio_put(bio); +} + #endif /* _NVMET_H */ diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index 26c587ccd152..011aeebace55 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -206,8 +206,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq) for_each_sg(req->sg, sg, req->sg_cnt, i) { if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length, sg->offset) < sg->length) { - if (bio != &req->p.inline_bio) - bio_put(bio); + nvmet_req_bio_put(req, bio); return -EINVAL; } } diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c index e12629b02320..112d213583ea 100644 --- a/drivers/nvme/target/zns.c +++ b/drivers/nvme/target/zns.c @@ -326,7 +326,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req) bio->bi_iter.bi_sector); out_bio_put: - if (bio != &req->b.inline_bio) - bio_put(bio); + nvmet_req_bio_put(req, bio); nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status); } -- 2.22.1 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends 2021-03-11 7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni @ 2021-03-12 0:37 ` Damien Le Moal 0 siblings, 0 replies; 12+ messages in thread From: Damien Le Moal @ 2021-03-12 0:37 UTC (permalink / raw) To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, kbusch, sagi On 2021/03/11 16:16, Chaitanya Kulkarni wrote: > With the addition of zns backend now we have three different backends > with inline bio optimization. That leads to having duplicate code in > for freeing the bio in all three backends: generic bdev, passsthru and > generic zns. > > Add a helper function to avoid duplicate code and update the > respective backends. > > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> > --- > drivers/nvme/target/io-cmd-bdev.c | 3 +-- > drivers/nvme/target/nvmet.h | 6 ++++++ > drivers/nvme/target/passthru.c | 3 +-- > drivers/nvme/target/zns.c | 3 +-- > 4 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c > index ada0215f5e56..ca39d787d71f 100644 > --- a/drivers/nvme/target/io-cmd-bdev.c > +++ b/drivers/nvme/target/io-cmd-bdev.c > @@ -173,8 +173,7 @@ static void nvmet_bio_done(struct bio *bio) > struct nvmet_req *req = bio->bi_private; > > nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status)); > - if (bio != &req->b.inline_bio) > - bio_put(bio); > + nvmet_req_bio_put(req, bio); > } > > #ifdef CONFIG_BLK_DEV_INTEGRITY > diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h > index f3fccc49de03..2f1bd3ac34a2 100644 > --- a/drivers/nvme/target/nvmet.h > +++ b/drivers/nvme/target/nvmet.h > @@ -654,4 +654,10 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba) > return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT); > } > > +static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio) > +{ > + if (bio != &req->b.inline_bio) > + bio_put(bio); > +} > + > #endif /* _NVMET_H */ > diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c > index 26c587ccd152..011aeebace55 100644 > --- a/drivers/nvme/target/passthru.c > +++ b/drivers/nvme/target/passthru.c > @@ -206,8 +206,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq) > for_each_sg(req->sg, sg, req->sg_cnt, i) { > if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length, > sg->offset) < sg->length) { > - if (bio != &req->p.inline_bio) > - bio_put(bio); > + nvmet_req_bio_put(req, bio); > return -EINVAL; > } > } > diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c > index e12629b02320..112d213583ea 100644 > --- a/drivers/nvme/target/zns.c > +++ b/drivers/nvme/target/zns.c > @@ -326,7 +326,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req) > bio->bi_iter.bi_sector); > > out_bio_put: > - if (bio != &req->b.inline_bio) > - bio_put(bio); > + nvmet_req_bio_put(req, bio); > nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status); > } > Looks good to me. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-03-15 5:01 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-11 7:15 [PATCH V12 0/3] nvmet: add ZBD backend support Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 1/3] nvmet: add NVM Command Set Identifier support Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 2/3] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni 2021-03-12 1:15 ` Damien Le Moal 2021-03-12 6:29 ` Chaitanya Kulkarni 2021-03-12 7:25 ` Damien Le Moal 2021-03-13 2:40 ` Chaitanya Kulkarni 2021-03-15 3:54 ` Chaitanya Kulkarni 2021-03-15 4:09 ` Damien Le Moal 2021-03-15 4:53 ` Chaitanya Kulkarni 2021-03-11 7:15 ` [PATCH V12 3/3] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni 2021-03-12 0:37 ` Damien Le Moal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).