All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V9 0/9] nvmet: add ZBD backend support
@ 2021-01-12  4:26 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

Hi,

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
backend. There is no support for a generic block device backend to
handle the ZBD devices which are not NVMe protocol compliant.

This adds support to export the ZBDs (which are not NVMe drives) to host
the from target via NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for
patch-series overview.

All the testcases are passing for the ZoneFS where ZBD exported with
NVMeOF backed by null_blk ZBD and null_blk ZBD without NVMeOF. Adding
test result below.

Note: This patch-series is based on the earlier posted patch series :-

[PATCH V2 0/4] nvmet: admin-cmd related cleanups and a fix
http://lists.infradead.org/pipermail/linux-nvme/2021-January/021729.html

-ck

Changes from V8:-

1. Rebase and retest on latest nvme-5.11.
2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set.
3. Add a fix to admin ns-desc list handler for handling default csi.

Changes from V7:-

1. Just like what block layer provides an API for bio_init(), provide
   nvmet_bio_init() such that we move bio initialization code for
   nvme-read-write commands from bdev and zns backend into the centralize
   helper. 
2. With bdev/zns/file now we have three backends that are checking for
   req->sg_cnt and calling nvmet_check_transfer_len() before we process
   nvme-read-write commands. Move this duplicate code from three
   backeneds into the helper.
3. Export and use nvmet_bio_done() callback in
   nvmet_execute_zone_append() instead of the open coding the function.
   This also avoids code duplication for bio & request completion with
   error log page update.
4. Add zonefs tests log for dm linear device created on the top of SMR HDD
   exported with NVMeOF ZNS backend with the help of nvme-loop.

Changes from V6:-

1. Instead of calling report zones to find conventional zones in the 
   loop use the loop inside LLD blkdev_report_zones()->LLD_report_zones,
   that also simplifies the report zone callback.
2. Fix the bug in nvmet_bdev_has_conv_zones().
3. Remove conditional operators in the nvmet_bdev_execute_zone_append().

Changes from V5:-

1.  Use bio->bi_iter.bi_sector for result of the REQ_OP_ZONE_APPEND
    command.
2.  Add endianness to the helper nvmet_sect_to_lba().
3.  Make bufsize u32 in zone mgmt recv command handler.
4.  Add __GFP_ZERO for report zone data buffer to return clean buffer.

Changes from V4:-

1.  Don't use bio_iov_iter_get_pages() instead add a patch to export
    bio_add_hw_page() and call it directly for zone append.
2.  Add inline vector optimization for append bio.
3.  Update the commit logs for the patches.
4.  Remove ZNS related identify data structures, use individual members.
5.  Add a comment for macro NVMET_MPSMIN_SHIFT.
6.  Remove nvmet_bdev() helper.
7.  Move the command set identifier code into common code.
8.  Use IS_ENABLED() and move helpers fomr zns.c into common code.
9.  Add a patch to support Command Set identifiers.
10. Open code nvmet_bdev_validate_zns_zones().
11. Remove the per namespace min zasl calculation and don't allow
    namespaces with zasl value > the first ns zasl value.
12. Move the stubs into the header file.
13. Add lba to/from sector conversion helpers and update the
    io-cmd-bdev.c to avoid the code duplication.
14. Add everything into one patch for zns command handlers and 
    respective calls from the target code.
15. Remove the trim ns-desclist admin callback patch from this series.
16. Add bio get and put helpers patches to reduce the duplicate code in
    generic bdev, passthru, and generic zns backend.

Changes from V3:-

1.  Get rid of the bio_max_zasl check.
2.  Remove extra lines.
4.  Remove the block layer api export patch.
5.  Remove the bvec check in the bio_iov_iter_get_pages() for
    REQ_OP_ZONE_APPEND so that we can reuse the code.

Changes from V2:-

1.  Move conventional zone bitmap check into 
    nvmet_bdev_validate_zns_zones(). 
2.  Don't use report zones call to check the runt zone.
3.  Trim nvmet_zasl() helper.
4.  Fix typo in the nvmet_zns_update_zasl().
5.  Remove the comment and fix the mdts calculation in
    nvmet_execute_identify_cns_cs_ctrl().
6.  Use u64 for bufsize in nvmet_bdev_execute_zone_mgmt_recv().
7.  Remove nvmet_zones_to_desc_size() and fix the nr_zones
    calculation.
8.  Remove the op variable in nvmet_bdev_execute_zone_append().
9.  Fix the nr_zones calculation nvmet_bdev_execute_zone_mgmt_recv().
10. Update cover letter subject.

Changes from V1:-

1.  Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2.  Mark helpers inline.
3.  Fix typos in the comments and update the comments.
4.  Get rid of the curly brackets.
5.  Don't allow drives with last smaller zones.
6.  Calculate the zasl as a function of ax_zone_append_sectors,
    bio_max_pages so we don't have to split the bio.
7.  Add global subsys->zasl and update the zasl when new namespace
    is enabled.
8.  Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
    move functionality in to the report zone callback.
9.  Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10. Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (9):
  block: export bio_add_hw_pages()
  nvmet: add lba to sect conversion helpers
  nvmet: add NVM command set identifier support
  nvmet: add ZBD over ZNS backend support
  nvmet: add bio get helper for different backends
  nvmet: add bio init helper for different backends
  nvmet: add bio put helper for different backends
  nvmet: add common I/O length check helper
  nvmet: call nvmet_bio_done() for zone append

 block/bio.c                       |   1 +
 block/blk.h                       |   4 -
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  67 ++++--
 drivers/nvme/target/core.c        |  16 +-
 drivers/nvme/target/io-cmd-bdev.c |  67 +++---
 drivers/nvme/target/io-cmd-file.c |   7 +-
 drivers/nvme/target/nvmet.h       |  97 +++++++++
 drivers/nvme/target/passthru.c    |  11 +-
 drivers/nvme/target/zns.c         | 328 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h            |   4 +
 11 files changed, 536 insertions(+), 67 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

* Zonefs Test log with dm-linear on the top of SMR HDD:-
--------------------------------------------------------------------------------

1. Test Zoned Block Device info :- 
--------------------------------------------------------------------------------

# fdisk  -l /dev/sdh
Disk /dev/sdh: 13.64 TiB, 15000173281280 bytes, 3662151680 sectors
Disk model: HGST HSH721415AL
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
# cat /sys/block/sdh/queue/nr_zones
55880
# cat /sys/block/sdh/queue/zoned
host-managed
# cat /sys/block/sdh/queue/zone_append_max_bytes
688128

2. Creating NVMeOF target backed by dm-linear on the top of ZBD
--------------------------------------------------------------------------------

# ./zbdev.sh 1 dm-zbd
++ NQN=dm-zbd
++ echo '0 29022486528 linear /dev/sdh 274726912' | dmsetup create cksdh
9 directories, 4 files
++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd
++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1
++ echo -n /dev/dm-0
++ cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path
/dev/dm-0
++ echo 1
++ mkdir /sys/kernel/config/nvmet/ports/1/
++ echo -n loop
++ echo -n 1
++ ln -s /sys/kernel/config/nvmet/subsystems/dm-zbd /sys/kernel/config/nvmet/ports/1/subsystems/
++ sleep 1
++ echo transport=loop,nqn=dm-zbd
++ sleep 1
++ dmesg -c
[233450.572565] nvmet: adding nsid 1 to subsystem dm-zbd
[233452.269477] nvmet: creating controller 1 for subsystem dm-zbd for NQN nqn.2014-08.org.nvmexpress:uuid:853d7e82-8018-44ce-8784-ab81e7465ad9.
[233452.283352] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[233452.292805] nvme nvme0: creating 8 I/O queues.
[233452.299210] nvme nvme0: new ctrl: "dm-zbd"

3. dm-linear and backend SMR HDD association :-
--------------------------------------------------------------------------------

# cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path 
/dev/dm-0
# dmsetup ls --tree
 cksdh (252:0)
	 └─ (8:112)
# lsblk | head -3
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdh       8:112  0  13.6T  0 disk 
└─cksdh 252:0    0  13.5T  0 dm   

4. NVMeOF controller :- 
--------------------------------------------------------------------------------

# nvme list | tr -s ' ' ' '  
Node SN Model Namespace Usage Format FW Rev  
/dev/nvme0n1 8c6f348dcd64404c Linux 1 14.86 TB / 14.86 TB 4 KiB + 0 B 5.10.0nv

5. Zonefs tests results :-
--------------------------------------------------------------------------------

# ./zonefs-tests.sh /dev/nvme0n1 
Gathering information on /dev/nvme0n1...
zonefs-tests on /dev/nvme0n1:
  55356 zones (0 conventional zones, 55356 sequential zones)
  524288 512B sectors zone size (256 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... PASS
  Test 0013:  mkzonefs (super block zone state)                    ... PASS
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

46 / 46 tests passed

* Without CONFIG_BLK_DEV_ZONED nvme tests :-
--------------------------------------------------------------------------------

#
# grep -i blk_dev_zoned .config
# CONFIG_BLK_DEV_ZONED is not set
# makej M=drivers/nvme/ clean 
  CLEAN   drivers/nvme//Module.symvers
# makej M=drivers/nvme/ 
  CC [M]  drivers/nvme//host/core.o
  CC [M]  drivers/nvme//host/trace.o
  CC [M]  drivers/nvme//host/lightnvm.o
  CC [M]  drivers/nvme//target/core.o
  CC [M]  drivers/nvme//host/hwmon.o
  CC [M]  drivers/nvme//target/configfs.o
  CC [M]  drivers/nvme//host/pci.o
  CC [M]  drivers/nvme//target/admin-cmd.o
  CC [M]  drivers/nvme//host/fabrics.o
  CC [M]  drivers/nvme//host/rdma.o
  CC [M]  drivers/nvme//target/fabrics-cmd.o
  CC [M]  drivers/nvme//target/discovery.o
  CC [M]  drivers/nvme//host/fc.o
  CC [M]  drivers/nvme//target/io-cmd-file.o
  CC [M]  drivers/nvme//host/tcp.o
  CC [M]  drivers/nvme//target/io-cmd-bdev.o
  CC [M]  drivers/nvme//target/passthru.o
  CC [M]  drivers/nvme//target/trace.o
  CC [M]  drivers/nvme//target/loop.o
  CC [M]  drivers/nvme//target/rdma.o
  CC [M]  drivers/nvme//target/fc.o
  CC [M]  drivers/nvme//target/fcloop.o
  CC [M]  drivers/nvme//target/tcp.o
  LD [M]  drivers/nvme//target/nvme-loop.o
  LD [M]  drivers/nvme//target/nvme-fcloop.o
  LD [M]  drivers/nvme//target/nvmet-tcp.o
  LD [M]  drivers/nvme//host/nvme-fabrics.o
  LD [M]  drivers/nvme//host/nvme.o
  LD [M]  drivers/nvme//host/nvme-rdma.o
  LD [M]  drivers/nvme//target/nvmet-rdma.o
  LD [M]  drivers/nvme//target/nvmet.o
  LD [M]  drivers/nvme//target/nvmet-fc.o
  LD [M]  drivers/nvme//host/nvme-tcp.o
  LD [M]  drivers/nvme//host/nvme-fc.o
  LD [M]  drivers/nvme//host/nvme-core.o
  MODPOST drivers/nvme//Module.symvers
  CC [M]  drivers/nvme//host/nvme-core.mod.o
  CC [M]  drivers/nvme//host/nvme-fabrics.mod.o
  CC [M]  drivers/nvme//host/nvme-fc.mod.o
  CC [M]  drivers/nvme//host/nvme-rdma.mod.o
  CC [M]  drivers/nvme//host/nvme-tcp.mod.o
  CC [M]  drivers/nvme//host/nvme.mod.o
  CC [M]  drivers/nvme//target/nvme-fcloop.mod.o
  CC [M]  drivers/nvme//target/nvme-loop.mod.o
  CC [M]  drivers/nvme//target/nvmet-fc.mod.o
  CC [M]  drivers/nvme//target/nvmet-rdma.mod.o
  CC [M]  drivers/nvme//target/nvmet-tcp.mod.o
  CC [M]  drivers/nvme//target/nvmet.mod.o
  LD [M]  drivers/nvme//target/nvme-fcloop.ko
  LD [M]  drivers/nvme//host/nvme-tcp.ko
  LD [M]  drivers/nvme//host/nvme-core.ko
  LD [M]  drivers/nvme//target/nvmet-tcp.ko
  LD [M]  drivers/nvme//target/nvme-loop.ko
  LD [M]  drivers/nvme//target/nvmet-fc.ko
  LD [M]  drivers/nvme//host/nvme-fabrics.ko
  LD [M]  drivers/nvme//host/nvme-fc.ko
  LD [M]  drivers/nvme//target/nvmet-rdma.ko
  LD [M]  drivers/nvme//host/nvme-rdma.ko
  LD [M]  drivers/nvme//host/nvme.ko
  LD [M]  drivers/nvme//target/nvmet.ko
#
# cdblktests 
# ./check tests/nvme/
nvme/002 (create many subsystems and test discovery)         [passed]
    runtime    ...  27.640s
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
    runtime  10.145s  ...  10.147s
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  1.713s  ...  1.712s
nvme/005 (reset local loopback target)                       [not run]
    nvme_core module does not have parameter multipath
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
    runtime  0.111s  ...  0.115s
nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
    runtime  0.081s  ...  0.069s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
    runtime  1.690s  ...  1.727s
nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
    runtime  1.659s  ...  1.661s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  28.781s  ...  30.166s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  253.831s  ...  238.774s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  40.003s  ...  68.076s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  272.649s  ...  283.720s
nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
    runtime  21.772s  ...  21.397s
nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
    runtime  21.908s  ...  18.622s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
    runtime  15.860s  ...  18.313s
nvme/017 (create/delete many file-ns and test discovery)     [passed]
    runtime  16.470s  ...  18.374s
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
    runtime  1.665s  ...  1.890s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
    runtime  1.681s  ...  1.982s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
    runtime  1.645s  ...  1.913s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
    runtime  1.648s  ...  1.956s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
    runtime  2.063s  ...  2.553s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
    runtime  1.692s  ...  2.588s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
    runtime  1.643s  ...  1.656s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
    runtime  1.640s  ...  1.668s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
    runtime  1.643s  ...  1.961s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
    runtime  1.641s  ...  1.677s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
    runtime  1.648s  ...  1.868s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
    runtime  1.982s  ...  2.703s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
    runtime  0.308s  ...  0.328s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
    runtime  5.432s  ...  7.495s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
    runtime  0.053s  ...  0.046s

* With CONFIG_BLK_DEV_ZONED nvme and zonefs tests on membacked null_blk zoned :-
--------------------------------------------------------------------------------

# grep -i blk_dev_zoned .config
CONFIG_BLK_DEV_ZONED=y
# make M=drivers/nvme/ clean 
  CLEAN   drivers/nvme//Module.symvers
# make M=drivers/nvme/ 
  CC [M]  drivers/nvme//host/core.o
  CC [M]  drivers/nvme//host/trace.o
  CC [M]  drivers/nvme//host/lightnvm.o
  CC [M]  drivers/nvme//host/zns.o
  CC [M]  drivers/nvme//host/hwmon.o
  LD [M]  drivers/nvme//host/nvme-core.o
  CC [M]  drivers/nvme//host/pci.o
  LD [M]  drivers/nvme//host/nvme.o
  CC [M]  drivers/nvme//host/fabrics.o
  LD [M]  drivers/nvme//host/nvme-fabrics.o
  CC [M]  drivers/nvme//host/rdma.o
  LD [M]  drivers/nvme//host/nvme-rdma.o
  CC [M]  drivers/nvme//host/fc.o
  LD [M]  drivers/nvme//host/nvme-fc.o
  CC [M]  drivers/nvme//host/tcp.o
  LD [M]  drivers/nvme//host/nvme-tcp.o
  CC [M]  drivers/nvme//target/core.o
  CC [M]  drivers/nvme//target/configfs.o
  CC [M]  drivers/nvme//target/admin-cmd.o
  CC [M]  drivers/nvme//target/fabrics-cmd.o
  CC [M]  drivers/nvme//target/discovery.o
  CC [M]  drivers/nvme//target/io-cmd-file.o
  CC [M]  drivers/nvme//target/io-cmd-bdev.o
  CC [M]  drivers/nvme//target/passthru.o
  CC [M]  drivers/nvme//target/zns.o
  CC [M]  drivers/nvme//target/trace.o
  LD [M]  drivers/nvme//target/nvmet.o
  CC [M]  drivers/nvme//target/loop.o
  LD [M]  drivers/nvme//target/nvme-loop.o
  CC [M]  drivers/nvme//target/rdma.o
  LD [M]  drivers/nvme//target/nvmet-rdma.o
  CC [M]  drivers/nvme//target/fc.o
  LD [M]  drivers/nvme//target/nvmet-fc.o
  CC [M]  drivers/nvme//target/fcloop.o
  LD [M]  drivers/nvme//target/nvme-fcloop.o
  CC [M]  drivers/nvme//target/tcp.o
  LD [M]  drivers/nvme//target/nvmet-tcp.o
  MODPOST drivers/nvme//Module.symvers
  CC [M]  drivers/nvme//host/nvme-core.mod.o
  LD [M]  drivers/nvme//host/nvme-core.ko
  CC [M]  drivers/nvme//host/nvme-fabrics.mod.o
  LD [M]  drivers/nvme//host/nvme-fabrics.ko
  CC [M]  drivers/nvme//host/nvme-fc.mod.o
  LD [M]  drivers/nvme//host/nvme-fc.ko
  CC [M]  drivers/nvme//host/nvme-rdma.mod.o
  LD [M]  drivers/nvme//host/nvme-rdma.ko
  CC [M]  drivers/nvme//host/nvme-tcp.mod.o
  LD [M]  drivers/nvme//host/nvme-tcp.ko
  CC [M]  drivers/nvme//host/nvme.mod.o
  LD [M]  drivers/nvme//host/nvme.ko
  CC [M]  drivers/nvme//target/nvme-fcloop.mod.o
  LD [M]  drivers/nvme//target/nvme-fcloop.ko
  CC [M]  drivers/nvme//target/nvme-loop.mod.o
  LD [M]  drivers/nvme//target/nvme-loop.ko
  CC [M]  drivers/nvme//target/nvmet-fc.mod.o
  LD [M]  drivers/nvme//target/nvmet-fc.ko
  CC [M]  drivers/nvme//target/nvmet-rdma.mod.o
  LD [M]  drivers/nvme//target/nvmet-rdma.ko
  CC [M]  drivers/nvme//target/nvmet-tcp.mod.o
  LD [M]  drivers/nvme//target/nvmet-tcp.ko
  CC [M]  drivers/nvme//target/nvmet.mod.o
  LD [M]  drivers/nvme//target/nvmet.ko
# 
# cdblktests 
# ./check tests/nvme/
nvme/002 (create many subsystems and test discovery)         [passed]
    runtime  24.378s  ...  24.636s
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
    runtime  10.133s  ...  10.152s
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  2.463s  ...  2.478s
nvme/005 (reset local loopback target)                       [not run]
    nvme_core module does not have parameter multipath
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
    runtime  0.095s  ...  0.122s
nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
    runtime  0.065s  ...  0.079s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
    runtime  2.473s  ...  2.501s
nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
    runtime  2.460s  ...  2.424s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  24.526s  ...  28.015s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  265.967s  ...  282.717s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  44.665s  ...  48.124s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  261.739s  ...  352.331s
nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
    runtime  21.268s  ...  22.013s
nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
    runtime  18.820s  ...  22.104s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
    runtime  13.899s  ...  14.322s
nvme/017 (create/delete many file-ns and test discovery)     [passed]
    runtime  14.322s  ...  14.031s
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
    runtime  2.450s  ...  2.444s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
    runtime  2.475s  ...  2.489s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
    runtime  2.410s  ...  2.448s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
    runtime  2.441s  ...  2.439s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
    runtime  2.864s  ...  2.863s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
    runtime  2.465s  ...  2.446s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
    runtime  2.416s  ...  2.411s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
    runtime  2.419s  ...  2.748s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
    runtime  2.422s  ...  2.410s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
    runtime  2.456s  ...  2.462s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
    runtime  2.427s  ...  2.429s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
    runtime  2.751s  ...  2.755s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
    runtime  0.346s  ...  0.357s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
    runtime  13.601s  ...  13.591s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
    runtime  0.039s  ...  0.059s
#
# cdzonefstest 
# ./zonefs-tests.sh /dev/nvme1n1 
Gathering information on /dev/nvme1n1...
zonefs-tests on /dev/nvme1n1:
  16 zones (0 conventional zones, 16 sequential zones)
  131072 512B sectors zone size (64 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... PASS
  Test 0013:  mkzonefs (super block zone state)                    ... PASS
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

46 / 46 tests passed
-- 
2.22.1


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH V9 0/9] nvmet: add ZBD backend support
@ 2021-01-12  4:26 ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

Hi,

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
backend. There is no support for a generic block device backend to
handle the ZBD devices which are not NVMe protocol compliant.

This adds support to export the ZBDs (which are not NVMe drives) to host
the from target via NVMeOF using the host side ZNS interface.

The patch series is generated in bottom-top manner where, it first adds
prep patch and ZNS command-specific handlers on the top of genblk and 
updates the data structures, then one by one it wires up the admin cmds
in the order host calls them in namespace initializing sequence. Once
everything is ready, it wires-up the I/O command handlers. See below for
patch-series overview.

All the testcases are passing for the ZoneFS where ZBD exported with
NVMeOF backed by null_blk ZBD and null_blk ZBD without NVMeOF. Adding
test result below.

Note: This patch-series is based on the earlier posted patch series :-

[PATCH V2 0/4] nvmet: admin-cmd related cleanups and a fix
http://lists.infradead.org/pipermail/linux-nvme/2021-January/021729.html

-ck

Changes from V8:-

1. Rebase and retest on latest nvme-5.11.
2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set.
3. Add a fix to admin ns-desc list handler for handling default csi.

Changes from V7:-

1. Just like what block layer provides an API for bio_init(), provide
   nvmet_bio_init() such that we move bio initialization code for
   nvme-read-write commands from bdev and zns backend into the centralize
   helper. 
2. With bdev/zns/file now we have three backends that are checking for
   req->sg_cnt and calling nvmet_check_transfer_len() before we process
   nvme-read-write commands. Move this duplicate code from three
   backeneds into the helper.
3. Export and use nvmet_bio_done() callback in
   nvmet_execute_zone_append() instead of the open coding the function.
   This also avoids code duplication for bio & request completion with
   error log page update.
4. Add zonefs tests log for dm linear device created on the top of SMR HDD
   exported with NVMeOF ZNS backend with the help of nvme-loop.

Changes from V6:-

1. Instead of calling report zones to find conventional zones in the 
   loop use the loop inside LLD blkdev_report_zones()->LLD_report_zones,
   that also simplifies the report zone callback.
2. Fix the bug in nvmet_bdev_has_conv_zones().
3. Remove conditional operators in the nvmet_bdev_execute_zone_append().

Changes from V5:-

1.  Use bio->bi_iter.bi_sector for result of the REQ_OP_ZONE_APPEND
    command.
2.  Add endianness to the helper nvmet_sect_to_lba().
3.  Make bufsize u32 in zone mgmt recv command handler.
4.  Add __GFP_ZERO for report zone data buffer to return clean buffer.

Changes from V4:-

1.  Don't use bio_iov_iter_get_pages() instead add a patch to export
    bio_add_hw_page() and call it directly for zone append.
2.  Add inline vector optimization for append bio.
3.  Update the commit logs for the patches.
4.  Remove ZNS related identify data structures, use individual members.
5.  Add a comment for macro NVMET_MPSMIN_SHIFT.
6.  Remove nvmet_bdev() helper.
7.  Move the command set identifier code into common code.
8.  Use IS_ENABLED() and move helpers fomr zns.c into common code.
9.  Add a patch to support Command Set identifiers.
10. Open code nvmet_bdev_validate_zns_zones().
11. Remove the per namespace min zasl calculation and don't allow
    namespaces with zasl value > the first ns zasl value.
12. Move the stubs into the header file.
13. Add lba to/from sector conversion helpers and update the
    io-cmd-bdev.c to avoid the code duplication.
14. Add everything into one patch for zns command handlers and 
    respective calls from the target code.
15. Remove the trim ns-desclist admin callback patch from this series.
16. Add bio get and put helpers patches to reduce the duplicate code in
    generic bdev, passthru, and generic zns backend.

Changes from V3:-

1.  Get rid of the bio_max_zasl check.
2.  Remove extra lines.
4.  Remove the block layer api export patch.
5.  Remove the bvec check in the bio_iov_iter_get_pages() for
    REQ_OP_ZONE_APPEND so that we can reuse the code.

Changes from V2:-

1.  Move conventional zone bitmap check into 
    nvmet_bdev_validate_zns_zones(). 
2.  Don't use report zones call to check the runt zone.
3.  Trim nvmet_zasl() helper.
4.  Fix typo in the nvmet_zns_update_zasl().
5.  Remove the comment and fix the mdts calculation in
    nvmet_execute_identify_cns_cs_ctrl().
6.  Use u64 for bufsize in nvmet_bdev_execute_zone_mgmt_recv().
7.  Remove nvmet_zones_to_desc_size() and fix the nr_zones
    calculation.
8.  Remove the op variable in nvmet_bdev_execute_zone_append().
9.  Fix the nr_zones calculation nvmet_bdev_execute_zone_mgmt_recv().
10. Update cover letter subject.

Changes from V1:-

1.  Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2.  Mark helpers inline.
3.  Fix typos in the comments and update the comments.
4.  Get rid of the curly brackets.
5.  Don't allow drives with last smaller zones.
6.  Calculate the zasl as a function of ax_zone_append_sectors,
    bio_max_pages so we don't have to split the bio.
7.  Add global subsys->zasl and update the zasl when new namespace
    is enabled.
8.  Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
    move functionality in to the report zone callback.
9.  Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10. Allocate the zones buffer with zones size instead of bdev nr_zones.

Chaitanya Kulkarni (9):
  block: export bio_add_hw_pages()
  nvmet: add lba to sect conversion helpers
  nvmet: add NVM command set identifier support
  nvmet: add ZBD over ZNS backend support
  nvmet: add bio get helper for different backends
  nvmet: add bio init helper for different backends
  nvmet: add bio put helper for different backends
  nvmet: add common I/O length check helper
  nvmet: call nvmet_bio_done() for zone append

 block/bio.c                       |   1 +
 block/blk.h                       |   4 -
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  67 ++++--
 drivers/nvme/target/core.c        |  16 +-
 drivers/nvme/target/io-cmd-bdev.c |  67 +++---
 drivers/nvme/target/io-cmd-file.c |   7 +-
 drivers/nvme/target/nvmet.h       |  97 +++++++++
 drivers/nvme/target/passthru.c    |  11 +-
 drivers/nvme/target/zns.c         | 328 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h            |   4 +
 11 files changed, 536 insertions(+), 67 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

* Zonefs Test log with dm-linear on the top of SMR HDD:-
--------------------------------------------------------------------------------

1. Test Zoned Block Device info :- 
--------------------------------------------------------------------------------

# fdisk  -l /dev/sdh
Disk /dev/sdh: 13.64 TiB, 15000173281280 bytes, 3662151680 sectors
Disk model: HGST HSH721415AL
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
# cat /sys/block/sdh/queue/nr_zones
55880
# cat /sys/block/sdh/queue/zoned
host-managed
# cat /sys/block/sdh/queue/zone_append_max_bytes
688128

2. Creating NVMeOF target backed by dm-linear on the top of ZBD
--------------------------------------------------------------------------------

# ./zbdev.sh 1 dm-zbd
++ NQN=dm-zbd
++ echo '0 29022486528 linear /dev/sdh 274726912' | dmsetup create cksdh
9 directories, 4 files
++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd
++ mkdir /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1
++ echo -n /dev/dm-0
++ cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path
/dev/dm-0
++ echo 1
++ mkdir /sys/kernel/config/nvmet/ports/1/
++ echo -n loop
++ echo -n 1
++ ln -s /sys/kernel/config/nvmet/subsystems/dm-zbd /sys/kernel/config/nvmet/ports/1/subsystems/
++ sleep 1
++ echo transport=loop,nqn=dm-zbd
++ sleep 1
++ dmesg -c
[233450.572565] nvmet: adding nsid 1 to subsystem dm-zbd
[233452.269477] nvmet: creating controller 1 for subsystem dm-zbd for NQN nqn.2014-08.org.nvmexpress:uuid:853d7e82-8018-44ce-8784-ab81e7465ad9.
[233452.283352] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
[233452.292805] nvme nvme0: creating 8 I/O queues.
[233452.299210] nvme nvme0: new ctrl: "dm-zbd"

3. dm-linear and backend SMR HDD association :-
--------------------------------------------------------------------------------

# cat /sys/kernel/config/nvmet/subsystems/dm-zbd/namespaces/1/device_path 
/dev/dm-0
# dmsetup ls --tree
 cksdh (252:0)
	 └─ (8:112)
# lsblk | head -3
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdh       8:112  0  13.6T  0 disk 
└─cksdh 252:0    0  13.5T  0 dm   

4. NVMeOF controller :- 
--------------------------------------------------------------------------------

# nvme list | tr -s ' ' ' '  
Node SN Model Namespace Usage Format FW Rev  
/dev/nvme0n1 8c6f348dcd64404c Linux 1 14.86 TB / 14.86 TB 4 KiB + 0 B 5.10.0nv

5. Zonefs tests results :-
--------------------------------------------------------------------------------

# ./zonefs-tests.sh /dev/nvme0n1 
Gathering information on /dev/nvme0n1...
zonefs-tests on /dev/nvme0n1:
  55356 zones (0 conventional zones, 55356 sequential zones)
  524288 512B sectors zone size (256 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... PASS
  Test 0013:  mkzonefs (super block zone state)                    ... PASS
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

46 / 46 tests passed

* Without CONFIG_BLK_DEV_ZONED nvme tests :-
--------------------------------------------------------------------------------

#
# grep -i blk_dev_zoned .config
# CONFIG_BLK_DEV_ZONED is not set
# makej M=drivers/nvme/ clean 
  CLEAN   drivers/nvme//Module.symvers
# makej M=drivers/nvme/ 
  CC [M]  drivers/nvme//host/core.o
  CC [M]  drivers/nvme//host/trace.o
  CC [M]  drivers/nvme//host/lightnvm.o
  CC [M]  drivers/nvme//target/core.o
  CC [M]  drivers/nvme//host/hwmon.o
  CC [M]  drivers/nvme//target/configfs.o
  CC [M]  drivers/nvme//host/pci.o
  CC [M]  drivers/nvme//target/admin-cmd.o
  CC [M]  drivers/nvme//host/fabrics.o
  CC [M]  drivers/nvme//host/rdma.o
  CC [M]  drivers/nvme//target/fabrics-cmd.o
  CC [M]  drivers/nvme//target/discovery.o
  CC [M]  drivers/nvme//host/fc.o
  CC [M]  drivers/nvme//target/io-cmd-file.o
  CC [M]  drivers/nvme//host/tcp.o
  CC [M]  drivers/nvme//target/io-cmd-bdev.o
  CC [M]  drivers/nvme//target/passthru.o
  CC [M]  drivers/nvme//target/trace.o
  CC [M]  drivers/nvme//target/loop.o
  CC [M]  drivers/nvme//target/rdma.o
  CC [M]  drivers/nvme//target/fc.o
  CC [M]  drivers/nvme//target/fcloop.o
  CC [M]  drivers/nvme//target/tcp.o
  LD [M]  drivers/nvme//target/nvme-loop.o
  LD [M]  drivers/nvme//target/nvme-fcloop.o
  LD [M]  drivers/nvme//target/nvmet-tcp.o
  LD [M]  drivers/nvme//host/nvme-fabrics.o
  LD [M]  drivers/nvme//host/nvme.o
  LD [M]  drivers/nvme//host/nvme-rdma.o
  LD [M]  drivers/nvme//target/nvmet-rdma.o
  LD [M]  drivers/nvme//target/nvmet.o
  LD [M]  drivers/nvme//target/nvmet-fc.o
  LD [M]  drivers/nvme//host/nvme-tcp.o
  LD [M]  drivers/nvme//host/nvme-fc.o
  LD [M]  drivers/nvme//host/nvme-core.o
  MODPOST drivers/nvme//Module.symvers
  CC [M]  drivers/nvme//host/nvme-core.mod.o
  CC [M]  drivers/nvme//host/nvme-fabrics.mod.o
  CC [M]  drivers/nvme//host/nvme-fc.mod.o
  CC [M]  drivers/nvme//host/nvme-rdma.mod.o
  CC [M]  drivers/nvme//host/nvme-tcp.mod.o
  CC [M]  drivers/nvme//host/nvme.mod.o
  CC [M]  drivers/nvme//target/nvme-fcloop.mod.o
  CC [M]  drivers/nvme//target/nvme-loop.mod.o
  CC [M]  drivers/nvme//target/nvmet-fc.mod.o
  CC [M]  drivers/nvme//target/nvmet-rdma.mod.o
  CC [M]  drivers/nvme//target/nvmet-tcp.mod.o
  CC [M]  drivers/nvme//target/nvmet.mod.o
  LD [M]  drivers/nvme//target/nvme-fcloop.ko
  LD [M]  drivers/nvme//host/nvme-tcp.ko
  LD [M]  drivers/nvme//host/nvme-core.ko
  LD [M]  drivers/nvme//target/nvmet-tcp.ko
  LD [M]  drivers/nvme//target/nvme-loop.ko
  LD [M]  drivers/nvme//target/nvmet-fc.ko
  LD [M]  drivers/nvme//host/nvme-fabrics.ko
  LD [M]  drivers/nvme//host/nvme-fc.ko
  LD [M]  drivers/nvme//target/nvmet-rdma.ko
  LD [M]  drivers/nvme//host/nvme-rdma.ko
  LD [M]  drivers/nvme//host/nvme.ko
  LD [M]  drivers/nvme//target/nvmet.ko
#
# cdblktests 
# ./check tests/nvme/
nvme/002 (create many subsystems and test discovery)         [passed]
    runtime    ...  27.640s
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
    runtime  10.145s  ...  10.147s
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  1.713s  ...  1.712s
nvme/005 (reset local loopback target)                       [not run]
    nvme_core module does not have parameter multipath
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
    runtime  0.111s  ...  0.115s
nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
    runtime  0.081s  ...  0.069s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
    runtime  1.690s  ...  1.727s
nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
    runtime  1.659s  ...  1.661s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  28.781s  ...  30.166s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  253.831s  ...  238.774s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  40.003s  ...  68.076s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  272.649s  ...  283.720s
nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
    runtime  21.772s  ...  21.397s
nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
    runtime  21.908s  ...  18.622s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
    runtime  15.860s  ...  18.313s
nvme/017 (create/delete many file-ns and test discovery)     [passed]
    runtime  16.470s  ...  18.374s
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
    runtime  1.665s  ...  1.890s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
    runtime  1.681s  ...  1.982s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
    runtime  1.645s  ...  1.913s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
    runtime  1.648s  ...  1.956s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
    runtime  2.063s  ...  2.553s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
    runtime  1.692s  ...  2.588s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
    runtime  1.643s  ...  1.656s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
    runtime  1.640s  ...  1.668s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
    runtime  1.643s  ...  1.961s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
    runtime  1.641s  ...  1.677s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
    runtime  1.648s  ...  1.868s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
    runtime  1.982s  ...  2.703s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
    runtime  0.308s  ...  0.328s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
    runtime  5.432s  ...  7.495s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
    runtime  0.053s  ...  0.046s

* With CONFIG_BLK_DEV_ZONED nvme and zonefs tests on membacked null_blk zoned :-
--------------------------------------------------------------------------------

# grep -i blk_dev_zoned .config
CONFIG_BLK_DEV_ZONED=y
# make M=drivers/nvme/ clean 
  CLEAN   drivers/nvme//Module.symvers
# make M=drivers/nvme/ 
  CC [M]  drivers/nvme//host/core.o
  CC [M]  drivers/nvme//host/trace.o
  CC [M]  drivers/nvme//host/lightnvm.o
  CC [M]  drivers/nvme//host/zns.o
  CC [M]  drivers/nvme//host/hwmon.o
  LD [M]  drivers/nvme//host/nvme-core.o
  CC [M]  drivers/nvme//host/pci.o
  LD [M]  drivers/nvme//host/nvme.o
  CC [M]  drivers/nvme//host/fabrics.o
  LD [M]  drivers/nvme//host/nvme-fabrics.o
  CC [M]  drivers/nvme//host/rdma.o
  LD [M]  drivers/nvme//host/nvme-rdma.o
  CC [M]  drivers/nvme//host/fc.o
  LD [M]  drivers/nvme//host/nvme-fc.o
  CC [M]  drivers/nvme//host/tcp.o
  LD [M]  drivers/nvme//host/nvme-tcp.o
  CC [M]  drivers/nvme//target/core.o
  CC [M]  drivers/nvme//target/configfs.o
  CC [M]  drivers/nvme//target/admin-cmd.o
  CC [M]  drivers/nvme//target/fabrics-cmd.o
  CC [M]  drivers/nvme//target/discovery.o
  CC [M]  drivers/nvme//target/io-cmd-file.o
  CC [M]  drivers/nvme//target/io-cmd-bdev.o
  CC [M]  drivers/nvme//target/passthru.o
  CC [M]  drivers/nvme//target/zns.o
  CC [M]  drivers/nvme//target/trace.o
  LD [M]  drivers/nvme//target/nvmet.o
  CC [M]  drivers/nvme//target/loop.o
  LD [M]  drivers/nvme//target/nvme-loop.o
  CC [M]  drivers/nvme//target/rdma.o
  LD [M]  drivers/nvme//target/nvmet-rdma.o
  CC [M]  drivers/nvme//target/fc.o
  LD [M]  drivers/nvme//target/nvmet-fc.o
  CC [M]  drivers/nvme//target/fcloop.o
  LD [M]  drivers/nvme//target/nvme-fcloop.o
  CC [M]  drivers/nvme//target/tcp.o
  LD [M]  drivers/nvme//target/nvmet-tcp.o
  MODPOST drivers/nvme//Module.symvers
  CC [M]  drivers/nvme//host/nvme-core.mod.o
  LD [M]  drivers/nvme//host/nvme-core.ko
  CC [M]  drivers/nvme//host/nvme-fabrics.mod.o
  LD [M]  drivers/nvme//host/nvme-fabrics.ko
  CC [M]  drivers/nvme//host/nvme-fc.mod.o
  LD [M]  drivers/nvme//host/nvme-fc.ko
  CC [M]  drivers/nvme//host/nvme-rdma.mod.o
  LD [M]  drivers/nvme//host/nvme-rdma.ko
  CC [M]  drivers/nvme//host/nvme-tcp.mod.o
  LD [M]  drivers/nvme//host/nvme-tcp.ko
  CC [M]  drivers/nvme//host/nvme.mod.o
  LD [M]  drivers/nvme//host/nvme.ko
  CC [M]  drivers/nvme//target/nvme-fcloop.mod.o
  LD [M]  drivers/nvme//target/nvme-fcloop.ko
  CC [M]  drivers/nvme//target/nvme-loop.mod.o
  LD [M]  drivers/nvme//target/nvme-loop.ko
  CC [M]  drivers/nvme//target/nvmet-fc.mod.o
  LD [M]  drivers/nvme//target/nvmet-fc.ko
  CC [M]  drivers/nvme//target/nvmet-rdma.mod.o
  LD [M]  drivers/nvme//target/nvmet-rdma.ko
  CC [M]  drivers/nvme//target/nvmet-tcp.mod.o
  LD [M]  drivers/nvme//target/nvmet-tcp.ko
  CC [M]  drivers/nvme//target/nvmet.mod.o
  LD [M]  drivers/nvme//target/nvmet.ko
# 
# cdblktests 
# ./check tests/nvme/
nvme/002 (create many subsystems and test discovery)         [passed]
    runtime  24.378s  ...  24.636s
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
    runtime  10.133s  ...  10.152s
nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
    runtime  2.463s  ...  2.478s
nvme/005 (reset local loopback target)                       [not run]
    nvme_core module does not have parameter multipath
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
    runtime  0.095s  ...  0.122s
nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
    runtime  0.065s  ...  0.079s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
    runtime  2.473s  ...  2.501s
nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
    runtime  2.460s  ...  2.424s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  24.526s  ...  28.015s
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  265.967s  ...  282.717s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
    runtime  44.665s  ...  48.124s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
    runtime  261.739s  ...  352.331s
nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
    runtime  21.268s  ...  22.013s
nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
    runtime  18.820s  ...  22.104s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
    runtime  13.899s  ...  14.322s
nvme/017 (create/delete many file-ns and test discovery)     [passed]
    runtime  14.322s  ...  14.031s
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
    runtime  2.450s  ...  2.444s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
    runtime  2.475s  ...  2.489s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
    runtime  2.410s  ...  2.448s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
    runtime  2.441s  ...  2.439s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
    runtime  2.864s  ...  2.863s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
    runtime  2.465s  ...  2.446s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
    runtime  2.416s  ...  2.411s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
    runtime  2.419s  ...  2.748s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
    runtime  2.422s  ...  2.410s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
    runtime  2.456s  ...  2.462s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
    runtime  2.427s  ...  2.429s
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
    runtime  2.751s  ...  2.755s
nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
    runtime  0.346s  ...  0.357s
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
    runtime  13.601s  ...  13.591s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
    runtime  0.039s  ...  0.059s
#
# cdzonefstest 
# ./zonefs-tests.sh /dev/nvme1n1 
Gathering information on /dev/nvme1n1...
zonefs-tests on /dev/nvme1n1:
  16 zones (0 conventional zones, 16 sequential zones)
  131072 512B sectors zone size (64 MiB)
  1 max open zones
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
  Test 0012:  mkzonefs (invalid device)                            ... PASS
  Test 0013:  mkzonefs (super block zone state)                    ... PASS
  Test 0020:  mount (default)                                      ... PASS
  Test 0021:  mount (invalid device)                               ... PASS
  Test 0022:  mount (check mount directory sub-directories)        ... PASS
  Test 0023:  mount (options)                                      ... PASS
  Test 0030:  Number of files (default)                            ... PASS
  Test 0031:  Number of files (aggr_cnv)                           ... skip
  Test 0032:  Number of files using stat (default)                 ... PASS
  Test 0033:  Number of files using stat (aggr_cnv)                ... PASS
  Test 0034:  Number of blocks using stat (default)                ... PASS
  Test 0035:  Number of blocks using stat (aggr_cnv)               ... PASS
  Test 0040:  Files permissions (default)                          ... PASS
  Test 0041:  Files permissions (aggr_cnv)                         ... skip
  Test 0042:  Files permissions (set value)                        ... PASS
  Test 0043:  Files permissions (set value + aggr_cnv)             ... skip
  Test 0050:  Files owner (default)                                ... PASS
  Test 0051:  Files owner (aggr_cnv)                               ... skip
  Test 0052:  Files owner (set value)                              ... PASS
  Test 0053:  Files owner (set value + aggr_cnv)                   ... skip
  Test 0060:  Files size (default)                                 ... PASS
  Test 0061:  Files size (aggr_cnv)                                ... skip
  Test 0070:  Conventional file truncate                           ... skip
  Test 0071:  Conventional file truncate (aggr_cnv)                ... skip
  Test 0072:  Conventional file unlink                             ... skip
  Test 0073:  Conventional file unlink (aggr_cnv)                  ... skip
  Test 0074:  Conventional file random write                       ... skip
  Test 0075:  Conventional file random write (direct)              ... skip
  Test 0076:  Conventional file random write (aggr_cnv)            ... skip
  Test 0077:  Conventional file random write (aggr_cnv, direct)    ... skip
  Test 0078:  Conventional file mmap read/write                    ... skip
  Test 0079:  Conventional file mmap read/write (aggr_cnv)         ... skip
  Test 0080:  Sequential file truncate                             ... PASS
  Test 0081:  Sequential file unlink                               ... PASS
  Test 0082:  Sequential file buffered write IO                    ... PASS
  Test 0083:  Sequential file overwrite                            ... PASS
  Test 0084:  Sequential file unaligned write (sync IO)            ... PASS
  Test 0085:  Sequential file unaligned write (async IO)           ... PASS
  Test 0086:  Sequential file append (sync)                        ... PASS
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS
  Test 0090:  sequential file 4K synchronous write                 ... PASS
  Test 0091:  Sequential file large synchronous write              ... PASS

46 / 46 tests passed
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH V9 1/9] block: export bio_add_hw_pages()
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

To implement the NVMe Zone Append command on the NVMeOF target side for
generic Zoned Block Devices with NVMe Zoned Namespaces interface, we
need to build the bios with hardware limitations, i.e. we use
bio_add_hw_page() with queue_max_zone_append_sectors() instead of
bio_add_page().

Without this API being exported NVMeOF target will require to use
bio_add_hw_page() caller bio_iov_iter_get_pages(). That results in
extra work which is inefficient.

Export the API so that NVMeOF ZBD over ZNS backend can use it to build
Zone Append bios.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/bio.c            | 1 +
 block/blk.h            | 4 ----
 include/linux/blkdev.h | 4 ++++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..5cbd56b54f98 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -826,6 +826,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 	bio->bi_iter.bi_size += len;
 	return len;
 }
+EXPORT_SYMBOL(bio_add_hw_page);
 
 /**
  * bio_add_pc_page	- attempt to add page to passthrough bio
diff --git a/block/blk.h b/block/blk.h
index 7550364c326c..200030b2d74f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -351,8 +351,4 @@ int bdev_resize_partition(struct block_device *bdev, int partno,
 		sector_t start, sector_t length);
 int disk_expand_part_tbl(struct gendisk *disk, int target);
 
-int bio_add_hw_page(struct request_queue *q, struct bio *bio,
-		struct page *page, unsigned int len, unsigned int offset,
-		unsigned int max_sectors, bool *same_page);
-
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 070de09425ad..028ccc9bdf8d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2005,6 +2005,10 @@ struct block_device *I_BDEV(struct inode *inode);
 struct block_device *bdgrab(struct block_device *bdev);
 void bdput(struct block_device *);
 
+int bio_add_hw_page(struct request_queue *q, struct bio *bio,
+		struct page *page, unsigned int len, unsigned int offset,
+		unsigned int max_sectors, bool *same_page);
+
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
 int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart,
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 1/9] block: export bio_add_hw_pages()
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

To implement the NVMe Zone Append command on the NVMeOF target side for
generic Zoned Block Devices with NVMe Zoned Namespaces interface, we
need to build the bios with hardware limitations, i.e. we use
bio_add_hw_page() with queue_max_zone_append_sectors() instead of
bio_add_page().

Without this API being exported NVMeOF target will require to use
bio_add_hw_page() caller bio_iov_iter_get_pages(). That results in
extra work which is inefficient.

Export the API so that NVMeOF ZBD over ZNS backend can use it to build
Zone Append bios.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/bio.c            | 1 +
 block/blk.h            | 4 ----
 include/linux/blkdev.h | 4 ++++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..5cbd56b54f98 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -826,6 +826,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 	bio->bi_iter.bi_size += len;
 	return len;
 }
+EXPORT_SYMBOL(bio_add_hw_page);
 
 /**
  * bio_add_pc_page	- attempt to add page to passthrough bio
diff --git a/block/blk.h b/block/blk.h
index 7550364c326c..200030b2d74f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -351,8 +351,4 @@ int bdev_resize_partition(struct block_device *bdev, int partno,
 		sector_t start, sector_t length);
 int disk_expand_part_tbl(struct gendisk *disk, int target);
 
-int bio_add_hw_page(struct request_queue *q, struct bio *bio,
-		struct page *page, unsigned int len, unsigned int offset,
-		unsigned int max_sectors, bool *same_page);
-
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 070de09425ad..028ccc9bdf8d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2005,6 +2005,10 @@ struct block_device *I_BDEV(struct inode *inode);
 struct block_device *bdgrab(struct block_device *bdev);
 void bdput(struct block_device *);
 
+int bio_add_hw_page(struct request_queue *q, struct bio *bio,
+		struct page *page, unsigned int len, unsigned int offset,
+		unsigned int max_sectors, bool *same_page);
+
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
 int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart,
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 2/9] nvmet: add lba to sect conversion helpers
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  8 +++-----
 drivers/nvme/target/nvmet.h       | 10 ++++++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 125dde3f410e..23095bdfce06 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -256,8 +256,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	if (is_pci_p2pdma_page(sg_page(req->sg)))
 		op |= REQ_NOMERGE;
 
-	sector = le64_to_cpu(req->cmd->rw.slba);
-	sector <<= (req->ns->blksize_shift - 9);
+	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 
 	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
 		bio = &req->b.inline_bio;
@@ -345,7 +344,7 @@ static u16 nvmet_bdev_discard_range(struct nvmet_req *req,
 	int ret;
 
 	ret = __blkdev_issue_discard(ns->bdev,
-			le64_to_cpu(range->slba) << (ns->blksize_shift - 9),
+			nvmet_lba_to_sect(ns, range->slba),
 			le32_to_cpu(range->nlb) << (ns->blksize_shift - 9),
 			GFP_KERNEL, 0, bio);
 	if (ret && ret != -EOPNOTSUPP) {
@@ -414,8 +413,7 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
-	sector = le64_to_cpu(write_zeroes->slba) <<
-		(req->ns->blksize_shift - 9);
+	sector = nvmet_lba_to_sect(req->ns, write_zeroes->slba);
 	nr_sector = (((sector_t)le16_to_cpu(write_zeroes->length) + 1) <<
 		(req->ns->blksize_shift - 9));
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 592763732065..8776dd1a0490 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -603,4 +603,14 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
 	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
 }
 
+static inline __le64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
+{
+	return cpu_to_le64(sect >> (ns->blksize_shift - SECTOR_SHIFT));
+}
+
+static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
+{
+	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
+}
+
 #endif /* _NVMET_H */
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 2/9] nvmet: add lba to sect conversion helpers
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  8 +++-----
 drivers/nvme/target/nvmet.h       | 10 ++++++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 125dde3f410e..23095bdfce06 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -256,8 +256,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	if (is_pci_p2pdma_page(sg_page(req->sg)))
 		op |= REQ_NOMERGE;
 
-	sector = le64_to_cpu(req->cmd->rw.slba);
-	sector <<= (req->ns->blksize_shift - 9);
+	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 
 	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
 		bio = &req->b.inline_bio;
@@ -345,7 +344,7 @@ static u16 nvmet_bdev_discard_range(struct nvmet_req *req,
 	int ret;
 
 	ret = __blkdev_issue_discard(ns->bdev,
-			le64_to_cpu(range->slba) << (ns->blksize_shift - 9),
+			nvmet_lba_to_sect(ns, range->slba),
 			le32_to_cpu(range->nlb) << (ns->blksize_shift - 9),
 			GFP_KERNEL, 0, bio);
 	if (ret && ret != -EOPNOTSUPP) {
@@ -414,8 +413,7 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
-	sector = le64_to_cpu(write_zeroes->slba) <<
-		(req->ns->blksize_shift - 9);
+	sector = nvmet_lba_to_sect(req->ns, write_zeroes->slba);
 	nr_sector = (((sector_t)le16_to_cpu(write_zeroes->length) + 1) <<
 		(req->ns->blksize_shift - 9));
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 592763732065..8776dd1a0490 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -603,4 +603,14 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
 	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
 }
 
+static inline __le64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
+{
+	return cpu_to_le64(sect >> (ns->blksize_shift - SECTOR_SHIFT));
+}
+
+static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
+{
+	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
+}
+
 #endif /* _NVMET_H */
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 3/9] nvmet: add NVM command set identifier support
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

NVMe TP 4056 allows controller to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controller
needs to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
support different command sets. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
which represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend over NVMe ZNS
interface, since ZNS commands belongs to the different command set than
the default one.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 39 ++++++++++++++++++++++-----------
 drivers/nvme/target/core.c      | 13 ++++++++++-
 drivers/nvme/target/nvmet.h     |  1 +
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 220edacfccfb..a50b7bcac67a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -175,19 +175,26 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	if (!log)
 		goto out;
 
-	log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_set_features]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_get_features]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_async_event]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_keep_alive]		= cpu_to_le32(1 << 0);
-
-	log->iocs[nvme_cmd_read]		= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_write]		= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
+	switch (req->cmd->get_log_page.csi) {
+	case NVME_CSI_NVM:
+		log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_set_features]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_get_features]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_async_event]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_keep_alive]		= cpu_to_le32(1 << 0);
+
+		log->iocs[nvme_cmd_read]		= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_write]		= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
+		break;
+	default:
+		status = NVME_SC_INVALID_LOG_PAGE;
+		break;
+	}
 
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
@@ -606,6 +613,7 @@ static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
+	u16 nvme_cis_nvm = NVME_CSI_NVM;
 	u16 status = 0;
 	off_t off = 0;
 
@@ -631,6 +639,11 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
+					  &nvme_cis_nvm, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 8ce4d59cc9e7..672e4009f8d6 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -681,6 +681,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	uuid_gen(&ns->uuid);
 	ns->buffered_io = false;
+	ns->csi = NVME_CSI_NVM;
 
 	return ns;
 }
@@ -1103,6 +1104,16 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
 	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
 }
 
+static inline bool nvmet_cc_css_check(u8 cc_css)
+{
+	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
+	case NVME_CC_CSS_NVM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 {
 	lockdep_assert_held(&ctrl->lock);
@@ -1111,7 +1122,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 	    nvmet_cc_iocqes(ctrl->cc) != NVME_NVM_IOCQES ||
 	    nvmet_cc_mps(ctrl->cc) != 0 ||
 	    nvmet_cc_ams(ctrl->cc) != 0 ||
-	    nvmet_cc_css(ctrl->cc) != 0) {
+	    !nvmet_cc_css_check(nvmet_cc_css(ctrl->cc))) {
 		ctrl->csts = NVME_CSTS_CFS;
 		return;
 	}
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 8776dd1a0490..476b3cd91c65 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,7 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+	u8			csi;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 3/9] nvmet: add NVM command set identifier support
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

NVMe TP 4056 allows controller to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controller
needs to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
support different command sets. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
which represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend over NVMe ZNS
interface, since ZNS commands belongs to the different command set than
the default one.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 39 ++++++++++++++++++++++-----------
 drivers/nvme/target/core.c      | 13 ++++++++++-
 drivers/nvme/target/nvmet.h     |  1 +
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 220edacfccfb..a50b7bcac67a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -175,19 +175,26 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	if (!log)
 		goto out;
 
-	log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_set_features]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_get_features]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_async_event]	= cpu_to_le32(1 << 0);
-	log->acs[nvme_admin_keep_alive]		= cpu_to_le32(1 << 0);
-
-	log->iocs[nvme_cmd_read]		= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_write]		= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
-	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
+	switch (req->cmd->get_log_page.csi) {
+	case NVME_CSI_NVM:
+		log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_set_features]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_get_features]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_async_event]	= cpu_to_le32(1 << 0);
+		log->acs[nvme_admin_keep_alive]		= cpu_to_le32(1 << 0);
+
+		log->iocs[nvme_cmd_read]		= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_write]		= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
+		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
+		break;
+	default:
+		status = NVME_SC_INVALID_LOG_PAGE;
+		break;
+	}
 
 	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 
@@ -606,6 +613,7 @@ static u16 nvmet_copy_ns_identifier(struct nvmet_req *req, u8 type, u8 len,
 
 static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 {
+	u16 nvme_cis_nvm = NVME_CSI_NVM;
 	u16 status = 0;
 	off_t off = 0;
 
@@ -631,6 +639,11 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
+					  &nvme_cis_nvm, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 8ce4d59cc9e7..672e4009f8d6 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -681,6 +681,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	uuid_gen(&ns->uuid);
 	ns->buffered_io = false;
+	ns->csi = NVME_CSI_NVM;
 
 	return ns;
 }
@@ -1103,6 +1104,16 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
 	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
 }
 
+static inline bool nvmet_cc_css_check(u8 cc_css)
+{
+	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
+	case NVME_CC_CSS_NVM:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 {
 	lockdep_assert_held(&ctrl->lock);
@@ -1111,7 +1122,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 	    nvmet_cc_iocqes(ctrl->cc) != NVME_NVM_IOCQES ||
 	    nvmet_cc_mps(ctrl->cc) != 0 ||
 	    nvmet_cc_ams(ctrl->cc) != 0 ||
-	    nvmet_cc_css(ctrl->cc) != 0) {
+	    !nvmet_cc_css_check(nvmet_cc_css(ctrl->cc))) {
 		ctrl->csts = NVME_CSTS_CFS;
 		return;
 	}
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 8776dd1a0490..476b3cd91c65 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -81,6 +81,7 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+	u8			csi;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for
NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend to support the ZBDs for NVMeOF target.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as
NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With new command set identifier we also update the target command effects
logs to reflect the ZNS compliant commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  28 +++
 drivers/nvme/target/core.c        |   3 +
 drivers/nvme/target/io-cmd-bdev.c |  33 ++-
 drivers/nvme/target/nvmet.h       |  38 ++++
 drivers/nvme/target/zns.c         | 342 ++++++++++++++++++++++++++++++
 6 files changed, 437 insertions(+), 8 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..9837e580fa7e 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
+nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index a50b7bcac67a..bdf09d8faa48 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 		break;
+	case NVME_CSI_ZNS:
+		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			u32 *iocs = log->iocs;
+
+			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
+			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+		}
+		break;
 	default:
 		status = NVME_SC_INVALID_LOG_PAGE;
 		break;
@@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 	if (status)
 		goto out;
 
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+		u16 nvme_cis_zns = NVME_CSI_ZNS;
+
+		if (req->ns->csi == NVME_CSI_ZNS)
+			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+							  NVME_NIDT_CSI_LEN,
+							  &nvme_cis_zns, &off);
+		if (status)
+			goto out;
+	}
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
@@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ns(req);
+		break;
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ctrl(req);
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 672e4009f8d6..17d5da062a5a 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
 static inline bool nvmet_cc_css_check(u8 cc_css)
 {
 	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
+	case NVME_CC_CSS_CSI:
 	case NVME_CC_CSS_NVM:
 		return true;
 	default:
@@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 {
 	/* command sets supported: NVMe command set: */
 	ctrl->cap = (1ULL << 37);
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+		ctrl->cap |= (1ULL << 43);
 	/* CC.EN timeout in 500msec units: */
 	ctrl->cap |= (15ULL << 24);
 	/* maximum queue entries supported: */
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 23095bdfce06..6178ef643962 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
 	}
 }
 
+void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
+{
+	if (ns->bdev) {
+		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
+		ns->bdev = NULL;
+	}
+}
+
 int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 {
 	int ret;
@@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
-	return 0;
-}
-
-void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
-{
-	if (ns->bdev) {
-		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
-		ns->bdev = NULL;
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
+		if (!nvmet_bdev_zns_enable(ns)) {
+			nvmet_bdev_ns_disable(ns);
+			return -EINVAL;
+		}
+		ns->csi = NVME_CSI_ZNS;
 	}
+
+	return 0;
 }
 
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
@@ -448,6 +456,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
 		       req->sq->qid);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 476b3cd91c65..7361665585a2 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -252,6 +252,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	u8			zasl;
+#endif /* CONFIG_BLK_DEV_ZONED */
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -614,4 +618,38 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
 	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+#else  /* CONFIG_BLK_DEV_ZONED */
+static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	return false;
+}
+static inline void
+nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..2a71f56e568d
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/nvme.h>
+#include <linux/blkdev.h>
+#include "nvmet.h"
+
+/*
+ * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
+ * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
+ * as page_shift value. When calculating the ZASL use shift by 12.
+ */
+#define NVMET_MPSMIN_SHIFT	12
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	u16 status = NVME_SC_SUCCESS;
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
+		status = NVME_SC_INVALID_FIELD;
+
+out:
+	return status;
+}
+
+/*
+ *  ZNS related command implementation and helpers.
+ */
+
+static inline u8 nvmet_zasl(unsigned int zone_append_sects)
+{
+	/*
+	 * Zone Append Size Limit is the value experessed in the units
+	 * of minimum memory page size (i.e. 12) and is reported power of 2.
+	 */
+	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
+}
+
+static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
+{
+	struct request_queue *q = ns->bdev->bd_disk->queue;
+	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+
+	if (ns->subsys->zasl)
+		return ns->subsys->zasl < zasl ? false : true;
+
+	ns->subsys->zasl = zasl;
+	return true;
+}
+
+
+static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
+					    unsigned int idx, void *data)
+{
+	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
+{
+	int ret;
+
+	if (bdev->bd_disk->queue->conv_zones_bitmap)
+		return true;
+
+	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
+				  nvmet_bdev_validate_zns_zones_cb, NULL);
+
+	return ret < 0 ? true : false;
+}
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	if (nvmet_bdev_has_conv_zones(ns->bdev))
+		return false;
+
+	/*
+	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
+	 * to the device physical block size. So use this value as the logical
+	 * block size to avoid errors.
+	 */
+	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
+
+	if (!nvmet_zns_update_zasl(ns))
+		return false;
+
+	return !(get_capacity(ns->bdev->bd_disk) &
+			(bdev_zone_sectors(ns->bdev) - 1));
+}
+
+/*
+ * ZNS related Admin and I/O command handlers.
+ */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	u8 zasl = req->sq->ctrl->subsys->zasl;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_id_ctrl_zns *id;
+	u16 status;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	if (ctrl->ops->get_mdts)
+		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
+	else
+		id->zasl = zasl;
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u16 status = NVME_SC_SUCCESS;
+	u64 zsze;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+struct nvmet_report_zone_data {
+	struct nvmet_ns *ns;
+	struct nvme_zone_report *rz;
+};
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
+				     void *data)
+{
+	struct nvmet_report_zone_data *report_zone_data = data;
+	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
+	struct nvmet_ns *ns = report_zone_data->ns;
+
+	entries[idx].zcap = nvmet_sect_to_lba(ns, z->capacity);
+	entries[idx].zslba = nvmet_sect_to_lba(ns, z->start);
+	entries[idx].wp = nvmet_sect_to_lba(ns, z->wp);
+	entries[idx].za = z->reset ? 1 << 2 : 0;
+	entries[idx].zt = z->type;
+	entries[idx].zs = z->cond << 4;
+
+	return 0;
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
+	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	struct nvmet_report_zone_data data = { .ns = req->ns };
+	unsigned int nr_zones;
+	int reported_zones;
+	u16 status;
+
+	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
+			sizeof(struct nvme_zone_descriptor);
+
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
+	if (!data.rz) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
+					     nvmet_bdev_report_zone_cb,
+					     &data);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	data.rz->nr_zones = cpu_to_le64(reported_zones);
+
+	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
+
+out_free_report_zones:
+	kvfree(data.rz);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
+	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
+	u16 status = NVME_SC_SUCCESS;
+	enum req_opf op;
+	int ret;
+
+	if (req->cmd->zms.select_all)
+		nr_sect = get_capacity(req->ns->bdev->bd_disk);
+
+	switch (req->cmd->zms.zsa) {
+	case NVME_ZONE_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case NVME_ZONE_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case NVME_ZONE_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case NVME_ZONE_RESET:
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
+	struct request_queue *q = req->ns->bdev->bd_disk->queue;
+	unsigned int max_sects = queue_max_zone_append_sectors(q);
+	u16 status = NVME_SC_SUCCESS;
+	unsigned int total_len = 0;
+	struct scatterlist *sg;
+	int ret = 0, sg_cnt;
+	struct bio *bio;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return;
+	}
+
+	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
+		bio = &req->b.inline_bio;
+		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
+	} else {
+		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
+	}
+
+	bio_set_dev(bio, req->ns->bdev);
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
+		bio->bi_opf |= REQ_FUA;
+
+	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
+		struct page *p = sg_page(sg);
+		unsigned int l = sg->length;
+		unsigned int o = sg->offset;
+		bool same_page = false;
+
+		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
+		if (ret != sg->length) {
+			status = NVME_SC_INTERNAL;
+			goto out_bio_put;
+		}
+		if (same_page)
+			put_page(p);
+
+		total_len += sg->length;
+	}
+
+	if (total_len != nvmet_rw_data_len(req)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out_bio_put;
+	}
+
+	ret = submit_bio_wait(bio);
+	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
+						 bio->bi_iter.bi_sector);
+
+out_bio_put:
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
+}
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for
NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend to support the ZBDs for NVMeOF target.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as
NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With new command set identifier we also update the target command effects
logs to reflect the ZNS compliant commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  28 +++
 drivers/nvme/target/core.c        |   3 +
 drivers/nvme/target/io-cmd-bdev.c |  33 ++-
 drivers/nvme/target/nvmet.h       |  38 ++++
 drivers/nvme/target/zns.c         | 342 ++++++++++++++++++++++++++++++
 6 files changed, 437 insertions(+), 8 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..9837e580fa7e 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
+nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index a50b7bcac67a..bdf09d8faa48 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 		break;
+	case NVME_CSI_ZNS:
+		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			u32 *iocs = log->iocs;
+
+			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
+			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+		}
+		break;
 	default:
 		status = NVME_SC_INVALID_LOG_PAGE;
 		break;
@@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 	if (status)
 		goto out;
 
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+		u16 nvme_cis_zns = NVME_CSI_ZNS;
+
+		if (req->ns->csi == NVME_CSI_ZNS)
+			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+							  NVME_NIDT_CSI_LEN,
+							  &nvme_cis_zns, &off);
+		if (status)
+			goto out;
+	}
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
@@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
 		return nvmet_execute_identify_ns(req);
+	case NVME_ID_CNS_CS_NS:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ns(req);
+		break;
 	case NVME_ID_CNS_CTRL:
 		return nvmet_execute_identify_ctrl(req);
+	case NVME_ID_CNS_CS_CTRL:
+		if (req->cmd->identify.csi == NVME_CSI_ZNS)
+			return nvmet_execute_identify_cns_cs_ctrl(req);
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		return nvmet_execute_identify_nslist(req);
 	case NVME_ID_CNS_NS_DESC_LIST:
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 672e4009f8d6..17d5da062a5a 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
 static inline bool nvmet_cc_css_check(u8 cc_css)
 {
 	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
+	case NVME_CC_CSS_CSI:
 	case NVME_CC_CSS_NVM:
 		return true;
 	default:
@@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 {
 	/* command sets supported: NVMe command set: */
 	ctrl->cap = (1ULL << 37);
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+		ctrl->cap |= (1ULL << 43);
 	/* CC.EN timeout in 500msec units: */
 	ctrl->cap |= (15ULL << 24);
 	/* maximum queue entries supported: */
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 23095bdfce06..6178ef643962 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
 	}
 }
 
+void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
+{
+	if (ns->bdev) {
+		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
+		ns->bdev = NULL;
+	}
+}
+
 int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 {
 	int ret;
@@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
-	return 0;
-}
-
-void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
-{
-	if (ns->bdev) {
-		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
-		ns->bdev = NULL;
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
+		if (!nvmet_bdev_zns_enable(ns)) {
+			nvmet_bdev_ns_disable(ns);
+			return -EINVAL;
+		}
+		ns->csi = NVME_CSI_ZNS;
 	}
+
+	return 0;
 }
 
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
@@ -448,6 +456,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
 	default:
 		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
 		       req->sq->qid);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 476b3cd91c65..7361665585a2 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -252,6 +252,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	u8			zasl;
+#endif /* CONFIG_BLK_DEV_ZONED */
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -614,4 +618,38 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
 	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
+#else  /* CONFIG_BLK_DEV_ZONED */
+static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	return false;
+}
+static inline void
+nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+}
+static inline void
+nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..2a71f56e568d
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (c) 2020-2021 HGST, a Western Digital Company.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/nvme.h>
+#include <linux/blkdev.h>
+#include "nvmet.h"
+
+/*
+ * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
+ * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
+ * as page_shift value. When calculating the ZASL use shift by 12.
+ */
+#define NVMET_MPSMIN_SHIFT	12
+
+static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
+{
+	u16 status = NVME_SC_SUCCESS;
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
+		status = NVME_SC_INVALID_FIELD;
+
+out:
+	return status;
+}
+
+/*
+ *  ZNS related command implementation and helpers.
+ */
+
+static inline u8 nvmet_zasl(unsigned int zone_append_sects)
+{
+	/*
+	 * Zone Append Size Limit is the value experessed in the units
+	 * of minimum memory page size (i.e. 12) and is reported power of 2.
+	 */
+	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
+}
+
+static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
+{
+	struct request_queue *q = ns->bdev->bd_disk->queue;
+	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+
+	if (ns->subsys->zasl)
+		return ns->subsys->zasl < zasl ? false : true;
+
+	ns->subsys->zasl = zasl;
+	return true;
+}
+
+
+static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
+					    unsigned int idx, void *data)
+{
+	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
+{
+	int ret;
+
+	if (bdev->bd_disk->queue->conv_zones_bitmap)
+		return true;
+
+	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
+				  nvmet_bdev_validate_zns_zones_cb, NULL);
+
+	return ret < 0 ? true : false;
+}
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	if (nvmet_bdev_has_conv_zones(ns->bdev))
+		return false;
+
+	/*
+	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
+	 * to the device physical block size. So use this value as the logical
+	 * block size to avoid errors.
+	 */
+	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
+
+	if (!nvmet_zns_update_zasl(ns))
+		return false;
+
+	return !(get_capacity(ns->bdev->bd_disk) &
+			(bdev_zone_sectors(ns->bdev) - 1));
+}
+
+/*
+ * ZNS related Admin and I/O command handlers.
+ */
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	u8 zasl = req->sq->ctrl->subsys->zasl;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_id_ctrl_zns *id;
+	u16 status;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	if (ctrl->ops->get_mdts)
+		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
+	else
+		id->zasl = zasl;
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u16 status = NVME_SC_SUCCESS;
+	u64 zsze;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	if (!req->ns) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+struct nvmet_report_zone_data {
+	struct nvmet_ns *ns;
+	struct nvme_zone_report *rz;
+};
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
+				     void *data)
+{
+	struct nvmet_report_zone_data *report_zone_data = data;
+	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
+	struct nvmet_ns *ns = report_zone_data->ns;
+
+	entries[idx].zcap = nvmet_sect_to_lba(ns, z->capacity);
+	entries[idx].zslba = nvmet_sect_to_lba(ns, z->start);
+	entries[idx].wp = nvmet_sect_to_lba(ns, z->wp);
+	entries[idx].za = z->reset ? 1 << 2 : 0;
+	entries[idx].zt = z->type;
+	entries[idx].zs = z->cond << 4;
+
+	return 0;
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
+	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	struct nvmet_report_zone_data data = { .ns = req->ns };
+	unsigned int nr_zones;
+	int reported_zones;
+	u16 status;
+
+	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
+			sizeof(struct nvme_zone_descriptor);
+
+	status = nvmet_bdev_zns_checks(req);
+	if (status)
+		goto out;
+
+	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
+	if (!data.rz) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
+					     nvmet_bdev_report_zone_cb,
+					     &data);
+	if (reported_zones < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out_free_report_zones;
+	}
+
+	data.rz->nr_zones = cpu_to_le64(reported_zones);
+
+	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
+
+out_free_report_zones:
+	kvfree(data.rz);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
+	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
+	u16 status = NVME_SC_SUCCESS;
+	enum req_opf op;
+	int ret;
+
+	if (req->cmd->zms.select_all)
+		nr_sect = get_capacity(req->ns->bdev->bd_disk);
+
+	switch (req->cmd->zms.zsa) {
+	case NVME_ZONE_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case NVME_ZONE_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case NVME_ZONE_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case NVME_ZONE_RESET:
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		status = NVME_SC_INVALID_FIELD;
+		goto out;
+	}
+
+	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
+	if (ret)
+		status = NVME_SC_INTERNAL;
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
+	struct request_queue *q = req->ns->bdev->bd_disk->queue;
+	unsigned int max_sects = queue_max_zone_append_sectors(q);
+	u16 status = NVME_SC_SUCCESS;
+	unsigned int total_len = 0;
+	struct scatterlist *sg;
+	int ret = 0, sg_cnt;
+	struct bio *bio;
+
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return;
+	}
+
+	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
+		bio = &req->b.inline_bio;
+		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
+	} else {
+		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
+	}
+
+	bio_set_dev(bio, req->ns->bdev);
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
+		bio->bi_opf |= REQ_FUA;
+
+	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
+		struct page *p = sg_page(sg);
+		unsigned int l = sg->length;
+		unsigned int o = sg->offset;
+		bool same_page = false;
+
+		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
+		if (ret != sg->length) {
+			status = NVME_SC_INTERNAL;
+			goto out_bio_put;
+		}
+		if (same_page)
+			put_page(p);
+
+		total_len += sg->length;
+	}
+
+	if (total_len != nvmet_rw_data_len(req)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out_bio_put;
+	}
+
+	ret = submit_bio_wait(bio);
+	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
+						 bio->bi_iter.bi_sector);
+
+out_bio_put:
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
+}
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

With the addition of the zns backend now we have three different
backends with inline bio optimization. That leads to having duplicate
code for allocating or initializing the bio in all three backends:
generic bdev, passsthru, and generic zns.

Add a helper function to reduce the duplicate code such that helper
function accepts the bi_end_io callback which gets initialize for the
non-inline bio_alloc() case. This is due to the special case needed for
the passthru backend non-inline bio allocation bio_alloc() where we set
the bio->bi_end_io = bio_put, having this parameter avoids the extra
branch in the passthru fast path. For rest of the backends, we set the
same bi_end_io callback for inline and non-inline cases, that is for
generic bdev we set to nvmet_bio_done() and for generic zns we set to
NULL.                            

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  7 +------
 drivers/nvme/target/nvmet.h       | 16 ++++++++++++++++
 drivers/nvme/target/passthru.c    |  8 +-------
 drivers/nvme/target/zns.c         |  8 +-------
 4 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 6178ef643962..72746e29cb0d 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -266,12 +266,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 
 	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 
-	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
-		bio = &req->b.inline_bio;
-		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
-	} else {
-		bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
-	}
+	bio = nvmet_req_bio_get(req, NULL);
 	bio_set_dev(bio, req->ns->bdev);
 	bio->bi_iter.bi_sector = sector;
 	bio->bi_private = req;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7361665585a2..3fc84f79cce1 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -652,4 +652,20 @@ nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 }
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
+					    bio_end_io_t *bi_end_io)
+{
+	struct bio *bio;
+
+	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
+		bio = &req->b.inline_bio;
+		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
+		return bio;
+	}
+
+	bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
+	bio->bi_end_io = bi_end_io;
+	return bio;
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index b9776fc8f08f..54f765b566ee 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -194,13 +194,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 	if (req->sg_cnt > BIO_MAX_PAGES)
 		return -EINVAL;
 
-	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
-		bio = &req->p.inline_bio;
-		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
-	} else {
-		bio = bio_alloc(GFP_KERNEL, min(req->sg_cnt, BIO_MAX_PAGES));
-		bio->bi_end_io = bio_put;
-	}
+	bio = nvmet_req_bio_get(req, bio_put);
 	bio->bi_opf = req_op(rq);
 
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 2a71f56e568d..c32e93a3c7e1 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -296,13 +296,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 		return;
 	}
 
-	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
-		bio = &req->b.inline_bio;
-		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
-	} else {
-		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
-	}
-
+	bio = nvmet_req_bio_get(req, NULL);
 	bio_set_dev(bio, req->ns->bdev);
 	bio->bi_iter.bi_sector = sect;
 	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

With the addition of the zns backend now we have three different
backends with inline bio optimization. That leads to having duplicate
code for allocating or initializing the bio in all three backends:
generic bdev, passsthru, and generic zns.

Add a helper function to reduce the duplicate code such that helper
function accepts the bi_end_io callback which gets initialize for the
non-inline bio_alloc() case. This is due to the special case needed for
the passthru backend non-inline bio allocation bio_alloc() where we set
the bio->bi_end_io = bio_put, having this parameter avoids the extra
branch in the passthru fast path. For rest of the backends, we set the
same bi_end_io callback for inline and non-inline cases, that is for
generic bdev we set to nvmet_bio_done() and for generic zns we set to
NULL.                            

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  7 +------
 drivers/nvme/target/nvmet.h       | 16 ++++++++++++++++
 drivers/nvme/target/passthru.c    |  8 +-------
 drivers/nvme/target/zns.c         |  8 +-------
 4 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 6178ef643962..72746e29cb0d 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -266,12 +266,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 
 	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 
-	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
-		bio = &req->b.inline_bio;
-		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
-	} else {
-		bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
-	}
+	bio = nvmet_req_bio_get(req, NULL);
 	bio_set_dev(bio, req->ns->bdev);
 	bio->bi_iter.bi_sector = sector;
 	bio->bi_private = req;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7361665585a2..3fc84f79cce1 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -652,4 +652,20 @@ nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 }
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
+					    bio_end_io_t *bi_end_io)
+{
+	struct bio *bio;
+
+	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
+		bio = &req->b.inline_bio;
+		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
+		return bio;
+	}
+
+	bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
+	bio->bi_end_io = bi_end_io;
+	return bio;
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index b9776fc8f08f..54f765b566ee 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -194,13 +194,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 	if (req->sg_cnt > BIO_MAX_PAGES)
 		return -EINVAL;
 
-	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
-		bio = &req->p.inline_bio;
-		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
-	} else {
-		bio = bio_alloc(GFP_KERNEL, min(req->sg_cnt, BIO_MAX_PAGES));
-		bio->bi_end_io = bio_put;
-	}
+	bio = nvmet_req_bio_get(req, bio_put);
 	bio->bi_opf = req_op(rq);
 
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 2a71f56e568d..c32e93a3c7e1 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -296,13 +296,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 		return;
 	}
 
-	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
-		bio = &req->b.inline_bio;
-		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
-	} else {
-		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
-	}
-
+	bio = nvmet_req_bio_get(req, NULL);
 	bio_set_dev(bio, req->ns->bdev);
 	bio->bi_iter.bi_sector = sect;
 	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

With the addition of the zns backend now we have two different backends
with the same bio initialization code. That leads to having duplicate
code in two backends: generic bdev and generic zns.

Add a helper function to reduce the duplicate code such that helper
function initializes the various bio initialization parameters such as
bio block device, op flags, sector, end io callback, and private member,

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  6 +-----
 drivers/nvme/target/nvmet.h       | 11 +++++++++++
 drivers/nvme/target/zns.c         |  6 +++---
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 72746e29cb0d..b1fb0bb1f39f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -267,11 +267,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 
 	bio = nvmet_req_bio_get(req, NULL);
-	bio_set_dev(bio, req->ns->bdev);
-	bio->bi_iter.bi_sector = sector;
-	bio->bi_private = req;
-	bio->bi_end_io = nvmet_bio_done;
-	bio->bi_opf = op;
+	nvmet_bio_init(bio, req->ns->bdev, op, sector, req, nvmet_bio_done);
 
 	blk_start_plug(&plug);
 	if (req->metadata_len)
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 3fc84f79cce1..1ec9e1b35c67 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -668,4 +668,15 @@ static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
 	return bio;
 }
 
+static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
+				  unsigned int op, sector_t sect, void *private,
+				  bio_end_io_t *bi_end_io)
+{
+	bio_set_dev(bio, bdev);
+	bio->bi_opf = op;
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_private = private;
+	bio->bi_end_io = bi_end_io;
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index c32e93a3c7e1..92213bed0006 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -281,6 +281,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 {
 	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 	struct request_queue *q = req->ns->bdev->bd_disk->queue;
+	unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
 	unsigned int max_sects = queue_max_zone_append_sectors(q);
 	u16 status = NVME_SC_SUCCESS;
 	unsigned int total_len = 0;
@@ -297,9 +298,8 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	}
 
 	bio = nvmet_req_bio_get(req, NULL);
-	bio_set_dev(bio, req->ns->bdev);
-	bio->bi_iter.bi_sector = sect;
-	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
+
 	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
 		bio->bi_opf |= REQ_FUA;
 
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

With the addition of the zns backend now we have two different backends
with the same bio initialization code. That leads to having duplicate
code in two backends: generic bdev and generic zns.

Add a helper function to reduce the duplicate code such that helper
function initializes the various bio initialization parameters such as
bio block device, op flags, sector, end io callback, and private member,

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  6 +-----
 drivers/nvme/target/nvmet.h       | 11 +++++++++++
 drivers/nvme/target/zns.c         |  6 +++---
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 72746e29cb0d..b1fb0bb1f39f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -267,11 +267,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 
 	bio = nvmet_req_bio_get(req, NULL);
-	bio_set_dev(bio, req->ns->bdev);
-	bio->bi_iter.bi_sector = sector;
-	bio->bi_private = req;
-	bio->bi_end_io = nvmet_bio_done;
-	bio->bi_opf = op;
+	nvmet_bio_init(bio, req->ns->bdev, op, sector, req, nvmet_bio_done);
 
 	blk_start_plug(&plug);
 	if (req->metadata_len)
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 3fc84f79cce1..1ec9e1b35c67 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -668,4 +668,15 @@ static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
 	return bio;
 }
 
+static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
+				  unsigned int op, sector_t sect, void *private,
+				  bio_end_io_t *bi_end_io)
+{
+	bio_set_dev(bio, bdev);
+	bio->bi_opf = op;
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_private = private;
+	bio->bi_end_io = bi_end_io;
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index c32e93a3c7e1..92213bed0006 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -281,6 +281,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 {
 	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
 	struct request_queue *q = req->ns->bdev->bd_disk->queue;
+	unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
 	unsigned int max_sects = queue_max_zone_append_sectors(q);
 	u16 status = NVME_SC_SUCCESS;
 	unsigned int total_len = 0;
@@ -297,9 +298,8 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	}
 
 	bio = nvmet_req_bio_get(req, NULL);
-	bio_set_dev(bio, req->ns->bdev);
-	bio->bi_iter.bi_sector = sect;
-	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
+
 	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
 		bio->bi_opf |= REQ_FUA;
 
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 7/9] nvmet: add bio put helper for different backends
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

With the addition of zns backend now we have three different backends
with inline bio optimization. That leads to having duplicate code in for
freeing the bio in all three backends: generic bdev, passsthru and
generic zns.

Add a helper function to avoid the duplicate code and update the
respective backends.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +--
 drivers/nvme/target/nvmet.h       | 6 ++++++
 drivers/nvme/target/passthru.c    | 3 +--
 drivers/nvme/target/zns.c         | 3 +--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index b1fb0bb1f39f..562c2dd9c08c 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -172,8 +172,7 @@ static void nvmet_bio_done(struct bio *bio)
 	struct nvmet_req *req = bio->bi_private;
 
 	nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status));
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 1ec9e1b35c67..93ebc9ae3fe4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -679,4 +679,10 @@ static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
 	bio->bi_end_io = bi_end_io;
 }
 
+static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
+{
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index 54f765b566ee..a4a73d64c603 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -200,8 +200,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,
 				    sg->offset) < sg->length) {
-			if (bio != &req->p.inline_bio)
-				bio_put(bio);
+			nvmet_req_bio_put(req, bio);
 			return -EINVAL;
 		}
 	}
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 92213bed0006..bba1d6957b6a 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -330,7 +330,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 						 bio->bi_iter.bi_sector);
 
 out_bio_put:
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
 }
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 7/9] nvmet: add bio put helper for different backends
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

With the addition of zns backend now we have three different backends
with inline bio optimization. That leads to having duplicate code in for
freeing the bio in all three backends: generic bdev, passsthru and
generic zns.

Add a helper function to avoid the duplicate code and update the
respective backends.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +--
 drivers/nvme/target/nvmet.h       | 6 ++++++
 drivers/nvme/target/passthru.c    | 3 +--
 drivers/nvme/target/zns.c         | 3 +--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index b1fb0bb1f39f..562c2dd9c08c 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -172,8 +172,7 @@ static void nvmet_bio_done(struct bio *bio)
 	struct nvmet_req *req = bio->bi_private;
 
 	nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status));
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 1ec9e1b35c67..93ebc9ae3fe4 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -679,4 +679,10 @@ static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
 	bio->bi_end_io = bi_end_io;
 }
 
+static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
+{
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index 54f765b566ee..a4a73d64c603 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -200,8 +200,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,
 				    sg->offset) < sg->length) {
-			if (bio != &req->p.inline_bio)
-				bio_put(bio);
+			nvmet_req_bio_put(req, bio);
 			return -EINVAL;
 		}
 	}
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 92213bed0006..bba1d6957b6a 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -330,7 +330,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 						 bio->bi_iter.bi_sector);
 
 out_bio_put:
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
 }
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 8/9] nvmet: add common I/O length check helper
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

With the addition of zns backend now we have three different backends
with which checks for the nvmet request's transfer len and nvmet
request's sg_cnt. That leads to having duplicate code in for three
backends: generic bdev, file and generic zns.

Add a helper function to avoid the duplicate code and update the
respective backends.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  8 +-------
 drivers/nvme/target/io-cmd-file.c |  7 +------
 drivers/nvme/target/nvmet.h       | 14 ++++++++++++++
 drivers/nvme/target/zns.c         |  7 +------
 4 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 562c2dd9c08c..c23a719513b0 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -240,16 +240,10 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	int op, i, rc;
 	struct sg_mapping_iter prot_miter;
 	unsigned int iter_flags;
-	unsigned int total_len = nvmet_rw_data_len(req) + req->metadata_len;
 
-	if (!nvmet_check_transfer_len(req, total_len))
+	if (!nvmet_continue_io(req, nvmet_rw_data_len(req) + req->metadata_len))
 		return;
 
-	if (!req->sg_cnt) {
-		nvmet_req_complete(req, 0);
-		return;
-	}
-
 	if (req->cmd->rw.opcode == nvme_cmd_write) {
 		op = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
 		if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 0abbefd9925e..e7caff221b7b 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -241,14 +241,9 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
 {
 	ssize_t nr_bvec = req->sg_cnt;
 
-	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_continue_io(req, nvmet_rw_data_len(req)))
 		return;
 
-	if (!req->sg_cnt || !nr_bvec) {
-		nvmet_req_complete(req, 0);
-		return;
-	}
-
 	if (nr_bvec > NVMET_MAX_INLINE_BIOVEC)
 		req->f.bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
 				GFP_KERNEL);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 93ebc9ae3fe4..f4f9d622df0d 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -685,4 +685,18 @@ static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
 		bio_put(bio);
 }
 
+static inline bool nvmet_continue_io(struct nvmet_req *req,
+				     unsigned int total_len)
+{
+	if (!nvmet_check_transfer_len(req, total_len))
+		return false;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return false;
+	}
+
+	return true;
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index bba1d6957b6a..149bc8ce7010 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -289,14 +289,9 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	int ret = 0, sg_cnt;
 	struct bio *bio;
 
-	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_continue_io(req, nvmet_rw_data_len(req)))
 		return;
 
-	if (!req->sg_cnt) {
-		nvmet_req_complete(req, 0);
-		return;
-	}
-
 	bio = nvmet_req_bio_get(req, NULL);
 	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
 
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 8/9] nvmet: add common I/O length check helper
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

With the addition of zns backend now we have three different backends
with which checks for the nvmet request's transfer len and nvmet
request's sg_cnt. That leads to having duplicate code in for three
backends: generic bdev, file and generic zns.

Add a helper function to avoid the duplicate code and update the
respective backends.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  8 +-------
 drivers/nvme/target/io-cmd-file.c |  7 +------
 drivers/nvme/target/nvmet.h       | 14 ++++++++++++++
 drivers/nvme/target/zns.c         |  7 +------
 4 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 562c2dd9c08c..c23a719513b0 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -240,16 +240,10 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	int op, i, rc;
 	struct sg_mapping_iter prot_miter;
 	unsigned int iter_flags;
-	unsigned int total_len = nvmet_rw_data_len(req) + req->metadata_len;
 
-	if (!nvmet_check_transfer_len(req, total_len))
+	if (!nvmet_continue_io(req, nvmet_rw_data_len(req) + req->metadata_len))
 		return;
 
-	if (!req->sg_cnt) {
-		nvmet_req_complete(req, 0);
-		return;
-	}
-
 	if (req->cmd->rw.opcode == nvme_cmd_write) {
 		op = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
 		if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 0abbefd9925e..e7caff221b7b 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -241,14 +241,9 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
 {
 	ssize_t nr_bvec = req->sg_cnt;
 
-	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_continue_io(req, nvmet_rw_data_len(req)))
 		return;
 
-	if (!req->sg_cnt || !nr_bvec) {
-		nvmet_req_complete(req, 0);
-		return;
-	}
-
 	if (nr_bvec > NVMET_MAX_INLINE_BIOVEC)
 		req->f.bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
 				GFP_KERNEL);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 93ebc9ae3fe4..f4f9d622df0d 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -685,4 +685,18 @@ static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
 		bio_put(bio);
 }
 
+static inline bool nvmet_continue_io(struct nvmet_req *req,
+				     unsigned int total_len)
+{
+	if (!nvmet_check_transfer_len(req, total_len))
+		return false;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return false;
+	}
+
+	return true;
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index bba1d6957b6a..149bc8ce7010 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -289,14 +289,9 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	int ret = 0, sg_cnt;
 	struct bio *bio;
 
-	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_continue_io(req, nvmet_rw_data_len(req)))
 		return;
 
-	if (!req->sg_cnt) {
-		nvmet_req_complete(req, 0);
-		return;
-	}
-
 	bio = nvmet_req_bio_get(req, NULL);
 	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
 
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: hch, sagi, damien.lemoal, Chaitanya Kulkarni

The function nvmet_bdev_execute_zone_append() does exactly same thing
for completion of the bio that is done in the nvmet_bio_done(),
completing the request & calling nvmet_bio_put()_to put non online bio.

Export the function nvmet_bio_done() and use that in the
nvmet_bdev_execute_zone_append() for the request completion and bio
processing. Set the bio->private after the call to submit_bio_wait() to
nvmet request. The call to nvmet_bio_done() also updates error log page
via call to blk_to_nvme_status() from nvmet_bio_done().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  2 +-
 drivers/nvme/target/nvmet.h       |  1 +
 drivers/nvme/target/zns.c         | 10 ++++------
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c23a719513b0..72a22351da2a 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -167,7 +167,7 @@ static u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
 	return status;
 }
 
-static void nvmet_bio_done(struct bio *bio)
+void nvmet_bio_done(struct bio *bio)
 {
 	struct nvmet_req *req = bio->bi_private;
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f4f9d622df0d..ab84ab75b952 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -535,6 +535,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
 int nvmet_file_ns_revalidate(struct nvmet_ns *ns);
 void nvmet_ns_revalidate(struct nvmet_ns *ns);
+void nvmet_bio_done(struct bio *bio);
 
 static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 {
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 149bc8ce7010..da4be0231428 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -283,7 +283,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	struct request_queue *q = req->ns->bdev->bd_disk->queue;
 	unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
 	unsigned int max_sects = queue_max_zone_append_sectors(q);
-	u16 status = NVME_SC_SUCCESS;
 	unsigned int total_len = 0;
 	struct scatterlist *sg;
 	int ret = 0, sg_cnt;
@@ -306,7 +305,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 
 		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
 		if (ret != sg->length) {
-			status = NVME_SC_INTERNAL;
+			bio->bi_status = BLK_STS_IOERR;
 			goto out_bio_put;
 		}
 		if (same_page)
@@ -316,15 +315,14 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	}
 
 	if (total_len != nvmet_rw_data_len(req)) {
-		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		bio->bi_status = BLK_STS_IOERR;
 		goto out_bio_put;
 	}
 
 	ret = submit_bio_wait(bio);
 	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
 						 bio->bi_iter.bi_sector);
-
 out_bio_put:
-	nvmet_req_bio_put(req, bio);
-	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
+	bio->bi_private = req;
+	nvmet_bio_done(bio);
 }
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append
@ 2021-01-12  4:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  4:26 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: damien.lemoal, hch, Chaitanya Kulkarni, sagi

The function nvmet_bdev_execute_zone_append() does exactly same thing
for completion of the bio that is done in the nvmet_bio_done(),
completing the request & calling nvmet_bio_put()_to put non online bio.

Export the function nvmet_bio_done() and use that in the
nvmet_bdev_execute_zone_append() for the request completion and bio
processing. Set the bio->private after the call to submit_bio_wait() to
nvmet request. The call to nvmet_bio_done() also updates error log page
via call to blk_to_nvme_status() from nvmet_bio_done().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c |  2 +-
 drivers/nvme/target/nvmet.h       |  1 +
 drivers/nvme/target/zns.c         | 10 ++++------
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c23a719513b0..72a22351da2a 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -167,7 +167,7 @@ static u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
 	return status;
 }
 
-static void nvmet_bio_done(struct bio *bio)
+void nvmet_bio_done(struct bio *bio)
 {
 	struct nvmet_req *req = bio->bi_private;
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f4f9d622df0d..ab84ab75b952 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -535,6 +535,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
 int nvmet_file_ns_revalidate(struct nvmet_ns *ns);
 void nvmet_ns_revalidate(struct nvmet_ns *ns);
+void nvmet_bio_done(struct bio *bio);
 
 static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 {
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index 149bc8ce7010..da4be0231428 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -283,7 +283,6 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	struct request_queue *q = req->ns->bdev->bd_disk->queue;
 	unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
 	unsigned int max_sects = queue_max_zone_append_sectors(q);
-	u16 status = NVME_SC_SUCCESS;
 	unsigned int total_len = 0;
 	struct scatterlist *sg;
 	int ret = 0, sg_cnt;
@@ -306,7 +305,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 
 		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
 		if (ret != sg->length) {
-			status = NVME_SC_INTERNAL;
+			bio->bi_status = BLK_STS_IOERR;
 			goto out_bio_put;
 		}
 		if (same_page)
@@ -316,15 +315,14 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
 	}
 
 	if (total_len != nvmet_rw_data_len(req)) {
-		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		bio->bi_status = BLK_STS_IOERR;
 		goto out_bio_put;
 	}
 
 	ret = submit_bio_wait(bio);
 	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
 						 bio->bi_iter.bi_sector);
-
 out_bio_put:
-	nvmet_req_bio_put(req, bio);
-	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
+	bio->bi_private = req;
+	nvmet_bio_done(bio);
 }
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 2/9] nvmet: add lba to sect conversion helpers
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  5:08     ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:08 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:26, Chaitanya Kulkarni wrote:
> In this preparation patch, we add helpers to convert lbas to sectors &
> sectors to lba. This is needed to eliminate code duplication in the ZBD
> backend.
> 
> Use these helpers in the block device backend.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c |  8 +++-----
>  drivers/nvme/target/nvmet.h       | 10 ++++++++++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 125dde3f410e..23095bdfce06 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -256,8 +256,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	if (is_pci_p2pdma_page(sg_page(req->sg)))
>  		op |= REQ_NOMERGE;
>  
> -	sector = le64_to_cpu(req->cmd->rw.slba);
> -	sector <<= (req->ns->blksize_shift - 9);
> +	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  
>  	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>  		bio = &req->b.inline_bio;
> @@ -345,7 +344,7 @@ static u16 nvmet_bdev_discard_range(struct nvmet_req *req,
>  	int ret;
>  
>  	ret = __blkdev_issue_discard(ns->bdev,
> -			le64_to_cpu(range->slba) << (ns->blksize_shift - 9),
> +			nvmet_lba_to_sect(ns, range->slba),
>  			le32_to_cpu(range->nlb) << (ns->blksize_shift - 9),
>  			GFP_KERNEL, 0, bio);
>  	if (ret && ret != -EOPNOTSUPP) {
> @@ -414,8 +413,7 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
>  	if (!nvmet_check_transfer_len(req, 0))
>  		return;
>  
> -	sector = le64_to_cpu(write_zeroes->slba) <<
> -		(req->ns->blksize_shift - 9);
> +	sector = nvmet_lba_to_sect(req->ns, write_zeroes->slba);
>  	nr_sector = (((sector_t)le16_to_cpu(write_zeroes->length) + 1) <<
>  		(req->ns->blksize_shift - 9));
>  
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..8776dd1a0490 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -603,4 +603,14 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +static inline __le64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return cpu_to_le64(sect >> (ns->blksize_shift - SECTOR_SHIFT));
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
>  #endif /* _NVMET_H */
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 2/9] nvmet: add lba to sect conversion helpers
@ 2021-01-12  5:08     ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:08 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:26, Chaitanya Kulkarni wrote:
> In this preparation patch, we add helpers to convert lbas to sectors &
> sectors to lba. This is needed to eliminate code duplication in the ZBD
> backend.
> 
> Use these helpers in the block device backend.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c |  8 +++-----
>  drivers/nvme/target/nvmet.h       | 10 ++++++++++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 125dde3f410e..23095bdfce06 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -256,8 +256,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	if (is_pci_p2pdma_page(sg_page(req->sg)))
>  		op |= REQ_NOMERGE;
>  
> -	sector = le64_to_cpu(req->cmd->rw.slba);
> -	sector <<= (req->ns->blksize_shift - 9);
> +	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  
>  	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>  		bio = &req->b.inline_bio;
> @@ -345,7 +344,7 @@ static u16 nvmet_bdev_discard_range(struct nvmet_req *req,
>  	int ret;
>  
>  	ret = __blkdev_issue_discard(ns->bdev,
> -			le64_to_cpu(range->slba) << (ns->blksize_shift - 9),
> +			nvmet_lba_to_sect(ns, range->slba),
>  			le32_to_cpu(range->nlb) << (ns->blksize_shift - 9),
>  			GFP_KERNEL, 0, bio);
>  	if (ret && ret != -EOPNOTSUPP) {
> @@ -414,8 +413,7 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
>  	if (!nvmet_check_transfer_len(req, 0))
>  		return;
>  
> -	sector = le64_to_cpu(write_zeroes->slba) <<
> -		(req->ns->blksize_shift - 9);
> +	sector = nvmet_lba_to_sect(req->ns, write_zeroes->slba);
>  	nr_sector = (((sector_t)le16_to_cpu(write_zeroes->length) + 1) <<
>  		(req->ns->blksize_shift - 9));
>  
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 592763732065..8776dd1a0490 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -603,4 +603,14 @@ static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
>  	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
>  }
>  
> +static inline __le64 nvmet_sect_to_lba(struct nvmet_ns *ns, sector_t sect)
> +{
> +	return cpu_to_le64(sect >> (ns->blksize_shift - SECTOR_SHIFT));
> +}
> +
> +static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
> +{
> +	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
> +}
> +
>  #endif /* _NVMET_H */
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  5:32     ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:32 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
> NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
> communicate with a non-volatile memory subsystem using zones for
> NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
> Protocol compliant devices on the target in the passthru mode. There
> are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
> HDDs that are not based on the NVMe protocol.
> 
> This patch adds ZNS backend to support the ZBDs for NVMeOF target.
> 
> This support includes implementing the new command set NVME_CSI_ZNS,
> adding different command handlers for ZNS command set such as
> NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
> NVMe Zone Management Send and NVMe Zone Management Receive.
> 
> With new command set identifier we also update the target command effects
> logs to reflect the ZNS compliant commands.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   1 +
>  drivers/nvme/target/admin-cmd.c   |  28 +++
>  drivers/nvme/target/core.c        |   3 +
>  drivers/nvme/target/io-cmd-bdev.c |  33 ++-
>  drivers/nvme/target/nvmet.h       |  38 ++++
>  drivers/nvme/target/zns.c         | 342 ++++++++++++++++++++++++++++++
>  6 files changed, 437 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..9837e580fa7e 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index a50b7bcac67a..bdf09d8faa48 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  		break;
> +	case NVME_CSI_ZNS:
> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +			u32 *iocs = log->iocs;
> +
> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +		}

Is it OK to not return an error here if CONFIG_BLK_DEV_ZONED is not enabled ?
I have not checked the entire code of this function nor how it is called, so I
may be wrong.

> +		break;
>  	default:
>  		status = NVME_SC_INVALID_LOG_PAGE;
>  		break;
> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>  	if (status)
>  		goto out;
>  
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +		if (req->ns->csi == NVME_CSI_ZNS)
> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +							  NVME_NIDT_CSI_LEN,
> +							  &nvme_cis_zns, &off);
> +		if (status)
> +			goto out;
> +	}

Same comment here.

> +
>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ns(req);
> +		break;
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ctrl(req);
> +		break;
>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>  		return nvmet_execute_identify_nslist(req);
>  	case NVME_ID_CNS_NS_DESC_LIST:
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 672e4009f8d6..17d5da062a5a 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>  static inline bool nvmet_cc_css_check(u8 cc_css)
>  {
>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_CSI:
>  	case NVME_CC_CSS_NVM:
>  		return true;
>  	default:
> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>  {
>  	/* command sets supported: NVMe command set: */
>  	ctrl->cap = (1ULL << 37);
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +		ctrl->cap |= (1ULL << 43);
>  	/* CC.EN timeout in 500msec units: */
>  	ctrl->cap |= (15ULL << 24);
>  	/* maximum queue entries supported: */
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 23095bdfce06..6178ef643962 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
>  	}
>  }
>  
> +void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev) {
> +		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
> +		ns->bdev = NULL;
> +	}
> +}
> +
>  int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  {
>  	int ret;
> @@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>  		nvmet_bdev_ns_enable_integrity(ns);
>  
> -	return 0;
> -}
> -
> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
> -{
> -	if (ns->bdev) {
> -		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
> -		ns->bdev = NULL;
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
> +		if (!nvmet_bdev_zns_enable(ns)) {
> +			nvmet_bdev_ns_disable(ns);
> +			return -EINVAL;
> +		}
> +		ns->csi = NVME_CSI_ZNS;
>  	}
> +
> +	return 0;
>  }
>  
>  void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
> @@ -448,6 +456,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>  	case nvme_cmd_write_zeroes:
>  		req->execute = nvmet_bdev_execute_write_zeroes;
>  		return 0;
> +	case nvme_cmd_zone_append:
> +		req->execute = nvmet_bdev_execute_zone_append;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_recv:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_send:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
> +		return 0;
>  	default:
>  		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
>  		       req->sq->qid);
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 476b3cd91c65..7361665585a2 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -252,6 +252,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	u8			zasl;
> +#endif /* CONFIG_BLK_DEV_ZONED */
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -614,4 +618,38 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>  	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>  }
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +static inline void
> +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..2a71f56e568d
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,342 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include "nvmet.h"
> +
> +/*
> + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
> + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
> + * as page_shift value. When calculating the ZASL use shift by 12.
> + */
> +#define NVMET_MPSMIN_SHIFT	12
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = NVME_SC_SUCCESS;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +
> +out:

You really want to keep this (useless) label ? Without it, the status variable
can be dropped and the code overall becomes so much easier to read... Not to
mention that life will be easier to the compiler for optimizing this.

> +	return status;
> +}
> +
> +/*
> + *  ZNS related command implementation and helpers.
> + */
> +
> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
> +{
> +	/*
> +	 * Zone Append Size Limit is the value experessed in the units
> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
> +	 */
> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
> +}
> +
> +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
> +{
> +	struct request_queue *q = ns->bdev->bd_disk->queue;
> +	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
> +
> +	if (ns->subsys->zasl)
> +		return ns->subsys->zasl < zasl ? false : true;
> +
> +	ns->subsys->zasl = zasl;
> +	return true;
> +}
> +
> +
> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
> +					    unsigned int idx, void *data)
> +{
> +	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +		return -EOPNOTSUPP;
> +	return 0;
> +}
> +
> +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
> +{
> +	int ret;
> +
> +	if (bdev->bd_disk->queue->conv_zones_bitmap)
> +		return true;
> +
> +	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
> +				  nvmet_bdev_validate_zns_zones_cb, NULL);
> +
> +	return ret < 0 ? true : false;

return ret <= 0;

would be simpler.

Note that "<=" includes the error case of the device not reporting any zone
(device dead) as we should fail that case I think.

> +}
> +
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	if (nvmet_bdev_has_conv_zones(ns->bdev))
> +		return false;
> +
> +	/*
> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> +	 * to the device physical block size. So use this value as the logical
> +	 * block size to avoid errors.
> +	 */
> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +
> +	if (!nvmet_zns_update_zasl(ns))
> +		return false;
> +
> +	return !(get_capacity(ns->bdev->bd_disk) &
> +			(bdev_zone_sectors(ns->bdev) - 1));
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	u8 zasl = req->sq->ctrl->subsys->zasl;
> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	if (ctrl->ops->get_mdts)
> +		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
> +	else
> +		id->zasl = zasl;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = NVME_SC_SUCCESS;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +struct nvmet_report_zone_data {
> +	struct nvmet_ns *ns;
> +	struct nvme_zone_report *rz;
> +};
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
> +				     void *data)
> +{
> +	struct nvmet_report_zone_data *report_zone_data = data;
> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
> +	struct nvmet_ns *ns = report_zone_data->ns;
> +
> +	entries[idx].zcap = nvmet_sect_to_lba(ns, z->capacity);
> +	entries[idx].zslba = nvmet_sect_to_lba(ns, z->start);
> +	entries[idx].wp = nvmet_sect_to_lba(ns, z->wp);
> +	entries[idx].za = z->reset ? 1 << 2 : 0;
> +	entries[idx].zt = z->type;
> +	entries[idx].zs = z->cond << 4;
> +
> +	return 0;
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
> +	struct nvmet_report_zone_data data = { .ns = req->ns };
> +	unsigned int nr_zones;
> +	int reported_zones;
> +	u16 status;
> +
> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
> +			sizeof(struct nvme_zone_descriptor);

I really would prefer this code to be moved down, before the call to
blkdev_report_zones().

You can also optimize this value a little with a min() of the value above and of
DIV_ROUND_UP(dev_capacity - sect, zone size). But not a big deal I think.

> +
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);

Shouldn't this be GFP_NOIO ? Also, is the NORETRY critical ?
blkdev_report_zones() will do mem allocation too and at leadt scsi does retry.

> +	if (!data.rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
> +					     nvmet_bdev_report_zone_cb,
> +					     &data);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
> +
> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(data.rz);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
> +	u16 status = NVME_SC_SUCCESS;
> +	enum req_opf op;
> +	int ret;
> +
> +	if (req->cmd->zms.select_all)
> +		nr_sect = get_capacity(req->ns->bdev->bd_disk);
> +
> +	switch (req->cmd->zms.zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);

GFP_NOIO ?

> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
> +	struct request_queue *q = req->ns->bdev->bd_disk->queue;
> +	unsigned int max_sects = queue_max_zone_append_sectors(q);
> +	u16 status = NVME_SC_SUCCESS;
> +	unsigned int total_len = 0;
> +	struct scatterlist *sg;
> +	int ret = 0, sg_cnt;
> +	struct bio *bio;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;
> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> +		bio = &req->b.inline_bio;
> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> +	} else {
> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> +	}
> +
> +	bio_set_dev(bio, req->ns->bdev);
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
> +		bio->bi_opf |= REQ_FUA;
> +
> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
> +		struct page *p = sg_page(sg);
> +		unsigned int l = sg->length;
> +		unsigned int o = sg->offset;
> +		bool same_page = false;
> +
> +		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
> +		if (ret != sg->length) {
> +			status = NVME_SC_INTERNAL;
> +			goto out_bio_put;
> +		}
> +		if (same_page)
> +			put_page(p);
> +
> +		total_len += sg->length;
> +	}
> +
> +	if (total_len != nvmet_rw_data_len(req)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out_bio_put;
> +	}
> +
> +	ret = submit_bio_wait(bio);
> +	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
> +						 bio->bi_iter.bi_sector);
> +
> +out_bio_put:
> +	if (bio != &req->b.inline_bio)
> +		bio_put(bio);
> +	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
> +}
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-12  5:32     ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:32 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
> NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
> communicate with a non-volatile memory subsystem using zones for
> NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
> Protocol compliant devices on the target in the passthru mode. There
> are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
> HDDs that are not based on the NVMe protocol.
> 
> This patch adds ZNS backend to support the ZBDs for NVMeOF target.
> 
> This support includes implementing the new command set NVME_CSI_ZNS,
> adding different command handlers for ZNS command set such as
> NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
> NVMe Zone Management Send and NVMe Zone Management Receive.
> 
> With new command set identifier we also update the target command effects
> logs to reflect the ZNS compliant commands.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/Makefile      |   1 +
>  drivers/nvme/target/admin-cmd.c   |  28 +++
>  drivers/nvme/target/core.c        |   3 +
>  drivers/nvme/target/io-cmd-bdev.c |  33 ++-
>  drivers/nvme/target/nvmet.h       |  38 ++++
>  drivers/nvme/target/zns.c         | 342 ++++++++++++++++++++++++++++++
>  6 files changed, 437 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/nvme/target/zns.c
> 
> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
> index ebf91fc4c72e..9837e580fa7e 100644
> --- a/drivers/nvme/target/Makefile
> +++ b/drivers/nvme/target/Makefile
> @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>  nvme-loop-y	+= loop.o
>  nvmet-rdma-y	+= rdma.o
>  nvmet-fc-y	+= fc.o
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index a50b7bcac67a..bdf09d8faa48 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  		break;
> +	case NVME_CSI_ZNS:
> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +			u32 *iocs = log->iocs;
> +
> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +		}

Is it OK to not return an error here if CONFIG_BLK_DEV_ZONED is not enabled ?
I have not checked the entire code of this function nor how it is called, so I
may be wrong.

> +		break;
>  	default:
>  		status = NVME_SC_INVALID_LOG_PAGE;
>  		break;
> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>  	if (status)
>  		goto out;
>  
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +		if (req->ns->csi == NVME_CSI_ZNS)
> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +							  NVME_NIDT_CSI_LEN,
> +							  &nvme_cis_zns, &off);
> +		if (status)
> +			goto out;
> +	}

Same comment here.

> +
>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ns(req);
> +		break;
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ctrl(req);
> +		break;
>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>  		return nvmet_execute_identify_nslist(req);
>  	case NVME_ID_CNS_NS_DESC_LIST:
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 672e4009f8d6..17d5da062a5a 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>  static inline bool nvmet_cc_css_check(u8 cc_css)
>  {
>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_CSI:
>  	case NVME_CC_CSS_NVM:
>  		return true;
>  	default:
> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>  {
>  	/* command sets supported: NVMe command set: */
>  	ctrl->cap = (1ULL << 37);
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +		ctrl->cap |= (1ULL << 43);
>  	/* CC.EN timeout in 500msec units: */
>  	ctrl->cap |= (15ULL << 24);
>  	/* maximum queue entries supported: */
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 23095bdfce06..6178ef643962 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
>  	}
>  }
>  
> +void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev) {
> +		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
> +		ns->bdev = NULL;
> +	}
> +}
> +
>  int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  {
>  	int ret;
> @@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>  		nvmet_bdev_ns_enable_integrity(ns);
>  
> -	return 0;
> -}
> -
> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
> -{
> -	if (ns->bdev) {
> -		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
> -		ns->bdev = NULL;
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
> +		if (!nvmet_bdev_zns_enable(ns)) {
> +			nvmet_bdev_ns_disable(ns);
> +			return -EINVAL;
> +		}
> +		ns->csi = NVME_CSI_ZNS;
>  	}
> +
> +	return 0;
>  }
>  
>  void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
> @@ -448,6 +456,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>  	case nvme_cmd_write_zeroes:
>  		req->execute = nvmet_bdev_execute_write_zeroes;
>  		return 0;
> +	case nvme_cmd_zone_append:
> +		req->execute = nvmet_bdev_execute_zone_append;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_recv:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
> +		return 0;
> +	case nvme_cmd_zone_mgmt_send:
> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
> +		return 0;
>  	default:
>  		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
>  		       req->sq->qid);
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 476b3cd91c65..7361665585a2 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -252,6 +252,10 @@ struct nvmet_subsys {
>  	unsigned int		admin_timeout;
>  	unsigned int		io_timeout;
>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	u8			zasl;
> +#endif /* CONFIG_BLK_DEV_ZONED */
>  };
>  
>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
> @@ -614,4 +618,38 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>  	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>  }
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
> +#else  /* CONFIG_BLK_DEV_ZONED */
> +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	return false;
> +}
> +static inline void
> +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +}
> +static inline void
> +nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +}
> +#endif /* CONFIG_BLK_DEV_ZONED */
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> new file mode 100644
> index 000000000000..2a71f56e568d
> --- /dev/null
> +++ b/drivers/nvme/target/zns.c
> @@ -0,0 +1,342 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NVMe ZNS-ZBD command implementation.
> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/nvme.h>
> +#include <linux/blkdev.h>
> +#include "nvmet.h"
> +
> +/*
> + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
> + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
> + * as page_shift value. When calculating the ZASL use shift by 12.
> + */
> +#define NVMET_MPSMIN_SHIFT	12
> +
> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
> +{
> +	u16 status = NVME_SC_SUCCESS;
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
> +		status = NVME_SC_INVALID_FIELD;
> +
> +out:

You really want to keep this (useless) label ? Without it, the status variable
can be dropped and the code overall becomes so much easier to read... Not to
mention that life will be easier to the compiler for optimizing this.

> +	return status;
> +}
> +
> +/*
> + *  ZNS related command implementation and helpers.
> + */
> +
> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
> +{
> +	/*
> +	 * Zone Append Size Limit is the value experessed in the units
> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
> +	 */
> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
> +}
> +
> +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
> +{
> +	struct request_queue *q = ns->bdev->bd_disk->queue;
> +	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
> +
> +	if (ns->subsys->zasl)
> +		return ns->subsys->zasl < zasl ? false : true;
> +
> +	ns->subsys->zasl = zasl;
> +	return true;
> +}
> +
> +
> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
> +					    unsigned int idx, void *data)
> +{
> +	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
> +		return -EOPNOTSUPP;
> +	return 0;
> +}
> +
> +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
> +{
> +	int ret;
> +
> +	if (bdev->bd_disk->queue->conv_zones_bitmap)
> +		return true;
> +
> +	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
> +				  nvmet_bdev_validate_zns_zones_cb, NULL);
> +
> +	return ret < 0 ? true : false;

return ret <= 0;

would be simpler.

Note that "<=" includes the error case of the device not reporting any zone
(device dead) as we should fail that case I think.

> +}
> +
> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
> +{
> +	if (nvmet_bdev_has_conv_zones(ns->bdev))
> +		return false;
> +
> +	/*
> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> +	 * to the device physical block size. So use this value as the logical
> +	 * block size to avoid errors.
> +	 */
> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
> +
> +	if (!nvmet_zns_update_zasl(ns))
> +		return false;
> +
> +	return !(get_capacity(ns->bdev->bd_disk) &
> +			(bdev_zone_sectors(ns->bdev) - 1));
> +}
> +
> +/*
> + * ZNS related Admin and I/O command handlers.
> + */
> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
> +{
> +	u8 zasl = req->sq->ctrl->subsys->zasl;
> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
> +	struct nvme_id_ctrl_zns *id;
> +	u16 status;
> +
> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
> +	if (!id) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	if (ctrl->ops->get_mdts)
> +		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
> +	else
> +		id->zasl = zasl;
> +
> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
> +
> +	kfree(id);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
> +{
> +	struct nvme_id_ns_zns *id_zns;
> +	u16 status = NVME_SC_SUCCESS;
> +	u64 zsze;
> +
> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto out;
> +	}
> +
> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
> +	if (!id_zns) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	if (!req->ns) {
> +		status = NVME_SC_INTERNAL;
> +		goto done;
> +	}
> +
> +	if (!bdev_is_zoned(req->ns->bdev)) {
> +		req->error_loc = offsetof(struct nvme_identify, nsid);
> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> +		goto done;
> +	}
> +
> +	nvmet_ns_revalidate(req->ns);
> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
> +					req->ns->blksize_shift;
> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
> +
> +done:
> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
> +	kfree(id_zns);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +struct nvmet_report_zone_data {
> +	struct nvmet_ns *ns;
> +	struct nvme_zone_report *rz;
> +};
> +
> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
> +				     void *data)
> +{
> +	struct nvmet_report_zone_data *report_zone_data = data;
> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
> +	struct nvmet_ns *ns = report_zone_data->ns;
> +
> +	entries[idx].zcap = nvmet_sect_to_lba(ns, z->capacity);
> +	entries[idx].zslba = nvmet_sect_to_lba(ns, z->start);
> +	entries[idx].wp = nvmet_sect_to_lba(ns, z->wp);
> +	entries[idx].za = z->reset ? 1 << 2 : 0;
> +	entries[idx].zt = z->type;
> +	entries[idx].zs = z->cond << 4;
> +
> +	return 0;
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
> +	struct nvmet_report_zone_data data = { .ns = req->ns };
> +	unsigned int nr_zones;
> +	int reported_zones;
> +	u16 status;
> +
> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
> +			sizeof(struct nvme_zone_descriptor);

I really would prefer this code to be moved down, before the call to
blkdev_report_zones().

You can also optimize this value a little with a min() of the value above and of
DIV_ROUND_UP(dev_capacity - sect, zone size). But not a big deal I think.

> +
> +	status = nvmet_bdev_zns_checks(req);
> +	if (status)
> +		goto out;
> +
> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);

Shouldn't this be GFP_NOIO ? Also, is the NORETRY critical ?
blkdev_report_zones() will do mem allocation too and at leadt scsi does retry.

> +	if (!data.rz) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
> +					     nvmet_bdev_report_zone_cb,
> +					     &data);
> +	if (reported_zones < 0) {
> +		status = NVME_SC_INTERNAL;
> +		goto out_free_report_zones;
> +	}
> +
> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
> +
> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
> +
> +out_free_report_zones:
> +	kvfree(data.rz);
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
> +	u16 status = NVME_SC_SUCCESS;
> +	enum req_opf op;
> +	int ret;
> +
> +	if (req->cmd->zms.select_all)
> +		nr_sect = get_capacity(req->ns->bdev->bd_disk);
> +
> +	switch (req->cmd->zms.zsa) {
> +	case NVME_ZONE_OPEN:
> +		op = REQ_OP_ZONE_OPEN;
> +		break;
> +	case NVME_ZONE_CLOSE:
> +		op = REQ_OP_ZONE_CLOSE;
> +		break;
> +	case NVME_ZONE_FINISH:
> +		op = REQ_OP_ZONE_FINISH;
> +		break;
> +	case NVME_ZONE_RESET:
> +		op = REQ_OP_ZONE_RESET;
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_FIELD;
> +		goto out;
> +	}
> +
> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);

GFP_NOIO ?

> +	if (ret)
> +		status = NVME_SC_INTERNAL;
> +out:
> +	nvmet_req_complete(req, status);
> +}
> +
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
> +{
> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
> +	struct request_queue *q = req->ns->bdev->bd_disk->queue;
> +	unsigned int max_sects = queue_max_zone_append_sectors(q);
> +	u16 status = NVME_SC_SUCCESS;
> +	unsigned int total_len = 0;
> +	struct scatterlist *sg;
> +	int ret = 0, sg_cnt;
> +	struct bio *bio;
> +
> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
> +		return;
> +
> +	if (!req->sg_cnt) {
> +		nvmet_req_complete(req, 0);
> +		return;
> +	}
> +
> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> +		bio = &req->b.inline_bio;
> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> +	} else {
> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> +	}
> +
> +	bio_set_dev(bio, req->ns->bdev);
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
> +		bio->bi_opf |= REQ_FUA;
> +
> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
> +		struct page *p = sg_page(sg);
> +		unsigned int l = sg->length;
> +		unsigned int o = sg->offset;
> +		bool same_page = false;
> +
> +		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
> +		if (ret != sg->length) {
> +			status = NVME_SC_INTERNAL;
> +			goto out_bio_put;
> +		}
> +		if (same_page)
> +			put_page(p);
> +
> +		total_len += sg->length;
> +	}
> +
> +	if (total_len != nvmet_rw_data_len(req)) {
> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> +		goto out_bio_put;
> +	}
> +
> +	ret = submit_bio_wait(bio);
> +	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
> +						 bio->bi_iter.bi_sector);
> +
> +out_bio_put:
> +	if (bio != &req->b.inline_bio)
> +		bio_put(bio);
> +	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
> +}
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  5:37     ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
> With the addition of the zns backend now we have three different
> backends with inline bio optimization. That leads to having duplicate
> code for allocating or initializing the bio in all three backends:
> generic bdev, passsthru, and generic zns.
> 
> Add a helper function to reduce the duplicate code such that helper
> function accepts the bi_end_io callback which gets initialize for the
> non-inline bio_alloc() case. This is due to the special case needed for
> the passthru backend non-inline bio allocation bio_alloc() where we set
> the bio->bi_end_io = bio_put, having this parameter avoids the extra
> branch in the passthru fast path. For rest of the backends, we set the
> same bi_end_io callback for inline and non-inline cases, that is for
> generic bdev we set to nvmet_bio_done() and for generic zns we set to
> NULL.                            
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c |  7 +------
>  drivers/nvme/target/nvmet.h       | 16 ++++++++++++++++
>  drivers/nvme/target/passthru.c    |  8 +-------
>  drivers/nvme/target/zns.c         |  8 +-------
>  4 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 6178ef643962..72746e29cb0d 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -266,12 +266,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  
>  	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  
> -	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> -		bio = &req->b.inline_bio;
> -		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> -	} else {
> -		bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
> -	}
> +	bio = nvmet_req_bio_get(req, NULL);
>  	bio_set_dev(bio, req->ns->bdev);
>  	bio->bi_iter.bi_sector = sector;
>  	bio->bi_private = req;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 7361665585a2..3fc84f79cce1 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -652,4 +652,20 @@ nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  }
>  #endif /* CONFIG_BLK_DEV_ZONED */
>  
> +static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
> +					    bio_end_io_t *bi_end_io)
> +{
> +	struct bio *bio;
> +
> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> +		bio = &req->b.inline_bio;
> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> +		return bio;
> +	}
> +
> +	bio = bio_alloc(GFP_KERNEL, req->sg_cnt);

I have a doubt about the use of GFP_KERNEL here... Shouldn't these be GFP_NOIO ?
The code was like this so it is may be OK, but without GFP_NOIO, is forward
progress guaranteed ? No recursion possible ?

> +	bio->bi_end_io = bi_end_io;
> +	return bio;
> +}
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
> index b9776fc8f08f..54f765b566ee 100644
> --- a/drivers/nvme/target/passthru.c
> +++ b/drivers/nvme/target/passthru.c
> @@ -194,13 +194,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
>  	if (req->sg_cnt > BIO_MAX_PAGES)
>  		return -EINVAL;
>  
> -	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> -		bio = &req->p.inline_bio;
> -		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> -	} else {
> -		bio = bio_alloc(GFP_KERNEL, min(req->sg_cnt, BIO_MAX_PAGES));
> -		bio->bi_end_io = bio_put;
> -	}
> +	bio = nvmet_req_bio_get(req, bio_put);
>  	bio->bi_opf = req_op(rq);
>  
>  	for_each_sg(req->sg, sg, req->sg_cnt, i) {
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index 2a71f56e568d..c32e93a3c7e1 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -296,13 +296,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  		return;
>  	}
>  
> -	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> -		bio = &req->b.inline_bio;
> -		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> -	} else {
> -		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> -	}
> -
> +	bio = nvmet_req_bio_get(req, NULL);
>  	bio_set_dev(bio, req->ns->bdev);
>  	bio->bi_iter.bi_sector = sect;
>  	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> 
-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-12  5:37     ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
> With the addition of the zns backend now we have three different
> backends with inline bio optimization. That leads to having duplicate
> code for allocating or initializing the bio in all three backends:
> generic bdev, passsthru, and generic zns.
> 
> Add a helper function to reduce the duplicate code such that helper
> function accepts the bi_end_io callback which gets initialize for the
> non-inline bio_alloc() case. This is due to the special case needed for
> the passthru backend non-inline bio allocation bio_alloc() where we set
> the bio->bi_end_io = bio_put, having this parameter avoids the extra
> branch in the passthru fast path. For rest of the backends, we set the
> same bi_end_io callback for inline and non-inline cases, that is for
> generic bdev we set to nvmet_bio_done() and for generic zns we set to
> NULL.                            
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c |  7 +------
>  drivers/nvme/target/nvmet.h       | 16 ++++++++++++++++
>  drivers/nvme/target/passthru.c    |  8 +-------
>  drivers/nvme/target/zns.c         |  8 +-------
>  4 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 6178ef643962..72746e29cb0d 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -266,12 +266,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  
>  	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  
> -	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> -		bio = &req->b.inline_bio;
> -		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> -	} else {
> -		bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
> -	}
> +	bio = nvmet_req_bio_get(req, NULL);
>  	bio_set_dev(bio, req->ns->bdev);
>  	bio->bi_iter.bi_sector = sector;
>  	bio->bi_private = req;
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 7361665585a2..3fc84f79cce1 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -652,4 +652,20 @@ nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  }
>  #endif /* CONFIG_BLK_DEV_ZONED */
>  
> +static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
> +					    bio_end_io_t *bi_end_io)
> +{
> +	struct bio *bio;
> +
> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> +		bio = &req->b.inline_bio;
> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> +		return bio;
> +	}
> +
> +	bio = bio_alloc(GFP_KERNEL, req->sg_cnt);

I have a doubt about the use of GFP_KERNEL here... Shouldn't these be GFP_NOIO ?
The code was like this so it is may be OK, but without GFP_NOIO, is forward
progress guaranteed ? No recursion possible ?

> +	bio->bi_end_io = bi_end_io;
> +	return bio;
> +}
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
> index b9776fc8f08f..54f765b566ee 100644
> --- a/drivers/nvme/target/passthru.c
> +++ b/drivers/nvme/target/passthru.c
> @@ -194,13 +194,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
>  	if (req->sg_cnt > BIO_MAX_PAGES)
>  		return -EINVAL;
>  
> -	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> -		bio = &req->p.inline_bio;
> -		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> -	} else {
> -		bio = bio_alloc(GFP_KERNEL, min(req->sg_cnt, BIO_MAX_PAGES));
> -		bio->bi_end_io = bio_put;
> -	}
> +	bio = nvmet_req_bio_get(req, bio_put);
>  	bio->bi_opf = req_op(rq);
>  
>  	for_each_sg(req->sg, sg, req->sg_cnt, i) {
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index 2a71f56e568d..c32e93a3c7e1 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -296,13 +296,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  		return;
>  	}
>  
> -	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
> -		bio = &req->b.inline_bio;
> -		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
> -	} else {
> -		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> -	}
> -
> +	bio = nvmet_req_bio_get(req, NULL);
>  	bio_set_dev(bio, req->ns->bdev);
>  	bio->bi_iter.bi_sector = sect;
>  	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> 
-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  5:40     ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
> With the addition of the zns backend now we have two different backends
> with the same bio initialization code. That leads to having duplicate
> code in two backends: generic bdev and generic zns.
> 
> Add a helper function to reduce the duplicate code such that helper
> function initializes the various bio initialization parameters such as
> bio block device, op flags, sector, end io callback, and private member,
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c |  6 +-----
>  drivers/nvme/target/nvmet.h       | 11 +++++++++++
>  drivers/nvme/target/zns.c         |  6 +++---
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 72746e29cb0d..b1fb0bb1f39f 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -267,11 +267,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  
>  	bio = nvmet_req_bio_get(req, NULL);
> -	bio_set_dev(bio, req->ns->bdev);
> -	bio->bi_iter.bi_sector = sector;
> -	bio->bi_private = req;
> -	bio->bi_end_io = nvmet_bio_done;
> -	bio->bi_opf = op;
> +	nvmet_bio_init(bio, req->ns->bdev, op, sector, req, nvmet_bio_done);
>  
>  	blk_start_plug(&plug);
>  	if (req->metadata_len)
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 3fc84f79cce1..1ec9e1b35c67 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -668,4 +668,15 @@ static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
>  	return bio;
>  }
>  
> +static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
> +				  unsigned int op, sector_t sect, void *private,
> +				  bio_end_io_t *bi_end_io)
> +{
> +	bio_set_dev(bio, bdev);
> +	bio->bi_opf = op;
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_private = private;
> +	bio->bi_end_io = bi_end_io;
> +}
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index c32e93a3c7e1..92213bed0006 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -281,6 +281,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  {
>  	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  	struct request_queue *q = req->ns->bdev->bd_disk->queue;
> +	unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>  	unsigned int max_sects = queue_max_zone_append_sectors(q);
>  	u16 status = NVME_SC_SUCCESS;
>  	unsigned int total_len = 0;
> @@ -297,9 +298,8 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  	}
>  
>  	bio = nvmet_req_bio_get(req, NULL);
> -	bio_set_dev(bio, req->ns->bdev);
> -	bio->bi_iter.bi_sector = sect;
> -	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);

op is used only here I think. So is that variable really necessary ?

> +
>  	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>  		bio->bi_opf |= REQ_FUA;
>  
> 

Apart from the nit above, looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-12  5:40     ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
> With the addition of the zns backend now we have two different backends
> with the same bio initialization code. That leads to having duplicate
> code in two backends: generic bdev and generic zns.
> 
> Add a helper function to reduce the duplicate code such that helper
> function initializes the various bio initialization parameters such as
> bio block device, op flags, sector, end io callback, and private member,
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  drivers/nvme/target/io-cmd-bdev.c |  6 +-----
>  drivers/nvme/target/nvmet.h       | 11 +++++++++++
>  drivers/nvme/target/zns.c         |  6 +++---
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 72746e29cb0d..b1fb0bb1f39f 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -267,11 +267,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	sector = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  
>  	bio = nvmet_req_bio_get(req, NULL);
> -	bio_set_dev(bio, req->ns->bdev);
> -	bio->bi_iter.bi_sector = sector;
> -	bio->bi_private = req;
> -	bio->bi_end_io = nvmet_bio_done;
> -	bio->bi_opf = op;
> +	nvmet_bio_init(bio, req->ns->bdev, op, sector, req, nvmet_bio_done);
>  
>  	blk_start_plug(&plug);
>  	if (req->metadata_len)
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 3fc84f79cce1..1ec9e1b35c67 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -668,4 +668,15 @@ static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
>  	return bio;
>  }
>  
> +static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
> +				  unsigned int op, sector_t sect, void *private,
> +				  bio_end_io_t *bi_end_io)
> +{
> +	bio_set_dev(bio, bdev);
> +	bio->bi_opf = op;
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_private = private;
> +	bio->bi_end_io = bi_end_io;
> +}
> +
>  #endif /* _NVMET_H */
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index c32e93a3c7e1..92213bed0006 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -281,6 +281,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  {
>  	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>  	struct request_queue *q = req->ns->bdev->bd_disk->queue;
> +	unsigned int op = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>  	unsigned int max_sects = queue_max_zone_append_sectors(q);
>  	u16 status = NVME_SC_SUCCESS;
>  	unsigned int total_len = 0;
> @@ -297,9 +298,8 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>  	}
>  
>  	bio = nvmet_req_bio_get(req, NULL);
> -	bio_set_dev(bio, req->ns->bdev);
> -	bio->bi_iter.bi_sector = sect;
> -	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> +	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);

op is used only here I think. So is that variable really necessary ?

> +
>  	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>  		bio->bi_opf |= REQ_FUA;
>  
> 

Apart from the nit above, looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 1/9] block: export bio_add_hw_pages()
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  5:40     ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:26, Chaitanya Kulkarni wrote:
> To implement the NVMe Zone Append command on the NVMeOF target side for
> generic Zoned Block Devices with NVMe Zoned Namespaces interface, we
> need to build the bios with hardware limitations, i.e. we use
> bio_add_hw_page() with queue_max_zone_append_sectors() instead of
> bio_add_page().
> 
> Without this API being exported NVMeOF target will require to use
> bio_add_hw_page() caller bio_iov_iter_get_pages(). That results in
> extra work which is inefficient.
> 
> Export the API so that NVMeOF ZBD over ZNS backend can use it to build
> Zone Append bios.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  block/bio.c            | 1 +
>  block/blk.h            | 4 ----
>  include/linux/blkdev.h | 4 ++++
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 1f2cc1fbe283..5cbd56b54f98 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -826,6 +826,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
>  	bio->bi_iter.bi_size += len;
>  	return len;
>  }
> +EXPORT_SYMBOL(bio_add_hw_page);
>  
>  /**
>   * bio_add_pc_page	- attempt to add page to passthrough bio
> diff --git a/block/blk.h b/block/blk.h
> index 7550364c326c..200030b2d74f 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -351,8 +351,4 @@ int bdev_resize_partition(struct block_device *bdev, int partno,
>  		sector_t start, sector_t length);
>  int disk_expand_part_tbl(struct gendisk *disk, int target);
>  
> -int bio_add_hw_page(struct request_queue *q, struct bio *bio,
> -		struct page *page, unsigned int len, unsigned int offset,
> -		unsigned int max_sectors, bool *same_page);
> -
>  #endif /* BLK_INTERNAL_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 070de09425ad..028ccc9bdf8d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -2005,6 +2005,10 @@ struct block_device *I_BDEV(struct inode *inode);
>  struct block_device *bdgrab(struct block_device *bdev);
>  void bdput(struct block_device *);
>  
> +int bio_add_hw_page(struct request_queue *q, struct bio *bio,
> +		struct page *page, unsigned int len, unsigned int offset,
> +		unsigned int max_sectors, bool *same_page);
> +
>  #ifdef CONFIG_BLOCK
>  void invalidate_bdev(struct block_device *bdev);
>  int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart,
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 1/9] block: export bio_add_hw_pages()
@ 2021-01-12  5:40     ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  5:40 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 13:26, Chaitanya Kulkarni wrote:
> To implement the NVMe Zone Append command on the NVMeOF target side for
> generic Zoned Block Devices with NVMe Zoned Namespaces interface, we
> need to build the bios with hardware limitations, i.e. we use
> bio_add_hw_page() with queue_max_zone_append_sectors() instead of
> bio_add_page().
> 
> Without this API being exported NVMeOF target will require to use
> bio_add_hw_page() caller bio_iov_iter_get_pages(). That results in
> extra work which is inefficient.
> 
> Export the API so that NVMeOF ZBD over ZNS backend can use it to build
> Zone Append bios.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  block/bio.c            | 1 +
>  block/blk.h            | 4 ----
>  include/linux/blkdev.h | 4 ++++
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 1f2cc1fbe283..5cbd56b54f98 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -826,6 +826,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
>  	bio->bi_iter.bi_size += len;
>  	return len;
>  }
> +EXPORT_SYMBOL(bio_add_hw_page);
>  
>  /**
>   * bio_add_pc_page	- attempt to add page to passthrough bio
> diff --git a/block/blk.h b/block/blk.h
> index 7550364c326c..200030b2d74f 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -351,8 +351,4 @@ int bdev_resize_partition(struct block_device *bdev, int partno,
>  		sector_t start, sector_t length);
>  int disk_expand_part_tbl(struct gendisk *disk, int target);
>  
> -int bio_add_hw_page(struct request_queue *q, struct bio *bio,
> -		struct page *page, unsigned int len, unsigned int offset,
> -		unsigned int max_sectors, bool *same_page);
> -
>  #endif /* BLK_INTERNAL_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 070de09425ad..028ccc9bdf8d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -2005,6 +2005,10 @@ struct block_device *I_BDEV(struct inode *inode);
>  struct block_device *bdgrab(struct block_device *bdev);
>  void bdput(struct block_device *);
>  
> +int bio_add_hw_page(struct request_queue *q, struct bio *bio,
> +		struct page *page, unsigned int len, unsigned int offset,
> +		unsigned int max_sectors, bool *same_page);
> +
>  #ifdef CONFIG_BLOCK
>  void invalidate_bdev(struct block_device *bdev);
>  int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart,
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-12  5:37     ` Damien Le Moal
@ 2021-01-12  5:55       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  5:55 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 1/11/21 21:37, Damien Le Moal wrote:
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 7361665585a2..3fc84f79cce1 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -652,4 +652,20 @@ nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>  }
>>  #endif /* CONFIG_BLK_DEV_ZONED */
>>  
>> +static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
>> +					    bio_end_io_t *bi_end_io)
>> +{
>> +	struct bio *bio;
>> +
>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>> +		bio = &req->b.inline_bio;
>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>> +		return bio;
>> +	}
>> +
>> +	bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> I have a doubt about the use of GFP_KERNEL here... Shouldn't these be GFP_NOIO ?
> The code was like this so it is may be OK, but without GFP_NOIO, is forward
> progress guaranteed ? No recursion possible ?
>
I've kept the original behavior, let check if this needs to be
GFP_KERNEL or not,
if so I'll send a separate patch with a proper message.

Thanks for pointing this out.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-12  5:55       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  5:55 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 1/11/21 21:37, Damien Le Moal wrote:
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 7361665585a2..3fc84f79cce1 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -652,4 +652,20 @@ nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>>  }
>>  #endif /* CONFIG_BLK_DEV_ZONED */
>>  
>> +static inline struct bio *nvmet_req_bio_get(struct nvmet_req *req,
>> +					    bio_end_io_t *bi_end_io)
>> +{
>> +	struct bio *bio;
>> +
>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>> +		bio = &req->b.inline_bio;
>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>> +		return bio;
>> +	}
>> +
>> +	bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
> I have a doubt about the use of GFP_KERNEL here... Shouldn't these be GFP_NOIO ?
> The code was like this so it is may be OK, but without GFP_NOIO, is forward
> progress guaranteed ? No recursion possible ?
>
I've kept the original behavior, let check if this needs to be
GFP_KERNEL or not,
if so I'll send a separate patch with a proper message.

Thanks for pointing this out.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-12  5:40     ` Damien Le Moal
@ 2021-01-12  5:57       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  5:57 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 1/11/21 21:40, Damien Le Moal wrote:
>>  	bio = nvmet_req_bio_get(req, NULL);
>> -	bio_set_dev(bio, req->ns->bdev);
>> -	bio->bi_iter.bi_sector = sect;
>> -	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
> op is used only here I think. So is that variable really necessary ?
>
This is just my personal preference as without using op we will have to
add a new line to a function call, I like to keep the function call in
one line as much as I can.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-12  5:57       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  5:57 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 1/11/21 21:40, Damien Le Moal wrote:
>>  	bio = nvmet_req_bio_get(req, NULL);
>> -	bio_set_dev(bio, req->ns->bdev);
>> -	bio->bi_iter.bi_sector = sect;
>> -	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
> op is used only here I think. So is that variable really necessary ?
>
This is just my personal preference as without using op we will have to
add a new line to a function call, I like to keep the function call in
one line as much as I can.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  5:32     ` Damien Le Moal
@ 2021-01-12  6:11       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  6:11 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 1/11/21 21:32, Damien Le Moal wrote:
> On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
>> NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
>> communicate with a non-volatile memory subsystem using zones for
>> NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
>> Protocol compliant devices on the target in the passthru mode. There
>> are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
>> HDDs that are not based on the NVMe protocol.
>>
>> This patch adds ZNS backend to support the ZBDs for NVMeOF target.
>>
>> This support includes implementing the new command set NVME_CSI_ZNS,
>> adding different command handlers for ZNS command set such as
>> NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
>> NVMe Zone Management Send and NVMe Zone Management Receive.
>>
>> With new command set identifier we also update the target command effects
>> logs to reflect the ZNS compliant commands.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>  drivers/nvme/target/Makefile      |   1 +
>>  drivers/nvme/target/admin-cmd.c   |  28 +++
>>  drivers/nvme/target/core.c        |   3 +
>>  drivers/nvme/target/io-cmd-bdev.c |  33 ++-
>>  drivers/nvme/target/nvmet.h       |  38 ++++
>>  drivers/nvme/target/zns.c         | 342 ++++++++++++++++++++++++++++++
>>  6 files changed, 437 insertions(+), 8 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>> index ebf91fc4c72e..9837e580fa7e 100644
>> --- a/drivers/nvme/target/Makefile
>> +++ b/drivers/nvme/target/Makefile
>> @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>>  nvme-loop-y	+= loop.o
>>  nvmet-rdma-y	+= rdma.o
>>  nvmet-fc-y	+= fc.o
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index a50b7bcac67a..bdf09d8faa48 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>>  		break;
>> +	case NVME_CSI_ZNS:
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +			u32 *iocs = log->iocs;
>> +
>> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +		}
> Is it OK to not return an error here if CONFIG_BLK_DEV_ZONED is not enabled ?
> I have not checked the entire code of this function nor how it is called, so I
> may be wrong.
Since we only set the controller cap when CONFIG_BLK_DEV_ZONED is
enabled we should be uniform everywhere in the code, I'll recheck
and make the change if needed.
>> +		break;
>>  	default:
>>  		status = NVME_SC_INVALID_LOG_PAGE;
>>  		break;
>> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>>  	if (status)
>>  		goto out;
>>  
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +		if (req->ns->csi == NVME_CSI_ZNS)
>> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +							  NVME_NIDT_CSI_LEN,
>> +							  &nvme_cis_zns, &off);
>> +		if (status)
>> +			goto out;
>> +	}
> Same comment here.
I think same explanation applies here too, will recheck and make the change
if needed.
>
>> +
>>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>  	switch (req->cmd->identify.cns) {
>>  	case NVME_ID_CNS_NS:
>>  		return nvmet_execute_identify_ns(req);
>> +	case NVME_ID_CNS_CS_NS:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ns(req);
>> +		break;
>>  	case NVME_ID_CNS_CTRL:
>>  		return nvmet_execute_identify_ctrl(req);
>> +	case NVME_ID_CNS_CS_CTRL:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ctrl(req);
>> +		break;
>>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>>  		return nvmet_execute_identify_nslist(req);
>>  	case NVME_ID_CNS_NS_DESC_LIST:
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 672e4009f8d6..17d5da062a5a 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>>  static inline bool nvmet_cc_css_check(u8 cc_css)
>>  {
>>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_CSI:
>>  	case NVME_CC_CSS_NVM:
>>  		return true;
>>  	default:
>> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>>  {
>>  	/* command sets supported: NVMe command set: */
>>  	ctrl->cap = (1ULL << 37);
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> +		ctrl->cap |= (1ULL << 43);
>>  	/* CC.EN timeout in 500msec units: */
>>  	ctrl->cap |= (15ULL << 24);
>>  	/* maximum queue entries supported: */
>> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
>> index 23095bdfce06..6178ef643962 100644
>> --- a/drivers/nvme/target/io-cmd-bdev.c
>> +++ b/drivers/nvme/target/io-cmd-bdev.c
>> @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
>>  	}
>>  }
>>  
>> +void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
>> +{
>> +	if (ns->bdev) {
>> +		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
>> +		ns->bdev = NULL;
>> +	}
>> +}
>> +
>>  int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>>  {
>>  	int ret;
>> @@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>>  		nvmet_bdev_ns_enable_integrity(ns);
>>  
>> -	return 0;
>> -}
>> -
>> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
>> -{
>> -	if (ns->bdev) {
>> -		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
>> -		ns->bdev = NULL;
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
>> +		if (!nvmet_bdev_zns_enable(ns)) {
>> +			nvmet_bdev_ns_disable(ns);
>> +			return -EINVAL;
>> +		}
>> +		ns->csi = NVME_CSI_ZNS;
>>  	}
>> +
>> +	return 0;
>>  }
>>  
>>  void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
>> @@ -448,6 +456,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>>  	case nvme_cmd_write_zeroes:
>>  		req->execute = nvmet_bdev_execute_write_zeroes;
>>  		return 0;
>> +	case nvme_cmd_zone_append:
>> +		req->execute = nvmet_bdev_execute_zone_append;
>> +		return 0;
>> +	case nvme_cmd_zone_mgmt_recv:
>> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
>> +		return 0;
>> +	case nvme_cmd_zone_mgmt_send:
>> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
>> +		return 0;
>>  	default:
>>  		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
>>  		       req->sq->qid);
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 476b3cd91c65..7361665585a2 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -252,6 +252,10 @@ struct nvmet_subsys {
>>  	unsigned int		admin_timeout;
>>  	unsigned int		io_timeout;
>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	u8			zasl;
>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>  };
>>  
>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>> @@ -614,4 +618,38 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>>  	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>>  }
>>  
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>> +#else  /* CONFIG_BLK_DEV_ZONED */
>> +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	return false;
>> +}
>> +static inline void
>> +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +}
>> +#endif /* CONFIG_BLK_DEV_ZONED */
>> +
>>  #endif /* _NVMET_H */
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..2a71f56e568d
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,342 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +#include <linux/nvme.h>
>> +#include <linux/blkdev.h>
>> +#include "nvmet.h"
>> +
>> +/*
>> + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
>> + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
>> + * as page_shift value. When calculating the ZASL use shift by 12.
>> + */
>> +#define NVMET_MPSMIN_SHIFT	12
>> +
>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>> +{
>> +	u16 status = NVME_SC_SUCCESS;
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>> +		status = NVME_SC_INVALID_FIELD;
>> +
>> +out:
> You really want to keep this (useless) label ? Without it, the status variable
> can be dropped and the code overall becomes so much easier to read... Not to
> mention that life will be easier to the compiler for optimizing this.
>
Will remove it in the next version.
>> +	return status;
>> +}
>> +
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
>> +
>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
>> +{
>> +	/*
>> +	 * Zone Append Size Limit is the value experessed in the units
>> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
>> +	 */
>> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
>> +}
>> +
>> +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
>> +{
>> +	struct request_queue *q = ns->bdev->bd_disk->queue;
>> +	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>> +
>> +	if (ns->subsys->zasl)
>> +		return ns->subsys->zasl < zasl ? false : true;
>> +
>> +	ns->subsys->zasl = zasl;
>> +	return true;
>> +}
>> +
>> +
>> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
>> +					    unsigned int idx, void *data)
>> +{
>> +	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
>> +		return -EOPNOTSUPP;
>> +	return 0;
>> +}
>> +
>> +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
>> +{
>> +	int ret;
>> +
>> +	if (bdev->bd_disk->queue->conv_zones_bitmap)
>> +		return true;
>> +
>> +	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
>> +				  nvmet_bdev_validate_zns_zones_cb, NULL);
>> +
>> +	return ret < 0 ? true : false;
> return ret <= 0;
>
> would be simpler.
>
> Note that "<=" includes the error case of the device not reporting any zone
> (device dead) as we should fail that case I think.
>
hmm will make that change.
>> +}
>> +
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	if (nvmet_bdev_has_conv_zones(ns->bdev))
>> +		return false;
>> +
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>> +
>> +	if (!nvmet_zns_update_zasl(ns))
>> +		return false;
>> +
>> +	return !(get_capacity(ns->bdev->bd_disk) &
>> +			(bdev_zone_sectors(ns->bdev) - 1));
>> +}
>> +
>> +/*
>> + * ZNS related Admin and I/O command handlers.
>> + */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	u8 zasl = req->sq->ctrl->subsys->zasl;
>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status;
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	if (ctrl->ops->get_mdts)
>> +		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
>> +	else
>> +		id->zasl = zasl;
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u16 status = NVME_SC_SUCCESS;
>> +	u64 zsze;
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +	if (!req->ns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
>> +	}
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
>> +					req->ns->blksize_shift;
>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
>> +
>> +done:
>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>> +	kfree(id_zns);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +struct nvmet_report_zone_data {
>> +	struct nvmet_ns *ns;
>> +	struct nvme_zone_report *rz;
>> +};
>> +
>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>> +				     void *data)
>> +{
>> +	struct nvmet_report_zone_data *report_zone_data = data;
>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>> +	struct nvmet_ns *ns = report_zone_data->ns;
>> +
>> +	entries[idx].zcap = nvmet_sect_to_lba(ns, z->capacity);
>> +	entries[idx].zslba = nvmet_sect_to_lba(ns, z->start);
>> +	entries[idx].wp = nvmet_sect_to_lba(ns, z->wp);
>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>> +	entries[idx].zt = z->type;
>> +	entries[idx].zs = z->cond << 4;
>> +
>> +	return 0;
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>> +	unsigned int nr_zones;
>> +	int reported_zones;
>> +	u16 status;
>> +
>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>> +			sizeof(struct nvme_zone_descriptor);
> I really would prefer this code to be moved down, before the call to
> blkdev_report_zones().
>
> You can also optimize this value a little with a min() of the value above and of
> DIV_ROUND_UP(dev_capacity - sect, zone size). But not a big deal I think.
I did that as per your last comment, when I did the code review with
host side it didn't match, I've a cleanup patch series to fix nits and
host side css checks for zns I've added this into that series.
>> +
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
> Shouldn't this be GFP_NOIO ? Also, is the NORETRY critical ?
Yes on GFP_NOIO. NORETRY critical means how we areallocating the memory on
the host side nvme_zns_alloc_report_buffer() ?
> blkdev_report_zones() will do mem allocation too and at leadt scsi does retry.
>
>> +	if (!data.rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>> +					     nvmet_bdev_report_zone_cb,
>> +					     &data);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(data.rz);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
>> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	enum req_opf op;
>> +	int ret;
>> +
>> +	if (req->cmd->zms.select_all)
>> +		nr_sect = get_capacity(req->ns->bdev->bd_disk);
>> +
>> +	switch (req->cmd->zms.zsa) {
>> +	case NVME_ZONE_OPEN:
>> +		op = REQ_OP_ZONE_OPEN;
>> +		break;
>> +	case NVME_ZONE_CLOSE:
>> +		op = REQ_OP_ZONE_CLOSE;
>> +		break;
>> +	case NVME_ZONE_FINISH:
>> +		op = REQ_OP_ZONE_FINISH;
>> +		break;
>> +	case NVME_ZONE_RESET:
>> +		op = REQ_OP_ZONE_RESET;
>> +		break;
>> +	default:
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
> GFP_NOIO ?
Yes.
>
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>> +	struct request_queue *q = req->ns->bdev->bd_disk->queue;
>> +	unsigned int max_sects = queue_max_zone_append_sectors(q);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	unsigned int total_len = 0;
>> +	struct scatterlist *sg;
>> +	int ret = 0, sg_cnt;
>> +	struct bio *bio;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
>> +
>> +	if (!req->sg_cnt) {
>> +		nvmet_req_complete(req, 0);
>> +		return;
>> +	}
>> +
>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>> +		bio = &req->b.inline_bio;
>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>> +	} else {
>> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
>> +	}
>> +
>> +	bio_set_dev(bio, req->ns->bdev);
>> +	bio->bi_iter.bi_sector = sect;
>> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>> +		bio->bi_opf |= REQ_FUA;
>> +
>> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
>> +		struct page *p = sg_page(sg);
>> +		unsigned int l = sg->length;
>> +		unsigned int o = sg->offset;
>> +		bool same_page = false;
>> +
>> +		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
>> +		if (ret != sg->length) {
>> +			status = NVME_SC_INTERNAL;
>> +			goto out_bio_put;
>> +		}
>> +		if (same_page)
>> +			put_page(p);
>> +
>> +		total_len += sg->length;
>> +	}
>> +
>> +	if (total_len != nvmet_rw_data_len(req)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out_bio_put;
>> +	}
>> +
>> +	ret = submit_bio_wait(bio);
>> +	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
>> +						 bio->bi_iter.bi_sector);
>> +
>> +out_bio_put:
>> +	if (bio != &req->b.inline_bio)
>> +		bio_put(bio);
>> +	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
>> +}
>>
>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-12  6:11       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  6:11 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, linux-nvme; +Cc: hch, sagi

On 1/11/21 21:32, Damien Le Moal wrote:
> On 2021/01/12 13:27, Chaitanya Kulkarni wrote:
>> NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
>> communicate with a non-volatile memory subsystem using zones for
>> NVMe protocol based controllers. NVMeOF already support the ZNS NVMe
>> Protocol compliant devices on the target in the passthru mode. There
>> are Generic zoned block devices like  Shingled Magnetic Recording (SMR)
>> HDDs that are not based on the NVMe protocol.
>>
>> This patch adds ZNS backend to support the ZBDs for NVMeOF target.
>>
>> This support includes implementing the new command set NVME_CSI_ZNS,
>> adding different command handlers for ZNS command set such as
>> NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
>> NVMe Zone Management Send and NVMe Zone Management Receive.
>>
>> With new command set identifier we also update the target command effects
>> logs to reflect the ZNS compliant commands.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>  drivers/nvme/target/Makefile      |   1 +
>>  drivers/nvme/target/admin-cmd.c   |  28 +++
>>  drivers/nvme/target/core.c        |   3 +
>>  drivers/nvme/target/io-cmd-bdev.c |  33 ++-
>>  drivers/nvme/target/nvmet.h       |  38 ++++
>>  drivers/nvme/target/zns.c         | 342 ++++++++++++++++++++++++++++++
>>  6 files changed, 437 insertions(+), 8 deletions(-)
>>  create mode 100644 drivers/nvme/target/zns.c
>>
>> diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
>> index ebf91fc4c72e..9837e580fa7e 100644
>> --- a/drivers/nvme/target/Makefile
>> +++ b/drivers/nvme/target/Makefile
>> @@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
>>  nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
>>  			discovery.o io-cmd-file.o io-cmd-bdev.o
>>  nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
>> +nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
>>  nvme-loop-y	+= loop.o
>>  nvmet-rdma-y	+= rdma.o
>>  nvmet-fc-y	+= fc.o
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index a50b7bcac67a..bdf09d8faa48 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>>  		break;
>> +	case NVME_CSI_ZNS:
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +			u32 *iocs = log->iocs;
>> +
>> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +		}
> Is it OK to not return an error here if CONFIG_BLK_DEV_ZONED is not enabled ?
> I have not checked the entire code of this function nor how it is called, so I
> may be wrong.
Since we only set the controller cap when CONFIG_BLK_DEV_ZONED is
enabled we should be uniform everywhere in the code, I'll recheck
and make the change if needed.
>> +		break;
>>  	default:
>>  		status = NVME_SC_INVALID_LOG_PAGE;
>>  		break;
>> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>>  	if (status)
>>  		goto out;
>>  
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +		if (req->ns->csi == NVME_CSI_ZNS)
>> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +							  NVME_NIDT_CSI_LEN,
>> +							  &nvme_cis_zns, &off);
>> +		if (status)
>> +			goto out;
>> +	}
> Same comment here.
I think same explanation applies here too, will recheck and make the change
if needed.
>
>> +
>>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>  	switch (req->cmd->identify.cns) {
>>  	case NVME_ID_CNS_NS:
>>  		return nvmet_execute_identify_ns(req);
>> +	case NVME_ID_CNS_CS_NS:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ns(req);
>> +		break;
>>  	case NVME_ID_CNS_CTRL:
>>  		return nvmet_execute_identify_ctrl(req);
>> +	case NVME_ID_CNS_CS_CTRL:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ctrl(req);
>> +		break;
>>  	case NVME_ID_CNS_NS_ACTIVE_LIST:
>>  		return nvmet_execute_identify_nslist(req);
>>  	case NVME_ID_CNS_NS_DESC_LIST:
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 672e4009f8d6..17d5da062a5a 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>>  static inline bool nvmet_cc_css_check(u8 cc_css)
>>  {
>>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_CSI:
>>  	case NVME_CC_CSS_NVM:
>>  		return true;
>>  	default:
>> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>>  {
>>  	/* command sets supported: NVMe command set: */
>>  	ctrl->cap = (1ULL << 37);
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> +		ctrl->cap |= (1ULL << 43);
>>  	/* CC.EN timeout in 500msec units: */
>>  	ctrl->cap |= (15ULL << 24);
>>  	/* maximum queue entries supported: */
>> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
>> index 23095bdfce06..6178ef643962 100644
>> --- a/drivers/nvme/target/io-cmd-bdev.c
>> +++ b/drivers/nvme/target/io-cmd-bdev.c
>> @@ -63,6 +63,14 @@ static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
>>  	}
>>  }
>>  
>> +void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
>> +{
>> +	if (ns->bdev) {
>> +		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
>> +		ns->bdev = NULL;
>> +	}
>> +}
>> +
>>  int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>>  {
>>  	int ret;
>> @@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>>  	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
>>  		nvmet_bdev_ns_enable_integrity(ns);
>>  
>> -	return 0;
>> -}
>> -
>> -void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
>> -{
>> -	if (ns->bdev) {
>> -		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
>> -		ns->bdev = NULL;
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
>> +		if (!nvmet_bdev_zns_enable(ns)) {
>> +			nvmet_bdev_ns_disable(ns);
>> +			return -EINVAL;
>> +		}
>> +		ns->csi = NVME_CSI_ZNS;
>>  	}
>> +
>> +	return 0;
>>  }
>>  
>>  void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
>> @@ -448,6 +456,15 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
>>  	case nvme_cmd_write_zeroes:
>>  		req->execute = nvmet_bdev_execute_write_zeroes;
>>  		return 0;
>> +	case nvme_cmd_zone_append:
>> +		req->execute = nvmet_bdev_execute_zone_append;
>> +		return 0;
>> +	case nvme_cmd_zone_mgmt_recv:
>> +		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
>> +		return 0;
>> +	case nvme_cmd_zone_mgmt_send:
>> +		req->execute = nvmet_bdev_execute_zone_mgmt_send;
>> +		return 0;
>>  	default:
>>  		pr_err("unhandled cmd %d on qid %d\n", cmd->common.opcode,
>>  		       req->sq->qid);
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 476b3cd91c65..7361665585a2 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -252,6 +252,10 @@ struct nvmet_subsys {
>>  	unsigned int		admin_timeout;
>>  	unsigned int		io_timeout;
>>  #endif /* CONFIG_NVME_TARGET_PASSTHRU */
>> +
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +	u8			zasl;
>> +#endif /* CONFIG_BLK_DEV_ZONED */
>>  };
>>  
>>  static inline struct nvmet_subsys *to_subsys(struct config_item *item)
>> @@ -614,4 +618,38 @@ static inline sector_t nvmet_lba_to_sect(struct nvmet_ns *ns, __le64 lba)
>>  	return le64_to_cpu(lba) << (ns->blksize_shift - SECTOR_SHIFT);
>>  }
>>  
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
>> +#else  /* CONFIG_BLK_DEV_ZONED */
>> +static inline bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	return false;
>> +}
>> +static inline void
>> +nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +}
>> +static inline void
>> +nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +}
>> +#endif /* CONFIG_BLK_DEV_ZONED */
>> +
>>  #endif /* _NVMET_H */
>> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
>> new file mode 100644
>> index 000000000000..2a71f56e568d
>> --- /dev/null
>> +++ b/drivers/nvme/target/zns.c
>> @@ -0,0 +1,342 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVMe ZNS-ZBD command implementation.
>> + * Copyright (c) 2020-2021 HGST, a Western Digital Company.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +#include <linux/nvme.h>
>> +#include <linux/blkdev.h>
>> +#include "nvmet.h"
>> +
>> +/*
>> + * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
>> + * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
>> + * as page_shift value. When calculating the ZASL use shift by 12.
>> + */
>> +#define NVMET_MPSMIN_SHIFT	12
>> +
>> +static u16 nvmet_bdev_zns_checks(struct nvmet_req *req)
>> +{
>> +	u16 status = NVME_SC_SUCCESS;
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.zrasf != NVME_ZRASF_ZONE_REPORT_ALL) {
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	if (req->cmd->zmr.pr != NVME_REPORT_ZONE_PARTIAL)
>> +		status = NVME_SC_INVALID_FIELD;
>> +
>> +out:
> You really want to keep this (useless) label ? Without it, the status variable
> can be dropped and the code overall becomes so much easier to read... Not to
> mention that life will be easier to the compiler for optimizing this.
>
Will remove it in the next version.
>> +	return status;
>> +}
>> +
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
>> +
>> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
>> +{
>> +	/*
>> +	 * Zone Append Size Limit is the value experessed in the units
>> +	 * of minimum memory page size (i.e. 12) and is reported power of 2.
>> +	 */
>> +	return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
>> +}
>> +
>> +static inline bool nvmet_zns_update_zasl(struct nvmet_ns *ns)
>> +{
>> +	struct request_queue *q = ns->bdev->bd_disk->queue;
>> +	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
>> +
>> +	if (ns->subsys->zasl)
>> +		return ns->subsys->zasl < zasl ? false : true;
>> +
>> +	ns->subsys->zasl = zasl;
>> +	return true;
>> +}
>> +
>> +
>> +static int nvmet_bdev_validate_zns_zones_cb(struct blk_zone *z,
>> +					    unsigned int idx, void *data)
>> +{
>> +	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
>> +		return -EOPNOTSUPP;
>> +	return 0;
>> +}
>> +
>> +static bool nvmet_bdev_has_conv_zones(struct block_device *bdev)
>> +{
>> +	int ret;
>> +
>> +	if (bdev->bd_disk->queue->conv_zones_bitmap)
>> +		return true;
>> +
>> +	ret = blkdev_report_zones(bdev, 0, blkdev_nr_zones(bdev->bd_disk),
>> +				  nvmet_bdev_validate_zns_zones_cb, NULL);
>> +
>> +	return ret < 0 ? true : false;
> return ret <= 0;
>
> would be simpler.
>
> Note that "<=" includes the error case of the device not reporting any zone
> (device dead) as we should fail that case I think.
>
hmm will make that change.
>> +}
>> +
>> +bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
>> +{
>> +	if (nvmet_bdev_has_conv_zones(ns->bdev))
>> +		return false;
>> +
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
>> +	ns->blksize_shift = blksize_bits(bdev_physical_block_size(ns->bdev));
>> +
>> +	if (!nvmet_zns_update_zasl(ns))
>> +		return false;
>> +
>> +	return !(get_capacity(ns->bdev->bd_disk) &
>> +			(bdev_zone_sectors(ns->bdev) - 1));
>> +}
>> +
>> +/*
>> + * ZNS related Admin and I/O command handlers.
>> + */
>> +void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
>> +{
>> +	u8 zasl = req->sq->ctrl->subsys->zasl;
>> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>> +	struct nvme_id_ctrl_zns *id;
>> +	u16 status;
>> +
>> +	id = kzalloc(sizeof(*id), GFP_KERNEL);
>> +	if (!id) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	if (ctrl->ops->get_mdts)
>> +		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
>> +	else
>> +		id->zasl = zasl;
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
>> +
>> +	kfree(id);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
>> +{
>> +	struct nvme_id_ns_zns *id_zns;
>> +	u16 status = NVME_SC_SUCCESS;
>> +	u64 zsze;
>> +
>> +	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto out;
>> +	}
>> +
>> +	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
>> +	if (!id_zns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	req->ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +	if (!req->ns) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto done;
>> +	}
>> +
>> +	if (!bdev_is_zoned(req->ns->bdev)) {
>> +		req->error_loc = offsetof(struct nvme_identify, nsid);
>> +		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
>> +		goto done;
>> +	}
>> +
>> +	nvmet_ns_revalidate(req->ns);
>> +	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
>> +					req->ns->blksize_shift;
>> +	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
>> +	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
>> +	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
>> +
>> +done:
>> +	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
>> +	kfree(id_zns);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +struct nvmet_report_zone_data {
>> +	struct nvmet_ns *ns;
>> +	struct nvme_zone_report *rz;
>> +};
>> +
>> +static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned int idx,
>> +				     void *data)
>> +{
>> +	struct nvmet_report_zone_data *report_zone_data = data;
>> +	struct nvme_zone_descriptor *entries = report_zone_data->rz->entries;
>> +	struct nvmet_ns *ns = report_zone_data->ns;
>> +
>> +	entries[idx].zcap = nvmet_sect_to_lba(ns, z->capacity);
>> +	entries[idx].zslba = nvmet_sect_to_lba(ns, z->start);
>> +	entries[idx].wp = nvmet_sect_to_lba(ns, z->wp);
>> +	entries[idx].za = z->reset ? 1 << 2 : 0;
>> +	entries[idx].zt = z->type;
>> +	entries[idx].zs = z->cond << 4;
>> +
>> +	return 0;
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>> +	unsigned int nr_zones;
>> +	int reported_zones;
>> +	u16 status;
>> +
>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>> +			sizeof(struct nvme_zone_descriptor);
> I really would prefer this code to be moved down, before the call to
> blkdev_report_zones().
>
> You can also optimize this value a little with a min() of the value above and of
> DIV_ROUND_UP(dev_capacity - sect, zone size). But not a big deal I think.
I did that as per your last comment, when I did the code review with
host side it didn't match, I've a cleanup patch series to fix nits and
host side css checks for zns I've added this into that series.
>> +
>> +	status = nvmet_bdev_zns_checks(req);
>> +	if (status)
>> +		goto out;
>> +
>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
> Shouldn't this be GFP_NOIO ? Also, is the NORETRY critical ?
Yes on GFP_NOIO. NORETRY critical means how we areallocating the memory on
the host side nvme_zns_alloc_report_buffer() ?
> blkdev_report_zones() will do mem allocation too and at leadt scsi does retry.
>
>> +	if (!data.rz) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out;
>> +	}
>> +
>> +	reported_zones = blkdev_report_zones(req->ns->bdev, sect, nr_zones,
>> +					     nvmet_bdev_report_zone_cb,
>> +					     &data);
>> +	if (reported_zones < 0) {
>> +		status = NVME_SC_INTERNAL;
>> +		goto out_free_report_zones;
>> +	}
>> +
>> +	data.rz->nr_zones = cpu_to_le64(reported_zones);
>> +
>> +	status = nvmet_copy_to_sgl(req, 0, data.rz, bufsize);
>> +
>> +out_free_report_zones:
>> +	kvfree(data.rz);
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
>> +	sector_t nr_sect = bdev_zone_sectors(req->ns->bdev);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	enum req_opf op;
>> +	int ret;
>> +
>> +	if (req->cmd->zms.select_all)
>> +		nr_sect = get_capacity(req->ns->bdev->bd_disk);
>> +
>> +	switch (req->cmd->zms.zsa) {
>> +	case NVME_ZONE_OPEN:
>> +		op = REQ_OP_ZONE_OPEN;
>> +		break;
>> +	case NVME_ZONE_CLOSE:
>> +		op = REQ_OP_ZONE_CLOSE;
>> +		break;
>> +	case NVME_ZONE_FINISH:
>> +		op = REQ_OP_ZONE_FINISH;
>> +		break;
>> +	case NVME_ZONE_RESET:
>> +		op = REQ_OP_ZONE_RESET;
>> +		break;
>> +	default:
>> +		status = NVME_SC_INVALID_FIELD;
>> +		goto out;
>> +	}
>> +
>> +	ret = blkdev_zone_mgmt(req->ns->bdev, op, sect, nr_sect, GFP_KERNEL);
> GFP_NOIO ?
Yes.
>
>> +	if (ret)
>> +		status = NVME_SC_INTERNAL;
>> +out:
>> +	nvmet_req_complete(req, status);
>> +}
>> +
>> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
>> +{
>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
>> +	struct request_queue *q = req->ns->bdev->bd_disk->queue;
>> +	unsigned int max_sects = queue_max_zone_append_sectors(q);
>> +	u16 status = NVME_SC_SUCCESS;
>> +	unsigned int total_len = 0;
>> +	struct scatterlist *sg;
>> +	int ret = 0, sg_cnt;
>> +	struct bio *bio;
>> +
>> +	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
>> +		return;
>> +
>> +	if (!req->sg_cnt) {
>> +		nvmet_req_complete(req, 0);
>> +		return;
>> +	}
>> +
>> +	if (req->transfer_len <= NVMET_MAX_INLINE_DATA_LEN) {
>> +		bio = &req->b.inline_bio;
>> +		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
>> +	} else {
>> +		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
>> +	}
>> +
>> +	bio_set_dev(bio, req->ns->bdev);
>> +	bio->bi_iter.bi_sector = sect;
>> +	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>> +	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
>> +		bio->bi_opf |= REQ_FUA;
>> +
>> +	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
>> +		struct page *p = sg_page(sg);
>> +		unsigned int l = sg->length;
>> +		unsigned int o = sg->offset;
>> +		bool same_page = false;
>> +
>> +		ret = bio_add_hw_page(q, bio, p, l, o, max_sects, &same_page);
>> +		if (ret != sg->length) {
>> +			status = NVME_SC_INTERNAL;
>> +			goto out_bio_put;
>> +		}
>> +		if (same_page)
>> +			put_page(p);
>> +
>> +		total_len += sg->length;
>> +	}
>> +
>> +	if (total_len != nvmet_rw_data_len(req)) {
>> +		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> +		goto out_bio_put;
>> +	}
>> +
>> +	ret = submit_bio_wait(bio);
>> +	req->cqe->result.u64 = nvmet_sect_to_lba(req->ns,
>> +						 bio->bi_iter.bi_sector);
>> +
>> +out_bio_put:
>> +	if (bio != &req->b.inline_bio)
>> +		bio_put(bio);
>> +	nvmet_req_complete(req, ret < 0 ? NVME_SC_INTERNAL : status);
>> +}
>>
>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 0/9] nvmet: add ZBD backend support
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-01-12  6:12   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  6:12 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, linux-nvme, hch, sagi

Damien,

On 1/11/21 20:26, Chaitanya Kulkarni wrote:
> Hi,
>
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
> backend. There is no support for a generic block device backend to
> handle the ZBD devices which are not NVMe protocol compliant.
>
> This adds support to export the ZBDs (which are not NVMe drives) to host
> the from target via NVMeOF using the host side ZNS interface.
>
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for
> patch-series overview.
>
> All the testcases are passing for the ZoneFS where ZBD exported with
> NVMeOF backed by null_blk ZBD and null_blk ZBD without NVMeOF. Adding
> test result below.
>
> Note: This patch-series is based on the earlier posted patch series :-
>
> [PATCH V2 0/4] nvmet: admin-cmd related cleanups and a fix
> http://lists.infradead.org/pipermail/linux-nvme/2021-January/021729.html
>
> -ck

thanks a lot or your comments, I'll send a V10 with fixes for your comments.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 0/9] nvmet: add ZBD backend support
@ 2021-01-12  6:12   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-12  6:12 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, hch, linux-nvme, sagi

Damien,

On 1/11/21 20:26, Chaitanya Kulkarni wrote:
> Hi,
>
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
> backend. There is no support for a generic block device backend to
> handle the ZBD devices which are not NVMe protocol compliant.
>
> This adds support to export the ZBDs (which are not NVMe drives) to host
> the from target via NVMeOF using the host side ZNS interface.
>
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for
> patch-series overview.
>
> All the testcases are passing for the ZoneFS where ZBD exported with
> NVMeOF backed by null_blk ZBD and null_blk ZBD without NVMeOF. Adding
> test result below.
>
> Note: This patch-series is based on the earlier posted patch series :-
>
> [PATCH V2 0/4] nvmet: admin-cmd related cleanups and a fix
> http://lists.infradead.org/pipermail/linux-nvme/2021-January/021729.html
>
> -ck

thanks a lot or your comments, I'll send a V10 with fixes for your comments.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-12  5:57       ` Chaitanya Kulkarni
@ 2021-01-12  6:27         ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  6:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 14:57, Chaitanya Kulkarni wrote:
> On 1/11/21 21:40, Damien Le Moal wrote:
>>>  	bio = nvmet_req_bio_get(req, NULL);
>>> -	bio_set_dev(bio, req->ns->bdev);
>>> -	bio->bi_iter.bi_sector = sect;
>>> -	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
>> op is used only here I think. So is that variable really necessary ?
>>
> This is just my personal preference as without using op we will have to
> add a new line to a function call, I like to keep the function call in
> one line as much as I can.

A new line in the code costs nothing. An unnecessary variable costs stack space...


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-12  6:27         ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  6:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 14:57, Chaitanya Kulkarni wrote:
> On 1/11/21 21:40, Damien Le Moal wrote:
>>>  	bio = nvmet_req_bio_get(req, NULL);
>>> -	bio_set_dev(bio, req->ns->bdev);
>>> -	bio->bi_iter.bi_sector = sect;
>>> -	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
>>> +	nvmet_bio_init(bio, req->ns->bdev, op, sect, NULL, NULL);
>> op is used only here I think. So is that variable really necessary ?
>>
> This is just my personal preference as without using op we will have to
> add a new line to a function call, I like to keep the function call in
> one line as much as I can.

A new line in the code costs nothing. An unnecessary variable costs stack space...


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  6:11       ` Chaitanya Kulkarni
@ 2021-01-12  6:31         ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  6:31 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 15:11, Chaitanya Kulkarni wrote:
[...]
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>> +	unsigned int nr_zones;
>>> +	int reported_zones;
>>> +	u16 status;
>>> +
>>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>> +			sizeof(struct nvme_zone_descriptor);
>> I really would prefer this code to be moved down, before the call to
>> blkdev_report_zones().
>>
>> You can also optimize this value a little with a min() of the value above and of
>> DIV_ROUND_UP(dev_capacity - sect, zone size). But not a big deal I think.
> I did that as per your last comment, when I did the code review with
> host side it didn't match, I've a cleanup patch series to fix nits and
> host side css checks for zns I've added this into that series.
>>> +
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>> Shouldn't this be GFP_NOIO ? Also, is the NORETRY critical ?
> Yes on GFP_NOIO. NORETRY critical means how we areallocating the memory on
> the host side nvme_zns_alloc_report_buffer() ?

By critical, I mean that if __GFP_NORETRY is removed, things break ? Or is it
just an optimization to avoid overtaxing the host resources ? I suspect the
latter case, but wanted to make sure.



-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-12  6:31         ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  6:31 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: hch, sagi

On 2021/01/12 15:11, Chaitanya Kulkarni wrote:
[...]
>>> +void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
>>> +{
>>> +	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>>> +	u32 bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
>>> +	struct nvmet_report_zone_data data = { .ns = req->ns };
>>> +	unsigned int nr_zones;
>>> +	int reported_zones;
>>> +	u16 status;
>>> +
>>> +	nr_zones = (bufsize - sizeof(struct nvme_zone_report)) /
>>> +			sizeof(struct nvme_zone_descriptor);
>> I really would prefer this code to be moved down, before the call to
>> blkdev_report_zones().
>>
>> You can also optimize this value a little with a min() of the value above and of
>> DIV_ROUND_UP(dev_capacity - sect, zone size). But not a big deal I think.
> I did that as per your last comment, when I did the code review with
> host side it didn't match, I've a cleanup patch series to fix nits and
> host side css checks for zns I've added this into that series.
>>> +
>>> +	status = nvmet_bdev_zns_checks(req);
>>> +	if (status)
>>> +		goto out;
>>> +
>>> +	data.rz = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY | __GFP_ZERO);
>> Shouldn't this be GFP_NOIO ? Also, is the NORETRY critical ?
> Yes on GFP_NOIO. NORETRY critical means how we areallocating the memory on
> the host side nvme_zns_alloc_report_buffer() ?

By critical, I mean that if __GFP_NORETRY is removed, things break ? Or is it
just an optimization to avoid overtaxing the host resources ? I suspect the
latter case, but wanted to make sure.



-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 1/9] block: export bio_add_hw_pages()
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:24     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:24 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

EXPORT_SYMBOL_GPL, please.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 1/9] block: export bio_add_hw_pages()
@ 2021-01-12  7:24     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:24 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

EXPORT_SYMBOL_GPL, please.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 3/9] nvmet: add NVM command set identifier support
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:27     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

The Command Set Identifier has no "NVM" in its name.


> +static inline bool nvmet_cc_css_check(u8 cc_css)
> +{
> +	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_NVM:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

This hunk looks misplaced, it isn't very useful on its own, but
should go together with the multiple command set support.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 3/9] nvmet: add NVM command set identifier support
@ 2021-01-12  7:27     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

The Command Set Identifier has no "NVM" in its name.


> +static inline bool nvmet_cc_css_check(u8 cc_css)
> +{
> +	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_NVM:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

This hunk looks misplaced, it isn't very useful on its own, but
should go together with the multiple command set support.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:33     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

I'm not a huge fan of this helper, especially as it sets an end_io
callback only for the allocated case, which is a weird calling
convention.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-12  7:33     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

I'm not a huge fan of this helper, especially as it sets an end_io
callback only for the allocated case, which is a weird calling
convention.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:33     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

> +static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
> +				  unsigned int op, sector_t sect, void *private,
> +				  bio_end_io_t *bi_end_io)
> +{
> +	bio_set_dev(bio, bdev);
> +	bio->bi_opf = op;
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_private = private;
> +	bio->bi_end_io = bi_end_io;
> +}

Nothing NVMe specific about this.  The helper also doesn't relaly contain
any logic either.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-12  7:33     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

> +static inline void nvmet_bio_init(struct bio *bio, struct block_device *bdev,
> +				  unsigned int op, sector_t sect, void *private,
> +				  bio_end_io_t *bi_end_io)
> +{
> +	bio_set_dev(bio, bdev);
> +	bio->bi_opf = op;
> +	bio->bi_iter.bi_sector = sect;
> +	bio->bi_private = private;
> +	bio->bi_end_io = bi_end_io;
> +}

Nothing NVMe specific about this.  The helper also doesn't relaly contain
any logic either.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 8/9] nvmet: add common I/O length check helper
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:35     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

I can't say I like this helper.  The semantics are a little confusing,
not helped by the name.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 8/9] nvmet: add common I/O length check helper
@ 2021-01-12  7:35     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

I can't say I like this helper.  The semantics are a little confusing,
not helped by the name.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:36     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

I don't see much of a need to share such trivial functionality over
different codebases.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append
@ 2021-01-12  7:36     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

I don't see much of a need to share such trivial functionality over
different codebases.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-12  7:48     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index a50b7bcac67a..bdf09d8faa48 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  		break;
> +	case NVME_CSI_ZNS:
> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +			u32 *iocs = log->iocs;
> +
> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +		}
> +		break;

We need to return errors if the command set is not actually supported.
I also think splitting this into one helper per command set would
be nice.

> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>  	if (status)
>  		goto out;
>  
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +		if (req->ns->csi == NVME_CSI_ZNS)
> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +							  NVME_NIDT_CSI_LEN,
> +							  &nvme_cis_zns, &off);
> +		if (status)
> +			goto out;
> +	}

We need to add the CSI for every namespace, i.e. something like:

	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
					  &req->ns->csi);		
	if (status)
		goto out;

and this hunk needs to go into the CSI patch.

>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ns(req);
> +		break;
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ctrl(req);
> +		break;

How does the CSI get mirrored into the cns field?

> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 672e4009f8d6..17d5da062a5a 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>  static inline bool nvmet_cc_css_check(u8 cc_css)
>  {
>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_CSI:
>  	case NVME_CC_CSS_NVM:
>  		return true;
>  	default:
> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>  {
>  	/* command sets supported: NVMe command set: */
>  	ctrl->cap = (1ULL << 37);
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +		ctrl->cap |= (1ULL << 43);
>  	/* CC.EN timeout in 500msec units: */
>  	ctrl->cap |= (15ULL << 24);
>  	/* maximum queue entries supported: */

This needs to go into a separate patch for multiple command set support.
We can probably merge the CAP and CC bits with the CSI support, though.

> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {

bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
these days.

> +/*
> + *  ZNS related command implementation and helpers.
> + */

Well, that is the description of the whole file, isn't it?  I don't think
this comment adds much value.

> +	/*
> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> +	 * to the device physical block size. So use this value as the logical
> +	 * block size to avoid errors.
> +	 */

I do not understand the logic here, given that NVMe does not have
conventional zones.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-12  7:48     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-12  7:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index a50b7bcac67a..bdf09d8faa48 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>  		break;
> +	case NVME_CSI_ZNS:
> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +			u32 *iocs = log->iocs;
> +
> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
> +		}
> +		break;

We need to return errors if the command set is not actually supported.
I also think splitting this into one helper per command set would
be nice.

> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>  	if (status)
>  		goto out;
>  
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> +		if (req->ns->csi == NVME_CSI_ZNS)
> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +							  NVME_NIDT_CSI_LEN,
> +							  &nvme_cis_zns, &off);
> +		if (status)
> +			goto out;
> +	}

We need to add the CSI for every namespace, i.e. something like:

	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
					  &req->ns->csi);		
	if (status)
		goto out;

and this hunk needs to go into the CSI patch.

>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>  	switch (req->cmd->identify.cns) {
>  	case NVME_ID_CNS_NS:
>  		return nvmet_execute_identify_ns(req);
> +	case NVME_ID_CNS_CS_NS:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ns(req);
> +		break;
>  	case NVME_ID_CNS_CTRL:
>  		return nvmet_execute_identify_ctrl(req);
> +	case NVME_ID_CNS_CS_CTRL:
> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
> +			return nvmet_execute_identify_cns_cs_ctrl(req);
> +		break;

How does the CSI get mirrored into the cns field?

> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 672e4009f8d6..17d5da062a5a 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>  static inline bool nvmet_cc_css_check(u8 cc_css)
>  {
>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_CSI:
>  	case NVME_CC_CSS_NVM:
>  		return true;
>  	default:
> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>  {
>  	/* command sets supported: NVMe command set: */
>  	ctrl->cap = (1ULL << 37);
> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +		ctrl->cap |= (1ULL << 43);
>  	/* CC.EN timeout in 500msec units: */
>  	ctrl->cap |= (15ULL << 24);
>  	/* maximum queue entries supported: */

This needs to go into a separate patch for multiple command set support.
We can probably merge the CAP and CC bits with the CSI support, though.

> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {

bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
these days.

> +/*
> + *  ZNS related command implementation and helpers.
> + */

Well, that is the description of the whole file, isn't it?  I don't think
this comment adds much value.

> +	/*
> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> +	 * to the device physical block size. So use this value as the logical
> +	 * block size to avoid errors.
> +	 */

I do not understand the logic here, given that NVMe does not have
conventional zones.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  7:48     ` Christoph Hellwig
@ 2021-01-12  7:52       ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  7:52 UTC (permalink / raw)
  To: Christoph Hellwig, Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, sagi

On 2021/01/12 16:48, Christoph Hellwig wrote:
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index a50b7bcac67a..bdf09d8faa48 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>>  		break;
>> +	case NVME_CSI_ZNS:
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +			u32 *iocs = log->iocs;
>> +
>> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +		}
>> +		break;
> 
> We need to return errors if the command set is not actually supported.
> I also think splitting this into one helper per command set would
> be nice.
> 
>> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>>  	if (status)
>>  		goto out;
>>  
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +		if (req->ns->csi == NVME_CSI_ZNS)
>> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +							  NVME_NIDT_CSI_LEN,
>> +							  &nvme_cis_zns, &off);
>> +		if (status)
>> +			goto out;
>> +	}
> 
> We need to add the CSI for every namespace, i.e. something like:
> 
> 	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> 					  &req->ns->csi);		
> 	if (status)
> 		goto out;
> 
> and this hunk needs to go into the CSI patch.
> 
>>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>  	switch (req->cmd->identify.cns) {
>>  	case NVME_ID_CNS_NS:
>>  		return nvmet_execute_identify_ns(req);
>> +	case NVME_ID_CNS_CS_NS:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ns(req);
>> +		break;
>>  	case NVME_ID_CNS_CTRL:
>>  		return nvmet_execute_identify_ctrl(req);
>> +	case NVME_ID_CNS_CS_CTRL:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ctrl(req);
>> +		break;
> 
> How does the CSI get mirrored into the cns field?
> 
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 672e4009f8d6..17d5da062a5a 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>>  static inline bool nvmet_cc_css_check(u8 cc_css)
>>  {
>>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_CSI:
>>  	case NVME_CC_CSS_NVM:
>>  		return true;
>>  	default:
>> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>>  {
>>  	/* command sets supported: NVMe command set: */
>>  	ctrl->cap = (1ULL << 37);
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> +		ctrl->cap |= (1ULL << 43);
>>  	/* CC.EN timeout in 500msec units: */
>>  	ctrl->cap |= (15ULL << 24);
>>  	/* maximum queue entries supported: */
> 
> This needs to go into a separate patch for multiple command set support.
> We can probably merge the CAP and CC bits with the CSI support, though.
> 
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
> 
> bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
> these days.
> 
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
> 
> Well, that is the description of the whole file, isn't it?  I don't think
> this comment adds much value.
> 
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
> 
> I do not understand the logic here, given that NVMe does not have
> conventional zones.

512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
these, all writes in sequential zones must be 4K aligned. So I suggested to
Chaitanya to simply use the physical block size as the LBA size for the target
to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
aligned write requests failing).


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-12  7:52       ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-12  7:52 UTC (permalink / raw)
  To: Christoph Hellwig, Chaitanya Kulkarni; +Cc: linux-block, sagi, linux-nvme

On 2021/01/12 16:48, Christoph Hellwig wrote:
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index a50b7bcac67a..bdf09d8faa48 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>>  		break;
>> +	case NVME_CSI_ZNS:
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +			u32 *iocs = log->iocs;
>> +
>> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +		}
>> +		break;
> 
> We need to return errors if the command set is not actually supported.
> I also think splitting this into one helper per command set would
> be nice.
> 
>> @@ -644,6 +653,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>>  	if (status)
>>  		goto out;
>>  
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +		if (req->ns->csi == NVME_CSI_ZNS)
>> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +							  NVME_NIDT_CSI_LEN,
>> +							  &nvme_cis_zns, &off);
>> +		if (status)
>> +			goto out;
>> +	}
> 
> We need to add the CSI for every namespace, i.e. something like:
> 
> 	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> 					  &req->ns->csi);		
> 	if (status)
> 		goto out;
> 
> and this hunk needs to go into the CSI patch.
> 
>>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>  	switch (req->cmd->identify.cns) {
>>  	case NVME_ID_CNS_NS:
>>  		return nvmet_execute_identify_ns(req);
>> +	case NVME_ID_CNS_CS_NS:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ns(req);
>> +		break;
>>  	case NVME_ID_CNS_CTRL:
>>  		return nvmet_execute_identify_ctrl(req);
>> +	case NVME_ID_CNS_CS_CTRL:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ctrl(req);
>> +		break;
> 
> How does the CSI get mirrored into the cns field?
> 
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 672e4009f8d6..17d5da062a5a 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>>  static inline bool nvmet_cc_css_check(u8 cc_css)
>>  {
>>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_CSI:
>>  	case NVME_CC_CSS_NVM:
>>  		return true;
>>  	default:
>> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>>  {
>>  	/* command sets supported: NVMe command set: */
>>  	ctrl->cap = (1ULL << 37);
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> +		ctrl->cap |= (1ULL << 43);
>>  	/* CC.EN timeout in 500msec units: */
>>  	ctrl->cap |= (15ULL << 24);
>>  	/* maximum queue entries supported: */
> 
> This needs to go into a separate patch for multiple command set support.
> We can probably merge the CAP and CC bits with the CSI support, though.
> 
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
> 
> bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
> these days.
> 
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
> 
> Well, that is the description of the whole file, isn't it?  I don't think
> this comment adds much value.
> 
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
> 
> I do not understand the logic here, given that NVMe does not have
> conventional zones.

512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
these, all writes in sequential zones must be 4K aligned. So I suggested to
Chaitanya to simply use the physical block size as the LBA size for the target
to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
aligned write requests failing).


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 1/9] block: export bio_add_hw_pages()
  2021-01-12  7:24     ` Christoph Hellwig
@ 2021-01-13  1:20       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  1:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:24, Christoph Hellwig wrote:
> EXPORT_SYMBOL_GPL, please.
>
Okay.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 1/9] block: export bio_add_hw_pages()
@ 2021-01-13  1:20       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  1:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:24, Christoph Hellwig wrote:
> EXPORT_SYMBOL_GPL, please.
>
Okay.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 3/9] nvmet: add NVM command set identifier support
  2021-01-12  7:27     ` Christoph Hellwig
@ 2021-01-13  4:16       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  4:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:27, Christoph Hellwig wrote:
> The Command Set Identifier has no "NVM" in its name.
>
>
>> +static inline bool nvmet_cc_css_check(u8 cc_css)
>> +{
>> +	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_NVM:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
> This hunk looks misplaced, it isn't very useful on its own, but
> should go together with the multiple command set support.
>
We advertise the support for command sets supported in
nvmet_init_cap() -> ctrl->cap = (1ULL << 37). This results in
nvme_enable_ctrl() setting the ctrl->ctrl_config -> NVME_CC_CSS_NVM.
In current code in nvmet_start_ctrl() ->  nvmet_cc_css(ctrl->cc) != 0
checks if value is not = 0 but doesn't use the macro used by the host.
Above function does that also makes it helper that we use in the next
patch where cc_css value is != 0 but NVME_CC_CSS_CSI with
ctrl->cap set to 1ULL << 43.

With code flow in [1] above function is needed to make sure css value
matches the value set by the host using the same macro in
nvme_enable_ctrl() NVME_CC_CSS_NVM. Otherwise patch looks incomplete
and adding check for the CSS NVM with CSS_CSI looks mixing up things
to me.

Are you okay with that ?

[1]
nvme_enable_ctrl()
 ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config)
  nvmf_reg_write32()
   nvmet_parse_fabrics_cmd()
    nvmet_execute_prop_set()
     nvmet_update_ctrl()
      new cc != old cc == true -> nvmet_start_ctrl()
       nvmet_cc_css_check(ctrl->css)
        Check if host has set the for controller config NVME_CC_CSS_NVM
        as we are supporting default CSS_NVM which ctrl needs to set
        irrespective of other CC_CSS values.
      
 



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 3/9] nvmet: add NVM command set identifier support
@ 2021-01-13  4:16       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  4:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:27, Christoph Hellwig wrote:
> The Command Set Identifier has no "NVM" in its name.
>
>
>> +static inline bool nvmet_cc_css_check(u8 cc_css)
>> +{
>> +	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_NVM:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
> This hunk looks misplaced, it isn't very useful on its own, but
> should go together with the multiple command set support.
>
We advertise the support for command sets supported in
nvmet_init_cap() -> ctrl->cap = (1ULL << 37). This results in
nvme_enable_ctrl() setting the ctrl->ctrl_config -> NVME_CC_CSS_NVM.
In current code in nvmet_start_ctrl() ->  nvmet_cc_css(ctrl->cc) != 0
checks if value is not = 0 but doesn't use the macro used by the host.
Above function does that also makes it helper that we use in the next
patch where cc_css value is != 0 but NVME_CC_CSS_CSI with
ctrl->cap set to 1ULL << 43.

With code flow in [1] above function is needed to make sure css value
matches the value set by the host using the same macro in
nvme_enable_ctrl() NVME_CC_CSS_NVM. Otherwise patch looks incomplete
and adding check for the CSS NVM with CSS_CSI looks mixing up things
to me.

Are you okay with that ?

[1]
nvme_enable_ctrl()
 ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config)
  nvmf_reg_write32()
   nvmet_parse_fabrics_cmd()
    nvmet_execute_prop_set()
     nvmet_update_ctrl()
      new cc != old cc == true -> nvmet_start_ctrl()
       nvmet_cc_css_check(ctrl->css)
        Check if host has set the for controller config NVME_CC_CSS_NVM
        as we are supporting default CSS_NVM which ctrl needs to set
        irrespective of other CC_CSS values.
      
 



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  7:48     ` Christoph Hellwig
@ 2021-01-13  4:57       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  4:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:48, Christoph Hellwig wrote:
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index a50b7bcac67a..bdf09d8faa48 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>>  		break;
>> +	case NVME_CSI_ZNS:
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +			u32 *iocs = log->iocs;
>> +
>> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +		}
>> +		break;
> We need to return errors if the command set is not actually supported.
> I also think splitting this into one helper per command set would
> be nice.
>
Okay.
>> @@ -644,6 +653,17 @@ static void nvmet_execuIt should be te_identify_desclist(struct nvmet_req *req)
>>  	if (status)
>>  		goto out;
>>  
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +		if (req->ns->csi == NVME_CSI_ZNS)
>> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +							  NVME_NIDT_CSI_LEN,
>> +							  &nvme_cis_zns, &off);
>> +		if (status)
>> +			goto out;
>> +	}
> We need to add the CSI for every namespace, i.e. something like:
>
> 	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> 					  &req->ns->csi);		
> 	if (status)
> 		goto out;
>
> and this hunk needs to go into the CSI patch.
even better, we can get rid of the local variables...
>>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>  	switch (req->cmd->identify.cns) {
>>  	case NVME_ID_CNS_NS:
>>  		return nvmet_execute_identify_ns(req);
>> +	case NVME_ID_CNS_CS_NS:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ns(req);
>> +		break;
>>  	case NVME_ID_CNS_CTRL:
>>  		return nvmet_execute_identify_ctrl(req);
>> +	case NVME_ID_CNS_CS_CTRL:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ctrl(req);
>> +		break;
> How does the CSI get mirrored into the cns field?
>
There is only one cns and one csi value we set from host/zns.c
This is just to reject req if we receive anything else or there is
any change on the host we fail.
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 672e4009f8d6..17d5da062a5a 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>>  static inline bool nvmet_cc_css_check(u8 cc_css)
>>  {
>>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_CSI:
>>  	case NVME_CC_CSS_NVM:
>>  		return true;
>>  	default:
>> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>>  {
>>  	/* command sets supported: NVMe command set: */
>>  	ctrl->cap = (1ULL << 37);
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> +		ctrl->cap |= (1ULL << 43);
>>  	/* CC.EN timeout in 500msec units: */
>>  	ctrl->cap |= (15ULL << 24);
>>  	/* maximum queue entries supported: */
> This needs to go into a separate patch for multiple command set support.
> We can probably merge the CAP and CC bits with the CSI support, though.
Do you mean previous patch ? but we don't add handlers non-default I/O
command set until this patch..
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
> bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
> these days.
Are you saying something like following in the prep patch ?or should
just remove
theIS_ENABLED(CONFIG_BLK_DEV_ZONED)part in above if?

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 028ccc9bdf8d..124086c1a0ba 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1570,6 +1570,9 @@ static inline bool bdev_is_zoned(struct
block_device *bdev)
 {
        struct request_queue *q = bdev_get_queue(bdev);
 
+       if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+               return false;
+
        if (q)
                return blk_queue_is_zoned(q);
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
> Well, that is the description of the whole file, isn't it?  I don't think
> this comment adds much value.
Stupid comment, will remove it.
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
> I do not understand the logic here, given that NVMe does not have
> conventional zones.
>
It should be :-

	/*
	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
	 * to the device physical block size. So use this value as the *physical*
	 * block size to avoid errors.
	 */



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-13  4:57       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  4:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:48, Christoph Hellwig wrote:
>> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
>> index a50b7bcac67a..bdf09d8faa48 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -191,6 +191,15 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>>  		log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>>  		log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
>>  		break;
>> +	case NVME_CSI_ZNS:
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +			u32 *iocs = log->iocs;
>> +
>> +			iocs[nvme_cmd_zone_append]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
>> +			iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
>> +		}
>> +		break;
> We need to return errors if the command set is not actually supported.
> I also think splitting this into one helper per command set would
> be nice.
>
Okay.
>> @@ -644,6 +653,17 @@ static void nvmet_execuIt should be te_identify_desclist(struct nvmet_req *req)
>>  	if (status)
>>  		goto out;
>>  
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
>> +		u16 nvme_cis_zns = NVME_CSI_ZNS;
>> +
>> +		if (req->ns->csi == NVME_CSI_ZNS)
>> +			status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
>> +							  NVME_NIDT_CSI_LEN,
>> +							  &nvme_cis_zns, &off);
>> +		if (status)
>> +			goto out;
>> +	}
> We need to add the CSI for every namespace, i.e. something like:
>
> 	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> 					  &req->ns->csi);		
> 	if (status)
> 		goto out;
>
> and this hunk needs to go into the CSI patch.
even better, we can get rid of the local variables...
>>  	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>>  			off) != NVME_IDENTIFY_DATA_SIZE - off)
>>  		status = NVME_SC_INTERNAL | NVME_SC_DNR;
>> @@ -660,8 +680,16 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>>  	switch (req->cmd->identify.cns) {
>>  	case NVME_ID_CNS_NS:
>>  		return nvmet_execute_identify_ns(req);
>> +	case NVME_ID_CNS_CS_NS:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ns(req);
>> +		break;
>>  	case NVME_ID_CNS_CTRL:
>>  		return nvmet_execute_identify_ctrl(req);
>> +	case NVME_ID_CNS_CS_CTRL:
>> +		if (req->cmd->identify.csi == NVME_CSI_ZNS)
>> +			return nvmet_execute_identify_cns_cs_ctrl(req);
>> +		break;
> How does the CSI get mirrored into the cns field?
>
There is only one cns and one csi value we set from host/zns.c
This is just to reject req if we receive anything else or there is
any change on the host we fail.
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 672e4009f8d6..17d5da062a5a 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -1107,6 +1107,7 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>>  static inline bool nvmet_cc_css_check(u8 cc_css)
>>  {
>>  	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
>> +	case NVME_CC_CSS_CSI:
>>  	case NVME_CC_CSS_NVM:
>>  		return true;
>>  	default:
>> @@ -1173,6 +1174,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>>  {
>>  	/* command sets supported: NVMe command set: */
>>  	ctrl->cap = (1ULL << 37);
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> +		ctrl->cap |= (1ULL << 43);
>>  	/* CC.EN timeout in 500msec units: */
>>  	ctrl->cap |= (15ULL << 24);
>>  	/* maximum queue entries supported: */
> This needs to go into a separate patch for multiple command set support.
> We can probably merge the CAP and CC bits with the CSI support, though.
Do you mean previous patch ? but we don't add handlers non-default I/O
command set until this patch..
>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && bdev_is_zoned(ns->bdev)) {
> bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
> these days.
Are you saying something like following in the prep patch ?or should
just remove
theIS_ENABLED(CONFIG_BLK_DEV_ZONED)part in above if?

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 028ccc9bdf8d..124086c1a0ba 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1570,6 +1570,9 @@ static inline bool bdev_is_zoned(struct
block_device *bdev)
 {
        struct request_queue *q = bdev_get_queue(bdev);
 
+       if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+               return false;
+
        if (q)
                return blk_queue_is_zoned(q);
>> +/*
>> + *  ZNS related command implementation and helpers.
>> + */
> Well, that is the description of the whole file, isn't it?  I don't think
> this comment adds much value.
Stupid comment, will remove it.
>> +	/*
>> +	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
>> +	 * to the device physical block size. So use this value as the logical
>> +	 * block size to avoid errors.
>> +	 */
> I do not understand the logic here, given that NVMe does not have
> conventional zones.
>
It should be :-

	/*
	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
	 * to the device physical block size. So use this value as the *physical*
	 * block size to avoid errors.
	 */



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-12  7:33     ` Christoph Hellwig
@ 2021-01-13  5:03       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:33, Christoph Hellwig wrote:
> I'm not a huge fan of this helper, especially as it sets an end_io
> callback only for the allocated case, which is a weird calling
> convention.
>

The patch has a right documentation and that end_io is needed
for passthru case. To get rid of the weirdness I can remove
passthru case and make itend_io assign to inline and non-inline bio
to make it non-weired.

Since this eliminates exactly identical lines of the code in atleast two
backend which we should try to use the helper in non-wried way.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-13  5:03       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:33, Christoph Hellwig wrote:
> I'm not a huge fan of this helper, especially as it sets an end_io
> callback only for the allocated case, which is a weird calling
> convention.
>

The patch has a right documentation and that end_io is needed
for passthru case. To get rid of the weirdness I can remove
passthru case and make itend_io assign to inline and non-inline bio
to make it non-weired.

Since this eliminates exactly identical lines of the code in atleast two
backend which we should try to use the helper in non-wried way.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-12  7:33     ` Christoph Hellwig
@ 2021-01-13  5:04       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:34, Christoph Hellwig wrote:
> Nothing NVMe specific about this.  The helper also doesn't relaly contain
> any logic either.
>
What is the preferable name ? just bio_init_fields() ?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-13  5:04       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:34, Christoph Hellwig wrote:
> Nothing NVMe specific about this.  The helper also doesn't relaly contain
> any logic either.
>
What is the preferable name ? just bio_init_fields() ?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 8/9] nvmet: add common I/O length check helper
  2021-01-12  7:35     ` Christoph Hellwig
@ 2021-01-13  5:07       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:35, Christoph Hellwig wrote:
> I can't say I like this helper.  The semantics are a little confusing,
> not helped by the name.
>
Is it because of the name of the helper ? if so can you please suggest a
name.
Because all it has a same code repeated in the three backends and it really
bothers me to see duplicate code, but that just me :P.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 8/9] nvmet: add common I/O length check helper
@ 2021-01-13  5:07       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:35, Christoph Hellwig wrote:
> I can't say I like this helper.  The semantics are a little confusing,
> not helped by the name.
>
Is it because of the name of the helper ? if so can you please suggest a
name.
Because all it has a same code repeated in the three backends and it really
bothers me to see duplicate code, but that just me :P.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append
  2021-01-12  7:36     ` Christoph Hellwig
@ 2021-01-13  5:13       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/11/21 23:36, Christoph Hellwig wrote:
> I don't see much of a need to share such trivial functionality over
> different codebases.
>
Since there is a function which does exactly same thing thwn why we should
open code ? I didn't find a good reason to open code, in fact it made it
easier
to search for nvmet_bio_done() which is logical point.

We can drop this but please consider previous patches as it has a lot of
repetitive code. (will remove the weirdness and add name fixes).


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append
@ 2021-01-13  5:13       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-13  5:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/11/21 23:36, Christoph Hellwig wrote:
> I don't see much of a need to share such trivial functionality over
> different codebases.
>
Since there is a function which does exactly same thing thwn why we should
open code ? I didn't find a good reason to open code, in fact it made it
easier
to search for nvmet_bio_done() which is logical point.

We can drop this but please consider previous patches as it has a lot of
repetitive code. (will remove the weirdness and add name fixes).


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 2/9] nvmet: add lba to sect conversion helpers
  2021-01-12  4:26   ` Chaitanya Kulkarni
@ 2021-01-18 18:19     ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, hch, sagi, damien.lemoal

Applied to nvme-5.12.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 2/9] nvmet: add lba to sect conversion helpers
@ 2021-01-18 18:19     ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, damien.lemoal, hch, linux-nvme, sagi

Applied to nvme-5.12.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 3/9] nvmet: add NVM command set identifier support
  2021-01-13  4:16       ` Chaitanya Kulkarni
@ 2021-01-18 18:21         ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, linux-block, linux-nvme, sagi, Damien Le Moal

On Wed, Jan 13, 2021 at 04:16:51AM +0000, Chaitanya Kulkarni wrote:
> We advertise the support for command sets supported in
> nvmet_init_cap() -> ctrl->cap = (1ULL << 37). This results in
> nvme_enable_ctrl() setting the ctrl->ctrl_config -> NVME_CC_CSS_NVM.
> In current code in nvmet_start_ctrl() ->  nvmet_cc_css(ctrl->cc) != 0
> checks if value is not = 0 but doesn't use the macro used by the host.
> Above function does that also makes it helper that we use in the next
> patch where cc_css value is != 0 but NVME_CC_CSS_CSI with
> ctrl->cap set to 1ULL << 43.
> 
> With code flow in [1] above function is needed to make sure css value
> matches the value set by the host using the same macro in
> nvme_enable_ctrl() NVME_CC_CSS_NVM. Otherwise patch looks incomplete
> and adding check for the CSS NVM with CSS_CSI looks mixing up things
> to me.
> 
> Are you okay with that ?

Yeah, we can probably include it in an overall multiple command sets
support patch.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 3/9] nvmet: add NVM command set identifier support
@ 2021-01-18 18:21         ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Damien Le Moal, Christoph Hellwig, linux-nvme, sagi

On Wed, Jan 13, 2021 at 04:16:51AM +0000, Chaitanya Kulkarni wrote:
> We advertise the support for command sets supported in
> nvmet_init_cap() -> ctrl->cap = (1ULL << 37). This results in
> nvme_enable_ctrl() setting the ctrl->ctrl_config -> NVME_CC_CSS_NVM.
> In current code in nvmet_start_ctrl() ->  nvmet_cc_css(ctrl->cc) != 0
> checks if value is not = 0 but doesn't use the macro used by the host.
> Above function does that also makes it helper that we use in the next
> patch where cc_css value is != 0 but NVME_CC_CSS_CSI with
> ctrl->cap set to 1ULL << 43.
> 
> With code flow in [1] above function is needed to make sure css value
> matches the value set by the host using the same macro in
> nvme_enable_ctrl() NVME_CC_CSS_NVM. Otherwise patch looks incomplete
> and adding check for the CSS NVM with CSS_CSI looks mixing up things
> to me.
> 
> Are you okay with that ?

Yeah, we can probably include it in an overall multiple command sets
support patch.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-12  7:52       ` Damien Le Moal
@ 2021-01-18 18:25         ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:25 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Chaitanya Kulkarni, linux-block, linux-nvme, sagi

On Tue, Jan 12, 2021 at 07:52:27AM +0000, Damien Le Moal wrote:
> > 
> > I do not understand the logic here, given that NVMe does not have
> > conventional zones.
> 
> 512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
> these, all writes in sequential zones must be 4K aligned. So I suggested to
> Chaitanya to simply use the physical block size as the LBA size for the target
> to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
> aligned write requests failing).

But in NVMe the physical block size exposes the atomic write unit, which
could be way too large.  Іf we want to do this cleanly we need to expose
a minimum sequential zone write alignment value in the block layer.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-18 18:25         ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:25 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, linux-nvme, Christoph Hellwig, Chaitanya Kulkarni, sagi

On Tue, Jan 12, 2021 at 07:52:27AM +0000, Damien Le Moal wrote:
> > 
> > I do not understand the logic here, given that NVMe does not have
> > conventional zones.
> 
> 512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
> these, all writes in sequential zones must be 4K aligned. So I suggested to
> Chaitanya to simply use the physical block size as the LBA size for the target
> to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
> aligned write requests failing).

But in NVMe the physical block size exposes the atomic write unit, which
could be way too large.  Іf we want to do this cleanly we need to expose
a minimum sequential zone write alignment value in the block layer.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-13  4:57       ` Chaitanya Kulkarni
@ 2021-01-18 18:27         ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, linux-block, linux-nvme, sagi, Damien Le Moal

On Wed, Jan 13, 2021 at 04:57:15AM +0000, Chaitanya Kulkarni wrote:
> >>  	/* command sets supported: NVMe command set: */
> >>  	ctrl->cap = (1ULL << 37);
> >> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> >> +		ctrl->cap |= (1ULL << 43);
> >>  	/* CC.EN timeout in 500msec units: */
> >>  	ctrl->cap |= (15ULL << 24);
> >>  	/* maximum queue entries supported: */
> > This needs to go into a separate patch for multiple command set support.
> > We can probably merge the CAP and CC bits with the CSI support, though.
> Do you mean previous patch ?

Yes.

> but we don't add handlers non-default I/O
> command set until this patch..

No, bit 43 just means the TP to support multiple comman sets is supported.
That infrastructure can be used even by controllers only supporting the
NVM command set.

> > bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
> > these days.
> Are you saying something like following in the prep patch ?or should
> just remove
> theIS_ENABLED(CONFIG_BLK_DEV_ZONED)part in above if?
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 028ccc9bdf8d..124086c1a0ba 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1570,6 +1570,9 @@ static inline bool bdev_is_zoned(struct
> block_device *bdev)
>  {
>         struct request_queue *q = bdev_get_queue(bdev);
>  
> +       if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +               return false;
> +
>         if (q)
>                 return blk_queue_is_zoned(q);

blk_queue_is_zoned calls blk_queue_zoned_model, which is stubbed out
already.  So no extra work should be required.

> 	/*
> 	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> 	 * to the device physical block size. So use this value as the *physical*
> 	 * block size to avoid errors.
> 	 */

See my reply to Damien.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-18 18:27         ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:27 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Damien Le Moal, Christoph Hellwig, linux-nvme, sagi

On Wed, Jan 13, 2021 at 04:57:15AM +0000, Chaitanya Kulkarni wrote:
> >>  	/* command sets supported: NVMe command set: */
> >>  	ctrl->cap = (1ULL << 37);
> >> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> >> +		ctrl->cap |= (1ULL << 43);
> >>  	/* CC.EN timeout in 500msec units: */
> >>  	ctrl->cap |= (15ULL << 24);
> >>  	/* maximum queue entries supported: */
> > This needs to go into a separate patch for multiple command set support.
> > We can probably merge the CAP and CC bits with the CSI support, though.
> Do you mean previous patch ?

Yes.

> but we don't add handlers non-default I/O
> command set until this patch..

No, bit 43 just means the TP to support multiple comman sets is supported.
That infrastructure can be used even by controllers only supporting the
NVM command set.

> > bdev_is_zoned should be probably stubbed out for !CONFIG_BLK_DEV_ZONED
> > these days.
> Are you saying something like following in the prep patch ?or should
> just remove
> theIS_ENABLED(CONFIG_BLK_DEV_ZONED)part in above if?
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 028ccc9bdf8d..124086c1a0ba 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1570,6 +1570,9 @@ static inline bool bdev_is_zoned(struct
> block_device *bdev)
>  {
>         struct request_queue *q = bdev_get_queue(bdev);
>  
> +       if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
> +               return false;
> +
>         if (q)
>                 return blk_queue_is_zoned(q);

blk_queue_is_zoned calls blk_queue_zoned_model, which is stubbed out
already.  So no extra work should be required.

> 	/*
> 	 * For ZBC and ZAC devices, writes into sequential zones must be aligned
> 	 * to the device physical block size. So use this value as the *physical*
> 	 * block size to avoid errors.
> 	 */

See my reply to Damien.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-13  5:03       ` Chaitanya Kulkarni
@ 2021-01-18 18:28         ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:28 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, linux-block, linux-nvme, sagi, Damien Le Moal

On Wed, Jan 13, 2021 at 05:03:05AM +0000, Chaitanya Kulkarni wrote:
> On 1/11/21 23:33, Christoph Hellwig wrote:
> > I'm not a huge fan of this helper, especially as it sets an end_io
> > callback only for the allocated case, which is a weird calling
> > convention.
> >
> 
> The patch has a right documentation and that end_io is needed
> for passthru case. To get rid of the weirdness I can remove
> passthru case and make itend_io assign to inline and non-inline bio
> to make it non-weired.
> 
> Since this eliminates exactly identical lines of the code in atleast two
> backend which we should try to use the helper in non-wried way.

I really do not like helper that just eliminate "duplicate lines" vs
encapsulating logic that makes sense one its own.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-18 18:28         ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:28 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Damien Le Moal, Christoph Hellwig, linux-nvme, sagi

On Wed, Jan 13, 2021 at 05:03:05AM +0000, Chaitanya Kulkarni wrote:
> On 1/11/21 23:33, Christoph Hellwig wrote:
> > I'm not a huge fan of this helper, especially as it sets an end_io
> > callback only for the allocated case, which is a weird calling
> > convention.
> >
> 
> The patch has a right documentation and that end_io is needed
> for passthru case. To get rid of the weirdness I can remove
> passthru case and make itend_io assign to inline and non-inline bio
> to make it non-weired.
> 
> Since this eliminates exactly identical lines of the code in atleast two
> backend which we should try to use the helper in non-wried way.

I really do not like helper that just eliminate "duplicate lines" vs
encapsulating logic that makes sense one its own.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
  2021-01-13  5:04       ` Chaitanya Kulkarni
@ 2021-01-18 18:33         ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, linux-block, linux-nvme, sagi, Damien Le Moal

On Wed, Jan 13, 2021 at 05:04:49AM +0000, Chaitanya Kulkarni wrote:
> On 1/11/21 23:34, Christoph Hellwig wrote:
> > Nothing NVMe specific about this.  The helper also doesn't relaly contain
> > any logic either.
> >
> What is the preferable name ? just bio_init_fields() ?

We'll eventually need to replace bio_alloc with something initializing
the additional fields.  For now I'd just do nothing and skip the helper
for now.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 6/9] nvmet: add bio init helper for different backends
@ 2021-01-18 18:33         ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:33 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Damien Le Moal, Christoph Hellwig, linux-nvme, sagi

On Wed, Jan 13, 2021 at 05:04:49AM +0000, Chaitanya Kulkarni wrote:
> On 1/11/21 23:34, Christoph Hellwig wrote:
> > Nothing NVMe specific about this.  The helper also doesn't relaly contain
> > any logic either.
> >
> What is the preferable name ? just bio_init_fields() ?

We'll eventually need to replace bio_alloc with something initializing
the additional fields.  For now I'd just do nothing and skip the helper
for now.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 8/9] nvmet: add common I/O length check helper
  2021-01-13  5:07       ` Chaitanya Kulkarni
@ 2021-01-18 18:34         ` Christoph Hellwig
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, linux-block, linux-nvme, sagi, Damien Le Moal

On Wed, Jan 13, 2021 at 05:07:16AM +0000, Chaitanya Kulkarni wrote:
> On 1/11/21 23:35, Christoph Hellwig wrote:
> > I can't say I like this helper.  The semantics are a little confusing,
> > not helped by the name.
> >
> Is it because of the name of the helper ? if so can you please suggest a
> name.
> Because all it has a same code repeated in the three backends and it really
> bothers me to see duplicate code, but that just me :P.

Also because it does two rather unrelated things.  Or in other words:
I don't think the helper helps the readability of the code.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 8/9] nvmet: add common I/O length check helper
@ 2021-01-18 18:34         ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2021-01-18 18:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Damien Le Moal, Christoph Hellwig, linux-nvme, sagi

On Wed, Jan 13, 2021 at 05:07:16AM +0000, Chaitanya Kulkarni wrote:
> On 1/11/21 23:35, Christoph Hellwig wrote:
> > I can't say I like this helper.  The semantics are a little confusing,
> > not helped by the name.
> >
> Is it because of the name of the helper ? if so can you please suggest a
> name.
> Because all it has a same code repeated in the three backends and it really
> bothers me to see duplicate code, but that just me :P.

Also because it does two rather unrelated things.  Or in other words:
I don't think the helper helps the readability of the code.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-18 18:25         ` Christoph Hellwig
@ 2021-01-19  0:02           ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-19  0:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chaitanya Kulkarni, linux-block, linux-nvme, sagi

On 2021/01/19 3:25, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 07:52:27AM +0000, Damien Le Moal wrote:
>>>
>>> I do not understand the logic here, given that NVMe does not have
>>> conventional zones.
>>
>> 512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
>> these, all writes in sequential zones must be 4K aligned. So I suggested to
>> Chaitanya to simply use the physical block size as the LBA size for the target
>> to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
>> aligned write requests failing).
> 
> But in NVMe the physical block size exposes the atomic write unit, which
> could be way too large.  Іf we want to do this cleanly we need to expose
> a minimum sequential zone write alignment value in the block layer.
> 

OK, good point. I think I can cook something like that today.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-19  0:02           ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-19  0:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, Chaitanya Kulkarni, sagi

On 2021/01/19 3:25, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 07:52:27AM +0000, Damien Le Moal wrote:
>>>
>>> I do not understand the logic here, given that NVMe does not have
>>> conventional zones.
>>
>> 512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
>> these, all writes in sequential zones must be 4K aligned. So I suggested to
>> Chaitanya to simply use the physical block size as the LBA size for the target
>> to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
>> aligned write requests failing).
> 
> But in NVMe the physical block size exposes the atomic write unit, which
> could be way too large.  Іf we want to do this cleanly we need to expose
> a minimum sequential zone write alignment value in the block layer.
> 

OK, good point. I think I can cook something like that today.

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-18 18:25         ` Christoph Hellwig
@ 2021-01-19  4:28           ` Damien Le Moal
  -1 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-19  4:28 UTC (permalink / raw)
  To: hch; +Cc: Chaitanya Kulkarni, sagi, linux-block, linux-nvme

On Mon, 2021-01-18 at 19:25 +0100, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 07:52:27AM +0000, Damien Le Moal wrote:
> > > 
> > > I do not understand the logic here, given that NVMe does not have
> > > conventional zones.
> > 
> > 512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
> > these, all writes in sequential zones must be 4K aligned. So I suggested to
> > Chaitanya to simply use the physical block size as the LBA size for the target
> > to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
> > aligned write requests failing).
> 
> But in NVMe the physical block size exposes the atomic write unit, which
> could be way too large.  Іf we want to do this cleanly we need to expose
> a minimum sequential zone write alignment value in the block layer.

What about something like this below to add to Chaitanya series ?
This adds the queue limit zone_write_granularity which is set to the physical
block size for scsi, and for NVMe/ZNS too since that value is limited to the
atomic block size. Lightly tested with both 512e and 4kn SMR drives.


diff --git a/block/blk-settings.c b/block/blk-settings.c
index 43990b1d148b..d6c2677a38df 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -60,6 +60,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
+	lim->zone_write_granularity = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -341,6 +342,14 @@ void blk_queue_logical_block_size(struct request_queue *q,
unsigned int size)
 		round_down(limits->max_hw_sectors, size >> SECTOR_SHIFT);
 	limits->max_sectors =
 		round_down(limits->max_sectors, size >> SECTOR_SHIFT);
+
+	if (blk_queue_is_zoned(q)) {
+		if (limits->zone_write_granularity < limits-
>logical_block_size)
+			limits->zone_write_granularity =
+				limits->logical_block_size;
+		if (q->limits.zone_write_granularity < q->limits.io_min)
+			q->limits.zone_write_granularity = q->limits.io_min;
+	}
 }
 EXPORT_SYMBOL(blk_queue_logical_block_size);
 
@@ -361,11 +370,39 @@ void blk_queue_physical_block_size(struct request_queue
*q, unsigned int size)
 	if (q->limits.physical_block_size < q->limits.logical_block_size)
 		q->limits.physical_block_size = q->limits.logical_block_size;
 
-	if (q->limits.io_min < q->limits.physical_block_size)
+	if (q->limits.io_min < q->limits.physical_block_size) {
 		q->limits.io_min = q->limits.physical_block_size;
+		if (blk_queue_is_zoned(q)
+		    && q->limits.zone_write_granularity < q->limits.io_min)
+			q->limits.zone_write_granularity = q->limits.io_min;
+	}
 }
 EXPORT_SYMBOL(blk_queue_physical_block_size);
 
+/**
+ * blk_queue_zone_write_granularity - set zone write granularity for the queue
+ * @q:  the request queue for the zoned device
+ * @size:  the zone write granularity size, in bytes
+ *
+ * Description:
+ *   This should be set to the lowest possible size allowing to write in
+ *   sequential zones of a zoned block device.
+ */
+void blk_queue_zone_write_granularity(struct request_queue *q, unsigned int
size)
+{
+	if (WARN_ON(!blk_queue_is_zoned(q)))
+		return;
+
+	q->limits.zone_write_granularity = size;
+
+	if (q->limits.zone_write_granularity < q->limits.logical_block_size)
+		q->limits.zone_write_granularity = q-
>limits.logical_block_size;
+
+	if (q->limits.zone_write_granularity < q->limits.io_min)
+		q->limits.zone_write_granularity = q->limits.io_min;
+}
+EXPORT_SYMBOL(blk_queue_zone_write_granularity);
+
 /**
  * blk_queue_alignment_offset - set physical block alignment offset
  * @q:	the request queue for the device
@@ -631,6 +668,8 @@ int blk_stack_limits(struct queue_limits *t, struct
queue_limits *b,
 			t->discard_granularity;
 	}
 
+	t->zone_write_granularity = max(t->zone_write_granularity,
+					b->zone_write_granularity);
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..7ea3dd4d876b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -219,6 +219,11 @@ static ssize_t queue_write_zeroes_max_show(struct
request_queue *q, char *page)
 		(unsigned long long)q->limits.max_write_zeroes_sectors << 9);
 }
 
+static ssize_t queue_zone_write_granularity_show(struct request_queue *q, char
*page)
+{
+	return queue_var_show(q->limits.zone_write_granularity, page);
+}
+
 static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 {
 	unsigned long long max_sectors = q->limits.max_zone_append_sectors;
@@ -585,6 +590,7 @@ QUEUE_RO_ENTRY(queue_discard_zeroes_data,
"discard_zeroes_data");
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
+QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
 
 QUEUE_RO_ENTRY(queue_zoned, "zoned");
 QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
@@ -639,6 +645,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
+	&queue_zone_write_granularity_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_zoned_entry.attr,
 	&queue_nr_zones_entry.attr,
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 1dfe9a3500e3..def76ac88248 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -113,6 +113,13 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned
lbaf)
 	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
 	blk_queue_max_open_zones(q, le32_to_cpu(id->mor) + 1);
 	blk_queue_max_active_zones(q, le32_to_cpu(id->mar) + 1);
+
+	/*
+	 * The physical block size is limited to the Atomic Write Unit Power
+	 * Fail parameter. Use this value as the zone write granularity as it
+	 * may be different from the logical block size.
+	 */
+	blk_queue_zone_write_granularity(q, q->limits.physical_block_size);
 free_data:
 	kfree(id);
 	return status;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index cf07b7f93579..41d602f7e62e 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -789,6 +789,16 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned
char *buf)
 	blk_queue_max_active_zones(q, 0);
 	nr_zones = round_up(sdkp->capacity, zone_blocks) >>
ilog2(zone_blocks);
 
+	/*
+	 * Per ZBC and ZAC specifications, writes in sequential write required
+	 * zones of host-managed devices must be aligned to the device
physical
+	 * block size.
+	 */
+	if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
+		blk_queue_zone_write_granularity(q, sdkp-
>physical_block_size);
+	else
+		blk_queue_zone_write_granularity(q, sdkp->device-
>sector_size);
+
 	/* READ16/WRITE16 is mandatory for ZBC disks */
 	sdkp->device->use_16_for_rw = 1;
 	sdkp->device->use_10_for_rw = 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..4b4df2644882 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -337,6 +337,7 @@ struct queue_limits {
 	unsigned int		max_zone_append_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
+	unsigned int		zone_write_granularity;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -1161,6 +1162,7 @@ extern void blk_queue_logical_block_size(struct
request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned
int);
+void blk_queue_zone_write_granularity(struct request_queue *, unsigned int);
 extern void blk_queue_alignment_offset(struct request_queue *q,
 				       unsigned int alignment);
 void blk_queue_update_readahead(struct request_queue *q);



-- 
Damien Le Moal
Western Digital

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-19  4:28           ` Damien Le Moal
  0 siblings, 0 replies; 98+ messages in thread
From: Damien Le Moal @ 2021-01-19  4:28 UTC (permalink / raw)
  To: hch; +Cc: linux-block, linux-nvme, sagi, Chaitanya Kulkarni

On Mon, 2021-01-18 at 19:25 +0100, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 07:52:27AM +0000, Damien Le Moal wrote:
> > > 
> > > I do not understand the logic here, given that NVMe does not have
> > > conventional zones.
> > 
> > 512e SAS & SATA SMR drives (512B logical, 4K physical) are a big thing, and for
> > these, all writes in sequential zones must be 4K aligned. So I suggested to
> > Chaitanya to simply use the physical block size as the LBA size for the target
> > to avoid weird IO errors that would not make sense in ZNS/NVMe world (e.g. 512B
> > aligned write requests failing).
> 
> But in NVMe the physical block size exposes the atomic write unit, which
> could be way too large.  Іf we want to do this cleanly we need to expose
> a minimum sequential zone write alignment value in the block layer.

What about something like this below to add to Chaitanya series ?
This adds the queue limit zone_write_granularity which is set to the physical
block size for scsi, and for NVMe/ZNS too since that value is limited to the
atomic block size. Lightly tested with both 512e and 4kn SMR drives.


diff --git a/block/blk-settings.c b/block/blk-settings.c
index 43990b1d148b..d6c2677a38df 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -60,6 +60,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->io_opt = 0;
 	lim->misaligned = 0;
 	lim->zoned = BLK_ZONED_NONE;
+	lim->zone_write_granularity = 0;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -341,6 +342,14 @@ void blk_queue_logical_block_size(struct request_queue *q,
unsigned int size)
 		round_down(limits->max_hw_sectors, size >> SECTOR_SHIFT);
 	limits->max_sectors =
 		round_down(limits->max_sectors, size >> SECTOR_SHIFT);
+
+	if (blk_queue_is_zoned(q)) {
+		if (limits->zone_write_granularity < limits-
>logical_block_size)
+			limits->zone_write_granularity =
+				limits->logical_block_size;
+		if (q->limits.zone_write_granularity < q->limits.io_min)
+			q->limits.zone_write_granularity = q->limits.io_min;
+	}
 }
 EXPORT_SYMBOL(blk_queue_logical_block_size);
 
@@ -361,11 +370,39 @@ void blk_queue_physical_block_size(struct request_queue
*q, unsigned int size)
 	if (q->limits.physical_block_size < q->limits.logical_block_size)
 		q->limits.physical_block_size = q->limits.logical_block_size;
 
-	if (q->limits.io_min < q->limits.physical_block_size)
+	if (q->limits.io_min < q->limits.physical_block_size) {
 		q->limits.io_min = q->limits.physical_block_size;
+		if (blk_queue_is_zoned(q)
+		    && q->limits.zone_write_granularity < q->limits.io_min)
+			q->limits.zone_write_granularity = q->limits.io_min;
+	}
 }
 EXPORT_SYMBOL(blk_queue_physical_block_size);
 
+/**
+ * blk_queue_zone_write_granularity - set zone write granularity for the queue
+ * @q:  the request queue for the zoned device
+ * @size:  the zone write granularity size, in bytes
+ *
+ * Description:
+ *   This should be set to the lowest possible size allowing to write in
+ *   sequential zones of a zoned block device.
+ */
+void blk_queue_zone_write_granularity(struct request_queue *q, unsigned int
size)
+{
+	if (WARN_ON(!blk_queue_is_zoned(q)))
+		return;
+
+	q->limits.zone_write_granularity = size;
+
+	if (q->limits.zone_write_granularity < q->limits.logical_block_size)
+		q->limits.zone_write_granularity = q-
>limits.logical_block_size;
+
+	if (q->limits.zone_write_granularity < q->limits.io_min)
+		q->limits.zone_write_granularity = q->limits.io_min;
+}
+EXPORT_SYMBOL(blk_queue_zone_write_granularity);
+
 /**
  * blk_queue_alignment_offset - set physical block alignment offset
  * @q:	the request queue for the device
@@ -631,6 +668,8 @@ int blk_stack_limits(struct queue_limits *t, struct
queue_limits *b,
 			t->discard_granularity;
 	}
 
+	t->zone_write_granularity = max(t->zone_write_granularity,
+					b->zone_write_granularity);
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..7ea3dd4d876b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -219,6 +219,11 @@ static ssize_t queue_write_zeroes_max_show(struct
request_queue *q, char *page)
 		(unsigned long long)q->limits.max_write_zeroes_sectors << 9);
 }
 
+static ssize_t queue_zone_write_granularity_show(struct request_queue *q, char
*page)
+{
+	return queue_var_show(q->limits.zone_write_granularity, page);
+}
+
 static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 {
 	unsigned long long max_sectors = q->limits.max_zone_append_sectors;
@@ -585,6 +590,7 @@ QUEUE_RO_ENTRY(queue_discard_zeroes_data,
"discard_zeroes_data");
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
+QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
 
 QUEUE_RO_ENTRY(queue_zoned, "zoned");
 QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
@@ -639,6 +645,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
+	&queue_zone_write_granularity_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_zoned_entry.attr,
 	&queue_nr_zones_entry.attr,
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 1dfe9a3500e3..def76ac88248 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -113,6 +113,13 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned
lbaf)
 	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
 	blk_queue_max_open_zones(q, le32_to_cpu(id->mor) + 1);
 	blk_queue_max_active_zones(q, le32_to_cpu(id->mar) + 1);
+
+	/*
+	 * The physical block size is limited to the Atomic Write Unit Power
+	 * Fail parameter. Use this value as the zone write granularity as it
+	 * may be different from the logical block size.
+	 */
+	blk_queue_zone_write_granularity(q, q->limits.physical_block_size);
 free_data:
 	kfree(id);
 	return status;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index cf07b7f93579..41d602f7e62e 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -789,6 +789,16 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned
char *buf)
 	blk_queue_max_active_zones(q, 0);
 	nr_zones = round_up(sdkp->capacity, zone_blocks) >>
ilog2(zone_blocks);
 
+	/*
+	 * Per ZBC and ZAC specifications, writes in sequential write required
+	 * zones of host-managed devices must be aligned to the device
physical
+	 * block size.
+	 */
+	if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
+		blk_queue_zone_write_granularity(q, sdkp-
>physical_block_size);
+	else
+		blk_queue_zone_write_granularity(q, sdkp->device-
>sector_size);
+
 	/* READ16/WRITE16 is mandatory for ZBC disks */
 	sdkp->device->use_16_for_rw = 1;
 	sdkp->device->use_10_for_rw = 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..4b4df2644882 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -337,6 +337,7 @@ struct queue_limits {
 	unsigned int		max_zone_append_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
+	unsigned int		zone_write_granularity;
 
 	unsigned short		max_segments;
 	unsigned short		max_integrity_segments;
@@ -1161,6 +1162,7 @@ extern void blk_queue_logical_block_size(struct
request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned
int);
+void blk_queue_zone_write_granularity(struct request_queue *, unsigned int);
 extern void blk_queue_alignment_offset(struct request_queue *q,
 				       unsigned int alignment);
 void blk_queue_update_readahead(struct request_queue *q);



-- 
Damien Le Moal
Western Digital
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
  2021-01-18 18:28         ` Christoph Hellwig
@ 2021-01-19  4:57           ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-19  4:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-nvme, sagi, Damien Le Moal

On 1/18/21 10:28 AM, Christoph Hellwig wrote:
> On Wed, Jan 13, 2021 at 05:03:05AM +0000, Chaitanya Kulkarni wrote:
>> On 1/11/21 23:33, Christoph Hellwig wrote:
>>> I'm not a huge fan of this helper, especially as it sets an end_io
>>> callback only for the allocated case, which is a weird calling
>>> convention.
>>>
>> The patch has a right documentation and that end_io is needed
>> for passthru case. To get rid of the weirdness I can remove
>> passthru case and make itend_io assign to inline and non-inline bio
>> to make it non-weired.
>>
>> Since this eliminates exactly identical lines of the code in atleast two
>> backend which we should try to use the helper in non-wried way.
> I really do not like helper that just eliminate "duplicate lines" vs
> encapsulating logic that makes sense one its own.
>
Okay, I'll drop it, next version.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 5/9] nvmet: add bio get helper for different backends
@ 2021-01-19  4:57           ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-19  4:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Damien Le Moal, sagi, linux-nvme

On 1/18/21 10:28 AM, Christoph Hellwig wrote:
> On Wed, Jan 13, 2021 at 05:03:05AM +0000, Chaitanya Kulkarni wrote:
>> On 1/11/21 23:33, Christoph Hellwig wrote:
>>> I'm not a huge fan of this helper, especially as it sets an end_io
>>> callback only for the allocated case, which is a weird calling
>>> convention.
>>>
>> The patch has a right documentation and that end_io is needed
>> for passthru case. To get rid of the weirdness I can remove
>> passthru case and make itend_io assign to inline and non-inline bio
>> to make it non-weired.
>>
>> Since this eliminates exactly identical lines of the code in atleast two
>> backend which we should try to use the helper in non-wried way.
> I really do not like helper that just eliminate "duplicate lines" vs
> encapsulating logic that makes sense one its own.
>
Okay, I'll drop it, next version.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
  2021-01-19  4:28           ` Damien Le Moal
@ 2021-01-19  6:15             ` hch
  -1 siblings, 0 replies; 98+ messages in thread
From: hch @ 2021-01-19  6:15 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: hch, Chaitanya Kulkarni, sagi, linux-block, linux-nvme

On Tue, Jan 19, 2021 at 04:28:00AM +0000, Damien Le Moal wrote:
> What about something like this below to add to Chaitanya series ?
> This adds the queue limit zone_write_granularity which is set to the physical
> block size for scsi, and for NVMe/ZNS too since that value is limited to the
> atomic block size. Lightly tested with both 512e and 4kn SMR drives.

For ZNS it should just be the logic block size, the atomic size is not
the right thing use here.

> +	if (blk_queue_is_zoned(q)) {
> +		if (limits->zone_write_granularity < limits-
> >logical_block_size)

Overly long line here, and your mailer wrapped it as a punishment :)

That being said I don't think the normal path to set the block size
should affect it.  Just add the manual call and leave the non-zoned
path alone.

> +EXPORT_SYMBOL(blk_queue_zone_write_granularity);

EXPORT_SYMBOL_GPL for all zoned stuff.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support
@ 2021-01-19  6:15             ` hch
  0 siblings, 0 replies; 98+ messages in thread
From: hch @ 2021-01-19  6:15 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, linux-nvme, hch, Chaitanya Kulkarni, sagi

On Tue, Jan 19, 2021 at 04:28:00AM +0000, Damien Le Moal wrote:
> What about something like this below to add to Chaitanya series ?
> This adds the queue limit zone_write_granularity which is set to the physical
> block size for scsi, and for NVMe/ZNS too since that value is limited to the
> atomic block size. Lightly tested with both 512e and 4kn SMR drives.

For ZNS it should just be the logic block size, the atomic size is not
the right thing use here.

> +	if (blk_queue_is_zoned(q)) {
> +		if (limits->zone_write_granularity < limits-
> >logical_block_size)

Overly long line here, and your mailer wrapped it as a punishment :)

That being said I don't think the normal path to set the block size
should affect it.  Just add the manual call and leave the non-zoned
path alone.

> +EXPORT_SYMBOL(blk_queue_zone_write_granularity);

EXPORT_SYMBOL_GPL for all zoned stuff.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 0/9] nvmet: add ZBD backend support
  2021-01-12  4:26 ` Chaitanya Kulkarni
@ 2021-02-10 22:42   ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-10 22:42 UTC (permalink / raw)
  To: hch, Damien Le Moal; +Cc: sagi, linux-block, linux-nvme

Christoph/Damien,

On 1/11/21 8:26 PM, Chaitanya Kulkarni wrote:
> Hi,
>
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
> backend. There is no support for a generic block device backend to
> handle the ZBD devices which are not NVMe protocol compliant.
>
> This adds support to export the ZBDs (which are not NVMe drives) to host
> the from target via NVMeOF using the host side ZNS interface.
>
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for
> patch-series overview.
>
> All the testcases are passing for the ZoneFS where ZBD exported with
> NVMeOF backed by null_blk ZBD and null_blk ZBD without NVMeOF. Adding
> test result below.
>
> Note: This patch-series is based on the earlier posted patch series :-
>
> [PATCH V2 0/4] nvmet: admin-cmd related cleanups and a fix
> http://lists.infradead.org/pipermail/linux-nvme/2021-January/021729.html
>
> -ck
>
> Changes from V8:-
>
> 1. Rebase and retest on latest nvme-5.11.
> 2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set.
> 3. Add a fix to admin ns-desc list handler for handling default csi.
>
I can see that Damien's granularity series is in the linux-block tree, I'm
planning to send v10 of this series given that it also has a block layer
patch
[1] should I use the linux-block/for-next or linux-nvme/nvme-5.12 ?


[1]  [PATCH V9 1/9] block: export bio_add_hw_pages()

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 0/9] nvmet: add ZBD backend support
@ 2021-02-10 22:42   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 98+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-10 22:42 UTC (permalink / raw)
  To: hch, Damien Le Moal; +Cc: linux-block, sagi, linux-nvme

Christoph/Damien,

On 1/11/21 8:26 PM, Chaitanya Kulkarni wrote:
> Hi,
>
> NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
> Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
> backend. There is no support for a generic block device backend to
> handle the ZBD devices which are not NVMe protocol compliant.
>
> This adds support to export the ZBDs (which are not NVMe drives) to host
> the from target via NVMeOF using the host side ZNS interface.
>
> The patch series is generated in bottom-top manner where, it first adds
> prep patch and ZNS command-specific handlers on the top of genblk and 
> updates the data structures, then one by one it wires up the admin cmds
> in the order host calls them in namespace initializing sequence. Once
> everything is ready, it wires-up the I/O command handlers. See below for
> patch-series overview.
>
> All the testcases are passing for the ZoneFS where ZBD exported with
> NVMeOF backed by null_blk ZBD and null_blk ZBD without NVMeOF. Adding
> test result below.
>
> Note: This patch-series is based on the earlier posted patch series :-
>
> [PATCH V2 0/4] nvmet: admin-cmd related cleanups and a fix
> http://lists.infradead.org/pipermail/linux-nvme/2021-January/021729.html
>
> -ck
>
> Changes from V8:-
>
> 1. Rebase and retest on latest nvme-5.11.
> 2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set.
> 3. Add a fix to admin ns-desc list handler for handling default csi.
>
I can see that Damien's granularity series is in the linux-block tree, I'm
planning to send v10 of this series given that it also has a block layer
patch
[1] should I use the linux-block/for-next or linux-nvme/nvme-5.12 ?


[1]  [PATCH V9 1/9] block: export bio_add_hw_pages()

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 0/9] nvmet: add ZBD backend support
  2021-02-10 22:42   ` Chaitanya Kulkarni
@ 2021-02-11  7:20     ` hch
  -1 siblings, 0 replies; 98+ messages in thread
From: hch @ 2021-02-11  7:20 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: hch, Damien Le Moal, sagi, linux-block, linux-nvme

On Wed, Feb 10, 2021 at 10:42:42PM +0000, Chaitanya Kulkarni wrote:
> I can see that Damien's granularity series is in the linux-block tree, I'm
> planning to send v10 of this series given that it also has a block layer
> patch
> [1] should I use the linux-block/for-next or linux-nvme/nvme-5.12 ?

I'd just wait for -rc1.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH V9 0/9] nvmet: add ZBD backend support
@ 2021-02-11  7:20     ` hch
  0 siblings, 0 replies; 98+ messages in thread
From: hch @ 2021-02-11  7:20 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, Damien Le Moal, hch, linux-nvme, sagi

On Wed, Feb 10, 2021 at 10:42:42PM +0000, Chaitanya Kulkarni wrote:
> I can see that Damien's granularity series is in the linux-block tree, I'm
> planning to send v10 of this series given that it also has a block layer
> patch
> [1] should I use the linux-block/for-next or linux-nvme/nvme-5.12 ?

I'd just wait for -rc1.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2021-02-11  7:21 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-12  4:26 [PATCH V9 0/9] nvmet: add ZBD backend support Chaitanya Kulkarni
2021-01-12  4:26 ` Chaitanya Kulkarni
2021-01-12  4:26 ` [PATCH V9 1/9] block: export bio_add_hw_pages() Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  5:40   ` Damien Le Moal
2021-01-12  5:40     ` Damien Le Moal
2021-01-12  7:24   ` Christoph Hellwig
2021-01-12  7:24     ` Christoph Hellwig
2021-01-13  1:20     ` Chaitanya Kulkarni
2021-01-13  1:20       ` Chaitanya Kulkarni
2021-01-12  4:26 ` [PATCH V9 2/9] nvmet: add lba to sect conversion helpers Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  5:08   ` Damien Le Moal
2021-01-12  5:08     ` Damien Le Moal
2021-01-18 18:19   ` Christoph Hellwig
2021-01-18 18:19     ` Christoph Hellwig
2021-01-12  4:26 ` [PATCH V9 3/9] nvmet: add NVM command set identifier support Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  7:27   ` Christoph Hellwig
2021-01-12  7:27     ` Christoph Hellwig
2021-01-13  4:16     ` Chaitanya Kulkarni
2021-01-13  4:16       ` Chaitanya Kulkarni
2021-01-18 18:21       ` Christoph Hellwig
2021-01-18 18:21         ` Christoph Hellwig
2021-01-12  4:26 ` [PATCH V9 4/9] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  5:32   ` Damien Le Moal
2021-01-12  5:32     ` Damien Le Moal
2021-01-12  6:11     ` Chaitanya Kulkarni
2021-01-12  6:11       ` Chaitanya Kulkarni
2021-01-12  6:31       ` Damien Le Moal
2021-01-12  6:31         ` Damien Le Moal
2021-01-12  7:48   ` Christoph Hellwig
2021-01-12  7:48     ` Christoph Hellwig
2021-01-12  7:52     ` Damien Le Moal
2021-01-12  7:52       ` Damien Le Moal
2021-01-18 18:25       ` Christoph Hellwig
2021-01-18 18:25         ` Christoph Hellwig
2021-01-19  0:02         ` Damien Le Moal
2021-01-19  0:02           ` Damien Le Moal
2021-01-19  4:28         ` Damien Le Moal
2021-01-19  4:28           ` Damien Le Moal
2021-01-19  6:15           ` hch
2021-01-19  6:15             ` hch
2021-01-13  4:57     ` Chaitanya Kulkarni
2021-01-13  4:57       ` Chaitanya Kulkarni
2021-01-18 18:27       ` Christoph Hellwig
2021-01-18 18:27         ` Christoph Hellwig
2021-01-12  4:26 ` [PATCH V9 5/9] nvmet: add bio get helper for different backends Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  5:37   ` Damien Le Moal
2021-01-12  5:37     ` Damien Le Moal
2021-01-12  5:55     ` Chaitanya Kulkarni
2021-01-12  5:55       ` Chaitanya Kulkarni
2021-01-12  7:33   ` Christoph Hellwig
2021-01-12  7:33     ` Christoph Hellwig
2021-01-13  5:03     ` Chaitanya Kulkarni
2021-01-13  5:03       ` Chaitanya Kulkarni
2021-01-18 18:28       ` Christoph Hellwig
2021-01-18 18:28         ` Christoph Hellwig
2021-01-19  4:57         ` Chaitanya Kulkarni
2021-01-19  4:57           ` Chaitanya Kulkarni
2021-01-12  4:26 ` [PATCH V9 6/9] nvmet: add bio init " Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  5:40   ` Damien Le Moal
2021-01-12  5:40     ` Damien Le Moal
2021-01-12  5:57     ` Chaitanya Kulkarni
2021-01-12  5:57       ` Chaitanya Kulkarni
2021-01-12  6:27       ` Damien Le Moal
2021-01-12  6:27         ` Damien Le Moal
2021-01-12  7:33   ` Christoph Hellwig
2021-01-12  7:33     ` Christoph Hellwig
2021-01-13  5:04     ` Chaitanya Kulkarni
2021-01-13  5:04       ` Chaitanya Kulkarni
2021-01-18 18:33       ` Christoph Hellwig
2021-01-18 18:33         ` Christoph Hellwig
2021-01-12  4:26 ` [PATCH V9 7/9] nvmet: add bio put " Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  4:26 ` [PATCH V9 8/9] nvmet: add common I/O length check helper Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  7:35   ` Christoph Hellwig
2021-01-12  7:35     ` Christoph Hellwig
2021-01-13  5:07     ` Chaitanya Kulkarni
2021-01-13  5:07       ` Chaitanya Kulkarni
2021-01-18 18:34       ` Christoph Hellwig
2021-01-18 18:34         ` Christoph Hellwig
2021-01-12  4:26 ` [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append Chaitanya Kulkarni
2021-01-12  4:26   ` Chaitanya Kulkarni
2021-01-12  7:36   ` Christoph Hellwig
2021-01-12  7:36     ` Christoph Hellwig
2021-01-13  5:13     ` Chaitanya Kulkarni
2021-01-13  5:13       ` Chaitanya Kulkarni
2021-01-12  6:12 ` [PATCH V9 0/9] nvmet: add ZBD backend support Chaitanya Kulkarni
2021-01-12  6:12   ` Chaitanya Kulkarni
2021-02-10 22:42 ` Chaitanya Kulkarni
2021-02-10 22:42   ` Chaitanya Kulkarni
2021-02-11  7:20   ` hch
2021-02-11  7:20     ` hch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.