All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/6] block: add support for REQ_OP_VERIFY
@ 2022-07-13  7:20 Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 1/6] " Chaitanya Kulkarni
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 19242 bytes --]

Hi,

One of the responsibilities of the Operating System, along with managing
resources, is to provide a unified interface to the user by creating
hardware abstractions. In the Linux Kernel storage stack that
abstraction is created by implementing the generic request operations
such as REQ_OP_READ/REQ_OP_WRITE or REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES,
etc that are mapped to the specific low-level hardware protocol commands 
e.g. SCSI or NVMe.

With that in mind, this patch-series implements a new block layer
operation to offload the data verification on to the controller if
supported or emulate the operation if not. The main advantage is to free
up the CPU and reduce the host link traffic since, for some devices,
their internal bandwidth is higher than the host link and offloading this
operation can improve the performance of the proactive error detection
applications such as file system level scrubbing. 

* Background *
-----------------------------------------------------------------------

NVMe Specification provides a controller level Verify command [1] which
is similar to the ATA Verify [2] command where the controller is
responsible for data verification without transferring the data to the
host. (Offloading LBAs verification). This is designed to proactively
discover any data corruption issues when the device is free so that
applications can protect sensitive data and take corrective action
instead of waiting for failure to occur.

The NVMe Verify command is added in order to provide low level media
scrubbing and possibly moving the data to the right place in case it has
correctable media degradation. Also, this provides a way to enhance
file-system level scrubbing/checksum verification and optinally offload
this task, which is CPU intensive, to the kernel (when emulated), over
the fabric, and to the controller (when supported).   

This is useful when the controller's internal bandwidth is higher than
the host's bandwith showing a sharp increase in the performance due to
_no host traffic or host CPU involvement_.

* Implementation *
-----------------------------------------------------------------------

Right now there is no generic interface which can be used by the
in-kernel components such as file-system or userspace application
(except passthru commands or some combination of write/read/compare) to 
issue verify command with the central block layer API. This can lead to
each userspace applications having protocol specific IOCTL which
defeates the purpose of having the OS provide a H/W abstraction.

This patch series introduces a new block layer payloadless request
operation REQ_OP_VERIFY that allows in-kernel components & userspace
applications to verify the range of the LBAs by offloading checksum
scrubbing/verification to the controller that is directly attached to
the host. For direct attached devices this leads to decrease in the host
DMA traffic and CPU usage and for the fabrics attached device over the
network that leads to a decrease in the network traffic and CPU usage
for both host & target.

* Scope *
-----------------------------------------------------------------------

Please note this only covers the operating system level overhead.
Analyzing controller verify command performance for common protocols
(SCSI/NVMe) is out of scope for REQ_OP_VERIFY.

* Micro Benchmarks *
-----------------------------------------------------------------------

When verifing 500GB of data on NVMeOF with nvme-loop and null_blk as a
target backend block device results show almost a 80% performance
increase :-

With Verify resulting in REQ_OP_VERIFY to null_blk :-

real	2m3.773s
user	0m0.000s
sys	0m59.553s

With Emulation resulting in REQ_OP_READ null_blk :-

real	12m18.964s
user	0m0.002s
sys	1m15.666s

Any comments are welcome.

Below is the summary of testlog :-

1. blkverify command on nvme-pcie :-
------------------------------------
[   22.802798] nvme nvme0: pci function 0000:00:04.0
[   22.846145] nvme nvme0: 48/0/0 default/read/poll queues
[   22.857822] blkdev_issue_verify 490
*[   22.857827] __blkdev_issue_verify 419*
*[   22.857828] __blkdev_issue_verify 452*
*[   22.857911] __blkdev_issue_verify 466*
[   22.857922] nvme_setup_verify 844
[   22.858287] blkdev_issue_verify 502
modprobe: FATAL: Module nvme is in use.

2. blkverify command on null_blk verify=0 :-
--------------------------------------------
Observed the emulation from block layer :-

[   24.696254] blkdev_issue_verify 490
[   24.696259] __blkdev_issue_verify 419
[   24.696263] __blkdev_issue_verify 429
*[   24.696264] __blkdev_emulate_verify 366*
*[   24.696265] __blkdev_emulate_verify 368*
[   24.696334] blkdev_issue_verify 502

3. blkverify command on null_blk verify=0 :-
--------------------------------------------
Observed the REQ_OP_VERIFY from block layer :-
[   26.396652] blkdev_issue_verify 490
*[   26.396659] __blkdev_issue_verify 419*
*[   26.396662] __blkdev_issue_verify 452*
*[   26.396669] __blkdev_issue_verify 466*
[   26.396702] null_blk: null_process_cmd 1406 kworker/0:1H
[   26.396740] blkdev_issue_verify 502

4. blkverify command on NVMeOF block device backend null_blk verify=0 :-
Observed REQ_OP_VERIFY on host side as target support NVMe verify.
Observed the emulation from block layer on target :-
[   31.520548] blkdev_issue_verify 490
*[   31.520553] __blkdev_issue_verify 419*
*[   31.520554] __blkdev_issue_verify 452*
*[   31.520885] __blkdev_issue_verify 466*
[   31.520976] nvme_setup_verify 844
[   31.520982] nvmet: nvmet_bdev_submit_emulate_verify 469
[   31.520984] blkdev_issue_verify 490
[   31.520985] __blkdev_issue_verify 419
[   31.520989] __blkdev_issue_verify 429
*[   31.520990] __blkdev_emulate_verify 36*
*[   31.520990] __blkdev_emulate_verify 368*
*[   31.521088] blkdev_issue_verify 502*
*[   31.521097] blkdev_issue_verify 502*
[   31.534798] nvme nvme1: Removing ctrl: NQN "testnqn"

3. blkverify command on NVMeOF block device backend null_blk verify=1 :-
Observed the REQ_OP_VERIFY from host and target block layer :-
[   54.399880] blkdev_issue_verify 490
[   54.399885] __blkdev_issue_verify 419
[   54.399887] __blkdev_issue_verify 452
[   54.399962] __blkdev_issue_verify 466
[   54.400038] nvme_setup_verify 844
[   54.400044] nvmet: nvmet_bdev_execute_verify 497
*[   54.400045] __blkdev_issue_verify 419*
*[   54.400046] __blkdev_issue_verify 452*
*[   54.400048] __blkdev_issue_verify 466*
[   54.400053] null_blk: null_process_cmd 1406 kworker/20:1
[   54.400062] blkdev_issue_verify 502
[   54.405139] nvme nvme1: Removing ctrl: NQN "testnqn"

6. blkverify command on scsi debug drive :-
Observed REQ_OP_VERIFY mapped onto SCSI Verify (16) :-
[   61.727782] sd 2:0:0:0: Attached scsi generic sg3 type 0
[   61.727853] sd 2:0:0:0: Power-on or device reset occurred
[   61.729965] sd 2:0:0:0: [sdc] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
[   61.730992] sd 2:0:0:0: [sdc] Write Protect is off
[   61.730996] sd 2:0:0:0: [sdc] Mode Sense: 73 00 10 08
[   61.733141] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   61.737303] sd 2:0:0:0: [sdc] VERIFY16 supported
[   61.737307] sd 2:0:0:0: [sdc] Preferred minimum I/O size 512 bytes
[   61.737309] sd 2:0:0:0: [sdc] Optimal transfer size 524288 bytes
[   61.755811] sd 2:0:0:0: [sdc] VERIFY16 supported
[   61.757983] sd 2:0:0:0: [sdc] Attached SCSI disk
[   61.759689] blkdev_issue_verify 490
*[   61.759693] __blkdev_issue_verify 419*
*[   61.759695] __blkdev_issue_verify 452*
*[   61.759770] __blkdev_issue_verify 466*
[   61.759784] sd_setup_verify_cmnd 1101
[   61.759785] sd_setup_verify16_cmnd 1063
[   61.760800] blkdev_issue_verify 502

-ck

Changes from V1:-

1. Don't use kzalloc for buffer allocation. (Darrik)
2. Use NVMe controllers VSL (Verify Size Limit) to set the verify max
   sectors limit for the block layer queue. (Keith, Christoph)
3. Remove the word "we" from commit messages and point to the right
   kernel subsystem. (Christoph).
4. Add complete original cover-letter.
5. Add SCSI REQ_OP_VERIFY patch with Damien's comments addressed.
6. Remove the patch for the NVMeOF file-ns.

References:-

[1] NVMe Verify :-

For pro-actively avoiding unrecoverable read errors, NVMe 1.4 adds
Verify and Get LBA Status commands. The Verify command is simple: it
does everything a normal read command does, except for returning the
data to the host system. If a read command would return an error, a
verify command will return the same error. If a read command would be
successful, a verify command will be as well. This makes it possible to
do a low-level scrub of the stored data without being bottlenecked by
the host interface bandwidth. Some SSDs will react to a fixable ECC
error by moving or re-writing degraded data, and a verify command
should trigger the same behavior. Overall, this should reduce the need
for filesystem-level checksum scrubbing/verification. Each Verify
command is tagged with a bit indicating whether the SSD should fail
fast or try hard to recover data, similar to but overriding the above
Read Recovery Level setting.

[2]

http://t13.org/Documents/UploadedDocuments/docs2017/di529...

Chaitanya Kulkarni (6):
  block: add support for REQ_OP_VERIFY
  nvme: add support for the Verify command
  nvmet: add Verify command support for bdev-ns
  nvmet: add Verify emulation support for bdev-ns
  null_blk: add REQ_OP_VERIFY support
  scsi: sd: add support for REQ_OP_VERIFY

 Documentation/ABI/stable/sysfs-block |  12 +++
 block/blk-core.c                     |   5 +
 block/blk-lib.c                      | 155 +++++++++++++++++++++++++++
 block/blk-merge.c                    |  18 ++++
 block/blk-settings.c                 |  17 +++
 block/blk-sysfs.c                    |   8 ++
 block/blk.h                          |   7 ++
 block/ioctl.c                        |  35 ++++++
 drivers/block/null_blk/main.c        |  20 +++-
 drivers/block/null_blk/null_blk.h    |   1 +
 drivers/nvme/host/core.c             |  31 ++++++
 drivers/nvme/host/nvme.h             |   1 +
 drivers/nvme/target/admin-cmd.c      |   3 +-
 drivers/nvme/target/io-cmd-bdev.c    |  66 ++++++++++++
 drivers/scsi/sd.c                    | 124 +++++++++++++++++++++
 drivers/scsi/sd.h                    |   5 +
 include/linux/bio.h                  |   9 +-
 include/linux/blk_types.h            |   2 +
 include/linux/blkdev.h               |  19 ++++
 include/linux/nvme.h                 |  19 ++++
 include/uapi/linux/fs.h              |   1 +
 21 files changed, 553 insertions(+), 5 deletions(-)

linux-block (for-next) # 
linux-block (for-next) # sh verify-test.sh 
nvme-pcie
[   22.802798] nvme nvme0: pci function 0000:00:04.0
[   22.846145] nvme nvme0: 48/0/0 default/read/poll queues
[   22.849666] nvme nvme0: Ignoring bogus Namespace Identifiers
[   22.857822] blkdev_issue_verify 490
[   22.857827] __blkdev_issue_verify 419
[   22.857828] __blkdev_issue_verify 452
[   22.857911] __blkdev_issue_verify 466
[   22.857922] nvme_setup_verify 844
[   22.858287] blkdev_issue_verify 502
modprobe: FATAL: Module nvme is in use.

null_blk verify=0
[   24.696254] blkdev_issue_verify 490
[   24.696259] __blkdev_issue_verify 419
[   24.696263] __blkdev_issue_verify 429
[   24.696264] __blkdev_emulate_verify 366
[   24.696265] __blkdev_emulate_verify 368
[   24.696334] blkdev_issue_verify 502

null_blk verify=1
[   26.396652] blkdev_issue_verify 490
[   26.396659] __blkdev_issue_verify 419
[   26.396662] __blkdev_issue_verify 452
[   26.396669] __blkdev_issue_verify 466
[   26.396702] null_blk: null_process_cmd 1406 kworker/0:1H
[   26.396740] blkdev_issue_verify 502

bdev-ns null_blk verify=0
++ FILE=/dev/nvme0n1
++ NN=1
++ NQN=testnqn
++ let NR_DEVICES=NN+1
++ modprobe -r null_blk
++ modprobe null_blk nr_devices=0 verify=0
++ modprobe nvme
++ modprobe nvme-fabrics
++ modprobe nvmet
++ modprobe nvme-loop
++ dmesg -c
++ sleep 2
++ tree /sys/kernel/config
/sys/kernel/config
├── nullb
│   └── features
└── nvmet
    ├── hosts
    ├── ports
    └── subsystems

5 directories, 1 file
++ mkdir /sys/kernel/config/nvmet/subsystems/testnqn
++ mkdir /sys/kernel/config/nvmet/ports/1/
++ echo -n loop
++ echo -n 1
++ ln -s /sys/kernel/config/nvmet/subsystems/testnqn /sys/kernel/config/nvmet/ports/1/subsystems/
++ sleep 1
++ echo transport=loop,nqn=testnqn
+++ shuf -i 1-1 -n 1
++ for i in `shuf -i  1-$NN -n $NN`
++ mkdir config/nullb/nullb1
++ echo 4096
++ echo 512000
++ echo 1
+++ cat config/nullb/nullb1/index
++ IDX=0
++ mkdir /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1
++ let IDX=IDX+1
++ echo ' ####### /dev/nullb1'
 ####### /dev/nullb1
++ echo -n /dev/nullb1
++ cat /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1/device_path
/dev/nullb1
++ echo 1
++ dmesg -c
[   30.489780] nvmet: creating nvm controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ee37606-f9d7-4925-8a61-784320913d7b.
[   30.489918] nvme nvme1: creating 48 I/O queues.
[   30.495425] nvme nvme1: new ctrl: "testnqn"
[   30.500883] null_blk: disk nullb1 created
[   30.503497] nvmet: adding nsid 1 to subsystem testnqn
[   30.505313] nvme nvme1: rescanning namespaces.
++ sleep 1
++ mount
++ column -t
++ grep nvme
++ '[' 1 ']'
+++ wc -l
+++ ls -l /dev/nvme1 /dev/nvme1n1
++ cnt=2
++ echo 2
2
++ '[' 2 -gt 1 ']'
++ break
++ dmesg -c
+ nvme disconnect -n testnqn
NQN:testnqn disconnected 1 controller(s)

real	0m0.362s
user	0m0.000s
sys	0m0.009s
++ shuf -i 1-1 -n 1
+ for i in `shuf -i  1-$NN -n $NN`
+ echo 0
+ rmdir /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1
+ rmdir config/nullb/nullb1
+ sleep 2
+ rm -fr /sys/kernel/config/nvmet/ports/1/subsystems/testnqn
+ sleep 1
+ rmdir /sys/kernel/config/nvmet/ports/1
+ rmdir /sys/kernel/config/nvmet/subsystems/testnqn
+ sleep 1
+ modprobe -r nvme_loop
+ modprobe -r nvme_fabrics
+ modprobe -r nvmet
+ modprobe -r nvme
+ umount /mnt/nvme0n1
umount: /mnt/nvme0n1: no mount point specified.
+ umount /mnt/backend
umount: /mnt/backend: not mounted.
+ modprobe -r null_blk
+ tree /sys/kernel/config
/sys/kernel/config

0 directories, 0 files
[   31.520548] blkdev_issue_verify 490
[   31.520553] __blkdev_issue_verify 419
[   31.520554] __blkdev_issue_verify 452
[   31.520885] __blkdev_issue_verify 466
[   31.520976] nvme_setup_verify 844
[   31.520982] nvmet: nvmet_bdev_submit_emulate_verify 469
[   31.520984] blkdev_issue_verify 490
[   31.520985] __blkdev_issue_verify 419
[   31.520989] __blkdev_issue_verify 429
[   31.520990] __blkdev_emulate_verify 366
[   31.520990] __blkdev_emulate_verify 368
[   31.521088] blkdev_issue_verify 502
[   31.521097] blkdev_issue_verify 502
[   31.534798] nvme nvme1: Removing ctrl: NQN "testnqn"

bdev-ns null_blk verify=1
++ FILE=/dev/nvme0n1
++ NN=1
++ NQN=testnqn
++ let NR_DEVICES=NN+1
++ modprobe -r null_blk
++ modprobe null_blk nr_devices=0 verify=1
++ modprobe nvme
++ modprobe nvme-fabrics
++ modprobe nvmet
++ modprobe nvme-loop
++ dmesg -c
++ sleep 2
++ tree /sys/kernel/config
/sys/kernel/config
├── nullb
│   └── features
└── nvmet
    ├── hosts
    ├── ports
    └── subsystems

5 directories, 1 file
++ mkdir /sys/kernel/config/nvmet/subsystems/testnqn
++ mkdir /sys/kernel/config/nvmet/ports/1/
++ echo -n loop
++ echo -n 1
++ ln -s /sys/kernel/config/nvmet/subsystems/testnqn /sys/kernel/config/nvmet/ports/1/subsystems/
++ sleep 1
++ echo transport=loop,nqn=testnqn
+++ shuf -i 1-1 -n 1
++ for i in `shuf -i  1-$NN -n $NN`
++ mkdir config/nullb/nullb1
++ echo 4096
++ echo 512000
++ echo 1
+++ cat config/nullb/nullb1/index
++ IDX=0
++ mkdir /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1
++ let IDX=IDX+1
++ echo ' ####### /dev/nullb1'
 ####### /dev/nullb1
++ echo -n /dev/nullb1
++ cat /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1/device_path
/dev/nullb1
++ echo 1
++ dmesg -c
[   53.372782] nvmet: creating nvm controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:0c78049e-e88f-4f9f-a8ff-bf6287235660.
[   53.373088] nvme nvme1: creating 48 I/O queues.
[   53.377729] nvme nvme1: new ctrl: "testnqn"
[   53.382877] null_blk: disk nullb1 created
[   53.385343] nvmet: adding nsid 1 to subsystem testnqn
[   53.387320] nvme nvme1: rescanning namespaces.
++ sleep 1
++ mount
++ column -t
++ grep nvme
++ '[' 1 ']'
+++ wc -l
+++ ls -l /dev/nvme1 /dev/nvme1n1
++ cnt=2
++ echo 2
2
++ '[' 2 -gt 1 ']'
++ break
++ dmesg -c
+ nvme disconnect -n testnqn
NQN:testnqn disconnected 1 controller(s)

real	0m0.364s
user	0m0.000s
sys	0m0.007s
++ shuf -i 1-1 -n 1
+ for i in `shuf -i  1-$NN -n $NN`
+ echo 0
+ rmdir /sys/kernel/config/nvmet/subsystems/testnqn/namespaces/1
+ rmdir config/nullb/nullb1
+ sleep 2
+ rm -fr /sys/kernel/config/nvmet/ports/1/subsystems/testnqn
+ sleep 1
+ rmdir /sys/kernel/config/nvmet/ports/1
+ rmdir /sys/kernel/config/nvmet/subsystems/testnqn
+ sleep 1
+ modprobe -r nvme_loop
+ modprobe -r nvme_fabrics
+ modprobe -r nvmet
+ modprobe -r nvme
+ umount /mnt/nvme0n1
umount: /mnt/nvme0n1: no mount point specified.
+ umount /mnt/backend
umount: /mnt/backend: not mounted.
+ modprobe -r null_blk
+ tree /sys/kernel/config
/sys/kernel/config

0 directories, 0 files
[   54.399880] blkdev_issue_verify 490
[   54.399885] __blkdev_issue_verify 419
[   54.399887] __blkdev_issue_verify 452
[   54.399962] __blkdev_issue_verify 466
[   54.400038] nvme_setup_verify 844
[   54.400044] nvmet: nvmet_bdev_execute_verify 497
[   54.400045] __blkdev_issue_verify 419
[   54.400046] __blkdev_issue_verify 452
[   54.400048] __blkdev_issue_verify 466
[   54.400053] null_blk: null_process_cmd 1406 kworker/20:1
[   54.400062] blkdev_issue_verify 502
[   54.405139] nvme nvme1: Removing ctrl: NQN "testnqn"

scsi debug
modprobe: FATAL: Module scsi_debug is in use.
[   61.392949] scsi_debug: module verification failed: signature and/or required key missing - tainting kernel
[   61.727201] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[   61.727208] scsi host2: scsi_debug: version 0191 [20210520]
                 dev_size_mb=4096, opts=0x0, submit_queues=1, statistics=0
[   61.727369] scsi 2:0:0:0: Direct-Access     Linux    scsi_debug       0191 PQ: 0 ANSI: 7
[   61.727782] sd 2:0:0:0: Attached scsi generic sg3 type 0
[   61.727853] sd 2:0:0:0: Power-on or device reset occurred
[   61.729965] sd 2:0:0:0: [sdc] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
[   61.730992] sd 2:0:0:0: [sdc] Write Protect is off
[   61.730996] sd 2:0:0:0: [sdc] Mode Sense: 73 00 10 08
[   61.733141] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   61.737303] sd 2:0:0:0: [sdc] VERIFY16 supported
[   61.737307] sd 2:0:0:0: [sdc] Preferred minimum I/O size 512 bytes
[   61.737309] sd 2:0:0:0: [sdc] Optimal transfer size 524288 bytes
[   61.755811] sd 2:0:0:0: [sdc] VERIFY16 supported
[   61.757983] sd 2:0:0:0: [sdc] Attached SCSI disk
[   61.759689] blkdev_issue_verify 490
[   61.759693] __blkdev_issue_verify 419
[   61.759695] __blkdev_issue_verify 452
[   61.759770] __blkdev_issue_verify 466
[   61.759784] sd_setup_verify_cmnd 1101
[   61.759785] sd_setup_verify16_cmnd 1063
[   61.760800] blkdev_issue_verify 502

-- 
2.29.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 1/6] block: add support for REQ_OP_VERIFY
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
@ 2022-07-13  7:20 ` Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 2/6] nvme: add support for the Verify command Chaitanya Kulkarni
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

This adds a new block layer operation to offload verifying a range of
LBAs. This support is needed in order to provide file systems and
fabrics, kernel components to offload LBA verification when it is
supported by the hardware controller. In case hardware offloading is
not supported then we provide API to emulate the same. The prominent
example of that is SCSI and NVMe Verify command. Block layer API also
provide an emulation of the same operation that can be used in case H/W
does not support verify. This is still useful when block device is
remotely attached e.g. using NVMeOF.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 Documentation/ABI/stable/sysfs-block |  12 +++
 block/blk-core.c                     |   5 +
 block/blk-lib.c                      | 155 +++++++++++++++++++++++++++
 block/blk-merge.c                    |  18 ++++
 block/blk-settings.c                 |  17 +++
 block/blk-sysfs.c                    |   8 ++
 block/blk.h                          |   7 ++
 block/ioctl.c                        |  35 ++++++
 include/linux/bio.h                  |   9 +-
 include/linux/blk_types.h            |   2 +
 include/linux/blkdev.h               |  19 ++++
 include/uapi/linux/fs.h              |   1 +
 12 files changed, 285 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index cd14ecb3c9a5..e3a10ed1f955 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -666,6 +666,18 @@ Description:
 		in a single write zeroes command. If write_zeroes_max_bytes is
 		0, write zeroes is not supported by the device.
 
+What:		/sys/block/<disk>/queue/verify_max_bytes
+Date:		July 2022
+Contact:	Chaitanya Kulkarni <kch@nvidia.com>
+Description:
+		Devices that support verify operation in which a single
+		request can be issued to verify the range of the contiguous
+		blocks on the storage without any payload in the request.
+		This can be used to optimize verifying LBAs on the device
+		without reading by offloading functionality. verify_max_bytes
+		indicates how many bytes can be written in a single verify
+		command. If verify_max_bytes is 0, verify operation is not
+		supported by the device.
 
 What:		/sys/block/<disk>/queue/zone_append_max_bytes
 Date:		May 2020
diff --git a/block/blk-core.c b/block/blk-core.c
index b530ce7b370c..a77975b8ae60 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
 	REQ_OP_NAME(ZONE_FINISH),
 	REQ_OP_NAME(ZONE_APPEND),
 	REQ_OP_NAME(WRITE_ZEROES),
+	REQ_OP_NAME(VERIFY),
 	REQ_OP_NAME(DRV_IN),
 	REQ_OP_NAME(DRV_OUT),
 };
@@ -785,6 +786,10 @@ void submit_bio_noacct(struct bio *bio)
 		if (!q->limits.max_write_zeroes_sectors)
 			goto not_supported;
 		break;
+	case REQ_OP_VERIFY:
+		if (!q->limits.max_verify_sectors)
+			goto not_supported;
+		break;
 	default:
 		break;
 	}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 09b7e1200c0f..df2e2bc092b3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -340,3 +340,158 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_secure_erase);
+
+/**
+ * __blkdev_emulate_verify - emulate number of verify operations
+ *				asynchronously
+ * @bdev:	blockdev to issue
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to verify
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @biop:	pointer to anchor bio
+ * @buf:	data buffer to mapped on bio
+ *
+ * Description:
+ *  Verify a block range by emulating REQ_OP_VERIFY into REQ_OP_READ,
+ *  use this when H/W offloading is not supported asynchronously.
+ *  Caller is responsible to handle anchored bio.
+ */
+static int __blkdev_emulate_verify(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, char *buf)
+{
+	struct bio *bio = *biop;
+	unsigned int sz;
+	int bi_size;
+
+	while (nr_sects != 0) {
+		bio = blk_next_bio(bio, bdev,
+				__blkdev_sectors_to_bio_pages(nr_sects),
+				REQ_OP_READ, gfp_mask);
+		bio->bi_iter.bi_sector = sector;
+
+		while (nr_sects != 0) {
+			bool is_vaddr = is_vmalloc_addr(buf);
+			struct page *p;
+
+			p = is_vaddr ? vmalloc_to_page(buf) : virt_to_page(buf);
+			sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
+
+			bi_size = bio_add_page(bio, p, sz, offset_in_page(buf));
+			if (bi_size < sz)
+				return -EIO;
+
+			nr_sects -= bi_size >> 9;
+			sector += bi_size >> 9;
+			buf += bi_size;
+		}
+		cond_resched();
+	}
+
+	*biop = bio;
+	return 0;
+}
+
+/**
+ * __blkdev_issue_verify - generate number of verify operations
+ * @bdev:	blockdev to issue
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to verify
+ * @gfp_mask:	memory allocation flags (for bio_alloc())
+ * @biop:	pointer to anchor bio
+ *
+ * Description:
+ *  Verify a block range using hardware offload.
+ *
+ * The function will emulate verify operation if no explicit hardware
+ * offloading for verifying is provided.
+ */
+int __blkdev_issue_verify(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, struct bio **biop)
+{
+	unsigned int max_verify_sectors = bdev_verify_sectors(bdev);
+	sector_t min_io_sect = (BIO_MAX_VECS << PAGE_SHIFT) >> 9;
+	struct bio *bio = *biop;
+	sector_t curr_sects;
+	char *buf;
+
+	if (!max_verify_sectors) {
+		int ret = 0;
+
+		buf = kmalloc(min_io_sect << 9, GFP_KERNEL);
+		if (!buf)
+			return -ENOMEM;
+
+		while (nr_sects > 0) {
+			curr_sects = min_t(sector_t, nr_sects, min_io_sect);
+			ret = __blkdev_emulate_verify(bdev, sector, curr_sects,
+						      gfp_mask, &bio, buf);
+			if (ret)
+				break;
+
+			if (bio) {
+				ret = submit_bio_wait(bio);
+				bio_put(bio);
+				bio = NULL;
+			}
+
+			nr_sects -= curr_sects;
+			sector += curr_sects;
+
+		}
+		/* set the biop to NULL since we have alrady completed above */
+		*biop = NULL;
+		kfree(buf);
+		return ret;
+	}
+
+	while (nr_sects) {
+		bio = blk_next_bio(bio, bdev, 0, REQ_OP_VERIFY, gfp_mask);
+		bio->bi_iter.bi_sector = sector;
+
+		if (nr_sects > max_verify_sectors) {
+			bio->bi_iter.bi_size = max_verify_sectors << 9;
+			nr_sects -= max_verify_sectors;
+			sector += max_verify_sectors;
+		} else {
+			bio->bi_iter.bi_size = nr_sects << 9;
+			nr_sects = 0;
+		}
+		cond_resched();
+	}
+	*biop = bio;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__blkdev_issue_verify);
+
+/**
+ * blkdev_issue_verify - verify a block range
+ * @bdev:	blockdev to verify
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to verify
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Verify a block range using hardware offload.
+ */
+int blkdev_issue_verify(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask)
+{
+	sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	struct bio *bio = NULL;
+	struct blk_plug plug;
+	int ret = 0;
+
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
+	blk_start_plug(&plug);
+	ret = __blkdev_issue_verify(bdev, sector, nr_sects, gfp_mask, &bio);
+	if (ret == 0 && bio) {
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+	}
+	blk_finish_plug(&plug);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_issue_verify);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5abf5aa5a5f0..f19668c1b7bf 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -153,6 +153,20 @@ static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
 	return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
 }
 
+static struct bio *blk_bio_verify_split(struct request_queue *q,
+		struct bio *bio, struct bio_set *bs, unsigned *nsegs)
+{
+	*nsegs = 0;
+
+	if (!q->limits.max_verify_sectors)
+		return NULL;
+
+	if (bio_sectors(bio) <= q->limits.max_verify_sectors)
+		return NULL;
+
+	return bio_split(bio, q->limits.max_verify_sectors, GFP_NOIO, bs);
+}
+
 /*
  * Return the maximum number of sectors from the start of a bio that may be
  * submitted as a single request to a block device. If enough sectors remain,
@@ -346,6 +360,10 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
 		split = blk_bio_write_zeroes_split(q, *bio, &q->bio_split,
 				nr_segs);
 		break;
+	case REQ_OP_VERIFY:
+		split = blk_bio_verify_split(q, *bio, &q->bio_split,
+				nr_segs);
+		break;
 	default:
 		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
 		break;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8bb9eef5310e..83fb42d42a91 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -43,6 +43,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = 0;
 	lim->chunk_sectors = 0;
 	lim->max_write_zeroes_sectors = 0;
+	lim->max_verify_sectors = 0;
 	lim->max_zone_append_sectors = 0;
 	lim->max_discard_sectors = 0;
 	lim->max_hw_discard_sectors = 0;
@@ -80,6 +81,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_sectors = UINT_MAX;
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
+	lim->max_verify_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
@@ -202,6 +204,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
+/**
+ * blk_queue_max_verify_sectors - set max sectors for a single verify
+ *
+ * @q:  the request queue for the device
+ * @max_verify_sectors: maximum number of sectors to verify per command
+ **/
+void blk_queue_max_verify_sectors(struct request_queue *q,
+		unsigned int max_verify_sectors)
+{
+	q->limits.max_verify_sectors = max_verify_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_verify_sectors);
+
 /**
  * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
  * @q:  the request queue for the device
@@ -554,6 +569,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors);
 	t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
 					b->max_write_zeroes_sectors);
+	t->max_verify_sectors = min(t->max_verify_sectors,
+				    b->max_verify_sectors);
 	t->max_zone_append_sectors = min(t->max_zone_append_sectors,
 					b->max_zone_append_sectors);
 	t->bounce = max(t->bounce, b->bounce);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c0303026752d..bae72999a230 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -113,6 +113,12 @@ queue_ra_store(struct request_queue *q, const char *page, size_t count)
 	return ret;
 }
 
+static ssize_t queue_verify_max_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_verify_sectors << 9);
+}
+
 static ssize_t queue_max_sectors_show(struct request_queue *q, char *page)
 {
 	int max_sectors_kb = queue_max_sectors(q) >> 1;
@@ -593,6 +599,7 @@ QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
 
 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
 QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
+QUEUE_RO_ENTRY(queue_verify_max, "verify_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
 QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
 
@@ -650,6 +657,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_write_zeroes_max_entry.attr,
+	&queue_verify_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
 	&queue_zone_write_granularity_entry.attr,
 	&queue_nonrot_entry.attr,
diff --git a/block/blk.h b/block/blk.h
index b71e22c97d77..af0a4942812f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -132,6 +132,9 @@ static inline bool rq_mergeable(struct request *rq)
 	if (req_op(rq) == REQ_OP_WRITE_ZEROES)
 		return false;
 
+	if (req_op(rq) == REQ_OP_VERIFY)
+		return false;
+
 	if (req_op(rq) == REQ_OP_ZONE_APPEND)
 		return false;
 
@@ -169,6 +172,9 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
 	if (unlikely(op == REQ_OP_WRITE_ZEROES))
 		return q->limits.max_write_zeroes_sectors;
 
+	if (unlikely(op == REQ_OP_VERIFY))
+		return q->limits.max_verify_sectors;
+
 	return q->limits.max_sectors;
 }
 
@@ -299,6 +305,7 @@ static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_VERIFY:
 		return true; /* non-trivial splitting decisions */
 	default:
 		break;
diff --git a/block/ioctl.c b/block/ioctl.c
index 60121e89052b..31094e14f42a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -192,6 +192,39 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 	return err;
 }
 
+static int blk_ioctl_verify(struct block_device *bdev, fmode_t mode,
+		unsigned long arg)
+{
+	uint64_t range[2];
+	struct address_space *mapping;
+	uint64_t start, end, len;
+
+	if (!(mode & FMODE_READ))
+		return -EBADF;
+
+	if (copy_from_user(range, (void __user *)arg, sizeof(range)))
+		return -EFAULT;
+
+	start = range[0];
+	len = range[1];
+	end = start + len - 1;
+
+	if (start & 511)
+		return -EINVAL;
+	if (len & 511)
+		return -EINVAL;
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	return blkdev_issue_verify(bdev, start >> 9, len >> 9, GFP_KERNEL);
+}
+
 static int put_ushort(unsigned short __user *argp, unsigned short val)
 {
 	return put_user(val, argp);
@@ -483,6 +516,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 		return blk_ioctl_secure_erase(bdev, mode, argp);
 	case BLKZEROOUT:
 		return blk_ioctl_zeroout(bdev, mode, arg);
+	case BLKVERIFY:
+		return blk_ioctl_verify(bdev, mode, arg);
 	case BLKGETDISKSEQ:
 		return put_u64(argp, bdev->bd_disk->diskseq);
 	case BLKREPORTZONE:
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 992ee987f273..31fa66c23485 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -55,7 +55,8 @@ static inline bool bio_has_data(struct bio *bio)
 	    bio->bi_iter.bi_size &&
 	    bio_op(bio) != REQ_OP_DISCARD &&
 	    bio_op(bio) != REQ_OP_SECURE_ERASE &&
-	    bio_op(bio) != REQ_OP_WRITE_ZEROES)
+	    bio_op(bio) != REQ_OP_WRITE_ZEROES &&
+	    bio_op(bio) != REQ_OP_VERIFY)
 		return true;
 
 	return false;
@@ -65,7 +66,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
 {
 	return bio_op(bio) == REQ_OP_DISCARD ||
 	       bio_op(bio) == REQ_OP_SECURE_ERASE ||
-	       bio_op(bio) == REQ_OP_WRITE_ZEROES;
+	       bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+	       bio_op(bio) == REQ_OP_VERIFY;
 }
 
 static inline void *bio_data(struct bio *bio)
@@ -176,7 +178,7 @@ static inline unsigned bio_segments(struct bio *bio)
 	struct bvec_iter iter;
 
 	/*
-	 * We special case discard/write same/write zeroes, because they
+	 * We special case discard/write same/write zeroes/verify, because they
 	 * interpret bi_size differently:
 	 */
 
@@ -184,6 +186,7 @@ static inline unsigned bio_segments(struct bio *bio)
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_VERIFY:
 		return 0;
 	default:
 		break;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a24d4078fb21..0d5383fc84ed 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -363,6 +363,8 @@ enum req_opf {
 	REQ_OP_FLUSH		= 2,
 	/* discard sectors */
 	REQ_OP_DISCARD		= 3,
+	/* Verify the sectors */
+	REQ_OP_VERIFY		= 6,
 	/* securely erase sectors */
 	REQ_OP_SECURE_ERASE	= 5,
 	/* write the zero filled sector many times */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 22c477fadc0f..8a44f442af9d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -296,6 +296,7 @@ struct queue_limits {
 	unsigned int		max_hw_discard_sectors;
 	unsigned int		max_secure_erase_sectors;
 	unsigned int		max_write_zeroes_sectors;
+	unsigned int		max_verify_sectors;
 	unsigned int		max_zone_append_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
@@ -930,6 +931,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors);
 extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
 		unsigned int max_write_same_sectors);
+extern void blk_queue_max_verify_sectors(struct request_queue *q,
+		unsigned int max_verify_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors);
@@ -1079,6 +1082,12 @@ extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned flags);
 
+extern int __blkdev_issue_verify(struct block_device *bdev,
+		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+		struct bio **biop);
+extern int blkdev_issue_verify(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask);
+
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {
@@ -1253,6 +1262,16 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
 	return 0;
 }
 
+static inline unsigned int bdev_verify_sectors(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return q->limits.max_verify_sectors;
+
+	return 0;
+}
+
 static inline bool bdev_nonrot(struct block_device *bdev)
 {
 	return blk_queue_nonrot(bdev_get_queue(bdev));
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index bdf7b404b3e7..ad0e5cb5cac4 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -185,6 +185,7 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKVERIFY _IO(0x12,129)
 /*
  * A jump here: 130-136 are reserved for zoned block devices
  * (see uapi/linux/blkzoned.h)
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 2/6] nvme: add support for the Verify command
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 1/6] " Chaitanya Kulkarni
@ 2022-07-13  7:20 ` Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 3/6] nvmet: add Verify command support for bdev-ns Chaitanya Kulkarni
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

Allow verify operations (REQ_OP_VERIFY) on the block device, if the
device supports optional command bit set for verify. Add support
to setup verify command. Set maximum possible verify sectors in one
verify command according to maximum hardware sectors supported by the
controller.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 drivers/nvme/host/core.c | 31 +++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 include/linux/nvme.h     | 19 +++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 252ab0a4bf8d..8b09ccc16184 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -838,6 +838,19 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 	return BLK_STS_OK;
 }
 
+static inline blk_status_t nvme_setup_verify(struct nvme_ns *ns,
+		struct request *req, struct nvme_command *cmnd)
+{
+	cmnd->verify.opcode = nvme_cmd_verify;
+	cmnd->verify.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->verify.slba =
+		cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
+	cmnd->verify.length =
+		cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+	cmnd->verify.control = 0;
+	return BLK_STS_OK;
+}
+
 static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 		struct request *req, struct nvme_command *cmnd,
 		enum nvme_opcode op)
@@ -943,6 +956,9 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 	case REQ_OP_WRITE_ZEROES:
 		ret = nvme_setup_write_zeroes(ns, req, cmd);
 		break;
+	case REQ_OP_VERIFY:
+		ret = nvme_setup_verify(ns, req, cmd);
+		break;
 	case REQ_OP_DISCARD:
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
@@ -1672,6 +1688,17 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
 		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
+static void nvme_config_verify(struct gendisk *disk, struct nvme_ns *ns)
+{
+	unsigned int sects = ns->ctrl->max_verify_sectors;
+
+	if (!(ns->ctrl->oncs & NVME_CTRL_ONCS_VERIFY))
+		return;
+
+	/* in case controller supports verify but vsl is 0 just use UINT_MAX */
+	blk_queue_max_verify_sectors(disk->queue, sects ? sects : UINT_MAX);
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
 	return uuid_equal(&a->uuid, &b->uuid) &&
@@ -1871,6 +1898,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	set_capacity_and_notify(disk, capacity);
 
 	nvme_config_discard(disk, ns);
+	nvme_config_verify(disk, ns);
 	blk_queue_max_write_zeroes_sectors(disk->queue,
 					   ns->ctrl->max_zeroes_sectors);
 }
@@ -2971,6 +2999,9 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
 	if (id->wzsl)
 		ctrl->max_zeroes_sectors = nvme_mps_to_sectors(ctrl, id->wzsl);
 
+	if (id->vsl)
+		ctrl->max_verify_sectors = nvme_mps_to_sectors(ctrl, id->vsl);
+
 free_data:
 	kfree(id);
 	return ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 7323a2f61126..3bb58282585d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -279,6 +279,7 @@ struct nvme_ctrl {
 	u32 max_discard_sectors;
 	u32 max_discard_segments;
 	u32 max_zeroes_sectors;
+	u32 max_verify_sectors;
 #ifdef CONFIG_BLK_DEV_ZONED
 	u32 max_zone_append;
 #endif
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 07cfc922f8e4..967ec7257102 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -363,6 +363,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_ONCS_RESERVATIONS		= 1 << 5,
 	NVME_CTRL_ONCS_TIMESTAMP		= 1 << 6,
+	NVME_CTRL_ONCS_VERIFY			= 1 << 7,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
 	NVME_CTRL_OACS_NS_MNGT_SUPP		= 1 << 3,
@@ -1003,6 +1004,23 @@ struct nvme_write_zeroes_cmd {
 	__le16			appmask;
 };
 
+struct nvme_verify_cmd {
+	__u8			opcode;
+	__u8			flags;
+	__u16			command_id;
+	__le32			nsid;
+	__u64			rsvd2;
+	__le64			metadata;
+	union nvme_data_ptr	dptr;
+	__le64			slba;
+	__le16			length;
+	__le16			control;
+	__le32			rsvd3;
+	__le32			reftag;
+	__le16			eapptag;
+	__le16			eappmask;
+};
+
 enum nvme_zone_mgmt_action {
 	NVME_ZONE_CLOSE		= 0x1,
 	NVME_ZONE_FINISH	= 0x2,
@@ -1541,6 +1559,7 @@ struct nvme_command {
 		struct nvme_format_cmd format;
 		struct nvme_dsm_cmd dsm;
 		struct nvme_write_zeroes_cmd write_zeroes;
+		struct nvme_verify_cmd verify;
 		struct nvme_zone_mgmt_send_cmd zms;
 		struct nvme_zone_mgmt_recv_cmd zmr;
 		struct nvme_abort_cmd abort;
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 3/6] nvmet: add Verify command support for bdev-ns
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 1/6] " Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 2/6] nvme: add support for the Verify command Chaitanya Kulkarni
@ 2022-07-13  7:20 ` Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 4/6] nvmet: add Verify emulation " Chaitanya Kulkarni
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

Add support for handling verify command on target. Call into
__blkdev_issue_verify, which the block layer expands into the
REQ_OP_VERIFY LBAs.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 drivers/nvme/target/admin-cmd.c   |  3 ++-
 drivers/nvme/target/io-cmd-bdev.c | 38 +++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 397daaf51f1b..495c3a31473a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,7 +431,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
-			NVME_CTRL_ONCS_WRITE_ZEROES);
+			NVME_CTRL_ONCS_WRITE_ZEROES |
+			NVME_CTRL_ONCS_VERIFY);
 
 	/* XXX: don't report vwc if the underlying device is write through */
 	id->vwc = NVME_CTRL_VWC_PRESENT;
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 27a72504d31c..82c341f90f06 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -146,6 +146,7 @@ u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
 		switch (req->cmd->common.opcode) {
 		case nvme_cmd_dsm:
 		case nvme_cmd_write_zeroes:
+		case nvme_cmd_verify:
 			status = NVME_SC_ONCS_NOT_SUPPORTED | NVME_SC_DNR;
 			break;
 		default:
@@ -171,6 +172,9 @@ u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
 		req->error_slba =
 			le64_to_cpu(req->cmd->write_zeroes.slba);
 		break;
+	case nvme_cmd_verify:
+		req->error_slba = le64_to_cpu(req->cmd->verify.slba);
+		break;
 	default:
 		req->error_slba = 0;
 	}
@@ -442,6 +446,37 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static void nvmet_bdev_execute_verify(struct nvmet_req *req)
+{
+	struct nvme_verify_cmd *verify = &req->cmd->verify;
+	struct bio *bio = NULL;
+	sector_t nr_sector;
+	sector_t sector;
+	int ret;
+
+	if (!nvmet_check_transfer_len(req, 0))
+		return;
+
+	if (!bdev_verify_sectors(req->ns->bdev)) {
+		nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
+		return;
+	}
+
+	sector = le64_to_cpu(verify->slba) << (req->ns->blksize_shift - 9);
+	nr_sector = (((sector_t)le16_to_cpu(verify->length) + 1) <<
+			(req->ns->blksize_shift - 9));
+
+	ret = __blkdev_issue_verify(req->ns->bdev, sector, nr_sector,
+			GFP_KERNEL, &bio);
+	if (bio) {
+		bio->bi_private = req;
+		bio->bi_end_io = nvmet_bio_done;
+		submit_bio(bio);
+	} else {
+		nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+	}
+}
+
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 {
 	switch (req->cmd->common.opcode) {
@@ -460,6 +495,9 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_write_zeroes:
 		req->execute = nvmet_bdev_execute_write_zeroes;
 		return 0;
+	case nvme_cmd_verify:
+		req->execute = nvmet_bdev_execute_verify;
+		return 0;
 	default:
 		return nvmet_report_invalid_opcode(req);
 	}
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 4/6] nvmet: add Verify emulation support for bdev-ns
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
                   ` (2 preceding siblings ...)
  2022-07-13  7:20 ` [PATCH V2 3/6] nvmet: add Verify command support for bdev-ns Chaitanya Kulkarni
@ 2022-07-13  7:20 ` Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 5/6] null_blk: add REQ_OP_VERIFY support Chaitanya Kulkarni
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

Not all devices can support verify requests which can be mapped to
the controller specific command. This patch adds a way to emulate
REQ_OP_VERIFY for NVMeOF block device namespace.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 48 ++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 82c341f90f06..aec287d3b7d7 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -446,35 +446,63 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
-static void nvmet_bdev_execute_verify(struct nvmet_req *req)
+static void __nvmet_req_to_verify_sectors(struct nvmet_req *req,
+		sector_t *sects, sector_t *nr_sects)
 {
 	struct nvme_verify_cmd *verify = &req->cmd->verify;
+
+	*sects = le64_to_cpu(verify->slba) << (req->ns->blksize_shift - 9);
+	*nr_sects = (((sector_t)le16_to_cpu(verify->length) + 1) <<
+			(req->ns->blksize_shift - 9));
+}
+
+static void nvmet_bdev_submit_emulate_verify(struct nvmet_req *req)
+{
+	sector_t nr_sector;
+	sector_t sector;
+	int ret = 0;
+
+	__nvmet_req_to_verify_sectors(req, &sector, &nr_sector);
+	if (!nr_sector)
+		goto out;
+
+	/* blkdev_issue_verify() will automatically emulate */
+	ret = blkdev_issue_verify(req->ns->bdev, sector, nr_sector,
+			GFP_KERNEL);
+out:
+	nvmet_req_complete(req,
+		blk_to_nvme_status(req, errno_to_blk_status(ret)));
+}
+
+static void nvmet_bdev_execute_verify(struct nvmet_req *req)
+{
 	struct bio *bio = NULL;
 	sector_t nr_sector;
 	sector_t sector;
-	int ret;
+	int ret = 0;
 
 	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
+	__nvmet_req_to_verify_sectors(req, &sector, &nr_sector);
+	if (!nr_sector)
+		goto out;
+
 	if (!bdev_verify_sectors(req->ns->bdev)) {
-		nvmet_req_complete(req, NVME_SC_INTERNAL | NVME_SC_DNR);
+		nvmet_bdev_submit_emulate_verify(req);
 		return;
 	}
 
-	sector = le64_to_cpu(verify->slba) << (req->ns->blksize_shift - 9);
-	nr_sector = (((sector_t)le16_to_cpu(verify->length) + 1) <<
-			(req->ns->blksize_shift - 9));
-
 	ret = __blkdev_issue_verify(req->ns->bdev, sector, nr_sector,
 			GFP_KERNEL, &bio);
-	if (bio) {
+	if (ret == 0 && bio) {
 		bio->bi_private = req;
 		bio->bi_end_io = nvmet_bio_done;
 		submit_bio(bio);
-	} else {
-		nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+		return;
 	}
+out:
+	nvmet_req_complete(req, errno_to_nvme_status(req, ret));
 }
 
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 5/6] null_blk: add REQ_OP_VERIFY support
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
                   ` (3 preceding siblings ...)
  2022-07-13  7:20 ` [PATCH V2 4/6] nvmet: add Verify emulation " Chaitanya Kulkarni
@ 2022-07-13  7:20 ` Chaitanya Kulkarni
  2022-07-13  7:20 ` [PATCH V2 6/6] scsi: sd: add support for REQ_OP_VERIFY Chaitanya Kulkarni
  2022-07-13 12:19 ` [PATCH V2 0/6] block: " Matthew Wilcox
  6 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

Add a new module parameter, configfs attribute to configure handling of
the REQ_OP_VERIFY. This is needed for testing newly added REQ_OP_VERIFY
block layer operation.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 drivers/block/null_blk/main.c     | 20 +++++++++++++++++++-
 drivers/block/null_blk/null_blk.h |  1 +
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index d44629538cc4..c31cad169595 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -77,6 +77,10 @@ enum {
 	NULL_IRQ_TIMER		= 2,
 };
 
+static bool g_verify = true;
+module_param_named(verify, g_verify, bool, 0444);
+MODULE_PARM_DESC(verify, "Allow REQ_OP_VERIFY processing. Default: true");
+
 static bool g_virt_boundary = false;
 module_param_named(virt_boundary, g_virt_boundary, bool, 0444);
 MODULE_PARM_DESC(virt_boundary, "Require a virtual boundary for the device. Default: False");
@@ -400,6 +404,7 @@ NULLB_DEVICE_ATTR(blocking, bool, NULL);
 NULLB_DEVICE_ATTR(use_per_node_hctx, bool, NULL);
 NULLB_DEVICE_ATTR(memory_backed, bool, NULL);
 NULLB_DEVICE_ATTR(discard, bool, NULL);
+NULLB_DEVICE_ATTR(verify, bool, NULL);
 NULLB_DEVICE_ATTR(mbps, uint, NULL);
 NULLB_DEVICE_ATTR(cache_size, ulong, NULL);
 NULLB_DEVICE_ATTR(zoned, bool, NULL);
@@ -522,6 +527,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_power,
 	&nullb_device_attr_memory_backed,
 	&nullb_device_attr_discard,
+	&nullb_device_attr_verify,
 	&nullb_device_attr_mbps,
 	&nullb_device_attr_cache_size,
 	&nullb_device_attr_badblocks,
@@ -588,7 +594,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item)
 static ssize_t memb_group_features_show(struct config_item *item, char *page)
 {
 	return snprintf(page, PAGE_SIZE,
-			"memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_capacity,zone_nr_conv,zone_max_open,zone_max_active,blocksize,max_sectors,virt_boundary\n");
+			"memory_backed,discard,verify,bandwidth,cache,badblocks,zoned,zone_size,zone_capacity,zone_nr_conv,zone_max_open,zone_max_active,blocksize,max_sectors,virt_boundary\n");
 }
 
 CONFIGFS_ATTR_RO(memb_group_, features);
@@ -651,6 +657,7 @@ static struct nullb_device *null_alloc_dev(void)
 	dev->hw_queue_depth = g_hw_queue_depth;
 	dev->blocking = g_blocking;
 	dev->use_per_node_hctx = g_use_per_node_hctx;
+	dev->verify = g_verify;
 	dev->zoned = g_zoned;
 	dev->zone_size = g_zone_size;
 	dev->zone_capacity = g_zone_capacity;
@@ -1394,6 +1401,10 @@ blk_status_t null_process_cmd(struct nullb_cmd *cmd,
 			return ret;
 	}
 
+	/* currently implemented as noop */
+	if (op == REQ_OP_VERIFY)
+		return 0;
+
 	if (dev->memory_backed)
 		return null_handle_memory_backed(cmd, op, sector, nr_sectors);
 
@@ -1769,6 +1780,12 @@ static void null_config_discard(struct nullb *nullb)
 	blk_queue_max_discard_sectors(nullb->q, UINT_MAX >> 9);
 }
 
+static void null_config_verify(struct nullb *nullb)
+{
+	blk_queue_max_verify_sectors(nullb->q,
+				     nullb->dev->verify ? UINT_MAX >> 9 : 0);
+}
+
 static const struct block_device_operations null_bio_ops = {
 	.owner		= THIS_MODULE,
 	.submit_bio	= null_submit_bio,
@@ -2058,6 +2075,7 @@ static int null_add_dev(struct nullb_device *dev)
 		blk_queue_virt_boundary(nullb->q, PAGE_SIZE - 1);
 
 	null_config_discard(nullb);
+	null_config_verify(nullb);
 
 	if (config_item_name(&dev->item)) {
 		/* Use configfs dir name as the device name */
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 8359b43842f2..2a1df1bc8165 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -111,6 +111,7 @@ struct nullb_device {
 	bool power; /* power on/off the device */
 	bool memory_backed; /* if data is stored in memory */
 	bool discard; /* if support discard */
+	bool verify; /* if support verify */
 	bool zoned; /* if device is zoned */
 	bool virt_boundary; /* virtual boundary on/off for the device */
 };
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 6/6] scsi: sd: add support for REQ_OP_VERIFY
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
                   ` (4 preceding siblings ...)
  2022-07-13  7:20 ` [PATCH V2 5/6] null_blk: add REQ_OP_VERIFY support Chaitanya Kulkarni
@ 2022-07-13  7:20 ` Chaitanya Kulkarni
  2022-07-13 12:19 ` [PATCH V2 0/6] block: " Matthew Wilcox
  6 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13  7:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel
  Cc: axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, willy, jefflexu, josef, clm, dsterba,
	jack, tytso, adilger.kernel, jlayton, idryomov, danil.kipnis,
	ebiggers, jinpu.wang, Chaitanya Kulkarni

Add support to handle REQ_OP_VERIFY req_op and map it on VERIFY (16)
or VERIFY (10) in the sd driver. In case SCSI command VERIFY (16) is not
supported use SCSI command VERIFY (10). Tested with scsi_debug.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 drivers/scsi/sd.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.h |   5 ++
 2 files changed, 129 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index eb02d939dd44..8ba8bdd78ebd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -101,6 +101,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 static void sd_config_discard(struct scsi_disk *, unsigned int);
 static void sd_config_write_same(struct scsi_disk *);
+static void sd_config_verify(struct scsi_disk *sdkp);
 static int  sd_revalidate_disk(struct gendisk *);
 static void sd_unlock_native_capacity(struct gendisk *disk);
 static int  sd_probe(struct device *);
@@ -519,6 +520,43 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RW(max_write_same_blocks);
 
+static ssize_t
+max_verify_blocks_show(struct device *dev, struct device_attribute *attr,
+		       char *buf)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+	return sprintf(buf, "%u\n", sdkp->max_verify_blocks);
+}
+
+static ssize_t
+max_verify_blocks_store(struct device *dev, struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+	struct scsi_device *sdp = sdkp->device;
+	unsigned long max;
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC)
+		return -EINVAL;
+
+	err = kstrtoul(buf, 10, &max);
+
+	if (err)
+		return err;
+
+	sdkp->max_verify_blocks = max;
+
+	sd_config_verify(sdkp);
+
+	return count;
+}
+static DEVICE_ATTR_RW(max_verify_blocks);
+
 static ssize_t
 zoned_cap_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
@@ -579,6 +617,7 @@ static struct attribute *sd_disk_attrs[] = {
 	&dev_attr_provisioning_mode.attr,
 	&dev_attr_zeroing_mode.attr,
 	&dev_attr_max_write_same_blocks.attr,
+	&dev_attr_max_verify_blocks.attr,
 	&dev_attr_max_medium_access_timeouts.attr,
 	&dev_attr_zoned_cap.attr,
 	&dev_attr_max_retries.attr,
@@ -1018,6 +1057,68 @@ static void sd_config_write_same(struct scsi_disk *sdkp)
 					 (logical_block_size >> 9));
 }
 
+static blk_status_t sd_setup_verify16_cmnd(struct scsi_cmnd *cmd, u64 lba,
+					   u32 nr_blocks)
+{
+	cmd->cmd_len = 16;
+	cmd->cmnd[0] = VERIFY_16;
+	put_unaligned_be64(lba, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_blocks, &cmd->cmnd[10]);
+	cmd->cmnd[14] = 0;
+	cmd->cmnd[15] = 0;
+
+	return BLK_STS_OK;
+}
+
+static blk_status_t sd_setup_verify10_cmnd(struct scsi_cmnd *cmd, u64 lba,
+					   u32 nr_blocks)
+{
+	if (lba > 0xffffffff && nr_blocks > 0xffff)
+		return BLK_STS_NOTSUPP;
+
+	cmd->cmd_len = 10;
+	cmd->cmnd[0] = VERIFY;
+	put_unaligned_be32((u32)lba, &cmd->cmnd[2]);
+	put_unaligned_be16((u16)nr_blocks, &cmd->cmnd[6]);
+	cmd->cmnd[9] = 0;
+
+	return BLK_STS_OK;
+}
+
+static blk_status_t sd_setup_verify_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = scsi_cmd_to_rq(cmd);
+	struct scsi_disk *sdkp = scsi_disk(rq->q->disk);
+	struct scsi_device *sdp = cmd->device;
+	u64 lba = sectors_to_logical(sdp, blk_rq_pos(rq));
+	u32 nr_blocks = sectors_to_logical(sdp, blk_rq_sectors(rq));
+
+	if (!sdkp->verify16 && !sdkp->verify10)
+		goto out;
+
+	cmd->allowed = SD_MAX_RETRIES;
+	cmd->sc_data_direction = DMA_NONE;
+	cmd->transfersize = 0;
+	/* skip veprotect / dpo / bytchk */
+	cmd->cmnd[1] = 0;
+
+	if (sdkp->verify16)
+		return sd_setup_verify16_cmnd(cmd, lba, nr_blocks);
+	if (sdkp->verify10)
+		return sd_setup_verify10_cmnd(cmd, lba, nr_blocks);
+out:
+	return BLK_STS_TARGET;
+}
+
+static void sd_config_verify(struct scsi_disk *sdkp)
+{
+	unsigned int max_verify_sectors = sdkp->max_verify_blocks;
+	unsigned int logical_bs = sdkp->device->sector_size;
+	struct request_queue *q = sdkp->disk->queue;
+
+	blk_queue_max_verify_sectors(q, max_verify_sectors * (logical_bs >> 9));
+}
+
 static blk_status_t sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
 {
 	struct request *rq = scsi_cmd_to_rq(cmd);
@@ -1244,6 +1345,8 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd)
 		}
 	case REQ_OP_WRITE_ZEROES:
 		return sd_setup_write_zeroes_cmnd(cmd);
+	case REQ_OP_VERIFY:
+		return sd_setup_verify_cmnd(cmd);
 	case REQ_OP_FLUSH:
 		return sd_setup_flush_cmnd(cmd);
 	case REQ_OP_READ:
@@ -1935,6 +2038,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	switch (req_op(req)) {
 	case REQ_OP_DISCARD:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_VERIFY:
 	case REQ_OP_ZONE_RESET:
 	case REQ_OP_ZONE_RESET_ALL:
 	case REQ_OP_ZONE_OPEN:
@@ -3021,6 +3125,24 @@ static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer)
 		sdkp->ws10 = 1;
 }
 
+static void sd_read_verify(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	struct scsi_device *sdev = sdkp->device;
+
+	if (scsi_report_opcode(sdev, buffer, SD_BUF_SIZE, VERIFY_16)) {
+		sd_printk(KERN_DEBUG, sdkp, "VERIFY16 supported\n");
+		sdkp->verify16 = 1;
+		sdkp->max_verify_blocks = SD_MAX_VERIFY16_BLOCKS;
+		return;
+	}
+
+	if (scsi_report_opcode(sdev, buffer, SD_BUF_SIZE, VERIFY)) {
+		sd_printk(KERN_DEBUG, sdkp, "VERIFY10 supported\n");
+		sdkp->verify10 = 1;
+		sdkp->max_verify_blocks = SD_MAX_VERIFY10_BLOCKS;
+	}
+}
+
 static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
 {
 	struct scsi_device *sdev = sdkp->device;
@@ -3264,6 +3386,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
 		sd_read_write_same(sdkp, buffer);
+		sd_read_verify(sdkp, buffer);
 		sd_read_security(sdkp, buffer);
 		sd_config_protection(sdkp);
 	}
@@ -3312,6 +3435,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 
 	set_capacity_and_notify(disk, logical_to_sectors(sdp, sdkp->capacity));
 	sd_config_write_same(sdkp);
+	sd_config_verify(sdkp);
 	kfree(buffer);
 
 	/*
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5eea762f84d1..249100e2ea1f 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -49,6 +49,8 @@ enum {
 	SD_MAX_XFER_BLOCKS = 0xffffffff,
 	SD_MAX_WS10_BLOCKS = 0xffff,
 	SD_MAX_WS16_BLOCKS = 0x7fffff,
+	SD_MAX_VERIFY10_BLOCKS = 0xffff,
+	SD_MAX_VERIFY16_BLOCKS = 0xffffff,
 };
 
 enum {
@@ -118,6 +120,7 @@ struct scsi_disk {
 	u32		max_xfer_blocks;
 	u32		opt_xfer_blocks;
 	u32		max_ws_blocks;
+	u32		max_verify_blocks;
 	u32		max_unmap_blocks;
 	u32		unmap_granularity;
 	u32		unmap_alignment;
@@ -145,6 +148,8 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned        verify10 : 1;
+	unsigned        verify16 : 1;
 	unsigned	rc_basis: 2;
 	unsigned	zoned: 2;
 	unsigned	urswrz : 1;
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 0/6] block: add support for REQ_OP_VERIFY
  2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
                   ` (5 preceding siblings ...)
  2022-07-13  7:20 ` [PATCH V2 6/6] scsi: sd: add support for REQ_OP_VERIFY Chaitanya Kulkarni
@ 2022-07-13 12:19 ` Matthew Wilcox
  2022-07-13 15:43   ` Chaitanya Kulkarni
  6 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2022-07-13 12:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel,
	axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, jefflexu, josef, clm, dsterba, jack,
	tytso, adilger.kernel, jlayton, idryomov, danil.kipnis, ebiggers,
	jinpu.wang

On Wed, Jul 13, 2022 at 12:20:13AM -0700, Chaitanya Kulkarni wrote:
> Hi,
> 
> One of the responsibilities of the Operating System, along with managing
> resources, is to provide a unified interface to the user by creating
> hardware abstractions. In the Linux Kernel storage stack that
> abstraction is created by implementing the generic request operations
> such as REQ_OP_READ/REQ_OP_WRITE or REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES,
> etc that are mapped to the specific low-level hardware protocol commands 
> e.g. SCSI or NVMe.

Still NAK, see previous thread for reasons.  Somewhat disappointing that
you sent a new version instead of addressing those comments first.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 0/6] block: add support for REQ_OP_VERIFY
  2022-07-13 12:19 ` [PATCH V2 0/6] block: " Matthew Wilcox
@ 2022-07-13 15:43   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2022-07-13 15:43 UTC (permalink / raw)
  To: Matthew Wilcox, Chaitanya Kulkarni
  Cc: linux-block, linux-scsi, linux-nvme, linux-xfs, linux-fsdevel,
	axboe, agk, song, djwong, kbusch, hch, sagi, jejb,
	martin.petersen, viro, javier, johannes.thumshirn, bvanassche,
	dongli.zhang, ming.lei, jefflexu, josef, clm, dsterba, jack,
	tytso, adilger.kernel, jlayton, idryomov, danil.kipnis, ebiggers,
	jinpu.wang

On 7/13/2022 5:19 AM, Matthew Wilcox wrote:
> On Wed, Jul 13, 2022 at 12:20:13AM -0700, Chaitanya Kulkarni wrote:
>> Hi,
>>
>> One of the responsibilities of the Operating System, along with managing
>> resources, is to provide a unified interface to the user by creating
>> hardware abstractions. In the Linux Kernel storage stack that
>> abstraction is created by implementing the generic request operations
>> such as REQ_OP_READ/REQ_OP_WRITE or REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES,
>> etc that are mapped to the specific low-level hardware protocol commands
>> e.g. SCSI or NVMe.
> 
> Still NAK, see previous thread for reasons.  Somewhat disappointing that
> you sent a new version instead of addressing those comments first.

I've also asked you exactly what do you expect as I did not understand
your first reply.

I've added the fixes to this version based on the other comments that 
I've received as I think they still needs be to reviewed.

As I can see your reply earlier today, I'll continue discussion on that
thread.

-ck



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-13 15:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-13  7:20 [PATCH V2 0/6] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
2022-07-13  7:20 ` [PATCH V2 1/6] " Chaitanya Kulkarni
2022-07-13  7:20 ` [PATCH V2 2/6] nvme: add support for the Verify command Chaitanya Kulkarni
2022-07-13  7:20 ` [PATCH V2 3/6] nvmet: add Verify command support for bdev-ns Chaitanya Kulkarni
2022-07-13  7:20 ` [PATCH V2 4/6] nvmet: add Verify emulation " Chaitanya Kulkarni
2022-07-13  7:20 ` [PATCH V2 5/6] null_blk: add REQ_OP_VERIFY support Chaitanya Kulkarni
2022-07-13  7:20 ` [PATCH V2 6/6] scsi: sd: add support for REQ_OP_VERIFY Chaitanya Kulkarni
2022-07-13 12:19 ` [PATCH V2 0/6] block: " Matthew Wilcox
2022-07-13 15:43   ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.