From: Douglas Gilbert <dgilbert@interlog.com>
To: Chaitanya Kulkarni <chaitanyak@nvidia.com>,
linux-block@vger.kernel.org, linux-raid@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
target-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org, dm-devel@redhat.com
Cc: axboe@kernel.dk, agk@redhat.com, snitzer@redhat.com,
song@kernel.org, djwong@kernel.org, kbusch@kernel.org,
hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com,
martin.petersen@oracle.com, viro@zeniv.linux.org.uk,
javier@javigon.com, johannes.thumshirn@wdc.com,
bvanassche@acm.org, dongli.zhang@oracle.com, ming.lei@redhat.com,
osandov@fb.com, willy@infradead.org, jefflexu@linux.alibaba.com,
josef@toxicpanda.com, clm@fb.com, dsterba@suse.com,
jack@suse.com, tytso@mit.edu, adilger.kernel@dilger.ca,
jlayton@kernel.org, idryomov@gmail.com,
danil.kipnis@cloud.ionos.com, ebiggers@google.com,
jinpu.wang@cloud.ionos.com, Chaitanya Kulkarni <kch@nvidia.com>
Subject: Re: [RFC PATCH 0/8] block: add support for REQ_OP_VERIFY
Date: Thu, 4 Nov 2021 11:16:41 -0400 [thread overview]
Message-ID: <7f734d14-c107-daa3-aaa8-0eda3c592add@interlog.com> (raw)
In-Reply-To: <20211104064634.4481-1-chaitanyak@nvidia.com>
On 2021-11-04 2:46 a.m., Chaitanya Kulkarni wrote:
> From: Chaitanya Kulkarni <kch@nvidia.com>
>
> Hi,
>
> One of the responsibilities of the Operating System, along with managing
> resources, is to provide a unified interface to the user by creating
> hardware abstractions. In the Linux Kernel storage stack that
> abstraction is created by implementing the generic request operations
> such as REQ_OP_READ/REQ_OP_WRITE or REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES,
> etc that are mapped to the specific low-level hardware protocol commands
> e.g. SCSI or NVMe.
>
> With that in mind, this RFC patch-series implements a new block layer
> operation to offload the data verification on to the controller if
> supported or emulate the operation if not. The main advantage is to free
> up the CPU and reduce the host link traffic since, for some devices,
> their internal bandwidth is higher than the host link and offloading this
> operation can improve the performance of the proactive error detection
> applications such as file system level scrubbing.
>
> * Background *
> -----------------------------------------------------------------------
>
> NVMe Specification provides a controller level Verify command [1] which
> is similar to the ATA Verify [2] command where the controller is
> responsible for data verification without transferring the data to the
> host. (Offloading LBAs verification). This is designed to proactively
> discover any data corruption issues when the device is free so that
> applications can protect sensitive data and take corrective action
> instead of waiting for failure to occur.
>
> The NVMe Verify command is added in order to provide low level media
> scrubbing and possibly moving the data to the right place in case it has
> correctable media degradation. Also, this provides a way to enhance
> file-system level scrubbing/checksum verification and optinally offload
> this task, which is CPU intensive, to the kernel (when emulated), over
> the fabric, and to the controller (when supported).
>
> This is useful when the controller's internal bandwidth is higher than
> the host's bandwith showing a sharp increase in the performance due to
> _no host traffic or host CPU involvement_.
>
> * Implementation *
> -----------------------------------------------------------------------
>
> Right now there is no generic interface which can be used by the
> in-kernel components such as file-system or userspace application
> (except passthru commands or some combination of write/read/compare) to
> issue verify command with the central block layer API. This can lead to
> each userspace applications having protocol specific IOCTL which
> defeates the purpose of having the OS provide a H/W abstraction.
>
> This patch series introduces a new block layer payloadless request
> operation REQ_OP_VERIFY that allows in-kernel components & userspace
> applications to verify the range of the LBAs by offloading checksum
> scrubbing/verification to the controller that is directly attached to
> the host. For direct attached devices this leads to decrease in the host
> DMA traffic and CPU usage and for the fabrics attached device over the
> network that leads to a decrease in the network traffic and CPU usage
> for both host & target.
>
> * Scope *
> -----------------------------------------------------------------------
>
> Please note this only covers the operating system level overhead.
> Analyzing controller verify command performance for common protocols
> (SCSI/NVMe) is out of scope for REQ_OP_VERIFY.
>
> * Micro Benchmarks *
> -----------------------------------------------------------------------
>
> When verifing 500GB of data on NVMeOF with nvme-loop and null_blk as a
> target backend block device results show almost a 80% performance
> increase :-
>
> With Verify resulting in REQ_OP_VERIFY to null_blk :-
>
> real 2m3.773s
> user 0m0.000s
> sys 0m59.553s
>
> With Emulation resulting in REQ_OP_READ null_blk :-
>
> real 12m18.964s
> user 0m0.002s
> sys 1m15.666s
>
>
> A detailed test log is included at the end of the cover letter.
> Each of the following was tested:
>
> 1. Direct Attached REQ_OP_VERIFY.
> 2. Fabrics Attached REQ_OP_VERIFY.
> 3. Multi-device (md) REQ_OP_VERIFY.
>
> * The complete picture *
> -----------------------------------------------------------------------
>
> For the completeness the whole kernel stack support is divided into
> two phases :-
>
> Phase I :-
>
> Add and stabilize the support for the Block layer & low level drivers
> such as SCSI, NVMe, MD, and NVMeOF, implement necessary emulations in
> the block layer if needed and provide block level tools such as
> _blkverify_. Also, add appropriate testcases for code-coverage.
>
> Phase II :-
>
> Add and stabilize the support for upper layer kernel components such
> as file-systems and provide userspace tools such _fsverify_ to route
> the request from file systems to block layer to Low level device
> drivers.
>
>
> Please note that the interfaces for blk-lib.c REQ_OP_VERIFY emulation
> will change in future I put together for the scope of RFC.
>
> Any comments are welcome.
Hi,
You may also want to consider higher level support for the NVME COMPARE
and SCSI VERIFY(BYTCHK=1) commands. Since PCIe and SAS transports are
full duplex, replacing two READs (plus a memcmp in host memory) with
one READ and one COMPARE may be a win on a bandwidth constrained
system. It is a safe to assume the data-in transfers on a storage transport
exceed (probably by a significant margin) the data-out transfers. An
offloaded COMPARE switches one of those data-in transfers to a data-out
transfer, so it should improve the bandwidth utilization.
I did some brief benchmarking on a NVME SSD's COMPARE command (its optional)
and the results were underwhelming. OTOH using my own dd variants (which
can do compare instead of copy) and a scsi_debug target (i.e. RAM) I have
seen compare times of > 15 GBps while a copy rarely exceeds 9 GBps.
BTW The SCSI VERIFY(BYTCHK=3) command compares one block sent from
the host with a sequence of logical blocks on the media. So, for example,
it would be a quick way of checking that a sequence of blocks contained
zero-ed data.
Doug Gilbert
next prev parent reply other threads:[~2021-11-04 15:22 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-04 6:46 [RFC PATCH 0/8] block: add support for REQ_OP_VERIFY Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 1/8] " Chaitanya Kulkarni
2021-11-04 17:25 ` Darrick J. Wong
2021-11-11 8:01 ` Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 2/8] scsi: add REQ_OP_VERIFY support Chaitanya Kulkarni
2021-11-04 12:33 ` Damien Le Moal
2021-11-11 8:07 ` Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 3/8] nvme: add support for the Verify command Chaitanya Kulkarni
2021-11-04 22:44 ` Keith Busch
2021-11-11 8:09 ` Chaitanya Kulkarni
2021-11-04 6:46 ` [PATCH 4/8] nvmet: add Verify command support for bdev-ns Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 5/8] nvmet: add Verify emulation " Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 6/8] nvmet: add verify emulation support for file-ns Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 7/8] null_blk: add REQ_OP_VERIFY support Chaitanya Kulkarni
2021-11-04 6:46 ` [RFC PATCH 8/8] md: add support for REQ_OP_VERIFY Chaitanya Kulkarni
2021-11-11 8:13 ` Chaitanya Kulkarni
2021-11-12 15:19 ` Mike Snitzer
2021-11-04 7:14 ` [RFC PATCH 0/8] block: " Christoph Hellwig
2021-11-04 9:27 ` Chaitanya Kulkarni
2021-11-04 17:32 ` Darrick J. Wong
2021-11-04 17:34 ` Christoph Hellwig
2021-11-04 22:37 ` Keith Busch
2021-11-05 8:25 ` javier
2021-11-11 8:18 ` Chaitanya Kulkarni
2021-11-04 15:16 ` Douglas Gilbert [this message]
2021-11-11 8:27 ` Chaitanya Kulkarni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7f734d14-c107-daa3-aaa8-0eda3c592add@interlog.com \
--to=dgilbert@interlog.com \
--cc=adilger.kernel@dilger.ca \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=chaitanyak@nvidia.com \
--cc=clm@fb.com \
--cc=danil.kipnis@cloud.ionos.com \
--cc=djwong@kernel.org \
--cc=dm-devel@redhat.com \
--cc=dongli.zhang@oracle.com \
--cc=dsterba@suse.com \
--cc=ebiggers@google.com \
--cc=hch@lst.de \
--cc=idryomov@gmail.com \
--cc=jack@suse.com \
--cc=javier@javigon.com \
--cc=jefflexu@linux.alibaba.com \
--cc=jejb@linux.ibm.com \
--cc=jinpu.wang@cloud.ionos.com \
--cc=jlayton@kernel.org \
--cc=johannes.thumshirn@wdc.com \
--cc=josef@toxicpanda.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@redhat.com \
--cc=osandov@fb.com \
--cc=sagi@grimberg.me \
--cc=snitzer@redhat.com \
--cc=song@kernel.org \
--cc=target-devel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).