From: Ilya Dryomov <idryomov@gmail.com>
To: Peter Lieven <pl@kamp.de>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
qemu-block@nongnu.org, ct@flyingcircus.io, qemu-devel@nongnu.org,
"Paolo Bonzini" <pbonzini@redhat.com>,
mreitz@redhat.com, "Jason Dillaman" <dillaman@redhat.com>
Subject: Re: [PATCH V4 5/6] block/rbd: add write zeroes support
Date: Fri, 2 Jul 2021 14:24:27 +0200 [thread overview]
Message-ID: <CAOi1vP8pkgyquGggTMLKN3RirmFxQMxSe2PVa_JjJKMQddt-wA@mail.gmail.com> (raw)
In-Reply-To: <20210702090935.15300-6-pl@kamp.de>
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven <pl@kamp.de> wrote:
>
> this patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores BDRV_REQ_MAY_UNMAP
> for older librbd versions.
>
> The rationale for this is as following (citing Ilya Dryomov current RBD maintainer):
> ---8<---
> a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
> and as a consequence always unmap if librbd is too old
>
> It's not clear what qemu's expectation is but in general Write
> Zeroes is allowed to unmap. The only guarantee is that subsequent
> reads return zeroes, everything else is a hint. This is how it is
> specified in the kernel and in the NVMe spec.
>
> In particular, block/nvme.c implements it as follows:
>
> if (flags & BDRV_REQ_MAY_UNMAP) {
> cdw12 |= (1 << 25);
> }
>
> This sets the Deallocate bit. But if it's not set, the device may
> still deallocate:
>
> """
> If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
> command, and the namespace supports clearing all bytes to 0h in the
> values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
> from a deallocated logical block and its metadata (excluding
> protection information), then for each specified logical block, the
> controller:
> - should deallocate that logical block;
>
> ...
>
> If the Deallocate bit is cleared to '0' in a Write Zeroes command,
> and the namespace supports clearing all bytes to 0h in the values
> read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
> a deallocated logical block and its metadata (excluding protection
> information), then, for each specified logical block, the
> controller:
> - may deallocate that logical block;
> """
>
> https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf
>
> b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
>
> Again, it's not clear what qemu expects here, but without it we end
> up in a ridiculous situation where specifying the "don't allow slow
> fallback" switch immediately fails all efficient zeroing requests on
> a device where Write Zeroes is always efficient:
>
> $ qemu-io -c 'help write' | grep -- '-[zun]'
> -n, -- with -z, don't allow slow fallback
> -u, -- with -z, allow unmapping
> -z, -- write zeroes using blk_co_pwrite_zeroes
>
> $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
> write failed: Operation not supported
> --->8---
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> block/rbd.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index be0471944a..149317d33c 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -63,7 +63,8 @@ typedef enum {
> RBD_AIO_READ,
> RBD_AIO_WRITE,
> RBD_AIO_DISCARD,
> - RBD_AIO_FLUSH
> + RBD_AIO_FLUSH,
> + RBD_AIO_WRITE_ZEROES
> } RBDAIOCmd;
>
> typedef struct BDRVRBDState {
> @@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
> }
> }
>
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
> +#endif
> +
> /* When extending regular files, we get zeros from the OS */
> bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>
> @@ -827,6 +832,18 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> case RBD_AIO_FLUSH:
> r = rbd_aio_flush(s->image, c);
> break;
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + case RBD_AIO_WRITE_ZEROES: {
> + int zero_flags = 0;
> +#ifdef RBD_WRITE_ZEROES_FLAG_THICK_PROVISION
> + if (!(flags & BDRV_REQ_MAY_UNMAP)) {
> + zero_flags = RBD_WRITE_ZEROES_FLAG_THICK_PROVISION;
> + }
> +#endif
> + r = rbd_aio_write_zeroes(s->image, offset, bytes, c, zero_flags, 0);
> + break;
> + }
> +#endif
> default:
> r = -EINVAL;
> }
> @@ -897,6 +914,16 @@ static int coroutine_fn qemu_rbd_co_pdiscard(BlockDriverState *bs,
> return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD);
> }
>
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> +static int
> +coroutine_fn qemu_rbd_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
> + int count, BdrvRequestFlags flags)
> +{
> + return qemu_rbd_start_co(bs, offset, count, NULL, flags,
> + RBD_AIO_WRITE_ZEROES);
> +}
> +#endif
> +
> static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
> {
> BDRVRBDState *s = bs->opaque;
> @@ -1120,6 +1147,9 @@ static BlockDriver bdrv_rbd = {
> .bdrv_co_pwritev = qemu_rbd_co_pwritev,
> .bdrv_co_flush_to_disk = qemu_rbd_co_flush,
> .bdrv_co_pdiscard = qemu_rbd_co_pdiscard,
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> + .bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
> +#endif
>
> .bdrv_snapshot_create = qemu_rbd_snap_create,
> .bdrv_snapshot_delete = qemu_rbd_snap_remove,
> --
> 2.17.1
>
>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Thanks,
Ilya
next prev parent reply other threads:[~2021-07-02 12:27 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-02 9:09 [PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support Peter Lieven
2021-07-02 9:09 ` [PATCH V4 1/6] block/rbd: bump librbd requirement to luminous release Peter Lieven
2021-07-02 10:43 ` Ilya Dryomov
2021-07-02 9:09 ` [PATCH V4 2/6] block/rbd: store object_size in BDRVRBDState Peter Lieven
2021-07-02 9:09 ` [PATCH V4 3/6] block/rbd: update s->image_size in qemu_rbd_getlength Peter Lieven
2021-07-02 10:45 ` Ilya Dryomov
2021-07-02 9:09 ` [PATCH V4 4/6] block/rbd: migrate from aio to coroutines Peter Lieven
2021-07-02 10:57 ` Ilya Dryomov
2021-07-02 9:09 ` [PATCH V4 5/6] block/rbd: add write zeroes support Peter Lieven
2021-07-02 12:24 ` Ilya Dryomov [this message]
2021-07-02 9:09 ` [PATCH V4 6/6] block/rbd: drop qemu_rbd_refresh_limits Peter Lieven
2021-07-02 12:46 ` [PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support Ilya Dryomov
2021-07-02 13:15 ` Peter Lieven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOi1vP8pkgyquGggTMLKN3RirmFxQMxSe2PVa_JjJKMQddt-wA@mail.gmail.com \
--to=idryomov@gmail.com \
--cc=berrange@redhat.com \
--cc=ct@flyingcircus.io \
--cc=dillaman@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).