All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>,
	"nbd@other.debian.org" <nbd@other.debian.org>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"libguestfs@redhat.com" <libguestfs@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
Date: Wed, 28 Aug 2019 13:45:08 +0000	[thread overview]
Message-ID: <3b8fd5cb-4c9d-801e-61e0-3a620e758cb7@virtuozzo.com> (raw)
In-Reply-To: <810779d1-0289-d635-0446-93b3dd32ec95@redhat.com>

28.08.2019 16:04, Eric Blake wrote:
> On 8/28/19 4:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>> Hence, it is desirable to have a way for clients to specify that a
>>> particular write zero request is being attempted for a fast wipe, and
>>> get an immediate failure if the zero request would otherwise take the
>>> same time as a write.  Conversely, if the client is not performing a
>>> pre-initialization pass, it is still more efficient in terms of
>>> networking traffic to send NBD_CMD_WRITE_ZERO requests where the
>>> server implements the fallback to the slower write, than it is for the
>>> client to have to perform the fallback to send NBD_CMD_WRITE with a
>>> zeroed buffer.
>>
>> How are you going to finally use it in qemu-img convert?
> 
> It's already in use there (in fact, the cover letter shows actual timing
> examples of how qemu-img's use of BDRV_REQ_NO_FALLBACK, which translates
> to NBD_CMD_FLAG_FAST_ZERO, observably affects timing).
> 
>> Ok, we have a loop
>> of sending write-zero requests. And on first ENOTSUP we'll assume that there
>> is no benefit to continue? But what if actually server returns ENOTSUP only
>> once when we have 1000 iterations? Seems we should still do zeroing if we
>> have only a few ENOTSUPs...
> 
> When attempting a bulk zero, you try to wipe the entire device,
> presumably with something that is large and aligned.  Yes, if you have
> to split the write zero request due to size limitations, then you risk
> that the first write zero succeeds but later ones fail, then you didn't
> wipe the entire disk, but you also don't need to redo the work on the
> first half of the image.  But it is much more likely that the first
> write of the bulk zero is representative of the overall operation (and
> so in practice, it only takes one fast zero attempt to learn if bulk
> zeroing is worthwhile, then continue with fast zeroes without issues).
> 
>>
>> I understand that fail-on-first ENOTSUP is OK for raw-without-fallocte vs qcow2,
>> as first will always return ENOTSUP and second will never fail.. But in such way
>> we'll OK with simpler extension, which only have one server-advirtised negotiation
>> flag NBD_FLAG_ZERO_IS_FAST.
> 
> Advertising that a server's zero behavior is always going to be
> successfully fast is a much harder flag to implement.  The justification
> for the semantics I chose (advertising that the server can quickly
> report failure if success is not fast, but not requiring fast zero)
> covers the case when the decision of whether a zero is fast may also
> depend on other factors - for example, if the server knows the image
> starts in an all-zero state, then it can track a boolean: all write zero
> requests while the boolean is set return immediate success (nothing to
> do), but after the first regular write, the boolean is set to false, and
> all further write zero requests fail as being potentially slow; and such
> an implementation is still useful for the qemu-img convert case.

Agreed, thanks for this example)

> 
>>
>> There is not such problem if we have only one iteration, so may be new command
>> FILL_ZERO, filling the whole device by zeros?
> 
> Or better yet, implement support for 64-bit commands.  Yes, my cover
> letter called out further orthogonal extensions, and implementing 64-bit
> zeroing (so that you can issue a write zero request over the entire
> image in one command), as well as a way for a server to advertise when
> the image begins life in an all-zero state, are also further extensions
> coming down the pipeline.  But as not all servers have to implement all
> of the extensions, each extension that can be orthogonally implemented
> and show an improvement on its own is still worthwhile; and my cover
> letter has shown that fast zeroes on their own make a measurable
> difference to certain workloads.
> 
>>> +    If the server advertised `NBD_FLAG_SEND_FAST_ZERO` but
>>> +    `NBD_CMD_FLAG_FAST_ZERO` is not set, then the server MUST NOT fail
>>> +    with `NBD_ENOTSUP`, even if the operation is no faster than a
>>> +    corresponding `NBD_CMD_WRITE`. Conversely, if
>>> +    `NBD_CMD_FLAG_FAST_ZERO` is set, the server MUST fail quickly with
>>> +    `NBD_ENOTSUP` unless the request can be serviced in less time than
>>> +    a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents
>>> +    of the export when returning this failure. The server's
>>> +    determination of a fast request MAY depend on a number of factors,
>>> +    such as whether the request was suitably aligned, on whether the
>>> +    `NBD_CMD_FLAG_NO_HOLE` flag was present, or even on whether a
>>> +    previous `NBD_CMD_TRIM` had been performed on the region.  If the
>>> +    server did not advertise `NBD_FLAG_SEND_FAST_ZERO`, then it SHOULD
>>> +    NOT fail with `NBD_ENOTSUP`, regardless of the speed of servicing
>>> +    a request, and SHOULD fail with `NBD_EINVAL` if the
>>> +    `NBD_CMD_FLAG_FAST_ZERO` flag was set. A server MAY advertise
>>> +    `NBD_FLAG_SEND_FAST_ZERO` whether or not it can perform fast
>>> +    zeroing; similarly, a server SHOULD fail with `NBD_ENOTSUP` when
>>> +    the flag is set if the server cannot quickly determine in advance
>>> +    whether that request would have been fast, even if it turns out
>>> +    that the same request without the flag would be fast after all.
>>> +
>>
>> What if WRITE_ZERO in the average is faster than WRITE (for example by 20%),
>> but server never can guarantee performance for one WRITE_ZERO operation, do
>> you restrict such case? Hmm, OK, SHOULD is not MUST actually..
> 
> I think my followup mail, based on Wouter's questions, covers this: the

Hmm, OK, sorry for duplication

> goal is to document the use case of optimizing the copy of a sparse
> image, by probing whether a bulk pre-zeroing pass is worthwhile.  That
> should be the measuring rod - if the implementation can perform a faster
> sparse copy because of write zeroes that are sometimes, but not always,
> faster than writes, in spite of the duplicated I/O that happens to the
> data portions of the image that were touched twice by the pre-zero pass
> then the actual data pass, then succeeding on fast zero requests is
> okay.  But if it makes the overall image copy slower, then failing with
> ENOTSUP is probably better.  And at the end of the day, it is really
> just a heuristic - if the server guessed wrong, the worst that happens
> is slower performance (and not data corruption).
> 

OK, I understand, than it's all sounds good

-- 
Best regards,
Vladimir

  reply	other threads:[~2019-08-28 13:46 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 14:30 [Qemu-devel] cross-project patches: Add NBD Fast Zero support Eric Blake
2019-08-23 14:34 ` [Qemu-devel] [PATCH 0/1] NBD protocol change to add fast zero support Eric Blake
2019-08-23 14:34   ` [Qemu-devel] [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO Eric Blake
2019-08-23 18:48     ` Wouter Verhelst
2019-08-23 18:58       ` Eric Blake
2019-08-24  6:44         ` Wouter Verhelst
2019-08-28  9:57     ` Vladimir Sementsov-Ogievskiy
2019-08-28 13:04       ` Eric Blake
2019-08-28 13:45         ` Vladimir Sementsov-Ogievskiy [this message]
2019-09-03 20:53   ` [Qemu-devel] [Libguestfs] [PATCH 0/1] NBD protocol change to add fast zero support Eric Blake
2019-08-23 14:37 ` [Qemu-devel] [PATCH 0/5] Add NBD fast zero support to qemu client and server Eric Blake
2019-08-23 14:37   ` [Qemu-devel] [PATCH 1/5] nbd: Improve per-export flag handling in server Eric Blake
2019-08-30 18:00     ` Vladimir Sementsov-Ogievskiy
2019-08-30 23:10       ` Eric Blake
2019-08-30 23:32         ` Eric Blake
2019-09-03 16:39           ` Eric Blake
2019-09-04 17:08     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 2/5] nbd: Prepare for NBD_CMD_FLAG_FAST_ZERO Eric Blake
2019-08-30 18:07     ` Vladimir Sementsov-Ogievskiy
2019-08-30 23:37       ` Eric Blake
2019-08-31  8:11         ` Vladimir Sementsov-Ogievskiy
2019-09-03 18:49       ` Eric Blake
2019-08-31  8:20     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 3/5] nbd: Implement client use of NBD FAST_ZERO Eric Blake
2019-08-30 18:11     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 4/5] nbd: Implement server " Eric Blake
2019-08-30 18:40     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 5/5] nbd: Tolerate more errors to structured reply request Eric Blake
2019-08-23 16:41     ` Eric Blake
2019-08-28 13:55   ` [Qemu-devel] [PATCH 0/5] Add NBD fast zero support to qemu client and server Vladimir Sementsov-Ogievskiy
2019-08-28 14:05     ` Eric Blake
2019-08-23 14:38 ` [Qemu-devel] [libnbd PATCH 0/1] libnbd support for new fast zero Eric Blake
2019-08-23 14:38   ` [Qemu-devel] [libnbd PATCH 1/1] api: Add support for FAST_ZERO flag Eric Blake
2019-08-27 12:25     ` [Qemu-devel] [Libguestfs] " Richard W.M. Jones
2019-08-23 14:40 ` [Qemu-devel] [nbdkit PATCH 0/3] nbdkit support for new NBD fast zero Eric Blake
2019-08-23 14:40   ` [Qemu-devel] [nbdkit PATCH 1/3] server: Add internal support for NBDKIT_FLAG_FAST_ZERO Eric Blake
2019-08-23 14:40   ` [Qemu-devel] [nbdkit PATCH 2/3] filters: Add .can_fast_zero hook Eric Blake
2019-08-23 14:40   ` [Qemu-devel] [nbdkit PATCH 3/3] plugins: " Eric Blake
2019-08-23 21:16     ` [Qemu-devel] [Libguestfs] " Eric Blake
2019-08-27 15:43     ` Richard W.M. Jones
2019-08-23 15:05 ` [Qemu-devel] cross-project patches: Add NBD Fast Zero support Vladimir Sementsov-Ogievskiy
2019-08-27 12:14 ` [Qemu-devel] [Libguestfs] " Richard W.M. Jones
2019-08-27 13:23   ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b8fd5cb-4c9d-801e-61e0-3a620e758cb7@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=libguestfs@redhat.com \
    --cc=nbd@other.debian.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.