QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>,
	"nbd@other.debian.org" <nbd@other.debian.org>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"libguestfs@redhat.com" <libguestfs@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO
Date: Wed, 28 Aug 2019 13:45:08 +0000
Message-ID: <3b8fd5cb-4c9d-801e-61e0-3a620e758cb7@virtuozzo.com> (raw)
In-Reply-To: <810779d1-0289-d635-0446-93b3dd32ec95@redhat.com>

28.08.2019 16:04, Eric Blake wrote:
> On 8/28/19 4:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>> Hence, it is desirable to have a way for clients to specify that a
>>> particular write zero request is being attempted for a fast wipe, and
>>> get an immediate failure if the zero request would otherwise take the
>>> same time as a write.  Conversely, if the client is not performing a
>>> pre-initialization pass, it is still more efficient in terms of
>>> networking traffic to send NBD_CMD_WRITE_ZERO requests where the
>>> server implements the fallback to the slower write, than it is for the
>>> client to have to perform the fallback to send NBD_CMD_WRITE with a
>>> zeroed buffer.
>>
>> How are you going to finally use it in qemu-img convert?
> 
> It's already in use there (in fact, the cover letter shows actual timing
> examples of how qemu-img's use of BDRV_REQ_NO_FALLBACK, which translates
> to NBD_CMD_FLAG_FAST_ZERO, observably affects timing).
> 
>> Ok, we have a loop
>> of sending write-zero requests. And on first ENOTSUP we'll assume that there
>> is no benefit to continue? But what if actually server returns ENOTSUP only
>> once when we have 1000 iterations? Seems we should still do zeroing if we
>> have only a few ENOTSUPs...
> 
> When attempting a bulk zero, you try to wipe the entire device,
> presumably with something that is large and aligned.  Yes, if you have
> to split the write zero request due to size limitations, then you risk
> that the first write zero succeeds but later ones fail, then you didn't
> wipe the entire disk, but you also don't need to redo the work on the
> first half of the image.  But it is much more likely that the first
> write of the bulk zero is representative of the overall operation (and
> so in practice, it only takes one fast zero attempt to learn if bulk
> zeroing is worthwhile, then continue with fast zeroes without issues).
> 
>>
>> I understand that fail-on-first ENOTSUP is OK for raw-without-fallocte vs qcow2,
>> as first will always return ENOTSUP and second will never fail.. But in such way
>> we'll OK with simpler extension, which only have one server-advirtised negotiation
>> flag NBD_FLAG_ZERO_IS_FAST.
> 
> Advertising that a server's zero behavior is always going to be
> successfully fast is a much harder flag to implement.  The justification
> for the semantics I chose (advertising that the server can quickly
> report failure if success is not fast, but not requiring fast zero)
> covers the case when the decision of whether a zero is fast may also
> depend on other factors - for example, if the server knows the image
> starts in an all-zero state, then it can track a boolean: all write zero
> requests while the boolean is set return immediate success (nothing to
> do), but after the first regular write, the boolean is set to false, and
> all further write zero requests fail as being potentially slow; and such
> an implementation is still useful for the qemu-img convert case.

Agreed, thanks for this example)

> 
>>
>> There is not such problem if we have only one iteration, so may be new command
>> FILL_ZERO, filling the whole device by zeros?
> 
> Or better yet, implement support for 64-bit commands.  Yes, my cover
> letter called out further orthogonal extensions, and implementing 64-bit
> zeroing (so that you can issue a write zero request over the entire
> image in one command), as well as a way for a server to advertise when
> the image begins life in an all-zero state, are also further extensions
> coming down the pipeline.  But as not all servers have to implement all
> of the extensions, each extension that can be orthogonally implemented
> and show an improvement on its own is still worthwhile; and my cover
> letter has shown that fast zeroes on their own make a measurable
> difference to certain workloads.
> 
>>> +    If the server advertised `NBD_FLAG_SEND_FAST_ZERO` but
>>> +    `NBD_CMD_FLAG_FAST_ZERO` is not set, then the server MUST NOT fail
>>> +    with `NBD_ENOTSUP`, even if the operation is no faster than a
>>> +    corresponding `NBD_CMD_WRITE`. Conversely, if
>>> +    `NBD_CMD_FLAG_FAST_ZERO` is set, the server MUST fail quickly with
>>> +    `NBD_ENOTSUP` unless the request can be serviced in less time than
>>> +    a corresponding `NBD_CMD_WRITE`, and SHOULD NOT alter the contents
>>> +    of the export when returning this failure. The server's
>>> +    determination of a fast request MAY depend on a number of factors,
>>> +    such as whether the request was suitably aligned, on whether the
>>> +    `NBD_CMD_FLAG_NO_HOLE` flag was present, or even on whether a
>>> +    previous `NBD_CMD_TRIM` had been performed on the region.  If the
>>> +    server did not advertise `NBD_FLAG_SEND_FAST_ZERO`, then it SHOULD
>>> +    NOT fail with `NBD_ENOTSUP`, regardless of the speed of servicing
>>> +    a request, and SHOULD fail with `NBD_EINVAL` if the
>>> +    `NBD_CMD_FLAG_FAST_ZERO` flag was set. A server MAY advertise
>>> +    `NBD_FLAG_SEND_FAST_ZERO` whether or not it can perform fast
>>> +    zeroing; similarly, a server SHOULD fail with `NBD_ENOTSUP` when
>>> +    the flag is set if the server cannot quickly determine in advance
>>> +    whether that request would have been fast, even if it turns out
>>> +    that the same request without the flag would be fast after all.
>>> +
>>
>> What if WRITE_ZERO in the average is faster than WRITE (for example by 20%),
>> but server never can guarantee performance for one WRITE_ZERO operation, do
>> you restrict such case? Hmm, OK, SHOULD is not MUST actually..
> 
> I think my followup mail, based on Wouter's questions, covers this: the

Hmm, OK, sorry for duplication

> goal is to document the use case of optimizing the copy of a sparse
> image, by probing whether a bulk pre-zeroing pass is worthwhile.  That
> should be the measuring rod - if the implementation can perform a faster
> sparse copy because of write zeroes that are sometimes, but not always,
> faster than writes, in spite of the duplicated I/O that happens to the
> data portions of the image that were touched twice by the pre-zero pass
> then the actual data pass, then succeeding on fast zero requests is
> okay.  But if it makes the overall image copy slower, then failing with
> ENOTSUP is probably better.  And at the end of the day, it is really
> just a heuristic - if the server guessed wrong, the worst that happens
> is slower performance (and not data corruption).
> 

OK, I understand, than it's all sounds good

-- 
Best regards,
Vladimir

  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 14:30 [Qemu-devel] cross-project patches: Add NBD Fast Zero support Eric Blake
2019-08-23 14:34 ` [Qemu-devel] [PATCH 0/1] NBD protocol change to add fast zero support Eric Blake
2019-08-23 14:34   ` [Qemu-devel] [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO Eric Blake
2019-08-23 18:48     ` Wouter Verhelst
2019-08-23 18:58       ` Eric Blake
2019-08-24  6:44         ` Wouter Verhelst
2019-08-28  9:57     ` Vladimir Sementsov-Ogievskiy
2019-08-28 13:04       ` Eric Blake
2019-08-28 13:45         ` Vladimir Sementsov-Ogievskiy [this message]
2019-09-03 20:53   ` [Qemu-devel] [Libguestfs] [PATCH 0/1] NBD protocol change to add fast zero support Eric Blake
2019-08-23 14:37 ` [Qemu-devel] [PATCH 0/5] Add NBD fast zero support to qemu client and server Eric Blake
2019-08-23 14:37   ` [Qemu-devel] [PATCH 1/5] nbd: Improve per-export flag handling in server Eric Blake
2019-08-30 18:00     ` Vladimir Sementsov-Ogievskiy
2019-08-30 23:10       ` Eric Blake
2019-08-30 23:32         ` Eric Blake
2019-09-03 16:39           ` Eric Blake
2019-09-04 17:08     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 2/5] nbd: Prepare for NBD_CMD_FLAG_FAST_ZERO Eric Blake
2019-08-30 18:07     ` Vladimir Sementsov-Ogievskiy
2019-08-30 23:37       ` Eric Blake
2019-08-31  8:11         ` Vladimir Sementsov-Ogievskiy
2019-09-03 18:49       ` Eric Blake
2019-08-31  8:20     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 3/5] nbd: Implement client use of NBD FAST_ZERO Eric Blake
2019-08-30 18:11     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 4/5] nbd: Implement server " Eric Blake
2019-08-30 18:40     ` Vladimir Sementsov-Ogievskiy
2019-08-23 14:37   ` [Qemu-devel] [PATCH 5/5] nbd: Tolerate more errors to structured reply request Eric Blake
2019-08-23 16:41     ` Eric Blake
2019-08-28 13:55   ` [Qemu-devel] [PATCH 0/5] Add NBD fast zero support to qemu client and server Vladimir Sementsov-Ogievskiy
2019-08-28 14:05     ` Eric Blake
2019-08-23 14:38 ` [Qemu-devel] [libnbd PATCH 0/1] libnbd support for new fast zero Eric Blake
2019-08-23 14:38   ` [Qemu-devel] [libnbd PATCH 1/1] api: Add support for FAST_ZERO flag Eric Blake
2019-08-27 12:25     ` [Qemu-devel] [Libguestfs] " Richard W.M. Jones
2019-08-23 14:40 ` [Qemu-devel] [nbdkit PATCH 0/3] nbdkit support for new NBD fast zero Eric Blake
2019-08-23 14:40   ` [Qemu-devel] [nbdkit PATCH 1/3] server: Add internal support for NBDKIT_FLAG_FAST_ZERO Eric Blake
2019-08-23 14:40   ` [Qemu-devel] [nbdkit PATCH 2/3] filters: Add .can_fast_zero hook Eric Blake
2019-08-23 14:40   ` [Qemu-devel] [nbdkit PATCH 3/3] plugins: " Eric Blake
2019-08-23 21:16     ` [Qemu-devel] [Libguestfs] " Eric Blake
2019-08-27 15:43     ` Richard W.M. Jones
2019-08-23 15:05 ` [Qemu-devel] cross-project patches: Add NBD Fast Zero support Vladimir Sementsov-Ogievskiy
2019-08-27 12:14 ` [Qemu-devel] [Libguestfs] " Richard W.M. Jones
2019-08-27 13:23   ` Eric Blake

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b8fd5cb-4c9d-801e-61e0-3a620e758cb7@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=libguestfs@redhat.com \
    --cc=nbd@other.debian.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org qemu-devel@archiver.kernel.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox