All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	John Snow <jsnow@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
Date: Fri, 12 May 2017 18:06:45 +0200	[thread overview]
Message-ID: <7171308e-833e-38a0-8775-9b5a52b8a105@redhat.com> (raw)
In-Reply-To: <161300e8-fefb-f2c6-09cd-afd16d14bca0@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]

On 2017-05-11 16:56, Eric Blake wrote:
> [revisiting this older patch version, even though the final version in
> today's pull request changed somewhat from this approach]
> 
> On 04/12/2017 04:49 AM, Kevin Wolf wrote:
>> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>>> 'qemu-img map' already coalesces information about unallocated
>>> clusters (those with status 0) and pure zero clusters (those
>>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>>> qcow2 images with no backing file already report all unallocated
>>> clusters (in the preallocation sense of clusters with no offset)
>>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>>> to external users), thanks to generic block layer code in
>>> bdrv_co_get_block_status().
>>>
>>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>>> mark clusters as unallocated (defer to backing file) rather than
>>> reads-as-zero (regardless of backing file) makes no difference
>>> to normal behavior, but may potentially allow for fewer writes to
>>> the L2 table of a freshly-created image where the L2 table is
>>> initially written to all-zeroes (although I don't actually know
>>> if we skip an L2 update and flush when re-writing the same
>>> contents as already present).
>>
>> I don't get this. Allocating a cluster always involves an L2 update, no
>> matter whether it was previously unallocated or a zero cluster.
> 
> On IRC, John, Kevin, and I were discussing the current situation with
> libvirt NBD storage migration.  When libvirt creates a file on the
> destination (rather than the user pre-creating it), it currently
> defaults to 0.10 [v2] images, even if the source was a 1.1 image [v3]
> (arguably something that could be improved in libvirt, but
> https://bugzilla.redhat.com/show_bug.cgi?id=1371749 was closed as not a
> bug).
> 
> Therefore, the use case of doing a mirror job to a v2 image, and having
> that image become thick even though the source was thin, is happening
> more than we'd like
> (https://bugzilla.redhat.com/show_bug.cgi?id=1371749).  While Kevin had
> a point that in the common case we ALWAYS want to turn an unallocated
> cluster into a zero cluster (so that we don't have to audit whether all
> callers are properly accounting for the case where a backing image is
> added later), our conversation on IRC today conceded that we may want to
> introduce a new BDRV_REQ_READ_ZERO_NOP (or some such name) that
> particular callers can use to request that if a cluster already reads as
> zeroes, the write zero request does NOT have to change it.  Normal guest
> operations would not use the flag, but mirroring IS a case that would
> use the flag, so that we can end up with thinner mirrors even to 0.10
> images.
> 
> The other consideration is that on 0.10 images, even if we have to
> allocate, right now our allocation is done by way of failing with
> -ENOTSUP and falling back to the normal pwrite() of explicit zeroes.  It
> may be worth teaching the qcow2 layer to explicitly handle write zeroes,
> even on 0.10 images, by allocating a cluster (as needed) but then
> telling bs->file to write zeroes (punching a hole as appropriate) so
> that the file is still thin.  In fact, it matches the fact that we
> already have code that probes whether a qcow2 cluster that reports
> BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID then probes bs->file to see if
> there is a hole there, at which point it can add BDRV_BLOCK_ZERO to the
> bdrv_get_block_status.
> 
> I don't know which one of us will tackle patches along these lines, but
> wanted to at least capture the gist of the IRC conversation in the
> archives for a bit more permanence.

Just short ideas:

(1) I do consider it a bug if v2 images are created. The BZ wasn't
closed for a very good reason, but because nobody answered this question:

> is this really a problem?

And apparently it is.

(2) There is a way of creating zero clusters on v2 images, but we've
never done it (yet): You create a single data cluster containing zeroes
and then you just point to it whenever you need a zero cluster (until
its refcount overflows, at which point you allocate another cluster).
Maybe that helps.

I'd be really careful with optimizations for v2 images. You have already
proven to me that my fears of too much complexity are out of proportions
sometimes, but still. v2 upstream is old legacy and should be treated as
such, at least IMHO.

Max



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 498 bytes --]

  parent reply	other threads:[~2017-05-12 16:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file Eric Blake
2017-04-12  9:49   ` Kevin Wolf
2017-04-12 13:32     ` Eric Blake
2017-04-21 18:42     ` Eric Blake
2017-05-11 14:56     ` Eric Blake
2017-05-11 15:18       ` Eric Blake
2017-05-12 16:06       ` Max Reitz [this message]
2017-05-12 23:00         ` John Snow
2017-05-15 18:35           ` Max Reitz
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 02/13] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
2017-04-11  2:37   ` Philippe Mathieu-Daudé
2017-04-11 12:11     ` Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 04/13] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 05/13] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 06/13] qcow2: Assert that cluster operations are aligned Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count Eric Blake
2017-04-11 22:12   ` Eric Blake
2017-04-11 22:15   ` [Qemu-devel] [PATCH v9.5 07/13] fixup! " Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 08/13] blkdebug: Sanity check block layer guarantees Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 09/13] blkdebug: Refactor error injection Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 10/13] blkdebug: Add pass-through write_zero and discard support Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 11/13] blkdebug: Simplify override logic Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 12/13] blkdebug: Add ability to override unmap geometries Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 13/13] tests: Add coverage for recent block geometry fixes Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7171308e-833e-38a0-8775-9b5a52b8a105@redhat.com \
    --to=mreitz@redhat.com \
    --cc=eblake@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.