qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	"open list:Sheepdog" <sheepdog@lists.wpkg.org>,
	qemu-block@nongnu.org, Jeff Cody <codyprime@gmail.com>,
	Stefan Weil <sw@weilnetz.de>, Peter Lieven <pl@kamp.de>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	david.edmondson@oracle.com, Stefan Hajnoczi <stefanha@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>,
	Liu Yuan <namei.unix@gmail.com>,
	Jason Dillaman <dillaman@redhat.com>
Subject: Re: [PATCH 09/17] block: Refactor bdrv_has_zero_init{,_truncate}
Date: Wed, 5 Feb 2020 18:55:48 +0100	[thread overview]
Message-ID: <4f193e34-93bc-fd63-18cc-b9e0d9ee201d@redhat.com> (raw)
In-Reply-To: <945a62b8-6b50-70bf-bb7e-daacc00d5e43@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 4798 bytes --]

On 05.02.20 15:07, Eric Blake wrote:
> On 2/5/20 1:51 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>>>> +typedef enum {
>>>>> +    /*
>>>>> +     * bdrv_known_zeroes() should include this bit if the contents of
>>>>> +     * a freshly-created image with no backing file reads as all
>>>>> +     * zeroes without any additional effort.  If .bdrv_co_truncate is
>>>>> +     * set, then this must be clear if BDRV_ZERO_TRUNCATE is clear.
>>>>
>>>> I understand that this is preexisting logic, but could I ask: why?
>>>> What's wrong
>>>> if driver can guarantee that created file is all-zero, but is not sure
>>>> about
>>>> file resizing? I agree that it's normal for these flags to have the
>>>> same
>>>> value,
>>>> but what is the reason for this restriction?..
>>>
>>> If areas added by truncation (or growth, rather) are always zero, then
>>> the file can always be created with size 0 and grown from there.  Thus,
>>> images where truncation adds zeroed areas will generally always be zero
>>> after creation.
>>
>> This means, that if truncation bit is set, than create bit should be
>> set.. But
>> here we say that if truncation is clear, than create bit must be clear.
> 
> Max, did we get the logic backwards?

Or maybe my explanation was just wrong.

Because nobody actually forces a driver to use truncate to ensure that
an newly created file will be 0.  Hm.  And more importantly, you can’t
use truncate with PREALLOC_MODE_OFF when you want to create an image
with preallocation.

Let’s see.  The offending commit message says:

> No .bdrv_has_zero_init() implementation returns 1 if growing the file
> would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it
> in lieu of this new function was always safe.
> 
> But on the other hand, it is possible that growing an image that is not
> zero-initialized would still add a zero-initialized area, like when
> using nonpreallocating truncation on a preallocated image.  For callers
> that care only about truncation, not about creation with potential
> preallocation, this new function is useful.

So I suppose the explanation is just the preallocation mode alone;
has_zero_init() is for the image’s actual preallocation mode, whereas
has_zero_init_truncate() is forced to PREALLOC_MODE_OFF.  As such, the
latter is less strict than the former.  So the former cannot be true
when the latter is false.

>>>> So, the only possible combination of flags, when they differs, is
>>>> create=0 and
>>>> truncate=1.. How is it possible?
>>>
>>> For preallocated qcow2 images, it depends on the storage whether they
>>> are actually 0 after creation.  Hence qcow2_has_zero_init() then defers
>>> to bdrv_has_zero_init() of s->data_file->bs.
>>>
>>> But when you truncate them (with PREALLOC_MODE_OFF, as
>>> BlockDriver.bdrv_has_zero_init_truncate()’s comment explains), the new
>>> area is always going to be 0, regardless of initial preallocation.
>>
>> ah yes, due to qcow2 zero clusters.
> 
> Hmm. Do we actually set the zero flag on unallocated clusters when
> resizing a qcow2 image?

No.  They are just unallocated, i.e. zero.  (Nodes with backing files
never return true for bdrv_has_zero_init_truncate anyway).

> That would be an O(n) operation (we have to
> visit the L2 entry for each added cluster, even if only to set the zero
> cluster bit).  Or do we instead just rely on the fact that qcow2 is
> inherently sparse, and that when you resize the guest-visible size
> without writing any new clusters, then it is only subsequent guest
> access to those addresses that finally allocate clusters, making resize
> O(1) (update the qcow2 metadata cluster, but not any L2 tables) while
> still reading 0 from the new data.  To some extent, that's what the
> allocation mode is supposed to control.
> 
> What about with external data images, where a resize in guest-visible
> length requires a resize of the underlying data image?  There, we DO
> have to worry about whether the data image resizes with zeroes (as in
> the filesystem) or with random data (as in a block device).

Well, partially: Namely, only with data_file_raw.  Because otherwise the
clusters are still unallocated and thus read as zero.  So yes, then we
do have to worry about that.

With data_file_raw, we have an obligation to make the data file return
the same data as the qcow2 file, so, um.  I wonder whether we actually
take any care of this yet.  If you have some external data file without
zero_init(_truncate), do get zeroes when reading from the qcow2 node,
but non-zeroes when reading from the raw data file?  That would be OK
without data_file_raw, but not with it.  I suppose I’ll have to test it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2020-02-05 17:57 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-31 17:44 [PATCH 00/17] Improve qcow2 all-zero detection Eric Blake
2020-01-31 17:44 ` [PATCH 01/17] qcow2: Comment typo fixes Eric Blake
2020-02-04 14:12   ` Vladimir Sementsov-Ogievskiy
2020-02-09 19:34   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 02/17] qcow2: List autoclear bit names in header Eric Blake
2020-02-04 14:26   ` Vladimir Sementsov-Ogievskiy
2020-01-31 17:44 ` [PATCH 03/17] qcow2: Avoid feature name extension on small cluster size Eric Blake
2020-02-04 14:39   ` Vladimir Sementsov-Ogievskiy
2020-02-09 19:28   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 04/17] block: Improve documentation of .bdrv_has_zero_init Eric Blake
2020-02-04 15:03   ` Vladimir Sementsov-Ogievskiy
2020-02-04 15:16     ` Eric Blake
2020-01-31 17:44 ` [PATCH 05/17] block: Don't advertise zero_init_truncate with encryption Eric Blake
2020-02-10 18:12   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 06/17] block: Improve bdrv_has_zero_init_truncate with backing file Eric Blake
2020-02-10 18:13   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 07/17] gluster: Drop useless has_zero_init callback Eric Blake
2020-02-04 15:06   ` Vladimir Sementsov-Ogievskiy
2020-02-10 18:21   ` Alberto Garcia
2020-02-17  8:06   ` [GEDI] " Niels de Vos
2020-02-17 12:03     ` Eric Blake
2020-02-17 12:22       ` Eric Blake
2020-02-17 14:01       ` Niels de Vos
2020-01-31 17:44 ` [PATCH 08/17] sheepdog: Consistently set bdrv_has_zero_init_truncate Eric Blake
2020-02-04 15:09   ` Vladimir Sementsov-Ogievskiy
2020-01-31 17:44 ` [PATCH 09/17] block: Refactor bdrv_has_zero_init{,_truncate} Eric Blake
2020-02-04 15:35   ` Vladimir Sementsov-Ogievskiy
2020-02-04 15:49     ` Eric Blake
2020-02-04 16:07       ` Vladimir Sementsov-Ogievskiy
2020-02-04 17:42     ` Max Reitz
2020-02-04 17:51       ` Eric Blake
2020-02-05 16:43         ` Max Reitz
2020-02-05  7:51       ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:07         ` Eric Blake
2020-02-05 14:25           ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:36             ` Eric Blake
2020-02-05 17:55           ` Max Reitz [this message]
2020-02-04 17:53   ` Max Reitz
2020-02-04 19:03     ` Eric Blake
2020-02-05 17:22       ` Max Reitz
2020-02-05 18:39         ` Eric Blake
2020-02-06  9:18           ` Max Reitz
2020-01-31 17:44 ` [PATCH 10/17] block: Add new BDRV_ZERO_OPEN flag Eric Blake
2020-01-31 18:03   ` Eric Blake
2020-02-04 17:34   ` Max Reitz
2020-02-04 17:50     ` Eric Blake
2020-02-05  8:39       ` Vladimir Sementsov-Ogievskiy
2020-02-05 17:26       ` Max Reitz
2020-01-31 17:44 ` [PATCH 11/17] file-posix: Support BDRV_ZERO_OPEN Eric Blake
2020-01-31 17:44 ` [PATCH 12/17] gluster: " Eric Blake
2020-02-17  8:16   ` [GEDI] " Niels de Vos
2020-01-31 17:44 ` [PATCH 13/17] qcow2: Add new autoclear feature for all zero image Eric Blake
2020-02-03 17:45   ` Vladimir Sementsov-Ogievskiy
2020-02-04 13:12     ` Eric Blake
2020-02-04 13:29       ` Vladimir Sementsov-Ogievskiy
2020-01-31 17:44 ` [PATCH 14/17] qcow2: Expose all zero bit through .bdrv_known_zeroes Eric Blake
2020-01-31 17:44 ` [PATCH 15/17] qcow2: Implement all-zero autoclear bit Eric Blake
2020-01-31 17:44 ` [PATCH 16/17] iotests: Add new test for qcow2 all-zero bit Eric Blake
2020-01-31 17:44 ` [PATCH 17/17] qcow2: Let qemu-img check cover " Eric Blake
2020-02-04 17:32 ` [PATCH 00/17] Improve qcow2 all-zero detection Max Reitz
2020-02-04 18:53   ` Eric Blake
2020-02-05 17:04     ` Max Reitz
2020-02-05 19:21       ` Eric Blake
2020-02-06  9:12         ` Max Reitz
2020-02-05  9:04 ` Vladimir Sementsov-Ogievskiy
2020-02-05  9:25   ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:26     ` Eric Blake
2020-02-05 14:47       ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:14         ` Vladimir Sementsov-Ogievskiy
2020-02-05 17:58           ` Max Reitz
2020-02-05 14:22   ` Eric Blake
2020-02-05 14:43     ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:58       ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f193e34-93bc-fd63-18cc-b9e0d9ee201d@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=codyprime@gmail.com \
    --cc=david.edmondson@oracle.com \
    --cc=den@openvz.org \
    --cc=dillaman@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=namei.unix@gmail.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=sheepdog@lists.wpkg.org \
    --cc=stefanha@redhat.com \
    --cc=sw@weilnetz.de \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).