All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: fam@euphon.net, qemu-block@nongnu.org, qemu-devel@nongnu.org,
	mreitz@redhat.com, stefanha@redhat.com, den@openvz.org
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Tue, 19 Nov 2019 13:05:52 +0100	[thread overview]
Message-ID: <20191119120552.GB5910@linux.fritz.box> (raw)
In-Reply-To: <20191116163410.12129-1-vsementsov@virtuozzo.com>

Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> I wanted to understand, what is the real difference between
> bdrv_block_status_above and bdrv_is_allocated_above, IMHO
> bdrv_is_allocated_above should work through bdrv_block_status_above..
> 
> And I found the problem: bdrv_is_allocated_above considers space after
> EOF as UNALLOCATED for intermediate nodes..
> 
> UNALLOCATED is not about allocation at fs level, but about should we
> go to backing or not.. And it seems incorrect for me, as in case of
> short backing file, we'll read zeroes after EOF, instead of going
> further by backing chain.

We actually have documentation what it means:

 * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
 *                       layer rather than any backing, set by block layer

Say we have a short overlay, like this:

base.qcow2:     AAAAAAAA
overlay.qcow2:  BBBB

Then of course the content of block 5 (the one after EOF of
overlay.qcow2) is still determined by overlay.qcow2, which can be easily
verified by reading it from overlay.qcow2 (produces zeros) and from
base.qcow2 (produces As).

So the correct result when querying the block status of block 5 on
overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

Interestingly, we already fixed the opposite case (large overlay over
short backing file) in commit e88ae2264d9 from May 2014 according to the
same logic.

> This leads to the following effect:
> 
> ./qemu-img create -f qcow2 base.qcow2 2M
> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> 
> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> 
> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> 
> But after commit guest visible state is changed, which seems wrong for me:
> ./qemu-img commit top.qcow2 -b mid.qcow2
> 
> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> 
> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> 
> 
> I don't know, is it a real bug, as I don't know, do we support backing
> file larger than its parent. Still, I'm not sure that this behavior of
> bdrv_is_allocated_above don't lead to other problems.

I agree it's a bug.

Your fix doesn't look right to me, though. You leave the buggy behaviour
of bdrv_co_block_status() as it is and then add four patches to work
around it in some (but not all) callers of it.

All that it should take to fix this is making the bs->backing check
independent from want_zero and let it set ALLOCATED. What I expected
would be something like the below patch.

But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in
qemu-io shows that the range is now considered allocated), so probably
there is still a separate bug in bdrv_is_allocated_above().

And I think we'll want an iotests case for both cases (short overlay,
short backing file).

Kevin


diff --git a/block/io.c b/block/io.c
index f75777f5ea..5eafcff01a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2359,16 +2359,17 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
 
     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
-    } else if (want_zero) {
-        if (bdrv_unallocated_blocks_are_zero(bs)) {
-            ret |= BDRV_BLOCK_ZERO;
-        } else if (bs->backing) {
-            BlockDriverState *bs2 = bs->backing->bs;
-            int64_t size2 = bdrv_getlength(bs2);
-
-            if (size2 >= 0 && offset >= size2) {
+    } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) {
+        ret |= BDRV_BLOCK_ZERO;
+    } else if (bs->backing) {
+        BlockDriverState *bs2 = bs->backing->bs;
+        int64_t size2 = bdrv_getlength(bs2);
+
+        if (size2 >= 0 && offset >= size2) {
+            if (want_zero) {
                 ret |= BDRV_BLOCK_ZERO;
             }
+            ret |= BDRV_BLOCK_ALLOCATED;
         }
     }
 



  parent reply	other threads:[~2019-11-19 12:06 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-16 16:34 [PATCH 0/4] fix & merge block_status_above and is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-16 16:34 ` [PATCH 1/4] block/io: fix bdrv_co_block_status_above Vladimir Sementsov-Ogievskiy
2019-11-25 16:00   ` Kevin Wolf
2019-11-26  7:26     ` Vladimir Sementsov-Ogievskiy
2019-11-26 14:20       ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 2/4] block/io: bdrv_common_block_status_above: support include_base Vladimir Sementsov-Ogievskiy
2019-11-25 16:19   ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 3/4] block/io: bdrv_common_block_status_above: support bs == base Vladimir Sementsov-Ogievskiy
2019-11-25 16:23   ` Kevin Wolf
2019-11-16 16:34 ` [PATCH 4/4] block/io: fix bdrv_is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-19 10:22 ` [PATCH 0/4] fix & merge block_status_above and is_allocated_above Max Reitz
2019-11-19 12:02   ` Denis V. Lunev
2019-11-19 12:12     ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:20     ` Max Reitz
2019-11-19 12:30       ` Vladimir Sementsov-Ogievskiy
2019-11-19 13:28         ` Kevin Wolf
2019-11-19 12:05 ` Kevin Wolf [this message]
2019-11-19 12:17   ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:32     ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:34       ` Vladimir Sementsov-Ogievskiy
2019-11-19 12:49         ` Vladimir Sementsov-Ogievskiy
2019-11-19 14:21     ` Kevin Wolf
2019-11-19 14:54 ` Kevin Wolf
2019-11-19 16:58 ` Stefan Hajnoczi
2019-11-19 17:11   ` Vladimir Sementsov-Ogievskiy
2019-11-20 10:20 ` Vladimir Sementsov-Ogievskiy
2019-11-20 11:44   ` Kevin Wolf
2019-11-20 12:04     ` Vladimir Sementsov-Ogievskiy
2019-11-20 13:30       ` Kevin Wolf
2019-11-20 13:51         ` Vladimir Sementsov-Ogievskiy
2019-11-20 13:37       ` Vladimir Sementsov-Ogievskiy
2019-11-20 16:24 ` [PATCH 5/4] iotests: add commit top->base cases to 274 Vladimir Sementsov-Ogievskiy
2019-11-25 10:08 ` [PATCH 0/4] fix & merge block_status_above and is_allocated_above Vladimir Sementsov-Ogievskiy
2019-11-25 15:46   ` Kevin Wolf
2019-11-26  7:27     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191119120552.GB5910@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=den@openvz.org \
    --cc=fam@euphon.net \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.