All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/2] mirror: fix improperly filled copy_bitmap for mirror block job
@ 2016-09-15 16:34 Denis V. Lunev
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above() Denis V. Lunev
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
  0 siblings, 2 replies; 12+ messages in thread
From: Denis V. Lunev @ 2016-09-15 16:34 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Stefan Hajnoczi, Fam Zheng, Kevin Wolf, Max Reitz, Jeff Cody

bdrv_is_allocated_above() returns true in the case even for completel
zeroed areas as BDRV_BLOCK_ALLOCATED flag is set in both cases.

The patch stops using bdrv_is_allocated_above() wrapper and switches to
bdrv_get_block_status_above() to distinguish zeroed areas and areas with
data to avoid extra IO operations if possible.

Though this change requires some preparations in bdrv_get_block_status_above()
performed in the patch (1).

Changes from v2:
- reworked patch 1 to properly hide data below short image
- fixed comment in patch 1
- fixed mask assignment in patch 2 to cover bdrv_is_zero_initialized() case

Changes from v1:
- fixed assert in 041 test case (added patch 1)
- fixed commit message
- fixed status check to be on the safe side

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>

Denis V. Lunev (2):
  block: sync bdrv_co_get_block_status_above() with
    bdrv_is_allocated_above()
  mirror: fix improperly filled copy_bitmap for mirror block job

 block/io.c     | 25 ++++++++++++++++++++-----
 block/mirror.c | 17 +++++++++++------
 2 files changed, 31 insertions(+), 11 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-15 16:34 [Qemu-devel] [PATCH v3 0/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
@ 2016-09-15 16:34 ` Denis V. Lunev
  2016-09-19  1:21   ` Fam Zheng
  2016-09-19 23:18   ` [Qemu-devel] " Max Reitz
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
  1 sibling, 2 replies; 12+ messages in thread
From: Denis V. Lunev @ 2016-09-15 16:34 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Stefan Hajnoczi, Fam Zheng, Kevin Wolf, Max Reitz, Jeff Cody

They should work very similar, covering same areas if backing store is
shorter than the image. This change is necessary for the followup patch
switching to bdrv_get_block_status_above() in mirror to avoid assert
in check_block.

This change should be made very carefully. Let us assume that we have
top image and 2 backing stores L0->L1->L2.
  L0: --------------
  L1: -------
  L2: -------=======
The data marked as '=' in L2 should not appear as BDRV_BLOCK_ALLOCATED
and we should return it as filled in L0 image with properly calculated
*pnum value.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
---
 block/io.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 420944d..067d465 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1741,18 +1741,33 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
         BlockDriverState **file)
 {
     BlockDriverState *p;
-    int64_t ret = 0;
+    int64_t ret = 0, res = nb_sectors;
 
     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
-        if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
-            break;
+        int sc;
+        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, &sc, file);
+        if (ret < 0) {
+            return ret;
+        } else if (ret & BDRV_BLOCK_ALLOCATED) {
+            *pnum = sc;
+            return ret;
+        }
+
+        if (res > sc && (p == bs || sector_num + sc < p->total_sectors)) {
+            res = sc;
         }
+
         /* [sector_num, pnum] unallocated on this layer, which could be only
          * the first part of [sector_num, nb_sectors].  */
-        nb_sectors = MIN(nb_sectors, *pnum);
+        nb_sectors = MIN(nb_sectors, sc);
+
+        if (nb_sectors == 0) {
+            break;
+        }
     }
+
+    *pnum = res;
     return ret;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job
  2016-09-15 16:34 [Qemu-devel] [PATCH v3 0/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above() Denis V. Lunev
@ 2016-09-15 16:34 ` Denis V. Lunev
  2016-09-15 17:19   ` Eric Blake
  2016-09-23 11:00   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 2 replies; 12+ messages in thread
From: Denis V. Lunev @ 2016-09-15 16:34 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Stefan Hajnoczi, Fam Zheng, Kevin Wolf, Max Reitz, Jeff Cody

bdrv_is_allocated_above() returns true in the case even for completel
zeroed areas as BDRV_BLOCK_ALLOCATED flag is set in both cases.

The patch stops using bdrv_is_allocated_above() wrapper and switches to
bdrv_get_block_status_above() to distinguish zeroed areas and areas with
data to avoid extra IO operations if possible.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
---
 block/mirror.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index e0b3f41..2710f62 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -548,11 +548,11 @@ static void mirror_throttle(MirrorBlockJob *s)
 
 static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
-    int64_t sector_num, end;
+    int64_t sector_num, end, alloc_mask;
     BlockDriverState *base = s->base;
     BlockDriverState *bs = blk_bs(s->common.blk);
     BlockDriverState *target_bs = blk_bs(s->target);
-    int ret, n;
+    int n;
 
     end = s->bdev_length / BDRV_SECTOR_SIZE;
 
@@ -585,11 +585,15 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
         mirror_drain(s);
     }
 
+    alloc_mask = base == NULL ? BDRV_BLOCK_ALLOCATED : BDRV_BLOCK_DATA;
+
     /* First part, loop on the sectors and initialize the dirty bitmap.  */
     for (sector_num = 0; sector_num < end; ) {
         /* Just to make sure we are not exceeding int limit. */
         int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
                              end - sector_num);
+        int64_t status;
+        BlockDriverState *file;
 
         mirror_throttle(s);
 
@@ -597,13 +601,14 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             return 0;
         }
 
-        ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
-        if (ret < 0) {
-            return ret;
+        status = bdrv_get_block_status_above(bs, base, sector_num,
+                                             nb_sectors, &n, &file);
+        if (status < 0) {
+            return status;
         }
 
         assert(n > 0);
-        if (ret == 1) {
+        if (status & alloc_mask) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
         }
         sector_num += n;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
@ 2016-09-15 17:19   ` Eric Blake
  2016-09-23 11:00   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Blake @ 2016-09-15 17:19 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Jeff Cody, Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

On 09/15/2016 11:34 AM, Denis V. Lunev wrote:
> bdrv_is_allocated_above() returns true in the case even for completel

s/completel/completely/

> zeroed areas as BDRV_BLOCK_ALLOCATED flag is set in both cases.
> 
> The patch stops using bdrv_is_allocated_above() wrapper and switches to
> bdrv_get_block_status_above() to distinguish zeroed areas and areas with
> data to avoid extra IO operations if possible.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Jeff Cody <jcody@redhat.com>
> ---
>  block/mirror.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above() Denis V. Lunev
@ 2016-09-19  1:21   ` Fam Zheng
  2016-09-19  4:37     ` Denis V. Lunev
  2016-09-19 23:18   ` [Qemu-devel] " Max Reitz
  1 sibling, 1 reply; 12+ messages in thread
From: Fam Zheng @ 2016-09-19  1:21 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-block, qemu-devel, Kevin Wolf, Jeff Cody, Max Reitz,
	Stefan Hajnoczi

On Thu, 09/15 19:34, Denis V. Lunev wrote:
> They should work very similar, covering same areas if backing store is
> shorter than the image. This change is necessary for the followup patch
> switching to bdrv_get_block_status_above() in mirror to avoid assert
> in check_block.
> 
> This change should be made very carefully. Let us assume that we have
> top image and 2 backing stores L0->L1->L2.

Stupid question: which one is top and which are backing?

>   L0: --------------
>   L1: -------
>   L2: -------=======
> The data marked as '=' in L2 should not appear as BDRV_BLOCK_ALLOCATED
> and we should return it as filled in L0 image with properly calculated
> *pnum value.

What '-', '=' and ' ' represent aren't immediately clear to me, could you put a
legend in the message too? Something like:

    '-': allocated
    '=': unallocated
    ' ': beyong EOF

> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Jeff Cody <jcody@redhat.com>
> ---
>  block/io.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 420944d..067d465 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1741,18 +1741,33 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
>          BlockDriverState **file)
>  {
>      BlockDriverState *p;
> -    int64_t ret = 0;
> +    int64_t ret = 0, res = nb_sectors;
>  
>      assert(bs != base);
>      for (p = bs; p != base; p = backing_bs(p)) {
> -        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
> -        if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
> -            break;
> +        int sc;
> +        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, &sc, file);
> +        if (ret < 0) {
> +            return ret;
> +        } else if (ret & BDRV_BLOCK_ALLOCATED) {
> +            *pnum = sc;
> +            return ret;
> +        }
> +
> +        if (res > sc && (p == bs || sector_num + sc < p->total_sectors)) {
> +            res = sc;
>          }
> +
>          /* [sector_num, pnum] unallocated on this layer, which could be only
>           * the first part of [sector_num, nb_sectors].  */
> -        nb_sectors = MIN(nb_sectors, *pnum);
> +        nb_sectors = MIN(nb_sectors, sc);
> +
> +        if (nb_sectors == 0) {
> +            break;
> +        }
>      }
> +
> +    *pnum = res;
>      return ret;
>  }
>  
> -- 
> 2.7.4
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-19  1:21   ` Fam Zheng
@ 2016-09-19  4:37     ` Denis V. Lunev
  2016-09-19 20:39       ` Eric Blake
  0 siblings, 1 reply; 12+ messages in thread
From: Denis V. Lunev @ 2016-09-19  4:37 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, qemu-devel, Kevin Wolf, Jeff Cody, Max Reitz,
	Stefan Hajnoczi

On 09/19/2016 04:21 AM, Fam Zheng wrote:
> On Thu, 09/15 19:34, Denis V. Lunev wrote:
>> They should work very similar, covering same areas if backing store is
>> shorter than the image. This change is necessary for the followup patch
>> switching to bdrv_get_block_status_above() in mirror to avoid assert
>> in check_block.
>>
>> This change should be made very carefully. Let us assume that we have
>> top image and 2 backing stores L0->L1->L2.
> Stupid question: which one is top and which are backing?
L0 is top, L2 is at bottom.


>>   L0: --------------
>>   L1: -------
>>   L2: -------=======
>> The data marked as '=' in L2 should not appear as BDRV_BLOCK_ALLOCATED
>> and we should return it as filled in L0 image with properly calculated
>> *pnum value.
> What '-', '=' and ' ' represent aren't immediately clear to me, could you put a
> legend in the message too? Something like:
>
>     '-': allocated
>     '=': unallocated
>     ' ': beyong EOF
ok.

here '-' in unallocated
'=' is allocated.
virtual size of L1 image is shorter that L2 and L0, thus ' ' is beyond EOF.

Thank you, will rewrite today.

Den
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Jeff Cody <jcody@redhat.com>
>> ---
>>  block/io.c | 25 ++++++++++++++++++++-----
>>  1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index 420944d..067d465 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1741,18 +1741,33 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
>>          BlockDriverState **file)
>>  {
>>      BlockDriverState *p;
>> -    int64_t ret = 0;
>> +    int64_t ret = 0, res = nb_sectors;
>>  
>>      assert(bs != base);
>>      for (p = bs; p != base; p = backing_bs(p)) {
>> -        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
>> -        if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
>> -            break;
>> +        int sc;
>> +        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, &sc, file);
>> +        if (ret < 0) {
>> +            return ret;
>> +        } else if (ret & BDRV_BLOCK_ALLOCATED) {
>> +            *pnum = sc;
>> +            return ret;
>> +        }
>> +
>> +        if (res > sc && (p == bs || sector_num + sc < p->total_sectors)) {
>> +            res = sc;
>>          }
>> +
>>          /* [sector_num, pnum] unallocated on this layer, which could be only
>>           * the first part of [sector_num, nb_sectors].  */
>> -        nb_sectors = MIN(nb_sectors, *pnum);
>> +        nb_sectors = MIN(nb_sectors, sc);
>> +
>> +        if (nb_sectors == 0) {
>> +            break;
>> +        }
>>      }
>> +
>> +    *pnum = res;
>>      return ret;
>>  }
>>  
>> -- 
>> 2.7.4
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-19  4:37     ` Denis V. Lunev
@ 2016-09-19 20:39       ` Eric Blake
  2016-09-26 15:04         ` Kevin Wolf
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2016-09-19 20:39 UTC (permalink / raw)
  To: Denis V. Lunev, Fam Zheng
  Cc: Kevin Wolf, qemu-block, Jeff Cody, qemu-devel, Max Reitz,
	Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

On 09/18/2016 11:37 PM, Denis V. Lunev wrote:
> On 09/19/2016 04:21 AM, Fam Zheng wrote:
>> On Thu, 09/15 19:34, Denis V. Lunev wrote:
>>> They should work very similar, covering same areas if backing store is
>>> shorter than the image. This change is necessary for the followup patch
>>> switching to bdrv_get_block_status_above() in mirror to avoid assert
>>> in check_block.
>>>
>>> This change should be made very carefully. Let us assume that we have
>>> top image and 2 backing stores L0->L1->L2.
>> Stupid question: which one is top and which are backing?
> L0 is top, L2 is at bottom.

I typically write this as:

L2 <- L1 <- L0

(read "L2 backs L1, which in turn backs L0") with the active on the
right.  So I understand the confusion in Fam's question where you were
using the opposite direction.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above() Denis V. Lunev
  2016-09-19  1:21   ` Fam Zheng
@ 2016-09-19 23:18   ` Max Reitz
  2016-09-20  6:13     ` Jeff Cody
  1 sibling, 1 reply; 12+ messages in thread
From: Max Reitz @ 2016-09-19 23:18 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Stefan Hajnoczi, Fam Zheng, Kevin Wolf, Jeff Cody

On 2016-09-15 at 18:34, Denis V. Lunev wrote:
> They should work very similar, covering same areas if backing store is
> shorter than the image. This change is necessary for the followup patch
> switching to bdrv_get_block_status_above() in mirror to avoid assert
> in check_block.
>
> This change should be made very carefully. Let us assume that we have
> top image and 2 backing stores L0->L1->L2.
>   L0: --------------
>   L1: -------
>   L2: -------=======
> The data marked as '=' in L2 should not appear as BDRV_BLOCK_ALLOCATED
> and we should return it as filled in L0 image with properly calculated
> *pnum value.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Jeff Cody <jcody@redhat.com>
> ---
>  block/io.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index 420944d..067d465 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1741,18 +1741,33 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
>          BlockDriverState **file)
>  {
>      BlockDriverState *p;
> -    int64_t ret = 0;
> +    int64_t ret = 0, res = nb_sectors;

It's not wrong to make res an int64_t, but an int is sufficient.

>
>      assert(bs != base);
>      for (p = bs; p != base; p = backing_bs(p)) {
> -        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
> -        if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
> -            break;
> +        int sc;
> +        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, &sc, file);
> +        if (ret < 0) {
> +            return ret;
> +        } else if (ret & BDRV_BLOCK_ALLOCATED) {
> +            *pnum = sc;
> +            return ret;
> +        }
> +
> +        if (res > sc && (p == bs || sector_num + sc < p->total_sectors)) {
> +            res = sc;

This definitely requires some comments because it took me a long time to 
figure out why we need "res" to be a separate variable from "nb_sectors" 
and why this condition is like it is.

So what I think this does is:

Basically, we want to return our final nb_sectors in *pnum. But we can 
have the constellation you noted in the commit message: A short 
intermediate layer, and the bottom layer has some data allocated beyond 
the end of that intermediate layer.

Now, when we pass through that intermediate layer, we need to shorten 
nb_sectors so that we don't query anything beyond the end of that 
intermediate layer because it doesn't matter anyway.

But we also want to remember that all of this area appears as 
unallocated to the top layer, so therefore we have to keep a second 
variable ("res") which retains this information.

Therefore, nb_sectors is always exactly the range we want to query, and 
"res" is the range we know to appear unallocated. This condition here 
tries to adjust "res" so that it conforms to that specification.

However, I'm not quite sure it actually does that. Let's take the case 
from your commit message:

L0: --------------
L1: -------
L2: -------=======

Let's say we invoke this function in the range [0, 14]. After passing 
through L0, res is 14 and nb_sectors is 14. After L1, res is still 14, 
but nb_sectors is 7. So far, so good.

But when passing through L2, "sc" will be 7 (and it will actually always 
be 7, regardless of what comes past sector 7, because nb_sectors is 7). 
Since L2 is larger than just 7 sectors, we will now reduce res to 7 as 
well (because sector_num + sc (= 0 + 7 = 7) < p->total_sectors (= 14)).

So therefore, we will set *pnum to 7. That doesn't seem too bad to me, 
but we could have achieved the same result by just setting *pnum to 
nb_sectors and not having to track the separate "res" variable.


Thus, I'm not quite sure what the point of this is. "res" will only be 
longer than "nb_sectors" as long as the layers get shorter or stay the 
same length when going downwards. As soon as one layer is longer than 
the one above it, "res" will probably be truncated to "sc" (which is 
going to be the same value as "nb_sectors", unless 
bdrv_co_get_block_status() returns a *pnum > nb_sectors).

I'm not sure whether I'm missing something here, though.

>          }
> +
>          /* [sector_num, pnum] unallocated on this layer, which could be only

The "pnum" here should be changed to "sc".

>           * the first part of [sector_num, nb_sectors].  */
> -        nb_sectors = MIN(nb_sectors, *pnum);
> +        nb_sectors = MIN(nb_sectors, sc);
> +
> +        if (nb_sectors == 0) {
> +            break;

While I can see that in this case ret would be 0, I think it wouldn't 
hurt to add an explicit "ret = 0;" here, too.

Max

> +        }
>      }
> +
> +    *pnum = res;
>      return ret;
>  }
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-19 23:18   ` [Qemu-devel] " Max Reitz
@ 2016-09-20  6:13     ` Jeff Cody
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Cody @ 2016-09-20  6:13 UTC (permalink / raw)
  To: Max Reitz
  Cc: Denis V. Lunev, qemu-block, qemu-devel, Stefan Hajnoczi,
	Fam Zheng, Kevin Wolf

On Tue, Sep 20, 2016 at 01:18:12AM +0200, Max Reitz wrote:
> On 2016-09-15 at 18:34, Denis V. Lunev wrote:
> >They should work very similar, covering same areas if backing store is
> >shorter than the image. This change is necessary for the followup patch
> >switching to bdrv_get_block_status_above() in mirror to avoid assert
> >in check_block.
> >
> >This change should be made very carefully. Let us assume that we have
> >top image and 2 backing stores L0->L1->L2.
> >  L0: --------------
> >  L1: -------
> >  L2: -------=======
> >The data marked as '=' in L2 should not appear as BDRV_BLOCK_ALLOCATED
> >and we should return it as filled in L0 image with properly calculated
> >*pnum value.
> >
> >Signed-off-by: Denis V. Lunev <den@openvz.org>
> >CC: Stefan Hajnoczi <stefanha@redhat.com>
> >CC: Fam Zheng <famz@redhat.com>
> >CC: Kevin Wolf <kwolf@redhat.com>
> >CC: Max Reitz <mreitz@redhat.com>
> >CC: Jeff Cody <jcody@redhat.com>
> >---
> > block/io.c | 25 ++++++++++++++++++++-----
> > 1 file changed, 20 insertions(+), 5 deletions(-)
> >
> >diff --git a/block/io.c b/block/io.c
> >index 420944d..067d465 100644
> >--- a/block/io.c
> >+++ b/block/io.c
> >@@ -1741,18 +1741,33 @@ static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
> >         BlockDriverState **file)
> > {
> >     BlockDriverState *p;
> >-    int64_t ret = 0;
> >+    int64_t ret = 0, res = nb_sectors;
> 
> It's not wrong to make res an int64_t, but an int is sufficient.
> 
> >
> >     assert(bs != base);
> >     for (p = bs; p != base; p = backing_bs(p)) {
> >-        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
> >-        if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
> >-            break;
> >+        int sc;
> >+        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, &sc, file);
> >+        if (ret < 0) {
> >+            return ret;
> >+        } else if (ret & BDRV_BLOCK_ALLOCATED) {
> >+            *pnum = sc;
> >+            return ret;
> >+        }
> >+
> >+        if (res > sc && (p == bs || sector_num + sc < p->total_sectors)) {
> >+            res = sc;

For what its worth, I think the code in bdrv_is_allocated_above() is a bit
more readable, which does largely the same thing as this function now does
with this patch.

Some of it is due to with the nomenclature of 'res' and 'sc' here, neither
of which are very intuitive.  Part of it is out of this patch's control: 
the explicit variable convention of "top", "intermediate", and "base" in the
while loop of bdrv_is_allocated_above() also aids readability.

I think since the functionality is largely pulled from
bdrv_is_allocated_above(), it would make sense to copy its syntax more
directly.

> 
> This definitely requires some comments because it took me a long time to
> figure out why we need "res" to be a separate variable from "nb_sectors" and
> why this condition is like it is.
> 
> So what I think this does is:
> 
> Basically, we want to return our final nb_sectors in *pnum. But we can have
> the constellation you noted in the commit message: A short intermediate
> layer, and the bottom layer has some data allocated beyond the end of that
> intermediate layer.
> 
> Now, when we pass through that intermediate layer, we need to shorten
> nb_sectors so that we don't query anything beyond the end of that
> intermediate layer because it doesn't matter anyway.
> 

That's the issue with this patch; we don't need to shorten nb_sectors,
precisely because it doesn't matter if it extends past the end of the
current layer. [1]


> But we also want to remember that all of this area appears as unallocated to
> the top layer, so therefore we have to keep a second variable ("res") which
> retains this information.
> 
> Therefore, nb_sectors is always exactly the range we want to query, and
> "res" is the range we know to appear unallocated. This condition here tries
> to adjust "res" so that it conforms to that specification.
>

This changes makes this function similar to "bdrv_is_allocated_above()".  It
may be useful to refer back to commit 63ba17d, to see the rationale when
that functioned got the similar functionality:

    block: Fix is_allocated_above with resized files

    In an image chain, if the base image is smaller than the current
    image, we need to make sure to use the current images count of
    unallocated blocks once we get to the end of the base image. Without
    this change the code will return 0 blocks when it gets to the end
    of the base image and mirror_run will fail its assertion.


This sounds compatible with your above analysis.

 
> However, I'm not quite sure it actually does that. Let's take the case from
> your commit message:
> 
> L0: --------------
> L1: -------
> L2: -------=======
> 
> Let's say we invoke this function in the range [0, 14]. After passing
> through L0, res is 14 and nb_sectors is 14. After L1, res is still 14, but
> nb_sectors is 7. So far, so good.
> 
> But when passing through L2, "sc" will be 7 (and it will actually always be
> 7, regardless of what comes past sector 7, because nb_sectors is 7). Since
> L2 is larger than just 7 sectors, we will now reduce res to 7 as well
> (because sector_num + sc (= 0 + 7 = 7) < p->total_sectors (= 14)).
> 
> So therefore, we will set *pnum to 7. That doesn't seem too bad to me, but
> we could have achieved the same result by just setting *pnum to nb_sectors
> and not having to track the separate "res" variable.
> 
> 
> Thus, I'm not quite sure what the point of this is. "res" will only be
> longer than "nb_sectors" as long as the layers get shorter or stay the same
> length when going downwards. As soon as one layer is longer than the one
> above it, "res" will probably be truncated to "sc" (which is going to be the
> same value as "nb_sectors", unless bdrv_co_get_block_status() returns a
> *pnum > nb_sectors).
> 
> I'm not sure whether I'm missing something here, though.
>

There is a bug, below [1]:

> >         }
> >+
> >         /* [sector_num, pnum] unallocated on this layer, which could be only
>
> The "pnum" here should be changed to "sc".
> 
> >          * the first part of [sector_num, nb_sectors].  */
> >-        nb_sectors = MIN(nb_sectors, *pnum);
> >+        nb_sectors = MIN(nb_sectors, sc);

[1]

The above code should be deleted completely. There isn't a reason to set
nb_sectors to the lowest value here - we should always be passing in the
original nb_sectors value (the underlying driver will truncate the returned
pnum value as appropriate).

> >+
> >+        if (nb_sectors == 0) {
> >+            break;

This can go as well.

-Jeff

> 
> While I can see that in this case ret would be 0, I think it wouldn't hurt
> to add an explicit "ret = 0;" here, too.
> 
> Max
> 
> >+        }
> >     }
> >+
> >+    *pnum = res;
> >     return ret;
> > }
> >
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job
  2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
  2016-09-15 17:19   ` Eric Blake
@ 2016-09-23 11:00   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-09-23 11:00 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Jeff Cody, Max Reitz, Stefan Hajnoczi

On 15.09.2016 19:34, Denis V. Lunev wrote:
> bdrv_is_allocated_above() returns true in the case even for completel
> zeroed areas as BDRV_BLOCK_ALLOCATED flag is set in both cases.
>
> The patch stops using bdrv_is_allocated_above() wrapper and switches to
> bdrv_get_block_status_above() to distinguish zeroed areas and areas with
> data to avoid extra IO operations if possible.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Jeff Cody <jcody@redhat.com>
> ---
>   block/mirror.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index e0b3f41..2710f62 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -548,11 +548,11 @@ static void mirror_throttle(MirrorBlockJob *s)
>   
>   static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>   {
> -    int64_t sector_num, end;
> +    int64_t sector_num, end, alloc_mask;
>       BlockDriverState *base = s->base;
>       BlockDriverState *bs = blk_bs(s->common.blk);
>       BlockDriverState *target_bs = blk_bs(s->target);
> -    int ret, n;
> +    int n;
>   
>       end = s->bdev_length / BDRV_SECTOR_SIZE;
>   
> @@ -585,11 +585,15 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>           mirror_drain(s);
>       }
>   
> +    alloc_mask = base == NULL ? BDRV_BLOCK_ALLOCATED : BDRV_BLOCK_DATA;

should not s/==/!= ?

> +
>       /* First part, loop on the sectors and initialize the dirty bitmap.  */
>       for (sector_num = 0; sector_num < end; ) {
>           /* Just to make sure we are not exceeding int limit. */
>           int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
>                                end - sector_num);
> +        int64_t status;
> +        BlockDriverState *file;
>   
>           mirror_throttle(s);
>   
> @@ -597,13 +601,14 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>               return 0;
>           }
>   
> -        ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
> -        if (ret < 0) {
> -            return ret;
> +        status = bdrv_get_block_status_above(bs, base, sector_num,
> +                                             nb_sectors, &n, &file);
> +        if (status < 0) {
> +            return status;
>           }
>   
>           assert(n > 0);
> -        if (ret == 1) {
> +        if (status & alloc_mask) {
>               bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
>           }
>           sector_num += n;


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-19 20:39       ` Eric Blake
@ 2016-09-26 15:04         ` Kevin Wolf
  2016-09-26 21:42           ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2016-09-26 15:04 UTC (permalink / raw)
  To: Eric Blake
  Cc: Denis V. Lunev, Fam Zheng, qemu-block, Jeff Cody, qemu-devel,
	Max Reitz, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

Am 19.09.2016 um 22:39 hat Eric Blake geschrieben:
> On 09/18/2016 11:37 PM, Denis V. Lunev wrote:
> > On 09/19/2016 04:21 AM, Fam Zheng wrote:
> >> On Thu, 09/15 19:34, Denis V. Lunev wrote:
> >>> They should work very similar, covering same areas if backing store is
> >>> shorter than the image. This change is necessary for the followup patch
> >>> switching to bdrv_get_block_status_above() in mirror to avoid assert
> >>> in check_block.
> >>>
> >>> This change should be made very carefully. Let us assume that we have
> >>> top image and 2 backing stores L0->L1->L2.
> >> Stupid question: which one is top and which are backing?
> > L0 is top, L2 is at bottom.
> 
> I typically write this as:
> 
> L2 <- L1 <- L0
> 
> (read "L2 backs L1, which in turn backs L0") with the active on the
> right.  So I understand the confusion in Fam's question where you were
> using the opposite direction.

And I tend to use this one:

    base <- sn1 <- sn2 <- top

"sn*" isn't any better than "L*", but having at least one of "base" and
"top" (or "active") in there disambiguates the roles of the nodes.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()
  2016-09-26 15:04         ` Kevin Wolf
@ 2016-09-26 21:42           ` Kashyap Chamarthy
  0 siblings, 0 replies; 12+ messages in thread
From: Kashyap Chamarthy @ 2016-09-26 21:42 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Eric Blake, Fam Zheng, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Denis V. Lunev

On Mon, Sep 26, 2016 at 05:04:21PM +0200, Kevin Wolf wrote:
> Am 19.09.2016 um 22:39 hat Eric Blake geschrieben:

[...]

> > I typically write this as:
> > 
> > L2 <- L1 <- L0
> > 
> > (read "L2 backs L1, which in turn backs L0") with the active on the
> > right.  So I understand the confusion in Fam's question where you were
> > using the opposite direction.
> 
> And I tend to use this one:
> 
>     base <- sn1 <- sn2 <- top
> 
> "sn*" isn't any better than "L*", but having at least one of "base" and
> "top" (or "active") in there disambiguates the roles of the nodes.

Not to quibble over terminology too much, but now that I'm writing a doc
that I want to submit upstream, I began with your (Kevin's) notation.
Then, I thought: Hmm, "sn1" could also be referred to as 'base', and
"sn2" as 'top' when using `block-commit`' (and `block-stream`, once it
starts supporting intermediate streaming?).

And, moreover, as Eric (correctly) warns elsewhere about file-names vs.
points-in-time: the guest state when 'sn1' was created is contained in
'base', so one could argue that 'sn1' ("snapshot 1") is a misnomer, and
is technically 'overlay1'.

So, I used the below notation until recently, including 'active'
(with the rationale Kevin mentioned):

    base <- overlay1 <- overlay2  <- active

Then, someone asked: "In the above chain, are you pointing to 'overlay2'
as active, or is 'active' a separate image unto itself"?  "Sigh, so it
is still prone to misunderstanding", I thought.

Given that, for now, though slightly more verbose and space-occupying, I
settled on the below (occasionally doing s/base/orig/, to avoid the
"overlay1 could be referred to as 'base' in some cases" problem):

                                      
                                    Live QEMU
                                       |
                                       v
    base <- overlay1 <- overlay2 <- overlay3

FWIW, the above also avoids the problem of a file called 'active' being
described as: "previously-active, but not anymore, because its contents
are merged into its backing file" in the event of a 'block-commit'.

</end quibble>

-- 
/kashyap

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-26 21:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15 16:34 [Qemu-devel] [PATCH v3 0/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above() Denis V. Lunev
2016-09-19  1:21   ` Fam Zheng
2016-09-19  4:37     ` Denis V. Lunev
2016-09-19 20:39       ` Eric Blake
2016-09-26 15:04         ` Kevin Wolf
2016-09-26 21:42           ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
2016-09-19 23:18   ` [Qemu-devel] " Max Reitz
2016-09-20  6:13     ` Jeff Cody
2016-09-15 16:34 ` [Qemu-devel] [PATCH v3 2/2] mirror: fix improperly filled copy_bitmap for mirror block job Denis V. Lunev
2016-09-15 17:19   ` Eric Blake
2016-09-23 11:00   ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.