All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] block/rbd: fixes for bdrv_co_block_status
@ 2022-01-10 11:41 Peter Lieven
  2022-01-10 11:41 ` [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status Peter Lieven
  2022-01-10 11:41 ` [PATCH 2/2] block/rbd: workaround for ceph issue #53784 Peter Lieven
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Lieven @ 2022-01-10 11:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, idryomov, berrange, Peter Lieven, qemu-devel, ct,
	pbonzini, idryomov, mreitz, dillaman

Peter Lieven (2):
  block/rbd: fix handling of holes in .bdrv_co_block_status
  block/rbd: workaround for ceph issue #53784

 block/rbd.c | 72 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 50 insertions(+), 22 deletions(-)

-- 
2.25.1




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status
  2022-01-10 11:41 [PATCH 0/2] block/rbd: fixes for bdrv_co_block_status Peter Lieven
@ 2022-01-10 11:41 ` Peter Lieven
  2022-01-12  9:05   ` Ilya Dryomov
  2022-01-10 11:41 ` [PATCH 2/2] block/rbd: workaround for ceph issue #53784 Peter Lieven
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2022-01-10 11:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, idryomov, berrange, qemu-stable, Peter Lieven, qemu-devel,
	ct, pbonzini, idryomov, mreitz, dillaman

the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.

We can see a hole in an image if we diff against base if there exists an older snapshot
of the image and we have discarded blocks in the image where the snapshot has data.

Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/rbd.c | 55 +++++++++++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index def96292e0..5e9dc91d81 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1279,13 +1279,24 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
     RBDDiffIterateReq *req = opaque;
 
     assert(req->offs + req->bytes <= offs);
-    /*
-     * we do not diff against a snapshot so we should never receive a callback
-     * for a hole.
-     */
-    assert(exists);
 
-    if (!req->exists && offs > req->offs) {
+    if (req->exists && offs > req->offs + req->bytes) {
+        /*
+         * we started in an allocated area and jumped over an unallocated area,
+         * req->bytes contains the length of the allocated area before the
+         * unallocated area. stop further processing.
+         */
+        return QEMU_RBD_EXIT_DIFF_ITERATE2;
+    }
+    if (req->exists && !exists) {
+        /*
+         * we started in an allocated area and reached a hole. req->bytes
+         * contains the length of the allocated area before the hole.
+         * stop further processing.
+         */
+        return QEMU_RBD_EXIT_DIFF_ITERATE2;
+    }
+    if (!req->exists && exists && offs > req->offs) {
         /*
          * we started in an unallocated area and hit the first allocated
          * block. req->bytes must be set to the length of the unallocated area
@@ -1295,17 +1306,19 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
         return QEMU_RBD_EXIT_DIFF_ITERATE2;
     }
 
-    if (req->exists && offs > req->offs + req->bytes) {
-        /*
-         * we started in an allocated area and jumped over an unallocated area,
-         * req->bytes contains the length of the allocated area before the
-         * unallocated area. stop further processing.
-         */
-        return QEMU_RBD_EXIT_DIFF_ITERATE2;
-    }
+    /*
+     * assert that we caught all cases above and allocation state has not
+     * changed during callbacks.
+     */
+    assert(exists == req->exists || !req->bytes);
+    req->exists = exists;
 
-    req->bytes += len;
-    req->exists = true;
+    /*
+     * assert that we either return an unallocated block or have got callbacks
+     * for all allocated blocks present.
+     */
+    assert(!req->exists || offs == req->offs + req->bytes);
+    req->bytes = offs + len - req->offs;
 
     return 0;
 }
@@ -1354,13 +1367,13 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
     }
     assert(req.bytes <= bytes);
     if (!req.exists) {
-        if (r == 0) {
+        if (r == 0 && !req.bytes) {
             /*
-             * rbd_diff_iterate2 does not invoke callbacks for unallocated
-             * areas. This here catches the case where no callback was
-             * invoked at all (req.bytes == 0).
+             * rbd_diff_iterate2 does not invoke callbacks for unallocated areas
+             * except for the case where an overlay has a hole where the parent
+             * or an older snapshot of the image has not. This here catches the
+             * case where no callback was invoked at all.
              */
-            assert(req.bytes == 0);
             req.bytes = bytes;
         }
         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
-- 
2.25.1




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-10 11:41 [PATCH 0/2] block/rbd: fixes for bdrv_co_block_status Peter Lieven
  2022-01-10 11:41 ` [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status Peter Lieven
@ 2022-01-10 11:41 ` Peter Lieven
  2022-01-10 14:18   ` Stefano Garzarella
  2022-01-12  9:59   ` Ilya Dryomov
  1 sibling, 2 replies; 13+ messages in thread
From: Peter Lieven @ 2022-01-10 11:41 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, idryomov, berrange, qemu-stable, Peter Lieven, qemu-devel,
	ct, pbonzini, idryomov, mreitz, dillaman

librbd had a bug until early 2022 that affected all versions of ceph that
supported fast-diff. This bug results in reporting of incorrect offsets
if the offset parameter to rbd_diff_iterate2 is not object aligned.
Work around this bug by rounding down the offset to object boundaries.

Fixes: https://tracker.ceph.com/issues/53784
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/rbd.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/rbd.c b/block/rbd.c
index 5e9dc91d81..260cb9f4b4 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
     int status, r;
     RBDDiffIterateReq req = { .offs = offset };
     uint64_t features, flags;
+    int64_t head;
 
     assert(offset + bytes <= s->image_size);
 
@@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
         return status;
     }
 
+    /*
+     * librbd had a bug until early 2022 that affected all versions of ceph that
+     * supported fast-diff. This bug results in reporting of incorrect offsets
+     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
+     * Work around this bug by rounding down the offset to object boundaries.
+     *
+     * See: https://tracker.ceph.com/issues/53784
+     */
+    head = offset & (s->object_size - 1);
+    offset -= head;
+    req.offs -= head;
+    bytes += head;
+
     r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
                           qemu_rbd_diff_iterate_cb, &req);
     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
@@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
     }
 
-    *pnum = req.bytes;
+    assert(req.bytes > head);
+    *pnum = req.bytes - head;
     return status;
 }
 
-- 
2.25.1




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-10 11:41 ` [PATCH 2/2] block/rbd: workaround for ceph issue #53784 Peter Lieven
@ 2022-01-10 14:18   ` Stefano Garzarella
  2022-01-11  9:10     ` Peter Lieven
  2022-01-12  9:59   ` Ilya Dryomov
  1 sibling, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2022-01-10 14:18 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, berrange, qemu-block, ct, qemu-stable, qemu-devel, mreitz,
	pbonzini, idryomov, idryomov, dillaman

On Mon, Jan 10, 2022 at 12:41:54PM +0100, Peter Lieven wrote:
>librbd had a bug until early 2022 that affected all versions of ceph that
>supported fast-diff. This bug results in reporting of incorrect offsets
>if the offset parameter to rbd_diff_iterate2 is not object aligned.
>Work around this bug by rounding down the offset to object boundaries.
>
>Fixes: https://tracker.ceph.com/issues/53784
>Cc: qemu-stable@nongnu.org
>Signed-off-by: Peter Lieven <pl@kamp.de>
>---
> block/rbd.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
>diff --git a/block/rbd.c b/block/rbd.c
>index 5e9dc91d81..260cb9f4b4 100644
>--- a/block/rbd.c
>+++ b/block/rbd.c
>@@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>     int status, r;
>     RBDDiffIterateReq req = { .offs = offset };
>     uint64_t features, flags;
>+    int64_t head;
>
>     assert(offset + bytes <= s->image_size);
>
>@@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>         return status;
>     }
>
>+    /*
>+     * librbd had a bug until early 2022 that affected all versions of ceph that
>+     * supported fast-diff. This bug results in reporting of incorrect offsets
>+     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
>+     * Work around this bug by rounding down the offset to object boundaries.
>+     *
>+     * See: https://tracker.ceph.com/issues/53784
>+     */
>+    head = offset & (s->object_size - 1);
>+    offset -= head;
>+    req.offs -= head;
>+    bytes += head;
>+
>     r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>                           qemu_rbd_diff_iterate_cb, &req);
>     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>@@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>     }
>
>-    *pnum = req.bytes;
>+    assert(req.bytes > head);
>+    *pnum = req.bytes - head;
>     return status;
> }

Thanks for the workaround!

I just tested this patch for the issue reported in this BZ [1] and the 
test now works correctly!

Tested-by: Stefano Garzarella <sgarzare@redhat.com>

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2034791



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-10 14:18   ` Stefano Garzarella
@ 2022-01-11  9:10     ` Peter Lieven
  2022-01-11 11:15       ` Stefano Garzarella
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2022-01-11  9:10 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: kwolf, berrange, qemu-block, ct, qemu-stable, qemu-devel, mreitz,
	pbonzini, idryomov, idryomov, dillaman

Am 10.01.22 um 15:18 schrieb Stefano Garzarella:
> On Mon, Jan 10, 2022 at 12:41:54PM +0100, Peter Lieven wrote:
>> librbd had a bug until early 2022 that affected all versions of ceph that
>> supported fast-diff. This bug results in reporting of incorrect offsets
>> if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> Work around this bug by rounding down the offset to object boundaries.
>>
>> Fixes: https://tracker.ceph.com/issues/53784
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> block/rbd.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 5e9dc91d81..260cb9f4b4 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>     int status, r;
>>     RBDDiffIterateReq req = { .offs = offset };
>>     uint64_t features, flags;
>> +    int64_t head;
>>
>>     assert(offset + bytes <= s->image_size);
>>
>> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>         return status;
>>     }
>>
>> +    /*
>> +     * librbd had a bug until early 2022 that affected all versions of ceph that
>> +     * supported fast-diff. This bug results in reporting of incorrect offsets
>> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> +     * Work around this bug by rounding down the offset to object boundaries.
>> +     *
>> +     * See: https://tracker.ceph.com/issues/53784
>> +     */
>> +    head = offset & (s->object_size - 1);
>> +    offset -= head;
>> +    req.offs -= head;
>> +    bytes += head;
>> +
>>     r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>>                           qemu_rbd_diff_iterate_cb, &req);
>>     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>> @@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>>     }
>>
>> -    *pnum = req.bytes;
>> +    assert(req.bytes > head);
>> +    *pnum = req.bytes - head;
>>     return status;
>> }
>
> Thanks for the workaround!
>
> I just tested this patch for the issue reported in this BZ [1] and the test now works correctly!
>
> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2034791
>


Hi Stefano,


thanks for the feedback. Please note that you also need the other patch or you will sooner or later run into another assertion as soon as rbd snapshots are involved.


Regarding the workaround I need confirmation from Ilya that it covers all cases. I do not know if it works if striping or EC is configured on the pool.


Best,

Peter





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-11  9:10     ` Peter Lieven
@ 2022-01-11 11:15       ` Stefano Garzarella
  0 siblings, 0 replies; 13+ messages in thread
From: Stefano Garzarella @ 2022-01-11 11:15 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, berrange, qemu-block, ct, qemu-stable, qemu-devel, mreitz,
	pbonzini, idryomov, idryomov, dillaman

Hi Peter,

On Tue, Jan 11, 2022 at 10:10:16AM +0100, Peter Lieven wrote:
>Hi Stefano,
>
>
>thanks for the feedback. Please note that you also need the other patch 
>or you will sooner or later run into another assertion as soon as rbd 
>snapshots are involved.

Yep, I tested with the entire series applied.
Anyway, thanks for clarifying that.

>
>Regarding the workaround I need confirmation from Ilya that it covers 
>all cases. I do not know if it works if striping or EC is configured on 
>the pool.

Sure :-)

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status
  2022-01-10 11:41 ` [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status Peter Lieven
@ 2022-01-12  9:05   ` Ilya Dryomov
  2022-01-12 20:39     ` Peter Lieven
  0 siblings, 1 reply; 13+ messages in thread
From: Ilya Dryomov @ 2022-01-12  9:05 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, qemu-stable, ct, qemu-devel, Paolo Bonzini,
	Max Reitz, Jason Dillaman

On Mon, Jan 10, 2022 at 12:42 PM Peter Lieven <pl@kamp.de> wrote:
>
> the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.
>
> We can see a hole in an image if we diff against base if there exists an older snapshot
> of the image and we have discarded blocks in the image where the snapshot has data.
>
> Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/rbd.c | 55 +++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 34 insertions(+), 21 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index def96292e0..5e9dc91d81 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1279,13 +1279,24 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
>      RBDDiffIterateReq *req = opaque;
>
>      assert(req->offs + req->bytes <= offs);
> -    /*
> -     * we do not diff against a snapshot so we should never receive a callback
> -     * for a hole.
> -     */
> -    assert(exists);
>
> -    if (!req->exists && offs > req->offs) {
> +    if (req->exists && offs > req->offs + req->bytes) {
> +        /*
> +         * we started in an allocated area and jumped over an unallocated area,
> +         * req->bytes contains the length of the allocated area before the
> +         * unallocated area. stop further processing.
> +         */
> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
> +    }
> +    if (req->exists && !exists) {
> +        /*
> +         * we started in an allocated area and reached a hole. req->bytes
> +         * contains the length of the allocated area before the hole.
> +         * stop further processing.
> +         */
> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
> +    }
> +    if (!req->exists && exists && offs > req->offs) {
>          /*
>           * we started in an unallocated area and hit the first allocated
>           * block. req->bytes must be set to the length of the unallocated area
> @@ -1295,17 +1306,19 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
>          return QEMU_RBD_EXIT_DIFF_ITERATE2;
>      }
>
> -    if (req->exists && offs > req->offs + req->bytes) {
> -        /*
> -         * we started in an allocated area and jumped over an unallocated area,
> -         * req->bytes contains the length of the allocated area before the
> -         * unallocated area. stop further processing.
> -         */
> -        return QEMU_RBD_EXIT_DIFF_ITERATE2;
> -    }
> +    /*
> +     * assert that we caught all cases above and allocation state has not
> +     * changed during callbacks.
> +     */
> +    assert(exists == req->exists || !req->bytes);
> +    req->exists = exists;
>
> -    req->bytes += len;
> -    req->exists = true;
> +    /*
> +     * assert that we either return an unallocated block or have got callbacks
> +     * for all allocated blocks present.
> +     */
> +    assert(!req->exists || offs == req->offs + req->bytes);
> +    req->bytes = offs + len - req->offs;
>
>      return 0;
>  }
> @@ -1354,13 +1367,13 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>      }
>      assert(req.bytes <= bytes);
>      if (!req.exists) {
> -        if (r == 0) {
> +        if (r == 0 && !req.bytes) {
>              /*
> -             * rbd_diff_iterate2 does not invoke callbacks for unallocated
> -             * areas. This here catches the case where no callback was
> -             * invoked at all (req.bytes == 0).
> +             * rbd_diff_iterate2 does not invoke callbacks for unallocated areas
> +             * except for the case where an overlay has a hole where the parent
> +             * or an older snapshot of the image has not. This here catches the
> +             * case where no callback was invoked at all.
>               */
> -            assert(req.bytes == 0);
>              req.bytes = bytes;
>          }
>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
> --
> 2.25.1
>
>

Hi Peter,

Can we just skip these "holes" by replacing the existing assert with
an if statement that would simply bail from the callback on !exists?

Just trying to keep the logic as simple as possible since as it turns
out we get to contend with ages-old librbd bugs here...

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-10 11:41 ` [PATCH 2/2] block/rbd: workaround for ceph issue #53784 Peter Lieven
  2022-01-10 14:18   ` Stefano Garzarella
@ 2022-01-12  9:59   ` Ilya Dryomov
  2022-01-12 11:55     ` Peter Lieven
  1 sibling, 1 reply; 13+ messages in thread
From: Ilya Dryomov @ 2022-01-12  9:59 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, qemu-devel, ct, qemu-stable, Paolo Bonzini,
	Max Reitz, Jason Dillaman

On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
>
> librbd had a bug until early 2022 that affected all versions of ceph that
> supported fast-diff. This bug results in reporting of incorrect offsets
> if the offset parameter to rbd_diff_iterate2 is not object aligned.
> Work around this bug by rounding down the offset to object boundaries.
>
> Fixes: https://tracker.ceph.com/issues/53784

I don't think the Fixes tag is appropriate here.  Linking librbd
ticket is fine but this patch doesn't really fix anything.

> Cc: qemu-stable@nongnu.org
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/rbd.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5e9dc91d81..260cb9f4b4 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>      int status, r;
>      RBDDiffIterateReq req = { .offs = offset };
>      uint64_t features, flags;
> +    int64_t head;
>
>      assert(offset + bytes <= s->image_size);
>
> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>          return status;
>      }
>
> +    /*
> +     * librbd had a bug until early 2022 that affected all versions of ceph that
> +     * supported fast-diff. This bug results in reporting of incorrect offsets
> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
> +     * Work around this bug by rounding down the offset to object boundaries.
> +     *
> +     * See: https://tracker.ceph.com/issues/53784
> +     */
> +    head = offset & (s->object_size - 1);
> +    offset -= head;
> +    req.offs -= head;
> +    bytes += head;

So it looks like the intention is to have more or less a permanent
workaround since all librbd versions are affected, right?  For that,
I think we would need to also reject custom striping patterns and
clones.  For the above to be reliable, the image has to be standalone
and have a default striping pattern (stripe_unit == object_size &&
stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
invalid.

> +

Nit: I'd replace { .offs = offset } initialization at the top with {}
and assign to req.offs here, right before calling rbd_diff_iterate2().

>      r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>                            qemu_rbd_diff_iterate_cb, &req);
>      if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
> @@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>      }
>
> -    *pnum = req.bytes;
> +    assert(req.bytes > head);

I'd expand the workaround comment with an explanation of why it's OK
to round down the offset -- because rbd_diff_iterate2() is called with
whole_object=true.  If that wasn't the case, on top of inconsistent
results for different offsets within an object, this assert could be
triggered.

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-12  9:59   ` Ilya Dryomov
@ 2022-01-12 11:55     ` Peter Lieven
  2022-01-12 12:22       ` Ilya Dryomov
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2022-01-12 11:55 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, qemu-devel, ct, qemu-stable, Paolo Bonzini,
	Max Reitz, Jason Dillaman

Am 12.01.22 um 10:59 schrieb Ilya Dryomov:
> On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
>> librbd had a bug until early 2022 that affected all versions of ceph that
>> supported fast-diff. This bug results in reporting of incorrect offsets
>> if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> Work around this bug by rounding down the offset to object boundaries.
>>
>> Fixes: https://tracker.ceph.com/issues/53784
> I don't think the Fixes tag is appropriate here.  Linking librbd
> ticket is fine but this patch doesn't really fix anything.


Okay, I will change that to See:


>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/rbd.c | 17 ++++++++++++++++-
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 5e9dc91d81..260cb9f4b4 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>      int status, r;
>>      RBDDiffIterateReq req = { .offs = offset };
>>      uint64_t features, flags;
>> +    int64_t head;
>>
>>      assert(offset + bytes <= s->image_size);
>>
>> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>          return status;
>>      }
>>
>> +    /*
>> +     * librbd had a bug until early 2022 that affected all versions of ceph that
>> +     * supported fast-diff. This bug results in reporting of incorrect offsets
>> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> +     * Work around this bug by rounding down the offset to object boundaries.
>> +     *
>> +     * See: https://tracker.ceph.com/issues/53784
>> +     */
>> +    head = offset & (s->object_size - 1);
>> +    offset -= head;
>> +    req.offs -= head;
>> +    bytes += head;
> So it looks like the intention is to have more or less a permanent
> workaround since all librbd versions are affected, right?  For that,
> I think we would need to also reject custom striping patterns and
> clones.  For the above to be reliable, the image has to be standalone
> and have a default striping pattern (stripe_unit == object_size &&
> stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
> invalid.


Do you have a fealing how many users use a different striping pattern than default?

What about EC backed pools?

Do you have another idea how we can detect if the librbd version is broken?


>
>> +
> Nit: I'd replace { .offs = offset } initialization at the top with {}
> and assign to req.offs here, right before calling rbd_diff_iterate2().
>
>>      r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>>                            qemu_rbd_diff_iterate_cb, &req);
>>      if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>> @@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>>      }
>>
>> -    *pnum = req.bytes;
>> +    assert(req.bytes > head);
> I'd expand the workaround comment with an explanation of why it's OK
> to round down the offset -- because rbd_diff_iterate2() is called with
> whole_object=true.  If that wasn't the case, on top of inconsistent
> results for different offsets within an object, this assert could be
> triggered.

Sure, you are right. I had this in mind. This also does not change complexity

since we stay with the offset in the same object. I will mention both.


Peter





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
  2022-01-12 11:55     ` Peter Lieven
@ 2022-01-12 12:22       ` Ilya Dryomov
  0 siblings, 0 replies; 13+ messages in thread
From: Ilya Dryomov @ 2022-01-12 12:22 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, qemu-devel, ct, qemu-stable, Paolo Bonzini,
	Max Reitz, Jason Dillaman

On Wed, Jan 12, 2022 at 12:55 PM Peter Lieven <pl@kamp.de> wrote:
>
> Am 12.01.22 um 10:59 schrieb Ilya Dryomov:
> > On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
> >> librbd had a bug until early 2022 that affected all versions of ceph that
> >> supported fast-diff. This bug results in reporting of incorrect offsets
> >> if the offset parameter to rbd_diff_iterate2 is not object aligned.
> >> Work around this bug by rounding down the offset to object boundaries.
> >>
> >> Fixes: https://tracker.ceph.com/issues/53784
> > I don't think the Fixes tag is appropriate here.  Linking librbd
> > ticket is fine but this patch doesn't really fix anything.
>
>
> Okay, I will change that to See:

It's already linked in the source code, up to you if you also want to
link it in the description.

>
>
> >
> >> Cc: qemu-stable@nongnu.org
> >> Signed-off-by: Peter Lieven <pl@kamp.de>
> >> ---
> >>  block/rbd.c | 17 ++++++++++++++++-
> >>  1 file changed, 16 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 5e9dc91d81..260cb9f4b4 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
> >>      int status, r;
> >>      RBDDiffIterateReq req = { .offs = offset };
> >>      uint64_t features, flags;
> >> +    int64_t head;
> >>
> >>      assert(offset + bytes <= s->image_size);
> >>
> >> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
> >>          return status;
> >>      }
> >>
> >> +    /*
> >> +     * librbd had a bug until early 2022 that affected all versions of ceph that
> >> +     * supported fast-diff. This bug results in reporting of incorrect offsets
> >> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
> >> +     * Work around this bug by rounding down the offset to object boundaries.
> >> +     *
> >> +     * See: https://tracker.ceph.com/issues/53784
> >> +     */
> >> +    head = offset & (s->object_size - 1);
> >> +    offset -= head;
> >> +    req.offs -= head;
> >> +    bytes += head;
> > So it looks like the intention is to have more or less a permanent
> > workaround since all librbd versions are affected, right?  For that,
> > I think we would need to also reject custom striping patterns and
> > clones.  For the above to be reliable, the image has to be standalone
> > and have a default striping pattern (stripe_unit == object_size &&
> > stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
> > invalid.
>
>
> Do you have a fealing how many users use a different striping pattern than default?

Very few.

>
> What about EC backed pools?

In this context EC pools behave exactly the same as replicated pools.

>
> Do you have another idea how we can detect if the librbd version is broken?

No.  Initially I wanted to just fix these bugs in librbd, relying on
the assumption that setups with a new QEMU should also have a fairly
new librbd.  But after looking at various distros and realizing the
extent of rbd_diff_iterate2() issues, I think a long-term workaround
in QEMU makes sense.  A configure-time check for known good versions
of librbd can be added later if someone feels like it.

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status
  2022-01-12  9:05   ` Ilya Dryomov
@ 2022-01-12 20:39     ` Peter Lieven
  2022-01-12 21:02       ` Ilya Dryomov
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Lieven @ 2022-01-12 20:39 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, qemu-stable, ct, qemu-devel, Paolo Bonzini,
	Max Reitz, Jason Dillaman

Am 12.01.22 um 10:05 schrieb Ilya Dryomov:
> On Mon, Jan 10, 2022 at 12:42 PM Peter Lieven <pl@kamp.de> wrote:
>> the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.
>>
>> We can see a hole in an image if we diff against base if there exists an older snapshot
>> of the image and we have discarded blocks in the image where the snapshot has data.
>>
>> Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/rbd.c | 55 +++++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 34 insertions(+), 21 deletions(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index def96292e0..5e9dc91d81 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1279,13 +1279,24 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
>>      RBDDiffIterateReq *req = opaque;
>>
>>      assert(req->offs + req->bytes <= offs);
>> -    /*
>> -     * we do not diff against a snapshot so we should never receive a callback
>> -     * for a hole.
>> -     */
>> -    assert(exists);
>>
>> -    if (!req->exists && offs > req->offs) {
>> +    if (req->exists && offs > req->offs + req->bytes) {
>> +        /*
>> +         * we started in an allocated area and jumped over an unallocated area,
>> +         * req->bytes contains the length of the allocated area before the
>> +         * unallocated area. stop further processing.
>> +         */
>> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
>> +    }
>> +    if (req->exists && !exists) {
>> +        /*
>> +         * we started in an allocated area and reached a hole. req->bytes
>> +         * contains the length of the allocated area before the hole.
>> +         * stop further processing.
>> +         */
>> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
>> +    }
>> +    if (!req->exists && exists && offs > req->offs) {
>>          /*
>>           * we started in an unallocated area and hit the first allocated
>>           * block. req->bytes must be set to the length of the unallocated area
>> @@ -1295,17 +1306,19 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
>>          return QEMU_RBD_EXIT_DIFF_ITERATE2;
>>      }
>>
>> -    if (req->exists && offs > req->offs + req->bytes) {
>> -        /*
>> -         * we started in an allocated area and jumped over an unallocated area,
>> -         * req->bytes contains the length of the allocated area before the
>> -         * unallocated area. stop further processing.
>> -         */
>> -        return QEMU_RBD_EXIT_DIFF_ITERATE2;
>> -    }
>> +    /*
>> +     * assert that we caught all cases above and allocation state has not
>> +     * changed during callbacks.
>> +     */
>> +    assert(exists == req->exists || !req->bytes);
>> +    req->exists = exists;
>>
>> -    req->bytes += len;
>> -    req->exists = true;
>> +    /*
>> +     * assert that we either return an unallocated block or have got callbacks
>> +     * for all allocated blocks present.
>> +     */
>> +    assert(!req->exists || offs == req->offs + req->bytes);
>> +    req->bytes = offs + len - req->offs;
>>
>>      return 0;
>>  }
>> @@ -1354,13 +1367,13 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>      }
>>      assert(req.bytes <= bytes);
>>      if (!req.exists) {
>> -        if (r == 0) {
>> +        if (r == 0 && !req.bytes) {
>>              /*
>> -             * rbd_diff_iterate2 does not invoke callbacks for unallocated
>> -             * areas. This here catches the case where no callback was
>> -             * invoked at all (req.bytes == 0).
>> +             * rbd_diff_iterate2 does not invoke callbacks for unallocated areas
>> +             * except for the case where an overlay has a hole where the parent
>> +             * or an older snapshot of the image has not. This here catches the
>> +             * case where no callback was invoked at all.
>>               */
>> -            assert(req.bytes == 0);
>>              req.bytes = bytes;
>>          }
>>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>> --
>> 2.25.1
>>
>>
> Hi Peter,
>
> Can we just skip these "holes" by replacing the existing assert with
> an if statement that would simply bail from the callback on !exists?
>
> Just trying to keep the logic as simple as possible since as it turns
> out we get to contend with ages-old librbd bugs here...


I'm afraid I think this would not work. Consider qemu-img convert.

If we bail out we would immediately call get_block_status with the offset

where we stopped and hit the !exist again.


Peter



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status
  2022-01-12 20:39     ` Peter Lieven
@ 2022-01-12 21:02       ` Ilya Dryomov
  2022-01-12 21:27         ` Peter Lieven
  0 siblings, 1 reply; 13+ messages in thread
From: Ilya Dryomov @ 2022-01-12 21:02 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Daniel P. Berrangé,
	qemu-block, qemu-stable, ct, qemu-devel, Paolo Bonzini,
	Max Reitz, Jason Dillaman

On Wed, Jan 12, 2022 at 9:39 PM Peter Lieven <pl@kamp.de> wrote:
>
> Am 12.01.22 um 10:05 schrieb Ilya Dryomov:
> > On Mon, Jan 10, 2022 at 12:42 PM Peter Lieven <pl@kamp.de> wrote:
> >> the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.
> >>
> >> We can see a hole in an image if we diff against base if there exists an older snapshot
> >> of the image and we have discarded blocks in the image where the snapshot has data.
> >>
> >> Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
> >> Cc: qemu-stable@nongnu.org
> >> Signed-off-by: Peter Lieven <pl@kamp.de>
> >> ---
> >>  block/rbd.c | 55 +++++++++++++++++++++++++++++++++--------------------
> >>  1 file changed, 34 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index def96292e0..5e9dc91d81 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1279,13 +1279,24 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
> >>      RBDDiffIterateReq *req = opaque;
> >>
> >>      assert(req->offs + req->bytes <= offs);
> >> -    /*
> >> -     * we do not diff against a snapshot so we should never receive a callback
> >> -     * for a hole.
> >> -     */
> >> -    assert(exists);
> >>
> >> -    if (!req->exists && offs > req->offs) {
> >> +    if (req->exists && offs > req->offs + req->bytes) {
> >> +        /*
> >> +         * we started in an allocated area and jumped over an unallocated area,
> >> +         * req->bytes contains the length of the allocated area before the
> >> +         * unallocated area. stop further processing.
> >> +         */
> >> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >> +    }
> >> +    if (req->exists && !exists) {
> >> +        /*
> >> +         * we started in an allocated area and reached a hole. req->bytes
> >> +         * contains the length of the allocated area before the hole.
> >> +         * stop further processing.
> >> +         */
> >> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >> +    }
> >> +    if (!req->exists && exists && offs > req->offs) {
> >>          /*
> >>           * we started in an unallocated area and hit the first allocated
> >>           * block. req->bytes must be set to the length of the unallocated area
> >> @@ -1295,17 +1306,19 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
> >>          return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >>      }
> >>
> >> -    if (req->exists && offs > req->offs + req->bytes) {
> >> -        /*
> >> -         * we started in an allocated area and jumped over an unallocated area,
> >> -         * req->bytes contains the length of the allocated area before the
> >> -         * unallocated area. stop further processing.
> >> -         */
> >> -        return QEMU_RBD_EXIT_DIFF_ITERATE2;
> >> -    }
> >> +    /*
> >> +     * assert that we caught all cases above and allocation state has not
> >> +     * changed during callbacks.
> >> +     */
> >> +    assert(exists == req->exists || !req->bytes);
> >> +    req->exists = exists;
> >>
> >> -    req->bytes += len;
> >> -    req->exists = true;
> >> +    /*
> >> +     * assert that we either return an unallocated block or have got callbacks
> >> +     * for all allocated blocks present.
> >> +     */
> >> +    assert(!req->exists || offs == req->offs + req->bytes);
> >> +    req->bytes = offs + len - req->offs;
> >>
> >>      return 0;
> >>  }
> >> @@ -1354,13 +1367,13 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
> >>      }
> >>      assert(req.bytes <= bytes);
> >>      if (!req.exists) {
> >> -        if (r == 0) {
> >> +        if (r == 0 && !req.bytes) {
> >>              /*
> >> -             * rbd_diff_iterate2 does not invoke callbacks for unallocated
> >> -             * areas. This here catches the case where no callback was
> >> -             * invoked at all (req.bytes == 0).
> >> +             * rbd_diff_iterate2 does not invoke callbacks for unallocated areas
> >> +             * except for the case where an overlay has a hole where the parent
> >> +             * or an older snapshot of the image has not. This here catches the
> >> +             * case where no callback was invoked at all.
> >>               */
> >> -            assert(req.bytes == 0);
> >>              req.bytes = bytes;
> >>          }
> >>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
> >> --
> >> 2.25.1
> >>
> >>
> > Hi Peter,
> >
> > Can we just skip these "holes" by replacing the existing assert with
> > an if statement that would simply bail from the callback on !exists?
> >
> > Just trying to keep the logic as simple as possible since as it turns
> > out we get to contend with ages-old librbd bugs here...
>
>
> I'm afraid I think this would not work. Consider qemu-img convert.
>
> If we bail out we would immediately call get_block_status with the offset
>
> where we stopped and hit the !exist again.

I'm suggesting bailing from the callback (i.e. return 0), not from the
entire rbd_diff_iterate2() instance.  The iteration would move on and
either stumble upon an allocated area within the requested range or run
off the end of the requested range.  Both of these cases are already
handled by the existing code.

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status
  2022-01-12 21:02       ` Ilya Dryomov
@ 2022-01-12 21:27         ` Peter Lieven
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Lieven @ 2022-01-12 21:27 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Kevin Wolf,  Daniel P. Berrangé ,
	qemu-block, ct, qemu-devel, qemu-stable, Paolo Bonzini,
	Max Reitz, Jason Dillaman


> Am 12.01.2022 um 22:06 schrieb Ilya Dryomov <idryomov@gmail.com>:
> 
> On Wed, Jan 12, 2022 at 9:39 PM Peter Lieven <pl@kamp.de> wrote:
>> 
>>> Am 12.01.22 um 10:05 schrieb Ilya Dryomov:
>>> On Mon, Jan 10, 2022 at 12:42 PM Peter Lieven <pl@kamp.de> wrote:
>>>> the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.
>>>> 
>>>> We can see a hole in an image if we diff against base if there exists an older snapshot
>>>> of the image and we have discarded blocks in the image where the snapshot has data.
>>>> 
>>>> Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
>>>> Cc: qemu-stable@nongnu.org
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>> block/rbd.c | 55 +++++++++++++++++++++++++++++++++--------------------
>>>> 1 file changed, 34 insertions(+), 21 deletions(-)
>>>> 
>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>> index def96292e0..5e9dc91d81 100644
>>>> --- a/block/rbd.c
>>>> +++ b/block/rbd.c
>>>> @@ -1279,13 +1279,24 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
>>>>     RBDDiffIterateReq *req = opaque;
>>>> 
>>>>     assert(req->offs + req->bytes <= offs);
>>>> -    /*
>>>> -     * we do not diff against a snapshot so we should never receive a callback
>>>> -     * for a hole.
>>>> -     */
>>>> -    assert(exists);
>>>> 
>>>> -    if (!req->exists && offs > req->offs) {
>>>> +    if (req->exists && offs > req->offs + req->bytes) {
>>>> +        /*
>>>> +         * we started in an allocated area and jumped over an unallocated area,
>>>> +         * req->bytes contains the length of the allocated area before the
>>>> +         * unallocated area. stop further processing.
>>>> +         */
>>>> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
>>>> +    }
>>>> +    if (req->exists && !exists) {
>>>> +        /*
>>>> +         * we started in an allocated area and reached a hole. req->bytes
>>>> +         * contains the length of the allocated area before the hole.
>>>> +         * stop further processing.
>>>> +         */
>>>> +        return QEMU_RBD_EXIT_DIFF_ITERATE2;
>>>> +    }
>>>> +    if (!req->exists && exists && offs > req->offs) {
>>>>         /*
>>>>          * we started in an unallocated area and hit the first allocated
>>>>          * block. req->bytes must be set to the length of the unallocated area
>>>> @@ -1295,17 +1306,19 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
>>>>         return QEMU_RBD_EXIT_DIFF_ITERATE2;
>>>>     }
>>>> 
>>>> -    if (req->exists && offs > req->offs + req->bytes) {
>>>> -        /*
>>>> -         * we started in an allocated area and jumped over an unallocated area,
>>>> -         * req->bytes contains the length of the allocated area before the
>>>> -         * unallocated area. stop further processing.
>>>> -         */
>>>> -        return QEMU_RBD_EXIT_DIFF_ITERATE2;
>>>> -    }
>>>> +    /*
>>>> +     * assert that we caught all cases above and allocation state has not
>>>> +     * changed during callbacks.
>>>> +     */
>>>> +    assert(exists == req->exists || !req->bytes);
>>>> +    req->exists = exists;
>>>> 
>>>> -    req->bytes += len;
>>>> -    req->exists = true;
>>>> +    /*
>>>> +     * assert that we either return an unallocated block or have got callbacks
>>>> +     * for all allocated blocks present.
>>>> +     */
>>>> +    assert(!req->exists || offs == req->offs + req->bytes);
>>>> +    req->bytes = offs + len - req->offs;
>>>> 
>>>>     return 0;
>>>> }
>>>> @@ -1354,13 +1367,13 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>>>     }
>>>>     assert(req.bytes <= bytes);
>>>>     if (!req.exists) {
>>>> -        if (r == 0) {
>>>> +        if (r == 0 && !req.bytes) {
>>>>             /*
>>>> -             * rbd_diff_iterate2 does not invoke callbacks for unallocated
>>>> -             * areas. This here catches the case where no callback was
>>>> -             * invoked at all (req.bytes == 0).
>>>> +             * rbd_diff_iterate2 does not invoke callbacks for unallocated areas
>>>> +             * except for the case where an overlay has a hole where the parent
>>>> +             * or an older snapshot of the image has not. This here catches the
>>>> +             * case where no callback was invoked at all.
>>>>              */
>>>> -            assert(req.bytes == 0);
>>>>             req.bytes = bytes;
>>>>         }
>>>>         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>>>> --
>>>> 2.25.1
>>>> 
>>>> 
>>> Hi Peter,
>>> 
>>> Can we just skip these "holes" by replacing the existing assert with
>>> an if statement that would simply bail from the callback on !exists?
>>> 
>>> Just trying to keep the logic as simple as possible since as it turns
>>> out we get to contend with ages-old librbd bugs here...
>> 
>> 
>> I'm afraid I think this would not work. Consider qemu-img convert.
>> 
>> If we bail out we would immediately call get_block_status with the offset
>> 
>> where we stopped and hit the !exist again.
> 
> I'm suggesting bailing from the callback (i.e. return 0), not from the
> entire rbd_diff_iterate2() instance.  The iteration would move on and
> either stumble upon an allocated area within the requested range or run
> off the end of the requested range.  Both of these cases are already
> handled by the existing code.

Ah, got it. That’s a smart solution!

Peter




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-01-12 21:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 11:41 [PATCH 0/2] block/rbd: fixes for bdrv_co_block_status Peter Lieven
2022-01-10 11:41 ` [PATCH 1/2] block/rbd: fix handling of holes in .bdrv_co_block_status Peter Lieven
2022-01-12  9:05   ` Ilya Dryomov
2022-01-12 20:39     ` Peter Lieven
2022-01-12 21:02       ` Ilya Dryomov
2022-01-12 21:27         ` Peter Lieven
2022-01-10 11:41 ` [PATCH 2/2] block/rbd: workaround for ceph issue #53784 Peter Lieven
2022-01-10 14:18   ` Stefano Garzarella
2022-01-11  9:10     ` Peter Lieven
2022-01-11 11:15       ` Stefano Garzarella
2022-01-12  9:59   ` Ilya Dryomov
2022-01-12 11:55     ` Peter Lieven
2022-01-12 12:22       ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.