All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block/rbd: fix write zeroes with growing images
@ 2022-03-17 16:26 Stefano Garzarella
  2022-03-17 18:27 ` Peter Lieven
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stefano Garzarella @ 2022-03-17 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Peter Lieven, Hanna Reitz, Ilya Dryomov,
	Stefano Garzarella

Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 block/rbd.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
 
     assert(!qiov || qiov->size == bytes);
 
+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+        /*
+         * RBD APIs don't allow us to write more than actual size, so in order
+         * to support growing images, we resize the image before write
+         * operations that exceed the current size.
+         */
+        if (offset + bytes > s->image_size) {
+            int r = qemu_rbd_resize(bs, offset + bytes);
+            if (r < 0) {
+                return r;
+            }
+        }
+    }
+
     r = rbd_aio_create_completion(&task,
                                   (rbd_callback_t) qemu_rbd_completion_cb, &c);
     if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
                                  int64_t bytes, QEMUIOVector *qiov,
                                  BdrvRequestFlags flags)
 {
-    BDRVRBDState *s = bs->opaque;
-    /*
-     * RBD APIs don't allow us to write more than actual size, so in order
-     * to support growing images, we resize the image before write
-     * operations that exceed the current size.
-     */
-    if (offset + bytes > s->image_size) {
-        int r = qemu_rbd_resize(bs, offset + bytes);
-        if (r < 0) {
-            return r;
-        }
-    }
     return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
 }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-17 16:26 [PATCH] block/rbd: fix write zeroes with growing images Stefano Garzarella
@ 2022-03-17 18:27 ` Peter Lieven
  2022-03-18  8:25   ` Stefano Garzarella
  2022-03-18 15:36 ` Hanna Reitz
  2022-03-19 12:33 ` Ilya Dryomov
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2022-03-17 18:27 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block



> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
> 
> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> added a workaround to support growing images (eg. qcow2), resizing
> the image before write operations that exceed the current size.
> 
> We recently added support for write zeroes and without the
> workaround we can have problems with qcow2.
> 
> So let's move the resize into qemu_rbd_start_co() and do it when
> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> block/rbd.c | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index 8f183eba2a..6caf35cbba 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> 
>     assert(!qiov || qiov->size == bytes);
> 
> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> +        /*
> +         * RBD APIs don't allow us to write more than actual size, so in order
> +         * to support growing images, we resize the image before write
> +         * operations that exceed the current size.
> +         */
> +        if (offset + bytes > s->image_size) {
> +            int r = qemu_rbd_resize(bs, offset + bytes);
> +            if (r < 0) {
> +                return r;
> +            }
> +        }
> +    }
> +
>     r = rbd_aio_create_completion(&task,
>                                   (rbd_callback_t) qemu_rbd_completion_cb, &c);
>     if (r < 0) {
> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>                                  int64_t bytes, QEMUIOVector *qiov,
>                                  BdrvRequestFlags flags)
> {
> -    BDRVRBDState *s = bs->opaque;
> -    /*
> -     * RBD APIs don't allow us to write more than actual size, so in order
> -     * to support growing images, we resize the image before write
> -     * operations that exceed the current size.
> -     */
> -    if (offset + bytes > s->image_size) {
> -        int r = qemu_rbd_resize(bs, offset + bytes);
> -        if (r < 0) {
> -            return r;
> -        }
> -    }
>     return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
> }
> 
> -- 
> 2.35.1
> 

Do we really have a use case for growing rbd images?

Peter



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-17 18:27 ` Peter Lieven
@ 2022-03-18  8:25   ` Stefano Garzarella
  2022-03-18 15:48     ` Peter Lieven
  2022-03-19 12:40     ` Ilya Dryomov
  0 siblings, 2 replies; 18+ messages in thread
From: Stefano Garzarella @ 2022-03-18  8:25 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block

On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>
>
>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>
>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>> added a workaround to support growing images (eg. qcow2), resizing
>> the image before write operations that exceed the current size.
>>
>> We recently added support for write zeroes and without the
>> workaround we can have problems with qcow2.
>>
>> So let's move the resize into qemu_rbd_start_co() and do it when
>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>> block/rbd.c | 26 ++++++++++++++------------
>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 8f183eba2a..6caf35cbba 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>
>>     assert(!qiov || qiov->size == bytes);
>>
>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>> +        /*
>> +         * RBD APIs don't allow us to write more than actual size, so in order
>> +         * to support growing images, we resize the image before write
>> +         * operations that exceed the current size.
>> +         */
>> +        if (offset + bytes > s->image_size) {
>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>> +            if (r < 0) {
>> +                return r;
>> +            }
>> +        }
>> +    }
>> +
>>     r = rbd_aio_create_completion(&task,
>>                                   (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>     if (r < 0) {
>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>                                  int64_t bytes, QEMUIOVector *qiov,
>>                                  BdrvRequestFlags flags)
>> {
>> -    BDRVRBDState *s = bs->opaque;
>> -    /*
>> -     * RBD APIs don't allow us to write more than actual size, so in order
>> -     * to support growing images, we resize the image before write
>> -     * operations that exceed the current size.
>> -     */
>> -    if (offset + bytes > s->image_size) {
>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>> -        if (r < 0) {
>> -            return r;
>> -        }
>> -    }
>>     return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>> }
>>
>> --
>> 2.35.1
>>
>
>Do we really have a use case for growing rbd images?

The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we 
had a little discussion about features that could be interesting (e.g.  
persistent dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use 
cases since we only increase the size when we go beyond the current 
size.

IMHO we can have it in :-)

Thanks,
Stefano

[1] 
https://lore.kernel.org/all/20190415080452.GA6031@localhost.localdomain/



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-17 16:26 [PATCH] block/rbd: fix write zeroes with growing images Stefano Garzarella
  2022-03-17 18:27 ` Peter Lieven
@ 2022-03-18 15:36 ` Hanna Reitz
  2022-03-19 12:33 ` Ilya Dryomov
  2 siblings, 0 replies; 18+ messages in thread
From: Hanna Reitz @ 2022-03-18 15:36 UTC (permalink / raw)
  To: Stefano Garzarella, qemu-devel
  Cc: Kevin Wolf, Ilya Dryomov, Peter Lieven, qemu-block

On 17.03.22 17:26, Stefano Garzarella wrote:
> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> added a workaround to support growing images (eg. qcow2), resizing
> the image before write operations that exceed the current size.
>
> We recently added support for write zeroes and without the
> workaround we can have problems with qcow2.
>
> So let's move the resize into qemu_rbd_start_co() and do it when
> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>   block/rbd.c | 26 ++++++++++++++------------
>   1 file changed, 14 insertions(+), 12 deletions(-)

Thanks, applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-18  8:25   ` Stefano Garzarella
@ 2022-03-18 15:48     ` Peter Lieven
  2022-03-18 16:47       ` Stefano Garzarella
  2022-03-19 12:40     ` Ilya Dryomov
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2022-03-18 15:48 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block



> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
> 
> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>> 
>> 
>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>> 
>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>> added a workaround to support growing images (eg. qcow2), resizing
>>> the image before write operations that exceed the current size.
>>> 
>>> We recently added support for write zeroes and without the
>>> workaround we can have problems with qcow2.
>>> 
>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>> 
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> ---
>>> block/rbd.c | 26 ++++++++++++++------------
>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>> 
>>> diff --git a/block/rbd.c b/block/rbd.c
>>> index 8f183eba2a..6caf35cbba 100644
>>> --- a/block/rbd.c
>>> +++ b/block/rbd.c
>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>> 
>>>    assert(!qiov || qiov->size == bytes);
>>> 
>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>> +        /*
>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>> +         * to support growing images, we resize the image before write
>>> +         * operations that exceed the current size.
>>> +         */
>>> +        if (offset + bytes > s->image_size) {
>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>> +            if (r < 0) {
>>> +                return r;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>    r = rbd_aio_create_completion(&task,
>>>                                  (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>    if (r < 0) {
>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>                                 int64_t bytes, QEMUIOVector *qiov,
>>>                                 BdrvRequestFlags flags)
>>> {
>>> -    BDRVRBDState *s = bs->opaque;
>>> -    /*
>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>> -     * to support growing images, we resize the image before write
>>> -     * operations that exceed the current size.
>>> -     */
>>> -    if (offset + bytes > s->image_size) {
>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>> -        if (r < 0) {
>>> -            return r;
>>> -        }
>>> -    }
>>>    return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>> }
>>> 
>>> --
>>> 2.35.1
>>> 
>> 
>> Do we really have a use case for growing rbd images?
> 
> The use case is to have a qcow2 image on rbd.
> I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g.  persistent dirty bitmaps for incremental backup).
> 
> In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
> 
> IMHO we can have it in :-)
> 

The QCOW2 alone doesn’t make much sense, but additional metadata might be a use case.
Be aware that the current approach will serialize requests. If there is a real use case, we might think of a better solution.

Peter

> Thanks,
> Stefano
> 
> [1] https://lore.kernel.org/all/20190415080452.GA6031@localhost.localdomain/
> 




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-18 15:48     ` Peter Lieven
@ 2022-03-18 16:47       ` Stefano Garzarella
  2022-03-19 15:15         ` Peter Lieven
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-03-18 16:47 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block

On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>
>
>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>
>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>
>>>
>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>
>>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>> the image before write operations that exceed the current size.
>>>>
>>>> We recently added support for write zeroes and without the
>>>> workaround we can have problems with qcow2.
>>>>
>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>
>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>> ---
>>>> block/rbd.c | 26 ++++++++++++++------------
>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>> index 8f183eba2a..6caf35cbba 100644
>>>> --- a/block/rbd.c
>>>> +++ b/block/rbd.c
>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>
>>>>    assert(!qiov || qiov->size == bytes);
>>>>
>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>> +        /*
>>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>>> +         * to support growing images, we resize the image before write
>>>> +         * operations that exceed the current size.
>>>> +         */
>>>> +        if (offset + bytes > s->image_size) {
>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>> +            if (r < 0) {
>>>> +                return r;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>>    r = rbd_aio_create_completion(&task,
>>>>                                  (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>    if (r < 0) {
>>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>                                 int64_t bytes, QEMUIOVector *qiov,
>>>>                                 BdrvRequestFlags flags)
>>>> {
>>>> -    BDRVRBDState *s = bs->opaque;
>>>> -    /*
>>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>>> -     * to support growing images, we resize the image before write
>>>> -     * operations that exceed the current size.
>>>> -     */
>>>> -    if (offset + bytes > s->image_size) {
>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>> -        if (r < 0) {
>>>> -            return r;
>>>> -        }
>>>> -    }
>>>>    return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>> }
>>>>
>>>> --
>>>> 2.35.1
>>>>
>>>
>>> Do we really have a use case for growing rbd images?
>>
>> The use case is to have a qcow2 image on rbd.
>> I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g.  persistent dirty bitmaps for incremental backup).
>>
>> In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
>>
>> IMHO we can have it in :-)
>>
>
>The QCOW2 alone doesn’t make much sense, but additional metadata might 
>be a use case.

Yep.

>Be aware that the current approach will serialize requests. If there is 
>a real use case, we might think of a better solution.

Good point, but it only happens when we have to resize, so maybe it's 
okay for now, but I agree we could do better ;-)

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-17 16:26 [PATCH] block/rbd: fix write zeroes with growing images Stefano Garzarella
  2022-03-17 18:27 ` Peter Lieven
  2022-03-18 15:36 ` Hanna Reitz
@ 2022-03-19 12:33 ` Ilya Dryomov
  2 siblings, 0 replies; 18+ messages in thread
From: Ilya Dryomov @ 2022-03-19 12:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Peter Lieven, qemu-devel, qemu block

On Thu, Mar 17, 2022 at 5:26 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> added a workaround to support growing images (eg. qcow2), resizing
> the image before write operations that exceed the current size.
>
> We recently added support for write zeroes and without the
> workaround we can have problems with qcow2.
>
> So let's move the resize into qemu_rbd_start_co() and do it when
> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  block/rbd.c | 26 ++++++++++++++------------
>  1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 8f183eba2a..6caf35cbba 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>
>      assert(!qiov || qiov->size == bytes);
>
> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> +        /*
> +         * RBD APIs don't allow us to write more than actual size, so in order
> +         * to support growing images, we resize the image before write
> +         * operations that exceed the current size.
> +         */
> +        if (offset + bytes > s->image_size) {
> +            int r = qemu_rbd_resize(bs, offset + bytes);
> +            if (r < 0) {
> +                return r;
> +            }
> +        }
> +    }
> +
>      r = rbd_aio_create_completion(&task,
>                                    (rbd_callback_t) qemu_rbd_completion_cb, &c);
>      if (r < 0) {
> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>                                   int64_t bytes, QEMUIOVector *qiov,
>                                   BdrvRequestFlags flags)
>  {
> -    BDRVRBDState *s = bs->opaque;
> -    /*
> -     * RBD APIs don't allow us to write more than actual size, so in order
> -     * to support growing images, we resize the image before write
> -     * operations that exceed the current size.
> -     */
> -    if (offset + bytes > s->image_size) {
> -        int r = qemu_rbd_resize(bs, offset + bytes);
> -        if (r < 0) {
> -            return r;
> -        }
> -    }
>      return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>  }
>
> --
> 2.35.1
>

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-18  8:25   ` Stefano Garzarella
  2022-03-18 15:48     ` Peter Lieven
@ 2022-03-19 12:40     ` Ilya Dryomov
  2022-03-19 13:23       ` Ilya Dryomov
  1 sibling, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2022-03-19 12:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Peter Lieven, qemu-devel, qemu block

On Fri, Mar 18, 2022 at 9:25 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
> >
> >
> >> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
> >>
> >> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> >> added a workaround to support growing images (eg. qcow2), resizing
> >> the image before write operations that exceed the current size.
> >>
> >> We recently added support for write zeroes and without the
> >> workaround we can have problems with qcow2.
> >>
> >> So let's move the resize into qemu_rbd_start_co() and do it when
> >> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> >>
> >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> >> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> >> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >> ---
> >> block/rbd.c | 26 ++++++++++++++------------
> >> 1 file changed, 14 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 8f183eba2a..6caf35cbba 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> >>
> >>     assert(!qiov || qiov->size == bytes);
> >>
> >> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> >> +        /*
> >> +         * RBD APIs don't allow us to write more than actual size, so in order
> >> +         * to support growing images, we resize the image before write
> >> +         * operations that exceed the current size.
> >> +         */
> >> +        if (offset + bytes > s->image_size) {
> >> +            int r = qemu_rbd_resize(bs, offset + bytes);
> >> +            if (r < 0) {
> >> +                return r;
> >> +            }
> >> +        }
> >> +    }
> >> +
> >>     r = rbd_aio_create_completion(&task,
> >>                                   (rbd_callback_t) qemu_rbd_completion_cb, &c);
> >>     if (r < 0) {
> >> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
> >>                                  int64_t bytes, QEMUIOVector *qiov,
> >>                                  BdrvRequestFlags flags)
> >> {
> >> -    BDRVRBDState *s = bs->opaque;
> >> -    /*
> >> -     * RBD APIs don't allow us to write more than actual size, so in order
> >> -     * to support growing images, we resize the image before write
> >> -     * operations that exceed the current size.
> >> -     */
> >> -    if (offset + bytes > s->image_size) {
> >> -        int r = qemu_rbd_resize(bs, offset + bytes);
> >> -        if (r < 0) {
> >> -            return r;
> >> -        }
> >> -    }
> >>     return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
> >> }
> >>
> >> --
> >> 2.35.1
> >>
> >
> >Do we really have a use case for growing rbd images?
>
> The use case is to have a qcow2 image on rbd.
> I don't think it's very common, but some people use it and here [1] we
> had a little discussion about features that could be interesting (e.g.
> persistent dirty bitmaps for incremental backup).

RBD supports that natively (object-map and fast-diff image features).
The granularity is higher than a typical QCOW2 granularity (64K vs 4M)
but I have never heard of that being a concern.

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-19 12:40     ` Ilya Dryomov
@ 2022-03-19 13:23       ` Ilya Dryomov
  2022-03-21  8:17         ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Dryomov @ 2022-03-19 13:23 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Peter Lieven, qemu-devel, qemu block

On Sat, Mar 19, 2022 at 1:40 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Fri, Mar 18, 2022 at 9:25 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
> > >
> > >
> > >> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
> > >>
> > >> Commit d24f80234b ("block/rbd: increase dynamically the image size")
> > >> added a workaround to support growing images (eg. qcow2), resizing
> > >> the image before write operations that exceed the current size.
> > >>
> > >> We recently added support for write zeroes and without the
> > >> workaround we can have problems with qcow2.
> > >>
> > >> So let's move the resize into qemu_rbd_start_co() and do it when
> > >> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
> > >>
> > >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
> > >> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
> > >> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >> ---
> > >> block/rbd.c | 26 ++++++++++++++------------
> > >> 1 file changed, 14 insertions(+), 12 deletions(-)
> > >>
> > >> diff --git a/block/rbd.c b/block/rbd.c
> > >> index 8f183eba2a..6caf35cbba 100644
> > >> --- a/block/rbd.c
> > >> +++ b/block/rbd.c
> > >> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> > >>
> > >>     assert(!qiov || qiov->size == bytes);
> > >>
> > >> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
> > >> +        /*
> > >> +         * RBD APIs don't allow us to write more than actual size, so in order
> > >> +         * to support growing images, we resize the image before write
> > >> +         * operations that exceed the current size.
> > >> +         */
> > >> +        if (offset + bytes > s->image_size) {
> > >> +            int r = qemu_rbd_resize(bs, offset + bytes);
> > >> +            if (r < 0) {
> > >> +                return r;
> > >> +            }
> > >> +        }
> > >> +    }
> > >> +
> > >>     r = rbd_aio_create_completion(&task,
> > >>                                   (rbd_callback_t) qemu_rbd_completion_cb, &c);
> > >>     if (r < 0) {
> > >> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
> > >>                                  int64_t bytes, QEMUIOVector *qiov,
> > >>                                  BdrvRequestFlags flags)
> > >> {
> > >> -    BDRVRBDState *s = bs->opaque;
> > >> -    /*
> > >> -     * RBD APIs don't allow us to write more than actual size, so in order
> > >> -     * to support growing images, we resize the image before write
> > >> -     * operations that exceed the current size.
> > >> -     */
> > >> -    if (offset + bytes > s->image_size) {
> > >> -        int r = qemu_rbd_resize(bs, offset + bytes);
> > >> -        if (r < 0) {
> > >> -            return r;
> > >> -        }
> > >> -    }
> > >>     return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
> > >> }
> > >>
> > >> --
> > >> 2.35.1
> > >>
> > >
> > >Do we really have a use case for growing rbd images?
> >
> > The use case is to have a qcow2 image on rbd.
> > I don't think it's very common, but some people use it and here [1] we
> > had a little discussion about features that could be interesting (e.g.
> > persistent dirty bitmaps for incremental backup).
>
> RBD supports that natively (object-map and fast-diff image features).
> The granularity is higher than a typical QCOW2 granularity (64K vs 4M)
> but I have never heard of that being a concern.

Sorry, I meant to say lower (more coarse grained) above.

RBD's default object size is 4M and that is the granularity at which
dirtiness is typically tracked.  It is possible to ask for a byte-level
incremental diff but it is obviously slow.

Thanks,

                Ilya


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-18 16:47       ` Stefano Garzarella
@ 2022-03-19 15:15         ` Peter Lieven
  2022-03-21  8:31           ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2022-03-19 15:15 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block



> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella <sgarzare@redhat.com>:
> 
> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>> 
>> 
>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>> 
>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>> 
>>>> 
>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>> 
>>>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>>> the image before write operations that exceed the current size.
>>>>> 
>>>>> We recently added support for write zeroes and without the
>>>>> workaround we can have problems with qcow2.
>>>>> 
>>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>> 
>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>> ---
>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>> 
>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>> --- a/block/rbd.c
>>>>> +++ b/block/rbd.c
>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>> 
>>>>>   assert(!qiov || qiov->size == bytes);
>>>>> 
>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>> +        /*
>>>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>>>> +         * to support growing images, we resize the image before write
>>>>> +         * operations that exceed the current size.
>>>>> +         */
>>>>> +        if (offset + bytes > s->image_size) {
>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>> +            if (r < 0) {
>>>>> +                return r;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>   r = rbd_aio_create_completion(&task,
>>>>>                                 (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>   if (r < 0) {
>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>                                int64_t bytes, QEMUIOVector *qiov,
>>>>>                                BdrvRequestFlags flags)
>>>>> {
>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>> -    /*
>>>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>>>> -     * to support growing images, we resize the image before write
>>>>> -     * operations that exceed the current size.
>>>>> -     */
>>>>> -    if (offset + bytes > s->image_size) {
>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>> -        if (r < 0) {
>>>>> -            return r;
>>>>> -        }
>>>>> -    }
>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>>> }
>>>>> 
>>>>> --
>>>>> 2.35.1
>>>>> 
>>>> 
>>>> Do we really have a use case for growing rbd images?
>>> 
>>> The use case is to have a qcow2 image on rbd.
>>> I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g.  persistent dirty bitmaps for incremental backup).
>>> 
>>> In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
>>> 
>>> IMHO we can have it in :-)
>>> 
>> 
>> The QCOW2 alone doesn’t make much sense, but additional metadata might be a use case.
> 
> Yep.
> 
>> Be aware that the current approach will serialize requests. If there is a real use case, we might think of a better solution.
> 
> Good point, but it only happens when we have to resize, so maybe it's okay for now, but I agree we could do better ;-)

There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid this. Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.

Peter

> 
> Thanks,
> Stefano
> 




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-19 13:23       ` Ilya Dryomov
@ 2022-03-21  8:17         ` Stefano Garzarella
  0 siblings, 0 replies; 18+ messages in thread
From: Stefano Garzarella @ 2022-03-21  8:17 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Kevin Wolf, Hanna Reitz, Peter Lieven, qemu-devel, qemu block

On Sat, Mar 19, 2022 at 02:23:18PM +0100, Ilya Dryomov wrote:
>On Sat, Mar 19, 2022 at 1:40 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>>
>> On Fri, Mar 18, 2022 at 9:25 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >
>> > On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>> > >
>> > >
>> > >> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>> > >>
>> > >> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>> > >> added a workaround to support growing images (eg. qcow2), resizing
>> > >> the image before write operations that exceed the current size.
>> > >>
>> > >> We recently added support for write zeroes and without the
>> > >> workaround we can have problems with qcow2.
>> > >>
>> > >> So let's move the resize into qemu_rbd_start_co() and do it when
>> > >> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>> > >>
>> > >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>> > >> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>> > >> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> > >> ---
>> > >> block/rbd.c | 26 ++++++++++++++------------
>> > >> 1 file changed, 14 insertions(+), 12 deletions(-)
>> > >>
>> > >> diff --git a/block/rbd.c b/block/rbd.c
>> > >> index 8f183eba2a..6caf35cbba 100644
>> > >> --- a/block/rbd.c
>> > >> +++ b/block/rbd.c
>> > >> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>> > >>
>> > >>     assert(!qiov || qiov->size == bytes);
>> > >>
>> > >> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>> > >> +        /*
>> > >> +         * RBD APIs don't allow us to write more than actual size, so in order
>> > >> +         * to support growing images, we resize the image before write
>> > >> +         * operations that exceed the current size.
>> > >> +         */
>> > >> +        if (offset + bytes > s->image_size) {
>> > >> +            int r = qemu_rbd_resize(bs, offset + bytes);
>> > >> +            if (r < 0) {
>> > >> +                return r;
>> > >> +            }
>> > >> +        }
>> > >> +    }
>> > >> +
>> > >>     r = rbd_aio_create_completion(&task,
>> > >>                                   (rbd_callback_t) qemu_rbd_completion_cb, &c);
>> > >>     if (r < 0) {
>> > >> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>> > >>                                  int64_t bytes, QEMUIOVector *qiov,
>> > >>                                  BdrvRequestFlags flags)
>> > >> {
>> > >> -    BDRVRBDState *s = bs->opaque;
>> > >> -    /*
>> > >> -     * RBD APIs don't allow us to write more than actual size, so in order
>> > >> -     * to support growing images, we resize the image before write
>> > >> -     * operations that exceed the current size.
>> > >> -     */
>> > >> -    if (offset + bytes > s->image_size) {
>> > >> -        int r = qemu_rbd_resize(bs, offset + bytes);
>> > >> -        if (r < 0) {
>> > >> -            return r;
>> > >> -        }
>> > >> -    }
>> > >>     return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>> > >> }
>> > >>
>> > >> --
>> > >> 2.35.1
>> > >>
>> > >
>> > >Do we really have a use case for growing rbd images?
>> >
>> > The use case is to have a qcow2 image on rbd.
>> > I don't think it's very common, but some people use it and here [1] we
>> > had a little discussion about features that could be interesting (e.g.
>> > persistent dirty bitmaps for incremental backup).
>>
>> RBD supports that natively (object-map and fast-diff image features).
>> The granularity is higher than a typical QCOW2 granularity (64K vs 4M)
>> but I have never heard of that being a concern.
>
>Sorry, I meant to say lower (more coarse grained) above.
>
>RBD's default object size is 4M and that is the granularity at which
>dirtiness is typically tracked.  It is possible to ask for a byte-level
>incremental diff but it is obviously slow.

I didn't know that, thanks for letting me know!

Stefano



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-19 15:15         ` Peter Lieven
@ 2022-03-21  8:31           ` Stefano Garzarella
  2022-03-22  9:38             ` Hanna Reitz
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-03-21  8:31 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block

On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>
>
>> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>
>> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>
>>>
>>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>
>>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>
>>>>>
>>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>
>>>>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>>>> the image before write operations that exceed the current size.
>>>>>>
>>>>>> We recently added support for write zeroes and without the
>>>>>> workaround we can have problems with qcow2.
>>>>>>
>>>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>
>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>> ---
>>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>>> --- a/block/rbd.c
>>>>>> +++ b/block/rbd.c
>>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>
>>>>>>   assert(!qiov || qiov->size == bytes);
>>>>>>
>>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>> +        /*
>>>>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>>>>> +         * to support growing images, we resize the image before write
>>>>>> +         * operations that exceed the current size.
>>>>>> +         */
>>>>>> +        if (offset + bytes > s->image_size) {
>>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>> +            if (r < 0) {
>>>>>> +                return r;
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>   r = rbd_aio_create_completion(&task,
>>>>>>                                 (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>>   if (r < 0) {
>>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>                                int64_t bytes, QEMUIOVector *qiov,
>>>>>>                                BdrvRequestFlags flags)
>>>>>> {
>>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>>> -    /*
>>>>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>>>>> -     * to support growing images, we resize the image before write
>>>>>> -     * operations that exceed the current size.
>>>>>> -     */
>>>>>> -    if (offset + bytes > s->image_size) {
>>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>> -        if (r < 0) {
>>>>>> -            return r;
>>>>>> -        }
>>>>>> -    }
>>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>>>> }
>>>>>>
>>>>>> --
>>>>>> 2.35.1
>>>>>>
>>>>>
>>>>> Do we really have a use case for growing rbd images?
>>>>
>>>> The use case is to have a qcow2 image on rbd.
>>>> I don't think it's very common, but some people use it and here [1] 
>>>> we had a little discussion about features that could be interesting 
>>>> (e.g.  persistent dirty bitmaps for incremental backup).
>>>>
>>>> In any case the support is quite simple and does not affect other 
>>>> use cases since we only increase the size when we go beyond the 
>>>> current size.
>>>>
>>>> IMHO we can have it in :-)
>>>>
>>>
>>> The QCOW2 alone doesn’t make much sense, but additional metadata 
>>> might be a use case.
>>
>> Yep.
>>
>>> Be aware that the current approach will serialize requests. If there 
>>> is a real use case, we might think of a better solution.
>>
>> Good point, but it only happens when we have to resize, so maybe it's 
>> okay for now, but I agree we could do better ;-)
>
>There might also be a problem if a write for a higher offset past eof 
>will be executed shortly before a write to a slightly lower offset past 
>eof. The second resize will fail as it would shrink the image. We would 
>need proper locking to avoid this. Maybe we need to check if we write 
>past eof. If yes, take a lock around the resize op and then check again 
>if it’s still eof and only resize if true.

I thought rbd_resize() was synchronous. Indeed when you said this could 
serialize writes it sounded like confirmation to me.

Since we call rbd_resize() before rbd_aio_writev(), I thought this case 
could not occur.

Can you please elaborate?

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-21  8:31           ` Stefano Garzarella
@ 2022-03-22  9:38             ` Hanna Reitz
  2022-03-24  9:52               ` Peter Lieven
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-03-22  9:38 UTC (permalink / raw)
  To: Stefano Garzarella, Peter Lieven
  Cc: Kevin Wolf, Ilya Dryomov, qemu-devel, qemu-block

On 21.03.22 09:31, Stefano Garzarella wrote:
> On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>>
>>
>>> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella 
>>> <sgarzare@redhat.com>:
>>>
>>> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>>
>>>>
>>>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella 
>>>>>> <sgarzare@redhat.com>:
>>>>>
>>>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>>
>>>>>>
>>>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella 
>>>>>>>> <sgarzare@redhat.com>:
>>>>>>>
>>>>>>> Commit d24f80234b ("block/rbd: increase dynamically the image 
>>>>>>> size")
>>>>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>>>>> the image before write operations that exceed the current size.
>>>>>>>
>>>>>>> We recently added support for write zeroes and without the
>>>>>>> workaround we can have problems with qcow2.
>>>>>>>
>>>>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>>
>>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>> ---
>>>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>>>> --- a/block/rbd.c
>>>>>>> +++ b/block/rbd.c
>>>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
>>>>>>> qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>>
>>>>>>>   assert(!qiov || qiov->size == bytes);
>>>>>>>
>>>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>>> +        /*
>>>>>>> +         * RBD APIs don't allow us to write more than actual 
>>>>>>> size, so in order
>>>>>>> +         * to support growing images, we resize the image 
>>>>>>> before write
>>>>>>> +         * operations that exceed the current size.
>>>>>>> +         */
>>>>>>> +        if (offset + bytes > s->image_size) {
>>>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>> +            if (r < 0) {
>>>>>>> +                return r;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>   r = rbd_aio_create_completion(&task,
>>>>>>>                                 (rbd_callback_t) 
>>>>>>> qemu_rbd_completion_cb, &c);
>>>>>>>   if (r < 0) {
>>>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn 
>>>>>>> qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>>                                int64_t bytes, QEMUIOVector *qiov,
>>>>>>>                                BdrvRequestFlags flags)
>>>>>>> {
>>>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>>>> -    /*
>>>>>>> -     * RBD APIs don't allow us to write more than actual size, 
>>>>>>> so in order
>>>>>>> -     * to support growing images, we resize the image before write
>>>>>>> -     * operations that exceed the current size.
>>>>>>> -     */
>>>>>>> -    if (offset + bytes > s->image_size) {
>>>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>> -        if (r < 0) {
>>>>>>> -            return r;
>>>>>>> -        }
>>>>>>> -    }
>>>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
>>>>>>> RBD_AIO_WRITE);
>>>>>>> }
>>>>>>>
>>>>>>> -- 
>>>>>>> 2.35.1
>>>>>>>
>>>>>>
>>>>>> Do we really have a use case for growing rbd images?
>>>>>
>>>>> The use case is to have a qcow2 image on rbd.
>>>>> I don't think it's very common, but some people use it and here 
>>>>> [1] we had a little discussion about features that could be 
>>>>> interesting (e.g.  persistent dirty bitmaps for incremental backup).
>>>>>
>>>>> In any case the support is quite simple and does not affect other 
>>>>> use cases since we only increase the size when we go beyond the 
>>>>> current size.
>>>>>
>>>>> IMHO we can have it in :-)
>>>>>
>>>>
>>>> The QCOW2 alone doesn’t make much sense, but additional metadata 
>>>> might be a use case.
>>>
>>> Yep.
>>>
>>>> Be aware that the current approach will serialize requests. If 
>>>> there is a real use case, we might think of a better solution.
>>>
>>> Good point, but it only happens when we have to resize, so maybe 
>>> it's okay for now, but I agree we could do better ;-)
>>
>> There might also be a problem if a write for a higher offset past eof 
>> will be executed shortly before a write to a slightly lower offset 
>> past eof. The second resize will fail as it would shrink the image. 
>> We would need proper locking to avoid this. Maybe we need to check if 
>> we write past eof. If yes, take a lock around the resize op and then 
>> check again if it’s still eof and only resize if true.
>
> I thought rbd_resize() was synchronous. Indeed when you said this 
> could serialize writes it sounded like confirmation to me.
>
> Since we call rbd_resize() before rbd_aio_writev(), I thought this 
> case could not occur.
>
> Can you please elaborate?

Seconding this request, because if rbd_resize() is allowed to shrink 
data, it being asynchronous might cause data corruption.

I’ll keep your patch because I find this highly unlikely, though: 
qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
qemu_coroutine_yield().

The only other possibility that comes to my mind is that rbd_resize() 
might delay the actual resize operation, but I would still expect 
consecutive resize requests to be executed in order, and since we call 
rbd_aio_writev()/rbd_aio_write_zeroes() immediately after the 
rbd_resize() (with no yielding in between), everything should be 
executed in the order that we expect.

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-22  9:38             ` Hanna Reitz
@ 2022-03-24  9:52               ` Peter Lieven
  2022-03-24 10:40                 ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2022-03-24  9:52 UTC (permalink / raw)
  To: Hanna Reitz, Stefano Garzarella
  Cc: Kevin Wolf, Ilya Dryomov, qemu-devel, qemu-block

Am 22.03.22 um 10:38 schrieb Hanna Reitz:
> On 21.03.22 09:31, Stefano Garzarella wrote:
>> On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>>>
>>>
>>>> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>
>>>> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>>>
>>>>>
>>>>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>
>>>>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>>>
>>>>>>>
>>>>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>
>>>>>>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>>>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>>>>>> the image before write operations that exceed the current size.
>>>>>>>>
>>>>>>>> We recently added support for write zeroes and without the
>>>>>>>> workaround we can have problems with qcow2.
>>>>>>>>
>>>>>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>>>
>>>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>> ---
>>>>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>>>>> --- a/block/rbd.c
>>>>>>>> +++ b/block/rbd.c
>>>>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>>>
>>>>>>>>   assert(!qiov || qiov->size == bytes);
>>>>>>>>
>>>>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>>>> +        /*
>>>>>>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>> +         * to support growing images, we resize the image before write
>>>>>>>> +         * operations that exceed the current size.
>>>>>>>> +         */
>>>>>>>> +        if (offset + bytes > s->image_size) {
>>>>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>> +            if (r < 0) {
>>>>>>>> +                return r;
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>   r = rbd_aio_create_completion(&task,
>>>>>>>>                                 (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>>>>   if (r < 0) {
>>>>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>>>                                int64_t bytes, QEMUIOVector *qiov,
>>>>>>>>                                BdrvRequestFlags flags)
>>>>>>>> {
>>>>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>>>>> -    /*
>>>>>>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>> -     * to support growing images, we resize the image before write
>>>>>>>> -     * operations that exceed the current size.
>>>>>>>> -     */
>>>>>>>> -    if (offset + bytes > s->image_size) {
>>>>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>> -        if (r < 0) {
>>>>>>>> -            return r;
>>>>>>>> -        }
>>>>>>>> -    }
>>>>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>>>>>> }
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 2.35.1
>>>>>>>>
>>>>>>>
>>>>>>> Do we really have a use case for growing rbd images?
>>>>>>
>>>>>> The use case is to have a qcow2 image on rbd.
>>>>>> I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g.  persistent dirty bitmaps for incremental backup).
>>>>>>
>>>>>> In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
>>>>>>
>>>>>> IMHO we can have it in :-)
>>>>>>
>>>>>
>>>>> The QCOW2 alone doesn’t make much sense, but additional metadata might be a use case.
>>>>
>>>> Yep.
>>>>
>>>>> Be aware that the current approach will serialize requests. If there is a real use case, we might think of a better solution.
>>>>
>>>> Good point, but it only happens when we have to resize, so maybe it's okay for now, but I agree we could do better ;-)
>>>
>>> There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid this. 
>>> Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.
>>
>> I thought rbd_resize() was synchronous. Indeed when you said this could serialize writes it sounded like confirmation to me.
>>
>> Since we call rbd_resize() before rbd_aio_writev(), I thought this case could not occur.
>>
>> Can you please elaborate?
>
> Seconding this request, because if rbd_resize() is allowed to shrink data, it being asynchronous might cause data corruption.
>
> I’ll keep your patch because I find this highly unlikely, though: qemu_rbd_resize() itself is definitely synchronous, it can’t invoke qemu_coroutine_yield().
>
> The only other possibility that comes to my mind is that rbd_resize() might delay the actual resize operation, but I would still expect consecutive resize requests to be executed in order, and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
> immediately after the rbd_resize() (with no yielding in between), everything should be executed in the order that we expect.


Maybe my assumption of parallelism here was wrong. I was thinking of:


Request A: write at offset (EOL + 4k).

Request A: rbd_resize is invoked (size EOL + 4k)

Request B: write at offset (EOL + 2k)

Request B: rbd_resize is invoked (size EOL + 2k) because image_size is not yet updated

Request A: rbd_resize finishes image_size is set to EOL + 4k

Request B: rbd_resize fails or shrinks the image to EOL + 2k


If this can't happen with coroutines, please ignore my concern.


Peter





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-24  9:52               ` Peter Lieven
@ 2022-03-24 10:40                 ` Stefano Garzarella
  2022-03-24 10:42                   ` Peter Lieven
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-03-24 10:40 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block

On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:
>Am 22.03.22 um 10:38 schrieb Hanna Reitz:
>>On 21.03.22 09:31, Stefano Garzarella wrote:
>>>On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>>>>
>>>>
>>>>>Am 18.03.2022 um 17:47 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>
>>>>>On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>>>>
>>>>>>
>>>>>>>>Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>
>>>>>>>On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>>Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>>
>>>>>>>>>Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>>>>>>>added a workaround to support growing images (eg. qcow2), resizing
>>>>>>>>>the image before write operations that exceed the current size.
>>>>>>>>>
>>>>>>>>>We recently added support for write zeroes and without the
>>>>>>>>>workaround we can have problems with qcow2.
>>>>>>>>>
>>>>>>>>>So let's move the resize into qemu_rbd_start_co() and do it when
>>>>>>>>>the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>>>>
>>>>>>>>>Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>>>>>Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>>>>>Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>>>---
>>>>>>>>>block/rbd.c | 26 ++++++++++++++------------
>>>>>>>>>1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>>>>
>>>>>>>>>diff --git a/block/rbd.c b/block/rbd.c
>>>>>>>>>index 8f183eba2a..6caf35cbba 100644
>>>>>>>>>--- a/block/rbd.c
>>>>>>>>>+++ b/block/rbd.c
>>>>>>>>>@@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>>>>
>>>>>>>>>  assert(!qiov || qiov->size == bytes);
>>>>>>>>>
>>>>>>>>>+    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>>>>>+        /*
>>>>>>>>>+         * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>>>+         * to support growing images, we resize the image before write
>>>>>>>>>+         * operations that exceed the current size.
>>>>>>>>>+         */
>>>>>>>>>+        if (offset + bytes > s->image_size) {
>>>>>>>>>+            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>+            if (r < 0) {
>>>>>>>>>+                return r;
>>>>>>>>>+            }
>>>>>>>>>+        }
>>>>>>>>>+    }
>>>>>>>>>+
>>>>>>>>>  r = rbd_aio_create_completion(&task,
>>>>>>>>>                                (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>>>>>  if (r < 0) {
>>>>>>>>>@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>>>>                               int64_t bytes, QEMUIOVector *qiov,
>>>>>>>>>                               BdrvRequestFlags flags)
>>>>>>>>>{
>>>>>>>>>-    BDRVRBDState *s = bs->opaque;
>>>>>>>>>-    /*
>>>>>>>>>-     * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>>>-     * to support growing images, we resize the image before write
>>>>>>>>>-     * operations that exceed the current size.
>>>>>>>>>-     */
>>>>>>>>>-    if (offset + bytes > s->image_size) {
>>>>>>>>>-        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>-        if (r < 0) {
>>>>>>>>>-            return r;
>>>>>>>>>-        }
>>>>>>>>>-    }
>>>>>>>>>  return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>>>>>>>}
>>>>>>>>>
>>>>>>>>>-- 2.35.1
>>>>>>>>>
>>>>>>>>
>>>>>>>>Do we really have a use case for growing rbd images?
>>>>>>>
>>>>>>>The use case is to have a qcow2 image on rbd.
>>>>>>>I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g.  persistent dirty bitmaps for incremental backup).
>>>>>>>
>>>>>>>In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
>>>>>>>
>>>>>>>IMHO we can have it in :-)
>>>>>>>
>>>>>>
>>>>>>The QCOW2 alone doesn’t make much sense, but additional metadata might be a use case.
>>>>>
>>>>>Yep.
>>>>>
>>>>>>Be aware that the current approach will serialize requests. If there is a real use case, we might think of a better solution.
>>>>>
>>>>>Good point, but it only happens when we have to resize, so maybe it's okay for now, but I agree we could do better ;-)
>>>>
>>>>There might also be a problem if a write for a higher offset 
>>>>past eof will be executed shortly before a write to a slightly 
>>>>lower offset past eof. The second resize will fail as it would 
>>>>shrink the image. We would need proper locking to avoid this. 
>>>>Maybe we need to check if we write past eof. If yes, take a lock 
>>>>around the resize op and then check again if it’s still eof and 
>>>>only resize if true.
>>>
>>>I thought rbd_resize() was synchronous. Indeed when you said this could serialize writes it sounded like confirmation to me.
>>>
>>>Since we call rbd_resize() before rbd_aio_writev(), I thought this case could not occur.
>>>
>>>Can you please elaborate?
>>
>>Seconding this request, because if rbd_resize() is allowed to shrink data, it being asynchronous might cause data corruption.
>>
>>I’ll keep your patch because I find this highly unlikely, though: qemu_rbd_resize() itself is definitely synchronous, it can’t invoke qemu_coroutine_yield().
>>
>>The only other possibility that comes to my mind is that 
>>rbd_resize() might delay the actual resize operation, but I would 
>>still expect consecutive resize requests to be executed in order, 
>>and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
>>immediately after the rbd_resize() (with no yielding in between), 
>>everything should be executed in the order that we expect.
>
>
>Maybe my assumption of parallelism here was wrong. I was thinking of:
>
>
>Request A: write at offset (EOL + 4k).
>
>Request A: rbd_resize is invoked (size EOL + 4k)

IIUC Request B can't start until Request A calls qemu_coroutine_yield(), 
but I'm waiting for a confirmation from Hanna :-)

Thanks,
Stefano

>
>Request B: write at offset (EOL + 2k)
>
>Request B: rbd_resize is invoked (size EOL + 2k) because image_size is not yet updated
>
>Request A: rbd_resize finishes image_size is set to EOL + 4k
>
>Request B: rbd_resize fails or shrinks the image to EOL + 2k
>
>
>If this can't happen with coroutines, please ignore my concern.
>
>
>Peter
>
>
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-24 10:40                 ` Stefano Garzarella
@ 2022-03-24 10:42                   ` Peter Lieven
  2022-03-24 11:06                     ` Hanna Reitz
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Lieven @ 2022-03-24 10:42 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, Ilya Dryomov, qemu-devel, qemu-block

Am 24.03.22 um 11:40 schrieb Stefano Garzarella:
> On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:
>> Am 22.03.22 um 10:38 schrieb Hanna Reitz:
>>> On 21.03.22 09:31, Stefano Garzarella wrote:
>>>> On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>>>>>
>>>>>
>>>>>> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>
>>>>>> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>>>>>
>>>>>>>
>>>>>>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>
>>>>>>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>>>
>>>>>>>>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>>>>>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>>>>>>>> the image before write operations that exceed the current size.
>>>>>>>>>>
>>>>>>>>>> We recently added support for write zeroes and without the
>>>>>>>>>> workaround we can have problems with qcow2.
>>>>>>>>>>
>>>>>>>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>>>>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>>>>>
>>>>>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>>>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>>>>>>> --- a/block/rbd.c
>>>>>>>>>> +++ b/block/rbd.c
>>>>>>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>>>>>
>>>>>>>>>>   assert(!qiov || qiov->size == bytes);
>>>>>>>>>>
>>>>>>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>>>>>> +        /*
>>>>>>>>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>>>> +         * to support growing images, we resize the image before write
>>>>>>>>>> +         * operations that exceed the current size.
>>>>>>>>>> +         */
>>>>>>>>>> +        if (offset + bytes > s->image_size) {
>>>>>>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>> +            if (r < 0) {
>>>>>>>>>> +                return r;
>>>>>>>>>> +            }
>>>>>>>>>> +        }
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>>   r = rbd_aio_create_completion(&task,
>>>>>>>>>>                                 (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>>>>>>   if (r < 0) {
>>>>>>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>>>>>                                int64_t bytes, QEMUIOVector *qiov,
>>>>>>>>>>                                BdrvRequestFlags flags)
>>>>>>>>>> {
>>>>>>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>>>>>>> -    /*
>>>>>>>>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>>>> -     * to support growing images, we resize the image before write
>>>>>>>>>> -     * operations that exceed the current size.
>>>>>>>>>> -     */
>>>>>>>>>> -    if (offset + bytes > s->image_size) {
>>>>>>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>> -        if (r < 0) {
>>>>>>>>>> -            return r;
>>>>>>>>>> -        }
>>>>>>>>>> -    }
>>>>>>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> -- 2.35.1
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do we really have a use case for growing rbd images?
>>>>>>>>
>>>>>>>> The use case is to have a qcow2 image on rbd.
>>>>>>>> I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g.  persistent dirty bitmaps for incremental backup).
>>>>>>>>
>>>>>>>> In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
>>>>>>>>
>>>>>>>> IMHO we can have it in :-)
>>>>>>>>
>>>>>>>
>>>>>>> The QCOW2 alone doesn’t make much sense, but additional metadata might be a use case.
>>>>>>
>>>>>> Yep.
>>>>>>
>>>>>>> Be aware that the current approach will serialize requests. If there is a real use case, we might think of a better solution.
>>>>>>
>>>>>> Good point, but it only happens when we have to resize, so maybe it's okay for now, but I agree we could do better ;-)
>>>>>
>>>>> There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid this. 
>>>>> Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.
>>>>
>>>> I thought rbd_resize() was synchronous. Indeed when you said this could serialize writes it sounded like confirmation to me.
>>>>
>>>> Since we call rbd_resize() before rbd_aio_writev(), I thought this case could not occur.
>>>>
>>>> Can you please elaborate?
>>>
>>> Seconding this request, because if rbd_resize() is allowed to shrink data, it being asynchronous might cause data corruption.
>>>
>>> I’ll keep your patch because I find this highly unlikely, though: qemu_rbd_resize() itself is definitely synchronous, it can’t invoke qemu_coroutine_yield().
>>>
>>> The only other possibility that comes to my mind is that rbd_resize() might delay the actual resize operation, but I would still expect consecutive resize requests to be executed in order, and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
>>> immediately after the rbd_resize() (with no yielding in between), everything should be executed in the order that we expect.
>>
>>
>> Maybe my assumption of parallelism here was wrong. I was thinking of:
>>
>>
>> Request A: write at offset (EOL + 4k).
>>
>> Request A: rbd_resize is invoked (size EOL + 4k)
>
> IIUC Request B can't start until Request A calls qemu_coroutine_yield(), but I'm waiting for a confirmation from Hanna :-)


Yes, and I would be interested if this is also true if coroutines are implemented as threads.

And, of course, that should read EOF and not EOL. Too much vim today ;-)


Peter





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-24 10:42                   ` Peter Lieven
@ 2022-03-24 11:06                     ` Hanna Reitz
  2022-03-24 11:34                       ` Peter Lieven
  0 siblings, 1 reply; 18+ messages in thread
From: Hanna Reitz @ 2022-03-24 11:06 UTC (permalink / raw)
  To: Peter Lieven, Stefano Garzarella
  Cc: Kevin Wolf, Ilya Dryomov, qemu-devel, qemu-block

On 24.03.22 11:42, Peter Lieven wrote:
> Am 24.03.22 um 11:40 schrieb Stefano Garzarella:
>> On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:
>>> Am 22.03.22 um 10:38 schrieb Hanna Reitz:
>>>> On 21.03.22 09:31, Stefano Garzarella wrote:
>>>>> On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>>>>>>
>>>>>>
>>>>>>> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella 
>>>>>>> <sgarzare@redhat.com>:
>>>>>>>
>>>>>>> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella 
>>>>>>>>>> <sgarzare@redhat.com>:
>>>>>>>>>
>>>>>>>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella 
>>>>>>>>>>>> <sgarzare@redhat.com>:
>>>>>>>>>>>
>>>>>>>>>>> Commit d24f80234b ("block/rbd: increase dynamically the 
>>>>>>>>>>> image size")
>>>>>>>>>>> added a workaround to support growing images (eg. qcow2), 
>>>>>>>>>>> resizing
>>>>>>>>>>> the image before write operations that exceed the current size.
>>>>>>>>>>>
>>>>>>>>>>> We recently added support for write zeroes and without the
>>>>>>>>>>> workaround we can have problems with qcow2.
>>>>>>>>>>>
>>>>>>>>>>> So let's move the resize into qemu_rbd_start_co() and do it 
>>>>>>>>>>> when
>>>>>>>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>>>>>>
>>>>>>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>>>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>>>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>>>>> ---
>>>>>>>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>>>>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>>>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>>>>>>>> --- a/block/rbd.c
>>>>>>>>>>> +++ b/block/rbd.c
>>>>>>>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
>>>>>>>>>>> qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>>>>>>
>>>>>>>>>>>   assert(!qiov || qiov->size == bytes);
>>>>>>>>>>>
>>>>>>>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>>>>>>> +        /*
>>>>>>>>>>> +         * RBD APIs don't allow us to write more than 
>>>>>>>>>>> actual size, so in order
>>>>>>>>>>> +         * to support growing images, we resize the image 
>>>>>>>>>>> before write
>>>>>>>>>>> +         * operations that exceed the current size.
>>>>>>>>>>> +         */
>>>>>>>>>>> +        if (offset + bytes > s->image_size) {
>>>>>>>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>>> +            if (r < 0) {
>>>>>>>>>>> +                return r;
>>>>>>>>>>> +            }
>>>>>>>>>>> +        }
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>>   r = rbd_aio_create_completion(&task,
>>>>>>>>>>> (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>>>>>>>   if (r < 0) {
>>>>>>>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn 
>>>>>>>>>>> qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>>>>>>                                int64_t bytes, QEMUIOVector 
>>>>>>>>>>> *qiov,
>>>>>>>>>>> BdrvRequestFlags flags)
>>>>>>>>>>> {
>>>>>>>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>>>>>>>> -    /*
>>>>>>>>>>> -     * RBD APIs don't allow us to write more than actual 
>>>>>>>>>>> size, so in order
>>>>>>>>>>> -     * to support growing images, we resize the image 
>>>>>>>>>>> before write
>>>>>>>>>>> -     * operations that exceed the current size.
>>>>>>>>>>> -     */
>>>>>>>>>>> -    if (offset + bytes > s->image_size) {
>>>>>>>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>>> -        if (r < 0) {
>>>>>>>>>>> -            return r;
>>>>>>>>>>> -        }
>>>>>>>>>>> -    }
>>>>>>>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, 
>>>>>>>>>>> RBD_AIO_WRITE);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> -- 2.35.1
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do we really have a use case for growing rbd images?
>>>>>>>>>
>>>>>>>>> The use case is to have a qcow2 image on rbd.
>>>>>>>>> I don't think it's very common, but some people use it and 
>>>>>>>>> here [1] we had a little discussion about features that could 
>>>>>>>>> be interesting (e.g. persistent dirty bitmaps for incremental 
>>>>>>>>> backup).
>>>>>>>>>
>>>>>>>>> In any case the support is quite simple and does not affect 
>>>>>>>>> other use cases since we only increase the size when we go 
>>>>>>>>> beyond the current size.
>>>>>>>>>
>>>>>>>>> IMHO we can have it in :-)
>>>>>>>>>
>>>>>>>>
>>>>>>>> The QCOW2 alone doesn’t make much sense, but additional 
>>>>>>>> metadata might be a use case.
>>>>>>>
>>>>>>> Yep.
>>>>>>>
>>>>>>>> Be aware that the current approach will serialize requests. If 
>>>>>>>> there is a real use case, we might think of a better solution.
>>>>>>>
>>>>>>> Good point, but it only happens when we have to resize, so maybe 
>>>>>>> it's okay for now, but I agree we could do better ;-)
>>>>>>
>>>>>> There might also be a problem if a write for a higher offset past 
>>>>>> eof will be executed shortly before a write to a slightly lower 
>>>>>> offset past eof. The second resize will fail as it would shrink 
>>>>>> the image. We would need proper locking to avoid this. Maybe we 
>>>>>> need to check if we write past eof. If yes, take a lock around 
>>>>>> the resize op and then check again if it’s still eof and only 
>>>>>> resize if true.
>>>>>
>>>>> I thought rbd_resize() was synchronous. Indeed when you said this 
>>>>> could serialize writes it sounded like confirmation to me.
>>>>>
>>>>> Since we call rbd_resize() before rbd_aio_writev(), I thought this 
>>>>> case could not occur.
>>>>>
>>>>> Can you please elaborate?
>>>>
>>>> Seconding this request, because if rbd_resize() is allowed to 
>>>> shrink data, it being asynchronous might cause data corruption.
>>>>
>>>> I’ll keep your patch because I find this highly unlikely, though: 
>>>> qemu_rbd_resize() itself is definitely synchronous, it can’t invoke 
>>>> qemu_coroutine_yield().
>>>>
>>>> The only other possibility that comes to my mind is that 
>>>> rbd_resize() might delay the actual resize operation, but I would 
>>>> still expect consecutive resize requests to be executed in order, 
>>>> and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
>>>> immediately after the rbd_resize() (with no yielding in between), 
>>>> everything should be executed in the order that we expect.
>>>
>>>
>>> Maybe my assumption of parallelism here was wrong. I was thinking of:
>>>
>>>
>>> Request A: write at offset (EOL + 4k).
>>>
>>> Request A: rbd_resize is invoked (size EOL + 4k)
>>
>> IIUC Request B can't start until Request A calls 
>> qemu_coroutine_yield(), but I'm waiting for a confirmation from Hanna :-)

That’s my impression at least.

> Yes, and I would be interested if this is also true if coroutines are 
> implemented as threads.

Depends on what you mean by that.  Coroutines are a form of cooperative 
multitasking, i.e. they can’t be preempted unless they explicitly 
yield.  Threads are generally supposed to be preemptive, so those are 
just different things.

Of course you can use coroutines in threads, i.e. run multiple requests 
in parallel.  But then the coroutine part becomes largely irrelevant, 
and you’re just facing standard thread-safety questions, and then of 
course this won’t be safe.  I assume to support such a model, all block 
drivers would need to be fully audited anyway, though.

For example, theoretically, the guest could then issue two resize 
operations simultaneously, and qemu_rbd_co_truncate() would be called in 
two concurrent threads.  This would already cause problems, because 
setting s->image_size would race.  That’s pre-existing regardless of 
this patch here (or d24f80234b39d2d5c0d91e63b5e4569d37b2399e).

What this means is that of course we could just slap a lock around the 
qemu_rbd_resize() call in qemu_rbd_start_co() (and its surrounding 
condition), it wouldn’t cost anything, assuming that this area can’t be 
run in parallel anyway.  But the rest of the block driver doesn’t 
contain a single lock yet, which to me signals that nothing in 
block/rbd.c is thread-safe anyway.

Hanna



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block/rbd: fix write zeroes with growing images
  2022-03-24 11:06                     ` Hanna Reitz
@ 2022-03-24 11:34                       ` Peter Lieven
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Lieven @ 2022-03-24 11:34 UTC (permalink / raw)
  To: Hanna Reitz, Stefano Garzarella
  Cc: Kevin Wolf, Ilya Dryomov, qemu-devel, qemu-block

Am 24.03.22 um 12:06 schrieb Hanna Reitz:
> On 24.03.22 11:42, Peter Lieven wrote:
>> Am 24.03.22 um 11:40 schrieb Stefano Garzarella:
>>> On Thu, Mar 24, 2022 at 10:52:04AM +0100, Peter Lieven wrote:
>>>> Am 22.03.22 um 10:38 schrieb Hanna Reitz:
>>>>> On 21.03.22 09:31, Stefano Garzarella wrote:
>>>>>> On Sat, Mar 19, 2022 at 04:15:33PM +0100, Peter Lieven wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Am 18.03.2022 um 17:47 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>
>>>>>>>> On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Am 17.03.2022 um 17:26 schrieb Stefano Garzarella <sgarzare@redhat.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>>>>>>>>>>> added a workaround to support growing images (eg. qcow2), resizing
>>>>>>>>>>>> the image before write operations that exceed the current size.
>>>>>>>>>>>>
>>>>>>>>>>>> We recently added support for write zeroes and without the
>>>>>>>>>>>> workaround we can have problems with qcow2.
>>>>>>>>>>>>
>>>>>>>>>>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>>>>>>>>>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>>>>>>>>>>>
>>>>>>>>>>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>>>>>>>>>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>>>>>>>>>>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> block/rbd.c | 26 ++++++++++++++------------
>>>>>>>>>>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>>>>>>>> index 8f183eba2a..6caf35cbba 100644
>>>>>>>>>>>> --- a/block/rbd.c
>>>>>>>>>>>> +++ b/block/rbd.c
>>>>>>>>>>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>>>>>>>>>>>
>>>>>>>>>>>>   assert(!qiov || qiov->size == bytes);
>>>>>>>>>>>>
>>>>>>>>>>>> +    if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>>>>>>>>>>> +        /*
>>>>>>>>>>>> +         * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>>>>>> +         * to support growing images, we resize the image before write
>>>>>>>>>>>> +         * operations that exceed the current size.
>>>>>>>>>>>> +         */
>>>>>>>>>>>> +        if (offset + bytes > s->image_size) {
>>>>>>>>>>>> +            int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>>>> +            if (r < 0) {
>>>>>>>>>>>> +                return r;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>>   r = rbd_aio_create_completion(&task,
>>>>>>>>>>>> (rbd_callback_t) qemu_rbd_completion_cb, &c);
>>>>>>>>>>>>   if (r < 0) {
>>>>>>>>>>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, int64_t offset,
>>>>>>>>>>>>                                int64_t bytes, QEMUIOVector *qiov,
>>>>>>>>>>>> BdrvRequestFlags flags)
>>>>>>>>>>>> {
>>>>>>>>>>>> -    BDRVRBDState *s = bs->opaque;
>>>>>>>>>>>> -    /*
>>>>>>>>>>>> -     * RBD APIs don't allow us to write more than actual size, so in order
>>>>>>>>>>>> -     * to support growing images, we resize the image before write
>>>>>>>>>>>> -     * operations that exceed the current size.
>>>>>>>>>>>> -     */
>>>>>>>>>>>> -    if (offset + bytes > s->image_size) {
>>>>>>>>>>>> -        int r = qemu_rbd_resize(bs, offset + bytes);
>>>>>>>>>>>> -        if (r < 0) {
>>>>>>>>>>>> -            return r;
>>>>>>>>>>>> -        }
>>>>>>>>>>>> -    }
>>>>>>>>>>>>   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> -- 2.35.1
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do we really have a use case for growing rbd images?
>>>>>>>>>>
>>>>>>>>>> The use case is to have a qcow2 image on rbd.
>>>>>>>>>> I don't think it's very common, but some people use it and here [1] we had a little discussion about features that could be interesting (e.g. persistent dirty bitmaps for incremental backup).
>>>>>>>>>>
>>>>>>>>>> In any case the support is quite simple and does not affect other use cases since we only increase the size when we go beyond the current size.
>>>>>>>>>>
>>>>>>>>>> IMHO we can have it in :-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The QCOW2 alone doesn’t make much sense, but additional metadata might be a use case.
>>>>>>>>
>>>>>>>> Yep.
>>>>>>>>
>>>>>>>>> Be aware that the current approach will serialize requests. If there is a real use case, we might think of a better solution.
>>>>>>>>
>>>>>>>> Good point, but it only happens when we have to resize, so maybe it's okay for now, but I agree we could do better ;-)
>>>>>>>
>>>>>>> There might also be a problem if a write for a higher offset past eof will be executed shortly before a write to a slightly lower offset past eof. The second resize will fail as it would shrink the image. We would need proper locking to avoid 
>>>>>>> this. Maybe we need to check if we write past eof. If yes, take a lock around the resize op and then check again if it’s still eof and only resize if true.
>>>>>>
>>>>>> I thought rbd_resize() was synchronous. Indeed when you said this could serialize writes it sounded like confirmation to me.
>>>>>>
>>>>>> Since we call rbd_resize() before rbd_aio_writev(), I thought this case could not occur.
>>>>>>
>>>>>> Can you please elaborate?
>>>>>
>>>>> Seconding this request, because if rbd_resize() is allowed to shrink data, it being asynchronous might cause data corruption.
>>>>>
>>>>> I’ll keep your patch because I find this highly unlikely, though: qemu_rbd_resize() itself is definitely synchronous, it can’t invoke qemu_coroutine_yield().
>>>>>
>>>>> The only other possibility that comes to my mind is that rbd_resize() might delay the actual resize operation, but I would still expect consecutive resize requests to be executed in order, and since we call rbd_aio_writev()/rbd_aio_write_zeroes() 
>>>>> immediately after the rbd_resize() (with no yielding in between), everything should be executed in the order that we expect.
>>>>
>>>>
>>>> Maybe my assumption of parallelism here was wrong. I was thinking of:
>>>>
>>>>
>>>> Request A: write at offset (EOL + 4k).
>>>>
>>>> Request A: rbd_resize is invoked (size EOL + 4k)
>>>
>>> IIUC Request B can't start until Request A calls qemu_coroutine_yield(), but I'm waiting for a confirmation from Hanna :-)
>
> That’s my impression at least.
>
>> Yes, and I would be interested if this is also true if coroutines are implemented as threads.
>
> Depends on what you mean by that.  Coroutines are a form of cooperative multitasking, i.e. they can’t be preempted unless they explicitly yield.  Threads are generally supposed to be preemptive, so those are just different things.


I believed there was a coroutine backend that used threads (at least in the past). Forget about that ;-)


>
> Of course you can use coroutines in threads, i.e. run multiple requests in parallel.  But then the coroutine part becomes largely irrelevant, and you’re just facing standard thread-safety questions, and then of course this won’t be safe.  I assume to 
> support such a model, all block drivers would need to be fully audited anyway, though.
>
> For example, theoretically, the guest could then issue two resize operations simultaneously, and qemu_rbd_co_truncate() would be called in two concurrent threads.  This would already cause problems, because setting s->image_size would race.  That’s 
> pre-existing regardless of this patch here (or d24f80234b39d2d5c0d91e63b5e4569d37b2399e).
>
> What this means is that of course we could just slap a lock around the qemu_rbd_resize() call in qemu_rbd_start_co() (and its surrounding condition), it wouldn’t cost anything, assuming that this area can’t be run in parallel anyway.  But the rest of 
> the block driver doesn’t contain a single lock yet, which to me signals that nothing in block/rbd.c is thread-safe anyway.


With that said I believe we can assume that the current implementation is safe (enough).


Peter





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-24 11:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17 16:26 [PATCH] block/rbd: fix write zeroes with growing images Stefano Garzarella
2022-03-17 18:27 ` Peter Lieven
2022-03-18  8:25   ` Stefano Garzarella
2022-03-18 15:48     ` Peter Lieven
2022-03-18 16:47       ` Stefano Garzarella
2022-03-19 15:15         ` Peter Lieven
2022-03-21  8:31           ` Stefano Garzarella
2022-03-22  9:38             ` Hanna Reitz
2022-03-24  9:52               ` Peter Lieven
2022-03-24 10:40                 ` Stefano Garzarella
2022-03-24 10:42                   ` Peter Lieven
2022-03-24 11:06                     ` Hanna Reitz
2022-03-24 11:34                       ` Peter Lieven
2022-03-19 12:40     ` Ilya Dryomov
2022-03-19 13:23       ` Ilya Dryomov
2022-03-21  8:17         ` Stefano Garzarella
2022-03-18 15:36 ` Hanna Reitz
2022-03-19 12:33 ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.