* [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
@ 2019-05-03 16:30 ` Stefano Garzarella
0 siblings, 0 replies; 7+ messages in thread
From: Stefano Garzarella @ 2019-05-03 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Josh Durgin, qemu-block, Max Reitz, Kevin Wolf
RBD APIs don't allow us to write more than the size set with
rbd_create() or rbd_resize().
In order to support growing images (eg. qcow2), we resize the
image before write operations that exceed the current size.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
v2:
- use bs->total_sectors instead of adding a new field [Kevin]
- resize the image only during write operation [Kevin]
for read operation, the bdrv_aligned_preadv() already handles reads
that exceed the length returned by bdrv_getlength(), so IMHO we can
avoid to handle it in the rbd driver
---
block/rbd.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/block/rbd.c b/block/rbd.c
index 0c549c9935..613e8f4982 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
}
switch (cmd) {
- case RBD_AIO_WRITE:
+ case RBD_AIO_WRITE: {
+ /*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+ if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
+ r = rbd_resize(s->image, off + size);
+ if (r < 0) {
+ goto failed_completion;
+ }
+ }
#ifdef LIBRBD_SUPPORTS_IOVEC
r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
#else
r = rbd_aio_write(s->image, off, size, rcb->buf, c);
#endif
break;
+ }
case RBD_AIO_READ:
#ifdef LIBRBD_SUPPORTS_IOVEC
r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
@ 2019-05-03 16:30 ` Stefano Garzarella
0 siblings, 0 replies; 7+ messages in thread
From: Stefano Garzarella @ 2019-05-03 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Josh Durgin, qemu-block, Max Reitz
RBD APIs don't allow us to write more than the size set with
rbd_create() or rbd_resize().
In order to support growing images (eg. qcow2), we resize the
image before write operations that exceed the current size.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
v2:
- use bs->total_sectors instead of adding a new field [Kevin]
- resize the image only during write operation [Kevin]
for read operation, the bdrv_aligned_preadv() already handles reads
that exceed the length returned by bdrv_getlength(), so IMHO we can
avoid to handle it in the rbd driver
---
block/rbd.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/block/rbd.c b/block/rbd.c
index 0c549c9935..613e8f4982 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
}
switch (cmd) {
- case RBD_AIO_WRITE:
+ case RBD_AIO_WRITE: {
+ /*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+ if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
+ r = rbd_resize(s->image, off + size);
+ if (r < 0) {
+ goto failed_completion;
+ }
+ }
#ifdef LIBRBD_SUPPORTS_IOVEC
r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
#else
r = rbd_aio_write(s->image, off, size, rcb->buf, c);
#endif
break;
+ }
case RBD_AIO_READ:
#ifdef LIBRBD_SUPPORTS_IOVEC
r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
@ 2019-05-03 17:21 ` Jason Dillaman
0 siblings, 0 replies; 7+ messages in thread
From: Jason Dillaman @ 2019-05-03 17:21 UTC (permalink / raw)
To: Stefano Garzarella
Cc: qemu-devel, Kevin Wolf, Josh Durgin, qemu-block, Max Reitz
On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> RBD APIs don't allow us to write more than the size set with
> rbd_create() or rbd_resize().
> In order to support growing images (eg. qcow2), we resize the
> image before write operations that exceed the current size.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> v2:
> - use bs->total_sectors instead of adding a new field [Kevin]
> - resize the image only during write operation [Kevin]
> for read operation, the bdrv_aligned_preadv() already handles reads
> that exceed the length returned by bdrv_getlength(), so IMHO we can
> avoid to handle it in the rbd driver
> ---
> block/rbd.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 0c549c9935..613e8f4982 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> }
>
> switch (cmd) {
> - case RBD_AIO_WRITE:
> + case RBD_AIO_WRITE: {
> + /*
> + * RBD APIs don't allow us to write more than actual size, so in order
> + * to support growing images, we resize the image before write
> + * operations that exceed the current size.
> + */
> + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
When will "bs->total_sectors" be refreshed to represent the correct
current size? You wouldn't want a future write whose extent was
greater than the original image size but less then a previous IO that
expanded the image to attempt to shrink the image.
> + r = rbd_resize(s->image, off + size);
> + if (r < 0) {
> + goto failed_completion;
> + }
> + }
> #ifdef LIBRBD_SUPPORTS_IOVEC
> r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
> #else
> r = rbd_aio_write(s->image, off, size, rcb->buf, c);
> #endif
> break;
> + }
> case RBD_AIO_READ:
> #ifdef LIBRBD_SUPPORTS_IOVEC
> r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
> --
> 2.20.1
>
>
--
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
@ 2019-05-03 17:21 ` Jason Dillaman
0 siblings, 0 replies; 7+ messages in thread
From: Jason Dillaman @ 2019-05-03 17:21 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Kevin Wolf, Josh Durgin, qemu-devel, qemu-block, Max Reitz
On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> RBD APIs don't allow us to write more than the size set with
> rbd_create() or rbd_resize().
> In order to support growing images (eg. qcow2), we resize the
> image before write operations that exceed the current size.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> v2:
> - use bs->total_sectors instead of adding a new field [Kevin]
> - resize the image only during write operation [Kevin]
> for read operation, the bdrv_aligned_preadv() already handles reads
> that exceed the length returned by bdrv_getlength(), so IMHO we can
> avoid to handle it in the rbd driver
> ---
> block/rbd.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 0c549c9935..613e8f4982 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> }
>
> switch (cmd) {
> - case RBD_AIO_WRITE:
> + case RBD_AIO_WRITE: {
> + /*
> + * RBD APIs don't allow us to write more than actual size, so in order
> + * to support growing images, we resize the image before write
> + * operations that exceed the current size.
> + */
> + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
When will "bs->total_sectors" be refreshed to represent the correct
current size? You wouldn't want a future write whose extent was
greater than the original image size but less then a previous IO that
expanded the image to attempt to shrink the image.
> + r = rbd_resize(s->image, off + size);
> + if (r < 0) {
> + goto failed_completion;
> + }
> + }
> #ifdef LIBRBD_SUPPORTS_IOVEC
> r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
> #else
> r = rbd_aio_write(s->image, off, size, rcb->buf, c);
> #endif
> break;
> + }
> case RBD_AIO_READ:
> #ifdef LIBRBD_SUPPORTS_IOVEC
> r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
> --
> 2.20.1
>
>
--
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
2019-05-03 17:21 ` Jason Dillaman
(?)
@ 2019-05-06 9:50 ` Stefano Garzarella
2019-05-07 9:43 ` Kevin Wolf
-1 siblings, 1 reply; 7+ messages in thread
From: Stefano Garzarella @ 2019-05-06 9:50 UTC (permalink / raw)
To: dillaman, Kevin Wolf; +Cc: Josh Durgin, qemu-devel, qemu-block, Max Reitz
On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote:
> On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > RBD APIs don't allow us to write more than the size set with
> > rbd_create() or rbd_resize().
> > In order to support growing images (eg. qcow2), we resize the
> > image before write operations that exceed the current size.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> > v2:
> > - use bs->total_sectors instead of adding a new field [Kevin]
> > - resize the image only during write operation [Kevin]
> > for read operation, the bdrv_aligned_preadv() already handles reads
> > that exceed the length returned by bdrv_getlength(), so IMHO we can
> > avoid to handle it in the rbd driver
> > ---
> > block/rbd.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/rbd.c b/block/rbd.c
> > index 0c549c9935..613e8f4982 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> > }
> >
> > switch (cmd) {
> > - case RBD_AIO_WRITE:
> > + case RBD_AIO_WRITE: {
> > + /*
> > + * RBD APIs don't allow us to write more than actual size, so in order
> > + * to support growing images, we resize the image before write
> > + * operations that exceed the current size.
> > + */
> > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
>
> When will "bs->total_sectors" be refreshed to represent the correct
> current size? You wouldn't want a future write whose extent was
> greater than the original image size but less then a previous IO that
> expanded the image to attempt to shrink the image.
>
Good point!
IIUC it can happen, because in the bdrv_aligned_pwritev() we do these
steps:
1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and
then it waits calling "qemu_coroutine_yield()"
2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors"
Between steps 1 and 2, maybe another request can be executed, then the
issue that you described can occur.
The solutions that I have in mind are:
a. Add a variable in the BDRVRBDState to track the latest resize.
b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink
the image.
c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not
sure it is allowed.
@Jason, @Kevin Do you have any advice?
Thanks,
Stefano
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
2019-05-06 9:50 ` Stefano Garzarella
@ 2019-05-07 9:43 ` Kevin Wolf
2019-05-08 9:41 ` Stefano Garzarella
0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2019-05-07 9:43 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Josh Durgin, dillaman, qemu-devel, qemu-block, Max Reitz
Am 06.05.2019 um 11:50 hat Stefano Garzarella geschrieben:
> On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote:
> > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >
> > > RBD APIs don't allow us to write more than the size set with
> > > rbd_create() or rbd_resize().
> > > In order to support growing images (eg. qcow2), we resize the
> > > image before write operations that exceed the current size.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > v2:
> > > - use bs->total_sectors instead of adding a new field [Kevin]
> > > - resize the image only during write operation [Kevin]
> > > for read operation, the bdrv_aligned_preadv() already handles reads
> > > that exceed the length returned by bdrv_getlength(), so IMHO we can
> > > avoid to handle it in the rbd driver
> > > ---
> > > block/rbd.c | 14 +++++++++++++-
> > > 1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/block/rbd.c b/block/rbd.c
> > > index 0c549c9935..613e8f4982 100644
> > > --- a/block/rbd.c
> > > +++ b/block/rbd.c
> > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> > > }
> > >
> > > switch (cmd) {
> > > - case RBD_AIO_WRITE:
> > > + case RBD_AIO_WRITE: {
> > > + /*
> > > + * RBD APIs don't allow us to write more than actual size, so in order
> > > + * to support growing images, we resize the image before write
> > > + * operations that exceed the current size.
> > > + */
> > > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
> >
> > When will "bs->total_sectors" be refreshed to represent the correct
> > current size? You wouldn't want a future write whose extent was
> > greater than the original image size but less then a previous IO that
> > expanded the image to attempt to shrink the image.
> >
>
> Good point!
> IIUC it can happen, because in the bdrv_aligned_pwritev() we do these
> steps:
> 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and
> then it waits calling "qemu_coroutine_yield()"
> 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors"
>
> Between steps 1 and 2, maybe another request can be executed, then the
> issue that you described can occur.
>
> The solutions that I have in mind are:
> a. Add a variable in the BDRVRBDState to track the latest resize.
This would work and be relatively simple.
> b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink
> the image.
I'm not sure if rbd_get_size() involves network traffic or other
significant complexity. If so, I'd definitely avoid it.
> c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not
> sure it is allowed.
>
> @Jason, @Kevin Do you have any advice?
We need to make sure to run everything that bdrv_co_write_req_finish()
does for resizing an image:
bs->total_sectors = end_sector;
bdrv_parent_cb_resize(bs);
bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS);
Just duplicating that code wouldn't be good; if something is added, we'd
probably forget updating rbd, too. So I think your solution c would at
least involve refactoring the above code into a separate function that
can be called from rbd.
But solution a might actually be the simplest. In this case, sorry for
giving you bad advice in v1 of the patch.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
2019-05-07 9:43 ` Kevin Wolf
@ 2019-05-08 9:41 ` Stefano Garzarella
0 siblings, 0 replies; 7+ messages in thread
From: Stefano Garzarella @ 2019-05-08 9:41 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Josh Durgin, dillaman, qemu-devel, qemu-block, Max Reitz
On Tue, May 07, 2019 at 11:43:50AM +0200, Kevin Wolf wrote:
> Am 06.05.2019 um 11:50 hat Stefano Garzarella geschrieben:
> > On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote:
> > > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > >
> > > > RBD APIs don't allow us to write more than the size set with
> > > > rbd_create() or rbd_resize().
> > > > In order to support growing images (eg. qcow2), we resize the
> > > > image before write operations that exceed the current size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > ---
> > > > v2:
> > > > - use bs->total_sectors instead of adding a new field [Kevin]
> > > > - resize the image only during write operation [Kevin]
> > > > for read operation, the bdrv_aligned_preadv() already handles reads
> > > > that exceed the length returned by bdrv_getlength(), so IMHO we can
> > > > avoid to handle it in the rbd driver
> > > > ---
> > > > block/rbd.c | 14 +++++++++++++-
> > > > 1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/block/rbd.c b/block/rbd.c
> > > > index 0c549c9935..613e8f4982 100644
> > > > --- a/block/rbd.c
> > > > +++ b/block/rbd.c
> > > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> > > > }
> > > >
> > > > switch (cmd) {
> > > > - case RBD_AIO_WRITE:
> > > > + case RBD_AIO_WRITE: {
> > > > + /*
> > > > + * RBD APIs don't allow us to write more than actual size, so in order
> > > > + * to support growing images, we resize the image before write
> > > > + * operations that exceed the current size.
> > > > + */
> > > > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
> > >
> > > When will "bs->total_sectors" be refreshed to represent the correct
> > > current size? You wouldn't want a future write whose extent was
> > > greater than the original image size but less then a previous IO that
> > > expanded the image to attempt to shrink the image.
> > >
> >
> > Good point!
> > IIUC it can happen, because in the bdrv_aligned_pwritev() we do these
> > steps:
> > 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and
> > then it waits calling "qemu_coroutine_yield()"
> > 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors"
> >
> > Between steps 1 and 2, maybe another request can be executed, then the
> > issue that you described can occur.
> >
> > The solutions that I have in mind are:
> > a. Add a variable in the BDRVRBDState to track the latest resize.
>
> This would work and be relatively simple.
>
> > b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink
> > the image.
>
> I'm not sure if rbd_get_size() involves network traffic or other
> significant complexity. If so, I'd definitely avoid it.
>
> > c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not
> > sure it is allowed.
> >
> > @Jason, @Kevin Do you have any advice?
>
> We need to make sure to run everything that bdrv_co_write_req_finish()
> does for resizing an image:
>
> bs->total_sectors = end_sector;
> bdrv_parent_cb_resize(bs);
> bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS);
>
> Just duplicating that code wouldn't be good; if something is added, we'd
> probably forget updating rbd, too. So I think your solution c would at
> least involve refactoring the above code into a separate function that
> can be called from rbd.
>
> But solution a might actually be the simplest. In this case, sorry for
> giving you bad advice in v1 of the patch.
>
I agree with you, 'a' should be simplest to implement.
I'll send a v3 fixing this.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-05-08 9:42 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 16:30 [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size Stefano Garzarella
2019-05-03 16:30 ` Stefano Garzarella
2019-05-03 17:21 ` Jason Dillaman
2019-05-03 17:21 ` Jason Dillaman
2019-05-06 9:50 ` Stefano Garzarella
2019-05-07 9:43 ` Kevin Wolf
2019-05-08 9:41 ` Stefano Garzarella
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.