All of lore.kernel.org
 help / color / mirror / Atom feed
* QEMU RBD is slow with QCOW2 images
@ 2021-03-03 17:40 Stefano Garzarella
  2021-03-03 18:47 ` Jason Dillaman
  2021-03-04 12:05 ` Kevin Wolf
  0 siblings, 2 replies; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-03 17:40 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Peter Lieven, qemu-devel, qemu-block

Hi Jason,
as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD 
writing data is very slow compared to a raw file.

Comparing raw vs QCOW2 image creation with RBD I found that we use a 
different object size, for the raw file I see '4 MiB objects', for QCOW2 
I see '64 KiB objects' as reported on comment 14 [2].
This should be the main issue of slowness, indeed forcing in the code 4 
MiB object size also for QCOW2 increased the speed a lot.

Looking better I discovered that for raw files, we call rbd_create() 
with obj_order = 0 (if 'cluster_size' options is not defined), so the 
default object size is used.
Instead for QCOW2, we use obj_order = 16, since the default 
'cluster_size' defined for QCOW2, is 64 KiB.

Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster 
size, since in qcow2_co_create_opts() we remove the 'cluster_size' from 
QemuOpts calling qemu_opts_to_qdict_filtered().
For some reason that I have yet to understand, after this deletion, 
however remains in QemuOpts the default value of 'cluster_size' for 
qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()

At this point my doubts are:
Does it make sense to use the same cluster_size as qcow2 as object_size 
in RBD?
If we want to keep the 2 options separated, how can it be done? Should 
we rename the option in block/rbd.c?

Thanks,
Stefano

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1744525
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1744525#c14



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
@ 2021-03-03 18:47 ` Jason Dillaman
  2021-03-03 21:26   ` Peter Lieven
  2021-03-04  8:55   ` Stefano Garzarella
  2021-03-04 12:05 ` Kevin Wolf
  1 sibling, 2 replies; 14+ messages in thread
From: Jason Dillaman @ 2021-03-03 18:47 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Peter Lieven, qemu-devel, qemu-block

On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Jason,
> as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> writing data is very slow compared to a raw file.
>
> Comparing raw vs QCOW2 image creation with RBD I found that we use a
> different object size, for the raw file I see '4 MiB objects', for QCOW2
> I see '64 KiB objects' as reported on comment 14 [2].
> This should be the main issue of slowness, indeed forcing in the code 4
> MiB object size also for QCOW2 increased the speed a lot.
>
> Looking better I discovered that for raw files, we call rbd_create()
> with obj_order = 0 (if 'cluster_size' options is not defined), so the
> default object size is used.
> Instead for QCOW2, we use obj_order = 16, since the default
> 'cluster_size' defined for QCOW2, is 64 KiB.
>
> Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> QemuOpts calling qemu_opts_to_qdict_filtered().
> For some reason that I have yet to understand, after this deletion,
> however remains in QemuOpts the default value of 'cluster_size' for
> qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
>
> At this point my doubts are:
> Does it make sense to use the same cluster_size as qcow2 as object_size
> in RBD?

No, not really. But it also doesn't really make any sense to put a
QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
does not put QCOW2 images on RBD, it converts QCOW2 images into raw
images to store in RBD.

> If we want to keep the 2 options separated, how can it be done? Should
> we rename the option in block/rbd.c?

You can already pass overrides to the RBD block driver by just
appending them after the
"rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
that could be re-used.

> Thanks,
> Stefano
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1744525
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1744525#c14
>


-- 
Jason



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-03 18:47 ` Jason Dillaman
@ 2021-03-03 21:26   ` Peter Lieven
  2021-03-04  8:58     ` Stefano Garzarella
  2021-03-04  8:55   ` Stefano Garzarella
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Lieven @ 2021-03-03 21:26 UTC (permalink / raw)
  To: dillaman, Stefano Garzarella; +Cc: qemu-devel, qemu-block

Am 03.03.21 um 19:47 schrieb Jason Dillaman:
> On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> Hi Jason,
>> as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> writing data is very slow compared to a raw file.
>>
>> Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> different object size, for the raw file I see '4 MiB objects', for QCOW2
>> I see '64 KiB objects' as reported on comment 14 [2].
>> This should be the main issue of slowness, indeed forcing in the code 4
>> MiB object size also for QCOW2 increased the speed a lot.
>>
>> Looking better I discovered that for raw files, we call rbd_create()
>> with obj_order = 0 (if 'cluster_size' options is not defined), so the
>> default object size is used.
>> Instead for QCOW2, we use obj_order = 16, since the default
>> 'cluster_size' defined for QCOW2, is 64 KiB.
>>
>> Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>> size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>> QemuOpts calling qemu_opts_to_qdict_filtered().
>> For some reason that I have yet to understand, after this deletion,
>> however remains in QemuOpts the default value of 'cluster_size' for
>> qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
>>
>> At this point my doubts are:
>> Does it make sense to use the same cluster_size as qcow2 as object_size
>> in RBD?
> No, not really. But it also doesn't really make any sense to put a
> QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
> does not put QCOW2 images on RBD, it converts QCOW2 images into raw
> images to store in RBD.


As discussed earlier the only reasonable format for rbd image is raw.

What is the idea behind putting a qcow2 on an rbd pool?

Jason and I even discussed shortly durign the review of the rbd driver rewrite I posted

earlier if it was ok to drop support for writing past the end of file.


Anyway the reason why it is so slow is that write requests serialize if the

qcow2 file grows. If there is a sane reason why we need qcow2 on rbd

we need to implement at least preallocation mode = full to overcome

the serialization.


Peter




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-03 18:47 ` Jason Dillaman
  2021-03-03 21:26   ` Peter Lieven
@ 2021-03-04  8:55   ` Stefano Garzarella
  2021-03-04 10:25     ` Daniel P. Berrangé
  1 sibling, 1 reply; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-04  8:55 UTC (permalink / raw)
  To: dillaman; +Cc: Peter Lieven, qemu-devel, qemu-block

On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
>On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> Hi Jason,
>> as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> writing data is very slow compared to a raw file.
>>
>> Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> different object size, for the raw file I see '4 MiB objects', for QCOW2
>> I see '64 KiB objects' as reported on comment 14 [2].
>> This should be the main issue of slowness, indeed forcing in the code 4
>> MiB object size also for QCOW2 increased the speed a lot.
>>
>> Looking better I discovered that for raw files, we call rbd_create()
>> with obj_order = 0 (if 'cluster_size' options is not defined), so the
>> default object size is used.
>> Instead for QCOW2, we use obj_order = 16, since the default
>> 'cluster_size' defined for QCOW2, is 64 KiB.
>>
>> Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>> size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>> QemuOpts calling qemu_opts_to_qdict_filtered().
>> For some reason that I have yet to understand, after this deletion,
>> however remains in QemuOpts the default value of 'cluster_size' for
>> qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
>>
>> At this point my doubts are:
>> Does it make sense to use the same cluster_size as qcow2 as object_size
>> in RBD?
>
>No, not really. But it also doesn't really make any sense to put a
>QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
>does not put QCOW2 images on RBD, it converts QCOW2 images into raw
>images to store in RBD.

Yes, that was my doubt, thanks for the confirmation.

Also Daniel (+CC) confirmed me the same thing, but just to be complete 
he added that there is a case where OpenStack could use qcow2 on RBD, 
but in this case using in-kernel RBD, so the QEMU RBD is not involved.

>
>> If we want to keep the 2 options separated, how can it be done? Should
>> we rename the option in block/rbd.c?
>
>You can already pass overrides to the RBD block driver by just
>appending them after the
>"rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
>that could be re-used.

I see, we should extend qemu_rbd_parse_filename() to suppurt it.

Maybe if we don't want to support this configuration, we should print 
some warning messages.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-03 21:26   ` Peter Lieven
@ 2021-03-04  8:58     ` Stefano Garzarella
  0 siblings, 0 replies; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-04  8:58 UTC (permalink / raw)
  To: Peter Lieven; +Cc: dillaman, qemu-devel, qemu-block

On Wed, Mar 03, 2021 at 10:26:12PM +0100, Peter Lieven wrote:
>Am 03.03.21 um 19:47 schrieb Jason Dillaman:
>> On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>> Hi Jason,
>>> as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>>> writing data is very slow compared to a raw file.
>>>
>>> Comparing raw vs QCOW2 image creation with RBD I found that we use a
>>> different object size, for the raw file I see '4 MiB objects', for QCOW2
>>> I see '64 KiB objects' as reported on comment 14 [2].
>>> This should be the main issue of slowness, indeed forcing in the code 4
>>> MiB object size also for QCOW2 increased the speed a lot.
>>>
>>> Looking better I discovered that for raw files, we call rbd_create()
>>> with obj_order = 0 (if 'cluster_size' options is not defined), so the
>>> default object size is used.
>>> Instead for QCOW2, we use obj_order = 16, since the default
>>> 'cluster_size' defined for QCOW2, is 64 KiB.
>>>
>>> Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>>> size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>>> QemuOpts calling qemu_opts_to_qdict_filtered().
>>> For some reason that I have yet to understand, after this deletion,
>>> however remains in QemuOpts the default value of 'cluster_size' for
>>> qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
>>>
>>> At this point my doubts are:
>>> Does it make sense to use the same cluster_size as qcow2 as object_size
>>> in RBD?
>> No, not really. But it also doesn't really make any sense to put a
>> QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
>> does not put QCOW2 images on RBD, it converts QCOW2 images into raw
>> images to store in RBD.
>
>
>As discussed earlier the only reasonable format for rbd image is raw.
>What is the idea behind putting a qcow2 on an rbd pool?
>Jason and I even discussed shortly durign the review of the rbd driver 
>rewrite I posted
>earlier if it was ok to drop support for writing past the end of file.
>
>Anyway the reason why it is so slow is that write requests serialize if the
>qcow2 file grows. If there is a sane reason why we need qcow2 on rbd
>we need to implement at least preallocation mode = full to overcome
>the serialization.

Agree, at most we could deprecate it by printing a message and then 
remove the support in future releases.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04  8:55   ` Stefano Garzarella
@ 2021-03-04 10:25     ` Daniel P. Berrangé
  2021-03-04 11:12       ` Stefano Garzarella
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel P. Berrangé @ 2021-03-04 10:25 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Peter Lieven, dillaman, qemu-devel, qemu-block

On Thu, Mar 04, 2021 at 09:55:40AM +0100, Stefano Garzarella wrote:
> On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
> > On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > 
> > > Hi Jason,
> > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > writing data is very slow compared to a raw file.
> > > 
> > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > different object size, for the raw file I see '4 MiB objects', for QCOW2
> > > I see '64 KiB objects' as reported on comment 14 [2].
> > > This should be the main issue of slowness, indeed forcing in the code 4
> > > MiB object size also for QCOW2 increased the speed a lot.
> > > 
> > > Looking better I discovered that for raw files, we call rbd_create()
> > > with obj_order = 0 (if 'cluster_size' options is not defined), so the
> > > default object size is used.
> > > Instead for QCOW2, we use obj_order = 16, since the default
> > > 'cluster_size' defined for QCOW2, is 64 KiB.
> > > 
> > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > > QemuOpts calling qemu_opts_to_qdict_filtered().
> > > For some reason that I have yet to understand, after this deletion,
> > > however remains in QemuOpts the default value of 'cluster_size' for
> > > qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
> > > 
> > > At this point my doubts are:
> > > Does it make sense to use the same cluster_size as qcow2 as object_size
> > > in RBD?
> > 
> > No, not really. But it also doesn't really make any sense to put a
> > QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
> > does not put QCOW2 images on RBD, it converts QCOW2 images into raw
> > images to store in RBD.
> 
> Yes, that was my doubt, thanks for the confirmation.
> 
> Also Daniel (+CC) confirmed me the same thing, but just to be complete he
> added that there is a case where OpenStack could use qcow2 on RBD, but in
> this case using in-kernel RBD, so the QEMU RBD is not involved.
> 
> > 
> > > If we want to keep the 2 options separated, how can it be done? Should
> > > we rename the option in block/rbd.c?
> > 
> > You can already pass overrides to the RBD block driver by just
> > appending them after the
> > "rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
> > that could be re-used.
> 
> I see, we should extend qemu_rbd_parse_filename() to suppurt it.

We shouldn't really be extending the legacy filename syntax.
If we need extra options we want them in the QAPI schema for
blockdev.

> Maybe if we don't want to support this configuration, we should print some
> warning messages.

Note these are separate layers in QEMU block layer. qcow2 is a format
driver, while RBD is a protocol driver. QEMU lets any format driver be
run on top of any protocol driver in general. In practice there are
certain combinations that are more sane and commonly used than others,
but QEMU doesn't document this or assign support level to pairing
right now.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04 10:25     ` Daniel P. Berrangé
@ 2021-03-04 11:12       ` Stefano Garzarella
  2021-03-04 11:15         ` Daniel P. Berrangé
  0 siblings, 1 reply; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-04 11:12 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Peter Lieven, dillaman, qemu-devel, qemu-block

On Thu, Mar 04, 2021 at 10:25:33AM +0000, Daniel P. Berrangé wrote:
>On Thu, Mar 04, 2021 at 09:55:40AM +0100, Stefano Garzarella wrote:
>> On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
>> > On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >
>> > > Hi Jason,
>> > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> > > writing data is very slow compared to a raw file.
>> > >
>> > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> > > different object size, for the raw file I see '4 MiB objects', for QCOW2
>> > > I see '64 KiB objects' as reported on comment 14 [2].
>> > > This should be the main issue of slowness, indeed forcing in the code 4
>> > > MiB object size also for QCOW2 increased the speed a lot.
>> > >
>> > > Looking better I discovered that for raw files, we call rbd_create()
>> > > with obj_order = 0 (if 'cluster_size' options is not defined), so the
>> > > default object size is used.
>> > > Instead for QCOW2, we use obj_order = 16, since the default
>> > > 'cluster_size' defined for QCOW2, is 64 KiB.
>> > >
>> > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>> > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>> > > QemuOpts calling qemu_opts_to_qdict_filtered().
>> > > For some reason that I have yet to understand, after this deletion,
>> > > however remains in QemuOpts the default value of 'cluster_size' for
>> > > qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
>> > >
>> > > At this point my doubts are:
>> > > Does it make sense to use the same cluster_size as qcow2 as object_size
>> > > in RBD?
>> >
>> > No, not really. But it also doesn't really make any sense to put a
>> > QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
>> > does not put QCOW2 images on RBD, it converts QCOW2 images into raw
>> > images to store in RBD.
>>
>> Yes, that was my doubt, thanks for the confirmation.
>>
>> Also Daniel (+CC) confirmed me the same thing, but just to be complete he
>> added that there is a case where OpenStack could use qcow2 on RBD, but in
>> this case using in-kernel RBD, so the QEMU RBD is not involved.
>>
>> >
>> > > If we want to keep the 2 options separated, how can it be done? Should
>> > > we rename the option in block/rbd.c?
>> >
>> > You can already pass overrides to the RBD block driver by just
>> > appending them after the
>> > "rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
>> > that could be re-used.
>>
>> I see, we should extend qemu_rbd_parse_filename() to suppurt it.
>
>We shouldn't really be extending the legacy filename syntax.
>If we need extra options we want them in the QAPI schema for
>blockdev.

Got it.

I'm still a bit confused about how QemuOpts are handled between format 
and protocol drivers.

It seems that in this case the protocol tries to access some information 
from the format (BLOCK_OPT_CLUSTER_SIZE).

Since the format removes this information from the QemuOpts passed to 
the protocol, this takes the default value of the format, even if a 
different value is specified.

Is it correct for a protocol to access BLOCK_OPT_CLUSTER_SIZE?

>
>> Maybe if we don't want to support this configuration, we should print some
>> warning messages.
>
>Note these are separate layers in QEMU block layer. qcow2 is a format
>driver, while RBD is a protocol driver. QEMU lets any format driver be
>run on top of any protocol driver in general. In practice there are
>certain combinations that are more sane and commonly used than others,
>but QEMU doesn't document this or assign support level to pairing
>right now.

Thanks for the clarification,
Stefano



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04 11:12       ` Stefano Garzarella
@ 2021-03-04 11:15         ` Daniel P. Berrangé
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel P. Berrangé @ 2021-03-04 11:15 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Peter Lieven, dillaman, qemu-devel, qemu-block

On Thu, Mar 04, 2021 at 12:12:51PM +0100, Stefano Garzarella wrote:
> On Thu, Mar 04, 2021 at 10:25:33AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Mar 04, 2021 at 09:55:40AM +0100, Stefano Garzarella wrote:
> > > On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
> > > > On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >
> > > > > Hi Jason,
> > > > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > > > writing data is very slow compared to a raw file.
> > > > >
> > > > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > > > different object size, for the raw file I see '4 MiB objects', for QCOW2
> > > > > I see '64 KiB objects' as reported on comment 14 [2].
> > > > > This should be the main issue of slowness, indeed forcing in the code 4
> > > > > MiB object size also for QCOW2 increased the speed a lot.
> > > > >
> > > > > Looking better I discovered that for raw files, we call rbd_create()
> > > > > with obj_order = 0 (if 'cluster_size' options is not defined), so the
> > > > > default object size is used.
> > > > > Instead for QCOW2, we use obj_order = 16, since the default
> > > > > 'cluster_size' defined for QCOW2, is 64 KiB.
> > > > >
> > > > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > > > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > > > > QemuOpts calling qemu_opts_to_qdict_filtered().
> > > > > For some reason that I have yet to understand, after this deletion,
> > > > > however remains in QemuOpts the default value of 'cluster_size' for
> > > > > qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
> > > > >
> > > > > At this point my doubts are:
> > > > > Does it make sense to use the same cluster_size as qcow2 as object_size
> > > > > in RBD?
> > > >
> > > > No, not really. But it also doesn't really make any sense to put a
> > > > QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
> > > > does not put QCOW2 images on RBD, it converts QCOW2 images into raw
> > > > images to store in RBD.
> > > 
> > > Yes, that was my doubt, thanks for the confirmation.
> > > 
> > > Also Daniel (+CC) confirmed me the same thing, but just to be complete he
> > > added that there is a case where OpenStack could use qcow2 on RBD, but in
> > > this case using in-kernel RBD, so the QEMU RBD is not involved.
> > > 
> > > >
> > > > > If we want to keep the 2 options separated, how can it be done? Should
> > > > > we rename the option in block/rbd.c?
> > > >
> > > > You can already pass overrides to the RBD block driver by just
> > > > appending them after the
> > > > "rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
> > > > that could be re-used.
> > > 
> > > I see, we should extend qemu_rbd_parse_filename() to suppurt it.
> > 
> > We shouldn't really be extending the legacy filename syntax.
> > If we need extra options we want them in the QAPI schema for
> > blockdev.
> 
> Got it.
> 
> I'm still a bit confused about how QemuOpts are handled between format and
> protocol drivers.
> 
> It seems that in this case the protocol tries to access some information
> from the format (BLOCK_OPT_CLUSTER_SIZE).
> 
> Since the format removes this information from the QemuOpts passed to the
> protocol, this takes the default value of the format, even if a different
> value is specified.
> 
> Is it correct for a protocol to access BLOCK_OPT_CLUSTER_SIZE?

In a -blockdev world, the caller would be expected to set the values
explicitly at all layers that need it.

You're talking about a scenario that is non-blockdev though, and
I'm not sure what the right answer is here. Will need Kevin/Max
to answer that one.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
  2021-03-03 18:47 ` Jason Dillaman
@ 2021-03-04 12:05 ` Kevin Wolf
  2021-03-04 14:08   ` Stefano Garzarella
  1 sibling, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2021-03-04 12:05 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Peter Lieven, Jason Dillaman, qemu-devel, qemu-block

Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
> Hi Jason,
> as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> writing data is very slow compared to a raw file.
> 
> Comparing raw vs QCOW2 image creation with RBD I found that we use a
> different object size, for the raw file I see '4 MiB objects', for QCOW2 I
> see '64 KiB objects' as reported on comment 14 [2].
> This should be the main issue of slowness, indeed forcing in the code 4 MiB
> object size also for QCOW2 increased the speed a lot.
> 
> Looking better I discovered that for raw files, we call rbd_create() with
> obj_order = 0 (if 'cluster_size' options is not defined), so the default
> object size is used.
> Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
> defined for QCOW2, is 64 KiB.

Hm, the QemuOpts-based image creation is messy, but why does the rbd
driver even see the cluster_size option?

The first thing qcow2_co_create_opts() does is splitting the passed
QemuOpts into options it will process on the qcow2 layer and options
that are passed to the protocol layer. So if you pass a cluster_size
option, qcow2 should take it for itself and not pass it to rbd.

If it is passed to rbd, I think that's a bug in the qcow2 driver.

> Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> QemuOpts calling qemu_opts_to_qdict_filtered().
> For some reason that I have yet to understand, after this deletion, however
> remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB),
> that it's used in qemu_rbd_co_create_opts()

So it seems you came to a similar conclusion. We need to find out where
the 64k come from and just fix that so that rbd uses its default.

> At this point my doubts are:
> Does it make sense to use the same cluster_size as qcow2 as object_size in
> RBD?
> If we want to keep the 2 options separated, how can it be done? Should we
> rename the option in block/rbd.c?

My lazy answer is that you could just use QMP blockdev-create, where you
create layer by layer separately.

What could possibly be done for the QemuOpts is using the dotted syntax
like for opening, so you could specify file.cluster_size=... for the
protocol layer (or data_file.cluster_size=... for the external data
file etc.)

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04 12:05 ` Kevin Wolf
@ 2021-03-04 14:08   ` Stefano Garzarella
  2021-03-04 14:59     ` Kevin Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-04 14:08 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, Jason Dillaman, qemu-devel, qemu-block

On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
>Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
>> Hi Jason,
>> as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> writing data is very slow compared to a raw file.
>>
>> Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> different object size, for the raw file I see '4 MiB objects', for 
>> QCOW2 I
>> see '64 KiB objects' as reported on comment 14 [2].
>> This should be the main issue of slowness, indeed forcing in the code 4 MiB
>> object size also for QCOW2 increased the speed a lot.
>>
>> Looking better I discovered that for raw files, we call rbd_create() with
>> obj_order = 0 (if 'cluster_size' options is not defined), so the default
>> object size is used.
>> Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
>> defined for QCOW2, is 64 KiB.
>
>Hm, the QemuOpts-based image creation is messy, but why does the rbd
>driver even see the cluster_size option?
>
>The first thing qcow2_co_create_opts() does is splitting the passed
>QemuOpts into options it will process on the qcow2 layer and options
>that are passed to the protocol layer. So if you pass a cluster_size
>option, qcow2 should take it for itself and not pass it to rbd.
>
>If it is passed to rbd, I think that's a bug in the qcow2 driver.

IIUC qcow2 properyl remove it, but when rbd uses 
qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE, 0) the default value 
of qcow2 format is returned.

Going in depth in qemu_opt_get_size_helper(), I found that 
qemu_opt_find() properly returns a NULL pointer, but then we call 
find_default_by_name() that returns the default value of qcow2 format 
(64k).

>
>> Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>> size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>> QemuOpts calling qemu_opts_to_qdict_filtered().
>> For some reason that I have yet to understand, after this deletion, however
>> remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB),
>> that it's used in qemu_rbd_co_create_opts()
>
>So it seems you came to a similar conclusion. We need to find out where
>the 64k come from and just fix that so that rbd uses its default.

Yes, I tried debugging above, but I'm not sure how to fix it.

Maybe a new parameter in qemu_opt_get_size_helper() to prevent it from 
looking for the default value.
Or we should prevent the default value from being added to the 
opts->list->desc, but that part is still not very clear to me.

>
>> At this point my doubts are:
>> Does it make sense to use the same cluster_size as qcow2 as object_size in
>> RBD?
>> If we want to keep the 2 options separated, how can it be done? Should we
>> rename the option in block/rbd.c?
>
>My lazy answer is that you could just use QMP blockdev-create, where you
>create layer by layer separately.
>
>What could possibly be done for the QemuOpts is using the dotted syntax
>like for opening, so you could specify file.cluster_size=... for the
>protocol layer (or data_file.cluster_size=... for the external data
>file etc.)
>

This would be cool :-)

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04 14:08   ` Stefano Garzarella
@ 2021-03-04 14:59     ` Kevin Wolf
  2021-03-04 17:32       ` Stefano Garzarella
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2021-03-04 14:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Peter Lieven, Jason Dillaman, qemu-devel, qemu-block, armbru

Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
> On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
> > Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
> > > Hi Jason,
> > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > writing data is very slow compared to a raw file.
> > > 
> > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > different object size, for the raw file I see '4 MiB objects', for
> > > QCOW2 I
> > > see '64 KiB objects' as reported on comment 14 [2].
> > > This should be the main issue of slowness, indeed forcing in the code 4 MiB
> > > object size also for QCOW2 increased the speed a lot.
> > > 
> > > Looking better I discovered that for raw files, we call rbd_create() with
> > > obj_order = 0 (if 'cluster_size' options is not defined), so the default
> > > object size is used.
> > > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
> > > defined for QCOW2, is 64 KiB.
> > 
> > Hm, the QemuOpts-based image creation is messy, but why does the rbd
> > driver even see the cluster_size option?
> > 
> > The first thing qcow2_co_create_opts() does is splitting the passed
> > QemuOpts into options it will process on the qcow2 layer and options
> > that are passed to the protocol layer. So if you pass a cluster_size
> > option, qcow2 should take it for itself and not pass it to rbd.
> > 
> > If it is passed to rbd, I think that's a bug in the qcow2 driver.
> 
> IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
> BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.
> 
> Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
> properly returns a NULL pointer, but then we call find_default_by_name()
> that returns the default value of qcow2 format (64k).

Ugh, I see why. We're passing the protocol driver a QemuOpts that was
created for a QemuOptsList with the qcow2 default, not for its own
QemuOptsList. This is wrong.

Note that the QemuOptsList is not qcow2_create_opts itself, but a list
that is created with qemu_opts_append() to combine qcow2 and rbd options
into a new QemuOptsList. For overlapping options, the format wins.

I don't think you can change the QemuOptsList of an existing QemuOpts,
nor is there a clone operation that could just copy all options into a
new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
hack^Wsolution would be converting to QDict and back...

> > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > > QemuOpts calling qemu_opts_to_qdict_filtered().
> > > For some reason that I have yet to understand, after this deletion, however
> > > remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB),
> > > that it's used in qemu_rbd_co_create_opts()
> > 
> > So it seems you came to a similar conclusion. We need to find out where
> > the 64k come from and just fix that so that rbd uses its default.
> 
> Yes, I tried debugging above, but I'm not sure how to fix it.
> 
> Maybe a new parameter in qemu_opt_get_size_helper() to prevent it from
> looking for the default value.
> Or we should prevent the default value from being added to the
> opts->list->desc, but that part is still not very clear to me.

opts->list is already wrong, I think this is what we need to fix.

> > > At this point my doubts are:
> > > Does it make sense to use the same cluster_size as qcow2 as object_size in
> > > RBD?
> > > If we want to keep the 2 options separated, how can it be done? Should we
> > > rename the option in block/rbd.c?
> > 
> > My lazy answer is that you could just use QMP blockdev-create, where you
> > create layer by layer separately.
> > 
> > What could possibly be done for the QemuOpts is using the dotted syntax
> > like for opening, so you could specify file.cluster_size=... for the
> > protocol layer (or data_file.cluster_size=... for the external data
> > file etc.)
> 
> This would be cool :-)

I'm almost sure that compatibility will make this more complicated than
it sounds, but we could have a try.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04 14:59     ` Kevin Wolf
@ 2021-03-04 17:32       ` Stefano Garzarella
  2021-03-05  9:16         ` Kevin Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-04 17:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, Jason Dillaman, qemu-devel, qemu-block, armbru

On Thu, Mar 04, 2021 at 03:59:17PM +0100, Kevin Wolf wrote:
>Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
>> On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
>> > Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
>> > > Hi Jason,
>> > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> > > writing data is very slow compared to a raw file.
>> > >
>> > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> > > different object size, for the raw file I see '4 MiB objects', for
>> > > QCOW2 I
>> > > see '64 KiB objects' as reported on comment 14 [2].
>> > > This should be the main issue of slowness, indeed forcing in the code 4 MiB
>> > > object size also for QCOW2 increased the speed a lot.
>> > >
>> > > Looking better I discovered that for raw files, we call rbd_create() with
>> > > obj_order = 0 (if 'cluster_size' options is not defined), so the default
>> > > object size is used.
>> > > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
>> > > defined for QCOW2, is 64 KiB.
>> >
>> > Hm, the QemuOpts-based image creation is messy, but why does the rbd
>> > driver even see the cluster_size option?
>> >
>> > The first thing qcow2_co_create_opts() does is splitting the passed
>> > QemuOpts into options it will process on the qcow2 layer and options
>> > that are passed to the protocol layer. So if you pass a cluster_size
>> > option, qcow2 should take it for itself and not pass it to rbd.
>> >
>> > If it is passed to rbd, I think that's a bug in the qcow2 driver.
>>
>> IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
>> BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.
>>
>> Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
>> properly returns a NULL pointer, but then we call find_default_by_name()
>> that returns the default value of qcow2 format (64k).
>
>Ugh, I see why. We're passing the protocol driver a QemuOpts that was
>created for a QemuOptsList with the qcow2 default, not for its own
>QemuOptsList. This is wrong.
>
>Note that the QemuOptsList is not qcow2_create_opts itself, but a list
>that is created with qemu_opts_append() to combine qcow2 and rbd options
>into a new QemuOptsList. For overlapping options, the format wins.
>
>I don't think you can change the QemuOptsList of an existing QemuOpts,
>nor is there a clone operation that could just copy all options into a
>new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
>hack^Wsolution would be converting to QDict and back...

Do you mean something like this? (I'll send a proper patch when 
everything is a little clearer to me :-)

diff --git a/block.c b/block.c
index a1f3cecd75..74b02b32dc 100644
--- a/block.c
+++ b/block.c
@@ -671,13 +671,33 @@ out:
  int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
  {
      BlockDriver *drv;
+    QemuOpts *new_opts;
+    QDict *qdict;
+    int ret;

      drv = bdrv_find_protocol(filename, true, errp);
      if (drv == NULL) {
          return -ENOENT;
      }

-    return bdrv_create(drv, filename, opts, errp);
+    if (!drv->create_opts) {
+        error_setg(errp, "Driver '%s' does not support image creation",
+                   drv->format_name);
+        return -ENOTSUP;
+    }
+
+    qdict = qemu_opts_to_qdict(opts, NULL);
+    new_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
+    if (new_opts == NULL) {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    ret = bdrv_create(drv, filename, new_opts, errp);
+out:
+    qemu_opts_del(new_opts);
+    qobject_unref(qdict);
+    return ret;
  }

>
>> > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
>> > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
>> > > QemuOpts calling qemu_opts_to_qdict_filtered().
>> > > For some reason that I have yet to understand, after this deletion, however
>> > > remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB),
>> > > that it's used in qemu_rbd_co_create_opts()
>> >
>> > So it seems you came to a similar conclusion. We need to find out where
>> > the 64k come from and just fix that so that rbd uses its default.
>>
>> Yes, I tried debugging above, but I'm not sure how to fix it.
>>
>> Maybe a new parameter in qemu_opt_get_size_helper() to prevent it from
>> looking for the default value.
>> Or we should prevent the default value from being added to the
>> opts->list->desc, but that part is still not very clear to me.
>
>opts->list is already wrong, I think this is what we need to fix.
>
>> > > At this point my doubts are:
>> > > Does it make sense to use the same cluster_size as qcow2 as object_size in
>> > > RBD?
>> > > If we want to keep the 2 options separated, how can it be done?  
>> > > Should we
>> > > rename the option in block/rbd.c?
>> >
>> > My lazy answer is that you could just use QMP blockdev-create, where you
>> > create layer by layer separately.
>> >
>> > What could possibly be done for the QemuOpts is using the dotted syntax
>> > like for opening, so you could specify file.cluster_size=... for the
>> > protocol layer (or data_file.cluster_size=... for the external data
>> > file etc.)
>>
>> This would be cool :-)
>
>I'm almost sure that compatibility will make this more complicated than
>it sounds, but we could have a try.

Eheh I see your point.

Thanks,
Stefano



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-04 17:32       ` Stefano Garzarella
@ 2021-03-05  9:16         ` Kevin Wolf
  2021-03-05  9:44           ` Stefano Garzarella
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2021-03-05  9:16 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Peter Lieven, Jason Dillaman, qemu-devel, qemu-block, armbru

Am 04.03.2021 um 18:32 hat Stefano Garzarella geschrieben:
> On Thu, Mar 04, 2021 at 03:59:17PM +0100, Kevin Wolf wrote:
> > Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
> > > On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
> > > > Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
> > > > > Hi Jason,
> > > > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > > > writing data is very slow compared to a raw file.
> > > > >
> > > > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > > > different object size, for the raw file I see '4 MiB objects', for
> > > > > QCOW2 I
> > > > > see '64 KiB objects' as reported on comment 14 [2].
> > > > > This should be the main issue of slowness, indeed forcing in the code 4 MiB
> > > > > object size also for QCOW2 increased the speed a lot.
> > > > >
> > > > > Looking better I discovered that for raw files, we call rbd_create() with
> > > > > obj_order = 0 (if 'cluster_size' options is not defined), so the default
> > > > > object size is used.
> > > > > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
> > > > > defined for QCOW2, is 64 KiB.
> > > >
> > > > Hm, the QemuOpts-based image creation is messy, but why does the rbd
> > > > driver even see the cluster_size option?
> > > >
> > > > The first thing qcow2_co_create_opts() does is splitting the passed
> > > > QemuOpts into options it will process on the qcow2 layer and options
> > > > that are passed to the protocol layer. So if you pass a cluster_size
> > > > option, qcow2 should take it for itself and not pass it to rbd.
> > > >
> > > > If it is passed to rbd, I think that's a bug in the qcow2 driver.
> > > 
> > > IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
> > > BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.
> > > 
> > > Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
> > > properly returns a NULL pointer, but then we call find_default_by_name()
> > > that returns the default value of qcow2 format (64k).
> > 
> > Ugh, I see why. We're passing the protocol driver a QemuOpts that was
> > created for a QemuOptsList with the qcow2 default, not for its own
> > QemuOptsList. This is wrong.
> > 
> > Note that the QemuOptsList is not qcow2_create_opts itself, but a list
> > that is created with qemu_opts_append() to combine qcow2 and rbd options
> > into a new QemuOptsList. For overlapping options, the format wins.
> > 
> > I don't think you can change the QemuOptsList of an existing QemuOpts,
> > nor is there a clone operation that could just copy all options into a
> > new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
> > hack^Wsolution would be converting to QDict and back...
> 
> Do you mean something like this? (I'll send a proper patch when everything
> is a little clearer to me :-)
> 
> diff --git a/block.c b/block.c
> index a1f3cecd75..74b02b32dc 100644
> --- a/block.c
> +++ b/block.c
> @@ -671,13 +671,33 @@ out:
>  int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>  {
>      BlockDriver *drv;
> +    QemuOpts *new_opts;
> +    QDict *qdict;
> +    int ret;
> 
>      drv = bdrv_find_protocol(filename, true, errp);
>      if (drv == NULL) {
>          return -ENOENT;
>      }
> 
> -    return bdrv_create(drv, filename, opts, errp);
> +    if (!drv->create_opts) {
> +        error_setg(errp, "Driver '%s' does not support image creation",
> +                   drv->format_name);
> +        return -ENOTSUP;
> +    }
> +
> +    qdict = qemu_opts_to_qdict(opts, NULL);
> +    new_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
> +    if (new_opts == NULL) {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    ret = bdrv_create(drv, filename, new_opts, errp);
> +out:
> +    qemu_opts_del(new_opts);
> +    qobject_unref(qdict);
> +    return ret;
>  }

Something like this, yes. Does it work for you?

Of course, in the real patch it could use a comment why we're doing
these seemingly redundant conversions.

Kevin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: QEMU RBD is slow with QCOW2 images
  2021-03-05  9:16         ` Kevin Wolf
@ 2021-03-05  9:44           ` Stefano Garzarella
  0 siblings, 0 replies; 14+ messages in thread
From: Stefano Garzarella @ 2021-03-05  9:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Peter Lieven, Jason Dillaman, qemu-devel, qemu-block, armbru

On Fri, Mar 05, 2021 at 10:16:41AM +0100, Kevin Wolf wrote:
>Am 04.03.2021 um 18:32 hat Stefano Garzarella geschrieben:
>> On Thu, Mar 04, 2021 at 03:59:17PM +0100, Kevin Wolf wrote:
>> > Am 04.03.2021 um 15:08 hat Stefano Garzarella geschrieben:
>> > > On Thu, Mar 04, 2021 at 01:05:02PM +0100, Kevin Wolf wrote:
>> > > > Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben:
>> > > > > Hi Jason,
>> > > > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
>> > > > > writing data is very slow compared to a raw file.
>> > > > >
>> > > > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
>> > > > > different object size, for the raw file I see '4 MiB objects', for
>> > > > > QCOW2 I
>> > > > > see '64 KiB objects' as reported on comment 14 [2].
>> > > > > This should be the main issue of slowness, indeed forcing in the code 4 MiB
>> > > > > object size also for QCOW2 increased the speed a lot.
>> > > > >
>> > > > > Looking better I discovered that for raw files, we call rbd_create() with
>> > > > > obj_order = 0 (if 'cluster_size' options is not defined), so the default
>> > > > > object size is used.
>> > > > > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size'
>> > > > > defined for QCOW2, is 64 KiB.
>> > > >
>> > > > Hm, the QemuOpts-based image creation is messy, but why does the rbd
>> > > > driver even see the cluster_size option?
>> > > >
>> > > > The first thing qcow2_co_create_opts() does is splitting the passed
>> > > > QemuOpts into options it will process on the qcow2 layer and options
>> > > > that are passed to the protocol layer. So if you pass a cluster_size
>> > > > option, qcow2 should take it for itself and not pass it to rbd.
>> > > >
>> > > > If it is passed to rbd, I think that's a bug in the qcow2 driver.
>> > >
>> > > IIUC qcow2 properyl remove it, but when rbd uses qemu_opt_get_size_del(opts,
>> > > BLOCK_OPT_CLUSTER_SIZE, 0) the default value of qcow2 format is returned.
>> > >
>> > > Going in depth in qemu_opt_get_size_helper(), I found that qemu_opt_find()
>> > > properly returns a NULL pointer, but then we call find_default_by_name()
>> > > that returns the default value of qcow2 format (64k).
>> >
>> > Ugh, I see why. We're passing the protocol driver a QemuOpts that was
>> > created for a QemuOptsList with the qcow2 default, not for its own
>> > QemuOptsList. This is wrong.
>> >
>> > Note that the QemuOptsList is not qcow2_create_opts itself, but a list
>> > that is created with qemu_opts_append() to combine qcow2 and rbd options
>> > into a new QemuOptsList. For overlapping options, the format wins.
>> >
>> > I don't think you can change the QemuOptsList of an existing QemuOpts,
>> > nor is there a clone operation that could just copy all options into a
>> > new QemuOpts created for the rbd QemuOptsList, so maybe the easiest
>> > hack^Wsolution would be converting to QDict and back...
>>
>> Do you mean something like this? (I'll send a proper patch when everything
>> is a little clearer to me :-)
>>
>> diff --git a/block.c b/block.c
>> index a1f3cecd75..74b02b32dc 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -671,13 +671,33 @@ out:
>>  int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>>  {
>>      BlockDriver *drv;
>> +    QemuOpts *new_opts;
>> +    QDict *qdict;
>> +    int ret;
>>
>>      drv = bdrv_find_protocol(filename, true, errp);
>>      if (drv == NULL) {
>>          return -ENOENT;
>>      }
>>
>> -    return bdrv_create(drv, filename, opts, errp);
>> +    if (!drv->create_opts) {
>> +        error_setg(errp, "Driver '%s' does not support image creation",
>> +                   drv->format_name);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    qdict = qemu_opts_to_qdict(opts, NULL);
>> +    new_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
>> +    if (new_opts == NULL) {
>> +        ret = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    ret = bdrv_create(drv, filename, new_opts, errp);
>> +out:
>> +    qemu_opts_del(new_opts);
>> +    qobject_unref(qdict);
>> +    return ret;
>>  }
>
>Something like this, yes. Does it work for you?

Yes, I did some testing and only the protocol parameters are passed and 
its defaults.

Thanks for the suggestion!

>
>Of course, in the real patch it could use a comment why we're doing
>these seemingly redundant conversions.

Sure, I'm running some tests to make sure I haven't broken anything. :-)

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-05  9:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
2021-03-03 18:47 ` Jason Dillaman
2021-03-03 21:26   ` Peter Lieven
2021-03-04  8:58     ` Stefano Garzarella
2021-03-04  8:55   ` Stefano Garzarella
2021-03-04 10:25     ` Daniel P. Berrangé
2021-03-04 11:12       ` Stefano Garzarella
2021-03-04 11:15         ` Daniel P. Berrangé
2021-03-04 12:05 ` Kevin Wolf
2021-03-04 14:08   ` Stefano Garzarella
2021-03-04 14:59     ` Kevin Wolf
2021-03-04 17:32       ` Stefano Garzarella
2021-03-05  9:16         ` Kevin Wolf
2021-03-05  9:44           ` Stefano Garzarella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.