Re: QEMU RBD is slow with QCOW2 images

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Peter Lieven <pl@kamp.de>,
	dillaman@redhat.com, qemu-devel <qemu-devel@nongnu.org>,
	qemu-block <qemu-block@nongnu.org>
Subject: Re: QEMU RBD is slow with QCOW2 images
Date: Thu, 4 Mar 2021 11:15:11 +0000	[thread overview]
Message-ID: <YEDBP86Y7OxiApwX@redhat.com> (raw)
In-Reply-To: <20210304111251.2ernxss627lllwqa@steredhat>

On Thu, Mar 04, 2021 at 12:12:51PM +0100, Stefano Garzarella wrote:
> On Thu, Mar 04, 2021 at 10:25:33AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Mar 04, 2021 at 09:55:40AM +0100, Stefano Garzarella wrote:
> > > On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
> > > > On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >
> > > > > Hi Jason,
> > > > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > > > writing data is very slow compared to a raw file.
> > > > >
> > > > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > > > different object size, for the raw file I see '4 MiB objects', for QCOW2
> > > > > I see '64 KiB objects' as reported on comment 14 [2].
> > > > > This should be the main issue of slowness, indeed forcing in the code 4
> > > > > MiB object size also for QCOW2 increased the speed a lot.
> > > > >
> > > > > Looking better I discovered that for raw files, we call rbd_create()
> > > > > with obj_order = 0 (if 'cluster_size' options is not defined), so the
> > > > > default object size is used.
> > > > > Instead for QCOW2, we use obj_order = 16, since the default
> > > > > 'cluster_size' defined for QCOW2, is 64 KiB.
> > > > >
> > > > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > > > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > > > > QemuOpts calling qemu_opts_to_qdict_filtered().
> > > > > For some reason that I have yet to understand, after this deletion,
> > > > > however remains in QemuOpts the default value of 'cluster_size' for
> > > > > qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
> > > > >
> > > > > At this point my doubts are:
> > > > > Does it make sense to use the same cluster_size as qcow2 as object_size
> > > > > in RBD?
> > > >
> > > > No, not really. But it also doesn't really make any sense to put a
> > > > QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
> > > > does not put QCOW2 images on RBD, it converts QCOW2 images into raw
> > > > images to store in RBD.
> > > 
> > > Yes, that was my doubt, thanks for the confirmation.
> > > 
> > > Also Daniel (+CC) confirmed me the same thing, but just to be complete he
> > > added that there is a case where OpenStack could use qcow2 on RBD, but in
> > > this case using in-kernel RBD, so the QEMU RBD is not involved.
> > > 
> > > >
> > > > > If we want to keep the 2 options separated, how can it be done? Should
> > > > > we rename the option in block/rbd.c?
> > > >
> > > > You can already pass overrides to the RBD block driver by just
> > > > appending them after the
> > > > "rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
> > > > that could be re-used.
> > > 
> > > I see, we should extend qemu_rbd_parse_filename() to suppurt it.
> > 
> > We shouldn't really be extending the legacy filename syntax.
> > If we need extra options we want them in the QAPI schema for
> > blockdev.
> 
> Got it.
> 
> I'm still a bit confused about how QemuOpts are handled between format and
> protocol drivers.
> 
> It seems that in this case the protocol tries to access some information
> from the format (BLOCK_OPT_CLUSTER_SIZE).
> 
> Since the format removes this information from the QemuOpts passed to the
> protocol, this takes the default value of the format, even if a different
> value is specified.
> 
> Is it correct for a protocol to access BLOCK_OPT_CLUSTER_SIZE?

In a -blockdev world, the caller would be expected to set the values
explicitly at all layers that need it.

You're talking about a scenario that is non-blockdev though, and
I'm not sure what the right answer is here. Will need Kevin/Max
to answer that one.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|