All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Peter Lieven <pl@kamp.de>,
	dillaman@redhat.com, qemu-devel <qemu-devel@nongnu.org>,
	qemu-block <qemu-block@nongnu.org>
Subject: Re: QEMU RBD is slow with QCOW2 images
Date: Thu, 4 Mar 2021 11:15:11 +0000	[thread overview]
Message-ID: <YEDBP86Y7OxiApwX@redhat.com> (raw)
In-Reply-To: <20210304111251.2ernxss627lllwqa@steredhat>

On Thu, Mar 04, 2021 at 12:12:51PM +0100, Stefano Garzarella wrote:
> On Thu, Mar 04, 2021 at 10:25:33AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Mar 04, 2021 at 09:55:40AM +0100, Stefano Garzarella wrote:
> > > On Wed, Mar 03, 2021 at 01:47:06PM -0500, Jason Dillaman wrote:
> > > > On Wed, Mar 3, 2021 at 12:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >
> > > > > Hi Jason,
> > > > > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD
> > > > > writing data is very slow compared to a raw file.
> > > > >
> > > > > Comparing raw vs QCOW2 image creation with RBD I found that we use a
> > > > > different object size, for the raw file I see '4 MiB objects', for QCOW2
> > > > > I see '64 KiB objects' as reported on comment 14 [2].
> > > > > This should be the main issue of slowness, indeed forcing in the code 4
> > > > > MiB object size also for QCOW2 increased the speed a lot.
> > > > >
> > > > > Looking better I discovered that for raw files, we call rbd_create()
> > > > > with obj_order = 0 (if 'cluster_size' options is not defined), so the
> > > > > default object size is used.
> > > > > Instead for QCOW2, we use obj_order = 16, since the default
> > > > > 'cluster_size' defined for QCOW2, is 64 KiB.
> > > > >
> > > > > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster
> > > > > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from
> > > > > QemuOpts calling qemu_opts_to_qdict_filtered().
> > > > > For some reason that I have yet to understand, after this deletion,
> > > > > however remains in QemuOpts the default value of 'cluster_size' for
> > > > > qcow2 (64 KiB), that it's used in qemu_rbd_co_create_opts()
> > > > >
> > > > > At this point my doubts are:
> > > > > Does it make sense to use the same cluster_size as qcow2 as object_size
> > > > > in RBD?
> > > >
> > > > No, not really. But it also doesn't really make any sense to put a
> > > > QCOW2 image within an RBD image. To clarify from the BZ, OpenStack
> > > > does not put QCOW2 images on RBD, it converts QCOW2 images into raw
> > > > images to store in RBD.
> > > 
> > > Yes, that was my doubt, thanks for the confirmation.
> > > 
> > > Also Daniel (+CC) confirmed me the same thing, but just to be complete he
> > > added that there is a case where OpenStack could use qcow2 on RBD, but in
> > > this case using in-kernel RBD, so the QEMU RBD is not involved.
> > > 
> > > >
> > > > > If we want to keep the 2 options separated, how can it be done? Should
> > > > > we rename the option in block/rbd.c?
> > > >
> > > > You can already pass overrides to the RBD block driver by just
> > > > appending them after the
> > > > "rbd:<filename>[:option1=value1[:option2=value2]]" portion, perhaps
> > > > that could be re-used.
> > > 
> > > I see, we should extend qemu_rbd_parse_filename() to suppurt it.
> > 
> > We shouldn't really be extending the legacy filename syntax.
> > If we need extra options we want them in the QAPI schema for
> > blockdev.
> 
> Got it.
> 
> I'm still a bit confused about how QemuOpts are handled between format and
> protocol drivers.
> 
> It seems that in this case the protocol tries to access some information
> from the format (BLOCK_OPT_CLUSTER_SIZE).
> 
> Since the format removes this information from the QemuOpts passed to the
> protocol, this takes the default value of the format, even if a different
> value is specified.
> 
> Is it correct for a protocol to access BLOCK_OPT_CLUSTER_SIZE?

In a -blockdev world, the caller would be expected to set the values
explicitly at all layers that need it.

You're talking about a scenario that is non-blockdev though, and
I'm not sure what the right answer is here. Will need Kevin/Max
to answer that one.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2021-03-04 11:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03 17:40 QEMU RBD is slow with QCOW2 images Stefano Garzarella
2021-03-03 18:47 ` Jason Dillaman
2021-03-03 21:26   ` Peter Lieven
2021-03-04  8:58     ` Stefano Garzarella
2021-03-04  8:55   ` Stefano Garzarella
2021-03-04 10:25     ` Daniel P. Berrangé
2021-03-04 11:12       ` Stefano Garzarella
2021-03-04 11:15         ` Daniel P. Berrangé [this message]
2021-03-04 12:05 ` Kevin Wolf
2021-03-04 14:08   ` Stefano Garzarella
2021-03-04 14:59     ` Kevin Wolf
2021-03-04 17:32       ` Stefano Garzarella
2021-03-05  9:16         ` Kevin Wolf
2021-03-05  9:44           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YEDBP86Y7OxiApwX@redhat.com \
    --to=berrange@redhat.com \
    --cc=dillaman@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.