All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] I/O parallelism on QCOW2
@ 2014-09-04 16:32 Xingbo Wu
  2014-09-05 10:02 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Xingbo Wu @ 2014-09-04 16:32 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5598 bytes --]

Hello guys,
  After running a 16-thread sync-random-write test against qcow2, It is
observed that QCOW2 seems to be serializing all its metadata-related writes.
 If qcow2 is designed to do this,* then what is the concern?* What would go
wrong if this ordering is relaxed?
By providing less features, raw-file and QED scales well on parallel I/O
workload. I believe qcow2 does this with clear reasons. Thanks!

Here the qcow2 image is plugged in /dev/nbd0 via qemu-nbd. The underlying
device is a spinning disk with cfq scheduler on Linux 3.16.1.

Two pieces of the trace:
(The comments are based on my estimation and guess. Please correct me if I
misunderstood their behavior)
Seen from the trace below, the requests are issued by several different
threads. On top of the nbd0 the writes are completely unordered.
Passing through the qemu-nbd and qcow2 image file, we see mostly serialized
requests.

====QCOW2
  8,32   7     4054    30.087620855 21061  *D  WS 4096* + 128 [qemu-nbd]
 ----metadata?
  8,32   7     4055    30.087626023 21061  *D  WS 363008* + 624 [qemu-nbd]
 ----data? It get issued in parallel by chance.
  8,32   7     4056    30.087992341     0  C  WS 4096 + 128 [0]
  8,32   7     4057    30.089205833     0  C  WS 363008 + 624 [0]
  8,32   7     4058    30.089264151 21061  Q FWS [qemu-nbd]
  8,32   7     4059    30.089265478 21061  G FWS [qemu-nbd]
  8,32   7     4060    30.089266386 21061  I FWS [qemu-nbd]    ----Flush
(Q-> G ->I ->C)
  8,32   7     4061    30.102978117     0  C  WS 0 [0]
  8,32   4     4930    30.103082669 21058  *D  WS 363632* + 16 [qemu-nbd]
 ----In very rare cases we can see two writes on 6-digit-sector# being
issued in parallel. not this one!
  8,32   0     4655    30.103243164     0  C  WS 363632 + 16 [0]
  8,32   4     4931    30.103261463 21058  Q FWS [qemu-nbd]
  8,32   4     4932    30.103263349 21058  G FWS [qemu-nbd]
  8,32   4     4933    30.103264326 21058  I FWS [qemu-nbd]
  8,32   2     3772    30.103266142 21010  Q FWS [qemu-nbd]
  8,32   2     3773    30.103268936 21010  G FWS [qemu-nbd]
  8,32   2     3774    30.103270612 21010  I FWS [qemu-nbd]
  8,32   3     3717    30.111390919     0  C  WS 0 [0]
  8,32   4     4934    30.129806741     0  C  WS 0 [0]
  8,32   6     4407    30.129880842 21062  Q FWS [qemu-nbd]
  8,32   6     4408    30.129882728 21062  G FWS [qemu-nbd]
  8,32   6     4409    30.129884125 21062  I FWS [qemu-nbd]
  8,32   5     4807    30.130019058     0  C  WS 0 [0]
  8,32   5     4808    30.130033376     0  *D  WS 1280* + 128 [swapper/0]
----This one looks like a metadata write.
  8,32   3     3718    30.130417014 20895  C  WS 1280 + 128 [0]
  8,32   7     4062    30.130442436 20925  Q FWS [qemu-nbd]
  8,32   7     4063    30.130450258 20925  G FWS [qemu-nbd]
  8,32   7     4064    30.130451166 20925  I FWS [qemu-nbd]
  8,32   6     4410    30.133539827     0  C  WS 0 [0]
  8,32   4     4935    30.133609250 20892  Q FWS [qemu-nbd]
  8,32   4     4936    30.133625662 20892  G FWS [qemu-nbd]
  8,32   4     4937    30.133626710 20892  I FWS [qemu-nbd]
  8,32   7     4065    30.133758570     0  C  WS 0 [0]
  8,32   6     4411    30.133773516 21008  *D  WS 2048* + 128 [qemu-nbd]
  8,32   6     4412    30.134165396     0  C  WS 2048 + 128 [0]
  8,32   6     4413    30.134191167 21008  Q FWS [qemu-nbd]
  8,32   6     4414    30.134192285 21008  G FWS [qemu-nbd]
  8,32   6     4415    30.134193193 21008  I FWS [qemu-nbd]
  8,32   4     4938    30.136255117     0  C  WS 0 [0]
  8,32   1     4780    30.136316368 21057  Q FWS [qemu-nbd]
  8,32   1     4781    30.136318743 21057  G FWS [qemu-nbd]
  8,32   1     4782    30.136320069 21057  I FWS [qemu-nbd]
  8,32   5     4809    30.136467435 20891  C  WS 0 [0]
====

On the raw partition things happen as I expected, the writes are issued in
parallel.

==== raw partition
  8,32   0      269     5.998464860 21154  D  WS 335548672 + 128 [fio]
  8,32   3      391     5.998474708 21146  D  WS 67113216 + 128 [fio]
  8,32   7      243     5.998483857 21159  D  WS 503320832 + 128 [fio]
  8,32   5      506     5.998494264 21149  D  WS 167776512 + 128 [fio]
  8,32   2      339     5.998509489 21156  D  WS 402657536 + 128 [fio]
  8,32   6      879     5.998522968 21158  D  WS 469766400 + 128 [fio]
  8,32   1      497     5.998537286 21151  D  WS 234885376 + 128 [fio]
  8,32   5      507     5.998553908 21144  D  WS 4352 + 128 [fio]
  8,32   2      340     5.998562568 21155  D  WS 369103104 + 128 [fio]
  8,32   6      880     5.998571159 21150  D  WS 201330944 + 128 [fio]
  8,32   5      508     5.998591064 21147  D  WS 100667648 + 128 [fio]
  8,32   2      341     5.998603635 21152  D  WS 268439808 + 128 [fio]
  8,32   6      881     5.998610410 21153  D  WS 301994240 + 128 [fio]
  8,32   6      882     5.998640860 21157  D  WS 436211968 + 128 [fio]
  8,32   2      342     5.998650429 21148  D  WS 134222080 + 128 [fio]
  8,32   7      244     5.998825870     0  C  WS 33558784 + 128 [0]
  8,32   7      245     5.998848638 21145  Q FWS [fio]
  8,32   7      246     5.998850175 21145  G FWS [fio]
  8,32   7      247     5.998851153 21145  I FWS [fio]
  8,32   0      270     5.999112918     0  C  WS 335548672 + 128 [0]
  8,32   0      271     5.999142600 21154  Q FWS [fio]
  8,32   0      272     5.999144137 21154  G FWS [fio]
  8,32   0      273     5.999145045 21154  I FWS [fio]
  8,32   3      392     5.999388302     0  C  WS 67113216 + 128 [0]
....

-- 

Cheers!
       吴兴博  Wu, Xingbo <wuxb45@gmail.com>

[-- Attachment #2: Type: text/html, Size: 10347 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] I/O parallelism on QCOW2
  2014-09-04 16:32 [Qemu-devel] I/O parallelism on QCOW2 Xingbo Wu
@ 2014-09-05 10:02 ` Stefan Hajnoczi
  2014-09-05 16:45   ` Xingbo Wu
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2014-09-05 10:02 UTC (permalink / raw)
  To: Xingbo Wu; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Thu, Sep 04, 2014 at 12:32:12PM -0400, Xingbo Wu wrote:
>   After running a 16-thread sync-random-write test against qcow2, It is
> observed that QCOW2 seems to be serializing all its metadata-related writes.
>  If qcow2 is designed to do this,* then what is the concern?* What would go
> wrong if this ordering is relaxed?

How do you know that serializing part of the write request is a
significant bottleneck?

Please post your benchmark results with raw, qed, and qcow2 handling 1-,
8-, and 16-threads of I/O (or whatever similar benchmarks you have run).

The bottleneck may actually be something else, so please share your
benchmark configuration and results.

> By providing less features, raw-file and QED scales well on parallel I/O
> workload. I believe qcow2 does this with clear reasons. Thanks!

QED serializes allocating writes, see qed_aio_write_alloc().

In qcow2 the BdrvQcowState->lock is held across metadata updates.  The
important pieces here are:
 * qcow2_co_writev() only releases the lock around data writes
   (including COW).
 * qcow2_co_flush_to_os() holds the lock around metadata updates

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] I/O parallelism on QCOW2
  2014-09-05 10:02 ` Stefan Hajnoczi
@ 2014-09-05 16:45   ` Xingbo Wu
  2014-09-08 10:09     ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Xingbo Wu @ 2014-09-05 16:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]

On Fri, Sep 5, 2014 at 6:02 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Thu, Sep 04, 2014 at 12:32:12PM -0400, Xingbo Wu wrote:
> >   After running a 16-thread sync-random-write test against qcow2, It is
> > observed that QCOW2 seems to be serializing all its metadata-related
> writes.
> >  If qcow2 is designed to do this,* then what is the concern?* What would
> go
> > wrong if this ordering is relaxed?
>
> How do you know that serializing part of the write request is a
> significant bottleneck?
>
> Please post your benchmark results with raw, qed, and qcow2 handling 1-,
> 8-, and 16-threads of I/O (or whatever similar benchmarks you have run).
>
> The bottleneck may actually be something else, so please share your
> benchmark configuration and results.
>
>
Here is the fio job file:
----j1.fio:
[j1]
direct=1
ioengine=psync
thread
fdatasync=1
runtime=300
numjobs=$NJ
# filename=/dev/sd? for raw disk
filename=/dev/nbd0
rw=write
bs=64k
offset_increment=1G
----EOF

qcow2 image is created on the raw disk with -o lazy_refcounts.
The raw disk is a D3200AAJS

As you can see, the test is to measure the performance on synchronize
writes.
So practically the result does not mean qcow2 has this bottleneck with real
workload.


> By providing less features, raw-file and QED scales well on parallel I/O
> > workload. I believe qcow2 does this with clear reasons. Thanks!
>
> QED serializes allocating writes, see qed_aio_write_alloc().
>
> In qcow2 the BdrvQcowState->lock is held across metadata updates.  The
> important pieces here are:
>  * qcow2_co_writev() only releases the lock around data writes
>    (including COW).
>

Thanks. This is what I want to confirm.
So the lock is held during the metadata write and related I/O acticity?
That's why I saw serialized metadata updates in the trace.
Could the lock be released during metadata I/O?

 * qcow2_co_flush_to_os() holds the lock around metadata updates
>
>
*flush_to_os moves the data down to the image file, but not necessarily
flush them to disk.

This function should usually returns fast with no actual disk I/O. The
later calls to flush the image file would incur the FLUSH to disk. Is that
correct?
If so, the locking here does not matter.


> Stefan
>



-- 

Cheers!
       吴兴博  Wu, Xingbo <wuxb45@gmail.com>

[-- Attachment #2: Type: text/html, Size: 4033 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] I/O parallelism on QCOW2
  2014-09-05 16:45   ` Xingbo Wu
@ 2014-09-08 10:09     ` Stefan Hajnoczi
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2014-09-08 10:09 UTC (permalink / raw)
  To: Xingbo Wu; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

On Fri, Sep 05, 2014 at 12:45:27PM -0400, Xingbo Wu wrote:
> On Fri, Sep 5, 2014 at 6:02 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Thu, Sep 04, 2014 at 12:32:12PM -0400, Xingbo Wu wrote:
> > >   After running a 16-thread sync-random-write test against qcow2, It is
> > > observed that QCOW2 seems to be serializing all its metadata-related
> > writes.
> > >  If qcow2 is designed to do this,* then what is the concern?* What would
> > go
> > > wrong if this ordering is relaxed?
> >
> > How do you know that serializing part of the write request is a
> > significant bottleneck?
> >
> > Please post your benchmark results with raw, qed, and qcow2 handling 1-,
> > 8-, and 16-threads of I/O (or whatever similar benchmarks you have run).
> >
> > The bottleneck may actually be something else, so please share your
> > benchmark configuration and results.
> >
> >
> Here is the fio job file:
> ----j1.fio:
> [j1]
> direct=1
> ioengine=psync
> thread
> fdatasync=1
> runtime=300
> numjobs=$NJ
> # filename=/dev/sd? for raw disk
> filename=/dev/nbd0
> rw=write
> bs=64k
> offset_increment=1G
> ----EOF

Aha!

The job file has fdatasync=1 so fio will issue fdatasync(2) after every
I/O.  If the qcow2 file is freshly created no clusters are allocated yet
so every write is an allocating write, in other words each I/O requests
causes a flush_to_os() and writes out metadata.

So even numjobs=1 will be significantly slower with qcow2 than raw!
Serializing write requests is not the main problem here, metadata
updates are.

You can check this by running dd if=/dev/zero of=/dev/... bs=1M; sync in
the guest before running the benchmark.  The qcow2 results should now be
very close to raw.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-08 10:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-04 16:32 [Qemu-devel] I/O parallelism on QCOW2 Xingbo Wu
2014-09-05 10:02 ` Stefan Hajnoczi
2014-09-05 16:45   ` Xingbo Wu
2014-09-08 10:09     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.