* [Qemu-devel] I/O parallelism on QCOW2
@ 2014-09-04 16:32 Xingbo Wu
2014-09-05 10:02 ` Stefan Hajnoczi
0 siblings, 1 reply; 4+ messages in thread
From: Xingbo Wu @ 2014-09-04 16:32 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 5598 bytes --]
Hello guys,
After running a 16-thread sync-random-write test against qcow2, It is
observed that QCOW2 seems to be serializing all its metadata-related writes.
If qcow2 is designed to do this,* then what is the concern?* What would go
wrong if this ordering is relaxed?
By providing less features, raw-file and QED scales well on parallel I/O
workload. I believe qcow2 does this with clear reasons. Thanks!
Here the qcow2 image is plugged in /dev/nbd0 via qemu-nbd. The underlying
device is a spinning disk with cfq scheduler on Linux 3.16.1.
Two pieces of the trace:
(The comments are based on my estimation and guess. Please correct me if I
misunderstood their behavior)
Seen from the trace below, the requests are issued by several different
threads. On top of the nbd0 the writes are completely unordered.
Passing through the qemu-nbd and qcow2 image file, we see mostly serialized
requests.
====QCOW2
8,32 7 4054 30.087620855 21061 *D WS 4096* + 128 [qemu-nbd]
----metadata?
8,32 7 4055 30.087626023 21061 *D WS 363008* + 624 [qemu-nbd]
----data? It get issued in parallel by chance.
8,32 7 4056 30.087992341 0 C WS 4096 + 128 [0]
8,32 7 4057 30.089205833 0 C WS 363008 + 624 [0]
8,32 7 4058 30.089264151 21061 Q FWS [qemu-nbd]
8,32 7 4059 30.089265478 21061 G FWS [qemu-nbd]
8,32 7 4060 30.089266386 21061 I FWS [qemu-nbd] ----Flush
(Q-> G ->I ->C)
8,32 7 4061 30.102978117 0 C WS 0 [0]
8,32 4 4930 30.103082669 21058 *D WS 363632* + 16 [qemu-nbd]
----In very rare cases we can see two writes on 6-digit-sector# being
issued in parallel. not this one!
8,32 0 4655 30.103243164 0 C WS 363632 + 16 [0]
8,32 4 4931 30.103261463 21058 Q FWS [qemu-nbd]
8,32 4 4932 30.103263349 21058 G FWS [qemu-nbd]
8,32 4 4933 30.103264326 21058 I FWS [qemu-nbd]
8,32 2 3772 30.103266142 21010 Q FWS [qemu-nbd]
8,32 2 3773 30.103268936 21010 G FWS [qemu-nbd]
8,32 2 3774 30.103270612 21010 I FWS [qemu-nbd]
8,32 3 3717 30.111390919 0 C WS 0 [0]
8,32 4 4934 30.129806741 0 C WS 0 [0]
8,32 6 4407 30.129880842 21062 Q FWS [qemu-nbd]
8,32 6 4408 30.129882728 21062 G FWS [qemu-nbd]
8,32 6 4409 30.129884125 21062 I FWS [qemu-nbd]
8,32 5 4807 30.130019058 0 C WS 0 [0]
8,32 5 4808 30.130033376 0 *D WS 1280* + 128 [swapper/0]
----This one looks like a metadata write.
8,32 3 3718 30.130417014 20895 C WS 1280 + 128 [0]
8,32 7 4062 30.130442436 20925 Q FWS [qemu-nbd]
8,32 7 4063 30.130450258 20925 G FWS [qemu-nbd]
8,32 7 4064 30.130451166 20925 I FWS [qemu-nbd]
8,32 6 4410 30.133539827 0 C WS 0 [0]
8,32 4 4935 30.133609250 20892 Q FWS [qemu-nbd]
8,32 4 4936 30.133625662 20892 G FWS [qemu-nbd]
8,32 4 4937 30.133626710 20892 I FWS [qemu-nbd]
8,32 7 4065 30.133758570 0 C WS 0 [0]
8,32 6 4411 30.133773516 21008 *D WS 2048* + 128 [qemu-nbd]
8,32 6 4412 30.134165396 0 C WS 2048 + 128 [0]
8,32 6 4413 30.134191167 21008 Q FWS [qemu-nbd]
8,32 6 4414 30.134192285 21008 G FWS [qemu-nbd]
8,32 6 4415 30.134193193 21008 I FWS [qemu-nbd]
8,32 4 4938 30.136255117 0 C WS 0 [0]
8,32 1 4780 30.136316368 21057 Q FWS [qemu-nbd]
8,32 1 4781 30.136318743 21057 G FWS [qemu-nbd]
8,32 1 4782 30.136320069 21057 I FWS [qemu-nbd]
8,32 5 4809 30.136467435 20891 C WS 0 [0]
====
On the raw partition things happen as I expected, the writes are issued in
parallel.
==== raw partition
8,32 0 269 5.998464860 21154 D WS 335548672 + 128 [fio]
8,32 3 391 5.998474708 21146 D WS 67113216 + 128 [fio]
8,32 7 243 5.998483857 21159 D WS 503320832 + 128 [fio]
8,32 5 506 5.998494264 21149 D WS 167776512 + 128 [fio]
8,32 2 339 5.998509489 21156 D WS 402657536 + 128 [fio]
8,32 6 879 5.998522968 21158 D WS 469766400 + 128 [fio]
8,32 1 497 5.998537286 21151 D WS 234885376 + 128 [fio]
8,32 5 507 5.998553908 21144 D WS 4352 + 128 [fio]
8,32 2 340 5.998562568 21155 D WS 369103104 + 128 [fio]
8,32 6 880 5.998571159 21150 D WS 201330944 + 128 [fio]
8,32 5 508 5.998591064 21147 D WS 100667648 + 128 [fio]
8,32 2 341 5.998603635 21152 D WS 268439808 + 128 [fio]
8,32 6 881 5.998610410 21153 D WS 301994240 + 128 [fio]
8,32 6 882 5.998640860 21157 D WS 436211968 + 128 [fio]
8,32 2 342 5.998650429 21148 D WS 134222080 + 128 [fio]
8,32 7 244 5.998825870 0 C WS 33558784 + 128 [0]
8,32 7 245 5.998848638 21145 Q FWS [fio]
8,32 7 246 5.998850175 21145 G FWS [fio]
8,32 7 247 5.998851153 21145 I FWS [fio]
8,32 0 270 5.999112918 0 C WS 335548672 + 128 [0]
8,32 0 271 5.999142600 21154 Q FWS [fio]
8,32 0 272 5.999144137 21154 G FWS [fio]
8,32 0 273 5.999145045 21154 I FWS [fio]
8,32 3 392 5.999388302 0 C WS 67113216 + 128 [0]
....
--
Cheers!
吴兴博 Wu, Xingbo <wuxb45@gmail.com>
[-- Attachment #2: Type: text/html, Size: 10347 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] I/O parallelism on QCOW2
2014-09-04 16:32 [Qemu-devel] I/O parallelism on QCOW2 Xingbo Wu
@ 2014-09-05 10:02 ` Stefan Hajnoczi
2014-09-05 16:45 ` Xingbo Wu
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2014-09-05 10:02 UTC (permalink / raw)
To: Xingbo Wu; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]
On Thu, Sep 04, 2014 at 12:32:12PM -0400, Xingbo Wu wrote:
> After running a 16-thread sync-random-write test against qcow2, It is
> observed that QCOW2 seems to be serializing all its metadata-related writes.
> If qcow2 is designed to do this,* then what is the concern?* What would go
> wrong if this ordering is relaxed?
How do you know that serializing part of the write request is a
significant bottleneck?
Please post your benchmark results with raw, qed, and qcow2 handling 1-,
8-, and 16-threads of I/O (or whatever similar benchmarks you have run).
The bottleneck may actually be something else, so please share your
benchmark configuration and results.
> By providing less features, raw-file and QED scales well on parallel I/O
> workload. I believe qcow2 does this with clear reasons. Thanks!
QED serializes allocating writes, see qed_aio_write_alloc().
In qcow2 the BdrvQcowState->lock is held across metadata updates. The
important pieces here are:
* qcow2_co_writev() only releases the lock around data writes
(including COW).
* qcow2_co_flush_to_os() holds the lock around metadata updates
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] I/O parallelism on QCOW2
2014-09-05 10:02 ` Stefan Hajnoczi
@ 2014-09-05 16:45 ` Xingbo Wu
2014-09-08 10:09 ` Stefan Hajnoczi
0 siblings, 1 reply; 4+ messages in thread
From: Xingbo Wu @ 2014-09-05 16:45 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]
On Fri, Sep 5, 2014 at 6:02 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Sep 04, 2014 at 12:32:12PM -0400, Xingbo Wu wrote:
> > After running a 16-thread sync-random-write test against qcow2, It is
> > observed that QCOW2 seems to be serializing all its metadata-related
> writes.
> > If qcow2 is designed to do this,* then what is the concern?* What would
> go
> > wrong if this ordering is relaxed?
>
> How do you know that serializing part of the write request is a
> significant bottleneck?
>
> Please post your benchmark results with raw, qed, and qcow2 handling 1-,
> 8-, and 16-threads of I/O (or whatever similar benchmarks you have run).
>
> The bottleneck may actually be something else, so please share your
> benchmark configuration and results.
>
>
Here is the fio job file:
----j1.fio:
[j1]
direct=1
ioengine=psync
thread
fdatasync=1
runtime=300
numjobs=$NJ
# filename=/dev/sd? for raw disk
filename=/dev/nbd0
rw=write
bs=64k
offset_increment=1G
----EOF
qcow2 image is created on the raw disk with -o lazy_refcounts.
The raw disk is a D3200AAJS
As you can see, the test is to measure the performance on synchronize
writes.
So practically the result does not mean qcow2 has this bottleneck with real
workload.
> By providing less features, raw-file and QED scales well on parallel I/O
> > workload. I believe qcow2 does this with clear reasons. Thanks!
>
> QED serializes allocating writes, see qed_aio_write_alloc().
>
> In qcow2 the BdrvQcowState->lock is held across metadata updates. The
> important pieces here are:
> * qcow2_co_writev() only releases the lock around data writes
> (including COW).
>
Thanks. This is what I want to confirm.
So the lock is held during the metadata write and related I/O acticity?
That's why I saw serialized metadata updates in the trace.
Could the lock be released during metadata I/O?
* qcow2_co_flush_to_os() holds the lock around metadata updates
>
>
*flush_to_os moves the data down to the image file, but not necessarily
flush them to disk.
This function should usually returns fast with no actual disk I/O. The
later calls to flush the image file would incur the FLUSH to disk. Is that
correct?
If so, the locking here does not matter.
> Stefan
>
--
Cheers!
吴兴博 Wu, Xingbo <wuxb45@gmail.com>
[-- Attachment #2: Type: text/html, Size: 4033 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] I/O parallelism on QCOW2
2014-09-05 16:45 ` Xingbo Wu
@ 2014-09-08 10:09 ` Stefan Hajnoczi
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2014-09-08 10:09 UTC (permalink / raw)
To: Xingbo Wu; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]
On Fri, Sep 05, 2014 at 12:45:27PM -0400, Xingbo Wu wrote:
> On Fri, Sep 5, 2014 at 6:02 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> > On Thu, Sep 04, 2014 at 12:32:12PM -0400, Xingbo Wu wrote:
> > > After running a 16-thread sync-random-write test against qcow2, It is
> > > observed that QCOW2 seems to be serializing all its metadata-related
> > writes.
> > > If qcow2 is designed to do this,* then what is the concern?* What would
> > go
> > > wrong if this ordering is relaxed?
> >
> > How do you know that serializing part of the write request is a
> > significant bottleneck?
> >
> > Please post your benchmark results with raw, qed, and qcow2 handling 1-,
> > 8-, and 16-threads of I/O (or whatever similar benchmarks you have run).
> >
> > The bottleneck may actually be something else, so please share your
> > benchmark configuration and results.
> >
> >
> Here is the fio job file:
> ----j1.fio:
> [j1]
> direct=1
> ioengine=psync
> thread
> fdatasync=1
> runtime=300
> numjobs=$NJ
> # filename=/dev/sd? for raw disk
> filename=/dev/nbd0
> rw=write
> bs=64k
> offset_increment=1G
> ----EOF
Aha!
The job file has fdatasync=1 so fio will issue fdatasync(2) after every
I/O. If the qcow2 file is freshly created no clusters are allocated yet
so every write is an allocating write, in other words each I/O requests
causes a flush_to_os() and writes out metadata.
So even numjobs=1 will be significantly slower with qcow2 than raw!
Serializing write requests is not the main problem here, metadata
updates are.
You can check this by running dd if=/dev/zero of=/dev/... bs=1M; sync in
the guest before running the benchmark. The qcow2 results should now be
very close to raw.
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-08 10:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-04 16:32 [Qemu-devel] I/O parallelism on QCOW2 Xingbo Wu
2014-09-05 10:02 ` Stefan Hajnoczi
2014-09-05 16:45 ` Xingbo Wu
2014-09-08 10:09 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.