[Qemu-devel] I/O parallelism on QCOW2

* [Qemu-devel] I/O parallelism on QCOW2
@ 2014-09-04 16:32 Xingbo Wu
  2014-09-05 10:02 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Xingbo Wu @ 2014-09-04 16:32 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5598 bytes --]

Hello guys,
  After running a 16-thread sync-random-write test against qcow2, It is
observed that QCOW2 seems to be serializing all its metadata-related writes.
 If qcow2 is designed to do this,* then what is the concern?* What would go
wrong if this ordering is relaxed?
By providing less features, raw-file and QED scales well on parallel I/O
workload. I believe qcow2 does this with clear reasons. Thanks!

Here the qcow2 image is plugged in /dev/nbd0 via qemu-nbd. The underlying
device is a spinning disk with cfq scheduler on Linux 3.16.1.

Two pieces of the trace:
(The comments are based on my estimation and guess. Please correct me if I
misunderstood their behavior)
Seen from the trace below, the requests are issued by several different
threads. On top of the nbd0 the writes are completely unordered.
Passing through the qemu-nbd and qcow2 image file, we see mostly serialized
requests.

====QCOW2
  8,32   7     4054    30.087620855 21061  *D  WS 4096* + 128 [qemu-nbd]
 ----metadata?
  8,32   7     4055    30.087626023 21061  *D  WS 363008* + 624 [qemu-nbd]
 ----data? It get issued in parallel by chance.
  8,32   7     4056    30.087992341     0  C  WS 4096 + 128 [0]
  8,32   7     4057    30.089205833     0  C  WS 363008 + 624 [0]
  8,32   7     4058    30.089264151 21061  Q FWS [qemu-nbd]
  8,32   7     4059    30.089265478 21061  G FWS [qemu-nbd]
  8,32   7     4060    30.089266386 21061  I FWS [qemu-nbd]    ----Flush
(Q-> G ->I ->C)
  8,32   7     4061    30.102978117     0  C  WS 0 [0]
  8,32   4     4930    30.103082669 21058  *D  WS 363632* + 16 [qemu-nbd]
 ----In very rare cases we can see two writes on 6-digit-sector# being
issued in parallel. not this one!
  8,32   0     4655    30.103243164     0  C  WS 363632 + 16 [0]
  8,32   4     4931    30.103261463 21058  Q FWS [qemu-nbd]
  8,32   4     4932    30.103263349 21058  G FWS [qemu-nbd]
  8,32   4     4933    30.103264326 21058  I FWS [qemu-nbd]
  8,32   2     3772    30.103266142 21010  Q FWS [qemu-nbd]
  8,32   2     3773    30.103268936 21010  G FWS [qemu-nbd]
  8,32   2     3774    30.103270612 21010  I FWS [qemu-nbd]
  8,32   3     3717    30.111390919     0  C  WS 0 [0]
  8,32   4     4934    30.129806741     0  C  WS 0 [0]
  8,32   6     4407    30.129880842 21062  Q FWS [qemu-nbd]
  8,32   6     4408    30.129882728 21062  G FWS [qemu-nbd]
  8,32   6     4409    30.129884125 21062  I FWS [qemu-nbd]
  8,32   5     4807    30.130019058     0  C  WS 0 [0]
  8,32   5     4808    30.130033376     0  *D  WS 1280* + 128 [swapper/0]
----This one looks like a metadata write.
  8,32   3     3718    30.130417014 20895  C  WS 1280 + 128 [0]
  8,32   7     4062    30.130442436 20925  Q FWS [qemu-nbd]
  8,32   7     4063    30.130450258 20925  G FWS [qemu-nbd]
  8,32   7     4064    30.130451166 20925  I FWS [qemu-nbd]
  8,32   6     4410    30.133539827     0  C  WS 0 [0]
  8,32   4     4935    30.133609250 20892  Q FWS [qemu-nbd]
  8,32   4     4936    30.133625662 20892  G FWS [qemu-nbd]
  8,32   4     4937    30.133626710 20892  I FWS [qemu-nbd]
  8,32   7     4065    30.133758570     0  C  WS 0 [0]
  8,32   6     4411    30.133773516 21008  *D  WS 2048* + 128 [qemu-nbd]
  8,32   6     4412    30.134165396     0  C  WS 2048 + 128 [0]
  8,32   6     4413    30.134191167 21008  Q FWS [qemu-nbd]
  8,32   6     4414    30.134192285 21008  G FWS [qemu-nbd]
  8,32   6     4415    30.134193193 21008  I FWS [qemu-nbd]
  8,32   4     4938    30.136255117     0  C  WS 0 [0]
  8,32   1     4780    30.136316368 21057  Q FWS [qemu-nbd]
  8,32   1     4781    30.136318743 21057  G FWS [qemu-nbd]
  8,32   1     4782    30.136320069 21057  I FWS [qemu-nbd]
  8,32   5     4809    30.136467435 20891  C  WS 0 [0]
====

On the raw partition things happen as I expected, the writes are issued in
parallel.

==== raw partition
  8,32   0      269     5.998464860 21154  D  WS 335548672 + 128 [fio]
  8,32   3      391     5.998474708 21146  D  WS 67113216 + 128 [fio]
  8,32   7      243     5.998483857 21159  D  WS 503320832 + 128 [fio]
  8,32   5      506     5.998494264 21149  D  WS 167776512 + 128 [fio]
  8,32   2      339     5.998509489 21156  D  WS 402657536 + 128 [fio]
  8,32   6      879     5.998522968 21158  D  WS 469766400 + 128 [fio]
  8,32   1      497     5.998537286 21151  D  WS 234885376 + 128 [fio]
  8,32   5      507     5.998553908 21144  D  WS 4352 + 128 [fio]
  8,32   2      340     5.998562568 21155  D  WS 369103104 + 128 [fio]
  8,32   6      880     5.998571159 21150  D  WS 201330944 + 128 [fio]
  8,32   5      508     5.998591064 21147  D  WS 100667648 + 128 [fio]
  8,32   2      341     5.998603635 21152  D  WS 268439808 + 128 [fio]
  8,32   6      881     5.998610410 21153  D  WS 301994240 + 128 [fio]
  8,32   6      882     5.998640860 21157  D  WS 436211968 + 128 [fio]
  8,32   2      342     5.998650429 21148  D  WS 134222080 + 128 [fio]
  8,32   7      244     5.998825870     0  C  WS 33558784 + 128 [0]
  8,32   7      245     5.998848638 21145  Q FWS [fio]
  8,32   7      246     5.998850175 21145  G FWS [fio]
  8,32   7      247     5.998851153 21145  I FWS [fio]
  8,32   0      270     5.999112918     0  C  WS 335548672 + 128 [0]
  8,32   0      271     5.999142600 21154  Q FWS [fio]
  8,32   0      272     5.999144137 21154  G FWS [fio]
  8,32   0      273     5.999145045 21154  I FWS [fio]
  8,32   3      392     5.999388302     0  C  WS 67113216 + 128 [0]
....

-- 

Cheers!
       吴兴博  Wu, Xingbo <wuxb45@gmail.com>

[-- Attachment #2: Type: text/html, Size: 10347 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread