[Qemu-devel] [Bug 1581334] [NEW] qemu + librbd takes high %sy cpu under high random io workload

* [Qemu-devel] [Bug 1581334] [NEW] qemu + librbd takes high %sy cpu under high random io workload
@ 2016-05-13  6:41 chenwqin
  2016-05-13 15:01 ` Serge Hallyn
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: chenwqin @ 2016-05-13  6:41 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

I got an IO problem. When running Qemu + ceph(use librbd), and do a random IO benchmark or some high load random IO test, it will exhaust all my host cpu on %sy cpu.
It doesn’t happen all the time, but when it appear it will reproduce every time I start a random IO benchmark(test with Fio).
And the only way to fix the problem is shutdown my vm and start it, but the problem will happen again with high random IO load.

Some information:
    Vendor      : HP
    Product     : HP ProLiant BL460c Gen9
    Kernel      : 3.16.0-4-amd64
    Disto       : Debian
    Version     : 8.4
    Arch        : amd64
    Qemu        : 2.1 ~ 2.6 (Yes, I already test the latest qemu2.6 version, but still got the problem)
    Ceph        : Hammer 0.94.5
    Librbd      : 0.94.5 ~ 10.2 (I rebuild librbd with ceph 10.2 source code, but the problem still here)
    Qemu config : cache=none
    Qemu cpu&mem: 4core, 8GB

How can i reproduce the problem?

while :; do bash randwrite.sh ; sleep 3600; done >test.log 2>&1 &
(Sleep 3600 is the key to reproduce my problem. I don’t known how long sleep suit for reproduce, but one hour sleep is enough. the problem will easy reproduce after a long sleep, if i keep benchmark running without sleep, i can't reproduce it)

My randwrite.sh script
----------------------------------------------
#!/bin/sh
sync
echo 3 > /proc/sys/vm/drop_caches

FILENAME=/dev/vdc
RUNTIME=100
BLOCKSIZE=4K
IOENGINE=libaio
RESULTFILE=fio-randwrite.log
IODEPTH=32
RAMP_TIME=5
SIZE=100G

fio --numjobs 10 --norandommap --randrepeat=0 --readwrite=randwrite --ramp_time=$RAMP_TIME --bs=$BLOCKSIZE --runtime=$RUNTIME --iodepth=$IODEPTH --filename=$FILENAME --ioengine=$IOENGINE --direct=1 --name=iops_randwrite --group_reporting  | tee $RESULTFILE
----------------------------------------------

What happened after the problem appear?
my vm will got huge IOPS drop. In my case, it will drop from 15000 IOPS to 3500 IOPS. And other thing, my host cpu will exhaust on %sy. Top output like this.

Qemu Fio benchmark
----------------------------------------------------
Tasks: 284 total,   2 running, 282 sleeping,   0 stopped,   0 zombie
%Cpu0  : 11.8 us, 66.7 sy,  0.0 ni, 21.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 12.7 us, 64.9 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 13.7 us, 64.5 sy,  0.0 ni, 21.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 13.2 us, 64.1 sy,  0.0 ni, 22.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 11.7 us, 65.4 sy,  0.0 ni, 22.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 13.2 us, 64.4 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 12.4 us, 65.1 sy,  0.0 ni, 22.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 13.6 us, 63.8 sy,  0.0 ni, 22.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  9.8 us, 73.0 sy,  0.0 ni, 17.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  7.8 us, 74.5 sy,  0.0 ni, 17.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  6.0 us, 81.4 sy,  0.0 ni,  6.6 id,  0.0 wa,  0.0 hi,  6.0 si,  0.0 st
%Cpu11 :  8.4 us, 79.5 sy,  0.0 ni,  8.8 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
%Cpu12 :  7.6 us, 80.7 sy,  0.0 ni,  7.0 id,  0.0 wa,  0.0 hi,  4.7 si,  0.0 st
%Cpu13 :  7.4 us, 79.9 sy,  0.0 ni,  7.7 id,  0.0 wa,  0.0 hi,  5.0 si,  0.0 st
%Cpu14 :  9.8 us, 75.4 sy,  0.0 ni, 11.4 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
%Cpu15 :  6.7 us, 80.1 sy,  0.0 ni, 10.1 id,  0.0 wa,  0.0 hi,  3.0 si,  0.0 st
%Cpu16 :  9.2 us, 69.2 sy,  0.0 ni, 17.5 id,  0.0 wa,  0.0 hi,  4.1 si,  0.0 st
%Cpu17 :  9.9 us, 66.6 sy,  0.0 ni, 20.1 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
%Cpu18 : 16.6 us, 49.0 sy,  0.0 ni, 34.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 : 16.7 us, 46.4 sy,  0.0 ni, 36.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 : 13.0 us, 50.8 sy,  0.0 ni, 36.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 : 18.9 us, 46.2 sy,  0.0 ni, 34.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 : 12.1 us, 52.9 sy,  0.0 ni, 35.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 : 15.9 us, 47.6 sy,  0.0 ni, 36.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  6.7 us, 62.0 sy,  0.0 ni, 31.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  7.6 us, 63.7 sy,  0.0 ni, 28.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  8.1 us, 75.8 sy,  0.0 ni, 16.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  6.7 us, 73.6 sy,  0.0 ni, 19.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  9.2 us, 74.3 sy,  0.0 ni, 16.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  8.2 us, 73.3 sy,  0.0 ni, 18.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  4.4 us, 73.1 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  7.5 us, 69.6 sy,  0.0 ni, 22.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  13217662+total,  3721572 used, 12845504+free,   283228 buffers
KiB Swap:  4194300 total,        0 used,  4194300 free.  2242976 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
30349 root      20   0 25.381g 499892  20640 R  2495  0.4 119:11.98 qemu-system-x86

Anything I do?
I use perf top, profile to debug the problem. It show me that something like thread deadlock problem. Any I test QEMU with kernel RBD, it work fine.
Here are the perf top output on host.
---------------------------------------------------------------
   PerfTop:   12393 irqs/sec  kernel:87.3%  exact:  0.0% [4000Hz cycles],  (all, 32 CPUs)
-------------------------------------------------------------------------------

    75.25%  [kernel]            [k] _raw_spin_lock
     1.17%  [kernel]            [k] futex_wait_setup
     0.86%  libc-2.19.so        [.] malloc
     0.58%  [kernel]            [k] futex_wake
     0.55%  libc-2.19.so        [.] 0x00000000000ea96f
     0.41%  [kernel]            [k] native_write_msr_safe
---------------------------------------------------------------

** Affects: qemu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1581334

Title:
  qemu + librbd takes high %sy cpu under high random io workload

Status in QEMU:
  New

Bug description:
  I got an IO problem. When running Qemu + ceph(use librbd), and do a random IO benchmark or some high load random IO test, it will exhaust all my host cpu on %sy cpu.
  It doesn’t happen all the time, but when it appear it will reproduce every time I start a random IO benchmark(test with Fio).
  And the only way to fix the problem is shutdown my vm and start it, but the problem will happen again with high random IO load.

  Some information:
      Vendor      : HP
      Product     : HP ProLiant BL460c Gen9
      Kernel      : 3.16.0-4-amd64
      Disto       : Debian
      Version     : 8.4
      Arch        : amd64
      Qemu        : 2.1 ~ 2.6 (Yes, I already test the latest qemu2.6 version, but still got the problem)
      Ceph        : Hammer 0.94.5
      Librbd      : 0.94.5 ~ 10.2 (I rebuild librbd with ceph 10.2 source code, but the problem still here)
      Qemu config : cache=none
      Qemu cpu&mem: 4core, 8GB

  How can i reproduce the problem?

  while :; do bash randwrite.sh ; sleep 3600; done >test.log 2>&1 &
  (Sleep 3600 is the key to reproduce my problem. I don’t known how long sleep suit for reproduce, but one hour sleep is enough. the problem will easy reproduce after a long sleep, if i keep benchmark running without sleep, i can't reproduce it)

  My randwrite.sh script
  ----------------------------------------------
  #!/bin/sh
  sync
  echo 3 > /proc/sys/vm/drop_caches

  FILENAME=/dev/vdc
  RUNTIME=100
  BLOCKSIZE=4K
  IOENGINE=libaio
  RESULTFILE=fio-randwrite.log
  IODEPTH=32
  RAMP_TIME=5
  SIZE=100G

  fio --numjobs 10 --norandommap --randrepeat=0 --readwrite=randwrite --ramp_time=$RAMP_TIME --bs=$BLOCKSIZE --runtime=$RUNTIME --iodepth=$IODEPTH --filename=$FILENAME --ioengine=$IOENGINE --direct=1 --name=iops_randwrite --group_reporting  | tee $RESULTFILE
  ----------------------------------------------

  What happened after the problem appear?
  my vm will got huge IOPS drop. In my case, it will drop from 15000 IOPS to 3500 IOPS. And other thing, my host cpu will exhaust on %sy. Top output like this.

  Qemu Fio benchmark
  ----------------------------------------------------
  Tasks: 284 total,   2 running, 282 sleeping,   0 stopped,   0 zombie
  %Cpu0  : 11.8 us, 66.7 sy,  0.0 ni, 21.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu1  : 12.7 us, 64.9 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu2  : 13.7 us, 64.5 sy,  0.0 ni, 21.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu3  : 13.2 us, 64.1 sy,  0.0 ni, 22.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu4  : 11.7 us, 65.4 sy,  0.0 ni, 22.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu5  : 13.2 us, 64.4 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu6  : 12.4 us, 65.1 sy,  0.0 ni, 22.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu7  : 13.6 us, 63.8 sy,  0.0 ni, 22.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu8  :  9.8 us, 73.0 sy,  0.0 ni, 17.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu9  :  7.8 us, 74.5 sy,  0.0 ni, 17.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu10 :  6.0 us, 81.4 sy,  0.0 ni,  6.6 id,  0.0 wa,  0.0 hi,  6.0 si,  0.0 st
  %Cpu11 :  8.4 us, 79.5 sy,  0.0 ni,  8.8 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
  %Cpu12 :  7.6 us, 80.7 sy,  0.0 ni,  7.0 id,  0.0 wa,  0.0 hi,  4.7 si,  0.0 st
  %Cpu13 :  7.4 us, 79.9 sy,  0.0 ni,  7.7 id,  0.0 wa,  0.0 hi,  5.0 si,  0.0 st
  %Cpu14 :  9.8 us, 75.4 sy,  0.0 ni, 11.4 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
  %Cpu15 :  6.7 us, 80.1 sy,  0.0 ni, 10.1 id,  0.0 wa,  0.0 hi,  3.0 si,  0.0 st
  %Cpu16 :  9.2 us, 69.2 sy,  0.0 ni, 17.5 id,  0.0 wa,  0.0 hi,  4.1 si,  0.0 st
  %Cpu17 :  9.9 us, 66.6 sy,  0.0 ni, 20.1 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
  %Cpu18 : 16.6 us, 49.0 sy,  0.0 ni, 34.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu19 : 16.7 us, 46.4 sy,  0.0 ni, 36.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu20 : 13.0 us, 50.8 sy,  0.0 ni, 36.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu21 : 18.9 us, 46.2 sy,  0.0 ni, 34.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu22 : 12.1 us, 52.9 sy,  0.0 ni, 35.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu23 : 15.9 us, 47.6 sy,  0.0 ni, 36.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu24 :  6.7 us, 62.0 sy,  0.0 ni, 31.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu25 :  7.6 us, 63.7 sy,  0.0 ni, 28.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu26 :  8.1 us, 75.8 sy,  0.0 ni, 16.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu27 :  6.7 us, 73.6 sy,  0.0 ni, 19.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu28 :  9.2 us, 74.3 sy,  0.0 ni, 16.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu29 :  8.2 us, 73.3 sy,  0.0 ni, 18.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu30 :  4.4 us, 73.1 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  %Cpu31 :  7.5 us, 69.6 sy,  0.0 ni, 22.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  KiB Mem:  13217662+total,  3721572 used, 12845504+free,   283228 buffers
  KiB Swap:  4194300 total,        0 used,  4194300 free.  2242976 cached Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  30349 root      20   0 25.381g 499892  20640 R  2495  0.4 119:11.98 qemu-system-x86

  Anything I do?
  I use perf top, profile to debug the problem. It show me that something like thread deadlock problem. Any I test QEMU with kernel RBD, it work fine.
  Here are the perf top output on host.
  ---------------------------------------------------------------
     PerfTop:   12393 irqs/sec  kernel:87.3%  exact:  0.0% [4000Hz cycles],  (all, 32 CPUs)
  -------------------------------------------------------------------------------

      75.25%  [kernel]            [k] _raw_spin_lock
       1.17%  [kernel]            [k] futex_wait_setup
       0.86%  libc-2.19.so        [.] malloc
       0.58%  [kernel]            [k] futex_wake
       0.55%  libc-2.19.so        [.] 0x00000000000ea96f
       0.41%  [kernel]            [k] native_write_msr_safe
  ---------------------------------------------------------------

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1581334/+subscriptions

^ permalink raw reply	[flat|nested] 11+ messages in thread