All of lore.kernel.org
 help / color / mirror / Atom feed
* FW: cgroup blkio.weight working, but not for KVM guests
       [not found] <022401cdac8d$32565fa0$97031ee0$@ncsu.edu>
@ 2012-10-22 13:36 ` Ben Clay
  2012-10-23 12:35   ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Clay @ 2012-10-22 13:36 UTC (permalink / raw)
  To: kvm

Forwarding this to the KVM general list.  I doubt you folks can help me with
libvirt, but I was wondering if there’s some way to verify if the cache=none
parameter is being respected for my KVM guest’s disk image, or if there are
any other configuration/debug steps appropriate for KVM + virtio + cgroup.

Thanks.

Ben Clay
rbclay@ncsu.edu



From: Ben Clay [mailto:rbclay@ncsu.edu] 
Sent: Wednesday, October 17, 2012 11:31 AM
To: libvirt-users@redhat.com
Subject: cgroup blkio.weight working, but not for KVM guests

I’m running libvirt 0.10.2 and qemu-kvm-1.2.0, both compiled from source, on
CentOS 6.  I’ve got a working blkio cgroup hierarchy which I’m attaching
guests to using the following XML guest configs:

VM1 (foreground):

  <cputune>
    <shares>2048</shares>
  </cputune>
  <blkiotune>
    <weight>1000</weight>
  </blkiotune>

VM2 (background): 

  <cputune>
    <shares>2</shares>
  </cputune>
  <blkiotune>
    <weight>100</weight>
  </blkiotune>

I’ve tested write throughput on the host using cgexec and dd, demonstrating
that libvirt has correctly set up the cgroups:

cgexec -g blkio:libvirt/qemu/foreground time dd if=/dev/zero of=trash1.img
oflag=direct bs=1M count=4096 & cgexec -g blkio:libvirt/qemu/background time
dd if=/dev/zero of=trash2.img oflag=direct bs=1M count=4096 &

Snap from iotop, showing an 8:1 ratio (should be 10:1, but 8:1 is
acceptable):

Total DISK READ: 0.00 B/s | Total DISK WRITE: 91.52 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
9602 be/4 root        0.00 B/s   10.71 M/s  0.00 % 98.54 % dd if=/dev/zero
of=trash2.img oflag=direct bs=1M count=4096
9601 be/4 root        0.00 B/s   80.81 M/s  0.00 % 97.76 % dd if=/dev/zero
of=trash1.img oflag=direct bs=1M count=4096

Further, checking the task list inside each cgroup shows the guest’s main
PID, plus those of the virtio kernel threads.  It’s hard to tell if all the
virtio kernel threads are listed, but all the ones I’ve hunted down appear
to be there.

However, when running the same dd commands inside the guests, I get
roughly-equal performance – nowhere near the ~8:1 relative bandwidth
enforcement I get from the host: (background ctrl-c’d right after foreground
finishes, both started within 1s of each other)

[ben@foreground ~]$ dd if=/dev/zero of=trash1.img oflag=direct bs=1M
count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 104.645 s, 41.0 MB/s

[ben@background ~]$ dd if=/dev/zero of=trash2.img oflag=direct bs=1M
count=4096
^C4052+0 records in
4052+0 records out
4248829952 bytes (4.2 GB) copied, 106.318 s, 40.0 MB/s

I thought based on this statement: “Currently, the Block I/O subsystem does
not work for buffered write operations. It is primarily targeted at direct
I/O, although it works for buffered read operations.” from this page:
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/ht
ml/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html that
this problem might be due to host-side buffering, but I have that explicitly
disabled in my guest configs:

  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source file="/path/to/disk.img"/>
      <target dev="vda" bus="virtio"/>
      <alias name="virtio-disk0"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x04"
function="0x0"/>
    </disk>

Here is the qemu line from ps, showing that it’s clearly being passed
through from the guest XML config:

root      5110 20.8  4.3 4491352 349312 ?      Sl   11:58   0:38
/usr/bin/qemu-kvm -name background -S -M pc-1.2 -enable-kvm -m 2048 -smp
2,sockets=2,cores=1,threads=1 -uuid ea632741-c7be-36ab-bd69-da3cbe505b38
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/background.monitor,server,n
owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/path/to/disk.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virti
o-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=22
-device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=
0x3 -chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc
127.0.0.1:1 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

For fun I tried a few different cache options to try to force a bypass the
host buffercache, including writethough and directsync, but the number of
virtio kernel threads appeared to explode (especially for directsync) and
the throughput dropped quite low: ~50% of “none” for writethrough and ~5%
for directsync.

With cache=none, when I generate write loads inside the VMs, I do see growth
in the host’s buffer cache.  Further, if I use non-direct I/O inside the
VMs, and inflate the balloon (forcing the guest’s buffer cache to flush), I
don’t see a corresponding drop in background throughput.  Is it possible
that the cache="none" directive is not being respected?  

Since cgroups is working for host-side processes I think my blkio subsystem
is correctly set up (using cfq, group_isolation=1 etc).  Maybe I miscompiled
qemu, without some needed direct I/O support?  Has anyone seen this before?

Ben Clay
rbclay@ncsu.edu



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FW: cgroup blkio.weight working, but not for KVM guests
  2012-10-22 13:36 ` FW: cgroup blkio.weight working, but not for KVM guests Ben Clay
@ 2012-10-23 12:35   ` Stefan Hajnoczi
  2012-10-23 22:48     ` Ben Clay
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2012-10-23 12:35 UTC (permalink / raw)
  To: Ben Clay; +Cc: kvm

On Mon, Oct 22, 2012 at 07:36:34AM -0600, Ben Clay wrote:
> Forwarding this to the KVM general list.  I doubt you folks can help me with
> libvirt, but I was wondering if there’s some way to verify if the cache=none
> parameter is being respected for my KVM guest’s disk image, or if there are
> any other configuration/debug steps appropriate for KVM + virtio + cgroup.

Here's how you can double-check the O_DIRECT flag:

Find the QEMU process PID on the host:

ps aux | grep qemu

Then find the file descriptor of the image file which the QEMU process
has open:

ls -l /proc/$PID/fd

Finally look at the file descriptor flags to confirm it is O_DIRECT:

grep ^flags: /proc/$PID/fdinfo/$FD

Note the flags field is in octal and you're looking for:

#define O_DIRECT        00040000        /* direct disk access hint */

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: FW: cgroup blkio.weight working, but not for KVM guests
  2012-10-23 12:35   ` Stefan Hajnoczi
@ 2012-10-23 22:48     ` Ben Clay
  2012-10-24  6:10       ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Clay @ 2012-10-23 22:48 UTC (permalink / raw)
  To: 'Stefan Hajnoczi'; +Cc: kvm

Stefan-

Thanks for the hand-holding, it looks like the disk file is indeed open with
O_DIRECT:

[root@host ~]# grep ^flags: /proc/$PID/fdinfo/$FD
flags:	02140002

Since this is not an issue, I guess another source of problems could be that
all the virtio threads attached to this domain are not being placed within
the cgroup.  I will look through libvirt to see if they're setting the
guest's process's cgroup classification as sticky (I can't imagine they
wouldn't be), but this raises another question: are virtio kernel threads
child processes of the guest's main process?

Are you aware of any other factor which I should be considering here?  I
reran the dd tests inside the guest with iflag=fullblock set to make sure
the guest buffer cache wasn't messing with throughput values (based on this:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=5929322cc
b1f9d27c1b07b746d37419d17a7cbf6), and got the same results listed earlier.

Thanks again!

Ben Clay
rbclay@ncsu.edu


-----Original Message-----
From: Stefan Hajnoczi [mailto:stefanha@gmail.com] 
Sent: Tuesday, October 23, 2012 6:35 AM
To: Ben Clay
Cc: kvm@vger.kernel.org
Subject: Re: FW: cgroup blkio.weight working, but not for KVM guests

On Mon, Oct 22, 2012 at 07:36:34AM -0600, Ben Clay wrote:
> Forwarding this to the KVM general list.  I doubt you folks can help 
> me with libvirt, but I was wondering if there’s some way to verify if 
> the cache=none parameter is being respected for my KVM guest’s disk 
> image, or if there are any other configuration/debug steps appropriate for
KVM + virtio + cgroup.

Here's how you can double-check the O_DIRECT flag:

Find the QEMU process PID on the host:

ps aux | grep qemu

Then find the file descriptor of the image file which the QEMU process has
open:

ls -l /proc/$PID/fd

Finally look at the file descriptor flags to confirm it is O_DIRECT:

grep ^flags: /proc/$PID/fdinfo/$FD

Note the flags field is in octal and you're looking for:

#define O_DIRECT        00040000        /* direct disk access hint */

Stefan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FW: cgroup blkio.weight working, but not for KVM guests
  2012-10-23 22:48     ` Ben Clay
@ 2012-10-24  6:10       ` Stefan Hajnoczi
  2012-10-25 17:13         ` Avi Kivity
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2012-10-24  6:10 UTC (permalink / raw)
  To: Ben Clay; +Cc: kvm

On Tue, Oct 23, 2012 at 04:48:13PM -0600, Ben Clay wrote:
> Since this is not an issue, I guess another source of problems could be that
> all the virtio threads attached to this domain are not being placed within
> the cgroup.  I will look through libvirt to see if they're setting the
> guest's process's cgroup classification as sticky (I can't imagine they
> wouldn't be), but this raises another question: are virtio kernel threads
> child processes of the guest's main process?

Virtio kernel threads?

Depend on the qemu-kvm -drive ...,aio=native|threads setting you should
either see:

1. For aio=native QEMU uses the Linux AIO API.  I think this results in
   kernel threads that process I/O on behalf of the userspace process.

2. For aio=threads QEMU uses its own userspace threadpool to call
   preadv(2)/pwritev(2).  These threads are spawned from QEMU's
   "iothread" event loop.

I suggest you try switching between aio=native and aio=threads to check
if this causes the result you have been seeing.

> Are you aware of any other factor which I should be considering here?

No, but I haven't played with the cgroups blkio controller much.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FW: cgroup blkio.weight working, but not for KVM guests
  2012-10-24  6:10       ` Stefan Hajnoczi
@ 2012-10-25 17:13         ` Avi Kivity
  0 siblings, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2012-10-25 17:13 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Ben Clay, kvm

On 10/24/2012 08:10 AM, Stefan Hajnoczi wrote:

> 1. For aio=native QEMU uses the Linux AIO API.  I think this results in
>    kernel threads that process I/O on behalf of the userspace process.

No, the request is submitted directly from io_submit(), and completion
sets the eventfd from irq context.  Usually no threads are involved.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-25 17:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <022401cdac8d$32565fa0$97031ee0$@ncsu.edu>
2012-10-22 13:36 ` FW: cgroup blkio.weight working, but not for KVM guests Ben Clay
2012-10-23 12:35   ` Stefan Hajnoczi
2012-10-23 22:48     ` Ben Clay
2012-10-24  6:10       ` Stefan Hajnoczi
2012-10-25 17:13         ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.