All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark
@ 2017-07-10 17:27 Nagarajan, Padhu (HPE Storage)
  2017-07-11  9:18 ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Nagarajan, Padhu (HPE Storage) @ 2017-07-10 17:27 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Posted this in qemu-discuss and did not get a response. Hoping that someone here might be able to offer insights.

I was running an 8K random-read fio benchmark inside the guest with iodepth=32. The device used inside the guest for the test was a virtio-blk device with iothread enabled, mapped on to a raw block device on the host. While this workload was running, I took a snapshot of the CPU utilization reported by the host and the guest. The guest had 4 cores. top inside guest shows 3 idle cores and one core being 74% utilized by fio (active on core 3). The host had 12 cores and three cores were completely consumed by three qemu threads. top inside host shows three qemu threads, each utilizing the CPU core to a near 100%. These threads are "CPU 1/KVM", "CPU 3/KVM" and "IO iothread1". The CPU utilization story on the host side is the same, even if I run a light fio workload inside the guest (for ex. iodepth=1). 

Why do I see two "CPU/KVM" threads occupying 100% CPU, even though only one core inside the guest is being utilized ? Note that I had 'accel=kvm' turned on for the guest.

Why does the CPU/KVM thread on the host use 100% CPU, even though the corresponding guest core utilization is far less ?

The host operating environment was Debian 8 and the guest was CentOS 7.3. I was running qemu-2.8.1 on the host. The guest VM had 'accel=kvm' turned on. The CPU utilization snapshot from 'top' on host (left) and guest (right) at the same point in time can be found here: https://ibb.co/ho3q0F

~Padhu.

[-- Attachment #2: host_guest_cpu_util.jpg --]
[-- Type: image/jpeg, Size: 173650 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark
  2017-07-10 17:27 [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark Nagarajan, Padhu (HPE Storage)
@ 2017-07-11  9:18 ` Stefan Hajnoczi
  2017-07-13 23:34   ` Nagarajan, Padhu (HPE Storage)
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Hajnoczi @ 2017-07-11  9:18 UTC (permalink / raw)
  To: Nagarajan, Padhu (HPE Storage); +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

On Mon, Jul 10, 2017 at 05:27:15PM +0000, Nagarajan, Padhu (HPE Storage) wrote:
> Posted this in qemu-discuss and did not get a response. Hoping that someone here might be able to offer insights.
> 
> I was running an 8K random-read fio benchmark inside the guest with iodepth=32. The device used inside the guest for the test was a virtio-blk device with iothread enabled, mapped on to a raw block device on the host. While this workload was running, I took a snapshot of the CPU utilization reported by the host and the guest. The guest had 4 cores. top inside guest shows 3 idle cores and one core being 74% utilized by fio (active on core 3). The host had 12 cores and three cores were completely consumed by three qemu threads. top inside host shows three qemu threads, each utilizing the CPU core to a near 100%. These threads are "CPU 1/KVM", "CPU 3/KVM" and "IO iothread1". The CPU utilization story on the host side is the same, even if I run a light fio workload inside the guest (for ex. iodepth=1). 
> 
> Why do I see two "CPU/KVM" threads occupying 100% CPU, even though only one core inside the guest is being utilized ? Note that I had 'accel=kvm' turned on for the guest.

fio might be submitting I/O requests on one vcpu and the completion
interrupts are processed on another vcpu.

To discuss further, please post:
1. Full QEMU command-line
2. Full fio command-line and job file (if applicable)
3. Output of cat /proc/interrupts inside the guest after running the
   benchmark

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark
  2017-07-11  9:18 ` Stefan Hajnoczi
@ 2017-07-13 23:34   ` Nagarajan, Padhu (HPE Storage)
  2017-07-14 10:03     ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Nagarajan, Padhu (HPE Storage) @ 2017-07-13 23:34 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

Thanks Stefan. Couldn't get to this earlier. Did another run and took a diff of /proc/interrupts before and after the run. It shows all the interrupts for 'virtio7-req.0' going to CPU1. I guess that explains the "CPU1/KVM" vcpu utilization on the host.

34:        147     666085          0          0   PCI-MSI-edge      virtio7-req.0

The only remaining question is the high CPU utilization of the vCPU threads for this workload. Even when I run a light fio workload (queue depth of 1 which gives 8K IOPS), the vCPU threads are close to 100% utilization. Why is it high and does it have an impact on guest code that could be executing on the same CPU ?

fio command line: fio --time_based --ioengine=libaio --randrepeat=1 --direct=1 --invalidate=1 --verify=0 --offset=0 --verify_fatal=0 --group_reporting --numjobs=1 --name=randread --rw=randread --blocksize=8K --iodepth=1 --runtime=60 --filename=/dev/vdb

qemu command line: qemu-system-x86_64 -L /usr/share/seabios/ -enable-kvm -name node1,debug-threads=on -name node1 -S -machine pc-i440fx-2.8,accel=kvm,usb=off -cpu SandyBridge -m 7680 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object iothread,id=iothread1 -object iothread,id=iothread2 -object iothread,id=iothread3 -object iothread,id=iothread4 -uuid XX -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/node1fs.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device lsi,id=scsi0,bus=pci.0,addr=0x6 -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x7 -device virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x8 -drive file=rhel7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/sdc,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x17,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/sdc,if=none,id=drive-scsi1-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi1-0-0-0,id=scsi1-0-0-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=XXX,bus=pci.0,addr=0x2 -netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=YYY,bus=pci.0,multifunction=on,addr=0x15 -netdev tap,fd=28,id=hostnet2,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=ZZZ,bus=pci.0,multifunction=on,addr=0x16 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on

# qemu-system-x86_64 --version
QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3~bpo8+1)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers

Note that I had the same host block device (/dev/sdc in this case) exposed over both virtio-scsi and virtio-blk to the guest VM for perf comparisons.

I see poor performance for 8K random reads inside the guest over both virtio-scsi and virtio-blk, compared to the host performance. Let me open another thread for that problem, but let me know if something obvious pops up based on the qemu command line.

~Padhu.

-----Original Message-----
From: Stefan Hajnoczi [mailto:stefanha@gmail.com] 
Sent: Tuesday, July 11, 2017 5:19 AM
To: Nagarajan, Padhu (HPE Storage) <padhu@hpe.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark

On Mon, Jul 10, 2017 at 05:27:15PM +0000, Nagarajan, Padhu (HPE Storage) wrote:
> Posted this in qemu-discuss and did not get a response. Hoping that someone here might be able to offer insights.
> 
> I was running an 8K random-read fio benchmark inside the guest with iodepth=32. The device used inside the guest for the test was a virtio-blk device with iothread enabled, mapped on to a raw block device on the host. While this workload was running, I took a snapshot of the CPU utilization reported by the host and the guest. The guest had 4 cores. top inside guest shows 3 idle cores and one core being 74% utilized by fio (active on core 3). The host had 12 cores and three cores were completely consumed by three qemu threads. top inside host shows three qemu threads, each utilizing the CPU core to a near 100%. These threads are "CPU 1/KVM", "CPU 3/KVM" and "IO iothread1". The CPU utilization story on the host side is the same, even if I run a light fio workload inside the guest (for ex. iodepth=1). 
> 
> Why do I see two "CPU/KVM" threads occupying 100% CPU, even though only one core inside the guest is being utilized ? Note that I had 'accel=kvm' turned on for the guest.

fio might be submitting I/O requests on one vcpu and the completion
interrupts are processed on another vcpu.

To discuss further, please post:
1. Full QEMU command-line
2. Full fio command-line and job file (if applicable)
3. Output of cat /proc/interrupts inside the guest after running the
   benchmark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark
  2017-07-13 23:34   ` Nagarajan, Padhu (HPE Storage)
@ 2017-07-14 10:03     ` Stefan Hajnoczi
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2017-07-14 10:03 UTC (permalink / raw)
  To: Nagarajan, Padhu (HPE Storage); +Cc: qemu-devel

On Fri, Jul 14, 2017 at 12:34 AM, Nagarajan, Padhu (HPE Storage)
<padhu@hpe.com> wrote:
> Thanks Stefan. Couldn't get to this earlier. Did another run and took a diff of /proc/interrupts before and after the run. It shows all the interrupts for 'virtio7-req.0' going to CPU1. I guess that explains the "CPU1/KVM" vcpu utilization on the host.
>
> 34:        147     666085          0          0   PCI-MSI-edge      virtio7-req.0
>
> The only remaining question is the high CPU utilization of the vCPU threads for this workload. Even when I run a light fio workload (queue depth of 1 which gives 8K IOPS), the vCPU threads are close to 100% utilization. Why is it high and does it have an impact on guest code that could be executing on the same CPU ?

100% is high for 8K IOPS.  I wonder what "perf top" shows on the host
while the fio benchmark is running inside the guest.

The cross-CPU interrupts you saw suggest you can get better
performance by pinning vcpu and iothreads to host CPUs so that
physical storage interrupts are handed by vcpu and iothreads on the
same host CPU.  See libvirt documentation for pinning vcpus and
iothreads.

Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-14 10:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-10 17:27 [Qemu-devel] Disparity between host and guest CPU utilization during disk IO benchmark Nagarajan, Padhu (HPE Storage)
2017-07-11  9:18 ` Stefan Hajnoczi
2017-07-13 23:34   ` Nagarajan, Padhu (HPE Storage)
2017-07-14 10:03     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.