[Qemu-devel] Very poor IO performance which looks like some design problem.

* [Qemu-devel] Very poor IO performance which looks like some design problem.
@ 2015-04-10 20:38 ein
  2015-04-11 13:09 ` Paolo Bonzini
  2015-04-13  1:45 ` Fam Zheng
  0 siblings, 2 replies; 8+ messages in thread
From: ein @ 2015-04-10 20:38 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 5067 bytes --]

Hello Group.

Let me describe my setup first.
Storage base are 6xSAS drives in RAID50 in IBM x3650 M3, there's
LSI ServeRAID M5015 controller (FW Package Build: 12.13.0-0179).

Disk specs:
http://www.cnet.com/products/seagate-savvio-10k-4-600gb-sas-2/specs/

I've created 6xRAID0 devices from above SAS drives. The reason was poor
performance of the controller itself in every possible RAID level. Every
virtual volume drive which is member of my raid looks like:

Virtual Drive: 3 (Target Id: 3)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 557.861 GB
Sector Size         : 512
Parity Size         : 0
State               : Optimal
Strip Size          : *128 KB*
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: *WriteBack*, *ReadAheadNone*, *Cached*, No Write
Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache
if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : *Enabled*
Encryption Type     : None
Is VD Cached: No

On that 6 RAID0 volumes I've created softraid (mdadm, Debian 8, testing)
2xRAID5 and I stripped it, which resulted in creation of RAID50 array:

Personalities : [raid6] [raid5] [raid4] [raid0]
md0 : active raid0 md2[1] md1[0]
      2339045376 blocks super 1.2 512k chunks

md2 : active raid5 sdg1[2] sdf1[1] sde1[0]
      1169653760 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/3]
[UUU]
      bitmap: 1/5 pages [4KB], 65536KB chunk

md1 : active raid5 sdd1[2] sdc1[1] sdb1[0]
      1169653760 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/3]
[UUU]
      bitmap: 1/5 pages [4KB], 65536KB chunk

On that raid, I've created ext2 fs:
mkfs.ext2 -b 4096 -E stride=128,stripe-width=512 -vvm1
/dev/mapper/hdd-images -i 4194304

Small benchmarks of sequential read and write (20GiB with echo 3 >
/proc/sys/vm/drop_caches before every test):
*1*. Filesystem benchmark:
read 380 MB/s, write 200MB/s
*2*. LVM volume benchmark:
read 409 MB/s, could not do write test
*3*. RAID device test:
423 MB/s
*4*. When I was reading continuously from 4 SAS virtual drives using dd
then I was able to hit bottleneck of the controller (6GB/s) easly.

I've installed Windows 2012 server, and I've very big problems with
optimal configuration, which allows me to maximize total troughput. Best
performance I've got in that configuration:

qemu-system-x86_64 -*enable-kvm* -name XXXX -S -machine
pc-1.1,*accel=kvm*,usb=off -*cpu host* -m 16000 -realtime mlock=off
-*smp 4,sockets=4,cores=1,threads=1* -uuid
d0e14081-b4a0-23b5-ae39-110a686b0e55 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/acm-server.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot strict=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 \
    -*drive*
file=/var/lib/libvirt/images/xxx.img,if=none,id=drive-virtio-disk0,format=*raw*,cache=*unsafe
*\*
*    -*device
virtio-blk-pci*,scsi=off,bus=pci.0,addr=0x5,*drive*=drive-virtio-disk0,id=virtio-disk0,bootindex=1\

    -*drive*
file=/dev/mapper/hdd-storage,if=none,id=drive-virtio-disk1,format=*raw*,cache=*unsafe*
\
    -*device*
*virtio-blk-pci*,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
\
    -*drive*
file=/var/lib/libvirt/images-hdd/storage.img,if=none,id=drive-virtio-disk2,format=*raw*,cache=*unsafe*
\
    -*device*
*virtio-blk-pci*,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2
\
 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f5:b5:b7,bus=pci.0,addr=0x3
-chardev spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

I was able to get 150MB/s sequential read in VM. Then I've discovered
something extraordinary, when I limited CPU count to one, not four like
before, disk throughput was almost two times bigger. Then I've realized
something:

Qemu creates more than 70 threads and everyone of them tries to write to
disk, which results in:
1. High I/O time.
2. Large latency.
2. Poor sequential read/write speeds.

When I limited number of cores, I guess I limited number of threads as
well. That's why I got better numbers.

I've tried to combine AIO native and thread setting with deadline
scheduler. Native AIO was much more worse.

The final question, is there any way to prevent Qemu for making so large
number of processes when VM does only one sequential R/W operation?

[-- Attachment #1.2.1: Type: text/html, Size: 7013 bytes --]

[-- Attachment #1.2.2: cehgjdef.png --]
[-- Type: image/png, Size: 279792 bytes --]

[-- Attachment #2: 0xF2C6EA10.asc --]
[-- Type: application/pgp-keys, Size: 4055 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread