All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Very poor IO performance which looks like some design problem.
@ 2015-04-10 20:38 ein
  2015-04-11 13:09 ` Paolo Bonzini
  2015-04-13  1:45 ` Fam Zheng
  0 siblings, 2 replies; 8+ messages in thread
From: ein @ 2015-04-10 20:38 UTC (permalink / raw)
  To: qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 5067 bytes --]

Hello Group.

Let me describe my setup first.
Storage base are 6xSAS drives in RAID50 in IBM x3650 M3, there's
LSI ServeRAID M5015 controller (FW Package Build: 12.13.0-0179).

Disk specs:
http://www.cnet.com/products/seagate-savvio-10k-4-600gb-sas-2/specs/

I've created 6xRAID0 devices from above SAS drives. The reason was poor
performance of the controller itself in every possible RAID level. Every
virtual volume drive which is member of my raid looks like:

Virtual Drive: 3 (Target Id: 3)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 557.861 GB
Sector Size         : 512
Parity Size         : 0
State               : Optimal
Strip Size          : *128 KB*
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: *WriteBack*, *ReadAheadNone*, *Cached*, No Write
Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache
if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : *Enabled*
Encryption Type     : None
Is VD Cached: No

On that 6 RAID0 volumes I've created softraid (mdadm, Debian 8, testing)
2xRAID5 and I stripped it, which resulted in creation of RAID50 array:

Personalities : [raid6] [raid5] [raid4] [raid0]
md0 : active raid0 md2[1] md1[0]
      2339045376 blocks super 1.2 512k chunks
     
md2 : active raid5 sdg1[2] sdf1[1] sde1[0]
      1169653760 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/3]
[UUU]
      bitmap: 1/5 pages [4KB], 65536KB chunk

md1 : active raid5 sdd1[2] sdc1[1] sdb1[0]
      1169653760 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/3]
[UUU]
      bitmap: 1/5 pages [4KB], 65536KB chunk

On that raid, I've created ext2 fs:
mkfs.ext2 -b 4096 -E stride=128,stripe-width=512 -vvm1
/dev/mapper/hdd-images -i 4194304

Small benchmarks of sequential read and write (20GiB with echo 3 >
/proc/sys/vm/drop_caches before every test):
*1*. Filesystem benchmark:
read 380 MB/s, write 200MB/s
*2*. LVM volume benchmark:
read 409 MB/s, could not do write test
*3*. RAID device test:
423 MB/s
*4*. When I was reading continuously from 4 SAS virtual drives using dd
then I was able to hit bottleneck of the controller (6GB/s) easly.

I've installed Windows 2012 server, and I've very big problems with
optimal configuration, which allows me to maximize total troughput. Best
performance I've got in that configuration:

qemu-system-x86_64 -*enable-kvm* -name XXXX -S -machine
pc-1.1,*accel=kvm*,usb=off -*cpu host* -m 16000 -realtime mlock=off
-*smp 4,sockets=4,cores=1,threads=1* -uuid
d0e14081-b4a0-23b5-ae39-110a686b0e55 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/acm-server.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot strict=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 \
    -*drive*
file=/var/lib/libvirt/images/xxx.img,if=none,id=drive-virtio-disk0,format=*raw*,cache=*unsafe
*\*
*    -*device
virtio-blk-pci*,scsi=off,bus=pci.0,addr=0x5,*drive*=drive-virtio-disk0,id=virtio-disk0,bootindex=1\

    -*drive*
file=/dev/mapper/hdd-storage,if=none,id=drive-virtio-disk1,format=*raw*,cache=*unsafe*
\
    -*device*
*virtio-blk-pci*,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
\
    -*drive*
file=/var/lib/libvirt/images-hdd/storage.img,if=none,id=drive-virtio-disk2,format=*raw*,cache=*unsafe*
\
    -*device*
*virtio-blk-pci*,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2
\
 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f5:b5:b7,bus=pci.0,addr=0x3
-chardev spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

I was able to get 150MB/s sequential read in VM. Then I've discovered
something extraordinary, when I limited CPU count to one, not four like
before, disk throughput was almost two times bigger. Then I've realized
something:



Qemu creates more than 70 threads and everyone of them tries to write to
disk, which results in:
1. High I/O time.
2. Large latency.
2. Poor sequential read/write speeds.

When I limited number of cores, I guess I limited number of threads as
well. That's why I got better numbers.

I've tried to combine AIO native and thread setting with deadline
scheduler. Native AIO was much more worse.

The final question, is there any way to prevent Qemu for making so large
number of processes when VM does only one sequential R/W operation?






[-- Attachment #1.2.1: Type: text/html, Size: 7013 bytes --]

[-- Attachment #1.2.2: cehgjdef.png --]
[-- Type: image/png, Size: 279792 bytes --]

[-- Attachment #2: 0xF2C6EA10.asc --]
[-- Type: application/pgp-keys, Size: 4055 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-10 20:38 [Qemu-devel] Very poor IO performance which looks like some design problem ein
@ 2015-04-11 13:09 ` Paolo Bonzini
  2015-04-11 17:10   ` ein
  2015-04-13  1:45 ` Fam Zheng
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2015-04-11 13:09 UTC (permalink / raw)
  To: ein, qemu-devel



On 10/04/2015 22:38, ein wrote:
> 
> Qemu creates more than 70 threads and everyone of them tries to write to
> disk, which results in:
> 1. High I/O time.
> 2. Large latency.
> 2. Poor sequential read/write speeds.
> 
> When I limited number of cores, I guess I limited number of threads as
> well. That's why I got better numbers.
> 
> I've tried to combine AIO native and thread setting with deadline
> scheduler. Native AIO was much more worse.
> 
> The final question, is there any way to prevent Qemu for making so large
> number of processes when VM does only one sequential R/W operation?

Use "aio=native,cache=none".  If that's not enough, you'll need to use
XFS or a block device; ext4 suffers from spinlock contention on O_DIRECT
I/O.

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-11 13:09 ` Paolo Bonzini
@ 2015-04-11 17:10   ` ein
  2015-04-11 19:00     ` ein
  0 siblings, 1 reply; 8+ messages in thread
From: ein @ 2015-04-11 17:10 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]

On 04/11/2015 03:09 PM, Paolo Bonzini wrote:
> On 10/04/2015 22:38, ein wrote:
>> Qemu creates more than 70 threads and everyone of them tries to write to
>> disk, which results in:
>> 1. High I/O time.
>> 2. Large latency.
>> 2. Poor sequential read/write speeds.
>>
>> When I limited number of cores, I guess I limited number of threads as
>> well. That's why I got better numbers.
>>
>> I've tried to combine AIO native and thread setting with deadline
>> scheduler. Native AIO was much more worse.
>>
>> The final question, is there any way to prevent Qemu for making so large
>> number of processes when VM does only one sequential R/W operation?
> Use "aio=native,cache=none".  If that's not enough, you'll need to use
> XFS or a block device; ext4 suffers from spinlock contention on O_DIRECT
> I/O.
Hello Paolo and thank you for reply.

Firstly, I do use ext2 now, which gave me more MiB/s than XFS in the
past. I've tried combination with XFS and block_device with NTFS (4KB)
on it. I did tests with AIO=native,cache=none. Results in this workload
was significantly worse. I don't have numbers on me right now but if
somebody is interested, I'll redo the tests. From my experience I can
say that disabling every software caches gives significant boost in
sequential RW ops. I mean: Qemu cache, linux kernel dirty pages or even
caching on VM itself. It makes somehow speed of data flow softer and
more stable. Using cache creates hiccups. Firstly there's enormous speed
for couple of seconds, more than hardware is capable of, then flush and
no data flow at all (or very little) in few / over a dozen / seconds.





[-- Attachment #2: 0xF2C6EA10.asc --]
[-- Type: application/pgp-keys, Size: 4055 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-11 17:10   ` ein
@ 2015-04-11 19:00     ` ein
  0 siblings, 0 replies; 8+ messages in thread
From: ein @ 2015-04-11 19:00 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 2692 bytes --]

Let me present some tests:
Before every tests I did drop caches on the host and reboot the VM.


  *1.* 8 cores:

Coping large file inside VM from/to different physical disks (source is
soft raid, destination is SSD in hardware RAID1):
write

read


  *1.* 1 core:





write

read



  *3.* 1 core, image @ block device:




write

read


  *3.* 1 core, image @ block device + AIO=native,cache=none:




Write

Read


  *4.* 8 cores, disk image @ lvm/xfs @ SSD (hw RAID1):


Copy & paste big file in VM from SSD to SSD (same logical volume).



Copy & paste big file in VM from SSD to SSD (same logical volume).


  *5.* 1 core, disk image @ lvm/xfs @ SSD (hw RAID1):




Copy & paste big file in VM from SSD to SSD (same logical volume).



As u can see there is significant correlation between vCPU number and
throughput. Increasing vCPU number *will* decease throughput.


On 04/11/2015 07:10 PM, ein wrote:
> On 04/11/2015 03:09 PM, Paolo Bonzini wrote:
>> On 10/04/2015 22:38, ein wrote:
>>> Qemu creates more than 70 threads and everyone of them tries to write to
>>> disk, which results in:
>>> 1. High I/O time.
>>> 2. Large latency.
>>> 2. Poor sequential read/write speeds.
>>>
>>> When I limited number of cores, I guess I limited number of threads as
>>> well. That's why I got better numbers.
>>>
>>> I've tried to combine AIO native and thread setting with deadline
>>> scheduler. Native AIO was much more worse.
>>>
>>> The final question, is there any way to prevent Qemu for making so large
>>> number of processes when VM does only one sequential R/W operation?
>> Use "aio=native,cache=none".  If that's not enough, you'll need to use
>> XFS or a block device; ext4 suffers from spinlock contention on O_DIRECT
>> I/O.
> Hello Paolo and thank you for reply.
>
> Firstly, I do use ext2 now, which gave me more MiB/s than XFS in the
> past. I've tried combination with XFS and block_device with NTFS (4KB)
> on it. I did tests with AIO=native,cache=none. Results in this workload
> was significantly worse. I don't have numbers on me right now but if
> somebody is interested, I'll redo the tests. From my experience I can
> say that disabling every software caches gives significant boost in
> sequential RW ops. I mean: Qemu cache, linux kernel dirty pages or even
> caching on VM itself. It makes somehow speed of data flow softer and
> more stable. Using cache creates hiccups. Firstly there's enormous speed
> for couple of seconds, more than hardware is capable of, then flush and
> no data flow at all (or very little) in few / over a dozen / seconds.
>
>
>
>


[-- Attachment #1.2.1: Type: text/html, Size: 5830 bytes --]

[-- Attachment #1.2.2: idfdjbih.png --]
[-- Type: image/png, Size: 17844 bytes --]

[-- Attachment #1.2.3: fhjgcejd.png --]
[-- Type: image/png, Size: 49265 bytes --]

[-- Attachment #1.2.4: dagcajaf.png --]
[-- Type: image/png, Size: 35415 bytes --]

[-- Attachment #1.2.5: ifdgddbb.png --]
[-- Type: image/png, Size: 12063 bytes --]

[-- Attachment #1.2.6: ibfhejgh.png --]
[-- Type: image/png, Size: 193728 bytes --]

[-- Attachment #1.2.7: fghjbbfc.png --]
[-- Type: image/png, Size: 18776 bytes --]

[-- Attachment #1.2.8: jgcaadgh.png --]
[-- Type: image/png, Size: 160790 bytes --]

[-- Attachment #1.2.9: gjjgfcfi.png --]
[-- Type: image/png, Size: 67497 bytes --]

[-- Attachment #1.2.10: aicchjad.png --]
[-- Type: image/png, Size: 33038 bytes --]

[-- Attachment #1.2.11: ihedgdjj.png --]
[-- Type: image/png, Size: 18106 bytes --]

[-- Attachment #1.2.12: acjfijff.png --]
[-- Type: image/png, Size: 110234 bytes --]

[-- Attachment #1.2.13: igaeahch.png --]
[-- Type: image/png, Size: 16484 bytes --]

[-- Attachment #1.2.14: bgegcfec.png --]
[-- Type: image/png, Size: 116731 bytes --]

[-- Attachment #1.2.15: cffjjdij.png --]
[-- Type: image/png, Size: 15763 bytes --]

[-- Attachment #1.2.16: degjffdh.png --]
[-- Type: image/png, Size: 61304 bytes --]

[-- Attachment #1.2.17: debfdbaf.png --]
[-- Type: image/png, Size: 50363 bytes --]

[-- Attachment #1.2.18: bbdefehh.png --]
[-- Type: image/png, Size: 14555 bytes --]

[-- Attachment #1.2.19: hcacddhb.png --]
[-- Type: image/png, Size: 70929 bytes --]

[-- Attachment #1.2.20: bcfbeegh.png --]
[-- Type: image/png, Size: 17987 bytes --]

[-- Attachment #1.2.21: facfabee.png --]
[-- Type: image/png, Size: 268526 bytes --]

[-- Attachment #1.2.22: ecddhghj.png --]
[-- Type: image/png, Size: 17411 bytes --]

[-- Attachment #1.2.23: ahacecag.png --]
[-- Type: image/png, Size: 69755 bytes --]

[-- Attachment #1.2.24: icgffjia.png --]
[-- Type: image/png, Size: 40828 bytes --]

[-- Attachment #1.2.25: ccibechd.png --]
[-- Type: image/png, Size: 16238 bytes --]

[-- Attachment #1.2.26: ejiadgjd.png --]
[-- Type: image/png, Size: 62981 bytes --]

[-- Attachment #1.2.27: aifdhije.png --]
[-- Type: image/png, Size: 17620 bytes --]

[-- Attachment #1.2.28: bfcjcicc.png --]
[-- Type: image/png, Size: 62605 bytes --]

[-- Attachment #1.2.29: fgjagadd.png --]
[-- Type: image/png, Size: 17095 bytes --]

[-- Attachment #1.2.30: cdeidehf.png --]
[-- Type: image/png, Size: 62478 bytes --]

[-- Attachment #1.2.31: dccacadc.png --]
[-- Type: image/png, Size: 46731 bytes --]

[-- Attachment #1.2.32: ffcebeie.png --]
[-- Type: image/png, Size: 15277 bytes --]

[-- Attachment #1.2.33: cjfddddj.png --]
[-- Type: image/png, Size: 161907 bytes --]

[-- Attachment #1.2.34: gdgbecgf.png --]
[-- Type: image/png, Size: 71380 bytes --]

[-- Attachment #1.2.35: jafdahej.png --]
[-- Type: image/png, Size: 70888 bytes --]

[-- Attachment #1.2.36: eiecfeef.png --]
[-- Type: image/png, Size: 16920 bytes --]

[-- Attachment #1.2.37: gcjcgjab.png --]
[-- Type: image/png, Size: 132695 bytes --]

[-- Attachment #2: 0xF2C6EA10.asc --]
[-- Type: application/pgp-keys, Size: 4055 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-10 20:38 [Qemu-devel] Very poor IO performance which looks like some design problem ein
  2015-04-11 13:09 ` Paolo Bonzini
@ 2015-04-13  1:45 ` Fam Zheng
  2015-04-13 12:28   ` ein
  1 sibling, 1 reply; 8+ messages in thread
From: Fam Zheng @ 2015-04-13  1:45 UTC (permalink / raw)
  To: ein; +Cc: qemu-devel

On Fri, 04/10 22:38, ein wrote:
> Qemu creates more than 70 threads and everyone of them tries to write to
> disk, which results in:
> 1. High I/O time.
> 2. Large latency.
> 2. Poor sequential read/write speeds.
> 
> When I limited number of cores, I guess I limited number of threads as
> well. That's why I got better numbers.
> 
> I've tried to combine AIO native and thread setting with deadline
> scheduler. Native AIO was much more worse.
> 
> The final question, is there any way to prevent Qemu for making so large
> number of processes when VM does only one sequential R/W operation?

aio=native will make QEMU only submit IO from the IO thread, so you shouldn't
see 70 threads with that. And that should usually be a better option for
performance.


Fam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-13  1:45 ` Fam Zheng
@ 2015-04-13 12:28   ` ein
  2015-04-13 13:53     ` Paolo Bonzini
  2015-04-14 10:31     ` Kevin Wolf
  0 siblings, 2 replies; 8+ messages in thread
From: ein @ 2015-04-13 12:28 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 1503 bytes --]

Dear Fam,

Check out my update please:
http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg01318.html

Using aio=native,cache=none results in 500%-2000% performance drop
comparing to bare metal and 300%-1000% comparing to
aio=threads,cache=unsafe.

I'd like to know what numbers should I expect. Can somebody share what
results do you have with aio=native in sequential IO ops? And of course
please add some info about disk and controller configuration.
Maybe there's some bug in current version of Qemu in Debian 8 - testing.
Package details: https://packages.debian.org/jessie/qemu-kvm . Version:
*qemu-kvm (1:2.1+dfsg-11) *

On 04/13/2015 03:45 AM, Fam Zheng wrote:
> On Fri, 04/10 22:38, ein wrote:
>> Qemu creates more than 70 threads and everyone of them tries to write to
>> disk, which results in:
>> 1. High I/O time.
>> 2. Large latency.
>> 2. Poor sequential read/write speeds.
>>
>> When I limited number of cores, I guess I limited number of threads as
>> well. That's why I got better numbers.
>>
>> I've tried to combine AIO native and thread setting with deadline
>> scheduler. Native AIO was much more worse.
>>
>> The final question, is there any way to prevent Qemu for making so large
>> number of processes when VM does only one sequential R/W operation?
> aio=native will make QEMU only submit IO from the IO thread, so you shouldn't
> see 70 threads with that. And that should usually be a better option for
> performance.
>
>
> Fam


[-- Attachment #1.2: Type: text/html, Size: 2204 bytes --]

[-- Attachment #2: 0xF2C6EA10.asc --]
[-- Type: application/pgp-keys, Size: 4055 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-13 12:28   ` ein
@ 2015-04-13 13:53     ` Paolo Bonzini
  2015-04-14 10:31     ` Kevin Wolf
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2015-04-13 13:53 UTC (permalink / raw)
  To: ein, qemu-devel



On 13/04/2015 14:28, ein wrote:
> 
> 
> Check out my update please:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg01318.html
> 
> Using aio=native,cache=none results in 500%-2000% performance drop
> comparing to bare metal and 300%-1000% comparing to
> aio=threads,cache=unsafe.

Not a surprise that aio=threads,cache=unsafe is faster.  With
cache=unsafe you're telling QEMU that it's okay to lose data in case of
a host power loss.  Same for ext2 over XFS (ext2 isn't even journaled!).

Thus, use XFS and preallocate storage using the "fallocate" command.

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Very poor IO performance which looks like some design problem.
  2015-04-13 12:28   ` ein
  2015-04-13 13:53     ` Paolo Bonzini
@ 2015-04-14 10:31     ` Kevin Wolf
  1 sibling, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2015-04-14 10:31 UTC (permalink / raw)
  To: ein; +Cc: Fam Zheng, qemu-devel

Am 13.04.2015 um 14:28 hat ein geschrieben:
> Dear Fam,
> 
> Check out my update please:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg01318.html
> 
> Using aio=native,cache=none results in 500%-2000% performance drop comparing to
> bare metal and 300%-1000% comparing to aio=threads,cache=unsafe.

cache=unsafe isn't really useful to compare against because it throws
all flush requests away. If your benchmark includes disk flushes, you're
only cheating yourself with this mode.

It's generally accepted that cache=none,aio=native is the best
configuration for such scenarios. Also make sure to use the right guest
configuration; for example, using the right I/O scheduler can make a
major difference for Linux guests. Not sure how to configure Windows for
best performance, maybe someone else can help with that.

If you insist on cheating and would like to combine both, you could give
'cache=none,cache.no-flush=on,aio=native' a try.

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-04-14 10:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-10 20:38 [Qemu-devel] Very poor IO performance which looks like some design problem ein
2015-04-11 13:09 ` Paolo Bonzini
2015-04-11 17:10   ` ein
2015-04-11 19:00     ` ein
2015-04-13  1:45 ` Fam Zheng
2015-04-13 12:28   ` ein
2015-04-13 13:53     ` Paolo Bonzini
2015-04-14 10:31     ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.