All of lore.kernel.org
 help / color / mirror / Atom feed
* [question] virtio-blk performance degradation happened with virito-serial
@ 2014-08-29  7:45 ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-08-29  7:45 UTC (permalink / raw)
  To: qemu-devel, kvm

Hi, all

I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
without virtio-serial:
4k-read-random 1186 IOPS
with virtio-serial:
4k-read-random 871 IOPS

but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.

And, ide performance degradation does not happen with virtio-serial.

[environment]
Host OS: linux-3.10
QEMU: 2.0.1
Guest OS: windows server 2008

[qemu command]
/usr/bin/kvm -id 1587174272642 -chardev socket,id=qmp,path=/var/run/qemu-server/1587174272642.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/1587174272642.pid -daemonize -name win2008-32 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 2048 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-00e081de43d7/cea072c4294f/win2008-32.vm/vm-disk-1.qcow2,if=none,id=drive-ide2,cache=none,aio=native -device ide-hd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 -netdev type=tap,id=net0,ifname=158717427264200,script=/sf/etc/kvm/vtp-bridge -device e10
 00,mac=FE:FC:FE:D3:F9:2B,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3
 =1 -global PIIX4_PM.disable_s4=1

Any ideas?

Thanks,
Zhang Haoyu


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
@ 2014-08-29  7:45 ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-08-29  7:45 UTC (permalink / raw)
  To: qemu-devel, kvm

Hi, all

I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
without virtio-serial:
4k-read-random 1186 IOPS
with virtio-serial:
4k-read-random 871 IOPS

but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.

And, ide performance degradation does not happen with virtio-serial.

[environment]
Host OS: linux-3.10
QEMU: 2.0.1
Guest OS: windows server 2008

[qemu command]
/usr/bin/kvm -id 1587174272642 -chardev socket,id=qmp,path=/var/run/qemu-server/1587174272642.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/1587174272642.pid -daemonize -name win2008-32 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 2048 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-00e081de43d7/cea072c4294f/win2008-32.vm/vm-disk-1.qcow2,if=none,id=drive-ide2,cache=none,aio=native -device ide-hd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 -netdev type=tap,id=net0,ifname=158717427264200,script=/sf/etc/kvm/vtp-bridge -device e1000,mac=FE:FC:FE:D3:F9:2B,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3
 =1 -global PIIX4_PM.disable_s4=1

Any ideas?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
  2014-08-29  7:45 ` [Qemu-devel] " Zhang Haoyu
  (?)
@ 2014-08-29 14:38 ` Amit Shah
  2014-09-01 12:38   ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
  -1 siblings, 1 reply; 40+ messages in thread
From: Amit Shah @ 2014-08-29 14:38 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On (Fri) 29 Aug 2014 [15:45:30], Zhang Haoyu wrote:
> Hi, all
> 
> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
> without virtio-serial:
> 4k-read-random 1186 IOPS
> with virtio-serial:
> 4k-read-random 871 IOPS
> 
> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
> 
> And, ide performance degradation does not happen with virtio-serial.

Pretty sure it's related to MSI vectors in use.  It's possible that
the virtio-serial device takes up all the avl vectors in the guests,
leaving old-style irqs for the virtio-blk device.

If you restrict the number of vectors the virtio-serial device gets
(using the -device virtio-serial-pci,vectors= param), does that make
things better for you?


		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-08-29 14:38 ` Amit Shah
@ 2014-09-01 12:38   ` Zhang Haoyu
  2014-09-01 12:46     ` Amit Shah
  2014-09-01 12:52     ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
  0 siblings, 2 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-01 12:38 UTC (permalink / raw)
  To: Amit Shah; +Cc: qemu-devel, kvm

>> Hi, all
>> 
>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>> without virtio-serial:
>> 4k-read-random 1186 IOPS
>> with virtio-serial:
>> 4k-read-random 871 IOPS
>> 
>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>> 
>> And, ide performance degradation does not happen with virtio-serial.
>
>Pretty sure it's related to MSI vectors in use.  It's possible that
>the virtio-serial device takes up all the avl vectors in the guests,
>leaving old-style irqs for the virtio-blk device.
>
I don't think so,
I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
the performance got back again, very obvious.
So, I think it has no business with legacy interrupt mode, right?

I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
and the difference of perf top data on guest when disable/enable virtio-serial in guest,
any ideas?

Thanks,
Zhang Haoyu
>If you restrict the number of vectors the virtio-serial device gets
>(using the -device virtio-serial-pci,vectors= param), does that make
>things better for you?
>
>
>		Amit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 12:38   ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
@ 2014-09-01 12:46     ` Amit Shah
  2014-09-01 12:57       ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
  2014-09-01 12:52     ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
  1 sibling, 1 reply; 40+ messages in thread
From: Amit Shah @ 2014-09-01 12:46 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On (Mon) 01 Sep 2014 [20:38:20], Zhang Haoyu wrote:
> >> Hi, all
> >> 
> >> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
> >> without virtio-serial:
> >> 4k-read-random 1186 IOPS
> >> with virtio-serial:
> >> 4k-read-random 871 IOPS
> >> 
> >> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
> >> 
> >> And, ide performance degradation does not happen with virtio-serial.
> >
> >Pretty sure it's related to MSI vectors in use.  It's possible that
> >the virtio-serial device takes up all the avl vectors in the guests,
> >leaving old-style irqs for the virtio-blk device.
> >
> I don't think so,
> I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
> then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
> the performance got back again, very obvious.
> So, I think it has no business with legacy interrupt mode, right?
> 
> I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
> and the difference of perf top data on guest when disable/enable virtio-serial in guest,
> any ideas?

So it's a windows guest; it could be something windows driver
specific, then?  Do you see the same on Linux guests too?

		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 12:38   ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
  2014-09-01 12:46     ` Amit Shah
@ 2014-09-01 12:52     ` Zhang Haoyu
  2014-09-01 13:09       ` Christian Borntraeger
  2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
  1 sibling, 2 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-01 12:52 UTC (permalink / raw)
  To: Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

>>> Hi, all
>>> 
>>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>>> without virtio-serial:
>>> 4k-read-random 1186 IOPS
>>> with virtio-serial:
>>> 4k-read-random 871 IOPS
>>> 
>>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>>> 
>>> And, ide performance degradation does not happen with virtio-serial.
>>
>>Pretty sure it's related to MSI vectors in use.  It's possible that
>>the virtio-serial device takes up all the avl vectors in the guests,
>>leaving old-style irqs for the virtio-blk device.
>>
>I don't think so,
>I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>the performance got back again, very obvious.
add comments:
Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.

>So, I think it has no business with legacy interrupt mode, right?
>
>I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
>and the difference of perf top data on guest when disable/enable virtio-serial in guest,
>any ideas?
>
>Thanks,
>Zhang Haoyu
>>If you restrict the number of vectors the virtio-serial device gets
>>(using the -device virtio-serial-pci,vectors= param), does that make
>>things better for you?
>>
>>
>>		Amit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial
  2014-09-01 12:46     ` Amit Shah
@ 2014-09-01 12:57       ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-01 12:57 UTC (permalink / raw)
  To: Amit Shah; +Cc: qemu-devel, kvm

>> >> Hi, all
>> >> 
>> >> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>> >> without virtio-serial:
>> >> 4k-read-random 1186 IOPS
>> >> with virtio-serial:
>> >> 4k-read-random 871 IOPS
>> >> 
>> >> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>> >> 
>> >> And, ide performance degradation does not happen with virtio-serial.
>> >
>> >Pretty sure it's related to MSI vectors in use.  It's possible that
>> >the virtio-serial device takes up all the avl vectors in the guests,
>> >leaving old-style irqs for the virtio-blk device.
>> >
>> I don't think so,
>> I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>> then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>> the performance got back again, very obvious.
>> So, I think it has no business with legacy interrupt mode, right?
>> 
>> I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
>> and the difference of perf top data on guest when disable/enable virtio-serial in guest,
>> any ideas?
>
>So it's a windows guest; it could be something windows driver
>specific, then?  Do you see the same on Linux guests too?
>
I suspect windows driver specific, too.
I have not test linux guest, I'll test it later.

Thanks,
Zhang Haoyu
>		Amit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 12:52     ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
@ 2014-09-01 13:09       ` Christian Borntraeger
  2014-09-01 13:12         ` Paolo Bonzini
  2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
  1 sibling, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2014-09-01 13:09 UTC (permalink / raw)
  To: Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

On 01/09/14 14:52, Zhang Haoyu wrote:
>>>> Hi, all
>>>>
>>>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>>>> without virtio-serial:
>>>> 4k-read-random 1186 IOPS
>>>> with virtio-serial:
>>>> 4k-read-random 871 IOPS
>>>>
>>>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>>>>
>>>> And, ide performance degradation does not happen with virtio-serial.
>>>
>>> Pretty sure it's related to MSI vectors in use.  It's possible that
>>> the virtio-serial device takes up all the avl vectors in the guests,
>>> leaving old-style irqs for the virtio-blk device.
>>>
>> I don't think so,
>> I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>> then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>> the performance got back again, very obvious.
> add comments:
> Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.

This is just wild guessing:
If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.

Christian



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 13:09       ` Christian Borntraeger
@ 2014-09-01 13:12         ` Paolo Bonzini
  2014-09-01 13:22           ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2014-09-01 13:12 UTC (permalink / raw)
  To: Christian Borntraeger, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

Il 01/09/2014 15:09, Christian Borntraeger ha scritto:
> This is just wild guessing:
> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.

That could be the case if MSI is disabled.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 13:12         ` Paolo Bonzini
@ 2014-09-01 13:22           ` Christian Borntraeger
  2014-09-01 13:29             ` Paolo Bonzini
  2014-09-04  7:56             ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu
  0 siblings, 2 replies; 40+ messages in thread
From: Christian Borntraeger @ 2014-09-01 13:22 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

On 01/09/14 15:12, Paolo Bonzini wrote:
> Il 01/09/2014 15:09, Christian Borntraeger ha scritto:
>> This is just wild guessing:
>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
> 
> That could be the case if MSI is disabled.
> 
> Paolo
> 

Do the windows virtio drivers enable MSIs, in their inf file?

Christian


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 13:22           ` Christian Borntraeger
@ 2014-09-01 13:29             ` Paolo Bonzini
  2014-09-01 14:03               ` Christian Borntraeger
  2014-09-04  7:56             ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu
  1 sibling, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2014-09-01 13:29 UTC (permalink / raw)
  To: Christian Borntraeger, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

Il 01/09/2014 15:22, Christian Borntraeger ha scritto:
> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
> > 
> > That could be the case if MSI is disabled.
> 
> Do the windows virtio drivers enable MSIs, in their inf file?

It depends on the version of the drivers, but it is a reasonable guess
at what differs between Linux and Windows.  Haoyu, can you give us the
output of lspci from a Linux guest?

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 13:29             ` Paolo Bonzini
@ 2014-09-01 14:03               ` Christian Borntraeger
  2014-09-01 14:15                 ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2014-09-01 14:03 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

On 01/09/14 15:29, Paolo Bonzini wrote:
> Il 01/09/2014 15:22, Christian Borntraeger ha scritto:
>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>>>
>>> That could be the case if MSI is disabled.
>>
>> Do the windows virtio drivers enable MSIs, in their inf file?
> 
> It depends on the version of the drivers, but it is a reasonable guess
> at what differs between Linux and Windows.  Haoyu, can you give us the
> output of lspci from a Linux guest?
> 
> Paolo

Zhang Haoyu, which virtio drivers did you use?

I just checked the Fedora virtio driver. The INF file does not contain the MSI enablement as described in
http://msdn.microsoft.com/en-us/library/windows/hardware/ff544246%28v=vs.85%29.aspx
That would explain the performance issues - given that the link information is still true.



Christian






^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 14:03               ` Christian Borntraeger
@ 2014-09-01 14:15                 ` Christian Borntraeger
  0 siblings, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2014-09-01 14:15 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm

On 01/09/14 16:03, Christian Borntraeger wrote:
> On 01/09/14 15:29, Paolo Bonzini wrote:
>> Il 01/09/2014 15:22, Christian Borntraeger ha scritto:
>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>>>>
>>>> That could be the case if MSI is disabled.
>>>
>>> Do the windows virtio drivers enable MSIs, in their inf file?
>>
>> It depends on the version of the drivers, but it is a reasonable guess
>> at what differs between Linux and Windows.  Haoyu, can you give us the
>> output of lspci from a Linux guest?
>>
>> Paolo
> 
> Zhang Haoyu, which virtio drivers did you use?
> 
> I just checked the Fedora virtio driver. The INF file does not contain the MSI enablement as described in
> http://msdn.microsoft.com/en-us/library/windows/hardware/ff544246%28v=vs.85%29.aspx
> That would explain the performance issues - given that the link information is still true.

Sorry, looked at the wrong inf file. The fedora driver does use MSI for serial and block.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-01 12:52     ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
  2014-09-01 13:09       ` Christian Borntraeger
@ 2014-09-02  6:36       ` Amit Shah
  2014-09-02 18:05         ` Andrey Korolyov
                           ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Amit Shah @ 2014-09-02  6:36 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote:
> >>> Hi, all
> >>> 
> >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
> >>> without virtio-serial:
> >>> 4k-read-random 1186 IOPS
> >>> with virtio-serial:
> >>> 4k-read-random 871 IOPS
> >>> 
> >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
> >>> 
> >>> And, ide performance degradation does not happen with virtio-serial.
> >>
> >>Pretty sure it's related to MSI vectors in use.  It's possible that
> >>the virtio-serial device takes up all the avl vectors in the guests,
> >>leaving old-style irqs for the virtio-blk device.
> >>
> >I don't think so,
> >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
> >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
> >the performance got back again, very obvious.
> add comments:
> Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.

Using the vectors= option as mentioned below, you can restrict the
number of MSI vectors the virtio-serial device gets.  You can then
confirm whether it's MSI that's related to these issues.

> >So, I think it has no business with legacy interrupt mode, right?
> >
> >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
> >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
> >any ideas?
> >
> >Thanks,
> >Zhang Haoyu
> >>If you restrict the number of vectors the virtio-serial device gets
> >>(using the -device virtio-serial-pci,vectors= param), does that make
> >>things better for you?



		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
@ 2014-09-02 18:05         ` Andrey Korolyov
  2014-09-02 18:11             ` [Qemu-devel] " Amit Shah
  2014-09-04  2:20         ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
  2014-09-19  5:53           ` Fam Zheng
  2 siblings, 1 reply; 40+ messages in thread
From: Andrey Korolyov @ 2014-09-02 18:05 UTC (permalink / raw)
  To: Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm

On Tue, Sep 2, 2014 at 10:36 AM, Amit Shah <amit.shah@redhat.com> wrote:
> On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote:
>> >>> Hi, all
>> >>>
>> >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>> >>> without virtio-serial:
>> >>> 4k-read-random 1186 IOPS
>> >>> with virtio-serial:
>> >>> 4k-read-random 871 IOPS
>> >>>
>> >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>> >>>
>> >>> And, ide performance degradation does not happen with virtio-serial.
>> >>
>> >>Pretty sure it's related to MSI vectors in use.  It's possible that
>> >>the virtio-serial device takes up all the avl vectors in the guests,
>> >>leaving old-style irqs for the virtio-blk device.
>> >>
>> >I don't think so,
>> >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>> >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>> >the performance got back again, very obvious.
>> add comments:
>> Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.
>
> Using the vectors= option as mentioned below, you can restrict the
> number of MSI vectors the virtio-serial device gets.  You can then
> confirm whether it's MSI that's related to these issues.
>
>> >So, I think it has no business with legacy interrupt mode, right?
>> >
>> >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
>> >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
>> >any ideas?
>> >
>> >Thanks,
>> >Zhang Haoyu
>> >>If you restrict the number of vectors the virtio-serial device gets
>> >>(using the -device virtio-serial-pci,vectors= param), does that make
>> >>things better for you?
>
>
>
>                 Amit
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can confirm serious degradation comparing to the 1.1 with regular
serial output  - I am able to hang VM forever after some tens of
seconds after continuously printing dmest to the ttyS0. VM just ate
all available CPU quota during test and hanged over some tens of
seconds, not even responding to regular pings and progressively
raising CPU consumption up to the limit.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-02 18:05         ` Andrey Korolyov
@ 2014-09-02 18:11             ` Amit Shah
  0 siblings, 0 replies; 40+ messages in thread
From: Amit Shah @ 2014-09-02 18:11 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Zhang Haoyu, qemu-devel, kvm

On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote:

> Can confirm serious degradation comparing to the 1.1 with regular
> serial output  - I am able to hang VM forever after some tens of
> seconds after continuously printing dmest to the ttyS0. VM just ate
> all available CPU quota during test and hanged over some tens of
> seconds, not even responding to regular pings and progressively
> raising CPU consumption up to the limit.

Entirely different to what's being discussed here.  You're observing
slowdown with ttyS0 in the guest -- the isa-serial device.  This
thread is discussing virtio-blk and virtio-serial.

		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
@ 2014-09-02 18:11             ` Amit Shah
  0 siblings, 0 replies; 40+ messages in thread
From: Amit Shah @ 2014-09-02 18:11 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: Zhang Haoyu, qemu-devel, kvm

On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote:

> Can confirm serious degradation comparing to the 1.1 with regular
> serial output  - I am able to hang VM forever after some tens of
> seconds after continuously printing dmest to the ttyS0. VM just ate
> all available CPU quota during test and hanged over some tens of
> seconds, not even responding to regular pings and progressively
> raising CPU consumption up to the limit.

Entirely different to what's being discussed here.  You're observing
slowdown with ttyS0 in the guest -- the isa-serial device.  This
thread is discussing virtio-blk and virtio-serial.

		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-02 18:11             ` [Qemu-devel] " Amit Shah
  (?)
@ 2014-09-02 18:27             ` Andrey Korolyov
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrey Korolyov @ 2014-09-02 18:27 UTC (permalink / raw)
  To: Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm

On Tue, Sep 2, 2014 at 10:11 PM, Amit Shah <amit.shah@redhat.com> wrote:
> On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote:
>
>> Can confirm serious degradation comparing to the 1.1 with regular
>> serial output  - I am able to hang VM forever after some tens of
>> seconds after continuously printing dmest to the ttyS0. VM just ate
>> all available CPU quota during test and hanged over some tens of
>> seconds, not even responding to regular pings and progressively
>> raising CPU consumption up to the limit.
>
> Entirely different to what's being discussed here.  You're observing
> slowdown with ttyS0 in the guest -- the isa-serial device.  This
> thread is discussing virtio-blk and virtio-serial.
>
>                 Amit

Sorry for thread hijacking, the problem definitely not related to the
interrupt rework, will start a new thread.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial
  2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
  2014-09-02 18:05         ` Andrey Korolyov
@ 2014-09-04  2:20         ` Zhang Haoyu
  2014-09-19  5:53           ` Fam Zheng
  2 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-04  2:20 UTC (permalink / raw)
  To: Amit Shah; +Cc: qemu-devel, kvm

>> >>> Hi, all
>> >>> 
>> >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>> >>> without virtio-serial:
>> >>> 4k-read-random 1186 IOPS
>> >>> with virtio-serial:
>> >>> 4k-read-random 871 IOPS
>> >>> 
>> >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>> >>> 
>> >>> And, ide performance degradation does not happen with virtio-serial.
>> >>
>> >>Pretty sure it's related to MSI vectors in use.  It's possible that
>> >>the virtio-serial device takes up all the avl vectors in the guests,
>> >>leaving old-style irqs for the virtio-blk device.
>> >>
>> >I don't think so,
>> >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>> >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>> >the performance got back again, very obvious.
>> add comments:
>> Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.
>
>Using the vectors= option as mentioned below, you can restrict the
>number of MSI vectors the virtio-serial device gets.  You can then
>confirm whether it's MSI that's related to these issues.
>
I use "-device virtio-serial,vectors=4" instead of "-device virtio-serial", but the degradation still happened, nothing changed.
with virtio-serial enabled:
64k-write-sequence: 4200 IOPS
with virtio-serial disabled:
64k-write-sequence: 5300 IOPS

How to confirm whether it's MSI in windows?

Thanks,
Zhang Haoyu

>> >So, I think it has no business with legacy interrupt mode, right?
>> >
>> >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
>> >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
>> >any ideas?
>> >
>> >Thanks,
>> >Zhang Haoyu
>> >>If you restrict the number of vectors the virtio-serial device gets
>> >>(using the -device virtio-serial-pci,vectors= param), does that make
>> >>things better for you?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial
  2014-09-01 13:22           ` Christian Borntraeger
  2014-09-01 13:29             ` Paolo Bonzini
@ 2014-09-04  7:56             ` Zhang Haoyu
  2014-09-07  9:46                 ` Zhang Haoyu
  1 sibling, 1 reply; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-04  7:56 UTC (permalink / raw)
  To: Paolo Bonzini, Christian Borntraeger, Amit Shah; +Cc: qemu-devel, kvm

>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>> > 
>> > That could be the case if MSI is disabled.
>> 
>> Do the windows virtio drivers enable MSIs, in their inf file?
>
>It depends on the version of the drivers, but it is a reasonable guess
>at what differs between Linux and Windows.  Haoyu, can you give us the
>output of lspci from a Linux guest?
>
I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
virtio_console module installed:
64K-write-sequence: 285 MBPS, 4380 IOPS
virtio_console module uninstalled:
64K-write-sequence: 370 MBPS, 5670 IOPS

And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled.
25:    2245933   PCI-MSI-edge      virtio1-requests

fio command:
fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest

QEMU comamnd:
/usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci
 .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0

[environment]
Host:linux-3.10(RHEL7-rc1)
QEMU: qemu-2.0.1
Guest: RHEL6.5

# lspci -tv
-[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
           +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
           +-01.1  Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
           +-01.2  Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II]
           +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
           +-02.0  Cirrus Logic GD 5446
           +-03.0  Red Hat, Inc Virtio console
           +-0b.0  Red Hat, Inc Virtio block device
           +-0c.0  Red Hat, Inc Virtio block device
           \-12.0  Red Hat, Inc Virtio network device

# lspci -vvv
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
        Subsystem: Red Hat, Inc Qemu virtual machine
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
        Subsystem: Red Hat, Inc Qemu virtual machine
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master])
        Subsystem: Red Hat, Inc Qemu virtual machine
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
        Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
        Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
        Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
        Region 4: I/O ports at c0e0 [size=16]
        Kernel driver in use: ata_piix
        Kernel modules: ata_generic, pata_acpi, ata_piix

00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI])
        Subsystem: Red Hat, Inc Qemu virtual machine
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin D routed to IRQ 11
        Region 4: I/O ports at c080 [size=32]
        Kernel driver in use: uhci_hcd

00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
        Subsystem: Red Hat, Inc Qemu virtual machine
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 9
        Kernel driver in use: piix4_smbus
        Kernel modules: i2c-piix4

00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller])
        Subsystem: Red Hat, Inc Device 1100
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
        Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at febc0000 [disabled] [size=64K]
        Kernel modules: cirrusfb

00:03.0 Communication controller: Red Hat, Inc Virtio console
        Subsystem: Red Hat, Inc Device 0003
        Physical Slot: 3
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at c0a0 [size=32]
        Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [40] MSI-X: Enable- Count=4 Masked-
                Vector table: BAR=1 offset=00000000
                PBA: BAR=1 offset=00000800
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci

00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device
        Subsystem: Red Hat, Inc Device 0002
        Physical Slot: 11
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at c000 [size=64]
        Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
                Vector table: BAR=1 offset=00000000
                PBA: BAR=1 offset=00000800
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci

00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device
        Subsystem: Red Hat, Inc Device 0002
        Physical Slot: 12
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at c040 [size=64]
        Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
                Vector table: BAR=1 offset=00000000
                PBA: BAR=1 offset=00000800
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci

00:12.0 Ethernet controller: Red Hat, Inc Virtio network device
        Subsystem: Red Hat, Inc Device 0001
        Physical Slot: 18
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at c0c0 [size=32]
        Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at feb80000 [disabled] [size=256K]
        Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
                Vector table: BAR=1 offset=00000000
                PBA: BAR=1 offset=00000800
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci

Thanks,
Zhang Haoyu

>Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial
  2014-09-04  7:56             ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu
@ 2014-09-07  9:46                 ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-07  9:46 UTC (permalink / raw)
  To: Zhang Haoyu, Paolo Bonzini, Christian Borntraeger, Amit Shah
  Cc: qemu-devel, kvm

Hi, Paolo, Amit,
any ideas?

Thanks,
Zhang Haoyu


On 2014-9-4 15:56, Zhang Haoyu wrote:
>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>>>> That could be the case if MSI is disabled.
>>> Do the windows virtio drivers enable MSIs, in their inf file?
>> It depends on the version of the drivers, but it is a reasonable guess
>> at what differs between Linux and Windows.  Haoyu, can you give us the
>> output of lspci from a Linux guest?
>>
> I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
> virtio_console module installed:
> 64K-write-sequence: 285 MBPS, 4380 IOPS
> virtio_console module uninstalled:
> 64K-write-sequence: 370 MBPS, 5670 IOPS
>
> And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled.
> 25:    2245933   PCI-MSI-edge      virtio1-requests
>
> fio command:
> fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest
>
> QEMU comamnd:
> /usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -dri
 ve file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci
>  .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0
>
> [environment]
> Host:linux-3.10(RHEL7-rc1)
> QEMU: qemu-2.0.1
> Guest: RHEL6.5
>
> # lspci -tv
> -[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
>            +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>            +-01.1  Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
>            +-01.2  Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II]
>            +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
>            +-02.0  Cirrus Logic GD 5446
>            +-03.0  Red Hat, Inc Virtio console
>            +-0b.0  Red Hat, Inc Virtio block device
>            +-0c.0  Red Hat, Inc Virtio block device
>            \-12.0  Red Hat, Inc Virtio network device
>
> # lspci -vvv
> 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
> 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
> 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master])
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
>         Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
>         Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
>         Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
>         Region 4: I/O ports at c0e0 [size=16]
>         Kernel driver in use: ata_piix
>         Kernel modules: ata_generic, pata_acpi, ata_piix
>
> 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI])
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin D routed to IRQ 11
>         Region 4: I/O ports at c080 [size=32]
>         Kernel driver in use: uhci_hcd
>
> 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 9
>         Kernel driver in use: piix4_smbus
>         Kernel modules: i2c-piix4
>
> 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller])
>         Subsystem: Red Hat, Inc Device 1100
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
>         Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
>         Expansion ROM at febc0000 [disabled] [size=64K]
>         Kernel modules: cirrusfb
>
> 00:03.0 Communication controller: Red Hat, Inc Virtio console
>         Subsystem: Red Hat, Inc Device 0003
>         Physical Slot: 3
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 10
>         Region 0: I/O ports at c0a0 [size=32]
>         Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] MSI-X: Enable- Count=4 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> 00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device
>         Subsystem: Red Hat, Inc Device 0002
>         Physical Slot: 11
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 10
>         Region 0: I/O ports at c000 [size=64]
>         Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> 00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device
>         Subsystem: Red Hat, Inc Device 0002
>         Physical Slot: 12
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 11
>         Region 0: I/O ports at c040 [size=64]
>         Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> 00:12.0 Ethernet controller: Red Hat, Inc Virtio network device
>         Subsystem: Red Hat, Inc Device 0001
>         Physical Slot: 18
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 10
>         Region 0: I/O ports at c0c0 [size=32]
>         Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K]
>         Expansion ROM at feb80000 [disabled] [size=256K]
>         Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> Thanks,
> Zhang Haoyu
>
>> Paolo
>
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial
@ 2014-09-07  9:46                 ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-07  9:46 UTC (permalink / raw)
  To: Zhang Haoyu, Paolo Bonzini, Christian Borntraeger, Amit Shah
  Cc: qemu-devel, kvm

Hi, Paolo, Amit,
any ideas?

Thanks,
Zhang Haoyu


On 2014-9-4 15:56, Zhang Haoyu wrote:
>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>>>> That could be the case if MSI is disabled.
>>> Do the windows virtio drivers enable MSIs, in their inf file?
>> It depends on the version of the drivers, but it is a reasonable guess
>> at what differs between Linux and Windows.  Haoyu, can you give us the
>> output of lspci from a Linux guest?
>>
> I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
> virtio_console module installed:
> 64K-write-sequence: 285 MBPS, 4380 IOPS
> virtio_console module uninstalled:
> 64K-write-sequence: 370 MBPS, 5670 IOPS
>
> And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled.
> 25:    2245933   PCI-MSI-edge      virtio1-requests
>
> fio command:
> fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest
>
> QEMU comamnd:
> /usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci
>  .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0
>
> [environment]
> Host:linux-3.10(RHEL7-rc1)
> QEMU: qemu-2.0.1
> Guest: RHEL6.5
>
> # lspci -tv
> -[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
>            +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>            +-01.1  Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
>            +-01.2  Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II]
>            +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
>            +-02.0  Cirrus Logic GD 5446
>            +-03.0  Red Hat, Inc Virtio console
>            +-0b.0  Red Hat, Inc Virtio block device
>            +-0c.0  Red Hat, Inc Virtio block device
>            \-12.0  Red Hat, Inc Virtio network device
>
> # lspci -vvv
> 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
> 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
> 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master])
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
>         Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
>         Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
>         Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
>         Region 4: I/O ports at c0e0 [size=16]
>         Kernel driver in use: ata_piix
>         Kernel modules: ata_generic, pata_acpi, ata_piix
>
> 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI])
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin D routed to IRQ 11
>         Region 4: I/O ports at c080 [size=32]
>         Kernel driver in use: uhci_hcd
>
> 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
>         Subsystem: Red Hat, Inc Qemu virtual machine
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 9
>         Kernel driver in use: piix4_smbus
>         Kernel modules: i2c-piix4
>
> 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller])
>         Subsystem: Red Hat, Inc Device 1100
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
>         Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
>         Expansion ROM at febc0000 [disabled] [size=64K]
>         Kernel modules: cirrusfb
>
> 00:03.0 Communication controller: Red Hat, Inc Virtio console
>         Subsystem: Red Hat, Inc Device 0003
>         Physical Slot: 3
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 10
>         Region 0: I/O ports at c0a0 [size=32]
>         Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] MSI-X: Enable- Count=4 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> 00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device
>         Subsystem: Red Hat, Inc Device 0002
>         Physical Slot: 11
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 10
>         Region 0: I/O ports at c000 [size=64]
>         Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> 00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device
>         Subsystem: Red Hat, Inc Device 0002
>         Physical Slot: 12
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 11
>         Region 0: I/O ports at c040 [size=64]
>         Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> 00:12.0 Ethernet controller: Red Hat, Inc Virtio network device
>         Subsystem: Red Hat, Inc Device 0001
>         Physical Slot: 18
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 10
>         Region 0: I/O ports at c0c0 [size=32]
>         Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K]
>         Expansion ROM at feb80000 [disabled] [size=256K]
>         Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
>                 Vector table: BAR=1 offset=00000000
>                 PBA: BAR=1 offset=00000800
>         Kernel driver in use: virtio-pci
>         Kernel modules: virtio_pci
>
> Thanks,
> Zhang Haoyu
>
>> Paolo
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial
  2014-09-07  9:46                 ` Zhang Haoyu
@ 2014-09-11  6:11                   ` Amit Shah
  -1 siblings, 0 replies; 40+ messages in thread
From: Amit Shah @ 2014-09-11  6:11 UTC (permalink / raw)
  To: Zhang Haoyu
  Cc: Zhang Haoyu, Paolo Bonzini, Christian Borntraeger, qemu-devel, kvm

On (Sun) 07 Sep 2014 [17:46:26], Zhang Haoyu wrote:
> Hi, Paolo, Amit,
> any ideas?

I'll check this, thanks for testing with Linux guests.


		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial
@ 2014-09-11  6:11                   ` Amit Shah
  0 siblings, 0 replies; 40+ messages in thread
From: Amit Shah @ 2014-09-11  6:11 UTC (permalink / raw)
  To: Zhang Haoyu
  Cc: Paolo Bonzini, Zhang Haoyu, qemu-devel, kvm, Christian Borntraeger

On (Sun) 07 Sep 2014 [17:46:26], Zhang Haoyu wrote:
> Hi, Paolo, Amit,
> any ideas?

I'll check this, thanks for testing with Linux guests.


		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [question] virtio-blk performance degradation happened with virito-serial
  2014-09-11  6:11                   ` Amit Shah
@ 2014-09-12  3:21                     ` Zhang Haoyu
  -1 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-12  3:21 UTC (permalink / raw)
  To: Amit Shah, Zhang Haoyu
  Cc: Paolo Bonzini, qemu-devel, kvm, Christian Borntraeger

>>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>>> > 
>>> > That could be the case if MSI is disabled.
>>> 
>>> Do the windows virtio drivers enable MSIs, in their inf file?
>>
>>It depends on the version of the drivers, but it is a reasonable guess
>>at what differs between Linux and Windows.  Haoyu, can you give us the
>>output of lspci from a Linux guest?
>>
>I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
>virtio_console module installed:
>64K-write-sequence: 285 MBPS, 4380 IOPS
>virtio_console module uninstalled:
>64K-write-sequence: 370 MBPS, 5670 IOPS
>
I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that,
virtio_console module installed:
qemu main thread cpu usage: 98%
virtio_console module uninstalled:
qemu main thread cpu usage: 60%

perf top -p <qemu-pid> result,
virtio_console module installed:
   PerfTop:    9868 irqs/sec  kernel:76.4%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

    11.80%  [kernel]                 [k] _raw_spin_lock_irqsave
     8.42%  [kernel]                 [k] _raw_spin_unlock_irqrestore
     7.33%  [kernel]                 [k] fget_light
     6.28%  [kernel]                 [k] fput
     3.61%  [kernel]                 [k] do_sys_poll
     3.30%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
     3.10%  [kernel]                 [k] __pollwait
     2.15%  qemu-system-x86_64       [.] qemu_iohandler_poll
     1.44%  libglib-2.0.so.0.3200.4  [.] g_array_append_vals
     1.36%  libc-2.13.so             [.] 0x000000000011fc2a
     1.31%  libpthread-2.13.so       [.] pthread_mutex_lock
     1.24%  libglib-2.0.so.0.3200.4  [.] 0x000000000001f961
     1.20%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
     0.99%  [kernel]                 [k] eventfd_poll
     0.98%  [vdso]                   [.] 0x0000000000000771
     0.97%  [kernel]                 [k] remove_wait_queue
     0.96%  qemu-system-x86_64       [.] qemu_iohandler_fill
     0.95%  [kernel]                 [k] add_wait_queue
     0.69%  [kernel]                 [k] __srcu_read_lock
     0.58%  [kernel]                 [k] poll_freewait
     0.57%  [kernel]                 [k] _raw_spin_lock_irq
     0.54%  [kernel]                 [k] __srcu_read_unlock
     0.47%  [kernel]                 [k] copy_user_enhanced_fast_string
     0.46%  [kvm_intel]              [k] vmx_vcpu_run
     0.46%  [kvm]                    [k] vcpu_enter_guest
     0.42%  [kernel]                 [k] tcp_poll
     0.41%  [kernel]                 [k] system_call_after_swapgs
     0.40%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
     0.40%  [kernel]                 [k] system_call
     0.38%  libpthread-2.13.so       [.] 0x000000000000e18d
     0.38%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
     0.38%  qemu-system-x86_64       [.] address_space_translate_internal
     0.38%  [kernel]                 [k] _raw_spin_lock
     0.37%  qemu-system-x86_64       [.] phys_page_find
     0.36%  [kernel]                 [k] get_page_from_freelist
     0.35%  [kernel]                 [k] sock_poll
     0.34%  [kernel]                 [k] fsnotify
     0.31%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
     0.30%  [kernel]                 [k] do_direct_IO
     0.29%  libpthread-2.13.so       [.] pthread_getspecific

virtio_console module uninstalled:
   PerfTop:    9138 irqs/sec  kernel:71.7%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
------------------------------------------------------------------------------------------------------------------------------

     5.72%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
     4.51%  [kernel]                 [k] fget_light
     3.98%  [kernel]                 [k] _raw_spin_lock_irqsave
     2.55%  [kernel]                 [k] fput
     2.48%  libpthread-2.13.so       [.] pthread_mutex_lock
     2.46%  [kernel]                 [k] _raw_spin_unlock_irqrestore
     2.21%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
     1.71%  [vdso]                   [.] 0x000000000000060c
     1.68%  libc-2.13.so             [.] 0x00000000000e751f
     1.64%  libglib-2.0.so.0.3200.4  [.] 0x000000000004fca0
     1.20%  [kernel]                 [k] __srcu_read_lock
     1.14%  [kernel]                 [k] do_sys_poll
     0.96%  [kernel]                 [k] _raw_spin_lock_irq
     0.95%  [kernel]                 [k] __pollwait
     0.91%  [kernel]                 [k] __srcu_read_unlock
     0.78%  [kernel]                 [k] tcp_poll
     0.74%  [kvm]                    [k] vcpu_enter_guest
     0.73%  [kvm_intel]              [k] vmx_vcpu_run
     0.72%  [kernel]                 [k] _raw_spin_lock
     0.72%  [kernel]                 [k] system_call_after_swapgs
     0.70%  [kernel]                 [k] copy_user_enhanced_fast_string
     0.67%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
     0.66%  libpthread-2.13.so       [.] 0x000000000000e12d
     0.65%  [kernel]                 [k] system_call
     0.61%  [kernel]                 [k] do_direct_IO
     0.57%  qemu-system-x86_64       [.] qemu_iohandler_poll
     0.57%  [kernel]                 [k] fsnotify
     0.54%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
     0.50%  [kernel]                 [k] vfs_write
     0.49%  libpthread-2.13.so       [.] pthread_getspecific
     0.48%  qemu-system-x86_64       [.] qemu_event_reset
     0.47%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
     0.46%  qemu-system-x86_64       [.] address_space_translate_internal
     0.46%  [kernel]                 [k] sock_poll
     0.46%  libpthread-2.13.so       [.] __pthread_disable_asynccancel
     0.44%  [kernel]                 [k] resched_task
     0.43%  libpthread-2.13.so       [.] __pthread_enable_asynccancel
     0.42%  qemu-system-x86_64       [.] phys_page_find
     0.39%  qemu-system-x86_64       [.] object_dynamic_cast_assert

>And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled.
>25:    2245933   PCI-MSI-edge      virtio1-requests
>
>fio command:
>fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest
>
>QEMU comamnd:
>/usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -driv
 e file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pc
 i
> .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0
>
>[environment]
>Host:linux-3.10(RHEL7-rc1)
>QEMU: qemu-2.0.1
>Guest: RHEL6.5
>
># lspci -tv
>-[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
>           +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>           +-01.1  Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
>           +-01.2  Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II]
>           +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
>           +-02.0  Cirrus Logic GD 5446
>           +-03.0  Red Hat, Inc Virtio console
>           +-0b.0  Red Hat, Inc Virtio block device
>           +-0c.0  Red Hat, Inc Virtio block device
>           \-12.0  Red Hat, Inc Virtio network device
>
># lspci -vvv
>00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
>00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
>00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master])
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
>        Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
>        Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
>        Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
>        Region 4: I/O ports at c0e0 [size=16]
>        Kernel driver in use: ata_piix
>        Kernel modules: ata_generic, pata_acpi, ata_piix
>
>00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI])
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Interrupt: pin D routed to IRQ 11
>        Region 4: I/O ports at c080 [size=32]
>        Kernel driver in use: uhci_hcd
>
>00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 9
>        Kernel driver in use: piix4_smbus
>        Kernel modules: i2c-piix4
>
>00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller])
>        Subsystem: Red Hat, Inc Device 1100
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
>        Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
>        Expansion ROM at febc0000 [disabled] [size=64K]
>        Kernel modules: cirrusfb
>
>00:03.0 Communication controller: Red Hat, Inc Virtio console
>        Subsystem: Red Hat, Inc Device 0003
>        Physical Slot: 3
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 10
>        Region 0: I/O ports at c0a0 [size=32]
>        Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
>        Capabilities: [40] MSI-X: Enable- Count=4 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device
>        Subsystem: Red Hat, Inc Device 0002
>        Physical Slot: 11
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 10
>        Region 0: I/O ports at c000 [size=64]
>        Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
>        Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device
>        Subsystem: Red Hat, Inc Device 0002
>        Physical Slot: 12
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 11
>        Region 0: I/O ports at c040 [size=64]
>        Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
>        Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>00:12.0 Ethernet controller: Red Hat, Inc Virtio network device
>        Subsystem: Red Hat, Inc Device 0001
>        Physical Slot: 18
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 10
>        Region 0: I/O ports at c0c0 [size=32]
>        Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K]
>        Expansion ROM at feb80000 [disabled] [size=256K]
>        Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>Thanks,
>Zhang Haoyu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
@ 2014-09-12  3:21                     ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-12  3:21 UTC (permalink / raw)
  To: Amit Shah, Zhang Haoyu
  Cc: Paolo Bonzini, qemu-devel, kvm, Christian Borntraeger

>>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
>>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
>>> > 
>>> > That could be the case if MSI is disabled.
>>> 
>>> Do the windows virtio drivers enable MSIs, in their inf file?
>>
>>It depends on the version of the drivers, but it is a reasonable guess
>>at what differs between Linux and Windows.  Haoyu, can you give us the
>>output of lspci from a Linux guest?
>>
>I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
>virtio_console module installed:
>64K-write-sequence: 285 MBPS, 4380 IOPS
>virtio_console module uninstalled:
>64K-write-sequence: 370 MBPS, 5670 IOPS
>
I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that,
virtio_console module installed:
qemu main thread cpu usage: 98%
virtio_console module uninstalled:
qemu main thread cpu usage: 60%

perf top -p <qemu-pid> result,
virtio_console module installed:
   PerfTop:    9868 irqs/sec  kernel:76.4%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

    11.80%  [kernel]                 [k] _raw_spin_lock_irqsave
     8.42%  [kernel]                 [k] _raw_spin_unlock_irqrestore
     7.33%  [kernel]                 [k] fget_light
     6.28%  [kernel]                 [k] fput
     3.61%  [kernel]                 [k] do_sys_poll
     3.30%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
     3.10%  [kernel]                 [k] __pollwait
     2.15%  qemu-system-x86_64       [.] qemu_iohandler_poll
     1.44%  libglib-2.0.so.0.3200.4  [.] g_array_append_vals
     1.36%  libc-2.13.so             [.] 0x000000000011fc2a
     1.31%  libpthread-2.13.so       [.] pthread_mutex_lock
     1.24%  libglib-2.0.so.0.3200.4  [.] 0x000000000001f961
     1.20%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
     0.99%  [kernel]                 [k] eventfd_poll
     0.98%  [vdso]                   [.] 0x0000000000000771
     0.97%  [kernel]                 [k] remove_wait_queue
     0.96%  qemu-system-x86_64       [.] qemu_iohandler_fill
     0.95%  [kernel]                 [k] add_wait_queue
     0.69%  [kernel]                 [k] __srcu_read_lock
     0.58%  [kernel]                 [k] poll_freewait
     0.57%  [kernel]                 [k] _raw_spin_lock_irq
     0.54%  [kernel]                 [k] __srcu_read_unlock
     0.47%  [kernel]                 [k] copy_user_enhanced_fast_string
     0.46%  [kvm_intel]              [k] vmx_vcpu_run
     0.46%  [kvm]                    [k] vcpu_enter_guest
     0.42%  [kernel]                 [k] tcp_poll
     0.41%  [kernel]                 [k] system_call_after_swapgs
     0.40%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
     0.40%  [kernel]                 [k] system_call
     0.38%  libpthread-2.13.so       [.] 0x000000000000e18d
     0.38%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
     0.38%  qemu-system-x86_64       [.] address_space_translate_internal
     0.38%  [kernel]                 [k] _raw_spin_lock
     0.37%  qemu-system-x86_64       [.] phys_page_find
     0.36%  [kernel]                 [k] get_page_from_freelist
     0.35%  [kernel]                 [k] sock_poll
     0.34%  [kernel]                 [k] fsnotify
     0.31%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
     0.30%  [kernel]                 [k] do_direct_IO
     0.29%  libpthread-2.13.so       [.] pthread_getspecific

virtio_console module uninstalled:
   PerfTop:    9138 irqs/sec  kernel:71.7%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
------------------------------------------------------------------------------------------------------------------------------

     5.72%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
     4.51%  [kernel]                 [k] fget_light
     3.98%  [kernel]                 [k] _raw_spin_lock_irqsave
     2.55%  [kernel]                 [k] fput
     2.48%  libpthread-2.13.so       [.] pthread_mutex_lock
     2.46%  [kernel]                 [k] _raw_spin_unlock_irqrestore
     2.21%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
     1.71%  [vdso]                   [.] 0x000000000000060c
     1.68%  libc-2.13.so             [.] 0x00000000000e751f
     1.64%  libglib-2.0.so.0.3200.4  [.] 0x000000000004fca0
     1.20%  [kernel]                 [k] __srcu_read_lock
     1.14%  [kernel]                 [k] do_sys_poll
     0.96%  [kernel]                 [k] _raw_spin_lock_irq
     0.95%  [kernel]                 [k] __pollwait
     0.91%  [kernel]                 [k] __srcu_read_unlock
     0.78%  [kernel]                 [k] tcp_poll
     0.74%  [kvm]                    [k] vcpu_enter_guest
     0.73%  [kvm_intel]              [k] vmx_vcpu_run
     0.72%  [kernel]                 [k] _raw_spin_lock
     0.72%  [kernel]                 [k] system_call_after_swapgs
     0.70%  [kernel]                 [k] copy_user_enhanced_fast_string
     0.67%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
     0.66%  libpthread-2.13.so       [.] 0x000000000000e12d
     0.65%  [kernel]                 [k] system_call
     0.61%  [kernel]                 [k] do_direct_IO
     0.57%  qemu-system-x86_64       [.] qemu_iohandler_poll
     0.57%  [kernel]                 [k] fsnotify
     0.54%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
     0.50%  [kernel]                 [k] vfs_write
     0.49%  libpthread-2.13.so       [.] pthread_getspecific
     0.48%  qemu-system-x86_64       [.] qemu_event_reset
     0.47%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
     0.46%  qemu-system-x86_64       [.] address_space_translate_internal
     0.46%  [kernel]                 [k] sock_poll
     0.46%  libpthread-2.13.so       [.] __pthread_disable_asynccancel
     0.44%  [kernel]                 [k] resched_task
     0.43%  libpthread-2.13.so       [.] __pthread_enable_asynccancel
     0.42%  qemu-system-x86_64       [.] phys_page_find
     0.39%  qemu-system-x86_64       [.] object_dynamic_cast_assert

>And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled.
>25:    2245933   PCI-MSI-edge      virtio1-requests
>
>fio command:
>fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest
>
>QEMU comamnd:
>/usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pc
 i
> .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0
>
>[environment]
>Host:linux-3.10(RHEL7-rc1)
>QEMU: qemu-2.0.1
>Guest: RHEL6.5
>
># lspci -tv
>-[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]
>           +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>           +-01.1  Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
>           +-01.2  Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II]
>           +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
>           +-02.0  Cirrus Logic GD 5446
>           +-03.0  Red Hat, Inc Virtio console
>           +-0b.0  Red Hat, Inc Virtio block device
>           +-0c.0  Red Hat, Inc Virtio block device
>           \-12.0  Red Hat, Inc Virtio network device
>
># lspci -vvv
>00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
>00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>
>00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master])
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
>        Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
>        Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
>        Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
>        Region 4: I/O ports at c0e0 [size=16]
>        Kernel driver in use: ata_piix
>        Kernel modules: ata_generic, pata_acpi, ata_piix
>
>00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI])
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Interrupt: pin D routed to IRQ 11
>        Region 4: I/O ports at c080 [size=32]
>        Kernel driver in use: uhci_hcd
>
>00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
>        Subsystem: Red Hat, Inc Qemu virtual machine
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 9
>        Kernel driver in use: piix4_smbus
>        Kernel modules: i2c-piix4
>
>00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller])
>        Subsystem: Red Hat, Inc Device 1100
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
>        Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K]
>        Expansion ROM at febc0000 [disabled] [size=64K]
>        Kernel modules: cirrusfb
>
>00:03.0 Communication controller: Red Hat, Inc Virtio console
>        Subsystem: Red Hat, Inc Device 0003
>        Physical Slot: 3
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 10
>        Region 0: I/O ports at c0a0 [size=32]
>        Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
>        Capabilities: [40] MSI-X: Enable- Count=4 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device
>        Subsystem: Red Hat, Inc Device 0002
>        Physical Slot: 11
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 10
>        Region 0: I/O ports at c000 [size=64]
>        Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K]
>        Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device
>        Subsystem: Red Hat, Inc Device 0002
>        Physical Slot: 12
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 11
>        Region 0: I/O ports at c040 [size=64]
>        Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K]
>        Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>00:12.0 Ethernet controller: Red Hat, Inc Virtio network device
>        Subsystem: Red Hat, Inc Device 0001
>        Physical Slot: 18
>        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Interrupt: pin A routed to IRQ 10
>        Region 0: I/O ports at c0c0 [size=32]
>        Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K]
>        Expansion ROM at feb80000 [disabled] [size=256K]
>        Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
>                Vector table: BAR=1 offset=00000000
>                PBA: BAR=1 offset=00000800
>        Kernel driver in use: virtio-pci
>        Kernel modules: virtio_pci
>
>Thanks,
>Zhang Haoyu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
  2014-09-12  3:21                     ` [Qemu-devel] " Zhang Haoyu
@ 2014-09-12 12:38                       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2014-09-12 12:38 UTC (permalink / raw)
  To: Zhang Haoyu
  Cc: Amit Shah, Zhang Haoyu, Paolo Bonzini, qemu-devel, kvm,
	Christian Borntraeger, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 6938 bytes --]

On Fri, Sep 12, 2014 at 11:21:37AM +0800, Zhang Haoyu wrote:
> >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
> >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
> >>> > 
> >>> > That could be the case if MSI is disabled.
> >>> 
> >>> Do the windows virtio drivers enable MSIs, in their inf file?
> >>
> >>It depends on the version of the drivers, but it is a reasonable guess
> >>at what differs between Linux and Windows.  Haoyu, can you give us the
> >>output of lspci from a Linux guest?
> >>
> >I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
> >virtio_console module installed:
> >64K-write-sequence: 285 MBPS, 4380 IOPS
> >virtio_console module uninstalled:
> >64K-write-sequence: 370 MBPS, 5670 IOPS
> >
> I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that,
> virtio_console module installed:
> qemu main thread cpu usage: 98%
> virtio_console module uninstalled:
> qemu main thread cpu usage: 60%
> 
> perf top -p <qemu-pid> result,
> virtio_console module installed:
>    PerfTop:    9868 irqs/sec  kernel:76.4%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>     11.80%  [kernel]                 [k] _raw_spin_lock_irqsave
>      8.42%  [kernel]                 [k] _raw_spin_unlock_irqrestore
>      7.33%  [kernel]                 [k] fget_light
>      6.28%  [kernel]                 [k] fput
>      3.61%  [kernel]                 [k] do_sys_poll
>      3.30%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
>      3.10%  [kernel]                 [k] __pollwait
>      2.15%  qemu-system-x86_64       [.] qemu_iohandler_poll
>      1.44%  libglib-2.0.so.0.3200.4  [.] g_array_append_vals
>      1.36%  libc-2.13.so             [.] 0x000000000011fc2a
>      1.31%  libpthread-2.13.so       [.] pthread_mutex_lock
>      1.24%  libglib-2.0.so.0.3200.4  [.] 0x000000000001f961
>      1.20%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
>      0.99%  [kernel]                 [k] eventfd_poll
>      0.98%  [vdso]                   [.] 0x0000000000000771
>      0.97%  [kernel]                 [k] remove_wait_queue
>      0.96%  qemu-system-x86_64       [.] qemu_iohandler_fill
>      0.95%  [kernel]                 [k] add_wait_queue
>      0.69%  [kernel]                 [k] __srcu_read_lock
>      0.58%  [kernel]                 [k] poll_freewait
>      0.57%  [kernel]                 [k] _raw_spin_lock_irq
>      0.54%  [kernel]                 [k] __srcu_read_unlock
>      0.47%  [kernel]                 [k] copy_user_enhanced_fast_string
>      0.46%  [kvm_intel]              [k] vmx_vcpu_run
>      0.46%  [kvm]                    [k] vcpu_enter_guest
>      0.42%  [kernel]                 [k] tcp_poll
>      0.41%  [kernel]                 [k] system_call_after_swapgs
>      0.40%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
>      0.40%  [kernel]                 [k] system_call
>      0.38%  libpthread-2.13.so       [.] 0x000000000000e18d
>      0.38%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
>      0.38%  qemu-system-x86_64       [.] address_space_translate_internal
>      0.38%  [kernel]                 [k] _raw_spin_lock
>      0.37%  qemu-system-x86_64       [.] phys_page_find
>      0.36%  [kernel]                 [k] get_page_from_freelist
>      0.35%  [kernel]                 [k] sock_poll
>      0.34%  [kernel]                 [k] fsnotify
>      0.31%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
>      0.30%  [kernel]                 [k] do_direct_IO
>      0.29%  libpthread-2.13.so       [.] pthread_getspecific
> 
> virtio_console module uninstalled:
>    PerfTop:    9138 irqs/sec  kernel:71.7%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
> ------------------------------------------------------------------------------------------------------------------------------
> 
>      5.72%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
>      4.51%  [kernel]                 [k] fget_light
>      3.98%  [kernel]                 [k] _raw_spin_lock_irqsave
>      2.55%  [kernel]                 [k] fput
>      2.48%  libpthread-2.13.so       [.] pthread_mutex_lock
>      2.46%  [kernel]                 [k] _raw_spin_unlock_irqrestore
>      2.21%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
>      1.71%  [vdso]                   [.] 0x000000000000060c
>      1.68%  libc-2.13.so             [.] 0x00000000000e751f
>      1.64%  libglib-2.0.so.0.3200.4  [.] 0x000000000004fca0
>      1.20%  [kernel]                 [k] __srcu_read_lock
>      1.14%  [kernel]                 [k] do_sys_poll
>      0.96%  [kernel]                 [k] _raw_spin_lock_irq
>      0.95%  [kernel]                 [k] __pollwait
>      0.91%  [kernel]                 [k] __srcu_read_unlock
>      0.78%  [kernel]                 [k] tcp_poll
>      0.74%  [kvm]                    [k] vcpu_enter_guest
>      0.73%  [kvm_intel]              [k] vmx_vcpu_run
>      0.72%  [kernel]                 [k] _raw_spin_lock
>      0.72%  [kernel]                 [k] system_call_after_swapgs
>      0.70%  [kernel]                 [k] copy_user_enhanced_fast_string
>      0.67%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
>      0.66%  libpthread-2.13.so       [.] 0x000000000000e12d
>      0.65%  [kernel]                 [k] system_call
>      0.61%  [kernel]                 [k] do_direct_IO
>      0.57%  qemu-system-x86_64       [.] qemu_iohandler_poll
>      0.57%  [kernel]                 [k] fsnotify
>      0.54%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
>      0.50%  [kernel]                 [k] vfs_write
>      0.49%  libpthread-2.13.so       [.] pthread_getspecific
>      0.48%  qemu-system-x86_64       [.] qemu_event_reset
>      0.47%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
>      0.46%  qemu-system-x86_64       [.] address_space_translate_internal
>      0.46%  [kernel]                 [k] sock_poll
>      0.46%  libpthread-2.13.so       [.] __pthread_disable_asynccancel
>      0.44%  [kernel]                 [k] resched_task
>      0.43%  libpthread-2.13.so       [.] __pthread_enable_asynccancel
>      0.42%  qemu-system-x86_64       [.] phys_page_find
>      0.39%  qemu-system-x86_64       [.] object_dynamic_cast_assert

Max: Unrelated to this performance issue but I notice that the qcow2
metadata overlap check is high in the host CPU profile.  Have you had
any thoughts about optimizing the check?

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
@ 2014-09-12 12:38                       ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2014-09-12 12:38 UTC (permalink / raw)
  To: Zhang Haoyu
  Cc: kvm, qemu-devel, Zhang Haoyu, Max Reitz, Christian Borntraeger,
	Amit Shah, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 6938 bytes --]

On Fri, Sep 12, 2014 at 11:21:37AM +0800, Zhang Haoyu wrote:
> >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that.
> >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused.
> >>> > 
> >>> > That could be the case if MSI is disabled.
> >>> 
> >>> Do the windows virtio drivers enable MSIs, in their inf file?
> >>
> >>It depends on the version of the drivers, but it is a reasonable guess
> >>at what differs between Linux and Windows.  Haoyu, can you give us the
> >>output of lspci from a Linux guest?
> >>
> >I made a test with fio on rhel-6.5 guest, the same degradation happened too,  this degradation can be reproduced on rhel6.5 guest 100%.
> >virtio_console module installed:
> >64K-write-sequence: 285 MBPS, 4380 IOPS
> >virtio_console module uninstalled:
> >64K-write-sequence: 370 MBPS, 5670 IOPS
> >
> I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that,
> virtio_console module installed:
> qemu main thread cpu usage: 98%
> virtio_console module uninstalled:
> qemu main thread cpu usage: 60%
> 
> perf top -p <qemu-pid> result,
> virtio_console module installed:
>    PerfTop:    9868 irqs/sec  kernel:76.4%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>     11.80%  [kernel]                 [k] _raw_spin_lock_irqsave
>      8.42%  [kernel]                 [k] _raw_spin_unlock_irqrestore
>      7.33%  [kernel]                 [k] fget_light
>      6.28%  [kernel]                 [k] fput
>      3.61%  [kernel]                 [k] do_sys_poll
>      3.30%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
>      3.10%  [kernel]                 [k] __pollwait
>      2.15%  qemu-system-x86_64       [.] qemu_iohandler_poll
>      1.44%  libglib-2.0.so.0.3200.4  [.] g_array_append_vals
>      1.36%  libc-2.13.so             [.] 0x000000000011fc2a
>      1.31%  libpthread-2.13.so       [.] pthread_mutex_lock
>      1.24%  libglib-2.0.so.0.3200.4  [.] 0x000000000001f961
>      1.20%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
>      0.99%  [kernel]                 [k] eventfd_poll
>      0.98%  [vdso]                   [.] 0x0000000000000771
>      0.97%  [kernel]                 [k] remove_wait_queue
>      0.96%  qemu-system-x86_64       [.] qemu_iohandler_fill
>      0.95%  [kernel]                 [k] add_wait_queue
>      0.69%  [kernel]                 [k] __srcu_read_lock
>      0.58%  [kernel]                 [k] poll_freewait
>      0.57%  [kernel]                 [k] _raw_spin_lock_irq
>      0.54%  [kernel]                 [k] __srcu_read_unlock
>      0.47%  [kernel]                 [k] copy_user_enhanced_fast_string
>      0.46%  [kvm_intel]              [k] vmx_vcpu_run
>      0.46%  [kvm]                    [k] vcpu_enter_guest
>      0.42%  [kernel]                 [k] tcp_poll
>      0.41%  [kernel]                 [k] system_call_after_swapgs
>      0.40%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
>      0.40%  [kernel]                 [k] system_call
>      0.38%  libpthread-2.13.so       [.] 0x000000000000e18d
>      0.38%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
>      0.38%  qemu-system-x86_64       [.] address_space_translate_internal
>      0.38%  [kernel]                 [k] _raw_spin_lock
>      0.37%  qemu-system-x86_64       [.] phys_page_find
>      0.36%  [kernel]                 [k] get_page_from_freelist
>      0.35%  [kernel]                 [k] sock_poll
>      0.34%  [kernel]                 [k] fsnotify
>      0.31%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
>      0.30%  [kernel]                 [k] do_direct_IO
>      0.29%  libpthread-2.13.so       [.] pthread_getspecific
> 
> virtio_console module uninstalled:
>    PerfTop:    9138 irqs/sec  kernel:71.7%  exact:  0.0% [4000Hz cycles],  (target_pid: 88381)
> ------------------------------------------------------------------------------------------------------------------------------
> 
>      5.72%  qemu-system-x86_64       [.] qcow2_check_metadata_overlap
>      4.51%  [kernel]                 [k] fget_light
>      3.98%  [kernel]                 [k] _raw_spin_lock_irqsave
>      2.55%  [kernel]                 [k] fput
>      2.48%  libpthread-2.13.so       [.] pthread_mutex_lock
>      2.46%  [kernel]                 [k] _raw_spin_unlock_irqrestore
>      2.21%  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
>      1.71%  [vdso]                   [.] 0x000000000000060c
>      1.68%  libc-2.13.so             [.] 0x00000000000e751f
>      1.64%  libglib-2.0.so.0.3200.4  [.] 0x000000000004fca0
>      1.20%  [kernel]                 [k] __srcu_read_lock
>      1.14%  [kernel]                 [k] do_sys_poll
>      0.96%  [kernel]                 [k] _raw_spin_lock_irq
>      0.95%  [kernel]                 [k] __pollwait
>      0.91%  [kernel]                 [k] __srcu_read_unlock
>      0.78%  [kernel]                 [k] tcp_poll
>      0.74%  [kvm]                    [k] vcpu_enter_guest
>      0.73%  [kvm_intel]              [k] vmx_vcpu_run
>      0.72%  [kernel]                 [k] _raw_spin_lock
>      0.72%  [kernel]                 [k] system_call_after_swapgs
>      0.70%  [kernel]                 [k] copy_user_enhanced_fast_string
>      0.67%  libglib-2.0.so.0.3200.4  [.] g_slice_free1
>      0.66%  libpthread-2.13.so       [.] 0x000000000000e12d
>      0.65%  [kernel]                 [k] system_call
>      0.61%  [kernel]                 [k] do_direct_IO
>      0.57%  qemu-system-x86_64       [.] qemu_iohandler_poll
>      0.57%  [kernel]                 [k] fsnotify
>      0.54%  libglib-2.0.so.0.3200.4  [.] g_slice_alloc
>      0.50%  [kernel]                 [k] vfs_write
>      0.49%  libpthread-2.13.so       [.] pthread_getspecific
>      0.48%  qemu-system-x86_64       [.] qemu_event_reset
>      0.47%  libglib-2.0.so.0.3200.4  [.] g_main_context_check
>      0.46%  qemu-system-x86_64       [.] address_space_translate_internal
>      0.46%  [kernel]                 [k] sock_poll
>      0.46%  libpthread-2.13.so       [.] __pthread_disable_asynccancel
>      0.44%  [kernel]                 [k] resched_task
>      0.43%  libpthread-2.13.so       [.] __pthread_enable_asynccancel
>      0.42%  qemu-system-x86_64       [.] phys_page_find
>      0.39%  qemu-system-x86_64       [.] object_dynamic_cast_assert

Max: Unrelated to this performance issue but I notice that the qcow2
metadata overlap check is high in the host CPU profile.  Have you had
any thoughts about optimizing the check?

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
  2014-09-12 12:38                       ` Stefan Hajnoczi
@ 2014-09-13 17:22                         ` Max Reitz
  -1 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2014-09-13 17:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Zhang Haoyu
  Cc: Amit Shah, Zhang Haoyu, Paolo Bonzini, qemu-devel, kvm,
	Christian Borntraeger

On 12.09.2014 14:38, Stefan Hajnoczi wrote:
> Max: Unrelated to this performance issue but I notice that the qcow2
> metadata overlap check is high in the host CPU profile.  Have you had
> any thoughts about optimizing the check?
>
> Stefan

In fact, I have done so (albeit only briefly). Instead of gathering all 
the information in the overlap function itself, we could either have a 
generic list of typed ranges (e.g. "cluster 0: header", "clusters 1 to 
5: L1 table", etc.) or a not-really-bitmap (with 4 bits per entry 
specifying the cluster type (header, L1 table, free or data cluster, etc.)).

The disadvantage of the former would be that in its simplest form we'd 
have to run through the whole list to find out whether a cluster is 
already reserved for metadata or not. We could easily optimize this by 
keeping the list in order and then performing a binary search.

The disadvantage of the latter would obviously be its memory size. For a 
1 TB image with 64 kB clusters, it would be 8 MB in size. Could be 
considered acceptable, but I deem it too large. The advantage would be 
constant access time, of course.

We could combine both approaches, that is, using the bitmap as a cache: 
Whenever a cluster is overlap checked, the corresponding bitmap range 
(or "bitmap window") is requested; if that is not available, it is 
generated from the range list and then put into the cache.

The remaining question is how large the range list would be in memory. 
Basically, its size would be comparable to an RLE version of the bitmap. 
In contrast to a raw RLE version, however, we'd have to add the start 
cluster to each entry in order to be able to perform binary search and 
we'd omit free and/or data clusters. So, we'd have 4 bits for the 
cluster type, let's say 12 bits for the cluster count and of course 64 
bits for the first cluster index. Or, for maximum efficiency, we'd have 
64 - 9 - 1 = 54 bits for the cluster index, 4 bits for the type and then 
6 bits for the cluster count. The first variant gives us 10 bytes per 
metadata range, the second 8. Considering one refcount block can handle 
cluster_size / 2 entries and one L2 table can handle cluster_size / 8 
entries, we have (for images with a cluster size of 64 kB) a ratio of 
about 1/32768 refcount blocks per cluster and 1/8192 L2 tables per 
cluster. I guess we therefore have a metadata ratio of about 1/6000. At 
the worst, each metadata cluster requires its own range list entry, 
which for 10 bytes per entry means less than 30 kB for the list of a 1 
TB image with 64 kB clusters. I think that's acceptable.

We could compress that list even more by making it a real RLE version of 
the bitmap, removing the cluster index from each entry; remember that 
for this mixed range list/bitmap approach we no longer need to be able 
to perform exact binary search but only need to be able to quickly seek 
to the beginning of a bitmap window. This can be achieved by forcing 
breaks in the range list at every window border and keeping track of 
those offsets along with the corresponding bitmap window index. When we 
want to generate a bitmap window, we look up the start offset in the 
range list (constant time), then generate it (linear to window size) and 
can then perform constant-time lookups for each overlap checks in that 
window.

I think that could greatly speed things up and also allow us to always 
perform range checks on data structures not kept in memory (inactive L1 
and L2 tables). The only question now remaining to me is whether that 
caching is actually feasible or whether binary search into the range 
list (which then would have to include the cluster index for each entry) 
would be faster than generating bitmap windows which might suffer from 
ping-pong effects.

Max

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
@ 2014-09-13 17:22                         ` Max Reitz
  0 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2014-09-13 17:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Zhang Haoyu
  Cc: kvm, qemu-devel, Zhang Haoyu, Christian Borntraeger, Amit Shah,
	Paolo Bonzini

On 12.09.2014 14:38, Stefan Hajnoczi wrote:
> Max: Unrelated to this performance issue but I notice that the qcow2
> metadata overlap check is high in the host CPU profile.  Have you had
> any thoughts about optimizing the check?
>
> Stefan

In fact, I have done so (albeit only briefly). Instead of gathering all 
the information in the overlap function itself, we could either have a 
generic list of typed ranges (e.g. "cluster 0: header", "clusters 1 to 
5: L1 table", etc.) or a not-really-bitmap (with 4 bits per entry 
specifying the cluster type (header, L1 table, free or data cluster, etc.)).

The disadvantage of the former would be that in its simplest form we'd 
have to run through the whole list to find out whether a cluster is 
already reserved for metadata or not. We could easily optimize this by 
keeping the list in order and then performing a binary search.

The disadvantage of the latter would obviously be its memory size. For a 
1 TB image with 64 kB clusters, it would be 8 MB in size. Could be 
considered acceptable, but I deem it too large. The advantage would be 
constant access time, of course.

We could combine both approaches, that is, using the bitmap as a cache: 
Whenever a cluster is overlap checked, the corresponding bitmap range 
(or "bitmap window") is requested; if that is not available, it is 
generated from the range list and then put into the cache.

The remaining question is how large the range list would be in memory. 
Basically, its size would be comparable to an RLE version of the bitmap. 
In contrast to a raw RLE version, however, we'd have to add the start 
cluster to each entry in order to be able to perform binary search and 
we'd omit free and/or data clusters. So, we'd have 4 bits for the 
cluster type, let's say 12 bits for the cluster count and of course 64 
bits for the first cluster index. Or, for maximum efficiency, we'd have 
64 - 9 - 1 = 54 bits for the cluster index, 4 bits for the type and then 
6 bits for the cluster count. The first variant gives us 10 bytes per 
metadata range, the second 8. Considering one refcount block can handle 
cluster_size / 2 entries and one L2 table can handle cluster_size / 8 
entries, we have (for images with a cluster size of 64 kB) a ratio of 
about 1/32768 refcount blocks per cluster and 1/8192 L2 tables per 
cluster. I guess we therefore have a metadata ratio of about 1/6000. At 
the worst, each metadata cluster requires its own range list entry, 
which for 10 bytes per entry means less than 30 kB for the list of a 1 
TB image with 64 kB clusters. I think that's acceptable.

We could compress that list even more by making it a real RLE version of 
the bitmap, removing the cluster index from each entry; remember that 
for this mixed range list/bitmap approach we no longer need to be able 
to perform exact binary search but only need to be able to quickly seek 
to the beginning of a bitmap window. This can be achieved by forcing 
breaks in the range list at every window border and keeping track of 
those offsets along with the corresponding bitmap window index. When we 
want to generate a bitmap window, we look up the start offset in the 
range list (constant time), then generate it (linear to window size) and 
can then perform constant-time lookups for each overlap checks in that 
window.

I think that could greatly speed things up and also allow us to always 
perform range checks on data structures not kept in memory (inactive L1 
and L2 tables). The only question now remaining to me is whether that 
caching is actually feasible or whether binary search into the range 
list (which then would have to include the cluster index for each entry) 
would be faster than generating bitmap windows which might suffer from 
ping-pong effects.

Max

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [question] virtio-blk performance degradation happened with virito-serial
  2014-09-12  3:21                     ` [Qemu-devel] " Zhang Haoyu
@ 2014-09-16 14:59                       ` Zhang Haoyu
  -1 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-16 14:59 UTC (permalink / raw)
  To: Zhang Haoyu, Amit Shah, Paolo Bonzini
  Cc: qemu-devel, kvm, Christian Borntraeger

>>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating
system has to check each virtqueue for activity. Maybe there is some
inefficiency doing that.
>>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports +
console) even if everything is unused.
>>>>>
>>>>> That could be the case if MSI is disabled.
>>>>
>>>> Do the windows virtio drivers enable MSIs, in their inf file?
>>>
>>> It depends on the version of the drivers, but it is a reasonable guess
>>> at what differs between Linux and Windows. Haoyu, can you give us the
>>> output of lspci from a Linux guest?
>>>
>> I made a test with fio on rhel-6.5 guest, the same degradation
happened too, this degradation can be reproduced on rhel6.5 guest 100%.
>> virtio_console module installed:
>> 64K-write-sequence: 285 MBPS, 4380 IOPS
>> virtio_console module uninstalled:
>> 64K-write-sequence: 370 MBPS, 5670 IOPS
>>
>I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that,
>virtio_console module installed:
>qemu main thread cpu usage: 98%
>virtio_console module uninstalled:
>qemu main thread cpu usage: 60%
>

I found that the statement "err =
register_virtio_driver(&virtio_console);" in virtio_console module's
init() function will
cause the degradation, if I directly return before "err =
register_virtio_driver(&virtio_console);", then the degradation disappeared,
if directly return after "err =
register_virtio_driver(&virtio_console);", the degradation is still there.
I will try below test case,
1. Dose not emulate virito-serial deivce, then install/uninstall
virtio_console driver in guest,
to see whether there is difference in virtio-blk performance and cpu usage.
2. Does not emulate virito-serial deivce, then install virtio_balloon
driver (and also dose not emulate virtio-balloon device),
to see whether virtio-blk performance degradation will happen.
3. Emulating virtio-balloon device instead of virtio-serial deivce ,
then to see whether the virtio-blk performance is hampered.

Base on the test result, corresponding analysis will be performed.
Any ideas?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
@ 2014-09-16 14:59                       ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-16 14:59 UTC (permalink / raw)
  To: Zhang Haoyu, Amit Shah, Paolo Bonzini
  Cc: Christian Borntraeger, qemu-devel, kvm

>>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating
system has to check each virtqueue for activity. Maybe there is some
inefficiency doing that.
>>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports +
console) even if everything is unused.
>>>>>
>>>>> That could be the case if MSI is disabled.
>>>>
>>>> Do the windows virtio drivers enable MSIs, in their inf file?
>>>
>>> It depends on the version of the drivers, but it is a reasonable guess
>>> at what differs between Linux and Windows. Haoyu, can you give us the
>>> output of lspci from a Linux guest?
>>>
>> I made a test with fio on rhel-6.5 guest, the same degradation
happened too, this degradation can be reproduced on rhel6.5 guest 100%.
>> virtio_console module installed:
>> 64K-write-sequence: 285 MBPS, 4380 IOPS
>> virtio_console module uninstalled:
>> 64K-write-sequence: 370 MBPS, 5670 IOPS
>>
>I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that,
>virtio_console module installed:
>qemu main thread cpu usage: 98%
>virtio_console module uninstalled:
>qemu main thread cpu usage: 60%
>

I found that the statement "err =
register_virtio_driver(&virtio_console);" in virtio_console module's
init() function will
cause the degradation, if I directly return before "err =
register_virtio_driver(&virtio_console);", then the degradation disappeared,
if directly return after "err =
register_virtio_driver(&virtio_console);", the degradation is still there.
I will try below test case,
1. Dose not emulate virito-serial deivce, then install/uninstall
virtio_console driver in guest,
to see whether there is difference in virtio-blk performance and cpu usage.
2. Does not emulate virito-serial deivce, then install virtio_balloon
driver (and also dose not emulate virtio-balloon device),
to see whether virtio-blk performance degradation will happen.
3. Emulating virtio-balloon device instead of virtio-serial deivce ,
then to see whether the virtio-blk performance is hampered.

Base on the test result, corresponding analysis will be performed.
Any ideas?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
@ 2014-09-19  5:53           ` Fam Zheng
  2014-09-04  2:20         ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
  2014-09-19  5:53           ` Fam Zheng
  2 siblings, 0 replies; 40+ messages in thread
From: Fam Zheng @ 2014-09-19  5:53 UTC (permalink / raw)
  To: Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm, Paolo Bonzini

On Tue, 09/02 12:06, Amit Shah wrote:
> On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote:
> > >>> Hi, all
> > >>> 
> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
> > >>> without virtio-serial:
> > >>> 4k-read-random 1186 IOPS
> > >>> with virtio-serial:
> > >>> 4k-read-random 871 IOPS
> > >>> 
> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
> > >>> 
> > >>> And, ide performance degradation does not happen with virtio-serial.
> > >>
> > >>Pretty sure it's related to MSI vectors in use.  It's possible that
> > >>the virtio-serial device takes up all the avl vectors in the guests,
> > >>leaving old-style irqs for the virtio-blk device.
> > >>
> > >I don't think so,
> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
> > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
> > >the performance got back again, very obvious.
> > add comments:
> > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.
> 
> Using the vectors= option as mentioned below, you can restrict the
> number of MSI vectors the virtio-serial device gets.  You can then
> confirm whether it's MSI that's related to these issues.

Amit,

It's related to the big number of ioeventfds used in virtio-serial-pci. With
virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
guest initializes it or not.

In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
virtio_console.ko, whereas 76 once modprobe virtio_console.

Looks like the ppoll takes more time to poll more fds.

Some trace data with systemtap:

12 fds:

time  rel_time      symbol
15    (+1)          qemu_poll_ns  [enter]
18    (+3)          qemu_poll_ns  [return]

76 fd:

12    (+2)          qemu_poll_ns  [enter]
18    (+6)          qemu_poll_ns  [return]

I haven't looked at virtio-serial code, I'm not sure if we should reduce the
number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.

Haven't compared with g_poll but I think the underlying syscall should be the
same.

Any ideas?

Fam


> 
> > >So, I think it has no business with legacy interrupt mode, right?
> > >
> > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
> > >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
> > >any ideas?
> > >
> > >Thanks,
> > >Zhang Haoyu
> > >>If you restrict the number of vectors the virtio-serial device gets
> > >>(using the -device virtio-serial-pci,vectors= param), does that make
> > >>things better for you?
> 
> 
> 
> 		Amit
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
@ 2014-09-19  5:53           ` Fam Zheng
  0 siblings, 0 replies; 40+ messages in thread
From: Fam Zheng @ 2014-09-19  5:53 UTC (permalink / raw)
  To: Amit Shah; +Cc: Paolo Bonzini, Zhang Haoyu, qemu-devel, kvm

On Tue, 09/02 12:06, Amit Shah wrote:
> On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote:
> > >>> Hi, all
> > >>> 
> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
> > >>> without virtio-serial:
> > >>> 4k-read-random 1186 IOPS
> > >>> with virtio-serial:
> > >>> 4k-read-random 871 IOPS
> > >>> 
> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
> > >>> 
> > >>> And, ide performance degradation does not happen with virtio-serial.
> > >>
> > >>Pretty sure it's related to MSI vectors in use.  It's possible that
> > >>the virtio-serial device takes up all the avl vectors in the guests,
> > >>leaving old-style irqs for the virtio-blk device.
> > >>
> > >I don't think so,
> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
> > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
> > >the performance got back again, very obvious.
> > add comments:
> > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.
> 
> Using the vectors= option as mentioned below, you can restrict the
> number of MSI vectors the virtio-serial device gets.  You can then
> confirm whether it's MSI that's related to these issues.

Amit,

It's related to the big number of ioeventfds used in virtio-serial-pci. With
virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
guest initializes it or not.

In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
virtio_console.ko, whereas 76 once modprobe virtio_console.

Looks like the ppoll takes more time to poll more fds.

Some trace data with systemtap:

12 fds:

time  rel_time      symbol
15    (+1)          qemu_poll_ns  [enter]
18    (+3)          qemu_poll_ns  [return]

76 fd:

12    (+2)          qemu_poll_ns  [enter]
18    (+6)          qemu_poll_ns  [return]

I haven't looked at virtio-serial code, I'm not sure if we should reduce the
number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.

Haven't compared with g_poll but I think the underlying syscall should be the
same.

Any ideas?

Fam


> 
> > >So, I think it has no business with legacy interrupt mode, right?
> > >
> > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
> > >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
> > >any ideas?
> > >
> > >Thanks,
> > >Zhang Haoyu
> > >>If you restrict the number of vectors the virtio-serial device gets
> > >>(using the -device virtio-serial-pci,vectors= param), does that make
> > >>things better for you?
> 
> 
> 
> 		Amit
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [question] virtio-blk performance degradationhappened with virito-serial
  2014-09-19  5:53           ` Fam Zheng
@ 2014-09-19 13:35             ` Paolo Bonzini
  -1 siblings, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2014-09-19 13:35 UTC (permalink / raw)
  To: Fam Zheng, Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm

Il 19/09/2014 07:53, Fam Zheng ha scritto:
> Any ideas?

The obvious, but hardish one is to switch to epoll (one epoll fd per
AioContext, plus one for iohandler.c).

This would require converting iohandler.c to a GSource.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
@ 2014-09-19 13:35             ` Paolo Bonzini
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2014-09-19 13:35 UTC (permalink / raw)
  To: Fam Zheng, Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm

Il 19/09/2014 07:53, Fam Zheng ha scritto:
> Any ideas?

The obvious, but hardish one is to switch to epoll (one epoll fd per
AioContext, plus one for iohandler.c).

This would require converting iohandler.c to a GSource.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial
  2014-09-19  5:53           ` Fam Zheng
@ 2014-09-22 13:23             ` Zhang Haoyu
  -1 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-22 13:23 UTC (permalink / raw)
  To: Fam Zheng, Amit Shah; +Cc: qemu-devel, kvm, Paolo Bonzini

>> > >>> Hi, all
>> > >>> 
>> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>> > >>> without virtio-serial:
>> > >>> 4k-read-random 1186 IOPS
>> > >>> with virtio-serial:
>> > >>> 4k-read-random 871 IOPS
>> > >>> 
>> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>> > >>> 
>> > >>> And, ide performance degradation does not happen with virtio-serial.
>> > >>
>> > >>Pretty sure it's related to MSI vectors in use.  It's possible that
>> > >>the virtio-serial device takes up all the avl vectors in the guests,
>> > >>leaving old-style irqs for the virtio-blk device.
>> > >>
>> > >I don't think so,
>> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>> > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>> > >the performance got back again, very obvious.
>> > add comments:
>> > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.
>> 
>> Using the vectors= option as mentioned below, you can restrict the
>> number of MSI vectors the virtio-serial device gets.  You can then
>> confirm whether it's MSI that's related to these issues.
>
>Amit,
>
>It's related to the big number of ioeventfds used in virtio-serial-pci. With
>virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
>guest initializes it or not.
>
>In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
>virtio_console.ko, whereas 76 once modprobe virtio_console.
>
>Looks like the ppoll takes more time to poll more fds.
>
>Some trace data with systemtap:
>
>12 fds:
>
>time  rel_time      symbol
>15    (+1)          qemu_poll_ns  [enter]
>18    (+3)          qemu_poll_ns  [return]
>
>76 fd:
>
>12    (+2)          qemu_poll_ns  [enter]
>18    (+6)          qemu_poll_ns  [return]
>
>I haven't looked at virtio-serial code, I'm not sure if we should reduce the
>number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.
>
Does ioeventfd=off hamper the performance of virtio-serial?
In my opinion, virtio-serial's use scenario is not so high throughput rate, 
so ioventfd=off has slight impaction on the performance.

Thanks,
Zhang Haoyu

>Haven't compared with g_poll but I think the underlying syscall should be the
>same.
>
>Any ideas?
>
>Fam
>
>
>> 
>> > >So, I think it has no business with legacy interrupt mode, right?
>> > >
>> > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
>> > >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
>> > >any ideas?
>> > >
>> > >Thanks,
>> > >Zhang Haoyu
>> > >>If you restrict the number of vectors the virtio-serial device gets
>> > >>(using the -device virtio-serial-pci,vectors= param), does that make
>> > >>things better for you?
>> 
>> 
>> 
>> 		Amit


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial
@ 2014-09-22 13:23             ` Zhang Haoyu
  0 siblings, 0 replies; 40+ messages in thread
From: Zhang Haoyu @ 2014-09-22 13:23 UTC (permalink / raw)
  To: Fam Zheng, Amit Shah; +Cc: Paolo Bonzini, qemu-devel, kvm

>> > >>> Hi, all
>> > >>> 
>> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%.
>> > >>> without virtio-serial:
>> > >>> 4k-read-random 1186 IOPS
>> > >>> with virtio-serial:
>> > >>> 4k-read-random 871 IOPS
>> > >>> 
>> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%.
>> > >>> 
>> > >>> And, ide performance degradation does not happen with virtio-serial.
>> > >>
>> > >>Pretty sure it's related to MSI vectors in use.  It's possible that
>> > >>the virtio-serial device takes up all the avl vectors in the guests,
>> > >>leaving old-style irqs for the virtio-blk device.
>> > >>
>> > >I don't think so,
>> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable,
>> > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable,
>> > >the performance got back again, very obvious.
>> > add comments:
>> > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened.
>> 
>> Using the vectors= option as mentioned below, you can restrict the
>> number of MSI vectors the virtio-serial device gets.  You can then
>> confirm whether it's MSI that's related to these issues.
>
>Amit,
>
>It's related to the big number of ioeventfds used in virtio-serial-pci. With
>virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
>guest initializes it or not.
>
>In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
>virtio_console.ko, whereas 76 once modprobe virtio_console.
>
>Looks like the ppoll takes more time to poll more fds.
>
>Some trace data with systemtap:
>
>12 fds:
>
>time  rel_time      symbol
>15    (+1)          qemu_poll_ns  [enter]
>18    (+3)          qemu_poll_ns  [return]
>
>76 fd:
>
>12    (+2)          qemu_poll_ns  [enter]
>18    (+6)          qemu_poll_ns  [return]
>
>I haven't looked at virtio-serial code, I'm not sure if we should reduce the
>number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.
>
Does ioeventfd=off hamper the performance of virtio-serial?
In my opinion, virtio-serial's use scenario is not so high throughput rate, 
so ioventfd=off has slight impaction on the performance.

Thanks,
Zhang Haoyu

>Haven't compared with g_poll but I think the underlying syscall should be the
>same.
>
>Any ideas?
>
>Fam
>
>
>> 
>> > >So, I think it has no business with legacy interrupt mode, right?
>> > >
>> > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest,
>> > >and the difference of perf top data on guest when disable/enable virtio-serial in guest,
>> > >any ideas?
>> > >
>> > >Thanks,
>> > >Zhang Haoyu
>> > >>If you restrict the number of vectors the virtio-serial device gets
>> > >>(using the -device virtio-serial-pci,vectors= param), does that make
>> > >>things better for you?
>> 
>> 
>> 
>> 		Amit

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial
  2014-09-22 13:23             ` Zhang Haoyu
@ 2014-09-23  1:29               ` Fam Zheng
  -1 siblings, 0 replies; 40+ messages in thread
From: Fam Zheng @ 2014-09-23  1:29 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: Amit Shah, qemu-devel, kvm, Paolo Bonzini

On Mon, 09/22 21:23, Zhang Haoyu wrote:
> >
> >Amit,
> >
> >It's related to the big number of ioeventfds used in virtio-serial-pci. With
> >virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
> >guest initializes it or not.
> >
> >In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
> >virtio_console.ko, whereas 76 once modprobe virtio_console.
> >
> >Looks like the ppoll takes more time to poll more fds.
> >
> >Some trace data with systemtap:
> >
> >12 fds:
> >
> >time  rel_time      symbol
> >15    (+1)          qemu_poll_ns  [enter]
> >18    (+3)          qemu_poll_ns  [return]
> >
> >76 fd:
> >
> >12    (+2)          qemu_poll_ns  [enter]
> >18    (+6)          qemu_poll_ns  [return]
> >
> >I haven't looked at virtio-serial code, I'm not sure if we should reduce the
> >number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.
> >
> Does ioeventfd=off hamper the performance of virtio-serial?

In theory it has an impact, but I have no data about this. If you have a
performance demand, it's best to try it against your use case to answer this
question.

Fam

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial
@ 2014-09-23  1:29               ` Fam Zheng
  0 siblings, 0 replies; 40+ messages in thread
From: Fam Zheng @ 2014-09-23  1:29 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, kvm

On Mon, 09/22 21:23, Zhang Haoyu wrote:
> >
> >Amit,
> >
> >It's related to the big number of ioeventfds used in virtio-serial-pci. With
> >virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
> >guest initializes it or not.
> >
> >In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
> >virtio_console.ko, whereas 76 once modprobe virtio_console.
> >
> >Looks like the ppoll takes more time to poll more fds.
> >
> >Some trace data with systemtap:
> >
> >12 fds:
> >
> >time  rel_time      symbol
> >15    (+1)          qemu_poll_ns  [enter]
> >18    (+3)          qemu_poll_ns  [return]
> >
> >76 fd:
> >
> >12    (+2)          qemu_poll_ns  [enter]
> >18    (+6)          qemu_poll_ns  [return]
> >
> >I haven't looked at virtio-serial code, I'm not sure if we should reduce the
> >number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.
> >
> Does ioeventfd=off hamper the performance of virtio-serial?

In theory it has an impact, but I have no data about this. If you have a
performance demand, it's best to try it against your use case to answer this
question.

Fam

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-09-23  1:29 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-29  7:45 [question] virtio-blk performance degradation happened with virito-serial Zhang Haoyu
2014-08-29  7:45 ` [Qemu-devel] " Zhang Haoyu
2014-08-29 14:38 ` Amit Shah
2014-09-01 12:38   ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
2014-09-01 12:46     ` Amit Shah
2014-09-01 12:57       ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
2014-09-01 12:52     ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
2014-09-01 13:09       ` Christian Borntraeger
2014-09-01 13:12         ` Paolo Bonzini
2014-09-01 13:22           ` Christian Borntraeger
2014-09-01 13:29             ` Paolo Bonzini
2014-09-01 14:03               ` Christian Borntraeger
2014-09-01 14:15                 ` Christian Borntraeger
2014-09-04  7:56             ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu
2014-09-07  9:46               ` Zhang Haoyu
2014-09-07  9:46                 ` Zhang Haoyu
2014-09-11  6:11                 ` Amit Shah
2014-09-11  6:11                   ` Amit Shah
2014-09-12  3:21                   ` [question] virtio-blk performance degradation happened with virito-serial Zhang Haoyu
2014-09-12  3:21                     ` [Qemu-devel] " Zhang Haoyu
2014-09-12 12:38                     ` Stefan Hajnoczi
2014-09-12 12:38                       ` Stefan Hajnoczi
2014-09-13 17:22                       ` Max Reitz
2014-09-13 17:22                         ` Max Reitz
2014-09-16 14:59                     ` Zhang Haoyu
2014-09-16 14:59                       ` [Qemu-devel] " Zhang Haoyu
2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
2014-09-02 18:05         ` Andrey Korolyov
2014-09-02 18:11           ` Amit Shah
2014-09-02 18:11             ` [Qemu-devel] " Amit Shah
2014-09-02 18:27             ` Andrey Korolyov
2014-09-04  2:20         ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
2014-09-19  5:53         ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Fam Zheng
2014-09-19  5:53           ` Fam Zheng
2014-09-19 13:35           ` Paolo Bonzini
2014-09-19 13:35             ` [Qemu-devel] " Paolo Bonzini
2014-09-22 13:23           ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
2014-09-22 13:23             ` Zhang Haoyu
2014-09-23  1:29             ` Fam Zheng
2014-09-23  1:29               ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.