* [question] virtio-blk performance degradation happened with virito-serial @ 2014-08-29 7:45 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-08-29 7:45 UTC (permalink / raw) To: qemu-devel, kvm Hi, all I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. without virtio-serial: 4k-read-random 1186 IOPS with virtio-serial: 4k-read-random 871 IOPS but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. And, ide performance degradation does not happen with virtio-serial. [environment] Host OS: linux-3.10 QEMU: 2.0.1 Guest OS: windows server 2008 [qemu command] /usr/bin/kvm -id 1587174272642 -chardev socket,id=qmp,path=/var/run/qemu-server/1587174272642.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/1587174272642.pid -daemonize -name win2008-32 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 2048 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-00e081de43d7/cea072c4294f/win2008-32.vm/vm-disk-1.qcow2,if=none,id=drive-ide2,cache=none,aio=native -device ide-hd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 -netdev type=tap,id=net0,ifname=158717427264200,script=/sf/etc/kvm/vtp-bridge -device e10 00,mac=FE:FC:FE:D3:F9:2B,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3 =1 -global PIIX4_PM.disable_s4=1 Any ideas? Thanks, Zhang Haoyu ^ permalink raw reply [flat|nested] 40+ messages in thread
* [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial @ 2014-08-29 7:45 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-08-29 7:45 UTC (permalink / raw) To: qemu-devel, kvm Hi, all I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. without virtio-serial: 4k-read-random 1186 IOPS with virtio-serial: 4k-read-random 871 IOPS but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. And, ide performance degradation does not happen with virtio-serial. [environment] Host OS: linux-3.10 QEMU: 2.0.1 Guest OS: windows server 2008 [qemu command] /usr/bin/kvm -id 1587174272642 -chardev socket,id=qmp,path=/var/run/qemu-server/1587174272642.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/1587174272642.pid -daemonize -name win2008-32 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 2048 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-00e081de43d7/cea072c4294f/win2008-32.vm/vm-disk-1.qcow2,if=none,id=drive-ide2,cache=none,aio=native -device ide-hd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 -netdev type=tap,id=net0,ifname=158717427264200,script=/sf/etc/kvm/vtp-bridge -device e1000,mac=FE:FC:FE:D3:F9:2B,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3 =1 -global PIIX4_PM.disable_s4=1 Any ideas? Thanks, Zhang Haoyu ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial 2014-08-29 7:45 ` [Qemu-devel] " Zhang Haoyu (?) @ 2014-08-29 14:38 ` Amit Shah 2014-09-01 12:38 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu -1 siblings, 1 reply; 40+ messages in thread From: Amit Shah @ 2014-08-29 14:38 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On (Fri) 29 Aug 2014 [15:45:30], Zhang Haoyu wrote: > Hi, all > > I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. > without virtio-serial: > 4k-read-random 1186 IOPS > with virtio-serial: > 4k-read-random 871 IOPS > > but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. > > And, ide performance degradation does not happen with virtio-serial. Pretty sure it's related to MSI vectors in use. It's possible that the virtio-serial device takes up all the avl vectors in the guests, leaving old-style irqs for the virtio-blk device. If you restrict the number of vectors the virtio-serial device gets (using the -device virtio-serial-pci,vectors= param), does that make things better for you? Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-08-29 14:38 ` Amit Shah @ 2014-09-01 12:38 ` Zhang Haoyu 2014-09-01 12:46 ` Amit Shah 2014-09-01 12:52 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu 0 siblings, 2 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-01 12:38 UTC (permalink / raw) To: Amit Shah; +Cc: qemu-devel, kvm >> Hi, all >> >> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >> without virtio-serial: >> 4k-read-random 1186 IOPS >> with virtio-serial: >> 4k-read-random 871 IOPS >> >> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >> >> And, ide performance degradation does not happen with virtio-serial. > >Pretty sure it's related to MSI vectors in use. It's possible that >the virtio-serial device takes up all the avl vectors in the guests, >leaving old-style irqs for the virtio-blk device. > I don't think so, I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, the performance got back again, very obvious. So, I think it has no business with legacy interrupt mode, right? I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, and the difference of perf top data on guest when disable/enable virtio-serial in guest, any ideas? Thanks, Zhang Haoyu >If you restrict the number of vectors the virtio-serial device gets >(using the -device virtio-serial-pci,vectors= param), does that make >things better for you? > > > Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 12:38 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu @ 2014-09-01 12:46 ` Amit Shah 2014-09-01 12:57 ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu 2014-09-01 12:52 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu 1 sibling, 1 reply; 40+ messages in thread From: Amit Shah @ 2014-09-01 12:46 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On (Mon) 01 Sep 2014 [20:38:20], Zhang Haoyu wrote: > >> Hi, all > >> > >> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. > >> without virtio-serial: > >> 4k-read-random 1186 IOPS > >> with virtio-serial: > >> 4k-read-random 871 IOPS > >> > >> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. > >> > >> And, ide performance degradation does not happen with virtio-serial. > > > >Pretty sure it's related to MSI vectors in use. It's possible that > >the virtio-serial device takes up all the avl vectors in the guests, > >leaving old-style irqs for the virtio-blk device. > > > I don't think so, > I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, > then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, > the performance got back again, very obvious. > So, I think it has no business with legacy interrupt mode, right? > > I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, > and the difference of perf top data on guest when disable/enable virtio-serial in guest, > any ideas? So it's a windows guest; it could be something windows driver specific, then? Do you see the same on Linux guests too? Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial 2014-09-01 12:46 ` Amit Shah @ 2014-09-01 12:57 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-01 12:57 UTC (permalink / raw) To: Amit Shah; +Cc: qemu-devel, kvm >> >> Hi, all >> >> >> >> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >> >> without virtio-serial: >> >> 4k-read-random 1186 IOPS >> >> with virtio-serial: >> >> 4k-read-random 871 IOPS >> >> >> >> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >> >> >> >> And, ide performance degradation does not happen with virtio-serial. >> > >> >Pretty sure it's related to MSI vectors in use. It's possible that >> >the virtio-serial device takes up all the avl vectors in the guests, >> >leaving old-style irqs for the virtio-blk device. >> > >> I don't think so, >> I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >> then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >> the performance got back again, very obvious. >> So, I think it has no business with legacy interrupt mode, right? >> >> I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, >> and the difference of perf top data on guest when disable/enable virtio-serial in guest, >> any ideas? > >So it's a windows guest; it could be something windows driver >specific, then? Do you see the same on Linux guests too? > I suspect windows driver specific, too. I have not test linux guest, I'll test it later. Thanks, Zhang Haoyu > Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 12:38 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu 2014-09-01 12:46 ` Amit Shah @ 2014-09-01 12:52 ` Zhang Haoyu 2014-09-01 13:09 ` Christian Borntraeger 2014-09-02 6:36 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah 1 sibling, 2 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-01 12:52 UTC (permalink / raw) To: Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm >>> Hi, all >>> >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >>> without virtio-serial: >>> 4k-read-random 1186 IOPS >>> with virtio-serial: >>> 4k-read-random 871 IOPS >>> >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >>> >>> And, ide performance degradation does not happen with virtio-serial. >> >>Pretty sure it's related to MSI vectors in use. It's possible that >>the virtio-serial device takes up all the avl vectors in the guests, >>leaving old-style irqs for the virtio-blk device. >> >I don't think so, >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >the performance got back again, very obvious. add comments: Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. >So, I think it has no business with legacy interrupt mode, right? > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, >and the difference of perf top data on guest when disable/enable virtio-serial in guest, >any ideas? > >Thanks, >Zhang Haoyu >>If you restrict the number of vectors the virtio-serial device gets >>(using the -device virtio-serial-pci,vectors= param), does that make >>things better for you? >> >> >> Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 12:52 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu @ 2014-09-01 13:09 ` Christian Borntraeger 2014-09-01 13:12 ` Paolo Bonzini 2014-09-02 6:36 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah 1 sibling, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2014-09-01 13:09 UTC (permalink / raw) To: Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm On 01/09/14 14:52, Zhang Haoyu wrote: >>>> Hi, all >>>> >>>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >>>> without virtio-serial: >>>> 4k-read-random 1186 IOPS >>>> with virtio-serial: >>>> 4k-read-random 871 IOPS >>>> >>>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >>>> >>>> And, ide performance degradation does not happen with virtio-serial. >>> >>> Pretty sure it's related to MSI vectors in use. It's possible that >>> the virtio-serial device takes up all the avl vectors in the guests, >>> leaving old-style irqs for the virtio-blk device. >>> >> I don't think so, >> I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >> then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >> the performance got back again, very obvious. > add comments: > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. This is just wild guessing: If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. Christian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 13:09 ` Christian Borntraeger @ 2014-09-01 13:12 ` Paolo Bonzini 2014-09-01 13:22 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Paolo Bonzini @ 2014-09-01 13:12 UTC (permalink / raw) To: Christian Borntraeger, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm Il 01/09/2014 15:09, Christian Borntraeger ha scritto: > This is just wild guessing: > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. That could be the case if MSI is disabled. Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 13:12 ` Paolo Bonzini @ 2014-09-01 13:22 ` Christian Borntraeger 2014-09-01 13:29 ` Paolo Bonzini 2014-09-04 7:56 ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu 0 siblings, 2 replies; 40+ messages in thread From: Christian Borntraeger @ 2014-09-01 13:22 UTC (permalink / raw) To: Paolo Bonzini, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm On 01/09/14 15:12, Paolo Bonzini wrote: > Il 01/09/2014 15:09, Christian Borntraeger ha scritto: >> This is just wild guessing: >> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. > > That could be the case if MSI is disabled. > > Paolo > Do the windows virtio drivers enable MSIs, in their inf file? Christian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 13:22 ` Christian Borntraeger @ 2014-09-01 13:29 ` Paolo Bonzini 2014-09-01 14:03 ` Christian Borntraeger 2014-09-04 7:56 ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu 1 sibling, 1 reply; 40+ messages in thread From: Paolo Bonzini @ 2014-09-01 13:29 UTC (permalink / raw) To: Christian Borntraeger, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm Il 01/09/2014 15:22, Christian Borntraeger ha scritto: > > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. > > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. > > > > That could be the case if MSI is disabled. > > Do the windows virtio drivers enable MSIs, in their inf file? It depends on the version of the drivers, but it is a reasonable guess at what differs between Linux and Windows. Haoyu, can you give us the output of lspci from a Linux guest? Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 13:29 ` Paolo Bonzini @ 2014-09-01 14:03 ` Christian Borntraeger 2014-09-01 14:15 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2014-09-01 14:03 UTC (permalink / raw) To: Paolo Bonzini, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm On 01/09/14 15:29, Paolo Bonzini wrote: > Il 01/09/2014 15:22, Christian Borntraeger ha scritto: >>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>> >>> That could be the case if MSI is disabled. >> >> Do the windows virtio drivers enable MSIs, in their inf file? > > It depends on the version of the drivers, but it is a reasonable guess > at what differs between Linux and Windows. Haoyu, can you give us the > output of lspci from a Linux guest? > > Paolo Zhang Haoyu, which virtio drivers did you use? I just checked the Fedora virtio driver. The INF file does not contain the MSI enablement as described in http://msdn.microsoft.com/en-us/library/windows/hardware/ff544246%28v=vs.85%29.aspx That would explain the performance issues - given that the link information is still true. Christian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 14:03 ` Christian Borntraeger @ 2014-09-01 14:15 ` Christian Borntraeger 0 siblings, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2014-09-01 14:15 UTC (permalink / raw) To: Paolo Bonzini, Zhang Haoyu, Amit Shah; +Cc: qemu-devel, kvm On 01/09/14 16:03, Christian Borntraeger wrote: > On 01/09/14 15:29, Paolo Bonzini wrote: >> Il 01/09/2014 15:22, Christian Borntraeger ha scritto: >>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>>> >>>> That could be the case if MSI is disabled. >>> >>> Do the windows virtio drivers enable MSIs, in their inf file? >> >> It depends on the version of the drivers, but it is a reasonable guess >> at what differs between Linux and Windows. Haoyu, can you give us the >> output of lspci from a Linux guest? >> >> Paolo > > Zhang Haoyu, which virtio drivers did you use? > > I just checked the Fedora virtio driver. The INF file does not contain the MSI enablement as described in > http://msdn.microsoft.com/en-us/library/windows/hardware/ff544246%28v=vs.85%29.aspx > That would explain the performance issues - given that the link information is still true. Sorry, looked at the wrong inf file. The fedora driver does use MSI for serial and block. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial 2014-09-01 13:22 ` Christian Borntraeger 2014-09-01 13:29 ` Paolo Bonzini @ 2014-09-04 7:56 ` Zhang Haoyu 2014-09-07 9:46 ` Zhang Haoyu 1 sibling, 1 reply; 40+ messages in thread From: Zhang Haoyu @ 2014-09-04 7:56 UTC (permalink / raw) To: Paolo Bonzini, Christian Borntraeger, Amit Shah; +Cc: qemu-devel, kvm >> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >> > >> > That could be the case if MSI is disabled. >> >> Do the windows virtio drivers enable MSIs, in their inf file? > >It depends on the version of the drivers, but it is a reasonable guess >at what differs between Linux and Windows. Haoyu, can you give us the >output of lspci from a Linux guest? > I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. virtio_console module installed: 64K-write-sequence: 285 MBPS, 4380 IOPS virtio_console module uninstalled: 64K-write-sequence: 370 MBPS, 5670 IOPS And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled. 25: 2245933 PCI-MSI-edge virtio1-requests fio command: fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest QEMU comamnd: /usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0 [environment] Host:linux-3.10(RHEL7-rc1) QEMU: qemu-2.0.1 Guest: RHEL6.5 # lspci -tv -[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI +-02.0 Cirrus Logic GD 5446 +-03.0 Red Hat, Inc Virtio console +-0b.0 Red Hat, Inc Virtio block device +-0c.0 Red Hat, Inc Virtio block device \-12.0 Red Hat, Inc Virtio network device # lspci -vvv 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master]) Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) Region 4: I/O ports at c0e0 [size=16] Kernel driver in use: ata_piix Kernel modules: ata_generic, pata_acpi, ata_piix 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI]) Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin D routed to IRQ 11 Region 4: I/O ports at c080 [size=32] Kernel driver in use: uhci_hcd 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 9 Kernel driver in use: piix4_smbus Kernel modules: i2c-piix4 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller]) Subsystem: Red Hat, Inc Device 1100 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at febc0000 [disabled] [size=64K] Kernel modules: cirrusfb 00:03.0 Communication controller: Red Hat, Inc Virtio console Subsystem: Red Hat, Inc Device 0003 Physical Slot: 3 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c0a0 [size=32] Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] MSI-X: Enable- Count=4 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device Subsystem: Red Hat, Inc Device 0002 Physical Slot: 11 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c000 [size=64] Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device Subsystem: Red Hat, Inc Device 0002 Physical Slot: 12 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at c040 [size=64] Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 Kernel driver in use: virtio-pci Kernel modules: virtio_pci 00:12.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 18 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c0c0 [size=32] Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at feb80000 [disabled] [size=256K] Capabilities: [40] MSI-X: Enable+ Count=3 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 Kernel driver in use: virtio-pci Kernel modules: virtio_pci Thanks, Zhang Haoyu >Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial 2014-09-04 7:56 ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu @ 2014-09-07 9:46 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-07 9:46 UTC (permalink / raw) To: Zhang Haoyu, Paolo Bonzini, Christian Borntraeger, Amit Shah Cc: qemu-devel, kvm Hi, Paolo, Amit, any ideas? Thanks, Zhang Haoyu On 2014-9-4 15:56, Zhang Haoyu wrote: >>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>>> That could be the case if MSI is disabled. >>> Do the windows virtio drivers enable MSIs, in their inf file? >> It depends on the version of the drivers, but it is a reasonable guess >> at what differs between Linux and Windows. Haoyu, can you give us the >> output of lspci from a Linux guest? >> > I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. > virtio_console module installed: > 64K-write-sequence: 285 MBPS, 4380 IOPS > virtio_console module uninstalled: > 64K-write-sequence: 370 MBPS, 5670 IOPS > > And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled. > 25: 2245933 PCI-MSI-edge virtio1-requests > > fio command: > fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest > > QEMU comamnd: > /usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -dri ve file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci > .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0 > > [environment] > Host:linux-3.10(RHEL7-rc1) > QEMU: qemu-2.0.1 > Guest: RHEL6.5 > > # lspci -tv > -[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] > +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] > +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI > +-02.0 Cirrus Logic GD 5446 > +-03.0 Red Hat, Inc Virtio console > +-0b.0 Red Hat, Inc Virtio block device > +-0c.0 Red Hat, Inc Virtio block device > \-12.0 Red Hat, Inc Virtio network device > > # lspci -vvv > 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] > Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) > Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] > Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) > Region 4: I/O ports at c0e0 [size=16] > Kernel driver in use: ata_piix > Kernel modules: ata_generic, pata_acpi, ata_piix > > 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin D routed to IRQ 11 > Region 4: I/O ports at c080 [size=32] > Kernel driver in use: uhci_hcd > > 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 9 > Kernel driver in use: piix4_smbus > Kernel modules: i2c-piix4 > > 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller]) > Subsystem: Red Hat, Inc Device 1100 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] > Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at febc0000 [disabled] [size=64K] > Kernel modules: cirrusfb > > 00:03.0 Communication controller: Red Hat, Inc Virtio console > Subsystem: Red Hat, Inc Device 0003 > Physical Slot: 3 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0a0 [size=32] > Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable- Count=4 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > 00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 11 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c000 [size=64] > Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > 00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 12 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 11 > Region 0: I/O ports at c040 [size=64] > Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > 00:12.0 Ethernet controller: Red Hat, Inc Virtio network device > Subsystem: Red Hat, Inc Device 0001 > Physical Slot: 18 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0c0 [size=32] > Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at feb80000 [disabled] [size=256K] > Capabilities: [40] MSI-X: Enable+ Count=3 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > Thanks, > Zhang Haoyu > >> Paolo > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial @ 2014-09-07 9:46 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-07 9:46 UTC (permalink / raw) To: Zhang Haoyu, Paolo Bonzini, Christian Borntraeger, Amit Shah Cc: qemu-devel, kvm Hi, Paolo, Amit, any ideas? Thanks, Zhang Haoyu On 2014-9-4 15:56, Zhang Haoyu wrote: >>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>>> That could be the case if MSI is disabled. >>> Do the windows virtio drivers enable MSIs, in their inf file? >> It depends on the version of the drivers, but it is a reasonable guess >> at what differs between Linux and Windows. Haoyu, can you give us the >> output of lspci from a Linux guest? >> > I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. > virtio_console module installed: > 64K-write-sequence: 285 MBPS, 4380 IOPS > virtio_console module uninstalled: > 64K-write-sequence: 370 MBPS, 5670 IOPS > > And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled. > 25: 2245933 PCI-MSI-edge virtio1-requests > > fio command: > fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest > > QEMU comamnd: > /usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci > .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0 > > [environment] > Host:linux-3.10(RHEL7-rc1) > QEMU: qemu-2.0.1 > Guest: RHEL6.5 > > # lspci -tv > -[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] > +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] > +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI > +-02.0 Cirrus Logic GD 5446 > +-03.0 Red Hat, Inc Virtio console > +-0b.0 Red Hat, Inc Virtio block device > +-0c.0 Red Hat, Inc Virtio block device > \-12.0 Red Hat, Inc Virtio network device > > # lspci -vvv > 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] > Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) > Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] > Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) > Region 4: I/O ports at c0e0 [size=16] > Kernel driver in use: ata_piix > Kernel modules: ata_generic, pata_acpi, ata_piix > > 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin D routed to IRQ 11 > Region 4: I/O ports at c080 [size=32] > Kernel driver in use: uhci_hcd > > 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 9 > Kernel driver in use: piix4_smbus > Kernel modules: i2c-piix4 > > 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller]) > Subsystem: Red Hat, Inc Device 1100 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] > Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at febc0000 [disabled] [size=64K] > Kernel modules: cirrusfb > > 00:03.0 Communication controller: Red Hat, Inc Virtio console > Subsystem: Red Hat, Inc Device 0003 > Physical Slot: 3 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0a0 [size=32] > Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable- Count=4 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > 00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 11 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c000 [size=64] > Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > 00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 12 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 11 > Region 0: I/O ports at c040 [size=64] > Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > 00:12.0 Ethernet controller: Red Hat, Inc Virtio network device > Subsystem: Red Hat, Inc Device 0001 > Physical Slot: 18 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0c0 [size=32] > Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at feb80000 [disabled] [size=256K] > Capabilities: [40] MSI-X: Enable+ Count=3 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > > Thanks, > Zhang Haoyu > >> Paolo > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial 2014-09-07 9:46 ` Zhang Haoyu @ 2014-09-11 6:11 ` Amit Shah -1 siblings, 0 replies; 40+ messages in thread From: Amit Shah @ 2014-09-11 6:11 UTC (permalink / raw) To: Zhang Haoyu Cc: Zhang Haoyu, Paolo Bonzini, Christian Borntraeger, qemu-devel, kvm On (Sun) 07 Sep 2014 [17:46:26], Zhang Haoyu wrote: > Hi, Paolo, Amit, > any ideas? I'll check this, thanks for testing with Linux guests. Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial @ 2014-09-11 6:11 ` Amit Shah 0 siblings, 0 replies; 40+ messages in thread From: Amit Shah @ 2014-09-11 6:11 UTC (permalink / raw) To: Zhang Haoyu Cc: Paolo Bonzini, Zhang Haoyu, qemu-devel, kvm, Christian Borntraeger On (Sun) 07 Sep 2014 [17:46:26], Zhang Haoyu wrote: > Hi, Paolo, Amit, > any ideas? I'll check this, thanks for testing with Linux guests. Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [question] virtio-blk performance degradation happened with virito-serial 2014-09-11 6:11 ` Amit Shah @ 2014-09-12 3:21 ` Zhang Haoyu -1 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-12 3:21 UTC (permalink / raw) To: Amit Shah, Zhang Haoyu Cc: Paolo Bonzini, qemu-devel, kvm, Christian Borntraeger >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>> > >>> > That could be the case if MSI is disabled. >>> >>> Do the windows virtio drivers enable MSIs, in their inf file? >> >>It depends on the version of the drivers, but it is a reasonable guess >>at what differs between Linux and Windows. Haoyu, can you give us the >>output of lspci from a Linux guest? >> >I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. >virtio_console module installed: >64K-write-sequence: 285 MBPS, 4380 IOPS >virtio_console module uninstalled: >64K-write-sequence: 370 MBPS, 5670 IOPS > I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that, virtio_console module installed: qemu main thread cpu usage: 98% virtio_console module uninstalled: qemu main thread cpu usage: 60% perf top -p <qemu-pid> result, virtio_console module installed: PerfTop: 9868 irqs/sec kernel:76.4% exact: 0.0% [4000Hz cycles], (target_pid: 88381) ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- 11.80% [kernel] [k] _raw_spin_lock_irqsave 8.42% [kernel] [k] _raw_spin_unlock_irqrestore 7.33% [kernel] [k] fget_light 6.28% [kernel] [k] fput 3.61% [kernel] [k] do_sys_poll 3.30% qemu-system-x86_64 [.] qcow2_check_metadata_overlap 3.10% [kernel] [k] __pollwait 2.15% qemu-system-x86_64 [.] qemu_iohandler_poll 1.44% libglib-2.0.so.0.3200.4 [.] g_array_append_vals 1.36% libc-2.13.so [.] 0x000000000011fc2a 1.31% libpthread-2.13.so [.] pthread_mutex_lock 1.24% libglib-2.0.so.0.3200.4 [.] 0x000000000001f961 1.20% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt 0.99% [kernel] [k] eventfd_poll 0.98% [vdso] [.] 0x0000000000000771 0.97% [kernel] [k] remove_wait_queue 0.96% qemu-system-x86_64 [.] qemu_iohandler_fill 0.95% [kernel] [k] add_wait_queue 0.69% [kernel] [k] __srcu_read_lock 0.58% [kernel] [k] poll_freewait 0.57% [kernel] [k] _raw_spin_lock_irq 0.54% [kernel] [k] __srcu_read_unlock 0.47% [kernel] [k] copy_user_enhanced_fast_string 0.46% [kvm_intel] [k] vmx_vcpu_run 0.46% [kvm] [k] vcpu_enter_guest 0.42% [kernel] [k] tcp_poll 0.41% [kernel] [k] system_call_after_swapgs 0.40% libglib-2.0.so.0.3200.4 [.] g_slice_alloc 0.40% [kernel] [k] system_call 0.38% libpthread-2.13.so [.] 0x000000000000e18d 0.38% libglib-2.0.so.0.3200.4 [.] g_slice_free1 0.38% qemu-system-x86_64 [.] address_space_translate_internal 0.38% [kernel] [k] _raw_spin_lock 0.37% qemu-system-x86_64 [.] phys_page_find 0.36% [kernel] [k] get_page_from_freelist 0.35% [kernel] [k] sock_poll 0.34% [kernel] [k] fsnotify 0.31% libglib-2.0.so.0.3200.4 [.] g_main_context_check 0.30% [kernel] [k] do_direct_IO 0.29% libpthread-2.13.so [.] pthread_getspecific virtio_console module uninstalled: PerfTop: 9138 irqs/sec kernel:71.7% exact: 0.0% [4000Hz cycles], (target_pid: 88381) ------------------------------------------------------------------------------------------------------------------------------ 5.72% qemu-system-x86_64 [.] qcow2_check_metadata_overlap 4.51% [kernel] [k] fget_light 3.98% [kernel] [k] _raw_spin_lock_irqsave 2.55% [kernel] [k] fput 2.48% libpthread-2.13.so [.] pthread_mutex_lock 2.46% [kernel] [k] _raw_spin_unlock_irqrestore 2.21% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt 1.71% [vdso] [.] 0x000000000000060c 1.68% libc-2.13.so [.] 0x00000000000e751f 1.64% libglib-2.0.so.0.3200.4 [.] 0x000000000004fca0 1.20% [kernel] [k] __srcu_read_lock 1.14% [kernel] [k] do_sys_poll 0.96% [kernel] [k] _raw_spin_lock_irq 0.95% [kernel] [k] __pollwait 0.91% [kernel] [k] __srcu_read_unlock 0.78% [kernel] [k] tcp_poll 0.74% [kvm] [k] vcpu_enter_guest 0.73% [kvm_intel] [k] vmx_vcpu_run 0.72% [kernel] [k] _raw_spin_lock 0.72% [kernel] [k] system_call_after_swapgs 0.70% [kernel] [k] copy_user_enhanced_fast_string 0.67% libglib-2.0.so.0.3200.4 [.] g_slice_free1 0.66% libpthread-2.13.so [.] 0x000000000000e12d 0.65% [kernel] [k] system_call 0.61% [kernel] [k] do_direct_IO 0.57% qemu-system-x86_64 [.] qemu_iohandler_poll 0.57% [kernel] [k] fsnotify 0.54% libglib-2.0.so.0.3200.4 [.] g_slice_alloc 0.50% [kernel] [k] vfs_write 0.49% libpthread-2.13.so [.] pthread_getspecific 0.48% qemu-system-x86_64 [.] qemu_event_reset 0.47% libglib-2.0.so.0.3200.4 [.] g_main_context_check 0.46% qemu-system-x86_64 [.] address_space_translate_internal 0.46% [kernel] [k] sock_poll 0.46% libpthread-2.13.so [.] __pthread_disable_asynccancel 0.44% [kernel] [k] resched_task 0.43% libpthread-2.13.so [.] __pthread_enable_asynccancel 0.42% qemu-system-x86_64 [.] phys_page_find 0.39% qemu-system-x86_64 [.] object_dynamic_cast_assert >And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled. >25: 2245933 PCI-MSI-edge virtio1-requests > >fio command: >fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest > >QEMU comamnd: >/usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -driv e file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pc i > .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0 > >[environment] >Host:linux-3.10(RHEL7-rc1) >QEMU: qemu-2.0.1 >Guest: RHEL6.5 > ># lspci -tv >-[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] > +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] > +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI > +-02.0 Cirrus Logic GD 5446 > +-03.0 Red Hat, Inc Virtio console > +-0b.0 Red Hat, Inc Virtio block device > +-0c.0 Red Hat, Inc Virtio block device > \-12.0 Red Hat, Inc Virtio network device > ># lspci -vvv >00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] > Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) > Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] > Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) > Region 4: I/O ports at c0e0 [size=16] > Kernel driver in use: ata_piix > Kernel modules: ata_generic, pata_acpi, ata_piix > >00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin D routed to IRQ 11 > Region 4: I/O ports at c080 [size=32] > Kernel driver in use: uhci_hcd > >00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 9 > Kernel driver in use: piix4_smbus > Kernel modules: i2c-piix4 > >00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller]) > Subsystem: Red Hat, Inc Device 1100 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] > Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at febc0000 [disabled] [size=64K] > Kernel modules: cirrusfb > >00:03.0 Communication controller: Red Hat, Inc Virtio console > Subsystem: Red Hat, Inc Device 0003 > Physical Slot: 3 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0a0 [size=32] > Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable- Count=4 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 11 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c000 [size=64] > Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 12 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 11 > Region 0: I/O ports at c040 [size=64] > Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >00:12.0 Ethernet controller: Red Hat, Inc Virtio network device > Subsystem: Red Hat, Inc Device 0001 > Physical Slot: 18 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0c0 [size=32] > Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at feb80000 [disabled] [size=256K] > Capabilities: [40] MSI-X: Enable+ Count=3 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >Thanks, >Zhang Haoyu ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial @ 2014-09-12 3:21 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-12 3:21 UTC (permalink / raw) To: Amit Shah, Zhang Haoyu Cc: Paolo Bonzini, qemu-devel, kvm, Christian Borntraeger >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>> > >>> > That could be the case if MSI is disabled. >>> >>> Do the windows virtio drivers enable MSIs, in their inf file? >> >>It depends on the version of the drivers, but it is a reasonable guess >>at what differs between Linux and Windows. Haoyu, can you give us the >>output of lspci from a Linux guest? >> >I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. >virtio_console module installed: >64K-write-sequence: 285 MBPS, 4380 IOPS >virtio_console module uninstalled: >64K-write-sequence: 370 MBPS, 5670 IOPS > I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that, virtio_console module installed: qemu main thread cpu usage: 98% virtio_console module uninstalled: qemu main thread cpu usage: 60% perf top -p <qemu-pid> result, virtio_console module installed: PerfTop: 9868 irqs/sec kernel:76.4% exact: 0.0% [4000Hz cycles], (target_pid: 88381) ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- 11.80% [kernel] [k] _raw_spin_lock_irqsave 8.42% [kernel] [k] _raw_spin_unlock_irqrestore 7.33% [kernel] [k] fget_light 6.28% [kernel] [k] fput 3.61% [kernel] [k] do_sys_poll 3.30% qemu-system-x86_64 [.] qcow2_check_metadata_overlap 3.10% [kernel] [k] __pollwait 2.15% qemu-system-x86_64 [.] qemu_iohandler_poll 1.44% libglib-2.0.so.0.3200.4 [.] g_array_append_vals 1.36% libc-2.13.so [.] 0x000000000011fc2a 1.31% libpthread-2.13.so [.] pthread_mutex_lock 1.24% libglib-2.0.so.0.3200.4 [.] 0x000000000001f961 1.20% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt 0.99% [kernel] [k] eventfd_poll 0.98% [vdso] [.] 0x0000000000000771 0.97% [kernel] [k] remove_wait_queue 0.96% qemu-system-x86_64 [.] qemu_iohandler_fill 0.95% [kernel] [k] add_wait_queue 0.69% [kernel] [k] __srcu_read_lock 0.58% [kernel] [k] poll_freewait 0.57% [kernel] [k] _raw_spin_lock_irq 0.54% [kernel] [k] __srcu_read_unlock 0.47% [kernel] [k] copy_user_enhanced_fast_string 0.46% [kvm_intel] [k] vmx_vcpu_run 0.46% [kvm] [k] vcpu_enter_guest 0.42% [kernel] [k] tcp_poll 0.41% [kernel] [k] system_call_after_swapgs 0.40% libglib-2.0.so.0.3200.4 [.] g_slice_alloc 0.40% [kernel] [k] system_call 0.38% libpthread-2.13.so [.] 0x000000000000e18d 0.38% libglib-2.0.so.0.3200.4 [.] g_slice_free1 0.38% qemu-system-x86_64 [.] address_space_translate_internal 0.38% [kernel] [k] _raw_spin_lock 0.37% qemu-system-x86_64 [.] phys_page_find 0.36% [kernel] [k] get_page_from_freelist 0.35% [kernel] [k] sock_poll 0.34% [kernel] [k] fsnotify 0.31% libglib-2.0.so.0.3200.4 [.] g_main_context_check 0.30% [kernel] [k] do_direct_IO 0.29% libpthread-2.13.so [.] pthread_getspecific virtio_console module uninstalled: PerfTop: 9138 irqs/sec kernel:71.7% exact: 0.0% [4000Hz cycles], (target_pid: 88381) ------------------------------------------------------------------------------------------------------------------------------ 5.72% qemu-system-x86_64 [.] qcow2_check_metadata_overlap 4.51% [kernel] [k] fget_light 3.98% [kernel] [k] _raw_spin_lock_irqsave 2.55% [kernel] [k] fput 2.48% libpthread-2.13.so [.] pthread_mutex_lock 2.46% [kernel] [k] _raw_spin_unlock_irqrestore 2.21% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt 1.71% [vdso] [.] 0x000000000000060c 1.68% libc-2.13.so [.] 0x00000000000e751f 1.64% libglib-2.0.so.0.3200.4 [.] 0x000000000004fca0 1.20% [kernel] [k] __srcu_read_lock 1.14% [kernel] [k] do_sys_poll 0.96% [kernel] [k] _raw_spin_lock_irq 0.95% [kernel] [k] __pollwait 0.91% [kernel] [k] __srcu_read_unlock 0.78% [kernel] [k] tcp_poll 0.74% [kvm] [k] vcpu_enter_guest 0.73% [kvm_intel] [k] vmx_vcpu_run 0.72% [kernel] [k] _raw_spin_lock 0.72% [kernel] [k] system_call_after_swapgs 0.70% [kernel] [k] copy_user_enhanced_fast_string 0.67% libglib-2.0.so.0.3200.4 [.] g_slice_free1 0.66% libpthread-2.13.so [.] 0x000000000000e12d 0.65% [kernel] [k] system_call 0.61% [kernel] [k] do_direct_IO 0.57% qemu-system-x86_64 [.] qemu_iohandler_poll 0.57% [kernel] [k] fsnotify 0.54% libglib-2.0.so.0.3200.4 [.] g_slice_alloc 0.50% [kernel] [k] vfs_write 0.49% libpthread-2.13.so [.] pthread_getspecific 0.48% qemu-system-x86_64 [.] qemu_event_reset 0.47% libglib-2.0.so.0.3200.4 [.] g_main_context_check 0.46% qemu-system-x86_64 [.] address_space_translate_internal 0.46% [kernel] [k] sock_poll 0.46% libpthread-2.13.so [.] __pthread_disable_asynccancel 0.44% [kernel] [k] resched_task 0.43% libpthread-2.13.so [.] __pthread_enable_asynccancel 0.42% qemu-system-x86_64 [.] phys_page_find 0.39% qemu-system-x86_64 [.] object_dynamic_cast_assert >And, virio-blk's interrupt mode always is MSI, no matter if virtio_console module is installed or uninstalled. >25: 2245933 PCI-MSI-edge virtio1-requests > >fio command: >fio -filename /dev/vda -direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=64k -size=30G -numjobs=1 -name=mytest > >QEMU comamnd: >/usr/bin/kvm -id 5497356709352 -chardev socket,id=qmp,path=/var/run/qemu-server/5497356709352.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5497356709352.pid -daemonize -name io-test-rhel-6.5 -smp sockets=1,cores=1 -cpu core2duo -nodefaults -vga cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive file=/sf/data/local/zhanghaoyu/rhel-server-6.5-x86_64-dvd.iso,if=none,id=drive-ide0,media=cdrom,aio=native,forecast=disable -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/local/images/host-1051721dff13/io-test-rhel-6.5.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pc i > .0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=164922379979200,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,mac=FE:FC:FE:C6:47:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -chardev socket,path=/run/virtser/1649223799792.sock,server,nowait,id=channelser -device virtio-serial,vectors=4 -device virtserialport,chardev=channelser,name=channelser.virtserial0.0 > >[environment] >Host:linux-3.10(RHEL7-rc1) >QEMU: qemu-2.0.1 >Guest: RHEL6.5 > ># lspci -tv >-[0000:00]-+-00.0 Intel Corporation 440FX - 82441FX PMC [Natoma] > +-01.0 Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > +-01.1 Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > +-01.2 Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] > +-01.3 Intel Corporation 82371AB/EB/MB PIIX4 ACPI > +-02.0 Cirrus Logic GD 5446 > +-03.0 Red Hat, Inc Virtio console > +-0b.0 Red Hat, Inc Virtio block device > +-0c.0 Red Hat, Inc Virtio block device > \-12.0 Red Hat, Inc Virtio network device > ># lspci -vvv >00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] > Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) > Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] > Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) > Region 4: I/O ports at c0e0 [size=16] > Kernel driver in use: ata_piix > Kernel modules: ata_generic, pata_acpi, ata_piix > >00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) (prog-if 00 [UHCI]) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin D routed to IRQ 11 > Region 4: I/O ports at c080 [size=32] > Kernel driver in use: uhci_hcd > >00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > Subsystem: Red Hat, Inc Qemu virtual machine > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 9 > Kernel driver in use: piix4_smbus > Kernel modules: i2c-piix4 > >00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller]) > Subsystem: Red Hat, Inc Device 1100 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] > Region 1: Memory at febd0000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at febc0000 [disabled] [size=64K] > Kernel modules: cirrusfb > >00:03.0 Communication controller: Red Hat, Inc Virtio console > Subsystem: Red Hat, Inc Device 0003 > Physical Slot: 3 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0a0 [size=32] > Region 1: Memory at febd1000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable- Count=4 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >00:0b.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 11 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c000 [size=64] > Region 1: Memory at febd2000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >00:0c.0 SCSI storage controller: Red Hat, Inc Virtio block device > Subsystem: Red Hat, Inc Device 0002 > Physical Slot: 12 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 11 > Region 0: I/O ports at c040 [size=64] > Region 1: Memory at febd3000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >00:12.0 Ethernet controller: Red Hat, Inc Virtio network device > Subsystem: Red Hat, Inc Device 0001 > Physical Slot: 18 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Interrupt: pin A routed to IRQ 10 > Region 0: I/O ports at c0c0 [size=32] > Region 1: Memory at febd4000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at feb80000 [disabled] [size=256K] > Capabilities: [40] MSI-X: Enable+ Count=3 Masked- > Vector table: BAR=1 offset=00000000 > PBA: BAR=1 offset=00000800 > Kernel driver in use: virtio-pci > Kernel modules: virtio_pci > >Thanks, >Zhang Haoyu ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial 2014-09-12 3:21 ` [Qemu-devel] " Zhang Haoyu @ 2014-09-12 12:38 ` Stefan Hajnoczi -1 siblings, 0 replies; 40+ messages in thread From: Stefan Hajnoczi @ 2014-09-12 12:38 UTC (permalink / raw) To: Zhang Haoyu Cc: Amit Shah, Zhang Haoyu, Paolo Bonzini, qemu-devel, kvm, Christian Borntraeger, Max Reitz [-- Attachment #1: Type: text/plain, Size: 6938 bytes --] On Fri, Sep 12, 2014 at 11:21:37AM +0800, Zhang Haoyu wrote: > >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. > >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. > >>> > > >>> > That could be the case if MSI is disabled. > >>> > >>> Do the windows virtio drivers enable MSIs, in their inf file? > >> > >>It depends on the version of the drivers, but it is a reasonable guess > >>at what differs between Linux and Windows. Haoyu, can you give us the > >>output of lspci from a Linux guest? > >> > >I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. > >virtio_console module installed: > >64K-write-sequence: 285 MBPS, 4380 IOPS > >virtio_console module uninstalled: > >64K-write-sequence: 370 MBPS, 5670 IOPS > > > I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that, > virtio_console module installed: > qemu main thread cpu usage: 98% > virtio_console module uninstalled: > qemu main thread cpu usage: 60% > > perf top -p <qemu-pid> result, > virtio_console module installed: > PerfTop: 9868 irqs/sec kernel:76.4% exact: 0.0% [4000Hz cycles], (target_pid: 88381) > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 11.80% [kernel] [k] _raw_spin_lock_irqsave > 8.42% [kernel] [k] _raw_spin_unlock_irqrestore > 7.33% [kernel] [k] fget_light > 6.28% [kernel] [k] fput > 3.61% [kernel] [k] do_sys_poll > 3.30% qemu-system-x86_64 [.] qcow2_check_metadata_overlap > 3.10% [kernel] [k] __pollwait > 2.15% qemu-system-x86_64 [.] qemu_iohandler_poll > 1.44% libglib-2.0.so.0.3200.4 [.] g_array_append_vals > 1.36% libc-2.13.so [.] 0x000000000011fc2a > 1.31% libpthread-2.13.so [.] pthread_mutex_lock > 1.24% libglib-2.0.so.0.3200.4 [.] 0x000000000001f961 > 1.20% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt > 0.99% [kernel] [k] eventfd_poll > 0.98% [vdso] [.] 0x0000000000000771 > 0.97% [kernel] [k] remove_wait_queue > 0.96% qemu-system-x86_64 [.] qemu_iohandler_fill > 0.95% [kernel] [k] add_wait_queue > 0.69% [kernel] [k] __srcu_read_lock > 0.58% [kernel] [k] poll_freewait > 0.57% [kernel] [k] _raw_spin_lock_irq > 0.54% [kernel] [k] __srcu_read_unlock > 0.47% [kernel] [k] copy_user_enhanced_fast_string > 0.46% [kvm_intel] [k] vmx_vcpu_run > 0.46% [kvm] [k] vcpu_enter_guest > 0.42% [kernel] [k] tcp_poll > 0.41% [kernel] [k] system_call_after_swapgs > 0.40% libglib-2.0.so.0.3200.4 [.] g_slice_alloc > 0.40% [kernel] [k] system_call > 0.38% libpthread-2.13.so [.] 0x000000000000e18d > 0.38% libglib-2.0.so.0.3200.4 [.] g_slice_free1 > 0.38% qemu-system-x86_64 [.] address_space_translate_internal > 0.38% [kernel] [k] _raw_spin_lock > 0.37% qemu-system-x86_64 [.] phys_page_find > 0.36% [kernel] [k] get_page_from_freelist > 0.35% [kernel] [k] sock_poll > 0.34% [kernel] [k] fsnotify > 0.31% libglib-2.0.so.0.3200.4 [.] g_main_context_check > 0.30% [kernel] [k] do_direct_IO > 0.29% libpthread-2.13.so [.] pthread_getspecific > > virtio_console module uninstalled: > PerfTop: 9138 irqs/sec kernel:71.7% exact: 0.0% [4000Hz cycles], (target_pid: 88381) > ------------------------------------------------------------------------------------------------------------------------------ > > 5.72% qemu-system-x86_64 [.] qcow2_check_metadata_overlap > 4.51% [kernel] [k] fget_light > 3.98% [kernel] [k] _raw_spin_lock_irqsave > 2.55% [kernel] [k] fput > 2.48% libpthread-2.13.so [.] pthread_mutex_lock > 2.46% [kernel] [k] _raw_spin_unlock_irqrestore > 2.21% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt > 1.71% [vdso] [.] 0x000000000000060c > 1.68% libc-2.13.so [.] 0x00000000000e751f > 1.64% libglib-2.0.so.0.3200.4 [.] 0x000000000004fca0 > 1.20% [kernel] [k] __srcu_read_lock > 1.14% [kernel] [k] do_sys_poll > 0.96% [kernel] [k] _raw_spin_lock_irq > 0.95% [kernel] [k] __pollwait > 0.91% [kernel] [k] __srcu_read_unlock > 0.78% [kernel] [k] tcp_poll > 0.74% [kvm] [k] vcpu_enter_guest > 0.73% [kvm_intel] [k] vmx_vcpu_run > 0.72% [kernel] [k] _raw_spin_lock > 0.72% [kernel] [k] system_call_after_swapgs > 0.70% [kernel] [k] copy_user_enhanced_fast_string > 0.67% libglib-2.0.so.0.3200.4 [.] g_slice_free1 > 0.66% libpthread-2.13.so [.] 0x000000000000e12d > 0.65% [kernel] [k] system_call > 0.61% [kernel] [k] do_direct_IO > 0.57% qemu-system-x86_64 [.] qemu_iohandler_poll > 0.57% [kernel] [k] fsnotify > 0.54% libglib-2.0.so.0.3200.4 [.] g_slice_alloc > 0.50% [kernel] [k] vfs_write > 0.49% libpthread-2.13.so [.] pthread_getspecific > 0.48% qemu-system-x86_64 [.] qemu_event_reset > 0.47% libglib-2.0.so.0.3200.4 [.] g_main_context_check > 0.46% qemu-system-x86_64 [.] address_space_translate_internal > 0.46% [kernel] [k] sock_poll > 0.46% libpthread-2.13.so [.] __pthread_disable_asynccancel > 0.44% [kernel] [k] resched_task > 0.43% libpthread-2.13.so [.] __pthread_enable_asynccancel > 0.42% qemu-system-x86_64 [.] phys_page_find > 0.39% qemu-system-x86_64 [.] object_dynamic_cast_assert Max: Unrelated to this performance issue but I notice that the qcow2 metadata overlap check is high in the host CPU profile. Have you had any thoughts about optimizing the check? Stefan [-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial @ 2014-09-12 12:38 ` Stefan Hajnoczi 0 siblings, 0 replies; 40+ messages in thread From: Stefan Hajnoczi @ 2014-09-12 12:38 UTC (permalink / raw) To: Zhang Haoyu Cc: kvm, qemu-devel, Zhang Haoyu, Max Reitz, Christian Borntraeger, Amit Shah, Paolo Bonzini [-- Attachment #1: Type: text/plain, Size: 6938 bytes --] On Fri, Sep 12, 2014 at 11:21:37AM +0800, Zhang Haoyu wrote: > >>> > > If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. > >>> > > AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. > >>> > > >>> > That could be the case if MSI is disabled. > >>> > >>> Do the windows virtio drivers enable MSIs, in their inf file? > >> > >>It depends on the version of the drivers, but it is a reasonable guess > >>at what differs between Linux and Windows. Haoyu, can you give us the > >>output of lspci from a Linux guest? > >> > >I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. > >virtio_console module installed: > >64K-write-sequence: 285 MBPS, 4380 IOPS > >virtio_console module uninstalled: > >64K-write-sequence: 370 MBPS, 5670 IOPS > > > I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that, > virtio_console module installed: > qemu main thread cpu usage: 98% > virtio_console module uninstalled: > qemu main thread cpu usage: 60% > > perf top -p <qemu-pid> result, > virtio_console module installed: > PerfTop: 9868 irqs/sec kernel:76.4% exact: 0.0% [4000Hz cycles], (target_pid: 88381) > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 11.80% [kernel] [k] _raw_spin_lock_irqsave > 8.42% [kernel] [k] _raw_spin_unlock_irqrestore > 7.33% [kernel] [k] fget_light > 6.28% [kernel] [k] fput > 3.61% [kernel] [k] do_sys_poll > 3.30% qemu-system-x86_64 [.] qcow2_check_metadata_overlap > 3.10% [kernel] [k] __pollwait > 2.15% qemu-system-x86_64 [.] qemu_iohandler_poll > 1.44% libglib-2.0.so.0.3200.4 [.] g_array_append_vals > 1.36% libc-2.13.so [.] 0x000000000011fc2a > 1.31% libpthread-2.13.so [.] pthread_mutex_lock > 1.24% libglib-2.0.so.0.3200.4 [.] 0x000000000001f961 > 1.20% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt > 0.99% [kernel] [k] eventfd_poll > 0.98% [vdso] [.] 0x0000000000000771 > 0.97% [kernel] [k] remove_wait_queue > 0.96% qemu-system-x86_64 [.] qemu_iohandler_fill > 0.95% [kernel] [k] add_wait_queue > 0.69% [kernel] [k] __srcu_read_lock > 0.58% [kernel] [k] poll_freewait > 0.57% [kernel] [k] _raw_spin_lock_irq > 0.54% [kernel] [k] __srcu_read_unlock > 0.47% [kernel] [k] copy_user_enhanced_fast_string > 0.46% [kvm_intel] [k] vmx_vcpu_run > 0.46% [kvm] [k] vcpu_enter_guest > 0.42% [kernel] [k] tcp_poll > 0.41% [kernel] [k] system_call_after_swapgs > 0.40% libglib-2.0.so.0.3200.4 [.] g_slice_alloc > 0.40% [kernel] [k] system_call > 0.38% libpthread-2.13.so [.] 0x000000000000e18d > 0.38% libglib-2.0.so.0.3200.4 [.] g_slice_free1 > 0.38% qemu-system-x86_64 [.] address_space_translate_internal > 0.38% [kernel] [k] _raw_spin_lock > 0.37% qemu-system-x86_64 [.] phys_page_find > 0.36% [kernel] [k] get_page_from_freelist > 0.35% [kernel] [k] sock_poll > 0.34% [kernel] [k] fsnotify > 0.31% libglib-2.0.so.0.3200.4 [.] g_main_context_check > 0.30% [kernel] [k] do_direct_IO > 0.29% libpthread-2.13.so [.] pthread_getspecific > > virtio_console module uninstalled: > PerfTop: 9138 irqs/sec kernel:71.7% exact: 0.0% [4000Hz cycles], (target_pid: 88381) > ------------------------------------------------------------------------------------------------------------------------------ > > 5.72% qemu-system-x86_64 [.] qcow2_check_metadata_overlap > 4.51% [kernel] [k] fget_light > 3.98% [kernel] [k] _raw_spin_lock_irqsave > 2.55% [kernel] [k] fput > 2.48% libpthread-2.13.so [.] pthread_mutex_lock > 2.46% [kernel] [k] _raw_spin_unlock_irqrestore > 2.21% libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt > 1.71% [vdso] [.] 0x000000000000060c > 1.68% libc-2.13.so [.] 0x00000000000e751f > 1.64% libglib-2.0.so.0.3200.4 [.] 0x000000000004fca0 > 1.20% [kernel] [k] __srcu_read_lock > 1.14% [kernel] [k] do_sys_poll > 0.96% [kernel] [k] _raw_spin_lock_irq > 0.95% [kernel] [k] __pollwait > 0.91% [kernel] [k] __srcu_read_unlock > 0.78% [kernel] [k] tcp_poll > 0.74% [kvm] [k] vcpu_enter_guest > 0.73% [kvm_intel] [k] vmx_vcpu_run > 0.72% [kernel] [k] _raw_spin_lock > 0.72% [kernel] [k] system_call_after_swapgs > 0.70% [kernel] [k] copy_user_enhanced_fast_string > 0.67% libglib-2.0.so.0.3200.4 [.] g_slice_free1 > 0.66% libpthread-2.13.so [.] 0x000000000000e12d > 0.65% [kernel] [k] system_call > 0.61% [kernel] [k] do_direct_IO > 0.57% qemu-system-x86_64 [.] qemu_iohandler_poll > 0.57% [kernel] [k] fsnotify > 0.54% libglib-2.0.so.0.3200.4 [.] g_slice_alloc > 0.50% [kernel] [k] vfs_write > 0.49% libpthread-2.13.so [.] pthread_getspecific > 0.48% qemu-system-x86_64 [.] qemu_event_reset > 0.47% libglib-2.0.so.0.3200.4 [.] g_main_context_check > 0.46% qemu-system-x86_64 [.] address_space_translate_internal > 0.46% [kernel] [k] sock_poll > 0.46% libpthread-2.13.so [.] __pthread_disable_asynccancel > 0.44% [kernel] [k] resched_task > 0.43% libpthread-2.13.so [.] __pthread_enable_asynccancel > 0.42% qemu-system-x86_64 [.] phys_page_find > 0.39% qemu-system-x86_64 [.] object_dynamic_cast_assert Max: Unrelated to this performance issue but I notice that the qcow2 metadata overlap check is high in the host CPU profile. Have you had any thoughts about optimizing the check? Stefan [-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial 2014-09-12 12:38 ` Stefan Hajnoczi @ 2014-09-13 17:22 ` Max Reitz -1 siblings, 0 replies; 40+ messages in thread From: Max Reitz @ 2014-09-13 17:22 UTC (permalink / raw) To: Stefan Hajnoczi, Zhang Haoyu Cc: Amit Shah, Zhang Haoyu, Paolo Bonzini, qemu-devel, kvm, Christian Borntraeger On 12.09.2014 14:38, Stefan Hajnoczi wrote: > Max: Unrelated to this performance issue but I notice that the qcow2 > metadata overlap check is high in the host CPU profile. Have you had > any thoughts about optimizing the check? > > Stefan In fact, I have done so (albeit only briefly). Instead of gathering all the information in the overlap function itself, we could either have a generic list of typed ranges (e.g. "cluster 0: header", "clusters 1 to 5: L1 table", etc.) or a not-really-bitmap (with 4 bits per entry specifying the cluster type (header, L1 table, free or data cluster, etc.)). The disadvantage of the former would be that in its simplest form we'd have to run through the whole list to find out whether a cluster is already reserved for metadata or not. We could easily optimize this by keeping the list in order and then performing a binary search. The disadvantage of the latter would obviously be its memory size. For a 1 TB image with 64 kB clusters, it would be 8 MB in size. Could be considered acceptable, but I deem it too large. The advantage would be constant access time, of course. We could combine both approaches, that is, using the bitmap as a cache: Whenever a cluster is overlap checked, the corresponding bitmap range (or "bitmap window") is requested; if that is not available, it is generated from the range list and then put into the cache. The remaining question is how large the range list would be in memory. Basically, its size would be comparable to an RLE version of the bitmap. In contrast to a raw RLE version, however, we'd have to add the start cluster to each entry in order to be able to perform binary search and we'd omit free and/or data clusters. So, we'd have 4 bits for the cluster type, let's say 12 bits for the cluster count and of course 64 bits for the first cluster index. Or, for maximum efficiency, we'd have 64 - 9 - 1 = 54 bits for the cluster index, 4 bits for the type and then 6 bits for the cluster count. The first variant gives us 10 bytes per metadata range, the second 8. Considering one refcount block can handle cluster_size / 2 entries and one L2 table can handle cluster_size / 8 entries, we have (for images with a cluster size of 64 kB) a ratio of about 1/32768 refcount blocks per cluster and 1/8192 L2 tables per cluster. I guess we therefore have a metadata ratio of about 1/6000. At the worst, each metadata cluster requires its own range list entry, which for 10 bytes per entry means less than 30 kB for the list of a 1 TB image with 64 kB clusters. I think that's acceptable. We could compress that list even more by making it a real RLE version of the bitmap, removing the cluster index from each entry; remember that for this mixed range list/bitmap approach we no longer need to be able to perform exact binary search but only need to be able to quickly seek to the beginning of a bitmap window. This can be achieved by forcing breaks in the range list at every window border and keeping track of those offsets along with the corresponding bitmap window index. When we want to generate a bitmap window, we look up the start offset in the range list (constant time), then generate it (linear to window size) and can then perform constant-time lookups for each overlap checks in that window. I think that could greatly speed things up and also allow us to always perform range checks on data structures not kept in memory (inactive L1 and L2 tables). The only question now remaining to me is whether that caching is actually feasible or whether binary search into the range list (which then would have to include the cluster index for each entry) would be faster than generating bitmap windows which might suffer from ping-pong effects. Max ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial @ 2014-09-13 17:22 ` Max Reitz 0 siblings, 0 replies; 40+ messages in thread From: Max Reitz @ 2014-09-13 17:22 UTC (permalink / raw) To: Stefan Hajnoczi, Zhang Haoyu Cc: kvm, qemu-devel, Zhang Haoyu, Christian Borntraeger, Amit Shah, Paolo Bonzini On 12.09.2014 14:38, Stefan Hajnoczi wrote: > Max: Unrelated to this performance issue but I notice that the qcow2 > metadata overlap check is high in the host CPU profile. Have you had > any thoughts about optimizing the check? > > Stefan In fact, I have done so (albeit only briefly). Instead of gathering all the information in the overlap function itself, we could either have a generic list of typed ranges (e.g. "cluster 0: header", "clusters 1 to 5: L1 table", etc.) or a not-really-bitmap (with 4 bits per entry specifying the cluster type (header, L1 table, free or data cluster, etc.)). The disadvantage of the former would be that in its simplest form we'd have to run through the whole list to find out whether a cluster is already reserved for metadata or not. We could easily optimize this by keeping the list in order and then performing a binary search. The disadvantage of the latter would obviously be its memory size. For a 1 TB image with 64 kB clusters, it would be 8 MB in size. Could be considered acceptable, but I deem it too large. The advantage would be constant access time, of course. We could combine both approaches, that is, using the bitmap as a cache: Whenever a cluster is overlap checked, the corresponding bitmap range (or "bitmap window") is requested; if that is not available, it is generated from the range list and then put into the cache. The remaining question is how large the range list would be in memory. Basically, its size would be comparable to an RLE version of the bitmap. In contrast to a raw RLE version, however, we'd have to add the start cluster to each entry in order to be able to perform binary search and we'd omit free and/or data clusters. So, we'd have 4 bits for the cluster type, let's say 12 bits for the cluster count and of course 64 bits for the first cluster index. Or, for maximum efficiency, we'd have 64 - 9 - 1 = 54 bits for the cluster index, 4 bits for the type and then 6 bits for the cluster count. The first variant gives us 10 bytes per metadata range, the second 8. Considering one refcount block can handle cluster_size / 2 entries and one L2 table can handle cluster_size / 8 entries, we have (for images with a cluster size of 64 kB) a ratio of about 1/32768 refcount blocks per cluster and 1/8192 L2 tables per cluster. I guess we therefore have a metadata ratio of about 1/6000. At the worst, each metadata cluster requires its own range list entry, which for 10 bytes per entry means less than 30 kB for the list of a 1 TB image with 64 kB clusters. I think that's acceptable. We could compress that list even more by making it a real RLE version of the bitmap, removing the cluster index from each entry; remember that for this mixed range list/bitmap approach we no longer need to be able to perform exact binary search but only need to be able to quickly seek to the beginning of a bitmap window. This can be achieved by forcing breaks in the range list at every window border and keeping track of those offsets along with the corresponding bitmap window index. When we want to generate a bitmap window, we look up the start offset in the range list (constant time), then generate it (linear to window size) and can then perform constant-time lookups for each overlap checks in that window. I think that could greatly speed things up and also allow us to always perform range checks on data structures not kept in memory (inactive L1 and L2 tables). The only question now remaining to me is whether that caching is actually feasible or whether binary search into the range list (which then would have to include the cluster index for each entry) would be faster than generating bitmap windows which might suffer from ping-pong effects. Max ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [question] virtio-blk performance degradation happened with virito-serial 2014-09-12 3:21 ` [Qemu-devel] " Zhang Haoyu @ 2014-09-16 14:59 ` Zhang Haoyu -1 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-16 14:59 UTC (permalink / raw) To: Zhang Haoyu, Amit Shah, Paolo Bonzini Cc: qemu-devel, kvm, Christian Borntraeger >>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>>>> >>>>> That could be the case if MSI is disabled. >>>> >>>> Do the windows virtio drivers enable MSIs, in their inf file? >>> >>> It depends on the version of the drivers, but it is a reasonable guess >>> at what differs between Linux and Windows. Haoyu, can you give us the >>> output of lspci from a Linux guest? >>> >> I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. >> virtio_console module installed: >> 64K-write-sequence: 285 MBPS, 4380 IOPS >> virtio_console module uninstalled: >> 64K-write-sequence: 370 MBPS, 5670 IOPS >> >I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that, >virtio_console module installed: >qemu main thread cpu usage: 98% >virtio_console module uninstalled: >qemu main thread cpu usage: 60% > I found that the statement "err = register_virtio_driver(&virtio_console);" in virtio_console module's init() function will cause the degradation, if I directly return before "err = register_virtio_driver(&virtio_console);", then the degradation disappeared, if directly return after "err = register_virtio_driver(&virtio_console);", the degradation is still there. I will try below test case, 1. Dose not emulate virito-serial deivce, then install/uninstall virtio_console driver in guest, to see whether there is difference in virtio-blk performance and cpu usage. 2. Does not emulate virito-serial deivce, then install virtio_balloon driver (and also dose not emulate virtio-balloon device), to see whether virtio-blk performance degradation will happen. 3. Emulating virtio-balloon device instead of virtio-serial deivce , then to see whether the virtio-blk performance is hampered. Base on the test result, corresponding analysis will be performed. Any ideas? Thanks, Zhang Haoyu ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial @ 2014-09-16 14:59 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-16 14:59 UTC (permalink / raw) To: Zhang Haoyu, Amit Shah, Paolo Bonzini Cc: Christian Borntraeger, qemu-devel, kvm >>>>>> If virtio-blk and virtio-serial share an IRQ, the guest operating system has to check each virtqueue for activity. Maybe there is some inefficiency doing that. >>>>>> AFAIK virtio-serial registers 64 virtqueues (on 31 ports + console) even if everything is unused. >>>>> >>>>> That could be the case if MSI is disabled. >>>> >>>> Do the windows virtio drivers enable MSIs, in their inf file? >>> >>> It depends on the version of the drivers, but it is a reasonable guess >>> at what differs between Linux and Windows. Haoyu, can you give us the >>> output of lspci from a Linux guest? >>> >> I made a test with fio on rhel-6.5 guest, the same degradation happened too, this degradation can be reproduced on rhel6.5 guest 100%. >> virtio_console module installed: >> 64K-write-sequence: 285 MBPS, 4380 IOPS >> virtio_console module uninstalled: >> 64K-write-sequence: 370 MBPS, 5670 IOPS >> >I use top -d 1 -H -p <qemu-pid> to monitor the cpu usage, and found that, >virtio_console module installed: >qemu main thread cpu usage: 98% >virtio_console module uninstalled: >qemu main thread cpu usage: 60% > I found that the statement "err = register_virtio_driver(&virtio_console);" in virtio_console module's init() function will cause the degradation, if I directly return before "err = register_virtio_driver(&virtio_console);", then the degradation disappeared, if directly return after "err = register_virtio_driver(&virtio_console);", the degradation is still there. I will try below test case, 1. Dose not emulate virito-serial deivce, then install/uninstall virtio_console driver in guest, to see whether there is difference in virtio-blk performance and cpu usage. 2. Does not emulate virito-serial deivce, then install virtio_balloon driver (and also dose not emulate virtio-balloon device), to see whether virtio-blk performance degradation will happen. 3. Emulating virtio-balloon device instead of virtio-serial deivce , then to see whether the virtio-blk performance is hampered. Base on the test result, corresponding analysis will be performed. Any ideas? Thanks, Zhang Haoyu ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-01 12:52 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu 2014-09-01 13:09 ` Christian Borntraeger @ 2014-09-02 6:36 ` Amit Shah 2014-09-02 18:05 ` Andrey Korolyov ` (2 more replies) 1 sibling, 3 replies; 40+ messages in thread From: Amit Shah @ 2014-09-02 6:36 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote: > >>> Hi, all > >>> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. > >>> without virtio-serial: > >>> 4k-read-random 1186 IOPS > >>> with virtio-serial: > >>> 4k-read-random 871 IOPS > >>> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. > >>> > >>> And, ide performance degradation does not happen with virtio-serial. > >> > >>Pretty sure it's related to MSI vectors in use. It's possible that > >>the virtio-serial device takes up all the avl vectors in the guests, > >>leaving old-style irqs for the virtio-blk device. > >> > >I don't think so, > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, > >the performance got back again, very obvious. > add comments: > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. Using the vectors= option as mentioned below, you can restrict the number of MSI vectors the virtio-serial device gets. You can then confirm whether it's MSI that's related to these issues. > >So, I think it has no business with legacy interrupt mode, right? > > > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, > >and the difference of perf top data on guest when disable/enable virtio-serial in guest, > >any ideas? > > > >Thanks, > >Zhang Haoyu > >>If you restrict the number of vectors the virtio-serial device gets > >>(using the -device virtio-serial-pci,vectors= param), does that make > >>things better for you? Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-02 6:36 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah @ 2014-09-02 18:05 ` Andrey Korolyov 2014-09-02 18:11 ` [Qemu-devel] " Amit Shah 2014-09-04 2:20 ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu 2014-09-19 5:53 ` Fam Zheng 2 siblings, 1 reply; 40+ messages in thread From: Andrey Korolyov @ 2014-09-02 18:05 UTC (permalink / raw) To: Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm On Tue, Sep 2, 2014 at 10:36 AM, Amit Shah <amit.shah@redhat.com> wrote: > On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote: >> >>> Hi, all >> >>> >> >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >> >>> without virtio-serial: >> >>> 4k-read-random 1186 IOPS >> >>> with virtio-serial: >> >>> 4k-read-random 871 IOPS >> >>> >> >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >> >>> >> >>> And, ide performance degradation does not happen with virtio-serial. >> >> >> >>Pretty sure it's related to MSI vectors in use. It's possible that >> >>the virtio-serial device takes up all the avl vectors in the guests, >> >>leaving old-style irqs for the virtio-blk device. >> >> >> >I don't think so, >> >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >> >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >> >the performance got back again, very obvious. >> add comments: >> Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. > > Using the vectors= option as mentioned below, you can restrict the > number of MSI vectors the virtio-serial device gets. You can then > confirm whether it's MSI that's related to these issues. > >> >So, I think it has no business with legacy interrupt mode, right? >> > >> >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, >> >and the difference of perf top data on guest when disable/enable virtio-serial in guest, >> >any ideas? >> > >> >Thanks, >> >Zhang Haoyu >> >>If you restrict the number of vectors the virtio-serial device gets >> >>(using the -device virtio-serial-pci,vectors= param), does that make >> >>things better for you? > > > > Amit > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Can confirm serious degradation comparing to the 1.1 with regular serial output - I am able to hang VM forever after some tens of seconds after continuously printing dmest to the ttyS0. VM just ate all available CPU quota during test and hanged over some tens of seconds, not even responding to regular pings and progressively raising CPU consumption up to the limit. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [question] virtio-blk performance degradationhappened with virito-serial 2014-09-02 18:05 ` Andrey Korolyov @ 2014-09-02 18:11 ` Amit Shah 0 siblings, 0 replies; 40+ messages in thread From: Amit Shah @ 2014-09-02 18:11 UTC (permalink / raw) To: Andrey Korolyov; +Cc: Zhang Haoyu, qemu-devel, kvm On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote: > Can confirm serious degradation comparing to the 1.1 with regular > serial output - I am able to hang VM forever after some tens of > seconds after continuously printing dmest to the ttyS0. VM just ate > all available CPU quota during test and hanged over some tens of > seconds, not even responding to regular pings and progressively > raising CPU consumption up to the limit. Entirely different to what's being discussed here. You're observing slowdown with ttyS0 in the guest -- the isa-serial device. This thread is discussing virtio-blk and virtio-serial. Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial @ 2014-09-02 18:11 ` Amit Shah 0 siblings, 0 replies; 40+ messages in thread From: Amit Shah @ 2014-09-02 18:11 UTC (permalink / raw) To: Andrey Korolyov; +Cc: Zhang Haoyu, qemu-devel, kvm On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote: > Can confirm serious degradation comparing to the 1.1 with regular > serial output - I am able to hang VM forever after some tens of > seconds after continuously printing dmest to the ttyS0. VM just ate > all available CPU quota during test and hanged over some tens of > seconds, not even responding to regular pings and progressively > raising CPU consumption up to the limit. Entirely different to what's being discussed here. You're observing slowdown with ttyS0 in the guest -- the isa-serial device. This thread is discussing virtio-blk and virtio-serial. Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-02 18:11 ` [Qemu-devel] " Amit Shah (?) @ 2014-09-02 18:27 ` Andrey Korolyov -1 siblings, 0 replies; 40+ messages in thread From: Andrey Korolyov @ 2014-09-02 18:27 UTC (permalink / raw) To: Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm On Tue, Sep 2, 2014 at 10:11 PM, Amit Shah <amit.shah@redhat.com> wrote: > On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote: > >> Can confirm serious degradation comparing to the 1.1 with regular >> serial output - I am able to hang VM forever after some tens of >> seconds after continuously printing dmest to the ttyS0. VM just ate >> all available CPU quota during test and hanged over some tens of >> seconds, not even responding to regular pings and progressively >> raising CPU consumption up to the limit. > > Entirely different to what's being discussed here. You're observing > slowdown with ttyS0 in the guest -- the isa-serial device. This > thread is discussing virtio-blk and virtio-serial. > > Amit Sorry for thread hijacking, the problem definitely not related to the interrupt rework, will start a new thread. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial 2014-09-02 6:36 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah 2014-09-02 18:05 ` Andrey Korolyov @ 2014-09-04 2:20 ` Zhang Haoyu 2014-09-19 5:53 ` Fam Zheng 2 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-04 2:20 UTC (permalink / raw) To: Amit Shah; +Cc: qemu-devel, kvm >> >>> Hi, all >> >>> >> >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >> >>> without virtio-serial: >> >>> 4k-read-random 1186 IOPS >> >>> with virtio-serial: >> >>> 4k-read-random 871 IOPS >> >>> >> >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >> >>> >> >>> And, ide performance degradation does not happen with virtio-serial. >> >> >> >>Pretty sure it's related to MSI vectors in use. It's possible that >> >>the virtio-serial device takes up all the avl vectors in the guests, >> >>leaving old-style irqs for the virtio-blk device. >> >> >> >I don't think so, >> >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >> >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >> >the performance got back again, very obvious. >> add comments: >> Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. > >Using the vectors= option as mentioned below, you can restrict the >number of MSI vectors the virtio-serial device gets. You can then >confirm whether it's MSI that's related to these issues. > I use "-device virtio-serial,vectors=4" instead of "-device virtio-serial", but the degradation still happened, nothing changed. with virtio-serial enabled: 64k-write-sequence: 4200 IOPS with virtio-serial disabled: 64k-write-sequence: 5300 IOPS How to confirm whether it's MSI in windows? Thanks, Zhang Haoyu >> >So, I think it has no business with legacy interrupt mode, right? >> > >> >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, >> >and the difference of perf top data on guest when disable/enable virtio-serial in guest, >> >any ideas? >> > >> >Thanks, >> >Zhang Haoyu >> >>If you restrict the number of vectors the virtio-serial device gets >> >>(using the -device virtio-serial-pci,vectors= param), does that make >> >>things better for you? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial 2014-09-02 6:36 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah @ 2014-09-19 5:53 ` Fam Zheng 2014-09-04 2:20 ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu 2014-09-19 5:53 ` Fam Zheng 2 siblings, 0 replies; 40+ messages in thread From: Fam Zheng @ 2014-09-19 5:53 UTC (permalink / raw) To: Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm, Paolo Bonzini On Tue, 09/02 12:06, Amit Shah wrote: > On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote: > > >>> Hi, all > > >>> > > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. > > >>> without virtio-serial: > > >>> 4k-read-random 1186 IOPS > > >>> with virtio-serial: > > >>> 4k-read-random 871 IOPS > > >>> > > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. > > >>> > > >>> And, ide performance degradation does not happen with virtio-serial. > > >> > > >>Pretty sure it's related to MSI vectors in use. It's possible that > > >>the virtio-serial device takes up all the avl vectors in the guests, > > >>leaving old-style irqs for the virtio-blk device. > > >> > > >I don't think so, > > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, > > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, > > >the performance got back again, very obvious. > > add comments: > > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. > > Using the vectors= option as mentioned below, you can restrict the > number of MSI vectors the virtio-serial device gets. You can then > confirm whether it's MSI that's related to these issues. Amit, It's related to the big number of ioeventfds used in virtio-serial-pci. With virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if guest initializes it or not. In my test, there are 12 fds to poll in qemu_poll_ns before loading guest virtio_console.ko, whereas 76 once modprobe virtio_console. Looks like the ppoll takes more time to poll more fds. Some trace data with systemtap: 12 fds: time rel_time symbol 15 (+1) qemu_poll_ns [enter] 18 (+3) qemu_poll_ns [return] 76 fd: 12 (+2) qemu_poll_ns [enter] 18 (+6) qemu_poll_ns [return] I haven't looked at virtio-serial code, I'm not sure if we should reduce the number of ioeventfds in virtio-serial-pci or focus on lower level efficiency. Haven't compared with g_poll but I think the underlying syscall should be the same. Any ideas? Fam > > > >So, I think it has no business with legacy interrupt mode, right? > > > > > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, > > >and the difference of perf top data on guest when disable/enable virtio-serial in guest, > > >any ideas? > > > > > >Thanks, > > >Zhang Haoyu > > >>If you restrict the number of vectors the virtio-serial device gets > > >>(using the -device virtio-serial-pci,vectors= param), does that make > > >>things better for you? > > > > Amit > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial @ 2014-09-19 5:53 ` Fam Zheng 0 siblings, 0 replies; 40+ messages in thread From: Fam Zheng @ 2014-09-19 5:53 UTC (permalink / raw) To: Amit Shah; +Cc: Paolo Bonzini, Zhang Haoyu, qemu-devel, kvm On Tue, 09/02 12:06, Amit Shah wrote: > On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote: > > >>> Hi, all > > >>> > > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. > > >>> without virtio-serial: > > >>> 4k-read-random 1186 IOPS > > >>> with virtio-serial: > > >>> 4k-read-random 871 IOPS > > >>> > > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. > > >>> > > >>> And, ide performance degradation does not happen with virtio-serial. > > >> > > >>Pretty sure it's related to MSI vectors in use. It's possible that > > >>the virtio-serial device takes up all the avl vectors in the guests, > > >>leaving old-style irqs for the virtio-blk device. > > >> > > >I don't think so, > > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, > > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, > > >the performance got back again, very obvious. > > add comments: > > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. > > Using the vectors= option as mentioned below, you can restrict the > number of MSI vectors the virtio-serial device gets. You can then > confirm whether it's MSI that's related to these issues. Amit, It's related to the big number of ioeventfds used in virtio-serial-pci. With virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if guest initializes it or not. In my test, there are 12 fds to poll in qemu_poll_ns before loading guest virtio_console.ko, whereas 76 once modprobe virtio_console. Looks like the ppoll takes more time to poll more fds. Some trace data with systemtap: 12 fds: time rel_time symbol 15 (+1) qemu_poll_ns [enter] 18 (+3) qemu_poll_ns [return] 76 fd: 12 (+2) qemu_poll_ns [enter] 18 (+6) qemu_poll_ns [return] I haven't looked at virtio-serial code, I'm not sure if we should reduce the number of ioeventfds in virtio-serial-pci or focus on lower level efficiency. Haven't compared with g_poll but I think the underlying syscall should be the same. Any ideas? Fam > > > >So, I think it has no business with legacy interrupt mode, right? > > > > > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, > > >and the difference of perf top data on guest when disable/enable virtio-serial in guest, > > >any ideas? > > > > > >Thanks, > > >Zhang Haoyu > > >>If you restrict the number of vectors the virtio-serial device gets > > >>(using the -device virtio-serial-pci,vectors= param), does that make > > >>things better for you? > > > > Amit > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [question] virtio-blk performance degradationhappened with virito-serial 2014-09-19 5:53 ` Fam Zheng @ 2014-09-19 13:35 ` Paolo Bonzini -1 siblings, 0 replies; 40+ messages in thread From: Paolo Bonzini @ 2014-09-19 13:35 UTC (permalink / raw) To: Fam Zheng, Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm Il 19/09/2014 07:53, Fam Zheng ha scritto: > Any ideas? The obvious, but hardish one is to switch to epoll (one epoll fd per AioContext, plus one for iohandler.c). This would require converting iohandler.c to a GSource. Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial @ 2014-09-19 13:35 ` Paolo Bonzini 0 siblings, 0 replies; 40+ messages in thread From: Paolo Bonzini @ 2014-09-19 13:35 UTC (permalink / raw) To: Fam Zheng, Amit Shah; +Cc: Zhang Haoyu, qemu-devel, kvm Il 19/09/2014 07:53, Fam Zheng ha scritto: > Any ideas? The obvious, but hardish one is to switch to epoll (one epoll fd per AioContext, plus one for iohandler.c). This would require converting iohandler.c to a GSource. Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial 2014-09-19 5:53 ` Fam Zheng @ 2014-09-22 13:23 ` Zhang Haoyu -1 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-22 13:23 UTC (permalink / raw) To: Fam Zheng, Amit Shah; +Cc: qemu-devel, kvm, Paolo Bonzini >> > >>> Hi, all >> > >>> >> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >> > >>> without virtio-serial: >> > >>> 4k-read-random 1186 IOPS >> > >>> with virtio-serial: >> > >>> 4k-read-random 871 IOPS >> > >>> >> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >> > >>> >> > >>> And, ide performance degradation does not happen with virtio-serial. >> > >> >> > >>Pretty sure it's related to MSI vectors in use. It's possible that >> > >>the virtio-serial device takes up all the avl vectors in the guests, >> > >>leaving old-style irqs for the virtio-blk device. >> > >> >> > >I don't think so, >> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >> > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >> > >the performance got back again, very obvious. >> > add comments: >> > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. >> >> Using the vectors= option as mentioned below, you can restrict the >> number of MSI vectors the virtio-serial device gets. You can then >> confirm whether it's MSI that's related to these issues. > >Amit, > >It's related to the big number of ioeventfds used in virtio-serial-pci. With >virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if >guest initializes it or not. > >In my test, there are 12 fds to poll in qemu_poll_ns before loading guest >virtio_console.ko, whereas 76 once modprobe virtio_console. > >Looks like the ppoll takes more time to poll more fds. > >Some trace data with systemtap: > >12 fds: > >time rel_time symbol >15 (+1) qemu_poll_ns [enter] >18 (+3) qemu_poll_ns [return] > >76 fd: > >12 (+2) qemu_poll_ns [enter] >18 (+6) qemu_poll_ns [return] > >I haven't looked at virtio-serial code, I'm not sure if we should reduce the >number of ioeventfds in virtio-serial-pci or focus on lower level efficiency. > Does ioeventfd=off hamper the performance of virtio-serial? In my opinion, virtio-serial's use scenario is not so high throughput rate, so ioventfd=off has slight impaction on the performance. Thanks, Zhang Haoyu >Haven't compared with g_poll but I think the underlying syscall should be the >same. > >Any ideas? > >Fam > > >> >> > >So, I think it has no business with legacy interrupt mode, right? >> > > >> > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, >> > >and the difference of perf top data on guest when disable/enable virtio-serial in guest, >> > >any ideas? >> > > >> > >Thanks, >> > >Zhang Haoyu >> > >>If you restrict the number of vectors the virtio-serial device gets >> > >>(using the -device virtio-serial-pci,vectors= param), does that make >> > >>things better for you? >> >> >> >> Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial @ 2014-09-22 13:23 ` Zhang Haoyu 0 siblings, 0 replies; 40+ messages in thread From: Zhang Haoyu @ 2014-09-22 13:23 UTC (permalink / raw) To: Fam Zheng, Amit Shah; +Cc: Paolo Bonzini, qemu-devel, kvm >> > >>> Hi, all >> > >>> >> > >>> I start a VM with virtio-serial (default ports number: 31), and found that virtio-blk performance degradation happened, about 25%, this problem can be reproduced 100%. >> > >>> without virtio-serial: >> > >>> 4k-read-random 1186 IOPS >> > >>> with virtio-serial: >> > >>> 4k-read-random 871 IOPS >> > >>> >> > >>> but if use max_ports=2 option to limit the max number of virio-serial ports, then the IO performance degradation is not so serious, about 5%. >> > >>> >> > >>> And, ide performance degradation does not happen with virtio-serial. >> > >> >> > >>Pretty sure it's related to MSI vectors in use. It's possible that >> > >>the virtio-serial device takes up all the avl vectors in the guests, >> > >>leaving old-style irqs for the virtio-blk device. >> > >> >> > >I don't think so, >> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the virtio-serial dynamically via device manager->virtio-serial => disable, >> > >then the performance get promotion about 25% immediately, then I re-enable the virtio-serial via device manager->virtio-serial => enable, >> > >the performance got back again, very obvious. >> > add comments: >> > Although the virtio-serial is enabled, I don't use it at all, the degradation still happened. >> >> Using the vectors= option as mentioned below, you can restrict the >> number of MSI vectors the virtio-serial device gets. You can then >> confirm whether it's MSI that's related to these issues. > >Amit, > >It's related to the big number of ioeventfds used in virtio-serial-pci. With >virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if >guest initializes it or not. > >In my test, there are 12 fds to poll in qemu_poll_ns before loading guest >virtio_console.ko, whereas 76 once modprobe virtio_console. > >Looks like the ppoll takes more time to poll more fds. > >Some trace data with systemtap: > >12 fds: > >time rel_time symbol >15 (+1) qemu_poll_ns [enter] >18 (+3) qemu_poll_ns [return] > >76 fd: > >12 (+2) qemu_poll_ns [enter] >18 (+6) qemu_poll_ns [return] > >I haven't looked at virtio-serial code, I'm not sure if we should reduce the >number of ioeventfds in virtio-serial-pci or focus on lower level efficiency. > Does ioeventfd=off hamper the performance of virtio-serial? In my opinion, virtio-serial's use scenario is not so high throughput rate, so ioventfd=off has slight impaction on the performance. Thanks, Zhang Haoyu >Haven't compared with g_poll but I think the underlying syscall should be the >same. > >Any ideas? > >Fam > > >> >> > >So, I think it has no business with legacy interrupt mode, right? >> > > >> > >I am going to observe the difference of perf top data on qemu and perf kvm stat data when disable/enable virtio-serial in guest, >> > >and the difference of perf top data on guest when disable/enable virtio-serial in guest, >> > >any ideas? >> > > >> > >Thanks, >> > >Zhang Haoyu >> > >>If you restrict the number of vectors the virtio-serial device gets >> > >>(using the -device virtio-serial-pci,vectors= param), does that make >> > >>things better for you? >> >> >> >> Amit ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial 2014-09-22 13:23 ` Zhang Haoyu @ 2014-09-23 1:29 ` Fam Zheng -1 siblings, 0 replies; 40+ messages in thread From: Fam Zheng @ 2014-09-23 1:29 UTC (permalink / raw) To: Zhang Haoyu; +Cc: Amit Shah, qemu-devel, kvm, Paolo Bonzini On Mon, 09/22 21:23, Zhang Haoyu wrote: > > > >Amit, > > > >It's related to the big number of ioeventfds used in virtio-serial-pci. With > >virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if > >guest initializes it or not. > > > >In my test, there are 12 fds to poll in qemu_poll_ns before loading guest > >virtio_console.ko, whereas 76 once modprobe virtio_console. > > > >Looks like the ppoll takes more time to poll more fds. > > > >Some trace data with systemtap: > > > >12 fds: > > > >time rel_time symbol > >15 (+1) qemu_poll_ns [enter] > >18 (+3) qemu_poll_ns [return] > > > >76 fd: > > > >12 (+2) qemu_poll_ns [enter] > >18 (+6) qemu_poll_ns [return] > > > >I haven't looked at virtio-serial code, I'm not sure if we should reduce the > >number of ioeventfds in virtio-serial-pci or focus on lower level efficiency. > > > Does ioeventfd=off hamper the performance of virtio-serial? In theory it has an impact, but I have no data about this. If you have a performance demand, it's best to try it against your use case to answer this question. Fam ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Qemu-devel] [question] virtio-blk performancedegradationhappened with virito-serial @ 2014-09-23 1:29 ` Fam Zheng 0 siblings, 0 replies; 40+ messages in thread From: Fam Zheng @ 2014-09-23 1:29 UTC (permalink / raw) To: Zhang Haoyu; +Cc: Amit Shah, Paolo Bonzini, qemu-devel, kvm On Mon, 09/22 21:23, Zhang Haoyu wrote: > > > >Amit, > > > >It's related to the big number of ioeventfds used in virtio-serial-pci. With > >virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if > >guest initializes it or not. > > > >In my test, there are 12 fds to poll in qemu_poll_ns before loading guest > >virtio_console.ko, whereas 76 once modprobe virtio_console. > > > >Looks like the ppoll takes more time to poll more fds. > > > >Some trace data with systemtap: > > > >12 fds: > > > >time rel_time symbol > >15 (+1) qemu_poll_ns [enter] > >18 (+3) qemu_poll_ns [return] > > > >76 fd: > > > >12 (+2) qemu_poll_ns [enter] > >18 (+6) qemu_poll_ns [return] > > > >I haven't looked at virtio-serial code, I'm not sure if we should reduce the > >number of ioeventfds in virtio-serial-pci or focus on lower level efficiency. > > > Does ioeventfd=off hamper the performance of virtio-serial? In theory it has an impact, but I have no data about this. If you have a performance demand, it's best to try it against your use case to answer this question. Fam ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2014-09-23 1:29 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-29 7:45 [question] virtio-blk performance degradation happened with virito-serial Zhang Haoyu 2014-08-29 7:45 ` [Qemu-devel] " Zhang Haoyu 2014-08-29 14:38 ` Amit Shah 2014-09-01 12:38 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu 2014-09-01 12:46 ` Amit Shah 2014-09-01 12:57 ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu 2014-09-01 12:52 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu 2014-09-01 13:09 ` Christian Borntraeger 2014-09-01 13:12 ` Paolo Bonzini 2014-09-01 13:22 ` Christian Borntraeger 2014-09-01 13:29 ` Paolo Bonzini 2014-09-01 14:03 ` Christian Borntraeger 2014-09-01 14:15 ` Christian Borntraeger 2014-09-04 7:56 ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu 2014-09-07 9:46 ` Zhang Haoyu 2014-09-07 9:46 ` Zhang Haoyu 2014-09-11 6:11 ` Amit Shah 2014-09-11 6:11 ` Amit Shah 2014-09-12 3:21 ` [question] virtio-blk performance degradation happened with virito-serial Zhang Haoyu 2014-09-12 3:21 ` [Qemu-devel] " Zhang Haoyu 2014-09-12 12:38 ` Stefan Hajnoczi 2014-09-12 12:38 ` Stefan Hajnoczi 2014-09-13 17:22 ` Max Reitz 2014-09-13 17:22 ` Max Reitz 2014-09-16 14:59 ` Zhang Haoyu 2014-09-16 14:59 ` [Qemu-devel] " Zhang Haoyu 2014-09-02 6:36 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah 2014-09-02 18:05 ` Andrey Korolyov 2014-09-02 18:11 ` Amit Shah 2014-09-02 18:11 ` [Qemu-devel] " Amit Shah 2014-09-02 18:27 ` Andrey Korolyov 2014-09-04 2:20 ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu 2014-09-19 5:53 ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Fam Zheng 2014-09-19 5:53 ` Fam Zheng 2014-09-19 13:35 ` Paolo Bonzini 2014-09-19 13:35 ` [Qemu-devel] " Paolo Bonzini 2014-09-22 13:23 ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu 2014-09-22 13:23 ` Zhang Haoyu 2014-09-23 1:29 ` Fam Zheng 2014-09-23 1:29 ` Fam Zheng
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.