From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
Subject: Re: QEMU PIC indirection patch for in-kernel APIC work
Date: Thu, 05 Apr 2007 10:22:45 -0500
Message-ID: <46151445.6050503@codemonkey.ws>
References: <0E6FE5D295DE5B4B8D9070C26A227987F2444B@pdsmsx411.ccr.corp.intel.com>
	<46150F4F.4030505@qumranet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
To: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Return-path: <kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
In-Reply-To: <46150F4F.4030505-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/kvm-devel>,
	<mailto:kvm-devel-request-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum=kvm-devel>
List-Post: <mailto:kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
List-Help: <mailto:kvm-devel-request-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/kvm-devel>,
	<mailto:kvm-devel-request-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org?subject=subscribe>
Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
List-Id: kvm.vger.kernel.org

Avi Kivity wrote:
> Dong, Eddie wrote:
>> Avi Kivity wrote:
>>  
>>>> With PIC in Xen, CPU2K gets 6.5% performance gain in old 1000HZ linux
>>>> kernel, KB gets 14% gain. We also did a shared PIC model which share
>>>>  PIC state among Qemu & VMM with less LOC in VMM, it can get
>>>> similar performance gain (5.8% in my test).
>>>> BTW, at that time, PIT is in VMM already.
>>>>
>>>>       
>>> I expect that the gain in kvm will be smaller.  Xen has to schedule
>>> dom0 to process the event channel (possibly on another cpu), dom0 has
>>> to schedule qemu-dm (again, possibly on another cpu), qemu does its
>>> thing, and then Xen has to schedule domU again.  With kvm, we are
>>> always on the same cpu, and the only overhead is the system call,
>>> which is a few hundred nanoseconds.  I expect with current hardware
>>> that it will be negligible (as a vmexit is measured in microseconds),
>>> but to become measurable as hardware improves.
>>>     
>> Yes very possible.
>> We can take a quick mesurement to see how many cycles are spent in a
>> dummy
>> I/O emulation in KVM/Qemu. In Xen, one of my old P4 3.8GHZ platform
>> takes about 50-60K cycles. We can see how many is it in KVM.
>> BTW, today Linux kernel is no longer 1000HZ :-)
>> thx,eddie
>>   
>
> There's some (old) data here:
>
>  http://virt.kernelnewbies.org/KVM/Performance
>
> showing pio latency of ~5600 cycles.  Note that this is on AMD, which 
> takes less cycles to switch than the P4, but on the other hand, we 
> still do a save/restore of the fpu state on every exit, so we can 
> speed it up even more.

It varies quite a lot.  On more modern Intel processors, the PIO latency 
is pretty good.  Also, the number on the wiki is worse than what there 
is today as back then there was an extra syscall pair in the PIO path.

I posted a timeline a while ago on kvm-devel.  IIRC, going to userspace 
added < 1k cycles.  However, this will get much more pronounced as we 
improve the VMEXIT latency in KVM.

Right now we save a lot of state that we strictly don't have to.  Avi 
mentioned the FPU state but we also save many MSRs on every exit that we 
strictly only have to save/restore when we lose the CPU (such as the 
sysenter MSRs).  The saving and restoring of these MSRs make up the bulk 
of the (avoidable) VMEXIT overhead.  Eliminating these will actually 
make the difference between handling in-kernel and userspace much more 
pronounced.

Xen PIO latency is definitely much worse (at least a factor of 3).  
However, what really kills Xen is that occasionally an undesirable 
scheduling decision is made and the PIO latency jumps to an enormous 
value.  This ends making the mean PIO latency considerably worse than 
the median latency.

Regards,

Anthony Liguori

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV