From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: QEMU PIC indirection patch for in-kernel APIC work Date: Thu, 05 Apr 2007 10:22:45 -0500 Message-ID: <46151445.6050503@codemonkey.ws> References: <0E6FE5D295DE5B4B8D9070C26A227987F2444B@pdsmsx411.ccr.corp.intel.com> <46150F4F.4030505@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org To: Avi Kivity Return-path: In-Reply-To: <46150F4F.4030505-atKUWr5tajBWk0Htik3J/w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org Avi Kivity wrote: > Dong, Eddie wrote: >> Avi Kivity wrote: >> >>>> With PIC in Xen, CPU2K gets 6.5% performance gain in old 1000HZ linux >>>> kernel, KB gets 14% gain. We also did a shared PIC model which share >>>> PIC state among Qemu & VMM with less LOC in VMM, it can get >>>> similar performance gain (5.8% in my test). >>>> BTW, at that time, PIT is in VMM already. >>>> >>>> >>> I expect that the gain in kvm will be smaller. Xen has to schedule >>> dom0 to process the event channel (possibly on another cpu), dom0 has >>> to schedule qemu-dm (again, possibly on another cpu), qemu does its >>> thing, and then Xen has to schedule domU again. With kvm, we are >>> always on the same cpu, and the only overhead is the system call, >>> which is a few hundred nanoseconds. I expect with current hardware >>> that it will be negligible (as a vmexit is measured in microseconds), >>> but to become measurable as hardware improves. >>> >> Yes very possible. >> We can take a quick mesurement to see how many cycles are spent in a >> dummy >> I/O emulation in KVM/Qemu. In Xen, one of my old P4 3.8GHZ platform >> takes about 50-60K cycles. We can see how many is it in KVM. >> BTW, today Linux kernel is no longer 1000HZ :-) >> thx,eddie >> > > There's some (old) data here: > > http://virt.kernelnewbies.org/KVM/Performance > > showing pio latency of ~5600 cycles. Note that this is on AMD, which > takes less cycles to switch than the P4, but on the other hand, we > still do a save/restore of the fpu state on every exit, so we can > speed it up even more. It varies quite a lot. On more modern Intel processors, the PIO latency is pretty good. Also, the number on the wiki is worse than what there is today as back then there was an extra syscall pair in the PIO path. I posted a timeline a while ago on kvm-devel. IIRC, going to userspace added < 1k cycles. However, this will get much more pronounced as we improve the VMEXIT latency in KVM. Right now we save a lot of state that we strictly don't have to. Avi mentioned the FPU state but we also save many MSRs on every exit that we strictly only have to save/restore when we lose the CPU (such as the sysenter MSRs). The saving and restoring of these MSRs make up the bulk of the (avoidable) VMEXIT overhead. Eliminating these will actually make the difference between handling in-kernel and userspace much more pronounced. Xen PIO latency is definitely much worse (at least a factor of 3). However, what really kills Xen is that occasionally an undesirable scheduling decision is made and the PIO latency jumps to an enormous value. This ends making the mean PIO latency considerably worse than the median latency. Regards, Anthony Liguori ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV