Re: kvm scaling question

From: Andrew Theurer <habanero@linux.vnet.ibm.com>
To: Bruce Rogers <BROGERS@novell.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm@vger.kernel.org
Subject: Re: kvm scaling question
Date: Tue, 15 Sep 2009 09:10:05 -0500	[thread overview]
Message-ID: <1253023806.4204.9.camel@twinturbo.austin.ibm.com> (raw)
In-Reply-To: <4AAE7B3B0200004800081118@novprvlin0050.provo.novell.com>

On Mon, 2009-09-14 at 17:19 -0600, Bruce Rogers wrote:
> On 9/11/2009 at 3:53 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Fri, Sep 11, 2009 at 09:36:10AM -0600, Bruce Rogers wrote:
> >> I am wondering if anyone has investigated how well kvm scales when 
> > supporting many guests, or many vcpus or both.
> >> 
> >> I'll do some investigations into the per vm memory overhead and
> >> play with bumping the max vcpu limit way beyond 16, but hopefully
> >> someone can comment on issues such as locking problems that are known
> >> to exist and needing to be addressed to increased parallellism,
> >> general overhead percentages which can help provide consolidation
> >> expectations, etc.
> > 
> > I suppose it depends on the guest and workload. With an EPT host and
> > 16-way Linux guest doing kernel compilations, on recent kernel, i see:
> > 
> > # Samples: 98703304
> > #
> > # Overhead          Command                      Shared Object  Symbol
> > # ........  ...............  .................................  ......
> > #
> >     97.15%               sh  [kernel]                           [k] 
> > vmx_vcpu_run
> >      0.27%               sh  [kernel]                           [k] 
> > kvm_arch_vcpu_ioctl_
> >      0.12%               sh  [kernel]                           [k] 
> > default_send_IPI_mas
> >      0.09%               sh  [kernel]                           [k] 
> > _spin_lock_irq
> > 
> > Which is pretty good. Without EPT/NPT the mmu_lock seems to be the major
> > bottleneck to parallelism.
> > 
> >> Also, when I did a simple experiment with vcpu overcommitment, I was
> >> surprised how quickly performance suffered (just bringing a Linux vm
> >> up), since I would have assumed the additional vcpus would have been
> >> halted the vast majority of the time. On a 2 proc box, overcommitment
> >> to 8 vcpus in a guest (I know this isn't a good usage scenario, but
> >> does provide some insights) caused the boot time to increase to almost
> >> exponential levels. At 16 vcpus, it took hours to just reach the gui
> >> login prompt.
> > 
> > One probable reason for that are vcpus which hold spinlocks in the guest
> > are scheduled out in favour of vcpus which spin on that same lock.
> 
> I suspected it might be a whole lot of spinning happening. That does seems most likely. I was just surprised how bad the behavior was.

I have collected lock_stat info on a similar vcpu over-commit
configuration, but with EPT system, and saw a very significant amount of
spinning.  However, if you don't have EPT or NPT, I would bet that's the
first problem.  IMO, I am a little surprised simply booting is such a
problem.  I would be interesting to see what lock_stat shows on your
guest after booting with 16 vcpus.  

I have observed that shortening the time between vcpus being scheduled
can help mitigate the problem with lock holder preemption (presumably
because the spinning vcpu is de-scheduled earlier and the vcpu holding
the lock is scheduled sooner), but I imagine there are other unwanted
side-effects like lower cache hits.

-Andrew

> 
> Bruce
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html