From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Theurer Subject: Re: kvm scaling question Date: Tue, 15 Sep 2009 09:10:05 -0500 Message-ID: <1253023806.4204.9.camel@twinturbo.austin.ibm.com> References: <4AAA1A0A0200004800080E06@novprvlin0050.provo.novell.com> <20090911215355.GD6244@amt.cnet> <4AAE7B3B0200004800081118@novprvlin0050.provo.novell.com> Reply-To: habanero@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org To: Bruce Rogers Return-path: Received: from e35.co.us.ibm.com ([32.97.110.153]:51682 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754131AbZIOOKk (ORCPT ); Tue, 15 Sep 2009 10:10:40 -0400 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by e35.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id n8FE0m8B015063 for ; Tue, 15 Sep 2009 08:00:48 -0600 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n8FEALU8171252 for ; Tue, 15 Sep 2009 08:10:22 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n8FEA79v025069 for ; Tue, 15 Sep 2009 08:10:08 -0600 In-Reply-To: <4AAE7B3B0200004800081118@novprvlin0050.provo.novell.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, 2009-09-14 at 17:19 -0600, Bruce Rogers wrote: > On 9/11/2009 at 3:53 PM, Marcelo Tosatti wrote: > > On Fri, Sep 11, 2009 at 09:36:10AM -0600, Bruce Rogers wrote: > >> I am wondering if anyone has investigated how well kvm scales when > > supporting many guests, or many vcpus or both. > >> > >> I'll do some investigations into the per vm memory overhead and > >> play with bumping the max vcpu limit way beyond 16, but hopefully > >> someone can comment on issues such as locking problems that are known > >> to exist and needing to be addressed to increased parallellism, > >> general overhead percentages which can help provide consolidation > >> expectations, etc. > > > > I suppose it depends on the guest and workload. With an EPT host and > > 16-way Linux guest doing kernel compilations, on recent kernel, i see: > > > > # Samples: 98703304 > > # > > # Overhead Command Shared Object Symbol > > # ........ ............... ................................. ...... > > # > > 97.15% sh [kernel] [k] > > vmx_vcpu_run > > 0.27% sh [kernel] [k] > > kvm_arch_vcpu_ioctl_ > > 0.12% sh [kernel] [k] > > default_send_IPI_mas > > 0.09% sh [kernel] [k] > > _spin_lock_irq > > > > Which is pretty good. Without EPT/NPT the mmu_lock seems to be the major > > bottleneck to parallelism. > > > >> Also, when I did a simple experiment with vcpu overcommitment, I was > >> surprised how quickly performance suffered (just bringing a Linux vm > >> up), since I would have assumed the additional vcpus would have been > >> halted the vast majority of the time. On a 2 proc box, overcommitment > >> to 8 vcpus in a guest (I know this isn't a good usage scenario, but > >> does provide some insights) caused the boot time to increase to almost > >> exponential levels. At 16 vcpus, it took hours to just reach the gui > >> login prompt. > > > > One probable reason for that are vcpus which hold spinlocks in the guest > > are scheduled out in favour of vcpus which spin on that same lock. > > I suspected it might be a whole lot of spinning happening. That does seems most likely. I was just surprised how bad the behavior was. I have collected lock_stat info on a similar vcpu over-commit configuration, but with EPT system, and saw a very significant amount of spinning. However, if you don't have EPT or NPT, I would bet that's the first problem. IMO, I am a little surprised simply booting is such a problem. I would be interesting to see what lock_stat shows on your guest after booting with 16 vcpus. I have observed that shortening the time between vcpus being scheduled can help mitigate the problem with lock holder preemption (presumably because the spinning vcpu is de-scheduled earlier and the vcpu holding the lock is scheduled sooner), but I imagine there are other unwanted side-effects like lower cache hits. -Andrew > > Bruce > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html