From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chegu Vinod Subject: Re: Performance of 40-way guest running 2.6.32-220 (RHEL6.2) vs. 3.3.1 OS Date: Tue, 17 Apr 2012 06:25:15 -0700 Message-ID: <4F8D6F3B.9070203@hp.com> References: <4F871D12.3060006@redhat.com> <20120416121833.GR11918@redhat.com> <4F8C3057.7010404@hp.com> <20120417094939.GE11918@redhat.com> Reply-To: chegu_vinod@hp.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Rik van Riel , kvm@vger.kernel.org To: Gleb Natapov Return-path: Received: from g4t0017.houston.hp.com ([15.201.24.20]:39218 "EHLO g4t0017.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932175Ab2DQNZr (ORCPT ); Tue, 17 Apr 2012 09:25:47 -0400 In-Reply-To: <20120417094939.GE11918@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 4/17/2012 2:49 AM, Gleb Natapov wrote: > On Mon, Apr 16, 2012 at 07:44:39AM -0700, Chegu Vinod wrote: >> On 4/16/2012 5:18 AM, Gleb Natapov wrote: >>> On Thu, Apr 12, 2012 at 02:21:06PM -0400, Rik van Riel wrote: >>>> On 04/11/2012 01:21 PM, Chegu Vinod wrote: >>>>> Hello, >>>>> >>>>> While running an AIM7 (workfile.high_systime) in a single 40-way (or a single >>>>> 60-way KVM guest) I noticed pretty bad performance when the guest was booted >>>>> with 3.3.1 kernel when compared to the same guest booted with 2.6.32-220 >>>>> (RHEL6.2) kernel. >>>>> For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x better than >>>>> the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run the older guest >>>>> kernel was nearly 12x better ! >>> How many CPUs your host has? >> 80 Cores on the DL980. (i.e. 8 Westmere sockets). >> > So you are not oversubscribing CPUs at all. Are those real cores or including HT? HT is off. > Do you have other cpus hogs running on the host while testing the guest? Nope. Sometimes I do run the utilities like "perf" or "sar" or "mpstat" on the numa node 0 (where the guest is not running). > >> I was using numactl to bind the qemu of the 40-way guests to numa >> nodes : 4-7 ( or for a 60-way guest >> binding them to nodes 2-7) >> >> /etc/qemu-ifup tap0 >> >> numactl --cpunodebind=4,5,6,7 --membind=4,5,6,7 >> /usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme >> -enable-kvm \ >> -m 65536 -smp 40 \ >> -name vm1 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait >> \ >> -drive file=/var/lib/libvirt/images/vmVinod1/vm1.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none >> -device virtio-blk-pci,scsi=off,bus=pci >> .0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ >> -monitor stdio \ >> -net nic,macaddr=<..mac_addr..> \ >> -net tap,ifname=tap0,script=no,downscript=no \ >> -vnc :4 >> >> /etc/qemu-ifdown tap0 >> >> >> I knew that there will be a few additional temporary qemu worker >> threads created... i.e. some over >> subscription will be there. >> > 4 nodes above have 40 real cores, yes? Yes . Other than the qemu's related threads and some of the generic per-cpu Linux kernel threads (e.g. migration etc) there isn't anything else running on these Numa nodes. > Can you try to run upstream > kernel without binding at all and check the performance? I shall re-run and get back to you with this info. Typically for the native runs... binding the workload results in better numbers. Hence I choose to do the binding for the guest too...i.e. on the same numa nodes as the native case for virt. vs. native comparison purposes. Having said that ...In the past I had seen a couple of cases where the non-binded guest performed better than the native case. Need to re-run and dig into this further... > >> Will have to retry by doing some explicit pinning of the vcpus to >> native cores (without using virsh). >> >>>>> Turned on function tracing and found that there appears to be more time being >>>>> spent around the lock code in the 3.3.1 guest when compared to the 2.6.32-220 >>>>> guest. >>>> Looks like you may be running into the ticket spinlock >>>> code. During the early RHEL 6 days, Gleb came up with a >>>> patch to automatically disable ticket spinlocks when >>>> running inside a KVM guest. >>>> >>>> IIRC that patch got rejected upstream at the time, >>>> with upstream developers preferring to wait for a >>>> "better solution". >>>> >>>> If such a better solution is not on its way upstream >>>> now (two years later), maybe we should just merge >>>> Gleb's patch upstream for the time being? >>> I think the pv spinlock that is actively discussed currently should >>> address the issue, but I am not sure someone tests it against non-ticket >>> lock in a guest to see which one performs better. >> I did see that discussion...seems to have originated from the Xen context. >> > Yes, The problem is the same for both hypervisors. > > -- > Gleb. Thanks Vinod