From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chegu Vinod <chegu_vinod@hp.com>
Subject: Re: Performance of  40-way guest running  2.6.32-220 (RHEL6.2)  vs.
 3.3.1 OS
Date: Tue, 17 Apr 2012 06:25:15 -0700
Message-ID: <4F8D6F3B.9070203@hp.com>
References: <loom.20120411T183827-108@post.gmane.org> <4F871D12.3060006@redhat.com> <20120416121833.GR11918@redhat.com> <4F8C3057.7010404@hp.com> <20120417094939.GE11918@redhat.com>
Reply-To: chegu_vinod@hp.com
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Rik van Riel <riel@redhat.com>, kvm@vger.kernel.org
To: Gleb Natapov <gleb@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from g4t0017.houston.hp.com ([15.201.24.20]:39218 "EHLO
	g4t0017.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932175Ab2DQNZr (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 17 Apr 2012 09:25:47 -0400
In-Reply-To: <20120417094939.GE11918@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 4/17/2012 2:49 AM, Gleb Natapov wrote:
> On Mon, Apr 16, 2012 at 07:44:39AM -0700, Chegu Vinod wrote:
>> On 4/16/2012 5:18 AM, Gleb Natapov wrote:
>>> On Thu, Apr 12, 2012 at 02:21:06PM -0400, Rik van Riel wrote:
>>>> On 04/11/2012 01:21 PM, Chegu Vinod wrote:
>>>>> Hello,
>>>>>
>>>>> While running an AIM7 (workfile.high_systime) in a single 40-way (or a single
>>>>> 60-way KVM guest) I noticed pretty bad performance when the guest was booted
>>>>> with 3.3.1 kernel when compared to the same guest booted with 2.6.32-220
>>>>> (RHEL6.2) kernel.
>>>>> For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x better than
>>>>> the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run the older guest
>>>>> kernel was nearly 12x better !
>>> How many CPUs your host has?
>> 80 Cores on the DL980.  (i.e. 8 Westmere sockets).
>>
> So you are not oversubscribing CPUs at all. Are those real cores or including HT?

HT is off.

> Do you have other cpus hogs running on the host while testing the guest?

Nope.  Sometimes I do run the utilities like "perf" or "sar" or "mpstat" 
on the numa node 0 (where
the guest is not running).

>
>> I was using numactl to bind the qemu of the 40-way guests to numa
>> nodes : 4-7  ( or for a 60-way guest
>> binding them to nodes 2-7)
>>
>> /etc/qemu-ifup tap0
>>
>> numactl --cpunodebind=4,5,6,7 --membind=4,5,6,7
>> /usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
>> -enable-kvm \
>> -m 65536 -smp 40 \
>> -name vm1 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait
>> \
>> -drive file=/var/lib/libvirt/images/vmVinod1/vm1.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
>> -device virtio-blk-pci,scsi=off,bus=pci
>> .0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
>> -monitor stdio \
>> -net nic,macaddr=<..mac_addr..>  \
>> -net tap,ifname=tap0,script=no,downscript=no \
>> -vnc :4
>>
>> /etc/qemu-ifdown tap0
>>
>>
>> I knew that there will be a few additional temporary qemu worker
>> threads created...  i.e. some over
>> subscription  will be there.
>>
> 4 nodes above have 40 real cores, yes?

Yes .
Other than the qemu's related threads and some of the generic per-cpu 
Linux kernel threads (e.g. migration  etc)
there isn't anything else running on these Numa nodes.

> Can you try to run upstream
> kernel without binding at all and check the performance?


I shall re-run and get back to you with this info.

Typically for the native runs... binding the workload results in better 
numbers.  Hence I choose to do the
binding for the guest too...i.e. on the same numa nodes as the native 
case for virt. vs. native comparison
purposes. Having said that ...In the past I had seen a couple of cases 
where the non-binded guest
performed better than the native case. Need to re-run and dig into this 
further...

>
>> Will have to retry by doing some explicit pinning of the vcpus to
>> native cores (without using virsh).
>>
>>>>> Turned on function tracing and found that there appears to be more time being
>>>>> spent around the lock code in the 3.3.1 guest when compared to the 2.6.32-220
>>>>> guest.
>>>> Looks like you may be running into the ticket spinlock
>>>> code. During the early RHEL 6 days, Gleb came up with a
>>>> patch to automatically disable ticket spinlocks when
>>>> running inside a KVM guest.
>>>>
>>>> IIRC that patch got rejected upstream at the time,
>>>> with upstream developers preferring to wait for a
>>>> "better solution".
>>>>
>>>> If such a better solution is not on its way upstream
>>>> now (two years later), maybe we should just merge
>>>> Gleb's patch upstream for the time being?
>>> I think the pv spinlock that is actively discussed currently should
>>> address the issue, but I am not sure someone tests it against non-ticket
>>> lock in a guest to see which one performs better.
>> I did see that discussion...seems to have originated from the Xen context.
>>
> Yes, The problem is the same for both hypervisors.
>
> --
> 			Gleb.

Thanks
Vinod