From mboxrd@z Thu Jan 1 00:00:00 1970 From: Raghavendra K T Subject: Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests Date: Wed, 18 Jan 2012 00:06:30 +0530 Message-ID: <4F15BFAE.7060500@linux.vnet.ibm.com> References: <20120114182501.8604.68416.sendpatchset@oc5400248562.ibm.com> <3EC1B881-0724-49E3-B892-F40BEB07D15D@suse.de> <20120116142014.GA10155@linux.vnet.ibm.com> <4F146EA5.3010106@linux.vnet.ibm.com> <4F15AF9E.9000907@linux.vnet.ibm.com> <1485A122-9D48-46E3-A01E-E37B5C9EC54A@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Cc: Jeremy Fitzhardinge , Greg Kroah-Hartman , linux-doc@vger.kernel.org, Peter Zijlstra , Jan Kiszka , Srivatsa Vaddagiri , Paul Mackerras , "H. Peter Anvin" , Stefano Stabellini , Xen , Dave Jiang , KVM , Glauber Costa , X86 , Ingo Molnar , Avi Kivity , Rik van Riel , Konrad Rzeszutek Wilk , Sasha Levin , Sedat Dilek , Thomas Gleixner , Virtualization , LKML , Dave Hansen Return-path: In-Reply-To: <1485A122-9D48-46E3-A01E-E37B5C9EC54A@suse.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org List-Id: kvm.vger.kernel.org On 01/17/2012 11:09 PM, Alexander Graf wrote: > > On 17.01.2012, at 18:27, Raghavendra K T wrote: > >> On 01/17/2012 12:12 AM, Alexander Graf wrote: >>> >>> On 16.01.2012, at 19:38, Raghavendra K T wrote: >>> >>>> On 01/16/2012 07:53 PM, Alexander Graf wrote: >>>>> >>>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote: >>>>> >>>>>> * Alexander Graf [2012-01-16 04:57:45]: >>>>>> >>>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? >>>>>> >>>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS >>>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for >>>>>> some workload(s)? >>>>> >>>>> Yup >>>>> >>>>>> >>>>>> In some sense, the 1x overcommitcase results posted does measure the overhead >>>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast >>>>>> kernbench .. >>>>>> >>>>>>> Result for Non PLE machine : >>>>>>> ============================ >>>>>> >>>>>> [snip] >>>>>> >>>>>>> Kernbench: >>>>>>> BASE BASE+patch >>>>> >>>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet. >>>>> >>>>> >>>>> Alex >>>> >>>> Sorry for confusion, I think I was little imprecise on the BASE. >>>> >>>> The BASE is pre 3.2.0 + Jeremy's following patches: >>>> xadd (https://lkml.org/lkml/2011/10/4/328) >>>> x86/ticketlocklock (https://lkml.org/lkml/2011/10/12/496). >>>> So this would have ticketlock cleanups from Jeremy and >>>> CONFIG_PARAVIRT_SPINLOCKS=y >>>> >>>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock >>>> series and CONFIG_PARAVIRT_SPINLOCKS=y >>>> >>>> In both the cases CONFIG_PARAVIRT_SPINLOCKS=y. >>>> >>>> So let, >>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n >>>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n >>>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y >>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n >>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y >>>> >>>> is it performance of A vs E ? (currently C vs E) >>> >>> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :). >>> >>> >>> Alex >>> >>> >> setup : >> Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, (16 cpu online) >> >> Guest : Single guest with 8 VCPU 4GB Ram. >> benchmark : kernbench -f -H -M -o 20 >> >> Here is the result : >> Native Run >> ============ >> case A case B %improvement case C %improvement >> 56.1917 (2.57125) 56.035 (2.02439) 0.278867 56.27 (2.40401) -0.139344 > > This looks a lot like statistical derivation. How often did you execute the test case? Did you make sure to have a clean base state every time? > > Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario. > It was a quick test. two iteration of kernbench (=6runs) and had ensured cache is cleared. echo "1" > /proc/sys/vm/drop_caches ccache -C. Yes may be I can run test as you mentioned.. >> >> Guest Run >> ============ >> case A case B %improvement case C %improvement >> 166.999 (15.7613) 161.876 (14.4874) 3.06768 161.24 (12.6497) 3.44852 > > Is this the same machine? Why is the guest 3x slower? Yes non - ple machine but with all 16 cpus online. 3x slower you meant case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ? > > > Alex > >> >> We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y >> > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Raghavendra K T Subject: Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests Date: Wed, 18 Jan 2012 00:06:30 +0530 Message-ID: <4F15BFAE.7060500@linux.vnet.ibm.com> References: <20120114182501.8604.68416.sendpatchset@oc5400248562.ibm.com> <3EC1B881-0724-49E3-B892-F40BEB07D15D@suse.de> <20120116142014.GA10155@linux.vnet.ibm.com> <4F146EA5.3010106@linux.vnet.ibm.com> <4F15AF9E.9000907@linux.vnet.ibm.com> <1485A122-9D48-46E3-A01E-E37B5C9EC54A@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1485A122-9D48-46E3-A01E-E37B5C9EC54A@suse.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Alexander Graf Cc: Jeremy Fitzhardinge , Greg Kroah-Hartman , linux-doc@vger.kernel.org, Peter Zijlstra , Jan Kiszka , Srivatsa Vaddagiri , Paul Mackerras , "H. Peter Anvin" , Stefano Stabellini , Xen , Dave Jiang , KVM , Glauber Costa , X86 , Ingo Molnar , Avi Kivity , Rik van Riel , Konrad Rzeszutek Wilk , Sasha Levin , Sedat Dilek , Thomas Gleixner , Virtualization , LKML , Dave Hansen List-Id: virtualization@lists.linuxfoundation.org On 01/17/2012 11:09 PM, Alexander Graf wrote: > > On 17.01.2012, at 18:27, Raghavendra K T wrote: > >> On 01/17/2012 12:12 AM, Alexander Graf wrote: >>> >>> On 16.01.2012, at 19:38, Raghavendra K T wrote: >>> >>>> On 01/16/2012 07:53 PM, Alexander Graf wrote: >>>>> >>>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote: >>>>> >>>>>> * Alexander Graf [2012-01-16 04:57:45]: >>>>>> >>>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? >>>>>> >>>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS >>>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for >>>>>> some workload(s)? >>>>> >>>>> Yup >>>>> >>>>>> >>>>>> In some sense, the 1x overcommitcase results posted does measure the overhead >>>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast >>>>>> kernbench .. >>>>>> >>>>>>> Result for Non PLE machine : >>>>>>> ============================ >>>>>> >>>>>> [snip] >>>>>> >>>>>>> Kernbench: >>>>>>> BASE BASE+patch >>>>> >>>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet. >>>>> >>>>> >>>>> Alex >>>> >>>> Sorry for confusion, I think I was little imprecise on the BASE. >>>> >>>> The BASE is pre 3.2.0 + Jeremy's following patches: >>>> xadd (https://lkml.org/lkml/2011/10/4/328) >>>> x86/ticketlocklock (https://lkml.org/lkml/2011/10/12/496). >>>> So this would have ticketlock cleanups from Jeremy and >>>> CONFIG_PARAVIRT_SPINLOCKS=y >>>> >>>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock >>>> series and CONFIG_PARAVIRT_SPINLOCKS=y >>>> >>>> In both the cases CONFIG_PARAVIRT_SPINLOCKS=y. >>>> >>>> So let, >>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n >>>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n >>>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y >>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n >>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y >>>> >>>> is it performance of A vs E ? (currently C vs E) >>> >>> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :). >>> >>> >>> Alex >>> >>> >> setup : >> Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, (16 cpu online) >> >> Guest : Single guest with 8 VCPU 4GB Ram. >> benchmark : kernbench -f -H -M -o 20 >> >> Here is the result : >> Native Run >> ============ >> case A case B %improvement case C %improvement >> 56.1917 (2.57125) 56.035 (2.02439) 0.278867 56.27 (2.40401) -0.139344 > > This looks a lot like statistical derivation. How often did you execute the test case? Did you make sure to have a clean base state every time? > > Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario. > It was a quick test. two iteration of kernbench (=6runs) and had ensured cache is cleared. echo "1" > /proc/sys/vm/drop_caches ccache -C. Yes may be I can run test as you mentioned.. >> >> Guest Run >> ============ >> case A case B %improvement case C %improvement >> 166.999 (15.7613) 161.876 (14.4874) 3.06768 161.24 (12.6497) 3.44852 > > Is this the same machine? Why is the guest 3x slower? Yes non - ple machine but with all 16 cpus online. 3x slower you meant case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ? > > > Alex > >> >> We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y >> > >