From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755514Ab2I0KdU (ORCPT ); Thu, 27 Sep 2012 06:33:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46863 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751645Ab2I0KdS (ORCPT ); Thu, 27 Sep 2012 06:33:18 -0400 Message-ID: <50642B60.4050008@redhat.com> Date: Thu, 27 Sep 2012 12:33:04 +0200 From: Dor Laor Reply-To: dlaor@redhat.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Raghavendra K T CC: Chegu Vinod , Peter Zijlstra , "H. Peter Anvin" , Marcelo Tosatti , Ingo Molnar , Avi Kivity , Rik van Riel , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , "Andrew M. Theurer" , LKML , Srivatsa Vaddagiri , Gleb Natapov , Andrew Jones Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler References: <20120921115942.27611.67488.sendpatchset@codeblue> <505C691D.4080801@hp.com> <505CA5BA.4020801@linux.vnet.ibm.com> <50601CE7.60801@redhat.com> <50604BF0.1070607@linux.vnet.ibm.com> <5061C70E.2090308@redhat.com> <50642139.80309@linux.vnet.ibm.com> In-Reply-To: <50642139.80309@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/27/2012 11:49 AM, Raghavendra K T wrote: > On 09/25/2012 08:30 PM, Dor Laor wrote: >> On 09/24/2012 02:02 PM, Raghavendra K T wrote: >>> On 09/24/2012 02:12 PM, Dor Laor wrote: >>>> In order to help PLE and pvticketlock converge I thought that a small >>>> test code should be developed to test this in a predictable, >>>> deterministic way. >>>> >>>> The idea is to have a guest kernel module that spawn a new thread each >>>> time you write to a /sys/.... entry. >>>> >>>> Each such a thread spins over a spin lock. The specific spin lock is >>>> also chosen by the /sys/ interface. Let's say we have an array of spin >>>> locks *10 times the amount of vcpus. >>>> >>>> All the threads are running a >>>> while (1) { >>>> >>>> spin_lock(my_lock); >>>> sum += execute_dummy_cpu_computation(time); >>>> spin_unlock(my_lock); >>>> >>>> if (sys_tells_thread_to_die()) break; >>>> } >>>> >>>> print_result(sum); >>>> >>>> Instead of calling the kernel's spin_lock functions, clone them and >>>> make >>>> the ticket lock order deterministic and known (like a linear walk of >>>> all >>>> the threads trying to catch that lock). >>> >>> By Cloning you mean hierarchy of the locks? >> >> No, I meant to clone the implementation of the current spin lock code in >> order to set any order you may like for the ticket selection. >> (even for a non pvticket lock version) >> >> For instance, let's say you have N threads trying to grab the lock, you >> can always make the ticket go linearly from 1->2...->N. >> Not sure it's a good idea, just a recommendation. >> >>> Also I believe time should be passed via sysfs / hardcoded for each >>> type of lock we are mimicking >> >> Yap >> >>> >>>> >>>> This way you can easy calculate: >>>> 1. the score of a single vcpu running a single thread >>>> 2. the score of sum of all thread scores when #thread==#vcpu all >>>> taking the same spin lock. The overall sum should be close as >>>> possible to #1. >>>> 3. Like #2 but #threads > #vcpus and other versions of #total vcpus >>>> (belonging to all VMs) > #pcpus. >>>> 4. Create #thread == #vcpus but let each thread have it's own spin >>>> lock >>>> 5. Like 4 + 2 >>>> >>>> Hopefully this way will allows you to judge and evaluate the exact >>>> overhead of scheduling VMs and threads since you have the ideal result >>>> in hand and you know what the threads are doing. >>>> >>>> My 2 cents, Dor >>>> >>> >>> Thank you, >>> I think this is an excellent idea. ( Though I am trying to put all the >>> pieces together you mentioned). So overall we should be able to measure >>> the performance of pvspinlock/PLE improvements with a deterministic >>> load in guest. >>> >>> Only thing I am missing is, >>> How to generate different combinations of the lock. >>> >>> Okay, let me see if I can come with a solid model for this. >>> >> >> Do you mean the various options for PLE/pvticket/other? I haven't >> thought of it and assumed its static but it can also be controlled >> through the temporary /sys interface. >> > > No, I am not there yet. > > So In summary, we are suffering with inconsistent benchmark result, > while measuring the benefit of our improvement in PLE/pvlock etc.. > > So good point from your suggestion is, > - Giving predictability to workload that runs in guest, so that we have > pi-pi comparison of improvement. > > - we can easily tune the workload via sysfs, and we can have script to > automate them. > > What is complicated is: > - How can we simulate a workload close to what we measure with > benchmarks? > - How can we mimic lock holding time/ lock hierarchy close to the way > it is seen with real workloads (for e.g. highly contended zone lru lock > with similar amount of lockholding times). You can spin for a similar instruction count that you're interested > - How close it would be to when we forget about other types of spinning > (for e.g, flush_tlb). > > So I feel it is not as trivial as it looks like. Indeed this is mainly a tool that can serve to optimize few synthetic workloads. I still believe that it worth to go through this exercise since a 100% predictable and controlled case can help us purely asses the state of PLE and pvticket code. Otherwise we're dealing w/ too many parameters and assumptions at once. Dor