linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Avi Kivity <avi@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Srikar <srikar@linux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	KVM <kvm@vger.kernel.org>, Jiannan Ouyang <ouyang@cs.pitt.edu>,
	chegu vinod <chegu_vinod@hp.com>,
	"Andrew M. Theurer" <habanero@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@gmail.com>,
	Gleb Natapov <gleb@redhat.com>
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
Date: Wed, 3 Oct 2012 17:52:09 +0530	[thread overview]
Message-ID: <20121003122209.GA9076@linux.vnet.ibm.com> (raw)
In-Reply-To: <50607F1F.2040704@redhat.com>

* Avi Kivity <avi@redhat.com> [2012-09-24 17:41:19]:

> On 09/21/2012 08:24 PM, Raghavendra K T wrote:
> > On 09/21/2012 06:32 PM, Rik van Riel wrote:
> >> On 09/21/2012 08:00 AM, Raghavendra K T wrote:
> >>> From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> >>>
> >>> When total number of VCPUs of system is less than or equal to physical
> >>> CPUs,
> >>> PLE exits become costly since each VCPU can have dedicated PCPU, and
> >>> trying to find a target VCPU to yield_to just burns time in PLE handler.
> >>>
> >>> This patch reduces overhead, by simply doing a return in such
> >>> scenarios by
> >>> checking the length of current cpu runqueue.
> >>
> >> I am not convinced this is the way to go.
> >>
> >> The VCPU that is holding the lock, and is not releasing it,
> >> probably got scheduled out. That implies that VCPU is on a
> >> runqueue with at least one other task.
> > 
> > I see your point here, we have two cases:
> > 
> > case 1)
> > 
> > rq1 : vcpu1->wait(lockA) (spinning)
> > rq2 : vcpu2->holding(lockA) (running)
> > 
> > Here Ideally vcpu1 should not enter PLE handler, since it would surely
> > get the lock within ple_window cycle. (assuming ple_window is tuned for
> > that workload perfectly).
> > 
> > May be this explains why we are not seeing benefit with kernbench.
> > 
> > On the other side, Since we cannot have a perfect ple_window tuned for
> > all type of workloads, for those workloads, which may need more than
> > 4096 cycles, we gain. thinking is it that we are seeing in benefited
> > cases?
> 
> Maybe we need to increase the ple window regardless.  4096 cycles is 2
> microseconds or less (call it t_spin).  The overhead from
> kvm_vcpu_on_spin() and the associated task switches is at least a few
> microseconds, increasing as contention is added (call it t_tield).  The
> time for a natural context switch is several milliseconds (call it
> t_slice).  There is also the time the lock holder owns the lock,
> assuming no contention (t_hold).
> 
> If t_yield > t_spin, then in the undercommitted case it dominates
> t_spin.  If t_hold > t_spin we lose badly.
> 
> If t_spin > t_yield, then the undercommitted case doesn't suffer as much
> as most of the spinning happens in the guest instead of the host, so it
> can pick up the unlock timely.  We don't lose too much in the
> overcommitted case provided the values aren't too far apart (say a
> factor of 3).
> 
> Obviously t_spin must be significantly smaller than t_slice, otherwise
> it accomplishes nothing.
> 
> Regarding t_hold: if it is small, then a larger t_spin helps avoid false
> exits.  If it is large, then we're not very sensitive to t_spin.  It
> doesn't matter if it takes us 2 usec or 20 usec to yield, if we end up
> yielding for several milliseconds.
> 
> So I think it's worth trying again with ple_window of 20000-40000.
> 

Hi Avi,

I ran different benchmarks increasing ple_window, and results does not
seem to be encouraging for increasing ple_window.

Results:
16 core PLE machine with 16 vcpu guest. 

base kernel = 3.6-rc5 + ple handler optimization patch 
base_pleopt_8k = base kernel + ple window = 8k
base_pleopt_16k = base kernel + ple window = 16k
base_pleopt_32k = base kernel + ple window = 32k


Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096

		base_pleopt_8k	base_pleopt_16k	base_pleopt_32k
-----------------------------------------------------------------			
kernbench_1x	-5.54915	-15.94529	-44.31562
kernbench_2x	-7.89399	-17.75039	-37.73498
-----------------------------------------------------------------			
sysbench_1x	0.45955		-0.98778	0.05252
sysbench_2x	1.44071		-0.81625	1.35620
sysbench_3x 	0.45549		1.51795		-0.41573
-----------------------------------------------------------------			
			
hackbench_1x	-3.80272	-13.91456	-40.79059
hackbench_2x 	-4.78999	-7.61382	-7.24475
-----------------------------------------------------------------			
ebizzy_1x	-2.54626	-16.86050	-38.46109
ebizzy_2x	-8.75526	-19.29116	-48.33314
-----------------------------------------------------------------			

I also got perf top output to analyse the difference. Difference comes
because of flushtlb (and also spinlock).

Ebizzy run for 4k ple_window
-  87.20%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      - 100.00% _raw_spin_unlock_irqrestore
         + 52.89% release_pages
         + 47.10% pagevec_lru_move_fn
-   5.71%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      + 86.03% default_send_IPI_mask_allbutself_phys
      + 13.96% default_send_IPI_mask_sequence_phys
-   3.10%  [kernel]  [k] smp_call_function_many
     smp_call_function_many


Ebizzy run for 32k ple_window

-  91.40%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      - 100.00% _raw_spin_unlock_irqrestore
         + 53.13% release_pages
         + 46.86% pagevec_lru_move_fn
-   4.38%  [kernel]  [k] smp_call_function_many
     smp_call_function_many
-   2.51%  [kernel]  [k] arch_local_irq_restore
   - arch_local_irq_restore
      + 90.76% default_send_IPI_mask_allbutself_phys
      + 9.24% default_send_IPI_mask_sequence_phys


Below is the detailed result:			
patch = base_pleopt_8k 
+-----------+-----------+-----------+------------+-----------+
                              kernbench 
+-----------+-----------+-----------+------------+-----------+
    base         stddev    patch       stdev       %improve    
+-----------+-----------+-----------+------------+-----------+
    41.0027     0.7990	    43.2780     0.5180	  -5.54915
    89.2983     1.2406	    96.3475     1.8891	  -7.89399
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              sysbench 
+-----------+-----------+-----------+------------+-----------+
     9.9010     0.0558	     9.8555     0.1246	   0.45955
    19.7611     0.4290	    19.4764     0.0835	   1.44071
    29.1775     0.9903	    29.0446     0.8641	   0.45549
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              hackbench 
+-----------+-----------+-----------+------------+-----------+
    77.1580     1.9787	    80.0921     2.9696	  -3.80272
   239.2490     1.5660	   250.7090     2.6074	  -4.78999
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              ebizzy 
+-----------+-----------+-----------+------------+-----------+
  4256.2500   186.8053	  4147.8750   206.1840	  -2.54626
  2197.2500    93.1048	  2004.8750    85.7995	  -8.75526
+-----------+-----------+-----------+------------+-----------+

patch = base_pleopt_16k
+-----------+-----------+-----------+------------+-----------+
                              kernbench 
+-----------+-----------+-----------+------------+-----------+
    base         stddev    patch       stdev       %improve    
+-----------+-----------+-----------+------------+-----------+
    41.0027     0.7990	    47.5407     0.5739	 -15.94529
    89.2983     1.2406	   105.1491     1.2244	 -17.75039
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              sysbench 
+-----------+-----------+-----------+------------+-----------+
     9.9010     0.0558	     9.9988     0.1106	  -0.98778
    19.7611     0.4290	    19.9224     0.9016	  -0.81625
    29.1775     0.9903	    28.7346     0.2788	   1.51795
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              hackbench 
+-----------+-----------+-----------+------------+-----------+
    77.1580     1.9787	    87.8942     2.2132	 -13.91456
   239.2490     1.5660	   257.4650     5.3674	  -7.61382
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              ebizzy 
+-----------+-----------+-----------+------------+-----------+
  4256.2500   186.8053	  3538.6250   101.1165	 -16.86050
  2197.2500    93.1048	  1773.3750    91.8414	 -19.29116
+-----------+-----------+-----------+------------+-----------+

patch = base_pleopt_32k
+-----------+-----------+-----------+------------+-----------+
                              kernbench 
+-----------+-----------+-----------+------------+-----------+
    base         stddev    patch       stdev       %improve    
+-----------+-----------+-----------+------------+-----------+
    41.0027     0.7990	    59.1733     0.8102	 -44.31562
    89.2983     1.2406	   122.9950     1.5534	 -37.73498
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              sysbench 
+-----------+-----------+-----------+------------+-----------+
     9.9010     0.0558	     9.8958     0.0593	   0.05252
    19.7611     0.4290	    19.4931     0.1767	   1.35620
    29.1775     0.9903	    29.2988     1.0420	  -0.41573
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              hackbench 
+-----------+-----------+-----------+------------+-----------+
    77.1580     1.9787	   108.6312    13.1500	 -40.79059
   239.2490     1.5660	   256.5820     2.2722	  -7.24475
+-----------+-----------+-----------+------------+-----------+
+-----------+-----------+-----------+------------+-----------+
                              ebizzy 
+-----------+-----------+-----------+------------+-----------+
  4256.2500   186.8053	  2619.2500    80.8150	 -38.46109
  2197.2500    93.1048	  1135.2500    22.2887	 -48.33314
+-----------+-----------+-----------+------------+-----------+


  parent reply	other threads:[~2012-10-03 12:26 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-21 11:59 [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 1/2] kvm: Handle undercommitted guest case " Raghavendra K T
2012-09-21 13:02   ` Rik van Riel
2012-09-21 17:24     ` Raghavendra K T
2012-09-24 15:41       ` Avi Kivity
2012-09-24 16:06         ` Avi Kivity
2012-09-24 16:14           ` Peter Zijlstra
2012-09-24 16:25             ` Avi Kivity
2012-09-25  8:09           ` Raghavendra K T
2012-09-25  8:54             ` Avi Kivity
2012-09-25 13:49               ` Raghavendra K T
2012-09-27  7:44               ` Gleb Natapov
2012-09-27  8:59                 ` Avi Kivity
2012-09-27  9:11                   ` Gleb Natapov
2012-09-27  9:33                     ` Avi Kivity
2012-09-27  9:58                       ` Gleb Natapov
2012-09-27 10:04                         ` Avi Kivity
2012-09-27 10:08                           ` Gleb Natapov
2012-09-27 10:15                             ` Avi Kivity
     [not found]               ` <CAJocwcf+8u84_yDC-PK0Yni93YSTWzYvr69nq6b3pNv1MwVJzQ@mail.gmail.com>
2012-09-27  8:50                 ` Avi Kivity
2012-09-27 11:26                   ` Raghavendra K T
2012-09-27 12:06                     ` Avi Kivity
2012-09-28 18:18                       ` Konrad Rzeszutek Wilk
2012-09-30  8:16                         ` Avi Kivity
     [not found]                   ` <CAJocwcc19F+PtsQ5okGMvYeVnkEigpZRpwWY9JgeRPFqfcVoXA@mail.gmail.com>
2012-09-28  6:16                     ` Raghavendra K T
2012-09-30  8:18                       ` Avi Kivity
2012-09-30 11:07                         ` Gleb Natapov
2012-09-30 11:13                           ` Avi Kivity
2012-10-03 14:17                             ` Raghavendra K T
2012-10-03 14:56                               ` Avi Kivity
2012-10-04  7:29                                 ` Gleb Natapov
2012-10-05  8:36                                   ` Raghavendra K T
2012-10-07  9:51                                     ` Avi Kivity
2012-09-25  7:36         ` Raghavendra K T
2012-09-25  8:12           ` Avi Kivity
2012-09-25 14:21             ` Takuya Yoshikawa
2012-09-27  8:43               ` Avi Kivity
2012-10-03 12:22         ` Raghavendra K T [this message]
2012-10-03 17:05           ` Avi Kivity
2012-10-04 10:49             ` Raghavendra K T
2012-10-04 12:41               ` Avi Kivity
2012-10-04 13:07                 ` Peter Zijlstra
2012-10-04 15:00                   ` Avi Kivity
2012-10-09 18:51                     ` Raghavendra K T
2012-10-10  2:59                       ` Andrew Theurer
2012-10-10 17:54                         ` Raghavendra K T
2012-10-10 18:03                           ` David Ahern
2012-10-10 18:14                             ` Raghavendra K T
2012-10-10 19:36                           ` Andrew Theurer
2012-10-15 12:10                             ` Raghavendra K T
2012-10-15 14:34                               ` Andrew Theurer
2012-10-19  8:30                                 ` Raghavendra K T
2012-10-19 13:31                                   ` Andrew Theurer
2012-10-10 14:24                       ` Andrew Theurer
2012-10-10 17:43                         ` Raghavendra K T
2012-10-10 19:27                           ` Andrew Theurer
2012-10-11 17:13                             ` Raghavendra K T
2012-10-11 10:39                         ` Nikunj A Dadhania
2012-10-18 12:39                       ` Avi Kivity
2012-10-19  8:19                         ` Raghavendra K T
2012-10-04 14:41                 ` Andrew Theurer
2012-10-05  9:06                   ` Raghavendra K T
2012-10-05  9:02                 ` Raghavendra K T
2012-09-24 11:33   ` Peter Zijlstra
2012-09-24 11:40     ` Raghavendra K T
2012-09-21 12:00 ` [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario " Raghavendra K T
2012-09-21 13:22   ` Rik van Riel
2012-09-21 13:46   ` Takuya Yoshikawa
2012-09-21 13:52     ` Rik van Riel
2012-09-21 17:45       ` Raghavendra K T
2012-09-24 13:43         ` Takuya Yoshikawa
2012-09-24 15:26   ` Avi Kivity
2012-09-24 15:34     ` Peter Zijlstra
2012-09-24 15:43       ` Avi Kivity
2012-09-24 15:52         ` Peter Zijlstra
2012-09-24 15:58           ` Avi Kivity
2012-09-24 16:05             ` Peter Zijlstra
2012-09-24 16:10               ` Avi Kivity
2012-09-24 16:13                 ` Peter Zijlstra
2012-09-24 16:21                   ` Avi Kivity
2012-09-25 10:11                     ` Avi Kivity
2012-09-21 13:18 ` [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios " Chegu Vinod
2012-09-21 17:36   ` Raghavendra K T
2012-09-24  8:42     ` Dor Laor
2012-09-24 12:02       ` Raghavendra K T
2012-09-25 15:00         ` Dor Laor
2012-09-26 12:27           ` Konrad Rzeszutek Wilk
2012-09-27 10:07             ` Raghavendra K T
2012-09-27  9:49           ` Raghavendra K T
2012-09-27 10:28             ` Andrew Jones
2012-09-27 10:44               ` Avi Kivity
2012-09-27 11:31               ` Raghavendra K T
2012-09-27 10:33             ` Dor Laor
2012-09-24 11:34 ` Peter Zijlstra
2012-09-24 11:52   ` Raghavendra K T
2012-09-24 12:36     ` Peter Zijlstra
2012-09-24 13:29       ` Raghavendra K T
2012-09-24 13:54         ` Peter Zijlstra
2012-09-24 14:16           ` Raghavendra K T
2012-09-25 13:40             ` Raghavendra K T
2012-09-27  8:36               ` Avi Kivity
2012-09-27 11:23                 ` Raghavendra K T
2012-09-27 12:03                   ` Avi Kivity
2012-09-27 12:25                     ` Andrew Theurer
2012-09-28  5:38                     ` Raghavendra K T
2012-09-28  5:45                       ` H. Peter Anvin
2012-09-28  6:03                         ` Raghavendra K T
2012-09-28  8:38                       ` Peter Zijlstra
2012-09-28 11:40                       ` Andrew Theurer
2012-09-28 14:11                         ` Raghavendra K T
2012-09-28 14:13                         ` Peter Zijlstra
2012-09-30  8:24                         ` Avi Kivity
2012-10-03 14:29                     ` Raghavendra K T
2012-10-03 17:25                       ` Avi Kivity
2012-10-04 10:56                         ` Raghavendra K T
2012-10-04 12:44                           ` Avi Kivity
2012-10-05  9:04                             ` Raghavendra K T
2012-09-24 15:51           ` Avi Kivity
2012-09-24 16:03             ` Peter Zijlstra
2012-09-24 16:20               ` Avi Kivity
2012-09-26 13:20                 ` Andrew Jones
2012-09-26 13:26                   ` Peter Zijlstra
2012-09-26 13:39                     ` Andrew Jones
2012-09-26 13:45                       ` Peter Zijlstra
2012-09-26 12:57       ` Andrew Jones
2012-09-27 10:21         ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121003122209.GA9076@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=gleb@redhat.com \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).