linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Moger, Babu" <Babu.Moger@amd.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: "joro@8bytes.org" <joro@8bytes.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [RFC 3/3] arch/x86/kvm: SVM: Introduce pause loop exit logic in SVM
Date: Sat, 10 Mar 2018 05:07:20 +0000	[thread overview]
Message-ID: <CY4PR12MB1768CC12C6685127A947D02795DD0@CY4PR12MB1768.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20180309181233.GO12290@flask>

Radim,
 Thanks for the comments. Taken care of most of the comments.
 I have few questions/comments. Please see inline.

> -----Original Message-----
> From: Radim Krčmář <rkrcmar@redhat.com>
> Sent: Friday, March 9, 2018 12:13 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: joro@8bytes.org; tglx@linutronix.de; mingo@redhat.com;
> hpa@zytor.com; x86@kernel.org; pbonzini@redhat.com;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [RFC 3/3] arch/x86/kvm: SVM: Introduce pause loop exit logic in
> SVM
> 
> 2018-03-02 11:17-0500, Babu Moger:
> > Bring the PLE(pause loop exit) logic to AMD svm driver.
> > We have noticed it help in situations where numerous pauses are
> generated
> > due to spinlock or other scenarios. Tested it with idle=poll and noticed
> > pause interceptions go down considerably.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kvm/svm.c | 114
> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.h |   1 +
> >  2 files changed, 114 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 50a4e95..30bc851 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -263,6 +263,55 @@ struct amd_svm_iommu_ir {
> >  static bool npt_enabled;
> >  #endif
> >
> > +/*
> > + * These 2 parameters are used to config the controls for Pause-Loop
> Exiting:
> > + * pause_filter_thresh: On processors that support Pause
> filtering(indicated
> > + *	by CPUID Fn8000_000A_EDX), the VMCB provides a 16 bit pause filter
> > + *	count value. On VMRUN this value is loaded into an internal counter.
> > + *	Each time a pause instruction is executed, this counter is
> decremented
> > + *	until it reaches zero at which time a #VMEXIT is generated if pause
> > + *	intercept is enabled. Refer to  AMD APM Vol 2 Section 15.14.4 Pause
> > + *	Intercept Filtering for more details.
> > + *	This also indicate if ple logic enabled.
> > + *
> > + * pause_filter_count: In addition, some processor families support
> advanced
> 
> The comment has thresh/count flipped.

Good catch. Thanks

> 
> > + *	pause filtering (indicated by CPUID Fn8000_000A_EDX) upper bound
> on
> > + *	the amount of time a guest is allowed to execute in a pause loop.
> > + *	In this mode, a 16-bit pause filter threshold field is added in the
> > + *	VMCB. The threshold value is a cycle count that is used to reset the
> > + *	pause counter. As with simple pause filtering, VMRUN loads the
> pause
> > + *	count value from VMCB into an internal counter. Then, on each
> pause
> > + *	instruction the hardware checks the elapsed number of cycles since
> > + *	the most recent pause instruction against the pause filter threshold.
> > + *	If the elapsed cycle count is greater than the pause filter threshold,
> > + *	then the internal pause count is reloaded from the VMCB and
> execution
> > + *	continues. If the elapsed cycle count is less than the pause filter
> > + *	threshold, then the internal pause count is decremented. If the
> count
> > + *	value is less than zero and PAUSE intercept is enabled, a #VMEXIT is
> > + *	triggered. If advanced pause filtering is supported and pause filter
> > + *	threshold field is set to zero, the filter will operate in the simpler,
> > + *	count only mode.
> > + */
> > +
> > +static int pause_filter_thresh = KVM_DEFAULT_PLE_GAP;
> > +module_param(pause_filter_thresh, int, S_IRUGO);
> 
> I think it was a mistake to put signed values in VMX ...
> Please use unsigned variants and also properly sized.
> (The module param type would be "ushort" instead of "int".)

Sure. Will take care.
> 
> > +static int pause_filter_count = KVM_DEFAULT_PLE_WINDOW;
> > +module_param(pause_filter_count, int, S_IRUGO);
> 
> We are going to want a different default for pause_filter_count, because
> they have a different meaning.  On Intel, it's the number of cycles, on
> AMD, it's the number of PAUSE instructions.
> 
> The AMD's 3k is a bit high in comparison to Intel's 4k, but I'd keep 3k
> unless we have other benchmark results.

Ok. Testing with pause_filter_count = 3k for AMD. If everything goes fine, will make these changes.

> 
> > +static int ple_window_grow = KVM_DEFAULT_PLE_WINDOW_GROW;
> 
> The naming would be nicer with a consistent prefix.  We're growing
> pause_filter_count, so pause_filter_count_grow is easier to understand.
> (Albeit unwieldy.)

Sure. Will take care.

> 
> > +module_param(ple_window_grow, int, S_IRUGO);
> 
> (This is better as unsigned too ... VMX should have had that.)

Yes. Will fix it.

> 
> > @@ -1046,6 +1095,58 @@ static int avic_ga_log_notifier(u32 ga_tag)
> >  	return 0;
> >  }
> >
> > +static void grow_ple_window(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	struct vmcb_control_area *control = &svm->vmcb->control;
> > +	int old = control->pause_filter_count;
> > +
> > +	control->pause_filter_count = __grow_ple_window(old,
> > +							pause_filter_count,
> > +							ple_window_grow,
> > +
> 	ple_window_actual_max);
> > +
> > +	if (control->pause_filter_count != old)
> > +		mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
> > +
> > +	trace_kvm_ple_window_grow(vcpu->vcpu_id,
> > +				  control->pause_filter_count, old);
> 
> Please move the tracing into __shrink_ple_window to share the code.
> This probably belongs to patch [2/3].

I will have to pass vcpu_id, and have to make few changes to display old and new values.
I am afraid it might add few more extra instructions.

> 
> > +/*
> > + * ple_window_actual_max is computed to be one grow_ple_window()
> below
> > + * ple_window_max. (See __grow_ple_window for the reason.)
> > + * This prevents overflows, because ple_window_max is int.
> > + * ple_window_max effectively rounded down to a multiple of
> ple_window_grow in
> > + * this process.
> > + * ple_window_max is also prevented from setting control-
> >pause_filter_count <
> > + * pause_filter_count.
> > + */
> > +static void update_ple_window_actual_max(void)
> > +{
> > +	ple_window_actual_max =
> > +		__shrink_ple_window(max(ple_window_max,
> pause_filter_count),
> 
> (I have no idea what I was thinking when I wrote that for VMX. :[
>  I'll write a patch to get rid of ple_window_actual_max, because its
>  benefits are really minuscule and the logic is complicated.)

If you are thinking of just straight forward removal, I can take care of it.

> 
> > +				    pause_filter_count,
> > +				    ple_window_grow, SHRT_MIN);
> > +}
> >  static __init int svm_hardware_setup(void)
> >  {
> >  	int cpu;
> > @@ -1309,7 +1412,11 @@ static void init_vmcb(struct vcpu_svm *svm)
> >  	svm->vcpu.arch.hflags = 0;
> >
> >  	if (boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
> > -		control->pause_filter_count = 3000;
> > +		control->pause_filter_count = pause_filter_count;
> > +		if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
> > +			control->pause_filter_thresh = pause_filter_thresh;
> > +		else
> > +			pause_filter_thresh = 0;
> 
> Please move this to hardware_setup and also clear pause_filter_count if

Moving this to hardware_setup will be a problem.  We don't have access to svm data structure in hardware_setup.

> X86_FEATURE_PAUSEFILTER is not present.

Sure. Will clear pause_filter_count if X86_FEATURE_PAUSEFILTER is not present.
> 
> >  		set_intercept(svm, INTERCEPT_PAUSE);
> 
> The intercept should then be disabled iff pause_filter_count == 0.

Yes, will disable intercept if pause_filter_count is zero.

> 
> The functionality looks correct,
> 
> thanks!

  reply	other threads:[~2018-03-10  5:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-02 16:17 [RFC 0/3] arch/x86/kvm: Introduce PLE(pause loop exit) logic in SVM Babu Moger
2018-03-02 16:17 ` [RFC 1/3] arch/x86/kvm: SVM: Introduce pause filter threshold Babu Moger
2018-03-02 16:17 ` [RFC 2/3] arch/x86/kvm: VMX: Bring the common code to header file Babu Moger
2018-03-02 16:17 ` [RFC 3/3] arch/x86/kvm: SVM: Introduce pause loop exit logic in SVM Babu Moger
2018-03-09 18:12   ` Radim Krčmář
2018-03-10  5:07     ` Moger, Babu [this message]
2018-03-14 13:26       ` Radim Krčmář
2018-03-16  2:07         ` Moger, Babu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR12MB1768CC12C6685127A947D02795DD0@CY4PR12MB1768.namprd12.prod.outlook.com \
    --to=babu.moger@amd.com \
    --cc=hpa@zytor.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).