kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Stamatis, Ilias" <ilstam@amazon.com>
To: "seanjc@google.com" <seanjc@google.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jmattson@google.com" <jmattson@google.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"mtosatti@redhat.com" <mtosatti@redhat.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"zamsden@gmail.com" <zamsden@gmail.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>
Subject: Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
Date: Tue, 25 May 2021 19:25:26 +0000	[thread overview]
Message-ID: <6d18b842e1ab946da2e0ebfae79fc51c3193802a.camel@amazon.com> (raw)
In-Reply-To: <YK0emU2NjWZWBovh@google.com>

On Tue, 2021-05-25 at 15:58 +0000, Sean Christopherson wrote:
> On Tue, May 25, 2021, Stamatis, Ilias wrote:
> > On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> > > Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
> > > scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
> > > per-VMCS tracking, it should not be useful, keyword "should".
> > > 
> > > What I meant by my earlier comment:
> > > 
> > >   Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
> > >   earlier", which is all kinds of wrong.
> > > 
> > > is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
> > > behavior is to set the field at VMCS initialization, and then immediately set it
> > > whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> > > In other words, my unclear feedback was to make it obsolete (and drop it) by
> > > fixing the underlying mess, not to just drop the optimization hack.
> > 
> > I understood this and replied earlier. The right place for the hw multiplier
> > field to be updated is inside set_tsc_khz() in common code when the ratio
> > changes. However, this requires adding another vendor callback etc. As all
> > this is further refactoring I believe it's better to leave this series as is -
> > ie only touching code that is directly related to nested TSC scaling and not
> > try to do everything as part of the same series.
> 
> But it directly impacts your code, e.g. the nested enter/exit flows would need
> to dance around the decache silliness.  And I believe it even more directly
> impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace
> invokes KVM_SET_TSC_KHZ while L2 is active.
> 
> > This makes testing easier too.
> 
> Hmm, sort of.  Yes, the fewer patches/modifications in a series definitely makes
> the series itself easier to test.  But stepping back and looking at the total
> cost of testing, I would argue that punting related changes to a later time
> increases the overall cost.  E.g. if someone else picks up the clean up work,
> then they have to redo most, if not all, of the testing that you are already
> doing, including getting access to the proper hardware, understanding what tests
> to prioritize, etc...  Whereas adding one more patch to your series is an
> incremental cost since you already have the hardware setup, know which tests to
> run, etc...
> 
> > We can still implement these changes later.
> 
> We can, but we shouldn't.  Simply dropping vmx->current_tsc_ratio is not an
> option; it knowingly introduces a (minor) performance regression, for no reason
> other than wanting to avoid code churn.  Piling more stuff on top of the flawed
> decache logic is impolite, as it adds more work for the person that ends up
> doing the cleanup.  I would 100% agree if this were a significant cleanup and/or
> completely unrelated, but IMO that's not the case.
> 
> Compile tested only...
> 
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 029c9615378f..34ad7a17458a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
>  KVM_X86_OP(get_l2_tsc_offset)
>  KVM_X86_OP(get_l2_tsc_multiplier)
>  KVM_X86_OP(write_tsc_offset)
> +KVM_X86_OP(write_tsc_multiplier)
>  KVM_X86_OP(get_exit_info)
>  KVM_X86_OP(check_intercept)
>  KVM_X86_OP(handle_exit_irqoff)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f099277b993d..a334ce7741ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
>         u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
>         u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
>         void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
> +       void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);
> 
>         /*
>          * Retrieve somewhat arbitrary exit information.  Intended to be used
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b18f60463073..914afcceb46d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1103,6 +1103,14 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>         vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
>  }
> 
> +static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
> +{
> +       /*
> +        * Handled when loading guest state since the ratio is programmed via
> +        * MSR_AMD64_TSC_RATIO, not a field in the VMCB.
> +        */
> +}
> +
>  /* Evaluate instruction intercepts that depend on guest CPUID features. */
>  static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
>                                               struct vcpu_svm *svm)
> @@ -4528,6 +4536,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>         .get_l2_tsc_offset = svm_get_l2_tsc_offset,
>         .get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
>         .write_tsc_offset = svm_write_tsc_offset,
> +       .write_tsc_multiplier = svm_write_tsc_multiplier,
> 
>         .load_mmu_pgd = svm_load_mmu_pgd,
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..712190493926 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2535,7 +2535,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>         vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);
> 
>         if (kvm_has_tsc_control)
> -               decache_tsc_multiplier(vmx);
> +               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);
> 
>         nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);
> 
> @@ -4505,7 +4505,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>                 vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);
> 
>         if (kvm_has_tsc_control)
> -               decache_tsc_multiplier(vmx);
> +               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);
> 
>         if (vmx->nested.change_vmcs01_virtual_apic_mode) {
>                 vmx->nested.change_vmcs01_virtual_apic_mode = false;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4b70431c2edd..bf845a08995e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1390,11 +1390,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> 
>                 vmx->loaded_vmcs->cpu = cpu;
>         }
> -
> -       /* Setup TSC multiplier */
> -       if (kvm_has_tsc_control &&
> -           vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> -               decache_tsc_multiplier(vmx);
>  }
> 
>  /*
> @@ -1813,6 +1808,11 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>         vmcs_write64(TSC_OFFSET, offset);
> ...skipping...
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -322,8 +322,6 @@ struct vcpu_vmx {
>         /* apic deadline value in host tsc */
>         u64 hv_deadline_tsc;
> 
> -       u64 current_tsc_ratio;
> -
>         unsigned long host_debugctlmsr;
> 
>         /*
> @@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
>                               GFP_KERNEL_ACCOUNT);
>  }
> 
> -static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
> -{
> -       vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
> -       vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
> -}
> -
>  static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
>  {
>         return vmx->secondary_exec_control &
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b61b54cea495..690de1868873 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2179,14 +2179,16 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
>         return v;
>  }
> 
> +static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
> +                                         u64 l1_multiplier);
> +
>  static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>  {
>         u64 ratio;
> 
>         /* Guest TSC same frequency as host TSC? */
>         if (!scale) {
> -               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> -               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> +               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
>                 return 0;
>         }
> 
> @@ -2212,7 +2214,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>                 return -1;
>         }
> 
> -       vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
> +       kvm_vcpu_write_tsc_multiplier(vcpu, ratio);
>         return 0;
>  }
> 
> @@ -2224,8 +2226,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
>         /* tsc_khz can be zero if TSC calibration fails */
>         if (user_tsc_khz == 0) {
>                 /* set tsc_scaling_ratio to a safe value */
> -               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> -               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> +               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
>                 return -1;
>         }
> 
> @@ -2383,6 +2384,25 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
>         static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
>  }
> 
> +static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
> +                                         u64 l1_multiplier)
> +{
> +       if (!kvm_has_tsc_control)
> +               return;
> +
> +       vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
> +
> +       /* Userspace is changing the multiplier while L2 is active... */
> +       if (is_guest_mode(vcpu))
> +               vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
> +                       l1_multiplier,
> +                       static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
> +       else
> +               vcpu->arch.tsc_scaling_ratio = l1_multiplier;
> +
> +       static_call(kvm_x86_write_tsc_multiplier)(vcpu, vcpu->arch.tsc_scaling_ratio);
> +}
> +
>  static inline bool kvm_check_tsc_unstable(void)
>  {
>  #ifdef CONFIG_X86_64

Hmm, this patch actually still removes the caching and introduces a small
performance overhead. For example if neither L1 nor L2 are scaled it will
still do a vmwrite for every L2 entry/write.

So do we want to get rid of decache_tsc_multiplier() but keep 
vmx->current_tsc_ratio and do the check inside write_tsc_multiplier()? Or 
alternatively delete vmx->current_tsc_ratio too and have 
write_tsc_multiplier() receive 2 parameters, one of the old multiplier and 
one of the new?



  parent reply	other threads:[~2021-05-25 19:25 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
2021-05-24 17:49   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
2021-05-24 17:49   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1() Ilias Stamatis
2021-05-24 14:21   ` Paolo Bonzini
2021-05-24 17:49     ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
2021-05-24 14:23   ` Paolo Bonzini
2021-05-24 15:48     ` Sean Christopherson
2021-05-24 15:56       ` Paolo Bonzini
2021-05-24 17:50     ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 05/12] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
2021-05-24 17:50   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier Ilias Stamatis
2021-05-24 17:51   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
2021-05-24 17:51   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier() Ilias Stamatis
2021-05-24 17:53   ` Maxim Levitsky
2021-05-24 18:44     ` Sean Christopherson
2021-05-25 10:41       ` Stamatis, Ilias
2021-05-25 15:58         ` Sean Christopherson
2021-05-25 16:15           ` Paolo Bonzini
2021-05-25 16:34             ` Sean Christopherson
2021-05-25 17:34               ` Paolo Bonzini
2021-05-25 18:21                 ` Sean Christopherson
2021-05-25 18:52           ` Stamatis, Ilias
2021-05-25 19:25           ` Stamatis, Ilias [this message]
2021-05-25 23:35             ` Sean Christopherson
2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
2021-05-24 17:54   ` Maxim Levitsky
2021-05-25 16:05   ` Sean Christopherson
2021-05-21 10:24 ` [PATCH v3 11/12] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
2021-05-24 17:55   ` Maxim Levitsky
2021-05-24 15:37 ` [PATCH v3 00/12] KVM: Implement nested TSC scaling Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d18b842e1ab946da2e0ebfae79fc51c3193802a.camel@amazon.com \
    --to=ilstam@amazon.com \
    --cc=dwmw@amazon.co.uk \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=zamsden@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).