kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Stamatis, Ilias" <ilstam@amazon.com>
Cc: "mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jmattson@google.com" <jmattson@google.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"mtosatti@redhat.com" <mtosatti@redhat.com>,
	"zamsden@gmail.com" <zamsden@gmail.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>
Subject: Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()
Date: Tue, 25 May 2021 15:58:17 +0000	[thread overview]
Message-ID: <YK0emU2NjWZWBovh@google.com> (raw)
In-Reply-To: <8a13dedc5bc118072d1e79d8af13b5026de736b3.camel@amazon.com>

On Tue, May 25, 2021, Stamatis, Ilias wrote:
> On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> > Yes, but its existence is a complete hack.  vmx->current_tsc_ratio has the same
> > scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch.  Unlike
> > per-VMCS tracking, it should not be useful, keyword "should".
> > 
> > What I meant by my earlier comment:
> > 
> >   Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
> >   earlier", which is all kinds of wrong.
> > 
> > is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER.  The correct
> > behavior is to set the field at VMCS initialization, and then immediately set it
> > whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> > In other words, my unclear feedback was to make it obsolete (and drop it) by
> > fixing the underlying mess, not to just drop the optimization hack.
> 
> I understood this and replied earlier. The right place for the hw multiplier
> field to be updated is inside set_tsc_khz() in common code when the ratio
> changes. However, this requires adding another vendor callback etc. As all
> this is further refactoring I believe it's better to leave this series as is -
> ie only touching code that is directly related to nested TSC scaling and not
> try to do everything as part of the same series.

But it directly impacts your code, e.g. the nested enter/exit flows would need
to dance around the decache silliness.  And I believe it even more directly
impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace
invokes KVM_SET_TSC_KHZ while L2 is active.

> This makes testing easier too.

Hmm, sort of.  Yes, the fewer patches/modifications in a series definitely makes
the series itself easier to test.  But stepping back and looking at the total
cost of testing, I would argue that punting related changes to a later time
increases the overall cost.  E.g. if someone else picks up the clean up work,
then they have to redo most, if not all, of the testing that you are already
doing, including getting access to the proper hardware, understanding what tests
to prioritize, etc...  Whereas adding one more patch to your series is an
incremental cost since you already have the hardware setup, know which tests to
run, etc...

> We can still implement these changes later.

We can, but we shouldn't.  Simply dropping vmx->current_tsc_ratio is not an
option; it knowingly introduces a (minor) performance regression, for no reason
other than wanting to avoid code churn.  Piling more stuff on top of the flawed
decache logic is impolite, as it adds more work for the person that ends up
doing the cleanup.  I would 100% agree if this were a significant cleanup and/or
completely unrelated, but IMO that's not the case.

Compile tested only...


diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 029c9615378f..34ad7a17458a 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
 KVM_X86_OP(write_tsc_offset)
+KVM_X86_OP(write_tsc_multiplier)
 KVM_X86_OP(get_exit_info)
 KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f099277b993d..a334ce7741ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1308,6 +1308,7 @@ struct kvm_x86_ops {
        u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
        u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu);
        void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+       void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);

        /*
         * Retrieve somewhat arbitrary exit information.  Intended to be used
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b18f60463073..914afcceb46d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1103,6 +1103,14 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
        vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }

+static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier)
+{
+       /*
+        * Handled when loading guest state since the ratio is programmed via
+        * MSR_AMD64_TSC_RATIO, not a field in the VMCB.
+        */
+}
+
 /* Evaluate instruction intercepts that depend on guest CPUID features. */
 static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
                                              struct vcpu_svm *svm)
@@ -4528,6 +4536,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
        .get_l2_tsc_offset = svm_get_l2_tsc_offset,
        .get_l2_tsc_multiplier = svm_get_l2_tsc_multiplier,
        .write_tsc_offset = svm_write_tsc_offset,
+       .write_tsc_multiplier = svm_write_tsc_multiplier,

        .load_mmu_pgd = svm_load_mmu_pgd,

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..712190493926 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2535,7 +2535,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
        vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset);

        if (kvm_has_tsc_control)
-               decache_tsc_multiplier(vmx);
+               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);

        nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);

@@ -4505,7 +4505,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
                vmcs_write32(TPR_THRESHOLD, vmx->nested.l1_tpr_threshold);

        if (kvm_has_tsc_control)
-               decache_tsc_multiplier(vmx);
+               vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_scaling_ratio);

        if (vmx->nested.change_vmcs01_virtual_apic_mode) {
                vmx->nested.change_vmcs01_virtual_apic_mode = false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4b70431c2edd..bf845a08995e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1390,11 +1390,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,

                vmx->loaded_vmcs->cpu = cpu;
        }
-
-       /* Setup TSC multiplier */
-       if (kvm_has_tsc_control &&
-           vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
-               decache_tsc_multiplier(vmx);
 }

 /*
@@ -1813,6 +1808,11 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
        vmcs_write64(TSC_OFFSET, offset);
...skipping...
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -322,8 +322,6 @@ struct vcpu_vmx {
        /* apic deadline value in host tsc */
        u64 hv_deadline_tsc;

-       u64 current_tsc_ratio;
-
        unsigned long host_debugctlmsr;

        /*
@@ -532,12 +530,6 @@ static inline struct vmcs *alloc_vmcs(bool shadow)
                              GFP_KERNEL_ACCOUNT);
 }

-static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx)
-{
-       vmx->current_tsc_ratio = vmx->vcpu.arch.tsc_scaling_ratio;
-       vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
-}
-
 static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
 {
        return vmx->secondary_exec_control &
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b61b54cea495..690de1868873 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2179,14 +2179,16 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
        return v;
 }

+static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
+                                         u64 l1_multiplier);
+
 static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
 {
        u64 ratio;

        /* Guest TSC same frequency as host TSC? */
        if (!scale) {
-               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
-               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
                return 0;
        }

@@ -2212,7 +2214,7 @@ static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
                return -1;
        }

-       vcpu->arch.l1_tsc_scaling_ratio = vcpu->arch.tsc_scaling_ratio = ratio;
+       kvm_vcpu_write_tsc_multiplier(vcpu, ratio);
        return 0;
 }

@@ -2224,8 +2226,7 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
        /* tsc_khz can be zero if TSC calibration fails */
        if (user_tsc_khz == 0) {
                /* set tsc_scaling_ratio to a safe value */
-               vcpu->arch.l1_tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
-               vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+               kvm_vcpu_write_tsc_multiplier(vcpu, kvm_default_tsc_scaling_ratio);
                return -1;
        }

@@ -2383,6 +2384,25 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
        static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }

+static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu,
+                                         u64 l1_multiplier)
+{
+       if (!kvm_has_tsc_control)
+               return;
+
+       vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
+
+       /* Userspace is changing the multiplier while L2 is active... */
+       if (is_guest_mode(vcpu))
+               vcpu->arch.tsc_scaling_ratio = kvm_calc_nested_tsc_multiplier(
+                       l1_multiplier,
+                       static_call(kvm_x86_get_l2_tsc_multiplier)(vcpu));
+       else
+               vcpu->arch.tsc_scaling_ratio = l1_multiplier;
+
+       static_call(kvm_x86_write_tsc_multiplier)(vcpu, vcpu->arch.tsc_scaling_ratio);
+}
+
 static inline bool kvm_check_tsc_unstable(void)
 {
 #ifdef CONFIG_X86_64

  reply	other threads:[~2021-05-25 15:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 10:24 [PATCH v3 00/12] KVM: Implement nested TSC scaling Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 01/12] math64.h: Add mul_s64_u64_shr() Ilias Stamatis
2021-05-24 17:49   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 02/12] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' Ilias Stamatis
2021-05-24 17:49   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 03/12] KVM: X86: Rename kvm_compute_tsc_offset() to kvm_compute_tsc_offset_l1() Ilias Stamatis
2021-05-24 14:21   ` Paolo Bonzini
2021-05-24 17:49     ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 04/12] KVM: X86: Add a ratio parameter to kvm_scale_tsc() Ilias Stamatis
2021-05-24 14:23   ` Paolo Bonzini
2021-05-24 15:48     ` Sean Christopherson
2021-05-24 15:56       ` Paolo Bonzini
2021-05-24 17:50     ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 05/12] KVM: VMX: Add a TSC multiplier field in VMCS12 Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 06/12] KVM: X86: Add functions for retrieving L2 TSC fields from common code Ilias Stamatis
2021-05-24 17:50   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 07/12] KVM: X86: Add functions that calculate L2's TSC offset and multiplier Ilias Stamatis
2021-05-24 17:51   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 08/12] KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it Ilias Stamatis
2021-05-24 17:51   ` Maxim Levitsky
2021-05-21 10:24 ` [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier() Ilias Stamatis
2021-05-24 17:53   ` Maxim Levitsky
2021-05-24 18:44     ` Sean Christopherson
2021-05-25 10:41       ` Stamatis, Ilias
2021-05-25 15:58         ` Sean Christopherson [this message]
2021-05-25 16:15           ` Paolo Bonzini
2021-05-25 16:34             ` Sean Christopherson
2021-05-25 17:34               ` Paolo Bonzini
2021-05-25 18:21                 ` Sean Christopherson
2021-05-25 18:52           ` Stamatis, Ilias
2021-05-25 19:25           ` Stamatis, Ilias
2021-05-25 23:35             ` Sean Christopherson
2021-05-21 10:24 ` [PATCH v3 10/12] KVM: VMX: Set the TSC offset and multiplier on nested entry and exit Ilias Stamatis
2021-05-24 17:54   ` Maxim Levitsky
2021-05-25 16:05   ` Sean Christopherson
2021-05-21 10:24 ` [PATCH v3 11/12] KVM: VMX: Expose TSC scaling to L2 Ilias Stamatis
2021-05-21 10:24 ` [PATCH v3 12/12] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test Ilias Stamatis
2021-05-24 17:55   ` Maxim Levitsky
2021-05-24 15:37 ` [PATCH v3 00/12] KVM: Implement nested TSC scaling Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK0emU2NjWZWBovh@google.com \
    --to=seanjc@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=ilstam@amazon.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=zamsden@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).