kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: "Stamatis, Ilias" <ilstam@amazon.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"ilstam@mailbox.org" <ilstam@mailbox.org>
Cc: "jmattson@google.com" <jmattson@google.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"mtosatti@redhat.com" <mtosatti@redhat.com>,
	"zamsden@gmail.com" <zamsden@gmail.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>
Subject: Re: [PATCH 6/8] KVM: VMX: Make vmx_write_l1_tsc_offset() work with nested TSC scaling
Date: Tue, 11 May 2021 15:44:48 +0300	[thread overview]
Message-ID: <875fa1ff9a85ff601a05030eaa24a1db45a71f36.camel@redhat.com> (raw)
In-Reply-To: <9e0973a1adef19158e0c09a642b8c733556e272c.camel@amazon.com>

On Mon, 2021-05-10 at 16:08 +0000, Stamatis, Ilias wrote:
> On Mon, 2021-05-10 at 16:54 +0300, Maxim Levitsky wrote:
> > On Thu, 2021-05-06 at 10:32 +0000, ilstam@mailbox.org wrote:
> > > From: Ilias Stamatis <ilstam@amazon.com>
> > > 
> > > Calculating the current TSC offset is done differently when nested TSC
> > > scaling is used.
> > > 
> > > Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> > > ---
> > >  arch/x86/kvm/vmx/vmx.c | 26 ++++++++++++++++++++------
> > >  1 file changed, 20 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 49241423b854..df7dc0e4c903 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1797,10 +1797,16 @@ static void setup_msrs(struct vcpu_vmx *vmx)
> > >               vmx_update_msr_bitmap(&vmx->vcpu);
> > >  }
> > > 
> > > -static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > > +/*
> > > + * This function receives the requested offset for L1 as an argument but it
> > > + * actually writes the "current" tsc offset to the VMCS and returns it. The
> > > + * current offset might be different in case an L2 guest is currently running
> > > + * and its VMCS02 is loaded.
> > > + */
> > 
> > (Not related to this patch) It might be a good idea to rename this callback
> > instead of this comment, but I am not sure about it.
> > 
> 
> Yes! I was planning to do this on v2 anyway as the name of that function
> is completely misleading/wrong IMHO.
> 
> I think that even the comment inside it that explains that when L1
> doesn't trap WRMSR then L2 TSC writes overwrite L1's TSC is misplaced.
> It should go one or more levels above I believe.
> 
> This function simply
> updates the TSC offset in the current VMCS depending on whether L1 or L2
> is executing. 
> 
> > > +static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
> > >  {
> > >       struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> > > -     u64 g_tsc_offset = 0;
> > > +     u64 cur_offset = l1_offset;
> > > 
> > >       /*
> > >        * We're here if L1 chose not to trap WRMSR to TSC. According
> > > @@ -1809,11 +1815,19 @@ static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
> > >        * to the newly set TSC to get L2's TSC.
> > >        */
> > >       if (is_guest_mode(vcpu) &&
> > > -         (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
> > > -             g_tsc_offset = vmcs12->tsc_offset;
> > > +         (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)) {
> > > +             if (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING) {
> > > +                     cur_offset = kvm_compute_02_tsc_offset(
> > > +                                     l1_offset,
> > > +                                     vmcs12->tsc_multiplier,
> > > +                                     vmcs12->tsc_offset);
> > > +             } else {
> > > +                     cur_offset = l1_offset + vmcs12->tsc_offset;
> > > +             }
> > > +     }
> > > 
> > > -     vmcs_write64(TSC_OFFSET, offset + g_tsc_offset);
> > > -     return offset + g_tsc_offset;
> > > +     vmcs_write64(TSC_OFFSET, cur_offset);
> > > +     return cur_offset;
> > >  }
> > > 
> > >  /*
> > 
> > This code would be ideal to move to common code as SVM will do basically
> > the same thing.
> > Doesn't have to be done now though.
> > 
> 
> Hmm, how can we do the feature availability checking in common code?

We can add a vendor callback for this.

Just a few thoughts about how I think we can implement
the nested TSC scaling in (mostly) common code:


Assuming that the common code knows that:
1. Nested guest is running (already the case)

2. The format of the scaling multiplier is known 
(thankfully both SVM and VMX use fixed point binary number.

SVM is using 8.32 format and VMX using 16.48 format.

The common code already knows this via
kvm_max_tsc_scaling_ratio/kvm_tsc_scaling_ratio_frac_bits.

3. the value of nested TSC scaling multiplier 
is known to the common code.

(a callback to VMX/SVM code to query the TSC scaling value, 
and have it return 1 when TSC scaling is disabled should work)


Then the common code can do the whole thing, and only
let the SVM/VMX code write the actual multiplier.

As far as I know on the SVM, the TSC scaling works like that:

1. SVM has a CPUID bit to indicate that tsc scaling is supported.
(X86_FEATURE_TSCRATEMSR)

When this bit is set, TSC scale ratio is unconditionally enabled (but
can be just 1), and it is set via a special MSR (MSR_AMD64_TSC_RATIO)
rather than a field in VMCB (someone at AMD did cut corners...).

However since the TSC scaling is only effective when a guest is running,
that MSR can be treated almost as if it was just another VMCB field.

The TSC scale value is 32 bit fraction and another 8 bits the integer value
(as opposed to 48 bit fraction on VMX and 16 bits integer value).

I don't think that there are any other differences.

I should also note that I can do all of the above myself if 
I end up implementing the nested TSC scaling on AMD 
so I don't object much to the way that this patch series is done.

Best regards,
	Maxim Levitsky

> 
> > Best regards,
> >         Maxim Levitsky
> > 



  reply	other threads:[~2021-05-11 12:44 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06 10:32 [PATCH 0/8] KVM: VMX: Implement nested TSC scaling ilstam
2021-05-06 10:32 ` [PATCH 1/8] KVM: VMX: Add a TSC multiplier field in VMCS12 ilstam
2021-05-06 14:50   ` kernel test robot
2021-05-06 17:36   ` Jim Mattson
2021-05-10 13:42   ` Maxim Levitsky
2021-05-06 10:32 ` [PATCH 2/8] KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' ilstam
2021-05-10 13:43   ` Maxim Levitsky
2021-05-06 10:32 ` [PATCH 3/8] KVM: X86: Pass an additional 'L1' argument to kvm_scale_tsc() ilstam
2021-05-10 13:52   ` Maxim Levitsky
2021-05-10 15:44     ` Stamatis, Ilias
2021-05-06 10:32 ` [PATCH 4/8] KVM: VMX: Adjust the TSC-related VMCS fields on L2 entry and exit ilstam
2021-05-06 11:32   ` Paolo Bonzini
2021-05-06 17:35     ` Stamatis, Ilias
2021-05-10 14:11       ` Paolo Bonzini
2021-05-10 13:53   ` Maxim Levitsky
2021-05-10 14:44     ` Stamatis, Ilias
2021-05-11 12:38       ` Maxim Levitsky
2021-05-11 15:11         ` Stamatis, Ilias
2021-05-06 10:32 ` [PATCH 5/8] KVM: X86: Move tracing outside write_l1_tsc_offset() ilstam
2021-05-10 13:54   ` Maxim Levitsky
2021-05-06 10:32 ` [PATCH 6/8] KVM: VMX: Make vmx_write_l1_tsc_offset() work with nested TSC scaling ilstam
2021-05-10 13:54   ` Maxim Levitsky
2021-05-10 16:08     ` Stamatis, Ilias
2021-05-11 12:44       ` Maxim Levitsky [this message]
2021-05-11 17:44         ` Stamatis, Ilias
2021-05-06 10:32 ` [PATCH 7/8] KVM: VMX: Expose TSC scaling to L2 ilstam
2021-05-10 13:56   ` Maxim Levitsky
2021-05-06 10:32 ` [PATCH 8/8] KVM: selftests: x86: Add vmx_nested_tsc_scaling_test ilstam
2021-05-10 13:59   ` Maxim Levitsky
2021-05-11 11:16     ` Stamatis, Ilias
2021-05-11 12:47       ` Maxim Levitsky
2021-05-11 14:02         ` Stamatis, Ilias
2021-05-06 17:16 ` [PATCH 0/8] KVM: VMX: Implement nested TSC scaling Jim Mattson
2021-05-06 17:48   ` Stamatis, Ilias
2021-05-10 13:43     ` Maxim Levitsky
2021-05-10 14:29   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875fa1ff9a85ff601a05030eaa24a1db45a71f36.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=ilstam@amazon.com \
    --cc=ilstam@mailbox.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=zamsden@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).