From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Wanpeng Li <kernellwp@gmail.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v3 1/2] KVM: LAPIC: Optimize timer latency consider world switch time
Date: Wed, 12 Jun 2019 21:22:43 +0200 [thread overview]
Message-ID: <20190612192243.GA23583@flask> (raw)
In-Reply-To: <20190612151447.GD20308@linux.intel.com>
2019-06-12 08:14-0700, Sean Christopherson:
> On Wed, Jun 12, 2019 at 05:40:18PM +0800, Wanpeng Li wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > @@ -145,6 +145,12 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
> > static int __read_mostly lapic_timer_advance_ns = -1;
> > module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR);
> >
> > +/*
> > + * lapic timer vmentry advance (tscdeadline mode only) in nanoseconds.
> > + */
> > +u32 __read_mostly vmentry_advance_ns = 300;
>
> Enabling this by default makes me nervous, e.g. nothing guarantees that
> future versions of KVM and/or CPUs will continue to have 300ns of overhead
> between wait_lapic_expire() and VM-Enter.
>
> If we want it enabled by default so that it gets tested, the default
> value should be extremely conservative, e.g. set the default to a small
> percentage (25%?) of the latency of VM-Enter itself on modern CPUs,
> VM-Enter latency being the min between VMLAUNCH and VMLOAD+VMRUN+VMSAVE.
I share the sentiment. We definitely must not enter the guest before
the deadline has expired and CPUs are approaching 5 GHz (in turbo), so
300 ns would be too much even today.
I wrote a simple testcase for rough timing and there are 267 cycles
(111 ns @ 2.4 GHz) between doing rdtsc() right after
kvm_wait_lapic_expire() [1] and doing rdtsc() in the guest as soon as
possible (see the attached kvm-unit-test).
That is on a Haswell, where vmexit.flat reports 2120 cycles for a
vmcall. This would linearly (likely incorrect method in this case)
translate to 230 cycles on a machine with 1800 cycles for a vmcall,
which is less than 50 ns @ 5 GHz.
I wouldn't go above 25 ns for a hard-coded default.
(We could also do a similar measurement when initializing KVM and have a
dynamic default, but I'm thinking it's going to be way too much code
for the benefit.)
---
1: This is how the TSC is read in KVM.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index da24f1858acc..a7251ac0109b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6449,6 +6449,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.apic->lapic_timer.timer_advance_ns)
kvm_wait_lapic_expire(vcpu);
+ vcpu->last_seen_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+
/*
* If this vCPU has touched SPEC_CTRL, restore the guest's value if
* it's non-zero. Since vmentry is serialising on affected CPUs, there
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6200d5a51f13..5e0ce8ca31e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7201,6 +7201,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_SEND_IPI:
ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2, a3, op_64_bit);
break;
+ case KVM_HC_LAST_SEEN_TSC:
+ ret = vcpu->last_seen_tsc;
+ break;
default:
ret = -KVM_ENOSYS;
break;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index abafddb9fe2c..7f70fe7a28b1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -323,6 +323,8 @@ struct kvm_vcpu {
bool preempted;
struct kvm_vcpu_arch arch;
struct dentry *debugfs_dentry;
+
+ u64 last_seen_tsc;
};
static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 6c0ce49931e5..dfbc6e9ad7a1 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -28,6 +28,7 @@
#define KVM_HC_MIPS_CONSOLE_OUTPUT 8
#define KVM_HC_CLOCK_PAIRING 9
#define KVM_HC_SEND_IPI 10
+#define KVM_HC_LAST_SEEN_TSC 11
/*
* hypercalls use architecture specific
next prev parent reply other threads:[~2019-06-12 19:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-12 9:40 [PATCH v3 1/2] KVM: LAPIC: Optimize timer latency consider world switch time Wanpeng Li
2019-06-12 9:40 ` [PATCH v3 2/2] KVM: LAPIC: remove the trailing newline used in the fmt parameter of TP_printk Wanpeng Li
2019-06-12 15:14 ` [PATCH v3 1/2] KVM: LAPIC: Optimize timer latency consider world switch time Sean Christopherson
2019-06-12 19:22 ` Radim Krčmář [this message]
2019-06-12 19:27 ` Radim Krčmář
2019-06-13 1:52 ` Wanpeng Li
2019-06-13 1:50 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190612192243.GA23583@flask \
--to=rkrcmar@redhat.com \
--cc=kernellwp@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=sean.j.christopherson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).