* [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
@ 2021-03-29 9:43 Wanpeng Li
2021-03-29 17:15 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2021-03-29 9:43 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
Jim Mattson, Joerg Roedel
From: Wanpeng Li <wanpengli@tencent.com>
The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831
reported that the guest time remains 0 when running a while true
loop in the guest.
The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it
belongs") moves guest_exit_irqoff() close to vmexit breaks the
tick-based time accouting when the ticks that happen after IRQs are
disabled are incorrectly accounted to the host/system time. This is
because we exit the guest state too early.
vtime-based time accounting is tied to context tracking, keep the
guest_exit_irqoff() around vmexit code when both vtime-based time
accounting and specific cpu is context tracking mode active.
Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff()
and explicit IRQ window for tick-based time accouting.
Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
arch/x86/kvm/svm/svm.c | 3 ++-
arch/x86/kvm/vmx/vmx.c | 3 ++-
arch/x86/kvm/x86.c | 2 ++
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58a45bb..55fb5ce 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* into world and some more.
*/
lockdep_hardirqs_off(CALLER_ADDR0);
- guest_exit_irqoff();
+ if (vtime_accounting_enabled_this_cpu())
+ guest_exit_irqoff();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 32cf828..85695b3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* into world and some more.
*/
lockdep_hardirqs_off(CALLER_ADDR0);
- guest_exit_irqoff();
+ if (vtime_accounting_enabled_this_cpu())
+ guest_exit_irqoff();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe806e8..234c8b3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9185,6 +9185,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
++vcpu->stat.exits;
local_irq_disable();
kvm_after_interrupt(vcpu);
+ if (!vtime_accounting_enabled_this_cpu())
+ guest_exit_irqoff();
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
--
2.7.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
2021-03-29 9:43 [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking Wanpeng Li
@ 2021-03-29 17:15 ` Sean Christopherson
2021-03-30 1:33 ` Wanpeng Li
0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2021-03-29 17:15 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
Jim Mattson, Joerg Roedel, Thomas Gleixner
+Thomas
On Mon, Mar 29, 2021, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
>
> The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831
> reported that the guest time remains 0 when running a while true
> loop in the guest.
>
> The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it
> belongs") moves guest_exit_irqoff() close to vmexit breaks the
> tick-based time accouting when the ticks that happen after IRQs are
> disabled are incorrectly accounted to the host/system time. This is
> because we exit the guest state too early.
>
> vtime-based time accounting is tied to context tracking, keep the
> guest_exit_irqoff() around vmexit code when both vtime-based time
> accounting and specific cpu is context tracking mode active.
> Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff()
> and explicit IRQ window for tick-based time accouting.
>
> Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> arch/x86/kvm/svm/svm.c | 3 ++-
> arch/x86/kvm/vmx/vmx.c | 3 ++-
> arch/x86/kvm/x86.c | 2 ++
> 3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58a45bb..55fb5ce 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> * into world and some more.
> */
> lockdep_hardirqs_off(CALLER_ADDR0);
> - guest_exit_irqoff();
> + if (vtime_accounting_enabled_this_cpu())
> + guest_exit_irqoff();
>
> instrumentation_begin();
> trace_hardirqs_off_finish();
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 32cf828..85695b3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> * into world and some more.
> */
> lockdep_hardirqs_off(CALLER_ADDR0);
> - guest_exit_irqoff();
> + if (vtime_accounting_enabled_this_cpu())
> + guest_exit_irqoff();
This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are
selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the
rcu_user_exit() call won't be delayed because it will never be called in the
!vtime case. But it still feels wrong poking into those details, e.g. it'll
be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific.
Maybe that will never happen though? And of course, my hack alternative also
pokes into the details[*].
Thomas, do you have an input on the least awful way to handle this? My horrible
hack was to force PF_VCPU around the window where KVM handles IRQs after guest
exit.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d9f931c63293..6ddf341cd755 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9118,6 +9118,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
+ /*
+ * Temporarily pretend this task is running a vCPU when potentially
+ * processing an IRQ exit, including the below opening of an IRQ
+ * window. Tick-based accounting of guest time relies on PF_VCPU
+ * being set when the tick IRQ handler runs.
+ */
+ current->flags |= PF_VCPU;
static_call(kvm_x86_handle_exit_irqoff)(vcpu);
/*
@@ -9132,6 +9139,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
++vcpu->stat.exits;
local_irq_disable();
kvm_after_interrupt(vcpu);
+ current->flags &= ~PF_VCPU;
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
[*]https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
> instrumentation_begin();
> trace_hardirqs_off_finish();
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fe806e8..234c8b3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9185,6 +9185,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> ++vcpu->stat.exits;
> local_irq_disable();
> kvm_after_interrupt(vcpu);
> + if (!vtime_accounting_enabled_this_cpu())
> + guest_exit_irqoff();
>
> if (lapic_in_kernel(vcpu)) {
> s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
> --
> 2.7.4
>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
2021-03-29 17:15 ` Sean Christopherson
@ 2021-03-30 1:33 ` Wanpeng Li
2021-04-06 22:16 ` Sean Christopherson
0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2021-03-30 1:33 UTC (permalink / raw)
To: Sean Christopherson
Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
Jim Mattson, Joerg Roedel, Thomas Gleixner
On Tue, 30 Mar 2021 at 01:15, Sean Christopherson <seanjc@google.com> wrote:
>
> +Thomas
>
> On Mon, Mar 29, 2021, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831
> > reported that the guest time remains 0 when running a while true
> > loop in the guest.
> >
> > The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it
> > belongs") moves guest_exit_irqoff() close to vmexit breaks the
> > tick-based time accouting when the ticks that happen after IRQs are
> > disabled are incorrectly accounted to the host/system time. This is
> > because we exit the guest state too early.
> >
> > vtime-based time accounting is tied to context tracking, keep the
> > guest_exit_irqoff() around vmexit code when both vtime-based time
> > accounting and specific cpu is context tracking mode active.
> > Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff()
> > and explicit IRQ window for tick-based time accouting.
> >
> > Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
> > Cc: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > arch/x86/kvm/svm/svm.c | 3 ++-
> > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > arch/x86/kvm/x86.c | 2 ++
> > 3 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 58a45bb..55fb5ce 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> > * into world and some more.
> > */
> > lockdep_hardirqs_off(CALLER_ADDR0);
> > - guest_exit_irqoff();
> > + if (vtime_accounting_enabled_this_cpu())
> > + guest_exit_irqoff();
> >
> > instrumentation_begin();
> > trace_hardirqs_off_finish();
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 32cf828..85695b3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> > * into world and some more.
> > */
> > lockdep_hardirqs_off(CALLER_ADDR0);
> > - guest_exit_irqoff();
> > + if (vtime_accounting_enabled_this_cpu())
> > + guest_exit_irqoff();
>
> This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are
> selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the
> rcu_user_exit() call won't be delayed because it will never be called in the
> !vtime case. But it still feels wrong poking into those details, e.g. it'll
> be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific.
Could you elaborate what's the meaning of "it'll be weird and/or wrong
guest_exit_irqoff() gains stuff that isn't vtime specific."?
Wanpeng
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
2021-03-30 1:33 ` Wanpeng Li
@ 2021-04-06 22:16 ` Sean Christopherson
0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-04-06 22:16 UTC (permalink / raw)
To: Wanpeng Li
Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
Jim Mattson, Joerg Roedel, Thomas Gleixner
On Tue, Mar 30, 2021, Wanpeng Li wrote:
> On Tue, 30 Mar 2021 at 01:15, Sean Christopherson <seanjc@google.com> wrote:
> >
> > +Thomas
> >
> > On Mon, Mar 29, 2021, Wanpeng Li wrote:
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 32cf828..85695b3 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> > > * into world and some more.
> > > */
> > > lockdep_hardirqs_off(CALLER_ADDR0);
> > > - guest_exit_irqoff();
> > > + if (vtime_accounting_enabled_this_cpu())
> > > + guest_exit_irqoff();
> >
> > This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are
> > selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the
> > rcu_user_exit() call won't be delayed because it will never be called in the
> > !vtime case. But it still feels wrong poking into those details, e.g. it'll
> > be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific.
>
> Could you elaborate what's the meaning of "it'll be weird and/or wrong
> guest_exit_irqoff() gains stuff that isn't vtime specific."?
For example, if RCU logic is added to guest_exit_irqoff() that is needed
irrespective of vtime, then KVM will end up with different RCU logic depending
on whether or not vtime is enabled. RCU is just an example. My point is that
it doesn't seem impossible that there would be something in the future that
wants to tap into the guest->host transition.
Maybe that never happens and the vtime check is perfectly ok, but for me, the
name guest_exit_irqoff() doesn't sound like something that should hinge on
time accounting being enabled.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-04-06 22:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29 9:43 [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking Wanpeng Li
2021-03-29 17:15 ` Sean Christopherson
2021-03-30 1:33 ` Wanpeng Li
2021-04-06 22:16 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).