kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
@ 2021-03-29  9:43 Wanpeng Li
  2021-03-29 17:15 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2021-03-29  9:43 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831 
reported that the guest time remains 0 when running a while true 
loop in the guest.

The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it 
belongs") moves guest_exit_irqoff() close to vmexit breaks the 
tick-based time accouting when the ticks that happen after IRQs are 
disabled are incorrectly accounted to the host/system time. This is 
because we exit the guest state too early.

vtime-based time accounting is tied to context tracking, keep the 
guest_exit_irqoff() around vmexit code when both vtime-based time 
accounting and specific cpu is context tracking mode active. 
Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff() 
and explicit IRQ window for tick-based time accouting.

Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/svm/svm.c | 3 ++-
 arch/x86/kvm/vmx/vmx.c | 3 ++-
 arch/x86/kvm/x86.c     | 2 ++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58a45bb..55fb5ce 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	 * into world and some more.
 	 */
 	lockdep_hardirqs_off(CALLER_ADDR0);
-	guest_exit_irqoff();
+	if (vtime_accounting_enabled_this_cpu())
+		guest_exit_irqoff();
 
 	instrumentation_begin();
 	trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 32cf828..85695b3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	 * into world and some more.
 	 */
 	lockdep_hardirqs_off(CALLER_ADDR0);
-	guest_exit_irqoff();
+	if (vtime_accounting_enabled_this_cpu())
+		guest_exit_irqoff();
 
 	instrumentation_begin();
 	trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe806e8..234c8b3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9185,6 +9185,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	++vcpu->stat.exits;
 	local_irq_disable();
 	kvm_after_interrupt(vcpu);
+	if (!vtime_accounting_enabled_this_cpu())
+		guest_exit_irqoff();
 
 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
  2021-03-29  9:43 [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking Wanpeng Li
@ 2021-03-29 17:15 ` Sean Christopherson
  2021-03-30  1:33   ` Wanpeng Li
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2021-03-29 17:15 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner

+Thomas

On Mon, Mar 29, 2021, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831 
> reported that the guest time remains 0 when running a while true 
> loop in the guest.
> 
> The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it 
> belongs") moves guest_exit_irqoff() close to vmexit breaks the 
> tick-based time accouting when the ticks that happen after IRQs are 
> disabled are incorrectly accounted to the host/system time. This is 
> because we exit the guest state too early.
> 
> vtime-based time accounting is tied to context tracking, keep the 
> guest_exit_irqoff() around vmexit code when both vtime-based time 
> accounting and specific cpu is context tracking mode active. 
> Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff() 
> and explicit IRQ window for tick-based time accouting.
> 
> Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/svm/svm.c | 3 ++-
>  arch/x86/kvm/vmx/vmx.c | 3 ++-
>  arch/x86/kvm/x86.c     | 2 ++
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58a45bb..55fb5ce 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  	 * into world and some more.
>  	 */
>  	lockdep_hardirqs_off(CALLER_ADDR0);
> -	guest_exit_irqoff();
> +	if (vtime_accounting_enabled_this_cpu())
> +		guest_exit_irqoff();
>  
>  	instrumentation_begin();
>  	trace_hardirqs_off_finish();
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 32cf828..85695b3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  	 * into world and some more.
>  	 */
>  	lockdep_hardirqs_off(CALLER_ADDR0);
> -	guest_exit_irqoff();
> +	if (vtime_accounting_enabled_this_cpu())
> +		guest_exit_irqoff();

This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are
selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the
rcu_user_exit() call won't be delayed because it will never be called in the
!vtime case.  But it still feels wrong poking into those details, e.g. it'll
be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific.
Maybe that will never happen though?  And of course, my hack alternative also
pokes into the details[*].

Thomas, do you have an input on the least awful way to handle this?  My horrible
hack was to force PF_VCPU around the window where KVM handles IRQs after guest
exit.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d9f931c63293..6ddf341cd755 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9118,6 +9118,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	vcpu->mode = OUTSIDE_GUEST_MODE;
 	smp_wmb();

+	/*
+	 * Temporarily pretend this task is running a vCPU when potentially
+	 * processing an IRQ exit, including the below opening of an IRQ
+	 * window.  Tick-based accounting of guest time relies on PF_VCPU
+	 * being set when the tick IRQ handler runs.
+	 */
+	current->flags |= PF_VCPU;
 	static_call(kvm_x86_handle_exit_irqoff)(vcpu);

 	/*
@@ -9132,6 +9139,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	++vcpu->stat.exits;
 	local_irq_disable();
 	kvm_after_interrupt(vcpu);
+	current->flags &= ~PF_VCPU;

 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;

[*]https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com

>  	instrumentation_begin();
>  	trace_hardirqs_off_finish();
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fe806e8..234c8b3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9185,6 +9185,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	++vcpu->stat.exits;
>  	local_irq_disable();
>  	kvm_after_interrupt(vcpu);
> +	if (!vtime_accounting_enabled_this_cpu())
> +		guest_exit_irqoff();
>  
>  	if (lapic_in_kernel(vcpu)) {
>  		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
> -- 
> 2.7.4
> 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
  2021-03-29 17:15 ` Sean Christopherson
@ 2021-03-30  1:33   ` Wanpeng Li
  2021-04-06 22:16     ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2021-03-30  1:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner

On Tue, 30 Mar 2021 at 01:15, Sean Christopherson <seanjc@google.com> wrote:
>
> +Thomas
>
> On Mon, Mar 29, 2021, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > The bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=209831
> > reported that the guest time remains 0 when running a while true
> > loop in the guest.
> >
> > The commit 87fa7f3e98a131 ("x86/kvm: Move context tracking where it
> > belongs") moves guest_exit_irqoff() close to vmexit breaks the
> > tick-based time accouting when the ticks that happen after IRQs are
> > disabled are incorrectly accounted to the host/system time. This is
> > because we exit the guest state too early.
> >
> > vtime-based time accounting is tied to context tracking, keep the
> > guest_exit_irqoff() around vmexit code when both vtime-based time
> > accounting and specific cpu is context tracking mode active.
> > Otherwise, leave guest_exit_irqoff() after handle_exit_irqoff()
> > and explicit IRQ window for tick-based time accouting.
> >
> > Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
> > Cc: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 3 ++-
> >  arch/x86/kvm/vmx/vmx.c | 3 ++-
> >  arch/x86/kvm/x86.c     | 2 ++
> >  3 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 58a45bb..55fb5ce 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3812,7 +3812,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> >        * into world and some more.
> >        */
> >       lockdep_hardirqs_off(CALLER_ADDR0);
> > -     guest_exit_irqoff();
> > +     if (vtime_accounting_enabled_this_cpu())
> > +             guest_exit_irqoff();
> >
> >       instrumentation_begin();
> >       trace_hardirqs_off_finish();
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 32cf828..85695b3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> >        * into world and some more.
> >        */
> >       lockdep_hardirqs_off(CALLER_ADDR0);
> > -     guest_exit_irqoff();
> > +     if (vtime_accounting_enabled_this_cpu())
> > +             guest_exit_irqoff();
>
> This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are
> selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the
> rcu_user_exit() call won't be delayed because it will never be called in the
> !vtime case.  But it still feels wrong poking into those details, e.g. it'll
> be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific.

Could you elaborate what's the meaning of "it'll be weird and/or wrong
guest_exit_irqoff() gains stuff that isn't vtime specific."?

    Wanpeng

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking
  2021-03-30  1:33   ` Wanpeng Li
@ 2021-04-06 22:16     ` Sean Christopherson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-04-06 22:16 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner

On Tue, Mar 30, 2021, Wanpeng Li wrote:
> On Tue, 30 Mar 2021 at 01:15, Sean Christopherson <seanjc@google.com> wrote:
> >
> > +Thomas
> >
> > On Mon, Mar 29, 2021, Wanpeng Li wrote:
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 32cf828..85695b3 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -6689,7 +6689,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> > >        * into world and some more.
> > >        */
> > >       lockdep_hardirqs_off(CALLER_ADDR0);
> > > -     guest_exit_irqoff();
> > > +     if (vtime_accounting_enabled_this_cpu())
> > > +             guest_exit_irqoff();
> >
> > This looks ok, as CONFIG_CONTEXT_TRACKING and CONFIG_VIRT_CPU_ACCOUNTING_GEN are
> > selected by CONFIG_NO_HZ_FULL=y, and can't be enabled independently, e.g. the
> > rcu_user_exit() call won't be delayed because it will never be called in the
> > !vtime case.  But it still feels wrong poking into those details, e.g. it'll
> > be weird and/or wrong guest_exit_irqoff() gains stuff that isn't vtime specific.
> 
> Could you elaborate what's the meaning of "it'll be weird and/or wrong
> guest_exit_irqoff() gains stuff that isn't vtime specific."?

For example, if RCU logic is added to guest_exit_irqoff() that is needed
irrespective of vtime, then KVM will end up with different RCU logic depending
on whether or not vtime is enabled.  RCU is just an example.  My point is that
it doesn't seem impossible that there would be something in the future that
wants to tap into the guest->host transition.

Maybe that never happens and the vtime check is perfectly ok, but for me, the
name guest_exit_irqoff() doesn't sound like something that should hinge on
time accounting being enabled.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-06 22:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29  9:43 [PATCH] KVM: X86: Properly account for guest CPU time when considering context tracking Wanpeng Li
2021-03-29 17:15 ` Sean Christopherson
2021-03-30  1:33   ` Wanpeng Li
2021-04-06 22:16     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).