* Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-06 20:17 Michael Tokarev
2021-04-06 21:47 ` Sean Christopherson
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Michael Tokarev @ 2021-04-06 20:17 UTC (permalink / raw)
To: kvm, qemu-devel, Thomas Gleixner; +Cc: Paolo Bonzini
Hi!
It looks like this commit:
commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 8 21:51:54 2020 +0200
x86/kvm: Move context tracking where it belongs
Context tracking for KVM happens way too early in the vcpu_run()
code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
cannot use RCU and should also be not instrumented.
The current way of doing this covers way too much code. Move it closer to
the actual vmenter/exit code.
broke kvm guest cpu time accounting - after this commit, when running
qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
elsewhere) is always 0.
I dunno why it happened, but it happened, and all kernels after 5.9
are affected by this.
This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev
@ 2021-04-06 21:47 ` Sean Christopherson
2021-04-08 13:18 ` Thomas Gleixner
2021-04-07 11:01 ` Wanpeng Li
2021-04-13 10:48 ` Wanpeng Li
2 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2021-04-06 21:47 UTC (permalink / raw)
To: Michael Tokarev; +Cc: kvm, qemu-devel, Thomas Gleixner, Paolo Bonzini
On Tue, Apr 06, 2021, Michael Tokarev wrote:
> Hi!
>
> It looks like this commit:
>
> commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed Jul 8 21:51:54 2020 +0200
>
> x86/kvm: Move context tracking where it belongs
>
> Context tracking for KVM happens way too early in the vcpu_run()
> code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
> cannot use RCU and should also be not instrumented.
>
> The current way of doing this covers way too much code. Move it closer to
> the actual vmenter/exit code.
>
> broke kvm guest cpu time accounting - after this commit, when running
> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> elsewhere) is always 0.
>
> I dunno why it happened, but it happened, and all kernels after 5.9
> are affected by this.
>
> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
Yes :-(
There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
either of the fixes, but an elegant solution hasn't presented itself.
Thomas/Paolo, can you please weigh in?
[1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
[2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
[3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev
@ 2021-04-07 11:01 ` Wanpeng Li
2021-04-07 11:01 ` Wanpeng Li
2021-04-13 10:48 ` Wanpeng Li
2 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-07 11:01 UTC (permalink / raw)
To: mjt; +Cc: kvm, qemu-devel@nongnu.org Developers, Thomas Gleixner, Paolo Bonzini
On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> Hi!
>
> It looks like this commit:
>
> commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed Jul 8 21:51:54 2020 +0200
>
> x86/kvm: Move context tracking where it belongs
>
> Context tracking for KVM happens way too early in the vcpu_run()
> code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
> cannot use RCU and should also be not instrumented.
>
> The current way of doing this covers way too much code. Move it closer to
> the actual vmenter/exit code.
>
> broke kvm guest cpu time accounting - after this commit, when running
> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> elsewhere) is always 0.
>
> I dunno why it happened, but it happened, and all kernels after 5.9
> are affected by this.
>
> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
Thanks for the report. I have a patch here which can fix it.
https://lore.kernel.org/kvm/1617011036-11734-1-git-send-email-wanpengli@tencent.com/
Wanpeng
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-07 11:01 ` Wanpeng Li
0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-07 11:01 UTC (permalink / raw)
To: mjt; +Cc: Paolo Bonzini, Thomas Gleixner, qemu-devel@nongnu.org Developers, kvm
On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> Hi!
>
> It looks like this commit:
>
> commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed Jul 8 21:51:54 2020 +0200
>
> x86/kvm: Move context tracking where it belongs
>
> Context tracking for KVM happens way too early in the vcpu_run()
> code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
> cannot use RCU and should also be not instrumented.
>
> The current way of doing this covers way too much code. Move it closer to
> the actual vmenter/exit code.
>
> broke kvm guest cpu time accounting - after this commit, when running
> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> elsewhere) is always 0.
>
> I dunno why it happened, but it happened, and all kernels after 5.9
> are affected by this.
>
> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
Thanks for the report. I have a patch here which can fix it.
https://lore.kernel.org/kvm/1617011036-11734-1-git-send-email-wanpengli@tencent.com/
Wanpeng
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-06 21:47 ` Sean Christopherson
@ 2021-04-08 13:18 ` Thomas Gleixner
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2021-04-08 13:18 UTC (permalink / raw)
To: Sean Christopherson, Michael Tokarev; +Cc: kvm, qemu-devel, Paolo Bonzini
On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote:
> On Tue, Apr 06, 2021, Michael Tokarev wrote:
>> broke kvm guest cpu time accounting - after this commit, when running
>> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
>> elsewhere) is always 0.
>>
>> I dunno why it happened, but it happened, and all kernels after 5.9
>> are affected by this.
>>
>> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
>
> Yes :-(
>
> There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
> either of the fixes, but an elegant solution hasn't presented itself.
>
> Thomas/Paolo, can you please weigh in?
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
> [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
All of the solutions I looked at so far are ugly as hell. The problem is
that the accounting is plumbed into the context tracking and moving
context tracking around to a different place is just wrong.
I think the right solution is to seperate the time accounting logic out
from guest_enter/exit_irqoff() and have virt time specific helpers which
can be placed at the proper spots in kvm.
Thanks,
tglx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-08 13:18 ` Thomas Gleixner
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2021-04-08 13:18 UTC (permalink / raw)
To: Sean Christopherson, Michael Tokarev; +Cc: Paolo Bonzini, qemu-devel, kvm
On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote:
> On Tue, Apr 06, 2021, Michael Tokarev wrote:
>> broke kvm guest cpu time accounting - after this commit, when running
>> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
>> elsewhere) is always 0.
>>
>> I dunno why it happened, but it happened, and all kernels after 5.9
>> are affected by this.
>>
>> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
>
> Yes :-(
>
> There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
> either of the fixes, but an elegant solution hasn't presented itself.
>
> Thomas/Paolo, can you please weigh in?
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
> [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
All of the solutions I looked at so far are ugly as hell. The problem is
that the accounting is plumbed into the context tracking and moving
context tracking around to a different place is just wrong.
I think the right solution is to seperate the time accounting logic out
from guest_enter/exit_irqoff() and have virt time specific helpers which
can be placed at the proper spots in kvm.
Thanks,
tglx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-08 13:18 ` Thomas Gleixner
@ 2021-04-09 2:14 ` Wanpeng Li
-1 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-09 2:14 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sean Christopherson, Michael Tokarev, kvm,
qemu-devel@nongnu.org Developers, Paolo Bonzini
On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote:
> > On Tue, Apr 06, 2021, Michael Tokarev wrote:
> >> broke kvm guest cpu time accounting - after this commit, when running
> >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> >> elsewhere) is always 0.
> >>
> >> I dunno why it happened, but it happened, and all kernels after 5.9
> >> are affected by this.
> >>
> >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
> >
> > Yes :-(
> >
> > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
> > either of the fixes, but an elegant solution hasn't presented itself.
> >
> > Thomas/Paolo, can you please weigh in?
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
> > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
>
> All of the solutions I looked at so far are ugly as hell. The problem is
> that the accounting is plumbed into the context tracking and moving
> context tracking around to a different place is just wrong.
>
> I think the right solution is to seperate the time accounting logic out
> from guest_enter/exit_irqoff() and have virt time specific helpers which
> can be placed at the proper spots in kvm.
Good suggestion, I will have a try. :)
Wanpeng
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-09 2:14 ` Wanpeng Li
0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-09 2:14 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sean Christopherson, Michael Tokarev,
qemu-devel@nongnu.org Developers, kvm, Paolo Bonzini
On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote:
> > On Tue, Apr 06, 2021, Michael Tokarev wrote:
> >> broke kvm guest cpu time accounting - after this commit, when running
> >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> >> elsewhere) is always 0.
> >>
> >> I dunno why it happened, but it happened, and all kernels after 5.9
> >> are affected by this.
> >>
> >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
> >
> > Yes :-(
> >
> > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
> > either of the fixes, but an elegant solution hasn't presented itself.
> >
> > Thomas/Paolo, can you please weigh in?
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
> > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
>
> All of the solutions I looked at so far are ugly as hell. The problem is
> that the accounting is plumbed into the context tracking and moving
> context tracking around to a different place is just wrong.
>
> I think the right solution is to seperate the time accounting logic out
> from guest_enter/exit_irqoff() and have virt time specific helpers which
> can be placed at the proper spots in kvm.
Good suggestion, I will have a try. :)
Wanpeng
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-08 13:18 ` Thomas Gleixner
@ 2021-04-09 8:13 ` Wanpeng Li
-1 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-09 8:13 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sean Christopherson, Michael Tokarev, kvm,
qemu-devel@nongnu.org Developers, Paolo Bonzini
On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote:
> > On Tue, Apr 06, 2021, Michael Tokarev wrote:
> >> broke kvm guest cpu time accounting - after this commit, when running
> >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> >> elsewhere) is always 0.
> >>
> >> I dunno why it happened, but it happened, and all kernels after 5.9
> >> are affected by this.
> >>
> >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
> >
> > Yes :-(
> >
> > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
> > either of the fixes, but an elegant solution hasn't presented itself.
> >
> > Thomas/Paolo, can you please weigh in?
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
> > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
>
> All of the solutions I looked at so far are ugly as hell. The problem is
> that the accounting is plumbed into the context tracking and moving
> context tracking around to a different place is just wrong.
>
> I think the right solution is to seperate the time accounting logic out
> from guest_enter/exit_irqoff() and have virt time specific helpers which
> can be placed at the proper spots in kvm.
For x86 part, how about something like below:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48b396f3..7aeb724 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3730,6 +3730,7 @@ static noinstr void svm_vcpu_enter_exit(struct
kvm_vcpu *vcpu)
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
+ account_guest_enter();
guest_enter_irqoff();
lockdep_hardirqs_on(CALLER_ADDR0);
@@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct
kvm_vcpu *vcpu)
*/
lockdep_hardirqs_off(CALLER_ADDR0);
guest_exit_irqoff();
+ if (vtime_accounting_enabled_this_cpu())
+ account_guest_exit();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c05e6e2..5f6c30c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6617,6 +6617,7 @@ static noinstr void vmx_vcpu_enter_exit(struct
kvm_vcpu *vcpu,
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
+ account_guest_enter();
guest_enter_irqoff();
lockdep_hardirqs_on(CALLER_ADDR0);
@@ -6648,6 +6649,8 @@ static noinstr void vmx_vcpu_enter_exit(struct
kvm_vcpu *vcpu,
*/
lockdep_hardirqs_off(CALLER_ADDR0);
guest_exit_irqoff();
+ if (vtime_accounting_enabled_this_cpu())
+ account_guest_exit();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16fb395..33422c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9229,6 +9229,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
++vcpu->stat.exits;
local_irq_disable();
kvm_after_interrupt(vcpu);
+ if (!vtime_accounting_enabled_this_cpu())
+ account_guest_exit();
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index bceb064..ff70229 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -104,8 +104,7 @@ static inline void context_tracking_init(void) { }
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-/* must be called with irqs disabled */
-static __always_inline void guest_enter_irqoff(void)
+static __always_inline void account_guest_enter(void)
{
instrumentation_begin();
if (vtime_accounting_enabled_this_cpu())
@@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void)
else
current->flags |= PF_VCPU;
instrumentation_end();
+}
+/* must be called with irqs disabled */
+static __always_inline void guest_enter_irqoff(void)
+{
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_GUEST);
@@ -131,11 +134,8 @@ static __always_inline void guest_enter_irqoff(void)
}
}
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void account_guest_exit(void)
{
- if (context_tracking_enabled())
- __context_tracking_exit(CONTEXT_GUEST);
-
instrumentation_begin();
if (vtime_accounting_enabled_this_cpu())
vtime_guest_exit(current);
@@ -144,8 +144,14 @@ static __always_inline void guest_exit_irqoff(void)
instrumentation_end();
}
+static __always_inline void guest_exit_irqoff(void)
+{
+ if (context_tracking_enabled())
+ __context_tracking_exit(CONTEXT_GUEST);
+}
+
#else
-static __always_inline void guest_enter_irqoff(void)
+static __always_inline void account_guest_enter(void)
{
/*
* This is running in ioctl context so its safe
@@ -155,11 +161,17 @@ static __always_inline void guest_enter_irqoff(void)
instrumentation_begin();
vtime_account_kernel(current);
current->flags |= PF_VCPU;
+ instrumentation_end();
+}
+
+static __always_inline void guest_enter_irqoff(void)
+{
+ instrumentation_begin();
rcu_virt_note_context_switch(smp_processor_id());
instrumentation_end();
}
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void account_guest_exit(void)
{
instrumentation_begin();
/* Flush the guest cputime we spent on the guest */
@@ -167,6 +179,11 @@ static __always_inline void guest_exit_irqoff(void)
current->flags &= ~PF_VCPU;
instrumentation_end();
}
+
+static __always_inline void guest_exit_irqoff(void)
+{
+
+}
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
static inline void guest_exit(void)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-09 8:13 ` Wanpeng Li
0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-09 8:13 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Sean Christopherson, Michael Tokarev,
qemu-devel@nongnu.org Developers, kvm, Paolo Bonzini
On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote:
> > On Tue, Apr 06, 2021, Michael Tokarev wrote:
> >> broke kvm guest cpu time accounting - after this commit, when running
> >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> >> elsewhere) is always 0.
> >>
> >> I dunno why it happened, but it happened, and all kernels after 5.9
> >> are affected by this.
> >>
> >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
> >
> > Yes :-(
> >
> > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like
> > either of the fixes, but an elegant solution hasn't presented itself.
> >
> > Thomas/Paolo, can you please weigh in?
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com
> > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com
>
> All of the solutions I looked at so far are ugly as hell. The problem is
> that the accounting is plumbed into the context tracking and moving
> context tracking around to a different place is just wrong.
>
> I think the right solution is to seperate the time accounting logic out
> from guest_enter/exit_irqoff() and have virt time specific helpers which
> can be placed at the proper spots in kvm.
For x86 part, how about something like below:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48b396f3..7aeb724 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3730,6 +3730,7 @@ static noinstr void svm_vcpu_enter_exit(struct
kvm_vcpu *vcpu)
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
+ account_guest_enter();
guest_enter_irqoff();
lockdep_hardirqs_on(CALLER_ADDR0);
@@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct
kvm_vcpu *vcpu)
*/
lockdep_hardirqs_off(CALLER_ADDR0);
guest_exit_irqoff();
+ if (vtime_accounting_enabled_this_cpu())
+ account_guest_exit();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c05e6e2..5f6c30c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6617,6 +6617,7 @@ static noinstr void vmx_vcpu_enter_exit(struct
kvm_vcpu *vcpu,
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
+ account_guest_enter();
guest_enter_irqoff();
lockdep_hardirqs_on(CALLER_ADDR0);
@@ -6648,6 +6649,8 @@ static noinstr void vmx_vcpu_enter_exit(struct
kvm_vcpu *vcpu,
*/
lockdep_hardirqs_off(CALLER_ADDR0);
guest_exit_irqoff();
+ if (vtime_accounting_enabled_this_cpu())
+ account_guest_exit();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16fb395..33422c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9229,6 +9229,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
++vcpu->stat.exits;
local_irq_disable();
kvm_after_interrupt(vcpu);
+ if (!vtime_accounting_enabled_this_cpu())
+ account_guest_exit();
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index bceb064..ff70229 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -104,8 +104,7 @@ static inline void context_tracking_init(void) { }
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-/* must be called with irqs disabled */
-static __always_inline void guest_enter_irqoff(void)
+static __always_inline void account_guest_enter(void)
{
instrumentation_begin();
if (vtime_accounting_enabled_this_cpu())
@@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void)
else
current->flags |= PF_VCPU;
instrumentation_end();
+}
+/* must be called with irqs disabled */
+static __always_inline void guest_enter_irqoff(void)
+{
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_GUEST);
@@ -131,11 +134,8 @@ static __always_inline void guest_enter_irqoff(void)
}
}
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void account_guest_exit(void)
{
- if (context_tracking_enabled())
- __context_tracking_exit(CONTEXT_GUEST);
-
instrumentation_begin();
if (vtime_accounting_enabled_this_cpu())
vtime_guest_exit(current);
@@ -144,8 +144,14 @@ static __always_inline void guest_exit_irqoff(void)
instrumentation_end();
}
+static __always_inline void guest_exit_irqoff(void)
+{
+ if (context_tracking_enabled())
+ __context_tracking_exit(CONTEXT_GUEST);
+}
+
#else
-static __always_inline void guest_enter_irqoff(void)
+static __always_inline void account_guest_enter(void)
{
/*
* This is running in ioctl context so its safe
@@ -155,11 +161,17 @@ static __always_inline void guest_enter_irqoff(void)
instrumentation_begin();
vtime_account_kernel(current);
current->flags |= PF_VCPU;
+ instrumentation_end();
+}
+
+static __always_inline void guest_enter_irqoff(void)
+{
+ instrumentation_begin();
rcu_virt_note_context_switch(smp_processor_id());
instrumentation_end();
}
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void account_guest_exit(void)
{
instrumentation_begin();
/* Flush the guest cputime we spent on the guest */
@@ -167,6 +179,11 @@ static __always_inline void guest_exit_irqoff(void)
current->flags &= ~PF_VCPU;
instrumentation_end();
}
+
+static __always_inline void guest_exit_irqoff(void)
+{
+
+}
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
static inline void guest_exit(void)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-09 8:13 ` Wanpeng Li
@ 2021-04-09 9:19 ` Thomas Gleixner
-1 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2021-04-09 9:19 UTC (permalink / raw)
To: Wanpeng Li
Cc: Sean Christopherson, Michael Tokarev, kvm,
qemu-devel@nongnu.org Developers, Paolo Bonzini
On Fri, Apr 09 2021 at 16:13, Wanpeng Li wrote:
> On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> + account_guest_enter();
This wants to move into the instrumentation_begin/end() section above.
> guest_enter_irqoff();
> lockdep_hardirqs_on(CALLER_ADDR0);
>
> @@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct
> kvm_vcpu *vcpu)
> */
> lockdep_hardirqs_off(CALLER_ADDR0);
> guest_exit_irqoff();
> + if (vtime_accounting_enabled_this_cpu())
> + account_guest_exit();
This time below. Aside of that I'd suggest to have two inlines instead
of having the conditional here.
>
> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> -/* must be called with irqs disabled */
> -static __always_inline void guest_enter_irqoff(void)
> +static __always_inline void account_guest_enter(void)
> {
> instrumentation_begin();
> if (vtime_accounting_enabled_this_cpu())
> @@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void)
> else
> current->flags |= PF_VCPU;
> instrumentation_end();
If you move the invocation into the instrumentable section then this
instrumentation_begin/end() can be removed.
Something like the below +/- the obligatory bikeshed painting
vs. function names.
Thanks,
tglx
---
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3782,6 +3782,7 @@ static noinstr void svm_vcpu_enter_exit(
* accordingly.
*/
instrumentation_begin();
+ vtime_account_guest_enter();
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
@@ -3816,6 +3817,7 @@ static noinstr void svm_vcpu_enter_exit(
instrumentation_begin();
trace_hardirqs_off_finish();
+ vtime_account_guest_exit();
instrumentation_end();
}
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6655,6 +6655,7 @@ static noinstr void vmx_vcpu_enter_exit(
* accordingly.
*/
instrumentation_begin();
+ vtime_account_guest_enter();
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
@@ -6693,6 +6694,7 @@ static noinstr void vmx_vcpu_enter_exit(
instrumentation_begin();
trace_hardirqs_off_finish();
+ vtime_account_guest_exit();
instrumentation_end();
}
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9200,6 +9200,7 @@ static int vcpu_enter_guest(struct kvm_v
++vcpu->stat.exits;
local_irq_disable();
kvm_after_interrupt(vcpu);
+ vcpu_account_guest_exit();
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -107,13 +107,6 @@ static inline void context_tracking_init
/* must be called with irqs disabled */
static __always_inline void guest_enter_irqoff(void)
{
- instrumentation_begin();
- if (vtime_accounting_enabled_this_cpu())
- vtime_guest_enter(current);
- else
- current->flags |= PF_VCPU;
- instrumentation_end();
-
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_GUEST);
@@ -135,37 +128,18 @@ static __always_inline void guest_exit_i
{
if (context_tracking_enabled())
__context_tracking_exit(CONTEXT_GUEST);
-
- instrumentation_begin();
- if (vtime_accounting_enabled_this_cpu())
- vtime_guest_exit(current);
- else
- current->flags &= ~PF_VCPU;
- instrumentation_end();
}
#else
static __always_inline void guest_enter_irqoff(void)
{
- /*
- * This is running in ioctl context so its safe
- * to assume that it's the stime pending cputime
- * to flush.
- */
instrumentation_begin();
- vtime_account_kernel(current);
- current->flags |= PF_VCPU;
rcu_virt_note_context_switch(smp_processor_id());
instrumentation_end();
}
static __always_inline void guest_exit_irqoff(void)
{
- instrumentation_begin();
- /* Flush the guest cputime we spent on the guest */
- vtime_account_kernel(current);
- current->flags &= ~PF_VCPU;
- instrumentation_end();
}
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
@@ -178,4 +152,24 @@ static inline void guest_exit(void)
local_irq_restore(flags);
}
+static __always_inline void vtime_account_guest_enter(void)
+{
+ if (vtime_accounting_enabled_this_cpu())
+ vtime_guest_enter(current);
+ else
+ current->flags |= PF_VCPU;
+}
+
+static __always_inline void vtime_account_guest_exit(void)
+{
+ if (vtime_accounting_enabled_this_cpu())
+ vtime_guest_exit(current);
+}
+
+static __always_inline void vcpu_account_guest_exit(void)
+{
+ if (!vtime_accounting_enabled_this_cpu())
+ current->flags &= ~PF_VCPU;
+}
+
#endif
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-09 9:19 ` Thomas Gleixner
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2021-04-09 9:19 UTC (permalink / raw)
To: Wanpeng Li
Cc: Sean Christopherson, Michael Tokarev,
qemu-devel@nongnu.org Developers, kvm, Paolo Bonzini
On Fri, Apr 09 2021 at 16:13, Wanpeng Li wrote:
> On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> + account_guest_enter();
This wants to move into the instrumentation_begin/end() section above.
> guest_enter_irqoff();
> lockdep_hardirqs_on(CALLER_ADDR0);
>
> @@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct
> kvm_vcpu *vcpu)
> */
> lockdep_hardirqs_off(CALLER_ADDR0);
> guest_exit_irqoff();
> + if (vtime_accounting_enabled_this_cpu())
> + account_guest_exit();
This time below. Aside of that I'd suggest to have two inlines instead
of having the conditional here.
>
> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> -/* must be called with irqs disabled */
> -static __always_inline void guest_enter_irqoff(void)
> +static __always_inline void account_guest_enter(void)
> {
> instrumentation_begin();
> if (vtime_accounting_enabled_this_cpu())
> @@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void)
> else
> current->flags |= PF_VCPU;
> instrumentation_end();
If you move the invocation into the instrumentable section then this
instrumentation_begin/end() can be removed.
Something like the below +/- the obligatory bikeshed painting
vs. function names.
Thanks,
tglx
---
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3782,6 +3782,7 @@ static noinstr void svm_vcpu_enter_exit(
* accordingly.
*/
instrumentation_begin();
+ vtime_account_guest_enter();
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
@@ -3816,6 +3817,7 @@ static noinstr void svm_vcpu_enter_exit(
instrumentation_begin();
trace_hardirqs_off_finish();
+ vtime_account_guest_exit();
instrumentation_end();
}
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6655,6 +6655,7 @@ static noinstr void vmx_vcpu_enter_exit(
* accordingly.
*/
instrumentation_begin();
+ vtime_account_guest_enter();
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
@@ -6693,6 +6694,7 @@ static noinstr void vmx_vcpu_enter_exit(
instrumentation_begin();
trace_hardirqs_off_finish();
+ vtime_account_guest_exit();
instrumentation_end();
}
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9200,6 +9200,7 @@ static int vcpu_enter_guest(struct kvm_v
++vcpu->stat.exits;
local_irq_disable();
kvm_after_interrupt(vcpu);
+ vcpu_account_guest_exit();
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -107,13 +107,6 @@ static inline void context_tracking_init
/* must be called with irqs disabled */
static __always_inline void guest_enter_irqoff(void)
{
- instrumentation_begin();
- if (vtime_accounting_enabled_this_cpu())
- vtime_guest_enter(current);
- else
- current->flags |= PF_VCPU;
- instrumentation_end();
-
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_GUEST);
@@ -135,37 +128,18 @@ static __always_inline void guest_exit_i
{
if (context_tracking_enabled())
__context_tracking_exit(CONTEXT_GUEST);
-
- instrumentation_begin();
- if (vtime_accounting_enabled_this_cpu())
- vtime_guest_exit(current);
- else
- current->flags &= ~PF_VCPU;
- instrumentation_end();
}
#else
static __always_inline void guest_enter_irqoff(void)
{
- /*
- * This is running in ioctl context so its safe
- * to assume that it's the stime pending cputime
- * to flush.
- */
instrumentation_begin();
- vtime_account_kernel(current);
- current->flags |= PF_VCPU;
rcu_virt_note_context_switch(smp_processor_id());
instrumentation_end();
}
static __always_inline void guest_exit_irqoff(void)
{
- instrumentation_begin();
- /* Flush the guest cputime we spent on the guest */
- vtime_account_kernel(current);
- current->flags &= ~PF_VCPU;
- instrumentation_end();
}
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
@@ -178,4 +152,24 @@ static inline void guest_exit(void)
local_irq_restore(flags);
}
+static __always_inline void vtime_account_guest_enter(void)
+{
+ if (vtime_accounting_enabled_this_cpu())
+ vtime_guest_enter(current);
+ else
+ current->flags |= PF_VCPU;
+}
+
+static __always_inline void vtime_account_guest_exit(void)
+{
+ if (vtime_accounting_enabled_this_cpu())
+ vtime_guest_exit(current);
+}
+
+static __always_inline void vcpu_account_guest_exit(void)
+{
+ if (!vtime_accounting_enabled_this_cpu())
+ current->flags &= ~PF_VCPU;
+}
+
#endif
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev
@ 2021-04-13 10:48 ` Wanpeng Li
2021-04-07 11:01 ` Wanpeng Li
2021-04-13 10:48 ` Wanpeng Li
2 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-13 10:48 UTC (permalink / raw)
To: Michael Tokarev
Cc: kvm, qemu-devel@nongnu.org Developers, Thomas Gleixner, Paolo Bonzini
On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> Hi!
>
> It looks like this commit:
>
> commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed Jul 8 21:51:54 2020 +0200
>
> x86/kvm: Move context tracking where it belongs
>
> Context tracking for KVM happens way too early in the vcpu_run()
> code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
> cannot use RCU and should also be not instrumented.
>
> The current way of doing this covers way too much code. Move it closer to
> the actual vmenter/exit code.
>
> broke kvm guest cpu time accounting - after this commit, when running
> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> elsewhere) is always 0.
>
> I dunno why it happened, but it happened, and all kernels after 5.9
> are affected by this.
>
> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
Hi Michael,
Please have a try.
https://lore.kernel.org/kvm/1618298169-3831-1-git-send-email-wanpengli@tencent.com/
Wanpeng
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting
@ 2021-04-13 10:48 ` Wanpeng Li
0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2021-04-13 10:48 UTC (permalink / raw)
To: Michael Tokarev
Cc: Paolo Bonzini, Thomas Gleixner, qemu-devel@nongnu.org Developers, kvm
On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> Hi!
>
> It looks like this commit:
>
> commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed Jul 8 21:51:54 2020 +0200
>
> x86/kvm: Move context tracking where it belongs
>
> Context tracking for KVM happens way too early in the vcpu_run()
> code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
> cannot use RCU and should also be not instrumented.
>
> The current way of doing this covers way too much code. Move it closer to
> the actual vmenter/exit code.
>
> broke kvm guest cpu time accounting - after this commit, when running
> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and
> elsewhere) is always 0.
>
> I dunno why it happened, but it happened, and all kernels after 5.9
> are affected by this.
>
> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10.
Hi Michael,
Please have a try.
https://lore.kernel.org/kvm/1618298169-3831-1-git-send-email-wanpengli@tencent.com/
Wanpeng
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-04-13 10:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev
2021-04-06 21:47 ` Sean Christopherson
2021-04-08 13:18 ` Thomas Gleixner
2021-04-08 13:18 ` Thomas Gleixner
2021-04-09 2:14 ` Wanpeng Li
2021-04-09 2:14 ` Wanpeng Li
2021-04-09 8:13 ` Wanpeng Li
2021-04-09 8:13 ` Wanpeng Li
2021-04-09 9:19 ` Thomas Gleixner
2021-04-09 9:19 ` Thomas Gleixner
2021-04-07 11:01 ` Wanpeng Li
2021-04-07 11:01 ` Wanpeng Li
2021-04-13 10:48 ` Wanpeng Li
2021-04-13 10:48 ` Wanpeng Li
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.