* Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-06 20:17 Michael Tokarev 2021-04-06 21:47 ` Sean Christopherson ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Michael Tokarev @ 2021-04-06 20:17 UTC (permalink / raw) To: kvm, qemu-devel, Thomas Gleixner; +Cc: Paolo Bonzini Hi! It looks like this commit: commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 8 21:51:54 2020 +0200 x86/kvm: Move context tracking where it belongs Context tracking for KVM happens way too early in the vcpu_run() code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() cannot use RCU and should also be not instrumented. The current way of doing this covers way too much code. Move it closer to the actual vmenter/exit code. broke kvm guest cpu time accounting - after this commit, when running qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and elsewhere) is always 0. I dunno why it happened, but it happened, and all kernels after 5.9 are affected by this. This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. Thanks, /mjt ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev @ 2021-04-06 21:47 ` Sean Christopherson 2021-04-08 13:18 ` Thomas Gleixner 2021-04-07 11:01 ` Wanpeng Li 2021-04-13 10:48 ` Wanpeng Li 2 siblings, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2021-04-06 21:47 UTC (permalink / raw) To: Michael Tokarev; +Cc: kvm, qemu-devel, Thomas Gleixner, Paolo Bonzini On Tue, Apr 06, 2021, Michael Tokarev wrote: > Hi! > > It looks like this commit: > > commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Wed Jul 8 21:51:54 2020 +0200 > > x86/kvm: Move context tracking where it belongs > > Context tracking for KVM happens way too early in the vcpu_run() > code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() > cannot use RCU and should also be not instrumented. > > The current way of doing this covers way too much code. Move it closer to > the actual vmenter/exit code. > > broke kvm guest cpu time accounting - after this commit, when running > qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > elsewhere) is always 0. > > I dunno why it happened, but it happened, and all kernels after 5.9 > are affected by this. > > This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. Yes :-( There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like either of the fixes, but an elegant solution hasn't presented itself. Thomas/Paolo, can you please weigh in? [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-06 21:47 ` Sean Christopherson @ 2021-04-08 13:18 ` Thomas Gleixner 0 siblings, 0 replies; 14+ messages in thread From: Thomas Gleixner @ 2021-04-08 13:18 UTC (permalink / raw) To: Sean Christopherson, Michael Tokarev; +Cc: kvm, qemu-devel, Paolo Bonzini On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote: > On Tue, Apr 06, 2021, Michael Tokarev wrote: >> broke kvm guest cpu time accounting - after this commit, when running >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and >> elsewhere) is always 0. >> >> I dunno why it happened, but it happened, and all kernels after 5.9 >> are affected by this. >> >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. > > Yes :-( > > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like > either of the fixes, but an elegant solution hasn't presented itself. > > Thomas/Paolo, can you please weigh in? > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com All of the solutions I looked at so far are ugly as hell. The problem is that the accounting is plumbed into the context tracking and moving context tracking around to a different place is just wrong. I think the right solution is to seperate the time accounting logic out from guest_enter/exit_irqoff() and have virt time specific helpers which can be placed at the proper spots in kvm. Thanks, tglx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-08 13:18 ` Thomas Gleixner 0 siblings, 0 replies; 14+ messages in thread From: Thomas Gleixner @ 2021-04-08 13:18 UTC (permalink / raw) To: Sean Christopherson, Michael Tokarev; +Cc: Paolo Bonzini, qemu-devel, kvm On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote: > On Tue, Apr 06, 2021, Michael Tokarev wrote: >> broke kvm guest cpu time accounting - after this commit, when running >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and >> elsewhere) is always 0. >> >> I dunno why it happened, but it happened, and all kernels after 5.9 >> are affected by this. >> >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. > > Yes :-( > > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like > either of the fixes, but an elegant solution hasn't presented itself. > > Thomas/Paolo, can you please weigh in? > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com All of the solutions I looked at so far are ugly as hell. The problem is that the accounting is plumbed into the context tracking and moving context tracking around to a different place is just wrong. I think the right solution is to seperate the time accounting logic out from guest_enter/exit_irqoff() and have virt time specific helpers which can be placed at the proper spots in kvm. Thanks, tglx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-08 13:18 ` Thomas Gleixner @ 2021-04-09 2:14 ` Wanpeng Li -1 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-09 2:14 UTC (permalink / raw) To: Thomas Gleixner Cc: Sean Christopherson, Michael Tokarev, kvm, qemu-devel@nongnu.org Developers, Paolo Bonzini On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote: > > On Tue, Apr 06, 2021, Michael Tokarev wrote: > >> broke kvm guest cpu time accounting - after this commit, when running > >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > >> elsewhere) is always 0. > >> > >> I dunno why it happened, but it happened, and all kernels after 5.9 > >> are affected by this. > >> > >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. > > > > Yes :-( > > > > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like > > either of the fixes, but an elegant solution hasn't presented itself. > > > > Thomas/Paolo, can you please weigh in? > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 > > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com > > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com > > All of the solutions I looked at so far are ugly as hell. The problem is > that the accounting is plumbed into the context tracking and moving > context tracking around to a different place is just wrong. > > I think the right solution is to seperate the time accounting logic out > from guest_enter/exit_irqoff() and have virt time specific helpers which > can be placed at the proper spots in kvm. Good suggestion, I will have a try. :) Wanpeng ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-09 2:14 ` Wanpeng Li 0 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-09 2:14 UTC (permalink / raw) To: Thomas Gleixner Cc: Sean Christopherson, Michael Tokarev, qemu-devel@nongnu.org Developers, kvm, Paolo Bonzini On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote: > > On Tue, Apr 06, 2021, Michael Tokarev wrote: > >> broke kvm guest cpu time accounting - after this commit, when running > >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > >> elsewhere) is always 0. > >> > >> I dunno why it happened, but it happened, and all kernels after 5.9 > >> are affected by this. > >> > >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. > > > > Yes :-( > > > > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like > > either of the fixes, but an elegant solution hasn't presented itself. > > > > Thomas/Paolo, can you please weigh in? > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 > > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com > > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com > > All of the solutions I looked at so far are ugly as hell. The problem is > that the accounting is plumbed into the context tracking and moving > context tracking around to a different place is just wrong. > > I think the right solution is to seperate the time accounting logic out > from guest_enter/exit_irqoff() and have virt time specific helpers which > can be placed at the proper spots in kvm. Good suggestion, I will have a try. :) Wanpeng ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-08 13:18 ` Thomas Gleixner @ 2021-04-09 8:13 ` Wanpeng Li -1 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-09 8:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Sean Christopherson, Michael Tokarev, kvm, qemu-devel@nongnu.org Developers, Paolo Bonzini On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote: > > On Tue, Apr 06, 2021, Michael Tokarev wrote: > >> broke kvm guest cpu time accounting - after this commit, when running > >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > >> elsewhere) is always 0. > >> > >> I dunno why it happened, but it happened, and all kernels after 5.9 > >> are affected by this. > >> > >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. > > > > Yes :-( > > > > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like > > either of the fixes, but an elegant solution hasn't presented itself. > > > > Thomas/Paolo, can you please weigh in? > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 > > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com > > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com > > All of the solutions I looked at so far are ugly as hell. The problem is > that the accounting is plumbed into the context tracking and moving > context tracking around to a different place is just wrong. > > I think the right solution is to seperate the time accounting logic out > from guest_enter/exit_irqoff() and have virt time specific helpers which > can be placed at the proper spots in kvm. For x86 part, how about something like below: diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 48b396f3..7aeb724 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3730,6 +3730,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu) lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); + account_guest_enter(); guest_enter_irqoff(); lockdep_hardirqs_on(CALLER_ADDR0); @@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu) */ lockdep_hardirqs_off(CALLER_ADDR0); guest_exit_irqoff(); + if (vtime_accounting_enabled_this_cpu()) + account_guest_exit(); instrumentation_begin(); trace_hardirqs_off_finish(); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c05e6e2..5f6c30c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6617,6 +6617,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); + account_guest_enter(); guest_enter_irqoff(); lockdep_hardirqs_on(CALLER_ADDR0); @@ -6648,6 +6649,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, */ lockdep_hardirqs_off(CALLER_ADDR0); guest_exit_irqoff(); + if (vtime_accounting_enabled_this_cpu()) + account_guest_exit(); instrumentation_begin(); trace_hardirqs_off_finish(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 16fb395..33422c0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9229,6 +9229,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ++vcpu->stat.exits; local_irq_disable(); kvm_after_interrupt(vcpu); + if (!vtime_accounting_enabled_this_cpu()) + account_guest_exit(); if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index bceb064..ff70229 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -104,8 +104,7 @@ static inline void context_tracking_init(void) { } #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN -/* must be called with irqs disabled */ -static __always_inline void guest_enter_irqoff(void) +static __always_inline void account_guest_enter(void) { instrumentation_begin(); if (vtime_accounting_enabled_this_cpu()) @@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void) else current->flags |= PF_VCPU; instrumentation_end(); +} +/* must be called with irqs disabled */ +static __always_inline void guest_enter_irqoff(void) +{ if (context_tracking_enabled()) __context_tracking_enter(CONTEXT_GUEST); @@ -131,11 +134,8 @@ static __always_inline void guest_enter_irqoff(void) } } -static __always_inline void guest_exit_irqoff(void) +static __always_inline void account_guest_exit(void) { - if (context_tracking_enabled()) - __context_tracking_exit(CONTEXT_GUEST); - instrumentation_begin(); if (vtime_accounting_enabled_this_cpu()) vtime_guest_exit(current); @@ -144,8 +144,14 @@ static __always_inline void guest_exit_irqoff(void) instrumentation_end(); } +static __always_inline void guest_exit_irqoff(void) +{ + if (context_tracking_enabled()) + __context_tracking_exit(CONTEXT_GUEST); +} + #else -static __always_inline void guest_enter_irqoff(void) +static __always_inline void account_guest_enter(void) { /* * This is running in ioctl context so its safe @@ -155,11 +161,17 @@ static __always_inline void guest_enter_irqoff(void) instrumentation_begin(); vtime_account_kernel(current); current->flags |= PF_VCPU; + instrumentation_end(); +} + +static __always_inline void guest_enter_irqoff(void) +{ + instrumentation_begin(); rcu_virt_note_context_switch(smp_processor_id()); instrumentation_end(); } -static __always_inline void guest_exit_irqoff(void) +static __always_inline void account_guest_exit(void) { instrumentation_begin(); /* Flush the guest cputime we spent on the guest */ @@ -167,6 +179,11 @@ static __always_inline void guest_exit_irqoff(void) current->flags &= ~PF_VCPU; instrumentation_end(); } + +static __always_inline void guest_exit_irqoff(void) +{ + +} #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */ static inline void guest_exit(void) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-09 8:13 ` Wanpeng Li 0 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-09 8:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Sean Christopherson, Michael Tokarev, qemu-devel@nongnu.org Developers, kvm, Paolo Bonzini On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Apr 06 2021 at 21:47, Sean Christopherson wrote: > > On Tue, Apr 06, 2021, Michael Tokarev wrote: > >> broke kvm guest cpu time accounting - after this commit, when running > >> qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > >> elsewhere) is always 0. > >> > >> I dunno why it happened, but it happened, and all kernels after 5.9 > >> are affected by this. > >> > >> This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. > > > > Yes :-( > > > > There's a bugzilla[1] and two proposed fixes[2][3]. I don't particularly like > > either of the fixes, but an elegant solution hasn't presented itself. > > > > Thomas/Paolo, can you please weigh in? > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=209831 > > [2] https://lkml.kernel.org/r/1617011036-11734-1-git-send-email-wanpengli@tencent.com > > [3] https://lkml.kernel.org/r/20210206004218.312023-1-seanjc@google.com > > All of the solutions I looked at so far are ugly as hell. The problem is > that the accounting is plumbed into the context tracking and moving > context tracking around to a different place is just wrong. > > I think the right solution is to seperate the time accounting logic out > from guest_enter/exit_irqoff() and have virt time specific helpers which > can be placed at the proper spots in kvm. For x86 part, how about something like below: diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 48b396f3..7aeb724 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3730,6 +3730,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu) lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); + account_guest_enter(); guest_enter_irqoff(); lockdep_hardirqs_on(CALLER_ADDR0); @@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu) */ lockdep_hardirqs_off(CALLER_ADDR0); guest_exit_irqoff(); + if (vtime_accounting_enabled_this_cpu()) + account_guest_exit(); instrumentation_begin(); trace_hardirqs_off_finish(); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c05e6e2..5f6c30c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6617,6 +6617,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); + account_guest_enter(); guest_enter_irqoff(); lockdep_hardirqs_on(CALLER_ADDR0); @@ -6648,6 +6649,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, */ lockdep_hardirqs_off(CALLER_ADDR0); guest_exit_irqoff(); + if (vtime_accounting_enabled_this_cpu()) + account_guest_exit(); instrumentation_begin(); trace_hardirqs_off_finish(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 16fb395..33422c0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9229,6 +9229,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ++vcpu->stat.exits; local_irq_disable(); kvm_after_interrupt(vcpu); + if (!vtime_accounting_enabled_this_cpu()) + account_guest_exit(); if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index bceb064..ff70229 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -104,8 +104,7 @@ static inline void context_tracking_init(void) { } #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN -/* must be called with irqs disabled */ -static __always_inline void guest_enter_irqoff(void) +static __always_inline void account_guest_enter(void) { instrumentation_begin(); if (vtime_accounting_enabled_this_cpu()) @@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void) else current->flags |= PF_VCPU; instrumentation_end(); +} +/* must be called with irqs disabled */ +static __always_inline void guest_enter_irqoff(void) +{ if (context_tracking_enabled()) __context_tracking_enter(CONTEXT_GUEST); @@ -131,11 +134,8 @@ static __always_inline void guest_enter_irqoff(void) } } -static __always_inline void guest_exit_irqoff(void) +static __always_inline void account_guest_exit(void) { - if (context_tracking_enabled()) - __context_tracking_exit(CONTEXT_GUEST); - instrumentation_begin(); if (vtime_accounting_enabled_this_cpu()) vtime_guest_exit(current); @@ -144,8 +144,14 @@ static __always_inline void guest_exit_irqoff(void) instrumentation_end(); } +static __always_inline void guest_exit_irqoff(void) +{ + if (context_tracking_enabled()) + __context_tracking_exit(CONTEXT_GUEST); +} + #else -static __always_inline void guest_enter_irqoff(void) +static __always_inline void account_guest_enter(void) { /* * This is running in ioctl context so its safe @@ -155,11 +161,17 @@ static __always_inline void guest_enter_irqoff(void) instrumentation_begin(); vtime_account_kernel(current); current->flags |= PF_VCPU; + instrumentation_end(); +} + +static __always_inline void guest_enter_irqoff(void) +{ + instrumentation_begin(); rcu_virt_note_context_switch(smp_processor_id()); instrumentation_end(); } -static __always_inline void guest_exit_irqoff(void) +static __always_inline void account_guest_exit(void) { instrumentation_begin(); /* Flush the guest cputime we spent on the guest */ @@ -167,6 +179,11 @@ static __always_inline void guest_exit_irqoff(void) current->flags &= ~PF_VCPU; instrumentation_end(); } + +static __always_inline void guest_exit_irqoff(void) +{ + +} #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */ static inline void guest_exit(void) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-09 8:13 ` Wanpeng Li @ 2021-04-09 9:19 ` Thomas Gleixner -1 siblings, 0 replies; 14+ messages in thread From: Thomas Gleixner @ 2021-04-09 9:19 UTC (permalink / raw) To: Wanpeng Li Cc: Sean Christopherson, Michael Tokarev, kvm, qemu-devel@nongnu.org Developers, Paolo Bonzini On Fri, Apr 09 2021 at 16:13, Wanpeng Li wrote: > On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote: > > + account_guest_enter(); This wants to move into the instrumentation_begin/end() section above. > guest_enter_irqoff(); > lockdep_hardirqs_on(CALLER_ADDR0); > > @@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct > kvm_vcpu *vcpu) > */ > lockdep_hardirqs_off(CALLER_ADDR0); > guest_exit_irqoff(); > + if (vtime_accounting_enabled_this_cpu()) > + account_guest_exit(); This time below. Aside of that I'd suggest to have two inlines instead of having the conditional here. > > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN > -/* must be called with irqs disabled */ > -static __always_inline void guest_enter_irqoff(void) > +static __always_inline void account_guest_enter(void) > { > instrumentation_begin(); > if (vtime_accounting_enabled_this_cpu()) > @@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void) > else > current->flags |= PF_VCPU; > instrumentation_end(); If you move the invocation into the instrumentable section then this instrumentation_begin/end() can be removed. Something like the below +/- the obligatory bikeshed painting vs. function names. Thanks, tglx --- --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3782,6 +3782,7 @@ static noinstr void svm_vcpu_enter_exit( * accordingly. */ instrumentation_begin(); + vtime_account_guest_enter(); trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); @@ -3816,6 +3817,7 @@ static noinstr void svm_vcpu_enter_exit( instrumentation_begin(); trace_hardirqs_off_finish(); + vtime_account_guest_exit(); instrumentation_end(); } --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6655,6 +6655,7 @@ static noinstr void vmx_vcpu_enter_exit( * accordingly. */ instrumentation_begin(); + vtime_account_guest_enter(); trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); @@ -6693,6 +6694,7 @@ static noinstr void vmx_vcpu_enter_exit( instrumentation_begin(); trace_hardirqs_off_finish(); + vtime_account_guest_exit(); instrumentation_end(); } --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9200,6 +9200,7 @@ static int vcpu_enter_guest(struct kvm_v ++vcpu->stat.exits; local_irq_disable(); kvm_after_interrupt(vcpu); + vcpu_account_guest_exit(); if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -107,13 +107,6 @@ static inline void context_tracking_init /* must be called with irqs disabled */ static __always_inline void guest_enter_irqoff(void) { - instrumentation_begin(); - if (vtime_accounting_enabled_this_cpu()) - vtime_guest_enter(current); - else - current->flags |= PF_VCPU; - instrumentation_end(); - if (context_tracking_enabled()) __context_tracking_enter(CONTEXT_GUEST); @@ -135,37 +128,18 @@ static __always_inline void guest_exit_i { if (context_tracking_enabled()) __context_tracking_exit(CONTEXT_GUEST); - - instrumentation_begin(); - if (vtime_accounting_enabled_this_cpu()) - vtime_guest_exit(current); - else - current->flags &= ~PF_VCPU; - instrumentation_end(); } #else static __always_inline void guest_enter_irqoff(void) { - /* - * This is running in ioctl context so its safe - * to assume that it's the stime pending cputime - * to flush. - */ instrumentation_begin(); - vtime_account_kernel(current); - current->flags |= PF_VCPU; rcu_virt_note_context_switch(smp_processor_id()); instrumentation_end(); } static __always_inline void guest_exit_irqoff(void) { - instrumentation_begin(); - /* Flush the guest cputime we spent on the guest */ - vtime_account_kernel(current); - current->flags &= ~PF_VCPU; - instrumentation_end(); } #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */ @@ -178,4 +152,24 @@ static inline void guest_exit(void) local_irq_restore(flags); } +static __always_inline void vtime_account_guest_enter(void) +{ + if (vtime_accounting_enabled_this_cpu()) + vtime_guest_enter(current); + else + current->flags |= PF_VCPU; +} + +static __always_inline void vtime_account_guest_exit(void) +{ + if (vtime_accounting_enabled_this_cpu()) + vtime_guest_exit(current); +} + +static __always_inline void vcpu_account_guest_exit(void) +{ + if (!vtime_accounting_enabled_this_cpu()) + current->flags &= ~PF_VCPU; +} + #endif ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-09 9:19 ` Thomas Gleixner 0 siblings, 0 replies; 14+ messages in thread From: Thomas Gleixner @ 2021-04-09 9:19 UTC (permalink / raw) To: Wanpeng Li Cc: Sean Christopherson, Michael Tokarev, qemu-devel@nongnu.org Developers, kvm, Paolo Bonzini On Fri, Apr 09 2021 at 16:13, Wanpeng Li wrote: > On Thu, 8 Apr 2021 at 21:19, Thomas Gleixner <tglx@linutronix.de> wrote: > > + account_guest_enter(); This wants to move into the instrumentation_begin/end() section above. > guest_enter_irqoff(); > lockdep_hardirqs_on(CALLER_ADDR0); > > @@ -3759,6 +3760,8 @@ static noinstr void svm_vcpu_enter_exit(struct > kvm_vcpu *vcpu) > */ > lockdep_hardirqs_off(CALLER_ADDR0); > guest_exit_irqoff(); > + if (vtime_accounting_enabled_this_cpu()) > + account_guest_exit(); This time below. Aside of that I'd suggest to have two inlines instead of having the conditional here. > > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN > -/* must be called with irqs disabled */ > -static __always_inline void guest_enter_irqoff(void) > +static __always_inline void account_guest_enter(void) > { > instrumentation_begin(); > if (vtime_accounting_enabled_this_cpu()) > @@ -113,7 +112,11 @@ static __always_inline void guest_enter_irqoff(void) > else > current->flags |= PF_VCPU; > instrumentation_end(); If you move the invocation into the instrumentable section then this instrumentation_begin/end() can be removed. Something like the below +/- the obligatory bikeshed painting vs. function names. Thanks, tglx --- --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3782,6 +3782,7 @@ static noinstr void svm_vcpu_enter_exit( * accordingly. */ instrumentation_begin(); + vtime_account_guest_enter(); trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); @@ -3816,6 +3817,7 @@ static noinstr void svm_vcpu_enter_exit( instrumentation_begin(); trace_hardirqs_off_finish(); + vtime_account_guest_exit(); instrumentation_end(); } --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6655,6 +6655,7 @@ static noinstr void vmx_vcpu_enter_exit( * accordingly. */ instrumentation_begin(); + vtime_account_guest_enter(); trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); @@ -6693,6 +6694,7 @@ static noinstr void vmx_vcpu_enter_exit( instrumentation_begin(); trace_hardirqs_off_finish(); + vtime_account_guest_exit(); instrumentation_end(); } --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9200,6 +9200,7 @@ static int vcpu_enter_guest(struct kvm_v ++vcpu->stat.exits; local_irq_disable(); kvm_after_interrupt(vcpu); + vcpu_account_guest_exit(); if (lapic_in_kernel(vcpu)) { s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta; --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -107,13 +107,6 @@ static inline void context_tracking_init /* must be called with irqs disabled */ static __always_inline void guest_enter_irqoff(void) { - instrumentation_begin(); - if (vtime_accounting_enabled_this_cpu()) - vtime_guest_enter(current); - else - current->flags |= PF_VCPU; - instrumentation_end(); - if (context_tracking_enabled()) __context_tracking_enter(CONTEXT_GUEST); @@ -135,37 +128,18 @@ static __always_inline void guest_exit_i { if (context_tracking_enabled()) __context_tracking_exit(CONTEXT_GUEST); - - instrumentation_begin(); - if (vtime_accounting_enabled_this_cpu()) - vtime_guest_exit(current); - else - current->flags &= ~PF_VCPU; - instrumentation_end(); } #else static __always_inline void guest_enter_irqoff(void) { - /* - * This is running in ioctl context so its safe - * to assume that it's the stime pending cputime - * to flush. - */ instrumentation_begin(); - vtime_account_kernel(current); - current->flags |= PF_VCPU; rcu_virt_note_context_switch(smp_processor_id()); instrumentation_end(); } static __always_inline void guest_exit_irqoff(void) { - instrumentation_begin(); - /* Flush the guest cputime we spent on the guest */ - vtime_account_kernel(current); - current->flags &= ~PF_VCPU; - instrumentation_end(); } #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */ @@ -178,4 +152,24 @@ static inline void guest_exit(void) local_irq_restore(flags); } +static __always_inline void vtime_account_guest_enter(void) +{ + if (vtime_accounting_enabled_this_cpu()) + vtime_guest_enter(current); + else + current->flags |= PF_VCPU; +} + +static __always_inline void vtime_account_guest_exit(void) +{ + if (vtime_accounting_enabled_this_cpu()) + vtime_guest_exit(current); +} + +static __always_inline void vcpu_account_guest_exit(void) +{ + if (!vtime_accounting_enabled_this_cpu()) + current->flags &= ~PF_VCPU; +} + #endif ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev @ 2021-04-07 11:01 ` Wanpeng Li 2021-04-07 11:01 ` Wanpeng Li 2021-04-13 10:48 ` Wanpeng Li 2 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-07 11:01 UTC (permalink / raw) To: mjt; +Cc: kvm, qemu-devel@nongnu.org Developers, Thomas Gleixner, Paolo Bonzini On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote: > > Hi! > > It looks like this commit: > > commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Wed Jul 8 21:51:54 2020 +0200 > > x86/kvm: Move context tracking where it belongs > > Context tracking for KVM happens way too early in the vcpu_run() > code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() > cannot use RCU and should also be not instrumented. > > The current way of doing this covers way too much code. Move it closer to > the actual vmenter/exit code. > > broke kvm guest cpu time accounting - after this commit, when running > qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > elsewhere) is always 0. > > I dunno why it happened, but it happened, and all kernels after 5.9 > are affected by this. > > This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. Thanks for the report. I have a patch here which can fix it. https://lore.kernel.org/kvm/1617011036-11734-1-git-send-email-wanpengli@tencent.com/ Wanpeng ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-07 11:01 ` Wanpeng Li 0 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-07 11:01 UTC (permalink / raw) To: mjt; +Cc: Paolo Bonzini, Thomas Gleixner, qemu-devel@nongnu.org Developers, kvm On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote: > > Hi! > > It looks like this commit: > > commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Wed Jul 8 21:51:54 2020 +0200 > > x86/kvm: Move context tracking where it belongs > > Context tracking for KVM happens way too early in the vcpu_run() > code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() > cannot use RCU and should also be not instrumented. > > The current way of doing this covers way too much code. Move it closer to > the actual vmenter/exit code. > > broke kvm guest cpu time accounting - after this commit, when running > qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > elsewhere) is always 0. > > I dunno why it happened, but it happened, and all kernels after 5.9 > are affected by this. > > This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. Thanks for the report. I have a patch here which can fix it. https://lore.kernel.org/kvm/1617011036-11734-1-git-send-email-wanpengli@tencent.com/ Wanpeng ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting 2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev @ 2021-04-13 10:48 ` Wanpeng Li 2021-04-07 11:01 ` Wanpeng Li 2021-04-13 10:48 ` Wanpeng Li 2 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-13 10:48 UTC (permalink / raw) To: Michael Tokarev Cc: kvm, qemu-devel@nongnu.org Developers, Thomas Gleixner, Paolo Bonzini On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote: > > Hi! > > It looks like this commit: > > commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Wed Jul 8 21:51:54 2020 +0200 > > x86/kvm: Move context tracking where it belongs > > Context tracking for KVM happens way too early in the vcpu_run() > code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() > cannot use RCU and should also be not instrumented. > > The current way of doing this covers way too much code. Move it closer to > the actual vmenter/exit code. > > broke kvm guest cpu time accounting - after this commit, when running > qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > elsewhere) is always 0. > > I dunno why it happened, but it happened, and all kernels after 5.9 > are affected by this. > > This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. Hi Michael, Please have a try. https://lore.kernel.org/kvm/1618298169-3831-1-git-send-email-wanpengli@tencent.com/ Wanpeng ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting @ 2021-04-13 10:48 ` Wanpeng Li 0 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-04-13 10:48 UTC (permalink / raw) To: Michael Tokarev Cc: Paolo Bonzini, Thomas Gleixner, qemu-devel@nongnu.org Developers, kvm On Wed, 7 Apr 2021 at 18:55, Michael Tokarev <mjt@tls.msk.ru> wrote: > > Hi! > > It looks like this commit: > > commit 87fa7f3e98a1310ef1ac1900e7ee7f9610a038bc > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Wed Jul 8 21:51:54 2020 +0200 > > x86/kvm: Move context tracking where it belongs > > Context tracking for KVM happens way too early in the vcpu_run() > code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() > cannot use RCU and should also be not instrumented. > > The current way of doing this covers way too much code. Move it closer to > the actual vmenter/exit code. > > broke kvm guest cpu time accounting - after this commit, when running > qemu-system-x86_64 -enable-kvm, the guest time (in /proc/stat and > elsewhere) is always 0. > > I dunno why it happened, but it happened, and all kernels after 5.9 > are affected by this. > > This commit is found in a (painful) git bisect between kernel 5.8 and 5.10. Hi Michael, Please have a try. https://lore.kernel.org/kvm/1618298169-3831-1-git-send-email-wanpengli@tencent.com/ Wanpeng ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-04-13 10:50 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-06 20:17 Commit "x86/kvm: Move context tracking where it belongs" broke guest time accounting Michael Tokarev 2021-04-06 21:47 ` Sean Christopherson 2021-04-08 13:18 ` Thomas Gleixner 2021-04-08 13:18 ` Thomas Gleixner 2021-04-09 2:14 ` Wanpeng Li 2021-04-09 2:14 ` Wanpeng Li 2021-04-09 8:13 ` Wanpeng Li 2021-04-09 8:13 ` Wanpeng Li 2021-04-09 9:19 ` Thomas Gleixner 2021-04-09 9:19 ` Thomas Gleixner 2021-04-07 11:01 ` Wanpeng Li 2021-04-07 11:01 ` Wanpeng Li 2021-04-13 10:48 ` Wanpeng Li 2021-04-13 10:48 ` Wanpeng Li
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.