KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests
@ 2021-04-15 22:20 Sean Christopherson
  2021-04-15 22:20 ` [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers Sean Christopherson
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

This is a continuation of Wanpeng's series[1] to fix tick-based CPU time
accounting on x86, with my cleanups[2] bolted on top.  The core premise of
Wanpeng's patches are preserved, but they are heavily stripped down.
Specifically, only the "guest exit" paths are split, and no code is
consolidated.  The intent is to do as little as possible in the three
patches that need to be backported.  Keeping those changes as small as
possible also meant that my cleanups did not need to unwind much 
refactoring.

On x86, tested CONFIG_VIRT_CPU_ACCOUNTING_GEN =y and =n, and with
CONFIG_DEBUG_ENTRY=y && CONFIG_VALIDATE_STACKS=y.  Compile tested arm64,
MIPS, PPC, and s390, the latter with CONFIG_DEBUG_ENTRY=y for giggles.

One last note: I elected to use vtime_account_guest_exit() in the x86 code
instead of open coding these equivalents:

	if (vtime_accounting_enabled_this_cpu())
		vtime_guest_exit(current);
...
	if (!vtime_accounting_enabled_this_cpu())
		current->flags &= ~PF_VCPU;

With CONFIG_VIRT_CPU_ACCOUNTING_GEN=n, this is a complete non-issue, but
for the =y case it means context_tracking_enabled_this_cpu() is being
checked back-to-back.  The redundant checks bug me, but open coding the
gory details in x86 or providing funky variants in vtime.h felt worse.

Delta from Wanpeng's v2:

  - s/context_guest/context_tracking_guest, purely to match the existing
    functions.  I have no strong opinion either way.
  - Split only the "exit" functions.
  - Partially open code vcpu_account_guest_exit() and
    __vtime_account_guest_exit() in x86 to avoid churn when segueing into
    my cleanups (see above).

[1] https://lkml.kernel.org/r/1618298169-3831-1-git-send-email-wanpengli@tencent.com
[2] https://lkml.kernel.org/r/20210413182933.1046389-1-seanjc@google.com

Sean Christopherson (6):
  sched/vtime: Move vtime accounting external declarations above inlines
  sched/vtime: Move guest enter/exit vtime accounting to vtime.h
  context_tracking: Consolidate guest enter/exit wrappers
  context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
  KVM: x86: Consolidate guest enter/exit logic to common helpers
  KVM: Move instrumentation-safe annotations for enter/exit to x86 code

Wanpeng Li (3):
  context_tracking: Move guest exit context tracking to separate helpers
  context_tracking: Move guest exit vtime accounting to separate helpers
  KVM: x86: Defer tick-based accounting 'til after IRQ handling

 arch/x86/kvm/svm/svm.c           |  39 +--------
 arch/x86/kvm/vmx/vmx.c           |  39 +--------
 arch/x86/kvm/x86.c               |   8 ++
 arch/x86/kvm/x86.h               |  52 ++++++++++++
 include/linux/context_tracking.h |  92 ++++-----------------
 include/linux/kvm_host.h         |  38 +++++++++
 include/linux/vtime.h            | 138 +++++++++++++++++++------------
 7 files changed, 204 insertions(+), 202 deletions(-)

-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
@ 2021-04-15 22:20 ` Sean Christopherson
  2021-04-20 18:48   ` Christian Borntraeger
  2021-04-21 10:57   ` Frederic Weisbecker
  2021-04-15 22:20 ` [PATCH v3 2/9] context_tracking: Move guest exit vtime accounting " Sean Christopherson
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

From: Wanpeng Li <wanpengli@tencent.com>

Provide separate context tracking helpers for guest exit, the standalone
helpers will be called separately by KVM x86 in later patches to fix
tick-based accounting.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/context_tracking.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index bceb06498521..200d30cb3a82 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -131,10 +131,15 @@ static __always_inline void guest_enter_irqoff(void)
 	}
 }
 
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void context_tracking_guest_exit_irqoff(void)
 {
 	if (context_tracking_enabled())
 		__context_tracking_exit(CONTEXT_GUEST);
+}
+
+static __always_inline void guest_exit_irqoff(void)
+{
+	context_tracking_guest_exit_irqoff();
 
 	instrumentation_begin();
 	if (vtime_accounting_enabled_this_cpu())
@@ -159,6 +164,8 @@ static __always_inline void guest_enter_irqoff(void)
 	instrumentation_end();
 }
 
+static __always_inline void context_tracking_guest_exit_irqoff(void) { }
+
 static __always_inline void guest_exit_irqoff(void)
 {
 	instrumentation_begin();
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 2/9] context_tracking: Move guest exit vtime accounting to separate helpers
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
  2021-04-15 22:20 ` [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers Sean Christopherson
@ 2021-04-15 22:20 ` Sean Christopherson
  2021-04-20 18:48   ` Christian Borntraeger
  2021-04-15 22:21 ` [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling Sean Christopherson
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

From: Wanpeng Li <wanpengli@tencent.com>

Provide separate vtime accounting functions for guest exit instead of
open coding the logic within the context tracking code.  This will allow
KVM x86 to handle vtime accounting slightly differently when using
tick-based accounting.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/context_tracking.h | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 200d30cb3a82..7cf03a8e5708 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -137,15 +137,20 @@ static __always_inline void context_tracking_guest_exit_irqoff(void)
 		__context_tracking_exit(CONTEXT_GUEST);
 }
 
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void vtime_account_guest_exit(void)
 {
-	context_tracking_guest_exit_irqoff();
-
-	instrumentation_begin();
 	if (vtime_accounting_enabled_this_cpu())
 		vtime_guest_exit(current);
 	else
 		current->flags &= ~PF_VCPU;
+}
+
+static __always_inline void guest_exit_irqoff(void)
+{
+	context_tracking_guest_exit_irqoff();
+
+	instrumentation_begin();
+	vtime_account_guest_exit();
 	instrumentation_end();
 }
 
@@ -166,12 +171,17 @@ static __always_inline void guest_enter_irqoff(void)
 
 static __always_inline void context_tracking_guest_exit_irqoff(void) { }
 
-static __always_inline void guest_exit_irqoff(void)
+static __always_inline void vtime_account_guest_exit(void)
 {
-	instrumentation_begin();
-	/* Flush the guest cputime we spent on the guest */
 	vtime_account_kernel(current);
 	current->flags &= ~PF_VCPU;
+}
+
+static __always_inline void guest_exit_irqoff(void)
+{
+	instrumentation_begin();
+	/* Flush the guest cputime we spent on the guest */
+	vtime_account_guest_exit();
 	instrumentation_end();
 }
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
  2021-04-15 22:20 ` [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers Sean Christopherson
  2021-04-15 22:20 ` [PATCH v3 2/9] context_tracking: Move guest exit vtime accounting " Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-20 23:14   ` Frederic Weisbecker
  2021-04-21 10:07   ` Frederic Weisbecker
  2021-04-15 22:21 ` [PATCH v3 4/9] sched/vtime: Move vtime accounting external declarations above inlines Sean Christopherson
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

From: Wanpeng Li <wanpengli@tencent.com>

When using tick-based accounting, defer the call to account guest time
until after servicing any IRQ(s) that happened in the guest or
immediately after VM-Exit.  Tick-based accounting of vCPU time relies on
PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked
throughout {svm,vmx}_vcpu_enter_exit().

This fixes a bug[*] where reported guest time remains '0', even when
running an infinite loop in the guest.

[*] https://bugzilla.kernel.org/show_bug.cgi?id=209831

Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: stable@vger.kernel.org#v5.9-rc1+
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 13 ++++++++++---
 arch/x86/kvm/vmx/vmx.c | 13 ++++++++++---
 arch/x86/kvm/x86.c     |  8 ++++++++
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48b396f33bee..bb2aa0dde7c5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3750,17 +3750,24 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 	 * have them in state 'on' as recorded before entering guest mode.
 	 * Same as enter_from_user_mode().
 	 *
-	 * guest_exit_irqoff() restores host context and reinstates RCU if
-	 * enabled and required.
+	 * context_tracking_guest_exit_irqoff() restores host context and
+	 * reinstates RCU if enabled and required.
 	 *
 	 * This needs to be done before the below as native_read_msr()
 	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
 	 * into world and some more.
 	 */
 	lockdep_hardirqs_off(CALLER_ADDR0);
-	guest_exit_irqoff();
+	context_tracking_guest_exit_irqoff();
 
 	instrumentation_begin();
+	/*
+	 * Account guest time when precise accounting based on context tracking
+	 * is enabled.  Tick based accounting is deferred until after IRQs that
+	 * occurred within the VM-Enter/VM-Exit "window" are handled.
+	 */
+	if (vtime_accounting_enabled_this_cpu())
+		vtime_account_guest_exit();
 	trace_hardirqs_off_finish();
 	instrumentation_end();
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c05e6e2854b5..5ae9dc197048 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6639,17 +6639,24 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	 * have them in state 'on' as recorded before entering guest mode.
 	 * Same as enter_from_user_mode().
 	 *
-	 * guest_exit_irqoff() restores host context and reinstates RCU if
-	 * enabled and required.
+	 * context_tracking_guest_exit_irqoff() restores host context and
+	 * reinstates RCU if enabled and required.
 	 *
 	 * This needs to be done before the below as native_read_msr()
 	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
 	 * into world and some more.
 	 */
 	lockdep_hardirqs_off(CALLER_ADDR0);
-	guest_exit_irqoff();
+	context_tracking_guest_exit_irqoff();
 
 	instrumentation_begin();
+	/*
+	 * Account guest time when precise accounting based on context tracking
+	 * is enabled.  Tick based accounting is deferred until after IRQs that
+	 * occurred within the VM-Enter/VM-Exit "window" are handled.
+	 */
+	if (vtime_accounting_enabled_this_cpu())
+		vtime_account_guest_exit();
 	trace_hardirqs_off_finish();
 	instrumentation_end();
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16fb39503296..e4d475df1d4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9230,6 +9230,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	local_irq_disable();
 	kvm_after_interrupt(vcpu);
 
+	/*
+	 * When using tick-based accounting, wait until after servicing IRQs to
+	 * account guest time so that any ticks that occurred while running the
+	 * guest are properly accounted to the guest.
+	 */
+	if (!vtime_accounting_enabled_this_cpu())
+		vtime_account_guest_exit();
+
 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
 		if (delta != S64_MIN) {
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 4/9] sched/vtime: Move vtime accounting external declarations above inlines
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-21  7:02   ` Christian Borntraeger
  2021-04-15 22:21 ` [PATCH v3 5/9] sched/vtime: Move guest enter/exit vtime accounting to vtime.h Sean Christopherson
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Move the blob of external declarations (and their stubs) above the set of
inline definitions (and their stubs) for vtime accounting.  This will
allow a future patch to bring in more inline definitions without also
having to shuffle large chunks of code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/vtime.h | 94 +++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 041d6524d144..6a4317560539 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -10,53 +10,6 @@
 
 struct task_struct;
 
-/*
- * vtime_accounting_enabled_this_cpu() definitions/declarations
- */
-#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE)
-
-static inline bool vtime_accounting_enabled_this_cpu(void) { return true; }
-extern void vtime_task_switch(struct task_struct *prev);
-
-#elif defined(CONFIG_VIRT_CPU_ACCOUNTING_GEN)
-
-/*
- * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
- * in that case and compute the tickless cputime.
- * For now vtime state is tied to context tracking. We might want to decouple
- * those later if necessary.
- */
-static inline bool vtime_accounting_enabled(void)
-{
-	return context_tracking_enabled();
-}
-
-static inline bool vtime_accounting_enabled_cpu(int cpu)
-{
-	return context_tracking_enabled_cpu(cpu);
-}
-
-static inline bool vtime_accounting_enabled_this_cpu(void)
-{
-	return context_tracking_enabled_this_cpu();
-}
-
-extern void vtime_task_switch_generic(struct task_struct *prev);
-
-static inline void vtime_task_switch(struct task_struct *prev)
-{
-	if (vtime_accounting_enabled_this_cpu())
-		vtime_task_switch_generic(prev);
-}
-
-#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
-
-static inline bool vtime_accounting_enabled_cpu(int cpu) {return false; }
-static inline bool vtime_accounting_enabled_this_cpu(void) { return false; }
-static inline void vtime_task_switch(struct task_struct *prev) { }
-
-#endif
-
 /*
  * Common vtime APIs
  */
@@ -94,6 +47,53 @@ static inline void vtime_account_hardirq(struct task_struct *tsk) { }
 static inline void vtime_flush(struct task_struct *tsk) { }
 #endif
 
+/*
+ * vtime_accounting_enabled_this_cpu() definitions/declarations
+ */
+#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE)
+
+static inline bool vtime_accounting_enabled_this_cpu(void) { return true; }
+extern void vtime_task_switch(struct task_struct *prev);
+
+#elif defined(CONFIG_VIRT_CPU_ACCOUNTING_GEN)
+
+/*
+ * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
+ * in that case and compute the tickless cputime.
+ * For now vtime state is tied to context tracking. We might want to decouple
+ * those later if necessary.
+ */
+static inline bool vtime_accounting_enabled(void)
+{
+	return context_tracking_enabled();
+}
+
+static inline bool vtime_accounting_enabled_cpu(int cpu)
+{
+	return context_tracking_enabled_cpu(cpu);
+}
+
+static inline bool vtime_accounting_enabled_this_cpu(void)
+{
+	return context_tracking_enabled_this_cpu();
+}
+
+extern void vtime_task_switch_generic(struct task_struct *prev);
+
+static inline void vtime_task_switch(struct task_struct *prev)
+{
+	if (vtime_accounting_enabled_this_cpu())
+		vtime_task_switch_generic(prev);
+}
+
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
+
+static inline bool vtime_accounting_enabled_cpu(int cpu) {return false; }
+static inline bool vtime_accounting_enabled_this_cpu(void) { return false; }
+static inline void vtime_task_switch(struct task_struct *prev) { }
+
+#endif
+
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 extern void irqtime_account_irq(struct task_struct *tsk, unsigned int offset);
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/9] sched/vtime: Move guest enter/exit vtime accounting to vtime.h
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 4/9] sched/vtime: Move vtime accounting external declarations above inlines Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-15 22:21 ` [PATCH v3 6/9] context_tracking: Consolidate guest enter/exit wrappers Sean Christopherson
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Provide separate helpers for guest enter vtime accounting (in addition to
the existing guest exit helpers), and move all vtime accounting helpers
to vtime.h where the existing #ifdef infrastructure can be leveraged to
better delineate the different types of accounting.  This will also allow
future cleanups via deduplication of context tracking code.

Opportunstically delete the vtime_account_kernel() stub now that all
callers are wrapped with CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/context_tracking.h | 17 +-----------
 include/linux/vtime.h            | 46 +++++++++++++++++++++++++++-----
 2 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 7cf03a8e5708..1c05035396ad 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -137,14 +137,6 @@ static __always_inline void context_tracking_guest_exit_irqoff(void)
 		__context_tracking_exit(CONTEXT_GUEST);
 }
 
-static __always_inline void vtime_account_guest_exit(void)
-{
-	if (vtime_accounting_enabled_this_cpu())
-		vtime_guest_exit(current);
-	else
-		current->flags &= ~PF_VCPU;
-}
-
 static __always_inline void guest_exit_irqoff(void)
 {
 	context_tracking_guest_exit_irqoff();
@@ -163,20 +155,13 @@ static __always_inline void guest_enter_irqoff(void)
 	 * to flush.
 	 */
 	instrumentation_begin();
-	vtime_account_kernel(current);
-	current->flags |= PF_VCPU;
+	vtime_account_guest_enter();
 	rcu_virt_note_context_switch(smp_processor_id());
 	instrumentation_end();
 }
 
 static __always_inline void context_tracking_guest_exit_irqoff(void) { }
 
-static __always_inline void vtime_account_guest_exit(void)
-{
-	vtime_account_kernel(current);
-	current->flags &= ~PF_VCPU;
-}
-
 static __always_inline void guest_exit_irqoff(void)
 {
 	instrumentation_begin();
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 6a4317560539..3684487d01e1 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -3,21 +3,18 @@
 #define _LINUX_KERNEL_VTIME_H
 
 #include <linux/context_tracking_state.h>
+#include <linux/sched.h>
+
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 #include <asm/vtime.h>
 #endif
 
-
-struct task_struct;
-
 /*
  * Common vtime APIs
  */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
 extern void vtime_account_kernel(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
-#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
-static inline void vtime_account_kernel(struct task_struct *tsk) { }
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
@@ -55,6 +52,18 @@ static inline void vtime_flush(struct task_struct *tsk) { }
 static inline bool vtime_accounting_enabled_this_cpu(void) { return true; }
 extern void vtime_task_switch(struct task_struct *prev);
 
+static __always_inline void vtime_account_guest_enter(void)
+{
+	vtime_account_kernel(current);
+	current->flags |= PF_VCPU;
+}
+
+static __always_inline void vtime_account_guest_exit(void)
+{
+	vtime_account_kernel(current);
+	current->flags &= ~PF_VCPU;
+}
+
 #elif defined(CONFIG_VIRT_CPU_ACCOUNTING_GEN)
 
 /*
@@ -86,12 +95,37 @@ static inline void vtime_task_switch(struct task_struct *prev)
 		vtime_task_switch_generic(prev);
 }
 
+static __always_inline void vtime_account_guest_enter(void)
+{
+	if (vtime_accounting_enabled_this_cpu())
+		vtime_guest_enter(current);
+	else
+		current->flags |= PF_VCPU;
+}
+
+static __always_inline void vtime_account_guest_exit(void)
+{
+	if (vtime_accounting_enabled_this_cpu())
+		vtime_guest_exit(current);
+	else
+		current->flags &= ~PF_VCPU;
+}
+
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
-static inline bool vtime_accounting_enabled_cpu(int cpu) {return false; }
 static inline bool vtime_accounting_enabled_this_cpu(void) { return false; }
 static inline void vtime_task_switch(struct task_struct *prev) { }
 
+static __always_inline void vtime_account_guest_enter(void)
+{
+	current->flags |= PF_VCPU;
+}
+
+static __always_inline void vtime_account_guest_exit(void)
+{
+	current->flags &= ~PF_VCPU;
+}
+
 #endif
 
 
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 6/9] context_tracking: Consolidate guest enter/exit wrappers
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 5/9] sched/vtime: Move guest enter/exit vtime accounting to vtime.h Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-15 22:21 ` [PATCH v3 7/9] context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain Sean Christopherson
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Consolidate the guest enter/exit wrappers, providing and tweaking stubs
as needed.  This will allow moving the wrappers under KVM without having
to bleed #ifdefs into the soon-to-be KVM code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/context_tracking.h | 65 ++++++++++++--------------------
 1 file changed, 24 insertions(+), 41 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 1c05035396ad..e172a547b2d0 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -71,6 +71,19 @@ static inline void exception_exit(enum ctx_state prev_ctx)
 	}
 }
 
+static __always_inline bool context_tracking_guest_enter_irqoff(void)
+{
+	if (context_tracking_enabled())
+		__context_tracking_enter(CONTEXT_GUEST);
+
+	return context_tracking_enabled_this_cpu();
+}
+
+static __always_inline void context_tracking_guest_exit_irqoff(void)
+{
+	if (context_tracking_enabled())
+		__context_tracking_exit(CONTEXT_GUEST);
+}
 
 /**
  * ct_state() - return the current context tracking state if known
@@ -92,6 +105,9 @@ static inline void user_exit_irqoff(void) { }
 static inline enum ctx_state exception_enter(void) { return 0; }
 static inline void exception_exit(enum ctx_state prev_ctx) { }
 static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; }
+static inline bool context_tracking_guest_enter_irqoff(void) { return false; }
+static inline void context_tracking_guest_exit_irqoff(void) { }
+
 #endif /* !CONFIG_CONTEXT_TRACKING */
 
 #define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond))
@@ -102,74 +118,41 @@ extern void context_tracking_init(void);
 static inline void context_tracking_init(void) { }
 #endif /* CONFIG_CONTEXT_TRACKING_FORCE */
 
-
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 /* must be called with irqs disabled */
 static __always_inline void guest_enter_irqoff(void)
 {
+	/*
+	 * This is running in ioctl context so its safe to assume that it's the
+	 * stime pending cputime to flush.
+	 */
 	instrumentation_begin();
-	if (vtime_accounting_enabled_this_cpu())
-		vtime_guest_enter(current);
-	else
-		current->flags |= PF_VCPU;
+	vtime_account_guest_enter();
 	instrumentation_end();
 
-	if (context_tracking_enabled())
-		__context_tracking_enter(CONTEXT_GUEST);
-
-	/* KVM does not hold any references to rcu protected data when it
+	/*
+	 * KVM does not hold any references to rcu protected data when it
 	 * switches CPU into a guest mode. In fact switching to a guest mode
 	 * is very similar to exiting to userspace from rcu point of view. In
 	 * addition CPU may stay in a guest mode for quite a long time (up to
 	 * one time slice). Lets treat guest mode as quiescent state, just like
 	 * we do with user-mode execution.
 	 */
-	if (!context_tracking_enabled_this_cpu()) {
+	if (!context_tracking_guest_enter_irqoff()) {
 		instrumentation_begin();
 		rcu_virt_note_context_switch(smp_processor_id());
 		instrumentation_end();
 	}
 }
 
-static __always_inline void context_tracking_guest_exit_irqoff(void)
-{
-	if (context_tracking_enabled())
-		__context_tracking_exit(CONTEXT_GUEST);
-}
-
 static __always_inline void guest_exit_irqoff(void)
 {
 	context_tracking_guest_exit_irqoff();
 
-	instrumentation_begin();
-	vtime_account_guest_exit();
-	instrumentation_end();
-}
-
-#else
-static __always_inline void guest_enter_irqoff(void)
-{
-	/*
-	 * This is running in ioctl context so its safe
-	 * to assume that it's the stime pending cputime
-	 * to flush.
-	 */
-	instrumentation_begin();
-	vtime_account_guest_enter();
-	rcu_virt_note_context_switch(smp_processor_id());
-	instrumentation_end();
-}
-
-static __always_inline void context_tracking_guest_exit_irqoff(void) { }
-
-static __always_inline void guest_exit_irqoff(void)
-{
 	instrumentation_begin();
 	/* Flush the guest cputime we spent on the guest */
 	vtime_account_guest_exit();
 	instrumentation_end();
 }
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
 
 static inline void guest_exit(void)
 {
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 7/9] context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 6/9] context_tracking: Consolidate guest enter/exit wrappers Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-21  7:10   ` Christian Borntraeger
  2021-04-15 22:21 ` [PATCH v3 8/9] KVM: x86: Consolidate guest enter/exit logic to common helpers Sean Christopherson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Move the guest enter/exit wrappers to kvm_host.h so that KVM can manage
its context tracking vs. vtime accounting without bleeding too many KVM
details into the context tracking code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/context_tracking.h | 45 --------------------------------
 include/linux/kvm_host.h         | 45 ++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index e172a547b2d0..d4dc9c4d79aa 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -118,49 +118,4 @@ extern void context_tracking_init(void);
 static inline void context_tracking_init(void) { }
 #endif /* CONFIG_CONTEXT_TRACKING_FORCE */
 
-/* must be called with irqs disabled */
-static __always_inline void guest_enter_irqoff(void)
-{
-	/*
-	 * This is running in ioctl context so its safe to assume that it's the
-	 * stime pending cputime to flush.
-	 */
-	instrumentation_begin();
-	vtime_account_guest_enter();
-	instrumentation_end();
-
-	/*
-	 * KVM does not hold any references to rcu protected data when it
-	 * switches CPU into a guest mode. In fact switching to a guest mode
-	 * is very similar to exiting to userspace from rcu point of view. In
-	 * addition CPU may stay in a guest mode for quite a long time (up to
-	 * one time slice). Lets treat guest mode as quiescent state, just like
-	 * we do with user-mode execution.
-	 */
-	if (!context_tracking_guest_enter_irqoff()) {
-		instrumentation_begin();
-		rcu_virt_note_context_switch(smp_processor_id());
-		instrumentation_end();
-	}
-}
-
-static __always_inline void guest_exit_irqoff(void)
-{
-	context_tracking_guest_exit_irqoff();
-
-	instrumentation_begin();
-	/* Flush the guest cputime we spent on the guest */
-	vtime_account_guest_exit();
-	instrumentation_end();
-}
-
-static inline void guest_exit(void)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	guest_exit_irqoff();
-	local_irq_restore(flags);
-}
-
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b06d12ec37e..444d5f0225cb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -332,6 +332,51 @@ struct kvm_vcpu {
 	struct kvm_dirty_ring dirty_ring;
 };
 
+/* must be called with irqs disabled */
+static __always_inline void guest_enter_irqoff(void)
+{
+	/*
+	 * This is running in ioctl context so its safe to assume that it's the
+	 * stime pending cputime to flush.
+	 */
+	instrumentation_begin();
+	vtime_account_guest_enter();
+	instrumentation_end();
+
+	/*
+	 * KVM does not hold any references to rcu protected data when it
+	 * switches CPU into a guest mode. In fact switching to a guest mode
+	 * is very similar to exiting to userspace from rcu point of view. In
+	 * addition CPU may stay in a guest mode for quite a long time (up to
+	 * one time slice). Lets treat guest mode as quiescent state, just like
+	 * we do with user-mode execution.
+	 */
+	if (!context_tracking_guest_enter_irqoff()) {
+		instrumentation_begin();
+		rcu_virt_note_context_switch(smp_processor_id());
+		instrumentation_end();
+	}
+}
+
+static __always_inline void guest_exit_irqoff(void)
+{
+	context_tracking_guest_exit_irqoff();
+
+	instrumentation_begin();
+	/* Flush the guest cputime we spent on the guest */
+	vtime_account_guest_exit();
+	instrumentation_end();
+}
+
+static inline void guest_exit(void)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	guest_exit_irqoff();
+	local_irq_restore(flags);
+}
+
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
 {
 	/*
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 8/9] KVM: x86: Consolidate guest enter/exit logic to common helpers
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 7/9] context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-15 22:21 ` [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code Sean Christopherson
  2021-04-20 23:33 ` [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Frederic Weisbecker
  9 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common
helpers.  Opportunistically update the somewhat stale comment about the
updates needing to occur immediately after VM-Exit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 46 ++-----------------------------------
 arch/x86/kvm/vmx/vmx.c | 46 ++-----------------------------------
 arch/x86/kvm/x86.h     | 52 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 88 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bb2aa0dde7c5..0677595d07e5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3713,25 +3713,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	/*
-	 * VMENTER enables interrupts (host state), but the kernel state is
-	 * interrupts disabled when this is invoked. Also tell RCU about
-	 * it. This is the same logic as for exit_to_user_mode().
-	 *
-	 * This ensures that e.g. latency analysis on the host observes
-	 * guest mode as interrupt enabled.
-	 *
-	 * guest_enter_irqoff() informs context tracking about the
-	 * transition to guest mode and if enabled adjusts RCU state
-	 * accordingly.
-	 */
-	instrumentation_begin();
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	instrumentation_end();
-
-	guest_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
+	kvm_guest_enter_irqoff();
 
 	if (sev_es_guest(vcpu->kvm)) {
 		__svm_sev_es_vcpu_run(svm->vmcb_pa);
@@ -3745,31 +3727,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 		vmload(__sme_page_pa(sd->save_area));
 	}
 
-	/*
-	 * VMEXIT disables interrupts (host state), but tracing and lockdep
-	 * have them in state 'on' as recorded before entering guest mode.
-	 * Same as enter_from_user_mode().
-	 *
-	 * context_tracking_guest_exit_irqoff() restores host context and
-	 * reinstates RCU if enabled and required.
-	 *
-	 * This needs to be done before the below as native_read_msr()
-	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
-	 * into world and some more.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	context_tracking_guest_exit_irqoff();
-
-	instrumentation_begin();
-	/*
-	 * Account guest time when precise accounting based on context tracking
-	 * is enabled.  Tick based accounting is deferred until after IRQs that
-	 * occurred within the VM-Enter/VM-Exit "window" are handled.
-	 */
-	if (vtime_accounting_enabled_this_cpu())
-		vtime_account_guest_exit();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
+	kvm_guest_exit_irqoff();
 }
 
 static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5ae9dc197048..19b0e25bf598 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6600,25 +6600,7 @@ static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 					struct vcpu_vmx *vmx)
 {
-	/*
-	 * VMENTER enables interrupts (host state), but the kernel state is
-	 * interrupts disabled when this is invoked. Also tell RCU about
-	 * it. This is the same logic as for exit_to_user_mode().
-	 *
-	 * This ensures that e.g. latency analysis on the host observes
-	 * guest mode as interrupt enabled.
-	 *
-	 * guest_enter_irqoff() informs context tracking about the
-	 * transition to guest mode and if enabled adjusts RCU state
-	 * accordingly.
-	 */
-	instrumentation_begin();
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	instrumentation_end();
-
-	guest_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
+	kvm_guest_enter_irqoff();
 
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
@@ -6634,31 +6616,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.cr2 = native_read_cr2();
 
-	/*
-	 * VMEXIT disables interrupts (host state), but tracing and lockdep
-	 * have them in state 'on' as recorded before entering guest mode.
-	 * Same as enter_from_user_mode().
-	 *
-	 * context_tracking_guest_exit_irqoff() restores host context and
-	 * reinstates RCU if enabled and required.
-	 *
-	 * This needs to be done before the below as native_read_msr()
-	 * contains a tracepoint and x86_spec_ctrl_restore_host() calls
-	 * into world and some more.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	context_tracking_guest_exit_irqoff();
-
-	instrumentation_begin();
-	/*
-	 * Account guest time when precise accounting based on context tracking
-	 * is enabled.  Tick based accounting is deferred until after IRQs that
-	 * occurred within the VM-Enter/VM-Exit "window" are handled.
-	 */
-	if (vtime_accounting_enabled_this_cpu())
-		vtime_account_guest_exit();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
+	kvm_guest_exit_irqoff();
 }
 
 static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index daccf20fbcd5..285953e81777 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -8,6 +8,58 @@
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
 
+static __always_inline void kvm_guest_enter_irqoff(void)
+{
+	/*
+	 * VMENTER enables interrupts (host state), but the kernel state is
+	 * interrupts disabled when this is invoked. Also tell RCU about
+	 * it. This is the same logic as for exit_to_user_mode().
+	 *
+	 * This ensures that e.g. latency analysis on the host observes
+	 * guest mode as interrupt enabled.
+	 *
+	 * guest_enter_irqoff() informs context tracking about the
+	 * transition to guest mode and if enabled adjusts RCU state
+	 * accordingly.
+	 */
+	instrumentation_begin();
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+	instrumentation_end();
+
+	guest_enter_irqoff();
+	lockdep_hardirqs_on(CALLER_ADDR0);
+}
+
+static __always_inline void kvm_guest_exit_irqoff(void)
+{
+	/*
+	 * VMEXIT disables interrupts (host state), but tracing and lockdep
+	 * have them in state 'on' as recorded before entering guest mode.
+	 * Same as enter_from_user_mode().
+	 *
+	 * context_tracking_guest_exit_irqoff() restores host context and
+	 * reinstates RCU if enabled and required.
+	 *
+	 * This needs to be done immediately after VM-Exit, before any code
+	 * that might contain tracepoints or call out to the greater world,
+	 * e.g. before x86_spec_ctrl_restore_host().
+	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	context_tracking_guest_exit_irqoff();
+
+	instrumentation_begin();
+	/*
+	 * Account guest time when precise accounting based on context tracking
+	 * is enabled.  Tick based accounting is deferred until after IRQs that
+	 * occurred within the VM-Enter/VM-Exit "window" are handled.
+	 */
+	if (vtime_accounting_enabled_this_cpu())
+		vtime_account_guest_exit();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+}
+
 #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)		\
 ({									\
 	bool failed = (consistency_check);				\
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 8/9] KVM: x86: Consolidate guest enter/exit logic to common helpers Sean Christopherson
@ 2021-04-15 22:21 ` Sean Christopherson
  2021-04-21  8:09   ` Christian Borntraeger
  2021-04-20 23:33 ` [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Frederic Weisbecker
  9 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2021-04-15 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Drop the instrumentation_{begin,end}() annonations from the common KVM
guest enter/exit helpers, and massage the x86 code as needed to preserve
the necessary annotations.  x86 is the only architecture whose transition
flow is tagged as noinstr, and more specifically, it is the only
architecture for which instrumentation_{begin,end}() can be non-empty.

No other architecture supports CONFIG_STACK_VALIDATION=y, and s390 is the
only other architecture that support CONFIG_DEBUG_ENTRY=y.  For
instrumentation annontations to be meaningful, both aformentioned configs
must be enabled.

Letting x86 deal with the annotations avoids unnecessary nops by
squashing back-to-back instrumention-safe sequences.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.h       | 4 ++--
 include/linux/kvm_host.h | 9 +--------
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 285953e81777..b17857ac540b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -25,9 +25,9 @@ static __always_inline void kvm_guest_enter_irqoff(void)
 	instrumentation_begin();
 	trace_hardirqs_on_prepare();
 	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	instrumentation_end();
-
 	guest_enter_irqoff();
+	instrumentation_end();
+
 	lockdep_hardirqs_on(CALLER_ADDR0);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 444d5f0225cb..e5eb64019f47 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -339,9 +339,7 @@ static __always_inline void guest_enter_irqoff(void)
 	 * This is running in ioctl context so its safe to assume that it's the
 	 * stime pending cputime to flush.
 	 */
-	instrumentation_begin();
 	vtime_account_guest_enter();
-	instrumentation_end();
 
 	/*
 	 * KVM does not hold any references to rcu protected data when it
@@ -351,21 +349,16 @@ static __always_inline void guest_enter_irqoff(void)
 	 * one time slice). Lets treat guest mode as quiescent state, just like
 	 * we do with user-mode execution.
 	 */
-	if (!context_tracking_guest_enter_irqoff()) {
-		instrumentation_begin();
+	if (!context_tracking_guest_enter_irqoff())
 		rcu_virt_note_context_switch(smp_processor_id());
-		instrumentation_end();
-	}
 }
 
 static __always_inline void guest_exit_irqoff(void)
 {
 	context_tracking_guest_exit_irqoff();
 
-	instrumentation_begin();
 	/* Flush the guest cputime we spent on the guest */
 	vtime_account_guest_exit();
-	instrumentation_end();
 }
 
 static inline void guest_exit(void)
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers
  2021-04-15 22:20 ` [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers Sean Christopherson
@ 2021-04-20 18:48   ` Christian Borntraeger
  2021-04-21 10:57   ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2021-04-20 18:48 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Thomas Gleixner, Michael Tokarev



On 16.04.21 00:20, Sean Christopherson wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Provide separate context tracking helpers for guest exit, the standalone
> helpers will be called separately by KVM x86 in later patches to fix
> tick-based accounting.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Michael Tokarev <mjt@tls.msk.ru>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>   include/linux/context_tracking.h | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index bceb06498521..200d30cb3a82 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -131,10 +131,15 @@ static __always_inline void guest_enter_irqoff(void)
>   	}
>   }
>   
> -static __always_inline void guest_exit_irqoff(void)
> +static __always_inline void context_tracking_guest_exit_irqoff(void)
>   {
>   	if (context_tracking_enabled())
>   		__context_tracking_exit(CONTEXT_GUEST);
> +}
> +
> +static __always_inline void guest_exit_irqoff(void)
> +{
> +	context_tracking_guest_exit_irqoff();
>   
>   	instrumentation_begin();
>   	if (vtime_accounting_enabled_this_cpu())
> @@ -159,6 +164,8 @@ static __always_inline void guest_enter_irqoff(void)
>   	instrumentation_end();
>   }
>   
> +static __always_inline void context_tracking_guest_exit_irqoff(void) { }
> +
>   static __always_inline void guest_exit_irqoff(void)
>   {
>   	instrumentation_begin();
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 2/9] context_tracking: Move guest exit vtime accounting to separate helpers
  2021-04-15 22:20 ` [PATCH v3 2/9] context_tracking: Move guest exit vtime accounting " Sean Christopherson
@ 2021-04-20 18:48   ` Christian Borntraeger
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2021-04-20 18:48 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Thomas Gleixner, Michael Tokarev



On 16.04.21 00:20, Sean Christopherson wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Provide separate vtime accounting functions for guest exit instead of
> open coding the logic within the context tracking code.  This will allow
> KVM x86 to handle vtime accounting slightly differently when using
> tick-based accounting.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michael Tokarev <mjt@tls.msk.ru>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>   include/linux/context_tracking.h | 24 +++++++++++++++++-------
>   1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index 200d30cb3a82..7cf03a8e5708 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -137,15 +137,20 @@ static __always_inline void context_tracking_guest_exit_irqoff(void)
>   		__context_tracking_exit(CONTEXT_GUEST);
>   }
>   
> -static __always_inline void guest_exit_irqoff(void)
> +static __always_inline void vtime_account_guest_exit(void)
>   {
> -	context_tracking_guest_exit_irqoff();
> -
> -	instrumentation_begin();
>   	if (vtime_accounting_enabled_this_cpu())
>   		vtime_guest_exit(current);
>   	else
>   		current->flags &= ~PF_VCPU;
> +}
> +
> +static __always_inline void guest_exit_irqoff(void)
> +{
> +	context_tracking_guest_exit_irqoff();
> +
> +	instrumentation_begin();
> +	vtime_account_guest_exit();
>   	instrumentation_end();
>   }
>   
> @@ -166,12 +171,17 @@ static __always_inline void guest_enter_irqoff(void)
>   
>   static __always_inline void context_tracking_guest_exit_irqoff(void) { }
>   
> -static __always_inline void guest_exit_irqoff(void)
> +static __always_inline void vtime_account_guest_exit(void)
>   {
> -	instrumentation_begin();
> -	/* Flush the guest cputime we spent on the guest */
>   	vtime_account_kernel(current);
>   	current->flags &= ~PF_VCPU;
> +}
> +
> +static __always_inline void guest_exit_irqoff(void)
> +{
> +	instrumentation_begin();
> +	/* Flush the guest cputime we spent on the guest */
> +	vtime_account_guest_exit();
>   	instrumentation_end();
>   }
>   #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-15 22:21 ` [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling Sean Christopherson
@ 2021-04-20 23:14   ` Frederic Weisbecker
  2021-04-20 23:26     ` Sean Christopherson
  2021-04-21 10:07   ` Frederic Weisbecker
  1 sibling, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2021-04-20 23:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Thu, Apr 15, 2021 at 03:21:00PM -0700, Sean Christopherson wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> When using tick-based accounting, defer the call to account guest time
> until after servicing any IRQ(s) that happened in the guest or
> immediately after VM-Exit.  Tick-based accounting of vCPU time relies on
> PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked
> throughout {svm,vmx}_vcpu_enter_exit().
> 
> This fixes a bug[*] where reported guest time remains '0', even when
> running an infinite loop in the guest.
> 
> [*] https://bugzilla.kernel.org/show_bug.cgi?id=209831
> 
> Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Michael Tokarev <mjt@tls.msk.ru>
> Cc: stable@vger.kernel.org#v5.9-rc1+
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 16fb39503296..e4d475df1d4a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9230,6 +9230,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	local_irq_disable();
>  	kvm_after_interrupt(vcpu);
>  
> +	/*
> +	 * When using tick-based accounting, wait until after servicing IRQs to
> +	 * account guest time so that any ticks that occurred while running the
> +	 * guest are properly accounted to the guest.
> +	 */
> +	if (!vtime_accounting_enabled_this_cpu())
> +		vtime_account_guest_exit();

Can we rather have instead:

static inline void tick_account_guest_exit(void)
{
	if (!vtime_accounting_enabled_this_cpu())
		current->flags &= ~PF_VCPU;
}

It duplicates a bit of code but I think this will read less confusing.

Thanks.

> +
>  	if (lapic_in_kernel(vcpu)) {
>  		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
>  		if (delta != S64_MIN) {
> -- 
> 2.31.1.368.gbe11c130af-goog
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-20 23:14   ` Frederic Weisbecker
@ 2021-04-20 23:26     ` Sean Christopherson
  2021-04-21 10:11       ` Frederic Weisbecker
  2021-04-21 12:19       ` Frederic Weisbecker
  0 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-20 23:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Wed, Apr 21, 2021, Frederic Weisbecker wrote:
> On Thu, Apr 15, 2021 at 03:21:00PM -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 16fb39503296..e4d475df1d4a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9230,6 +9230,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  	local_irq_disable();
> >  	kvm_after_interrupt(vcpu);
> >  
> > +	/*
> > +	 * When using tick-based accounting, wait until after servicing IRQs to
> > +	 * account guest time so that any ticks that occurred while running the
> > +	 * guest are properly accounted to the guest.
> > +	 */
> > +	if (!vtime_accounting_enabled_this_cpu())
> > +		vtime_account_guest_exit();
> 
> Can we rather have instead:
> 
> static inline void tick_account_guest_exit(void)
> {
> 	if (!vtime_accounting_enabled_this_cpu())
> 		current->flags &= ~PF_VCPU;
> }
> 
> It duplicates a bit of code but I think this will read less confusing.

Either way works for me.  I used vtime_account_guest_exit() to try to keep as
many details as possible inside vtime, e.g. in case the implemenation is tweaked
in the future.  But I agree that pretending KVM isn't already deeply intertwined
with the details is a lie.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests
  2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-04-15 22:21 ` [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code Sean Christopherson
@ 2021-04-20 23:33 ` Frederic Weisbecker
  9 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-04-20 23:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Thu, Apr 15, 2021 at 03:20:57PM -0700, Sean Christopherson wrote:
> This is a continuation of Wanpeng's series[1] to fix tick-based CPU time
> accounting on x86, with my cleanups[2] bolted on top.  The core premise of
> Wanpeng's patches are preserved, but they are heavily stripped down.
> Specifically, only the "guest exit" paths are split, and no code is
> consolidated.  The intent is to do as little as possible in the three
> patches that need to be backported.  Keeping those changes as small as
> possible also meant that my cleanups did not need to unwind much 
> refactoring.
> 
> On x86, tested CONFIG_VIRT_CPU_ACCOUNTING_GEN =y and =n, and with
> CONFIG_DEBUG_ENTRY=y && CONFIG_VALIDATE_STACKS=y.  Compile tested arm64,
> MIPS, PPC, and s390, the latter with CONFIG_DEBUG_ENTRY=y for giggles.
> 
> One last note: I elected to use vtime_account_guest_exit() in the x86 code
> instead of open coding these equivalents:
> 
> 	if (vtime_accounting_enabled_this_cpu())
> 		vtime_guest_exit(current);
> ...
> 	if (!vtime_accounting_enabled_this_cpu())
> 		current->flags &= ~PF_VCPU;
> 
> With CONFIG_VIRT_CPU_ACCOUNTING_GEN=n, this is a complete non-issue, but
> for the =y case it means context_tracking_enabled_this_cpu() is being
> checked back-to-back.  The redundant checks bug me, but open coding the
> gory details in x86 or providing funky variants in vtime.h felt worse.
> 
> Delta from Wanpeng's v2:
> 
>   - s/context_guest/context_tracking_guest, purely to match the existing
>     functions.  I have no strong opinion either way.
>   - Split only the "exit" functions.
>   - Partially open code vcpu_account_guest_exit() and
>     __vtime_account_guest_exit() in x86 to avoid churn when segueing into
>     my cleanups (see above).
> 
> [1] https://lkml.kernel.org/r/1618298169-3831-1-git-send-email-wanpengli@tencent.com
> [2] https://lkml.kernel.org/r/20210413182933.1046389-1-seanjc@google.com
> 
> Sean Christopherson (6):
>   sched/vtime: Move vtime accounting external declarations above inlines
>   sched/vtime: Move guest enter/exit vtime accounting to vtime.h
>   context_tracking: Consolidate guest enter/exit wrappers
>   context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
>   KVM: x86: Consolidate guest enter/exit logic to common helpers
>   KVM: Move instrumentation-safe annotations for enter/exit to x86 code
> 
> Wanpeng Li (3):
>   context_tracking: Move guest exit context tracking to separate helpers
>   context_tracking: Move guest exit vtime accounting to separate helpers
>   KVM: x86: Defer tick-based accounting 'til after IRQ handling
> 
>  arch/x86/kvm/svm/svm.c           |  39 +--------
>  arch/x86/kvm/vmx/vmx.c           |  39 +--------
>  arch/x86/kvm/x86.c               |   8 ++
>  arch/x86/kvm/x86.h               |  52 ++++++++++++
>  include/linux/context_tracking.h |  92 ++++-----------------
>  include/linux/kvm_host.h         |  38 +++++++++
>  include/linux/vtime.h            | 138 +++++++++++++++++++------------
>  7 files changed, 204 insertions(+), 202 deletions(-)

Please Cc me on any follow-up of this patchset. I have set up a lot of booby
traps on purpose in this cave and I might be able to remember a few on the way.
Should you meet one of the poisoned arrows, rest assured that you were not the
aimed target though.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 4/9] sched/vtime: Move vtime accounting external declarations above inlines
  2021-04-15 22:21 ` [PATCH v3 4/9] sched/vtime: Move vtime accounting external declarations above inlines Sean Christopherson
@ 2021-04-21  7:02   ` Christian Borntraeger
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2021-04-21  7:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Thomas Gleixner, Michael Tokarev



On 16.04.21 00:21, Sean Christopherson wrote:
> Move the blob of external declarations (and their stubs) above the set of
> inline definitions (and their stubs) for vtime accounting.  This will
> allow a future patch to bring in more inline definitions without also
> having to shuffle large chunks of code.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson<seanjc@google.com>

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>   include/linux/vtime.h | 94 +++++++++++++++++++++----------------------
>   1 file changed, 47 insertions(+), 47 deletions(-)
> 
> diff --git a/include/linux/vtime.h b/include/linux/vtime.h
> index 041d6524d144..6a4317560539 100644
> --- a/include/linux/vtime.h
> +++ b/include/linux/vtime.h
> @@ -10,53 +10,6 @@
>   
>   struct task_struct;
>   
> -/*
> - * vtime_accounting_enabled_this_cpu() definitions/declarations
> - */
> -#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE)
> -
> -static inline bool vtime_accounting_enabled_this_cpu(void) { return true; }
> -extern void vtime_task_switch(struct task_struct *prev);
> -
> -#elif defined(CONFIG_VIRT_CPU_ACCOUNTING_GEN)
> -
> -/*
> - * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
> - * in that case and compute the tickless cputime.
> - * For now vtime state is tied to context tracking. We might want to decouple
> - * those later if necessary.
> - */
> -static inline bool vtime_accounting_enabled(void)
> -{
> -	return context_tracking_enabled();
> -}
> -
> -static inline bool vtime_accounting_enabled_cpu(int cpu)
> -{
> -	return context_tracking_enabled_cpu(cpu);
> -}
> -
> -static inline bool vtime_accounting_enabled_this_cpu(void)
> -{
> -	return context_tracking_enabled_this_cpu();
> -}
> -
> -extern void vtime_task_switch_generic(struct task_struct *prev);
> -
> -static inline void vtime_task_switch(struct task_struct *prev)
> -{
> -	if (vtime_accounting_enabled_this_cpu())
> -		vtime_task_switch_generic(prev);
> -}
> -
> -#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
> -
> -static inline bool vtime_accounting_enabled_cpu(int cpu) {return false; }
> -static inline bool vtime_accounting_enabled_this_cpu(void) { return false; }
> -static inline void vtime_task_switch(struct task_struct *prev) { }
> -
> -#endif
> -
>   /*
>    * Common vtime APIs
>    */
> @@ -94,6 +47,53 @@ static inline void vtime_account_hardirq(struct task_struct *tsk) { }
>   static inline void vtime_flush(struct task_struct *tsk) { }
>   #endif
>   
> +/*
> + * vtime_accounting_enabled_this_cpu() definitions/declarations
> + */
> +#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE)
> +
> +static inline bool vtime_accounting_enabled_this_cpu(void) { return true; }
> +extern void vtime_task_switch(struct task_struct *prev);
> +
> +#elif defined(CONFIG_VIRT_CPU_ACCOUNTING_GEN)
> +
> +/*
> + * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
> + * in that case and compute the tickless cputime.
> + * For now vtime state is tied to context tracking. We might want to decouple
> + * those later if necessary.
> + */
> +static inline bool vtime_accounting_enabled(void)
> +{
> +	return context_tracking_enabled();
> +}
> +
> +static inline bool vtime_accounting_enabled_cpu(int cpu)
> +{
> +	return context_tracking_enabled_cpu(cpu);
> +}
> +
> +static inline bool vtime_accounting_enabled_this_cpu(void)
> +{
> +	return context_tracking_enabled_this_cpu();
> +}
> +
> +extern void vtime_task_switch_generic(struct task_struct *prev);
> +
> +static inline void vtime_task_switch(struct task_struct *prev)
> +{
> +	if (vtime_accounting_enabled_this_cpu())
> +		vtime_task_switch_generic(prev);
> +}
> +
> +#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
> +
> +static inline bool vtime_accounting_enabled_cpu(int cpu) {return false; }
> +static inline bool vtime_accounting_enabled_this_cpu(void) { return false; }
> +static inline void vtime_task_switch(struct task_struct *prev) { }
> +
> +#endif
> +
>   
>   #ifdef CONFIG_IRQ_TIME_ACCOUNTING
>   extern void irqtime_account_irq(struct task_struct *tsk, unsigned int offset);
> -- 2.31.1.368.gbe11c130af-goog
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/9] context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
  2021-04-15 22:21 ` [PATCH v3 7/9] context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain Sean Christopherson
@ 2021-04-21  7:10   ` Christian Borntraeger
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2021-04-21  7:10 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Thomas Gleixner, Michael Tokarev,
	Frederic Weisbecker



On 16.04.21 00:21, Sean Christopherson wrote:
> Move the guest enter/exit wrappers to kvm_host.h so that KVM can manage
> its context tracking vs. vtime accounting without bleeding too many KVM
> details into the context tracking code.
> 
> No functional change intended.

Funny story. This used to be in kvm code and it was moved to context_tracking by
commit 521921bad1192fb1b8f9b6a5aa673635848b8b5f
     kvm: Move guest entry/exit APIs to context_tracking

I think with all your cleanup it can now move back as it no longer deals with
context tracking.


> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   include/linux/context_tracking.h | 45 --------------------------------
>   include/linux/kvm_host.h         | 45 ++++++++++++++++++++++++++++++++
>   2 files changed, 45 insertions(+), 45 deletions(-)
> 
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index e172a547b2d0..d4dc9c4d79aa 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -118,49 +118,4 @@ extern void context_tracking_init(void);
>   static inline void context_tracking_init(void) { }
>   #endif /* CONFIG_CONTEXT_TRACKING_FORCE */
>   
> -/* must be called with irqs disabled */
> -static __always_inline void guest_enter_irqoff(void)
> -{
> -	/*
> -	 * This is running in ioctl context so its safe to assume that it's the
> -	 * stime pending cputime to flush.
> -	 */
> -	instrumentation_begin();
> -	vtime_account_guest_enter();
> -	instrumentation_end();
> -
> -	/*
> -	 * KVM does not hold any references to rcu protected data when it
> -	 * switches CPU into a guest mode. In fact switching to a guest mode
> -	 * is very similar to exiting to userspace from rcu point of view. In
> -	 * addition CPU may stay in a guest mode for quite a long time (up to
> -	 * one time slice). Lets treat guest mode as quiescent state, just like
> -	 * we do with user-mode execution.
> -	 */
> -	if (!context_tracking_guest_enter_irqoff()) {
> -		instrumentation_begin();
> -		rcu_virt_note_context_switch(smp_processor_id());
> -		instrumentation_end();
> -	}
> -}
> -
> -static __always_inline void guest_exit_irqoff(void)
> -{
> -	context_tracking_guest_exit_irqoff();
> -
> -	instrumentation_begin();
> -	/* Flush the guest cputime we spent on the guest */
> -	vtime_account_guest_exit();
> -	instrumentation_end();
> -}
> -
> -static inline void guest_exit(void)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	guest_exit_irqoff();
> -	local_irq_restore(flags);
> -}
> -
>   #endif
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3b06d12ec37e..444d5f0225cb 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -332,6 +332,51 @@ struct kvm_vcpu {
>   	struct kvm_dirty_ring dirty_ring;
>   };
>   
> +/* must be called with irqs disabled */
> +static __always_inline void guest_enter_irqoff(void)
> +{
> +	/*
> +	 * This is running in ioctl context so its safe to assume that it's the
> +	 * stime pending cputime to flush.
> +	 */
> +	instrumentation_begin();
> +	vtime_account_guest_enter();
> +	instrumentation_end();
> +
> +	/*
> +	 * KVM does not hold any references to rcu protected data when it
> +	 * switches CPU into a guest mode. In fact switching to a guest mode
> +	 * is very similar to exiting to userspace from rcu point of view. In
> +	 * addition CPU may stay in a guest mode for quite a long time (up to
> +	 * one time slice). Lets treat guest mode as quiescent state, just like
> +	 * we do with user-mode execution.
> +	 */
> +	if (!context_tracking_guest_enter_irqoff()) {
> +		instrumentation_begin();
> +		rcu_virt_note_context_switch(smp_processor_id());
> +		instrumentation_end();
> +	}
> +}
> +
> +static __always_inline void guest_exit_irqoff(void)
> +{
> +	context_tracking_guest_exit_irqoff();
> +
> +	instrumentation_begin();
> +	/* Flush the guest cputime we spent on the guest */
> +	vtime_account_guest_exit();
> +	instrumentation_end();
> +}
> +
> +static inline void guest_exit(void)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	guest_exit_irqoff();
> +	local_irq_restore(flags);
> +}
> +
>   static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
>   {
>   	/*
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code
  2021-04-15 22:21 ` [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code Sean Christopherson
@ 2021-04-21  8:09   ` Christian Borntraeger
  2021-04-22 14:38     ` Sven Schnelle
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Borntraeger @ 2021-04-21  8:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Thomas Gleixner, Michael Tokarev, Sven Schnelle,
	Heiko Carstens



On 16.04.21 00:21, Sean Christopherson wrote:
> Drop the instrumentation_{begin,end}() annonations from the common KVM
> guest enter/exit helpers, and massage the x86 code as needed to preserve
> the necessary annotations.  x86 is the only architecture whose transition
> flow is tagged as noinstr, and more specifically, it is the only
> architecture for which instrumentation_{begin,end}() can be non-empty.
> 
> No other architecture supports CONFIG_STACK_VALIDATION=y, and s390 is the
> only other architecture that support CONFIG_DEBUG_ENTRY=y.  For
> instrumentation annontations to be meaningful, both aformentioned configs
> must be enabled.
> 
> Letting x86 deal with the annotations avoids unnecessary nops by
> squashing back-to-back instrumention-safe sequences.

We have considered implementing objtool for s390. Not sure where we
stand and if we will do this or not. Sven/Heiko?

So maybe drop this patch until every other arch agrees that there are
no plans to implement this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-15 22:21 ` [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling Sean Christopherson
  2021-04-20 23:14   ` Frederic Weisbecker
@ 2021-04-21 10:07   ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-04-21 10:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Thu, Apr 15, 2021 at 03:21:00PM -0700, Sean Christopherson wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> When using tick-based accounting, defer the call to account guest time
> until after servicing any IRQ(s) that happened in the guest or
> immediately after VM-Exit.  Tick-based accounting of vCPU time relies on
> PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked
> throughout {svm,vmx}_vcpu_enter_exit().

Out of curiosity, how has guest accounting ever managed to work if we have always
cleared PF_VCPU before re-enabling interrupt?

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-20 23:26     ` Sean Christopherson
@ 2021-04-21 10:11       ` Frederic Weisbecker
  2021-04-21 12:19       ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-04-21 10:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Tue, Apr 20, 2021 at 11:26:34PM +0000, Sean Christopherson wrote:
> On Wed, Apr 21, 2021, Frederic Weisbecker wrote:
> > On Thu, Apr 15, 2021 at 03:21:00PM -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 16fb39503296..e4d475df1d4a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -9230,6 +9230,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > >  	local_irq_disable();
> > >  	kvm_after_interrupt(vcpu);
> > >  
> > > +	/*
> > > +	 * When using tick-based accounting, wait until after servicing IRQs to
> > > +	 * account guest time so that any ticks that occurred while running the
> > > +	 * guest are properly accounted to the guest.
> > > +	 */
> > > +	if (!vtime_accounting_enabled_this_cpu())
> > > +		vtime_account_guest_exit();
> > 
> > Can we rather have instead:
> > 
> > static inline void tick_account_guest_exit(void)
> > {
> > 	if (!vtime_accounting_enabled_this_cpu())
> > 		current->flags &= ~PF_VCPU;
> > }
> > 
> > It duplicates a bit of code but I think this will read less confusing.
> 
> Either way works for me.  I used vtime_account_guest_exit() to try to keep as
> many details as possible inside vtime, e.g. in case the implemenation is tweaked
> in the future.  But I agree that pretending KVM isn't already deeply intertwined
> with the details is a lie.

Ok let's keep it as is then. It reads funny but can we perhaps hope that
other archs have the same issue and in the future we'll need to have this split
generalized outside x86 with:

	* guest_exit_vtime()
	* guest_exit_tick()

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers
  2021-04-15 22:20 ` [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers Sean Christopherson
  2021-04-20 18:48   ` Christian Borntraeger
@ 2021-04-21 10:57   ` Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2021-04-21 10:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Thu, Apr 15, 2021 at 03:20:58PM -0700, Sean Christopherson wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Provide separate context tracking helpers for guest exit, the standalone
> helpers will be called separately by KVM x86 in later patches to fix
> tick-based accounting.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Michael Tokarev <mjt@tls.msk.ru>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  include/linux/context_tracking.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index bceb06498521..200d30cb3a82 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -131,10 +131,15 @@ static __always_inline void guest_enter_irqoff(void)
>  	}
>  }
>  
> -static __always_inline void guest_exit_irqoff(void)
> +static __always_inline void context_tracking_guest_exit_irqoff(void)

You can make this context_tracking_guest_exit() since there is no irq-on
version.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-20 23:26     ` Sean Christopherson
  2021-04-21 10:11       ` Frederic Weisbecker
@ 2021-04-21 12:19       ` Frederic Weisbecker
  2021-04-28 22:38         ` Sean Christopherson
  1 sibling, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2021-04-21 12:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

On Tue, Apr 20, 2021 at 11:26:34PM +0000, Sean Christopherson wrote:
> On Wed, Apr 21, 2021, Frederic Weisbecker wrote:
> > On Thu, Apr 15, 2021 at 03:21:00PM -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 16fb39503296..e4d475df1d4a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -9230,6 +9230,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > >  	local_irq_disable();
> > >  	kvm_after_interrupt(vcpu);
> > >  
> > > +	/*
> > > +	 * When using tick-based accounting, wait until after servicing IRQs to
> > > +	 * account guest time so that any ticks that occurred while running the
> > > +	 * guest are properly accounted to the guest.
> > > +	 */
> > > +	if (!vtime_accounting_enabled_this_cpu())
> > > +		vtime_account_guest_exit();
> > 
> > Can we rather have instead:
> > 
> > static inline void tick_account_guest_exit(void)
> > {
> > 	if (!vtime_accounting_enabled_this_cpu())
> > 		current->flags &= ~PF_VCPU;
> > }
> > 
> > It duplicates a bit of code but I think this will read less confusing.
> 
> Either way works for me.  I used vtime_account_guest_exit() to try to keep as
> many details as possible inside vtime, e.g. in case the implemenation is tweaked
> in the future.  But I agree that pretending KVM isn't already deeply intertwined
> with the details is a lie.

Ah I see, before 87fa7f3e98a131 the vtime was accounted after interrupts get
processed. So it used to work until then. I see that ARM64 waits for IRQs to
be enabled too.

PPC/book3s_hv, MIPS, s390 do it before IRQs get re-enabled (weird, how does that
work?)

And PPC/book3s_pr calls guest_exit() so I guess it has interrupts enabled.

The point is: does it matter to call vtime_account_guest_exit() before or
after interrupts? If it doesn't matter, we can simply call
vtime_account_guest_exit() once and for all once IRQs are re-enabled.

If it does matter because we don't want to account the host IRQs firing at the
end of vcpu exit, then probably we should standardize that behaviour and have
guest_exit_vtime() called before interrupts get enabled and guest_exit_tick()
called after interrupts get enabled. It's probably then beyond the scope of this
patchset but I would like to poke your opinion on that.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code
  2021-04-21  8:09   ` Christian Borntraeger
@ 2021-04-22 14:38     ` Sven Schnelle
  2021-04-23  9:32       ` Vasily Gorbik
  0 siblings, 1 reply; 25+ messages in thread
From: Sven Schnelle @ 2021-04-22 14:38 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Heiko Carstens, gor

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> On 16.04.21 00:21, Sean Christopherson wrote:
>> Drop the instrumentation_{begin,end}() annonations from the common KVM
>> guest enter/exit helpers, and massage the x86 code as needed to preserve
>> the necessary annotations.  x86 is the only architecture whose transition
>> flow is tagged as noinstr, and more specifically, it is the only
>> architecture for which instrumentation_{begin,end}() can be non-empty.
>> No other architecture supports CONFIG_STACK_VALIDATION=y, and s390
>> is the
>> only other architecture that support CONFIG_DEBUG_ENTRY=y.  For
>> instrumentation annontations to be meaningful, both aformentioned configs
>> must be enabled.
>> Letting x86 deal with the annotations avoids unnecessary nops by
>> squashing back-to-back instrumention-safe sequences.
>
> We have considered implementing objtool for s390. Not sure where we
> stand and if we will do this or not. Sven/Heiko?

We are planning to support objtool on s390. Vasily is working on it -
maybe he has some thoughts about this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code
  2021-04-22 14:38     ` Sven Schnelle
@ 2021-04-23  9:32       ` Vasily Gorbik
  0 siblings, 0 replies; 25+ messages in thread
From: Vasily Gorbik @ 2021-04-23  9:32 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Christian Borntraeger, Sean Christopherson, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Thomas Gleixner, Michael Tokarev, Heiko Carstens

On Thu, Apr 22, 2021 at 04:38:24PM +0200, Sven Schnelle wrote:
> Christian Borntraeger <borntraeger@de.ibm.com> writes:
> 
> > On 16.04.21 00:21, Sean Christopherson wrote:
> >> Drop the instrumentation_{begin,end}() annonations from the common KVM
> >> guest enter/exit helpers, and massage the x86 code as needed to preserve
> >> the necessary annotations.  x86 is the only architecture whose transition
> >> flow is tagged as noinstr, and more specifically, it is the only
> >> architecture for which instrumentation_{begin,end}() can be non-empty.
> >> No other architecture supports CONFIG_STACK_VALIDATION=y, and s390
> >> is the
> >> only other architecture that support CONFIG_DEBUG_ENTRY=y.  For
> >> instrumentation annontations to be meaningful, both aformentioned configs
> >> must be enabled.
> >> Letting x86 deal with the annotations avoids unnecessary nops by
> >> squashing back-to-back instrumention-safe sequences.
> >
> > We have considered implementing objtool for s390. Not sure where we
> > stand and if we will do this or not. Sven/Heiko?
> 
> We are planning to support objtool on s390. Vasily is working on it -
> maybe he has some thoughts about this.

We got CONFIG_DEBUG_ENTRY=y since 5.12, objtool runs on vmlinux.o but I have
not yet enabled --noinstr option in s390 objtool. So, it's hard to say in
advance if this particular change would make things better or worse.
In general, common code annotations are problematic, because arch
specific code is still not identical and this leads sometimes to different
needs for common code annotations.

I'll try to experiment with --noinstr on s390 shortly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling
  2021-04-21 12:19       ` Frederic Weisbecker
@ 2021-04-28 22:38         ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-04-28 22:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Thomas Gleixner,
	Michael Tokarev, Christian Borntraeger

Apologies for the slow response.

On Wed, Apr 21, 2021, Frederic Weisbecker wrote:
> On Tue, Apr 20, 2021 at 11:26:34PM +0000, Sean Christopherson wrote:
> > On Wed, Apr 21, 2021, Frederic Weisbecker wrote:
> > > On Thu, Apr 15, 2021 at 03:21:00PM -0700, Sean Christopherson wrote:
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 16fb39503296..e4d475df1d4a 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -9230,6 +9230,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > > >  	local_irq_disable();
> > > >  	kvm_after_interrupt(vcpu);
> > > >  
> > > > +	/*
> > > > +	 * When using tick-based accounting, wait until after servicing IRQs to
> > > > +	 * account guest time so that any ticks that occurred while running the
> > > > +	 * guest are properly accounted to the guest.
> > > > +	 */
> > > > +	if (!vtime_accounting_enabled_this_cpu())
> > > > +		vtime_account_guest_exit();
> > > 
> > > Can we rather have instead:
> > > 
> > > static inline void tick_account_guest_exit(void)
> > > {
> > > 	if (!vtime_accounting_enabled_this_cpu())
> > > 		current->flags &= ~PF_VCPU;
> > > }
> > > 
> > > It duplicates a bit of code but I think this will read less confusing.
> > 
> > Either way works for me.  I used vtime_account_guest_exit() to try to keep as
> > many details as possible inside vtime, e.g. in case the implemenation is tweaked
> > in the future.  But I agree that pretending KVM isn't already deeply intertwined
> > with the details is a lie.
> 
> Ah I see, before 87fa7f3e98a131 the vtime was accounted after interrupts get
> processed. So it used to work until then. I see that ARM64 waits for IRQs to
> be enabled too.
> 
> PPC/book3s_hv, MIPS, s390 do it before IRQs get re-enabled (weird, how does that
> work?)

No idea.  It's entirely possible it doesn't work on one or more of those
architectures.

Based on init/Kconfig, s390 doesn't support tick-based accounting, so I assume
s390 is ok.

  config TICK_CPU_ACCOUNTING
	bool "Simple tick based cputime accounting"
	depends on !S390 && !NO_HZ_FULL

> And PPC/book3s_pr calls guest_exit() so I guess it has interrupts enabled.
> 
> The point is: does it matter to call vtime_account_guest_exit() before or
> after interrupts? If it doesn't matter, we can simply call
> vtime_account_guest_exit() once and for all once IRQs are re-enabled.
> 
> If it does matter because we don't want to account the host IRQs firing at the
> end of vcpu exit, then probably we should standardize that behaviour and have
> guest_exit_vtime() called before interrupts get enabled and guest_exit_tick()
> called after interrupts get enabled. It's probably then beyond the scope of this
> patchset but I would like to poke your opinion on that.
> 
> Thanks.

I don't know.  For x86, I would be ok with simply moving the call to
vtime_account_guest_exit() to after IRQs are enabled.  It would bug me a little
bit that KVM _could_ be more precise when running with
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y, and KVM would still be poking into the details
of vtime_account_guest_exit() to some extent, but overall it would be an
improvement from a code cleanliness perspective.

The problem is I have no clue who, if anyone, deploys KVM on x86 with
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y.  On the other hand, AMD/SVM has always had the
"inaccurate" accounting, and Intel/VMX has been inaccurate since commit
d7a08882a0a4 ("KVM: x86: Unconditionally enable irqs in guest context"), which
amusingly was a fix for an edge case in tick-based accounting.

Anyone have an opinion either way?  I'm very tempted to go with Frederic's
suggestion of moving the time accounting back to where it was, it makes KVM just
a little less ugly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 22:20 [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Sean Christopherson
2021-04-15 22:20 ` [PATCH v3 1/9] context_tracking: Move guest exit context tracking to separate helpers Sean Christopherson
2021-04-20 18:48   ` Christian Borntraeger
2021-04-21 10:57   ` Frederic Weisbecker
2021-04-15 22:20 ` [PATCH v3 2/9] context_tracking: Move guest exit vtime accounting " Sean Christopherson
2021-04-20 18:48   ` Christian Borntraeger
2021-04-15 22:21 ` [PATCH v3 3/9] KVM: x86: Defer tick-based accounting 'til after IRQ handling Sean Christopherson
2021-04-20 23:14   ` Frederic Weisbecker
2021-04-20 23:26     ` Sean Christopherson
2021-04-21 10:11       ` Frederic Weisbecker
2021-04-21 12:19       ` Frederic Weisbecker
2021-04-28 22:38         ` Sean Christopherson
2021-04-21 10:07   ` Frederic Weisbecker
2021-04-15 22:21 ` [PATCH v3 4/9] sched/vtime: Move vtime accounting external declarations above inlines Sean Christopherson
2021-04-21  7:02   ` Christian Borntraeger
2021-04-15 22:21 ` [PATCH v3 5/9] sched/vtime: Move guest enter/exit vtime accounting to vtime.h Sean Christopherson
2021-04-15 22:21 ` [PATCH v3 6/9] context_tracking: Consolidate guest enter/exit wrappers Sean Christopherson
2021-04-15 22:21 ` [PATCH v3 7/9] context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain Sean Christopherson
2021-04-21  7:10   ` Christian Borntraeger
2021-04-15 22:21 ` [PATCH v3 8/9] KVM: x86: Consolidate guest enter/exit logic to common helpers Sean Christopherson
2021-04-15 22:21 ` [PATCH v3 9/9] KVM: Move instrumentation-safe annotations for enter/exit to x86 code Sean Christopherson
2021-04-21  8:09   ` Christian Borntraeger
2021-04-22 14:38     ` Sven Schnelle
2021-04-23  9:32       ` Vasily Gorbik
2021-04-20 23:33 ` [PATCH v3 0/9] KVM: Fix tick-based accounting for x86 guests Frederic Weisbecker

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git