linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
@ 2022-01-19 10:58 Mark Rutland
  2022-01-19 10:58 ` [PATCH v2 1/7] entry: add arch_in_rcu_eqs() Mark Rutland
                   ` (7 more replies)
  0 siblings, 8 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel, Paolo Bonzini, Michael Ellerman
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, nsaenzju, palmer, paulmck,
	paul.walmsley, peterz, seanjc, suzuki.poulose, svens, tglx,
	tsbogend, vkuznets, wanpengli, will

Several architectures have latent bugs around guest entry/exit. This
series addresses those for:

	arm64, mips, riscv, s390, x86

However, I'm not sure how to address powerpc and could do with some help
there. I have build-tested the arm64, mips, riscv, s390, and x86 cases,
but I don't have a suitable HW setup to test these, so any review and/or
testing would be much appreciated.

Issues include:

1) Several architectures enable interrupts between guest_enter() and
   guest_exit(). As this period is an RCU extended quiescent state (EQS)
   this is unsound unless the irq entry code explicitly wakes RCU, which
   most architectures only do for entry from usersapce or idle.

   I believe this affects: arm64, riscv, s390

   I am not sure about powerpc.

2) Several architectures permit instrumentation of code between
   guest_enter() and guest_exit(), e.g. KASAN, KCOV, KCSAN, etc. As
   instrumentation may directly o indirectly use RCU, this has the same
   problems as with interrupts.

   I believe this affects: arm64, mips, powerpc, riscv, s390

3) Several architectures do not inform lockdep and tracing that
   interrupts are enabled during the execution of the guest, or do so in
   an incorrect order. Generally this means that logs will report IRQs
   being masked for much longer than is actually the case, which is not
   ideal for debugging. I don't know whether this affects the
   correctness of lockdep.

   I believe this affects: arm64, mips, powerpc, riscv, s390

This was previously fixed for x86 specifically in a series of commits:

  87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
  0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
  9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
  3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
  135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
  160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
  bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")

But other architectures were left broken, and the infrastructure for
handling this correctly is x86-specific.

This series introduces generic helper functions which can be used to
handle the problems above, and migrates architectures over to these,
fixing the latent issues. For s390, where the KVM guest EQS is
interruptible, I've added infrastructure to wake RCU during this EQS.

Since v1 [1]:
* Add arch_in_rcu_eqs()
* Convert s390
* Rename exit_to_guest_mode() -> guest_state_enter_irqoff()
* Rename enter_from_guest_mode() -> guest_state_exit_irqoff()
* Various commit message cleanups

I've pushed the series (based on v5.16) to my kvm/entry-rework branch:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=kvm/entry-rework
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git kvm/entry-rework

... with this version tagged as kvm-entry-rework-20210119.

[1] https://lore.kernel.org/r/20220111153539.2532246-1-mark.rutland@arm.com/

Thanks,
Mark.

Mark Rutland (7):
  entry: add arch_in_rcu_eqs()
  kvm: add guest_state_{enter,exit}_irqoff()
  kvm/arm64: rework guest entry logic
  kvm/mips: rework guest entry logic
  kvm/riscv: rework guest entry logic
  kvm/s390: rework guest entry logic
  kvm/x86: rework guest entry logic

 arch/arm64/kvm/arm.c                 |  51 +++++++-----
 arch/mips/kvm/mips.c                 |  37 ++++++++-
 arch/riscv/kvm/vcpu.c                |  44 +++++++----
 arch/s390/include/asm/entry-common.h |  10 +++
 arch/s390/include/asm/kvm_host.h     |   3 +
 arch/s390/kvm/kvm-s390.c             |  49 +++++++++---
 arch/s390/kvm/vsie.c                 |  17 ++--
 arch/x86/kvm/svm/svm.c               |   4 +-
 arch/x86/kvm/vmx/vmx.c               |   4 +-
 arch/x86/kvm/x86.c                   |   4 +-
 arch/x86/kvm/x86.h                   |  45 -----------
 include/linux/entry-common.h         |  16 ++++
 include/linux/kvm_host.h             | 112 ++++++++++++++++++++++++++-
 kernel/entry/common.c                |   3 +-
 14 files changed, 286 insertions(+), 113 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/7] entry: add arch_in_rcu_eqs()
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-19 17:35   ` Christian Borntraeger
  2022-01-21 17:34   ` Nicolas Saenz Julienne
  2022-01-19 10:58 ` [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff() Mark Rutland
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, mark.rutland, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, pbonzini, seanjc, suzuki.poulose, svens, tsbogend,
	vkuznets, wanpengli, will

All architectures have an interruptible RCU extended quiescent state
(EQS) as part of their idle sequences, where interrupts can occur
without RCU watching. Entry code must account for this and wake RCU as
necessary; the common entry code deals with this in irqentry_enter() by
treating any interrupt from an idle thread as potentially having
occurred with an EQS and waking RCU for the duration of the interrupt
via rcu_irq_enter() .. rcu_irq_exit().

Some architectures may have other interruptible EQSs which require
similar treatment. For example, on s390 is it necessary to enable
interrupts around guest entry in the middle of a period where core KVM
code has entered an EQS.

So that architectueres can wake RCU in these cases, this patch adds a
new arch_in_rcu_eqs() hook to the common entry code which is checked in
addition to the existing is_idle_thread() check, with RCU woken if
either returns true. A default implementation is provided which always
returns false, which suffices for most architectures.

As no architectures currently implement arch_in_rcu_eqs(), there should
be no functional change as a result of this patch alone. A subsequent
patch will add an s390 implementation to fix a latent bug with missing
RCU wakeups.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/entry-common.h | 16 ++++++++++++++++
 kernel/entry/common.c        |  3 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 2e2b8d6140ed4..f1b91a13a15a6 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -99,6 +99,22 @@ static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs
 }
 #endif
 
+/**
+ * arch_in_rcu_eqs - Architecture specific check for RCU extended quiescent
+ * states.
+ *
+ * Returns: true if the CPU is potentially in an RCU EQS, false otherwise.
+ *
+ * Architectures only need to define this if threads other than the idle thread
+ * may have an interruptible EQS. This does not need to handle idle threads. It
+ * is safe to over-estimate at the cost of redundant RCU management work.
+ *
+ * Invoked from irqentry_enter()
+ */
+#ifndef arch_in_rcu_eqs
+static __always_inline bool arch_in_rcu_eqs(void) { return false; }
+#endif
+
 /**
  * enter_from_user_mode - Establish state when coming from user mode
  *
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index d5a61d565ad5d..b13d4e0b0b643 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -349,7 +349,8 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 	 * TINY_RCU does not support EQS, so let the compiler eliminate
 	 * this part when enabled.
 	 */
-	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
+	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
+	    (is_idle_task(current) || arch_in_rcu_eqs())) {
 		/*
 		 * If RCU is not watching then the same careful
 		 * sequence vs. lockdep and tracing is required
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff()
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
  2022-01-19 10:58 ` [PATCH v2 1/7] entry: add arch_in_rcu_eqs() Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-20 11:00   ` Paolo Bonzini
  2022-01-21 17:35   ` Nicolas Saenz Julienne
  2022-01-19 10:58 ` [PATCH v2 3/7] kvm/arm64: rework guest entry logic Mark Rutland
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, mpe, nsaenzju, palmer,
	paulmck, paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

When transitioning to/from guest mode, it is necessary to inform
lockdep, tracing, and RCU in a specific order, similar to the
requirements for transitions to/from user mode. Additionally, it is
necessary to perform vtime accounting for a window around running the
guest, with RCU enabled, such that timer interrupts taken from the guest
can be accounted as guest time.

Most architectures don't handle all the necessary pieces, and a have a
number of common bugs, including unsafe usage of RCU during the window
between guest_enter() and guest_exit().

On x86, this was dealt with across commits:

  87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
  0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
  9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
  3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
  135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
  160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
  bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")

... but those fixes are specific to x86, and as the resulting logic
(while correct) is split across generic helper functions and
x86-specific helper functions, it is difficult to see that the
entry/exit accounting is balanced.

This patch adds generic helpers which architectures can use to handle
guest entry/exit consistently and correctly. The guest_{enter,exit}()
helpers are split into guest_timing_{enter,exit}() to perform vtime
accounting, and guest_context_{enter,exit}() to perform the necessary
context tracking and RCU management. The existing guest_{enter,exit}()
heleprs are left as wrappers of these.

Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff()
helpers are added to handle the ordering of lockdep, tracing, and RCU
manageent. These are inteneded to mirror exit_to_user_mode() and
enter_from_user_mode().

Subsequent patches will migrate architectures over to the new helpers,
following a sequence:

	guest_timing_enter_irqoff();

	guest_state_enter_irqoff();
	< run the vcpu >
	guest_state_exit_irqoff();

	< take any pending IRQs >

	guest_timing_exit_irqoff();

This sequences handles all of the above correctly, and more clearly
balances the entry and exit portions, making it easier to understand.

The existing helpers are marked as deprecated, and will be removed once
all architectures have been converted.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h | 112 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 109 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c310648cc8f1a..774a3b9e9bd8d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -29,6 +29,8 @@
 #include <linux/refcount.h>
 #include <linux/nospec.h>
 #include <linux/notifier.h>
+#include <linux/ftrace.h>
+#include <linux/instrumentation.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -362,8 +364,11 @@ struct kvm_vcpu {
 	int last_used_slot;
 };
 
-/* must be called with irqs disabled */
-static __always_inline void guest_enter_irqoff(void)
+/*
+ * Start accounting time towards a guest.
+ * Must be called before entering guest context.
+ */
+static __always_inline void guest_timing_enter_irqoff(void)
 {
 	/*
 	 * This is running in ioctl context so its safe to assume that it's the
@@ -372,7 +377,18 @@ static __always_inline void guest_enter_irqoff(void)
 	instrumentation_begin();
 	vtime_account_guest_enter();
 	instrumentation_end();
+}
 
+/*
+ * Enter guest context and enter an RCU extended quiescent state.
+ *
+ * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is
+ * unsafe to use any code which may directly or indirectly use RCU, tracing
+ * (including IRQ flag tracing), or lockdep. All code in this period must be
+ * non-instrumentable.
+ */
+static __always_inline void guest_context_enter_irqoff(void)
+{
 	/*
 	 * KVM does not hold any references to rcu protected data when it
 	 * switches CPU into a guest mode. In fact switching to a guest mode
@@ -388,16 +404,79 @@ static __always_inline void guest_enter_irqoff(void)
 	}
 }
 
-static __always_inline void guest_exit_irqoff(void)
+/*
+ * Deprecated. Architectures should move to guest_timing_enter_irqoff() and
+ * guest_state_enter_irqoff().
+ */
+static __always_inline void guest_enter_irqoff(void)
+{
+	guest_timing_enter_irqoff();
+	guest_context_enter_irqoff();
+}
+
+/**
+ * guest_state_enter_irqoff - Fixup state when entering a guest
+ *
+ * Entry to a guest will enable interrupts, but the kernel state is interrupts
+ * disabled when this is invoked. Also tell RCU about it.
+ *
+ * 1) Trace interrupts on state
+ * 2) Invoke context tracking if enabled to adjust RCU state
+ * 3) Tell lockdep that interrupts are enabled
+ *
+ * Invoked from architecture specific code before entering a guest.
+ * Must be called with interrupts disabled and the caller must be
+ * non-instrumentable.
+ * The caller has to invoke guest_timing_enter_irqoff() before this.
+ *
+ * Note: this is analogous to exit_to_user_mode().
+ */
+static __always_inline void guest_state_enter_irqoff(void)
+{
+	instrumentation_begin();
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+	instrumentation_end();
+
+	guest_context_enter_irqoff();
+	lockdep_hardirqs_on(CALLER_ADDR0);
+}
+
+/*
+ * Exit guest context and exit an RCU extended quiescent state.
+ *
+ * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is
+ * unsafe to use any code which may directly or indirectly use RCU, tracing
+ * (including IRQ flag tracing), or lockdep. All code in this period must be
+ * non-instrumentable.
+ */
+static __always_inline void guest_context_exit_irqoff(void)
 {
 	context_tracking_guest_exit();
+}
 
+/*
+ * Stop accounting time towards a guest.
+ * Must be called after exiting guest context.
+ */
+static __always_inline void guest_timing_exit_irqoff(void)
+{
 	instrumentation_begin();
 	/* Flush the guest cputime we spent on the guest */
 	vtime_account_guest_exit();
 	instrumentation_end();
 }
 
+/*
+ * Deprecated. Architectures should move to guest_state_exit_irqoff() and
+ * guest_timing_exit_irqoff().
+ */
+static __always_inline void guest_exit_irqoff(void)
+{
+	guest_context_exit_irqoff();
+	guest_timing_exit_irqoff();
+}
+
 static inline void guest_exit(void)
 {
 	unsigned long flags;
@@ -407,6 +486,33 @@ static inline void guest_exit(void)
 	local_irq_restore(flags);
 }
 
+/**
+ * guest_state_exit_irqoff - Establish state when returning from guest mode
+ *
+ * Entry from a guest disables interrupts, but guest mode is traced as
+ * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
+ *
+ * 1) Tell lockdep that interrupts are disabled
+ * 2) Invoke context tracking if enabled to reactivate RCU
+ * 3) Trace interrupts off state
+ *
+ * Invoked from architecture specific code after exiting a guest.
+ * Must be invoked with interrupts disabled and the caller must be
+ * non-instrumentable.
+ * The caller has to invoke guest_timing_exit_irqoff() after this.
+ *
+ * Note: this is analogous to enter_from_user_mode().
+ */
+static __always_inline void guest_state_exit_irqoff(void)
+{
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	guest_context_exit_irqoff();
+
+	instrumentation_begin();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+}
+
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
 {
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/7] kvm/arm64: rework guest entry logic
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
  2022-01-19 10:58 ` [PATCH v2 1/7] entry: add arch_in_rcu_eqs() Mark Rutland
  2022-01-19 10:58 ` [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff() Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-21 17:37   ` Nicolas Saenz Julienne
  2022-01-19 10:58 ` [PATCH v2 4/7] kvm/mips: " Mark Rutland
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, mpe, nsaenzju, palmer,
	paulmck, paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state
(EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to
exiting the EQS by calling guest_exit(). As the IRQ entry code will not
wake RCU in this case, we may run the core IRQ code and IRQ handler
without RCU watching, leading to various potential problems.

Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.

This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:

	guest_timing_enter_irqoff();

	guest_state_enter_irqoff();
	< run the vcpu >
	guest_state_exit_irqoff();

	< take any pending IRQs >

	guest_timing_exit_irqoff();

Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_arm_enter_exit_vcpu() helper which is marked
noinstr.

Fixes: 1b3d546daf85ed2b ("arm/arm64: KVM: Properly account for guest CPU time")
Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 51 ++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e4727dc771bf3..b2222d8eb0b55 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -764,6 +764,24 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret)
 			xfer_to_guest_mode_work_pending();
 }
 
+/*
+ * Actually run the vCPU, entering an RCU extended quiescent state (EQS) while
+ * the vCPU is running.
+ *
+ * This must be noinstr as instrumentation may make use of RCU, and this is not
+ * safe during the EQS.
+ */
+static int noinstr kvm_arm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	guest_state_enter_irqoff();
+	ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
+	guest_state_exit_irqoff();
+
+	return ret;
+}
+
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:	The VCPU pointer
@@ -854,9 +872,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 * Enter the guest
 		 */
 		trace_kvm_entry(*vcpu_pc(vcpu));
-		guest_enter_irqoff();
+		guest_timing_enter_irqoff();
 
-		ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
+		ret = kvm_arm_vcpu_enter_exit(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->stat.exits++;
@@ -891,26 +909,23 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		kvm_arch_vcpu_ctxsync_fp(vcpu);
 
 		/*
-		 * We may have taken a host interrupt in HYP mode (ie
-		 * while executing the guest). This interrupt is still
-		 * pending, as we haven't serviced it yet!
+		 * We must ensure that any pending interrupts are taken before
+		 * we exit guest timing so that timer ticks are accounted as
+		 * guest time. Transiently unmask interrupts so that any
+		 * pending interrupts are taken.
 		 *
-		 * We're now back in SVC mode, with interrupts
-		 * disabled.  Enabling the interrupts now will have
-		 * the effect of taking the interrupt again, in SVC
-		 * mode this time.
+		 * Per ARM DDI 0487G.b section D1.13.4, an ISB (or other
+		 * context synchronization event) is necessary to ensure that
+		 * pending interrupts are taken.
 		 */
 		local_irq_enable();
+		isb();
+		local_irq_disable();
+
+		guest_timing_exit_irqoff();
+
+		local_irq_enable();
 
-		/*
-		 * We do local_irq_enable() before calling guest_exit() so
-		 * that if a timer interrupt hits while running the guest we
-		 * account that tick as being spent in the guest.  We enable
-		 * preemption after calling guest_exit() so that if we get
-		 * preempted we make sure ticks after that is not counted as
-		 * guest time.
-		 */
-		guest_exit();
 		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
 		/* Exit types that need handling before we can be preempted */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
                   ` (2 preceding siblings ...)
  2022-01-19 10:58 ` [PATCH v2 3/7] kvm/arm64: rework guest entry logic Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-20 11:10   ` Paolo Bonzini
  2022-01-20 16:44   ` Mark Rutland
  2022-01-19 10:58 ` [PATCH v2 5/7] kvm/riscv: " Mark Rutland
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, mpe, nsaenzju, palmer,
	paulmck, paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and
guest_exit_irqoff() directly, with interrupts masked between these. As
we don't handle any timer ticks during this window, we will not account
time spent within the guest as guest time, which is unfortunate.

Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.

This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:

	guest_timing_enter_irqoff();

	guest_state_enter_irqoff();
	< run the vcpu >
	guest_state_exit_irqoff();

	< take any pending IRQs >

	guest_timing_exit_irqoff();

Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_mips_enter_exit_vcpu() helper which is marked
noinstr.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
---
 arch/mips/kvm/mips.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index aa20d074d3883..1a961c2434fee 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -438,6 +438,24 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	return -ENOIOCTLCMD;
 }
 
+/*
+ * Actually run the vCPU, entering an RCU extended quiescent state (EQS) while
+ * the vCPU is running.
+ *
+ * This must be noinstr as instrumentation may make use of RCU, and this is not
+ * safe during the EQS.
+ */
+static int noinstr kvm_mips_vcpu_enter_exit(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	guest_state_enter_irqoff();
+	ret = kvm_mips_callbacks->vcpu_run(vcpu);
+	guest_state_exit_irqoff();
+
+	return ret;
+}
+
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
 	int r = -EINTR;
@@ -458,7 +476,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	lose_fpu(1);
 
 	local_irq_disable();
-	guest_enter_irqoff();
+	guest_timing_enter_irqoff();
 	trace_kvm_enter(vcpu);
 
 	/*
@@ -469,10 +487,23 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	 */
 	smp_store_mb(vcpu->mode, IN_GUEST_MODE);
 
-	r = kvm_mips_callbacks->vcpu_run(vcpu);
+	r = kvm_mips_vcpu_enter_exit(vcpu);
+
+	/*
+	 * We must ensure that any pending interrupts are taken before
+	 * we exit guest timing so that timer ticks are accounted as
+	 * guest time. Transiently unmask interrupts so that any
+	 * pending interrupts are taken.
+	 *
+	 * TODO: is there a barrier which ensures that pending interrupts are
+	 * recognised? Currently this just hopes that the CPU takes any pending
+	 * interrupts between the enable and disable.
+	 */
+	local_irq_enable();
+	local_irq_disable();
 
 	trace_kvm_out(vcpu);
-	guest_exit_irqoff();
+	guest_timing_exit_irqoff();
 	local_irq_enable();
 
 out:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 5/7] kvm/riscv: rework guest entry logic
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
                   ` (3 preceding siblings ...)
  2022-01-19 10:58 ` [PATCH v2 4/7] kvm/mips: " Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-20 11:18   ` Paolo Bonzini
  2022-01-19 10:58 ` [PATCH v2 6/7] kvm/s390: " Mark Rutland
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, mpe, nsaenzju, palmer,
	paulmck, paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state
(EQS) by calling guest_enter_irqoff(), and unmask IRQs prior to exiting
the EQS by calling guest_exit(). As the IRQ entry code will not wake RCU
in this case, we may run the core IRQ code and IRQ handler without RCU
watching, leading to various potential problems.

Additionally, we do not inform lockdep or tracing that interrupts will
be enabled during guest execution, which caan lead to misleading traces
and warnings that interrupts have been enabled for overly-long periods.

This patch fixes these issues by using the new timing and context
entry/exit helpers to ensure that interrupts are handled during guest
vtime but with RCU watching, with a sequence:

	guest_timing_enter_irqoff();

	guest_state_enter_irqoff();
	< run the vcpu >
	guest_state_exit_irqoff();

	< take any pending IRQs >

	guest_timing_exit_irqoff();

Since instrumentation may make use of RCU, we must also ensure that no
instrumented code is run during the EQS. I've split out the critical
section into a new kvm_riscv_enter_exit_vcpu() helper which is marked
noinstr.

Fixes: 99cdc6c18c2d815e ("RISC-V: Add initial skeletal KVM support")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
---
 arch/riscv/kvm/vcpu.c | 44 ++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index fb84619df0127..5033e8b8541aa 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -675,6 +675,20 @@ static void kvm_riscv_update_hvip(struct kvm_vcpu *vcpu)
 	csr_write(CSR_HVIP, csr->hvip);
 }
 
+/*
+ * Actually run the vCPU, entering an RCU extended quiescent state (EQS) while
+ * the vCPU is running.
+ *
+ * This must be noinstr as instrumentation may make use of RCU, and this is not
+ * safe during the EQS.
+ */
+static void noinstr kvm_riscv_vcpu_enter_exit(struct kvm_vcpu *vcpu)
+{
+	guest_state_enter_irqoff();
+	__kvm_riscv_switch_to(&vcpu->arch);
+	guest_state_exit_irqoff();
+}
+
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
 	int ret;
@@ -766,9 +780,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			continue;
 		}
 
-		guest_enter_irqoff();
+		guest_timing_enter_irqoff();
 
-		__kvm_riscv_switch_to(&vcpu->arch);
+		kvm_riscv_vcpu_enter_exit(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->stat.exits++;
@@ -788,25 +802,21 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		kvm_riscv_vcpu_sync_interrupts(vcpu);
 
 		/*
-		 * We may have taken a host interrupt in VS/VU-mode (i.e.
-		 * while executing the guest). This interrupt is still
-		 * pending, as we haven't serviced it yet!
+		 * We must ensure that any pending interrupts are taken before
+		 * we exit guest timing so that timer ticks are accounted as
+		 * guest time. Transiently unmask interrupts so that any
+		 * pending interrupts are taken.
 		 *
-		 * We're now back in HS-mode with interrupts disabled
-		 * so enabling the interrupts now will have the effect
-		 * of taking the interrupt again, in HS-mode this time.
+		 * There's no barrier which ensures that pending interrupts are
+		 * recognised, so we just hope that the CPU takes any pending
+		 * interrupts between the enable and disable.
 		 */
 		local_irq_enable();
+		local_irq_disable();
 
-		/*
-		 * We do local_irq_enable() before calling guest_exit() so
-		 * that if a timer interrupt hits while running the guest
-		 * we account that tick as being spent in the guest. We
-		 * enable preemption after calling guest_exit() so that if
-		 * we get preempted we make sure ticks after that is not
-		 * counted as guest time.
-		 */
-		guest_exit();
+		guest_timing_exit_irqoff();
+
+		local_irq_enable();
 
 		preempt_enable();
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 6/7] kvm/s390: rework guest entry logic
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
                   ` (4 preceding siblings ...)
  2022-01-19 10:58 ` [PATCH v2 5/7] kvm/riscv: " Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-19 10:58 ` [PATCH v2 7/7] kvm/x86: " Mark Rutland
  2022-01-19 18:25 ` [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Christian Borntraeger
  7 siblings, 0 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, mpe, nsaenzju, palmer,
	paulmck, paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

In __vcpu_run() and do_vsie_run(), we enter an RCU extended quiescent
state (EQS) by calling guest_enter_irqoff(), which lasts until
__vcpu_run() calls guest_exit_irqoff(). However, during the two we
enable interrupts and may handle interrupts during the EQS. As the IRQ
entry code will not wake RCU in this case, we may run the core IRQ code
and IRQ handler without RCU watching, leading to various potential
problems.

It is necessary to unmask (host) interrupts around entering the guest,
as entering the guest via SIE will not automatically unmask these. When
a host interrupts is taken from a guest, it is taken via its regular
host IRQ handler rather than being treated as a direct exit from SIE.
Due to this, we cannot simply mask interrupts around guest entry, and
must handle interrupts during this window, waking RCU as required.

Additionally, between guest_exit_irqoff() and guest_exit_irqoff(), we
use local_irq_enable() and local_irq_disable() to unmask interrupts,
violating the ordering requirements for RCU/lockdep/tracing around
entry/exit sequences. Further, since this occurs in an instrumentable
function, it's possible that instrumented code runs during this window,
with potential usage of RCU, etc.

To fix the RCU wakeup problem, an s390 implementation of
arch_in_rcu_eqs() is added which checks for PF_VCPU in current->flags.
PF_VCPU is set/cleared by guest_timing_{enter,exit}_irqoff(), which
surround the actual guest entry.

To fix the remaining issues, the lower-level guest entry logic is moved
into a shared noinstr helper function using the
guest_state_{enter,exit}_irqoff() helpers. These perform all the
lockdep/RCU/tracing manipulation necessary, but as sie64a() does not
enable/disable interrupts, we must do this explicitly with the
non-instrumented arch_local_irq_{enable,disable}() helpers:

	guest_state_enter_irqoff()

	arch_local_irq_enable();
	sie64a(...);
	arch_local_irq_disable();

	guest_state_exit_irqoff();

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
---
 arch/s390/include/asm/entry-common.h | 10 ++++++
 arch/s390/include/asm/kvm_host.h     |  3 ++
 arch/s390/kvm/kvm-s390.c             | 49 +++++++++++++++++++++-------
 arch/s390/kvm/vsie.c                 | 17 ++++------
 4 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/arch/s390/include/asm/entry-common.h b/arch/s390/include/asm/entry-common.h
index 17aead80aadba..e69a2ab28b847 100644
--- a/arch/s390/include/asm/entry-common.h
+++ b/arch/s390/include/asm/entry-common.h
@@ -57,6 +57,16 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 
 #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
 
+static __always_inline bool arch_in_rcu_eqs(void)
+{
+	if (IS_ENABLED(CONFIG_KVM))
+		return current->flags & PF_VCPU;
+
+	return false;
+}
+
+#define arch_in_rcu_eqs arch_in_rcu_eqs
+
 static inline bool on_thread_stack(void)
 {
 	return !(((unsigned long)(current->stack) ^ current_stack_pointer()) & ~(THREAD_SIZE - 1));
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a604d51acfc83..bf7efd990039b 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -995,6 +995,9 @@ void kvm_arch_crypto_set_masks(struct kvm *kvm, unsigned long *apm,
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 
+extern int kvm_s390_enter_exit_sie(struct kvm_s390_sie_block *scb,
+				   u64 *gprs);
+
 extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
 extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 14a18ba5ff2c8..d13401bf6a5a2 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4169,6 +4169,30 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 	return vcpu_post_run_fault_in_sie(vcpu);
 }
 
+int noinstr kvm_s390_enter_exit_sie(struct kvm_s390_sie_block *scb,
+				    u64 *gprs)
+{
+	int ret;
+
+	guest_state_enter_irqoff();
+
+	/*
+	 * The guest_state_{enter,exit}_irqoff() functions inform lockdep and
+	 * tracing that entry to the guest will enable host IRQs, and exit from
+	 * the guest will disable host IRQs.
+	 *
+	 * We must not use lockdep/tracing/RCU in this critical section, so we
+	 * use the low-level arch_local_irq_*() helpers to enable/disable IRQs.
+	 */
+	arch_local_irq_enable();
+	ret = sie64a(scb, gprs);
+	arch_local_irq_disable();
+
+	guest_state_exit_irqoff();
+
+	return ret;
+}
+
 #define PSW_INT_MASK (PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_MCHECK)
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
@@ -4189,12 +4213,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 		srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 		/*
 		 * As PF_VCPU will be used in fault handler, between
-		 * guest_enter and guest_exit should be no uaccess.
+		 * guest_timing_enter_irqoff and guest_timing_exit_irqoff
+		 * should be no uaccess.
 		 */
-		local_irq_disable();
-		guest_enter_irqoff();
-		__disable_cpu_timer_accounting(vcpu);
-		local_irq_enable();
 		if (kvm_s390_pv_cpu_is_protected(vcpu)) {
 			memcpy(sie_page->pv_grregs,
 			       vcpu->run->s.regs.gprs,
@@ -4202,8 +4223,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 		}
 		if (test_cpu_flag(CIF_FPU))
 			load_fpu_regs();
-		exit_reason = sie64a(vcpu->arch.sie_block,
-				     vcpu->run->s.regs.gprs);
+
+		local_irq_disable();
+		guest_timing_enter_irqoff();
+		__disable_cpu_timer_accounting(vcpu);
+
+		exit_reason = kvm_s390_enter_exit_sie(vcpu->arch.sie_block,
+						      vcpu->run->s.regs.gprs);
+
+		__enable_cpu_timer_accounting(vcpu);
+		guest_timing_exit_irqoff();
+		local_irq_enable();
+
 		if (kvm_s390_pv_cpu_is_protected(vcpu)) {
 			memcpy(vcpu->run->s.regs.gprs,
 			       sie_page->pv_grregs,
@@ -4219,10 +4250,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 				vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
 			}
 		}
-		local_irq_disable();
-		__enable_cpu_timer_accounting(vcpu);
-		guest_exit_irqoff();
-		local_irq_enable();
 		vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 
 		rc = vcpu_post_run(vcpu, exit_reason);
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index acda4b6fc8518..e9b0b2d04e1e3 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1106,10 +1106,6 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 	    vcpu->arch.sie_block->fpf & FPF_BPBC)
 		set_thread_flag(TIF_ISOLATE_BP_GUEST);
 
-	local_irq_disable();
-	guest_enter_irqoff();
-	local_irq_enable();
-
 	/*
 	 * Simulate a SIE entry of the VCPU (see sie64a), so VCPU blocking
 	 * and VCPU requests also hinder the vSIE from running and lead
@@ -1120,15 +1116,16 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 	barrier();
 	if (test_cpu_flag(CIF_FPU))
 		load_fpu_regs();
-	if (!kvm_s390_vcpu_sie_inhibited(vcpu))
-		rc = sie64a(scb_s, vcpu->run->s.regs.gprs);
+	if (!kvm_s390_vcpu_sie_inhibited(vcpu)) {
+		local_irq_disable();
+		guest_timing_enter_irqoff();
+		rc = kvm_s390_enter_exit_sie(scb_s, vcpu->run->s.regs.gprs);
+		guest_timing_exit_irqoff();
+		local_irq_enable();
+	}
 	barrier();
 	vcpu->arch.sie_block->prog0c &= ~PROG_IN_SIE;
 
-	local_irq_disable();
-	guest_exit_irqoff();
-	local_irq_enable();
-
 	/* restore guest state for bp isolation override */
 	if (!guest_bp_isolation)
 		clear_thread_flag(TIF_ISOLATE_BP_GUEST);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 7/7] kvm/x86: rework guest entry logic
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
                   ` (5 preceding siblings ...)
  2022-01-19 10:58 ` [PATCH v2 6/7] kvm/s390: " Mark Rutland
@ 2022-01-19 10:58 ` Mark Rutland
  2022-01-20 11:20   ` Paolo Bonzini
  2022-01-21 17:40   ` Nicolas Saenz Julienne
  2022-01-19 18:25 ` [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Christian Borntraeger
  7 siblings, 2 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, mark.rutland, maz, mingo, mpe, nsaenzju, palmer,
	paulmck, paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

For consistency and clarity, migrate x86 over to the generic helpers for
guest timing and lockdep/RCU/tracing management, and remove the
x86-specific helpers.

Prior to this patch, the guest timing was entered in
kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
svm_vcpu_enter_exit()), and was exited by the call to
vtime_account_guest_exit() within vcpu_enter_guest().

To minimize duplication and to more clearly balance entry and exit, both
entry and exit of guest timing are placed in vcpu_enter_guest(), using
the new guest_timing_{enter,exit}_irqoff() helpers. When context
tracking is used a small amount of additional time will be accounted
towards guests; tick-based accounting is unnaffected as IRQs are
disabled at this point and not enabled until after the return from the
guest.

This also corrects (benign) mis-balanced context tracking accounting
introduced in commits:

  ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath")
  26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath")

Where KVM can enter a guest multiple times, calling vtime_guest_enter()
without a corresponding call to vtime_account_guest_exit(), and with
vtime_account_system() called when vtime_account_guest() should be used.
As account_system_time() checks PF_VCPU and calls account_guest_time(),
this doesn't result in any functional problem, but is unnecessarily
confusing.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/svm/svm.c |  4 ++--
 arch/x86/kvm/vmx/vmx.c |  4 ++--
 arch/x86/kvm/x86.c     |  4 +++-
 arch/x86/kvm/x86.h     | 45 ------------------------------------------
 4 files changed, 7 insertions(+), 50 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5151efa424acb..1253add2c1075 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3814,7 +3814,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	unsigned long vmcb_pa = svm->current_vmcb->pa;
 
-	kvm_guest_enter_irqoff();
+	guest_state_enter_irqoff();
 
 	if (sev_es_guest(vcpu->kvm)) {
 		__svm_sev_es_vcpu_run(vmcb_pa);
@@ -3834,7 +3834,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 		vmload(__sme_page_pa(sd->save_area));
 	}
 
-	kvm_guest_exit_irqoff();
+	guest_state_exit_irqoff();
 }
 
 static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0dbf94eb954fd..f458026a85159 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6593,7 +6593,7 @@ static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 					struct vcpu_vmx *vmx)
 {
-	kvm_guest_enter_irqoff();
+	guest_state_enter_irqoff();
 
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
@@ -6609,7 +6609,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.cr2 = native_read_cr2();
 
-	kvm_guest_exit_irqoff();
+	guest_state_exit_irqoff();
 }
 
 static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e50e97ac44084..bd3873b90889d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9876,6 +9876,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		set_debugreg(0, 7);
 	}
 
+	guest_timing_enter_irqoff();
+
 	for (;;) {
 		/*
 		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
@@ -9949,7 +9951,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * of accounting via context tracking, but the loss of accuracy is
 	 * acceptable for all known use cases.
 	 */
-	vtime_account_guest_exit();
+	guest_timing_exit_irqoff();
 
 	if (lapic_in_kernel(vcpu)) {
 		s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 4abcd8d9836dd..8e50645ac740e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -10,51 +10,6 @@
 
 void kvm_spurious_fault(void);
 
-static __always_inline void kvm_guest_enter_irqoff(void)
-{
-	/*
-	 * VMENTER enables interrupts (host state), but the kernel state is
-	 * interrupts disabled when this is invoked. Also tell RCU about
-	 * it. This is the same logic as for exit_to_user_mode().
-	 *
-	 * This ensures that e.g. latency analysis on the host observes
-	 * guest mode as interrupt enabled.
-	 *
-	 * guest_enter_irqoff() informs context tracking about the
-	 * transition to guest mode and if enabled adjusts RCU state
-	 * accordingly.
-	 */
-	instrumentation_begin();
-	trace_hardirqs_on_prepare();
-	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
-	instrumentation_end();
-
-	guest_enter_irqoff();
-	lockdep_hardirqs_on(CALLER_ADDR0);
-}
-
-static __always_inline void kvm_guest_exit_irqoff(void)
-{
-	/*
-	 * VMEXIT disables interrupts (host state), but tracing and lockdep
-	 * have them in state 'on' as recorded before entering guest mode.
-	 * Same as enter_from_user_mode().
-	 *
-	 * context_tracking_guest_exit() restores host context and reinstates
-	 * RCU if enabled and required.
-	 *
-	 * This needs to be done immediately after VM-Exit, before any code
-	 * that might contain tracepoints or call out to the greater world,
-	 * e.g. before x86_spec_ctrl_restore_host().
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	context_tracking_guest_exit();
-
-	instrumentation_begin();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
-}
-
 #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)		\
 ({									\
 	bool failed = (consistency_check);				\
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/7] entry: add arch_in_rcu_eqs()
  2022-01-19 10:58 ` [PATCH v2 1/7] entry: add arch_in_rcu_eqs() Mark Rutland
@ 2022-01-19 17:35   ` Christian Borntraeger
  2022-01-21 17:34   ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-19 17:35 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, maz,
	mingo, mpe, nsaenzju, palmer, paulmck, paul.walmsley, pbonzini,
	seanjc, suzuki.poulose, svens, tsbogend, vkuznets, wanpengli,
	will



Am 19.01.22 um 11:58 schrieb Mark Rutland:
> All architectures have an interruptible RCU extended quiescent state
> (EQS) as part of their idle sequences, where interrupts can occur
> without RCU watching. Entry code must account for this and wake RCU as
> necessary; the common entry code deals with this in irqentry_enter() by
> treating any interrupt from an idle thread as potentially having
> occurred with an EQS and waking RCU for the duration of the interrupt
> via rcu_irq_enter() .. rcu_irq_exit().
> 
> Some architectures may have other interruptible EQSs which require
> similar treatment. For example, on s390 is it necessary to enable
> interrupts around guest entry in the middle of a period where core KVM
> code has entered an EQS.
> 
> So that architectueres can wake RCU in these cases, this patch adds a
                     ^
> new arch_in_rcu_eqs() hook to the common entry code which is checked in
> addition to the existing is_idle_thread() check, with RCU woken if
> either returns true. A default implementation is provided which always
> returns false, which suffices for most architectures.
> 
> As no architectures currently implement arch_in_rcu_eqs(), there should
> be no functional change as a result of this patch alone. A subsequent
> patch will add an s390 implementation to fix a latent bug with missing
> RCU wakeups.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>   include/linux/entry-common.h | 16 ++++++++++++++++
>   kernel/entry/common.c        |  3 ++-
>   2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index 2e2b8d6140ed4..f1b91a13a15a6 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -99,6 +99,22 @@ static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs
>   }
>   #endif
>   
> +/**
> + * arch_in_rcu_eqs - Architecture specific check for RCU extended quiescent
> + * states.
> + *
> + * Returns: true if the CPU is potentially in an RCU EQS, false otherwise.
> + *
> + * Architectures only need to define this if threads other than the idle thread
> + * may have an interruptible EQS. This does not need to handle idle threads. It
> + * is safe to over-estimate at the cost of redundant RCU management work.
> + *
> + * Invoked from irqentry_enter()
> + */
> +#ifndef arch_in_rcu_eqs
> +static __always_inline bool arch_in_rcu_eqs(void) { return false; }
> +#endif
> +
>   /**
>    * enter_from_user_mode - Establish state when coming from user mode
>    *
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index d5a61d565ad5d..b13d4e0b0b643 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -349,7 +349,8 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>   	 * TINY_RCU does not support EQS, so let the compiler eliminate
>   	 * this part when enabled.
>   	 */
> -	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
> +	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
> +	    (is_idle_task(current) || arch_in_rcu_eqs())) {
>   		/*
>   		 * If RCU is not watching then the same careful
>   		 * sequence vs. lockdep and tracing is required

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
                   ` (6 preceding siblings ...)
  2022-01-19 10:58 ` [PATCH v2 7/7] kvm/x86: " Mark Rutland
@ 2022-01-19 18:25 ` Christian Borntraeger
  2022-01-19 18:28   ` Christian Borntraeger
  2022-01-19 19:22   ` Mark Rutland
  7 siblings, 2 replies; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-19 18:25 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel, Paolo Bonzini, Michael Ellerman
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

Am 19.01.22 um 11:58 schrieb Mark Rutland:


CCing new emails for Anup and Atish so that they are aware of this thread.

> Several architectures have latent bugs around guest entry/exit. This
> series addresses those for:
> 
> 	arm64, mips, riscv, s390, x86
> 
> However, I'm not sure how to address powerpc and could do with some help
> there. I have build-tested the arm64, mips, riscv, s390, and x86 cases,
> but I don't have a suitable HW setup to test these, so any review and/or
> testing would be much appreciated.
> 
> Issues include:
> 
> 1) Several architectures enable interrupts between guest_enter() and
>     guest_exit(). As this period is an RCU extended quiescent state (EQS)
>     this is unsound unless the irq entry code explicitly wakes RCU, which
>     most architectures only do for entry from usersapce or idle.
> 
>     I believe this affects: arm64, riscv, s390
> 
>     I am not sure about powerpc.
> 
> 2) Several architectures permit instrumentation of code between
>     guest_enter() and guest_exit(), e.g. KASAN, KCOV, KCSAN, etc. As
>     instrumentation may directly o indirectly use RCU, this has the same
>     problems as with interrupts.
> 
>     I believe this affects: arm64, mips, powerpc, riscv, s390
> 
> 3) Several architectures do not inform lockdep and tracing that
>     interrupts are enabled during the execution of the guest, or do so in
>     an incorrect order. Generally this means that logs will report IRQs
>     being masked for much longer than is actually the case, which is not
>     ideal for debugging. I don't know whether this affects the
>     correctness of lockdep.
> 
>     I believe this affects: arm64, mips, powerpc, riscv, s390
> 
> This was previously fixed for x86 specifically in a series of commits:
> 
>    87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
>    0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
>    9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
>    3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
>    135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
>    160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
>    bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
> 
> But other architectures were left broken, and the infrastructure for
> handling this correctly is x86-specific.
> 
> This series introduces generic helper functions which can be used to
> handle the problems above, and migrates architectures over to these,
> fixing the latent issues. For s390, where the KVM guest EQS is
> interruptible, I've added infrastructure to wake RCU during this EQS.
> 
> Since v1 [1]:
> * Add arch_in_rcu_eqs()
> * Convert s390
> * Rename exit_to_guest_mode() -> guest_state_enter_irqoff()
> * Rename enter_from_guest_mode() -> guest_state_exit_irqoff()
> * Various commit message cleanups
> 
> I've pushed the series (based on v5.16) to my kvm/entry-rework branch:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=kvm/entry-rework
>    git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git kvm/entry-rework
> 
> ... with this version tagged as kvm-entry-rework-20210119.
> 
> [1] https://lore.kernel.org/r/20220111153539.2532246-1-mark.rutland@arm.com/
> 
> Thanks,
> Mark.
> 
> Mark Rutland (7):
>    entry: add arch_in_rcu_eqs()
>    kvm: add guest_state_{enter,exit}_irqoff()
>    kvm/arm64: rework guest entry logic
>    kvm/mips: rework guest entry logic
>    kvm/riscv: rework guest entry logic
>    kvm/s390: rework guest entry logic
>    kvm/x86: rework guest entry logic
> 
>   arch/arm64/kvm/arm.c                 |  51 +++++++-----
>   arch/mips/kvm/mips.c                 |  37 ++++++++-
>   arch/riscv/kvm/vcpu.c                |  44 +++++++----
>   arch/s390/include/asm/entry-common.h |  10 +++
>   arch/s390/include/asm/kvm_host.h     |   3 +
>   arch/s390/kvm/kvm-s390.c             |  49 +++++++++---
>   arch/s390/kvm/vsie.c                 |  17 ++--
>   arch/x86/kvm/svm/svm.c               |   4 +-
>   arch/x86/kvm/vmx/vmx.c               |   4 +-
>   arch/x86/kvm/x86.c                   |   4 +-
>   arch/x86/kvm/x86.h                   |  45 -----------
>   include/linux/entry-common.h         |  16 ++++
>   include/linux/kvm_host.h             | 112 ++++++++++++++++++++++++++-
>   kernel/entry/common.c                |   3 +-
>   14 files changed, 286 insertions(+), 113 deletions(-)
> 


I just gave this a spin on s390 with debugging on and I got the following:

[  457.151295] ------------[ cut here ]------------
[  457.151311] WARNING: CPU: 14 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0xf8/0x118
[  457.151324] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ib_uverbs s390_trng ib_core genwqe_card crc_itu_t vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4
[  457.151422] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 5.16.0-00007-g89e9021389e2 #3
[  457.151428] Hardware name: IBM 3906 M04 704 (LPAR)
[  457.151432] Krnl PSW : 0404d00180000000 00000000a7c0495c (rcu_eqs_enter.constprop.0+0xfc/0x118)
[  457.151440]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[  457.151445] Krnl GPRS: ffffffffebd81d31 4000000000000000 0000000000000070 00000000a7fd7024
[  457.151450]            0000000000000000 0000000000000001 0000000000000000 0000000000000000
[  457.151454]            000000000000000e 000000000000000e 00000000a84d3a88 0000001fd8645c00
[  457.151458]            0000000000000000 0000000000000000 00000000a7c04882 0000038000653dc0
[  457.151468] Krnl Code: 00000000a7c0494c: ebaff0a00004	lmg	%r10,%r15,160(%r15)
                           00000000a7c04952: c0f4fffffef7	brcl	15,00000000a7c04740
                          #00000000a7c04958: af000000		mc	0,0
                          >00000000a7c0495c: a7f4ffa3		brc	15,00000000a7c048a2
                           00000000a7c04960: c0e500003f70	brasl	%r14,00000000a7c0c840
                           00000000a7c04966: a7f4ffcd		brc	15,00000000a7c04900
                           00000000a7c0496a: c0e500003f6b	brasl	%r14,00000000a7c0c840
                           00000000a7c04970: a7f4ffde		brc	15,00000000a7c0492c
[  457.151527] Call Trace:
[  457.151530]  [<00000000a7c0495c>] rcu_eqs_enter.constprop.0+0xfc/0x118
[  457.151536] ([<00000000a7c04882>] rcu_eqs_enter.constprop.0+0x22/0x118)
[  457.151540]  [<00000000a7c14cd2>] default_idle_call+0x62/0xd8
[  457.151545]  [<00000000a6f816c6>] do_idle+0xf6/0x1b0
[  457.151553]  [<00000000a6f81a06>] cpu_startup_entry+0x36/0x40
[  457.151558]  [<00000000a7c16abe>] restart_int_handler+0x6e/0x90
[  457.151563] no locks held by swapper/14/0.
[  457.151567] Last Breaking-Event-Address:
[  457.151570]  [<00000000a7c0489e>] rcu_eqs_enter.constprop.0+0x3e/0x118
[  457.151574] irq event stamp: 608654
[  457.151578] hardirqs last  enabled at (608653): [<00000000a70190d8>] tick_nohz_idle_enter+0xb0/0x130
[  457.151584] hardirqs last disabled at (608654): [<00000000a6f8173e>] do_idle+0x16e/0x1b0
[  457.151589] softirqs last  enabled at (608586): [<00000000a7c1861a>] __do_softirq+0x4ba/0x668
[  457.151594] softirqs last disabled at (608581): [<00000000a6f367c6>] __irq_exit_rcu+0x13e/0x170
[  457.151600] ---[ end trace 2ae2154f9724de86 ]---

I can not see right now whats wrong, your patches look sane.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-19 18:25 ` [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Christian Borntraeger
@ 2022-01-19 18:28   ` Christian Borntraeger
  2022-01-19 19:22   ` Mark Rutland
  1 sibling, 0 replies; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-19 18:28 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel, Paolo Bonzini, Michael Ellerman
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

> [  457.151295] ------------[ cut here ]------------
> [  457.151311] WARNING: CPU: 14 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0xf8/0x118
> [  457.151324] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ib_uverbs s390_trng ib_core genwqe_card crc_itu_t vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4
> [  457.151422] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 5.16.0-00007-g89e9021389e2 #3
> [  457.151428] Hardware name: IBM 3906 M04 704 (LPAR)
> [  457.151432] Krnl PSW : 0404d00180000000 00000000a7c0495c (rcu_eqs_enter.constprop.0+0xfc/0x118)
> [  457.151440]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> [  457.151445] Krnl GPRS: ffffffffebd81d31 4000000000000000 0000000000000070 00000000a7fd7024
> [  457.151450]            0000000000000000 0000000000000001 0000000000000000 0000000000000000
> [  457.151454]            000000000000000e 000000000000000e 00000000a84d3a88 0000001fd8645c00
> [  457.151458]            0000000000000000 0000000000000000 00000000a7c04882 0000038000653dc0
> [  457.151468] Krnl Code: 00000000a7c0494c: ebaff0a00004    lmg    %r10,%r15,160(%r15)
>                            00000000a7c04952: c0f4fffffef7    brcl    15,00000000a7c04740
>                           #00000000a7c04958: af000000        mc    0,0
>                           >00000000a7c0495c: a7f4ffa3        brc    15,00000000a7c048a2
>                            00000000a7c04960: c0e500003f70    brasl    %r14,00000000a7c0c840
>                            00000000a7c04966: a7f4ffcd        brc    15,00000000a7c04900
>                            00000000a7c0496a: c0e500003f6b    brasl    %r14,00000000a7c0c840
>                            00000000a7c04970: a7f4ffde        brc    15,00000000a7c0492c
> [  457.151527] Call Trace:
> [  457.151530]  [<00000000a7c0495c>] rcu_eqs_enter.constprop.0+0xfc/0x118
> [  457.151536] ([<00000000a7c04882>] rcu_eqs_enter.constprop.0+0x22/0x118)
> [  457.151540]  [<00000000a7c14cd2>] default_idle_call+0x62/0xd8
> [  457.151545]  [<00000000a6f816c6>] do_idle+0xf6/0x1b0
> [  457.151553]  [<00000000a6f81a06>] cpu_startup_entry+0x36/0x40
> [  457.151558]  [<00000000a7c16abe>] restart_int_handler+0x6e/0x90
> [  457.151563] no locks held by swapper/14/0.
> [  457.151567] Last Breaking-Event-Address:
> [  457.151570]  [<00000000a7c0489e>] rcu_eqs_enter.constprop.0+0x3e/0x118
> [  457.151574] irq event stamp: 608654
> [  457.151578] hardirqs last  enabled at (608653): [<00000000a70190d8>] tick_nohz_idle_enter+0xb0/0x130
> [  457.151584] hardirqs last disabled at (608654): [<00000000a6f8173e>] do_idle+0x16e/0x1b0
> [  457.151589] softirqs last  enabled at (608586): [<00000000a7c1861a>] __do_softirq+0x4ba/0x668
> [  457.151594] softirqs last disabled at (608581): [<00000000a6f367c6>] __irq_exit_rcu+0x13e/0x170
> [  457.151600] ---[ end trace 2ae2154f9724de86 ]---
> 
> I can not see right now whats wrong, your patches look sane.


Now followed by (might be a followup to the error above).

[  574.301736] =============================
[  574.301741] WARNING: suspicious RCU usage
[  574.301746] 5.16.0-00007-g89e9021389e2 #3 Tainted: G        W
[  574.301751] -----------------------------
[  574.301755] kernel/rcu/tree.c:821 Bad RCU  dynticks_nmi_nesting counter
                !
[  574.301760]
                other info that might help us debug this:

[  574.301764]
                rcu_scheduler_active = 2, debug_locks = 1
[  574.301769] 2 locks held by CPU 0/KVM/8327:
[  574.301773]  #0: 00000001521956b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x9a/0xa40 [kvm]
[  574.301843]  #1: 000000016e92c6a8 (&kvm->srcu){....}-{0:0}, at: __vcpu_run+0x332/0x5b8 [kvm]
[  574.301873]
                stack backtrace:
[  574.301878] CPU: 46 PID: 8327 Comm: CPU 0/KVM Tainted: G        W         5.16.0-00007-g89e9021389e2 #3
[  574.301884] Hardware name: IBM 3906 M04 704 (LPAR)
[  574.301888] Call Trace:
[  574.301892]  [<00000000a7c001c6>] dump_stack_lvl+0x8e/0xc8
[  574.301903]  [<00000000a6fee70e>] rcu_irq_exit_check_preempt+0x136/0x1c8
[  574.301913]  [<00000000a6ff9d1a>] irqentry_exit_cond_resched+0x32/0x80
[  574.301921]  [<00000000a7c0521c>] irqentry_exit+0xac/0x130
[  574.301927]  [<00000000a7c16626>] ext_int_handler+0xe6/0x120
[  574.301933]  [<00000000a6fc2482>] lock_acquire.part.0+0x12a/0x258
[  574.301939] ([<00000000a6fc2450>] lock_acquire.part.0+0xf8/0x258)
[  574.301944]  [<00000000a6fc2660>] lock_acquire+0xb0/0x200
[  574.302053]  [<000003ff807092ce>] __vcpu_run+0x376/0x5b8 [kvm]
[  574.302076]  [<000003ff807099ba>] kvm_arch_vcpu_ioctl_run+0x10a/0x270 [kvm]
[  574.302098]  [<000003ff806f002c>] kvm_vcpu_ioctl+0x27c/0xa40 [kvm]
[  574.302120]  [<00000000a728b5b6>] __s390x_sys_ioctl+0xbe/0x100
[  574.302129]  [<00000000a7c038ea>] __do_syscall+0x1da/0x208
[  574.302133]  [<00000000a7c16352>] system_call+0x82/0xb0
[  574.302138] 2 locks held by CPU 0/KVM/8327:
[  574.302143]  #0: 00000001521956b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x9a/0xa40 [kvm]
[  574.302172]  #1: 000000016e92c6a8 (&kvm->srcu){....}-{0:0}, at: __vcpu_run+0x332/0x5b8 [kvm]


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-19 18:25 ` [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Christian Borntraeger
  2022-01-19 18:28   ` Christian Borntraeger
@ 2022-01-19 19:22   ` Mark Rutland
  2022-01-19 19:30     ` Christian Borntraeger
  2022-01-20 11:28     ` Paolo Bonzini
  1 sibling, 2 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-19 19:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-kernel, Paolo Bonzini, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

On Wed, Jan 19, 2022 at 07:25:20PM +0100, Christian Borntraeger wrote:
> Am 19.01.22 um 11:58 schrieb Mark Rutland:
> 
> 
> CCing new emails for Anup and Atish so that they are aware of this thread.

Ah; whoops. I'd meant to fix the Ccs on the patches.

Thanks!

[...]

> I just gave this a spin on s390 with debugging on and I got the following:
> 
> [  457.151295] ------------[ cut here ]------------
> [  457.151311] WARNING: CPU: 14 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0xf8/0x118

Hmm, so IIUC that's:

	WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);

... and we're clearly in the idle thread here.

I wonder, is the s390 guest entry/exit *preemptible* ?

If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
things before a ctx-switch to the idle thread, which would then be able
to hit this.

I'll need to go audit the other architectures for similar.

Thanks,
Mark.

> [  457.151324] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ib_uverbs s390_trng ib_core genwqe_card crc_itu_t vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4
> [  457.151422] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 5.16.0-00007-g89e9021389e2 #3
> [  457.151428] Hardware name: IBM 3906 M04 704 (LPAR)
> [  457.151432] Krnl PSW : 0404d00180000000 00000000a7c0495c (rcu_eqs_enter.constprop.0+0xfc/0x118)
> [  457.151440]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> [  457.151445] Krnl GPRS: ffffffffebd81d31 4000000000000000 0000000000000070 00000000a7fd7024
> [  457.151450]            0000000000000000 0000000000000001 0000000000000000 0000000000000000
> [  457.151454]            000000000000000e 000000000000000e 00000000a84d3a88 0000001fd8645c00
> [  457.151458]            0000000000000000 0000000000000000 00000000a7c04882 0000038000653dc0
> [  457.151468] Krnl Code: 00000000a7c0494c: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>                           00000000a7c04952: c0f4fffffef7	brcl	15,00000000a7c04740
>                          #00000000a7c04958: af000000		mc	0,0
>                          >00000000a7c0495c: a7f4ffa3		brc	15,00000000a7c048a2
>                           00000000a7c04960: c0e500003f70	brasl	%r14,00000000a7c0c840
>                           00000000a7c04966: a7f4ffcd		brc	15,00000000a7c04900
>                           00000000a7c0496a: c0e500003f6b	brasl	%r14,00000000a7c0c840
>                           00000000a7c04970: a7f4ffde		brc	15,00000000a7c0492c
> [  457.151527] Call Trace:
> [  457.151530]  [<00000000a7c0495c>] rcu_eqs_enter.constprop.0+0xfc/0x118
> [  457.151536] ([<00000000a7c04882>] rcu_eqs_enter.constprop.0+0x22/0x118)
> [  457.151540]  [<00000000a7c14cd2>] default_idle_call+0x62/0xd8
> [  457.151545]  [<00000000a6f816c6>] do_idle+0xf6/0x1b0
> [  457.151553]  [<00000000a6f81a06>] cpu_startup_entry+0x36/0x40
> [  457.151558]  [<00000000a7c16abe>] restart_int_handler+0x6e/0x90
> [  457.151563] no locks held by swapper/14/0.
> [  457.151567] Last Breaking-Event-Address:
> [  457.151570]  [<00000000a7c0489e>] rcu_eqs_enter.constprop.0+0x3e/0x118
> [  457.151574] irq event stamp: 608654
> [  457.151578] hardirqs last  enabled at (608653): [<00000000a70190d8>] tick_nohz_idle_enter+0xb0/0x130
> [  457.151584] hardirqs last disabled at (608654): [<00000000a6f8173e>] do_idle+0x16e/0x1b0
> [  457.151589] softirqs last  enabled at (608586): [<00000000a7c1861a>] __do_softirq+0x4ba/0x668
> [  457.151594] softirqs last disabled at (608581): [<00000000a6f367c6>] __irq_exit_rcu+0x13e/0x170
> [  457.151600] ---[ end trace 2ae2154f9724de86 ]---
> 
> I can not see right now whats wrong, your patches look sane.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-19 19:22   ` Mark Rutland
@ 2022-01-19 19:30     ` Christian Borntraeger
  2022-01-20 11:57       ` Mark Rutland
  2022-01-20 11:28     ` Paolo Bonzini
  1 sibling, 1 reply; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-19 19:30 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Paolo Bonzini, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra



Am 19.01.22 um 20:22 schrieb Mark Rutland:
> On Wed, Jan 19, 2022 at 07:25:20PM +0100, Christian Borntraeger wrote:
>> Am 19.01.22 um 11:58 schrieb Mark Rutland:
>>
>>
>> CCing new emails for Anup and Atish so that they are aware of this thread.
> 
> Ah; whoops. I'd meant to fix the Ccs on the patches.
> 
> Thanks!
> 
> [...]
> 
>> I just gave this a spin on s390 with debugging on and I got the following:
>>
>> [  457.151295] ------------[ cut here ]------------
>> [  457.151311] WARNING: CPU: 14 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0xf8/0x118
> 
> Hmm, so IIUC that's:
> 
> 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> 
> ... and we're clearly in the idle thread here.
> 
> I wonder, is the s390 guest entry/exit *preemptible* ?

Looks like debug_defconfig is indeed using preemption:

CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACE_PREEMPT_TOGGLE=y
CONFIG_PREEMPT_TRACER=y
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

> 
> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
> things before a ctx-switch to the idle thread, which would then be able
> to hit this.
> 
> I'll need to go audit the other architectures for similar.
> 
> Thanks,
> Mark.
> 
>> [  457.151324] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ib_uverbs s390_trng ib_core genwqe_card crc_itu_t vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4
>> [  457.151422] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 5.16.0-00007-g89e9021389e2 #3
>> [  457.151428] Hardware name: IBM 3906 M04 704 (LPAR)
>> [  457.151432] Krnl PSW : 0404d00180000000 00000000a7c0495c (rcu_eqs_enter.constprop.0+0xfc/0x118)
>> [  457.151440]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
>> [  457.151445] Krnl GPRS: ffffffffebd81d31 4000000000000000 0000000000000070 00000000a7fd7024
>> [  457.151450]            0000000000000000 0000000000000001 0000000000000000 0000000000000000
>> [  457.151454]            000000000000000e 000000000000000e 00000000a84d3a88 0000001fd8645c00
>> [  457.151458]            0000000000000000 0000000000000000 00000000a7c04882 0000038000653dc0
>> [  457.151468] Krnl Code: 00000000a7c0494c: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>                            00000000a7c04952: c0f4fffffef7	brcl	15,00000000a7c04740
>>                           #00000000a7c04958: af000000		mc	0,0
>>                           >00000000a7c0495c: a7f4ffa3		brc	15,00000000a7c048a2
>>                            00000000a7c04960: c0e500003f70	brasl	%r14,00000000a7c0c840
>>                            00000000a7c04966: a7f4ffcd		brc	15,00000000a7c04900
>>                            00000000a7c0496a: c0e500003f6b	brasl	%r14,00000000a7c0c840
>>                            00000000a7c04970: a7f4ffde		brc	15,00000000a7c0492c
>> [  457.151527] Call Trace:
>> [  457.151530]  [<00000000a7c0495c>] rcu_eqs_enter.constprop.0+0xfc/0x118
>> [  457.151536] ([<00000000a7c04882>] rcu_eqs_enter.constprop.0+0x22/0x118)
>> [  457.151540]  [<00000000a7c14cd2>] default_idle_call+0x62/0xd8
>> [  457.151545]  [<00000000a6f816c6>] do_idle+0xf6/0x1b0
>> [  457.151553]  [<00000000a6f81a06>] cpu_startup_entry+0x36/0x40
>> [  457.151558]  [<00000000a7c16abe>] restart_int_handler+0x6e/0x90
>> [  457.151563] no locks held by swapper/14/0.
>> [  457.151567] Last Breaking-Event-Address:
>> [  457.151570]  [<00000000a7c0489e>] rcu_eqs_enter.constprop.0+0x3e/0x118
>> [  457.151574] irq event stamp: 608654
>> [  457.151578] hardirqs last  enabled at (608653): [<00000000a70190d8>] tick_nohz_idle_enter+0xb0/0x130
>> [  457.151584] hardirqs last disabled at (608654): [<00000000a6f8173e>] do_idle+0x16e/0x1b0
>> [  457.151589] softirqs last  enabled at (608586): [<00000000a7c1861a>] __do_softirq+0x4ba/0x668
>> [  457.151594] softirqs last disabled at (608581): [<00000000a6f367c6>] __irq_exit_rcu+0x13e/0x170
>> [  457.151600] ---[ end trace 2ae2154f9724de86 ]---
>>
>> I can not see right now whats wrong, your patches look sane.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff()
  2022-01-19 10:58 ` [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff() Mark Rutland
@ 2022-01-20 11:00   ` Paolo Bonzini
  2022-01-21 17:35   ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 11:00 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, peterz, seanjc, suzuki.poulose, svens, tglx,
	tsbogend, vkuznets, wanpengli, will

On 1/19/22 11:58, Mark Rutland wrote:
> When transitioning to/from guest mode, it is necessary to inform
> lockdep, tracing, and RCU in a specific order, similar to the
> requirements for transitions to/from user mode. Additionally, it is
> necessary to perform vtime accounting for a window around running the
> guest, with RCU enabled, such that timer interrupts taken from the guest
> can be accounted as guest time.
> 
> Most architectures don't handle all the necessary pieces, and a have a
> number of common bugs, including unsafe usage of RCU during the window
> between guest_enter() and guest_exit().
> 
> On x86, this was dealt with across commits:
> 
>    87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
>    0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
>    9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
>    3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
>    135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
>    160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
>    bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
> 
> ... but those fixes are specific to x86, and as the resulting logic
> (while correct) is split across generic helper functions and
> x86-specific helper functions, it is difficult to see that the
> entry/exit accounting is balanced.
> 
> This patch adds generic helpers which architectures can use to handle
> guest entry/exit consistently and correctly. The guest_{enter,exit}()
> helpers are split into guest_timing_{enter,exit}() to perform vtime
> accounting, and guest_context_{enter,exit}() to perform the necessary
> context tracking and RCU management. The existing guest_{enter,exit}()
> heleprs are left as wrappers of these.
> 
> Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff()
> helpers are added to handle the ordering of lockdep, tracing, and RCU
> manageent. These are inteneded to mirror exit_to_user_mode() and
> enter_from_user_mode().
> 
> Subsequent patches will migrate architectures over to the new helpers,
> following a sequence:
> 
> 	guest_timing_enter_irqoff();
> 
> 	guest_state_enter_irqoff();
> 	< run the vcpu >
> 	guest_state_exit_irqoff();
> 
> 	< take any pending IRQs >
> 
> 	guest_timing_exit_irqoff();
> 
> This sequences handles all of the above correctly, and more clearly
> balances the entry and exit portions, making it easier to understand.
> 
> The existing helpers are marked as deprecated, and will be removed once
> all architectures have been converted.
> 
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Marc Zyngier <maz@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/linux/kvm_host.h | 112 +++++++++++++++++++++++++++++++++++++--
>   1 file changed, 109 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c310648cc8f1a..774a3b9e9bd8d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -29,6 +29,8 @@
>   #include <linux/refcount.h>
>   #include <linux/nospec.h>
>   #include <linux/notifier.h>
> +#include <linux/ftrace.h>
> +#include <linux/instrumentation.h>
>   #include <asm/signal.h>
>   
>   #include <linux/kvm.h>
> @@ -362,8 +364,11 @@ struct kvm_vcpu {
>   	int last_used_slot;
>   };
>   
> -/* must be called with irqs disabled */
> -static __always_inline void guest_enter_irqoff(void)
> +/*
> + * Start accounting time towards a guest.
> + * Must be called before entering guest context.
> + */
> +static __always_inline void guest_timing_enter_irqoff(void)
>   {
>   	/*
>   	 * This is running in ioctl context so its safe to assume that it's the
> @@ -372,7 +377,18 @@ static __always_inline void guest_enter_irqoff(void)
>   	instrumentation_begin();
>   	vtime_account_guest_enter();
>   	instrumentation_end();
> +}
>   
> +/*
> + * Enter guest context and enter an RCU extended quiescent state.
> + *
> + * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is
> + * unsafe to use any code which may directly or indirectly use RCU, tracing
> + * (including IRQ flag tracing), or lockdep. All code in this period must be
> + * non-instrumentable.
> + */
> +static __always_inline void guest_context_enter_irqoff(void)
> +{
>   	/*
>   	 * KVM does not hold any references to rcu protected data when it
>   	 * switches CPU into a guest mode. In fact switching to a guest mode
> @@ -388,16 +404,79 @@ static __always_inline void guest_enter_irqoff(void)
>   	}
>   }
>   
> -static __always_inline void guest_exit_irqoff(void)
> +/*
> + * Deprecated. Architectures should move to guest_timing_enter_irqoff() and
> + * guest_state_enter_irqoff().
> + */
> +static __always_inline void guest_enter_irqoff(void)
> +{
> +	guest_timing_enter_irqoff();
> +	guest_context_enter_irqoff();
> +}
> +
> +/**
> + * guest_state_enter_irqoff - Fixup state when entering a guest
> + *
> + * Entry to a guest will enable interrupts, but the kernel state is interrupts
> + * disabled when this is invoked. Also tell RCU about it.
> + *
> + * 1) Trace interrupts on state
> + * 2) Invoke context tracking if enabled to adjust RCU state
> + * 3) Tell lockdep that interrupts are enabled
> + *
> + * Invoked from architecture specific code before entering a guest.
> + * Must be called with interrupts disabled and the caller must be
> + * non-instrumentable.
> + * The caller has to invoke guest_timing_enter_irqoff() before this.
> + *
> + * Note: this is analogous to exit_to_user_mode().
> + */
> +static __always_inline void guest_state_enter_irqoff(void)
> +{
> +	instrumentation_begin();
> +	trace_hardirqs_on_prepare();
> +	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
> +	instrumentation_end();
> +
> +	guest_context_enter_irqoff();
> +	lockdep_hardirqs_on(CALLER_ADDR0);
> +}
> +
> +/*
> + * Exit guest context and exit an RCU extended quiescent state.
> + *
> + * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is
> + * unsafe to use any code which may directly or indirectly use RCU, tracing
> + * (including IRQ flag tracing), or lockdep. All code in this period must be
> + * non-instrumentable.
> + */
> +static __always_inline void guest_context_exit_irqoff(void)
>   {
>   	context_tracking_guest_exit();
> +}
>   
> +/*
> + * Stop accounting time towards a guest.
> + * Must be called after exiting guest context.
> + */
> +static __always_inline void guest_timing_exit_irqoff(void)
> +{
>   	instrumentation_begin();
>   	/* Flush the guest cputime we spent on the guest */
>   	vtime_account_guest_exit();
>   	instrumentation_end();
>   }
>   
> +/*
> + * Deprecated. Architectures should move to guest_state_exit_irqoff() and
> + * guest_timing_exit_irqoff().
> + */
> +static __always_inline void guest_exit_irqoff(void)
> +{
> +	guest_context_exit_irqoff();
> +	guest_timing_exit_irqoff();
> +}
> +
>   static inline void guest_exit(void)
>   {
>   	unsigned long flags;
> @@ -407,6 +486,33 @@ static inline void guest_exit(void)
>   	local_irq_restore(flags);
>   }
>   
> +/**
> + * guest_state_exit_irqoff - Establish state when returning from guest mode
> + *
> + * Entry from a guest disables interrupts, but guest mode is traced as
> + * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
> + *
> + * 1) Tell lockdep that interrupts are disabled
> + * 2) Invoke context tracking if enabled to reactivate RCU
> + * 3) Trace interrupts off state
> + *
> + * Invoked from architecture specific code after exiting a guest.
> + * Must be invoked with interrupts disabled and the caller must be
> + * non-instrumentable.
> + * The caller has to invoke guest_timing_exit_irqoff() after this.
> + *
> + * Note: this is analogous to enter_from_user_mode().
> + */
> +static __always_inline void guest_state_exit_irqoff(void)
> +{
> +	lockdep_hardirqs_off(CALLER_ADDR0);
> +	guest_context_exit_irqoff();
> +
> +	instrumentation_begin();
> +	trace_hardirqs_off_finish();
> +	instrumentation_end();
> +}
> +
>   static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
>   {
>   	/*

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-19 10:58 ` [PATCH v2 4/7] kvm/mips: " Mark Rutland
@ 2022-01-20 11:10   ` Paolo Bonzini
  2022-01-20 13:33     ` Mark Rutland
  2022-01-20 16:44   ` Mark Rutland
  1 sibling, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 11:10 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, peterz, seanjc, suzuki.poulose, svens, tglx,
	tsbogend, vkuznets, wanpengli, will

On 1/19/22 11:58, Mark Rutland wrote:
> +	 * TODO: is there a barrier which ensures that pending interrupts are
> +	 * recognised? Currently this just hopes that the CPU takes any pending
> +	 * interrupts between the enable and disable.
> +	 */
> +	local_irq_enable();
> +	local_irq_disable();
>   

It's okay, there is irq_enable_hazard() but it's automatically included 
in arch_local_irq_enable().

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/7] kvm/riscv: rework guest entry logic
  2022-01-19 10:58 ` [PATCH v2 5/7] kvm/riscv: " Mark Rutland
@ 2022-01-20 11:18   ` Paolo Bonzini
  2022-01-20 12:56     ` Mark Rutland
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 11:18 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, peterz, seanjc, suzuki.poulose, svens, tglx,
	tsbogend, vkuznets, wanpengli, will

On 1/19/22 11:58, Mark Rutland wrote:
> +		 * There's no barrier which ensures that pending interrupts are
> +		 * recognised, so we just hope that the CPU takes any pending
> +		 * interrupts between the enable and disable.
>   		 */
>   		local_irq_enable();
> +		local_irq_disable();
>   

This should be the required architectural behavior: "a CSR access is 
performed after the execution of any prior instructions in program order 
whose behavior modifies or is modified by the CSR state and before the 
execution of any subsequent instructions in program order whose behavior 
modifies or is modified by the CSR state" (Zicsr spec, paragraph "CSR 
Access Ordering", available at 
https://www.five-embeddev.com/riscv-isa-manual/latest/csr.html#csrinsts).

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/7] kvm/x86: rework guest entry logic
  2022-01-19 10:58 ` [PATCH v2 7/7] kvm/x86: " Mark Rutland
@ 2022-01-20 11:20   ` Paolo Bonzini
  2022-01-21 17:40   ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 11:20 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, peterz, seanjc, suzuki.poulose, svens, tglx,
	tsbogend, vkuznets, wanpengli, will

On 1/19/22 11:58, Mark Rutland wrote:
> 
> To minimize duplication and to more clearly balance entry and exit, both
> entry and exit of guest timing are placed in vcpu_enter_guest(), using
> the new guest_timing_{enter,exit}_irqoff() helpers. When context
> tracking is used a small amount of additional time will be accounted
> towards guests; tick-based accounting is unnaffected as IRQs are
> disabled at this point and not enabled until after the return from the
> guest.
> 
> This also corrects (benign) mis-balanced context tracking accounting
> introduced in commits:
> 
>    ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath")
>    26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath")
> 
> Where KVM can enter a guest multiple times, calling vtime_guest_enter()
> without a corresponding call to vtime_account_guest_exit(), and with
> vtime_account_system() called when vtime_account_guest() should be used.
> As account_system_time() checks PF_VCPU and calls account_guest_time(),
> this doesn't result in any functional problem, but is unnecessarily
> confusing.

Yes, I agree.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-19 19:22   ` Mark Rutland
  2022-01-19 19:30     ` Christian Borntraeger
@ 2022-01-20 11:28     ` Paolo Bonzini
  2022-01-20 12:03       ` Mark Rutland
  1 sibling, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 11:28 UTC (permalink / raw)
  To: Mark Rutland, Christian Borntraeger
  Cc: linux-kernel, Michael Ellerman, aleksandar.qemu.devel,
	alexandru.elisei, anup.patel, aou, atish.patra, bp,
	catalin.marinas, chenhuacai, dave.hansen, frankja, frederic, gor,
	hca, james.morse, jmattson, joro, luto, maz, mingo, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will, Anup Patel,
	Atish Patra

On 1/19/22 20:22, Mark Rutland wrote:
> I wonder, is the s390 guest entry/exit*preemptible*  ?
> 
> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
> things before a ctx-switch to the idle thread, which would then be able
> to hit this.
> 
> I'll need to go audit the other architectures for similar.

They don't enable interrupts in the entry/exit path so they should be 
okay.  RISC-V and x86 have an explicit preempt_disable/enable, while 
MIPS only has local_irq_disable/enable.

(MIPS is a mess of dead code, I have patches to clean it up).

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-19 19:30     ` Christian Borntraeger
@ 2022-01-20 11:57       ` Mark Rutland
  2022-01-20 12:02         ` Christian Borntraeger
  0 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-20 11:57 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-kernel, Paolo Bonzini, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

On Wed, Jan 19, 2022 at 08:30:17PM +0100, Christian Borntraeger wrote:
> 
> 
> Am 19.01.22 um 20:22 schrieb Mark Rutland:
> > On Wed, Jan 19, 2022 at 07:25:20PM +0100, Christian Borntraeger wrote:
> > > Am 19.01.22 um 11:58 schrieb Mark Rutland:
> > > 
> > > 
> > > CCing new emails for Anup and Atish so that they are aware of this thread.
> > 
> > Ah; whoops. I'd meant to fix the Ccs on the patches.
> > 
> > Thanks!
> > 
> > [...]
> > 
> > > I just gave this a spin on s390 with debugging on and I got the following:
> > > 
> > > [  457.151295] ------------[ cut here ]------------
> > > [  457.151311] WARNING: CPU: 14 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0xf8/0x118
> > 
> > Hmm, so IIUC that's:
> > 
> > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> > 
> > ... and we're clearly in the idle thread here.
> > 
> > I wonder, is the s390 guest entry/exit *preemptible* ?
> 
> Looks like debug_defconfig is indeed using preemption:
> 
> CONFIG_PREEMPT_BUILD=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_PREEMPTIRQ_TRACEPOINTS=y
> CONFIG_TRACE_PREEMPT_TOGGLE=y
> CONFIG_PREEMPT_TRACER=y
> # CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Thanks for confirming!

Could you try with CONFIG_PROVE_RCU=y ? That can't be selected directly, but
selecting PROVE_LOCKING=y will enable it.

If I'm right, with that we should get a splat out of
rcu_irq_exit_check_preempt().

If so, I think we can solve this with preempt_{disable,enable}() around the
guest_timing_{enter,exit}_irqoff() calls. We'll also need to add some more
comments around arch_in_rcu_eqs() that arch-specific EQSs should be
non-preemptible.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-20 11:57       ` Mark Rutland
@ 2022-01-20 12:02         ` Christian Borntraeger
  0 siblings, 0 replies; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-20 12:02 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Paolo Bonzini, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra



Am 20.01.22 um 12:57 schrieb Mark Rutland:
> On Wed, Jan 19, 2022 at 08:30:17PM +0100, Christian Borntraeger wrote:
>>
>>
>> Am 19.01.22 um 20:22 schrieb Mark Rutland:
>>> On Wed, Jan 19, 2022 at 07:25:20PM +0100, Christian Borntraeger wrote:
>>>> Am 19.01.22 um 11:58 schrieb Mark Rutland:
>>>>
>>>>
>>>> CCing new emails for Anup and Atish so that they are aware of this thread.
>>>
>>> Ah; whoops. I'd meant to fix the Ccs on the patches.
>>>
>>> Thanks!
>>>
>>> [...]
>>>
>>>> I just gave this a spin on s390 with debugging on and I got the following:
>>>>
>>>> [  457.151295] ------------[ cut here ]------------
>>>> [  457.151311] WARNING: CPU: 14 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0xf8/0x118
>>>
>>> Hmm, so IIUC that's:
>>>
>>> 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
>>>
>>> ... and we're clearly in the idle thread here.
>>>
>>> I wonder, is the s390 guest entry/exit *preemptible* ?
>>
>> Looks like debug_defconfig is indeed using preemption:
>>
>> CONFIG_PREEMPT_BUILD=y
>> # CONFIG_PREEMPT_NONE is not set
>> # CONFIG_PREEMPT_VOLUNTARY is not set
>> CONFIG_PREEMPT=y
>> CONFIG_PREEMPT_COUNT=y
>> CONFIG_PREEMPTION=y
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_PREEMPT_NOTIFIERS=y
>> CONFIG_DEBUG_PREEMPT=y
>> CONFIG_PREEMPTIRQ_TRACEPOINTS=y
>> CONFIG_TRACE_PREEMPT_TOGGLE=y
>> CONFIG_PREEMPT_TRACER=y
>> # CONFIG_PREEMPTIRQ_DELAY_TEST is not set
> 
> Thanks for confirming!
> 
> Could you try with CONFIG_PROVE_RCU=y ? That can't be selected directly, but
> selecting PROVE_LOCKING=y will enable it.

PROVE_LOCKING was enabled in my runs as well as PROVE_RCU.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-20 11:28     ` Paolo Bonzini
@ 2022-01-20 12:03       ` Mark Rutland
  2022-01-20 15:14         ` Christian Borntraeger
  0 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-20 12:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christian Borntraeger, linux-kernel, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
> On 1/19/22 20:22, Mark Rutland wrote:
> > I wonder, is the s390 guest entry/exit*preemptible*  ?
> > 
> > If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
> > things before a ctx-switch to the idle thread, which would then be able
> > to hit this.
> > 
> > I'll need to go audit the other architectures for similar.
> 
> They don't enable interrupts in the entry/exit path so they should be okay.

True.

So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
right thing to do. I'll add that and explanatory commentary.

> RISC-V and x86 have an explicit preempt_disable/enable, while MIPS only has
> local_irq_disable/enable.
> 
> (MIPS is a mess of dead code, I have patches to clean it up).

Sure; I haven't wrapped my head around ppc yet, but I assume they keep
interrupts disabled as with the other simple cases.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/7] kvm/riscv: rework guest entry logic
  2022-01-20 11:18   ` Paolo Bonzini
@ 2022-01-20 12:56     ` Mark Rutland
  2022-01-20 13:13       ` Paolo Bonzini
  0 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-20 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, palmer
  Cc: linux-kernel, aleksandar.qemu.devel, alexandru.elisei,
	anup.patel, aou, atish.patra, borntraeger, bp, catalin.marinas,
	chenhuacai, dave.hansen, frankja, frederic, gor, hca,
	james.morse, jmattson, joro, luto, maz, mingo, mpe, nsaenzju,
	paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose, svens,
	tglx, tsbogend, vkuznets, wanpengli, will

On Thu, Jan 20, 2022 at 12:18:04PM +0100, Paolo Bonzini wrote:
> On 1/19/22 11:58, Mark Rutland wrote:
> > +		 * There's no barrier which ensures that pending interrupts are
> > +		 * recognised, so we just hope that the CPU takes any pending
> > +		 * interrupts between the enable and disable.
> >   		 */
> >   		local_irq_enable();
> > +		local_irq_disable();
> 
> This should be the required architectural behavior: "a CSR access is
> performed after the execution of any prior instructions in program order
> whose behavior modifies or is modified by the CSR state and before the
> execution of any subsequent instructions in program order whose behavior
> modifies or is modified by the CSR state" (Zicsr spec, paragraph "CSR Access
> Ordering", available at
> https://www.five-embeddev.com/riscv-isa-manual/latest/csr.html#csrinsts).

I think that's necessary, but not sufficient.

IIUC that wording means that writes to the CSR state occur in program order
without requiring additional barriers to take effect. The current value of the
CSR determines whether interrupts *can* be taken, but that doesn't say that
pending interrrupts *must* be taken immediately when unmasked in the CSR.

For comparison, ARMv8 has similar wording that writes to PSTATE.I via the
DAIF/DAIFSet/DAIFClr registers occur in program order and do not require
additional barriers. However, there's also wording that to ensure that a
pending interrupt is taken, a context-synchronization-event (e.g. an ISB
instruction) is necessary. Without that, a back-to-back enable->disable will
not necessarily result in a pending interrupt being taken.

The arm64 patch in this series has references to the documentation.

I had asked Palmer about this on IRC, and he didn't seem to think the
architecture mandated this (or at least, was unsure).

Palmer, thoughts?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/7] kvm/riscv: rework guest entry logic
  2022-01-20 12:56     ` Mark Rutland
@ 2022-01-20 13:13       ` Paolo Bonzini
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 13:13 UTC (permalink / raw)
  To: Mark Rutland, palmer
  Cc: linux-kernel, aleksandar.qemu.devel, alexandru.elisei,
	anup.patel, aou, atish.patra, borntraeger, bp, catalin.marinas,
	chenhuacai, dave.hansen, frankja, frederic, gor, hca,
	james.morse, jmattson, joro, luto, maz, mingo, mpe, nsaenzju,
	paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose, svens,
	tglx, tsbogend, vkuznets, wanpengli, will

On 1/20/22 13:56, Mark Rutland wrote:
>> This should be the required architectural behavior: "a CSR access is
>> performed after the execution of any prior instructions in program order
>> whose behavior modifies or is modified by the CSR state and before the
>> execution of any subsequent instructions in program order whose behavior
>> modifies or is modified by the CSR state" (Zicsr spec, paragraph "CSR Access
>> Ordering", available at
>> https://www.five-embeddev.com/riscv-isa-manual/latest/csr.html#csrinsts).
>
> I think that's necessary, but not sufficient.
> 
> IIUC that wording means that writes to the CSR state occur in program order
> without requiring additional barriers to take effect. The current value of the
> CSR determines whether interrupts *can* be taken, but that doesn't say that
> pending interrrupts *must*  be taken immediately when unmasked in the CSR.

I see.  Yeah, my reasoning was that there would be _different_ 
instructions executed after the CSR write if an interrupt has to be 
taken, but perhaps that's a bit of a stretch.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-20 11:10   ` Paolo Bonzini
@ 2022-01-20 13:33     ` Mark Rutland
  0 siblings, 0 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-20 13:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, aleksandar.qemu.devel, alexandru.elisei,
	anup.patel, aou, atish.patra, borntraeger, bp, catalin.marinas,
	chenhuacai, dave.hansen, frankja, frederic, gor, hca,
	james.morse, jmattson, joro, luto, maz, mingo, mpe, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

On Thu, Jan 20, 2022 at 12:10:22PM +0100, Paolo Bonzini wrote:
> On 1/19/22 11:58, Mark Rutland wrote:
> > +	 * TODO: is there a barrier which ensures that pending interrupts are
> > +	 * recognised? Currently this just hopes that the CPU takes any pending
> > +	 * interrupts between the enable and disable.
> > +	 */
> > +	local_irq_enable();
> > +	local_irq_disable();
> 
> It's okay, there is irq_enable_hazard() but it's automatically included in
> arch_local_irq_enable().

As with the riscv case, I'm not sure whether that ensures that a pending
IRQ is actually recognized and taken.

Since there's also an irq_disable_hazard() it looks like that's there to
ensure the IRQ mask is updated in program order, rather than
guaranteeing that a pending IRQ is necessarily taken while IRQs are
unmasked.

In practice, I suspect it probably does, but it'd be good if someone
from the MIPS side could say something either way.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-20 12:03       ` Mark Rutland
@ 2022-01-20 15:14         ` Christian Borntraeger
  2022-01-21  9:53           ` Christian Borntraeger
  0 siblings, 1 reply; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-20 15:14 UTC (permalink / raw)
  To: Mark Rutland, Paolo Bonzini
  Cc: linux-kernel, Michael Ellerman, aleksandar.qemu.devel,
	alexandru.elisei, anup.patel, aou, atish.patra, bp,
	catalin.marinas, chenhuacai, dave.hansen, frankja, frederic, gor,
	hca, james.morse, jmattson, joro, luto, maz, mingo, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will, Anup Patel,
	Atish Patra



Am 20.01.22 um 13:03 schrieb Mark Rutland:
> On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
>> On 1/19/22 20:22, Mark Rutland wrote:
>>> I wonder, is the s390 guest entry/exit*preemptible*  ?
>>>
>>> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
>>> things before a ctx-switch to the idle thread, which would then be able
>>> to hit this.
>>>
>>> I'll need to go audit the other architectures for similar.
>>
>> They don't enable interrupts in the entry/exit path so they should be okay.
> 
> True.
> 
> So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
> right thing to do. I'll add that and explanatory commentary.

That would not be trivial I guess. We do allow (and need) page faults on sie for guest
demand paging and

this piece of arch/s390/mm/fault.c

        case GMAP_FAULT:
                 if (faulthandler_disabled() || !mm)
                         goto out;
                 break;
         }

would no longer work since faulthandler_disabled checks for the preempt count.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-19 10:58 ` [PATCH v2 4/7] kvm/mips: " Mark Rutland
  2022-01-20 11:10   ` Paolo Bonzini
@ 2022-01-20 16:44   ` Mark Rutland
  2022-01-20 16:57     ` Paolo Bonzini
  1 sibling, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-20 16:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, pbonzini, peterz, seanjc, suzuki.poulose, svens,
	tglx, tsbogend, vkuznets, wanpengli, will

On Wed, Jan 19, 2022 at 10:58:51AM +0000, Mark Rutland wrote:
> In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and
> guest_exit_irqoff() directly, with interrupts masked between these. As
> we don't handle any timer ticks during this window, we will not account
> time spent within the guest as guest time, which is unfortunate.
> 
> Additionally, we do not inform lockdep or tracing that interrupts will
> be enabled during guest execution, which caan lead to misleading traces
> and warnings that interrupts have been enabled for overly-long periods.
> 
> This patch fixes these issues by using the new timing and context
> entry/exit helpers to ensure that interrupts are handled during guest
> vtime but with RCU watching, with a sequence:
> 
> 	guest_timing_enter_irqoff();
> 
> 	guest_state_enter_irqoff();
> 	< run the vcpu >
> 	guest_state_exit_irqoff();
> 
> 	< take any pending IRQs >
> 
> 	guest_timing_exit_irqoff();

Looking again, this patch isn't sufficient.

On MIPS a guest exit will be handled by kvm_mips_handle_exit() *before*
returning into the "< run the vcpu >" step above, so we won't call 
guest_state_exit_irqoff() before using RCU, etc.

This'll need some more thought...

Mark.

> Since instrumentation may make use of RCU, we must also ensure that no
> instrumented code is run during the EQS. I've split out the critical
> section into a new kvm_mips_enter_exit_vcpu() helper which is marked
> noinstr.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Huacai Chen <chenhuacai@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> ---
>  arch/mips/kvm/mips.c | 37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index aa20d074d3883..1a961c2434fee 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -438,6 +438,24 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>  	return -ENOIOCTLCMD;
>  }
>  
> +/*
> + * Actually run the vCPU, entering an RCU extended quiescent state (EQS) while
> + * the vCPU is running.
> + *
> + * This must be noinstr as instrumentation may make use of RCU, and this is not
> + * safe during the EQS.
> + */
> +static int noinstr kvm_mips_vcpu_enter_exit(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +
> +	guest_state_enter_irqoff();
> +	ret = kvm_mips_callbacks->vcpu_run(vcpu);
> +	guest_state_exit_irqoff();
> +
> +	return ret;
> +}
> +
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  {
>  	int r = -EINTR;
> @@ -458,7 +476,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	lose_fpu(1);
>  
>  	local_irq_disable();
> -	guest_enter_irqoff();
> +	guest_timing_enter_irqoff();
>  	trace_kvm_enter(vcpu);
>  
>  	/*
> @@ -469,10 +487,23 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	 */
>  	smp_store_mb(vcpu->mode, IN_GUEST_MODE);
>  
> -	r = kvm_mips_callbacks->vcpu_run(vcpu);
> +	r = kvm_mips_vcpu_enter_exit(vcpu);
> +
> +	/*
> +	 * We must ensure that any pending interrupts are taken before
> +	 * we exit guest timing so that timer ticks are accounted as
> +	 * guest time. Transiently unmask interrupts so that any
> +	 * pending interrupts are taken.
> +	 *
> +	 * TODO: is there a barrier which ensures that pending interrupts are
> +	 * recognised? Currently this just hopes that the CPU takes any pending
> +	 * interrupts between the enable and disable.
> +	 */
> +	local_irq_enable();
> +	local_irq_disable();
>  
>  	trace_kvm_out(vcpu);
> -	guest_exit_irqoff();
> +	guest_timing_exit_irqoff();
>  	local_irq_enable();
>  
>  out:
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-20 16:44   ` Mark Rutland
@ 2022-01-20 16:57     ` Paolo Bonzini
  2022-01-20 17:15       ` Mark Rutland
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 16:57 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, nsaenzju, palmer, paulmck,
	paul.walmsley, peterz, seanjc, suzuki.poulose, svens, tglx,
	tsbogend, vkuznets, wanpengli, will

On 1/20/22 17:44, Mark Rutland wrote:
> On Wed, Jan 19, 2022 at 10:58:51AM +0000, Mark Rutland wrote:
>> In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and
>> guest_exit_irqoff() directly, with interrupts masked between these. As
>> we don't handle any timer ticks during this window, we will not account
>> time spent within the guest as guest time, which is unfortunate.
>>
>> Additionally, we do not inform lockdep or tracing that interrupts will
>> be enabled during guest execution, which caan lead to misleading traces
>> and warnings that interrupts have been enabled for overly-long periods.
>>
>> This patch fixes these issues by using the new timing and context
>> entry/exit helpers to ensure that interrupts are handled during guest
>> vtime but with RCU watching, with a sequence:
>>
>> 	guest_timing_enter_irqoff();
>>
>> 	guest_state_enter_irqoff();
>> 	< run the vcpu >
>> 	guest_state_exit_irqoff();
>>
>> 	< take any pending IRQs >
>>
>> 	guest_timing_exit_irqoff();
> 
> Looking again, this patch isn't sufficient.
> 
> On MIPS a guest exit will be handled by kvm_mips_handle_exit() *before*
> returning into the "< run the vcpu >" step above, so we won't call
> guest_state_exit_irqoff() before using RCU, etc.

Indeed.  kvm_mips_handle_exit has a weird mutual recursion through runtime-assembled code, but then this is MIPS...

This should do it:

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index e59cb6246f76..6f2291f017f5 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1192,6 +1192,7 @@ int kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
  	kvm_mips_set_c0_status();
  
  	local_irq_enable();
+	guest_timing_exit_irqoff();
  
  	kvm_debug("kvm_mips_handle_exit: cause: %#x, PC: %p, kvm_run: %p, kvm_vcpu: %p\n",
  			cause, opc, run, vcpu);
@@ -1325,6 +1326,7 @@ int kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
  	}
  
  	if (ret == RESUME_GUEST) {
+		guest_timing_enter_irqoff();
  		trace_kvm_reenter(vcpu);
  
  		/*


Paolo


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-20 16:57     ` Paolo Bonzini
@ 2022-01-20 17:15       ` Mark Rutland
  2022-01-20 17:17         ` Sean Christopherson
  2022-01-20 17:29         ` Paolo Bonzini
  0 siblings, 2 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-20 17:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, aleksandar.qemu.devel, alexandru.elisei,
	anup.patel, aou, atish.patra, borntraeger, bp, catalin.marinas,
	chenhuacai, dave.hansen, frankja, frederic, gor, hca,
	james.morse, jmattson, joro, luto, maz, mingo, mpe, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

On Thu, Jan 20, 2022 at 05:57:05PM +0100, Paolo Bonzini wrote:
> On 1/20/22 17:44, Mark Rutland wrote:
> > On Wed, Jan 19, 2022 at 10:58:51AM +0000, Mark Rutland wrote:
> > > In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and
> > > guest_exit_irqoff() directly, with interrupts masked between these. As
> > > we don't handle any timer ticks during this window, we will not account
> > > time spent within the guest as guest time, which is unfortunate.
> > > 
> > > Additionally, we do not inform lockdep or tracing that interrupts will
> > > be enabled during guest execution, which caan lead to misleading traces
> > > and warnings that interrupts have been enabled for overly-long periods.
> > > 
> > > This patch fixes these issues by using the new timing and context
> > > entry/exit helpers to ensure that interrupts are handled during guest
> > > vtime but with RCU watching, with a sequence:
> > > 
> > > 	guest_timing_enter_irqoff();
> > > 
> > > 	guest_state_enter_irqoff();
> > > 	< run the vcpu >
> > > 	guest_state_exit_irqoff();
> > > 
> > > 	< take any pending IRQs >
> > > 
> > > 	guest_timing_exit_irqoff();
> > 
> > Looking again, this patch isn't sufficient.
> > 
> > On MIPS a guest exit will be handled by kvm_mips_handle_exit() *before*
> > returning into the "< run the vcpu >" step above, so we won't call
> > guest_state_exit_irqoff() before using RCU, etc.
> 
> Indeed.  kvm_mips_handle_exit has a weird mutual recursion through runtime-assembled code, but then this is MIPS...
> 
> This should do it:
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index e59cb6246f76..6f2291f017f5 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1192,6 +1192,7 @@ int kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
>  	kvm_mips_set_c0_status();
>  	local_irq_enable();
> +	guest_timing_exit_irqoff();
>  	kvm_debug("kvm_mips_handle_exit: cause: %#x, PC: %p, kvm_run: %p, kvm_vcpu: %p\n",
>  			cause, opc, run, vcpu);
> @@ -1325,6 +1326,7 @@ int kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
>  	}
>  	if (ret == RESUME_GUEST) {
> +		guest_timing_enter_irqoff();
>  		trace_kvm_reenter(vcpu);
>  		/*

As above, we'll also need the guest_state_{enter,exit}() calls
surrounding this (e.g. before that local_irq_enable() at the start of
kvm_mips_handle_exit(), and that needs to happen in noinstr code, etc.

It looks like the simplest thing to do is to rename
kvm_mips_handle_exit() to __kvm_mips_handle_exit() and add a
kvm_mips_handle_exit() wrapper that handles that (with the return path
conditional on whether __kvm_mips_handle_exit() returns RESUME_GUEST).

I'll have a go at that tomorrow when I regain the capability to think.

Longer-term MIPS should move to a loop like everyone else has:

	for (;;) {
		status = kvm_mips_enter_exit_vcpu();

		if (handle_exit(status))
			break;
		
		...
	}

... which is far easier to manage.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-20 17:15       ` Mark Rutland
@ 2022-01-20 17:17         ` Sean Christopherson
  2022-01-20 17:29         ` Paolo Bonzini
  1 sibling, 0 replies; 41+ messages in thread
From: Sean Christopherson @ 2022-01-20 17:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paolo Bonzini, linux-kernel, aleksandar.qemu.devel,
	alexandru.elisei, anup.patel, aou, atish.patra, borntraeger, bp,
	catalin.marinas, chenhuacai, dave.hansen, frankja, frederic, gor,
	hca, james.morse, jmattson, joro, luto, maz, mingo, mpe,
	nsaenzju, palmer, paulmck, paul.walmsley, peterz, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

On Thu, Jan 20, 2022, Mark Rutland wrote:
> Longer-term MIPS should move to a loop like everyone else has:
> 
> 	for (;;) {
> 		status = kvm_mips_enter_exit_vcpu();
> 
> 		if (handle_exit(status))
> 			break;
> 		
> 		...
> 	}
> 
> ... which is far easier to manage.

I don't suppose we can just remove MIPS "support"?  And PPC while we're at it?  :-D

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-20 17:15       ` Mark Rutland
  2022-01-20 17:17         ` Sean Christopherson
@ 2022-01-20 17:29         ` Paolo Bonzini
  2022-01-21 12:44           ` Mark Rutland
  1 sibling, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2022-01-20 17:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, aleksandar.qemu.devel, alexandru.elisei,
	anup.patel, aou, atish.patra, borntraeger, bp, catalin.marinas,
	chenhuacai, dave.hansen, frankja, frederic, gor, hca,
	james.morse, jmattson, joro, luto, maz, mingo, mpe, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

On 1/20/22 18:15, Mark Rutland wrote:
> As above, we'll also need the guest_state_{enter,exit}() calls
> surrounding this (e.g. before that local_irq_enable() at the start of
> kvm_mips_handle_exit(),

Oh, indeed.  And there is also an interrupt-enabled area similar to 
s390's, in both vcpu_run and the exception handler entry point (which 
falls through to the exit handler created by kvm_mips_build_exit).  For 
example:

         /* Setup status register for running guest in UM */
         uasm_i_ori(&p, V1, V1, ST0_EXL | KSU_USER | ST0_IE);
         UASM_i_LA(&p, AT, ~(ST0_CU0 | ST0_MX | ST0_SX | ST0_UX));
         uasm_i_and(&p, V1, V1, AT);
         uasm_i_mtc0(&p, V1, C0_STATUS);
         uasm_i_ehb(&p);

I'd rather get rid altogether of the EQS for MIPS.

> and that needs to happen in noinstr code, etc.

There are bigger problems with instrumentation, because the 
runtime-generated code as far as I can tell is not noinstr.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-20 15:14         ` Christian Borntraeger
@ 2022-01-21  9:53           ` Christian Borntraeger
  2022-01-21 14:17             ` Christian Borntraeger
  0 siblings, 1 reply; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-21  9:53 UTC (permalink / raw)
  To: Mark Rutland, Paolo Bonzini
  Cc: linux-kernel, Michael Ellerman, aleksandar.qemu.devel,
	alexandru.elisei, anup.patel, aou, atish.patra, bp,
	catalin.marinas, chenhuacai, dave.hansen, frankja, frederic, gor,
	hca, james.morse, jmattson, joro, luto, maz, mingo, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will, Anup Patel,
	Atish Patra

Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
> 
> 
> Am 20.01.22 um 13:03 schrieb Mark Rutland:
>> On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
>>> On 1/19/22 20:22, Mark Rutland wrote:
>>>> I wonder, is the s390 guest entry/exit*preemptible*  ?
>>>>
>>>> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
>>>> things before a ctx-switch to the idle thread, which would then be able
>>>> to hit this.
>>>>
>>>> I'll need to go audit the other architectures for similar.
>>>
>>> They don't enable interrupts in the entry/exit path so they should be okay.
>>
>> True.
>>
>> So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
>> right thing to do. I'll add that and explanatory commentary.
> 
> That would not be trivial I guess. We do allow (and need) page faults on sie for guest
> demand paging and
> 
> this piece of arch/s390/mm/fault.c
> 
>         case GMAP_FAULT:
>                  if (faulthandler_disabled() || !mm)
>                          goto out;
>                  break;
>          }
> 
> would no longer work since faulthandler_disabled checks for the preempt count.
> 


Something like this


diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index d30f5986fa85..1c7d45346e12 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
                         return 0;
                 goto out;
         case USER_FAULT:
-       case GMAP_FAULT:
                 if (faulthandler_disabled() || !mm)
                         goto out;
                 break;
+               /*
+                * We know that we interrupted SIE and we are not in a IRQ.
+                * preemption might be disabled thus checking for in_atomic
+                * would result in failures
+                */
+       case GMAP_FAULT:
+               if (pagefault_disabled() || !mm)
+                       goto out;
+               break;
         }
  
         perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

seems to work with preemption disabled around sie. Not sure yet if this is correct.

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/7] kvm/mips: rework guest entry logic
  2022-01-20 17:29         ` Paolo Bonzini
@ 2022-01-21 12:44           ` Mark Rutland
  0 siblings, 0 replies; 41+ messages in thread
From: Mark Rutland @ 2022-01-21 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, aleksandar.qemu.devel, alexandru.elisei,
	anup.patel, aou, atish.patra, borntraeger, bp, catalin.marinas,
	chenhuacai, dave.hansen, frankja, frederic, gor, hca,
	james.morse, jmattson, joro, luto, maz, mingo, mpe, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will

On Thu, Jan 20, 2022 at 06:29:25PM +0100, Paolo Bonzini wrote:
> On 1/20/22 18:15, Mark Rutland wrote:
> > As above, we'll also need the guest_state_{enter,exit}() calls
> > surrounding this (e.g. before that local_irq_enable() at the start of
> > kvm_mips_handle_exit(),
> 
> Oh, indeed.  And there is also an interrupt-enabled area similar to s390's,
> in both vcpu_run and the exception handler entry point (which falls through
> to the exit handler created by kvm_mips_build_exit).  For example:
> 
>         /* Setup status register for running guest in UM */
>         uasm_i_ori(&p, V1, V1, ST0_EXL | KSU_USER | ST0_IE);
>         UASM_i_LA(&p, AT, ~(ST0_CU0 | ST0_MX | ST0_SX | ST0_UX));
>         uasm_i_and(&p, V1, V1, AT);
>         uasm_i_mtc0(&p, V1, C0_STATUS);
>         uasm_i_ehb(&p);
> 
> I'd rather get rid altogether of the EQS for MIPS.

Ok; I'm not immediately sure how to do that without invasive changes around the
context tracking bits.

Did you have a specific approach in mind, or was that just a general statement?

> > and that needs to happen in noinstr code, etc.
> 
> There are bigger problems with instrumentation, because the
> runtime-generated code as far as I can tell is not noinstr.

The generated sequences themselves are not a problem -- they're not
compiler-instrumented, and kprobes will reject them since they live in a
kzalloc()'d buffer which is outside of kernel text.

Those call tlbmiss_handler_setup_pgd(), but that itself is runtime-generated,
and AFAICT doesn't call anything. It is placed within the kernel text, but it
could be blacklisted from kprobes.

Have I missed something there?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-21  9:53           ` Christian Borntraeger
@ 2022-01-21 14:17             ` Christian Borntraeger
  2022-01-21 14:30               ` Mark Rutland
  0 siblings, 1 reply; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-21 14:17 UTC (permalink / raw)
  To: Mark Rutland, Paolo Bonzini
  Cc: linux-kernel, Michael Ellerman, aleksandar.qemu.devel,
	alexandru.elisei, anup.patel, aou, atish.patra, bp,
	catalin.marinas, chenhuacai, dave.hansen, frankja, frederic, gor,
	hca, james.morse, jmattson, joro, luto, maz, mingo, nsaenzju,
	palmer, paulmck, paul.walmsley, peterz, seanjc, suzuki.poulose,
	svens, tglx, tsbogend, vkuznets, wanpengli, will, Anup Patel,
	Atish Patra



Am 21.01.22 um 10:53 schrieb Christian Borntraeger:
> Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
>>
>>
>> Am 20.01.22 um 13:03 schrieb Mark Rutland:
>>> On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
>>>> On 1/19/22 20:22, Mark Rutland wrote:
>>>>> I wonder, is the s390 guest entry/exit*preemptible*  ?
>>>>>
>>>>> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
>>>>> things before a ctx-switch to the idle thread, which would then be able
>>>>> to hit this.
>>>>>
>>>>> I'll need to go audit the other architectures for similar.
>>>>
>>>> They don't enable interrupts in the entry/exit path so they should be okay.
>>>
>>> True.
>>>
>>> So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
>>> right thing to do. I'll add that and explanatory commentary.
>>
>> That would not be trivial I guess. We do allow (and need) page faults on sie for guest
>> demand paging and
>>
>> this piece of arch/s390/mm/fault.c
>>
>>         case GMAP_FAULT:
>>                  if (faulthandler_disabled() || !mm)
>>                          goto out;
>>                  break;
>>          }
>>
>> would no longer work since faulthandler_disabled checks for the preempt count.
>>
> 
> 
> Something like this
> 
> 
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index d30f5986fa85..1c7d45346e12 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>                          return 0;
>                  goto out;
>          case USER_FAULT:
> -       case GMAP_FAULT:
>                  if (faulthandler_disabled() || !mm)
>                          goto out;
>                  break;
> +               /*
> +                * We know that we interrupted SIE and we are not in a IRQ.
> +                * preemption might be disabled thus checking for in_atomic
> +                * would result in failures
> +                */
> +       case GMAP_FAULT:
> +               if (pagefault_disabled() || !mm)
> +                       goto out;
> +               break;
>          }
> 
>          perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
> 
> seems to work with preemption disabled around sie. Not sure yet if this is correct.


No it does not work. scheduling while preemption is disabled.
[ 1880.448663] BUG: scheduling while atomic: qemu-system-s39/1806/0x00000002
[ 1880.448674] INFO: lockdep is turned off.
[ 1880.448676] Modules linked in: kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc mlx5_ib ib_uverbs s390_trng ib_core genwqe_card crc_itu_t vfio_ccw mdev vfio_iommu_type1 eadm_sch vfio zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4
[ 1880.448730] Preemption disabled at:
[ 1880.448731] [<000003ff8070da88>] ckc_irq_pending+0x30/0xe0 [kvm]
[ 1880.448778] CPU: 47 PID: 1806 Comm: qemu-system-s39 Tainted: G        W         5.16.0-00007-g89e9021389e2-dirty #15
[ 1880.448782] Hardware name: IBM 3906 M04 704 (LPAR)
[ 1880.448784] Call Trace:
[ 1880.448785]  [<000000000bf001d6>] dump_stack_lvl+0x8e/0xc8
[ 1880.448794]  [<000000000b26e08a>] __schedule_bug+0xe2/0xf8
[ 1880.448801]  [<000000000b26e212>] schedule_debug+0x172/0x1a8
[ 1880.448804]  [<000000000bf0bcae>] __schedule+0x56/0x8b0
[ 1880.448808]  [<000000000bf0c570>] schedule+0x68/0x110
[ 1880.448811]  [<000000000bf13e76>] schedule_timeout+0x106/0x160
[ 1880.448815]  [<000000000bf0ddf2>] wait_for_completion+0xc2/0x110
[ 1880.448818]  [<000000000b258674>] __flush_work+0xd4/0x118
[ 1880.448823]  [<000000000b4e3c88>] __drain_all_pages+0x218/0x308
[ 1880.448829]  [<000000000b4ec3bc>] __alloc_pages_slowpath.constprop.0+0x5bc/0xc98
[ 1880.448832]  [<000000000b4ece5c>] __alloc_pages+0x3c4/0x448
[ 1880.448835]  [<000000000b5143cc>] alloc_pages_vma+0x9c/0x360
[ 1880.448841]  [<000000000b4c0d6e>] do_swap_page+0x66e/0xca0
[ 1880.448845]  [<000000000b4c3012>] __handle_mm_fault+0x29a/0x4b0
[ 1880.448869]  [<000000000b4c33ac>] handle_mm_fault+0x184/0x3a8
[ 1880.448872]  [<000000000b2062ce>] do_exception+0x136/0x490
[ 1880.448877]  [<000000000b206b9a>] do_dat_exception+0x2a/0x50
[ 1880.448880]  [<000000000bf03650>] __do_pgm_check+0x120/0x1f0
[ 1880.448882]  [<000000000bf164ee>] pgm_check_handler+0x11e/0x180
[ 1880.448885]  [<000000000bf16298>] sie_exit+0x0/0x48
[ 1880.448888] ([<000003ff8071e954>] kvm_s390_enter_exit_sie+0x64/0x98 [kvm])
[ 1880.448910]  [<000003ff807061fa>] __vcpu_run+0x2a2/0x5b8 [kvm]
[ 1880.448931]  [<000003ff807069ba>] kvm_arch_vcpu_ioctl_run+0x10a/0x270 [kvm]
[ 1880.448953]  [<000003ff806ed02c>] kvm_vcpu_ioctl+0x27c/0xa40 [kvm]
[ 1880.448975]  [<000000000b58b5c6>] __s390x_sys_ioctl+0xbe/0x100
[ 1880.448982]  [<000000000bf038fa>] __do_syscall+0x1da/0x208
[ 1880.448984]  [<000000000bf16362>] system_call+0x82/0xb0

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-21 14:17             ` Christian Borntraeger
@ 2022-01-21 14:30               ` Mark Rutland
  2022-01-21 14:42                 ` Christian Borntraeger
  0 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-21 14:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, linux-kernel, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

On Fri, Jan 21, 2022 at 03:17:01PM +0100, Christian Borntraeger wrote:
> 
> 
> Am 21.01.22 um 10:53 schrieb Christian Borntraeger:
> > Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
> > > 
> > > 
> > > Am 20.01.22 um 13:03 schrieb Mark Rutland:
> > > > On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
> > > > > On 1/19/22 20:22, Mark Rutland wrote:
> > > > > > I wonder, is the s390 guest entry/exit*preemptible*  ?
> > > > > > 
> > > > > > If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
> > > > > > things before a ctx-switch to the idle thread, which would then be able
> > > > > > to hit this.
> > > > > > 
> > > > > > I'll need to go audit the other architectures for similar.
> > > > > 
> > > > > They don't enable interrupts in the entry/exit path so they should be okay.
> > > > 
> > > > True.
> > > > 
> > > > So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
> > > > right thing to do. I'll add that and explanatory commentary.
> > > 
> > > That would not be trivial I guess. We do allow (and need) page faults on sie for guest
> > > demand paging and
> > > 
> > > this piece of arch/s390/mm/fault.c
> > > 
> > >         case GMAP_FAULT:
> > >                  if (faulthandler_disabled() || !mm)
> > >                          goto out;
> > >                  break;
> > >          }
> > > 
> > > would no longer work since faulthandler_disabled checks for the preempt count.
> > > 
> > 
> > 
> > Something like this
> > 
> > 
> > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> > index d30f5986fa85..1c7d45346e12 100644
> > --- a/arch/s390/mm/fault.c
> > +++ b/arch/s390/mm/fault.c
> > @@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> >                          return 0;
> >                  goto out;
> >          case USER_FAULT:
> > -       case GMAP_FAULT:
> >                  if (faulthandler_disabled() || !mm)
> >                          goto out;
> >                  break;
> > +               /*
> > +                * We know that we interrupted SIE and we are not in a IRQ.
> > +                * preemption might be disabled thus checking for in_atomic
> > +                * would result in failures
> > +                */
> > +       case GMAP_FAULT:
> > +               if (pagefault_disabled() || !mm)
> > +                       goto out;
> > +               break;
> >          }
> > 
> >          perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
> > 
> > seems to work with preemption disabled around sie. Not sure yet if this is correct.
> 
> 
> No it does not work. scheduling while preemption is disabled.

Hmm... which exceptions do we expect to take from a guest?

I wonder if we can handle those more like other architectures by getting those
to immediately return from the sie64a() call with some status code that we can
handle outside of the guest_state_{enter,exit}_irqoff() critical section.

On arm64 in VHE mode, we swap our exception vectors when entering/exiting the
guest to allow us to do that; I wonder if we could do similar here?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-21 14:30               ` Mark Rutland
@ 2022-01-21 14:42                 ` Christian Borntraeger
  2022-01-21 15:29                   ` Mark Rutland
  0 siblings, 1 reply; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-21 14:42 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paolo Bonzini, linux-kernel, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra



Am 21.01.22 um 15:30 schrieb Mark Rutland:
> On Fri, Jan 21, 2022 at 03:17:01PM +0100, Christian Borntraeger wrote:
>>
>>
>> Am 21.01.22 um 10:53 schrieb Christian Borntraeger:
>>> Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
>>>>
>>>>
>>>> Am 20.01.22 um 13:03 schrieb Mark Rutland:
>>>>> On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
>>>>>> On 1/19/22 20:22, Mark Rutland wrote:
>>>>>>> I wonder, is the s390 guest entry/exit*preemptible*  ?
>>>>>>>
>>>>>>> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
>>>>>>> things before a ctx-switch to the idle thread, which would then be able
>>>>>>> to hit this.
>>>>>>>
>>>>>>> I'll need to go audit the other architectures for similar.
>>>>>>
>>>>>> They don't enable interrupts in the entry/exit path so they should be okay.
>>>>>
>>>>> True.
>>>>>
>>>>> So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
>>>>> right thing to do. I'll add that and explanatory commentary.
>>>>
>>>> That would not be trivial I guess. We do allow (and need) page faults on sie for guest
>>>> demand paging and
>>>>
>>>> this piece of arch/s390/mm/fault.c
>>>>
>>>>          case GMAP_FAULT:
>>>>                   if (faulthandler_disabled() || !mm)
>>>>                           goto out;
>>>>                   break;
>>>>           }
>>>>
>>>> would no longer work since faulthandler_disabled checks for the preempt count.
>>>>
>>>
>>>
>>> Something like this
>>>
>>>
>>> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
>>> index d30f5986fa85..1c7d45346e12 100644
>>> --- a/arch/s390/mm/fault.c
>>> +++ b/arch/s390/mm/fault.c
>>> @@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>>>                           return 0;
>>>                   goto out;
>>>           case USER_FAULT:
>>> -       case GMAP_FAULT:
>>>                   if (faulthandler_disabled() || !mm)
>>>                           goto out;
>>>                   break;
>>> +               /*
>>> +                * We know that we interrupted SIE and we are not in a IRQ.
>>> +                * preemption might be disabled thus checking for in_atomic
>>> +                * would result in failures
>>> +                */
>>> +       case GMAP_FAULT:
>>> +               if (pagefault_disabled() || !mm)
>>> +                       goto out;
>>> +               break;
>>>           }
>>>
>>>           perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
>>>
>>> seems to work with preemption disabled around sie. Not sure yet if this is correct.
>>
>>
>> No it does not work. scheduling while preemption is disabled.
> 
> Hmm... which exceptions do we expect to take from a guest?
> 
> I wonder if we can handle those more like other architectures by getting those
> to immediately return from the sie64a() call with some status code that we can
> handle outside of the guest_state_{enter,exit}_irqoff() critical section.

We take all kind of page faults (invalid->valid, ro->rw) on the sie instruction and
run that in the context of the pgm_check handler just like for userspace. the pgm_check
handler does a sie_exit (similar to the interrupt handlers) by setting the return IA.
  
> On arm64 in VHE mode, we swap our exception vectors when entering/exiting the
> guest to allow us to do that; I wonder if we could do similar here?



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-21 14:42                 ` Christian Borntraeger
@ 2022-01-21 15:29                   ` Mark Rutland
  2022-01-21 15:40                     ` Christian Borntraeger
  0 siblings, 1 reply; 41+ messages in thread
From: Mark Rutland @ 2022-01-21 15:29 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, linux-kernel, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra

On Fri, Jan 21, 2022 at 03:42:48PM +0100, Christian Borntraeger wrote:
> Am 21.01.22 um 15:30 schrieb Mark Rutland:
> > On Fri, Jan 21, 2022 at 03:17:01PM +0100, Christian Borntraeger wrote:
> > > Am 21.01.22 um 10:53 schrieb Christian Borntraeger:
> > > > Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
> > > > > Am 20.01.22 um 13:03 schrieb Mark Rutland:
> > > > > > On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
> > > > > > > On 1/19/22 20:22, Mark Rutland wrote:
> > > > > > > > I wonder, is the s390 guest entry/exit*preemptible*  ?
> > > > > > > > 
> > > > > > > > If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
> > > > > > > > things before a ctx-switch to the idle thread, which would then be able
> > > > > > > > to hit this.
> > > > > > > > 
> > > > > > > > I'll need to go audit the other architectures for similar.
> > > > > > > 
> > > > > > > They don't enable interrupts in the entry/exit path so they should be okay.
> > > > > > 
> > > > > > True.
> > > > > > 
> > > > > > So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
> > > > > > right thing to do. I'll add that and explanatory commentary.
> > > > > 
> > > > > That would not be trivial I guess. We do allow (and need) page faults on sie for guest
> > > > > demand paging and
> > > > > 
> > > > > this piece of arch/s390/mm/fault.c
> > > > > 
> > > > >          case GMAP_FAULT:
> > > > >                   if (faulthandler_disabled() || !mm)
> > > > >                           goto out;
> > > > >                   break;
> > > > >           }
> > > > > 
> > > > > would no longer work since faulthandler_disabled checks for the preempt count.
> > > > > 
> > > > 
> > > > 
> > > > Something like this
> > > > 
> > > > 
> > > > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> > > > index d30f5986fa85..1c7d45346e12 100644
> > > > --- a/arch/s390/mm/fault.c
> > > > +++ b/arch/s390/mm/fault.c
> > > > @@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> > > >                           return 0;
> > > >                   goto out;
> > > >           case USER_FAULT:
> > > > -       case GMAP_FAULT:
> > > >                   if (faulthandler_disabled() || !mm)
> > > >                           goto out;
> > > >                   break;
> > > > +               /*
> > > > +                * We know that we interrupted SIE and we are not in a IRQ.
> > > > +                * preemption might be disabled thus checking for in_atomic
> > > > +                * would result in failures
> > > > +                */
> > > > +       case GMAP_FAULT:
> > > > +               if (pagefault_disabled() || !mm)
> > > > +                       goto out;
> > > > +               break;
> > > >           }
> > > > 
> > > >           perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
> > > > 
> > > > seems to work with preemption disabled around sie. Not sure yet if this is correct.
> > > 
> > > 
> > > No it does not work. scheduling while preemption is disabled.
> > 
> > Hmm... which exceptions do we expect to take from a guest?
> > 
> > I wonder if we can handle those more like other architectures by getting those
> > to immediately return from the sie64a() call with some status code that we can
> > handle outside of the guest_state_{enter,exit}_irqoff() critical section.
> 
> We take all kind of page faults (invalid->valid, ro->rw) on the sie instruction and
> run that in the context of the pgm_check handler just like for userspace.

Do we only expect to tak faults from a guest (which IIUC at the GMAP_FAULT
cases in the bit above), or are there other esceptions we expect to take from
the middle of a SIE?

> the pgm_check handler does a sie_exit (similar to the interrupt handlers) by
> setting the return IA.

Sure, but that sie_exit happens *after* the handler runs, where as I'm asking
whether we can structure this like all the other architectures and turn all
exceptions into returns from sie64a() that we can handle after having called
guest_state_exit_irqoff().

> > On arm64 in VHE mode, we swap our exception vectors when entering/exiting the
> > guest to allow us to do that; I wonder if we could do similar here?

Does this idea sound at all plausible?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs
  2022-01-21 15:29                   ` Mark Rutland
@ 2022-01-21 15:40                     ` Christian Borntraeger
  0 siblings, 0 replies; 41+ messages in thread
From: Christian Borntraeger @ 2022-01-21 15:40 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paolo Bonzini, linux-kernel, Michael Ellerman,
	aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, bp, catalin.marinas, chenhuacai, dave.hansen,
	frankja, frederic, gor, hca, james.morse, jmattson, joro, luto,
	maz, mingo, nsaenzju, palmer, paulmck, paul.walmsley, peterz,
	seanjc, suzuki.poulose, svens, tglx, tsbogend, vkuznets,
	wanpengli, will, Anup Patel, Atish Patra



Am 21.01.22 um 16:29 schrieb Mark Rutland:
> On Fri, Jan 21, 2022 at 03:42:48PM +0100, Christian Borntraeger wrote:
>> Am 21.01.22 um 15:30 schrieb Mark Rutland:
>>> On Fri, Jan 21, 2022 at 03:17:01PM +0100, Christian Borntraeger wrote:
>>>> Am 21.01.22 um 10:53 schrieb Christian Borntraeger:
>>>>> Am 20.01.22 um 16:14 schrieb Christian Borntraeger:
>>>>>> Am 20.01.22 um 13:03 schrieb Mark Rutland:
>>>>>>> On Thu, Jan 20, 2022 at 12:28:09PM +0100, Paolo Bonzini wrote:
>>>>>>>> On 1/19/22 20:22, Mark Rutland wrote:
>>>>>>>>> I wonder, is the s390 guest entry/exit*preemptible*  ?
>>>>>>>>>
>>>>>>>>> If a timer IRQ can preempt in the middle of the EQS, we wouldn't balance
>>>>>>>>> things before a ctx-switch to the idle thread, which would then be able
>>>>>>>>> to hit this.
>>>>>>>>>
>>>>>>>>> I'll need to go audit the other architectures for similar.
>>>>>>>>
>>>>>>>> They don't enable interrupts in the entry/exit path so they should be okay.
>>>>>>>
>>>>>>> True.
>>>>>>>
>>>>>>> So it sounds like for s390 adding an explicit preempt_{disable,enable}() is the
>>>>>>> right thing to do. I'll add that and explanatory commentary.
>>>>>>
>>>>>> That would not be trivial I guess. We do allow (and need) page faults on sie for guest
>>>>>> demand paging and
>>>>>>
>>>>>> this piece of arch/s390/mm/fault.c
>>>>>>
>>>>>>           case GMAP_FAULT:
>>>>>>                    if (faulthandler_disabled() || !mm)
>>>>>>                            goto out;
>>>>>>                    break;
>>>>>>            }
>>>>>>
>>>>>> would no longer work since faulthandler_disabled checks for the preempt count.
>>>>>>
>>>>>
>>>>>
>>>>> Something like this
>>>>>
>>>>>
>>>>> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
>>>>> index d30f5986fa85..1c7d45346e12 100644
>>>>> --- a/arch/s390/mm/fault.c
>>>>> +++ b/arch/s390/mm/fault.c
>>>>> @@ -385,10 +385,18 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>>>>>                            return 0;
>>>>>                    goto out;
>>>>>            case USER_FAULT:
>>>>> -       case GMAP_FAULT:
>>>>>                    if (faulthandler_disabled() || !mm)
>>>>>                            goto out;
>>>>>                    break;
>>>>> +               /*
>>>>> +                * We know that we interrupted SIE and we are not in a IRQ.
>>>>> +                * preemption might be disabled thus checking for in_atomic
>>>>> +                * would result in failures
>>>>> +                */
>>>>> +       case GMAP_FAULT:
>>>>> +               if (pagefault_disabled() || !mm)
>>>>> +                       goto out;
>>>>> +               break;
>>>>>            }
>>>>>
>>>>>            perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
>>>>>
>>>>> seems to work with preemption disabled around sie. Not sure yet if this is correct.
>>>>
>>>>
>>>> No it does not work. scheduling while preemption is disabled.
>>>
>>> Hmm... which exceptions do we expect to take from a guest?
>>>
>>> I wonder if we can handle those more like other architectures by getting those
>>> to immediately return from the sie64a() call with some status code that we can
>>> handle outside of the guest_state_{enter,exit}_irqoff() critical section.
>>
>> We take all kind of page faults (invalid->valid, ro->rw) on the sie instruction and
>> run that in the context of the pgm_check handler just like for userspace.
> 
> Do we only expect to tak faults from a guest (which IIUC at the GMAP_FAULT
> cases in the bit above), or are there other esceptions we expect to take from
> the middle of a SIE?
> 
>> the pgm_check handler does a sie_exit (similar to the interrupt handlers) by
>> setting the return IA.
> 
> Sure, but that sie_exit happens *after* the handler runs, where as I'm asking
> whether we can structure this like all the other architectures and turn all
> exceptions into returns from sie64a() that we can handle after having called
> guest_state_exit_irqoff().
> 
>>> On arm64 in VHE mode, we swap our exception vectors when entering/exiting the
>>> guest to allow us to do that; I wonder if we could do similar here?
> 
> Does this idea sound at all plausible?

Maybe. We already have something like that for async_pf (see kvm_arch_setup_async_pf)
where we handle the pagefault later on after first kicking off the resolution for major
page faults (via FAULT_FLAG_RETRY_NOWAIT) in the low level handler.
Let me think about it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/7] entry: add arch_in_rcu_eqs()
  2022-01-19 10:58 ` [PATCH v2 1/7] entry: add arch_in_rcu_eqs() Mark Rutland
  2022-01-19 17:35   ` Christian Borntraeger
@ 2022-01-21 17:34   ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 41+ messages in thread
From: Nicolas Saenz Julienne @ 2022-01-21 17:34 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, maz, mingo, mpe, palmer, paulmck, paul.walmsley, pbonzini,
	seanjc, suzuki.poulose, svens, tsbogend, vkuznets, wanpengli,
	will

On Wed, 2022-01-19 at 10:58 +0000, Mark Rutland wrote:
> All architectures have an interruptible RCU extended quiescent state
> (EQS) as part of their idle sequences, where interrupts can occur
> without RCU watching. Entry code must account for this and wake RCU as
> necessary; the common entry code deals with this in irqentry_enter() by
> treating any interrupt from an idle thread as potentially having
> occurred with an EQS and waking RCU for the duration of the interrupt
> via rcu_irq_enter() .. rcu_irq_exit().
> 
> Some architectures may have other interruptible EQSs which require
> similar treatment. For example, on s390 is it necessary to enable
> interrupts around guest entry in the middle of a period where core KVM
> code has entered an EQS.
> 
> So that architectueres can wake RCU in these cases, this patch adds a
> new arch_in_rcu_eqs() hook to the common entry code which is checked in
> addition to the existing is_idle_thread() check, with RCU woken if
> either returns true. A default implementation is provided which always
> returns false, which suffices for most architectures.
> 
> As no architectures currently implement arch_in_rcu_eqs(), there should
> be no functional change as a result of this patch alone. A subsequent
> patch will add an s390 implementation to fix a latent bug with missing
> RCU wakeups.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---

Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Thanks,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff()
  2022-01-19 10:58 ` [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff() Mark Rutland
  2022-01-20 11:00   ` Paolo Bonzini
@ 2022-01-21 17:35   ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 41+ messages in thread
From: Nicolas Saenz Julienne @ 2022-01-21 17:35 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, palmer, paulmck, paul.walmsley,
	pbonzini, peterz, seanjc, suzuki.poulose, svens, tglx, tsbogend,
	vkuznets, wanpengli, will

On Wed, 2022-01-19 at 10:58 +0000, Mark Rutland wrote:
> When transitioning to/from guest mode, it is necessary to inform
> lockdep, tracing, and RCU in a specific order, similar to the
> requirements for transitions to/from user mode. Additionally, it is
> necessary to perform vtime accounting for a window around running the
> guest, with RCU enabled, such that timer interrupts taken from the guest
> can be accounted as guest time.
> 
> Most architectures don't handle all the necessary pieces, and a have a
> number of common bugs, including unsafe usage of RCU during the window
> between guest_enter() and guest_exit().
> 
> On x86, this was dealt with across commits:
> 
>   87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
>   0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
>   9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
>   3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
>   135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
>   160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
>   bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
> 
> ... but those fixes are specific to x86, and as the resulting logic
> (while correct) is split across generic helper functions and
> x86-specific helper functions, it is difficult to see that the
> entry/exit accounting is balanced.
> 
> This patch adds generic helpers which architectures can use to handle
> guest entry/exit consistently and correctly. The guest_{enter,exit}()
> helpers are split into guest_timing_{enter,exit}() to perform vtime
> accounting, and guest_context_{enter,exit}() to perform the necessary
> context tracking and RCU management. The existing guest_{enter,exit}()
> heleprs are left as wrappers of these.
> 
> Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff()
> helpers are added to handle the ordering of lockdep, tracing, and RCU
> manageent. These are inteneded to mirror exit_to_user_mode() and
> enter_from_user_mode().
> 
> Subsequent patches will migrate architectures over to the new helpers,
> following a sequence:
> 
> 	guest_timing_enter_irqoff();
> 
> 	guest_state_enter_irqoff();
> 	< run the vcpu >
> 	guest_state_exit_irqoff();
> 
> 	< take any pending IRQs >
> 
> 	guest_timing_exit_irqoff();
> 
> This sequences handles all of the above correctly, and more clearly
> balances the entry and exit portions, making it easier to understand.
> 
> The existing helpers are marked as deprecated, and will be removed once
> all architectures have been converted.
> 
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Marc Zyngier <maz@kernel.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> ---

Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Thanks,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 3/7] kvm/arm64: rework guest entry logic
  2022-01-19 10:58 ` [PATCH v2 3/7] kvm/arm64: rework guest entry logic Mark Rutland
@ 2022-01-21 17:37   ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 41+ messages in thread
From: Nicolas Saenz Julienne @ 2022-01-21 17:37 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, palmer, paulmck, paul.walmsley,
	pbonzini, peterz, seanjc, suzuki.poulose, svens, tglx, tsbogend,
	vkuznets, wanpengli, will

On Wed, 2022-01-19 at 10:58 +0000, Mark Rutland wrote:
> In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state
> (EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to
> exiting the EQS by calling guest_exit(). As the IRQ entry code will not
> wake RCU in this case, we may run the core IRQ code and IRQ handler
> without RCU watching, leading to various potential problems.
> 
> Additionally, we do not inform lockdep or tracing that interrupts will
> be enabled during guest execution, which caan lead to misleading traces
> and warnings that interrupts have been enabled for overly-long periods.
> 
> This patch fixes these issues by using the new timing and context
> entry/exit helpers to ensure that interrupts are handled during guest
> vtime but with RCU watching, with a sequence:
> 
> 	guest_timing_enter_irqoff();
> 
> 	guest_state_enter_irqoff();
> 	< run the vcpu >
> 	guest_state_exit_irqoff();
> 
> 	< take any pending IRQs >
> 
> 	guest_timing_exit_irqoff();
> 
> Since instrumentation may make use of RCU, we must also ensure that no
> instrumented code is run during the EQS. I've split out the critical
> section into a new kvm_arm_enter_exit_vcpu() helper which is marked
> noinstr.
> 
> Fixes: 1b3d546daf85ed2b ("arm/arm64: KVM: Properly account for guest CPU time")
> Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Marc Zyngier <maz@kernel.org>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: James Morse <james.morse@arm.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---

Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Thanks,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/7] kvm/x86: rework guest entry logic
  2022-01-19 10:58 ` [PATCH v2 7/7] kvm/x86: " Mark Rutland
  2022-01-20 11:20   ` Paolo Bonzini
@ 2022-01-21 17:40   ` Nicolas Saenz Julienne
  1 sibling, 0 replies; 41+ messages in thread
From: Nicolas Saenz Julienne @ 2022-01-21 17:40 UTC (permalink / raw)
  To: Mark Rutland, linux-kernel
  Cc: aleksandar.qemu.devel, alexandru.elisei, anup.patel, aou,
	atish.patra, borntraeger, bp, catalin.marinas, chenhuacai,
	dave.hansen, frankja, frederic, gor, hca, james.morse, jmattson,
	joro, luto, maz, mingo, mpe, palmer, paulmck, paul.walmsley,
	pbonzini, peterz, seanjc, suzuki.poulose, svens, tglx, tsbogend,
	vkuznets, wanpengli, will

On Wed, 2022-01-19 at 10:58 +0000, Mark Rutland wrote:
> For consistency and clarity, migrate x86 over to the generic helpers for
> guest timing and lockdep/RCU/tracing management, and remove the
> x86-specific helpers.
> 
> Prior to this patch, the guest timing was entered in
> kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and
> svm_vcpu_enter_exit()), and was exited by the call to
> vtime_account_guest_exit() within vcpu_enter_guest().
> 
> To minimize duplication and to more clearly balance entry and exit, both
> entry and exit of guest timing are placed in vcpu_enter_guest(), using
> the new guest_timing_{enter,exit}_irqoff() helpers. When context
> tracking is used a small amount of additional time will be accounted
> towards guests; tick-based accounting is unnaffected as IRQs are
> disabled at this point and not enabled until after the return from the
> guest.
> 
> This also corrects (benign) mis-balanced context tracking accounting
> introduced in commits:
> 
>   ae95f566b3d22ade ("KVM: X86: TSCDEADLINE MSR emulation fastpath")
>   26efe2fd92e50822 ("KVM: VMX: Handle preemption timer fastpath")
> 
> Where KVM can enter a guest multiple times, calling vtime_guest_enter()
> without a corresponding call to vtime_account_guest_exit(), and with
> vtime_account_system() called when vtime_account_guest() should be used.
> As account_system_time() checks PF_VCPU and calls account_guest_time(),
> this doesn't result in any functional problem, but is unnecessarily
> confusing.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> ---

Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Thanks,

-- 
Nicolás Sáenz


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-01-21 17:40 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 10:58 [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Mark Rutland
2022-01-19 10:58 ` [PATCH v2 1/7] entry: add arch_in_rcu_eqs() Mark Rutland
2022-01-19 17:35   ` Christian Borntraeger
2022-01-21 17:34   ` Nicolas Saenz Julienne
2022-01-19 10:58 ` [PATCH v2 2/7] kvm: add guest_state_{enter,exit}_irqoff() Mark Rutland
2022-01-20 11:00   ` Paolo Bonzini
2022-01-21 17:35   ` Nicolas Saenz Julienne
2022-01-19 10:58 ` [PATCH v2 3/7] kvm/arm64: rework guest entry logic Mark Rutland
2022-01-21 17:37   ` Nicolas Saenz Julienne
2022-01-19 10:58 ` [PATCH v2 4/7] kvm/mips: " Mark Rutland
2022-01-20 11:10   ` Paolo Bonzini
2022-01-20 13:33     ` Mark Rutland
2022-01-20 16:44   ` Mark Rutland
2022-01-20 16:57     ` Paolo Bonzini
2022-01-20 17:15       ` Mark Rutland
2022-01-20 17:17         ` Sean Christopherson
2022-01-20 17:29         ` Paolo Bonzini
2022-01-21 12:44           ` Mark Rutland
2022-01-19 10:58 ` [PATCH v2 5/7] kvm/riscv: " Mark Rutland
2022-01-20 11:18   ` Paolo Bonzini
2022-01-20 12:56     ` Mark Rutland
2022-01-20 13:13       ` Paolo Bonzini
2022-01-19 10:58 ` [PATCH v2 6/7] kvm/s390: " Mark Rutland
2022-01-19 10:58 ` [PATCH v2 7/7] kvm/x86: " Mark Rutland
2022-01-20 11:20   ` Paolo Bonzini
2022-01-21 17:40   ` Nicolas Saenz Julienne
2022-01-19 18:25 ` [PATCH v2 0/7] kvm: fix latent guest entry/exit bugs Christian Borntraeger
2022-01-19 18:28   ` Christian Borntraeger
2022-01-19 19:22   ` Mark Rutland
2022-01-19 19:30     ` Christian Borntraeger
2022-01-20 11:57       ` Mark Rutland
2022-01-20 12:02         ` Christian Borntraeger
2022-01-20 11:28     ` Paolo Bonzini
2022-01-20 12:03       ` Mark Rutland
2022-01-20 15:14         ` Christian Borntraeger
2022-01-21  9:53           ` Christian Borntraeger
2022-01-21 14:17             ` Christian Borntraeger
2022-01-21 14:30               ` Mark Rutland
2022-01-21 14:42                 ` Christian Borntraeger
2022-01-21 15:29                   ` Mark Rutland
2022-01-21 15:40                     ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).