linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, "Paul E. McKenney" <paulmck@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>, Brian Gerst <brgerst@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Will Deacon <will@kernel.org>
Subject: [patch V4 part 5 07/31] x86/entry: Provide idtentry_entry/exit_cond_rcu()
Date: Tue, 05 May 2020 15:53:48 +0200	[thread overview]
Message-ID: <20200505135828.808686575@linutronix.de> (raw)
In-Reply-To: 20200505135341.730586321@linutronix.de

The pagefault handler cannot use the regular idtentry_enter() because on
that invokes rcu_irq_enter() the pagefault was caused in the kernel. Not a
problem per se, but kernel side page faults can schedule which is not
possible without invoking rcu_irq_exit().

Adding rcu_irq_exit() and a matching rcu_irq_enter() into the actual
pagefault handling code is possible, but not pretty either.

Provide idtentry_entry/exit_cond_rcu() which calls rcu_irq_enter() only
when RCU is not watching. While this is not a legit kernel #PF establishing
RCU before handling it avoids RCU side effects which might affect
debugability.

The function is also useful for implementing lightweight scheduler IPI
entry handling later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/common.c         |  119 ++++++++++++++++++++++++++++++++++------
 arch/x86/include/asm/idtentry.h |    3 +
 2 files changed, 106 insertions(+), 16 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -515,6 +515,28 @@ SYSCALL_DEFINE0(ni_syscall)
 	return -ENOSYS;
 }
 
+static __always_inline bool __idtentry_enter(struct pt_regs *regs,
+					     bool cond_rcu)
+{
+	if (user_mode(regs)) {
+		enter_from_user_mode();
+	} else {
+		if (!cond_rcu || !rcu_is_watching()) {
+			lockdep_hardirqs_off(CALLER_ADDR0);
+			rcu_irq_enter();
+			instr_begin();
+			trace_hardirqs_off_prepare();
+			instr_end();
+			return true;
+		} else {
+			instr_begin();
+			trace_hardirqs_off();
+			instr_end();
+		}
+	}
+	return false;
+}
+
 /**
  * idtentry_enter - Handle state tracking on idtentry
  * @regs:	Pointer to pt_regs of interrupted context
@@ -532,19 +554,60 @@ SYSCALL_DEFINE0(ni_syscall)
  */
 void noinstr idtentry_enter(struct pt_regs *regs)
 {
-	if (user_mode(regs)) {
-		enter_from_user_mode();
-	} else {
-		lockdep_hardirqs_off(CALLER_ADDR0);
-		rcu_irq_enter();
-		instr_begin();
-		trace_hardirqs_off_prepare();
-		instr_end();
-	}
+	__idtentry_enter(regs, false);
+}
+
+/**
+ * idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional
+ *			     RCU handling
+ * @regs:	Pointer to pt_regs of interrupted context
+ *
+ * Invokes:
+ *  - lockdep irqflag state tracking as low level ASM entry disabled
+ *    interrupts.
+ *
+ *  - Context tracking if the exception hit user mode.
+ *
+ *  - The hardirq tracer to keep the state consistent as low level ASM
+ *    entry disabled interrupts.
+ *
+ * For kernel mode entries the conditional RCU handling is useful for two
+ * purposes
+ *
+ * 1) Pagefaults: Kernel code can fault and sleep, e.g. on exec. This code
+ *    is not in an RCU idle section. If rcu_irq_enter() would be invoked
+ *    then nothing would invoke rcu_irq_exit() before scheduling.
+ *
+ *   If the kernel faults in a RCU idle section then all bets are off
+ *   anyway but at least avoiding a subsequent issue vs. RCU is helpful for
+ *   debugging.
+ *
+ * 2) Scheduler IPI: To avoid the overhead of a regular idtentry vs. RCU
+ *    and irq_enter() the IPI can be made lightweight if the tracepoints
+ *    are not enabled. While the IPI functionality itself does not require
+ *    RCU (folding preempt count) it still calls out into instrumentable
+ *    functions, e.g. ack_APIC_irq(). The scheduler IPI can hit RCU idle
+ *    sections, so RCU needs to be adjusted. For the fast path case, e.g.
+ *    KVM kicking a vCPU out of guest mode this can be avoided because the
+ *    IPI is handled after KVM reestablished kernel context including RCU.
+ *
+ * For user mode entries enter_from_user_mode() must be invoked to
+ * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
+ * would not be possible.
+ *
+ * Returns: True if RCU has been adjusted on a kernel entry
+ *	    False otherwise
+ *
+ * The return value must be fed into the rcu_exit argument of
+ * idtentry_exit_cond_rcu().
+ */
+bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
+{
+	return __idtentry_enter(regs, true);
 }
 
 static __always_inline void __idtentry_exit(struct pt_regs *regs,
-					    bool preempt_hcall)
+					    bool preempt_hcall, bool rcu_exit)
 {
 	lockdep_assert_irqs_disabled();
 
@@ -568,7 +631,8 @@ static __always_inline void __idtentry_e
 			 */
 			if (!preempt_count()) {
 				instr_begin();
-				rcu_irq_exit_preempt();
+				if (rcu_exit)
+					rcu_irq_exit_preempt();
 				if (need_resched())
 					preempt_schedule_irq();
 				/* Covers both tracing and lockdep */
@@ -592,11 +656,13 @@ static __always_inline void __idtentry_e
 		trace_hardirqs_on_prepare();
 		lockdep_hardirqs_on_prepare(CALLER_ADDR0);
 		instr_end();
-		rcu_irq_exit();
+		if (rcu_exit)
+			rcu_irq_exit();
 		lockdep_hardirqs_on(CALLER_ADDR0);
 	} else {
 		/* IRQ flags state is correct already. Just tell RCU */
-		rcu_irq_exit();
+		if (rcu_exit)
+			rcu_irq_exit();
 	}
 }
 
@@ -617,7 +683,28 @@ static __always_inline void __idtentry_e
  */
 void noinstr idtentry_exit(struct pt_regs *regs)
 {
-	__idtentry_exit(regs, false);
+	__idtentry_exit(regs, false, true);
+}
+
+/**
+ * idtentry_exit_cond_rcu - Handle return from exception with conditional RCU
+ *			    handling
+ * @regs:	Pointer to pt_regs (exception entry regs)
+ * @rcu_exit:	Invoke rcu_irq_exit() if true
+ *
+ * Depending on the return target (kernel/user) this runs the necessary
+ * preemption and work checks if possible and reguired and returns to
+ * the caller with interrupts disabled and no further work pending.
+ *
+ * This is the last action before returning to the low level ASM code which
+ * just needs to return to the appropriate context.
+ *
+ * Counterpart to idtentry_enter_cond_rcu(). The return value of the entry
+ * function must be fed into the @rcu_exit argument.
+ */
+void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
+{
+	__idtentry_exit(regs, false, rcu_exit);
 }
 
 #ifdef CONFIG_XEN_PV
@@ -658,11 +745,11 @@ static noinstr void run_on_irqstack(void
 	set_irq_regs(old_regs);
 
 	if (IS_ENABLED(CONFIG_PREEMPTION)) {
-		__idtentry_exit(regs, false);
+		__idtentry_exit(regs, false, true);
 	} else {
 		bool inhcall = __this_cpu_read(xen_in_preemptible_hcall);
 
-		__idtentry_exit(regs, inhcall && need_resched());
+		__idtentry_exit(regs, inhcall && need_resched(), true);
 	}
 }
 #endif /* CONFIG_XEN_PV */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -10,6 +10,9 @@
 void idtentry_enter(struct pt_regs *regs);
 void idtentry_exit(struct pt_regs *regs);
 
+bool idtentry_enter_cond_rcu(struct pt_regs *regs);
+void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit);
+
 /**
  * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
  *		      No error code pushed by hardware


  parent reply	other threads:[~2020-05-05 14:16 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 13:53 [patch V4 part 5 00/31] x86/entry: Entry/exception code rework, Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 01/31] genirq: Provide irq_enter/exit_rcu() Thomas Gleixner
2020-05-15  5:53   ` Andy Lutomirski
2020-05-05 13:53 ` [patch V4 part 5 02/31] x86/entry: Provide helpers for execute on irqstack Thomas Gleixner
2020-05-06  8:20   ` Thomas Gleixner
2020-05-10  4:33   ` Lai Jiangshan
2020-05-11  9:07   ` Alexandre Chartre
2020-05-11 11:54     ` Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 03/31] x86/entry/64: Move softirq stack switch to C Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 04/31] x86/entry: Split idtentry_enter/exit() Thomas Gleixner
2020-05-11 12:42   ` Alexandre Chartre
2020-05-05 13:53 ` [patch V4 part 5 05/31] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY Thomas Gleixner
2020-05-07  2:11   ` Boris Ostrovsky
2020-05-07  8:30     ` Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 06/31] x86/entry/64: Simplify idtentry_body Thomas Gleixner
2020-05-05 13:53 ` Thomas Gleixner [this message]
2020-05-11 13:53   ` [patch V4 part 5 07/31] x86/entry: Provide idtentry_entry/exit_cond_rcu() Alexandre Chartre
2020-05-11 14:13     ` Peter Zijlstra
2020-05-12 16:30     ` Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 08/31] x86/entry: Switch page fault exception to IDTENTRY_RAW Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 09/31] x86/entry: Remove the transition leftovers Thomas Gleixner
2020-05-11 14:11   ` Alexandre Chartre
2020-05-05 13:53 ` [patch V4 part 5 10/31] x86/entry: Change exit path of xen_failsafe_callback Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 11/31] x86/entry/64: Remove error_exit Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 12/31] x86/entry/32: Remove common_exception Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 13/31] x86/irq: Convey vector as argument and not in ptregs Thomas Gleixner
2020-05-10  2:44   ` Lai Jiangshan
2020-05-11 14:35     ` Thomas Gleixner
2020-05-11 15:11       ` Lai Jiangshan
2020-05-05 13:53 ` [patch V4 part 5 14/31] x86/irq/64: Provide handle_irq() Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 15/31] x86/entry: Add IRQENTRY_IRQ macro Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 16/31] x86/entry: Use idtentry for interrupts Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 17/31] x86/entry: Provide IDTENTRY_SYSVEC Thomas Gleixner
2020-05-05 13:53 ` [patch V4 part 5 18/31] x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 19/31] x86/entry: Convert SMP system vectors " Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 20/31] x86/entry: Convert various system vectors Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 21/31] x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 22/31] x86/entry: Convert various hypervisor " Thomas Gleixner
2020-05-06 16:56   ` Wei Liu
2020-05-06 17:11     ` Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 23/31] x86/entry: Convert XEN hypercall vector " Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 24/31] x86/entry: Convert reschedule interrupt to IDTENTRY_RAW Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 25/31] x86/entry: Remove the apic/BUILD interrupt leftovers Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 26/31] x86/entry/64: Remove IRQ stack switching ASM Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 27/31] x86/entry: Make enter_from_user_mode() static Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 28/31] x86/entry/32: Remove redundant irq disable code Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 29/31] x86/entry/64: Remove TRACE_IRQS_*_DEBUG Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 30/31] x86/entry: Move paranoid irq tracing out of ASM code Thomas Gleixner
2020-05-05 13:54 ` [patch V4 part 5 31/31] x86/entry: Remove the TRACE_IRQS cruft Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200505135828.808686575@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alexandre.chartre@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=brgerst@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).