linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, "Paul E. McKenney" <paulmck@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>, Brian Gerst <brgerst@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Will Deacon <will@kernel.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Wei Liu <wei.liu@kernel.org>,
	Michael Kelley <mikelley@microsoft.com>,
	Jason Chen CJ <jason.cj.chen@intel.com>,
	Zhao Yakui <yakui.zhao@intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>
Subject: [patch V5 12/38] x86/entry: Provide idtentry_entry/exit_cond_rcu()
Date: Tue, 12 May 2020 23:01:11 +0200	[thread overview]
Message-ID: <20200512213810.611305682@linutronix.de> (raw)
In-Reply-To: 20200512210059.056244513@linutronix.de

The pagefault handler cannot use the regular idtentry_enter() because that
invokes rcu_irq_enter() if the pagefault was caused in the kernel. Not a
problem per se, but kernel side page faults can schedule which is not
possible without invoking rcu_irq_exit().

Adding rcu_irq_exit() and a matching rcu_irq_enter() into the actual
pagefault handling code would be possible, but not pretty either.

Provide idtentry_entry/exit_cond_rcu() which calls rcu_irq_enter() only
when RCU is not watching. The conditional RCU enabling is a correctness
issue: A kernel page fault which hits a RCU idle reason can neither
schedule nor is it likely to survive. But avoiding RCU warnings or RCU side
effects is at least increasing the chance for useful debug output.

The function is also useful for implementing lightweight reschedule IPI and
KVM posted interrupt IPI entry handling later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V5: Add comments about the rcu_exit conditional and make changelog readable.
---
 arch/x86/entry/common.c         |  142 +++++++++++++++++++++++++++++++++++-----
 arch/x86/include/asm/idtentry.h |    3 
 2 files changed, 128 insertions(+), 17 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -515,6 +515,36 @@ SYSCALL_DEFINE0(ni_syscall)
 	return -ENOSYS;
 }
 
+static __always_inline bool __idtentry_enter(struct pt_regs *regs,
+					     bool cond_rcu)
+{
+	if (user_mode(regs)) {
+		enter_from_user_mode();
+	} else {
+		if (!cond_rcu || !__rcu_is_watching()) {
+			/*
+			 * If RCU is not watching then the same careful
+			 * sequence vs. lockdep and tracing is required.
+			 */
+			lockdep_hardirqs_off(CALLER_ADDR0);
+			rcu_irq_enter();
+			instrumentation_begin();
+			trace_hardirqs_off_prepare();
+			instrumentation_end();
+			return true;
+		} else {
+			/*
+			 * If RCU is watching then the combo function
+			 * can be used.
+			 */
+			instrumentation_begin();
+			trace_hardirqs_off();
+			instrumentation_end();
+		}
+	}
+	return false;
+}
+
 /**
  * idtentry_enter - Handle state tracking on idtentry
  * @regs:	Pointer to pt_regs of interrupted context
@@ -532,19 +562,60 @@ SYSCALL_DEFINE0(ni_syscall)
  */
 void noinstr idtentry_enter(struct pt_regs *regs)
 {
-	if (user_mode(regs)) {
-		enter_from_user_mode();
-	} else {
-		lockdep_hardirqs_off(CALLER_ADDR0);
-		rcu_irq_enter();
-		instrumentation_begin();
-		trace_hardirqs_off_prepare();
-		instrumentation_end();
-	}
+	__idtentry_enter(regs, false);
+}
+
+/**
+ * idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional
+ *			     RCU handling
+ * @regs:	Pointer to pt_regs of interrupted context
+ *
+ * Invokes:
+ *  - lockdep irqflag state tracking as low level ASM entry disabled
+ *    interrupts.
+ *
+ *  - Context tracking if the exception hit user mode.
+ *
+ *  - The hardirq tracer to keep the state consistent as low level ASM
+ *    entry disabled interrupts.
+ *
+ * For kernel mode entries the conditional RCU handling is useful for two
+ * purposes
+ *
+ * 1) Pagefaults: Kernel code can fault and sleep, e.g. on exec. This code
+ *    is not in an RCU idle section. If rcu_irq_enter() would be invoked
+ *    then nothing would invoke rcu_irq_exit() before scheduling.
+ *
+ *   If the kernel faults in a RCU idle section then all bets are off
+ *   anyway but at least avoiding a subsequent issue vs. RCU is helpful for
+ *   debugging.
+ *
+ * 2) Scheduler IPI: To avoid the overhead of a regular idtentry vs. RCU
+ *    and irq_enter() the IPI can be made lightweight if the tracepoints
+ *    are not enabled. While the IPI functionality itself does not require
+ *    RCU (folding preempt count) it still calls out into instrumentable
+ *    functions, e.g. ack_APIC_irq(). The scheduler IPI can hit RCU idle
+ *    sections, so RCU needs to be adjusted. For the fast path case, e.g.
+ *    KVM kicking a vCPU out of guest mode this can be avoided because the
+ *    IPI is handled after KVM reestablished kernel context including RCU.
+ *
+ * For user mode entries enter_from_user_mode() must be invoked to
+ * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
+ * would not be possible.
+ *
+ * Returns: True if RCU has been adjusted on a kernel entry
+ *	    False otherwise
+ *
+ * The return value must be fed into the rcu_exit argument of
+ * idtentry_exit_cond_rcu().
+ */
+bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
+{
+	return __idtentry_enter(regs, true);
 }
 
 static __always_inline void __idtentry_exit(struct pt_regs *regs,
-					    bool preempt_hcall)
+					    bool preempt_hcall, bool rcu_exit)
 {
 	lockdep_assert_irqs_disabled();
 
@@ -570,7 +641,12 @@ static __always_inline void __idtentry_e
 				if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
 					WARN_ON_ONCE(!on_thread_stack());
 				instrumentation_begin();
-				rcu_irq_exit_preempt();
+				/*
+				 * Conditional for idtentry_exit_cond_rcu(),
+				 * unconditional for all other users.
+				 */
+				if (rcu_exit)
+					rcu_irq_exit_preempt();
 				if (need_resched())
 					preempt_schedule_irq();
 				/* Covers both tracing and lockdep */
@@ -602,11 +678,22 @@ static __always_inline void __idtentry_e
 		trace_hardirqs_on_prepare();
 		lockdep_hardirqs_on_prepare(CALLER_ADDR0);
 		instrumentation_end();
-		rcu_irq_exit();
+		/*
+		 * Conditional for idtentry_exit_cond_rcu(), unconditional
+		 * for all other users.
+		 */
+		if (rcu_exit)
+			rcu_irq_exit();
 		lockdep_hardirqs_on(CALLER_ADDR0);
 	} else {
-		/* IRQ flags state is correct already. Just tell RCU */
-		rcu_irq_exit();
+		/*
+		 * IRQ flags state is correct already. Just tell RCU.
+		 *
+		 * Conditional for idtentry_exit_cond_rcu(), unconditional
+		 * for all other users.
+		 */
+		if (rcu_exit)
+			rcu_irq_exit();
 	}
 }
 
@@ -627,7 +714,28 @@ static __always_inline void __idtentry_e
  */
 void noinstr idtentry_exit(struct pt_regs *regs)
 {
-	__idtentry_exit(regs, false);
+	__idtentry_exit(regs, false, true);
+}
+
+/**
+ * idtentry_exit_cond_rcu - Handle return from exception with conditional RCU
+ *			    handling
+ * @regs:	Pointer to pt_regs (exception entry regs)
+ * @rcu_exit:	Invoke rcu_irq_exit() if true
+ *
+ * Depending on the return target (kernel/user) this runs the necessary
+ * preemption and work checks if possible and reguired and returns to
+ * the caller with interrupts disabled and no further work pending.
+ *
+ * This is the last action before returning to the low level ASM code which
+ * just needs to return to the appropriate context.
+ *
+ * Counterpart to idtentry_enter_cond_rcu(). The return value of the entry
+ * function must be fed into the @rcu_exit argument.
+ */
+void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
+{
+	__idtentry_exit(regs, false, rcu_exit);
 }
 
 #ifdef CONFIG_XEN_PV
@@ -659,11 +767,11 @@ static void __xen_pv_evtchn_do_upcall(vo
 	set_irq_regs(old_regs);
 
 	if (IS_ENABLED(CONFIG_PREEMPTION)) {
-		__idtentry_exit(regs, false);
+		__idtentry_exit(regs, false, true);
 	} else {
 		bool inhcall = __this_cpu_read(xen_in_preemptible_hcall);
 
-		__idtentry_exit(regs, inhcall && need_resched());
+		__idtentry_exit(regs, inhcall && need_resched(), true);
 	}
 }
 #endif /* CONFIG_XEN_PV */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -10,6 +10,9 @@
 void idtentry_enter(struct pt_regs *regs);
 void idtentry_exit(struct pt_regs *regs);
 
+bool idtentry_enter_cond_rcu(struct pt_regs *regs);
+void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit);
+
 /**
  * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
  *		      No error code pushed by hardware


  parent reply	other threads:[~2020-05-12 22:25 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12 21:00 [patch V5 00/38] x86/entry: Entry/exception code rework - the leftovers Thomas Gleixner
2020-05-12 21:01 ` [patch V5 01/38] x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS Thomas Gleixner
2020-05-13  7:11   ` Jürgen Groß
2020-05-12 21:01 ` [patch V5 02/38] x86/entry/64: Use native swapgs in asm_native_load_gs_index() Thomas Gleixner
2020-05-13  2:02   ` Steven Rostedt
2020-05-13  6:34     ` Thomas Gleixner
2020-05-13  7:12   ` Jürgen Groß
2020-05-19 19:58   ` [tip: x86/entry] x86/entry/64: Use native swapgs in asm_load_gs_index() tip-bot2 for Thomas Gleixner
2020-05-12 21:01 ` [patch V5 03/38] nmi, tracing: Provide nmi_enter/exit_notrace() Thomas Gleixner
2020-05-15  1:32   ` Steven Rostedt
2020-05-15  1:35     ` Steven Rostedt
2020-05-15  1:37       ` Steven Rostedt
2020-05-12 21:01 ` [patch V5 04/38] x86: Make hardware latency tracing explicit Thomas Gleixner
2020-05-15  1:43   ` Steven Rostedt
2020-05-15 15:08     ` Thomas Gleixner
2020-05-12 21:01 ` [patch V5 05/38] genirq: Provide irq_enter/exit_rcu() Thomas Gleixner
2020-05-12 21:01 ` [patch V5 06/38] x86/entry: Provide helpers for execute on irqstack Thomas Gleixner
2020-05-13 21:43   ` Josh Poimboeuf
2020-05-12 21:01 ` [patch V5 07/38] x86/entry/64: Move do_softirq_own_stack() to C Thomas Gleixner
2020-05-12 21:01 ` [patch V5 08/38] x86/entry: Split idtentry_enter/exit() Thomas Gleixner
2020-05-12 21:01 ` [patch V5 09/38] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY Thomas Gleixner
2020-05-14 16:24   ` Boris Ostrovsky
2020-05-12 21:01 ` [patch V5 10/38] x86/entry/64: Simplify idtentry_body Thomas Gleixner
2020-05-12 21:01 ` [patch V5 11/38] rcu: Provide __rcu_is_watching() Thomas Gleixner
2020-05-19 19:52   ` [tip: core/rcu] " tip-bot2 for Thomas Gleixner
2020-05-12 21:01 ` Thomas Gleixner [this message]
2020-05-12 21:01 ` [patch V5 13/38] x86/entry: Switch page fault exception to IDTENTRY_RAW Thomas Gleixner
2020-05-12 21:01 ` [patch V5 14/38] x86/entry: Remove the transition leftovers Thomas Gleixner
2020-05-12 21:01 ` [patch V5 15/38] x86/entry: Change exit path of xen_failsafe_callback Thomas Gleixner
2020-05-12 21:01 ` [patch V5 16/38] x86/entry/64: Remove error_exit Thomas Gleixner
2020-05-12 21:01 ` [patch V5 17/38] x86/entry/32: Remove common_exception Thomas Gleixner
2020-05-12 21:01 ` [patch V5 18/38] x86/irq: Use generic irq_regs implementation Thomas Gleixner
2020-05-12 21:01 ` [patch V5 19/38] x86/irq: Convey vector as argument and not in ptregs Thomas Gleixner
2020-05-12 21:01 ` [patch V5 20/38] x86/irq/64: Provide handle_irq() Thomas Gleixner
2020-05-12 21:01 ` [patch V5 21/38] x86/entry: Add IRQENTRY_IRQ macro Thomas Gleixner
2020-05-12 21:01 ` [patch V5 22/38] x86/entry: Use idtentry for interrupts Thomas Gleixner
2020-05-12 21:01 ` [patch V5 23/38] genirq: Provde __irq_enter/exit_raw() Thomas Gleixner
2020-05-12 21:01 ` [patch V5 24/38] x86/entry: Provide IDTENTRY_SYSVEC Thomas Gleixner
2020-05-15  0:16   ` Boris Ostrovsky
2020-05-15  8:52     ` Thomas Gleixner
2020-05-12 21:01 ` [patch V5 25/38] x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-12 21:01 ` [patch V5 26/38] x86/entry: Convert SMP system vectors " Thomas Gleixner
2020-05-12 21:01 ` [patch V5 27/38] x86/entry: Convert various system vectors Thomas Gleixner
2020-05-12 21:01 ` [patch V5 28/38] x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-12 21:01 ` [patch V5 29/38] x86/entry: Convert various hypervisor " Thomas Gleixner
2020-05-12 21:01 ` [patch V5 30/38] x86/entry: Convert XEN hypercall vector " Thomas Gleixner
2020-05-12 21:01 ` [patch V5 31/38] x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLE Thomas Gleixner
2020-05-12 21:01 ` [patch V5 32/38] x86/entry: Remove the apic/BUILD interrupt leftovers Thomas Gleixner
2020-05-12 21:01 ` [patch V5 33/38] x86/entry/64: Remove IRQ stack switching ASM Thomas Gleixner
2020-05-12 21:01 ` [patch V5 34/38] x86/entry: Make enter_from_user_mode() static Thomas Gleixner
2020-05-12 21:01 ` [patch V5 35/38] x86/entry/32: Remove redundant irq disable code Thomas Gleixner
2020-05-12 21:01 ` [patch V5 36/38] x86/entry/64: Remove TRACE_IRQS_*_DEBUG Thomas Gleixner
2020-05-12 21:01 ` [patch V5 37/38] x86/entry: Move paranoid irq tracing out of ASM code Thomas Gleixner
2020-05-12 21:01 ` [patch V5 38/38] x86/entry: Remove the TRACE_IRQS cruft Thomas Gleixner
2020-05-14 16:35 ` [patch V5 00/38] x86/entry: Entry/exception code rework - the leftovers Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200512213810.611305682@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alexandre.chartre@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=brgerst@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jason.cj.chen@intel.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yakui.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).