linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0
@ 2018-08-29 22:20 Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}() Paul E. McKenney
                   ` (19 more replies)
  0 siblings, 20 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel

Hello!

This series contains the RCU flavor consolidation, along with some initial
cleanup work enabled by that consolidation (and some that became apparent
while cleaning up):

1.	Refactor rcu_{nmi,irq}_{enter,exit}(), saving a branch on the
	idle entry/exit hotpaths, courtesy of Byungchul Park,

2.	Defer reporting RCU-preempt quiescent states when disabled.
	This is the key commit that consolidates the RCU-bh and RCU-sched
	flavors into RCU, however, the RCU-bh and RCU-sched flavors
	still exist independently as well at this point.

3.	Test extended "rcu" read-side critical sections.  This commit
	causes rcutorture to test RCU's new-found ability to act as
	the combination of RCU, RCU-bh, and RCU-sched.

4.	Allow processing deferred QSes for exiting RCU-preempt readers.
	This is a optimization.

5.	Remove now-unused ->b.exp_need_qs field from the rcu_special union.

6.	Add warning to detect half-interrupts.  Test the claim that
	the Linux kernel no longer does half-interrupts.

7.	Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe, that is,
	make the consolidated RCU inherit RCU-bh's denial-of-service
	avoidance mechanism.

8.	Report expedited grace periods at context-switch time.  This is
	an optimization enabled by the RCU flavor consolidation.

9.	Define RCU-bh update API in terms of RCU.  This commit gets rid
	of the RCU-bh update mechanism.

10.	Update comments and help text to account for the removal of
	the RCU-bh update mechanism.

11.	Drop "wake" parameter from rcu_report_exp_rdp().

12.	Fix typo in rcu_get_gp_kthreads_prio() header comment.

13.	Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds.
	Although this commit gets rid of the RCU-sched update mechanism
	from PREEMPT builds, it of course remains as the sole RCU flavor
	for !PREEMPT && !SMP builds.

14.	Express Tiny RCU updates in terms of RCU rather than RCU-sched.
	This will enable additional cleanups and code savings.

15.	Remove RCU_STATE_INITIALIZER() in favor of just using an open-coded
	initializer for the sole remaining rcu_state structure.

16.	Eliminate rcu_state structure's ->call field, as it is now always
	just call_rcu().

17.	Remove rcu_state structure's ->rda field, as there is now only one
	set of per-CPU rcu_data structures.

							Thanx, Paul

------------------------------------------------------------------------

 Documentation/RCU/Design/Requirements/Requirements.html |   50 -
 include/linux/rcupdate.h                                |   48 -
 include/linux/rcupdate_wait.h                           |    6 
 include/linux/rcutiny.h                                 |   61 +
 include/linux/rcutree.h                                 |   31 
 include/linux/sched.h                                   |    6 
 kernel/rcu/Kconfig                                      |   10 
 kernel/rcu/rcutorture.c                                 |    1 
 kernel/rcu/tiny.c                                       |  163 +--
 kernel/rcu/tree.c                                       |  655 +++++-----------
 kernel/rcu/tree.h                                       |   44 -
 kernel/rcu/tree_exp.h                                   |  256 +++---
 kernel/rcu/tree_plugin.h                                |  523 ++++++------
 kernel/rcu/update.c                                     |    2 
 kernel/softirq.c                                        |    3 
 15 files changed, 836 insertions(+), 1023 deletions(-)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-30 18:10   ` Steven Rostedt
  2018-08-29 22:20 ` [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled Paul E. McKenney
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Byungchul Park, Paul E . McKenney

From: Byungchul Park <byungchul.park@lge.com>

When entering or exiting irq or NMI handlers, the current code uses
->dynticks_nmi_nesting to detect if it is in the outermost handler,
that is, the one interrupting or returning to an RCU-idle context (the
idle loop or nohz_full usermode execution).  When entering the outermost
handler via an interrupt (as opposed to NMI), it is necessary to invoke
rcu_dynticks_task_exit() just before the CPU is marked non-idle from an
RCU perspective and to invoke rcu_cleanup_after_idle() just after the
CPU is marked non-idle.  Similarly, when exiting the outermost handler
via an interrupt, it is necessary to invoke rcu_prepare_for_idle() just
before marking the CPU idle and to invoke rcu_dynticks_task_enter()
just after marking the CPU idle.

The decision to execute these four functions is currently taken in
rcu_irq_enter() and rcu_irq_exit() as follows:

   rcu_irq_enter()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_enter()
         /* A conditional branch with ->dynticks */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_irq_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */
      rcu_nmi_exit()
         /* A conditional branch with ->dynticks_nmi_nesting */
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter()
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit()
      /* A conditional branch with ->dynticks_nmi_nesting */

This works, but the conditional branches in rcu_irq_enter() and
rcu_irq_exit() are redundant with those in rcu_nmi_enter() and
rcu_nmi_exit(), respectively.  Redundant branches are not something
we want in the to/from-idle fastpaths, so this commit refactors
rcu_{nmi,irq}_{enter,exit}() so they use a common inlined function passed
a constant argument as follows:

   rcu_irq_enter() inlining rcu_nmi_enter_common(irq=true)
      /* A conditional branch with ->dynticks */

   rcu_irq_exit() inlining rcu_nmi_exit_common(irq=true)
      /* A conditional branch with ->dynticks_nmi_nesting */

   rcu_nmi_enter() inlining rcu_nmi_enter_common(irq=false)
      /* A conditional branch with ->dynticks */

   rcu_nmi_exit() inlining rcu_nmi_exit_common(irq=false)
      /* A conditional branch with ->dynticks_nmi_nesting */

The combination of the constant function argument and the inlining allows
the compiler to discard the conditionals that previously controlled
execution of rcu_dynticks_task_exit(), rcu_cleanup_after_idle(),
rcu_prepare_for_idle(), and rcu_dynticks_task_enter().  This reduces both
the to-idle and from-idle path lengths by two conditional branches each,
and improves readability as well.

This commit also changes order of execution from this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	trace_rcu_dyntick();
	rcu_cleanup_after_idle();

To this:

	rcu_dynticks_task_exit();
	rcu_dynticks_eqs_exit();
	rcu_cleanup_after_idle();
	trace_rcu_dyntick();

In other words, the calls to trace_rcu_dyntick() and trace_rcu_dyntick()
are reversed.  This has no functional effect because the real
concern is whether a given call is before or after the call to
rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
CPU and after that call RCU is watching.

A similar switch in calling order happens on the idle-entry path, with
similar lack of effect for the same reasons.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 61 +++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1369f7..0adf77923e8b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -771,17 +771,18 @@ void rcu_user_enter(void)
 #endif /* CONFIG_NO_HZ_FULL */
 
 /**
- * rcu_nmi_exit - inform RCU of exit from NMI context
+ * rcu_nmi_exit_common - inform RCU of exit from NMI context
+ * @irq: Is this call from rcu_irq_exit?
  *
  * If we are returning from the outermost NMI handler that interrupted an
  * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
  * to let the RCU grace-period handling know that the CPU is back to
  * being RCU-idle.
  *
- * If you add or remove a call to rcu_nmi_exit(), be sure to test
+ * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_nmi_exit(void)
+static __always_inline void rcu_nmi_exit_common(bool irq)
 {
 	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
 
@@ -807,7 +808,22 @@ void rcu_nmi_exit(void)
 	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
 	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
 	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
+
+	if (irq)
+		rcu_prepare_for_idle();
+
 	rcu_dynticks_eqs_enter();
+
+	if (irq)
+		rcu_dynticks_task_enter();
+}
+
+/**
+ * rcu_nmi_exit - inform RCU of exit from NMI context
+ */
+void rcu_nmi_exit(void)
+{
+	rcu_nmi_exit_common(false);
 }
 
 /**
@@ -831,14 +847,8 @@ void rcu_nmi_exit(void)
  */
 void rcu_irq_exit(void)
 {
-	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
-
 	lockdep_assert_irqs_disabled();
-	if (rdtp->dynticks_nmi_nesting == 1)
-		rcu_prepare_for_idle();
-	rcu_nmi_exit();
-	if (rdtp->dynticks_nmi_nesting == 0)
-		rcu_dynticks_task_enter();
+	rcu_nmi_exit_common(true);
 }
 
 /*
@@ -921,7 +931,8 @@ void rcu_user_exit(void)
 #endif /* CONFIG_NO_HZ_FULL */
 
 /**
- * rcu_nmi_enter - inform RCU of entry to NMI context
+ * rcu_nmi_enter_common - inform RCU of entry to NMI context
+ * @irq: Is this call from rcu_irq_enter?
  *
  * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
  * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
@@ -929,10 +940,10 @@ void rcu_user_exit(void)
  * long as the nesting level does not overflow an int.  (You will probably
  * run out of stack space first.)
  *
- * If you add or remove a call to rcu_nmi_enter(), be sure to test
+ * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_nmi_enter(void)
+static __always_inline void rcu_nmi_enter_common(bool irq)
 {
 	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
 	long incby = 2;
@@ -949,7 +960,15 @@ void rcu_nmi_enter(void)
 	 * period (observation due to Andy Lutomirski).
 	 */
 	if (rcu_dynticks_curr_cpu_in_eqs()) {
+
+		if (irq)
+			rcu_dynticks_task_exit();
+
 		rcu_dynticks_eqs_exit();
+
+		if (irq)
+			rcu_cleanup_after_idle();
+
 		incby = 1;
 	}
 	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
@@ -960,6 +979,14 @@ void rcu_nmi_enter(void)
 	barrier();
 }
 
+/**
+ * rcu_nmi_enter - inform RCU of entry to NMI context
+ */
+void rcu_nmi_enter(void)
+{
+	rcu_nmi_enter_common(false);
+}
+
 /**
  * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
  *
@@ -984,14 +1011,8 @@ void rcu_nmi_enter(void)
  */
 void rcu_irq_enter(void)
 {
-	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
-
 	lockdep_assert_irqs_disabled();
-	if (rdtp->dynticks_nmi_nesting == 0)
-		rcu_dynticks_task_exit();
-	rcu_nmi_enter();
-	if (rdtp->dynticks_nmi_nesting == 1)
-		rcu_cleanup_after_idle();
+	rcu_nmi_enter_common(true);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}() Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-10-29 11:24   ` Ran Rozenstein
  2018-08-29 22:20 ` [PATCH tip/core/rcu 03/19] rcutorture: Test extended "rcu" read-side critical sections Paul E. McKenney
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

This commit defers reporting of RCU-preempt quiescent states at
rcu_read_unlock_special() time when any of interrupts, softirq, or
preemption are disabled.  These deferred quiescent states are reported
at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
offline operation.  Of course, if another RCU read-side critical
section has started in the meantime, the reporting of the quiescent
state will be further deferred.

This also means that disabling preemption, interrupts, and/or
softirqs will act as an RCU-preempt read-side critical section.
This is enforced by checking preempt_count() as needed.

Some special cases must be handled on an ad-hoc basis, for example,
context switch is a quiescent state even though both the scheduler and
do_exit() disable preemption.  In these cases, additional calls to
rcu_preempt_deferred_qs() override the preemption disabling.  Similar
logic overrides disabled interrupts in rcu_preempt_check_callbacks()
because in this case the quiescent state happened just before the
corresponding scheduling-clock interrupt.

In theory, this change lifts a long-standing restriction that required
that if interrupts were disabled across a call to rcu_read_unlock()
that the matching rcu_read_lock() also be contained within that
interrupts-disabled region of code.  Because the reporting of the
corresponding RCU-preempt quiescent state is now deferred until
after interrupts have been enabled, it is no longer possible for this
situation to result in deadlocks involving the scheduler's runqueue and
priority-inheritance locks.  This may allow some code simplification that
might reduce interrupt latency a bit.  Unfortunately, in practice this
would also defer deboosting a low-priority task that had been subjected
to RCU priority boosting, so real-time-response considerations might
well force this restriction to remain in place.

Because RCU-preempt grace periods are now blocked not only by RCU
read-side critical sections, but also by disabling of interrupts,
preemption, and softirqs, it will be possible to eliminate RCU-bh and
RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels.  This may
require some additional plumbing to provide the network denial-of-service
guarantees that have been traditionally provided by RCU-bh.  Once these
are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
into RCU-sched.  This would mean that all kernels would have but
one flavor of RCU, which would open the door to significant code
cleanup.

Moving to a single flavor of RCU would also have the beneficial effect
of reducing the NOCB kthreads by at least a factor of two.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback
  from Joel Fernandes. ]
[ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
  response to bug reports from kbuild test robot. ]
[ paulmck: Fix bug located by kbuild test robot involving recursion
  via rcu_preempt_deferred_qs(). ]
---
 .../RCU/Design/Requirements/Requirements.html |  50 +++---
 include/linux/rcutiny.h                       |   5 +
 kernel/rcu/tree.c                             |   9 ++
 kernel/rcu/tree.h                             |   3 +
 kernel/rcu/tree_exp.h                         |  71 +++++++--
 kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
 6 files changed, 205 insertions(+), 77 deletions(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 49690228b1c6..038714475edb 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -2394,30 +2394,9 @@ when invoked from a CPU-hotplug notifier.
 <p>
 RCU depends on the scheduler, and the scheduler uses RCU to
 protect some of its data structures.
-This means the scheduler is forbidden from acquiring
-the runqueue locks and the priority-inheritance locks
-in the middle of an outermost RCU read-side critical section unless either
-(1)&nbsp;it releases them before exiting that same
-RCU read-side critical section, or
-(2)&nbsp;interrupts are disabled across
-that entire RCU read-side critical section.
-This same prohibition also applies (recursively!) to any lock that is acquired
-while holding any lock to which this prohibition applies.
-Adhering to this rule prevents preemptible RCU from invoking
-<tt>rcu_read_unlock_special()</tt> while either runqueue or
-priority-inheritance locks are held, thus avoiding deadlock.
-
-<p>
-Prior to v4.4, it was only necessary to disable preemption across
-RCU read-side critical sections that acquired scheduler locks.
-In v4.4, expedited grace periods started using IPIs, and these
-IPIs could force a <tt>rcu_read_unlock()</tt> to take the slowpath.
-Therefore, this expedited-grace-period change required disabling of
-interrupts, not just preemption.
-
-<p>
-For RCU's part, the preemptible-RCU <tt>rcu_read_unlock()</tt>
-implementation must be written carefully to avoid similar deadlocks.
+The preemptible-RCU <tt>rcu_read_unlock()</tt>
+implementation must therefore be written carefully to avoid deadlocks
+involving the scheduler's runqueue and priority-inheritance locks.
 In particular, <tt>rcu_read_unlock()</tt> must tolerate an
 interrupt where the interrupt handler invokes both
 <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>.
@@ -2426,7 +2405,7 @@ negative nesting levels to avoid destructive recursion via
 interrupt handler's use of RCU.
 
 <p>
-This pair of mutual scheduler-RCU requirements came as a
+This scheduler-RCU requirement came as a
 <a href="https://lwn.net/Articles/453002/">complete surprise</a>.
 
 <p>
@@ -2437,9 +2416,28 @@ when running context-switch-heavy workloads when built with
 <tt>CONFIG_NO_HZ_FULL=y</tt>
 <a href="http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf">did come as a surprise [PDF]</a>.
 RCU has made good progress towards meeting this requirement, even
-for context-switch-have <tt>CONFIG_NO_HZ_FULL=y</tt> workloads,
+for context-switch-heavy <tt>CONFIG_NO_HZ_FULL=y</tt> workloads,
 but there is room for further improvement.
 
+<p>
+In the past, it was forbidden to disable interrupts across an
+<tt>rcu_read_unlock()</tt> unless that interrupt-disabled region
+of code also included the matching <tt>rcu_read_lock()</tt>.
+Violating this restriction could result in deadlocks involving the
+scheduler's runqueue and priority-inheritance spinlocks.
+This restriction was lifted when interrupt-disabled calls to
+<tt>rcu_read_unlock()</tt> started deferring the reporting of
+the resulting RCU-preempt quiescent state until the end of that
+interrupts-disabled region.
+This deferred reporting means that the scheduler's runqueue and
+priority-inheritance locks cannot be held while reporting an RCU-preempt
+quiescent state, which lifts the earlier restriction, at least from
+a deadlock perspective.
+Unfortunately, real-time systems using RCU priority boosting may
+need this restriction to remain in effect because deferred
+quiescent-state reporting also defers deboosting, which in turn
+degrades real-time latencies.
+
 <h3><a name="Tracing and RCU">Tracing and RCU</a></h3>
 
 <p>
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 8d9a0ea8f0b5..f617ab19bb51 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -115,6 +115,11 @@ static inline void rcu_irq_exit_irqson(void) { }
 static inline void rcu_irq_enter_irqson(void) { }
 static inline void rcu_irq_exit(void) { }
 static inline void exit_rcu(void) { }
+static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t)
+{
+	return false;
+}
+static inline void rcu_preempt_deferred_qs(struct task_struct *t) { }
 #ifdef CONFIG_SRCU
 void rcu_scheduler_starting(void);
 #else /* #ifndef CONFIG_SRCU */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0adf77923e8b..dc041c2afbcc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -422,6 +422,7 @@ static void rcu_momentary_dyntick_idle(void)
 	special = atomic_add_return(2 * RCU_DYNTICK_CTRL_CTR, &rdtp->dynticks);
 	/* It is illegal to call this from idle state. */
 	WARN_ON_ONCE(!(special & RCU_DYNTICK_CTRL_CTR));
+	rcu_preempt_deferred_qs(current);
 }
 
 /*
@@ -729,6 +730,7 @@ static void rcu_eqs_enter(bool user)
 		do_nocb_deferred_wakeup(rdp);
 	}
 	rcu_prepare_for_idle();
+	rcu_preempt_deferred_qs(current);
 	WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid irq-access tearing. */
 	rcu_dynticks_eqs_enter();
 	rcu_dynticks_task_enter();
@@ -2849,6 +2851,12 @@ __rcu_process_callbacks(struct rcu_state *rsp)
 
 	WARN_ON_ONCE(!rdp->beenonline);
 
+	/* Report any deferred quiescent states if preemption enabled. */
+	if (!(preempt_count() & PREEMPT_MASK))
+		rcu_preempt_deferred_qs(current);
+	else if (rcu_preempt_need_deferred_qs(current))
+		resched_cpu(rdp->cpu); /* Provoke future context switch. */
+
 	/* Update RCU state based on any recent quiescent states. */
 	rcu_check_quiescent_state(rsp, rdp);
 
@@ -3822,6 +3830,7 @@ void rcu_report_dead(unsigned int cpu)
 	rcu_report_exp_rdp(&rcu_sched_state,
 			   this_cpu_ptr(rcu_sched_state.rda), true);
 	preempt_enable();
+	rcu_preempt_deferred_qs(current);
 	for_each_rcu_flavor(rsp)
 		rcu_cleanup_dying_idle_cpu(cpu, rsp);
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4e74df768c57..025bd2e5592b 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -195,6 +195,7 @@ struct rcu_data {
 	bool		core_needs_qs;	/* Core waits for quiesc state. */
 	bool		beenonline;	/* CPU online at least once. */
 	bool		gpwrap;		/* Possible ->gp_seq wrap. */
+	bool		deferred_qs;	/* This CPU awaiting a deferred QS? */
 	struct rcu_node *mynode;	/* This CPU's leaf of hierarchy */
 	unsigned long grpmask;		/* Mask to apply to leaf qsmask. */
 	unsigned long	ticks_this_gp;	/* The number of scheduling-clock */
@@ -461,6 +462,8 @@ static void rcu_cleanup_after_idle(void);
 static void rcu_prepare_for_idle(void);
 static void rcu_idle_count_callbacks_posted(void);
 static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
+static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
+static void rcu_preempt_deferred_qs(struct task_struct *t);
 static void print_cpu_stall_info_begin(void);
 static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
 static void print_cpu_stall_info_end(void);
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0b2c2ad69629..f9d5bbd8adce 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -262,6 +262,7 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
 			       bool wake)
 {
+	WRITE_ONCE(rdp->deferred_qs, false);
 	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake);
 }
 
@@ -735,32 +736,70 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
  */
 static void sync_rcu_exp_handler(void *info)
 {
-	struct rcu_data *rdp;
+	unsigned long flags;
 	struct rcu_state *rsp = info;
+	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
+	struct rcu_node *rnp = rdp->mynode;
 	struct task_struct *t = current;
 
 	/*
-	 * Within an RCU read-side critical section, request that the next
-	 * rcu_read_unlock() report.  Unless this RCU read-side critical
-	 * section has already blocked, in which case it is already set
-	 * up for the expedited grace period to wait on it.
+	 * First, the common case of not being in an RCU read-side
+	 * critical section.  If also enabled or idle, immediately
+	 * report the quiescent state, otherwise defer.
 	 */
-	if (t->rcu_read_lock_nesting > 0 &&
-	    !t->rcu_read_unlock_special.b.blocked) {
-		t->rcu_read_unlock_special.b.exp_need_qs = true;
+	if (!t->rcu_read_lock_nesting) {
+		if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
+		    rcu_dynticks_curr_cpu_in_eqs()) {
+			rcu_report_exp_rdp(rsp, rdp, true);
+		} else {
+			rdp->deferred_qs = true;
+			resched_cpu(rdp->cpu);
+		}
 		return;
 	}
 
 	/*
-	 * We are either exiting an RCU read-side critical section (negative
-	 * values of t->rcu_read_lock_nesting) or are not in one at all
-	 * (zero value of t->rcu_read_lock_nesting).  Or we are in an RCU
-	 * read-side critical section that blocked before this expedited
-	 * grace period started.  Either way, we can immediately report
-	 * the quiescent state.
+	 * Second, the less-common case of being in an RCU read-side
+	 * critical section.  In this case we can count on a future
+	 * rcu_read_unlock().  However, this rcu_read_unlock() might
+	 * execute on some other CPU, but in that case there will be
+	 * a future context switch.  Either way, if the expedited
+	 * grace period is still waiting on this CPU, set ->deferred_qs
+	 * so that the eventual quiescent state will be reported.
+	 * Note that there is a large group of race conditions that
+	 * can have caused this quiescent state to already have been
+	 * reported, so we really do need to check ->expmask.
 	 */
-	rdp = this_cpu_ptr(rsp->rda);
-	rcu_report_exp_rdp(rsp, rdp, true);
+	if (t->rcu_read_lock_nesting > 0) {
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
+		if (rnp->expmask & rdp->grpmask)
+			rdp->deferred_qs = true;
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	}
+
+	/*
+	 * The final and least likely case is where the interrupted
+	 * code was just about to or just finished exiting the RCU-preempt
+	 * read-side critical section, and no, we can't tell which.
+	 * So either way, set ->deferred_qs to flag later code that
+	 * a quiescent state is required.
+	 *
+	 * If the CPU is fully enabled (or if some buggy RCU-preempt
+	 * read-side critical section is being used from idle), just
+	 * invoke rcu_preempt_defer_qs() to immediately report the
+	 * quiescent state.  We cannot use rcu_read_unlock_special()
+	 * because we are in an interrupt handler, which will cause that
+	 * function to take an early exit without doing anything.
+	 *
+	 * Otherwise, use resched_cpu() to force a context switch after
+	 * the CPU enables everything.
+	 */
+	rdp->deferred_qs = true;
+	if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
+	    WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs()))
+		rcu_preempt_deferred_qs(t);
+	else
+		resched_cpu(rdp->cpu);
 }
 
 /**
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a97c20ea9bce..542791361908 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -371,6 +371,9 @@ static void rcu_preempt_note_context_switch(bool preempt)
 		 * behalf of preempted instance of __rcu_read_unlock().
 		 */
 		rcu_read_unlock_special(t);
+		rcu_preempt_deferred_qs(t);
+	} else {
+		rcu_preempt_deferred_qs(t);
 	}
 
 	/*
@@ -464,54 +467,51 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
 }
 
 /*
- * Handle special cases during rcu_read_unlock(), such as needing to
- * notify RCU core processing or task having blocked during the RCU
- * read-side critical section.
+ * Report deferred quiescent states.  The deferral time can
+ * be quite short, for example, in the case of the call from
+ * rcu_read_unlock_special().
  */
-static void rcu_read_unlock_special(struct task_struct *t)
+static void
+rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 {
 	bool empty_exp;
 	bool empty_norm;
 	bool empty_exp_now;
-	unsigned long flags;
 	struct list_head *np;
 	bool drop_boost_mutex = false;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	union rcu_special special;
 
-	/* NMI handlers cannot block and cannot safely manipulate state. */
-	if (in_nmi())
-		return;
-
-	local_irq_save(flags);
-
 	/*
 	 * If RCU core is waiting for this CPU to exit its critical section,
 	 * report the fact that it has exited.  Because irqs are disabled,
 	 * t->rcu_read_unlock_special cannot change.
 	 */
 	special = t->rcu_read_unlock_special;
+	rdp = this_cpu_ptr(rcu_state_p->rda);
+	if (!special.s && !rdp->deferred_qs) {
+		local_irq_restore(flags);
+		return;
+	}
 	if (special.b.need_qs) {
 		rcu_preempt_qs();
 		t->rcu_read_unlock_special.b.need_qs = false;
-		if (!t->rcu_read_unlock_special.s) {
+		if (!t->rcu_read_unlock_special.s && !rdp->deferred_qs) {
 			local_irq_restore(flags);
 			return;
 		}
 	}
 
 	/*
-	 * Respond to a request for an expedited grace period, but only if
-	 * we were not preempted, meaning that we were running on the same
-	 * CPU throughout.  If we were preempted, the exp_need_qs flag
-	 * would have been cleared at the time of the first preemption,
-	 * and the quiescent state would be reported when we were dequeued.
+	 * Respond to a request by an expedited grace period for a
+	 * quiescent state from this CPU.  Note that requests from
+	 * tasks are handled when removing the task from the
+	 * blocked-tasks list below.
 	 */
-	if (special.b.exp_need_qs) {
-		WARN_ON_ONCE(special.b.blocked);
+	if (special.b.exp_need_qs || rdp->deferred_qs) {
 		t->rcu_read_unlock_special.b.exp_need_qs = false;
-		rdp = this_cpu_ptr(rcu_state_p->rda);
+		rdp->deferred_qs = false;
 		rcu_report_exp_rdp(rcu_state_p, rdp, true);
 		if (!t->rcu_read_unlock_special.s) {
 			local_irq_restore(flags);
@@ -519,19 +519,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		}
 	}
 
-	/* Hardware IRQ handlers cannot block, complain if they get here. */
-	if (in_irq() || in_serving_softirq()) {
-		lockdep_rcu_suspicious(__FILE__, __LINE__,
-				       "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
-		pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
-			 t->rcu_read_unlock_special.s,
-			 t->rcu_read_unlock_special.b.blocked,
-			 t->rcu_read_unlock_special.b.exp_need_qs,
-			 t->rcu_read_unlock_special.b.need_qs);
-		local_irq_restore(flags);
-		return;
-	}
-
 	/* Clean up if blocked during RCU read-side critical section. */
 	if (special.b.blocked) {
 		t->rcu_read_unlock_special.b.blocked = false;
@@ -602,6 +589,72 @@ static void rcu_read_unlock_special(struct task_struct *t)
 	}
 }
 
+/*
+ * Is a deferred quiescent-state pending, and are we also not in
+ * an RCU read-side critical section?  It is the caller's responsibility
+ * to ensure it is otherwise safe to report any deferred quiescent
+ * states.  The reason for this is that it is safe to report a
+ * quiescent state during context switch even though preemption
+ * is disabled.  This function cannot be expected to understand these
+ * nuances, so the caller must handle them.
+ */
+static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
+{
+	return (this_cpu_ptr(&rcu_preempt_data)->deferred_qs ||
+		READ_ONCE(t->rcu_read_unlock_special.s)) &&
+	       !t->rcu_read_lock_nesting;
+}
+
+/*
+ * Report a deferred quiescent state if needed and safe to do so.
+ * As with rcu_preempt_need_deferred_qs(), "safe" involves only
+ * not being in an RCU read-side critical section.  The caller must
+ * evaluate safety in terms of interrupt, softirq, and preemption
+ * disabling.
+ */
+static void rcu_preempt_deferred_qs(struct task_struct *t)
+{
+	unsigned long flags;
+	bool couldrecurse = t->rcu_read_lock_nesting >= 0;
+
+	if (!rcu_preempt_need_deferred_qs(t))
+		return;
+	if (couldrecurse)
+		t->rcu_read_lock_nesting -= INT_MIN;
+	local_irq_save(flags);
+	rcu_preempt_deferred_qs_irqrestore(t, flags);
+	if (couldrecurse)
+		t->rcu_read_lock_nesting += INT_MIN;
+}
+
+/*
+ * Handle special cases during rcu_read_unlock(), such as needing to
+ * notify RCU core processing or task having blocked during the RCU
+ * read-side critical section.
+ */
+static void rcu_read_unlock_special(struct task_struct *t)
+{
+	unsigned long flags;
+	bool preempt_bh_were_disabled =
+			!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
+	bool irqs_were_disabled;
+
+	/* NMI handlers cannot block and cannot safely manipulate state. */
+	if (in_nmi())
+		return;
+
+	local_irq_save(flags);
+	irqs_were_disabled = irqs_disabled_flags(flags);
+	if ((preempt_bh_were_disabled || irqs_were_disabled) &&
+	    t->rcu_read_unlock_special.b.blocked) {
+		/* Need to defer quiescent state until everything is enabled. */
+		raise_softirq_irqoff(RCU_SOFTIRQ);
+		local_irq_restore(flags);
+		return;
+	}
+	rcu_preempt_deferred_qs_irqrestore(t, flags);
+}
+
 /*
  * Dump detailed information for all tasks blocking the current RCU
  * grace period on the specified rcu_node structure.
@@ -737,10 +790,20 @@ static void rcu_preempt_check_callbacks(void)
 	struct rcu_state *rsp = &rcu_preempt_state;
 	struct task_struct *t = current;
 
-	if (t->rcu_read_lock_nesting == 0) {
-		rcu_preempt_qs();
+	if (t->rcu_read_lock_nesting > 0 ||
+	    (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
+		/* No QS, force context switch if deferred. */
+		if (rcu_preempt_need_deferred_qs(t))
+			resched_cpu(smp_processor_id());
+	} else if (rcu_preempt_need_deferred_qs(t)) {
+		rcu_preempt_deferred_qs(t); /* Report deferred QS. */
+		return;
+	} else if (!t->rcu_read_lock_nesting) {
+		rcu_preempt_qs(); /* Report immediate QS. */
 		return;
 	}
+
+	/* If GP is oldish, ask for help from rcu_read_unlock_special(). */
 	if (t->rcu_read_lock_nesting > 0 &&
 	    __this_cpu_read(rcu_data_p->core_needs_qs) &&
 	    __this_cpu_read(rcu_data_p->cpu_no_qs.b.norm) &&
@@ -859,6 +922,7 @@ void exit_rcu(void)
 	barrier();
 	t->rcu_read_unlock_special.b.blocked = true;
 	__rcu_read_unlock();
+	rcu_preempt_deferred_qs(current);
 }
 
 /*
@@ -940,6 +1004,16 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
 	return false;
 }
 
+/*
+ * Because there is no preemptible RCU, there can be no deferred quiescent
+ * states.
+ */
+static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
+{
+	return false;
+}
+static void rcu_preempt_deferred_qs(struct task_struct *t) { }
+
 /*
  * Because preemptible RCU does not exist, we never have to check for
  * tasks blocked within RCU read-side critical sections.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 03/19] rcutorture: Test extended "rcu" read-side critical sections
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}() Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 04/19] rcu: Allow processing deferred QSes for exiting RCU-preempt readers Paul E. McKenney
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

This commit makes the "rcu" torture type test extended read-side
critical sections in order to test the deferral of RCU-preempt
quiescent-state testing.

In CONFIG_PREEMPT=n kernels, this simply duplicates the setup already
in place for the "sched" torture type.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/rcutorture.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index c596c6f1e457..c55d1483886e 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -431,6 +431,7 @@ static struct rcu_torture_ops rcu_ops = {
 	.stats		= NULL,
 	.irq_capable	= 1,
 	.can_boost	= rcu_can_boost(),
+	.extendables	= RCUTORTURE_MAX_EXTEND,
 	.name		= "rcu"
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 04/19] rcu: Allow processing deferred QSes for exiting RCU-preempt readers
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 03/19] rcutorture: Test extended "rcu" read-side critical sections Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 05/19] rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union Paul E. McKenney
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

If an RCU-preempt read-side critical section is exiting, that is,
->rcu_read_lock_nesting is negative, then it is a good time to look
at the possibility of reporting deferred quiescent states.  This
commit therefore updates the checks in rcu_preempt_need_deferred_qs()
to allow exiting critical sections to report deferred quiescent states.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 542791361908..24c209676d20 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -602,7 +602,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 {
 	return (this_cpu_ptr(&rcu_preempt_data)->deferred_qs ||
 		READ_ONCE(t->rcu_read_unlock_special.s)) &&
-	       !t->rcu_read_lock_nesting;
+	       t->rcu_read_lock_nesting <= 0;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 05/19] rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 04/19] rcu: Allow processing deferred QSes for exiting RCU-preempt readers Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts Paul E. McKenney
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

The ->b.exp_need_qs field is now set only to false, so this commit
removes it.  The job this field used to do is now done by the rcu_data
structure's ->deferred_qs field, which is a consequence of a better
split between task-based (the rcu_node structure's ->exp_tasks field) and
CPU-based (the aforementioned rcu_data structure's ->deferred_qs field)
tracking of quiescent states for RCU-preempt expedited grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/sched.h    |  6 +-----
 kernel/rcu/tree_plugin.h | 13 ++++---------
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 977cb57d7bc9..004ca21f7e80 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -571,12 +571,8 @@ union rcu_special {
 	struct {
 		u8			blocked;
 		u8			need_qs;
-		u8			exp_need_qs;
-
-		/* Otherwise the compiler can store garbage here: */
-		u8			pad;
 	} b; /* Bits. */
-	u32 s; /* Set of bits. */
+	u16 s; /* Set of bits. */
 };
 
 enum perf_event_task_context {
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 24c209676d20..527a52792dce 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -284,13 +284,10 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
 	 * no need to check for a subsequent expedited GP.  (Though we are
 	 * still in a quiescent state in any case.)
 	 */
-	if (blkd_state & RCU_EXP_BLKD &&
-	    t->rcu_read_unlock_special.b.exp_need_qs) {
-		t->rcu_read_unlock_special.b.exp_need_qs = false;
+	if (blkd_state & RCU_EXP_BLKD && rdp->deferred_qs)
 		rcu_report_exp_rdp(rdp->rsp, rdp, true);
-	} else {
-		WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
-	}
+	else
+		WARN_ON_ONCE(rdp->deferred_qs);
 }
 
 /*
@@ -509,9 +506,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	 * tasks are handled when removing the task from the
 	 * blocked-tasks list below.
 	 */
-	if (special.b.exp_need_qs || rdp->deferred_qs) {
-		t->rcu_read_unlock_special.b.exp_need_qs = false;
-		rdp->deferred_qs = false;
+	if (rdp->deferred_qs) {
 		rcu_report_exp_rdp(rcu_state_p, rdp, true);
 		if (!t->rcu_read_unlock_special.s) {
 			local_irq_restore(flags);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 05/19] rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2019-03-11 13:39   ` Joel Fernandes
  2018-08-29 22:20 ` [PATCH tip/core/rcu 07/19] rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe Paul E. McKenney
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
either an interrupt that invokes rcu_irq_enter() but never invokes the
corresponding rcu_irq_exit() on the one hand, or an interrupt that never
invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
on the other.  These things really did happen at one time, as evidenced
by this ca-2011 LKML post:

http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com

The reason why RCU tolerates half-interrupts is that usermode helpers
used exceptions to invoke a system call from within the kernel such that
the system call did a normal return (not a return from exception) to
the calling context.  This caused rcu_irq_enter() to be invoked without
a matching rcu_irq_exit().  However, usermode helpers have since been
rewritten to make much more housebroken use of workqueues, kernel threads,
and do_execve(), and therefore should no longer produce half-interrupts.
No one knows of any other source of half-interrupts, but then again,
no one seems insane enough to go audit the entire kernel to verify that
half-interrupts really are a relic of the past.

This commit therefore adds a pair of WARN_ON_ONCE() calls that will
trigger in the presence of half interrupts, which the code will continue
to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
mid-2021, then perhaps RCU can stop handling half-interrupts, which
would be a considerable simplification.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dc041c2afbcc..d2b6ade692c9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -714,6 +714,7 @@ static void rcu_eqs_enter(bool user)
 	struct rcu_dynticks *rdtp;
 
 	rdtp = this_cpu_ptr(&rcu_dynticks);
+	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
 	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 		     rdtp->dynticks_nesting == 0);
@@ -895,6 +896,7 @@ static void rcu_eqs_exit(bool user)
 	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
 	WRITE_ONCE(rdtp->dynticks_nesting, 1);
+	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting);
 	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 07/19] rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 08/19] rcu: Report expedited grace periods at context-switch time Paul E. McKenney
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

One necessary step towards consolidating the three flavors of RCU is to
make sure that the resulting consolidated "one flavor to rule them all"
correctly handles networking denial-of-service attacks.  One thing that
allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every
so often, and so something similar has to happen for consolidated RCU.

This must be done carefully.  For example, if a preemption-disabled
region of code takes an interrupt which does softirq processing before
returning, consolidated RCU must ignore the resulting rcu_bh_qs()
invocations -- preemption is still disabled, and that means an RCU
reader for the consolidated flavor.

This commit therefore creates a new rcu_softirq_qs() that is called only
from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region
problem.  This new rcu_softirq_qs() function invokes rcu_sched_qs(),
rcu_preempt_qs(), and rcu_preempt_deferred_qs().  The latter call handles
any deferred quiescent states.

Note that __do_softirq() still invokes rcu_bh_qs().  It will continue to
do so until a later stage of cleanup when the RCU-bh flavor is removed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix !SMP issue located by kbuild test robot. ]
---
 include/linux/rcutiny.h  | 5 +++++
 include/linux/rcutree.h  | 1 +
 kernel/rcu/tree.c        | 7 +++++++
 kernel/rcu/tree.h        | 1 +
 kernel/rcu/tree_plugin.h | 5 +++++
 kernel/softirq.c         | 2 ++
 6 files changed, 21 insertions(+)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index f617ab19bb51..bcfbc40a7239 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -90,6 +90,11 @@ static inline void kfree_call_rcu(struct rcu_head *head,
 	call_rcu(head, func);
 }
 
+static inline void rcu_softirq_qs(void)
+{
+	rcu_sched_qs();
+}
+
 #define rcu_note_context_switch(preempt) \
 	do { \
 		rcu_sched_qs(); \
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 914655848ef6..664b580695d6 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -30,6 +30,7 @@
 #ifndef __LINUX_RCUTREE_H
 #define __LINUX_RCUTREE_H
 
+void rcu_softirq_qs(void);
 void rcu_note_context_switch(bool preempt);
 int rcu_needs_cpu(u64 basem, u64 *nextevt);
 void rcu_cpu_stall_reset(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d2b6ade692c9..9b6bd12133c4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -255,6 +255,13 @@ void rcu_bh_qs(void)
 	}
 }
 
+void rcu_softirq_qs(void)
+{
+	rcu_sched_qs();
+	rcu_preempt_qs();
+	rcu_preempt_deferred_qs(current);
+}
+
 /*
  * Steal a bit from the bottom of ->dynticks for idle entry/exit
  * control.  Initially this is for TLB flushing.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 025bd2e5592b..e02c882861eb 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -433,6 +433,7 @@ DECLARE_PER_CPU(char, rcu_cpu_has_work);
 
 /* Forward declarations for rcutree_plugin.h */
 static void rcu_bootup_announce(void);
+static void rcu_preempt_qs(void);
 static void rcu_preempt_note_context_switch(bool preempt);
 static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 527a52792dce..c686bf63bba5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -974,6 +974,11 @@ static void __init rcu_bootup_announce(void)
 	rcu_bootup_announce_oddness();
 }
 
+/* Because preemptible RCU does not exist, we can ignore its QSes. */
+static void rcu_preempt_qs(void)
+{
+}
+
 /*
  * Because preemptible RCU does not exist, we never have to check for
  * CPUs being in quiescent states.
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 6f584861d329..ebd69694144a 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -302,6 +302,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	}
 
 	rcu_bh_qs();
+	if (__this_cpu_read(ksoftirqd) == current)
+		rcu_softirq_qs();
 	local_irq_disable();
 
 	pending = local_softirq_pending();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 08/19] rcu: Report expedited grace periods at context-switch time
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 07/19] rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 09/19] rcu: Define RCU-bh update API in terms of RCU Paul E. McKenney
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

This commit reduces the latency of expedited RCU grace periods by
reporting a quiescent state for the CPU at context-switch time.
In CONFIG_PREEMPT=y kernels, if the outgoing task is still within an
RCU read-side critical section (and thus still blocking some grace
period, perhaps including this expedited grace period), then that task
will already have been placed on one of the leaf rcu_node structures'
->blkd_tasks list.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c686bf63bba5..0d7107fb3dec 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -332,7 +332,7 @@ static void rcu_preempt_qs(void)
 static void rcu_preempt_note_context_switch(bool preempt)
 {
 	struct task_struct *t = current;
-	struct rcu_data *rdp;
+	struct rcu_data *rdp = this_cpu_ptr(rcu_state_p->rda);
 	struct rcu_node *rnp;
 
 	lockdep_assert_irqs_disabled();
@@ -341,7 +341,6 @@ static void rcu_preempt_note_context_switch(bool preempt)
 	    !t->rcu_read_unlock_special.b.blocked) {
 
 		/* Possibly blocking in an RCU read-side critical section. */
-		rdp = this_cpu_ptr(rcu_state_p->rda);
 		rnp = rdp->mynode;
 		raw_spin_lock_rcu_node(rnp);
 		t->rcu_read_unlock_special.b.blocked = true;
@@ -383,6 +382,8 @@ static void rcu_preempt_note_context_switch(bool preempt)
 	 * means that we continue to block the current grace period.
 	 */
 	rcu_preempt_qs();
+	if (rdp->deferred_qs)
+		rcu_report_exp_rdp(rcu_state_p, rdp, true);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 09/19] rcu: Define RCU-bh update API in terms of RCU
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 08/19] rcu: Report expedited grace periods at context-switch time Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 10/19] rcu: Update comments and help text for no more RCU-bh updaters Paul E. McKenney
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

Now that the main RCU API knows about softirq disabling and softirq's
quiescent states, the RCU-bh update code can be dispensed with.
This commit therefore removes the RCU-bh update-side implementation and
defines RCU-bh's update-side API in terms of that of either RCU-preempt or
RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option.

In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect
of reducing by one the number of rcuo kthreads per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |  10 ++--
 include/linux/rcutiny.h  |  10 +++-
 include/linux/rcutree.h  |   8 ++-
 kernel/rcu/tiny.c        | 115 +++++++--------------------------------
 kernel/rcu/tree.c        |  97 +++------------------------------
 kernel/rcu/tree_plugin.h |   1 -
 kernel/softirq.c         |   1 -
 7 files changed, 48 insertions(+), 194 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 75e5b393cf44..9ebfd436cec7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,11 +55,15 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #define	call_rcu	call_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
 void rcu_barrier_tasks(void);
 
+static inline void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
+{
+	call_rcu(head, func);
+}
+
 #ifdef CONFIG_PREEMPT_RCU
 
 void __rcu_read_lock(void);
@@ -104,7 +108,6 @@ static inline int rcu_preempt_depth(void)
 void rcu_init(void);
 extern int rcu_scheduler_active __read_mostly;
 void rcu_sched_qs(void);
-void rcu_bh_qs(void);
 void rcu_check_callbacks(int user);
 void rcu_report_dead(unsigned int cpu);
 void rcutree_migrate_callbacks(int cpu);
@@ -326,8 +329,7 @@ static inline void rcu_preempt_sleep_check(void) { }
  * and rcu_assign_pointer().  Some of these could be folded into their
  * callers, but they are left separate in order to ease introduction of
  * multiple flavors of pointers to match the multiple flavors of RCU
- * (e.g., __rcu_bh, * __rcu_sched, and __srcu), should this make sense in
- * the future.
+ * (e.g., __rcu_sched, and __srcu), should this make sense in the future.
  */
 
 #ifdef __CHECKER__
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index bcfbc40a7239..ac26c27ccde8 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -56,19 +56,23 @@ static inline void cond_synchronize_sched(unsigned long oldstate)
 	might_sleep();
 }
 
-extern void rcu_barrier_bh(void);
-extern void rcu_barrier_sched(void);
-
 static inline void synchronize_rcu_expedited(void)
 {
 	synchronize_sched();	/* Only one CPU, so pretty fast anyway!!! */
 }
 
+extern void rcu_barrier_sched(void);
+
 static inline void rcu_barrier(void)
 {
 	rcu_barrier_sched();  /* Only one CPU, so only one list of callbacks! */
 }
 
+static inline void rcu_barrier_bh(void)
+{
+	rcu_barrier();
+}
+
 static inline void synchronize_rcu_bh(void)
 {
 	synchronize_sched();
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 664b580695d6..c789c302a2c9 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -45,7 +45,11 @@ static inline void rcu_virt_note_context_switch(int cpu)
 	rcu_note_context_switch(false);
 }
 
-void synchronize_rcu_bh(void);
+static inline void synchronize_rcu_bh(void)
+{
+	synchronize_rcu();
+}
+
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
@@ -69,7 +73,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
  */
 static inline void synchronize_rcu_bh_expedited(void)
 {
-	synchronize_sched_expedited();
+	synchronize_rcu_expedited();
 }
 
 void rcu_barrier(void);
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index befc9321a89c..cadcf63c4889 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -51,64 +51,22 @@ static struct rcu_ctrlblk rcu_sched_ctrlblk = {
 	.curtail	= &rcu_sched_ctrlblk.rcucblist,
 };
 
-static struct rcu_ctrlblk rcu_bh_ctrlblk = {
-	.donetail	= &rcu_bh_ctrlblk.rcucblist,
-	.curtail	= &rcu_bh_ctrlblk.rcucblist,
-};
-
-void rcu_barrier_bh(void)
-{
-	wait_rcu_gp(call_rcu_bh);
-}
-EXPORT_SYMBOL(rcu_barrier_bh);
-
 void rcu_barrier_sched(void)
 {
 	wait_rcu_gp(call_rcu_sched);
 }
 EXPORT_SYMBOL(rcu_barrier_sched);
 
-/*
- * Helper function for rcu_sched_qs() and rcu_bh_qs().
- * Also irqs are disabled to avoid confusion due to interrupt handlers
- * invoking call_rcu().
- */
-static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
-{
-	if (rcp->donetail != rcp->curtail) {
-		rcp->donetail = rcp->curtail;
-		return 1;
-	}
-
-	return 0;
-}
-
-/*
- * Record an rcu quiescent state.  And an rcu_bh quiescent state while we
- * are at it, given that any rcu quiescent state is also an rcu_bh
- * quiescent state.  Use "+" instead of "||" to defeat short circuiting.
- */
+/* Record an rcu quiescent state.  */
 void rcu_sched_qs(void)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	if (rcu_qsctr_help(&rcu_sched_ctrlblk) +
-	    rcu_qsctr_help(&rcu_bh_ctrlblk))
-		raise_softirq(RCU_SOFTIRQ);
-	local_irq_restore(flags);
-}
-
-/*
- * Record an rcu_bh quiescent state.
- */
-void rcu_bh_qs(void)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	if (rcu_qsctr_help(&rcu_bh_ctrlblk))
+	if (rcu_sched_ctrlblk.donetail != rcu_sched_ctrlblk.curtail) {
+		rcu_sched_ctrlblk.donetail = rcu_sched_ctrlblk.curtail;
 		raise_softirq(RCU_SOFTIRQ);
+	}
 	local_irq_restore(flags);
 }
 
@@ -122,32 +80,27 @@ void rcu_check_callbacks(int user)
 {
 	if (user)
 		rcu_sched_qs();
-	if (user || !in_softirq())
-		rcu_bh_qs();
 }
 
-/*
- * Invoke the RCU callbacks on the specified rcu_ctrlkblk structure
- * whose grace period has elapsed.
- */
-static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
+/* Invoke the RCU callbacks whose grace period has elapsed.  */
+static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
 {
 	struct rcu_head *next, *list;
 	unsigned long flags;
 
 	/* Move the ready-to-invoke callbacks to a local list. */
 	local_irq_save(flags);
-	if (rcp->donetail == &rcp->rcucblist) {
+	if (rcu_sched_ctrlblk.donetail == &rcu_sched_ctrlblk.rcucblist) {
 		/* No callbacks ready, so just leave. */
 		local_irq_restore(flags);
 		return;
 	}
-	list = rcp->rcucblist;
-	rcp->rcucblist = *rcp->donetail;
-	*rcp->donetail = NULL;
-	if (rcp->curtail == rcp->donetail)
-		rcp->curtail = &rcp->rcucblist;
-	rcp->donetail = &rcp->rcucblist;
+	list = rcu_sched_ctrlblk.rcucblist;
+	rcu_sched_ctrlblk.rcucblist = *rcu_sched_ctrlblk.donetail;
+	*rcu_sched_ctrlblk.donetail = NULL;
+	if (rcu_sched_ctrlblk.curtail == rcu_sched_ctrlblk.donetail)
+		rcu_sched_ctrlblk.curtail = &rcu_sched_ctrlblk.rcucblist;
+	rcu_sched_ctrlblk.donetail = &rcu_sched_ctrlblk.rcucblist;
 	local_irq_restore(flags);
 
 	/* Invoke the callbacks on the local list. */
@@ -162,19 +115,13 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
 	}
 }
 
-static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
-{
-	__rcu_process_callbacks(&rcu_sched_ctrlblk);
-	__rcu_process_callbacks(&rcu_bh_ctrlblk);
-}
-
 /*
  * Wait for a grace period to elapse.  But it is illegal to invoke
  * synchronize_sched() from within an RCU read-side critical section.
  * Therefore, any legal call to synchronize_sched() is a quiescent
  * state, and so on a UP system, synchronize_sched() need do nothing.
- * Ditto for synchronize_rcu_bh().  (But Lai Jiangshan points out the
- * benefits of doing might_sleep() to reduce latency.)
+ * (But Lai Jiangshan points out the benefits of doing might_sleep()
+ * to reduce latency.)
  *
  * Cool, huh?  (Due to Josh Triplett.)
  */
@@ -188,11 +135,11 @@ void synchronize_sched(void)
 EXPORT_SYMBOL_GPL(synchronize_sched);
 
 /*
- * Helper function for call_rcu() and call_rcu_bh().
+ * Post an RCU callback to be invoked after the end of an RCU-sched grace
+ * period.  But since we have but one CPU, that would be after any
+ * quiescent state.
  */
-static void __call_rcu(struct rcu_head *head,
-		       rcu_callback_t func,
-		       struct rcu_ctrlblk *rcp)
+void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 
@@ -201,8 +148,8 @@ static void __call_rcu(struct rcu_head *head,
 	head->next = NULL;
 
 	local_irq_save(flags);
-	*rcp->curtail = head;
-	rcp->curtail = &head->next;
+	*rcu_sched_ctrlblk.curtail = head;
+	rcu_sched_ctrlblk.curtail = &head->next;
 	local_irq_restore(flags);
 
 	if (unlikely(is_idle_task(current))) {
@@ -210,28 +157,8 @@ static void __call_rcu(struct rcu_head *head,
 		resched_cpu(0);
 	}
 }
-
-/*
- * Post an RCU callback to be invoked after the end of an RCU-sched grace
- * period.  But since we have but one CPU, that would be after any
- * quiescent state.
- */
-void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
-{
-	__call_rcu(head, func, &rcu_sched_ctrlblk);
-}
 EXPORT_SYMBOL_GPL(call_rcu_sched);
 
-/*
- * Post an RCU bottom-half callback to be invoked after any subsequent
- * quiescent state.
- */
-void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
-{
-	__call_rcu(head, func, &rcu_bh_ctrlblk);
-}
-EXPORT_SYMBOL_GPL(call_rcu_bh);
-
 void __init rcu_init(void)
 {
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9b6bd12133c4..b602d60462ba 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -108,7 +108,6 @@ struct rcu_state sname##_state = { \
 }
 
 RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
-RCU_STATE_INITIALIZER(rcu_bh, 'b', call_rcu_bh);
 
 static struct rcu_state *const rcu_state_p;
 LIST_HEAD(rcu_struct_flavors);
@@ -244,17 +243,6 @@ void rcu_sched_qs(void)
 			   this_cpu_ptr(&rcu_sched_data), true);
 }
 
-void rcu_bh_qs(void)
-{
-	RCU_LOCKDEP_WARN(preemptible(), "rcu_bh_qs() invoked with preemption enabled!!!");
-	if (__this_cpu_read(rcu_bh_data.cpu_no_qs.s)) {
-		trace_rcu_grace_period(TPS("rcu_bh"),
-				       __this_cpu_read(rcu_bh_data.gp_seq),
-				       TPS("cpuqs"));
-		__this_cpu_write(rcu_bh_data.cpu_no_qs.b.norm, false);
-	}
-}
-
 void rcu_softirq_qs(void)
 {
 	rcu_sched_qs();
@@ -581,7 +569,7 @@ EXPORT_SYMBOL_GPL(rcu_sched_get_gp_seq);
  */
 unsigned long rcu_bh_get_gp_seq(void)
 {
-	return READ_ONCE(rcu_bh_state.gp_seq);
+	return READ_ONCE(rcu_state_p->gp_seq);
 }
 EXPORT_SYMBOL_GPL(rcu_bh_get_gp_seq);
 
@@ -621,7 +609,7 @@ EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
  */
 void rcu_bh_force_quiescent_state(void)
 {
-	force_quiescent_state(&rcu_bh_state);
+	force_quiescent_state(rcu_state_p);
 }
 EXPORT_SYMBOL_GPL(rcu_bh_force_quiescent_state);
 
@@ -680,10 +668,8 @@ void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
 
 	switch (test_type) {
 	case RCU_FLAVOR:
-		rsp = rcu_state_p;
-		break;
 	case RCU_BH_FLAVOR:
-		rsp = &rcu_bh_state;
+		rsp = rcu_state_p;
 		break;
 	case RCU_SCHED_FLAVOR:
 		rsp = &rcu_sched_state;
@@ -2672,26 +2658,15 @@ void rcu_check_callbacks(int user)
 		 * nested interrupt.  In this case, the CPU is in
 		 * a quiescent state, so note it.
 		 *
-		 * No memory barrier is required here because both
-		 * rcu_sched_qs() and rcu_bh_qs() reference only CPU-local
-		 * variables that other CPUs neither access nor modify,
-		 * at least not while the corresponding CPU is online.
+		 * No memory barrier is required here because
+		 * rcu_sched_qs() references only CPU-local variables
+		 * that other CPUs neither access nor modify, at least
+		 * not while the corresponding CPU is online.
 		 */
 
 		rcu_sched_qs();
-		rcu_bh_qs();
 		rcu_note_voluntary_context_switch(current);
 
-	} else if (!in_softirq()) {
-
-		/*
-		 * Get here if this CPU did not take its interrupt from
-		 * softirq, in other words, if it is not interrupting
-		 * a rcu_bh read-side critical section.  This is an _bh
-		 * critical section, so note it.
-		 */
-
-		rcu_bh_qs();
 	}
 	rcu_preempt_check_callbacks();
 	if (rcu_pending())
@@ -3078,34 +3053,6 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(call_rcu_sched);
 
-/**
- * call_rcu_bh() - Queue an RCU for invocation after a quicker grace period.
- * @head: structure to be used for queueing the RCU updates.
- * @func: actual callback function to be invoked after the grace period
- *
- * The callback function will be invoked some time after a full grace
- * period elapses, in other words after all currently executing RCU
- * read-side critical sections have completed. call_rcu_bh() assumes
- * that the read-side critical sections end on completion of a softirq
- * handler. This means that read-side critical sections in process
- * context must not be interrupted by softirqs. This interface is to be
- * used when most of the read-side critical sections are in softirq context.
- * RCU read-side critical sections are delimited by:
- *
- * - rcu_read_lock() and  rcu_read_unlock(), if in interrupt context, OR
- * - rcu_read_lock_bh() and rcu_read_unlock_bh(), if in process context.
- *
- * These may be nested.
- *
- * See the description of call_rcu() for more detailed information on
- * memory ordering guarantees.
- */
-void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
-{
-	__call_rcu(head, func, &rcu_bh_state, -1, 0);
-}
-EXPORT_SYMBOL_GPL(call_rcu_bh);
-
 /*
  * Queue an RCU callback for lazy invocation after a grace period.
  * This will likely be later named something like "call_rcu_lazy()",
@@ -3190,33 +3137,6 @@ void synchronize_sched(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_sched);
 
-/**
- * synchronize_rcu_bh - wait until an rcu_bh grace period has elapsed.
- *
- * Control will return to the caller some time after a full rcu_bh grace
- * period has elapsed, in other words after all currently executing rcu_bh
- * read-side critical sections have completed.  RCU read-side critical
- * sections are delimited by rcu_read_lock_bh() and rcu_read_unlock_bh(),
- * and may be nested.
- *
- * See the description of synchronize_sched() for more detailed information
- * on memory ordering guarantees.
- */
-void synchronize_rcu_bh(void)
-{
-	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
-			 lock_is_held(&rcu_lock_map) ||
-			 lock_is_held(&rcu_sched_lock_map),
-			 "Illegal synchronize_rcu_bh() in RCU-bh read-side critical section");
-	if (rcu_blocking_is_gp())
-		return;
-	if (rcu_gp_is_expedited())
-		synchronize_rcu_bh_expedited();
-	else
-		wait_rcu_gp(call_rcu_bh);
-}
-EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
-
 /**
  * get_state_synchronize_rcu - Snapshot current RCU state
  *
@@ -3528,7 +3448,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
  */
 void rcu_barrier_bh(void)
 {
-	_rcu_barrier(&rcu_bh_state);
+	_rcu_barrier(rcu_state_p);
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_bh);
 
@@ -4179,7 +4099,6 @@ void __init rcu_init(void)
 
 	rcu_bootup_announce();
 	rcu_init_geometry();
-	rcu_init_one(&rcu_bh_state);
 	rcu_init_one(&rcu_sched_state);
 	if (dump_tree)
 		rcu_dump_rcu_node_tree(&rcu_sched_state);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0d7107fb3dec..1ff742a3c8d1 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1320,7 +1320,6 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 static void rcu_kthread_do_work(void)
 {
 	rcu_do_batch(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data));
-	rcu_do_batch(&rcu_bh_state, this_cpu_ptr(&rcu_bh_data));
 	rcu_do_batch(&rcu_preempt_state, this_cpu_ptr(&rcu_preempt_data));
 }
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ebd69694144a..7a0720a20003 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -301,7 +301,6 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 		pending >>= softirq_bit;
 	}
 
-	rcu_bh_qs();
 	if (__this_cpu_read(ksoftirqd) == current)
 		rcu_softirq_qs();
 	local_irq_disable();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 10/19] rcu: Update comments and help text for no more RCU-bh updaters
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 09/19] rcu: Define RCU-bh update API in terms of RCU Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 11/19] rcu: Drop "wake" parameter from rcu_report_exp_rdp() Paul E. McKenney
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

This commit updates comments and help text to account for the fact that
RCU-bh update-side functions are now simple wrappers for their RCU or
RCU-sched counterparts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h      | 12 ++++--------
 include/linux/rcupdate_wait.h |  6 +++---
 include/linux/rcutree.h       | 14 ++------------
 kernel/rcu/Kconfig            | 10 +++++-----
 kernel/rcu/tree.c             | 17 +++++++++--------
 kernel/rcu/update.c           |  2 +-
 6 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9ebfd436cec7..8d5740edd63c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -688,14 +688,10 @@ static inline void rcu_read_unlock(void)
 /**
  * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section
  *
- * This is equivalent of rcu_read_lock(), but to be used when updates
- * are being done using call_rcu_bh() or synchronize_rcu_bh(). Since
- * both call_rcu_bh() and synchronize_rcu_bh() consider completion of a
- * softirq handler to be a quiescent state, a process in RCU read-side
- * critical section must be protected by disabling softirqs. Read-side
- * critical sections in interrupt context can use just rcu_read_lock(),
- * though this should at least be commented to avoid confusing people
- * reading the code.
+ * This is equivalent of rcu_read_lock(), but also disables softirqs.
+ * Note that synchronize_rcu() and friends may be used for the update
+ * side, although synchronize_rcu_bh() is available as a wrapper in the
+ * short term.  Longer term, the _bh update-side API will be eliminated.
  *
  * Note that rcu_read_lock_bh() and the matching rcu_read_unlock_bh()
  * must occur in the same context, for example, it is illegal to invoke
diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
index 57f371344152..bc104699560e 100644
--- a/include/linux/rcupdate_wait.h
+++ b/include/linux/rcupdate_wait.h
@@ -36,13 +36,13 @@ do {									\
  * @...: List of call_rcu() functions for the flavors to wait on.
  *
  * This macro waits concurrently for multiple flavors of RCU grace periods.
- * For example, synchronize_rcu_mult(call_rcu, call_rcu_bh) would wait
- * on concurrent RCU and RCU-bh grace periods.  Waiting on a give SRCU
+ * For example, synchronize_rcu_mult(call_rcu, call_rcu_sched) would wait
+ * on concurrent RCU and RCU-sched grace periods.  Waiting on a give SRCU
  * domain requires you to write a wrapper function for that SRCU domain's
  * call_srcu() function, supplying the corresponding srcu_struct.
  *
  * If Tiny RCU, tell _wait_rcu_gp() not to bother waiting for RCU
- * or RCU-bh, given that anywhere synchronize_rcu_mult() can be called
+ * or RCU-sched, given that anywhere synchronize_rcu_mult() can be called
  * is automatically a grace period.
  */
 #define synchronize_rcu_mult(...) \
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index c789c302a2c9..f7a41323aa54 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -58,18 +58,8 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
- * Wait for an RCU-bh grace period to elapse, but use a "big hammer"
- * approach to force the grace period to end quickly.  This consumes
- * significant time on all CPUs and is unfriendly to real-time workloads,
- * so is thus not recommended for any sort of common-case code.  In fact,
- * if you are using synchronize_rcu_bh_expedited() in a loop, please
- * restructure your code to batch your updates, and then use a single
- * synchronize_rcu_bh() instead.
- *
- * Note that it is illegal to call this function while holding any lock
- * that is acquired by a CPU-hotplug notifier.  And yes, it is also illegal
- * to call this function from a CPU-hotplug notifier.  Failing to observe
- * these restriction will result in deadlock.
+ * This is a transitional API and will soon be removed, with all
+ * callers converted to synchronize_rcu_expedited().
  */
 static inline void synchronize_rcu_bh_expedited(void)
 {
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 9210379c0353..a0b7f0103ca9 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -229,11 +229,11 @@ config RCU_NOCB_CPU
 	  CPUs specified at boot time by the rcu_nocbs parameter.
 	  For each such CPU, a kthread ("rcuox/N") will be created to
 	  invoke callbacks, where the "N" is the CPU being offloaded,
-	  and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
-	  "s" for RCU-sched.  Nothing prevents this kthread from running
-	  on the specified CPUs, but (1) the kthreads may be preempted
-	  between each callback, and (2) affinity or cgroups can be used
-	  to force the kthreads to run on whatever set of CPUs is desired.
+	  and where the "p" for RCU-preempt and "s" for RCU-sched.
+	  Nothing prevents this kthread from running on the specified
+	  CPUs, but (1) the kthreads may be preempted between each
+	  callback, and (2) affinity or cgroups can be used to force
+	  the kthreads to run on whatever set of CPUs is desired.
 
 	  Say Y here if you want to help to debug reduced OS jitter.
 	  Say N here if you are unsure.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b602d60462ba..8d04cf2c6f76 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -565,7 +565,8 @@ unsigned long rcu_sched_get_gp_seq(void)
 EXPORT_SYMBOL_GPL(rcu_sched_get_gp_seq);
 
 /*
- * Return the number of RCU-bh GPs completed thus far for debug & stats.
+ * Return the number of RCU GPs completed thus far for debug & stats.
+ * This is a transitional API and will soon be removed.
  */
 unsigned long rcu_bh_get_gp_seq(void)
 {
@@ -3068,13 +3069,13 @@ void kfree_call_rcu(struct rcu_head *head,
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
 
 /*
- * Because a context switch is a grace period for RCU-sched and RCU-bh,
- * any blocking grace-period wait automatically implies a grace period
- * if there is only one CPU online at any point time during execution
- * of either synchronize_sched() or synchronize_rcu_bh().  It is OK to
- * occasionally incorrectly indicate that there are multiple CPUs online
- * when there was in fact only one the whole time, as this just adds
- * some overhead: RCU still operates correctly.
+ * Because a context switch is a grace period for RCU-sched, any blocking
+ * grace-period wait automatically implies a grace period if there
+ * is only one CPU online at any point time during execution of either
+ * synchronize_sched() or synchronize_rcu_bh().  It is OK to occasionally
+ * incorrectly indicate that there are multiple CPUs online when there
+ * was in fact only one the whole time, as this just adds some overhead:
+ * RCU still operates correctly.
  */
 static int rcu_blocking_is_gp(void)
 {
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 39cb23d22109..9ea87d0aa386 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -298,7 +298,7 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
  *
  * Check debug_lockdep_rcu_enabled() to prevent false positives during boot.
  *
- * Note that rcu_read_lock() is disallowed if the CPU is either idle or
+ * Note that rcu_read_lock_bh() is disallowed if the CPU is either idle or
  * offline from an RCU perspective, so check for those as well.
  */
 int rcu_read_lock_bh_held(void)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 11/19] rcu: Drop "wake" parameter from rcu_report_exp_rdp()
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 10/19] rcu: Update comments and help text for no more RCU-bh updaters Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 12/19] rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment Paul E. McKenney
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

The rcu_report_exp_rdp() function is always invoked with its "wake"
argument set to "true", so this commit drops this parameter.  The only
potential call site that would use "false" is in the code driving the
expedited grace period, and that code uses rcu_report_exp_cpu_mult()
instead, which therefore retains its "wake" parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 9 +++------
 kernel/rcu/tree_exp.h    | 9 ++++-----
 kernel/rcu/tree_plugin.h | 6 +++---
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8d04cf2c6f76..bb40d3598a0d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -165,8 +165,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
 static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
-static void rcu_report_exp_rdp(struct rcu_state *rsp,
-			       struct rcu_data *rdp, bool wake);
+static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp);
 static void sync_sched_exp_online_cleanup(int cpu);
 
 /* rcuc/rcub kthread realtime priority */
@@ -239,8 +238,7 @@ void rcu_sched_qs(void)
 	if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
 		return;
 	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, false);
-	rcu_report_exp_rdp(&rcu_sched_state,
-			   this_cpu_ptr(&rcu_sched_data), true);
+	rcu_report_exp_rdp(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data));
 }
 
 void rcu_softirq_qs(void)
@@ -3757,8 +3755,7 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* QS for any half-done expedited RCU-sched GP. */
 	preempt_disable();
-	rcu_report_exp_rdp(&rcu_sched_state,
-			   this_cpu_ptr(rcu_sched_state.rda), true);
+	rcu_report_exp_rdp(&rcu_sched_state, this_cpu_ptr(rcu_sched_state.rda));
 	preempt_enable();
 	rcu_preempt_deferred_qs(current);
 	for_each_rcu_flavor(rsp)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index f9d5bbd8adce..0f8f225c1b46 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -259,11 +259,10 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
 /*
  * Report expedited quiescent state for specified rcu_data (CPU).
  */
-static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
-			       bool wake)
+static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp)
 {
 	WRITE_ONCE(rdp->deferred_qs, false);
-	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake);
+	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, true);
 }
 
 /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
@@ -352,7 +351,7 @@ static void sync_sched_exp_handler(void *data)
 		return;
 	if (rcu_is_cpu_rrupt_from_idle()) {
 		rcu_report_exp_rdp(&rcu_sched_state,
-				   this_cpu_ptr(&rcu_sched_data), true);
+				   this_cpu_ptr(&rcu_sched_data));
 		return;
 	}
 	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
@@ -750,7 +749,7 @@ static void sync_rcu_exp_handler(void *info)
 	if (!t->rcu_read_lock_nesting) {
 		if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
 		    rcu_dynticks_curr_cpu_in_eqs()) {
-			rcu_report_exp_rdp(rsp, rdp, true);
+			rcu_report_exp_rdp(rsp, rdp);
 		} else {
 			rdp->deferred_qs = true;
 			resched_cpu(rdp->cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1ff742a3c8d1..9f0d054e6c20 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -285,7 +285,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
 	 * still in a quiescent state in any case.)
 	 */
 	if (blkd_state & RCU_EXP_BLKD && rdp->deferred_qs)
-		rcu_report_exp_rdp(rdp->rsp, rdp, true);
+		rcu_report_exp_rdp(rdp->rsp, rdp);
 	else
 		WARN_ON_ONCE(rdp->deferred_qs);
 }
@@ -383,7 +383,7 @@ static void rcu_preempt_note_context_switch(bool preempt)
 	 */
 	rcu_preempt_qs();
 	if (rdp->deferred_qs)
-		rcu_report_exp_rdp(rcu_state_p, rdp, true);
+		rcu_report_exp_rdp(rcu_state_p, rdp);
 }
 
 /*
@@ -508,7 +508,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	 * blocked-tasks list below.
 	 */
 	if (rdp->deferred_qs) {
-		rcu_report_exp_rdp(rcu_state_p, rdp, true);
+		rcu_report_exp_rdp(rcu_state_p, rdp);
 		if (!t->rcu_read_unlock_special.s) {
 			local_irq_restore(flags);
 			return;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 12/19] rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 11/19] rcu: Drop "wake" parameter from rcu_report_exp_rdp() Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 13/19] rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds Paul E. McKenney
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bb40d3598a0d..5cc035dc61cb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -181,7 +181,7 @@ module_param(gp_init_delay, int, 0444);
 static int gp_cleanup_delay;
 module_param(gp_cleanup_delay, int, 0444);
 
-/* Retreive RCU kthreads priority for rcutorture */
+/* Retrieve RCU kthreads priority for rcutorture */
 int rcu_get_gp_kthreads_prio(void)
 {
 	return kthread_prio;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 13/19] rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (11 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 12/19] rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 14/19] rcu: Express Tiny RCU updates in terms of RCU rather than RCU-sched Paul E. McKenney
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney, Andi Kleen

Now that RCU-preempt knows about preemption disabling, its implementation
of synchronize_rcu() works for synchronize_sched(), and likewise for the
other RCU-sched update-side API members.  This commit therefore confines
the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines
RCU-sched's update-side API members in terms of those of RCU-preempt.

This means that any given build of the Linux kernel has only one
update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds
and RCU-sched for CONFIG_PREEMPT=n builds.  This in turn means that kernels
built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 include/linux/rcupdate.h |  14 +-
 include/linux/rcutiny.h  |   7 +
 include/linux/rcutree.h  |   7 +-
 kernel/rcu/tree.c        | 301 +++++++++++++--------------------------
 kernel/rcu/tree.h        |   9 +-
 kernel/rcu/tree_exp.h    | 153 ++++++++++----------
 kernel/rcu/tree_plugin.h | 297 ++++++++++++++------------------------
 7 files changed, 308 insertions(+), 480 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 8d5740edd63c..94474bb6b5c4 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -49,11 +49,11 @@
 
 /* Exported common interfaces */
 
-#ifdef CONFIG_PREEMPT_RCU
-void call_rcu(struct rcu_head *head, rcu_callback_t func);
-#else /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TINY_RCU
 #define	call_rcu	call_rcu_sched
-#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+#else
+void call_rcu(struct rcu_head *head, rcu_callback_t func);
+#endif
 
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -92,11 +92,6 @@ static inline void __rcu_read_unlock(void)
 		preempt_enable();
 }
 
-static inline void synchronize_rcu(void)
-{
-	synchronize_sched();
-}
-
 static inline int rcu_preempt_depth(void)
 {
 	return 0;
@@ -107,7 +102,6 @@ static inline int rcu_preempt_depth(void)
 /* Internal to kernel */
 void rcu_init(void);
 extern int rcu_scheduler_active __read_mostly;
-void rcu_sched_qs(void);
 void rcu_check_callbacks(int user);
 void rcu_report_dead(unsigned int cpu);
 void rcutree_migrate_callbacks(int cpu);
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ac26c27ccde8..df2c0895c5e7 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -36,6 +36,11 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
 /* Never flag non-existent other CPUs! */
 static inline bool rcu_eqs_special_set(int cpu) { return false; }
 
+static inline void synchronize_rcu(void)
+{
+	synchronize_sched();
+}
+
 static inline unsigned long get_state_synchronize_rcu(void)
 {
 	return 0;
@@ -94,6 +99,8 @@ static inline void kfree_call_rcu(struct rcu_head *head,
 	call_rcu(head, func);
 }
 
+void rcu_sched_qs(void);
+
 static inline void rcu_softirq_qs(void)
 {
 	rcu_sched_qs();
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index f7a41323aa54..0c44720f0e84 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -45,14 +45,19 @@ static inline void rcu_virt_note_context_switch(int cpu)
 	rcu_note_context_switch(false);
 }
 
+void synchronize_rcu(void);
 static inline void synchronize_rcu_bh(void)
 {
 	synchronize_rcu();
 }
 
-void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
+static inline void synchronize_sched_expedited(void)
+{
+	synchronize_rcu_expedited();
+}
+
 void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
 
 /**
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5cc035dc61cb..a8965a7caf25 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -92,24 +92,29 @@ static const char *tp_##sname##_varname __used __tracepoint_string = sname##_var
 
 #define RCU_STATE_INITIALIZER(sname, sabbr, cr) \
 DEFINE_RCU_TPS(sname) \
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, sname##_data); \
-struct rcu_state sname##_state = { \
-	.level = { &sname##_state.node[0] }, \
-	.rda = &sname##_data, \
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data); \
+struct rcu_state rcu_state = { \
+	.level = { &rcu_state.node[0] }, \
+	.rda = &rcu_data, \
 	.call = cr, \
 	.gp_state = RCU_GP_IDLE, \
 	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, \
-	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
+	.barrier_mutex = __MUTEX_INITIALIZER(rcu_state.barrier_mutex), \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
-	.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
-	.exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \
-	.ofl_lock = __SPIN_LOCK_UNLOCKED(sname##_state.ofl_lock), \
+	.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex), \
+	.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex), \
+	.ofl_lock = __SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock), \
 }
 
-RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
+#ifdef CONFIG_PREEMPT_RCU
+RCU_STATE_INITIALIZER(rcu_preempt, 'p', call_rcu);
+#else
+RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu);
+#endif
 
-static struct rcu_state *const rcu_state_p;
+static struct rcu_state *const rcu_state_p = &rcu_state;
+static struct rcu_data __percpu *const rcu_data_p = &rcu_data;
 LIST_HEAD(rcu_struct_flavors);
 
 /* Dump rcu_node combining tree at boot to verify correct setup. */
@@ -220,31 +225,9 @@ static int rcu_gp_in_progress(struct rcu_state *rsp)
 	return rcu_seq_state(rcu_seq_current(&rsp->gp_seq));
 }
 
-/*
- * Note a quiescent state.  Because we do not need to know
- * how many quiescent states passed, just if there was at least
- * one since the start of the grace period, this just sets a flag.
- * The caller must have disabled preemption.
- */
-void rcu_sched_qs(void)
-{
-	RCU_LOCKDEP_WARN(preemptible(), "rcu_sched_qs() invoked with preemption enabled!!!");
-	if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.s))
-		return;
-	trace_rcu_grace_period(TPS("rcu_sched"),
-			       __this_cpu_read(rcu_sched_data.gp_seq),
-			       TPS("cpuqs"));
-	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
-	if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
-		return;
-	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, false);
-	rcu_report_exp_rdp(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data));
-}
-
 void rcu_softirq_qs(void)
 {
-	rcu_sched_qs();
-	rcu_preempt_qs();
+	rcu_qs();
 	rcu_preempt_deferred_qs(current);
 }
 
@@ -418,31 +401,18 @@ static void rcu_momentary_dyntick_idle(void)
 	rcu_preempt_deferred_qs(current);
 }
 
-/*
- * Note a context switch.  This is a quiescent state for RCU-sched,
- * and requires special handling for preemptible RCU.
- * The caller must have disabled interrupts.
+/**
+ * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
+ *
+ * If the current CPU is idle or running at a first-level (not nested)
+ * interrupt from idle, return true.  The caller must have at least
+ * disabled preemption.
  */
-void rcu_note_context_switch(bool preempt)
+static int rcu_is_cpu_rrupt_from_idle(void)
 {
-	barrier(); /* Avoid RCU read-side critical sections leaking down. */
-	trace_rcu_utilization(TPS("Start context switch"));
-	rcu_sched_qs();
-	rcu_preempt_note_context_switch(preempt);
-	/* Load rcu_urgent_qs before other flags. */
-	if (!smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs)))
-		goto out;
-	this_cpu_write(rcu_dynticks.rcu_urgent_qs, false);
-	if (unlikely(raw_cpu_read(rcu_dynticks.rcu_need_heavy_qs)))
-		rcu_momentary_dyntick_idle();
-	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
-	if (!preempt)
-		rcu_tasks_qs(current);
-out:
-	trace_rcu_utilization(TPS("End context switch"));
-	barrier(); /* Avoid RCU read-side critical sections leaking up. */
+	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
+	       __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
 }
-EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
 /*
  * Register a quiescent state for all RCU flavors.  If there is an
@@ -476,8 +446,8 @@ void rcu_all_qs(void)
 		rcu_momentary_dyntick_idle();
 		local_irq_restore(flags);
 	}
-	if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)))
-		rcu_sched_qs();
+	if (unlikely(raw_cpu_read(rcu_data.cpu_no_qs.b.exp)))
+		rcu_qs();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 	preempt_enable();
@@ -558,7 +528,7 @@ EXPORT_SYMBOL_GPL(rcu_get_gp_seq);
  */
 unsigned long rcu_sched_get_gp_seq(void)
 {
-	return READ_ONCE(rcu_sched_state.gp_seq);
+	return rcu_get_gp_seq();
 }
 EXPORT_SYMBOL_GPL(rcu_sched_get_gp_seq);
 
@@ -590,7 +560,7 @@ EXPORT_SYMBOL_GPL(rcu_exp_batches_completed);
  */
 unsigned long rcu_exp_batches_completed_sched(void)
 {
-	return rcu_sched_state.expedited_sequence;
+	return rcu_state.expedited_sequence;
 }
 EXPORT_SYMBOL_GPL(rcu_exp_batches_completed_sched);
 
@@ -617,7 +587,7 @@ EXPORT_SYMBOL_GPL(rcu_bh_force_quiescent_state);
  */
 void rcu_sched_force_quiescent_state(void)
 {
-	force_quiescent_state(&rcu_sched_state);
+	rcu_force_quiescent_state();
 }
 EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 
@@ -668,10 +638,8 @@ void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
 	switch (test_type) {
 	case RCU_FLAVOR:
 	case RCU_BH_FLAVOR:
-		rsp = rcu_state_p;
-		break;
 	case RCU_SCHED_FLAVOR:
-		rsp = &rcu_sched_state;
+		rsp = rcu_state_p;
 		break;
 	default:
 		break;
@@ -1106,19 +1074,6 @@ EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online);
 
 #endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */
 
-/**
- * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
- *
- * If the current CPU is idle or running at a first-level (not nested)
- * interrupt from idle, return true.  The caller must have at least
- * disabled preemption.
- */
-static int rcu_is_cpu_rrupt_from_idle(void)
-{
-	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
-	       __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
-}
-
 /*
  * We are reporting a quiescent state on behalf of some other CPU, so
  * it is our responsibility to check for and handle potential overflow
@@ -2363,7 +2318,7 @@ rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
 	struct rcu_node *rnp_p;
 
 	raw_lockdep_assert_held_rcu_node(rnp);
-	if (WARN_ON_ONCE(rcu_state_p == &rcu_sched_state) ||
+	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT)) ||
 	    WARN_ON_ONCE(rsp != rcu_state_p) ||
 	    WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) ||
 	    rnp->qsmask != 0) {
@@ -2649,25 +2604,7 @@ void rcu_check_callbacks(int user)
 {
 	trace_rcu_utilization(TPS("Start scheduler-tick"));
 	increment_cpu_stall_ticks();
-	if (user || rcu_is_cpu_rrupt_from_idle()) {
-
-		/*
-		 * Get here if this CPU took its interrupt from user
-		 * mode or from the idle loop, and if this is not a
-		 * nested interrupt.  In this case, the CPU is in
-		 * a quiescent state, so note it.
-		 *
-		 * No memory barrier is required here because
-		 * rcu_sched_qs() references only CPU-local variables
-		 * that other CPUs neither access nor modify, at least
-		 * not while the corresponding CPU is online.
-		 */
-
-		rcu_sched_qs();
-		rcu_note_voluntary_context_switch(current);
-
-	}
-	rcu_preempt_check_callbacks();
+	rcu_flavor_check_callbacks(user);
 	if (rcu_pending())
 		invoke_rcu_core();
 
@@ -2693,7 +2630,7 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *rsp))
 		mask = 0;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask == 0) {
-			if (rcu_state_p == &rcu_sched_state ||
+			if (!IS_ENABLED(CONFIG_PREEMPT) ||
 			    rsp != rcu_state_p ||
 			    rcu_preempt_blocked_readers_cgp(rnp)) {
 				/*
@@ -3027,28 +2964,56 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func,
 }
 
 /**
- * call_rcu_sched() - Queue an RCU for invocation after sched grace period.
+ * call_rcu() - Queue an RCU callback for invocation after a grace period.
  * @head: structure to be used for queueing the RCU updates.
  * @func: actual callback function to be invoked after the grace period
  *
  * The callback function will be invoked some time after a full grace
- * period elapses, in other words after all currently executing RCU
- * read-side critical sections have completed. call_rcu_sched() assumes
- * that the read-side critical sections end on enabling of preemption
- * or on voluntary preemption.
- * RCU read-side critical sections are delimited by:
- *
- * - rcu_read_lock_sched() and rcu_read_unlock_sched(), OR
- * - anything that disables preemption.
- *
- *  These may be nested.
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed.  However, the callback function
+ * might well execute concurrently with RCU read-side critical sections
+ * that started after call_rcu() was invoked.  RCU read-side critical
+ * sections are delimited by rcu_read_lock() and rcu_read_unlock(), and
+ * may be nested.  In addition, regions of code across which interrupts,
+ * preemption, or softirqs have been disabled also serve as RCU read-side
+ * critical sections.  This includes hardware interrupt handlers, softirq
+ * handlers, and NMI handlers.
+ *
+ * Note that all CPUs must agree that the grace period extended beyond
+ * all pre-existing RCU read-side critical section.  On systems with more
+ * than one CPU, this means that when "func()" is invoked, each CPU is
+ * guaranteed to have executed a full memory barrier since the end of its
+ * last RCU read-side critical section whose beginning preceded the call
+ * to call_rcu().  It also means that each CPU executing an RCU read-side
+ * critical section that continues beyond the start of "func()" must have
+ * executed a memory barrier after the call_rcu() but before the beginning
+ * of that RCU read-side critical section.  Note that these guarantees
+ * include CPUs that are offline, idle, or executing in user mode, as
+ * well as CPUs that are executing in the kernel.
+ *
+ * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
+ * resulting RCU callback function "func()", then both CPU A and CPU B are
+ * guaranteed to execute a full memory barrier during the time interval
+ * between the call to call_rcu() and the invocation of "func()" -- even
+ * if CPU A and CPU B are the same CPU (but again only if the system has
+ * more than one CPU).
+ */
+void call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	__call_rcu(head, func, rcu_state_p, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+/**
+ * call_rcu_sched() - Queue an RCU for invocation after sched grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
  *
- * See the description of call_rcu() for more detailed information on
- * memory ordering guarantees.
+ * This is transitional.
  */
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
 {
-	__call_rcu(head, func, &rcu_sched_state, -1, 0);
+	call_rcu(head, func);
 }
 EXPORT_SYMBOL_GPL(call_rcu_sched);
 
@@ -3066,73 +3031,14 @@ void kfree_call_rcu(struct rcu_head *head,
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
 
-/*
- * Because a context switch is a grace period for RCU-sched, any blocking
- * grace-period wait automatically implies a grace period if there
- * is only one CPU online at any point time during execution of either
- * synchronize_sched() or synchronize_rcu_bh().  It is OK to occasionally
- * incorrectly indicate that there are multiple CPUs online when there
- * was in fact only one the whole time, as this just adds some overhead:
- * RCU still operates correctly.
- */
-static int rcu_blocking_is_gp(void)
-{
-	int ret;
-
-	might_sleep();  /* Check for RCU read-side critical section. */
-	preempt_disable();
-	ret = num_online_cpus() <= 1;
-	preempt_enable();
-	return ret;
-}
-
 /**
  * synchronize_sched - wait until an rcu-sched grace period has elapsed.
  *
- * Control will return to the caller some time after a full rcu-sched
- * grace period has elapsed, in other words after all currently executing
- * rcu-sched read-side critical sections have completed.   These read-side
- * critical sections are delimited by rcu_read_lock_sched() and
- * rcu_read_unlock_sched(), and may be nested.  Note that preempt_disable(),
- * local_irq_disable(), and so on may be used in place of
- * rcu_read_lock_sched().
- *
- * This means that all preempt_disable code sequences, including NMI and
- * non-threaded hardware-interrupt handlers, in progress on entry will
- * have completed before this primitive returns.  However, this does not
- * guarantee that softirq handlers will have completed, since in some
- * kernels, these handlers can run in process context, and can block.
- *
- * Note that this guarantee implies further memory-ordering guarantees.
- * On systems with more than one CPU, when synchronize_sched() returns,
- * each CPU is guaranteed to have executed a full memory barrier since the
- * end of its last RCU-sched read-side critical section whose beginning
- * preceded the call to synchronize_sched().  In addition, each CPU having
- * an RCU read-side critical section that extends beyond the return from
- * synchronize_sched() is guaranteed to have executed a full memory barrier
- * after the beginning of synchronize_sched() and before the beginning of
- * that RCU read-side critical section.  Note that these guarantees include
- * CPUs that are offline, idle, or executing in user mode, as well as CPUs
- * that are executing in the kernel.
- *
- * Furthermore, if CPU A invoked synchronize_sched(), which returned
- * to its caller on CPU B, then both CPU A and CPU B are guaranteed
- * to have executed a full memory barrier during the execution of
- * synchronize_sched() -- even if CPU A and CPU B are the same CPU (but
- * again only if the system has more than one CPU).
+ * This is transitional.
  */
 void synchronize_sched(void)
 {
-	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
-			 lock_is_held(&rcu_lock_map) ||
-			 lock_is_held(&rcu_sched_lock_map),
-			 "Illegal synchronize_sched() in RCU-sched read-side critical section");
-	if (rcu_blocking_is_gp())
-		return;
-	if (rcu_gp_is_expedited())
-		synchronize_sched_expedited();
-	else
-		wait_rcu_gp(call_rcu_sched);
+	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(synchronize_sched);
 
@@ -3180,41 +3086,23 @@ EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
 /**
  * get_state_synchronize_sched - Snapshot current RCU-sched state
  *
- * Returns a cookie that is used by a later call to cond_synchronize_sched()
- * to determine whether or not a full grace period has elapsed in the
- * meantime.
+ * This is transitional, and only used by rcutorture.
  */
 unsigned long get_state_synchronize_sched(void)
 {
-	/*
-	 * Any prior manipulation of RCU-protected data must happen
-	 * before the load from ->gp_seq.
-	 */
-	smp_mb();  /* ^^^ */
-	return rcu_seq_snap(&rcu_sched_state.gp_seq);
+	return get_state_synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(get_state_synchronize_sched);
 
 /**
  * cond_synchronize_sched - Conditionally wait for an RCU-sched grace period
- *
  * @oldstate: return value from earlier call to get_state_synchronize_sched()
  *
- * If a full RCU-sched grace period has elapsed since the earlier call to
- * get_state_synchronize_sched(), just return.  Otherwise, invoke
- * synchronize_sched() to wait for a full grace period.
- *
- * Yes, this function does not take counter wrap into account.  But
- * counter wrap is harmless.  If the counter wraps, we have waited for
- * more than 2 billion grace periods (and way more on a 64-bit system!),
- * so waiting for one additional grace period should be just fine.
+ * This is transitional and only used by rcutorture.
  */
 void cond_synchronize_sched(unsigned long oldstate)
 {
-	if (!rcu_seq_done(&rcu_sched_state.gp_seq, oldstate))
-		synchronize_sched();
-	else
-		smp_mb(); /* Ensure GP ends before subsequent accesses. */
+	cond_synchronize_rcu(oldstate);
 }
 EXPORT_SYMBOL_GPL(cond_synchronize_sched);
 
@@ -3451,12 +3339,28 @@ void rcu_barrier_bh(void)
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_bh);
 
+/**
+ * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
+ *
+ * Note that this primitive does not necessarily wait for an RCU grace period
+ * to complete.  For example, if there are no RCU callbacks queued anywhere
+ * in the system, then rcu_barrier() is within its rights to return
+ * immediately, without waiting for anything, much less an RCU grace period.
+ */
+void rcu_barrier(void)
+{
+	_rcu_barrier(rcu_state_p);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
 /**
  * rcu_barrier_sched - Wait for in-flight call_rcu_sched() callbacks.
+ *
+ * This is transitional.
  */
 void rcu_barrier_sched(void)
 {
-	_rcu_barrier(&rcu_sched_state);
+	rcu_barrier();
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_sched);
 
@@ -3755,7 +3659,7 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* QS for any half-done expedited RCU-sched GP. */
 	preempt_disable();
-	rcu_report_exp_rdp(&rcu_sched_state, this_cpu_ptr(rcu_sched_state.rda));
+	rcu_report_exp_rdp(&rcu_state, this_cpu_ptr(rcu_state.rda));
 	preempt_enable();
 	rcu_preempt_deferred_qs(current);
 	for_each_rcu_flavor(rsp)
@@ -4097,10 +4001,9 @@ void __init rcu_init(void)
 
 	rcu_bootup_announce();
 	rcu_init_geometry();
-	rcu_init_one(&rcu_sched_state);
+	rcu_init_one(&rcu_state);
 	if (dump_tree)
-		rcu_dump_rcu_node_tree(&rcu_sched_state);
-	__rcu_init_preempt();
+		rcu_dump_rcu_node_tree(&rcu_state);
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 
 	/*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e02c882861eb..38658ca87dcb 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -225,9 +225,6 @@ struct rcu_data {
 
 	/* 5) _rcu_barrier(), OOM callbacks, and expediting. */
 	struct rcu_head barrier_head;
-#ifdef CONFIG_RCU_FAST_NO_HZ
-	struct rcu_head oom_head;
-#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 	int exp_dynticks_snap;		/* Double-check need for IPI. */
 
 	/* 6) Callback offloading. */
@@ -433,8 +430,7 @@ DECLARE_PER_CPU(char, rcu_cpu_has_work);
 
 /* Forward declarations for rcutree_plugin.h */
 static void rcu_bootup_announce(void);
-static void rcu_preempt_qs(void);
-static void rcu_preempt_note_context_switch(bool preempt);
+static void rcu_qs(void);
 static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
 #ifdef CONFIG_HOTPLUG_CPU
 static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
@@ -444,9 +440,8 @@ static int rcu_print_task_stall(struct rcu_node *rnp);
 static int rcu_print_task_exp_stall(struct rcu_node *rnp);
 static void rcu_preempt_check_blocked_tasks(struct rcu_state *rsp,
 					    struct rcu_node *rnp);
-static void rcu_preempt_check_callbacks(void);
+static void rcu_flavor_check_callbacks(int user);
 void call_rcu(struct rcu_head *head, rcu_callback_t func);
-static void __init __rcu_init_preempt(void);
 static void dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp,
 			    int ncheck);
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0f8f225c1b46..5619edfd414e 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -265,7 +265,7 @@ static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp)
 	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, true);
 }
 
-/* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
+/* Common code for work-done checking. */
 static bool sync_exp_work_done(struct rcu_state *rsp, unsigned long s)
 {
 	if (rcu_exp_gp_seq_done(rsp, s)) {
@@ -337,45 +337,6 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	return false;
 }
 
-/* Invoked on each online non-idle CPU for expedited quiescent state. */
-static void sync_sched_exp_handler(void *data)
-{
-	struct rcu_data *rdp;
-	struct rcu_node *rnp;
-	struct rcu_state *rsp = data;
-
-	rdp = this_cpu_ptr(rsp->rda);
-	rnp = rdp->mynode;
-	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
-	    __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
-		return;
-	if (rcu_is_cpu_rrupt_from_idle()) {
-		rcu_report_exp_rdp(&rcu_sched_state,
-				   this_cpu_ptr(&rcu_sched_data));
-		return;
-	}
-	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
-	/* Store .exp before .rcu_urgent_qs. */
-	smp_store_release(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs), true);
-	resched_cpu(smp_processor_id());
-}
-
-/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
-static void sync_sched_exp_online_cleanup(int cpu)
-{
-	struct rcu_data *rdp;
-	int ret;
-	struct rcu_node *rnp;
-	struct rcu_state *rsp = &rcu_sched_state;
-
-	rdp = per_cpu_ptr(rsp->rda, cpu);
-	rnp = rdp->mynode;
-	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
-		return;
-	ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0);
-	WARN_ON_ONCE(ret);
-}
-
 /*
  * Select the CPUs within the specified rcu_node that the upcoming
  * expedited grace period needs to wait for.
@@ -691,39 +652,6 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
 	mutex_unlock(&rsp->exp_mutex);
 }
 
-/**
- * synchronize_sched_expedited - Brute-force RCU-sched grace period
- *
- * Wait for an RCU-sched grace period to elapse, but use a "big hammer"
- * approach to force the grace period to end quickly.  This consumes
- * significant time on all CPUs and is unfriendly to real-time workloads,
- * so is thus not recommended for any sort of common-case code.  In fact,
- * if you are using synchronize_sched_expedited() in a loop, please
- * restructure your code to batch your updates, and then use a single
- * synchronize_sched() instead.
- *
- * This implementation can be thought of as an application of sequence
- * locking to expedited grace periods, but using the sequence counter to
- * determine when someone else has already done the work instead of for
- * retrying readers.
- */
-void synchronize_sched_expedited(void)
-{
-	struct rcu_state *rsp = &rcu_sched_state;
-
-	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
-			 lock_is_held(&rcu_lock_map) ||
-			 lock_is_held(&rcu_sched_lock_map),
-			 "Illegal synchronize_sched_expedited() in RCU read-side critical section");
-
-	/* If only one CPU, this is automatically a grace period. */
-	if (rcu_blocking_is_gp())
-		return;
-
-	_synchronize_rcu_expedited(rsp, sync_sched_exp_handler);
-}
-EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
-
 #ifdef CONFIG_PREEMPT_RCU
 
 /*
@@ -801,6 +729,11 @@ static void sync_rcu_exp_handler(void *info)
 		resched_cpu(rdp->cpu);
 }
 
+/* PREEMPT=y, so no RCU-sched to clean up after. */
+static void sync_sched_exp_online_cleanup(int cpu)
+{
+}
+
 /**
  * synchronize_rcu_expedited - Brute-force RCU grace period
  *
@@ -818,6 +751,8 @@ static void sync_rcu_exp_handler(void *info)
  * you are using synchronize_rcu_expedited() in a loop, please restructure
  * your code to batch your updates, and then Use a single synchronize_rcu()
  * instead.
+ *
+ * This has the same semantics as (but is more brutal than) synchronize_rcu().
  */
 void synchronize_rcu_expedited(void)
 {
@@ -836,13 +771,79 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
+/* Invoked on each online non-idle CPU for expedited quiescent state. */
+static void sync_sched_exp_handler(void *data)
+{
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+	struct rcu_state *rsp = data;
+
+	rdp = this_cpu_ptr(rsp->rda);
+	rnp = rdp->mynode;
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
+	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
+		return;
+	if (rcu_is_cpu_rrupt_from_idle()) {
+		rcu_report_exp_rdp(&rcu_state, this_cpu_ptr(&rcu_data));
+		return;
+	}
+	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
+	/* Store .exp before .rcu_urgent_qs. */
+	smp_store_release(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs), true);
+	resched_cpu(smp_processor_id());
+}
+
+/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
+static void sync_sched_exp_online_cleanup(int cpu)
+{
+	struct rcu_data *rdp;
+	int ret;
+	struct rcu_node *rnp;
+	struct rcu_state *rsp = &rcu_state;
+
+	rdp = per_cpu_ptr(rsp->rda, cpu);
+	rnp = rdp->mynode;
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
+		return;
+	ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0);
+	WARN_ON_ONCE(ret);
+}
+
 /*
- * Wait for an rcu-preempt grace period, but make it happen quickly.
- * But because preemptible RCU does not exist, map to rcu-sched.
+ * Because a context switch is a grace period for RCU-sched, any blocking
+ * grace-period wait automatically implies a grace period if there
+ * is only one CPU online at any point time during execution of either
+ * synchronize_sched() or synchronize_rcu_bh().  It is OK to occasionally
+ * incorrectly indicate that there are multiple CPUs online when there
+ * was in fact only one the whole time, as this just adds some overhead:
+ * RCU still operates correctly.
  */
+static int rcu_blocking_is_gp(void)
+{
+	int ret;
+
+	might_sleep();  /* Check for RCU read-side critical section. */
+	preempt_disable();
+	ret = num_online_cpus() <= 1;
+	preempt_enable();
+	return ret;
+}
+
+/* PREEMPT=n implementation of synchronize_rcu_expedited(). */
 void synchronize_rcu_expedited(void)
 {
-	synchronize_sched_expedited();
+	struct rcu_state *rsp = &rcu_state;
+
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_sched_expedited() in RCU read-side critical section");
+
+	/* If only one CPU, this is automatically a grace period. */
+	if (rcu_blocking_is_gp())
+		return;
+
+	_synchronize_rcu_expedited(rsp, sync_sched_exp_handler);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 9f0d054e6c20..2c81f8dd63b4 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -123,10 +123,6 @@ static void __init rcu_bootup_announce_oddness(void)
 
 #ifdef CONFIG_PREEMPT_RCU
 
-RCU_STATE_INITIALIZER(rcu_preempt, 'p', call_rcu);
-static struct rcu_state *const rcu_state_p = &rcu_preempt_state;
-static struct rcu_data __percpu *const rcu_data_p = &rcu_preempt_data;
-
 static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 			       bool wake);
 static void rcu_read_unlock_special(struct task_struct *t);
@@ -303,15 +299,15 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
  *
  * Callers to this function must disable preemption.
  */
-static void rcu_preempt_qs(void)
+static void rcu_qs(void)
 {
-	RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_qs() invoked with preemption enabled!!!\n");
+	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
 	if (__this_cpu_read(rcu_data_p->cpu_no_qs.s)) {
 		trace_rcu_grace_period(TPS("rcu_preempt"),
 				       __this_cpu_read(rcu_data_p->gp_seq),
 				       TPS("cpuqs"));
 		__this_cpu_write(rcu_data_p->cpu_no_qs.b.norm, false);
-		barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */
+		barrier(); /* Coordinate with rcu_flavor_check_callbacks(). */
 		current->rcu_read_unlock_special.b.need_qs = false;
 	}
 }
@@ -329,12 +325,14 @@ static void rcu_preempt_qs(void)
  *
  * Caller must disable interrupts.
  */
-static void rcu_preempt_note_context_switch(bool preempt)
+void rcu_note_context_switch(bool preempt)
 {
 	struct task_struct *t = current;
 	struct rcu_data *rdp = this_cpu_ptr(rcu_state_p->rda);
 	struct rcu_node *rnp;
 
+	barrier(); /* Avoid RCU read-side critical sections leaking down. */
+	trace_rcu_utilization(TPS("Start context switch"));
 	lockdep_assert_irqs_disabled();
 	WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
 	if (t->rcu_read_lock_nesting > 0 &&
@@ -381,10 +379,13 @@ static void rcu_preempt_note_context_switch(bool preempt)
 	 * grace period, then the fact that the task has been enqueued
 	 * means that we continue to block the current grace period.
 	 */
-	rcu_preempt_qs();
+	rcu_qs();
 	if (rdp->deferred_qs)
 		rcu_report_exp_rdp(rcu_state_p, rdp);
+	trace_rcu_utilization(TPS("End context switch"));
+	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
+EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
 /*
  * Check for preempted RCU readers blocking the current grace period
@@ -493,7 +494,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 		return;
 	}
 	if (special.b.need_qs) {
-		rcu_preempt_qs();
+		rcu_qs();
 		t->rcu_read_unlock_special.b.need_qs = false;
 		if (!t->rcu_read_unlock_special.s && !rdp->deferred_qs) {
 			local_irq_restore(flags);
@@ -596,7 +597,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
  */
 static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 {
-	return (this_cpu_ptr(&rcu_preempt_data)->deferred_qs ||
+	return (this_cpu_ptr(&rcu_data)->deferred_qs ||
 		READ_ONCE(t->rcu_read_unlock_special.s)) &&
 	       t->rcu_read_lock_nesting <= 0;
 }
@@ -781,11 +782,14 @@ rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
  *
  * Caller must disable hard irqs.
  */
-static void rcu_preempt_check_callbacks(void)
+static void rcu_flavor_check_callbacks(int user)
 {
-	struct rcu_state *rsp = &rcu_preempt_state;
+	struct rcu_state *rsp = &rcu_state;
 	struct task_struct *t = current;
 
+	if (user || rcu_is_cpu_rrupt_from_idle()) {
+		rcu_note_voluntary_context_switch(current);
+	}
 	if (t->rcu_read_lock_nesting > 0 ||
 	    (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
 		/* No QS, force context switch if deferred. */
@@ -795,7 +799,7 @@ static void rcu_preempt_check_callbacks(void)
 		rcu_preempt_deferred_qs(t); /* Report deferred QS. */
 		return;
 	} else if (!t->rcu_read_lock_nesting) {
-		rcu_preempt_qs(); /* Report immediate QS. */
+		rcu_qs(); /* Report immediate QS. */
 		return;
 	}
 
@@ -808,44 +812,6 @@ static void rcu_preempt_check_callbacks(void)
 		t->rcu_read_unlock_special.b.need_qs = true;
 }
 
-/**
- * call_rcu() - Queue an RCU callback for invocation after a grace period.
- * @head: structure to be used for queueing the RCU updates.
- * @func: actual callback function to be invoked after the grace period
- *
- * The callback function will be invoked some time after a full grace
- * period elapses, in other words after all pre-existing RCU read-side
- * critical sections have completed.  However, the callback function
- * might well execute concurrently with RCU read-side critical sections
- * that started after call_rcu() was invoked.  RCU read-side critical
- * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
- * and may be nested.
- *
- * Note that all CPUs must agree that the grace period extended beyond
- * all pre-existing RCU read-side critical section.  On systems with more
- * than one CPU, this means that when "func()" is invoked, each CPU is
- * guaranteed to have executed a full memory barrier since the end of its
- * last RCU read-side critical section whose beginning preceded the call
- * to call_rcu().  It also means that each CPU executing an RCU read-side
- * critical section that continues beyond the start of "func()" must have
- * executed a memory barrier after the call_rcu() but before the beginning
- * of that RCU read-side critical section.  Note that these guarantees
- * include CPUs that are offline, idle, or executing in user mode, as
- * well as CPUs that are executing in the kernel.
- *
- * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
- * resulting RCU callback function "func()", then both CPU A and CPU B are
- * guaranteed to execute a full memory barrier during the time interval
- * between the call to call_rcu() and the invocation of "func()" -- even
- * if CPU A and CPU B are the same CPU (but again only if the system has
- * more than one CPU).
- */
-void call_rcu(struct rcu_head *head, rcu_callback_t func)
-{
-	__call_rcu(head, func, rcu_state_p, -1, 0);
-}
-EXPORT_SYMBOL_GPL(call_rcu);
-
 /**
  * synchronize_rcu - wait until a grace period has elapsed.
  *
@@ -856,14 +822,28 @@ EXPORT_SYMBOL_GPL(call_rcu);
  * concurrently with new RCU read-side critical sections that began while
  * synchronize_rcu() was waiting.  RCU read-side critical sections are
  * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested.
+ * In addition, regions of code across which interrupts, preemption, or
+ * softirqs have been disabled also serve as RCU read-side critical
+ * sections.  This includes hardware interrupt handlers, softirq handlers,
+ * and NMI handlers.
+ *
+ * Note that this guarantee implies further memory-ordering guarantees.
+ * On systems with more than one CPU, when synchronize_rcu() returns,
+ * each CPU is guaranteed to have executed a full memory barrier since the
+ * end of its last RCU-sched read-side critical section whose beginning
+ * preceded the call to synchronize_rcu().  In addition, each CPU having
+ * an RCU read-side critical section that extends beyond the return from
+ * synchronize_rcu() is guaranteed to have executed a full memory barrier
+ * after the beginning of synchronize_rcu() and before the beginning of
+ * that RCU read-side critical section.  Note that these guarantees include
+ * CPUs that are offline, idle, or executing in user mode, as well as CPUs
+ * that are executing in the kernel.
  *
- * See the description of synchronize_sched() for more detailed
- * information on memory-ordering guarantees.  However, please note
- * that -only- the memory-ordering guarantees apply.  For example,
- * synchronize_rcu() is -not- guaranteed to wait on things like code
- * protected by preempt_disable(), instead, synchronize_rcu() is -only-
- * guaranteed to wait on RCU read-side critical sections, that is, sections
- * of code protected by rcu_read_lock().
+ * Furthermore, if CPU A invoked synchronize_rcu(), which returned
+ * to its caller on CPU B, then both CPU A and CPU B are guaranteed
+ * to have executed a full memory barrier during the execution of
+ * synchronize_rcu() -- even if CPU A and CPU B are the same CPU (but
+ * again only if the system has more than one CPU).
  */
 void synchronize_rcu(void)
 {
@@ -880,28 +860,6 @@ void synchronize_rcu(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
-/**
- * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
- *
- * Note that this primitive does not necessarily wait for an RCU grace period
- * to complete.  For example, if there are no RCU callbacks queued anywhere
- * in the system, then rcu_barrier() is within its rights to return
- * immediately, without waiting for anything, much less an RCU grace period.
- */
-void rcu_barrier(void)
-{
-	_rcu_barrier(rcu_state_p);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier);
-
-/*
- * Initialize preemptible RCU's state structures.
- */
-static void __init __rcu_init_preempt(void)
-{
-	rcu_init_one(rcu_state_p);
-}
-
 /*
  * Check for a task exiting while in a preemptible-RCU read-side
  * critical section, clean up if so.  No need to issue warnings,
@@ -964,8 +922,6 @@ dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck)
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
-static struct rcu_state *const rcu_state_p = &rcu_sched_state;
-
 /*
  * Tell them what RCU they are running.
  */
@@ -975,18 +931,48 @@ static void __init rcu_bootup_announce(void)
 	rcu_bootup_announce_oddness();
 }
 
-/* Because preemptible RCU does not exist, we can ignore its QSes. */
-static void rcu_preempt_qs(void)
+/*
+ * Note a quiescent state for PREEMPT=n.  Because we do not need to know
+ * how many quiescent states passed, just if there was at least one since
+ * the start of the grace period, this just sets a flag.  The caller must
+ * have disabled preemption.
+ */
+static void rcu_qs(void)
 {
+	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!");
+	if (!__this_cpu_read(rcu_data.cpu_no_qs.s))
+		return;
+	trace_rcu_grace_period(TPS("rcu_sched"),
+			       __this_cpu_read(rcu_data.gp_seq), TPS("cpuqs"));
+	__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
+	if (!__this_cpu_read(rcu_data.cpu_no_qs.b.exp))
+		return;
+	__this_cpu_write(rcu_data.cpu_no_qs.b.exp, false);
+	rcu_report_exp_rdp(&rcu_state, this_cpu_ptr(&rcu_data));
 }
 
 /*
- * Because preemptible RCU does not exist, we never have to check for
- * CPUs being in quiescent states.
+ * Note a PREEMPT=n context switch.  The caller must have disabled interrupts.
  */
-static void rcu_preempt_note_context_switch(bool preempt)
+void rcu_note_context_switch(bool preempt)
 {
+	barrier(); /* Avoid RCU read-side critical sections leaking down. */
+	trace_rcu_utilization(TPS("Start context switch"));
+	rcu_qs();
+	/* Load rcu_urgent_qs before other flags. */
+	if (!smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs)))
+		goto out;
+	this_cpu_write(rcu_dynticks.rcu_urgent_qs, false);
+	if (unlikely(raw_cpu_read(rcu_dynticks.rcu_need_heavy_qs)))
+		rcu_momentary_dyntick_idle();
+	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
+	if (!preempt)
+		rcu_tasks_qs(current);
+out:
+	trace_rcu_utilization(TPS("End context switch"));
+	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
+EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
 /*
  * Because preemptible RCU does not exist, there are never any preempted
@@ -1054,29 +1040,48 @@ rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
 }
 
 /*
- * Because preemptible RCU does not exist, it never has any callbacks
- * to check.
+ * Check to see if this CPU is in a non-context-switch quiescent state
+ * (user mode or idle loop for rcu, non-softirq execution for rcu_bh).
+ * Also schedule RCU core processing.
+ *
+ * This function must be called from hardirq context.  It is normally
+ * invoked from the scheduling-clock interrupt.
  */
-static void rcu_preempt_check_callbacks(void)
+static void rcu_flavor_check_callbacks(int user)
 {
-}
+	if (user || rcu_is_cpu_rrupt_from_idle()) {
 
-/*
- * Because preemptible RCU does not exist, rcu_barrier() is just
- * another name for rcu_barrier_sched().
- */
-void rcu_barrier(void)
-{
-	rcu_barrier_sched();
+		/*
+		 * Get here if this CPU took its interrupt from user
+		 * mode or from the idle loop, and if this is not a
+		 * nested interrupt.  In this case, the CPU is in
+		 * a quiescent state, so note it.
+		 *
+		 * No memory barrier is required here because rcu_qs()
+		 * references only CPU-local variables that other CPUs
+		 * neither access nor modify, at least not while the
+		 * corresponding CPU is online.
+		 */
+
+		rcu_qs();
+	}
 }
-EXPORT_SYMBOL_GPL(rcu_barrier);
 
-/*
- * Because preemptible RCU does not exist, it need not be initialized.
- */
-static void __init __rcu_init_preempt(void)
+/* PREEMPT=n implementation of synchronize_rcu(). */
+void synchronize_rcu(void)
 {
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_rcu() in RCU-sched read-side critical section");
+	if (rcu_blocking_is_gp())
+		return;
+	if (rcu_gp_is_expedited())
+		synchronize_rcu_expedited();
+	else
+		wait_rcu_gp(call_rcu);
 }
+EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 /*
  * Because preemptible RCU does not exist, tasks cannot possibly exit
@@ -1319,8 +1324,7 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 
 static void rcu_kthread_do_work(void)
 {
-	rcu_do_batch(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data));
-	rcu_do_batch(&rcu_preempt_state, this_cpu_ptr(&rcu_preempt_data));
+	rcu_do_batch(&rcu_state, this_cpu_ptr(&rcu_data));
 }
 
 static void rcu_cpu_kthread_setup(unsigned int cpu)
@@ -1727,87 +1731,6 @@ static void rcu_idle_count_callbacks_posted(void)
 	__this_cpu_add(rcu_dynticks.nonlazy_posted, 1);
 }
 
-/*
- * Data for flushing lazy RCU callbacks at OOM time.
- */
-static atomic_t oom_callback_count;
-static DECLARE_WAIT_QUEUE_HEAD(oom_callback_wq);
-
-/*
- * RCU OOM callback -- decrement the outstanding count and deliver the
- * wake-up if we are the last one.
- */
-static void rcu_oom_callback(struct rcu_head *rhp)
-{
-	if (atomic_dec_and_test(&oom_callback_count))
-		wake_up(&oom_callback_wq);
-}
-
-/*
- * Post an rcu_oom_notify callback on the current CPU if it has at
- * least one lazy callback.  This will unnecessarily post callbacks
- * to CPUs that already have a non-lazy callback at the end of their
- * callback list, but this is an infrequent operation, so accept some
- * extra overhead to keep things simple.
- */
-static void rcu_oom_notify_cpu(void *unused)
-{
-	struct rcu_state *rsp;
-	struct rcu_data *rdp;
-
-	for_each_rcu_flavor(rsp) {
-		rdp = raw_cpu_ptr(rsp->rda);
-		if (rcu_segcblist_n_lazy_cbs(&rdp->cblist)) {
-			atomic_inc(&oom_callback_count);
-			rsp->call(&rdp->oom_head, rcu_oom_callback);
-		}
-	}
-}
-
-/*
- * If low on memory, ensure that each CPU has a non-lazy callback.
- * This will wake up CPUs that have only lazy callbacks, in turn
- * ensuring that they free up the corresponding memory in a timely manner.
- * Because an uncertain amount of memory will be freed in some uncertain
- * timeframe, we do not claim to have freed anything.
- */
-static int rcu_oom_notify(struct notifier_block *self,
-			  unsigned long notused, void *nfreed)
-{
-	int cpu;
-
-	/* Wait for callbacks from earlier instance to complete. */
-	wait_event(oom_callback_wq, atomic_read(&oom_callback_count) == 0);
-	smp_mb(); /* Ensure callback reuse happens after callback invocation. */
-
-	/*
-	 * Prevent premature wakeup: ensure that all increments happen
-	 * before there is a chance of the counter reaching zero.
-	 */
-	atomic_set(&oom_callback_count, 1);
-
-	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
-		cond_resched_tasks_rcu_qs();
-	}
-
-	/* Unconditionally decrement: no need to wake ourselves up. */
-	atomic_dec(&oom_callback_count);
-
-	return NOTIFY_OK;
-}
-
-static struct notifier_block rcu_oom_nb = {
-	.notifier_call = rcu_oom_notify
-};
-
-static int __init rcu_register_oom_notifier(void)
-{
-	register_oom_notifier(&rcu_oom_nb);
-	return 0;
-}
-early_initcall(rcu_register_oom_notifier);
-
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 #ifdef CONFIG_RCU_FAST_NO_HZ
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 14/19] rcu: Express Tiny RCU updates in terms of RCU rather than RCU-sched
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (12 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 13/19] rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 15/19] rcu: Remove RCU_STATE_INITIALIZER() Paul E. McKenney
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

This commit renames Tiny RCU functions so that the lowest level of
functionality is RCU (e.g., synchronize_rcu()) rather than RCU-sched
(e.g., synchronize_sched()).  This provides greater naming compatibility
with Tree RCU, which will in turn permit more LoC removal once
the RCU-sched and RCU-bh update-side API is removed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix Tiny call_rcu()'s EXPORT_SYMBOL() in response to a bug
  report from kbuild test robot. ]
---
 include/linux/rcupdate.h | 12 +++++-----
 include/linux/rcutiny.h  | 34 +++++++++++++++-------------
 include/linux/rcutree.h  |  1 -
 kernel/rcu/tiny.c        | 48 ++++++++++++++++++++--------------------
 4 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 94474bb6b5c4..1207c6c9bd8b 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -49,15 +49,14 @@
 
 /* Exported common interfaces */
 
-#ifdef CONFIG_TINY_RCU
-#define	call_rcu	call_rcu_sched
-#else
-void call_rcu(struct rcu_head *head, rcu_callback_t func);
+#ifndef CONFIG_TINY_RCU
+void synchronize_sched(void);
+void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 #endif
 
-void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
-void synchronize_sched(void);
+void call_rcu(struct rcu_head *head, rcu_callback_t func);
 void rcu_barrier_tasks(void);
+void synchronize_rcu(void);
 
 static inline void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
 {
@@ -68,7 +67,6 @@ static inline void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
 
 void __rcu_read_lock(void);
 void __rcu_read_unlock(void);
-void synchronize_rcu(void);
 
 /*
  * Defined as a macro as it is a very low level header included from
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index df2c0895c5e7..e66fb8bc2127 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -36,9 +36,9 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
 /* Never flag non-existent other CPUs! */
 static inline bool rcu_eqs_special_set(int cpu) { return false; }
 
-static inline void synchronize_rcu(void)
+static inline void synchronize_sched(void)
 {
-	synchronize_sched();
+	synchronize_rcu();
 }
 
 static inline unsigned long get_state_synchronize_rcu(void)
@@ -61,16 +61,11 @@ static inline void cond_synchronize_sched(unsigned long oldstate)
 	might_sleep();
 }
 
-static inline void synchronize_rcu_expedited(void)
-{
-	synchronize_sched();	/* Only one CPU, so pretty fast anyway!!! */
-}
+extern void rcu_barrier(void);
 
-extern void rcu_barrier_sched(void);
-
-static inline void rcu_barrier(void)
+static inline void rcu_barrier_sched(void)
 {
-	rcu_barrier_sched();  /* Only one CPU, so only one list of callbacks! */
+	rcu_barrier();  /* Only one CPU, so only one list of callbacks! */
 }
 
 static inline void rcu_barrier_bh(void)
@@ -88,27 +83,36 @@ static inline void synchronize_rcu_bh_expedited(void)
 	synchronize_sched();
 }
 
+static inline void synchronize_rcu_expedited(void)
+{
+	synchronize_sched();
+}
+
 static inline void synchronize_sched_expedited(void)
 {
 	synchronize_sched();
 }
 
-static inline void kfree_call_rcu(struct rcu_head *head,
-				  rcu_callback_t func)
+static inline void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
+{
+	call_rcu(head, func);
+}
+
+static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	call_rcu(head, func);
 }
 
-void rcu_sched_qs(void);
+void rcu_qs(void);
 
 static inline void rcu_softirq_qs(void)
 {
-	rcu_sched_qs();
+	rcu_qs();
 }
 
 #define rcu_note_context_switch(preempt) \
 	do { \
-		rcu_sched_qs(); \
+		rcu_qs(); \
 		rcu_tasks_qs(current); \
 	} while (0)
 
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 0c44720f0e84..6d30a0809300 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -45,7 +45,6 @@ static inline void rcu_virt_note_context_switch(int cpu)
 	rcu_note_context_switch(false);
 }
 
-void synchronize_rcu(void);
 static inline void synchronize_rcu_bh(void)
 {
 	synchronize_rcu();
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index cadcf63c4889..30826fb6e438 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -46,25 +46,25 @@ struct rcu_ctrlblk {
 };
 
 /* Definition for rcupdate control block. */
-static struct rcu_ctrlblk rcu_sched_ctrlblk = {
-	.donetail	= &rcu_sched_ctrlblk.rcucblist,
-	.curtail	= &rcu_sched_ctrlblk.rcucblist,
+static struct rcu_ctrlblk rcu_ctrlblk = {
+	.donetail	= &rcu_ctrlblk.rcucblist,
+	.curtail	= &rcu_ctrlblk.rcucblist,
 };
 
-void rcu_barrier_sched(void)
+void rcu_barrier(void)
 {
-	wait_rcu_gp(call_rcu_sched);
+	wait_rcu_gp(call_rcu);
 }
-EXPORT_SYMBOL(rcu_barrier_sched);
+EXPORT_SYMBOL(rcu_barrier);
 
 /* Record an rcu quiescent state.  */
-void rcu_sched_qs(void)
+void rcu_qs(void)
 {
 	unsigned long flags;
 
 	local_irq_save(flags);
-	if (rcu_sched_ctrlblk.donetail != rcu_sched_ctrlblk.curtail) {
-		rcu_sched_ctrlblk.donetail = rcu_sched_ctrlblk.curtail;
+	if (rcu_ctrlblk.donetail != rcu_ctrlblk.curtail) {
+		rcu_ctrlblk.donetail = rcu_ctrlblk.curtail;
 		raise_softirq(RCU_SOFTIRQ);
 	}
 	local_irq_restore(flags);
@@ -79,7 +79,7 @@ void rcu_sched_qs(void)
 void rcu_check_callbacks(int user)
 {
 	if (user)
-		rcu_sched_qs();
+		rcu_qs();
 }
 
 /* Invoke the RCU callbacks whose grace period has elapsed.  */
@@ -90,17 +90,17 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
 
 	/* Move the ready-to-invoke callbacks to a local list. */
 	local_irq_save(flags);
-	if (rcu_sched_ctrlblk.donetail == &rcu_sched_ctrlblk.rcucblist) {
+	if (rcu_ctrlblk.donetail == &rcu_ctrlblk.rcucblist) {
 		/* No callbacks ready, so just leave. */
 		local_irq_restore(flags);
 		return;
 	}
-	list = rcu_sched_ctrlblk.rcucblist;
-	rcu_sched_ctrlblk.rcucblist = *rcu_sched_ctrlblk.donetail;
-	*rcu_sched_ctrlblk.donetail = NULL;
-	if (rcu_sched_ctrlblk.curtail == rcu_sched_ctrlblk.donetail)
-		rcu_sched_ctrlblk.curtail = &rcu_sched_ctrlblk.rcucblist;
-	rcu_sched_ctrlblk.donetail = &rcu_sched_ctrlblk.rcucblist;
+	list = rcu_ctrlblk.rcucblist;
+	rcu_ctrlblk.rcucblist = *rcu_ctrlblk.donetail;
+	*rcu_ctrlblk.donetail = NULL;
+	if (rcu_ctrlblk.curtail == rcu_ctrlblk.donetail)
+		rcu_ctrlblk.curtail = &rcu_ctrlblk.rcucblist;
+	rcu_ctrlblk.donetail = &rcu_ctrlblk.rcucblist;
 	local_irq_restore(flags);
 
 	/* Invoke the callbacks on the local list. */
@@ -125,21 +125,21 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
  *
  * Cool, huh?  (Due to Josh Triplett.)
  */
-void synchronize_sched(void)
+void synchronize_rcu(void)
 {
 	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
 			 lock_is_held(&rcu_lock_map) ||
 			 lock_is_held(&rcu_sched_lock_map),
 			 "Illegal synchronize_sched() in RCU read-side critical section");
 }
-EXPORT_SYMBOL_GPL(synchronize_sched);
+EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 /*
  * Post an RCU callback to be invoked after the end of an RCU-sched grace
  * period.  But since we have but one CPU, that would be after any
  * quiescent state.
  */
-void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
+void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 
@@ -148,16 +148,16 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
 	head->next = NULL;
 
 	local_irq_save(flags);
-	*rcu_sched_ctrlblk.curtail = head;
-	rcu_sched_ctrlblk.curtail = &head->next;
+	*rcu_ctrlblk.curtail = head;
+	rcu_ctrlblk.curtail = &head->next;
 	local_irq_restore(flags);
 
 	if (unlikely(is_idle_task(current))) {
-		/* force scheduling for rcu_sched_qs() */
+		/* force scheduling for rcu_qs() */
 		resched_cpu(0);
 	}
 }
-EXPORT_SYMBOL_GPL(call_rcu_sched);
+EXPORT_SYMBOL_GPL(call_rcu);
 
 void __init rcu_init(void)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 15/19] rcu: Remove RCU_STATE_INITIALIZER()
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (13 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 14/19] rcu: Express Tiny RCU updates in terms of RCU rather than RCU-sched Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 16/19] rcu: Eliminate rcu_state structure's ->call field Paul E. McKenney
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

Now that a given build of the Linux kernel has only one set of rcu_state,
rcu_node, and rcu_data structures, there is no point in creating a macro
to declare and compile-time initialize them.  This commit therefore
just does normal declaration and compile-time initialization of these
structures.  While in the area, this commit also removes #ifndefs of
the no-longer-ever-defined preprocessor macro RCU_TREE_NONCORE.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 54 ++++++++++++-----------------------------------
 kernel/rcu/tree.h | 29 +++++++++++++++++++------
 2 files changed, 37 insertions(+), 46 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a8965a7caf25..1868df8089ba 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -72,46 +72,20 @@
 
 /* Data structures. */
 
-/*
- * In order to export the rcu_state name to the tracing tools, it
- * needs to be added in the __tracepoint_string section.
- * This requires defining a separate variable tp_<sname>_varname
- * that points to the string being used, and this will allow
- * the tracing userspace tools to be able to decipher the string
- * address to the matching string.
- */
-#ifdef CONFIG_TRACING
-# define DEFINE_RCU_TPS(sname) \
-static char sname##_varname[] = #sname; \
-static const char *tp_##sname##_varname __used __tracepoint_string = sname##_varname;
-# define RCU_STATE_NAME(sname) sname##_varname
-#else
-# define DEFINE_RCU_TPS(sname)
-# define RCU_STATE_NAME(sname) __stringify(sname)
-#endif
-
-#define RCU_STATE_INITIALIZER(sname, sabbr, cr) \
-DEFINE_RCU_TPS(sname) \
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data); \
-struct rcu_state rcu_state = { \
-	.level = { &rcu_state.node[0] }, \
-	.rda = &rcu_data, \
-	.call = cr, \
-	.gp_state = RCU_GP_IDLE, \
-	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, \
-	.barrier_mutex = __MUTEX_INITIALIZER(rcu_state.barrier_mutex), \
-	.name = RCU_STATE_NAME(sname), \
-	.abbr = sabbr, \
-	.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex), \
-	.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex), \
-	.ofl_lock = __SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock), \
-}
-
-#ifdef CONFIG_PREEMPT_RCU
-RCU_STATE_INITIALIZER(rcu_preempt, 'p', call_rcu);
-#else
-RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu);
-#endif
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data);
+struct rcu_state rcu_state = {
+	.level = { &rcu_state.node[0] },
+	.rda = &rcu_data,
+	.call = call_rcu,
+	.gp_state = RCU_GP_IDLE,
+	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,
+	.barrier_mutex = __MUTEX_INITIALIZER(rcu_state.barrier_mutex),
+	.name = RCU_NAME,
+	.abbr = RCU_ABBR,
+	.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
+	.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
+	.ofl_lock = __SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock),
+};
 
 static struct rcu_state *const rcu_state_p = &rcu_state;
 static struct rcu_data __percpu *const rcu_data_p = &rcu_data;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 38658ca87dcb..3f36562d3118 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -386,7 +386,6 @@ struct rcu_state {
 #define RCU_GP_CLEANUP   7	/* Grace-period cleanup started. */
 #define RCU_GP_CLEANED   8	/* Grace-period cleanup complete. */
 
-#ifndef RCU_TREE_NONCORE
 static const char * const gp_state_names[] = {
 	"RCU_GP_IDLE",
 	"RCU_GP_WAIT_GPS",
@@ -398,7 +397,29 @@ static const char * const gp_state_names[] = {
 	"RCU_GP_CLEANUP",
 	"RCU_GP_CLEANED",
 };
-#endif /* #ifndef RCU_TREE_NONCORE */
+
+/*
+ * In order to export the rcu_state name to the tracing tools, it
+ * needs to be added in the __tracepoint_string section.
+ * This requires defining a separate variable tp_<sname>_varname
+ * that points to the string being used, and this will allow
+ * the tracing userspace tools to be able to decipher the string
+ * address to the matching string.
+ */
+#ifdef CONFIG_PREEMPT_RCU
+#define RCU_ABBR 'p'
+#define RCU_NAME_RAW "rcu_preempt"
+#else /* #ifdef CONFIG_PREEMPT_RCU */
+#define RCU_ABBR 's'
+#define RCU_NAME_RAW "rcu_sched"
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+#ifndef CONFIG_TRACING
+#define RCU_NAME RCU_NAME_RAW
+#else /* #ifdef CONFIG_TRACING */
+static char rcu_name[] = RCU_NAME_RAW;
+static const char *tp_rcu_varname __used __tracepoint_string = rcu_name;
+#define RCU_NAME rcu_name
+#endif /* #else #ifdef CONFIG_TRACING */
 
 extern struct list_head rcu_struct_flavors;
 
@@ -426,8 +447,6 @@ DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
 DECLARE_PER_CPU(char, rcu_cpu_has_work);
 #endif /* #ifdef CONFIG_RCU_BOOST */
 
-#ifndef RCU_TREE_NONCORE
-
 /* Forward declarations for rcutree_plugin.h */
 static void rcu_bootup_announce(void);
 static void rcu_qs(void);
@@ -495,5 +514,3 @@ void srcu_offline_cpu(unsigned int cpu);
 void srcu_online_cpu(unsigned int cpu) { }
 void srcu_offline_cpu(unsigned int cpu) { }
 #endif /* #else #ifdef CONFIG_SRCU */
-
-#endif /* #ifndef RCU_TREE_NONCORE */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 16/19] rcu: Eliminate rcu_state structure's ->call field
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (14 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 15/19] rcu: Remove RCU_STATE_INITIALIZER() Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 17/19] rcu: Remove rcu_state structure's ->rda field Paul E. McKenney
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

The rcu_state structure's ->call field references the corresponding RCU
flavor's call_rcu() function.  However, now that there is only ever one
rcu_state structure in a given build of the Linux kernel, and that flavor
uses plain old call_rcu(), there is not a lot of point in continuing to
have the ->call field.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c     | 1 -
 kernel/rcu/tree.h     | 1 -
 kernel/rcu/tree_exp.h | 2 +-
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1868df8089ba..0c736e078fe6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -76,7 +76,6 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data);
 struct rcu_state rcu_state = {
 	.level = { &rcu_state.node[0] },
 	.rda = &rcu_data,
-	.call = call_rcu,
 	.gp_state = RCU_GP_IDLE,
 	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,
 	.barrier_mutex = __MUTEX_INITIALIZER(rcu_state.barrier_mutex),
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 3f36562d3118..c50060567146 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -313,7 +313,6 @@ struct rcu_state {
 						/* Hierarchy levels (+1 to */
 						/*  shut bogus gcc warning) */
 	struct rcu_data __percpu *rda;		/* pointer of percu rcu_data. */
-	call_rcu_func_t call;			/* call_rcu() flavor. */
 	int ncpus;				/* # CPUs seen so far. */
 
 	/* The following fields are guarded by the root rcu_node's lock. */
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 5619edfd414e..224f05f0c0c9 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -619,7 +619,7 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
 
 	/* If expedited grace periods are prohibited, fall back to normal. */
 	if (rcu_gp_is_normal()) {
-		wait_rcu_gp(rsp->call);
+		wait_rcu_gp(call_rcu);
 		return;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 17/19] rcu: Remove rcu_state structure's ->rda field
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (15 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 16/19] rcu: Eliminate rcu_state structure's ->call field Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 18/19] rcu: Remove rcu_state_p pointer to default rcu_state structure Paul E. McKenney
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

The rcu_state structure's ->rda field was used to find the per-CPU
rcu_data structures corresponding to that rcu_state structure.  But now
there is only one rcu_state structure (creatively named "rcu_state")
and one set of per-CPU rcu_data structures (creatively named "rcu_data").
Therefore, uses of the ->rda field can always be replaced by "rcu_data,
and this commit makes that change and removes the ->rda field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 67 ++++++++++++++++++++--------------------
 kernel/rcu/tree.h        |  1 -
 kernel/rcu/tree_exp.h    | 19 ++++++------
 kernel/rcu/tree_plugin.h | 24 +++++++-------
 4 files changed, 54 insertions(+), 57 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0c736e078fe6..1dd8086ee90d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -75,7 +75,6 @@
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data);
 struct rcu_state rcu_state = {
 	.level = { &rcu_state.node[0] },
-	.rda = &rcu_data,
 	.gp_state = RCU_GP_IDLE,
 	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,
 	.barrier_mutex = __MUTEX_INITIALIZER(rcu_state.barrier_mutex),
@@ -586,7 +585,7 @@ void show_rcu_gp_kthreads(void)
 			if (!rcu_is_leaf_node(rnp))
 				continue;
 			for_each_leaf_node_possible_cpu(rnp, cpu) {
-				rdp = per_cpu_ptr(rsp->rda, cpu);
+				rdp = per_cpu_ptr(&rcu_data, cpu);
 				if (rdp->gpwrap ||
 				    ULONG_CMP_GE(rsp->gp_seq,
 						 rdp->gp_seq_needed))
@@ -660,7 +659,7 @@ static void rcu_eqs_enter(bool user)
 	trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0, rdtp->dynticks);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
 	for_each_rcu_flavor(rsp) {
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		do_nocb_deferred_wakeup(rdp);
 	}
 	rcu_prepare_for_idle();
@@ -1033,7 +1032,7 @@ bool rcu_lockdep_current_cpu_online(void)
 		return true;
 	preempt_disable();
 	for_each_rcu_flavor(rsp) {
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		rnp = rdp->mynode;
 		if (rdp->grpmask & rcu_rnp_online_cpus(rnp)) {
 			preempt_enable();
@@ -1351,7 +1350,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gp_seq)
 
 	print_cpu_stall_info_end();
 	for_each_possible_cpu(cpu)
-		totqlen += rcu_segcblist_n_cbs(&per_cpu_ptr(rsp->rda,
+		totqlen += rcu_segcblist_n_cbs(&per_cpu_ptr(&rcu_data,
 							    cpu)->cblist);
 	pr_cont("(detected by %d, t=%ld jiffies, g=%ld, q=%lu)\n",
 	       smp_processor_id(), (long)(jiffies - rsp->gp_start),
@@ -1391,7 +1390,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 {
 	int cpu;
 	unsigned long flags;
-	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rcu_get_root(rsp);
 	long totqlen = 0;
 
@@ -1412,7 +1411,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
 	print_cpu_stall_info_end();
 	for_each_possible_cpu(cpu)
-		totqlen += rcu_segcblist_n_cbs(&per_cpu_ptr(rsp->rda,
+		totqlen += rcu_segcblist_n_cbs(&per_cpu_ptr(&rcu_data,
 							    cpu)->cblist);
 	pr_cont(" (t=%lu jiffies g=%ld q=%lu)\n",
 		jiffies - rsp->gp_start,
@@ -1623,7 +1622,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
 {
 	bool needmore;
-	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 
 	needmore = ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed);
 	if (!needmore)
@@ -1935,7 +1934,7 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 	rcu_for_each_node_breadth_first(rsp, rnp) {
 		rcu_gp_slow(rsp, gp_init_delay);
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		rcu_preempt_check_blocked_tasks(rsp, rnp);
 		rnp->qsmask = rnp->qsmaskinit;
 		WRITE_ONCE(rnp->gp_seq, rsp->gp_seq);
@@ -2049,7 +2048,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 			dump_blkd_tasks(rsp, rnp, 10);
 		WARN_ON_ONCE(rnp->qsmask);
 		WRITE_ONCE(rnp->gp_seq, new_gp_seq);
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		if (rnp == rdp->mynode)
 			needgp = __note_gp_changes(rsp, rnp, rdp) || needgp;
 		/* smp_mb() provided by prior unlock-lock pair. */
@@ -2069,7 +2068,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	trace_rcu_grace_period(rsp->name, rsp->gp_seq, TPS("end"));
 	rsp->gp_state = RCU_GP_IDLE;
 	/* Check for GP requests since above loop. */
-	rdp = this_cpu_ptr(rsp->rda);
+	rdp = this_cpu_ptr(&rcu_data);
 	if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
 		trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
 				  TPS("CleanupMore"));
@@ -2404,7 +2403,7 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
 {
 	RCU_TRACE(bool blkd;)
-	RCU_TRACE(struct rcu_data *rdp = this_cpu_ptr(rsp->rda);)
+	RCU_TRACE(struct rcu_data *rdp = this_cpu_ptr(&rcu_data);)
 	RCU_TRACE(struct rcu_node *rnp = rdp->mynode;)
 
 	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
@@ -2468,7 +2467,7 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
  */
 static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
 {
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
 	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
@@ -2621,7 +2620,7 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *rsp))
 		for_each_leaf_node_possible_cpu(rnp, cpu) {
 			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
 			if ((rnp->qsmask & bit) != 0) {
-				if (f(per_cpu_ptr(rsp->rda, cpu)))
+				if (f(per_cpu_ptr(&rcu_data, cpu)))
 					mask |= bit;
 			}
 		}
@@ -2647,7 +2646,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	struct rcu_node *rnp_old = NULL;
 
 	/* Funnel through hierarchy to reduce memory contention. */
-	rnp = __this_cpu_read(rsp->rda->mynode);
+	rnp = __this_cpu_read(rcu_data.mynode);
 	for (; rnp != NULL; rnp = rnp->parent) {
 		ret = (READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
 		      !raw_spin_trylock(&rnp->fqslock);
@@ -2739,7 +2738,7 @@ static void
 __rcu_process_callbacks(struct rcu_state *rsp)
 {
 	unsigned long flags;
-	struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
+	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
 
 	WARN_ON_ONCE(!rdp->beenonline);
@@ -2893,14 +2892,14 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func,
 	head->func = func;
 	head->next = NULL;
 	local_irq_save(flags);
-	rdp = this_cpu_ptr(rsp->rda);
+	rdp = this_cpu_ptr(&rcu_data);
 
 	/* Add the callback to our list. */
 	if (unlikely(!rcu_segcblist_is_enabled(&rdp->cblist)) || cpu != -1) {
 		int offline;
 
 		if (cpu != -1)
-			rdp = per_cpu_ptr(rsp->rda, cpu);
+			rdp = per_cpu_ptr(&rcu_data, cpu);
 		if (likely(rdp->mynode)) {
 			/* Post-boot, so this should be for a no-CBs CPU. */
 			offline = !__call_rcu_nocb(rdp, head, lazy, flags);
@@ -3134,7 +3133,7 @@ static int rcu_pending(void)
 	struct rcu_state *rsp;
 
 	for_each_rcu_flavor(rsp)
-		if (__rcu_pending(rsp, this_cpu_ptr(rsp->rda)))
+		if (__rcu_pending(rsp, this_cpu_ptr(&rcu_data)))
 			return 1;
 	return 0;
 }
@@ -3152,7 +3151,7 @@ static bool rcu_cpu_has_callbacks(bool *all_lazy)
 	struct rcu_state *rsp;
 
 	for_each_rcu_flavor(rsp) {
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		if (rcu_segcblist_empty(&rdp->cblist))
 			continue;
 		hc = true;
@@ -3201,7 +3200,7 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
 static void rcu_barrier_func(void *type)
 {
 	struct rcu_state *rsp = type;
-	struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
+	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
 
 	_rcu_barrier_trace(rsp, TPS("IRQ"), -1, rsp->barrier_sequence);
 	rdp->barrier_head.func = rcu_barrier_callback;
@@ -3261,7 +3260,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	for_each_possible_cpu(cpu) {
 		if (!cpu_online(cpu) && !rcu_is_nocb_cpu(cpu))
 			continue;
-		rdp = per_cpu_ptr(rsp->rda, cpu);
+		rdp = per_cpu_ptr(&rcu_data, cpu);
 		if (rcu_is_nocb_cpu(cpu)) {
 			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
 				_rcu_barrier_trace(rsp, TPS("OfflineNoCB"), cpu,
@@ -3371,7 +3370,7 @@ static void rcu_init_new_rnp(struct rcu_node *rnp_leaf)
 static void __init
 rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 {
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 
 	/* Set up local state, ensuring consistent view of global state. */
 	rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
@@ -3397,7 +3396,7 @@ static void
 rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
 {
 	unsigned long flags;
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	/* Set up local state, ensuring consistent view of global state. */
@@ -3453,7 +3452,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
  */
 static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
 {
-	struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 
 	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
 }
@@ -3470,7 +3469,7 @@ int rcutree_online_cpu(unsigned int cpu)
 	struct rcu_state *rsp;
 
 	for_each_rcu_flavor(rsp) {
-		rdp = per_cpu_ptr(rsp->rda, cpu);
+		rdp = per_cpu_ptr(&rcu_data, cpu);
 		rnp = rdp->mynode;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		rnp->ffmask |= rdp->grpmask;
@@ -3497,7 +3496,7 @@ int rcutree_offline_cpu(unsigned int cpu)
 	struct rcu_state *rsp;
 
 	for_each_rcu_flavor(rsp) {
-		rdp = per_cpu_ptr(rsp->rda, cpu);
+		rdp = per_cpu_ptr(&rcu_data, cpu);
 		rnp = rdp->mynode;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		rnp->ffmask &= ~rdp->grpmask;
@@ -3531,7 +3530,7 @@ int rcutree_dead_cpu(unsigned int cpu)
 
 	for_each_rcu_flavor(rsp) {
 		rcu_cleanup_dead_cpu(cpu, rsp);
-		do_nocb_deferred_wakeup(per_cpu_ptr(rsp->rda, cpu));
+		do_nocb_deferred_wakeup(per_cpu_ptr(&rcu_data, cpu));
 	}
 	return 0;
 }
@@ -3565,7 +3564,7 @@ void rcu_cpu_starting(unsigned int cpu)
 	per_cpu(rcu_cpu_started, cpu) = 1;
 
 	for_each_rcu_flavor(rsp) {
-		rdp = per_cpu_ptr(rsp->rda, cpu);
+		rdp = per_cpu_ptr(&rcu_data, cpu);
 		rnp = rdp->mynode;
 		mask = rdp->grpmask;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -3599,7 +3598,7 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
@@ -3632,7 +3631,7 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* QS for any half-done expedited RCU-sched GP. */
 	preempt_disable();
-	rcu_report_exp_rdp(&rcu_state, this_cpu_ptr(rcu_state.rda));
+	rcu_report_exp_rdp(&rcu_state, this_cpu_ptr(&rcu_data));
 	preempt_enable();
 	rcu_preempt_deferred_qs(current);
 	for_each_rcu_flavor(rsp)
@@ -3646,7 +3645,7 @@ static void rcu_migrate_callbacks(int cpu, struct rcu_state *rsp)
 {
 	unsigned long flags;
 	struct rcu_data *my_rdp;
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp_root = rcu_get_root(rdp->rsp);
 	bool needwake;
 
@@ -3654,7 +3653,7 @@ static void rcu_migrate_callbacks(int cpu, struct rcu_state *rsp)
 		return;  /* No callbacks to migrate. */
 
 	local_irq_save(flags);
-	my_rdp = this_cpu_ptr(rsp->rda);
+	my_rdp = this_cpu_ptr(&rcu_data);
 	if (rcu_nocb_adopt_orphan_cbs(my_rdp, rdp, flags)) {
 		local_irq_restore(flags);
 		return;
@@ -3856,7 +3855,7 @@ static void __init rcu_init_one(struct rcu_state *rsp)
 	for_each_possible_cpu(i) {
 		while (i > rnp->grphi)
 			rnp++;
-		per_cpu_ptr(rsp->rda, i)->mynode = rnp;
+		per_cpu_ptr(&rcu_data, i)->mynode = rnp;
 		rcu_boot_init_percpu_data(i, rsp);
 	}
 	list_add(&rsp->flavors, &rcu_struct_flavors);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index c50060567146..d60304f1ef56 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -312,7 +312,6 @@ struct rcu_state {
 	struct rcu_node *level[RCU_NUM_LVLS + 1];
 						/* Hierarchy levels (+1 to */
 						/*  shut bogus gcc warning) */
-	struct rcu_data __percpu *rda;		/* pointer of percu rcu_data. */
 	int ncpus;				/* # CPUs seen so far. */
 
 	/* The following fields are guarded by the root rcu_node's lock. */
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 224f05f0c0c9..3a8a582d9958 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -286,7 +286,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, unsigned long s)
  */
 static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 {
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
 	struct rcu_node *rnp = rdp->mynode;
 	struct rcu_node *rnp_root = rcu_get_root(rsp);
 
@@ -361,7 +361,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 	mask_ofl_test = 0;
 	for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
 		unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
-		struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 		struct rcu_dynticks *rdtp = per_cpu_ptr(&rcu_dynticks, cpu);
 		int snap;
 
@@ -390,7 +390,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
 	/* IPI the remaining CPUs for expedited quiescent state. */
 	for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
 		unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
-		struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 
 		if (!(mask_ofl_ipi & mask))
 			continue;
@@ -509,7 +509,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 				if (!(rnp->expmask & mask))
 					continue;
 				ndetected++;
-				rdp = per_cpu_ptr(rsp->rda, cpu);
+				rdp = per_cpu_ptr(&rcu_data, cpu);
 				pr_cont(" %d-%c%c%c", cpu,
 					"O."[!!cpu_online(cpu)],
 					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
@@ -642,7 +642,7 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
 	}
 
 	/* Wait for expedited grace period to complete. */
-	rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
+	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
 	rnp = rcu_get_root(rsp);
 	wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
 		   sync_exp_work_done(rsp, s));
@@ -665,7 +665,7 @@ static void sync_rcu_exp_handler(void *info)
 {
 	unsigned long flags;
 	struct rcu_state *rsp = info;
-	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;
 	struct task_struct *t = current;
 
@@ -772,13 +772,12 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
 /* Invoked on each online non-idle CPU for expedited quiescent state. */
-static void sync_sched_exp_handler(void *data)
+static void sync_sched_exp_handler(void *unused)
 {
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
-	struct rcu_state *rsp = data;
 
-	rdp = this_cpu_ptr(rsp->rda);
+	rdp = this_cpu_ptr(&rcu_data);
 	rnp = rdp->mynode;
 	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
 	    __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
@@ -801,7 +800,7 @@ static void sync_sched_exp_online_cleanup(int cpu)
 	struct rcu_node *rnp;
 	struct rcu_state *rsp = &rcu_state;
 
-	rdp = per_cpu_ptr(rsp->rda, cpu);
+	rdp = per_cpu_ptr(&rcu_data, cpu);
 	rnp = rdp->mynode;
 	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
 		return;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 2c81f8dd63b4..b7a99a6e64b6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -328,7 +328,7 @@ static void rcu_qs(void)
 void rcu_note_context_switch(bool preempt)
 {
 	struct task_struct *t = current;
-	struct rcu_data *rdp = this_cpu_ptr(rcu_state_p->rda);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp;
 
 	barrier(); /* Avoid RCU read-side critical sections leaking down. */
@@ -488,7 +488,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	 * t->rcu_read_unlock_special cannot change.
 	 */
 	special = t->rcu_read_unlock_special;
-	rdp = this_cpu_ptr(rcu_state_p->rda);
+	rdp = this_cpu_ptr(&rcu_data);
 	if (!special.s && !rdp->deferred_qs) {
 		local_irq_restore(flags);
 		return;
@@ -911,7 +911,7 @@ dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck)
 	}
 	pr_cont("\n");
 	for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
-		rdp = per_cpu_ptr(rsp->rda, cpu);
+		rdp = per_cpu_ptr(&rcu_data, cpu);
 		onl = !!(rdp->grpmask & rcu_rnp_online_cpus(rnp));
 		pr_info("\t%d: %c online: %ld(%d) offline: %ld(%d)\n",
 			cpu, ".o"[onl],
@@ -1437,7 +1437,7 @@ static void __init rcu_spawn_boost_kthreads(void)
 
 static void rcu_prepare_kthreads(int cpu)
 {
-	struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;
 
 	/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
@@ -1574,7 +1574,7 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void)
 	rdtp->last_advance_all = jiffies;
 
 	for_each_rcu_flavor(rsp) {
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		rnp = rdp->mynode;
 
 		/*
@@ -1692,7 +1692,7 @@ static void rcu_prepare_for_idle(void)
 		return;
 	rdtp->last_accelerate = jiffies;
 	for_each_rcu_flavor(rsp) {
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = this_cpu_ptr(&rcu_data);
 		if (!rcu_segcblist_pend_cbs(&rdp->cblist))
 			continue;
 		rnp = rdp->mynode;
@@ -1778,7 +1778,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
 {
 	unsigned long delta;
 	char fast_no_hz[72];
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_dynticks *rdtp = rdp->dynticks;
 	char *ticks_title;
 	unsigned long ticks_value;
@@ -1833,7 +1833,7 @@ static void increment_cpu_stall_ticks(void)
 	struct rcu_state *rsp;
 
 	for_each_rcu_flavor(rsp)
-		raw_cpu_inc(rsp->rda->ticks_this_gp);
+		raw_cpu_inc(rcu_data.ticks_this_gp);
 }
 
 #ifdef CONFIG_RCU_NOCB_CPU
@@ -1965,7 +1965,7 @@ static void wake_nocb_leader_defer(struct rcu_data *rdp, int waketype,
  */
 static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
 {
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	unsigned long ret;
 #ifdef CONFIG_PROVE_RCU
 	struct rcu_head *rhp;
@@ -2426,7 +2426,7 @@ void __init rcu_init_nohz(void)
 
 	for_each_rcu_flavor(rsp) {
 		for_each_cpu(cpu, rcu_nocb_mask)
-			init_nocb_callback_list(per_cpu_ptr(rsp->rda, cpu));
+			init_nocb_callback_list(per_cpu_ptr(&rcu_data, cpu));
 		rcu_organize_nocb_kthreads(rsp);
 	}
 }
@@ -2452,7 +2452,7 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
 	struct rcu_data *rdp;
 	struct rcu_data *rdp_last;
 	struct rcu_data *rdp_old_leader;
-	struct rcu_data *rdp_spawn = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_data *rdp_spawn = per_cpu_ptr(&rcu_data, cpu);
 	struct task_struct *t;
 
 	/*
@@ -2545,7 +2545,7 @@ static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp)
 	 * we will spawn the needed set of rcu_nocb_kthread() kthreads.
 	 */
 	for_each_cpu(cpu, rcu_nocb_mask) {
-		rdp = per_cpu_ptr(rsp->rda, cpu);
+		rdp = per_cpu_ptr(&rcu_data, cpu);
 		if (rdp->cpu >= nl) {
 			/* New leader, set up for followers & next leader. */
 			nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 18/19] rcu: Remove rcu_state_p pointer to default rcu_state structure
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (16 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 17/19] rcu: Remove rcu_state structure's ->rda field Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:20 ` [PATCH tip/core/rcu 19/19] rcu: Remove rcu_data_p pointer to default rcu_data structure Paul E. McKenney
  2018-08-29 22:22 ` [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

The rcu_state_p pointer references the default rcu_state structure,
that is, the one that call_rcu() uses, as opposed to call_rcu_bh()
and sometimes call_rcu_sched().  But there is now only one rcu_state
structure, so that one structure is by definition the default, which
means that the rcu_state_p pointer no longer serves any useful purpose.
This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 27 ++++++++++++---------------
 kernel/rcu/tree_exp.h    |  2 +-
 kernel/rcu/tree_plugin.h | 16 ++++++++--------
 3 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1dd8086ee90d..a3bcf08ad596 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -85,7 +85,6 @@ struct rcu_state rcu_state = {
 	.ofl_lock = __SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock),
 };
 
-static struct rcu_state *const rcu_state_p = &rcu_state;
 static struct rcu_data __percpu *const rcu_data_p = &rcu_data;
 LIST_HEAD(rcu_struct_flavors);
 
@@ -491,7 +490,7 @@ static int rcu_pending(void);
  */
 unsigned long rcu_get_gp_seq(void)
 {
-	return READ_ONCE(rcu_state_p->gp_seq);
+	return READ_ONCE(rcu_state.gp_seq);
 }
 EXPORT_SYMBOL_GPL(rcu_get_gp_seq);
 
@@ -510,7 +509,7 @@ EXPORT_SYMBOL_GPL(rcu_sched_get_gp_seq);
  */
 unsigned long rcu_bh_get_gp_seq(void)
 {
-	return READ_ONCE(rcu_state_p->gp_seq);
+	return READ_ONCE(rcu_state.gp_seq);
 }
 EXPORT_SYMBOL_GPL(rcu_bh_get_gp_seq);
 
@@ -522,7 +521,7 @@ EXPORT_SYMBOL_GPL(rcu_bh_get_gp_seq);
  */
 unsigned long rcu_exp_batches_completed(void)
 {
-	return rcu_state_p->expedited_sequence;
+	return rcu_state.expedited_sequence;
 }
 EXPORT_SYMBOL_GPL(rcu_exp_batches_completed);
 
@@ -541,7 +540,7 @@ EXPORT_SYMBOL_GPL(rcu_exp_batches_completed_sched);
  */
 void rcu_force_quiescent_state(void)
 {
-	force_quiescent_state(rcu_state_p);
+	force_quiescent_state(&rcu_state);
 }
 EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
 
@@ -550,7 +549,7 @@ EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
  */
 void rcu_bh_force_quiescent_state(void)
 {
-	force_quiescent_state(rcu_state_p);
+	force_quiescent_state(&rcu_state);
 }
 EXPORT_SYMBOL_GPL(rcu_bh_force_quiescent_state);
 
@@ -611,7 +610,7 @@ void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
 	case RCU_FLAVOR:
 	case RCU_BH_FLAVOR:
 	case RCU_SCHED_FLAVOR:
-		rsp = rcu_state_p;
+		rsp = &rcu_state;
 		break;
 	default:
 		break;
@@ -2291,7 +2290,6 @@ rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
 
 	raw_lockdep_assert_held_rcu_node(rnp);
 	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT)) ||
-	    WARN_ON_ONCE(rsp != rcu_state_p) ||
 	    WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) ||
 	    rnp->qsmask != 0) {
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
@@ -2603,7 +2601,6 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *rsp))
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask == 0) {
 			if (!IS_ENABLED(CONFIG_PREEMPT) ||
-			    rsp != rcu_state_p ||
 			    rcu_preempt_blocked_readers_cgp(rnp)) {
 				/*
 				 * No point in scanning bits because they
@@ -2972,7 +2969,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func,
  */
 void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-	__call_rcu(head, func, rcu_state_p, -1, 0);
+	__call_rcu(head, func, &rcu_state, -1, 0);
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
@@ -2999,7 +2996,7 @@ EXPORT_SYMBOL_GPL(call_rcu_sched);
 void kfree_call_rcu(struct rcu_head *head,
 		    rcu_callback_t func)
 {
-	__call_rcu(head, func, rcu_state_p, -1, 1);
+	__call_rcu(head, func, &rcu_state, -1, 1);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
 
@@ -3028,7 +3025,7 @@ unsigned long get_state_synchronize_rcu(void)
 	 * before the load from ->gp_seq.
 	 */
 	smp_mb();  /* ^^^ */
-	return rcu_seq_snap(&rcu_state_p->gp_seq);
+	return rcu_seq_snap(&rcu_state.gp_seq);
 }
 EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
 
@@ -3048,7 +3045,7 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
  */
 void cond_synchronize_rcu(unsigned long oldstate)
 {
-	if (!rcu_seq_done(&rcu_state_p->gp_seq, oldstate))
+	if (!rcu_seq_done(&rcu_state.gp_seq, oldstate))
 		synchronize_rcu();
 	else
 		smp_mb(); /* Ensure GP ends before subsequent accesses. */
@@ -3307,7 +3304,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
  */
 void rcu_barrier_bh(void)
 {
-	_rcu_barrier(rcu_state_p);
+	_rcu_barrier(&rcu_state);
 }
 EXPORT_SYMBOL_GPL(rcu_barrier_bh);
 
@@ -3321,7 +3318,7 @@ EXPORT_SYMBOL_GPL(rcu_barrier_bh);
  */
 void rcu_barrier(void)
 {
-	_rcu_barrier(rcu_state_p);
+	_rcu_barrier(&rcu_state);
 }
 EXPORT_SYMBOL_GPL(rcu_barrier);
 
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 3a8a582d9958..298a6904bbcd 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -756,7 +756,7 @@ static void sync_sched_exp_online_cleanup(int cpu)
  */
 void synchronize_rcu_expedited(void)
 {
-	struct rcu_state *rsp = rcu_state_p;
+	struct rcu_state *rsp = &rcu_state;
 
 	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
 			 lock_is_held(&rcu_lock_map) ||
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b7a99a6e64b6..329d5802d899 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -381,7 +381,7 @@ void rcu_note_context_switch(bool preempt)
 	 */
 	rcu_qs();
 	if (rdp->deferred_qs)
-		rcu_report_exp_rdp(rcu_state_p, rdp);
+		rcu_report_exp_rdp(&rcu_state, rdp);
 	trace_rcu_utilization(TPS("End context switch"));
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
@@ -509,7 +509,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	 * blocked-tasks list below.
 	 */
 	if (rdp->deferred_qs) {
-		rcu_report_exp_rdp(rcu_state_p, rdp);
+		rcu_report_exp_rdp(&rcu_state, rdp);
 		if (!t->rcu_read_unlock_special.s) {
 			local_irq_restore(flags);
 			return;
@@ -566,7 +566,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 							 rnp->grplo,
 							 rnp->grphi,
 							 !!rnp->gp_tasks);
-			rcu_report_unblock_qs_rnp(rcu_state_p, rnp, flags);
+			rcu_report_unblock_qs_rnp(&rcu_state, rnp, flags);
 		} else {
 			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
@@ -580,7 +580,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 		 * then we need to report up the rcu_node hierarchy.
 		 */
 		if (!empty_exp && empty_exp_now)
-			rcu_report_exp_rnp(rcu_state_p, rnp, true);
+			rcu_report_exp_rnp(&rcu_state, rnp, true);
 	} else {
 		local_irq_restore(flags);
 	}
@@ -1300,7 +1300,7 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 	struct sched_param sp;
 	struct task_struct *t;
 
-	if (rcu_state_p != rsp)
+	if (&rcu_state != rsp)
 		return 0;
 
 	if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0)
@@ -1431,8 +1431,8 @@ static void __init rcu_spawn_boost_kthreads(void)
 	for_each_possible_cpu(cpu)
 		per_cpu(rcu_cpu_has_work, cpu) = 0;
 	BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
-	rcu_for_each_leaf_node(rcu_state_p, rnp)
-		(void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp);
+	rcu_for_each_leaf_node(&rcu_state, rnp)
+		(void)rcu_spawn_one_boost_kthread(&rcu_state, rnp);
 }
 
 static void rcu_prepare_kthreads(int cpu)
@@ -1442,7 +1442,7 @@ static void rcu_prepare_kthreads(int cpu)
 
 	/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
 	if (rcu_scheduler_fully_active)
-		(void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp);
+		(void)rcu_spawn_one_boost_kthread(&rcu_state, rnp);
 }
 
 #else /* #ifdef CONFIG_RCU_BOOST */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH tip/core/rcu 19/19] rcu: Remove rcu_data_p pointer to default rcu_data structure
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (17 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 18/19] rcu: Remove rcu_state_p pointer to default rcu_state structure Paul E. McKenney
@ 2018-08-29 22:20 ` Paul E. McKenney
  2018-08-29 22:22 ` [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Paul E. McKenney

The rcu_data_p pointer references the default set of per-CPU rcu_data
structures, that is, those that call_rcu() uses, as opposed to
call_rcu_bh() and sometimes call_rcu_sched().  But there is now only one
set of per-CPU rcu_data structures, so that one set is by definition
the default, which means that the rcu_data_p pointer no longer serves
any useful purpose.  This commit therefore removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        |  1 -
 kernel/rcu/tree_plugin.h | 10 +++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a3bcf08ad596..f0e7e3972fd9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -85,7 +85,6 @@ struct rcu_state rcu_state = {
 	.ofl_lock = __SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock),
 };
 
-static struct rcu_data __percpu *const rcu_data_p = &rcu_data;
 LIST_HEAD(rcu_struct_flavors);
 
 /* Dump rcu_node combining tree at boot to verify correct setup. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 329d5802d899..18175ca19f34 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -302,11 +302,11 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
 static void rcu_qs(void)
 {
 	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
-	if (__this_cpu_read(rcu_data_p->cpu_no_qs.s)) {
+	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
 		trace_rcu_grace_period(TPS("rcu_preempt"),
-				       __this_cpu_read(rcu_data_p->gp_seq),
+				       __this_cpu_read(rcu_data.gp_seq),
 				       TPS("cpuqs"));
-		__this_cpu_write(rcu_data_p->cpu_no_qs.b.norm, false);
+		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
 		barrier(); /* Coordinate with rcu_flavor_check_callbacks(). */
 		current->rcu_read_unlock_special.b.need_qs = false;
 	}
@@ -805,8 +805,8 @@ static void rcu_flavor_check_callbacks(int user)
 
 	/* If GP is oldish, ask for help from rcu_read_unlock_special(). */
 	if (t->rcu_read_lock_nesting > 0 &&
-	    __this_cpu_read(rcu_data_p->core_needs_qs) &&
-	    __this_cpu_read(rcu_data_p->cpu_no_qs.b.norm) &&
+	    __this_cpu_read(rcu_data.core_needs_qs) &&
+	    __this_cpu_read(rcu_data.cpu_no_qs.b.norm) &&
 	    !t->rcu_read_unlock_special.b.need_qs &&
 	    time_after(jiffies, rsp->gp_start + HZ))
 		t->rcu_read_unlock_special.b.need_qs = true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0
  2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
                   ` (18 preceding siblings ...)
  2018-08-29 22:20 ` [PATCH tip/core/rcu 19/19] rcu: Remove rcu_data_p pointer to default rcu_data structure Paul E. McKenney
@ 2018-08-29 22:22 ` Paul E. McKenney
  19 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-29 22:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel

On Wed, Aug 29, 2018 at 03:20:21PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This series contains the RCU flavor consolidation, along with some initial
> cleanup work enabled by that consolidation (and some that became apparent
> while cleaning up):
> 
> 1.	Refactor rcu_{nmi,irq}_{enter,exit}(), saving a branch on the
> 	idle entry/exit hotpaths, courtesy of Byungchul Park,
> 
> 2.	Defer reporting RCU-preempt quiescent states when disabled.
> 	This is the key commit that consolidates the RCU-bh and RCU-sched
> 	flavors into RCU, however, the RCU-bh and RCU-sched flavors
> 	still exist independently as well at this point.
> 
> 3.	Test extended "rcu" read-side critical sections.  This commit
> 	causes rcutorture to test RCU's new-found ability to act as
> 	the combination of RCU, RCU-bh, and RCU-sched.
> 
> 4.	Allow processing deferred QSes for exiting RCU-preempt readers.
> 	This is a optimization.
> 
> 5.	Remove now-unused ->b.exp_need_qs field from the rcu_special union.
> 
> 6.	Add warning to detect half-interrupts.  Test the claim that
> 	the Linux kernel no longer does half-interrupts.
> 
> 7.	Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe, that is,
> 	make the consolidated RCU inherit RCU-bh's denial-of-service
> 	avoidance mechanism.
> 
> 8.	Report expedited grace periods at context-switch time.  This is
> 	an optimization enabled by the RCU flavor consolidation.
> 
> 9.	Define RCU-bh update API in terms of RCU.  This commit gets rid
> 	of the RCU-bh update mechanism.
> 
> 10.	Update comments and help text to account for the removal of
> 	the RCU-bh update mechanism.
> 
> 11.	Drop "wake" parameter from rcu_report_exp_rdp().
> 
> 12.	Fix typo in rcu_get_gp_kthreads_prio() header comment.
> 
> 13.	Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds.
> 	Although this commit gets rid of the RCU-sched update mechanism
> 	from PREEMPT builds, it of course remains as the sole RCU flavor
> 	for !PREEMPT && !SMP builds.
> 
> 14.	Express Tiny RCU updates in terms of RCU rather than RCU-sched.
> 	This will enable additional cleanups and code savings.
> 
> 15.	Remove RCU_STATE_INITIALIZER() in favor of just using an open-coded
> 	initializer for the sole remaining rcu_state structure.
> 
> 16.	Eliminate rcu_state structure's ->call field, as it is now always
> 	just call_rcu().
> 
> 17.	Remove rcu_state structure's ->rda field, as there is now only one
> 	set of per-CPU rcu_data structures.

And last, but perhaps not least:

18.	Remove rcu_state_p pointer to default rcu_state structure, now
	that there is only one rcu_state structure.

19.	Remove rcu_data_p pointer to default rcu_data structure, now that
	there is only one set of per-CPU rcu_data structures.

							Thanx, Paul

> ------------------------------------------------------------------------
> 
>  Documentation/RCU/Design/Requirements/Requirements.html |   50 -
>  include/linux/rcupdate.h                                |   48 -
>  include/linux/rcupdate_wait.h                           |    6 
>  include/linux/rcutiny.h                                 |   61 +
>  include/linux/rcutree.h                                 |   31 
>  include/linux/sched.h                                   |    6 
>  kernel/rcu/Kconfig                                      |   10 
>  kernel/rcu/rcutorture.c                                 |    1 
>  kernel/rcu/tiny.c                                       |  163 +--
>  kernel/rcu/tree.c                                       |  655 +++++-----------
>  kernel/rcu/tree.h                                       |   44 -
>  kernel/rcu/tree_exp.h                                   |  256 +++---
>  kernel/rcu/tree_plugin.h                                |  523 ++++++------
>  kernel/rcu/update.c                                     |    2 
>  kernel/softirq.c                                        |    3 
>  15 files changed, 836 insertions(+), 1023 deletions(-)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
  2018-08-29 22:20 ` [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}() Paul E. McKenney
@ 2018-08-30 18:10   ` Steven Rostedt
  2018-08-30 23:02     ` Paul E. McKenney
  2018-08-31  2:25     ` Byungchul Park
  0 siblings, 2 replies; 49+ messages in thread
From: Steven Rostedt @ 2018-08-30 18:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Byungchul Park

On Wed, 29 Aug 2018 15:20:29 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> This commit also changes order of execution from this:
> 
> 	rcu_dynticks_task_exit();
> 	rcu_dynticks_eqs_exit();
> 	trace_rcu_dyntick();
> 	rcu_cleanup_after_idle();
> 
> To this:
> 
> 	rcu_dynticks_task_exit();
> 	rcu_dynticks_eqs_exit();
> 	rcu_cleanup_after_idle();
> 	trace_rcu_dyntick();
> 
> In other words, the calls to trace_rcu_dyntick() and trace_rcu_dyntick()

How is trace_rcu_dyntick() and trace_rcu_dyntick reversed ? ;-)

> are reversed.  This has no functional effect because the real
> concern is whether a given call is before or after the call to
> rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
> call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
> CPU and after that call RCU is watching.
> 
> A similar switch in calling order happens on the idle-entry path, with
> similar lack of effect for the same reasons.
> 
> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/tree.c | 61 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0b760c1369f7..0adf77923e8b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -771,17 +771,18 @@ void rcu_user_enter(void)
>  #endif /* CONFIG_NO_HZ_FULL */
>  
>  /**
> - * rcu_nmi_exit - inform RCU of exit from NMI context
> + * rcu_nmi_exit_common - inform RCU of exit from NMI context
> + * @irq: Is this call from rcu_irq_exit?
>   *
>   * If we are returning from the outermost NMI handler that interrupted an
>   * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
>   * to let the RCU grace-period handling know that the CPU is back to
>   * being RCU-idle.
>   *
> - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> + * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.

As this is a static function, this description doesn't make sense. You
need to move the description down to the new rcu_nmi_exit() below.

Other than that...

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve


>   */
> -void rcu_nmi_exit(void)
> +static __always_inline void rcu_nmi_exit_common(bool irq)
>  {
>  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>  
> @@ -807,7 +808,22 @@ void rcu_nmi_exit(void)
>  	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
>  	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
>  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
> +
> +	if (irq)
> +		rcu_prepare_for_idle();
> +
>  	rcu_dynticks_eqs_enter();
> +
> +	if (irq)
> +		rcu_dynticks_task_enter();
> +}
> +
> +/**
> + * rcu_nmi_exit - inform RCU of exit from NMI context
> + */
> +void rcu_nmi_exit(void)
> +{
> +	rcu_nmi_exit_common(false);
>  }
>  
>  /**
> @@ -831,14 +847,8 @@ void rcu_nmi_exit(void)
>   */
>  void rcu_irq_exit(void)
>  {
> -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> -
>  	lockdep_assert_irqs_disabled();
> -	if (rdtp->dynticks_nmi_nesting == 1)
> -		rcu_prepare_for_idle();
> -	rcu_nmi_exit();
> -	if (rdtp->dynticks_nmi_nesting == 0)
> -		rcu_dynticks_task_enter();
> +	rcu_nmi_exit_common(true);
>  }
>  
>  /*
> @@ -921,7 +931,8 @@ void rcu_user_exit(void)
>  #endif /* CONFIG_NO_HZ_FULL */
>  
>  /**
> - * rcu_nmi_enter - inform RCU of entry to NMI context
> + * rcu_nmi_enter_common - inform RCU of entry to NMI context
> + * @irq: Is this call from rcu_irq_enter?
>   *
>   * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
>   * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
> @@ -929,10 +940,10 @@ void rcu_user_exit(void)
>   * long as the nesting level does not overflow an int.  (You will probably
>   * run out of stack space first.)
>   *
> - * If you add or remove a call to rcu_nmi_enter(), be sure to test
> + * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_nmi_enter(void)
> +static __always_inline void rcu_nmi_enter_common(bool irq)
>  {
>  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>  	long incby = 2;
> @@ -949,7 +960,15 @@ void rcu_nmi_enter(void)
>  	 * period (observation due to Andy Lutomirski).
>  	 */
>  	if (rcu_dynticks_curr_cpu_in_eqs()) {
> +
> +		if (irq)
> +			rcu_dynticks_task_exit();
> +
>  		rcu_dynticks_eqs_exit();
> +
> +		if (irq)
> +			rcu_cleanup_after_idle();
> +
>  		incby = 1;
>  	}
>  	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
> @@ -960,6 +979,14 @@ void rcu_nmi_enter(void)
>  	barrier();
>  }
>  
> +/**
> + * rcu_nmi_enter - inform RCU of entry to NMI context
> + */
> +void rcu_nmi_enter(void)
> +{
> +	rcu_nmi_enter_common(false);
> +}
> +
>  /**
>   * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
>   *
> @@ -984,14 +1011,8 @@ void rcu_nmi_enter(void)
>   */
>  void rcu_irq_enter(void)
>  {
> -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> -
>  	lockdep_assert_irqs_disabled();
> -	if (rdtp->dynticks_nmi_nesting == 0)
> -		rcu_dynticks_task_exit();
> -	rcu_nmi_enter();
> -	if (rdtp->dynticks_nmi_nesting == 1)
> -		rcu_cleanup_after_idle();
> +	rcu_nmi_enter_common(true);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
  2018-08-30 18:10   ` Steven Rostedt
@ 2018-08-30 23:02     ` Paul E. McKenney
  2018-08-31  2:25     ` Byungchul Park
  1 sibling, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-08-30 23:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, Byungchul Park

On Thu, Aug 30, 2018 at 02:10:32PM -0400, Steven Rostedt wrote:
> On Wed, 29 Aug 2018 15:20:29 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > This commit also changes order of execution from this:
> > 
> > 	rcu_dynticks_task_exit();
> > 	rcu_dynticks_eqs_exit();
> > 	trace_rcu_dyntick();
> > 	rcu_cleanup_after_idle();
> > 
> > To this:
> > 
> > 	rcu_dynticks_task_exit();
> > 	rcu_dynticks_eqs_exit();
> > 	rcu_cleanup_after_idle();
> > 	trace_rcu_dyntick();
> > 
> > In other words, the calls to trace_rcu_dyntick() and trace_rcu_dyntick()
> 
> How is trace_rcu_dyntick() and trace_rcu_dyntick reversed ? ;-)

Very carefully?

I changed the first trace_rcu_dyntick() to rcu_cleanup_after_idle(),
good catch!

> > are reversed.  This has no functional effect because the real
> > concern is whether a given call is before or after the call to
> > rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
> > call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
> > CPU and after that call RCU is watching.
> > 
> > A similar switch in calling order happens on the idle-entry path, with
> > similar lack of effect for the same reasons.
> > 
> > Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/tree.c | 61 +++++++++++++++++++++++++++++++----------------
> >  1 file changed, 41 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 0b760c1369f7..0adf77923e8b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -771,17 +771,18 @@ void rcu_user_enter(void)
> >  #endif /* CONFIG_NO_HZ_FULL */
> >  
> >  /**
> > - * rcu_nmi_exit - inform RCU of exit from NMI context
> > + * rcu_nmi_exit_common - inform RCU of exit from NMI context
> > + * @irq: Is this call from rcu_irq_exit?
> >   *
> >   * If we are returning from the outermost NMI handler that interrupted an
> >   * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> >   * to let the RCU grace-period handling know that the CPU is back to
> >   * being RCU-idle.
> >   *
> > - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> > + * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
> >   * with CONFIG_RCU_EQS_DEBUG=y.
> 
> As this is a static function, this description doesn't make sense. You
> need to move the description down to the new rcu_nmi_exit() below.

Heh!  This will give git a chance to show off its conflict-resolution
capabilities!!!  Let's see how it does...

Not bad!  It resolved the conflicts automatically despite the code
movement.  Nice!!!  ;-)

> Other than that...
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Of course my penalty for my lack of faith in git is a second rebase
to pull this in.  ;-)

Thank you for your review and comments!

							Thanx, Paul

> -- Steve
> 
> 
> >   */
> > -void rcu_nmi_exit(void)
> > +static __always_inline void rcu_nmi_exit_common(bool irq)
> >  {
> >  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >  
> > @@ -807,7 +808,22 @@ void rcu_nmi_exit(void)
> >  	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
> >  	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
> >  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
> > +
> > +	if (irq)
> > +		rcu_prepare_for_idle();
> > +
> >  	rcu_dynticks_eqs_enter();
> > +
> > +	if (irq)
> > +		rcu_dynticks_task_enter();
> > +}
> > +
> > +/**
> > + * rcu_nmi_exit - inform RCU of exit from NMI context
> > + */
> > +void rcu_nmi_exit(void)
> > +{
> > +	rcu_nmi_exit_common(false);
> >  }
> >  
> >  /**
> > @@ -831,14 +847,8 @@ void rcu_nmi_exit(void)
> >   */
> >  void rcu_irq_exit(void)
> >  {
> > -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > -
> >  	lockdep_assert_irqs_disabled();
> > -	if (rdtp->dynticks_nmi_nesting == 1)
> > -		rcu_prepare_for_idle();
> > -	rcu_nmi_exit();
> > -	if (rdtp->dynticks_nmi_nesting == 0)
> > -		rcu_dynticks_task_enter();
> > +	rcu_nmi_exit_common(true);
> >  }
> >  
> >  /*
> > @@ -921,7 +931,8 @@ void rcu_user_exit(void)
> >  #endif /* CONFIG_NO_HZ_FULL */
> >  
> >  /**
> > - * rcu_nmi_enter - inform RCU of entry to NMI context
> > + * rcu_nmi_enter_common - inform RCU of entry to NMI context
> > + * @irq: Is this call from rcu_irq_enter?
> >   *
> >   * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
> >   * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
> > @@ -929,10 +940,10 @@ void rcu_user_exit(void)
> >   * long as the nesting level does not overflow an int.  (You will probably
> >   * run out of stack space first.)
> >   *
> > - * If you add or remove a call to rcu_nmi_enter(), be sure to test
> > + * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
> >   * with CONFIG_RCU_EQS_DEBUG=y.
> >   */
> > -void rcu_nmi_enter(void)
> > +static __always_inline void rcu_nmi_enter_common(bool irq)
> >  {
> >  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >  	long incby = 2;
> > @@ -949,7 +960,15 @@ void rcu_nmi_enter(void)
> >  	 * period (observation due to Andy Lutomirski).
> >  	 */
> >  	if (rcu_dynticks_curr_cpu_in_eqs()) {
> > +
> > +		if (irq)
> > +			rcu_dynticks_task_exit();
> > +
> >  		rcu_dynticks_eqs_exit();
> > +
> > +		if (irq)
> > +			rcu_cleanup_after_idle();
> > +
> >  		incby = 1;
> >  	}
> >  	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
> > @@ -960,6 +979,14 @@ void rcu_nmi_enter(void)
> >  	barrier();
> >  }
> >  
> > +/**
> > + * rcu_nmi_enter - inform RCU of entry to NMI context
> > + */
> > +void rcu_nmi_enter(void)
> > +{
> > +	rcu_nmi_enter_common(false);
> > +}
> > +
> >  /**
> >   * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
> >   *
> > @@ -984,14 +1011,8 @@ void rcu_nmi_enter(void)
> >   */
> >  void rcu_irq_enter(void)
> >  {
> > -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > -
> >  	lockdep_assert_irqs_disabled();
> > -	if (rdtp->dynticks_nmi_nesting == 0)
> > -		rcu_dynticks_task_exit();
> > -	rcu_nmi_enter();
> > -	if (rdtp->dynticks_nmi_nesting == 1)
> > -		rcu_cleanup_after_idle();
> > +	rcu_nmi_enter_common(true);
> >  }
> >  
> >  /*
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}()
  2018-08-30 18:10   ` Steven Rostedt
  2018-08-30 23:02     ` Paul E. McKenney
@ 2018-08-31  2:25     ` Byungchul Park
  1 sibling, 0 replies; 49+ messages in thread
From: Byungchul Park @ 2018-08-31  2:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg, joel, kernel-team

On Thu, Aug 30, 2018 at 02:10:32PM -0400, Steven Rostedt wrote:
> On Wed, 29 Aug 2018 15:20:29 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > This commit also changes order of execution from this:
> > 
> > 	rcu_dynticks_task_exit();
> > 	rcu_dynticks_eqs_exit();
> > 	trace_rcu_dyntick();
> > 	rcu_cleanup_after_idle();
> > 
> > To this:
> > 
> > 	rcu_dynticks_task_exit();
> > 	rcu_dynticks_eqs_exit();
> > 	rcu_cleanup_after_idle();
> > 	trace_rcu_dyntick();
> > 
> > In other words, the calls to trace_rcu_dyntick() and trace_rcu_dyntick()
> 
> How is trace_rcu_dyntick() and trace_rcu_dyntick reversed ? ;-)
> 
> > are reversed.  This has no functional effect because the real
> > concern is whether a given call is before or after the call to
> > rcu_dynticks_eqs_exit(), and this patch does not change that.  Before the
> > call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
> > CPU and after that call RCU is watching.
> > 
> > A similar switch in calling order happens on the idle-entry path, with
> > similar lack of effect for the same reasons.
> > 
> > Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/tree.c | 61 +++++++++++++++++++++++++++++++----------------
> >  1 file changed, 41 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 0b760c1369f7..0adf77923e8b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -771,17 +771,18 @@ void rcu_user_enter(void)
> >  #endif /* CONFIG_NO_HZ_FULL */
> >  
> >  /**
> > - * rcu_nmi_exit - inform RCU of exit from NMI context
> > + * rcu_nmi_exit_common - inform RCU of exit from NMI context
> > + * @irq: Is this call from rcu_irq_exit?
> >   *
> >   * If we are returning from the outermost NMI handler that interrupted an
> >   * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> >   * to let the RCU grace-period handling know that the CPU is back to
> >   * being RCU-idle.
> >   *
> > - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> > + * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
> >   * with CONFIG_RCU_EQS_DEBUG=y.
> 
> As this is a static function, this description doesn't make sense. You
> need to move the description down to the new rcu_nmi_exit() below.

Right.. I should've done that.

Thanks you Steve.

Byungchul

> Other than that...
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> -- Steve
> 
> 
> >   */
> > -void rcu_nmi_exit(void)
> > +static __always_inline void rcu_nmi_exit_common(bool irq)
> >  {
> >  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >  
> > @@ -807,7 +808,22 @@ void rcu_nmi_exit(void)
> >  	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
> >  	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
> >  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
> > +
> > +	if (irq)
> > +		rcu_prepare_for_idle();
> > +
> >  	rcu_dynticks_eqs_enter();
> > +
> > +	if (irq)
> > +		rcu_dynticks_task_enter();
> > +}
> > +
> > +/**
> > + * rcu_nmi_exit - inform RCU of exit from NMI context
> > + */
> > +void rcu_nmi_exit(void)
> > +{
> > +	rcu_nmi_exit_common(false);
> >  }
> >  
> >  /**
> > @@ -831,14 +847,8 @@ void rcu_nmi_exit(void)
> >   */
> >  void rcu_irq_exit(void)
> >  {
> > -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > -
> >  	lockdep_assert_irqs_disabled();
> > -	if (rdtp->dynticks_nmi_nesting == 1)
> > -		rcu_prepare_for_idle();
> > -	rcu_nmi_exit();
> > -	if (rdtp->dynticks_nmi_nesting == 0)
> > -		rcu_dynticks_task_enter();
> > +	rcu_nmi_exit_common(true);
> >  }
> >  
> >  /*
> > @@ -921,7 +931,8 @@ void rcu_user_exit(void)
> >  #endif /* CONFIG_NO_HZ_FULL */
> >  
> >  /**
> > - * rcu_nmi_enter - inform RCU of entry to NMI context
> > + * rcu_nmi_enter_common - inform RCU of entry to NMI context
> > + * @irq: Is this call from rcu_irq_enter?
> >   *
> >   * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
> >   * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
> > @@ -929,10 +940,10 @@ void rcu_user_exit(void)
> >   * long as the nesting level does not overflow an int.  (You will probably
> >   * run out of stack space first.)
> >   *
> > - * If you add or remove a call to rcu_nmi_enter(), be sure to test
> > + * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
> >   * with CONFIG_RCU_EQS_DEBUG=y.
> >   */
> > -void rcu_nmi_enter(void)
> > +static __always_inline void rcu_nmi_enter_common(bool irq)
> >  {
> >  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >  	long incby = 2;
> > @@ -949,7 +960,15 @@ void rcu_nmi_enter(void)
> >  	 * period (observation due to Andy Lutomirski).
> >  	 */
> >  	if (rcu_dynticks_curr_cpu_in_eqs()) {
> > +
> > +		if (irq)
> > +			rcu_dynticks_task_exit();
> > +
> >  		rcu_dynticks_eqs_exit();
> > +
> > +		if (irq)
> > +			rcu_cleanup_after_idle();
> > +
> >  		incby = 1;
> >  	}
> >  	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
> > @@ -960,6 +979,14 @@ void rcu_nmi_enter(void)
> >  	barrier();
> >  }
> >  
> > +/**
> > + * rcu_nmi_enter - inform RCU of entry to NMI context
> > + */
> > +void rcu_nmi_enter(void)
> > +{
> > +	rcu_nmi_enter_common(false);
> > +}
> > +
> >  /**
> >   * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
> >   *
> > @@ -984,14 +1011,8 @@ void rcu_nmi_enter(void)
> >   */
> >  void rcu_irq_enter(void)
> >  {
> > -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > -
> >  	lockdep_assert_irqs_disabled();
> > -	if (rdtp->dynticks_nmi_nesting == 0)
> > -		rcu_dynticks_task_exit();
> > -	rcu_nmi_enter();
> > -	if (rdtp->dynticks_nmi_nesting == 1)
> > -		rcu_cleanup_after_idle();
> > +	rcu_nmi_enter_common(true);
> >  }
> >  
> >  /*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-08-29 22:20 ` [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled Paul E. McKenney
@ 2018-10-29 11:24   ` Ran Rozenstein
  2018-10-29 14:27     ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Ran Rozenstein @ 2018-10-29 11:24 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	Maor Gottlieb, Tariq Toukan, Eran Ben Elisha, Leon Romanovsky

Hi Paul and all,

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Paul E. McKenney
> Sent: Thursday, August 30, 2018 01:21
> To: linux-kernel@vger.kernel.org
> Cc: mingo@kernel.org; jiangshanlai@gmail.com; dipankar@in.ibm.com;
> akpm@linux-foundation.org; mathieu.desnoyers@efficios.com;
> josh@joshtriplett.org; tglx@linutronix.de; peterz@infradead.org;
> rostedt@goodmis.org; dhowells@redhat.com; edumazet@google.com;
> fweisbec@gmail.com; oleg@redhat.com; joel@joelfernandes.org; Paul E.
> McKenney <paulmck@linux.vnet.ibm.com>
> Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> quiescent states when disabled
> 
> This commit defers reporting of RCU-preempt quiescent states at
> rcu_read_unlock_special() time when any of interrupts, softirq, or
> preemption are disabled.  These deferred quiescent states are reported at a
> later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> operation.  Of course, if another RCU read-side critical section has started in
> the meantime, the reporting of the quiescent state will be further deferred.
> 
> This also means that disabling preemption, interrupts, and/or softirqs will act
> as an RCU-preempt read-side critical section.
> This is enforced by checking preempt_count() as needed.
> 
> Some special cases must be handled on an ad-hoc basis, for example,
> context switch is a quiescent state even though both the scheduler and
> do_exit() disable preemption.  In these cases, additional calls to
> rcu_preempt_deferred_qs() override the preemption disabling.  Similar logic
> overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> this case the quiescent state happened just before the corresponding
> scheduling-clock interrupt.
> 
> In theory, this change lifts a long-standing restriction that required that if
> interrupts were disabled across a call to rcu_read_unlock() that the matching
> rcu_read_lock() also be contained within that interrupts-disabled region of
> code.  Because the reporting of the corresponding RCU-preempt quiescent
> state is now deferred until after interrupts have been enabled, it is no longer
> possible for this situation to result in deadlocks involving the scheduler's
> runqueue and priority-inheritance locks.  This may allow some code
> simplification that might reduce interrupt latency a bit.  Unfortunately, in
> practice this would also defer deboosting a low-priority task that had been
> subjected to RCU priority boosting, so real-time-response considerations
> might well force this restriction to remain in place.
> 
> Because RCU-preempt grace periods are now blocked not only by RCU read-
> side critical sections, but also by disabling of interrupts, preemption, and
> softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of
> RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> additional plumbing to provide the network denial-of-service guarantees
> that have been traditionally provided by RCU-bh.  Once these are in place,
> CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> This would mean that all kernels would have but one flavor of RCU, which
> would open the door to significant code cleanup.
> 
> Moving to a single flavor of RCU would also have the beneficial effect of
> reducing the NOCB kthreads by at least a factor of two.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck:
> Apply rcu_read_unlock_special() preempt_count() feedback
>   from Joel Fernandes. ]
> [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
>   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located
> by kbuild test robot involving recursion
>   via rcu_preempt_deferred_qs(). ]
> ---
>  .../RCU/Design/Requirements/Requirements.html |  50 +++---
>  include/linux/rcutiny.h                       |   5 +
>  kernel/rcu/tree.c                             |   9 ++
>  kernel/rcu/tree.h                             |   3 +
>  kernel/rcu/tree_exp.h                         |  71 +++++++--
>  kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
>  6 files changed, 205 insertions(+), 77 deletions(-)
> 

We started seeing the trace below in our regression system, after I bisected I found this is the offending commit.
This appears immediately on boot. 
Please let me know if you need any additional details.

Thanks,
Ran


[2018-10-27 05:53:07] ================================================================================
[2018-10-27 05:53:07] UBSAN: Undefined behaviour in kernel/rcu/tree_plugin.h:620:28
[2018-10-27 05:53:07] signed integer overflow:
[2018-10-27 05:53:07] 0 - -2147483648 cannot be represented in type 'int'
[2018-10-27 05:53:07] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-for-upstream-dbg-2018-10-25_03-10-39-82 #1
[2018-10-27 05:53:07] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[2018-10-27 05:53:07] Call Trace:
[2018-10-27 05:53:07]  dump_stack+0x9a/0xeb
[2018-10-27 05:53:07]  ubsan_epilogue+0x9/0x7c
[2018-10-27 05:53:07]  handle_overflow+0x235/0x278
[2018-10-27 05:53:07]  ? __ubsan_handle_negate_overflow+0x1bd/0x1bd
[2018-10-27 05:53:07]  ? sched_clock_local+0xd4/0x140
[2018-10-27 05:53:07]  ? kvm_clock_read+0x14/0x30
[2018-10-27 05:53:07]  rcu_preempt_deferred_qs+0x12a/0x150
[2018-10-27 05:53:07]  rcu_note_context_switch+0x1b9/0x1ac0
[2018-10-27 05:53:07]  __schedule+0x22d/0x1fd0
[2018-10-27 05:53:07]  ? pci_mmcfg_check_reserved+0x130/0x130
[2018-10-27 05:53:07]  ? sched_set_stop_task+0x330/0x330
[2018-10-27 05:53:07]  ? lockdep_hardirqs_on+0x360/0x620
[2018-10-27 05:53:07]  schedule_idle+0x45/0x80
[2018-10-27 05:53:07]  do_idle+0x23e/0x3e0
[2018-10-27 05:53:07]  ? check_flags.part.26+0x440/0x440
[2018-10-27 05:53:07]  ? arch_cpu_idle_exit+0x40/0x40
[2018-10-27 05:53:07]  ? __wake_up_common+0x156/0x5c0
[2018-10-27 05:53:07]  ? _raw_spin_unlock_irqrestore+0x59/0x70
[2018-10-27 05:53:07]  cpu_startup_entry+0x19/0x20
[2018-10-27 05:53:07]  start_secondary+0x420/0x570
[2018-10-27 05:53:07]  ? set_cpu_sibling_map+0x2f90/0x2f90
[2018-10-27 05:53:07]  secondary_startup_64+0xa4/0xb0
[2018-10-27 05:53:07] ================================================================================
[2018-10-27 05:53:07] ================================================================================
[2018-10-27 05:53:07] UBSAN: Undefined behaviour in kernel/rcu/tree_plugin.h:624:28
[2018-10-27 05:53:07] signed integer overflow:
[2018-10-27 05:53:07] -2147483648 + -2147483648 cannot be represented in type 'int'
[2018-10-27 05:53:07] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-for-upstream-dbg-2018-10-25_03-10-39-82 #1
[2018-10-27 05:53:07] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[2018-10-27 05:53:07] Call Trace:
[2018-10-27 05:53:07]  dump_stack+0x9a/0xeb
[2018-10-27 05:53:07]  ubsan_epilogue+0x9/0x7c
[2018-10-27 05:53:07]  handle_overflow+0x235/0x278
[2018-10-27 05:53:07]  ? __ubsan_handle_negate_overflow+0x1bd/0x1bd
[2018-10-27 05:53:07]  ? check_flags.part.26+0x440/0x440
[2018-10-27 05:53:07]  ? _raw_spin_unlock_irqrestore+0x3c/0x70
[2018-10-27 05:53:07]  ? _raw_spin_unlock_irqrestore+0x3c/0x70
[2018-10-27 05:53:07]  ? lockdep_hardirqs_off+0x1fd/0x2c0
[2018-10-27 05:53:07]  ? kvm_clock_read+0x14/0x30
[2018-10-27 05:53:07]  rcu_preempt_deferred_qs+0x145/0x150
[2018-10-27 05:53:07]  rcu_note_context_switch+0x1b9/0x1ac0
[2018-10-27 05:53:07]  __schedule+0x22d/0x1fd0
[2018-10-27 05:53:07]  ? pci_mmcfg_check_reserved+0x130/0x130
[2018-10-27 05:53:07]  ? sched_set_stop_task+0x330/0x330
[2018-10-27 05:53:07]  ? lockdep_hardirqs_on+0x360/0x620
[2018-10-27 05:53:07]  schedule_idle+0x45/0x80
[2018-10-27 05:53:07]  do_idle+0x23e/0x3e0
[2018-10-27 05:53:07]  ? check_flags.part.26+0x440/0x440
[2018-10-27 05:53:07]  ? arch_cpu_idle_exit+0x40/0x40
[2018-10-27 05:53:07]  ? __wake_up_common+0x156/0x5c0
[2018-10-27 05:53:07]  ? _raw_spin_unlock_irqrestore+0x59/0x70
[2018-10-27 05:53:07]  cpu_startup_entry+0x19/0x20
[2018-10-27 05:53:07]  start_secondary+0x420/0x570
[2018-10-27 05:53:07]  ? set_cpu_sibling_map+0x2f90/0x2f90
[2018-10-27 05:53:07]  secondary_startup_64+0xa4/0xb0
[2018-10-27 05:53:07] ================================================================================



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-10-29 11:24   ` Ran Rozenstein
@ 2018-10-29 14:27     ` Paul E. McKenney
  2018-10-30  3:44       ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-10-29 14:27 UTC (permalink / raw)
  To: Ran Rozenstein
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Mon, Oct 29, 2018 at 11:24:42AM +0000, Ran Rozenstein wrote:
> Hi Paul and all,
> 
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > owner@vger.kernel.org] On Behalf Of Paul E. McKenney
> > Sent: Thursday, August 30, 2018 01:21
> > To: linux-kernel@vger.kernel.org
> > Cc: mingo@kernel.org; jiangshanlai@gmail.com; dipankar@in.ibm.com;
> > akpm@linux-foundation.org; mathieu.desnoyers@efficios.com;
> > josh@joshtriplett.org; tglx@linutronix.de; peterz@infradead.org;
> > rostedt@goodmis.org; dhowells@redhat.com; edumazet@google.com;
> > fweisbec@gmail.com; oleg@redhat.com; joel@joelfernandes.org; Paul E.
> > McKenney <paulmck@linux.vnet.ibm.com>
> > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > quiescent states when disabled
> > 
> > This commit defers reporting of RCU-preempt quiescent states at
> > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > preemption are disabled.  These deferred quiescent states are reported at a
> > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > operation.  Of course, if another RCU read-side critical section has started in
> > the meantime, the reporting of the quiescent state will be further deferred.
> > 
> > This also means that disabling preemption, interrupts, and/or softirqs will act
> > as an RCU-preempt read-side critical section.
> > This is enforced by checking preempt_count() as needed.
> > 
> > Some special cases must be handled on an ad-hoc basis, for example,
> > context switch is a quiescent state even though both the scheduler and
> > do_exit() disable preemption.  In these cases, additional calls to
> > rcu_preempt_deferred_qs() override the preemption disabling.  Similar logic
> > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > this case the quiescent state happened just before the corresponding
> > scheduling-clock interrupt.
> > 
> > In theory, this change lifts a long-standing restriction that required that if
> > interrupts were disabled across a call to rcu_read_unlock() that the matching
> > rcu_read_lock() also be contained within that interrupts-disabled region of
> > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > state is now deferred until after interrupts have been enabled, it is no longer
> > possible for this situation to result in deadlocks involving the scheduler's
> > runqueue and priority-inheritance locks.  This may allow some code
> > simplification that might reduce interrupt latency a bit.  Unfortunately, in
> > practice this would also defer deboosting a low-priority task that had been
> > subjected to RCU priority boosting, so real-time-response considerations
> > might well force this restriction to remain in place.
> > 
> > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > side critical sections, but also by disabling of interrupts, preemption, and
> > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of
> > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > additional plumbing to provide the network denial-of-service guarantees
> > that have been traditionally provided by RCU-bh.  Once these are in place,
> > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > This would mean that all kernels would have but one flavor of RCU, which
> > would open the door to significant code cleanup.
> > 
> > Moving to a single flavor of RCU would also have the beneficial effect of
> > reducing the NOCB kthreads by at least a factor of two.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck:
> > Apply rcu_read_unlock_special() preempt_count() feedback
> >   from Joel Fernandes. ]
> > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located
> > by kbuild test robot involving recursion
> >   via rcu_preempt_deferred_qs(). ]
> > ---
> >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> >  include/linux/rcutiny.h                       |   5 +
> >  kernel/rcu/tree.c                             |   9 ++
> >  kernel/rcu/tree.h                             |   3 +
> >  kernel/rcu/tree_exp.h                         |  71 +++++++--
> >  kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
> >  6 files changed, 205 insertions(+), 77 deletions(-)
> > 
> 
> We started seeing the trace below in our regression system, after I bisected I found this is the offending commit.
> This appears immediately on boot. 
> Please let me know if you need any additional details.

Interesting.  Here is the offending function:

	static void rcu_preempt_deferred_qs(struct task_struct *t)
	{
		unsigned long flags;
		bool couldrecurse = t->rcu_read_lock_nesting >= 0;

		if (!rcu_preempt_need_deferred_qs(t))
			return;
		if (couldrecurse)
			t->rcu_read_lock_nesting -= INT_MIN;
		local_irq_save(flags);
		rcu_preempt_deferred_qs_irqrestore(t, flags);
		if (couldrecurse)
			t->rcu_read_lock_nesting += INT_MIN;
	}

Using twos-complement arithmetic (which the kernel build gcc arguments
enforce, last I checked) this does work.  But as UBSAN says, subtracting
INT_MIN is unconditionally undefined behavior according to the C standard.

Good catch!!!

So how do I make the above code not simply function, but rather meet
the C standard?

One approach to add INT_MIN going in, then add INT_MAX and then add 1
coming out.

Another approach is to sacrifice the INT_MAX value (should be plenty
safe), thus subtract INT_MAX going in and add INT_MAX coming out.
For consistency, I suppose that I should change the INT_MIN in
__rcu_read_unlock() to -INT_MAX.

I could also leave __rcu_read_unlock() alone and XOR the top
bit of t->rcu_read_lock_nesting on entry and exit to/from
rcu_preempt_deferred_qs().

Sacrificing the INT_MIN value seems most maintainable, as in the following
patch.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index bd8186d0f4a7..f1b40c6d36e4 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -424,7 +424,7 @@ void __rcu_read_unlock(void)
 		--t->rcu_read_lock_nesting;
 	} else {
 		barrier();  /* critical section before exit code. */
-		t->rcu_read_lock_nesting = INT_MIN;
+		t->rcu_read_lock_nesting = -INT_MAX;
 		barrier();  /* assign before ->rcu_read_unlock_special load */
 		if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
 			rcu_read_unlock_special(t);
@@ -617,11 +617,11 @@ static void rcu_preempt_deferred_qs(struct task_struct *t)
 	if (!rcu_preempt_need_deferred_qs(t))
 		return;
 	if (couldrecurse)
-		t->rcu_read_lock_nesting -= INT_MIN;
+		t->rcu_read_lock_nesting -= INT_MAX;
 	local_irq_save(flags);
 	rcu_preempt_deferred_qs_irqrestore(t, flags);
 	if (couldrecurse)
-		t->rcu_read_lock_nesting += INT_MIN;
+		t->rcu_read_lock_nesting += INT_MAX;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-10-29 14:27     ` Paul E. McKenney
@ 2018-10-30  3:44       ` Joel Fernandes
  2018-10-30 12:58         ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2018-10-30  3:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ran Rozenstein, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 29, 2018 at 11:24:42AM +0000, Ran Rozenstein wrote:
> > Hi Paul and all,
> > 
> > > -----Original Message-----
> > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > > owner@vger.kernel.org] On Behalf Of Paul E. McKenney
> > > Sent: Thursday, August 30, 2018 01:21
> > > To: linux-kernel@vger.kernel.org
> > > Cc: mingo@kernel.org; jiangshanlai@gmail.com; dipankar@in.ibm.com;
> > > akpm@linux-foundation.org; mathieu.desnoyers@efficios.com;
> > > josh@joshtriplett.org; tglx@linutronix.de; peterz@infradead.org;
> > > rostedt@goodmis.org; dhowells@redhat.com; edumazet@google.com;
> > > fweisbec@gmail.com; oleg@redhat.com; joel@joelfernandes.org; Paul E.
> > > McKenney <paulmck@linux.vnet.ibm.com>
> > > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > > quiescent states when disabled
> > > 
> > > This commit defers reporting of RCU-preempt quiescent states at
> > > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > > preemption are disabled.  These deferred quiescent states are reported at a
> > > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > > operation.  Of course, if another RCU read-side critical section has started in
> > > the meantime, the reporting of the quiescent state will be further deferred.
> > > 
> > > This also means that disabling preemption, interrupts, and/or softirqs will act
> > > as an RCU-preempt read-side critical section.
> > > This is enforced by checking preempt_count() as needed.
> > > 
> > > Some special cases must be handled on an ad-hoc basis, for example,
> > > context switch is a quiescent state even though both the scheduler and
> > > do_exit() disable preemption.  In these cases, additional calls to
> > > rcu_preempt_deferred_qs() override the preemption disabling.  Similar logic
> > > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > > this case the quiescent state happened just before the corresponding
> > > scheduling-clock interrupt.
> > > 
> > > In theory, this change lifts a long-standing restriction that required that if
> > > interrupts were disabled across a call to rcu_read_unlock() that the matching
> > > rcu_read_lock() also be contained within that interrupts-disabled region of
> > > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > > state is now deferred until after interrupts have been enabled, it is no longer
> > > possible for this situation to result in deadlocks involving the scheduler's
> > > runqueue and priority-inheritance locks.  This may allow some code
> > > simplification that might reduce interrupt latency a bit.  Unfortunately, in
> > > practice this would also defer deboosting a low-priority task that had been
> > > subjected to RCU priority boosting, so real-time-response considerations
> > > might well force this restriction to remain in place.
> > > 
> > > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > > side critical sections, but also by disabling of interrupts, preemption, and
> > > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of
> > > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > > additional plumbing to provide the network denial-of-service guarantees
> > > that have been traditionally provided by RCU-bh.  Once these are in place,
> > > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > > This would mean that all kernels would have but one flavor of RCU, which
> > > would open the door to significant code cleanup.
> > > 
> > > Moving to a single flavor of RCU would also have the beneficial effect of
> > > reducing the NOCB kthreads by at least a factor of two.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck:
> > > Apply rcu_read_unlock_special() preempt_count() feedback
> > >   from Joel Fernandes. ]
> > > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> > >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located
> > > by kbuild test robot involving recursion
> > >   via rcu_preempt_deferred_qs(). ]
> > > ---
> > >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> > >  include/linux/rcutiny.h                       |   5 +
> > >  kernel/rcu/tree.c                             |   9 ++
> > >  kernel/rcu/tree.h                             |   3 +
> > >  kernel/rcu/tree_exp.h                         |  71 +++++++--
> > >  kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
> > >  6 files changed, 205 insertions(+), 77 deletions(-)
> > > 
> > 
> > We started seeing the trace below in our regression system, after I bisected I found this is the offending commit.
> > This appears immediately on boot. 
> > Please let me know if you need any additional details.
> 
> Interesting.  Here is the offending function:
> 
> 	static void rcu_preempt_deferred_qs(struct task_struct *t)
> 	{
> 		unsigned long flags;
> 		bool couldrecurse = t->rcu_read_lock_nesting >= 0;
> 
> 		if (!rcu_preempt_need_deferred_qs(t))
> 			return;
> 		if (couldrecurse)
> 			t->rcu_read_lock_nesting -= INT_MIN;
> 		local_irq_save(flags);
> 		rcu_preempt_deferred_qs_irqrestore(t, flags);
> 		if (couldrecurse)
> 			t->rcu_read_lock_nesting += INT_MIN;
> 	}
> 
> Using twos-complement arithmetic (which the kernel build gcc arguments
> enforce, last I checked) this does work.  But as UBSAN says, subtracting
> INT_MIN is unconditionally undefined behavior according to the C standard.
> 
> Good catch!!!
> 
> So how do I make the above code not simply function, but rather meet
> the C standard?
> 
> One approach to add INT_MIN going in, then add INT_MAX and then add 1
> coming out.
> 
> Another approach is to sacrifice the INT_MAX value (should be plenty
> safe), thus subtract INT_MAX going in and add INT_MAX coming out.
> For consistency, I suppose that I should change the INT_MIN in
> __rcu_read_unlock() to -INT_MAX.
> 
> I could also leave __rcu_read_unlock() alone and XOR the top
> bit of t->rcu_read_lock_nesting on entry and exit to/from
> rcu_preempt_deferred_qs().
> 
> Sacrificing the INT_MIN value seems most maintainable, as in the following
> patch.  Thoughts?

The INT_MAX naming could be very confusing for nesting levels, could we not
instead just define something like:
#define RCU_NESTING_MIN (INT_MIN - 1)
#define RCU_NESTING_MAX (INT_MAX)

and just use that? also one more comment below:

> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index bd8186d0f4a7..f1b40c6d36e4 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -424,7 +424,7 @@ void __rcu_read_unlock(void)
>  		--t->rcu_read_lock_nesting;
>  	} else {
>  		barrier();  /* critical section before exit code. */
> -		t->rcu_read_lock_nesting = INT_MIN;
> +		t->rcu_read_lock_nesting = -INT_MAX;
>  		barrier();  /* assign before ->rcu_read_unlock_special load */
>  		if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
>  			rcu_read_unlock_special(t);
> @@ -617,11 +617,11 @@ static void rcu_preempt_deferred_qs(struct task_struct *t)
>  	if (!rcu_preempt_need_deferred_qs(t))
>  		return;
>  	if (couldrecurse)
> -		t->rcu_read_lock_nesting -= INT_MIN;
> +		t->rcu_read_lock_nesting -= INT_MAX;

Shouldn't this be:  t->rcu_read_lock_nesting -= -INT_MAX; ?

>  	local_irq_save(flags);
>  	rcu_preempt_deferred_qs_irqrestore(t, flags);
>  	if (couldrecurse)
> -		t->rcu_read_lock_nesting += INT_MIN;
> +		t->rcu_read_lock_nesting += INT_MAX;

And t->rcu_read_lock_nesting += -INT_MAX; ?

But apologies if I missed something, thanks,

 - Joel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-10-30  3:44       ` Joel Fernandes
@ 2018-10-30 12:58         ` Paul E. McKenney
  2018-10-30 22:21           ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-10-30 12:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Ran Rozenstein, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Mon, Oct 29, 2018 at 08:44:52PM -0700, Joel Fernandes wrote:
> On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 29, 2018 at 11:24:42AM +0000, Ran Rozenstein wrote:
> > > Hi Paul and all,
> > > 
> > > > -----Original Message-----
> > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > > > owner@vger.kernel.org] On Behalf Of Paul E. McKenney
> > > > Sent: Thursday, August 30, 2018 01:21
> > > > To: linux-kernel@vger.kernel.org
> > > > Cc: mingo@kernel.org; jiangshanlai@gmail.com; dipankar@in.ibm.com;
> > > > akpm@linux-foundation.org; mathieu.desnoyers@efficios.com;
> > > > josh@joshtriplett.org; tglx@linutronix.de; peterz@infradead.org;
> > > > rostedt@goodmis.org; dhowells@redhat.com; edumazet@google.com;
> > > > fweisbec@gmail.com; oleg@redhat.com; joel@joelfernandes.org; Paul E.
> > > > McKenney <paulmck@linux.vnet.ibm.com>
> > > > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > > > quiescent states when disabled
> > > > 
> > > > This commit defers reporting of RCU-preempt quiescent states at
> > > > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > > > preemption are disabled.  These deferred quiescent states are reported at a
> > > > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > > > operation.  Of course, if another RCU read-side critical section has started in
> > > > the meantime, the reporting of the quiescent state will be further deferred.
> > > > 
> > > > This also means that disabling preemption, interrupts, and/or softirqs will act
> > > > as an RCU-preempt read-side critical section.
> > > > This is enforced by checking preempt_count() as needed.
> > > > 
> > > > Some special cases must be handled on an ad-hoc basis, for example,
> > > > context switch is a quiescent state even though both the scheduler and
> > > > do_exit() disable preemption.  In these cases, additional calls to
> > > > rcu_preempt_deferred_qs() override the preemption disabling.  Similar logic
> > > > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > > > this case the quiescent state happened just before the corresponding
> > > > scheduling-clock interrupt.
> > > > 
> > > > In theory, this change lifts a long-standing restriction that required that if
> > > > interrupts were disabled across a call to rcu_read_unlock() that the matching
> > > > rcu_read_lock() also be contained within that interrupts-disabled region of
> > > > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > > > state is now deferred until after interrupts have been enabled, it is no longer
> > > > possible for this situation to result in deadlocks involving the scheduler's
> > > > runqueue and priority-inheritance locks.  This may allow some code
> > > > simplification that might reduce interrupt latency a bit.  Unfortunately, in
> > > > practice this would also defer deboosting a low-priority task that had been
> > > > subjected to RCU priority boosting, so real-time-response considerations
> > > > might well force this restriction to remain in place.
> > > > 
> > > > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > > > side critical sections, but also by disabling of interrupts, preemption, and
> > > > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of
> > > > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > > > additional plumbing to provide the network denial-of-service guarantees
> > > > that have been traditionally provided by RCU-bh.  Once these are in place,
> > > > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > > > This would mean that all kernels would have but one flavor of RCU, which
> > > > would open the door to significant code cleanup.
> > > > 
> > > > Moving to a single flavor of RCU would also have the beneficial effect of
> > > > reducing the NOCB kthreads by at least a factor of two.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck:
> > > > Apply rcu_read_unlock_special() preempt_count() feedback
> > > >   from Joel Fernandes. ]
> > > > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> > > >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located
> > > > by kbuild test robot involving recursion
> > > >   via rcu_preempt_deferred_qs(). ]
> > > > ---
> > > >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> > > >  include/linux/rcutiny.h                       |   5 +
> > > >  kernel/rcu/tree.c                             |   9 ++
> > > >  kernel/rcu/tree.h                             |   3 +
> > > >  kernel/rcu/tree_exp.h                         |  71 +++++++--
> > > >  kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
> > > >  6 files changed, 205 insertions(+), 77 deletions(-)
> > > > 
> > > 
> > > We started seeing the trace below in our regression system, after I bisected I found this is the offending commit.
> > > This appears immediately on boot. 
> > > Please let me know if you need any additional details.
> > 
> > Interesting.  Here is the offending function:
> > 
> > 	static void rcu_preempt_deferred_qs(struct task_struct *t)
> > 	{
> > 		unsigned long flags;
> > 		bool couldrecurse = t->rcu_read_lock_nesting >= 0;
> > 
> > 		if (!rcu_preempt_need_deferred_qs(t))
> > 			return;
> > 		if (couldrecurse)
> > 			t->rcu_read_lock_nesting -= INT_MIN;
> > 		local_irq_save(flags);
> > 		rcu_preempt_deferred_qs_irqrestore(t, flags);
> > 		if (couldrecurse)
> > 			t->rcu_read_lock_nesting += INT_MIN;
> > 	}
> > 
> > Using twos-complement arithmetic (which the kernel build gcc arguments
> > enforce, last I checked) this does work.  But as UBSAN says, subtracting
> > INT_MIN is unconditionally undefined behavior according to the C standard.
> > 
> > Good catch!!!
> > 
> > So how do I make the above code not simply function, but rather meet
> > the C standard?
> > 
> > One approach to add INT_MIN going in, then add INT_MAX and then add 1
> > coming out.
> > 
> > Another approach is to sacrifice the INT_MAX value (should be plenty
> > safe), thus subtract INT_MAX going in and add INT_MAX coming out.
> > For consistency, I suppose that I should change the INT_MIN in
> > __rcu_read_unlock() to -INT_MAX.
> > 
> > I could also leave __rcu_read_unlock() alone and XOR the top
> > bit of t->rcu_read_lock_nesting on entry and exit to/from
> > rcu_preempt_deferred_qs().
> > 
> > Sacrificing the INT_MIN value seems most maintainable, as in the following
> > patch.  Thoughts?
> 
> The INT_MAX naming could be very confusing for nesting levels, could we not
> instead just define something like:
> #define RCU_NESTING_MIN (INT_MIN - 1)
> #define RCU_NESTING_MAX (INT_MAX)
> 
> and just use that? also one more comment below:

Hmmm...  There is currently no use for RCU_NESTING_MAX, but if the check
at the end of __rcu_read_unlock() were to be extended to check for
too-deep positive nesting, it would need to check for something like
INT_MAX/2.  You could of course argue that the current check against
INT_MIN/2 should instead be against -INT_MAX/2, but there really isn't
much difference between the two.

Another approach would be to convert to unsigned in order to avoid the
overflow problem completely.

For the moment, anyway, I am inclined to leave it as is.

> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index bd8186d0f4a7..f1b40c6d36e4 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -424,7 +424,7 @@ void __rcu_read_unlock(void)
> >  		--t->rcu_read_lock_nesting;
> >  	} else {
> >  		barrier();  /* critical section before exit code. */
> > -		t->rcu_read_lock_nesting = INT_MIN;
> > +		t->rcu_read_lock_nesting = -INT_MAX;
> >  		barrier();  /* assign before ->rcu_read_unlock_special load */
> >  		if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
> >  			rcu_read_unlock_special(t);
> > @@ -617,11 +617,11 @@ static void rcu_preempt_deferred_qs(struct task_struct *t)
> >  	if (!rcu_preempt_need_deferred_qs(t))
> >  		return;
> >  	if (couldrecurse)
> > -		t->rcu_read_lock_nesting -= INT_MIN;
> > +		t->rcu_read_lock_nesting -= INT_MAX;
> 
> Shouldn't this be:  t->rcu_read_lock_nesting -= -INT_MAX; ?

Suppose t->rcu_read_lock_nesting is zero.  Then you change would leave the
value a large positive number (INT_MAX, to be precise), which would then
result in signed integer overflow if there was a nested rcu_read_lock().
Worse yet, if t->rcu_read_lock_nesting is any positive number, subtracting
-INT_MAX would immediately result in signed integer overflow.  Please
note that the whole point of this patch in the first place is to avoid
signed integer overflow.

Note that the check at the beginnning of this function never sets
couldrecurse if ->rcu_read_lock_nesting is negative, which avoids the
possibility of overflow given the current code.  Unless you have two
billion nested rcu_read_lock() calls, in which case you broke it, so
you get to buy it.  ;-)

> >  	local_irq_save(flags);
> >  	rcu_preempt_deferred_qs_irqrestore(t, flags);
> >  	if (couldrecurse)
> > -		t->rcu_read_lock_nesting += INT_MIN;
> > +		t->rcu_read_lock_nesting += INT_MAX;
> 
> And t->rcu_read_lock_nesting += -INT_MAX; ?

And this would have similar problems for ->rcu_read_lock_nesting
not equal to zero on entry to this function.

> But apologies if I missed something, thanks,

No need to apologies, especially if it turns out that I am the one
who is confused.  Either way, thank you for looking this over!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-10-30 12:58         ` Paul E. McKenney
@ 2018-10-30 22:21           ` Joel Fernandes
  2018-10-31 18:22             ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2018-10-30 22:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ran Rozenstein, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Tue, Oct 30, 2018 at 05:58:00AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 29, 2018 at 08:44:52PM -0700, Joel Fernandes wrote:
> > On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 29, 2018 at 11:24:42AM +0000, Ran Rozenstein wrote:
> > > > Hi Paul and all,
> > > > 
> > > > > -----Original Message-----
> > > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > > > > owner@vger.kernel.org] On Behalf Of Paul E. McKenney
> > > > > Sent: Thursday, August 30, 2018 01:21
> > > > > To: linux-kernel@vger.kernel.org
> > > > > Cc: mingo@kernel.org; jiangshanlai@gmail.com; dipankar@in.ibm.com;
> > > > > akpm@linux-foundation.org; mathieu.desnoyers@efficios.com;
> > > > > josh@joshtriplett.org; tglx@linutronix.de; peterz@infradead.org;
> > > > > rostedt@goodmis.org; dhowells@redhat.com; edumazet@google.com;
> > > > > fweisbec@gmail.com; oleg@redhat.com; joel@joelfernandes.org; Paul E.
> > > > > McKenney <paulmck@linux.vnet.ibm.com>
> > > > > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > > > > quiescent states when disabled
> > > > > 
> > > > > This commit defers reporting of RCU-preempt quiescent states at
> > > > > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > > > > preemption are disabled.  These deferred quiescent states are reported at a
> > > > > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > > > > operation.  Of course, if another RCU read-side critical section has started in
> > > > > the meantime, the reporting of the quiescent state will be further deferred.
> > > > > 
> > > > > This also means that disabling preemption, interrupts, and/or softirqs will act
> > > > > as an RCU-preempt read-side critical section.
> > > > > This is enforced by checking preempt_count() as needed.
> > > > > 
> > > > > Some special cases must be handled on an ad-hoc basis, for example,
> > > > > context switch is a quiescent state even though both the scheduler and
> > > > > do_exit() disable preemption.  In these cases, additional calls to
> > > > > rcu_preempt_deferred_qs() override the preemption disabling.  Similar logic
> > > > > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > > > > this case the quiescent state happened just before the corresponding
> > > > > scheduling-clock interrupt.
> > > > > 
> > > > > In theory, this change lifts a long-standing restriction that required that if
> > > > > interrupts were disabled across a call to rcu_read_unlock() that the matching
> > > > > rcu_read_lock() also be contained within that interrupts-disabled region of
> > > > > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > > > > state is now deferred until after interrupts have been enabled, it is no longer
> > > > > possible for this situation to result in deadlocks involving the scheduler's
> > > > > runqueue and priority-inheritance locks.  This may allow some code
> > > > > simplification that might reduce interrupt latency a bit.  Unfortunately, in
> > > > > practice this would also defer deboosting a low-priority task that had been
> > > > > subjected to RCU priority boosting, so real-time-response considerations
> > > > > might well force this restriction to remain in place.
> > > > > 
> > > > > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > > > > side critical sections, but also by disabling of interrupts, preemption, and
> > > > > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of
> > > > > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > > > > additional plumbing to provide the network denial-of-service guarantees
> > > > > that have been traditionally provided by RCU-bh.  Once these are in place,
> > > > > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > > > > This would mean that all kernels would have but one flavor of RCU, which
> > > > > would open the door to significant code cleanup.
> > > > > 
> > > > > Moving to a single flavor of RCU would also have the beneficial effect of
> > > > > reducing the NOCB kthreads by at least a factor of two.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck:
> > > > > Apply rcu_read_unlock_special() preempt_count() feedback
> > > > >   from Joel Fernandes. ]
> > > > > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> > > > >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located
> > > > > by kbuild test robot involving recursion
> > > > >   via rcu_preempt_deferred_qs(). ]
> > > > > ---
> > > > >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> > > > >  include/linux/rcutiny.h                       |   5 +
> > > > >  kernel/rcu/tree.c                             |   9 ++
> > > > >  kernel/rcu/tree.h                             |   3 +
> > > > >  kernel/rcu/tree_exp.h                         |  71 +++++++--
> > > > >  kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
> > > > >  6 files changed, 205 insertions(+), 77 deletions(-)
> > > > > 
> > > > 
> > > > We started seeing the trace below in our regression system, after I bisected I found this is the offending commit.
> > > > This appears immediately on boot. 
> > > > Please let me know if you need any additional details.
> > > 
> > > Interesting.  Here is the offending function:
> > > 
> > > 	static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > 	{
> > > 		unsigned long flags;
> > > 		bool couldrecurse = t->rcu_read_lock_nesting >= 0;
> > > 
> > > 		if (!rcu_preempt_need_deferred_qs(t))
> > > 			return;
> > > 		if (couldrecurse)
> > > 			t->rcu_read_lock_nesting -= INT_MIN;
> > > 		local_irq_save(flags);
> > > 		rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > 		if (couldrecurse)
> > > 			t->rcu_read_lock_nesting += INT_MIN;
> > > 	}
> > > 
> > > Using twos-complement arithmetic (which the kernel build gcc arguments
> > > enforce, last I checked) this does work.  But as UBSAN says, subtracting
> > > INT_MIN is unconditionally undefined behavior according to the C standard.
> > > 
> > > Good catch!!!
> > > 
> > > So how do I make the above code not simply function, but rather meet
> > > the C standard?
> > > 
> > > One approach to add INT_MIN going in, then add INT_MAX and then add 1
> > > coming out.
> > > 
> > > Another approach is to sacrifice the INT_MAX value (should be plenty
> > > safe), thus subtract INT_MAX going in and add INT_MAX coming out.
> > > For consistency, I suppose that I should change the INT_MIN in
> > > __rcu_read_unlock() to -INT_MAX.
> > > 
> > > I could also leave __rcu_read_unlock() alone and XOR the top
> > > bit of t->rcu_read_lock_nesting on entry and exit to/from
> > > rcu_preempt_deferred_qs().
> > > 
> > > Sacrificing the INT_MIN value seems most maintainable, as in the following
> > > patch.  Thoughts?
> > 
> > The INT_MAX naming could be very confusing for nesting levels, could we not
> > instead just define something like:
> > #define RCU_NESTING_MIN (INT_MIN - 1)
> > #define RCU_NESTING_MAX (INT_MAX)
> > 
> > and just use that? also one more comment below:
> 
> Hmmm...  There is currently no use for RCU_NESTING_MAX, but if the check
> at the end of __rcu_read_unlock() were to be extended to check for
> too-deep positive nesting, it would need to check for something like
> INT_MAX/2.  You could of course argue that the current check against
> INT_MIN/2 should instead be against -INT_MAX/2, but there really isn't
> much difference between the two.
> 
> Another approach would be to convert to unsigned in order to avoid the
> overflow problem completely.
> 
> For the moment, anyway, I am inclined to leave it as is.

Both the unsigned and INT_MIN/2 options sound good to me, but if you want
leave it as is, that would be fine as well. thanks,

- Joel
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-10-30 22:21           ` Joel Fernandes
@ 2018-10-31 18:22             ` Paul E. McKenney
  2018-11-02 19:43               ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-10-31 18:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Ran Rozenstein, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Tue, Oct 30, 2018 at 03:21:23PM -0700, Joel Fernandes wrote:
> On Tue, Oct 30, 2018 at 05:58:00AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 29, 2018 at 08:44:52PM -0700, Joel Fernandes wrote:
> > > On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 29, 2018 at 11:24:42AM +0000, Ran Rozenstein wrote:
> > > > > Hi Paul and all,
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > > > > > owner@vger.kernel.org] On Behalf Of Paul E. McKenney
> > > > > > Sent: Thursday, August 30, 2018 01:21
> > > > > > To: linux-kernel@vger.kernel.org
> > > > > > Cc: mingo@kernel.org; jiangshanlai@gmail.com; dipankar@in.ibm.com;
> > > > > > akpm@linux-foundation.org; mathieu.desnoyers@efficios.com;
> > > > > > josh@joshtriplett.org; tglx@linutronix.de; peterz@infradead.org;
> > > > > > rostedt@goodmis.org; dhowells@redhat.com; edumazet@google.com;
> > > > > > fweisbec@gmail.com; oleg@redhat.com; joel@joelfernandes.org; Paul E.
> > > > > > McKenney <paulmck@linux.vnet.ibm.com>
> > > > > > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > > > > > quiescent states when disabled
> > > > > > 
> > > > > > This commit defers reporting of RCU-preempt quiescent states at
> > > > > > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > > > > > preemption are disabled.  These deferred quiescent states are reported at a
> > > > > > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > > > > > operation.  Of course, if another RCU read-side critical section has started in
> > > > > > the meantime, the reporting of the quiescent state will be further deferred.
> > > > > > 
> > > > > > This also means that disabling preemption, interrupts, and/or softirqs will act
> > > > > > as an RCU-preempt read-side critical section.
> > > > > > This is enforced by checking preempt_count() as needed.
> > > > > > 
> > > > > > Some special cases must be handled on an ad-hoc basis, for example,
> > > > > > context switch is a quiescent state even though both the scheduler and
> > > > > > do_exit() disable preemption.  In these cases, additional calls to
> > > > > > rcu_preempt_deferred_qs() override the preemption disabling.  Similar logic
> > > > > > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > > > > > this case the quiescent state happened just before the corresponding
> > > > > > scheduling-clock interrupt.
> > > > > > 
> > > > > > In theory, this change lifts a long-standing restriction that required that if
> > > > > > interrupts were disabled across a call to rcu_read_unlock() that the matching
> > > > > > rcu_read_lock() also be contained within that interrupts-disabled region of
> > > > > > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > > > > > state is now deferred until after interrupts have been enabled, it is no longer
> > > > > > possible for this situation to result in deadlocks involving the scheduler's
> > > > > > runqueue and priority-inheritance locks.  This may allow some code
> > > > > > simplification that might reduce interrupt latency a bit.  Unfortunately, in
> > > > > > practice this would also defer deboosting a low-priority task that had been
> > > > > > subjected to RCU priority boosting, so real-time-response considerations
> > > > > > might well force this restriction to remain in place.
> > > > > > 
> > > > > > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > > > > > side critical sections, but also by disabling of interrupts, preemption, and
> > > > > > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of
> > > > > > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > > > > > additional plumbing to provide the network denial-of-service guarantees
> > > > > > that have been traditionally provided by RCU-bh.  Once these are in place,
> > > > > > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > > > > > This would mean that all kernels would have but one flavor of RCU, which
> > > > > > would open the door to significant code cleanup.
> > > > > > 
> > > > > > Moving to a single flavor of RCU would also have the beneficial effect of
> > > > > > reducing the NOCB kthreads by at least a factor of two.
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck:
> > > > > > Apply rcu_read_unlock_special() preempt_count() feedback
> > > > > >   from Joel Fernandes. ]
> > > > > > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> > > > > >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located
> > > > > > by kbuild test robot involving recursion
> > > > > >   via rcu_preempt_deferred_qs(). ]
> > > > > > ---
> > > > > >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> > > > > >  include/linux/rcutiny.h                       |   5 +
> > > > > >  kernel/rcu/tree.c                             |   9 ++
> > > > > >  kernel/rcu/tree.h                             |   3 +
> > > > > >  kernel/rcu/tree_exp.h                         |  71 +++++++--
> > > > > >  kernel/rcu/tree_plugin.h                      | 144 +++++++++++++-----
> > > > > >  6 files changed, 205 insertions(+), 77 deletions(-)
> > > > > > 
> > > > > 
> > > > > We started seeing the trace below in our regression system, after I bisected I found this is the offending commit.
> > > > > This appears immediately on boot. 
> > > > > Please let me know if you need any additional details.
> > > > 
> > > > Interesting.  Here is the offending function:
> > > > 
> > > > 	static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > > 	{
> > > > 		unsigned long flags;
> > > > 		bool couldrecurse = t->rcu_read_lock_nesting >= 0;
> > > > 
> > > > 		if (!rcu_preempt_need_deferred_qs(t))
> > > > 			return;
> > > > 		if (couldrecurse)
> > > > 			t->rcu_read_lock_nesting -= INT_MIN;
> > > > 		local_irq_save(flags);
> > > > 		rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > 		if (couldrecurse)
> > > > 			t->rcu_read_lock_nesting += INT_MIN;
> > > > 	}
> > > > 
> > > > Using twos-complement arithmetic (which the kernel build gcc arguments
> > > > enforce, last I checked) this does work.  But as UBSAN says, subtracting
> > > > INT_MIN is unconditionally undefined behavior according to the C standard.
> > > > 
> > > > Good catch!!!
> > > > 
> > > > So how do I make the above code not simply function, but rather meet
> > > > the C standard?
> > > > 
> > > > One approach to add INT_MIN going in, then add INT_MAX and then add 1
> > > > coming out.
> > > > 
> > > > Another approach is to sacrifice the INT_MAX value (should be plenty
> > > > safe), thus subtract INT_MAX going in and add INT_MAX coming out.
> > > > For consistency, I suppose that I should change the INT_MIN in
> > > > __rcu_read_unlock() to -INT_MAX.
> > > > 
> > > > I could also leave __rcu_read_unlock() alone and XOR the top
> > > > bit of t->rcu_read_lock_nesting on entry and exit to/from
> > > > rcu_preempt_deferred_qs().
> > > > 
> > > > Sacrificing the INT_MIN value seems most maintainable, as in the following
> > > > patch.  Thoughts?
> > > 
> > > The INT_MAX naming could be very confusing for nesting levels, could we not
> > > instead just define something like:
> > > #define RCU_NESTING_MIN (INT_MIN - 1)
> > > #define RCU_NESTING_MAX (INT_MAX)
> > > 
> > > and just use that? also one more comment below:
> > 
> > Hmmm...  There is currently no use for RCU_NESTING_MAX, but if the check
> > at the end of __rcu_read_unlock() were to be extended to check for
> > too-deep positive nesting, it would need to check for something like
> > INT_MAX/2.  You could of course argue that the current check against
> > INT_MIN/2 should instead be against -INT_MAX/2, but there really isn't
> > much difference between the two.
> > 
> > Another approach would be to convert to unsigned in order to avoid the
> > overflow problem completely.
> > 
> > For the moment, anyway, I am inclined to leave it as is.
> 
> Both the unsigned and INT_MIN/2 options sound good to me, but if you want
> leave it as is, that would be fine as well. thanks,

One approach would be something like this:

#define RCU_READ_LOCK_BIAS (INT_MAX)
#define RCU_READ_LOCK_NMAX (-INT_MAX)
#define RCU_READ_LOCK_PMAX INT_MAX

Then _rcu_read_unlock() would set ->rcu_read_lock_nesting to
-RCU_READ_LOCK_BIAS, and compare against RCU_READ_LOCK_NMAX.
The comparison against RCU_READ_LOCK_PMAX would preferably take
place just after the increment in __rcu_read_lock(), again only under
CONFIG_PROVE_RCU.

rcu_preempt_deferred_qs() would then subtract then add RCU_READ_LOCK_BIAS.

Thoughts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-10-31 18:22             ` Paul E. McKenney
@ 2018-11-02 19:43               ` Paul E. McKenney
  2018-11-26 13:55                 ` Ran Rozenstein
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2018-11-02 19:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Ran Rozenstein, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Wed, Oct 31, 2018 at 11:22:59AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 30, 2018 at 03:21:23PM -0700, Joel Fernandes wrote:
> > On Tue, Oct 30, 2018 at 05:58:00AM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 29, 2018 at 08:44:52PM -0700, Joel Fernandes wrote:
> > > > On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > > The INT_MAX naming could be very confusing for nesting levels, could we not
> > > > instead just define something like:
> > > > #define RCU_NESTING_MIN (INT_MIN - 1)
> > > > #define RCU_NESTING_MAX (INT_MAX)
> > > > 
> > > > and just use that? also one more comment below:
> > > 
> > > Hmmm...  There is currently no use for RCU_NESTING_MAX, but if the check
> > > at the end of __rcu_read_unlock() were to be extended to check for
> > > too-deep positive nesting, it would need to check for something like
> > > INT_MAX/2.  You could of course argue that the current check against
> > > INT_MIN/2 should instead be against -INT_MAX/2, but there really isn't
> > > much difference between the two.
> > > 
> > > Another approach would be to convert to unsigned in order to avoid the
> > > overflow problem completely.
> > > 
> > > For the moment, anyway, I am inclined to leave it as is.
> > 
> > Both the unsigned and INT_MIN/2 options sound good to me, but if you want
> > leave it as is, that would be fine as well. thanks,
> 
> One approach would be something like this:
> 
> #define RCU_READ_LOCK_BIAS (INT_MAX)
> #define RCU_READ_LOCK_NMAX (-INT_MAX)
> #define RCU_READ_LOCK_PMAX INT_MAX
> 
> Then _rcu_read_unlock() would set ->rcu_read_lock_nesting to
> -RCU_READ_LOCK_BIAS, and compare against RCU_READ_LOCK_NMAX.
> The comparison against RCU_READ_LOCK_PMAX would preferably take
> place just after the increment in __rcu_read_lock(), again only under
> CONFIG_PROVE_RCU.
> 
> rcu_preempt_deferred_qs() would then subtract then add RCU_READ_LOCK_BIAS.
> 
> Thoughts?

Hearing no objections, here is the updated patch.

								Thanx, Paul

------------------------------------------------------------------------

commit 970cab5d3d206029ed27274a98ea1c3d7e780e53
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Mon Oct 29 07:36:50 2018 -0700

    rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
    
    Subtracting INT_MIN can be interpreted as unconditional signed integer
    overflow, which according to the C standard is undefined behavior.
    Therefore, kernel build arguments notwithstanding, it would be good to
    future-proof the code.  This commit therefore substitutes INT_MAX for
    INT_MIN in order to avoid undefined behavior.
    
    While in the neighborhood, this commit also creates some meaningful names
    for INT_MAX and friends in order to improve readability, as suggested
    by Joel Fernandes.
    
    Reported-by: Ran Rozenstein <ranro@mellanox.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
    
    squash! rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
    
    While in the neighborhood, use macros to give meaningful names.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index bd8186d0f4a7..e60f820ffb83 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -397,6 +397,11 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
 	return rnp->gp_tasks != NULL;
 }
 
+/* Bias and limit values for ->rcu_read_lock_nesting. */
+#define RCU_NEST_BIAS INT_MAX
+#define RCU_NEST_NMAX (-INT_MAX / 2)
+#define RCU_NEST_PMAX (INT_MAX / 2)
+
 /*
  * Preemptible RCU implementation for rcu_read_lock().
  * Just increment ->rcu_read_lock_nesting, shared state will be updated
@@ -405,6 +410,8 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
 void __rcu_read_lock(void)
 {
 	current->rcu_read_lock_nesting++;
+	if (IS_ENABLED(CONFIG_PROVE_LOCKING))
+		WARN_ON_ONCE(current->rcu_read_lock_nesting > RCU_NEST_PMAX);
 	barrier();  /* critical section after entry code. */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_lock);
@@ -424,20 +431,18 @@ void __rcu_read_unlock(void)
 		--t->rcu_read_lock_nesting;
 	} else {
 		barrier();  /* critical section before exit code. */
-		t->rcu_read_lock_nesting = INT_MIN;
+		t->rcu_read_lock_nesting = -RCU_NEST_BIAS;
 		barrier();  /* assign before ->rcu_read_unlock_special load */
 		if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
 			rcu_read_unlock_special(t);
 		barrier();  /* ->rcu_read_unlock_special load before assign */
 		t->rcu_read_lock_nesting = 0;
 	}
-#ifdef CONFIG_PROVE_LOCKING
-	{
-		int rrln = READ_ONCE(t->rcu_read_lock_nesting);
+	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+		int rrln = t->rcu_read_lock_nesting;
 
-		WARN_ON_ONCE(rrln < 0 && rrln > INT_MIN / 2);
+		WARN_ON_ONCE(rrln < 0 && rrln > RCU_NEST_NMAX);
 	}
-#endif /* #ifdef CONFIG_PROVE_LOCKING */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_unlock);
 
@@ -617,11 +622,11 @@ static void rcu_preempt_deferred_qs(struct task_struct *t)
 	if (!rcu_preempt_need_deferred_qs(t))
 		return;
 	if (couldrecurse)
-		t->rcu_read_lock_nesting -= INT_MIN;
+		t->rcu_read_lock_nesting -= RCU_NEST_BIAS;
 	local_irq_save(flags);
 	rcu_preempt_deferred_qs_irqrestore(t, flags);
 	if (couldrecurse)
-		t->rcu_read_lock_nesting += INT_MIN;
+		t->rcu_read_lock_nesting += RCU_NEST_BIAS;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* RE: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-11-02 19:43               ` Paul E. McKenney
@ 2018-11-26 13:55                 ` Ran Rozenstein
  2018-11-26 19:00                   ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Ran Rozenstein @ 2018-11-26 13:55 UTC (permalink / raw)
  To: paulmck, Joel Fernandes
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

> 
> Hearing no objections, here is the updated patch.
> 
> 								Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 970cab5d3d206029ed27274a98ea1c3d7e780e53
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Mon Oct 29 07:36:50 2018 -0700
> 
>     rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
> 
>     Subtracting INT_MIN can be interpreted as unconditional signed integer
>     overflow, which according to the C standard is undefined behavior.
>     Therefore, kernel build arguments notwithstanding, it would be good to
>     future-proof the code.  This commit therefore substitutes INT_MAX for
>     INT_MIN in order to avoid undefined behavior.
> 
>     While in the neighborhood, this commit also creates some meaningful
> names
>     for INT_MAX and friends in order to improve readability, as suggested
>     by Joel Fernandes.
> 
>     Reported-by: Ran Rozenstein <ranro@mellanox.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
>     squash! rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
> 
>     While in the neighborhood, use macros to give meaningful names.
> 
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 


Hi,

What is the acceptance status of this patch?

Thanks,
Ran


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled
  2018-11-26 13:55                 ` Ran Rozenstein
@ 2018-11-26 19:00                   ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2018-11-26 19:00 UTC (permalink / raw)
  To: Ran Rozenstein
  Cc: Joel Fernandes, linux-kernel, mingo, jiangshanlai, dipankar,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Maor Gottlieb, Tariq Toukan,
	Eran Ben Elisha, Leon Romanovsky

On Mon, Nov 26, 2018 at 01:55:37PM +0000, Ran Rozenstein wrote:
> > 
> > Hearing no objections, here is the updated patch.
> > 
> > 								Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 970cab5d3d206029ed27274a98ea1c3d7e780e53
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Mon Oct 29 07:36:50 2018 -0700
> > 
> >     rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
> > 
> >     Subtracting INT_MIN can be interpreted as unconditional signed integer
> >     overflow, which according to the C standard is undefined behavior.
> >     Therefore, kernel build arguments notwithstanding, it would be good to
> >     future-proof the code.  This commit therefore substitutes INT_MAX for
> >     INT_MIN in order to avoid undefined behavior.
> > 
> >     While in the neighborhood, this commit also creates some meaningful
> > names
> >     for INT_MAX and friends in order to improve readability, as suggested
> >     by Joel Fernandes.
> > 
> >     Reported-by: Ran Rozenstein <ranro@mellanox.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> >     squash! rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
> > 
> >     While in the neighborhood, use macros to give meaningful names.
> > 
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> Hi,
> 
> What is the acceptance status of this patch?

It is queued in -rcu.  If no problems arise beforehand, I intend to submit
it as part of a pull request into -tip, which (again if no problems arise)
be pulled into mainline during the next merge window.

Oddly enough, a couple of weeks ago the C++ Standards Committee voted
in a proposal for C++20 removing undefined behavior for signed integer
overflow.  This is C++ rather than C, and C must support additional
hardware that wouldn't much like forcing twos complement for signed
integer overflow.  But still...  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2018-08-29 22:20 ` [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts Paul E. McKenney
@ 2019-03-11 13:39   ` Joel Fernandes
  2019-03-11 22:29     ` Paul E. McKenney
  2019-03-15  7:31     ` Byungchul Park
  0 siblings, 2 replies; 49+ messages in thread
From: Joel Fernandes @ 2019-03-11 13:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, rcu, jiangshanlai, dipankar, mathieu.desnoyers,
	josh, rostedt, luto, byungchul.park

On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> either an interrupt that invokes rcu_irq_enter() but never invokes the
> corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> on the other.  These things really did happen at one time, as evidenced
> by this ca-2011 LKML post:
> 
> http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> 
> The reason why RCU tolerates half-interrupts is that usermode helpers
> used exceptions to invoke a system call from within the kernel such that
> the system call did a normal return (not a return from exception) to
> the calling context.  This caused rcu_irq_enter() to be invoked without
> a matching rcu_irq_exit().  However, usermode helpers have since been
> rewritten to make much more housebroken use of workqueues, kernel threads,
> and do_execve(), and therefore should no longer produce half-interrupts.
> No one knows of any other source of half-interrupts, but then again,
> no one seems insane enough to go audit the entire kernel to verify that
> half-interrupts really are a relic of the past.
> 
> This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> trigger in the presence of half interrupts, which the code will continue
> to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> mid-2021, then perhaps RCU can stop handling half-interrupts, which
> would be a considerable simplification.

Hi Paul and everyone,
I was thinking some more about this patch and whether we can simplify this code
much in 2021. Since 2021 is a bit far away, I thought working on it in again to
keep it fresh in memory is a good idea ;-)

To me it seems we cannot easily combine the counters (dynticks_nesting and
dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
half-interrupt scenario (assuming simplication means counter combining like
Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
2 counters need to be tracked separately as they are used differently in the
following function:

static int rcu_is_cpu_rrupt_from_idle(void)
{
        return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
               __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
}

dynticks_nesting actually tracks if we entered/exited idle or user mode.

dynticks_nmi_nesting tracks if we entered/exited interrupts.

We have to do the "dynticks_nmi_nesting <= 1" check because
rcu_is_cpu_rrupt_from_idle() can possibly be called from an interrupt itself
(like timer) so we discount 1 interrupt, and, the "dynticks_nesting <= 0"
check is because the CPU MUST be in user or idle for the check to return
true. We can't really combine these two into one counter then I think because
they both convey different messages.

The only simplication we can do, is probably the "crowbar" updates to
dynticks_nmi_nesting can be removed from rcu_eqs_enter/exit once we confirm
no more half-interrupts are possible. Which might still be a worthwhile thing
to do (while still keeping both counters separate).

However, I think we could combine the counters and lead to simplying the code
in case we implement rcu_is_cpu_rrupt_from_idle differently such that it does
not need the counters but NOHZ_FULL may take issue with that since it needs
rcu_user_enter->rcu_eqs_enter to convey that the CPU is "RCU"-idle.

Actually, I had another question... rcu_user_enter() is a NOOP in !NOHZ_FULL config.
In this case I was wondering if the the warning Paul added (in the patch I'm replying to)
will really get fired for half-interrupts. The vast majority of the systems I believe are
NOHZ_IDLE not NOHZ_FULL.
This is what a half-interrupt really looks like right? Please correct me if I'm wrong:
rcu_irq_enter()   [half interrupt causes an exception and thus rcu_irq_enter]
rcu_user_enter()  [due to usermode upcall]
rcu_user_exit()
(no more rcu_irq_exit() - hence half an interrupt)

But the rcu_user_enter()/exit is a NOOP in some configs, so will the warning in
rcu_eqs_e{xit,nter} really do anything?

Or was the idea with adding the new warnings, that they would fire the next
time rcu_idle_enter/exit is called? Like for example:

rcu_irq_enter()   [This is due to half-interrupt]
rcu_idle_enter()  [Eventually we enter the idle loop at some point
		   after the half-interrupt and the rcu_eqs_enter()
		   would "crowbar" the dynticks_nmi_nesting counter to 0].

thanks!

 - Joel

> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Reported-by: Joel Fernandes <joel@joelfernandes.org>
> Reported-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index dc041c2afbcc..d2b6ade692c9 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -714,6 +714,7 @@ static void rcu_eqs_enter(bool user)
>  	struct rcu_dynticks *rdtp;
>  
>  	rdtp = this_cpu_ptr(&rcu_dynticks);
> +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
>  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>  		     rdtp->dynticks_nesting == 0);
> @@ -895,6 +896,7 @@ static void rcu_eqs_exit(bool user)
>  	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
>  	WRITE_ONCE(rdtp->dynticks_nesting, 1);
> +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting);
>  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-11 13:39   ` Joel Fernandes
@ 2019-03-11 22:29     ` Paul E. McKenney
  2019-03-12 15:05       ` Joel Fernandes
  2019-03-15  7:31     ` Byungchul Park
  1 sibling, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2019-03-11 22:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rcu, jiangshanlai, dipankar, mathieu.desnoyers,
	josh, rostedt, luto, byungchul.park

On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > on the other.  These things really did happen at one time, as evidenced
> > by this ca-2011 LKML post:
> > 
> > http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> > 
> > The reason why RCU tolerates half-interrupts is that usermode helpers
> > used exceptions to invoke a system call from within the kernel such that
> > the system call did a normal return (not a return from exception) to
> > the calling context.  This caused rcu_irq_enter() to be invoked without
> > a matching rcu_irq_exit().  However, usermode helpers have since been
> > rewritten to make much more housebroken use of workqueues, kernel threads,
> > and do_execve(), and therefore should no longer produce half-interrupts.
> > No one knows of any other source of half-interrupts, but then again,
> > no one seems insane enough to go audit the entire kernel to verify that
> > half-interrupts really are a relic of the past.
> > 
> > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > trigger in the presence of half interrupts, which the code will continue
> > to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > would be a considerable simplification.
> 
> Hi Paul and everyone,
> I was thinking some more about this patch and whether we can simplify this code
> much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> keep it fresh in memory is a good idea ;-)

Indeed, easy to forget.  ;-)

> To me it seems we cannot easily combine the counters (dynticks_nesting and
> dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> half-interrupt scenario (assuming simplication means counter combining like
> Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> 2 counters need to be tracked separately as they are used differently in the
> following function:
> 
> static int rcu_is_cpu_rrupt_from_idle(void)
> {
>         return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
>                __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> }
> 
> dynticks_nesting actually tracks if we entered/exited idle or user mode.

True, though it tracks user mode only in CONFIG_NO_HZ_FULL kernels.

> dynticks_nmi_nesting tracks if we entered/exited interrupts.

Including NMIs, yes.

> We have to do the "dynticks_nmi_nesting <= 1" check because
> rcu_is_cpu_rrupt_from_idle() can possibly be called from an interrupt itself
> (like timer) so we discount 1 interrupt, and, the "dynticks_nesting <= 0"
> check is because the CPU MUST be in user or idle for the check to return
> true. We can't really combine these two into one counter then I think because
> they both convey different messages.
> 
> The only simplication we can do, is probably the "crowbar" updates to
> dynticks_nmi_nesting can be removed from rcu_eqs_enter/exit once we confirm
> no more half-interrupts are possible. Which might still be a worthwhile thing
> to do (while still keeping both counters separate).
> 
> However, I think we could combine the counters and lead to simplying the code
> in case we implement rcu_is_cpu_rrupt_from_idle differently such that it does
> not need the counters but NOHZ_FULL may take issue with that since it needs
> rcu_user_enter->rcu_eqs_enter to convey that the CPU is "RCU"-idle.

I haven't gone through it in detail, but it seems like we should be able
to treat in-kernel process-level execution like an interrupt from idle
or userspace, as the case might be.  If we did that, shouldn't we be
able to just do this?

static int rcu_is_cpu_rrupt_from_idle(void)
{
        return __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
}

> Actually, I had another question... rcu_user_enter() is a NOOP in !NOHZ_FULL config.
> In this case I was wondering if the the warning Paul added (in the patch I'm replying to)
> will really get fired for half-interrupts. The vast majority of the systems I believe are
> NOHZ_IDLE not NOHZ_FULL.
> This is what a half-interrupt really looks like right? Please correct me if I'm wrong:
> rcu_irq_enter()   [half interrupt causes an exception and thus rcu_irq_enter]
> rcu_user_enter()  [due to usermode upcall]
> rcu_user_exit()
> (no more rcu_irq_exit() - hence half an interrupt)
> 
> But the rcu_user_enter()/exit is a NOOP in some configs, so will the warning in
> rcu_eqs_e{xit,nter} really do anything?

Yes, because these are also called from rcu_idle_enter() and
rcu_idle_exit(), which is invoked even in !NO_HZ_FULL kernels.

> Or was the idea with adding the new warnings, that they would fire the next
> time rcu_idle_enter/exit is called? Like for example:
> 
> rcu_irq_enter()   [This is due to half-interrupt]
> rcu_idle_enter()  [Eventually we enter the idle loop at some point
> 		   after the half-interrupt and the rcu_eqs_enter()
> 		   would "crowbar" the dynticks_nmi_nesting counter to 0].

You got it!  ;-)

So yes, these warnings just detect the presence of misnesting.  Presumably
event tracing would then be used to track down the culprits.  Assuming that
the misnesting is reproducible and all that.

							Thanx, Paul

> thanks!
> 
>  - Joel
> 
> > 
> > Reported-by: Steven Rostedt <rostedt@goodmis.org>
> > Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > Reported-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index dc041c2afbcc..d2b6ade692c9 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -714,6 +714,7 @@ static void rcu_eqs_enter(bool user)
> >  	struct rcu_dynticks *rdtp;
> >  
> >  	rdtp = this_cpu_ptr(&rcu_dynticks);
> > +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> >  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
> >  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> >  		     rdtp->dynticks_nesting == 0);
> > @@ -895,6 +896,7 @@ static void rcu_eqs_exit(bool user)
> >  	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
> >  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
> >  	WRITE_ONCE(rdtp->dynticks_nesting, 1);
> > +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting);
> >  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
> >  }
> >  
> > -- 
> > 2.17.1
> > 
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-11 22:29     ` Paul E. McKenney
@ 2019-03-12 15:05       ` Joel Fernandes
  2019-03-12 15:20         ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2019-03-12 15:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, rcu, jiangshanlai, dipankar, mathieu.desnoyers,
	josh, rostedt, luto, byungchul.park

On Mon, Mar 11, 2019 at 03:29:03PM -0700, Paul E. McKenney wrote:
> On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > > on the other.  These things really did happen at one time, as evidenced
> > > by this ca-2011 LKML post:
> > > 
> > > http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> > > 
> > > The reason why RCU tolerates half-interrupts is that usermode helpers
> > > used exceptions to invoke a system call from within the kernel such that
> > > the system call did a normal return (not a return from exception) to
> > > the calling context.  This caused rcu_irq_enter() to be invoked without
> > > a matching rcu_irq_exit().  However, usermode helpers have since been
> > > rewritten to make much more housebroken use of workqueues, kernel threads,
> > > and do_execve(), and therefore should no longer produce half-interrupts.
> > > No one knows of any other source of half-interrupts, but then again,
> > > no one seems insane enough to go audit the entire kernel to verify that
> > > half-interrupts really are a relic of the past.
> > > 
> > > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > > trigger in the presence of half interrupts, which the code will continue
> > > to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> > > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > > would be a considerable simplification.
> > 
> > Hi Paul and everyone,
> > I was thinking some more about this patch and whether we can simplify this code
> > much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> > keep it fresh in memory is a good idea ;-)
> 
> Indeed, easy to forget.  ;-)
> 
> > To me it seems we cannot easily combine the counters (dynticks_nesting and
> > dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> > half-interrupt scenario (assuming simplication means counter combining like
> > Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> > 2 counters need to be tracked separately as they are used differently in the
> > following function:
> > 
> > static int rcu_is_cpu_rrupt_from_idle(void)
> > {
> >         return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
> >                __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> > }
> > 
> > dynticks_nesting actually tracks if we entered/exited idle or user mode.
> 
> True, though it tracks user mode only in CONFIG_NO_HZ_FULL kernels.
> 
> > dynticks_nmi_nesting tracks if we entered/exited interrupts.
> 
> Including NMIs, yes.
> 
> > We have to do the "dynticks_nmi_nesting <= 1" check because
> > rcu_is_cpu_rrupt_from_idle() can possibly be called from an interrupt itself
> > (like timer) so we discount 1 interrupt, and, the "dynticks_nesting <= 0"
> > check is because the CPU MUST be in user or idle for the check to return
> > true. We can't really combine these two into one counter then I think because
> > they both convey different messages.
> > 
> > The only simplication we can do, is probably the "crowbar" updates to
> > dynticks_nmi_nesting can be removed from rcu_eqs_enter/exit once we confirm
> > no more half-interrupts are possible. Which might still be a worthwhile thing
> > to do (while still keeping both counters separate).
> > 
> > However, I think we could combine the counters and lead to simplying the code
> > in case we implement rcu_is_cpu_rrupt_from_idle differently such that it does
> > not need the counters but NOHZ_FULL may take issue with that since it needs
> > rcu_user_enter->rcu_eqs_enter to convey that the CPU is "RCU"-idle.
> 
> I haven't gone through it in detail, but it seems like we should be able
> to treat in-kernel process-level execution like an interrupt from idle
> or userspace, as the case might be.  If we did that, shouldn't we be
> able to just do this?
> 
> static int rcu_is_cpu_rrupt_from_idle(void)
> {
>         return __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> }

I think that would work only if this function is always called from an
interrupt:

The comments on this function says:
 * If the current CPU is idle or running at a first-level (not nested)
 * interrupt from idle, return true.  The caller must have at least
 * disabled preemption.
 */

According to this comment rcu_is_cpu_rrupt_from_idle can be called from
somewhere other than an interrupt although from code-reading that does not
seem to be.

If it is ever possible to call rcu_is_cpu_rrupt_from_idle from anywhere but
an interrupt, then we would have a scenario like:
rcu_eqs_exit()   (Called say from rcu_idle_exit)
rcu_is_cpu_rrupt_from_idle  (now nesting counter is 1, so the <=1 check
                             returns true).

This would result in the function falsely claiming the CPU was idle from an
RCU standpoint I think.

However both from testing and from code reading, I don't see such a scenario
happening. Could we be more explicit in the code that this function can only
be called from an interrupt, and also we change the code comment to be more
clear about it (like the following diff)?

I made the following change and it didn't seem to blow up and also makes the
code more clear. The only change I would make if we were considering merging
this is to change the BUG_ON to WARN_ON_ONCE but I want to check first if you
think this change is useful to do. I think being explicit about the intention
is a good thing. I also specifically check for first-level of interrupt
nesting (==1).

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a45c45a4a09b..531c63c40001 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -377,14 +377,19 @@ static void __maybe_unused rcu_momentary_dyntick_idle(void)
 /**
  * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
  *
- * If the current CPU is idle or running at a first-level (not nested)
+ * If the current CPU is idle and running at a first-level (not nested)
  * interrupt from idle, return true.  The caller must have at least
  * disabled preemption.
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-	return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
-	       __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
+	int dynticks_nesting = __this_cpu_read(rcu_data.dynticks_nesting);
+	int dynticks_nmi_nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
+
+	/* This function should only be called from an interrupt */
+	BUG_ON(dynticks_nmi_nesting < 1);
+
+	return dynticks_nmi_nesting == 1 && dynticks_nesting <= 0;
 }
 
 #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch. */


> > Actually, I had another question... rcu_user_enter() is a NOOP in !NOHZ_FULL config.
> > In this case I was wondering if the the warning Paul added (in the patch I'm replying to)
> > will really get fired for half-interrupts. The vast majority of the systems I believe are
> > NOHZ_IDLE not NOHZ_FULL.
> > This is what a half-interrupt really looks like right? Please correct me if I'm wrong:
> > rcu_irq_enter()   [half interrupt causes an exception and thus rcu_irq_enter]
> > rcu_user_enter()  [due to usermode upcall]
> > rcu_user_exit()
> > (no more rcu_irq_exit() - hence half an interrupt)
> > 
> > But the rcu_user_enter()/exit is a NOOP in some configs, so will the warning in
> > rcu_eqs_e{xit,nter} really do anything?
> 
> Yes, because these are also called from rcu_idle_enter() and
> rcu_idle_exit(), which is invoked even in !NO_HZ_FULL kernels.

Got it.

> > Or was the idea with adding the new warnings, that they would fire the next
> > time rcu_idle_enter/exit is called? Like for example:
> > 
> > rcu_irq_enter()   [This is due to half-interrupt]
> > rcu_idle_enter()  [Eventually we enter the idle loop at some point
> > 		   after the half-interrupt and the rcu_eqs_enter()
> > 		   would "crowbar" the dynticks_nmi_nesting counter to 0].
> 
> You got it!  ;-)

Thanks a lot for confirming,

- Joel


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-12 15:05       ` Joel Fernandes
@ 2019-03-12 15:20         ` Paul E. McKenney
  2019-03-13 15:09           ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2019-03-12 15:20 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rcu, jiangshanlai, dipankar, mathieu.desnoyers,
	josh, rostedt, luto, byungchul.park

On Tue, Mar 12, 2019 at 11:05:14AM -0400, Joel Fernandes wrote:
> On Mon, Mar 11, 2019 at 03:29:03PM -0700, Paul E. McKenney wrote:
> > On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > > > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > > > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > > > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > > > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > > > on the other.  These things really did happen at one time, as evidenced
> > > > by this ca-2011 LKML post:
> > > > 
> > > > http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> > > > 
> > > > The reason why RCU tolerates half-interrupts is that usermode helpers
> > > > used exceptions to invoke a system call from within the kernel such that
> > > > the system call did a normal return (not a return from exception) to
> > > > the calling context.  This caused rcu_irq_enter() to be invoked without
> > > > a matching rcu_irq_exit().  However, usermode helpers have since been
> > > > rewritten to make much more housebroken use of workqueues, kernel threads,
> > > > and do_execve(), and therefore should no longer produce half-interrupts.
> > > > No one knows of any other source of half-interrupts, but then again,
> > > > no one seems insane enough to go audit the entire kernel to verify that
> > > > half-interrupts really are a relic of the past.
> > > > 
> > > > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > > > trigger in the presence of half interrupts, which the code will continue
> > > > to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> > > > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > > > would be a considerable simplification.
> > > 
> > > Hi Paul and everyone,
> > > I was thinking some more about this patch and whether we can simplify this code
> > > much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> > > keep it fresh in memory is a good idea ;-)
> > 
> > Indeed, easy to forget.  ;-)
> > 
> > > To me it seems we cannot easily combine the counters (dynticks_nesting and
> > > dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> > > half-interrupt scenario (assuming simplication means counter combining like
> > > Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> > > 2 counters need to be tracked separately as they are used differently in the
> > > following function:
> > > 
> > > static int rcu_is_cpu_rrupt_from_idle(void)
> > > {
> > >         return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
> > >                __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> > > }
> > > 
> > > dynticks_nesting actually tracks if we entered/exited idle or user mode.
> > 
> > True, though it tracks user mode only in CONFIG_NO_HZ_FULL kernels.
> > 
> > > dynticks_nmi_nesting tracks if we entered/exited interrupts.
> > 
> > Including NMIs, yes.
> > 
> > > We have to do the "dynticks_nmi_nesting <= 1" check because
> > > rcu_is_cpu_rrupt_from_idle() can possibly be called from an interrupt itself
> > > (like timer) so we discount 1 interrupt, and, the "dynticks_nesting <= 0"
> > > check is because the CPU MUST be in user or idle for the check to return
> > > true. We can't really combine these two into one counter then I think because
> > > they both convey different messages.
> > > 
> > > The only simplication we can do, is probably the "crowbar" updates to
> > > dynticks_nmi_nesting can be removed from rcu_eqs_enter/exit once we confirm
> > > no more half-interrupts are possible. Which might still be a worthwhile thing
> > > to do (while still keeping both counters separate).
> > > 
> > > However, I think we could combine the counters and lead to simplying the code
> > > in case we implement rcu_is_cpu_rrupt_from_idle differently such that it does
> > > not need the counters but NOHZ_FULL may take issue with that since it needs
> > > rcu_user_enter->rcu_eqs_enter to convey that the CPU is "RCU"-idle.
> > 
> > I haven't gone through it in detail, but it seems like we should be able
> > to treat in-kernel process-level execution like an interrupt from idle
> > or userspace, as the case might be.  If we did that, shouldn't we be
> > able to just do this?
> > 
> > static int rcu_is_cpu_rrupt_from_idle(void)
> > {
> >         return __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> > }
> 
> I think that would work only if this function is always called from an
> interrupt:
> 
> The comments on this function says:
>  * If the current CPU is idle or running at a first-level (not nested)
>  * interrupt from idle, return true.  The caller must have at least
>  * disabled preemption.
>  */
> 
> According to this comment rcu_is_cpu_rrupt_from_idle can be called from
> somewhere other than an interrupt although from code-reading that does not
> seem to be.
> 
> If it is ever possible to call rcu_is_cpu_rrupt_from_idle from anywhere but
> an interrupt, then we would have a scenario like:
> rcu_eqs_exit()   (Called say from rcu_idle_exit)
> rcu_is_cpu_rrupt_from_idle  (now nesting counter is 1, so the <=1 check
>                              returns true).
> 
> This would result in the function falsely claiming the CPU was idle from an
> RCU standpoint I think.
> 
> However both from testing and from code reading, I don't see such a scenario
> happening.

Agreed!

>            Could we be more explicit in the code that this function can only
> be called from an interrupt, and also we change the code comment to be more
> clear about it (like the following diff)?

That would be good!

Nice trick on using dyntick state to check for interrupt nesting, but
wouldn't consolidating the counters break that?  But is there a lockdep
check for being in a hardware interrupt handler?  If not, could one
be added?  This would have the benefit of not adding overhead to the
scheduling-clock interrupt in production builds of the Linux kernel,
while still finding this bug in testing.

(Another approach would be to use IS_ENABLED(CONFIG_PROVE_RCU), but
a lockdep check would be cleaner.)

> I made the following change and it didn't seem to blow up and also makes the
> code more clear. The only change I would make if we were considering merging
> this is to change the BUG_ON to WARN_ON_ONCE but I want to check first if you
> think this change is useful to do. I think being explicit about the intention
> is a good thing. I also specifically check for first-level of interrupt
> nesting (==1).
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a45c45a4a09b..531c63c40001 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -377,14 +377,19 @@ static void __maybe_unused rcu_momentary_dyntick_idle(void)
>  /**
>   * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
>   *
> - * If the current CPU is idle or running at a first-level (not nested)
> + * If the current CPU is idle and running at a first-level (not nested)
>   * interrupt from idle, return true.  The caller must have at least
>   * disabled preemption.
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> -	return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
> -	       __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> +	int dynticks_nesting = __this_cpu_read(rcu_data.dynticks_nesting);
> +	int dynticks_nmi_nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> +
> +	/* This function should only be called from an interrupt */
> +	BUG_ON(dynticks_nmi_nesting < 1);
> +
> +	return dynticks_nmi_nesting == 1 && dynticks_nesting <= 0;
>  }
>  
>  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch. */
> 
> 
> > > Actually, I had another question... rcu_user_enter() is a NOOP in !NOHZ_FULL config.
> > > In this case I was wondering if the the warning Paul added (in the patch I'm replying to)
> > > will really get fired for half-interrupts. The vast majority of the systems I believe are
> > > NOHZ_IDLE not NOHZ_FULL.
> > > This is what a half-interrupt really looks like right? Please correct me if I'm wrong:
> > > rcu_irq_enter()   [half interrupt causes an exception and thus rcu_irq_enter]
> > > rcu_user_enter()  [due to usermode upcall]
> > > rcu_user_exit()
> > > (no more rcu_irq_exit() - hence half an interrupt)
> > > 
> > > But the rcu_user_enter()/exit is a NOOP in some configs, so will the warning in
> > > rcu_eqs_e{xit,nter} really do anything?
> > 
> > Yes, because these are also called from rcu_idle_enter() and
> > rcu_idle_exit(), which is invoked even in !NO_HZ_FULL kernels.
> 
> Got it.
> 
> > > Or was the idea with adding the new warnings, that they would fire the next
> > > time rcu_idle_enter/exit is called? Like for example:
> > > 
> > > rcu_irq_enter()   [This is due to half-interrupt]
> > > rcu_idle_enter()  [Eventually we enter the idle loop at some point
> > > 		   after the half-interrupt and the rcu_eqs_enter()
> > > 		   would "crowbar" the dynticks_nmi_nesting counter to 0].
> > 
> > You got it!  ;-)
> 
> Thanks a lot for confirming,

Thank you for looking into this!

								Thanx, Paul


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-12 15:20         ` Paul E. McKenney
@ 2019-03-13 15:09           ` Joel Fernandes
  2019-03-13 15:27             ` Steven Rostedt
  0 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2019-03-13 15:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, rcu, jiangshanlai, dipankar, mathieu.desnoyers,
	josh, rostedt, luto, byungchul.park

On Tue, Mar 12, 2019 at 08:20:34AM -0700, Paul E. McKenney wrote:
[snip]
> 
> >            Could we be more explicit in the code that this function can only
> > be called from an interrupt, and also we change the code comment to be more
> > clear about it (like the following diff)?
> 
> That would be good!
> 
> Nice trick on using dyntick state to check for interrupt nesting, but
> wouldn't consolidating the counters break that?  But is there a lockdep
> check for being in a hardware interrupt handler?  If not, could one
> be added?  This would have the benefit of not adding overhead to the
> scheduling-clock interrupt in production builds of the Linux kernel,
> while still finding this bug in testing.
> 
> (Another approach would be to use IS_ENABLED(CONFIG_PROVE_RCU), but
> a lockdep check would be cleaner.)

AFAICS, lockdep does not specifically track when we enter an interrupt, but
rather only tracks when interrupts are enabled/disabled.

But we could use in_irq() to find if we are in an interrupt (which uses the
preempt_count to track in HARDIRQ_MASK section of the counter).

I will add an in_irq() check that does the check only when PROVE_RCU is
enabled, and send a patch.

Thanks and it is my pleasure to look into this, quite interesting!

 - Joel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-13 15:09           ` Joel Fernandes
@ 2019-03-13 15:27             ` Steven Rostedt
  2019-03-13 15:51               ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Steven Rostedt @ 2019-03-13 15:27 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Wed, 13 Mar 2019 11:09:48 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> AFAICS, lockdep does not specifically track when we enter an interrupt, but
> rather only tracks when interrupts are enabled/disabled.

It does:

#define __irq_enter()					\
	do {						\
		account_irq_enter_time(current);	\
		preempt_count_add(HARDIRQ_OFFSET);	\
		trace_hardirq_enter();			\
	} while (0)

# define trace_hardirq_enter()			\
do {						\
	current->hardirq_context++;		\
} while (0)


And if the hardirq_context ever does not match "in_irq()" lockdep will
complain loudly.

-- Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-13 15:27             ` Steven Rostedt
@ 2019-03-13 15:51               ` Paul E. McKenney
  2019-03-13 16:51                 ` Steven Rostedt
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2019-03-13 15:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Wed, Mar 13, 2019 at 11:27:26AM -0400, Steven Rostedt wrote:
> On Wed, 13 Mar 2019 11:09:48 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > AFAICS, lockdep does not specifically track when we enter an interrupt, but
> > rather only tracks when interrupts are enabled/disabled.
> 
> It does:
> 
> #define __irq_enter()					\
> 	do {						\
> 		account_irq_enter_time(current);	\
> 		preempt_count_add(HARDIRQ_OFFSET);	\
> 		trace_hardirq_enter();			\
> 	} while (0)
> 
> # define trace_hardirq_enter()			\
> do {						\
> 	current->hardirq_context++;		\
> } while (0)
> 
> 
> And if the hardirq_context ever does not match "in_irq()" lockdep will
> complain loudly.

Good to know, thank you!

Does this mean that there is a better approach that Joel's suggestion?
I believe he would end up with something like this:

	WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && !in_irq());

It would be nice if there is something like this:

	lockdep_assert_in_irq_handler();

But I haven't seen this.  (Not that I have looked particularly hard for
such a thing, mind you!)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-13 15:51               ` Paul E. McKenney
@ 2019-03-13 16:51                 ` Steven Rostedt
  2019-03-13 18:07                   ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Steven Rostedt @ 2019-03-13 16:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Wed, 13 Mar 2019 08:51:55 -0700
"Paul E. McKenney" <paulmck@linux.ibm.com> wrote:

> Does this mean that there is a better approach that Joel's suggestion?
> I believe he would end up with something like this:
> 
> 	WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && !in_irq());
> 
> It would be nice if there is something like this:
> 
> 	lockdep_assert_in_irq_handler();
> 
> But I haven't seen this.  (Not that I have looked particularly hard for
> such a thing, mind you!)

That would be trivial to implement:

#define lockdep_assert_in_irq() do {
		WARN_ON(debug_locks && !current->hardirq_context);
	} while (0)

-- Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-13 16:51                 ` Steven Rostedt
@ 2019-03-13 18:07                   ` Paul E. McKenney
  2019-03-14 12:31                     ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2019-03-13 18:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Wed, Mar 13, 2019 at 12:51:25PM -0400, Steven Rostedt wrote:
> On Wed, 13 Mar 2019 08:51:55 -0700
> "Paul E. McKenney" <paulmck@linux.ibm.com> wrote:
> 
> > Does this mean that there is a better approach that Joel's suggestion?
> > I believe he would end up with something like this:
> > 
> > 	WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && !in_irq());
> > 
> > It would be nice if there is something like this:
> > 
> > 	lockdep_assert_in_irq_handler();
> > 
> > But I haven't seen this.  (Not that I have looked particularly hard for
> > such a thing, mind you!)
> 
> That would be trivial to implement:
> 
> #define lockdep_assert_in_irq() do {
> 		WARN_ON(debug_locks && !current->hardirq_context);
> 	} while (0)

Looks good to me!

Joel, does this work for you?  I could be wrong, but I suspect that Steve
is suggesting that you incorporate the above into your eventual patch.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-13 18:07                   ` Paul E. McKenney
@ 2019-03-14 12:31                     ` Joel Fernandes
  2019-03-14 13:36                       ` Steven Rostedt
  0 siblings, 1 reply; 49+ messages in thread
From: Joel Fernandes @ 2019-03-14 12:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Wed, Mar 13, 2019 at 11:07:30AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 13, 2019 at 12:51:25PM -0400, Steven Rostedt wrote:
> > On Wed, 13 Mar 2019 08:51:55 -0700
> > "Paul E. McKenney" <paulmck@linux.ibm.com> wrote:
> > 
> > > Does this mean that there is a better approach that Joel's suggestion?
> > > I believe he would end up with something like this:
> > > 
> > > 	WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && !in_irq());
> > > 
> > > It would be nice if there is something like this:
> > > 
> > > 	lockdep_assert_in_irq_handler();
> > > 
> > > But I haven't seen this.  (Not that I have looked particularly hard for
> > > such a thing, mind you!)
> > 
> > That would be trivial to implement:
> > 
> > #define lockdep_assert_in_irq() do {
> > 		WARN_ON(debug_locks && !current->hardirq_context);
> > 	} while (0)
> 
> Looks good to me!
> 
> Joel, does this work for you?  I could be wrong, but I suspect that Steve
> is suggesting that you incorporate the above into your eventual patch.  ;-)

Oh thanks for pointing that out. Yes it does work for me. I agree with the
lockdep API addition and others could benefit from it too. I will incorporate
the lockdep API addition into the RCU patch, but let me know if I should
rather split it.

thanks!

 - Joel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-14 12:31                     ` Joel Fernandes
@ 2019-03-14 13:36                       ` Steven Rostedt
  2019-03-14 13:37                         ` Steven Rostedt
  0 siblings, 1 reply; 49+ messages in thread
From: Steven Rostedt @ 2019-03-14 13:36 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Thu, 14 Mar 2019 08:31:59 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> Oh thanks for pointing that out. Yes it does work for me. I agree with the
> lockdep API addition and others could benefit from it too. I will incorporate
> the lockdep API addition into the RCU patch, but let me know if I should
> rather split it.

I'd recommend splitting it (adding the lockdep_assert in a patch by
itself), but make sure that patch Cc's the lockdep maintainers and
explains the reason for adding it. Might as well Cc the lockdep
maintainers on the entire series too.

-- Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-14 13:36                       ` Steven Rostedt
@ 2019-03-14 13:37                         ` Steven Rostedt
  2019-03-14 21:27                           ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Steven Rostedt @ 2019-03-14 13:37 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Thu, 14 Mar 2019 09:36:57 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 14 Mar 2019 08:31:59 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > Oh thanks for pointing that out. Yes it does work for me. I agree with the
> > lockdep API addition and others could benefit from it too. I will incorporate
> > the lockdep API addition into the RCU patch, but let me know if I should
> > rather split it.  
> 
> I'd recommend splitting it (adding the lockdep_assert in a patch by
> itself), but make sure that patch Cc's the lockdep maintainers and
> explains the reason for adding it. Might as well Cc the lockdep
> maintainers on the entire series too.
>

Feel free to add "Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>"
on that patch too.

-- Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-14 13:37                         ` Steven Rostedt
@ 2019-03-14 21:27                           ` Joel Fernandes
  0 siblings, 0 replies; 49+ messages in thread
From: Joel Fernandes @ 2019-03-14 21:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, luto, byungchul.park

On Thu, Mar 14, 2019 at 09:37:46AM -0400, Steven Rostedt wrote:
> On Thu, 14 Mar 2019 09:36:57 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, 14 Mar 2019 08:31:59 -0400
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> > 
> > > Oh thanks for pointing that out. Yes it does work for me. I agree with the
> > > lockdep API addition and others could benefit from it too. I will incorporate
> > > the lockdep API addition into the RCU patch, but let me know if I should
> > > rather split it.  
> > 
> > I'd recommend splitting it (adding the lockdep_assert in a patch by
> > itself), but make sure that patch Cc's the lockdep maintainers and
> > explains the reason for adding it. Might as well Cc the lockdep
> > maintainers on the entire series too.
> >
> 
> Feel free to add "Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>"
> on that patch too.

Will definitely split it and add your tag. Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-11 13:39   ` Joel Fernandes
  2019-03-11 22:29     ` Paul E. McKenney
@ 2019-03-15  7:31     ` Byungchul Park
  2019-03-15  7:44       ` Byungchul Park
  1 sibling, 1 reply; 49+ messages in thread
From: Byungchul Park @ 2019-03-15  7:31 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, rostedt, luto, kernel-team

On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > on the other.  These things really did happen at one time, as evidenced
> > by this ca-2011 LKML post:
> > 
> > http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> > 
> > The reason why RCU tolerates half-interrupts is that usermode helpers
> > used exceptions to invoke a system call from within the kernel such that
> > the system call did a normal return (not a return from exception) to
> > the calling context.  This caused rcu_irq_enter() to be invoked without
> > a matching rcu_irq_exit().  However, usermode helpers have since been
> > rewritten to make much more housebroken use of workqueues, kernel threads,
> > and do_execve(), and therefore should no longer produce half-interrupts.
> > No one knows of any other source of half-interrupts, but then again,
> > no one seems insane enough to go audit the entire kernel to verify that
> > half-interrupts really are a relic of the past.
> > 
> > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > trigger in the presence of half interrupts, which the code will continue
> > to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > would be a considerable simplification.
> 
> Hi Paul and everyone,
> I was thinking some more about this patch and whether we can simplify this code
> much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> keep it fresh in memory is a good idea ;-)
> 
> To me it seems we cannot easily combine the counters (dynticks_nesting and
> dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> half-interrupt scenario (assuming simplication means counter combining like
> Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> 2 counters need to be tracked separately as they are used differently in the
> following function:

Hi Joel and Paul,

I always love the way to logically approach problems so I'm a fan of
all your works :) But I'm JUST curious about something here. Why can't
we combine them the way I tried even if we confirm no possibility of
half-interrupt? IMHO, the only thing we want to know through calling
rcu_is_cpu_rrupt_from_idle() is whether the interrupt comes from
RCU-idle or not - of course assuming the caller context always be an
well-defined interrupt context like e.g. the tick handler.

So the function can return true if the caller is within a RCU-idle
region except a well-known single interrupt nested.

Of course, now that we cannot confirm it yet, the crowbar is necessary.
But does it still have a problem even after confirming it? Why? What am
I missing? Could you explain why for me? :(

Thanks,
Byungchul

> static int rcu_is_cpu_rrupt_from_idle(void)
> {
>         return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
>                __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> }
> 
> dynticks_nesting actually tracks if we entered/exited idle or user mode.
> 
> dynticks_nmi_nesting tracks if we entered/exited interrupts.
> 
> We have to do the "dynticks_nmi_nesting <= 1" check because
> rcu_is_cpu_rrupt_from_idle() can possibly be called from an interrupt itself
> (like timer) so we discount 1 interrupt, and, the "dynticks_nesting <= 0"
> check is because the CPU MUST be in user or idle for the check to return
> true. We can't really combine these two into one counter then I think because
> they both convey different messages.
> 
> The only simplication we can do, is probably the "crowbar" updates to
> dynticks_nmi_nesting can be removed from rcu_eqs_enter/exit once we confirm
> no more half-interrupts are possible. Which might still be a worthwhile thing
> to do (while still keeping both counters separate).
> 
> However, I think we could combine the counters and lead to simplying the code
> in case we implement rcu_is_cpu_rrupt_from_idle differently such that it does
> not need the counters but NOHZ_FULL may take issue with that since it needs
> rcu_user_enter->rcu_eqs_enter to convey that the CPU is "RCU"-idle.
> 
> Actually, I had another question... rcu_user_enter() is a NOOP in !NOHZ_FULL config.
> In this case I was wondering if the the warning Paul added (in the patch I'm replying to)
> will really get fired for half-interrupts. The vast majority of the systems I believe are
> NOHZ_IDLE not NOHZ_FULL.
> This is what a half-interrupt really looks like right? Please correct me if I'm wrong:
> rcu_irq_enter()   [half interrupt causes an exception and thus rcu_irq_enter]
> rcu_user_enter()  [due to usermode upcall]
> rcu_user_exit()
> (no more rcu_irq_exit() - hence half an interrupt)
> 
> But the rcu_user_enter()/exit is a NOOP in some configs, so will the warning in
> rcu_eqs_e{xit,nter} really do anything?
> 
> Or was the idea with adding the new warnings, that they would fire the next
> time rcu_idle_enter/exit is called? Like for example:
> 
> rcu_irq_enter()   [This is due to half-interrupt]
> rcu_idle_enter()  [Eventually we enter the idle loop at some point
> 		   after the half-interrupt and the rcu_eqs_enter()
> 		   would "crowbar" the dynticks_nmi_nesting counter to 0].
> 
> thanks!
> 
>  - Joel
> 
> > 
> > Reported-by: Steven Rostedt <rostedt@goodmis.org>
> > Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > Reported-by: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index dc041c2afbcc..d2b6ade692c9 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -714,6 +714,7 @@ static void rcu_eqs_enter(bool user)
> >  	struct rcu_dynticks *rdtp;
> >  
> >  	rdtp = this_cpu_ptr(&rcu_dynticks);
> > +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> >  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
> >  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> >  		     rdtp->dynticks_nesting == 0);
> > @@ -895,6 +896,7 @@ static void rcu_eqs_exit(bool user)
> >  	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
> >  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
> >  	WRITE_ONCE(rdtp->dynticks_nesting, 1);
> > +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting);
> >  	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
> >  }
> >  
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-15  7:31     ` Byungchul Park
@ 2019-03-15  7:44       ` Byungchul Park
  2019-03-15 13:46         ` Joel Fernandes
  0 siblings, 1 reply; 49+ messages in thread
From: Byungchul Park @ 2019-03-15  7:44 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, rostedt, luto, kernel-team

On 03/15/2019 04:31 PM, Byungchul Park wrote:
> On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
>> On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
>>> RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
>>> either an interrupt that invokes rcu_irq_enter() but never invokes the
>>> corresponding rcu_irq_exit() on the one hand, or an interrupt that never
>>> invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
>>> on the other.  These things really did happen at one time, as evidenced
>>> by this ca-2011 LKML post:
>>>
>>> http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
>>>
>>> The reason why RCU tolerates half-interrupts is that usermode helpers
>>> used exceptions to invoke a system call from within the kernel such that
>>> the system call did a normal return (not a return from exception) to
>>> the calling context.  This caused rcu_irq_enter() to be invoked without
>>> a matching rcu_irq_exit().  However, usermode helpers have since been
>>> rewritten to make much more housebroken use of workqueues, kernel threads,
>>> and do_execve(), and therefore should no longer produce half-interrupts.
>>> No one knows of any other source of half-interrupts, but then again,
>>> no one seems insane enough to go audit the entire kernel to verify that
>>> half-interrupts really are a relic of the past.
>>>
>>> This commit therefore adds a pair of WARN_ON_ONCE() calls that will
>>> trigger in the presence of half interrupts, which the code will continue
>>> to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
>>> mid-2021, then perhaps RCU can stop handling half-interrupts, which
>>> would be a considerable simplification.
>> Hi Paul and everyone,
>> I was thinking some more about this patch and whether we can simplify this code
>> much in 2021. Since 2021 is a bit far away, I thought working on it in again to
>> keep it fresh in memory is a good idea ;-)
>>
>> To me it seems we cannot easily combine the counters (dynticks_nesting and
>> dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
>> half-interrupt scenario (assuming simplication means counter combining like
>> Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
>> 2 counters need to be tracked separately as they are used differently in the
>> following function:
> Hi Joel and Paul,
>
> I always love the way to logically approach problems so I'm a fan of
> all your works :) But I'm JUST curious about something here. Why can't
> we combine them the way I tried even if we confirm no possibility of
> half-interrupt? IMHO, the only thing we want to know through calling
> rcu_is_cpu_rrupt_from_idle() is whether the interrupt comes from
> RCU-idle or not - of course assuming the caller context always be an
> well-defined interrupt context like e.g. the tick handler.
>
> So the function can return true if the caller is within a RCU-idle
> region except a well-known single interrupt nested.
>
> Of course, now that we cannot confirm it yet, the crowbar is necessary.
> But does it still have a problem even after confirming it? Why? What am
> I missing? Could you explain why for me? :(


Did you also want to consider the case the function is called from 
others than
well-known interrupt contexts? If yes, then I agree with you, there doesn't
seem to be the kind of code and it's not a good idea to let the function be
called generally though.


> Thanks,
> Byungchul
>
>> static int rcu_is_cpu_rrupt_from_idle(void)
>> {
>>          return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
>>                 __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
>> }
>>
>> dynticks_nesting actually tracks if we entered/exited idle or user mode.
>>
>> dynticks_nmi_nesting tracks if we entered/exited interrupts.
>>
>> We have to do the "dynticks_nmi_nesting <= 1" check because
>> rcu_is_cpu_rrupt_from_idle() can possibly be called from an interrupt itself
>> (like timer) so we discount 1 interrupt, and, the "dynticks_nesting <= 0"
>> check is because the CPU MUST be in user or idle for the check to return
>> true. We can't really combine these two into one counter then I think because
>> they both convey different messages.
>>
>> The only simplication we can do, is probably the "crowbar" updates to
>> dynticks_nmi_nesting can be removed from rcu_eqs_enter/exit once we confirm
>> no more half-interrupts are possible. Which might still be a worthwhile thing
>> to do (while still keeping both counters separate).
>>
>> However, I think we could combine the counters and lead to simplying the code
>> in case we implement rcu_is_cpu_rrupt_from_idle differently such that it does
>> not need the counters but NOHZ_FULL may take issue with that since it needs
>> rcu_user_enter->rcu_eqs_enter to convey that the CPU is "RCU"-idle.
>>
>> Actually, I had another question... rcu_user_enter() is a NOOP in !NOHZ_FULL config.
>> In this case I was wondering if the the warning Paul added (in the patch I'm replying to)
>> will really get fired for half-interrupts. The vast majority of the systems I believe are
>> NOHZ_IDLE not NOHZ_FULL.
>> This is what a half-interrupt really looks like right? Please correct me if I'm wrong:
>> rcu_irq_enter()   [half interrupt causes an exception and thus rcu_irq_enter]
>> rcu_user_enter()  [due to usermode upcall]
>> rcu_user_exit()
>> (no more rcu_irq_exit() - hence half an interrupt)
>>
>> But the rcu_user_enter()/exit is a NOOP in some configs, so will the warning in
>> rcu_eqs_e{xit,nter} really do anything?
>>
>> Or was the idea with adding the new warnings, that they would fire the next
>> time rcu_idle_enter/exit is called? Like for example:
>>
>> rcu_irq_enter()   [This is due to half-interrupt]
>> rcu_idle_enter()  [Eventually we enter the idle loop at some point
>> 		   after the half-interrupt and the rcu_eqs_enter()
>> 		   would "crowbar" the dynticks_nmi_nesting counter to 0].
>>
>> thanks!
>>
>>   - Joel
>>
>>> Reported-by: Steven Rostedt <rostedt@goodmis.org>
>>> Reported-by: Joel Fernandes <joel@joelfernandes.org>
>>> Reported-by: Andy Lutomirski <luto@kernel.org>
>>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>>> ---
>>>   kernel/rcu/tree.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index dc041c2afbcc..d2b6ade692c9 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -714,6 +714,7 @@ static void rcu_eqs_enter(bool user)
>>>   	struct rcu_dynticks *rdtp;
>>>   
>>>   	rdtp = this_cpu_ptr(&rcu_dynticks);
>>> +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
>>>   	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
>>>   	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>>>   		     rdtp->dynticks_nesting == 0);
>>> @@ -895,6 +896,7 @@ static void rcu_eqs_exit(bool user)
>>>   	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
>>>   	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
>>>   	WRITE_ONCE(rdtp->dynticks_nesting, 1);
>>> +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting);
>>>   	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
>>>   }
>>>   
>>> -- 
>>> 2.17.1
>>>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts
  2019-03-15  7:44       ` Byungchul Park
@ 2019-03-15 13:46         ` Joel Fernandes
  0 siblings, 0 replies; 49+ messages in thread
From: Joel Fernandes @ 2019-03-15 13:46 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Paul E. McKenney, linux-kernel, rcu, jiangshanlai, dipankar,
	mathieu.desnoyers, josh, rostedt, luto, kernel-team

On Fri, Mar 15, 2019 at 04:44:52PM +0900, Byungchul Park wrote:
> On 03/15/2019 04:31 PM, Byungchul Park wrote:
> > On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > > > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > > > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > > > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > > > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > > > on the other.  These things really did happen at one time, as evidenced
> > > > by this ca-2011 LKML post:
> > > > 
> > > > http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> > > > 
> > > > The reason why RCU tolerates half-interrupts is that usermode helpers
> > > > used exceptions to invoke a system call from within the kernel such that
> > > > the system call did a normal return (not a return from exception) to
> > > > the calling context.  This caused rcu_irq_enter() to be invoked without
> > > > a matching rcu_irq_exit().  However, usermode helpers have since been
> > > > rewritten to make much more housebroken use of workqueues, kernel threads,
> > > > and do_execve(), and therefore should no longer produce half-interrupts.
> > > > No one knows of any other source of half-interrupts, but then again,
> > > > no one seems insane enough to go audit the entire kernel to verify that
> > > > half-interrupts really are a relic of the past.
> > > > 
> > > > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > > > trigger in the presence of half interrupts, which the code will continue
> > > > to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> > > > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > > > would be a considerable simplification.
> > > Hi Paul and everyone,
> > > I was thinking some more about this patch and whether we can simplify this code
> > > much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> > > keep it fresh in memory is a good idea ;-)
> > > 
> > > To me it seems we cannot easily combine the counters (dynticks_nesting and
> > > dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> > > half-interrupt scenario (assuming simplication means counter combining like
> > > Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> > > 2 counters need to be tracked separately as they are used differently in the
> > > following function:
> > Hi Joel and Paul,
> > 
> > I always love the way to logically approach problems so I'm a fan of
> > all your works :) But I'm JUST curious about something here. Why can't
> > we combine them the way I tried even if we confirm no possibility of
> > half-interrupt? IMHO, the only thing we want to know through calling
> > rcu_is_cpu_rrupt_from_idle() is whether the interrupt comes from
> > RCU-idle or not - of course assuming the caller context always be an
> > well-defined interrupt context like e.g. the tick handler.
> > 
> > So the function can return true if the caller is within a RCU-idle
> > region except a well-known single interrupt nested.
> > 
> > Of course, now that we cannot confirm it yet, the crowbar is necessary.
> > But does it still have a problem even after confirming it? Why? What am
> > I missing? Could you explain why for me? :(
> 
> 
> Did you also want to consider the case the function is called from others
> than
> well-known interrupt contexts? If yes, then I agree with you, there doesn't
> seem to be the kind of code and it's not a good idea to let the function be
> called generally though.

We were discussing exactly this on the thread. I am going to be adding a
lockdep check to make sure the function isn't called from anywhere but an
interrupt context. Then once we can confirm that there are no more
half-interrupts in the future, we can apply your counter combining approach.

Based on the comments on the rrupt_from_idle() function, it wasn't clear to
me if the function was intended to be called from any context. That's why I
thought the counter combing approach would not work, but you are right - it
should work.

I will CC you on the lockdep check patch. Also hope you are subscribed to
the new shiny rcu@vger.kernel.org list as well :-)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2019-03-15 13:46 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-29 22:20 [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 01/19] rcu: Refactor rcu_{nmi,irq}_{enter,exit}() Paul E. McKenney
2018-08-30 18:10   ` Steven Rostedt
2018-08-30 23:02     ` Paul E. McKenney
2018-08-31  2:25     ` Byungchul Park
2018-08-29 22:20 ` [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled Paul E. McKenney
2018-10-29 11:24   ` Ran Rozenstein
2018-10-29 14:27     ` Paul E. McKenney
2018-10-30  3:44       ` Joel Fernandes
2018-10-30 12:58         ` Paul E. McKenney
2018-10-30 22:21           ` Joel Fernandes
2018-10-31 18:22             ` Paul E. McKenney
2018-11-02 19:43               ` Paul E. McKenney
2018-11-26 13:55                 ` Ran Rozenstein
2018-11-26 19:00                   ` Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 03/19] rcutorture: Test extended "rcu" read-side critical sections Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 04/19] rcu: Allow processing deferred QSes for exiting RCU-preempt readers Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 05/19] rcu: Remove now-unused ->b.exp_need_qs field from the rcu_special union Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts Paul E. McKenney
2019-03-11 13:39   ` Joel Fernandes
2019-03-11 22:29     ` Paul E. McKenney
2019-03-12 15:05       ` Joel Fernandes
2019-03-12 15:20         ` Paul E. McKenney
2019-03-13 15:09           ` Joel Fernandes
2019-03-13 15:27             ` Steven Rostedt
2019-03-13 15:51               ` Paul E. McKenney
2019-03-13 16:51                 ` Steven Rostedt
2019-03-13 18:07                   ` Paul E. McKenney
2019-03-14 12:31                     ` Joel Fernandes
2019-03-14 13:36                       ` Steven Rostedt
2019-03-14 13:37                         ` Steven Rostedt
2019-03-14 21:27                           ` Joel Fernandes
2019-03-15  7:31     ` Byungchul Park
2019-03-15  7:44       ` Byungchul Park
2019-03-15 13:46         ` Joel Fernandes
2018-08-29 22:20 ` [PATCH tip/core/rcu 07/19] rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 08/19] rcu: Report expedited grace periods at context-switch time Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 09/19] rcu: Define RCU-bh update API in terms of RCU Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 10/19] rcu: Update comments and help text for no more RCU-bh updaters Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 11/19] rcu: Drop "wake" parameter from rcu_report_exp_rdp() Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 12/19] rcu: Fix typo in rcu_get_gp_kthreads_prio() header comment Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 13/19] rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 14/19] rcu: Express Tiny RCU updates in terms of RCU rather than RCU-sched Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 15/19] rcu: Remove RCU_STATE_INITIALIZER() Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 16/19] rcu: Eliminate rcu_state structure's ->call field Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 17/19] rcu: Remove rcu_state structure's ->rda field Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 18/19] rcu: Remove rcu_state_p pointer to default rcu_state structure Paul E. McKenney
2018-08-29 22:20 ` [PATCH tip/core/rcu 19/19] rcu: Remove rcu_data_p pointer to default rcu_data structure Paul E. McKenney
2018-08-29 22:22 ` [PATCH tip/core/rcu 0/19] RCU flavor-consolidation changes for v4.20/v5.0 Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).