All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/15] General fixes
@ 2017-07-24 21:44 Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
                   ` (14 more replies)
  0 siblings, 15 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg

Hello!

This series contains general fixes:

1.	Make cond_resched() provide RCU quiescent state, which finally
	avoids degrading performance.

2.	Use timer as backstop for NOCB deferred wakeups.

3.	Simplify RCU Kconfig by driving TASKS_RCU directly off of PREEMPT.

4.	Create reasonable API for do_exit() TASKS_RCU processing.

5.	Add TPS() to event-traced strings.

6.	Move rcu.h to new trivial-function style.

7.	Add event tracing to ->gp_tasks update at GP start.

8.	Add idle swait variants which don't contribute to load average,
	courtesy of Luis R. Rodriguez.

9.	Use idle versions of swait to make idle-hack clear, courtesy of
	Luis R. Rodriguez.

10.	Add TPS() protection for _rcu_barrier_trace strings.

11.	Set disable_rcu_irq_enter on rcu_eqs_exit(), courtesy of
	Masami Hiramatsu.

12.	Add assertions verifying blocked-tasks list.

13.	Make rcu_idle_enter() rely on callers disabling irqs, courtesy
	of Peter Zijlstra.

14.	Add warning to rcu_idle_enter() for irqs enabled.

15.	Remove exports from rcu_idle_exit() and rcu_idle_enter().

							Thanx, Paul

------------------------------------------------------------------------

 include/linux/rcupdate.h                                    |   13 
 include/linux/sched.h                                       |    8 
 include/linux/swait.h                                       |   55 +++
 kernel/exit.c                                               |    7 
 kernel/rcu/Kconfig                                          |    3 
 kernel/rcu/rcu.h                                            |  128 +------
 kernel/rcu/rcutorture.c                                     |   17 
 kernel/rcu/tree.c                                           |   75 +---
 kernel/rcu/tree.h                                           |    2 
 kernel/rcu/tree_plugin.h                                    |  211 +++++++-----
 kernel/rcu/update.c                                         |   18 -
 kernel/sched/core.c                                         |    1 
 tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt |    2 
 13 files changed, 276 insertions(+), 264 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-08-15 16:19   ` [PATCH v5 " Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Paul E. McKenney
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

There is some confusion as to which of cond_resched() or
cond_resched_rcu_qs() should be added to long in-kernel loops.
This commit therefore eliminates the decision by adding RCU quiescent
states to cond_resched().  This commit also simplifies the code that
used to interact with cond_resched_rcu_qs(), and that now interacts with
cond_resched(), to reduce its overhead.  This reduction is necessary to
allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
everywhere that cond_resched() is invoked.

Part of that reduction in overhead converts the jiffies_till_sched_qs
kernel parameter to read-only at runtime, thus eliminating the need for
bounds checking.

Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/sched.h |  3 ++-
 kernel/rcu/tree.c     | 23 ++++-------------------
 kernel/sched/core.c   |  1 +
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2ba9ec93423f..12f326aa5871 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1508,10 +1508,11 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
  * cond_resched_lock() will drop the spinlock before scheduling,
  * cond_resched_softirq() will enable bhs before scheduling.
  */
+void rcu_all_qs(void);
 #ifndef CONFIG_PREEMPT
 extern int _cond_resched(void);
 #else
-static inline int _cond_resched(void) { return 0; }
+static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
 #endif
 
 #define cond_resched() ({			\
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 51d4c3acf32d..ab4c2cea208d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -537,8 +537,8 @@ module_param(rcu_kick_kthreads, bool, 0644);
  * How long the grace period must be before we start recruiting
  * quiescent-state help from rcu_note_context_switch().
  */
-static ulong jiffies_till_sched_qs = HZ / 20;
-module_param(jiffies_till_sched_qs, ulong, 0644);
+static ulong jiffies_till_sched_qs = HZ / 16;
+module_param(jiffies_till_sched_qs, ulong, 0444);
 
 static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
 				  struct rcu_data *rdp);
@@ -1230,7 +1230,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	unsigned long jtsq;
 	bool *rnhqp;
 	bool *ruqp;
-	unsigned long rjtsc;
 	struct rcu_node *rnp;
 
 	/*
@@ -1247,23 +1246,13 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		return 1;
 	}
 
-	/* Compute and saturate jiffies_till_sched_qs. */
-	jtsq = jiffies_till_sched_qs;
-	rjtsc = rcu_jiffies_till_stall_check();
-	if (jtsq > rjtsc / 2) {
-		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
-		jtsq = rjtsc / 2;
-	} else if (jtsq < 1) {
-		WRITE_ONCE(jiffies_till_sched_qs, 1);
-		jtsq = 1;
-	}
-
 	/*
 	 * Has this CPU encountered a cond_resched_rcu_qs() since the
 	 * beginning of the grace period?  For this to be the case,
 	 * the CPU has to have noticed the current grace period.  This
 	 * might not be the case for nohz_full CPUs looping in the kernel.
 	 */
+	jtsq = jiffies_till_sched_qs;
 	rnp = rdp->mynode;
 	ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
 	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
@@ -1271,7 +1260,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
 		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
 		return 1;
-	} else {
+	} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
 		/* Load rcu_qs_ctr before store to rcu_urgent_qs. */
 		smp_store_release(ruqp, true);
 	}
@@ -1299,10 +1288,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	 * updates are only once every few jiffies, the probability of
 	 * lossage (and thus of slight grace-period extension) is
 	 * quite low.
-	 *
-	 * Note that if the jiffies_till_sched_qs boot/sysfs parameter
-	 * is set too high, we override with half of the RCU CPU stall
-	 * warning delay.
 	 */
 	rnhqp = &per_cpu(rcu_dynticks.rcu_need_heavy_qs, rdp->cpu);
 	if (!READ_ONCE(*rnhqp) &&
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 17c667b427b4..9433633012ba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4808,6 +4808,7 @@ int __sched _cond_resched(void)
 		preempt_schedule_common();
 		return 1;
 	}
+	rcu_all_qs();
 	return 0;
 }
 EXPORT_SYMBOL(_cond_resched);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-25 18:12   ` Steven Rostedt
  2017-07-24 21:44 ` [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT Paul E. McKenney
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

The handling of RCU's no-CBs CPUs has a maintenance headache, namely
that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
wakeup must be defered to a point where we can be sure that scheduler
locks are not held.  Of course, there are a lot of code paths leading
from an interrupts-disabled invocation of call_rcu(), and missing any
one of these can result in excessive callback-invocation latency, and
potentially even system hangs.

This commit therefore uses a timer to guarantee that the wakeup will
eventually occur.  If one of the deferred-wakeup points kicks in, then
the timer is simply cancelled.

This commit also fixes up an incomplete removal of commits that were
intended to plug remaining exit paths, which should have the added
benefit of reducing the overhead of RCU's context-switch hooks.  In
addition, it simplifies leader-to-follower callback-list handoff by
introducing locking.  The call_rcu()-to-leader handoff continues to
use atomic operations in order to maintain good real-time latency for
common-case use of call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
---
 kernel/rcu/tree.h        |   2 +
 kernel/rcu/tree_plugin.h | 179 +++++++++++++++++++++++++++++------------------
 2 files changed, 111 insertions(+), 70 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 9af0f31d6847..fe83f684ddcd 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -268,7 +268,9 @@ struct rcu_data {
 	struct rcu_head **nocb_follower_tail;
 	struct swait_queue_head nocb_wq; /* For nocb kthreads to sleep on. */
 	struct task_struct *nocb_kthread;
+	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
+	struct timer_list nocb_timer;	/* Enforce finite deferral. */
 
 	/* The following fields are used by the leader, hence own cacheline. */
 	struct rcu_head *nocb_gp_head ____cacheline_internodealigned_in_smp;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 908b309d60d7..bb9e6e43130f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1788,23 +1788,62 @@ bool rcu_is_nocb_cpu(int cpu)
 }
 
 /*
- * Kick the leader kthread for this NOCB group.
+ * Kick the leader kthread for this NOCB group.  Caller holds ->nocb_lock
+ * and this function releases it.
  */
-static void wake_nocb_leader(struct rcu_data *rdp, bool force)
+static void __wake_nocb_leader(struct rcu_data *rdp, bool force,
+			       unsigned long flags)
+	__releases(rdp->nocb_lock)
 {
 	struct rcu_data *rdp_leader = rdp->nocb_leader;
 
-	if (!READ_ONCE(rdp_leader->nocb_kthread))
+	lockdep_assert_held(&rdp->nocb_lock);
+	if (!READ_ONCE(rdp_leader->nocb_kthread)) {
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
-	if (READ_ONCE(rdp_leader->nocb_leader_sleep) || force) {
+	}
+	if (rdp_leader->nocb_leader_sleep || force) {
 		/* Prior smp_mb__after_atomic() orders against prior enqueue. */
 		WRITE_ONCE(rdp_leader->nocb_leader_sleep, false);
+		del_timer(&rdp->nocb_timer);
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		smp_mb(); /* ->nocb_leader_sleep before swake_up(). */
 		swake_up(&rdp_leader->nocb_wq);
+	} else {
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 	}
 }
 
 /*
+ * Kick the leader kthread for this NOCB group, but caller has not
+ * acquired locks.
+ */
+static void wake_nocb_leader(struct rcu_data *rdp, bool force)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	__wake_nocb_leader(rdp, force, flags);
+}
+
+/*
+ * Arrange to wake the leader kthread for this NOCB group at some
+ * future time when it is safe to do so.
+ */
+static void wake_nocb_leader_defer(struct rcu_data *rdp, int waketype,
+				   const char *reason)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
+		mod_timer(&rdp->nocb_timer, jiffies + 1);
+	WRITE_ONCE(rdp->nocb_defer_wakeup, waketype);
+	trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, reason);
+	raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
+}
+
+/*
  * Does the specified CPU need an RCU callback for the specified flavor
  * of rcu_barrier()?
  */
@@ -1891,11 +1930,8 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
 					    TPS("WakeEmpty"));
 		} else {
-			WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE);
-			/* Store ->nocb_defer_wakeup before ->rcu_urgent_qs. */
-			smp_store_release(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs), true);
-			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
-					    TPS("WakeEmptyIsDeferred"));
+			wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE,
+					       TPS("WakeEmptyIsDeferred"));
 		}
 		rdp->qlen_last_fqs_check = 0;
 	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
@@ -1905,11 +1941,8 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
 			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
 					    TPS("WakeOvf"));
 		} else {
-			WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_FORCE);
-			/* Store ->nocb_defer_wakeup before ->rcu_urgent_qs. */
-			smp_store_release(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs), true);
-			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
-					    TPS("WakeOvfIsDeferred"));
+			wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE,
+					       TPS("WakeOvfIsDeferred"));
 		}
 		rdp->qlen_last_fqs_check = LONG_MAX / 2;
 	} else {
@@ -2031,6 +2064,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
 static void nocb_leader_wait(struct rcu_data *my_rdp)
 {
 	bool firsttime = true;
+	unsigned long flags;
 	bool gotcbs;
 	struct rcu_data *rdp;
 	struct rcu_head **tail;
@@ -2042,7 +2076,11 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Sleep");
 		swait_event_interruptible(my_rdp->nocb_wq,
 				!READ_ONCE(my_rdp->nocb_leader_sleep));
-		/* Memory barrier handled by smp_mb() calls below and repoll. */
+		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
+		my_rdp->nocb_leader_sleep = true;
+		WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		del_timer(&my_rdp->nocb_timer);
+		raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
 	} else if (firsttime) {
 		firsttime = false; /* Don't drown trace log with "Poll"! */
 		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Poll");
@@ -2054,7 +2092,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 	 * nocb_gp_head, where they await a grace period.
 	 */
 	gotcbs = false;
-	smp_mb(); /* wakeup before ->nocb_head reads. */
+	smp_mb(); /* wakeup and _sleep before ->nocb_head reads. */
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_follower) {
 		rdp->nocb_gp_head = READ_ONCE(rdp->nocb_head);
 		if (!rdp->nocb_gp_head)
@@ -2066,56 +2104,41 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 		gotcbs = true;
 	}
 
-	/*
-	 * If there were no callbacks, sleep a bit, rescan after a
-	 * memory barrier, and go retry.
-	 */
+	/* No callbacks?  Sleep a bit if polling, and go retry.  */
 	if (unlikely(!gotcbs)) {
-		if (!rcu_nocb_poll)
+		WARN_ON(signal_pending(current));
+		if (rcu_nocb_poll) {
+			schedule_timeout_interruptible(1);
+		} else {
 			trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu,
 					    "WokeEmpty");
-		WARN_ON(signal_pending(current));
-		schedule_timeout_interruptible(1);
-
-		/* Rescan in case we were a victim of memory ordering. */
-		my_rdp->nocb_leader_sleep = true;
-		smp_mb();  /* Ensure _sleep true before scan. */
-		for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_follower)
-			if (READ_ONCE(rdp->nocb_head)) {
-				/* Found CB, so short-circuit next wait. */
-				my_rdp->nocb_leader_sleep = false;
-				break;
-			}
+		}
 		goto wait_again;
 	}
 
 	/* Wait for one grace period. */
 	rcu_nocb_wait_gp(my_rdp);
 
-	/*
-	 * We left ->nocb_leader_sleep unset to reduce cache thrashing.
-	 * We set it now, but recheck for new callbacks while
-	 * traversing our follower list.
-	 */
-	my_rdp->nocb_leader_sleep = true;
-	smp_mb(); /* Ensure _sleep true before scan of ->nocb_head. */
-
 	/* Each pass through the following loop wakes a follower, if needed. */
 	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_follower) {
-		if (READ_ONCE(rdp->nocb_head))
+		if (!rcu_nocb_poll &&
+		    READ_ONCE(rdp->nocb_head) &&
+		    READ_ONCE(my_rdp->nocb_leader_sleep)) {
+			raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
 			my_rdp->nocb_leader_sleep = false;/* No need to sleep.*/
+			raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
+		}
 		if (!rdp->nocb_gp_head)
 			continue; /* No CBs, so no need to wake follower. */
 
 		/* Append callbacks to follower's "done" list. */
-		tail = xchg(&rdp->nocb_follower_tail, rdp->nocb_gp_tail);
+		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+		tail = rdp->nocb_follower_tail;
+		rdp->nocb_follower_tail = rdp->nocb_gp_tail;
 		*tail = rdp->nocb_gp_head;
-		smp_mb__after_atomic(); /* Store *tail before wakeup. */
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		if (rdp != my_rdp && tail == &rdp->nocb_follower_head) {
-			/*
-			 * List was empty, wake up the follower.
-			 * Memory barriers supplied by atomic_long_add().
-			 */
+			/* List was empty, so wake up the follower.  */
 			swake_up(&rdp->nocb_wq);
 		}
 	}
@@ -2131,28 +2154,16 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
  */
 static void nocb_follower_wait(struct rcu_data *rdp)
 {
-	bool firsttime = true;
-
 	for (;;) {
-		if (!rcu_nocb_poll) {
-			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
-					    "FollowerSleep");
-			swait_event_interruptible(rdp->nocb_wq,
-						 READ_ONCE(rdp->nocb_follower_head));
-		} else if (firsttime) {
-			/* Don't drown trace log with "Poll"! */
-			firsttime = false;
-			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "Poll");
-		}
+		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "FollowerSleep");
+		swait_event_interruptible(rdp->nocb_wq,
+					 READ_ONCE(rdp->nocb_follower_head));
 		if (smp_load_acquire(&rdp->nocb_follower_head)) {
 			/* ^^^ Ensure CB invocation follows _head test. */
 			return;
 		}
-		if (!rcu_nocb_poll)
-			trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu,
-					    "WokeEmpty");
 		WARN_ON(signal_pending(current));
-		schedule_timeout_interruptible(1);
+		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeEmpty");
 	}
 }
 
@@ -2165,6 +2176,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
 static int rcu_nocb_kthread(void *arg)
 {
 	int c, cl;
+	unsigned long flags;
 	struct rcu_head *list;
 	struct rcu_head *next;
 	struct rcu_head **tail;
@@ -2179,11 +2191,14 @@ static int rcu_nocb_kthread(void *arg)
 			nocb_follower_wait(rdp);
 
 		/* Pull the ready-to-invoke callbacks onto local list. */
-		list = READ_ONCE(rdp->nocb_follower_head);
+		raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+		list = rdp->nocb_follower_head;
+		rdp->nocb_follower_head = NULL;
+		tail = rdp->nocb_follower_tail;
+		rdp->nocb_follower_tail = &rdp->nocb_follower_head;
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		BUG_ON(!list);
 		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeNonEmpty");
-		WRITE_ONCE(rdp->nocb_follower_head, NULL);
-		tail = xchg(&rdp->nocb_follower_tail, &rdp->nocb_follower_head);
 
 		/* Each pass through the following loop invokes a callback. */
 		trace_rcu_batch_start(rdp->rsp->name,
@@ -2226,18 +2241,39 @@ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
 }
 
 /* Do a deferred wakeup of rcu_nocb_kthread(). */
-static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
+static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 {
+	unsigned long flags;
 	int ndw;
 
-	if (!rcu_nocb_need_deferred_wakeup(rdp))
+	raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
+	if (!rcu_nocb_need_deferred_wakeup(rdp)) {
+		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		return;
+	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
 	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
-	wake_nocb_leader(rdp, ndw == RCU_NOCB_WAKE_FORCE);
+	__wake_nocb_leader(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWake"));
 }
 
+/* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */
+static void do_nocb_deferred_wakeup_timer(unsigned long x)
+{
+	do_nocb_deferred_wakeup_common((struct rcu_data *)x);
+}
+
+/*
+ * Do a deferred wakeup of rcu_nocb_kthread() from fastpath.
+ * This means we do an inexact common-case check.  Note that if
+ * we miss, ->nocb_timer will eventually clean things up.
+ */
+static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
+{
+	if (rcu_nocb_need_deferred_wakeup(rdp))
+		do_nocb_deferred_wakeup_common(rdp);
+}
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -2287,6 +2323,9 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 	rdp->nocb_tail = &rdp->nocb_head;
 	init_swait_queue_head(&rdp->nocb_wq);
 	rdp->nocb_follower_tail = &rdp->nocb_follower_head;
+	raw_spin_lock_init(&rdp->nocb_lock);
+	setup_timer(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer,
+		    (unsigned long)rdp);
 }
 
 /*
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-25 18:14   ` Steven Rostedt
  2017-07-24 21:44 ` [PATCH tip/core/rcu 04/15] rcu: Create reasonable API for do_exit() TASKS_RCU processing Paul E. McKenney
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
is used instead.  This commit therefore makes synchronize_rcu_tasks()
and call_rcu_tasks() available always, but mapped to synchronize_sched()
and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
allows some #ifdefs to be removed from rcutorture.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/rcupdate.h                                |  6 ++++--
 kernel/rcu/Kconfig                                      |  3 +--
 kernel/rcu/rcutorture.c                                 | 17 +----------------
 .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt       |  2 +-
 4 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f816fc72b51e..c3f380befdd7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -58,8 +58,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
-void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
-void synchronize_rcu_tasks(void);
 void rcu_barrier_tasks(void);
 
 #ifdef CONFIG_PREEMPT_RCU
@@ -176,10 +174,14 @@ extern struct srcu_struct tasks_rcu_exit_srcu;
 		rcu_all_qs(); \
 		rcu_note_voluntary_context_switch_lite(t); \
 	} while (0)
+void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
+void synchronize_rcu_tasks(void);
 #else /* #ifdef CONFIG_TASKS_RCU */
 #define TASKS_RCU(x) do { } while (0)
 #define rcu_note_voluntary_context_switch_lite(t)	do { } while (0)
 #define rcu_note_voluntary_context_switch(t)		rcu_all_qs()
+#define call_rcu_tasks call_rcu_sched
+#define synchronize_rcu_tasks synchronize_sched
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 /**
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index be90c945063f..9210379c0353 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -69,8 +69,7 @@ config TREE_SRCU
 	  This option selects the full-fledged version of SRCU.
 
 config TASKS_RCU
-	bool
-	default n
+	def_bool PREEMPT
 	select SRCU
 	help
 	  This option enables a task-based RCU implementation that uses
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index b8f7f8ce8575..b284c861a511 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -696,8 +696,6 @@ static struct rcu_torture_ops sched_ops = {
 	.name		= "sched"
 };
 
-#ifdef CONFIG_TASKS_RCU
-
 /*
  * Definitions for RCU-tasks torture testing.
  */
@@ -735,24 +733,11 @@ static struct rcu_torture_ops tasks_ops = {
 	.name		= "tasks"
 };
 
-#define RCUTORTURE_TASKS_OPS &tasks_ops,
-
 static bool __maybe_unused torturing_tasks(void)
 {
 	return cur_ops == &tasks_ops;
 }
 
-#else /* #ifdef CONFIG_TASKS_RCU */
-
-#define RCUTORTURE_TASKS_OPS
-
-static bool __maybe_unused torturing_tasks(void)
-{
-	return false;
-}
-
-#endif /* #else #ifdef CONFIG_TASKS_RCU */
-
 /*
  * RCU torture priority-boost testing.  Runs one real-time thread per
  * CPU for moderate bursts, repeatedly registering RCU callbacks and
@@ -1749,7 +1734,7 @@ rcu_torture_init(void)
 	int firsterr = 0;
 	static struct rcu_torture_ops *torture_ops[] = {
 		&rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
-		&sched_ops, RCUTORTURE_TASKS_OPS
+		&sched_ops, &tasks_ops,
 	};
 
 	if (!torture_init_begin(torture_type, verbose, &torture_runnable))
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index 9ad3f89c8dc7..af6fca03602f 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -69,11 +69,11 @@ CONFIG_RCU_TORTURE_TEST_RUNNABLE
 CONFIG_PREEMPT_RCU
 CONFIG_TREE_RCU
 CONFIG_TINY_RCU
+CONFIG_TASKS_RCU
 
 	These are controlled by CONFIG_PREEMPT and/or CONFIG_SMP.
 
 CONFIG_SRCU
-CONFIG_TASKS_RCU
 
 	Selected by CONFIG_RCU_TORTURE_TEST, so cannot disable.
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 04/15] rcu: Create reasonable API for do_exit() TASKS_RCU processing
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (2 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings Paul E. McKenney
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
APIs for do_exit() use.  This has the benefit of confining the use of the
tasks_rcu_exit_srcu variable to one file, allowing it to become static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |  7 ++++---
 include/linux/sched.h    |  5 +++--
 kernel/exit.c            |  7 ++-----
 kernel/rcu/update.c      | 18 +++++++++++++++++-
 4 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index c3f380befdd7..ce9d21923d75 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -162,8 +162,6 @@ static inline void rcu_init_nohz(void) { }
  * macro rather than an inline function to avoid #include hell.
  */
 #ifdef CONFIG_TASKS_RCU
-#define TASKS_RCU(x) x
-extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch_lite(t) \
 	do { \
 		if (READ_ONCE((t)->rcu_tasks_holdout)) \
@@ -176,12 +174,15 @@ extern struct srcu_struct tasks_rcu_exit_srcu;
 	} while (0)
 void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
 void synchronize_rcu_tasks(void);
+void exit_tasks_rcu_start(void);
+void exit_tasks_rcu_finish(void);
 #else /* #ifdef CONFIG_TASKS_RCU */
-#define TASKS_RCU(x) do { } while (0)
 #define rcu_note_voluntary_context_switch_lite(t)	do { } while (0)
 #define rcu_note_voluntary_context_switch(t)		rcu_all_qs()
 #define call_rcu_tasks call_rcu_sched
 #define synchronize_rcu_tasks synchronize_sched
+static inline void exit_tasks_rcu_start(void) { }
+static inline void exit_tasks_rcu_finish(void) { }
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 /**
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 12f326aa5871..f6d6ad47511d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -582,9 +582,10 @@ struct task_struct {
 
 #ifdef CONFIG_TASKS_RCU
 	unsigned long			rcu_tasks_nvcsw;
-	bool				rcu_tasks_holdout;
-	struct list_head		rcu_tasks_holdout_list;
+	u8				rcu_tasks_holdout;
+	u8				rcu_tasks_idx;
 	int				rcu_tasks_idle_cpu;
+	struct list_head		rcu_tasks_holdout_list;
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
 	struct sched_info		sched_info;
diff --git a/kernel/exit.c b/kernel/exit.c
index c5548faa9f37..d297c525f188 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -764,7 +764,6 @@ void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
-	TASKS_RCU(int tasks_rcu_i);
 
 	profile_task_exit(tsk);
 	kcov_task_exit(tsk);
@@ -881,9 +880,7 @@ void __noreturn do_exit(long code)
 	 */
 	flush_ptrace_hw_breakpoint(tsk);
 
-	TASKS_RCU(preempt_disable());
-	TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
-	TASKS_RCU(preempt_enable());
+	exit_tasks_rcu_start();
 	exit_notify(tsk, group_dead);
 	proc_exit_connector(tsk);
 	mpol_put_task_policy(tsk);
@@ -918,7 +915,7 @@ void __noreturn do_exit(long code)
 	if (tsk->nr_dirtied)
 		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
 	exit_rcu();
-	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
+	exit_tasks_rcu_finish();
 
 	do_task_dead();
 }
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 00e77c470017..5033b66d2753 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -568,7 +568,7 @@ static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
 static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
 
 /* Track exiting tasks in order to allow them to be waited for. */
-DEFINE_SRCU(tasks_rcu_exit_srcu);
+DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
 
 /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. */
 #define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10)
@@ -875,6 +875,22 @@ static void rcu_spawn_tasks_kthread(void)
 	mutex_unlock(&rcu_tasks_kthread_mutex);
 }
 
+/* Do the srcu_read_lock() for the above synchronize_srcu().  */
+void exit_tasks_rcu_start(void)
+{
+	preempt_disable();
+	current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
+	preempt_enable();
+}
+
+/* Do the srcu_read_unlock() for the above synchronize_srcu().  */
+void exit_tasks_rcu_finish(void)
+{
+	preempt_disable();
+	__srcu_read_unlock(&tasks_rcu_exit_srcu, current->rcu_tasks_idx);
+	preempt_enable();
+}
+
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
 #ifndef CONFIG_TINY_RCU
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (3 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 04/15] rcu: Create reasonable API for do_exit() TASKS_RCU processing Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-28  1:32   ` Steven Rostedt
  2017-07-24 21:44 ` [PATCH tip/core/rcu 06/15] rcu: Move rcu.h to new trivial-function style Paul E. McKenney
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

Strings used in event tracing need to be specially handled, for example,
using the TPS() macro.  Without the TPS() macro, although output looks
fine from within a running kernel, extracting traces from a crash dump
produces garbage instead of strings.  This commit therefore adds the TPS()
macro to some unadorned strings that were passed to event-tracing macros.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcu/tree_plugin.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index bb9e6e43130f..14ba496a13cd 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2073,7 +2073,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 
 	/* Wait for callbacks to appear. */
 	if (!rcu_nocb_poll) {
-		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Sleep");
+		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Sleep"));
 		swait_event_interruptible(my_rdp->nocb_wq,
 				!READ_ONCE(my_rdp->nocb_leader_sleep));
 		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
@@ -2083,7 +2083,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 		raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
 	} else if (firsttime) {
 		firsttime = false; /* Don't drown trace log with "Poll"! */
-		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Poll");
+		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Poll"));
 	}
 
 	/*
@@ -2111,7 +2111,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 			schedule_timeout_interruptible(1);
 		} else {
 			trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu,
-					    "WokeEmpty");
+					    TPS("WokeEmpty"));
 		}
 		goto wait_again;
 	}
@@ -2155,7 +2155,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
 static void nocb_follower_wait(struct rcu_data *rdp)
 {
 	for (;;) {
-		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "FollowerSleep");
+		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("FollowerSleep"));
 		swait_event_interruptible(rdp->nocb_wq,
 					 READ_ONCE(rdp->nocb_follower_head));
 		if (smp_load_acquire(&rdp->nocb_follower_head)) {
@@ -2163,7 +2163,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
 			return;
 		}
 		WARN_ON(signal_pending(current));
-		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeEmpty");
+		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WokeEmpty"));
 	}
 }
 
@@ -2198,7 +2198,7 @@ static int rcu_nocb_kthread(void *arg)
 		rdp->nocb_follower_tail = &rdp->nocb_follower_head;
 		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
 		BUG_ON(!list);
-		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeNonEmpty");
+		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WokeNonEmpty"));
 
 		/* Each pass through the following loop invokes a callback. */
 		trace_rcu_batch_start(rdp->rsp->name,
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 06/15] rcu: Move rcu.h to new trivial-function style
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (4 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start Paul E. McKenney
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line
definitions for trivial functions, instead of the old style where the
two curly braces each get their own line.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/rcu.h | 128 +++++++++----------------------------------------------
 1 file changed, 20 insertions(+), 108 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 808b8c85f626..e4b43fef89f5 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -356,22 +356,10 @@ do {									\
 
 #ifdef CONFIG_TINY_RCU
 /* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
-static inline bool rcu_gp_is_normal(void)  /* Internal RCU use. */
-{
-	return true;
-}
-static inline bool rcu_gp_is_expedited(void)  /* Internal RCU use. */
-{
-	return false;
-}
-
-static inline void rcu_expedite_gp(void)
-{
-}
-
-static inline void rcu_unexpedite_gp(void)
-{
-}
+static inline bool rcu_gp_is_normal(void) { return true; }
+static inline bool rcu_gp_is_expedited(void) { return false; }
+static inline void rcu_expedite_gp(void) { }
+static inline void rcu_unexpedite_gp(void) { }
 #else /* #ifdef CONFIG_TINY_RCU */
 bool rcu_gp_is_normal(void);     /* Internal RCU use. */
 bool rcu_gp_is_expedited(void);  /* Internal RCU use. */
@@ -419,12 +407,8 @@ static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
 	*gpnum = 0;
 	*completed = 0;
 }
-static inline void rcutorture_record_test_transition(void)
-{
-}
-static inline void rcutorture_record_progress(unsigned long vernum)
-{
-}
+static inline void rcutorture_record_test_transition(void) { }
+static inline void rcutorture_record_progress(unsigned long vernum) { }
 #ifdef CONFIG_RCU_TRACE
 void do_trace_rcu_torture_read(const char *rcutorturename,
 			       struct rcu_head *rhp,
@@ -460,92 +444,20 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type,
 #endif
 
 #ifdef CONFIG_TINY_RCU
-
-/*
- * Return the number of grace periods started.
- */
-static inline unsigned long rcu_batches_started(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of bottom-half grace periods started.
- */
-static inline unsigned long rcu_batches_started_bh(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of sched grace periods started.
- */
-static inline unsigned long rcu_batches_started_sched(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of grace periods completed.
- */
-static inline unsigned long rcu_batches_completed(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of bottom-half grace periods completed.
- */
-static inline unsigned long rcu_batches_completed_bh(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of sched grace periods completed.
- */
-static inline unsigned long rcu_batches_completed_sched(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of expedited grace periods completed.
- */
-static inline unsigned long rcu_exp_batches_completed(void)
-{
-	return 0;
-}
-
-/*
- * Return the number of expedited sched grace periods completed.
- */
-static inline unsigned long rcu_exp_batches_completed_sched(void)
-{
-	return 0;
-}
-
-static inline unsigned long srcu_batches_completed(struct srcu_struct *sp)
-{
-	return 0;
-}
-
-static inline void rcu_force_quiescent_state(void)
-{
-}
-
-static inline void rcu_bh_force_quiescent_state(void)
-{
-}
-
-static inline void rcu_sched_force_quiescent_state(void)
-{
-}
-
-static inline void show_rcu_gp_kthreads(void)
-{
-}
-
+static inline unsigned long rcu_batches_started(void) { return 0; }
+static inline unsigned long rcu_batches_started_bh(void) { return 0; }
+static inline unsigned long rcu_batches_started_sched(void) { return 0; }
+static inline unsigned long rcu_batches_completed(void) { return 0; }
+static inline unsigned long rcu_batches_completed_bh(void) { return 0; }
+static inline unsigned long rcu_batches_completed_sched(void) { return 0; }
+static inline unsigned long rcu_exp_batches_completed(void) { return 0; }
+static inline unsigned long rcu_exp_batches_completed_sched(void) { return 0; }
+static inline unsigned long
+srcu_batches_completed(struct srcu_struct *sp) { return 0; }
+static inline void rcu_force_quiescent_state(void) { }
+static inline void rcu_bh_force_quiescent_state(void) { }
+static inline void rcu_sched_force_quiescent_state(void) { }
+static inline void show_rcu_gp_kthreads(void) { }
 #else /* #ifdef CONFIG_TINY_RCU */
 extern unsigned long rcutorture_testseq;
 extern unsigned long rcutorture_vernum;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (5 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 06/15] rcu: Move rcu.h to new trivial-function style Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-28  1:38   ` Steven Rostedt
  2017-07-24 21:44 ` [PATCH tip/core/rcu 08/15] swait: add idle variants which don't contribute to load average Paul E. McKenney
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

There is currently event tracing to track when a task is preempted
within a preemptible RCU read-side critical section, and also when that
task subsequently reaches its outermost rcu_read_unlock(), but none
indicating when a new grace period starts when that grace period must
wait on pre-existing readers that have been been preempted at least once
since the beginning of their current RCU read-side critical sections.

This commit therefore adds an event trace at grace-period start in
the case where there are such readers.  Note that only the first
reader in the list is traced.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 14ba496a13cd..3e3f92e981a1 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
  */
 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
 {
+	struct task_struct *t;
+
 	RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
 	WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
-	if (rcu_preempt_has_tasks(rnp))
+	if (rcu_preempt_has_tasks(rnp)) {
 		rnp->gp_tasks = rnp->blkd_tasks.next;
+		t = container_of(rnp->gp_tasks, struct task_struct,
+				 rcu_node_entry);
+		trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
+						rnp->gpnum, t->pid);
+	}
 	WARN_ON_ONCE(rnp->qsmask);
 }
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 08/15] swait: add idle variants which don't contribute to load average
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (6 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 09/15] rcu: use idle versions of swait to make idle-hack clear Paul E. McKenney
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Luis R. Rodriguez, Paul E. McKenney

From: "Luis R. Rodriguez" <mcgrof@kernel.org>

There are cases where folks are using an interruptible swait when
using kthreads. This is rather confusing given you'd expect
interruptible waits to be -- interruptible, but kthreads are not
interruptible ! The reason for such practice though is to avoid
having these kthreads contribute to the system load average.

When systems are idle some kthreads may spend a lot of time blocking if
using swait_event_timeout(). This would contribute to the system load
average. On systems without preemption this would mean the load average
of an idle system is bumped to 2 instead of 0. On systems with PREEMPT=y
this would mean the load average of an idle system is bumped to 3
instead of 0.

This adds proper API using TASK_IDLE to make such goals explicit and
avoid confusion.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/swait.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index c1f9c62a8a50..4a4e180d0a35 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -169,4 +169,59 @@ do {									\
 	__ret;								\
 })
 
+#define __swait_event_idle(wq, condition)				\
+	(void)___swait_event(wq, condition, TASK_IDLE, 0, schedule())
+
+/**
+ * swait_event_idle - wait without system load contribution
+ * @wq: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ *
+ * The process is put to sleep (TASK_IDLE) until the @condition evaluates to
+ * true. The @condition is checked each time the waitqueue @wq is woken up.
+ *
+ * This function is mostly used when a kthread or workqueue waits for some
+ * condition and doesn't want to contribute to system load. Signals are
+ * ignored.
+ */
+#define swait_event_idle(wq, condition)					\
+do {									\
+	if (condition)							\
+		break;							\
+	__swait_event_idle(wq, condition);				\
+} while (0)
+
+#define __swait_event_idle_timeout(wq, condition, timeout)		\
+	___swait_event(wq, ___wait_cond_timeout(condition),		\
+		       TASK_IDLE, timeout,				\
+		       __ret = schedule_timeout(__ret))
+
+/**
+ * swait_event_idle_timeout - wait up to timeout without load contribution
+ * @wq: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @timeout: timeout at which we'll give up in jiffies
+ *
+ * The process is put to sleep (TASK_IDLE) until the @condition evaluates to
+ * true. The @condition is checked each time the waitqueue @wq is woken up.
+ *
+ * This function is mostly used when a kthread or workqueue waits for some
+ * condition and doesn't want to contribute to system load. Signals are
+ * ignored.
+ *
+ * Returns:
+ * 0 if the @condition evaluated to %false after the @timeout elapsed,
+ * 1 if the @condition evaluated to %true after the @timeout elapsed,
+ * or the remaining jiffies (at least 1) if the @condition evaluated
+ * to %true before the @timeout elapsed.
+ */
+#define swait_event_idle_timeout(wq, condition, timeout)		\
+({									\
+	long __ret = timeout;						\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __swait_event_idle_timeout(wq,			\
+						   condition, timeout);	\
+	__ret;								\
+})
+
 #endif /* _LINUX_SWAIT_H */
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 09/15] rcu: use idle versions of swait to make idle-hack clear
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (7 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 08/15] swait: add idle variants which don't contribute to load average Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings Paul E. McKenney
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Luis R. Rodriguez, Paul E. McKenney

From: "Luis R. Rodriguez" <mcgrof@kernel.org>

These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
our goal clear, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadmins to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ab4c2cea208d..9a90d9b1dc04 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2052,8 +2052,8 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 }
 
 /*
- * Helper function for wait_event_interruptible_timeout() wakeup
- * at force-quiescent-state time.
+ * Helper function for swait_event_idle() wakeup at force-quiescent-state
+ * time.
  */
 static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
 {
@@ -2191,9 +2191,8 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("reqwait"));
 			rsp->gp_state = RCU_GP_WAIT_GPS;
-			swait_event_interruptible(rsp->gp_wq,
-						 READ_ONCE(rsp->gp_flags) &
-						 RCU_GP_FLAG_INIT);
+			swait_event_idle(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+						     RCU_GP_FLAG_INIT);
 			rsp->gp_state = RCU_GP_DONE_GPS;
 			/* Locking provides needed memory barrier. */
 			if (rcu_gp_init(rsp))
@@ -2224,7 +2223,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 					       READ_ONCE(rsp->gpnum),
 					       TPS("fqswait"));
 			rsp->gp_state = RCU_GP_WAIT_FQS;
-			ret = swait_event_interruptible_timeout(rsp->gp_wq,
+			ret = swait_event_idle_timeout(rsp->gp_wq,
 					rcu_gp_fqs_check_wake(rsp, &gf), j);
 			rsp->gp_state = RCU_GP_DOING_FQS;
 			/* Locking provides needed memory barriers. */
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (8 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 09/15] rcu: use idle versions of swait to make idle-hack clear Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-28  1:40   ` Steven Rostedt
  2017-07-24 21:44 ` [PATCH tip/core/rcu 11/15] rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() Paul E. McKenney
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
which needs TPS() protection for strings passed through the second
argument.  However, it has escaped prior TPS()-ification efforts because
it _rcu_barrier_trace() does not start with "trace_".  This commit
therefore adds the needed TPS() protection

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcu/tree.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9a90d9b1dc04..7e018696fd82 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3553,10 +3553,11 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
 	struct rcu_state *rsp = rdp->rsp;
 
 	if (atomic_dec_and_test(&rsp->barrier_cpu_count)) {
-		_rcu_barrier_trace(rsp, "LastCB", -1, rsp->barrier_sequence);
+		_rcu_barrier_trace(rsp, TPS("LastCB"), -1,
+				   rsp->barrier_sequence);
 		complete(&rsp->barrier_completion);
 	} else {
-		_rcu_barrier_trace(rsp, "CB", -1, rsp->barrier_sequence);
+		_rcu_barrier_trace(rsp, TPS("CB"), -1, rsp->barrier_sequence);
 	}
 }
 
@@ -3568,14 +3569,15 @@ static void rcu_barrier_func(void *type)
 	struct rcu_state *rsp = type;
 	struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
 
-	_rcu_barrier_trace(rsp, "IRQ", -1, rsp->barrier_sequence);
+	_rcu_barrier_trace(rsp, TPS("IRQ"), -1, rsp->barrier_sequence);
 	rdp->barrier_head.func = rcu_barrier_callback;
 	debug_rcu_head_queue(&rdp->barrier_head);
 	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head, 0)) {
 		atomic_inc(&rsp->barrier_cpu_count);
 	} else {
 		debug_rcu_head_unqueue(&rdp->barrier_head);
-		_rcu_barrier_trace(rsp, "IRQNQ", -1, rsp->barrier_sequence);
+		_rcu_barrier_trace(rsp, TPS("IRQNQ"), -1,
+				   rsp->barrier_sequence);
 	}
 }
 
@@ -3589,14 +3591,15 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	struct rcu_data *rdp;
 	unsigned long s = rcu_seq_snap(&rsp->barrier_sequence);
 
-	_rcu_barrier_trace(rsp, "Begin", -1, s);
+	_rcu_barrier_trace(rsp, TPS("Begin"), -1, s);
 
 	/* Take mutex to serialize concurrent rcu_barrier() requests. */
 	mutex_lock(&rsp->barrier_mutex);
 
 	/* Did someone else do our work for us? */
 	if (rcu_seq_done(&rsp->barrier_sequence, s)) {
-		_rcu_barrier_trace(rsp, "EarlyExit", -1, rsp->barrier_sequence);
+		_rcu_barrier_trace(rsp, TPS("EarlyExit"), -1,
+				   rsp->barrier_sequence);
 		smp_mb(); /* caller's subsequent code after above check. */
 		mutex_unlock(&rsp->barrier_mutex);
 		return;
@@ -3604,7 +3607,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 
 	/* Mark the start of the barrier operation. */
 	rcu_seq_start(&rsp->barrier_sequence);
-	_rcu_barrier_trace(rsp, "Inc1", -1, rsp->barrier_sequence);
+	_rcu_barrier_trace(rsp, TPS("Inc1"), -1, rsp->barrier_sequence);
 
 	/*
 	 * Initialize the count to one rather than to zero in order to
@@ -3627,10 +3630,10 @@ static void _rcu_barrier(struct rcu_state *rsp)
 		rdp = per_cpu_ptr(rsp->rda, cpu);
 		if (rcu_is_nocb_cpu(cpu)) {
 			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
-				_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+				_rcu_barrier_trace(rsp, TPS("OfflineNoCB"), cpu,
 						   rsp->barrier_sequence);
 			} else {
-				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+				_rcu_barrier_trace(rsp, TPS("OnlineNoCB"), cpu,
 						   rsp->barrier_sequence);
 				smp_mb__before_atomic();
 				atomic_inc(&rsp->barrier_cpu_count);
@@ -3638,11 +3641,11 @@ static void _rcu_barrier(struct rcu_state *rsp)
 					   rcu_barrier_callback, rsp, cpu, 0);
 			}
 		} else if (rcu_segcblist_n_cbs(&rdp->cblist)) {
-			_rcu_barrier_trace(rsp, "OnlineQ", cpu,
+			_rcu_barrier_trace(rsp, TPS("OnlineQ"), cpu,
 					   rsp->barrier_sequence);
 			smp_call_function_single(cpu, rcu_barrier_func, rsp, 1);
 		} else {
-			_rcu_barrier_trace(rsp, "OnlineNQ", cpu,
+			_rcu_barrier_trace(rsp, TPS("OnlineNQ"), cpu,
 					   rsp->barrier_sequence);
 		}
 	}
@@ -3659,7 +3662,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	wait_for_completion(&rsp->barrier_completion);
 
 	/* Mark the end of the barrier operation. */
-	_rcu_barrier_trace(rsp, "Inc2", -1, rsp->barrier_sequence);
+	_rcu_barrier_trace(rsp, TPS("Inc2"), -1, rsp->barrier_sequence);
 	rcu_seq_end(&rsp->barrier_sequence);
 
 	/* Other rcu_barrier() invocations can now safely proceed. */
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 11/15] rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (9 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 12/15] rcu: Add assertions verifying blocked-tasks list Paul E. McKenney
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Masami Hiramatsu, Paul E. McKenney

From: Masami Hiramatsu <mhiramat@kernel.org>

Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
fixed for rcu_eqs_enter_common() by commit 03ecd3f48e57 ("rcu/tracing:
Add rcu_disabled to denote when rcu_irq_enter() will not work").

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7e018696fd82..913c90eccd4d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -955,8 +955,10 @@ static void rcu_eqs_exit(bool user)
 	if (oldval & DYNTICK_TASK_NEST_MASK) {
 		rdtp->dynticks_nesting += DYNTICK_TASK_NEST_VALUE;
 	} else {
+		__this_cpu_inc(disable_rcu_irq_enter);
 		rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
 		rcu_eqs_exit_common(oldval, user);
+		__this_cpu_dec(disable_rcu_irq_enter);
 	}
 }
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 12/15] rcu: Add assertions verifying blocked-tasks list
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (10 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 11/15] rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 13/15] rcu: Make rcu_idle_enter() rely on callers disabling irqs Paul E. McKenney
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

This commit adds assertions verifying the consistency of the rcu_node
structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
->boost_tasks pointers.  In particular, the ->blkd_tasks lists must be
empty except for leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        |  2 ++
 kernel/rcu/tree_plugin.h | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 913c90eccd4d..c028eb254704 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2395,6 +2395,8 @@ rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
 			return;
 		}
 		WARN_ON_ONCE(oldmask); /* Any child must be all zeroed! */
+		WARN_ON_ONCE(rnp->level != rcu_num_lvls - 1 &&
+			     rcu_preempt_blocked_readers_cgp(rnp));
 		rnp->qsmask &= ~mask;
 		trace_rcu_quiescent_state_report(rsp->name, rnp->gpnum,
 						 mask, rnp->qsmask, rnp->level,
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3e3f92e981a1..eadf8b95b5e9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -180,6 +180,8 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
 	struct task_struct *t = current;
 
 	lockdep_assert_held(&rnp->lock);
+	WARN_ON_ONCE(rdp->mynode != rnp);
+	WARN_ON_ONCE(rnp->level != rcu_num_lvls - 1);
 
 	/*
 	 * Decide where to queue the newly blocked task.  In theory,
@@ -261,6 +263,10 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
 		rnp->gp_tasks = &t->rcu_node_entry;
 	if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
 		rnp->exp_tasks = &t->rcu_node_entry;
+	WARN_ON_ONCE(!(blkd_state & RCU_GP_BLKD) !=
+		     !(rnp->qsmask & rdp->grpmask));
+	WARN_ON_ONCE(!(blkd_state & RCU_EXP_BLKD) !=
+		     !(rnp->expmask & rdp->grpmask));
 	raw_spin_unlock_rcu_node(rnp); /* interrupts remain disabled. */
 
 	/*
@@ -482,6 +488,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 		rnp = t->rcu_blocked_node;
 		raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
 		WARN_ON_ONCE(rnp != t->rcu_blocked_node);
+		WARN_ON_ONCE(rnp->level != rcu_num_lvls - 1);
 		empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
 		empty_exp = sync_rcu_preempt_exp_done(rnp);
 		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
@@ -495,10 +502,10 @@ void rcu_read_unlock_special(struct task_struct *t)
 		if (&t->rcu_node_entry == rnp->exp_tasks)
 			rnp->exp_tasks = np;
 		if (IS_ENABLED(CONFIG_RCU_BOOST)) {
-			if (&t->rcu_node_entry == rnp->boost_tasks)
-				rnp->boost_tasks = np;
 			/* Snapshot ->boost_mtx ownership w/rnp->lock held. */
 			drop_boost_mutex = rt_mutex_owner(&rnp->boost_mtx) == t;
+			if (&t->rcu_node_entry == rnp->boost_tasks)
+				rnp->boost_tasks = np;
 		}
 
 		/*
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 13/15] rcu: Make rcu_idle_enter() rely on callers disabling irqs
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (11 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 12/15] rcu: Add assertions verifying blocked-tasks list Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 14/15] rcu: Add warning to rcu_idle_enter() for irqs enabled Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 15/15] rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() Paul E. McKenney
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

All callers to rcu_idle_enter() have irqs disabled, so there is no
point in rcu_idle_enter disabling them again.  This commit therefore
replaces the irq disabling with a RCU_LOCKDEP_WARN().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c028eb254704..64b2cb9a8830 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -843,11 +843,8 @@ static void rcu_eqs_enter(bool user)
  */
 void rcu_idle_enter(void)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
+	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_idle_enter() invoked with irqs enabled!!!");
 	rcu_eqs_enter(false);
-	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 14/15] rcu: Add warning to rcu_idle_enter() for irqs enabled
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (12 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 13/15] rcu: Make rcu_idle_enter() rely on callers disabling irqs Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  2017-07-24 21:44 ` [PATCH tip/core/rcu 15/15] rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() Paul E. McKenney
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

All current callers of rcu_idle_enter() have irqs disabled, and
rcu_idle_enter() relies on this, but doesn't check.  This commit
therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
While we are there, pass "true" rather than "1" to rcu_eqs_enter().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 64b2cb9a8830..b4961a5f47a8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -859,7 +859,8 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
  */
 void rcu_user_enter(void)
 {
-	rcu_eqs_enter(1);
+	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_user_enter() invoked with irqs enabled!!!");
+	rcu_eqs_enter(true);
 }
 #endif /* CONFIG_NO_HZ_FULL */
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH tip/core/rcu 15/15] rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
  2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
                   ` (13 preceding siblings ...)
  2017-07-24 21:44 ` [PATCH tip/core/rcu 14/15] rcu: Add warning to rcu_idle_enter() for irqs enabled Paul E. McKenney
@ 2017-07-24 21:44 ` Paul E. McKenney
  14 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-24 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney

The rcu_idle_exit() and rcu_idle_enter() functions are exported because
they were originally used by RCU_NONIDLE(), which was intended to
be usable from modules.  However, RCU_NONIDLE() now instead uses
rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
exported, and there have been no complaints.

This commit therefore removes the exports from rcu_idle_exit() and
rcu_idle_enter().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b4961a5f47a8..febee2f46d50 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -846,7 +846,6 @@ void rcu_idle_enter(void)
 	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_idle_enter() invoked with irqs enabled!!!");
 	rcu_eqs_enter(false);
 }
-EXPORT_SYMBOL_GPL(rcu_idle_enter);
 
 #ifdef CONFIG_NO_HZ_FULL
 /**
@@ -979,7 +978,6 @@ void rcu_idle_exit(void)
 	rcu_eqs_exit(false);
 	local_irq_restore(flags);
 }
-EXPORT_SYMBOL_GPL(rcu_idle_exit);
 
 #ifdef CONFIG_NO_HZ_FULL
 /**
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-24 21:44 ` [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Paul E. McKenney
@ 2017-07-25 18:12   ` Steven Rostedt
  2017-07-25 19:18     ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-25 18:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Mon, 24 Jul 2017 14:44:31 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> The handling of RCU's no-CBs CPUs has a maintenance headache, namely
> that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
> wakeup must be defered to a point where we can be sure that scheduler
> locks are not held.  Of course, there are a lot of code paths leading
> from an interrupts-disabled invocation of call_rcu(), and missing any
> one of these can result in excessive callback-invocation latency, and
> potentially even system hangs.

What about using irq_work? That's what perf and ftrace use for such a
case.

-- Steve

> 
> This commit therefore uses a timer to guarantee that the wakeup will
> eventually occur.  If one of the deferred-wakeup points kicks in, then
> the timer is simply cancelled.
> 
> This commit also fixes up an incomplete removal of commits that were
> intended to plug remaining exit paths, which should have the added
> benefit of reducing the overhead of RCU's context-switch hooks.  In
> addition, it simplifies leader-to-follower callback-list handoff by
> introducing locking.  The call_rcu()-to-leader handoff continues to
> use atomic operations in order to maintain good real-time latency for
> common-case use of call_rcu().
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT
  2017-07-24 21:44 ` [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT Paul E. McKenney
@ 2017-07-25 18:14   ` Steven Rostedt
  2017-07-25 19:19     ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-25 18:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Mon, 24 Jul 2017 14:44:32 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
> is used instead.  This commit therefore makes synchronize_rcu_tasks()
> and call_rcu_tasks() available always, but mapped to synchronize_sched()
> and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
> allows some #ifdefs to be removed from rcutorture.
> 

Hmm, I'll need to update ftrace. I believe I call synchronize_sched()
twice with !PREEMPT then.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-25 18:12   ` Steven Rostedt
@ 2017-07-25 19:18     ` Paul E. McKenney
  2017-07-25 22:17       ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-25 19:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Tue, Jul 25, 2017 at 02:12:20PM -0400, Steven Rostedt wrote:
> On Mon, 24 Jul 2017 14:44:31 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > The handling of RCU's no-CBs CPUs has a maintenance headache, namely
> > that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
> > wakeup must be defered to a point where we can be sure that scheduler
> > locks are not held.  Of course, there are a lot of code paths leading
> > from an interrupts-disabled invocation of call_rcu(), and missing any
> > one of these can result in excessive callback-invocation latency, and
> > potentially even system hangs.
> 
> What about using irq_work? That's what perf and ftrace use for such a
> case.

I hadn't looked at irq_work before, thank you for the pointer!

I nevertheless believe that timers work better in this particular case
because they can be cancelled (which appears to be the common case), they
normally are not at all time-critical, and because running in softirq
is just fine -- no need to run out of the scheduling-clock interrupt.

Seem reasonable?

							Thanx, Paul

> -- Steve
> 
> > 
> > This commit therefore uses a timer to guarantee that the wakeup will
> > eventually occur.  If one of the deferred-wakeup points kicks in, then
> > the timer is simply cancelled.
> > 
> > This commit also fixes up an incomplete removal of commits that were
> > intended to plug remaining exit paths, which should have the added
> > benefit of reducing the overhead of RCU's context-switch hooks.  In
> > addition, it simplifies leader-to-follower callback-list handoff by
> > introducing locking.  The call_rcu()-to-leader handoff continues to
> > use atomic operations in order to maintain good real-time latency for
> > common-case use of call_rcu().
> > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT
  2017-07-25 18:14   ` Steven Rostedt
@ 2017-07-25 19:19     ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-25 19:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Tue, Jul 25, 2017 at 02:14:05PM -0400, Steven Rostedt wrote:
> On Mon, 24 Jul 2017 14:44:32 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
> > is used instead.  This commit therefore makes synchronize_rcu_tasks()
> > and call_rcu_tasks() available always, but mapped to synchronize_sched()
> > and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
> > allows some #ifdefs to be removed from rcutorture.
> 
> Hmm, I'll need to update ftrace. I believe I call synchronize_sched()
> twice with !PREEMPT then.

Does this mean that synchronize_rcu_tasks() should invoke
synchronize_sched() twice?  Easy change to make, if so.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-25 19:18     ` Paul E. McKenney
@ 2017-07-25 22:17       ` Steven Rostedt
  2017-07-26  0:05         ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-25 22:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Tue, 25 Jul 2017 12:18:14 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, Jul 25, 2017 at 02:12:20PM -0400, Steven Rostedt wrote:
> > On Mon, 24 Jul 2017 14:44:31 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > The handling of RCU's no-CBs CPUs has a maintenance headache, namely
> > > that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
> > > wakeup must be defered to a point where we can be sure that scheduler
> > > locks are not held.  Of course, there are a lot of code paths leading
> > > from an interrupts-disabled invocation of call_rcu(), and missing any
> > > one of these can result in excessive callback-invocation latency, and
> > > potentially even system hangs.  
> > 
> > What about using irq_work? That's what perf and ftrace use for such a
> > case.  
> 
> I hadn't looked at irq_work before, thank you for the pointer!
> 
> I nevertheless believe that timers work better in this particular case
> because they can be cancelled (which appears to be the common case), they

Is the common case here that it doesn't trigger? That is, the
del_timer() will be called?

> normally are not at all time-critical, and because running in softirq
> is just fine -- no need to run out of the scheduling-clock interrupt.

irq_work doesn't always use the scheduling clock. IIRC, it will simply
trigger a interrupt (if the arch supports it), and the work will be
done when interrupts are enabled (the interrupt that will do the work
will trigger)

> 
> Seem reasonable?

Don't know. With irq_work, you just call it and forget about it. No
need to mod or del timers.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-25 22:17       ` Steven Rostedt
@ 2017-07-26  0:05         ` Paul E. McKenney
  2017-07-26 21:18           ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-26  0:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Tue, Jul 25, 2017 at 06:17:10PM -0400, Steven Rostedt wrote:
> On Tue, 25 Jul 2017 12:18:14 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Jul 25, 2017 at 02:12:20PM -0400, Steven Rostedt wrote:
> > > On Mon, 24 Jul 2017 14:44:31 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > The handling of RCU's no-CBs CPUs has a maintenance headache, namely
> > > > that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
> > > > wakeup must be defered to a point where we can be sure that scheduler
> > > > locks are not held.  Of course, there are a lot of code paths leading
> > > > from an interrupts-disabled invocation of call_rcu(), and missing any
> > > > one of these can result in excessive callback-invocation latency, and
> > > > potentially even system hangs.  
> > > 
> > > What about using irq_work? That's what perf and ftrace use for such a
> > > case.  
> > 
> > I hadn't looked at irq_work before, thank you for the pointer!
> > 
> > I nevertheless believe that timers work better in this particular case
> > because they can be cancelled (which appears to be the common case), they
> 
> Is the common case here that it doesn't trigger? That is, the
> del_timer() will be called?

If you have lots of call_rcu() invocations, many of them will be invoked
with interrupts enabled, and a later one with interrupts enabled will
take care of things for the earlier ones.  So there can be workloads
where this is the case.

> > normally are not at all time-critical, and because running in softirq
> > is just fine -- no need to run out of the scheduling-clock interrupt.
> 
> irq_work doesn't always use the scheduling clock. IIRC, it will simply
> trigger a interrupt (if the arch supports it), and the work will be
> done when interrupts are enabled (the interrupt that will do the work
> will trigger)

Ah, OK, so scheduling clock is just the backstop.  Still, softirq
is a bit nicer to manage than hardirq.

> > Seem reasonable?
> 
> Don't know. With irq_work, you just call it and forget about it. No
> need to mod or del timers.

But I could have a series of call_rcu() invocations with interrupts
disabled, so I would need to interact somehow with the irq_work handler.
Either that or dynamically allocate the needed data structure.

Or am I missing something here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-26  0:05         ` Paul E. McKenney
@ 2017-07-26 21:18           ` Steven Rostedt
  2017-07-26 21:47             ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-26 21:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Tue, 25 Jul 2017 17:05:40 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, Jul 25, 2017 at 06:17:10PM -0400, Steven Rostedt wrote:
> > On Tue, 25 Jul 2017 12:18:14 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > On Tue, Jul 25, 2017 at 02:12:20PM -0400, Steven Rostedt wrote:  
> > > > On Mon, 24 Jul 2017 14:44:31 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > >     
> > > > > The handling of RCU's no-CBs CPUs has a maintenance headache, namely
> > > > > that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
> > > > > wakeup must be defered to a point where we can be sure that scheduler
> > > > > locks are not held.  Of course, there are a lot of code paths leading
> > > > > from an interrupts-disabled invocation of call_rcu(), and missing any
> > > > > one of these can result in excessive callback-invocation latency, and
> > > > > potentially even system hangs.    
> > > > 
> > > > What about using irq_work? That's what perf and ftrace use for such a
> > > > case.    
> > > 
> > > I hadn't looked at irq_work before, thank you for the pointer!
> > > 
> > > I nevertheless believe that timers work better in this particular case
> > > because they can be cancelled (which appears to be the common case), they  
> > 
> > Is the common case here that it doesn't trigger? That is, the
> > del_timer() will be called?  
> 
> If you have lots of call_rcu() invocations, many of them will be invoked
> with interrupts enabled, and a later one with interrupts enabled will
> take care of things for the earlier ones.  So there can be workloads
> where this is the case.

Note, only the first irq_work called will take action. The other
callers will see that a irq_work is pending and will not reivoke one.

> 
> > > normally are not at all time-critical, and because running in softirq
> > > is just fine -- no need to run out of the scheduling-clock interrupt.  
> > 
> > irq_work doesn't always use the scheduling clock. IIRC, it will simply
> > trigger a interrupt (if the arch supports it), and the work will be
> > done when interrupts are enabled (the interrupt that will do the work
> > will trigger)  
> 
> Ah, OK, so scheduling clock is just the backstop.  Still, softirq
> is a bit nicer to manage than hardirq.

Still requires a hard interrupt (timer) (thinking of NOHZ FULL where
this does matter).

> 
> > > Seem reasonable?  
> > 
> > Don't know. With irq_work, you just call it and forget about it. No
> > need to mod or del timers.  
> 
> But I could have a series of call_rcu() invocations with interrupts
> disabled, so I would need to interact somehow with the irq_work handler.
> Either that or dynamically allocate the needed data structure.
> 
> Or am I missing something here?

You treat it just like you are with the timer code. You have a irq_work
struct attached to your rdp descriptor. And call irq_work_run() when
interrupts are disabled. If it hasn't already been invoked it will
invoke one. Then the irq_work handler will look at the rdp attached to
the irq_work (container_of()), and then wake the associated thread.

It is much lighter weight than a timer setup.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-26 21:18           ` Steven Rostedt
@ 2017-07-26 21:47             ` Paul E. McKenney
  2017-07-26 23:09               ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-26 21:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Wed, Jul 26, 2017 at 05:18:01PM -0400, Steven Rostedt wrote:
> On Tue, 25 Jul 2017 17:05:40 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Jul 25, 2017 at 06:17:10PM -0400, Steven Rostedt wrote:
> > > On Tue, 25 Jul 2017 12:18:14 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > On Tue, Jul 25, 2017 at 02:12:20PM -0400, Steven Rostedt wrote:  
> > > > > On Mon, 24 Jul 2017 14:44:31 -0700
> > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > >     
> > > > > > The handling of RCU's no-CBs CPUs has a maintenance headache, namely
> > > > > > that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
> > > > > > wakeup must be defered to a point where we can be sure that scheduler
> > > > > > locks are not held.  Of course, there are a lot of code paths leading
> > > > > > from an interrupts-disabled invocation of call_rcu(), and missing any
> > > > > > one of these can result in excessive callback-invocation latency, and
> > > > > > potentially even system hangs.    
> > > > > 
> > > > > What about using irq_work? That's what perf and ftrace use for such a
> > > > > case.    
> > > > 
> > > > I hadn't looked at irq_work before, thank you for the pointer!
> > > > 
> > > > I nevertheless believe that timers work better in this particular case
> > > > because they can be cancelled (which appears to be the common case), they  
> > > 
> > > Is the common case here that it doesn't trigger? That is, the
> > > del_timer() will be called?  
> > 
> > If you have lots of call_rcu() invocations, many of them will be invoked
> > with interrupts enabled, and a later one with interrupts enabled will
> > take care of things for the earlier ones.  So there can be workloads
> > where this is the case.
> 
> Note, only the first irq_work called will take action. The other
> callers will see that a irq_work is pending and will not reivoke one.

OK, that does make things a bit easier.

But suppose that an old irq_work has just done the wakeup on CPU 0,
but has not yet completed, and the rcuo kthead duly wakes up, does
some stuff on CPU 1 and goes to sleep, then CPU 2 gets a call_rcu()
with interrupts disabled, and therefore wants to do an irq_work again.
But the irq_work on CPU 0 is still running.

OK, this seems to be handled by clearing IRQ_WORK_PENDING before invoking
the irq_work handler.

> > > > normally are not at all time-critical, and because running in softirq
> > > > is just fine -- no need to run out of the scheduling-clock interrupt.  
> > > 
> > > irq_work doesn't always use the scheduling clock. IIRC, it will simply
> > > trigger a interrupt (if the arch supports it), and the work will be
> > > done when interrupts are enabled (the interrupt that will do the work
> > > will trigger)  
> > 
> > Ah, OK, so scheduling clock is just the backstop.  Still, softirq
> > is a bit nicer to manage than hardirq.
> 
> Still requires a hard interrupt (timer) (thinking of NOHZ FULL where
> this does matter).

But only assuming that there isn't an interrupts-enabled invocation of
call_rcu() before the timer would have gone off.  In this case, the
irq_work would still trigger, and if I didn't keep the "don't need it"
complexity of the current timer-based patch, could further result in
a spurious wakeup of the rcuo kthread, which could be just as much of
a problem for nohz_full CPUs.  (Yes, hopefully the rcuo kthread would
be placed to avoid nohz_full CPUs, but on the other hand, hopefully
code that caused call_rcu() to be invoked with interrupts disabled
would also be so placed.)

> > > > Seem reasonable?  
> > > 
> > > Don't know. With irq_work, you just call it and forget about it. No
> > > need to mod or del timers.  
> > 
> > But I could have a series of call_rcu() invocations with interrupts
> > disabled, so I would need to interact somehow with the irq_work handler.
> > Either that or dynamically allocate the needed data structure.
> > 
> > Or am I missing something here?
> 
> You treat it just like you are with the timer code. You have a irq_work
> struct attached to your rdp descriptor. And call irq_work_run() when
> interrupts are disabled. If it hasn't already been invoked it will
> invoke one. Then the irq_work handler will look at the rdp attached to
> the irq_work (container_of()), and then wake the associated thread.
> 
> It is much lighter weight than a timer setup.

How much lighter weight?  In other words, what fraction of the
timers have to avoid being cancelled for irq_work to break even?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-26 21:47             ` Paul E. McKenney
@ 2017-07-26 23:09               ` Steven Rostedt
  2017-07-27 17:33                 ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-26 23:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Wed, 26 Jul 2017 14:47:41 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:


> > It is much lighter weight than a timer setup.  
> 
> How much lighter weight?  In other words, what fraction of the
> timers have to avoid being cancelled for irq_work to break even?

No idea. I guess that would be a nice academic exercise ;-)

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups
  2017-07-26 23:09               ` Steven Rostedt
@ 2017-07-27 17:33                 ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-27 17:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Wed, Jul 26, 2017 at 07:09:34PM -0400, Steven Rostedt wrote:
> On Wed, 26 Jul 2017 14:47:41 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> 
> > > It is much lighter weight than a timer setup.  
> > 
> > How much lighter weight?  In other words, what fraction of the
> > timers have to avoid being cancelled for irq_work to break even?
> 
> No idea. I guess that would be a nice academic exercise ;-)

Hmmm...  I guess I should try it and see what it does, though that
would be some merge window after the upcoming one.  If it is indeed
simpler, I might forgive some possible performance shortcomings...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings
  2017-07-24 21:44 ` [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings Paul E. McKenney
@ 2017-07-28  1:32   ` Steven Rostedt
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2017-07-28  1:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Mon, 24 Jul 2017 14:44:34 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Strings used in event tracing need to be specially handled, for example,
> using the TPS() macro.  Without the TPS() macro, although output looks
> fine from within a running kernel, extracting traces from a crash dump
> produces garbage instead of strings.  This commit therefore adds the TPS()
> macro to some unadorned strings that were passed to event-tracing macros.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> ---
>  kernel/rcu/tree_plugin.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index bb9e6e43130f..14ba496a13cd 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2073,7 +2073,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  
>  	/* Wait for callbacks to appear. */
>  	if (!rcu_nocb_poll) {
> -		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Sleep");
> +		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Sleep"));
>  		swait_event_interruptible(my_rdp->nocb_wq,
>  				!READ_ONCE(my_rdp->nocb_leader_sleep));
>  		raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
> @@ -2083,7 +2083,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  		raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
>  	} else if (firsttime) {
>  		firsttime = false; /* Don't drown trace log with "Poll"! */
> -		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, "Poll");
> +		trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Poll"));
>  	}
>  
>  	/*
> @@ -2111,7 +2111,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  			schedule_timeout_interruptible(1);
>  		} else {
>  			trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu,
> -					    "WokeEmpty");
> +					    TPS("WokeEmpty"));
>  		}
>  		goto wait_again;
>  	}
> @@ -2155,7 +2155,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
>  static void nocb_follower_wait(struct rcu_data *rdp)
>  {
>  	for (;;) {
> -		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "FollowerSleep");
> +		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("FollowerSleep"));
>  		swait_event_interruptible(rdp->nocb_wq,
>  					 READ_ONCE(rdp->nocb_follower_head));
>  		if (smp_load_acquire(&rdp->nocb_follower_head)) {
> @@ -2163,7 +2163,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
>  			return;
>  		}
>  		WARN_ON(signal_pending(current));
> -		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeEmpty");
> +		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WokeEmpty"));
>  	}
>  }
>  
> @@ -2198,7 +2198,7 @@ static int rcu_nocb_kthread(void *arg)
>  		rdp->nocb_follower_tail = &rdp->nocb_follower_head;
>  		raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
>  		BUG_ON(!list);
> -		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeNonEmpty");
> +		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WokeNonEmpty"));
>  
>  		/* Each pass through the following loop invokes a callback. */
>  		trace_rcu_batch_start(rdp->rsp->name,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start
  2017-07-24 21:44 ` [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start Paul E. McKenney
@ 2017-07-28  1:38   ` Steven Rostedt
  2017-07-28  3:22     ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-28  1:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Mon, 24 Jul 2017 14:44:36 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> There is currently event tracing to track when a task is preempted
> within a preemptible RCU read-side critical section, and also when that
> task subsequently reaches its outermost rcu_read_unlock(), but none
> indicating when a new grace period starts when that grace period must
> wait on pre-existing readers that have been been preempted at least once
> since the beginning of their current RCU read-side critical sections.
> 
> This commit therefore adds an event trace at grace-period start in
> the case where there are such readers.  Note that only the first
> reader in the list is traced.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/tree_plugin.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 14ba496a13cd..3e3f92e981a1 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>   */
>  static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>  {
> +	struct task_struct *t;
> +
>  	RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
>  	WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
> -	if (rcu_preempt_has_tasks(rnp))
> +	if (rcu_preempt_has_tasks(rnp)) {

The only function of this if block is to fill the content of the
trace event, correct?

What about doing:

	if (trace_rcu_unlock_preempted_task_enabled() &&
	    rcu_preempt_has_tasks(rnp)) {

instead? The trace_rcu_unlock_preempted_task_enabled() is a static
branch (aka jump_label), which would make the above a constant branch
when tracing is not enabled, and would keep this from adding any extra
overhead.

-- Steve

>  		rnp->gp_tasks = rnp->blkd_tasks.next;
> +		t = container_of(rnp->gp_tasks, struct task_struct,
> +				 rcu_node_entry);
> +		trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
> +						rnp->gpnum, t->pid);
> +	}
>  	WARN_ON_ONCE(rnp->qsmask);
>  }
>  

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings
  2017-07-24 21:44 ` [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings Paul E. McKenney
@ 2017-07-28  1:40   ` Steven Rostedt
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2017-07-28  1:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Mon, 24 Jul 2017 14:44:39 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
> which needs TPS() protection for strings passed through the second
> argument.  However, it has escaped prior TPS()-ification efforts because
> it _rcu_barrier_trace() does not start with "trace_".  This commit
> therefore adds the needed TPS() protection
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> ---
>  kernel/rcu/tree.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9a90d9b1dc04..7e018696fd82 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3553,10 +3553,11 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
>  	struct rcu_state *rsp = rdp->rsp;
>  
>  	if (atomic_dec_and_test(&rsp->barrier_cpu_count)) {
> -		_rcu_barrier_trace(rsp, "LastCB", -1, rsp->barrier_sequence);
> +		_rcu_barrier_trace(rsp, TPS("LastCB"), -1,
> +				   rsp->barrier_sequence);
>  		complete(&rsp->barrier_completion);
>  	} else {
> -		_rcu_barrier_trace(rsp, "CB", -1, rsp->barrier_sequence);
> +		_rcu_barrier_trace(rsp, TPS("CB"), -1, rsp->barrier_sequence);
>  	}
>  }
>  
> @@ -3568,14 +3569,15 @@ static void rcu_barrier_func(void *type)
>  	struct rcu_state *rsp = type;
>  	struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
>  
> -	_rcu_barrier_trace(rsp, "IRQ", -1, rsp->barrier_sequence);
> +	_rcu_barrier_trace(rsp, TPS("IRQ"), -1, rsp->barrier_sequence);
>  	rdp->barrier_head.func = rcu_barrier_callback;
>  	debug_rcu_head_queue(&rdp->barrier_head);
>  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head, 0)) {
>  		atomic_inc(&rsp->barrier_cpu_count);
>  	} else {
>  		debug_rcu_head_unqueue(&rdp->barrier_head);
> -		_rcu_barrier_trace(rsp, "IRQNQ", -1, rsp->barrier_sequence);
> +		_rcu_barrier_trace(rsp, TPS("IRQNQ"), -1,
> +				   rsp->barrier_sequence);
>  	}
>  }
>  
> @@ -3589,14 +3591,15 @@ static void _rcu_barrier(struct rcu_state *rsp)
>  	struct rcu_data *rdp;
>  	unsigned long s = rcu_seq_snap(&rsp->barrier_sequence);
>  
> -	_rcu_barrier_trace(rsp, "Begin", -1, s);
> +	_rcu_barrier_trace(rsp, TPS("Begin"), -1, s);
>  
>  	/* Take mutex to serialize concurrent rcu_barrier() requests. */
>  	mutex_lock(&rsp->barrier_mutex);
>  
>  	/* Did someone else do our work for us? */
>  	if (rcu_seq_done(&rsp->barrier_sequence, s)) {
> -		_rcu_barrier_trace(rsp, "EarlyExit", -1, rsp->barrier_sequence);
> +		_rcu_barrier_trace(rsp, TPS("EarlyExit"), -1,
> +				   rsp->barrier_sequence);
>  		smp_mb(); /* caller's subsequent code after above check. */
>  		mutex_unlock(&rsp->barrier_mutex);
>  		return;
> @@ -3604,7 +3607,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
>  
>  	/* Mark the start of the barrier operation. */
>  	rcu_seq_start(&rsp->barrier_sequence);
> -	_rcu_barrier_trace(rsp, "Inc1", -1, rsp->barrier_sequence);
> +	_rcu_barrier_trace(rsp, TPS("Inc1"), -1, rsp->barrier_sequence);
>  
>  	/*
>  	 * Initialize the count to one rather than to zero in order to
> @@ -3627,10 +3630,10 @@ static void _rcu_barrier(struct rcu_state *rsp)
>  		rdp = per_cpu_ptr(rsp->rda, cpu);
>  		if (rcu_is_nocb_cpu(cpu)) {
>  			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
> -				_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
> +				_rcu_barrier_trace(rsp, TPS("OfflineNoCB"), cpu,
>  						   rsp->barrier_sequence);
>  			} else {
> -				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
> +				_rcu_barrier_trace(rsp, TPS("OnlineNoCB"), cpu,
>  						   rsp->barrier_sequence);
>  				smp_mb__before_atomic();
>  				atomic_inc(&rsp->barrier_cpu_count);
> @@ -3638,11 +3641,11 @@ static void _rcu_barrier(struct rcu_state *rsp)
>  					   rcu_barrier_callback, rsp, cpu, 0);
>  			}
>  		} else if (rcu_segcblist_n_cbs(&rdp->cblist)) {
> -			_rcu_barrier_trace(rsp, "OnlineQ", cpu,
> +			_rcu_barrier_trace(rsp, TPS("OnlineQ"), cpu,
>  					   rsp->barrier_sequence);
>  			smp_call_function_single(cpu, rcu_barrier_func, rsp, 1);
>  		} else {
> -			_rcu_barrier_trace(rsp, "OnlineNQ", cpu,
> +			_rcu_barrier_trace(rsp, TPS("OnlineNQ"), cpu,
>  					   rsp->barrier_sequence);
>  		}
>  	}
> @@ -3659,7 +3662,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
>  	wait_for_completion(&rsp->barrier_completion);
>  
>  	/* Mark the end of the barrier operation. */
> -	_rcu_barrier_trace(rsp, "Inc2", -1, rsp->barrier_sequence);
> +	_rcu_barrier_trace(rsp, TPS("Inc2"), -1, rsp->barrier_sequence);
>  	rcu_seq_end(&rsp->barrier_sequence);
>  
>  	/* Other rcu_barrier() invocations can now safely proceed. */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start
  2017-07-28  1:38   ` Steven Rostedt
@ 2017-07-28  3:22     ` Paul E. McKenney
  2017-07-28 12:18       ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-28  3:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Thu, Jul 27, 2017 at 09:38:10PM -0400, Steven Rostedt wrote:
> On Mon, 24 Jul 2017 14:44:36 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > There is currently event tracing to track when a task is preempted
> > within a preemptible RCU read-side critical section, and also when that
> > task subsequently reaches its outermost rcu_read_unlock(), but none
> > indicating when a new grace period starts when that grace period must
> > wait on pre-existing readers that have been been preempted at least once
> > since the beginning of their current RCU read-side critical sections.
> > 
> > This commit therefore adds an event trace at grace-period start in
> > the case where there are such readers.  Note that only the first
> > reader in the list is traced.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/tree_plugin.h | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 14ba496a13cd..3e3f92e981a1 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> >   */
> >  static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> >  {
> > +	struct task_struct *t;
> > +
> >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
> >  	WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
> > -	if (rcu_preempt_has_tasks(rnp))
> > +	if (rcu_preempt_has_tasks(rnp)) {
> 
> The only function of this if block is to fill the content of the
> trace event, correct?
> 
> What about doing:
> 
> 	if (trace_rcu_unlock_preempted_task_enabled() &&
> 	    rcu_preempt_has_tasks(rnp)) {
> 
> instead? The trace_rcu_unlock_preempted_task_enabled() is a static
> branch (aka jump_label), which would make the above a constant branch
> when tracing is not enabled, and would keep this from adding any extra
> overhead.
> 
> -- Steve
> 
> >  		rnp->gp_tasks = rnp->blkd_tasks.next;

The trace_rcu_unlock_preempted_task_enabled() call is a new one on me,
thank you!

Unfortunately, the above assignment to rnp->gp_tasks is required even
if tracing is disabled.  The reason is that the newly started grace
period needs to wait on all tasks that have been preempted within their
current RCU read-side critical section, and rnp->gp_tasks records the
point in the rnp->blkd_tasks list beyond which all preempted tasks block
this new grace period.

If this assignment is omitted, we get too-short grace periods, and the
tasks on this list might still be holding references to stuff that gets
freed at the end of this new grace period.

I applied your two acks, thank you!

							Thanx, Paul

> > +		t = container_of(rnp->gp_tasks, struct task_struct,
> > +				 rcu_node_entry);
> > +		trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
> > +						rnp->gpnum, t->pid);
> > +	}
> >  	WARN_ON_ONCE(rnp->qsmask);
> >  }
> >  
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start
  2017-07-28  3:22     ` Paul E. McKenney
@ 2017-07-28 12:18       ` Steven Rostedt
  2017-07-28 17:13         ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-07-28 12:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Thu, 27 Jul 2017 20:22:32 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Thu, Jul 27, 2017 at 09:38:10PM -0400, Steven Rostedt wrote:
> > On Mon, 24 Jul 2017 14:44:36 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > There is currently event tracing to track when a task is preempted
> > > within a preemptible RCU read-side critical section, and also when that
> > > task subsequently reaches its outermost rcu_read_unlock(), but none
> > > indicating when a new grace period starts when that grace period must
> > > wait on pre-existing readers that have been been preempted at least once
> > > since the beginning of their current RCU read-side critical sections.
> > > 
> > > This commit therefore adds an event trace at grace-period start in
> > > the case where there are such readers.  Note that only the first
> > > reader in the list is traced.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 14ba496a13cd..3e3f92e981a1 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > >   */
> > >  static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> > >  {
> > > +	struct task_struct *t;
> > > +
> > >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
> > >  	WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
> > > -	if (rcu_preempt_has_tasks(rnp))
> > > +	if (rcu_preempt_has_tasks(rnp)) {  
> > 
> > The only function of this if block is to fill the content of the
> > trace event, correct?
> > 
> > What about doing:
> > 
> > 	if (trace_rcu_unlock_preempted_task_enabled() &&
> > 	    rcu_preempt_has_tasks(rnp)) {
> > 
> > instead? The trace_rcu_unlock_preempted_task_enabled() is a static
> > branch (aka jump_label), which would make the above a constant branch
> > when tracing is not enabled, and would keep this from adding any extra
> > overhead.
> > 
> > -- Steve
> >   
> > >  		rnp->gp_tasks = rnp->blkd_tasks.next;  
> 
> The trace_rcu_unlock_preempted_task_enabled() call is a new one on me,
> thank you!
> 
> Unfortunately, the above assignment to rnp->gp_tasks is required even
> if tracing is disabled.  The reason is that the newly started grace
> period needs to wait on all tasks that have been preempted within their
> current RCU read-side critical section, and rnp->gp_tasks records the
> point in the rnp->blkd_tasks list beyond which all preempted tasks block
> this new grace period.
> 
> If this assignment is omitted, we get too-short grace periods, and the
> tasks on this list might still be holding references to stuff that gets
> freed at the end of this new grace period.
> 
> I applied your two acks, thank you!
> 

And with you answer about the block not just being for tracing, you can
add my acked-by here too ;-)

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start
  2017-07-28 12:18       ` Steven Rostedt
@ 2017-07-28 17:13         ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-07-28 17:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, dhowells, edumazet,
	fweisbec, oleg

On Fri, Jul 28, 2017 at 08:18:18AM -0400, Steven Rostedt wrote:
> On Thu, 27 Jul 2017 20:22:32 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Thu, Jul 27, 2017 at 09:38:10PM -0400, Steven Rostedt wrote:
> > > On Mon, 24 Jul 2017 14:44:36 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > There is currently event tracing to track when a task is preempted
> > > > within a preemptible RCU read-side critical section, and also when that
> > > > task subsequently reaches its outermost rcu_read_unlock(), but none
> > > > indicating when a new grace period starts when that grace period must
> > > > wait on pre-existing readers that have been been preempted at least once
> > > > since the beginning of their current RCU read-side critical sections.
> > > > 
> > > > This commit therefore adds an event trace at grace-period start in
> > > > the case where there are such readers.  Note that only the first
> > > > reader in the list is traced.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > ---
> > > >  kernel/rcu/tree_plugin.h | 9 ++++++++-
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 14ba496a13cd..3e3f92e981a1 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > > >   */
> > > >  static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> > > >  {
> > > > +	struct task_struct *t;
> > > > +
> > > >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
> > > >  	WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
> > > > -	if (rcu_preempt_has_tasks(rnp))
> > > > +	if (rcu_preempt_has_tasks(rnp)) {  
> > > 
> > > The only function of this if block is to fill the content of the
> > > trace event, correct?
> > > 
> > > What about doing:
> > > 
> > > 	if (trace_rcu_unlock_preempted_task_enabled() &&
> > > 	    rcu_preempt_has_tasks(rnp)) {
> > > 
> > > instead? The trace_rcu_unlock_preempted_task_enabled() is a static
> > > branch (aka jump_label), which would make the above a constant branch
> > > when tracing is not enabled, and would keep this from adding any extra
> > > overhead.
> > > 
> > > -- Steve
> > >   
> > > >  		rnp->gp_tasks = rnp->blkd_tasks.next;  
> > 
> > The trace_rcu_unlock_preempted_task_enabled() call is a new one on me,
> > thank you!
> > 
> > Unfortunately, the above assignment to rnp->gp_tasks is required even
> > if tracing is disabled.  The reason is that the newly started grace
> > period needs to wait on all tasks that have been preempted within their
> > current RCU read-side critical section, and rnp->gp_tasks records the
> > point in the rnp->blkd_tasks list beyond which all preempted tasks block
> > this new grace period.
> > 
> > If this assignment is omitted, we get too-short grace periods, and the
> > tasks on this list might still be holding references to stuff that gets
> > freed at the end of this new grace period.
> > 
> > I applied your two acks, thank you!
> > 
> 
> And with you answer about the block not just being for tracing, you can
> add my acked-by here too ;-)

Added, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state
  2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
@ 2017-08-15 16:19   ` Paul E. McKenney
  2017-08-17  8:22     ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-08-15 16:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg

There is some confusion as to which of cond_resched() or
cond_resched_rcu_qs() should be added to long in-kernel loops.
This commit therefore eliminates the decision by adding RCU quiescent
states to cond_resched().  This commit also simplifies the code that
used to interact with cond_resched_rcu_qs(), and that now interacts with
cond_resched(), to reduce its overhead.  This reduction is necessary to
allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
everywhere that cond_resched() is invoked.

Part of that reduction in overhead converts the jiffies_till_sched_qs
kernel parameter to read-only at runtime, thus eliminating the need for
bounds checking.

Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Tuning for performance issues reported by 0day Test Robot. ]

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8337e2db0bb2..d2f291a3a44a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1522,10 +1522,11 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
  * cond_resched_lock() will drop the spinlock before scheduling,
  * cond_resched_softirq() will enable bhs before scheduling.
  */
+void rcu_all_qs(void);
 #ifndef CONFIG_PREEMPT
 extern int _cond_resched(void);
 #else
-static inline int _cond_resched(void) { return 0; }
+static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
 #endif
 
 #define cond_resched() ({			\
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 51d4c3acf32d..e40cb5190783 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -537,8 +537,8 @@ module_param(rcu_kick_kthreads, bool, 0644);
  * How long the grace period must be before we start recruiting
  * quiescent-state help from rcu_note_context_switch().
  */
-static ulong jiffies_till_sched_qs = HZ / 20;
-module_param(jiffies_till_sched_qs, ulong, 0644);
+static ulong jiffies_till_sched_qs = HZ / 10;
+module_param(jiffies_till_sched_qs, ulong, 0444);
 
 static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
 				  struct rcu_data *rdp);
@@ -1230,7 +1230,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	unsigned long jtsq;
 	bool *rnhqp;
 	bool *ruqp;
-	unsigned long rjtsc;
 	struct rcu_node *rnp;
 
 	/*
@@ -1247,23 +1246,13 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		return 1;
 	}
 
-	/* Compute and saturate jiffies_till_sched_qs. */
-	jtsq = jiffies_till_sched_qs;
-	rjtsc = rcu_jiffies_till_stall_check();
-	if (jtsq > rjtsc / 2) {
-		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
-		jtsq = rjtsc / 2;
-	} else if (jtsq < 1) {
-		WRITE_ONCE(jiffies_till_sched_qs, 1);
-		jtsq = 1;
-	}
-
 	/*
 	 * Has this CPU encountered a cond_resched_rcu_qs() since the
 	 * beginning of the grace period?  For this to be the case,
 	 * the CPU has to have noticed the current grace period.  This
 	 * might not be the case for nohz_full CPUs looping in the kernel.
 	 */
+	jtsq = jiffies_till_sched_qs;
 	rnp = rdp->mynode;
 	ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
 	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
@@ -1271,7 +1260,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
 		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
 		return 1;
-	} else {
+	} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
 		/* Load rcu_qs_ctr before store to rcu_urgent_qs. */
 		smp_store_release(ruqp, true);
 	}
@@ -1299,10 +1288,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 	 * updates are only once every few jiffies, the probability of
 	 * lossage (and thus of slight grace-period extension) is
 	 * quite low.
-	 *
-	 * Note that if the jiffies_till_sched_qs boot/sysfs parameter
-	 * is set too high, we override with half of the RCU CPU stall
-	 * warning delay.
 	 */
 	rnhqp = &per_cpu(rcu_dynticks.rcu_need_heavy_qs, rdp->cpu);
 	if (!READ_ONCE(*rnhqp) &&
@@ -1311,7 +1296,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		WRITE_ONCE(*rnhqp, true);
 		/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
 		smp_store_release(ruqp, true);
-		rdp->rsp->jiffies_resched += 5; /* Re-enable beating. */
+		rdp->rsp->jiffies_resched += jtsq; /* Re-enable beating. */
 	}
 
 	/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 17c667b427b4..9433633012ba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4808,6 +4808,7 @@ int __sched _cond_resched(void)
 		preempt_schedule_common();
 		return 1;
 	}
+	rcu_all_qs();
 	return 0;
 }
 EXPORT_SYMBOL(_cond_resched);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state
  2017-08-15 16:19   ` [PATCH v5 " Paul E. McKenney
@ 2017-08-17  8:22     ` Ingo Molnar
  2017-08-17 12:40       ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2017-08-17  8:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Linus Torvalds


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> There is some confusion as to which of cond_resched() or
> cond_resched_rcu_qs() should be added to long in-kernel loops.
> This commit therefore eliminates the decision by adding RCU quiescent
> states to cond_resched().  This commit also simplifies the code that
> used to interact with cond_resched_rcu_qs(), and that now interacts with
> cond_resched(), to reduce its overhead.  This reduction is necessary to
> allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
> everywhere that cond_resched() is invoked.
> 
> Part of that reduction in overhead converts the jiffies_till_sched_qs
> kernel parameter to read-only at runtime, thus eliminating the need for
> bounds checking.
> 
> Reported-by: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> [ paulmck: Tuning for performance issues reported by 0day Test Robot. ]
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8337e2db0bb2..d2f291a3a44a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1522,10 +1522,11 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
>   * cond_resched_lock() will drop the spinlock before scheduling,
>   * cond_resched_softirq() will enable bhs before scheduling.
>   */
> +void rcu_all_qs(void);
>  #ifndef CONFIG_PREEMPT
>  extern int _cond_resched(void);
>  #else
> -static inline int _cond_resched(void) { return 0; }
> +static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
>  #endif
>  
>  #define cond_resched() ({			\
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 51d4c3acf32d..e40cb5190783 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -537,8 +537,8 @@ module_param(rcu_kick_kthreads, bool, 0644);
>   * How long the grace period must be before we start recruiting
>   * quiescent-state help from rcu_note_context_switch().
>   */
> -static ulong jiffies_till_sched_qs = HZ / 20;
> -module_param(jiffies_till_sched_qs, ulong, 0644);
> +static ulong jiffies_till_sched_qs = HZ / 10;
> +module_param(jiffies_till_sched_qs, ulong, 0444);
>  
>  static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
>  				  struct rcu_data *rdp);
> @@ -1230,7 +1230,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  	unsigned long jtsq;
>  	bool *rnhqp;
>  	bool *ruqp;
> -	unsigned long rjtsc;
>  	struct rcu_node *rnp;
>  
>  	/*
> @@ -1247,23 +1246,13 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  		return 1;
>  	}
>  
> -	/* Compute and saturate jiffies_till_sched_qs. */
> -	jtsq = jiffies_till_sched_qs;
> -	rjtsc = rcu_jiffies_till_stall_check();
> -	if (jtsq > rjtsc / 2) {
> -		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
> -		jtsq = rjtsc / 2;
> -	} else if (jtsq < 1) {
> -		WRITE_ONCE(jiffies_till_sched_qs, 1);
> -		jtsq = 1;
> -	}
> -
>  	/*
>  	 * Has this CPU encountered a cond_resched_rcu_qs() since the
>  	 * beginning of the grace period?  For this to be the case,
>  	 * the CPU has to have noticed the current grace period.  This
>  	 * might not be the case for nohz_full CPUs looping in the kernel.
>  	 */
> +	jtsq = jiffies_till_sched_qs;
>  	rnp = rdp->mynode;
>  	ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
>  	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
> @@ -1271,7 +1260,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
>  		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
>  		return 1;
> -	} else {
> +	} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
>  		/* Load rcu_qs_ctr before store to rcu_urgent_qs. */
>  		smp_store_release(ruqp, true);
>  	}
> @@ -1299,10 +1288,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  	 * updates are only once every few jiffies, the probability of
>  	 * lossage (and thus of slight grace-period extension) is
>  	 * quite low.
> -	 *
> -	 * Note that if the jiffies_till_sched_qs boot/sysfs parameter
> -	 * is set too high, we override with half of the RCU CPU stall
> -	 * warning delay.
>  	 */
>  	rnhqp = &per_cpu(rcu_dynticks.rcu_need_heavy_qs, rdp->cpu);
>  	if (!READ_ONCE(*rnhqp) &&
> @@ -1311,7 +1296,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  		WRITE_ONCE(*rnhqp, true);
>  		/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
>  		smp_store_release(ruqp, true);
> -		rdp->rsp->jiffies_resched += 5; /* Re-enable beating. */
> +		rdp->rsp->jiffies_resched += jtsq; /* Re-enable beating. */
>  	}
>  
>  	/*
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 17c667b427b4..9433633012ba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4808,6 +4808,7 @@ int __sched _cond_resched(void)
>  		preempt_schedule_common();
>  		return 1;
>  	}
> +	rcu_all_qs();
>  	return 0;
>  }
>  EXPORT_SYMBOL(_cond_resched);

So I'm a bit uneasy about this change:

- There's hundreds of uses of cond_resched(), some of them in commonly inlined
  functions.

- cond_resched() typically gets called in functions that _might_ take a long time
  to execute, but that's not a given.

- it's definitely getting called opportunistically as well, under
  PREEMPT_VOLUNTARY, from common lightweight helpers that we know are in
  schedulable contexts. We risk adding significant overhead here.

So what we risk here is turning a known to be super simple function into something 
much slower - and exporting slowdowns to literally thousands of explicit and 
implicit usage sites.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state
  2017-08-17  8:22     ` Ingo Molnar
@ 2017-08-17 12:40       ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-08-17 12:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Linus Torvalds

On Thu, Aug 17, 2017 at 10:22:44AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > There is some confusion as to which of cond_resched() or
> > cond_resched_rcu_qs() should be added to long in-kernel loops.
> > This commit therefore eliminates the decision by adding RCU quiescent
> > states to cond_resched().  This commit also simplifies the code that
> > used to interact with cond_resched_rcu_qs(), and that now interacts with
> > cond_resched(), to reduce its overhead.  This reduction is necessary to
> > allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
> > everywhere that cond_resched() is invoked.
> > 
> > Part of that reduction in overhead converts the jiffies_till_sched_qs
> > kernel parameter to read-only at runtime, thus eliminating the need for
> > bounds checking.
> > 
> > Reported-by: Michal Hocko <mhocko@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > [ paulmck: Tuning for performance issues reported by 0day Test Robot. ]
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 8337e2db0bb2..d2f291a3a44a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1522,10 +1522,11 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
> >   * cond_resched_lock() will drop the spinlock before scheduling,
> >   * cond_resched_softirq() will enable bhs before scheduling.
> >   */
> > +void rcu_all_qs(void);
> >  #ifndef CONFIG_PREEMPT
> >  extern int _cond_resched(void);
> >  #else
> > -static inline int _cond_resched(void) { return 0; }
> > +static inline int _cond_resched(void) { rcu_all_qs(); return 0; }
> >  #endif
> >  
> >  #define cond_resched() ({			\
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 51d4c3acf32d..e40cb5190783 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -537,8 +537,8 @@ module_param(rcu_kick_kthreads, bool, 0644);
> >   * How long the grace period must be before we start recruiting
> >   * quiescent-state help from rcu_note_context_switch().
> >   */
> > -static ulong jiffies_till_sched_qs = HZ / 20;
> > -module_param(jiffies_till_sched_qs, ulong, 0644);
> > +static ulong jiffies_till_sched_qs = HZ / 10;
> > +module_param(jiffies_till_sched_qs, ulong, 0444);
> >  
> >  static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
> >  				  struct rcu_data *rdp);
> > @@ -1230,7 +1230,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	unsigned long jtsq;
> >  	bool *rnhqp;
> >  	bool *ruqp;
> > -	unsigned long rjtsc;
> >  	struct rcu_node *rnp;
> >  
> >  	/*
> > @@ -1247,23 +1246,13 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  		return 1;
> >  	}
> >  
> > -	/* Compute and saturate jiffies_till_sched_qs. */
> > -	jtsq = jiffies_till_sched_qs;
> > -	rjtsc = rcu_jiffies_till_stall_check();
> > -	if (jtsq > rjtsc / 2) {
> > -		WRITE_ONCE(jiffies_till_sched_qs, rjtsc);
> > -		jtsq = rjtsc / 2;
> > -	} else if (jtsq < 1) {
> > -		WRITE_ONCE(jiffies_till_sched_qs, 1);
> > -		jtsq = 1;
> > -	}
> > -
> >  	/*
> >  	 * Has this CPU encountered a cond_resched_rcu_qs() since the
> >  	 * beginning of the grace period?  For this to be the case,
> >  	 * the CPU has to have noticed the current grace period.  This
> >  	 * might not be the case for nohz_full CPUs looping in the kernel.
> >  	 */
> > +	jtsq = jiffies_till_sched_qs;
> >  	rnp = rdp->mynode;
> >  	ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
> >  	if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
> > @@ -1271,7 +1260,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	    READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
> >  		trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
> >  		return 1;
> > -	} else {
> > +	} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
> >  		/* Load rcu_qs_ctr before store to rcu_urgent_qs. */
> >  		smp_store_release(ruqp, true);
> >  	}
> > @@ -1299,10 +1288,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  	 * updates are only once every few jiffies, the probability of
> >  	 * lossage (and thus of slight grace-period extension) is
> >  	 * quite low.
> > -	 *
> > -	 * Note that if the jiffies_till_sched_qs boot/sysfs parameter
> > -	 * is set too high, we override with half of the RCU CPU stall
> > -	 * warning delay.
> >  	 */
> >  	rnhqp = &per_cpu(rcu_dynticks.rcu_need_heavy_qs, rdp->cpu);
> >  	if (!READ_ONCE(*rnhqp) &&
> > @@ -1311,7 +1296,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
> >  		WRITE_ONCE(*rnhqp, true);
> >  		/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
> >  		smp_store_release(ruqp, true);
> > -		rdp->rsp->jiffies_resched += 5; /* Re-enable beating. */
> > +		rdp->rsp->jiffies_resched += jtsq; /* Re-enable beating. */
> >  	}
> >  
> >  	/*
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 17c667b427b4..9433633012ba 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4808,6 +4808,7 @@ int __sched _cond_resched(void)
> >  		preempt_schedule_common();
> >  		return 1;
> >  	}
> > +	rcu_all_qs();
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(_cond_resched);
> 
> So I'm a bit uneasy about this change:
> 
> - There's hundreds of uses of cond_resched(), some of them in commonly inlined
>   functions.
> 
> - cond_resched() typically gets called in functions that _might_ take a long time
>   to execute, but that's not a given.
> 
> - it's definitely getting called opportunistically as well, under
>   PREEMPT_VOLUNTARY, from common lightweight helpers that we know are in
>   schedulable contexts. We risk adding significant overhead here.
> 
> So what we risk here is turning a known to be super simple function into something 
> much slower - and exporting slowdowns to literally thousands of explicit and 
> implicit usage sites.

Yeah, I have been refining this one since November.  It finally seems
to make 0day Test Robot happy, but it isn't going to hurt to keep it
out for another cycle.

I will rebase it into my stack intended for 4.15 before sending you a
pull request.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-08-17 12:40 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24 21:44 [PATCH tip/core/rcu 0/15] General fixes Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 01/15] sched,rcu: Make cond_resched() provide RCU quiescent state Paul E. McKenney
2017-08-15 16:19   ` [PATCH v5 " Paul E. McKenney
2017-08-17  8:22     ` Ingo Molnar
2017-08-17 12:40       ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 02/15] rcu: Use timer as backstop for NOCB deferred wakeups Paul E. McKenney
2017-07-25 18:12   ` Steven Rostedt
2017-07-25 19:18     ` Paul E. McKenney
2017-07-25 22:17       ` Steven Rostedt
2017-07-26  0:05         ` Paul E. McKenney
2017-07-26 21:18           ` Steven Rostedt
2017-07-26 21:47             ` Paul E. McKenney
2017-07-26 23:09               ` Steven Rostedt
2017-07-27 17:33                 ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 03/15] rcu: Drive TASKS_RCU directly off of PREEMPT Paul E. McKenney
2017-07-25 18:14   ` Steven Rostedt
2017-07-25 19:19     ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 04/15] rcu: Create reasonable API for do_exit() TASKS_RCU processing Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 05/15] rcu: Add TPS() to event-traced strings Paul E. McKenney
2017-07-28  1:32   ` Steven Rostedt
2017-07-24 21:44 ` [PATCH tip/core/rcu 06/15] rcu: Move rcu.h to new trivial-function style Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start Paul E. McKenney
2017-07-28  1:38   ` Steven Rostedt
2017-07-28  3:22     ` Paul E. McKenney
2017-07-28 12:18       ` Steven Rostedt
2017-07-28 17:13         ` Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 08/15] swait: add idle variants which don't contribute to load average Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 09/15] rcu: use idle versions of swait to make idle-hack clear Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 10/15] rcu: Add TPS() protection for _rcu_barrier_trace strings Paul E. McKenney
2017-07-28  1:40   ` Steven Rostedt
2017-07-24 21:44 ` [PATCH tip/core/rcu 11/15] rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 12/15] rcu: Add assertions verifying blocked-tasks list Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 13/15] rcu: Make rcu_idle_enter() rely on callers disabling irqs Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 14/15] rcu: Add warning to rcu_idle_enter() for irqs enabled Paul E. McKenney
2017-07-24 21:44 ` [PATCH tip/core/rcu 15/15] rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.