All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17
@ 2014-07-07 22:37 Paul E. McKenney
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
  2014-07-09  2:14 ` [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17 Lai Jiangshan
  0 siblings, 2 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw

Hello!

This series provides miscellaneous fixes:

1.	Document deadlock-avoidance information in rcu_read_unlock()'s
	docbook comment header.

2.	Remove obsolete references to TINY_PREEMPT_RCU.

3.	Add deadlock explanation to local_irq_save() call in
	__lock_task_sighand().

4.	Make the rcu_node arrays be static const char * const,
	courtesy of Fabian Frederick.

5.	Remove redundant ACCESS_ONCE() from tick_do_timer_cpu under
	#ifdef CONFIG_NO_HZ_FULL.

6.	Eliminate read-modify-write ACCESS_ONCE() calls.

7.	Loosen __call_rcu()'s rcu_head alignment constraint to handle
	m68k's 16-bit alignment.

8.	Allow post-unlock reference for rt_mutex.

9.	Check both root and current rcu_node structures when setting up
	future grace periods, courtesy of Pranith Kumar.

10.	Simplify priority boosting by putting rt_mutex in rcu_node
	structure.

11.	Bind grace-period kthreads to no-NO_HZ_FULL CPUs instead of the
	timekeeping CPU, at least for CONFIG_NO_HZ_FULL_SYSIDLE=n.

12.	Don't use NMIs to dump other CPUs' stacks.

13.	Use __this_cpu_read() instead of per_cpu_ptr(), courtesy of Shan Wei.

14.	Remove CONFIG_PROVE_RCU_DELAY.

15.	Fix __rcu_reclaim to use true/false instead of 1/0.

16.	Fix sparse warning in rcu_initiate_boost(), courtesy of Pranith
	Kumar.

17.	Fix sparse warning in rcu_report_unblock_qs_rnp(), again courtesy
	of Pranith Kumar.

							Thanx, Paul

------------------------------------------------------------------------

 b/include/linux/init_task.h                               |    9 --
 b/include/linux/rcupdate.h                                |   45 ++++++++--
 b/include/linux/sched.h                                   |    6 -
 b/include/linux/tick.h                                    |   19 ++++
 b/init/Kconfig                                            |    2 
 b/kernel/rcu/rcu.h                                        |    8 +
 b/kernel/rcu/srcu.c                                       |    4 
 b/kernel/rcu/tree.c                                       |   59 ++++++--------
 b/kernel/rcu/tree.h                                       |    8 +
 b/kernel/rcu/tree_plugin.h                                |   52 +++++++-----
 b/kernel/rcu/update.c                                     |    3 
 b/kernel/signal.c                                         |    4 
 b/kernel/time/tick-sched.c                                |   10 ++
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE02   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE06   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE08   |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TREE09   |    1 
 24 files changed, 147 insertions(+), 93 deletions(-)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock()
  2014-07-07 22:37 [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17 Paul E. McKenney
@ 2014-07-07 22:38 ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 02/17] rcu: Handle obsolete references to TINY_PREEMPT_RCU Paul E. McKenney
                     ` (15 more replies)
  2014-07-09  2:14 ` [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17 Lai Jiangshan
  1 sibling, 16 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6a94cc8b1ca0..c56ad15204ec 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -858,6 +858,34 @@ static inline void rcu_read_lock(void)
 /**
  * rcu_read_unlock() - marks the end of an RCU read-side critical section.
  *
+ * In most situations, rcu_read_unlock() is immune from deadlock.
+ * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock()
+ * is responsible for deboosting, which it does via rt_mutex_unlock().
+ * Unfortunately, this function acquires the scheduler's runqueue and
+ * priority-inheritance spinlocks.  This means that deadlock could result
+ * if the caller of rcu_read_unlock() already holds one of these locks or
+ * any lock that is ever acquired while holding them.
+ *
+ * That said, RCU readers are never priority boosted unless they were
+ * preempted.  Therefore, one way to avoid deadlock is to make sure
+ * that preemption never happens within any RCU read-side critical
+ * section whose outermost rcu_read_unlock() is called with one of
+ * rt_mutex_unlock()'s locks held.  Such preemption can be avoided in
+ * a number of ways, for example, by invoking preempt_disable() before
+ * critical section's outermost rcu_read_lock().
+ *
+ * Given that the set of locks acquired by rt_mutex_unlock() might change
+ * at any time, a somewhat more future-proofed approach is to make sure
+ * that that preemption never happens within any RCU read-side critical
+ * section whose outermost rcu_read_unlock() is called with irqs disabled.
+ * This approach relies on the fact that rt_mutex_unlock() currently only
+ * acquires irq-disabled locks.
+ *
+ * The second of these two approaches is best in most situations,
+ * however, the first approach can also be useful, at least to those
+ * developers willing to keep abreast of the set of locks acquired by
+ * rt_mutex_unlock().
+ *
  * See rcu_read_lock() for more information.
  */
 static inline void rcu_read_unlock(void)
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 02/17] rcu: Handle obsolete references to TINY_PREEMPT_RCU
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call Paul E. McKenney
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 17 ++++++++---------
 init/Kconfig             |  2 +-
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index c56ad15204ec..d231aa17b1d7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -826,15 +826,14 @@ static inline void rcu_preempt_sleep_check(void)
  * read-side critical section that would block in a !PREEMPT kernel.
  * But if you want the full story, read on!
  *
- * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
- * is illegal to block while in an RCU read-side critical section.  In
- * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
- * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
- * be preempted, but explicit blocking is illegal.  Finally, in preemptible
- * RCU implementations in real-time (with -rt patchset) kernel builds,
- * RCU read-side critical sections may be preempted and they may also
- * block, but only when acquiring spinlocks that are subject to priority
- * inheritance.
+ * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),
+ * it is illegal to block while in an RCU read-side critical section.
+ * In preemptible RCU implementations (TREE_PREEMPT_RCU) in CONFIG_PREEMPT
+ * kernel builds, RCU read-side critical sections may be preempted,
+ * but explicit blocking is illegal.  Finally, in preemptible RCU
+ * implementations in real-time (with -rt patchset) kernel builds, RCU
+ * read-side critical sections may be preempted and they may also block, but
+ * only when acquiring spinlocks that are subject to priority inheritance.
  */
 static inline void rcu_read_lock(void)
 {
diff --git a/init/Kconfig b/init/Kconfig
index 9d76b99af1b9..977b37806e95 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -505,7 +505,7 @@ config PREEMPT_RCU
 	def_bool TREE_PREEMPT_RCU
 	help
 	  This option enables preemptible-RCU code that is common between
-	  the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
+	  TREE_PREEMPT_RCU and, in the old days, TINY_PREEMPT_RCU.
 
 config RCU_STALL_COMMON
 	def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 02/17] rcu: Handle obsolete references to TINY_PREEMPT_RCU Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-08  9:01     ` Lai Jiangshan
  2014-07-07 22:38   ` [PATCH tip/core/rcu 04/17] rcu: Make rcu node arrays static const char * const Paul E. McKenney
                     ` (13 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The explicit local_irq_save() in __lock_task_sighand() is needed to avoid
a potential deadlock condition, as noted in a841796f11c90d53 (signal:
align __lock_task_sighand() irq disabling and RCU).  However, someone
reading the code might be forgiven for concluding that this separate
local_irq_save() was completely unnecessary.  This commit therefore adds
a comment referencing the shiny new block comment on rcu_read_unlock().

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index a4077e90f19f..46161e744760 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1263,6 +1263,10 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
 	struct sighand_struct *sighand;
 
 	for (;;) {
+		/*
+		 * Disable interrupts early to avoid deadlocks.
+		 * See rcu_read_unlock comment header for details.
+		 */
 		local_irq_save(*flags);
 		rcu_read_lock();
 		sighand = rcu_dereference(tsk->sighand);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 04/17] rcu: Make rcu node arrays static const char * const
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 02/17] rcu: Handle obsolete references to TINY_PREEMPT_RCU Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 05/17] rcu: remove redundant ACCESS_ONCE() from tick_do_timer_cpu Paul E. McKenney
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Fabian Frederick, Paul E. McKenney

From: Fabian Frederick <fabf@skynet.be>

Those two arrays are being passed to lockdep_init_map(), which expects
const char *, and are stored in lockdep_map the same way.

Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 625d0b0cd75a..ebd99af2214e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3564,14 +3564,16 @@ static void __init rcu_init_levelspread(struct rcu_state *rsp)
 static void __init rcu_init_one(struct rcu_state *rsp,
 		struct rcu_data __percpu *rda)
 {
-	static char *buf[] = { "rcu_node_0",
-			       "rcu_node_1",
-			       "rcu_node_2",
-			       "rcu_node_3" };  /* Match MAX_RCU_LVLS */
-	static char *fqs[] = { "rcu_node_fqs_0",
-			       "rcu_node_fqs_1",
-			       "rcu_node_fqs_2",
-			       "rcu_node_fqs_3" };  /* Match MAX_RCU_LVLS */
+	static const char * const buf[] = {
+		"rcu_node_0",
+		"rcu_node_1",
+		"rcu_node_2",
+		"rcu_node_3" };  /* Match MAX_RCU_LVLS */
+	static const char * const fqs[] = {
+		"rcu_node_fqs_0",
+		"rcu_node_fqs_1",
+		"rcu_node_fqs_2",
+		"rcu_node_fqs_3" };  /* Match MAX_RCU_LVLS */
 	static u8 fl_mask = 0x1;
 	int cpustride = 1;
 	int i;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 05/17] rcu: remove redundant ACCESS_ONCE() from tick_do_timer_cpu
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (2 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 04/17] rcu: Make rcu node arrays static const char * const Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-08 14:46     ` Frederic Weisbecker
  2014-07-07 22:38   ` [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls Paul E. McKenney
                     ` (11 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant
once boot completes.  Thus, there is no need to wrap it in ACCESS_ONCE()
in code that is built only when CONFIG_NO_HZ_FULL.  This commit therefore
removes the redundant ACCESS_ONCE().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 02ac0fb186b8..5da9f9b3abc9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2844,7 +2844,7 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
 static void rcu_bind_gp_kthread(void)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+	int cpu = tick_do_timer_cpu;
 
 	if (cpu < 0 || cpu >= nr_cpu_ids)
 		return;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (3 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 05/17] rcu: remove redundant ACCESS_ONCE() from tick_do_timer_cpu Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-08 16:59     ` Pranith Kumar
  2014-07-07 22:38   ` [PATCH tip/core/rcu 07/17] rcu: Loosen __call_rcu()'s rcu_head alignment constraint Paul E. McKenney
                     ` (10 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

RCU contains code of the following forms:

	ACCESS_ONCE(x)++;
	ACCESS_ONCE(x) += y;
	ACCESS_ONCE(x) -= y;

Now these constructs do operate correctly, but they really result in a
pair of volatile accesses, one to do the load and another to do the store.
This can be confusing, as the casual reader might well assume that (for
example) gcc might generate a memory-to-memory add instruction for each
of these three cases.  In fact, gcc will do no such thing.  Also, there
is a good chance that the kernel will move to separate load and store
variants of ACCESS_ONCE(), and constructs like the above could easily
confuse both people and scripts attempting to make that sort of change.
Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
only need the store to be volatile, so that the read-modify-write form
might be misleading.

This commit therefore changes the above forms in RCU so that each instance
of ACCESS_ONCE() either does a load or a store, but not both.  In a few
cases, ACCESS_ONCE() was not critical, for example, for maintaining
statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
entirely.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/srcu.c |  4 ++--
 kernel/rcu/tree.c | 12 ++++++------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index c639556f3fa0..e037f3eb2f7b 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -298,9 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp)
 
 	idx = ACCESS_ONCE(sp->completed) & 0x1;
 	preempt_disable();
-	ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->c[idx]) += 1;
+	__this_cpu_inc(sp->per_cpu_ref->c[idx]);
 	smp_mb(); /* B */  /* Avoid leaking the critical section. */
-	ACCESS_ONCE(this_cpu_ptr(sp->per_cpu_ref)->seq[idx]) += 1;
+	__this_cpu_inc(sp->per_cpu_ref->seq[idx]);
 	preempt_enable();
 	return idx;
 }
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ebd99af2214e..6bf7daebcc6b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2347,7 +2347,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 	}
 	smp_mb(); /* List handling before counting for rcu_barrier(). */
 	rdp->qlen_lazy -= count_lazy;
-	ACCESS_ONCE(rdp->qlen) -= count;
+	ACCESS_ONCE(rdp->qlen) = rdp->qlen - count;
 	rdp->n_cbs_invoked += count;
 
 	/* Reinstate batch limit if we have worked down the excess. */
@@ -2492,7 +2492,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 		if (rnp_old != NULL)
 			raw_spin_unlock(&rnp_old->fqslock);
 		if (ret) {
-			ACCESS_ONCE(rsp->n_force_qs_lh)++;
+			rsp->n_force_qs_lh++;
 			return;
 		}
 		rnp_old = rnp;
@@ -2504,7 +2504,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	smp_mb__after_unlock_lock();
 	raw_spin_unlock(&rnp_old->fqslock);
 	if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
-		ACCESS_ONCE(rsp->n_force_qs_lh)++;
+		rsp->n_force_qs_lh++;
 		raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
 		return;  /* Someone beat us to it. */
 	}
@@ -2693,7 +2693,7 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 		local_irq_restore(flags);
 		return;
 	}
-	ACCESS_ONCE(rdp->qlen)++;
+	ACCESS_ONCE(rdp->qlen) = rdp->qlen + 1;
 	if (lazy)
 		rdp->qlen_lazy++;
 	else
@@ -3257,7 +3257,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	 * ACCESS_ONCE() to prevent the compiler from speculating
 	 * the increment to precede the early-exit check.
 	 */
-	ACCESS_ONCE(rsp->n_barrier_done)++;
+	ACCESS_ONCE(rsp->n_barrier_done) = rsp->n_barrier_done + 1;
 	WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 1);
 	_rcu_barrier_trace(rsp, "Inc1", -1, rsp->n_barrier_done);
 	smp_mb(); /* Order ->n_barrier_done increment with below mechanism. */
@@ -3307,7 +3307,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 
 	/* Increment ->n_barrier_done to prevent duplicate work. */
 	smp_mb(); /* Keep increment after above mechanism. */
-	ACCESS_ONCE(rsp->n_barrier_done)++;
+	ACCESS_ONCE(rsp->n_barrier_done) = rsp->n_barrier_done + 1;
 	WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 0);
 	_rcu_barrier_trace(rsp, "Inc2", -1, rsp->n_barrier_done);
 	smp_mb(); /* Keep increment before caller's subsequent code. */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 07/17] rcu: Loosen __call_rcu()'s rcu_head alignment constraint
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (4 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex Paul E. McKenney
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The m68k architecture aligns only to 16-bit boundaries, which can cause
the align-to-32-bits check in __call_rcu() to trigger.  Because there is
currently no known potential need for more than one low-order bit, this
commit loosens the check to 16-bit boundaries.

Reported-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bf7daebcc6b..bcd635e42841 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2662,7 +2662,7 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 	unsigned long flags;
 	struct rcu_data *rdp;
 
-	WARN_ON_ONCE((unsigned long)head & 0x3); /* Misaligned rcu_head! */
+	WARN_ON_ONCE((unsigned long)head & 0x1); /* Misaligned rcu_head! */
 	if (debug_rcu_head_queue(head)) {
 		/* Probable double call_rcu(), so leak the callback. */
 		ACCESS_ONCE(head->func) = rcu_leak_callback;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (5 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 07/17] rcu: Loosen __call_rcu()'s rcu_head alignment constraint Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-09  1:50     ` Lai Jiangshan
  2014-07-07 22:38   ` [PATCH tip/core/rcu 09/17] rcu: Check both root and current rcu_node when setting up future grace period Paul E. McKenney
                     ` (8 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The current approach to RCU priority boosting uses an rt_mutex strictly
for its priority-boosting side effects.  The rt_mutex_init_proxy_locked()
function is used by the booster to initialize the lock as held by the
boostee.  The booster then uses rt_mutex_lock() to acquire this rt_mutex,
which priority-boosts the boostee.  When the boostee reaches the end
of its outermost RCU read-side critical section, it checks a field in
its task structure to see whether it has been boosted, and, if so, uses
rt_mutex_unlock() to release the rt_mutex.  The booster can then go on
to boost the next task that is blocking the current RCU grace period.

But reasonable implementations of rt_mutex_unlock() might result in the
boostee referencing the rt_mutex's data after releasing it.  But the
booster might have re-initialized the rt_mutex between the time that the
boostee released it and the time that it later referenced it.  This is
clearly asking for trouble, so this commit introduces a completion that
forces the booster to wait until the boostee has completely finished with
the rt_mutex, thus avoiding the case where the booster is re-initializing
the rt_mutex before the last boostee's last reference to that rt_mutex.

This of course does introduce some overhead, but the priority-boosting
code paths are miles from any possible fastpath, and the overhead of
executing the completion will normally be quite small compared to the
overhead of priority boosting and deboosting, so this should be OK.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.h        | 5 +++++
 kernel/rcu/tree_plugin.h | 8 +++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0f69a79c5b7d..3eeb919e26a2 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -172,6 +172,11 @@ struct rcu_node {
 				/*  queued on this rcu_node structure that */
 				/*  are blocking the current grace period, */
 				/*  there can be no such task. */
+	struct completion boost_completion;
+				/* Used to ensure that the rt_mutex used */
+				/*  to carry out the boosting is fully */
+				/*  released with no future boostee accesses */
+				/*  before that rt_mutex is re-initialized. */
 	unsigned long boost_time;
 				/* When to start boosting (jiffies). */
 	struct task_struct *boost_kthread_task;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 5da9f9b3abc9..9c811879d31e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -427,8 +427,10 @@ void rcu_read_unlock_special(struct task_struct *t)
 
 #ifdef CONFIG_RCU_BOOST
 		/* Unboost if we were boosted. */
-		if (rbmp)
+		if (rbmp) {
 			rt_mutex_unlock(rbmp);
+			complete(&rnp->boost_completion);
+		}
 #endif /* #ifdef CONFIG_RCU_BOOST */
 
 		/*
@@ -1202,10 +1204,14 @@ static int rcu_boost(struct rcu_node *rnp)
 	t = container_of(tb, struct task_struct, rcu_node_entry);
 	rt_mutex_init_proxy_locked(&mtx, t);
 	t->rcu_boost_mutex = &mtx;
+	init_completion(&rnp->boost_completion);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
 	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
 
+	/* Wait until boostee is done accessing mtx before reinitializing. */
+	wait_for_completion(&rnp->boost_completion);
+
 	return ACCESS_ONCE(rnp->exp_tasks) != NULL ||
 	       ACCESS_ONCE(rnp->boost_tasks) != NULL;
 }
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 09/17] rcu: Check both root and current rcu_node when setting up future grace period
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (6 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 10/17] rcu: Simplify priority boosting by putting rt_mutex in rcu_node Paul E. McKenney
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Pranith Kumar, Paul E. McKenney

From: Pranith Kumar <bobby.prani@gmail.com>

The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
and ->completed twice, once without ACCESS_ONCE() and once with it.
Which is pointless because we hold that rcu_node's ->lock at that point.
The intent was to check the current rcu_node structure and the root
rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
commit therefore makes that change.

The reason that it is safe to locklessly check the root rcu_nodes's
->gpnum and ->completed fields is that we hold the current rcu_node's
->lock, which constrains the root rcu_node's ability to change its
->gpnum and ->completed fields.  Of course, if there is a single rcu_node
structure, then rnp_root==rnp, and holding the lock prevents all changes.
If there is more than one rcu_node structure, then the code updates the
fields in the following order:

1.	Increment rnp_root->gpnum to start new grace period.
2.	Increment rnp->gpnum to initialize the current rcu_node,
	continuing initialization for the new grace period.
3.	Increment rnp_root->completed to end the current grace period.
4.	Increment rnp->completed to continue cleaning up after the
	old grace period.

So there are four possible combinations of relative values of these
four fields:

N   N   N   N:  RCU idle, new grace period must be initiated.
		Although rnp_root->gpnum might be incremented immediately
		after we check, that will just result in unnecessary work.
		The grace period already started, and we try to start it.

N+1 N   N   N:  RCU grace period just started.  No further change is
		possible because we hold rnp->lock, so the checks of
		rnp_root->gpnum and rnp_root->completed are stable.
		We know that our request for a future grace period will
		be seen during grace-period cleanup.

N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
		different than rnp->completed, we won't even look at
		rnp_root->gpnum and rnp_root->completed, so the possible
		concurrent change to rnp_root->completed does not matter.
		We know that our request for a future grace period will
		be seen during grace-period cleanup, which cannot pass
		this rcu_node because we hold its ->lock.

N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
		Because rnp->gpnum is different than rnp->completed, we
		won't look at rnp_root->gpnum and rnp_root->completed, so
		the possible concurrent change to rnp_root->completed does
		not matter.  We know that our request for a future grace
		period will be seen during grace-period cleanup, which
		cannot pass this rcu_node because we hold its ->lock.

Therefore, despite initial appearances, the lockless check is safe.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[ paulmck: Update comment to say why the lockless check is safe. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bcd635e42841..3f93033d3c61 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1305,10 +1305,16 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * believe that a grace period is in progress, then we must wait
 	 * for the one following, which is in "c".  Because our request
 	 * will be noticed at the end of the current grace period, we don't
-	 * need to explicitly start one.
+	 * need to explicitly start one.  We only do the lockless check
+	 * of rnp_root's fields if the current rcu_node structure thinks
+	 * there is no grace period in flight, and because we hold rnp->lock,
+	 * the only possible change is when rnp_root's two fields are
+	 * equal, in which case rnp_root->gpnum might be concurrently
+	 * incremented.  But that is OK, as it will just result in our
+	 * doing some extra useless work.
 	 */
 	if (rnp->gpnum != rnp->completed ||
-	    ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
+	    ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
 		rnp->need_future_gp[c & 0x1]++;
 		trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
 		goto out;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 10/17] rcu: Simplify priority boosting by putting rt_mutex in rcu_node
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (7 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 09/17] rcu: Check both root and current rcu_node when setting up future grace period Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs Paul E. McKenney
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

RCU priority boosting currently checks for boosting via a pointer in
task_struct.  However, this is not needed: As Oleg noted, if the
rt_mutex is placed in the rcu_node instead of on the booster's stack,
the boostee can simply check it see if it owns the lock.  This commit
makes this change, shrinking task_struct by one pointer and the kernel
by thirteen lines.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/init_task.h |  9 +--------
 include/linux/sched.h     |  6 ------
 kernel/rcu/tree.h         |  3 +++
 kernel/rcu/tree_plugin.h  | 25 +++++++++++--------------
 4 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6df7f9fe0d01..2bb4c4f3531a 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -102,12 +102,6 @@ extern struct group_info init_groups;
 #define INIT_IDS
 #endif
 
-#ifdef CONFIG_RCU_BOOST
-#define INIT_TASK_RCU_BOOST()						\
-	.rcu_boost_mutex = NULL,
-#else
-#define INIT_TASK_RCU_BOOST()
-#endif
 #ifdef CONFIG_TREE_PREEMPT_RCU
 #define INIT_TASK_RCU_TREE_PREEMPT()					\
 	.rcu_blocked_node = NULL,
@@ -119,8 +113,7 @@ extern struct group_info init_groups;
 	.rcu_read_lock_nesting = 0,					\
 	.rcu_read_unlock_special = 0,					\
 	.rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry),		\
-	INIT_TASK_RCU_TREE_PREEMPT()					\
-	INIT_TASK_RCU_BOOST()
+	INIT_TASK_RCU_TREE_PREEMPT()
 #else
 #define INIT_TASK_RCU_PREEMPT(tsk)
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 306f4f0c987a..3cfbc05e66e6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1270,9 +1270,6 @@ struct task_struct {
 #ifdef CONFIG_TREE_PREEMPT_RCU
 	struct rcu_node *rcu_blocked_node;
 #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
-#ifdef CONFIG_RCU_BOOST
-	struct rt_mutex *rcu_boost_mutex;
-#endif /* #ifdef CONFIG_RCU_BOOST */
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
 	struct sched_info sched_info;
@@ -2009,9 +2006,6 @@ static inline void rcu_copy_process(struct task_struct *p)
 #ifdef CONFIG_TREE_PREEMPT_RCU
 	p->rcu_blocked_node = NULL;
 #endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
-#ifdef CONFIG_RCU_BOOST
-	p->rcu_boost_mutex = NULL;
-#endif /* #ifdef CONFIG_RCU_BOOST */
 	INIT_LIST_HEAD(&p->rcu_node_entry);
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 3eeb919e26a2..60fb0eaa2d16 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -177,6 +177,9 @@ struct rcu_node {
 				/*  to carry out the boosting is fully */
 				/*  released with no future boostee accesses */
 				/*  before that rt_mutex is re-initialized. */
+	struct rt_mutex boost_mtx;
+				/* Used only for the priority-boosting */
+				/*  side effect, not as a lock. */
 	unsigned long boost_time;
 				/* When to start boosting (jiffies). */
 	struct task_struct *boost_kthread_task;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 9c811879d31e..719587af7b10 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -33,6 +33,7 @@
 #define RCU_KTHREAD_PRIO 1
 
 #ifdef CONFIG_RCU_BOOST
+#include "../locking/rtmutex_common.h"
 #define RCU_BOOST_PRIO CONFIG_RCU_BOOST_PRIO
 #else
 #define RCU_BOOST_PRIO RCU_KTHREAD_PRIO
@@ -336,7 +337,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 	unsigned long flags;
 	struct list_head *np;
 #ifdef CONFIG_RCU_BOOST
-	struct rt_mutex *rbmp = NULL;
+	bool drop_boost_mutex = false;
 #endif /* #ifdef CONFIG_RCU_BOOST */
 	struct rcu_node *rnp;
 	int special;
@@ -398,11 +399,8 @@ void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		if (&t->rcu_node_entry == rnp->boost_tasks)
 			rnp->boost_tasks = np;
-		/* Snapshot/clear ->rcu_boost_mutex with rcu_node lock held. */
-		if (t->rcu_boost_mutex) {
-			rbmp = t->rcu_boost_mutex;
-			t->rcu_boost_mutex = NULL;
-		}
+		/* Snapshot ->boost_mtx ownership with rcu_node lock held. */
+		drop_boost_mutex = rt_mutex_owner(&rnp->boost_mtx) == t;
 #endif /* #ifdef CONFIG_RCU_BOOST */
 
 		/*
@@ -427,8 +425,8 @@ void rcu_read_unlock_special(struct task_struct *t)
 
 #ifdef CONFIG_RCU_BOOST
 		/* Unboost if we were boosted. */
-		if (rbmp) {
-			rt_mutex_unlock(rbmp);
+		if (drop_boost_mutex) {
+			rt_mutex_unlock(&rnp->boost_mtx);
 			complete(&rnp->boost_completion);
 		}
 #endif /* #ifdef CONFIG_RCU_BOOST */
@@ -1151,7 +1149,6 @@ static void rcu_wake_cond(struct task_struct *t, int status)
 static int rcu_boost(struct rcu_node *rnp)
 {
 	unsigned long flags;
-	struct rt_mutex mtx;
 	struct task_struct *t;
 	struct list_head *tb;
 
@@ -1202,14 +1199,14 @@ static int rcu_boost(struct rcu_node *rnp)
 	 * section.
 	 */
 	t = container_of(tb, struct task_struct, rcu_node_entry);
-	rt_mutex_init_proxy_locked(&mtx, t);
-	t->rcu_boost_mutex = &mtx;
+	rt_mutex_init_proxy_locked(&rnp->boost_mtx, t);
 	init_completion(&rnp->boost_completion);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
-	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
-	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
+	/* Lock only for side effect: boosts task t's priority. */
+	rt_mutex_lock(&rnp->boost_mtx);
+	rt_mutex_unlock(&rnp->boost_mtx);  /* Then keep lockdep happy. */
 
-	/* Wait until boostee is done accessing mtx before reinitializing. */
+	/* Wait for boostee to be done w/boost_mtx before reinitializing. */
 	wait_for_completion(&rnp->boost_completion);
 
 	return ACCESS_ONCE(rnp->exp_tasks) != NULL ||
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (8 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 10/17] rcu: Simplify priority boosting by putting rt_mutex in rcu_node Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-08 15:24     ` Frederic Weisbecker
  2014-07-07 22:38   ` [PATCH tip/core/rcu 12/17] rcu: Don't use NMIs to dump other CPUs' stacks Paul E. McKenney
                     ` (5 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Binding the grace-period kthreads to the timekeeping CPU resulted in
significant performance decreases for some workloads.  For more detail,
see:

https://lkml.org/lkml/2014/6/3/395 for benchmark numbers

https://lkml.org/lkml/2014/6/4/218 for CPU statistics

It turns out that it is necessary to bind the grace-period kthreads
to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
In other cases, it suffices to bind the grace-period kthreads to the
set of non-nohz_full CPUs.

This commit therefore creates a tick_nohz_not_full_mask that is the
complement of tick_nohz_full_mask, and then binds the grace-period
kthread to the set of CPUs indicated by this new mask, which covers
the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
case still binds the grace-period kthreads to the timekeeping CPU.
This commit also includes the tick_nohz_full_enabled() check suggested
by Frederic Weisbecker.

Reported-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
---
 include/linux/tick.h     | 19 +++++++++++++++++++
 kernel/rcu/tree_plugin.h | 14 +++++++++-----
 kernel/time/tick-sched.c | 10 ++++++++++
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773cb9f4c..c39af3261351 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -12,6 +12,7 @@
 #include <linux/hrtimer.h>
 #include <linux/context_tracking_state.h>
 #include <linux/cpumask.h>
+#include <linux/sched.h>
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
@@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 #ifdef CONFIG_NO_HZ_FULL
 extern bool tick_nohz_full_running;
 extern cpumask_var_t tick_nohz_full_mask;
+extern cpumask_var_t tick_nohz_not_full_mask;
 
 static inline bool tick_nohz_full_enabled(void)
 {
@@ -194,6 +196,23 @@ static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
 #endif
 
+static inline bool is_housekeeping_cpu(int cpu)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	if (tick_nohz_full_enabled())
+		return cpumask_test_cpu(cpu, tick_nohz_not_full_mask);
+#endif
+	return true;
+}
+
+static inline void housekeeping_affine(struct task_struct *t)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	if (tick_nohz_full_enabled())
+		set_cpus_allowed_ptr(t, tick_nohz_not_full_mask);
+#endif
+}
+
 static inline void tick_nohz_full_check(void)
 {
 	if (tick_nohz_full_enabled())
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 719587af7b10..b39ba7239bd6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2846,12 +2846,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
  */
 static void rcu_bind_gp_kthread(void)
 {
-#ifdef CONFIG_NO_HZ_FULL
-	int cpu = tick_do_timer_cpu;
+	int __maybe_unused cpu;
 
-	if (cpu < 0 || cpu >= nr_cpu_ids)
+	if (!tick_nohz_full_enabled())
 		return;
-	if (raw_smp_processor_id() != cpu)
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
+	cpu = tick_do_timer_cpu;
+	if (cpu >= 0 && cpu < nr_cpu_ids && raw_smp_processor_id() != cpu)
 		set_cpus_allowed_ptr(current, cpumask_of(cpu));
-#endif /* #ifdef CONFIG_NO_HZ_FULL */
+#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+	if (!is_housekeeping_cpu(raw_smp_processor_id()))
+		housekeeping_affine(current);
+#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7ac112d..e023134d63a1 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -154,6 +154,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 
 #ifdef CONFIG_NO_HZ_FULL
 cpumask_var_t tick_nohz_full_mask;
+cpumask_var_t tick_nohz_not_full_mask;
 bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
@@ -281,6 +282,7 @@ static int __init tick_nohz_full_setup(char *str)
 	int cpu;
 
 	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
 	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
 		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
 		return 1;
@@ -291,6 +293,8 @@ static int __init tick_nohz_full_setup(char *str)
 		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
 		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
 	}
+	cpumask_andnot(tick_nohz_not_full_mask,
+		       cpu_possible_mask, tick_nohz_full_mask);
 	tick_nohz_full_running = true;
 
 	return 1;
@@ -332,9 +336,15 @@ static int tick_nohz_init_all(void)
 		pr_err("NO_HZ: Can't allocate full dynticks cpumask\n");
 		return err;
 	}
+	if (!alloc_cpumask_var(&tick_nohz_not_full_mask, GFP_KERNEL)) {
+		pr_err("NO_HZ: Can't allocate not-full dynticks cpumask\n");
+		return err;
+	}
 	err = 0;
 	cpumask_setall(tick_nohz_full_mask);
 	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
+	cpumask_clear(tick_nohz_not_full_mask);
+	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
 	tick_nohz_full_running = true;
 #endif
 	return err;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 12/17] rcu: Don't use NMIs to dump other CPUs' stacks
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (9 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 13/17] rcu: Use __this_cpu_read() instead of per_cpu_ptr() Paul E. McKenney
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks.  This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3f93033d3c61..8f3e4d43d736 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1013,10 +1013,7 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
 }
 
 /*
- * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
- * for architectures that do not implement trigger_all_cpu_backtrace().
- * The NMI-triggered stack traces are more accurate because they are
- * printed by the target CPU.
+ * Dump stacks of all tasks running on stalled CPUs.
  */
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
@@ -1094,7 +1091,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	       (long)rsp->gpnum, (long)rsp->completed, totqlen);
 	if (ndetected == 0)
 		pr_err("INFO: Stall ended before state dump start\n");
-	else if (!trigger_all_cpu_backtrace())
+	else
 		rcu_dump_cpu_stacks(rsp);
 
 	/* Complain about tasks blocking the grace period. */
@@ -1125,8 +1122,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
 		jiffies - rsp->gp_start,
 		(long)rsp->gpnum, (long)rsp->completed, totqlen);
-	if (!trigger_all_cpu_backtrace())
-		dump_stack();
+	rcu_dump_cpu_stacks(rsp);
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp->jiffies_stall)))
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 13/17] rcu: Use __this_cpu_read() instead of per_cpu_ptr()
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (10 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 12/17] rcu: Don't use NMIs to dump other CPUs' stacks Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY Paul E. McKenney
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney, Shan Wei, Pranith Kumar

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The __this_cpu_read() function produces better code than does
per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
4.7.3-12ubuntu1) 4.7.3 produces the following:

ARMv7 per_cpu_ptr():

force_quiescent_state:
    mov    r3, sp    @,
    bic    r1, r3, #8128    @ tmp171,,
    ldr    r2, .L98    @ tmp169,
    bic    r1, r1, #63    @ tmp170, tmp171,
    ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)->rda
    ldr    r1, [r1, #20]    @ D.35903_68->cpu, D.35903_68->cpu
    mov    r6, r0    @ rsp, rsp
    ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
    add    r3, r3, r2    @ tmp175, __ptr, tmp173
    ldr    r5, [r3, #12]    @ rnp_old, D.29162_13->mynode

ARMv7 __this_cpu_read():

force_quiescent_state:
    ldr    r3, [r0, #220]    @ rsp_7(D)->rda, rsp_7(D)->rda
    mov    r6, r0    @ rsp, rsp
    add    r3, r3, #12    @ __ptr, rsp_7(D)->rda,
    ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13

Using gcc 4.8.2:

x86_64 per_cpu_ptr():

    movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
    movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
    movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)->rda, __ptr
    movq    24(%rdx,%rax), %r12    # _15->mynode, rnp_old

x86_64 __this_cpu_read():

    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)->rda, rsp_9(D)->rda
    movq %gs:24(%rax),%r12    # _10->mynode, rnp_old

Because this change produces significant benefits for these two very
diverse architectures, this commit makes this change.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8f3e4d43d736..a6c5424ffa38 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2487,7 +2487,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
 	struct rcu_node *rnp_old = NULL;
 
 	/* Funnel through hierarchy to reduce memory contention. */
-	rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
+	rnp = __this_cpu_read(rsp->rda->mynode);
 	for (; rnp != NULL; rnp = rnp->parent) {
 		ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
 		      !raw_spin_trylock(&rnp->fqslock);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (11 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 13/17] rcu: Use __this_cpu_read() instead of per_cpu_ptr() Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-08  8:11     ` Paul Bolle
  2014-07-07 22:38   ` [PATCH tip/core/rcu 15/17] rcu: Fix __rcu_reclaim() to use true/false for bool Paul E. McKenney
                     ` (2 subsequent siblings)
  15 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney, Andi Kleen

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
effective at finding race conditions, so this commit removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 kernel/rcu/tree.c                                       | 5 -----
 kernel/rcu/update.c                                     | 3 ---
 tools/testing/selftests/rcutorture/configs/rcu/TREE01   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE02   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE02-T | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE03   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE04   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE05   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE06   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE07   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE08   | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE08-T | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TREE09   | 1 -
 13 files changed, 19 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a6c5424ffa38..1b70cb6fbe3c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1647,11 +1647,6 @@ static int rcu_gp_init(struct rcu_state *rsp)
 					    rnp->level, rnp->grplo,
 					    rnp->grphi, rnp->qsmask);
 		raw_spin_unlock_irq(&rnp->lock);
-#ifdef CONFIG_PROVE_RCU_DELAY
-		if ((prandom_u32() % (rcu_num_nodes + 1)) == 0 &&
-		    system_state == SYSTEM_RUNNING)
-			udelay(200);
-#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
 		cond_resched();
 	}
 
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index bc7883570530..4056d7992a6c 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -90,9 +90,6 @@ void __rcu_read_unlock(void)
 	} else {
 		barrier();  /* critical section before exit code. */
 		t->rcu_read_lock_nesting = INT_MIN;
-#ifdef CONFIG_PROVE_RCU_DELAY
-		udelay(10); /* Make preemption more probable. */
-#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
 		barrier();  /* assign before ->rcu_read_unlock_special load */
 		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
 			rcu_read_unlock_special(t);
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
index 9c827ec59a97..063b7079c621 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
@@ -15,7 +15,6 @@ CONFIG_RCU_FANOUT_EXACT=n
 CONFIG_RCU_NOCB_CPU=y
 CONFIG_RCU_NOCB_CPU_ZERO=y
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_RCU_BOOST=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02 b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
index 1a777b5f68b5..ea119ba2f7d4 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
@@ -18,7 +18,6 @@ CONFIG_RCU_FANOUT_EXACT=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=y
 CONFIG_RCU_BOOST=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
index 61c8d9ce5bb2..19cf9485f48a 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
@@ -18,7 +18,6 @@ CONFIG_RCU_FANOUT_EXACT=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=y
 CONFIG_RCU_BOOST=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
index c1f111c1561b..f4567fb3e332 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
@@ -14,7 +14,6 @@ CONFIG_RCU_FANOUT_LEAF=4
 CONFIG_RCU_FANOUT_EXACT=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_RCU_BOOST=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
index 7dbd27ce17a4..0a262fbb0c12 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
@@ -18,7 +18,6 @@ CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_FANOUT_EXACT=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=y
 CONFIG_RCU_CPU_STALL_VERBOSE=y
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
index d0f32e574743..3a06b97e9a73 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
@@ -18,7 +18,6 @@ CONFIG_RCU_NOCB_CPU_NONE=y
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=y
 CONFIG_PROVE_RCU=y
-CONFIG_PROVE_RCU_DELAY=y
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE06 b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
index 2e477dfb9c57..8f084cca91bf 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE06
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
@@ -19,7 +19,6 @@ CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=y
 CONFIG_PROVE_RCU=y
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
index 042f86ef362a..ab6225506909 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE07
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
@@ -17,7 +17,6 @@ CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_FANOUT_EXACT=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=y
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08 b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
index 3438cee1e3c5..69a2e255bf98 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
@@ -18,7 +18,6 @@ CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_NOCB_CPU=y
 CONFIG_RCU_NOCB_CPU_ALL=y
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_RCU_BOOST=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
index bf4523d3e44c..a0f32fb8f17e 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
@@ -18,7 +18,6 @@ CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_NOCB_CPU=y
 CONFIG_RCU_NOCB_CPU_ALL=y
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_RCU_BOOST=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE09 b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
index 81e4f7c0bf0b..b7a62a540ad1 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE09
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
@@ -13,7 +13,6 @@ CONFIG_SUSPEND=n
 CONFIG_HIBERNATION=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_PROVE_RCU_DELAY=n
 CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_CPU_STALL_VERBOSE=n
 CONFIG_RCU_BOOST=n
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 15/17] rcu: Fix __rcu_reclaim() to use true/false for bool
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (12 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 16/17] rcu: Fix a sparse warning in rcu_initiate_boost() Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 17/17] rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() Paul E. McKenney
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The __rcu_reclaim() function returned 0/1, which is not proper for a
function of type bool.  This commit therefore converts to false/true.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/rcu.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index bfda2726ca45..ff1a6de62f17 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -99,6 +99,10 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
 
 void kfree(const void *);
 
+/*
+ * Reclaim the specified callback, either by invoking it (non-lazy case)
+ * or freeing it directly (lazy case).  Return true if lazy, false otherwise.
+ */
 static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
 {
 	unsigned long offset = (unsigned long)head->func;
@@ -108,12 +112,12 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
 		RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset));
 		kfree((void *)head - offset);
 		rcu_lock_release(&rcu_callback_map);
-		return 1;
+		return true;
 	} else {
 		RCU_TRACE(trace_rcu_invoke_callback(rn, head));
 		head->func(head);
 		rcu_lock_release(&rcu_callback_map);
-		return 0;
+		return false;
 	}
 }
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 16/17] rcu: Fix a sparse warning in rcu_initiate_boost()
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (13 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 15/17] rcu: Fix __rcu_reclaim() to use true/false for bool Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  2014-07-07 22:38   ` [PATCH tip/core/rcu 17/17] rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() Paul E. McKenney
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Pranith Kumar, Paul E. McKenney

From: Pranith Kumar <bobby.prani@gmail.com>

This commit annotates rcu_initiate_boost() fixes the following sparse
warning:

	kernel/rcu/tree_plugin.h:1494:13: warning: context imbalance in 'rcu_initiate_boost' - unexpected unlock

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b39ba7239bd6..0409ba34a05c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1259,6 +1259,7 @@ static int rcu_boost_kthread(void *arg)
  * about it going away.
  */
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
+	__releases(rnp->lock)
 {
 	struct task_struct *t;
 
@@ -1494,6 +1495,7 @@ static void rcu_prepare_kthreads(int cpu)
 #else /* #ifdef CONFIG_RCU_BOOST */
 
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
+	__releases(rnp->lock)
 {
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH tip/core/rcu 17/17] rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
                     ` (14 preceding siblings ...)
  2014-07-07 22:38   ` [PATCH tip/core/rcu 16/17] rcu: Fix a sparse warning in rcu_initiate_boost() Paul E. McKenney
@ 2014-07-07 22:38   ` Paul E. McKenney
  15 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-07 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg, sbw,
	Pranith Kumar, Paul E. McKenney

From: Pranith Kumar <bobby.prani@gmail.com>

This commit annotates rcu_report_unblock_qs_rnp() in order to fix the
following sparse warning:

kernel/rcu/tree_plugin.h:990:13: warning: context imbalance in 'rcu_report_unblock_qs_rnp' - unexpected unlock

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0409ba34a05c..c66bdcb20c82 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -988,6 +988,7 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
 
 /* Because preemptible RCU does not exist, no quieting of tasks. */
 static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags)
+	__releases(rnp->lock)
 {
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY
  2014-07-07 22:38   ` [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY Paul E. McKenney
@ 2014-07-08  8:11     ` Paul Bolle
  2014-07-08 13:56       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Paul Bolle @ 2014-07-08  8:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, sbw, Andi Kleen

Paul,

On Mon, 2014-07-07 at 15:38 -0700, Paul E. McKenney wrote:
> The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
> effective at finding race conditions, so this commit removes it.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> ---
>  kernel/rcu/tree.c                                       | 5 -----
>  kernel/rcu/update.c                                     | 3 ---

The commit explanation implies that this patch would also remove the
entry for PROVE_RCU_DELAY in lib/Kconfig.debug. Is there any reason to
keep it after this patch?

>  tools/testing/selftests/rcutorture/configs/rcu/TREE01   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE02   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE02-T | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE03   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE04   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE05   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE06   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE07   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE08   | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE08-T | 1 -
>  tools/testing/selftests/rcutorture/configs/rcu/TREE09   | 1 -
>  13 files changed, 19 deletions(-)

This doesn't touch a few other references in
tools/testing/selftests/rcutorture/. I have no idea how these tests
work, so I assume these references are still needed. Is that correct?


Paul Bolle


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call
  2014-07-07 22:38   ` [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call Paul E. McKenney
@ 2014-07-08  9:01     ` Lai Jiangshan
  2014-07-08 15:50       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Lai Jiangshan @ 2014-07-08  9:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, sbw

On 07/08/2014 06:38 AM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The explicit local_irq_save() in __lock_task_sighand() is needed to avoid
> a potential deadlock condition, as noted in a841796f11c90d53 (signal:
> align __lock_task_sighand() irq disabling and RCU).  However, someone
> reading the code might be forgiven for concluding that this separate
> local_irq_save() was completely unnecessary.  This commit therefore adds
> a comment referencing the shiny new block comment on rcu_read_unlock().
> 
> Reported-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/signal.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a4077e90f19f..46161e744760 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1263,6 +1263,10 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
>  	struct sighand_struct *sighand;
>  
>  	for (;;) {
> +		/*
> +		 * Disable interrupts early to avoid deadlocks.
> +		 * See rcu_read_unlock comment header for details.
> +		 */

A pair of brackets are missing here: rcu_read_unlock()
after that, please add:

Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>


It reminds me that I should keep my effort to solve the deadlock
problem where rcu_read_unlock() is overlapped with schedule locks.

>  		local_irq_save(*flags);
>  		rcu_read_lock();
>  		sighand = rcu_dereference(tsk->sighand);


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY
  2014-07-08  8:11     ` Paul Bolle
@ 2014-07-08 13:56       ` Paul E. McKenney
  0 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 13:56 UTC (permalink / raw)
  To: Paul Bolle
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, sbw, Andi Kleen

On Tue, Jul 08, 2014 at 10:11:06AM +0200, Paul Bolle wrote:
> Paul,
> 
> On Mon, 2014-07-07 at 15:38 -0700, Paul E. McKenney wrote:
> > The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
> > effective at finding race conditions, so this commit removes it.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Andi Kleen <ak@linux.intel.com>
> > ---
> >  kernel/rcu/tree.c                                       | 5 -----
> >  kernel/rcu/update.c                                     | 3 ---
> 
> The commit explanation implies that this patch would also remove the
> entry for PROVE_RCU_DELAY in lib/Kconfig.debug. Is there any reason to
> keep it after this patch?
> 
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE01   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE02   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE02-T | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE03   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE04   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE05   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE06   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE07   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE08   | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE08-T | 1 -
> >  tools/testing/selftests/rcutorture/configs/rcu/TREE09   | 1 -
> >  13 files changed, 19 deletions(-)
> 
> This doesn't touch a few other references in
> tools/testing/selftests/rcutorture/. I have no idea how these tests
> work, so I assume these references are still needed. Is that correct?

Good catch on both counts, fixed.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 05/17] rcu: remove redundant ACCESS_ONCE() from tick_do_timer_cpu
  2014-07-07 22:38   ` [PATCH tip/core/rcu 05/17] rcu: remove redundant ACCESS_ONCE() from tick_do_timer_cpu Paul E. McKenney
@ 2014-07-08 14:46     ` Frederic Weisbecker
  0 siblings, 0 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-08 14:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Mon, Jul 07, 2014 at 03:38:09PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant
> once boot completes.  Thus, there is no need to wrap it in ACCESS_ONCE()
> in code that is built only when CONFIG_NO_HZ_FULL.  This commit therefore
> removes the redundant ACCESS_ONCE().
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

ACK, thanks!

> ---
>  kernel/rcu/tree_plugin.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 02ac0fb186b8..5da9f9b3abc9 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2844,7 +2844,7 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
>  static void rcu_bind_gp_kthread(void)
>  {
>  #ifdef CONFIG_NO_HZ_FULL
> -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> +	int cpu = tick_do_timer_cpu;
>  
>  	if (cpu < 0 || cpu >= nr_cpu_ids)
>  		return;
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-07 22:38   ` [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs Paul E. McKenney
@ 2014-07-08 15:24     ` Frederic Weisbecker
  2014-07-08 15:47       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-08 15:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> Binding the grace-period kthreads to the timekeeping CPU resulted in
> significant performance decreases for some workloads.  For more detail,
> see:
> 
> https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> 
> https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> 
> It turns out that it is necessary to bind the grace-period kthreads
> to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> In other cases, it suffices to bind the grace-period kthreads to the
> set of non-nohz_full CPUs.
> 
> This commit therefore creates a tick_nohz_not_full_mask that is the
> complement of tick_nohz_full_mask, and then binds the grace-period
> kthread to the set of CPUs indicated by this new mask, which covers
> the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> case still binds the grace-period kthreads to the timekeeping CPU.
> This commit also includes the tick_nohz_full_enabled() check suggested
> by Frederic Weisbecker.
> 
> Reported-by: Jet Chen <jet.chen@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> ---
>  include/linux/tick.h     | 19 +++++++++++++++++++
>  kernel/rcu/tree_plugin.h | 14 +++++++++-----
>  kernel/time/tick-sched.c | 10 ++++++++++
>  3 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b84773cb9f4c..c39af3261351 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -12,6 +12,7 @@
>  #include <linux/hrtimer.h>
>  #include <linux/context_tracking_state.h>
>  #include <linux/cpumask.h>
> +#include <linux/sched.h>
>  
>  #ifdef CONFIG_GENERIC_CLOCKEVENTS
>  
> @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
>  #ifdef CONFIG_NO_HZ_FULL
>  extern bool tick_nohz_full_running;
>  extern cpumask_var_t tick_nohz_full_mask;
> +extern cpumask_var_t tick_nohz_not_full_mask;

So I'm still puzzled by this mask.

How about creating a temporary cpumask on top of tick_nohz_full_mask
from housekeeping_affine().

If you wonder about performance, this can be called once for good from
rcu_spawn_gp_kthread() (that would be much better than checking that all
the time from the kthread itself anyway).

In case you're out of time, I can handle that. Just checking your opinion.

Thanks.

>  
>  static inline bool tick_nohz_full_enabled(void)
>  {
> @@ -194,6 +196,23 @@ static inline void tick_nohz_full_kick_all(void) { }
>  static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
>  #endif
>  
> +static inline bool is_housekeeping_cpu(int cpu)
> +{
> +#ifdef CONFIG_NO_HZ_FULL
> +	if (tick_nohz_full_enabled())
> +		return cpumask_test_cpu(cpu, tick_nohz_not_full_mask);
> +#endif
> +	return true;
> +}
> +
> +static inline void housekeeping_affine(struct task_struct *t)
> +{
> +#ifdef CONFIG_NO_HZ_FULL
> +	if (tick_nohz_full_enabled())
> +		set_cpus_allowed_ptr(t, tick_nohz_not_full_mask);
> +#endif
> +}
> +
>  static inline void tick_nohz_full_check(void)
>  {
>  	if (tick_nohz_full_enabled())
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 719587af7b10..b39ba7239bd6 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2846,12 +2846,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
>   */
>  static void rcu_bind_gp_kthread(void)
>  {
> -#ifdef CONFIG_NO_HZ_FULL
> -	int cpu = tick_do_timer_cpu;
> +	int __maybe_unused cpu;
>  
> -	if (cpu < 0 || cpu >= nr_cpu_ids)
> +	if (!tick_nohz_full_enabled())
>  		return;
> -	if (raw_smp_processor_id() != cpu)
> +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
> +	cpu = tick_do_timer_cpu;
> +	if (cpu >= 0 && cpu < nr_cpu_ids && raw_smp_processor_id() != cpu)
>  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> +#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> +	if (!is_housekeeping_cpu(raw_smp_processor_id()))
> +		housekeeping_affine(current);
> +#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
>  }
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 6558b7ac112d..e023134d63a1 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -154,6 +154,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
>  
>  #ifdef CONFIG_NO_HZ_FULL
>  cpumask_var_t tick_nohz_full_mask;
> +cpumask_var_t tick_nohz_not_full_mask;
>  bool tick_nohz_full_running;
>  
>  static bool can_stop_full_tick(void)
> @@ -281,6 +282,7 @@ static int __init tick_nohz_full_setup(char *str)
>  	int cpu;
>  
>  	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
> +	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
>  	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
>  		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
>  		return 1;
> @@ -291,6 +293,8 @@ static int __init tick_nohz_full_setup(char *str)
>  		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
>  		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
>  	}
> +	cpumask_andnot(tick_nohz_not_full_mask,
> +		       cpu_possible_mask, tick_nohz_full_mask);
>  	tick_nohz_full_running = true;
>  
>  	return 1;
> @@ -332,9 +336,15 @@ static int tick_nohz_init_all(void)
>  		pr_err("NO_HZ: Can't allocate full dynticks cpumask\n");
>  		return err;
>  	}
> +	if (!alloc_cpumask_var(&tick_nohz_not_full_mask, GFP_KERNEL)) {
> +		pr_err("NO_HZ: Can't allocate not-full dynticks cpumask\n");
> +		return err;
> +	}
>  	err = 0;
>  	cpumask_setall(tick_nohz_full_mask);
>  	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
> +	cpumask_clear(tick_nohz_not_full_mask);
> +	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
>  	tick_nohz_full_running = true;
>  #endif
>  	return err;
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 15:24     ` Frederic Weisbecker
@ 2014-07-08 15:47       ` Paul E. McKenney
  2014-07-08 18:38         ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 15:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Tue, Jul 08, 2014 at 05:24:00PM +0200, Frederic Weisbecker wrote:
> On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > significant performance decreases for some workloads.  For more detail,
> > see:
> > 
> > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > 
> > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > 
> > It turns out that it is necessary to bind the grace-period kthreads
> > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > In other cases, it suffices to bind the grace-period kthreads to the
> > set of non-nohz_full CPUs.
> > 
> > This commit therefore creates a tick_nohz_not_full_mask that is the
> > complement of tick_nohz_full_mask, and then binds the grace-period
> > kthread to the set of CPUs indicated by this new mask, which covers
> > the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > case still binds the grace-period kthreads to the timekeeping CPU.
> > This commit also includes the tick_nohz_full_enabled() check suggested
> > by Frederic Weisbecker.
> > 
> > Reported-by: Jet Chen <jet.chen@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> > ---
> >  include/linux/tick.h     | 19 +++++++++++++++++++
> >  kernel/rcu/tree_plugin.h | 14 +++++++++-----
> >  kernel/time/tick-sched.c | 10 ++++++++++
> >  3 files changed, 38 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index b84773cb9f4c..c39af3261351 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -12,6 +12,7 @@
> >  #include <linux/hrtimer.h>
> >  #include <linux/context_tracking_state.h>
> >  #include <linux/cpumask.h>
> > +#include <linux/sched.h>
> >  
> >  #ifdef CONFIG_GENERIC_CLOCKEVENTS
> >  
> > @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> >  #ifdef CONFIG_NO_HZ_FULL
> >  extern bool tick_nohz_full_running;
> >  extern cpumask_var_t tick_nohz_full_mask;
> > +extern cpumask_var_t tick_nohz_not_full_mask;
> 
> So I'm still puzzled by this mask.
> 
> How about creating a temporary cpumask on top of tick_nohz_full_mask
> from housekeeping_affine().
> 
> If you wonder about performance, this can be called once for good from
> rcu_spawn_gp_kthread() (that would be much better than checking that all
> the time from the kthread itself anyway).

I was figuring that a fair number of the kthreads might eventually
be using this, not just for the grace-period kthreads.  In addition,
my concern about once-for-good affinity is for the NO_HZ_FULL_SYSIDLE=y
case, where moving the grace-period kthreads prevents ever entering
full-system idle state.

Or am I missing some use case?

							Thanx, Paul

> In case you're out of time, I can handle that. Just checking your opinion.
> 
> Thanks.
> 
> >  
> >  static inline bool tick_nohz_full_enabled(void)
> >  {
> > @@ -194,6 +196,23 @@ static inline void tick_nohz_full_kick_all(void) { }
> >  static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
> >  #endif
> >  
> > +static inline bool is_housekeeping_cpu(int cpu)
> > +{
> > +#ifdef CONFIG_NO_HZ_FULL
> > +	if (tick_nohz_full_enabled())
> > +		return cpumask_test_cpu(cpu, tick_nohz_not_full_mask);
> > +#endif
> > +	return true;
> > +}
> > +
> > +static inline void housekeeping_affine(struct task_struct *t)
> > +{
> > +#ifdef CONFIG_NO_HZ_FULL
> > +	if (tick_nohz_full_enabled())
> > +		set_cpus_allowed_ptr(t, tick_nohz_not_full_mask);
> > +#endif
> > +}
> > +
> >  static inline void tick_nohz_full_check(void)
> >  {
> >  	if (tick_nohz_full_enabled())
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 719587af7b10..b39ba7239bd6 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2846,12 +2846,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> >   */
> >  static void rcu_bind_gp_kthread(void)
> >  {
> > -#ifdef CONFIG_NO_HZ_FULL
> > -	int cpu = tick_do_timer_cpu;
> > +	int __maybe_unused cpu;
> >  
> > -	if (cpu < 0 || cpu >= nr_cpu_ids)
> > +	if (!tick_nohz_full_enabled())
> >  		return;
> > -	if (raw_smp_processor_id() != cpu)
> > +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
> > +	cpu = tick_do_timer_cpu;
> > +	if (cpu >= 0 && cpu < nr_cpu_ids && raw_smp_processor_id() != cpu)
> >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > +#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> > +	if (!is_housekeeping_cpu(raw_smp_processor_id()))
> > +		housekeeping_affine(current);
> > +#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> >  }
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 6558b7ac112d..e023134d63a1 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -154,6 +154,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
> >  
> >  #ifdef CONFIG_NO_HZ_FULL
> >  cpumask_var_t tick_nohz_full_mask;
> > +cpumask_var_t tick_nohz_not_full_mask;
> >  bool tick_nohz_full_running;
> >  
> >  static bool can_stop_full_tick(void)
> > @@ -281,6 +282,7 @@ static int __init tick_nohz_full_setup(char *str)
> >  	int cpu;
> >  
> >  	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
> > +	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
> >  	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
> >  		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
> >  		return 1;
> > @@ -291,6 +293,8 @@ static int __init tick_nohz_full_setup(char *str)
> >  		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> >  		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> >  	}
> > +	cpumask_andnot(tick_nohz_not_full_mask,
> > +		       cpu_possible_mask, tick_nohz_full_mask);
> >  	tick_nohz_full_running = true;
> >  
> >  	return 1;
> > @@ -332,9 +336,15 @@ static int tick_nohz_init_all(void)
> >  		pr_err("NO_HZ: Can't allocate full dynticks cpumask\n");
> >  		return err;
> >  	}
> > +	if (!alloc_cpumask_var(&tick_nohz_not_full_mask, GFP_KERNEL)) {
> > +		pr_err("NO_HZ: Can't allocate not-full dynticks cpumask\n");
> > +		return err;
> > +	}
> >  	err = 0;
> >  	cpumask_setall(tick_nohz_full_mask);
> >  	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
> > +	cpumask_clear(tick_nohz_not_full_mask);
> > +	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
> >  	tick_nohz_full_running = true;
> >  #endif
> >  	return err;
> > -- 
> > 1.8.1.5
> > 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call
  2014-07-08  9:01     ` Lai Jiangshan
@ 2014-07-08 15:50       ` Paul E. McKenney
  0 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 15:50 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, sbw

On Tue, Jul 08, 2014 at 05:01:51PM +0800, Lai Jiangshan wrote:
> On 07/08/2014 06:38 AM, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The explicit local_irq_save() in __lock_task_sighand() is needed to avoid
> > a potential deadlock condition, as noted in a841796f11c90d53 (signal:
> > align __lock_task_sighand() irq disabling and RCU).  However, someone
> > reading the code might be forgiven for concluding that this separate
> > local_irq_save() was completely unnecessary.  This commit therefore adds
> > a comment referencing the shiny new block comment on rcu_read_unlock().
> > 
> > Reported-by: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  kernel/signal.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index a4077e90f19f..46161e744760 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1263,6 +1263,10 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
> >  	struct sighand_struct *sighand;
> >  
> >  	for (;;) {
> > +		/*
> > +		 * Disable interrupts early to avoid deadlocks.
> > +		 * See rcu_read_unlock comment header for details.
> > +		 */
> 
> A pair of brackets are missing here: rcu_read_unlock()
> after that, please add:
> 
> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

Good point, added the "()" and your Reviewed-by.

> It reminds me that I should keep my effort to solve the deadlock
> problem where rcu_read_unlock() is overlapped with schedule locks.

That would be good!  I vaguely remember an earlier patch of yours that
Steven Rostedt gave feedback on, but have not been able to locate either
email.

							Thanx, Paul

> >  		local_irq_save(*flags);
> >  		rcu_read_lock();
> >  		sighand = rcu_dereference(tsk->sighand);
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  2014-07-07 22:38   ` [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls Paul E. McKenney
@ 2014-07-08 16:59     ` Pranith Kumar
  2014-07-08 20:35       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Pranith Kumar @ 2014-07-08 16:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, mingo, laijs, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, niv, tglx, Peter Zijlstra,
	rostedt, dhowells, Eric Dumazet

Hi Paul,

On Mon, Jul 7, 2014 at 6:38 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> RCU contains code of the following forms:
>
>         ACCESS_ONCE(x)++;
>         ACCESS_ONCE(x) += y;
>         ACCESS_ONCE(x) -= y;
>
> Now these constructs do operate correctly, but they really result in a
> pair of volatile accesses, one to do the load and another to do the store.
> This can be confusing, as the casual reader might well assume that (for
> example) gcc might generate a memory-to-memory add instruction for each
> of these three cases.  In fact, gcc will do no such thing.  Also, there
> is a good chance that the kernel will move to separate load and store
> variants of ACCESS_ONCE(), and constructs like the above could easily
> confuse both people and scripts attempting to make that sort of change.
> Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
> only need the store to be volatile, so that the read-modify-write form
> might be misleading.
>
> This commit therefore changes the above forms in RCU so that each instance
> of ACCESS_ONCE() either does a load or a store, but not both.  In a few
> cases, ACCESS_ONCE() was not critical, for example, for maintaining
> statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
> entirely.
>

Is there any reason why |=, &= cannot be replaced similarly? Also
there are a few more in tree_plugin.h. Please find patch below:

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dac6d20..f500395 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1700,7 +1700,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int
fqs_state_in)
        if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
                raw_spin_lock_irq(&rnp->lock);
                smp_mb__after_unlock_lock();
-               ACCESS_ONCE(rsp->gp_flags) &= ~RCU_GP_FLAG_FQS;
+               ACCESS_ONCE(rsp->gp_flags) = rsp->gp_flags & ~RCU_GP_FLAG_FQS;
                raw_spin_unlock_irq(&rnp->lock);
        }
        return fqs_state;
@@ -2514,7 +2514,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
                raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
                return;  /* Someone beat us to it. */
        }
-       ACCESS_ONCE(rsp->gp_flags) |= RCU_GP_FLAG_FQS;
+       ACCESS_ONCE(rsp->gp_flags) = rsp->gp_flags | RCU_GP_FLAG_FQS;
        raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
        wake_up(&rsp->gp_wq);  /* Memory barrier implied by wake_up() path. */
 }
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1a4ab26..752d382 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -897,7 +897,8 @@ void synchronize_rcu_expedited(void)

        /* Clean up and exit. */
        smp_mb(); /* ensure expedited GP seen before counter increment. */
-       ACCESS_ONCE(sync_rcu_preempt_exp_count)++;
+       ACCESS_ONCE(sync_rcu_preempt_exp_count) =
+                                       sync_rcu_preempt_exp_count + 1;
 unlock_mb_ret:
        mutex_unlock(&sync_rcu_preempt_exp_mutex);
 mb_ret:
@@ -2307,8 +2308,9 @@ static int rcu_nocb_kthread(void *arg)
                        list = next;
                }
                trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-               ACCESS_ONCE(rdp->nocb_p_count) -= c;
-               ACCESS_ONCE(rdp->nocb_p_count_lazy) -= cl;
+               ACCESS_ONCE(rdp->nocb_p_count) = rdp->nocb_p_count - c;
+               ACCESS_ONCE(rdp->nocb_p_count_lazy) =
+                                               rdp->nocb_p_count_lazy - cl;
                rdp->n_nocbs_invoked += c;
        }
        return 0;

-- 
Pranith

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 15:47       ` Paul E. McKenney
@ 2014-07-08 18:38         ` Frederic Weisbecker
  2014-07-08 19:58           ` Paul E. McKenney
  2014-07-11 18:10           ` Christoph Lameter
  0 siblings, 2 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-08 18:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Tue, Jul 08, 2014 at 08:47:23AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 08, 2014 at 05:24:00PM +0200, Frederic Weisbecker wrote:
> > On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > 
> > > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > > significant performance decreases for some workloads.  For more detail,
> > > see:
> > > 
> > > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > > 
> > > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > > 
> > > It turns out that it is necessary to bind the grace-period kthreads
> > > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > > In other cases, it suffices to bind the grace-period kthreads to the
> > > set of non-nohz_full CPUs.
> > > 
> > > This commit therefore creates a tick_nohz_not_full_mask that is the
> > > complement of tick_nohz_full_mask, and then binds the grace-period
> > > kthread to the set of CPUs indicated by this new mask, which covers
> > > the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > > case still binds the grace-period kthreads to the timekeeping CPU.
> > > This commit also includes the tick_nohz_full_enabled() check suggested
> > > by Frederic Weisbecker.
> > > 
> > > Reported-by: Jet Chen <jet.chen@intel.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> > > ---
> > >  include/linux/tick.h     | 19 +++++++++++++++++++
> > >  kernel/rcu/tree_plugin.h | 14 +++++++++-----
> > >  kernel/time/tick-sched.c | 10 ++++++++++
> > >  3 files changed, 38 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > index b84773cb9f4c..c39af3261351 100644
> > > --- a/include/linux/tick.h
> > > +++ b/include/linux/tick.h
> > > @@ -12,6 +12,7 @@
> > >  #include <linux/hrtimer.h>
> > >  #include <linux/context_tracking_state.h>
> > >  #include <linux/cpumask.h>
> > > +#include <linux/sched.h>
> > >  
> > >  #ifdef CONFIG_GENERIC_CLOCKEVENTS
> > >  
> > > @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > >  #ifdef CONFIG_NO_HZ_FULL
> > >  extern bool tick_nohz_full_running;
> > >  extern cpumask_var_t tick_nohz_full_mask;
> > > +extern cpumask_var_t tick_nohz_not_full_mask;
> > 
> > So I'm still puzzled by this mask.
> > 
> > How about creating a temporary cpumask on top of tick_nohz_full_mask
> > from housekeeping_affine().
> > 
> > If you wonder about performance, this can be called once for good from
> > rcu_spawn_gp_kthread() (that would be much better than checking that all
> > the time from the kthread itself anyway).
> 
> I was figuring that a fair number of the kthreads might eventually
> be using this, not just for the grace-period kthreads.

Ok makes sense. But can we just rename the cpumask to housekeeping_mask?

> In addition,
> my concern about once-for-good affinity is for the NO_HZ_FULL_SYSIDLE=y
> case, where moving the grace-period kthreads prevents ever entering
> full-system idle state.
> 
> Or am I missing some use case?

No that's what I had in mind. But rcu_spawn_gp_kthread() still looks like
a better place for that. Or am I missing something else?

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 18:38         ` Frederic Weisbecker
@ 2014-07-08 19:58           ` Paul E. McKenney
  2014-07-08 20:40             ` Frederic Weisbecker
  2014-07-11 18:10           ` Christoph Lameter
  1 sibling, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 19:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Tue, Jul 08, 2014 at 08:38:47PM +0200, Frederic Weisbecker wrote:
> On Tue, Jul 08, 2014 at 08:47:23AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 08, 2014 at 05:24:00PM +0200, Frederic Weisbecker wrote:
> > > On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > > > significant performance decreases for some workloads.  For more detail,
> > > > see:
> > > > 
> > > > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > > > 
> > > > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > > > 
> > > > It turns out that it is necessary to bind the grace-period kthreads
> > > > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > > > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > > > In other cases, it suffices to bind the grace-period kthreads to the
> > > > set of non-nohz_full CPUs.
> > > > 
> > > > This commit therefore creates a tick_nohz_not_full_mask that is the
> > > > complement of tick_nohz_full_mask, and then binds the grace-period
> > > > kthread to the set of CPUs indicated by this new mask, which covers
> > > > the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > > > case still binds the grace-period kthreads to the timekeeping CPU.
> > > > This commit also includes the tick_nohz_full_enabled() check suggested
> > > > by Frederic Weisbecker.
> > > > 
> > > > Reported-by: Jet Chen <jet.chen@intel.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> > > > ---
> > > >  include/linux/tick.h     | 19 +++++++++++++++++++
> > > >  kernel/rcu/tree_plugin.h | 14 +++++++++-----
> > > >  kernel/time/tick-sched.c | 10 ++++++++++
> > > >  3 files changed, 38 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > index b84773cb9f4c..c39af3261351 100644
> > > > --- a/include/linux/tick.h
> > > > +++ b/include/linux/tick.h
> > > > @@ -12,6 +12,7 @@
> > > >  #include <linux/hrtimer.h>
> > > >  #include <linux/context_tracking_state.h>
> > > >  #include <linux/cpumask.h>
> > > > +#include <linux/sched.h>
> > > >  
> > > >  #ifdef CONFIG_GENERIC_CLOCKEVENTS
> > > >  
> > > > @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > > >  #ifdef CONFIG_NO_HZ_FULL
> > > >  extern bool tick_nohz_full_running;
> > > >  extern cpumask_var_t tick_nohz_full_mask;
> > > > +extern cpumask_var_t tick_nohz_not_full_mask;
> > > 
> > > So I'm still puzzled by this mask.
> > > 
> > > How about creating a temporary cpumask on top of tick_nohz_full_mask
> > > from housekeeping_affine().
> > > 
> > > If you wonder about performance, this can be called once for good from
> > > rcu_spawn_gp_kthread() (that would be much better than checking that all
> > > the time from the kthread itself anyway).
> > 
> > I was figuring that a fair number of the kthreads might eventually
> > be using this, not just for the grace-period kthreads.
> 
> Ok makes sense. But can we just rename the cpumask to housekeeping_mask?

Good point!  After all, it someday might be something other than the
complement of tick_nohz_full_mask.

> > In addition,
> > my concern about once-for-good affinity is for the NO_HZ_FULL_SYSIDLE=y
> > case, where moving the grace-period kthreads prevents ever entering
> > full-system idle state.
> > 
> > Or am I missing some use case?
> 
> No that's what I had in mind. But rcu_spawn_gp_kthread() still looks like
> a better place for that. Or am I missing something else?

My fear was that sysadmins would move it in the NO_HZ_FULL_SYSIDLE=y
case, in which case it needs to move back.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  2014-07-08 16:59     ` Pranith Kumar
@ 2014-07-08 20:35       ` Paul E. McKenney
  2014-07-08 20:43         ` Pranith Kumar
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 20:35 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, mingo, laijs, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, niv, tglx, Peter Zijlstra,
	rostedt, dhowells, Eric Dumazet

On Tue, Jul 08, 2014 at 12:59:46PM -0400, Pranith Kumar wrote:
> Hi Paul,
> 
> On Mon, Jul 7, 2014 at 6:38 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > RCU contains code of the following forms:
> >
> >         ACCESS_ONCE(x)++;
> >         ACCESS_ONCE(x) += y;
> >         ACCESS_ONCE(x) -= y;
> >
> > Now these constructs do operate correctly, but they really result in a
> > pair of volatile accesses, one to do the load and another to do the store.
> > This can be confusing, as the casual reader might well assume that (for
> > example) gcc might generate a memory-to-memory add instruction for each
> > of these three cases.  In fact, gcc will do no such thing.  Also, there
> > is a good chance that the kernel will move to separate load and store
> > variants of ACCESS_ONCE(), and constructs like the above could easily
> > confuse both people and scripts attempting to make that sort of change.
> > Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
> > only need the store to be volatile, so that the read-modify-write form
> > might be misleading.
> >
> > This commit therefore changes the above forms in RCU so that each instance
> > of ACCESS_ONCE() either does a load or a store, but not both.  In a few
> > cases, ACCESS_ONCE() was not critical, for example, for maintaining
> > statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
> > entirely.
> >
> 
> Is there any reason why |=, &= cannot be replaced similarly? Also
> there are a few more in tree_plugin.h. Please find patch below:

Good catch, I clearly didn't include enough patterns in my search.

But please see below.  And please rebase onto branch rcu/dev in
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git,
as this patch set does not apply.

							Thanx, Paul

> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index dac6d20..f500395 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1700,7 +1700,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int
> fqs_state_in)
>         if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
>                 raw_spin_lock_irq(&rnp->lock);
>                 smp_mb__after_unlock_lock();
> -               ACCESS_ONCE(rsp->gp_flags) &= ~RCU_GP_FLAG_FQS;
> +               ACCESS_ONCE(rsp->gp_flags) = rsp->gp_flags & ~RCU_GP_FLAG_FQS;

Here we need ACCESS_ONCE() around both instances of rsp->gp_flags.

>                 raw_spin_unlock_irq(&rnp->lock);
>         }
>         return fqs_state;
> @@ -2514,7 +2514,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
>                 raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
>                 return;  /* Someone beat us to it. */
>         }
> -       ACCESS_ONCE(rsp->gp_flags) |= RCU_GP_FLAG_FQS;
> +       ACCESS_ONCE(rsp->gp_flags) = rsp->gp_flags | RCU_GP_FLAG_FQS;

Same here.

>         raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
>         wake_up(&rsp->gp_wq);  /* Memory barrier implied by wake_up() path. */
>  }
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1a4ab26..752d382 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -897,7 +897,8 @@ void synchronize_rcu_expedited(void)
> 
>         /* Clean up and exit. */
>         smp_mb(); /* ensure expedited GP seen before counter increment. */
> -       ACCESS_ONCE(sync_rcu_preempt_exp_count)++;
> +       ACCESS_ONCE(sync_rcu_preempt_exp_count) =
> +                                       sync_rcu_preempt_exp_count + 1;

This one is OK as is because this code path is the only thing that
updates sync_rcu_preempt_exp_count.

>  unlock_mb_ret:
>         mutex_unlock(&sync_rcu_preempt_exp_mutex);
>  mb_ret:
> @@ -2307,8 +2308,9 @@ static int rcu_nocb_kthread(void *arg)
>                         list = next;
>                 }
>                 trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> -               ACCESS_ONCE(rdp->nocb_p_count) -= c;
> -               ACCESS_ONCE(rdp->nocb_p_count_lazy) -= cl;
> +               ACCESS_ONCE(rdp->nocb_p_count) = rdp->nocb_p_count - c;
> +               ACCESS_ONCE(rdp->nocb_p_count_lazy) =
> +                                               rdp->nocb_p_count_lazy - cl;

Same here, no other code path updates ->nocb_p_count_lazy.

>                 rdp->n_nocbs_invoked += c;
>         }
>         return 0;
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 19:58           ` Paul E. McKenney
@ 2014-07-08 20:40             ` Frederic Weisbecker
  2014-07-08 22:05               ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-08 20:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Tue, Jul 08, 2014 at 12:58:37PM -0700, Paul E. McKenney wrote:
> On Tue, Jul 08, 2014 at 08:38:47PM +0200, Frederic Weisbecker wrote:
> > On Tue, Jul 08, 2014 at 08:47:23AM -0700, Paul E. McKenney wrote:
> > > On Tue, Jul 08, 2014 at 05:24:00PM +0200, Frederic Weisbecker wrote:
> > > > On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > > > > significant performance decreases for some workloads.  For more detail,
> > > > > see:
> > > > > 
> > > > > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > > > > 
> > > > > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > > > > 
> > > > > It turns out that it is necessary to bind the grace-period kthreads
> > > > > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > > > > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > > > > In other cases, it suffices to bind the grace-period kthreads to the
> > > > > set of non-nohz_full CPUs.
> > > > > 
> > > > > This commit therefore creates a tick_nohz_not_full_mask that is the
> > > > > complement of tick_nohz_full_mask, and then binds the grace-period
> > > > > kthread to the set of CPUs indicated by this new mask, which covers
> > > > > the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > > > > case still binds the grace-period kthreads to the timekeeping CPU.
> > > > > This commit also includes the tick_nohz_full_enabled() check suggested
> > > > > by Frederic Weisbecker.
> > > > > 
> > > > > Reported-by: Jet Chen <jet.chen@intel.com>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> > > > > ---
> > > > >  include/linux/tick.h     | 19 +++++++++++++++++++
> > > > >  kernel/rcu/tree_plugin.h | 14 +++++++++-----
> > > > >  kernel/time/tick-sched.c | 10 ++++++++++
> > > > >  3 files changed, 38 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > > index b84773cb9f4c..c39af3261351 100644
> > > > > --- a/include/linux/tick.h
> > > > > +++ b/include/linux/tick.h
> > > > > @@ -12,6 +12,7 @@
> > > > >  #include <linux/hrtimer.h>
> > > > >  #include <linux/context_tracking_state.h>
> > > > >  #include <linux/cpumask.h>
> > > > > +#include <linux/sched.h>
> > > > >  
> > > > >  #ifdef CONFIG_GENERIC_CLOCKEVENTS
> > > > >  
> > > > > @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > > > >  #ifdef CONFIG_NO_HZ_FULL
> > > > >  extern bool tick_nohz_full_running;
> > > > >  extern cpumask_var_t tick_nohz_full_mask;
> > > > > +extern cpumask_var_t tick_nohz_not_full_mask;
> > > > 
> > > > So I'm still puzzled by this mask.
> > > > 
> > > > How about creating a temporary cpumask on top of tick_nohz_full_mask
> > > > from housekeeping_affine().
> > > > 
> > > > If you wonder about performance, this can be called once for good from
> > > > rcu_spawn_gp_kthread() (that would be much better than checking that all
> > > > the time from the kthread itself anyway).
> > > 
> > > I was figuring that a fair number of the kthreads might eventually
> > > be using this, not just for the grace-period kthreads.
> > 
> > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> 
> Good point!  After all, it someday might be something other than the
> complement of tick_nohz_full_mask.
> 
> > > In addition,
> > > my concern about once-for-good affinity is for the NO_HZ_FULL_SYSIDLE=y
> > > case, where moving the grace-period kthreads prevents ever entering
> > > full-system idle state.
> > > 
> > > Or am I missing some use case?
> > 
> > No that's what I had in mind. But rcu_spawn_gp_kthread() still looks like
> > a better place for that. Or am I missing something else?
> 
> My fear was that sysadmins would move it in the NO_HZ_FULL_SYSIDLE=y
> case, in which case it needs to move back.

If you use kthread_bind(), this can't be overriden by the user. See PF_NO_SETAFFINITY.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  2014-07-08 20:35       ` Paul E. McKenney
@ 2014-07-08 20:43         ` Pranith Kumar
  2014-07-08 21:40           ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Pranith Kumar @ 2014-07-08 20:43 UTC (permalink / raw)
  To: Paul McKenney
  Cc: LKML, mingo, laijs, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, niv, tglx, Peter Zijlstra,
	rostedt, dhowells, Eric Dumazet

On Tue, Jul 8, 2014 at 4:35 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Good catch, I clearly didn't include enough patterns in my search.
>
> But please see below.  And please rebase onto branch rcu/dev in
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git,
> as this patch set does not apply.

OK, I will resend the patch. One question below:

>
>                                                         Thanx, Paul
>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index dac6d20..f500395 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -1700,7 +1700,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int
>> fqs_state_in)
>>         if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
>>                 raw_spin_lock_irq(&rnp->lock);
>>                 smp_mb__after_unlock_lock();
>> -               ACCESS_ONCE(rsp->gp_flags) &= ~RCU_GP_FLAG_FQS;
>> +               ACCESS_ONCE(rsp->gp_flags) = rsp->gp_flags & ~RCU_GP_FLAG_FQS;
>
> Here we need ACCESS_ONCE() around both instances of rsp->gp_flags.

I see that all accesses of gp_flags are wrapped with ACCESS_ONCE(). Is
there any reason why we can't declare it as 'volatile' and not use
ACCESS_ONCE everywhere?

-- 
Pranith

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls
  2014-07-08 20:43         ` Pranith Kumar
@ 2014-07-08 21:40           ` Paul E. McKenney
  0 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 21:40 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, mingo, laijs, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, niv, tglx, Peter Zijlstra,
	rostedt, dhowells, Eric Dumazet

On Tue, Jul 08, 2014 at 04:43:37PM -0400, Pranith Kumar wrote:
> On Tue, Jul 8, 2014 at 4:35 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Good catch, I clearly didn't include enough patterns in my search.
> >
> > But please see below.  And please rebase onto branch rcu/dev in
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git,
> > as this patch set does not apply.
> 
> OK, I will resend the patch. One question below:
> 
> >
> >                                                         Thanx, Paul
> >
> >> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> >> ---
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index dac6d20..f500395 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -1700,7 +1700,7 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int
> >> fqs_state_in)
> >>         if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
> >>                 raw_spin_lock_irq(&rnp->lock);
> >>                 smp_mb__after_unlock_lock();
> >> -               ACCESS_ONCE(rsp->gp_flags) &= ~RCU_GP_FLAG_FQS;
> >> +               ACCESS_ONCE(rsp->gp_flags) = rsp->gp_flags & ~RCU_GP_FLAG_FQS;
> >
> > Here we need ACCESS_ONCE() around both instances of rsp->gp_flags.
> 
> I see that all accesses of gp_flags are wrapped with ACCESS_ONCE(). Is
> there any reason why we can't declare it as 'volatile' and not use
> ACCESS_ONCE everywhere?

The explicit ACCESS_ONCE() serves as a good documentation aid.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 20:40             ` Frederic Weisbecker
@ 2014-07-08 22:05               ` Paul E. McKenney
  2014-07-09 15:40                 ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-08 22:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Tue, Jul 08, 2014 at 10:40:11PM +0200, Frederic Weisbecker wrote:
> On Tue, Jul 08, 2014 at 12:58:37PM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 08, 2014 at 08:38:47PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Jul 08, 2014 at 08:47:23AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 08, 2014 at 05:24:00PM +0200, Frederic Weisbecker wrote:
> > > > > On Mon, Jul 07, 2014 at 03:38:15PM -0700, Paul E. McKenney wrote:
> > > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > > 
> > > > > > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > > > > > significant performance decreases for some workloads.  For more detail,
> > > > > > see:
> > > > > > 
> > > > > > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > > > > > 
> > > > > > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > > > > > 
> > > > > > It turns out that it is necessary to bind the grace-period kthreads
> > > > > > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > > > > > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > > > > > In other cases, it suffices to bind the grace-period kthreads to the
> > > > > > set of non-nohz_full CPUs.
> > > > > > 
> > > > > > This commit therefore creates a tick_nohz_not_full_mask that is the
> > > > > > complement of tick_nohz_full_mask, and then binds the grace-period
> > > > > > kthread to the set of CPUs indicated by this new mask, which covers
> > > > > > the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > > > > > case still binds the grace-period kthreads to the timekeeping CPU.
> > > > > > This commit also includes the tick_nohz_full_enabled() check suggested
> > > > > > by Frederic Weisbecker.
> > > > > > 
> > > > > > Reported-by: Jet Chen <jet.chen@intel.com>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > > [ paulmck: Created housekeeping_affine() per fweisbec feedback. ]
> > > > > > ---
> > > > > >  include/linux/tick.h     | 19 +++++++++++++++++++
> > > > > >  kernel/rcu/tree_plugin.h | 14 +++++++++-----
> > > > > >  kernel/time/tick-sched.c | 10 ++++++++++
> > > > > >  3 files changed, 38 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > > > index b84773cb9f4c..c39af3261351 100644
> > > > > > --- a/include/linux/tick.h
> > > > > > +++ b/include/linux/tick.h
> > > > > > @@ -12,6 +12,7 @@
> > > > > >  #include <linux/hrtimer.h>
> > > > > >  #include <linux/context_tracking_state.h>
> > > > > >  #include <linux/cpumask.h>
> > > > > > +#include <linux/sched.h>
> > > > > >  
> > > > > >  #ifdef CONFIG_GENERIC_CLOCKEVENTS
> > > > > >  
> > > > > > @@ -162,6 +163,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> > > > > >  #ifdef CONFIG_NO_HZ_FULL
> > > > > >  extern bool tick_nohz_full_running;
> > > > > >  extern cpumask_var_t tick_nohz_full_mask;
> > > > > > +extern cpumask_var_t tick_nohz_not_full_mask;
> > > > > 
> > > > > So I'm still puzzled by this mask.
> > > > > 
> > > > > How about creating a temporary cpumask on top of tick_nohz_full_mask
> > > > > from housekeeping_affine().
> > > > > 
> > > > > If you wonder about performance, this can be called once for good from
> > > > > rcu_spawn_gp_kthread() (that would be much better than checking that all
> > > > > the time from the kthread itself anyway).
> > > > 
> > > > I was figuring that a fair number of the kthreads might eventually
> > > > be using this, not just for the grace-period kthreads.
> > > 
> > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > 
> > Good point!  After all, it someday might be something other than the
> > complement of tick_nohz_full_mask.
> > 
> > > > In addition,
> > > > my concern about once-for-good affinity is for the NO_HZ_FULL_SYSIDLE=y
> > > > case, where moving the grace-period kthreads prevents ever entering
> > > > full-system idle state.
> > > > 
> > > > Or am I missing some use case?
> > > 
> > > No that's what I had in mind. But rcu_spawn_gp_kthread() still looks like
> > > a better place for that. Or am I missing something else?
> > 
> > My fear was that sysadmins would move it in the NO_HZ_FULL_SYSIDLE=y
> > case, in which case it needs to move back.
> 
> If you use kthread_bind(), this can't be overriden by the user. See PF_NO_SETAFFINITY.

Fair point.  This would be a kthread_bind_housekeeping(), then.

But I need to create an kthread_bind_mask() or some such that acts like
kthread_bind(), but which takes a cpumask rather than a cpu.  Easy
enough to do, but sounds like 3.18 material rather than 3.17 material.
So this is on my 3.18 list.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex
  2014-07-07 22:38   ` [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex Paul E. McKenney
@ 2014-07-09  1:50     ` Lai Jiangshan
  2014-07-09 16:04       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Lai Jiangshan @ 2014-07-09  1:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, sbw

On 07/08/2014 06:38 AM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The current approach to RCU priority boosting uses an rt_mutex strictly
> for its priority-boosting side effects.  The rt_mutex_init_proxy_locked()
> function is used by the booster to initialize the lock as held by the
> boostee.  The booster then uses rt_mutex_lock() to acquire this rt_mutex,
> which priority-boosts the boostee.  When the boostee reaches the end
> of its outermost RCU read-side critical section, it checks a field in
> its task structure to see whether it has been boosted, and, if so, uses
> rt_mutex_unlock() to release the rt_mutex.  The booster can then go on
> to boost the next task that is blocking the current RCU grace period.
> 
> But reasonable implementations of rt_mutex_unlock() might result in the
> boostee referencing the rt_mutex's data after releasing it. 

XXXX_unlock(lock_ptr) should not reference to the lock_ptr after it has unlocked the lock. (*)
So I think this patch is unneeded. Although its adding overhead is at slow-patch,
but it adds REVIEW-burden.

And although the original rt_mutex_unlock() violates the rule(*) when the fast-cmpxchg-path,
but it is fixed now.

It is the lock-subsystem's responsible to do this. I prefer to add the wait_for_complete()
stuff until the future when the boostee needs to re-access the booster after rt_mutex_unlock()
instead of now.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17
  2014-07-07 22:37 [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17 Paul E. McKenney
  2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
@ 2014-07-09  2:14 ` Lai Jiangshan
  1 sibling, 0 replies; 54+ messages in thread
From: Lai Jiangshan @ 2014-07-09  2:14 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, sbw

Besides patch 3, please also add my review-by for these patches:
patch 1, 2, 5, 7, 12, 13, 15, 16, 17

Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

Thanks,
Lai

On 07/08/2014 06:37 AM, Paul E. McKenney wrote:
> Hello!
> 
> This series provides miscellaneous fixes:
> 
> 1.	Document deadlock-avoidance information in rcu_read_unlock()'s
> 	docbook comment header.
> 
> 2.	Remove obsolete references to TINY_PREEMPT_RCU.
> 
> 3.	Add deadlock explanation to local_irq_save() call in
> 	__lock_task_sighand().
> 
> 4.	Make the rcu_node arrays be static const char * const,
> 	courtesy of Fabian Frederick.
> 
> 5.	Remove redundant ACCESS_ONCE() from tick_do_timer_cpu under
> 	#ifdef CONFIG_NO_HZ_FULL.
> 
> 6.	Eliminate read-modify-write ACCESS_ONCE() calls.
> 
> 7.	Loosen __call_rcu()'s rcu_head alignment constraint to handle
> 	m68k's 16-bit alignment.
> 
> 8.	Allow post-unlock reference for rt_mutex.
> 
> 9.	Check both root and current rcu_node structures when setting up
> 	future grace periods, courtesy of Pranith Kumar.
> 
> 10.	Simplify priority boosting by putting rt_mutex in rcu_node
> 	structure.
> 
> 11.	Bind grace-period kthreads to no-NO_HZ_FULL CPUs instead of the
> 	timekeeping CPU, at least for CONFIG_NO_HZ_FULL_SYSIDLE=n.
> 
> 12.	Don't use NMIs to dump other CPUs' stacks.
> 
> 13.	Use __this_cpu_read() instead of per_cpu_ptr(), courtesy of Shan Wei.
> 
> 14.	Remove CONFIG_PROVE_RCU_DELAY.
> 
> 15.	Fix __rcu_reclaim to use true/false instead of 1/0.
> 
> 16.	Fix sparse warning in rcu_initiate_boost(), courtesy of Pranith
> 	Kumar.
> 
> 17.	Fix sparse warning in rcu_report_unblock_qs_rnp(), again courtesy
> 	of Pranith Kumar.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
>  b/include/linux/init_task.h                               |    9 --
>  b/include/linux/rcupdate.h                                |   45 ++++++++--
>  b/include/linux/sched.h                                   |    6 -
>  b/include/linux/tick.h                                    |   19 ++++
>  b/init/Kconfig                                            |    2 
>  b/kernel/rcu/rcu.h                                        |    8 +
>  b/kernel/rcu/srcu.c                                       |    4 
>  b/kernel/rcu/tree.c                                       |   59 ++++++--------
>  b/kernel/rcu/tree.h                                       |    8 +
>  b/kernel/rcu/tree_plugin.h                                |   52 +++++++-----
>  b/kernel/rcu/update.c                                     |    3 
>  b/kernel/signal.c                                         |    4 
>  b/kernel/time/tick-sched.c                                |   10 ++
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE01   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE02   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE03   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE04   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE05   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE06   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE07   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE08   |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T |    1 
>  b/tools/testing/selftests/rcutorture/configs/rcu/TREE09   |    1 
>  24 files changed, 147 insertions(+), 93 deletions(-)
> 
> .
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 22:05               ` Paul E. McKenney
@ 2014-07-09 15:40                 ` Frederic Weisbecker
  0 siblings, 0 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-09 15:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	oleg, sbw

On Tue, Jul 08, 2014 at 03:05:56PM -0700, Paul E. McKenney wrote:
> Fair point.  This would be a kthread_bind_housekeeping(), then.

Hmm, after all this should only be needed for kthreads so yeah.

> But I need to create an kthread_bind_mask() or some such that acts like
> kthread_bind(), but which takes a cpumask rather than a cpu.  Easy
> enough to do, but sounds like 3.18 material rather than 3.17 material.
> So this is on my 3.18 list.

Ok sounds good.

Thanks!
 
> 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex
  2014-07-09  1:50     ` Lai Jiangshan
@ 2014-07-09 16:04       ` Paul E. McKenney
  0 siblings, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-09 16:04 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, sbw

On Wed, Jul 09, 2014 at 09:50:09AM +0800, Lai Jiangshan wrote:
> On 07/08/2014 06:38 AM, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The current approach to RCU priority boosting uses an rt_mutex strictly
> > for its priority-boosting side effects.  The rt_mutex_init_proxy_locked()
> > function is used by the booster to initialize the lock as held by the
> > boostee.  The booster then uses rt_mutex_lock() to acquire this rt_mutex,
> > which priority-boosts the boostee.  When the boostee reaches the end
> > of its outermost RCU read-side critical section, it checks a field in
> > its task structure to see whether it has been boosted, and, if so, uses
> > rt_mutex_unlock() to release the rt_mutex.  The booster can then go on
> > to boost the next task that is blocking the current RCU grace period.
> > 
> > But reasonable implementations of rt_mutex_unlock() might result in the
> > boostee referencing the rt_mutex's data after releasing it. 
> 
> XXXX_unlock(lock_ptr) should not reference to the lock_ptr after it has unlocked the lock. (*)
> So I think this patch is unneeded. Although its adding overhead is at slow-patch,
> but it adds REVIEW-burden.
> 
> And although the original rt_mutex_unlock() violates the rule(*) when the fast-cmpxchg-path,
> but it is fixed now.
> 
> It is the lock-subsystem's responsible to do this. I prefer to add the wait_for_complete()
> stuff until the future when the boostee needs to re-access the booster after rt_mutex_unlock()
> instead of now.

It is on my list to remove.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-08 18:38         ` Frederic Weisbecker
  2014-07-08 19:58           ` Paul E. McKenney
@ 2014-07-11 18:10           ` Christoph Lameter
  2014-07-11 18:25             ` Frederic Weisbecker
  2014-07-11 18:29             ` Paul E. McKenney
  1 sibling, 2 replies; 54+ messages in thread
From: Christoph Lameter @ 2014-07-11 18:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Tue, 8 Jul 2014, Frederic Weisbecker wrote:

> > I was figuring that a fair number of the kthreads might eventually
> > be using this, not just for the grace-period kthreads.
>
> Ok makes sense. But can we just rename the cpumask to housekeeping_mask?

That would imply that all no-nohz processors are housekeeping? So all
processors with a tick are housekeeping?

Could we make that set configurable? Ideally I'd like to have the ability
restrict the housekeeping to one processor.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 18:10           ` Christoph Lameter
@ 2014-07-11 18:25             ` Frederic Weisbecker
  2014-07-11 18:45               ` Paul E. McKenney
  2014-07-11 19:05               ` Christoph Lameter
  2014-07-11 18:29             ` Paul E. McKenney
  1 sibling, 2 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 18:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> 
> > > I was figuring that a fair number of the kthreads might eventually
> > > be using this, not just for the grace-period kthreads.
> >
> > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> 
> That would imply that all no-nohz processors are housekeeping? So all
> processors with a tick are housekeeping?

Well, now that I think about it again, I would really like to keep housekeeping
to CPU 0 when nohz_full= is passed.

> 
> Could we make that set configurable? Ideally I'd like to have the ability
> restrict the housekeeping to one processor.

Ah, I'm curious about your usecase. But I think we can do that. And we should.

In fact I think that Paul could keep affining grace period kthread to CPU 0
for the sole case when we have nohz_full= parameter passed.

I think the performance issues reported to him refer to CONFIG_NO_HZ_FULL=y
config without nohz_full= parameter passed. That's the most important to address.

Optimizing the "nohz_full= passed" case is probably not very useful and worse
it complicate things a lot.

What do you think Paul? Can we simplify things that way? I'm pretty sure that
nobody cares about optimizing the nohz_full= case. That would really simplify
things to stick to CPU 0.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 18:10           ` Christoph Lameter
  2014-07-11 18:25             ` Frederic Weisbecker
@ 2014-07-11 18:29             ` Paul E. McKenney
  1 sibling, 0 replies; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-11 18:29 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> 
> > > I was figuring that a fair number of the kthreads might eventually
> > > be using this, not just for the grace-period kthreads.
> >
> > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> 
> That would imply that all no-nohz processors are housekeeping? So all
> processors with a tick are housekeeping?
> 
> Could we make that set configurable? Ideally I'd like to have the ability
> restrict the housekeeping to one processor.

We have a housekeeping_affine() in -rcu that currently assumes that all
no-nohz CPUs are housekeeping CPUs.  It would not be hard to add the
ability to restrict the housekeeping CPUs further.  Coming to agreement
on exactly how to go about doing it might be hard, but read on!  ;-)

Here are some possibilities that come to mind:

1.	Have a housekeeping= boot parameter that takes the list of
	housekeeping CPUs.  You have full control: You break it,
	you get to keep the pieces.

2.	As above, but eliminate any nohz_full= CPUs from the
	housekeeping= list (probably with a splat).

	This of course raises the question about what to do if the
	resulting housekeeping set is empty:

	a.	Complain and let housekeeping tasks run anywhere.

	b.	Complain and restrict housekeeping tasks to !nohz CPUs.

3.	Have a housekeeping= boot parameter that specifies the number
	of housekeeping CPUs, which are taken from the !nohz_full
	list in order.  If too many are specified, complain and either:

	a.	Restrict to !nohz_full CPUs.

	b.	Use the specified number of nohz_full CPUs as
		housekeeping CPUs.

Thoughts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 18:25             ` Frederic Weisbecker
@ 2014-07-11 18:45               ` Paul E. McKenney
  2014-07-11 18:57                 ` Frederic Weisbecker
  2014-07-11 19:05               ` Christoph Lameter
  1 sibling, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-11 18:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 08:25:43PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> > On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> > 
> > > > I was figuring that a fair number of the kthreads might eventually
> > > > be using this, not just for the grace-period kthreads.
> > >
> > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > 
> > That would imply that all no-nohz processors are housekeeping? So all
> > processors with a tick are housekeeping?
> 
> Well, now that I think about it again, I would really like to keep housekeeping
> to CPU 0 when nohz_full= is passed.

When CONFIG_NO_HZ_FULL_SYSIDLE=y, then housekeeping kthreads are bound to
CPU 0.  However, doing this causes significant slowdowns according to
Fengguang's testing, so when CONFIG_NO_HZ_FULL_SYSIDLE=n, I bind the
housekeeping kthreads to the set of non-nohz_full CPUs.

> > Could we make that set configurable? Ideally I'd like to have the ability
> > restrict the housekeeping to one processor.
> 
> Ah, I'm curious about your usecase. But I think we can do that. And we should.
> 
> In fact I think that Paul could keep affining grace period kthread to CPU 0
> for the sole case when we have nohz_full= parameter passed.
> 
> I think the performance issues reported to him refer to CONFIG_NO_HZ_FULL=y
> config without nohz_full= parameter passed. That's the most important to address.
> 
> Optimizing the "nohz_full= passed" case is probably not very useful and worse
> it complicate things a lot.
> 
> What do you think Paul? Can we simplify things that way? I'm pretty sure that
> nobody cares about optimizing the nohz_full= case. That would really simplify
> things to stick to CPU 0.

When we have CONFIG_NO_HZ_FULL_SYSIDLE=y, agreed.  In that case, having
housekeeping CPUs on CPUs other than CPU 0 means that you never reach
full-system-idle state.

But in other cases, we appear to need more than one housekeeping CPU.
This is especially the case when people run general workloads on systems
that have NO_HZ_FULL=y, which appears to be a significant fraction of
the systems these days.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 18:45               ` Paul E. McKenney
@ 2014-07-11 18:57                 ` Frederic Weisbecker
  2014-07-11 19:08                   ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 18:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 11:45:28AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 11, 2014 at 08:25:43PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> > > On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> > > 
> > > > > I was figuring that a fair number of the kthreads might eventually
> > > > > be using this, not just for the grace-period kthreads.
> > > >
> > > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > > 
> > > That would imply that all no-nohz processors are housekeeping? So all
> > > processors with a tick are housekeeping?
> > 
> > Well, now that I think about it again, I would really like to keep housekeeping
> > to CPU 0 when nohz_full= is passed.
> 
> When CONFIG_NO_HZ_FULL_SYSIDLE=y, then housekeeping kthreads are bound to
> CPU 0.  However, doing this causes significant slowdowns according to
> Fengguang's testing, so when CONFIG_NO_HZ_FULL_SYSIDLE=n, I bind the
> housekeeping kthreads to the set of non-nohz_full CPUs.

But did he see these slowdowns with nohz_full= parameter passed? I doubt he
tested that. And I'm not sure that people who need full dynticks will run
the usecases that trigger slowdowns with grace period kthreads.

I also doubt that people will often omit other CPUs than CPU 0 nohz_full=
range.

> 
> > > Could we make that set configurable? Ideally I'd like to have the ability
> > > restrict the housekeeping to one processor.
> > 
> > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > 
> > In fact I think that Paul could keep affining grace period kthread to CPU 0
> > for the sole case when we have nohz_full= parameter passed.
> > 
> > I think the performance issues reported to him refer to CONFIG_NO_HZ_FULL=y
> > config without nohz_full= parameter passed. That's the most important to address.
> > 
> > Optimizing the "nohz_full= passed" case is probably not very useful and worse
> > it complicate things a lot.
> > 
> > What do you think Paul? Can we simplify things that way? I'm pretty sure that
> > nobody cares about optimizing the nohz_full= case. That would really simplify
> > things to stick to CPU 0.
> 
> When we have CONFIG_NO_HZ_FULL_SYSIDLE=y, agreed.  In that case, having
> housekeeping CPUs on CPUs other than CPU 0 means that you never reach
> full-system-idle state.

That said I expect CONFIG_NO_HZ_FULL_SYSIDLE=y to be always enable for those
who run NO_HZ_FULL in the long run.

> 
> But in other cases, we appear to need more than one housekeeping CPU.
> This is especially the case when people run general workloads on systems
> that have NO_HZ_FULL=y, which appears to be a significant fraction of
> the systems these days.

Yeah NO_HZ_FULL=y is likely to be enabled in many distros. But you know the
amount of nohz_full= users.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 18:25             ` Frederic Weisbecker
  2014-07-11 18:45               ` Paul E. McKenney
@ 2014-07-11 19:05               ` Christoph Lameter
  2014-07-11 19:11                 ` Frederic Weisbecker
  2014-07-11 20:15                 ` Peter Zijlstra
  1 sibling, 2 replies; 54+ messages in thread
From: Christoph Lameter @ 2014-07-11 19:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, 11 Jul 2014, Frederic Weisbecker wrote:

> > That would imply that all no-nohz processors are housekeeping? So all
> > processors with a tick are housekeeping?
>
> Well, now that I think about it again, I would really like to keep housekeeping
> to CPU 0 when nohz_full= is passed.

Yeah.

> > Could we make that set configurable? Ideally I'd like to have the ability
> > restrict the housekeeping to one processor.
>
> Ah, I'm curious about your usecase. But I think we can do that. And we should.

The use case is pretty straightforward because we are trying to keep as
much OS noise as possible off most processors. Processor 0 is the
sacrificial lamb that will be used for all OS processing and hopefully all
high latency operations will occur there. Processors 1-X have a tick but
we still try to keep latencies sane. And then there is X-Y where tick is
off.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 18:57                 ` Frederic Weisbecker
@ 2014-07-11 19:08                   ` Paul E. McKenney
  2014-07-11 19:26                     ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-11 19:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 08:57:33PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 11, 2014 at 11:45:28AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 11, 2014 at 08:25:43PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> > > > On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> > > > 
> > > > > > I was figuring that a fair number of the kthreads might eventually
> > > > > > be using this, not just for the grace-period kthreads.
> > > > >
> > > > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > > > 
> > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > processors with a tick are housekeeping?
> > > 
> > > Well, now that I think about it again, I would really like to keep housekeeping
> > > to CPU 0 when nohz_full= is passed.
> > 
> > When CONFIG_NO_HZ_FULL_SYSIDLE=y, then housekeeping kthreads are bound to
> > CPU 0.  However, doing this causes significant slowdowns according to
> > Fengguang's testing, so when CONFIG_NO_HZ_FULL_SYSIDLE=n, I bind the
> > housekeeping kthreads to the set of non-nohz_full CPUs.
> 
> But did he see these slowdowns with nohz_full= parameter passed? I doubt he
> tested that. And I'm not sure that people who need full dynticks will run
> the usecases that trigger slowdowns with grace period kthreads.
> 
> I also doubt that people will often omit other CPUs than CPU 0 nohz_full=
> range.

Agreed, this is only a problem when people run workloads for which
NO_HZ_FULL is not well-suited.  Which is why I settled on designating
the non-nohz_full= CPUs as the housekeeping CPUs -- people wanting to
run general workloads not suited to NO_HZ_FULL probably won't specify
nohz_full=.  If they don't, then any CPU can be a housekeeping CPU.

> > > > Could we make that set configurable? Ideally I'd like to have the ability
> > > > restrict the housekeeping to one processor.
> > > 
> > > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > > 
> > > In fact I think that Paul could keep affining grace period kthread to CPU 0
> > > for the sole case when we have nohz_full= parameter passed.
> > > 
> > > I think the performance issues reported to him refer to CONFIG_NO_HZ_FULL=y
> > > config without nohz_full= parameter passed. That's the most important to address.
> > > 
> > > Optimizing the "nohz_full= passed" case is probably not very useful and worse
> > > it complicate things a lot.
> > > 
> > > What do you think Paul? Can we simplify things that way? I'm pretty sure that
> > > nobody cares about optimizing the nohz_full= case. That would really simplify
> > > things to stick to CPU 0.
> > 
> > When we have CONFIG_NO_HZ_FULL_SYSIDLE=y, agreed.  In that case, having
> > housekeeping CPUs on CPUs other than CPU 0 means that you never reach
> > full-system-idle state.
> 
> That said I expect CONFIG_NO_HZ_FULL_SYSIDLE=y to be always enable for those
> who run NO_HZ_FULL in the long run.

Hmmm...  That probably means that we need boot-time parameters to
make sysidle detection really happen.  Otherwise, many users will
get a nasty surprise once CONFIG_NO_HZ_FULL_SYSIDLE=y is enabled on
systems that really aren't running HPC or RT workloads.

I suppose that I could confine SYSIDLE's attention to the nohz_full=
CPUs -- that might actually make things work nicely in all cases with
no configuration of any sort required.  I will need to give this some
thought.

> > But in other cases, we appear to need more than one housekeeping CPU.
> > This is especially the case when people run general workloads on systems
> > that have NO_HZ_FULL=y, which appears to be a significant fraction of
> > the systems these days.
> 
> Yeah NO_HZ_FULL=y is likely to be enabled in many distros. But you know the
> amount of nohz_full= users.

Indeed!  ;-)

								Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 19:05               ` Christoph Lameter
@ 2014-07-11 19:11                 ` Frederic Weisbecker
  2014-07-11 20:35                   ` Paul E. McKenney
  2014-07-11 20:15                 ` Peter Zijlstra
  1 sibling, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 19:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 02:05:08PM -0500, Christoph Lameter wrote:
> On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> 
> > > That would imply that all no-nohz processors are housekeeping? So all
> > > processors with a tick are housekeeping?
> >
> > Well, now that I think about it again, I would really like to keep housekeeping
> > to CPU 0 when nohz_full= is passed.
> 
> Yeah.
> 
> > > Could we make that set configurable? Ideally I'd like to have the ability
> > > restrict the housekeeping to one processor.
> >
> > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> 
> The use case is pretty straightforward because we are trying to keep as
> much OS noise as possible off most processors. Processor 0 is the
> sacrificial lamb that will be used for all OS processing and hopefully all
> high latency operations will occur there. Processors 1-X have a tick but
> we still try to keep latencies sane. And then there is X-Y where tick is
> off.

Ok. I don't entirely get why you need 1-X but I can easily imagine some non-latency-critical
stuff running there.

Paul proposed "housekeeping=". If we ever go there, I'd rather vote for "sacrifical_lamb="

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 19:08                   ` Paul E. McKenney
@ 2014-07-11 19:26                     ` Frederic Weisbecker
  2014-07-11 19:43                       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 19:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 12:08:16PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 11, 2014 at 08:57:33PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 11, 2014 at 11:45:28AM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 11, 2014 at 08:25:43PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> > > > > On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> > > > > 
> > > > > > > I was figuring that a fair number of the kthreads might eventually
> > > > > > > be using this, not just for the grace-period kthreads.
> > > > > >
> > > > > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > > > > 
> > > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > > processors with a tick are housekeeping?
> > > > 
> > > > Well, now that I think about it again, I would really like to keep housekeeping
> > > > to CPU 0 when nohz_full= is passed.
> > > 
> > > When CONFIG_NO_HZ_FULL_SYSIDLE=y, then housekeeping kthreads are bound to
> > > CPU 0.  However, doing this causes significant slowdowns according to
> > > Fengguang's testing, so when CONFIG_NO_HZ_FULL_SYSIDLE=n, I bind the
> > > housekeeping kthreads to the set of non-nohz_full CPUs.
> > 
> > But did he see these slowdowns with nohz_full= parameter passed? I doubt he
> > tested that. And I'm not sure that people who need full dynticks will run
> > the usecases that trigger slowdowns with grace period kthreads.
> > 
> > I also doubt that people will often omit other CPUs than CPU 0 nohz_full=
> > range.
> 
> Agreed, this is only a problem when people run workloads for which
> NO_HZ_FULL is not well-suited.  Which is why I settled on designating
> the non-nohz_full= CPUs as the housekeeping CPUs -- people wanting to
> run general workloads not suited to NO_HZ_FULL probably won't specify
> nohz_full=.  If they don't, then any CPU can be a housekeeping CPU.

Right. So affining GP kthread to all non-nohz-full CPU works in all case. It's convenient
but it requires some plumbing:

* add a housekeeping cpumask and implement housekeeping_affine on top
* add kthread_bind_cpumask()

So what I propose is to skip these complications and just do:

        if (tick_nohz_full_enabled()) // means that somebody passed nohz_full= kernel parameter
            kthread_bind_cpu(GP kthread, 0)

Moreover Thomas didn't like the idea of extending housekeeping duty further CPU 0, arguing that
it's too early for that. He meant that for timekeeping but the idea is expandable.

> 
> > > > > Could we make that set configurable? Ideally I'd like to have the ability
> > > > > restrict the housekeeping to one processor.
> > > > 
> > > > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > > > 
> > > > In fact I think that Paul could keep affining grace period kthread to CPU 0
> > > > for the sole case when we have nohz_full= parameter passed.
> > > > 
> > > > I think the performance issues reported to him refer to CONFIG_NO_HZ_FULL=y
> > > > config without nohz_full= parameter passed. That's the most important to address.
> > > > 
> > > > Optimizing the "nohz_full= passed" case is probably not very useful and worse
> > > > it complicate things a lot.
> > > > 
> > > > What do you think Paul? Can we simplify things that way? I'm pretty sure that
> > > > nobody cares about optimizing the nohz_full= case. That would really simplify
> > > > things to stick to CPU 0.
> > > 
> > > When we have CONFIG_NO_HZ_FULL_SYSIDLE=y, agreed.  In that case, having
> > > housekeeping CPUs on CPUs other than CPU 0 means that you never reach
> > > full-system-idle state.
> > 
> > That said I expect CONFIG_NO_HZ_FULL_SYSIDLE=y to be always enable for those
> > who run NO_HZ_FULL in the long run.
> 
> Hmmm...  That probably means that we need boot-time parameters to
> make sysidle detection really happen.  Otherwise, many users will
> get a nasty surprise once CONFIG_NO_HZ_FULL_SYSIDLE=y is enabled on
> systems that really aren't running HPC or RT workloads.
> 
> I suppose that I could confine SYSIDLE's attention to the nohz_full=
> CPUs -- that might actually make things work nicely in all cases with
> no configuration of any sort required.  I will need to give this some
> thought.

Exactly, nohz_full= gives all the information we need for sysidle.

> 
> > > But in other cases, we appear to need more than one housekeeping CPU.
> > > This is especially the case when people run general workloads on systems
> > > that have NO_HZ_FULL=y, which appears to be a significant fraction of
> > > the systems these days.
> > 
> > Yeah NO_HZ_FULL=y is likely to be enabled in many distros. But you know the
> > amount of nohz_full= users.
> 
> Indeed!  ;-)
> 
> 								Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 19:26                     ` Frederic Weisbecker
@ 2014-07-11 19:43                       ` Paul E. McKenney
  2014-07-11 19:55                         ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-11 19:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 09:26:14PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 11, 2014 at 12:08:16PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 11, 2014 at 08:57:33PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jul 11, 2014 at 11:45:28AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jul 11, 2014 at 08:25:43PM +0200, Frederic Weisbecker wrote:
> > > > > On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> > > > > > On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> > > > > > 
> > > > > > > > I was figuring that a fair number of the kthreads might eventually
> > > > > > > > be using this, not just for the grace-period kthreads.
> > > > > > >
> > > > > > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > > > > > 
> > > > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > > > processors with a tick are housekeeping?
> > > > > 
> > > > > Well, now that I think about it again, I would really like to keep housekeeping
> > > > > to CPU 0 when nohz_full= is passed.
> > > > 
> > > > When CONFIG_NO_HZ_FULL_SYSIDLE=y, then housekeeping kthreads are bound to
> > > > CPU 0.  However, doing this causes significant slowdowns according to
> > > > Fengguang's testing, so when CONFIG_NO_HZ_FULL_SYSIDLE=n, I bind the
> > > > housekeeping kthreads to the set of non-nohz_full CPUs.
> > > 
> > > But did he see these slowdowns with nohz_full= parameter passed? I doubt he
> > > tested that. And I'm not sure that people who need full dynticks will run
> > > the usecases that trigger slowdowns with grace period kthreads.
> > > 
> > > I also doubt that people will often omit other CPUs than CPU 0 nohz_full=
> > > range.
> > 
> > Agreed, this is only a problem when people run workloads for which
> > NO_HZ_FULL is not well-suited.  Which is why I settled on designating
> > the non-nohz_full= CPUs as the housekeeping CPUs -- people wanting to
> > run general workloads not suited to NO_HZ_FULL probably won't specify
> > nohz_full=.  If they don't, then any CPU can be a housekeeping CPU.
> 
> Right. So affining GP kthread to all non-nohz-full CPU works in all case. It's convenient
> but it requires some plumbing:
> 
> * add a housekeeping cpumask and implement housekeeping_affine on top
> * add kthread_bind_cpumask()

Yep.

> So what I propose is to skip these complications and just do:
> 
>         if (tick_nohz_full_enabled()) // means that somebody passed nohz_full= kernel parameter
>             kthread_bind_cpu(GP kthread, 0)
> 
> Moreover Thomas didn't like the idea of extending housekeeping duty further CPU 0, arguing that
> it's too early for that. He meant that for timekeeping but the idea is expandable.

Although I agree that we can get away with a single timekeeping CPU, I
don't believe that we get away with having only a single housekeeping CPU.

> > > > > > Could we make that set configurable? Ideally I'd like to have the ability
> > > > > > restrict the housekeeping to one processor.
> > > > > 
> > > > > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > > > > 
> > > > > In fact I think that Paul could keep affining grace period kthread to CPU 0
> > > > > for the sole case when we have nohz_full= parameter passed.
> > > > > 
> > > > > I think the performance issues reported to him refer to CONFIG_NO_HZ_FULL=y
> > > > > config without nohz_full= parameter passed. That's the most important to address.
> > > > > 
> > > > > Optimizing the "nohz_full= passed" case is probably not very useful and worse
> > > > > it complicate things a lot.
> > > > > 
> > > > > What do you think Paul? Can we simplify things that way? I'm pretty sure that
> > > > > nobody cares about optimizing the nohz_full= case. That would really simplify
> > > > > things to stick to CPU 0.
> > > > 
> > > > When we have CONFIG_NO_HZ_FULL_SYSIDLE=y, agreed.  In that case, having
> > > > housekeeping CPUs on CPUs other than CPU 0 means that you never reach
> > > > full-system-idle state.
> > > 
> > > That said I expect CONFIG_NO_HZ_FULL_SYSIDLE=y to be always enable for those
> > > who run NO_HZ_FULL in the long run.
> > 
> > Hmmm...  That probably means that we need boot-time parameters to
> > make sysidle detection really happen.  Otherwise, many users will
> > get a nasty surprise once CONFIG_NO_HZ_FULL_SYSIDLE=y is enabled on
> > systems that really aren't running HPC or RT workloads.
> > 
> > I suppose that I could confine SYSIDLE's attention to the nohz_full=
> > CPUs -- that might actually make things work nicely in all cases with
> > no configuration of any sort required.  I will need to give this some
> > thought.
> 
> Exactly, nohz_full= gives all the information we need for sysidle.

Famous last words!  ;-)

But it does good thus far.

							Thanx, Paul

> > > > But in other cases, we appear to need more than one housekeeping CPU.
> > > > This is especially the case when people run general workloads on systems
> > > > that have NO_HZ_FULL=y, which appears to be a significant fraction of
> > > > the systems these days.
> > > 
> > > Yeah NO_HZ_FULL=y is likely to be enabled in many distros. But you know the
> > > amount of nohz_full= users.
> > 
> > Indeed!  ;-)
> > 
> > 								Thanx, Paul
> > 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 19:43                       ` Paul E. McKenney
@ 2014-07-11 19:55                         ` Frederic Weisbecker
  0 siblings, 0 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 19:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 12:43:14PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 11, 2014 at 09:26:14PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 11, 2014 at 12:08:16PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 11, 2014 at 08:57:33PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Jul 11, 2014 at 11:45:28AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jul 11, 2014 at 08:25:43PM +0200, Frederic Weisbecker wrote:
> > > > > > On Fri, Jul 11, 2014 at 01:10:41PM -0500, Christoph Lameter wrote:
> > > > > > > On Tue, 8 Jul 2014, Frederic Weisbecker wrote:
> > > > > > > 
> > > > > > > > > I was figuring that a fair number of the kthreads might eventually
> > > > > > > > > be using this, not just for the grace-period kthreads.
> > > > > > > >
> > > > > > > > Ok makes sense. But can we just rename the cpumask to housekeeping_mask?
> > > > > > > 
> > > > > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > > > > processors with a tick are housekeeping?
> > > > > > 
> > > > > > Well, now that I think about it again, I would really like to keep housekeeping
> > > > > > to CPU 0 when nohz_full= is passed.
> > > > > 
> > > > > When CONFIG_NO_HZ_FULL_SYSIDLE=y, then housekeeping kthreads are bound to
> > > > > CPU 0.  However, doing this causes significant slowdowns according to
> > > > > Fengguang's testing, so when CONFIG_NO_HZ_FULL_SYSIDLE=n, I bind the
> > > > > housekeeping kthreads to the set of non-nohz_full CPUs.
> > > > 
> > > > But did he see these slowdowns with nohz_full= parameter passed? I doubt he
> > > > tested that. And I'm not sure that people who need full dynticks will run
> > > > the usecases that trigger slowdowns with grace period kthreads.
> > > > 
> > > > I also doubt that people will often omit other CPUs than CPU 0 nohz_full=
> > > > range.
> > > 
> > > Agreed, this is only a problem when people run workloads for which
> > > NO_HZ_FULL is not well-suited.  Which is why I settled on designating
> > > the non-nohz_full= CPUs as the housekeeping CPUs -- people wanting to
> > > run general workloads not suited to NO_HZ_FULL probably won't specify
> > > nohz_full=.  If they don't, then any CPU can be a housekeeping CPU.
> > 
> > Right. So affining GP kthread to all non-nohz-full CPU works in all case. It's convenient
> > but it requires some plumbing:
> > 
> > * add a housekeeping cpumask and implement housekeeping_affine on top
> > * add kthread_bind_cpumask()
> 
> Yep.
> 
> > So what I propose is to skip these complications and just do:
> > 
> >         if (tick_nohz_full_enabled()) // means that somebody passed nohz_full= kernel parameter
> >             kthread_bind_cpu(GP kthread, 0)
> > 
> > Moreover Thomas didn't like the idea of extending housekeeping duty further CPU 0, arguing that
> > it's too early for that. He meant that for timekeeping but the idea is expandable.
> 
> Although I agree that we can get away with a single timekeeping CPU, I
> don't believe that we get away with having only a single housekeeping CPU.

Ok, well I won't insist too much. As long as the performance issue is fixed, I'm ok :)

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 19:05               ` Christoph Lameter
  2014-07-11 19:11                 ` Frederic Weisbecker
@ 2014-07-11 20:15                 ` Peter Zijlstra
  2014-07-14 13:53                   ` Christoph Lameter
  1 sibling, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2014-07-11 20:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, Paul E. McKenney, linux-kernel, mingo,
	laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	rostedt, dhowells, edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 02:05:08PM -0500, Christoph Lameter wrote:
> The use case is pretty straightforward because we are trying to keep as
> much OS noise as possible off most processors. Processor 0 is the
> sacrificial lamb that will be used for all OS processing and hopefully all
> high latency operations will occur there. Processors 1-X have a tick but
> we still try to keep latencies sane. And then there is X-Y where tick is
> off.

So the problem is -- and this isn't too hard to create -- when we
saturate/overload cpu0 with just the housekeeping. At that point it
would be good if we could scale and use multiple housekeepers.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 19:11                 ` Frederic Weisbecker
@ 2014-07-11 20:35                   ` Paul E. McKenney
  2014-07-11 20:45                     ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-11 20:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 09:11:15PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 11, 2014 at 02:05:08PM -0500, Christoph Lameter wrote:
> > On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> > 
> > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > processors with a tick are housekeeping?
> > >
> > > Well, now that I think about it again, I would really like to keep housekeeping
> > > to CPU 0 when nohz_full= is passed.
> > 
> > Yeah.
> > 
> > > > Could we make that set configurable? Ideally I'd like to have the ability
> > > > restrict the housekeeping to one processor.
> > >
> > > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > 
> > The use case is pretty straightforward because we are trying to keep as
> > much OS noise as possible off most processors. Processor 0 is the
> > sacrificial lamb that will be used for all OS processing and hopefully all
> > high latency operations will occur there. Processors 1-X have a tick but
> > we still try to keep latencies sane. And then there is X-Y where tick is
> > off.
> 
> Ok. I don't entirely get why you need 1-X but I can easily imagine some non-latency-critical
> stuff running there.
> 
> Paul proposed "housekeeping=". If we ever go there, I'd rather vote for "sacrifical_lamb="

Given Christoph's desire for only one housekeeping CPU, I guess the
counting version makes the most sense, so that "housekeeping=N" designates
the first N non-nohz CPUs in numerical order as housekeeping CPUs.
If there are fewer than N non-nohz CPUs, you get a splat at boot time
and your request is capped at the number of non-nohz CPUs.

Seem reasonable?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 20:35                   ` Paul E. McKenney
@ 2014-07-11 20:45                     ` Frederic Weisbecker
  2014-07-12  1:39                       ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 20:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 01:35:13PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 11, 2014 at 09:11:15PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 11, 2014 at 02:05:08PM -0500, Christoph Lameter wrote:
> > > On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> > > 
> > > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > > processors with a tick are housekeeping?
> > > >
> > > > Well, now that I think about it again, I would really like to keep housekeeping
> > > > to CPU 0 when nohz_full= is passed.
> > > 
> > > Yeah.
> > > 
> > > > > Could we make that set configurable? Ideally I'd like to have the ability
> > > > > restrict the housekeeping to one processor.
> > > >
> > > > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > > 
> > > The use case is pretty straightforward because we are trying to keep as
> > > much OS noise as possible off most processors. Processor 0 is the
> > > sacrificial lamb that will be used for all OS processing and hopefully all
> > > high latency operations will occur there. Processors 1-X have a tick but
> > > we still try to keep latencies sane. And then there is X-Y where tick is
> > > off.
> > 
> > Ok. I don't entirely get why you need 1-X but I can easily imagine some non-latency-critical
> > stuff running there.
> > 
> > Paul proposed "housekeeping=". If we ever go there, I'd rather vote for "sacrifical_lamb="
> 
> Given Christoph's desire for only one housekeeping CPU, I guess the
> counting version makes the most sense, so that "housekeeping=N" designates
> the first N non-nohz CPUs in numerical order as housekeeping CPUs.
> If there are fewer than N non-nohz CPUs, you get a splat at boot time
> and your request is capped at the number of non-nohz CPUs.
> 
> Seem reasonable?

I wonder if it's wouldn't be rather reasonable to affine housekeeping to all non-nohz-full CPUs
by default and then people who want finergrained housekeeping can affine manually kthreads from userspace.

That implies to bind without PF_NO_SETAFFINIT but that's easy enough to do.

> 
> 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 20:45                     ` Frederic Weisbecker
@ 2014-07-12  1:39                       ` Paul E. McKenney
  2014-07-14 13:52                         ` Christoph Lameter
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2014-07-12  1:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, Jul 11, 2014 at 10:45:05PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 11, 2014 at 01:35:13PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 11, 2014 at 09:11:15PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jul 11, 2014 at 02:05:08PM -0500, Christoph Lameter wrote:
> > > > On Fri, 11 Jul 2014, Frederic Weisbecker wrote:
> > > > 
> > > > > > That would imply that all no-nohz processors are housekeeping? So all
> > > > > > processors with a tick are housekeeping?
> > > > >
> > > > > Well, now that I think about it again, I would really like to keep housekeeping
> > > > > to CPU 0 when nohz_full= is passed.
> > > > 
> > > > Yeah.
> > > > 
> > > > > > Could we make that set configurable? Ideally I'd like to have the ability
> > > > > > restrict the housekeeping to one processor.
> > > > >
> > > > > Ah, I'm curious about your usecase. But I think we can do that. And we should.
> > > > 
> > > > The use case is pretty straightforward because we are trying to keep as
> > > > much OS noise as possible off most processors. Processor 0 is the
> > > > sacrificial lamb that will be used for all OS processing and hopefully all
> > > > high latency operations will occur there. Processors 1-X have a tick but
> > > > we still try to keep latencies sane. And then there is X-Y where tick is
> > > > off.
> > > 
> > > Ok. I don't entirely get why you need 1-X but I can easily imagine some non-latency-critical
> > > stuff running there.
> > > 
> > > Paul proposed "housekeeping=". If we ever go there, I'd rather vote for "sacrifical_lamb="
> > 
> > Given Christoph's desire for only one housekeeping CPU, I guess the
> > counting version makes the most sense, so that "housekeeping=N" designates
> > the first N non-nohz CPUs in numerical order as housekeeping CPUs.
> > If there are fewer than N non-nohz CPUs, you get a splat at boot time
> > and your request is capped at the number of non-nohz CPUs.
> > 
> > Seem reasonable?
> 
> I wonder if it's wouldn't be rather reasonable to affine housekeeping to all non-nohz-full CPUs
> by default and then people who want finergrained housekeeping can affine manually kthreads from userspace.
> 
> That implies to bind without PF_NO_SETAFFINIT but that's easy enough to do.

Works for me!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-12  1:39                       ` Paul E. McKenney
@ 2014-07-14 13:52                         ` Christoph Lameter
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2014-07-14 13:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt, dhowells,
	edumazet, dvhart, oleg, sbw

On Fri, 11 Jul 2014, Paul E. McKenney wrote:

> >
> > I wonder if it's wouldn't be rather reasonable to affine housekeeping to all non-nohz-full CPUs
> > by default and then people who want finergrained housekeeping can affine manually kthreads from userspace.
> >
> > That implies to bind without PF_NO_SETAFFINIT but that's easy enough to do.
>
> Works for me!

Also here.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
  2014-07-11 20:15                 ` Peter Zijlstra
@ 2014-07-14 13:53                   ` Christoph Lameter
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2014-07-14 13:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Paul E. McKenney, linux-kernel, mingo,
	laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	rostedt, dhowells, edumazet, dvhart, oleg, sbw

On Fri, 11 Jul 2014, Peter Zijlstra wrote:

> On Fri, Jul 11, 2014 at 02:05:08PM -0500, Christoph Lameter wrote:
> > The use case is pretty straightforward because we are trying to keep as
> > much OS noise as possible off most processors. Processor 0 is the
> > sacrificial lamb that will be used for all OS processing and hopefully all
> > high latency operations will occur there. Processors 1-X have a tick but
> > we still try to keep latencies sane. And then there is X-Y where tick is
> > off.
>
> So the problem is -- and this isn't too hard to create -- when we
> saturate/overload cpu0 with just the housekeeping. At that point it
> would be good if we could scale and use multiple housekeepers.

Well we would simply add another machine at that point but there are
certainly configurations in which we would also want multiple housekeeping
cpus.


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2014-07-14 13:53 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-07 22:37 [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17 Paul E. McKenney
2014-07-07 22:38 ` [PATCH tip/core/rcu 01/17] rcu: Document deadlock-avoidance information for rcu_read_unlock() Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 02/17] rcu: Handle obsolete references to TINY_PREEMPT_RCU Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 03/17] signal: Explain local_irq_save() call Paul E. McKenney
2014-07-08  9:01     ` Lai Jiangshan
2014-07-08 15:50       ` Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 04/17] rcu: Make rcu node arrays static const char * const Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 05/17] rcu: remove redundant ACCESS_ONCE() from tick_do_timer_cpu Paul E. McKenney
2014-07-08 14:46     ` Frederic Weisbecker
2014-07-07 22:38   ` [PATCH tip/core/rcu 06/17] rcu: Eliminate read-modify-write ACCESS_ONCE() calls Paul E. McKenney
2014-07-08 16:59     ` Pranith Kumar
2014-07-08 20:35       ` Paul E. McKenney
2014-07-08 20:43         ` Pranith Kumar
2014-07-08 21:40           ` Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 07/17] rcu: Loosen __call_rcu()'s rcu_head alignment constraint Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 08/17] rcu: Allow post-unlock reference for rt_mutex Paul E. McKenney
2014-07-09  1:50     ` Lai Jiangshan
2014-07-09 16:04       ` Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 09/17] rcu: Check both root and current rcu_node when setting up future grace period Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 10/17] rcu: Simplify priority boosting by putting rt_mutex in rcu_node Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 11/17] rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs Paul E. McKenney
2014-07-08 15:24     ` Frederic Weisbecker
2014-07-08 15:47       ` Paul E. McKenney
2014-07-08 18:38         ` Frederic Weisbecker
2014-07-08 19:58           ` Paul E. McKenney
2014-07-08 20:40             ` Frederic Weisbecker
2014-07-08 22:05               ` Paul E. McKenney
2014-07-09 15:40                 ` Frederic Weisbecker
2014-07-11 18:10           ` Christoph Lameter
2014-07-11 18:25             ` Frederic Weisbecker
2014-07-11 18:45               ` Paul E. McKenney
2014-07-11 18:57                 ` Frederic Weisbecker
2014-07-11 19:08                   ` Paul E. McKenney
2014-07-11 19:26                     ` Frederic Weisbecker
2014-07-11 19:43                       ` Paul E. McKenney
2014-07-11 19:55                         ` Frederic Weisbecker
2014-07-11 19:05               ` Christoph Lameter
2014-07-11 19:11                 ` Frederic Weisbecker
2014-07-11 20:35                   ` Paul E. McKenney
2014-07-11 20:45                     ` Frederic Weisbecker
2014-07-12  1:39                       ` Paul E. McKenney
2014-07-14 13:52                         ` Christoph Lameter
2014-07-11 20:15                 ` Peter Zijlstra
2014-07-14 13:53                   ` Christoph Lameter
2014-07-11 18:29             ` Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 12/17] rcu: Don't use NMIs to dump other CPUs' stacks Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 13/17] rcu: Use __this_cpu_read() instead of per_cpu_ptr() Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 14/17] rcu: remove CONFIG_PROVE_RCU_DELAY Paul E. McKenney
2014-07-08  8:11     ` Paul Bolle
2014-07-08 13:56       ` Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 15/17] rcu: Fix __rcu_reclaim() to use true/false for bool Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 16/17] rcu: Fix a sparse warning in rcu_initiate_boost() Paul E. McKenney
2014-07-07 22:38   ` [PATCH tip/core/rcu 17/17] rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() Paul E. McKenney
2014-07-09  2:14 ` [PATCH tip/core/rcu 0/17] Miscellaneous fixes for 3.17 Lai Jiangshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.