linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4
@ 2015-10-06 16:13 Paul E. McKenney
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
  2015-10-06 17:23 ` [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Josh Triplett
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani

Hello!

This series contains miscellaneous fixes:

1.	Export sched_setscheduler_nocheck() so that the new locktorture
	rtmutex_lock tests can be run as modules, courtesy of Davidlohr
	Bueso.

2.	Use rcu_callback_t in call_rcu*() and friends to improve
	readability and to make cscope able to find them, courtesy of
	Boqun Feng.

3.	Use call_rcu_func_t to replace explicit type equivalents when
	defining RCU callback functions, courtesy of Boqun Feng.

4.	Don't unnecessarily disable preemption for Tiny and Tree
	RCU readers (only for preemptible RCU readers), courtesy
	of Boqun Feng.

5.	Eliminate boot-time panic when a silly boot-time fanout is
	specified.

6.	Add online/offline info to help debug stall-warning messages.

7.	Move preemption disabling out of __srcu_read_lock() into
	srcu_read_lock().

8.	Finish folding ->fqs_state into ->gp_state, courtesy of Petr Mladek.

9.	Correct comment for values of ->gp_state field.

10.	Add rcu_pointer_handoff() to allow explicit marking of handing
	off protection from RCU to some other means, such as locking
	or reference counting.

11.	Make list_entry_rcu() use lockless_dereference(), courtesy
	of Patrick Marlier.  Despite the fact that this patch
	does nothing more than eliminate a single store and a
	single load of an unshared stack variable it nevertheless
	manages to provide a measurable performance increase:
	http://people.csail.mit.edu/amatveev/RLU_SOSP2015.pdf

12.	Remove deprecated rcu_lockdep_assert().

							Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/kernel-parameters.txt |    9 +++--
 b/include/linux/list.h                |    5 +-
 b/include/linux/list_bl.h             |    5 +-
 b/include/linux/list_nulls.h          |    3 +
 b/include/linux/rculist.h             |    5 --
 b/include/linux/rcupdate.h            |   59 +++++++++++++++++-----------------
 b/include/linux/rcutiny.h             |    3 +
 b/include/linux/rcutree.h             |    2 -
 b/include/linux/srcu.h                |    5 ++
 b/kernel/exit.c                       |    2 +
 b/kernel/rcu/rcutorture.c             |    6 +--
 b/kernel/rcu/srcu.c                   |    4 --
 b/kernel/rcu/tiny.c                   |    8 ++--
 b/kernel/rcu/tree.c                   |   55 ++++++++++++++++++-------------
 b/kernel/rcu/tree.h                   |   21 +++---------
 b/kernel/rcu/tree_plugin.h            |   10 ++++-
 b/kernel/rcu/tree_trace.c             |    2 -
 b/kernel/rcu/update.c                 |    2 -
 b/kernel/sched/core.c                 |    1 
 19 files changed, 111 insertions(+), 96 deletions(-)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck
  2015-10-06 16:13 [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Paul E. McKenney
@ 2015-10-06 16:13 ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 02/13] rcu: Use rcu_callback_t in call_rcu*() and friends Paul E. McKenney
                     ` (11 more replies)
  2015-10-06 17:23 ` [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Josh Triplett
  1 sibling, 12 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Davidlohr Bueso, Paul E. McKenney

From: Davidlohr Bueso <dave@stgolabs.net>

The new locktorture rtmutex_lock tests exercise priority boosting, which
means that they need to set some tasks to real-time priority.  To do this,
they use sched_setscheduler_nocheck().  However, this is not exported to
modules, which results in the following error when building locktorture
as a module:

ERROR: "sched_setscheduler_nocheck" [kernel/locking/locktorture.ko] undefined!

This commit therefore adds an EXPORT_SYMBOL_GPL() to allow this function
to be invoked from locktorture when built as a module.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2f9c92884817..c4e607873d6f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4022,6 +4022,7 @@ int sched_setscheduler_nocheck(struct task_struct *p, int policy,
 {
 	return _sched_setscheduler(p, policy, param, false);
 }
+EXPORT_SYMBOL_GPL(sched_setscheduler_nocheck);
 
 static int
 do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 02/13] rcu: Use rcu_callback_t in call_rcu*() and friends
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 03/13] rcu: Use call_rcu_func_t to replace explicit type equivalents Paul E. McKenney
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Boqun Feng, Paul E. McKenney

From: Boqun Feng <boqun.feng@gmail.com>

As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 10 +++++-----
 include/linux/rcutiny.h  |  2 +-
 include/linux/rcutree.h  |  2 +-
 kernel/rcu/rcutorture.c  |  4 ++--
 kernel/rcu/srcu.c        |  2 +-
 kernel/rcu/tiny.c        |  8 ++++----
 kernel/rcu/tree.c        |  8 ++++----
 kernel/rcu/tree.h        |  2 +-
 kernel/rcu/tree_plugin.h |  2 +-
 kernel/rcu/update.c      |  2 +-
 10 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 581abf848566..d63bb77dab35 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -160,7 +160,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
  * more than one CPU).
  */
 void call_rcu(struct rcu_head *head,
-	      void (*func)(struct rcu_head *head));
+	      rcu_callback_t func);
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
@@ -191,7 +191,7 @@ void call_rcu(struct rcu_head *head,
  * memory ordering guarantees.
  */
 void call_rcu_bh(struct rcu_head *head,
-		 void (*func)(struct rcu_head *head));
+		 rcu_callback_t func);
 
 /**
  * call_rcu_sched() - Queue an RCU for invocation after sched grace period.
@@ -213,7 +213,7 @@ void call_rcu_bh(struct rcu_head *head,
  * memory ordering guarantees.
  */
 void call_rcu_sched(struct rcu_head *head,
-		    void (*func)(struct rcu_head *rcu));
+		    rcu_callback_t func);
 
 void synchronize_sched(void);
 
@@ -274,7 +274,7 @@ do {									\
  * See the description of call_rcu() for more detailed information on
  * memory ordering guarantees.
  */
-void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head));
+void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
 void synchronize_rcu_tasks(void);
 void rcu_barrier_tasks(void);
 
@@ -1065,7 +1065,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 #define __kfree_rcu(head, offset) \
 	do { \
 		BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-		kfree_call_rcu(head, (void (*)(struct rcu_head *))(unsigned long)(offset)); \
+		kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
 	} while (0)
 
 /**
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ff968b7af3a4..c8a0722f77ea 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -83,7 +83,7 @@ static inline void synchronize_sched_expedited(void)
 }
 
 static inline void kfree_call_rcu(struct rcu_head *head,
-				  void (*func)(struct rcu_head *rcu))
+				  rcu_callback_t func)
 {
 	call_rcu(head, func);
 }
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 5abec82f325e..60d15a080d7c 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,7 +48,7 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
 
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 77192953dee5..51c8e7f02f48 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -448,7 +448,7 @@ static void synchronize_rcu_busted(void)
 }
 
 static void
-call_rcu_busted(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+call_rcu_busted(struct rcu_head *head, rcu_callback_t func)
 {
 	/* This is a deliberate bug for testing purposes only! */
 	func(head);
@@ -523,7 +523,7 @@ static void srcu_torture_synchronize(void)
 }
 
 static void srcu_torture_call(struct rcu_head *head,
-			      void (*func)(struct rcu_head *head))
+			      rcu_callback_t func)
 {
 	call_srcu(srcu_ctlp, head, func);
 }
diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index d3fcb2ec8536..9e6122540d28 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -387,7 +387,7 @@ static void srcu_flip(struct srcu_struct *sp)
  * srcu_struct structure.
  */
 void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
-		void (*func)(struct rcu_head *head))
+	       rcu_callback_t func)
 {
 	unsigned long flags;
 
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index d0471056d0af..944b1b491ed8 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -44,7 +44,7 @@ struct rcu_ctrlblk;
 static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp);
 static void rcu_process_callbacks(struct softirq_action *unused);
 static void __call_rcu(struct rcu_head *head,
-		       void (*func)(struct rcu_head *rcu),
+		       rcu_callback_t func,
 		       struct rcu_ctrlblk *rcp);
 
 #include "tiny_plugin.h"
@@ -203,7 +203,7 @@ EXPORT_SYMBOL_GPL(synchronize_sched);
  * Helper function for call_rcu() and call_rcu_bh().
  */
 static void __call_rcu(struct rcu_head *head,
-		       void (*func)(struct rcu_head *rcu),
+		       rcu_callback_t func,
 		       struct rcu_ctrlblk *rcp)
 {
 	unsigned long flags;
@@ -229,7 +229,7 @@ static void __call_rcu(struct rcu_head *head,
  * period.  But since we have but one CPU, that would be after any
  * quiescent state.
  */
-void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
 {
 	__call_rcu(head, func, &rcu_sched_ctrlblk);
 }
@@ -239,7 +239,7 @@ EXPORT_SYMBOL_GPL(call_rcu_sched);
  * Post an RCU bottom-half callback to be invoked after any subsequent
  * quiescent state.
  */
-void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
 {
 	__call_rcu(head, func, &rcu_bh_ctrlblk);
 }
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 775d36cc0050..b9d9e0249e2f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3017,7 +3017,7 @@ static void rcu_leak_callback(struct rcu_head *rhp)
  * is expected to specify a CPU.
  */
 static void
-__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+__call_rcu(struct rcu_head *head, rcu_callback_t func,
 	   struct rcu_state *rsp, int cpu, bool lazy)
 {
 	unsigned long flags;
@@ -3088,7 +3088,7 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 /*
  * Queue an RCU-sched callback for invocation after a grace period.
  */
-void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
 {
 	__call_rcu(head, func, &rcu_sched_state, -1, 0);
 }
@@ -3097,7 +3097,7 @@ EXPORT_SYMBOL_GPL(call_rcu_sched);
 /*
  * Queue an RCU callback for invocation after a quicker grace period.
  */
-void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
 {
 	__call_rcu(head, func, &rcu_bh_state, -1, 0);
 }
@@ -3111,7 +3111,7 @@ EXPORT_SYMBOL_GPL(call_rcu_bh);
  * function may only be called from __kfree_rcu().
  */
 void kfree_call_rcu(struct rcu_head *head,
-		    void (*func)(struct rcu_head *rcu))
+		    rcu_callback_t func)
 {
 	__call_rcu(head, func, rcu_state_p, -1, 1);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2e991f8361e4..ad11529375cc 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -584,7 +584,7 @@ static void rcu_print_detail_task_stall(struct rcu_state *rsp);
 static int rcu_print_task_stall(struct rcu_node *rnp);
 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
 static void rcu_preempt_check_callbacks(void);
-void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
+void call_rcu(struct rcu_head *head, rcu_callback_t func);
 static void __init __rcu_init_preempt(void);
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
 static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b2bf3963a0ae..06116ae6dfd7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -500,7 +500,7 @@ static void rcu_preempt_do_callbacks(void)
 /*
  * Queue a preemptible-RCU callback for invocation after a grace period.
  */
-void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	__call_rcu(head, func, rcu_state_p, -1, 0);
 }
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 7a0b3bc7c5ed..5f748c5a40f0 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -534,7 +534,7 @@ static void rcu_spawn_tasks_kthread(void);
  * Post an RCU-tasks callback.  First call must be from process context
  * after the scheduler if fully operational.
  */
-void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
+void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
 {
 	unsigned long flags;
 	bool needwake;
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 03/13] rcu: Use call_rcu_func_t to replace explicit type equivalents
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 02/13] rcu: Use rcu_callback_t in call_rcu*() and friends Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Paul E. McKenney
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Boqun Feng, Paul E. McKenney

From: Boqun Feng <boqun.feng@gmail.com>

We have had the call_rcu_func_t typedef for a quite awhile, but we still
use explicit function pointer types in some places.  These types can
confuse cscope and can be hard to read.  This patch therefore replaces
these types with the call_rcu_func_t typedef.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/rcutorture.c | 2 +-
 kernel/rcu/tree.h       | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 51c8e7f02f48..f9ec6cbe77d3 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -252,7 +252,7 @@ struct rcu_torture_ops {
 	void (*exp_sync)(void);
 	unsigned long (*get_state)(void);
 	void (*cond_sync)(unsigned long oldstate);
-	void (*call)(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
+	call_rcu_func_t call;
 	void (*cb_barrier)(void);
 	void (*fqs)(void);
 	void (*stats)(void);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index ad11529375cc..0c33c82cec64 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -464,8 +464,7 @@ struct rcu_state {
 						/*  shut bogus gcc warning) */
 	u8 flavor_mask;				/* bit in flavor mask. */
 	struct rcu_data __percpu *rda;		/* pointer of percu rcu_data. */
-	void (*call)(struct rcu_head *head,	/* call_rcu() flavor. */
-		     void (*func)(struct rcu_head *head));
+	call_rcu_func_t call;			/* call_rcu() flavor. */
 
 	/* The following fields are guarded by the root rcu_node's lock. */
 
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 02/13] rcu: Use rcu_callback_t in call_rcu*() and friends Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 03/13] rcu: Use call_rcu_func_t to replace explicit type equivalents Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:20     ` [Kernel networking modules.] OSI levels 2 & 3, Assistance - If anyone knows anyone in the US. North West region John D Allen, Leveridge Systems INC
  2015-10-06 16:44     ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Josh Triplett
  2015-10-06 16:13   ` [PATCH tip/core/rcu 05/13] rcu: Eliminate panic when silly boot-time fanout specified Paul E. McKenney
                     ` (8 subsequent siblings)
  11 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Boqun Feng, Paul E. McKenney

From: Boqun Feng <boqun.feng@gmail.com>

Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers.  Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
RCU and bloats Tiny RCU.

This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively.  However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.

This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 6 ++++--
 include/linux/rcutiny.h  | 1 +
 kernel/rcu/tree.c        | 9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d63bb77dab35..6c3ceceb6148 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -297,12 +297,14 @@ void synchronize_rcu(void);
 
 static inline void __rcu_read_lock(void)
 {
-	preempt_disable();
+	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
+		preempt_disable();
 }
 
 static inline void __rcu_read_unlock(void)
 {
-	preempt_enable();
+	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
+		preempt_enable();
 }
 
 static inline void synchronize_rcu(void)
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index c8a0722f77ea..4c1aaf9cce7b 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
 
 static inline void rcu_all_qs(void)
 {
+	barrier(); /* Avoid RCU read-side critical sections leaking across. */
 }
 
 #endif /* __LINUX_RCUTINY_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b9d9e0249e2f..93c0f23c3e45 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
  */
 void rcu_note_context_switch(void)
 {
+	barrier(); /* Avoid RCU read-side critical sections leaking down. */
 	trace_rcu_utilization(TPS("Start context switch"));
 	rcu_sched_qs();
 	rcu_preempt_note_context_switch();
 	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
 		rcu_momentary_dyntick_idle();
 	trace_rcu_utilization(TPS("End context switch"));
+	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
 EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
@@ -353,12 +355,19 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
  * RCU flavors in desperate need of a quiescent state, which will normally
  * be none of them).  Either way, do a lightweight quiescent state for
  * all RCU flavors.
+ *
+ * The barrier() calls are redundant in the common case when this is
+ * called externally, but just in case this is called from within this
+ * file.
+ *
  */
 void rcu_all_qs(void)
 {
+	barrier(); /* Avoid RCU read-side critical sections leaking down. */
 	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
 		rcu_momentary_dyntick_idle();
 	this_cpu_inc(rcu_qs_ctr);
+	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 }
 EXPORT_SYMBOL_GPL(rcu_all_qs);
 
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 05/13] rcu: Eliminate panic when silly boot-time fanout specified
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (2 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 06/13] rcu: Add online/offline info to stall warning message Paul E. McKenney
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit loosens rcutree.rcu_fanout_leaf range checks
and replaces a panic() with a fallback to compile-time values.
This fallback is accompanied by a WARN_ON(), and both occur when the
rcutree.rcu_fanout_leaf value is too small to accommodate the number of
CPUs.  For example, given the current four-level limit for the rcu_node
tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must
have rcutree.rcu_fanout_leaf larger than 2.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt |  9 ++++++---
 kernel/rcu/tree.c                   | 20 +++++++++++---------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 22a4b687ea5b..23ec96877311 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3074,9 +3074,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			cache-to-cache transfer latencies.
 
 	rcutree.rcu_fanout_leaf= [KNL]
-			Increase the number of CPUs assigned to each
-			leaf rcu_node structure.  Useful for very large
-			systems.
+			Change the number of CPUs assigned to each
+			leaf rcu_node structure.  Useful for very
+			large systems, which will choose the value 64,
+			and for NUMA systems with large remote-access
+			latencies, which will choose a value aligned
+			with the appropriate hardware boundaries.
 
 	rcutree.jiffies_till_sched_qs= [KNL]
 			Set required age in jiffies for a
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 93c0f23c3e45..713eb92314b4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4230,13 +4230,12 @@ static void __init rcu_init_geometry(void)
 		rcu_fanout_leaf, nr_cpu_ids);
 
 	/*
-	 * The boot-time rcu_fanout_leaf parameter is only permitted
-	 * to increase the leaf-level fanout, not decrease it.  Of course,
-	 * the leaf-level fanout cannot exceed the number of bits in
-	 * the rcu_node masks.  Complain and fall back to the compile-
-	 * time values if these limits are exceeded.
+	 * The boot-time rcu_fanout_leaf parameter must be at least two
+	 * and cannot exceed the number of bits in the rcu_node masks.
+	 * Complain and fall back to the compile-time values if this
+	 * limit is exceeded.
 	 */
-	if (rcu_fanout_leaf < RCU_FANOUT_LEAF ||
+	if (rcu_fanout_leaf < 2 ||
 	    rcu_fanout_leaf > sizeof(unsigned long) * 8) {
 		rcu_fanout_leaf = RCU_FANOUT_LEAF;
 		WARN_ON(1);
@@ -4253,10 +4252,13 @@ static void __init rcu_init_geometry(void)
 
 	/*
 	 * The tree must be able to accommodate the configured number of CPUs.
-	 * If this limit is exceeded than we have a serious problem elsewhere.
+	 * If this limit is exceeded, fall back to the compile-time values.
 	 */
-	if (nr_cpu_ids > rcu_capacity[RCU_NUM_LVLS - 1])
-		panic("rcu_init_geometry: rcu_capacity[] is too small");
+	if (nr_cpu_ids > rcu_capacity[RCU_NUM_LVLS - 1]) {
+		rcu_fanout_leaf = RCU_FANOUT_LEAF;
+		WARN_ON(1);
+		return;
+	}
 
 	/* Calculate the number of levels in the tree. */
 	for (i = 0; nr_cpu_ids > rcu_capacity[i]; i++) {
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 06/13] rcu: Add online/offline info to stall warning message
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (3 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 05/13] rcu: Eliminate panic when silly boot-time fanout specified Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 17:15     ` Josh Triplett
  2015-10-06 16:13   ` [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock() Paul E. McKenney
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number.  A "?" indicates global
offline, a "," global online, and a "!" indicates RCU believes that the
CPU is offline and "." otherwise, both right after the CPU number.
So for CPU 10, you would normally see "10,.:" indicating that everything
believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 06116ae6dfd7..57ed9c13ae5a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1702,8 +1702,12 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
 		ticks_value = rsp->gpnum - rdp->gpnum;
 	}
 	print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
-	pr_err("\t%d: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u fqs=%ld %s\n",
-	       cpu, ticks_value, ticks_title,
+	pr_err("\t%d-%c%c%c: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u fqs=%ld %s\n",
+	       cpu,
+	       "O."[!!cpu_online(cpu)],
+	       "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
+	       "N."[!!(rdp->grpmask & rdp->mynode->qsmaskinitnext)],
+	       ticks_value, ticks_title,
 	       atomic_read(&rdtp->dynticks) & 0xfff,
 	       rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
 	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (4 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 06/13] rcu: Add online/offline info to stall warning message Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 17:18     ` Josh Triplett
  2015-10-06 20:07     ` Peter Zijlstra
  2015-10-06 16:13   ` [PATCH tip/core/rcu 08/13] rcu: Finish folding ->fqs_state into ->gp_state Paul E. McKenney
                     ` (5 subsequent siblings)
  11 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, __srcu_read_lock() cannot be invoked from restricted
environments because it contains calls to preempt_disable() and
preempt_enable(), both of which can invoke lockdep, which is a bad
idea in some restricted execution modes.  This commit therefore moves
the preempt_disable() and preempt_enable() from __srcu_read_lock()
to srcu_read_lock().  It also inserts the preempt_disable() and
preempt_enable() around the call to __srcu_read_lock() in do_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/srcu.h | 5 ++++-
 kernel/exit.c        | 2 ++
 kernel/rcu/srcu.c    | 2 --
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index bdeb4567b71e..f5f80c5643ac 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -215,8 +215,11 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
  */
 static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 {
-	int retval = __srcu_read_lock(sp);
+	int retval;
 
+	preempt_disable();
+	retval = __srcu_read_lock(sp);
+	preempt_enable();
 	rcu_lock_acquire(&(sp)->dep_map);
 	return retval;
 }
diff --git a/kernel/exit.c b/kernel/exit.c
index ea95ee1b5ef7..0e93b63bbc59 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -761,7 +761,9 @@ void do_exit(long code)
 	 */
 	flush_ptrace_hw_breakpoint(tsk);
 
+	TASKS_RCU(preempt_disable());
 	TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
+	TASKS_RCU(preempt_enable());
 	exit_notify(tsk, group_dead);
 	proc_exit_connector(tsk);
 #ifdef CONFIG_NUMA
diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index 9e6122540d28..a63a1ea5a41b 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -298,11 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp)
 	int idx;
 
 	idx = READ_ONCE(sp->completed) & 0x1;
-	preempt_disable();
 	__this_cpu_inc(sp->per_cpu_ref->c[idx]);
 	smp_mb(); /* B */  /* Avoid leaking the critical section. */
 	__this_cpu_inc(sp->per_cpu_ref->seq[idx]);
-	preempt_enable();
 	return idx;
 }
 EXPORT_SYMBOL_GPL(__srcu_read_lock);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 08/13] rcu: Finish folding ->fqs_state into ->gp_state
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (5 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock() Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 09/13] rcu: Correct comment for values of ->gp_state field Paul E. McKenney
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Petr Mladek, Paul E. McKenney

From: Petr Mladek <pmladek@suse.com>

Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing
into kthread") started the process of folding the old ->fqs_state into
->gp_state, but did not complete it.  This situation does not cause
any malfunction, but can result in extremely confusing trace output.
This commit completes this task of eliminating ->fqs_state in favor
of ->gp_state.

The old ->fqs_state was also used to decide when to collect dyntick-idle
snapshots.  For this purpose, we add a boolean variable into the kthread,
which is set on the first call to rcu_gp_fqs() for a given grace period
and clear otherwise.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c       | 18 ++++++++----------
 kernel/rcu/tree.h       | 14 +++-----------
 kernel/rcu/tree_trace.c |  2 +-
 3 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 713eb92314b4..4d296b0fb987 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -98,7 +98,7 @@ struct rcu_state sname##_state = { \
 	.level = { &sname##_state.node[0] }, \
 	.rda = &sname##_data, \
 	.call = cr, \
-	.fqs_state = RCU_GP_IDLE, \
+	.gp_state = RCU_GP_IDLE, \
 	.gpnum = 0UL - 300UL, \
 	.completed = 0UL - 300UL, \
 	.orphan_lock = __RAW_SPIN_LOCK_UNLOCKED(&sname##_state.orphan_lock), \
@@ -1936,16 +1936,15 @@ static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
 /*
  * Do one round of quiescent-state forcing.
  */
-static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
+static void rcu_gp_fqs(struct rcu_state *rsp, bool first_time)
 {
-	int fqs_state = fqs_state_in;
 	bool isidle = false;
 	unsigned long maxj;
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
 	WRITE_ONCE(rsp->gp_activity, jiffies);
 	rsp->n_force_qs++;
-	if (fqs_state == RCU_SAVE_DYNTICK) {
+	if (first_time) {
 		/* Collect dyntick-idle snapshots. */
 		if (is_sysidle_rcu_state(rsp)) {
 			isidle = true;
@@ -1954,7 +1953,6 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
 		force_qs_rnp(rsp, dyntick_save_progress_counter,
 			     &isidle, &maxj);
 		rcu_sysidle_report_gp(rsp, isidle, maxj);
-		fqs_state = RCU_FORCE_QS;
 	} else {
 		/* Handle dyntick-idle and offline CPUs. */
 		isidle = true;
@@ -1968,7 +1966,6 @@ static int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
 			   READ_ONCE(rsp->gp_flags) & ~RCU_GP_FLAG_FQS);
 		raw_spin_unlock_irq(&rnp->lock);
 	}
-	return fqs_state;
 }
 
 /*
@@ -2032,7 +2029,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	/* Declare grace period done. */
 	WRITE_ONCE(rsp->completed, rsp->gpnum);
 	trace_rcu_grace_period(rsp->name, rsp->completed, TPS("end"));
-	rsp->fqs_state = RCU_GP_IDLE;
+	rsp->gp_state = RCU_GP_IDLE;
 	rdp = this_cpu_ptr(rsp->rda);
 	/* Advance CBs to reduce false positives below. */
 	needgp = rcu_advance_cbs(rsp, rnp, rdp) || needgp;
@@ -2050,7 +2047,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
  */
 static int __noreturn rcu_gp_kthread(void *arg)
 {
-	int fqs_state;
+	bool first_gp_fqs;
 	int gf;
 	unsigned long j;
 	int ret;
@@ -2082,7 +2079,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 		}
 
 		/* Handle quiescent-state forcing. */
-		fqs_state = RCU_SAVE_DYNTICK;
+		first_gp_fqs = true;
 		j = jiffies_till_first_fqs;
 		if (j > HZ) {
 			j = HZ;
@@ -2110,7 +2107,8 @@ static int __noreturn rcu_gp_kthread(void *arg)
 				trace_rcu_grace_period(rsp->name,
 						       READ_ONCE(rsp->gpnum),
 						       TPS("fqsstart"));
-				fqs_state = rcu_gp_fqs(rsp, fqs_state);
+				rcu_gp_fqs(rsp, first_gp_fqs);
+				first_gp_fqs = false;
 				trace_rcu_grace_period(rsp->name,
 						       READ_ONCE(rsp->gpnum),
 						       TPS("fqsend"));
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0c33c82cec64..be6d1e8eeb79 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -412,13 +412,6 @@ struct rcu_data {
 	struct rcu_state *rsp;
 };
 
-/* Values for fqs_state field in struct rcu_state. */
-#define RCU_GP_IDLE		0	/* No grace period in progress. */
-#define RCU_GP_INIT		1	/* Grace period being initialized. */
-#define RCU_SAVE_DYNTICK	2	/* Need to scan dyntick state. */
-#define RCU_FORCE_QS		3	/* Need to force quiescent state. */
-#define RCU_SIGNAL_INIT		RCU_SAVE_DYNTICK
-
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
 #define RCU_NOGP_WAKE_NOT	0
 #define RCU_NOGP_WAKE		1
@@ -468,9 +461,8 @@ struct rcu_state {
 
 	/* The following fields are guarded by the root rcu_node's lock. */
 
-	u8	fqs_state ____cacheline_internodealigned_in_smp;
-						/* Force QS state. */
-	u8	boost;				/* Subject to priority boost. */
+	u8	boost ____cacheline_internodealigned_in_smp;
+						/* Subject to priority boost. */
 	unsigned long gpnum;			/* Current gp number. */
 	unsigned long completed;		/* # of last completed gp. */
 	struct task_struct *gp_kthread;		/* Task for grace periods. */
@@ -538,7 +530,7 @@ struct rcu_state {
 #define RCU_GP_FLAG_FQS  0x2	/* Need grace-period quiescent-state forcing. */
 
 /* Values for rcu_state structure's gp_flags field. */
-#define RCU_GP_WAIT_INIT 0	/* Initial state. */
+#define RCU_GP_IDLE	 0	/* Initial state and no GP in progress. */
 #define RCU_GP_WAIT_GPS  1	/* Wait for grace-period start. */
 #define RCU_GP_DONE_GPS  2	/* Wait done for grace-period start. */
 #define RCU_GP_WAIT_FQS  3	/* Wait for force-quiescent-state time. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 6fc4c5ff3bb5..1d61f5ba4641 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -268,7 +268,7 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 	gpnum = rsp->gpnum;
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x ",
 		   ulong2long(rsp->completed), ulong2long(gpnum),
-		   rsp->fqs_state,
+		   rsp->gp_state,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff));
 	seq_printf(m, "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld/%ld\n",
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 09/13] rcu: Correct comment for values of ->gp_state field
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (6 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 08/13] rcu: Finish folding ->fqs_state into ->gp_state Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff() Paul E. McKenney
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit corrects the comment for the values of the ->gp_state field,
which previously incorrectly said that these were for the ->gp_flags
field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index be6d1e8eeb79..674ebbc3e406 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -529,7 +529,7 @@ struct rcu_state {
 #define RCU_GP_FLAG_INIT 0x1	/* Need grace-period initialization. */
 #define RCU_GP_FLAG_FQS  0x2	/* Need grace-period quiescent-state forcing. */
 
-/* Values for rcu_state structure's gp_flags field. */
+/* Values for rcu_state structure's gp_state field. */
 #define RCU_GP_IDLE	 0	/* Initial state and no GP in progress. */
 #define RCU_GP_WAIT_GPS  1	/* Wait for grace-period start. */
 #define RCU_GP_DONE_GPS  2	/* Wait done for grace-period start. */
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (7 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 09/13] rcu: Correct comment for values of ->gp_state field Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 17:21     ` Josh Triplett
  2015-10-06 20:27     ` Peter Zijlstra
  2015-10-06 16:13   ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
                     ` (2 subsequent siblings)
  11 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit adds an rcu_pointer_handoff() that is intended to mark
situations where a structure's protection transitions from RCU to some
other mechanism (locking, reference counting, whatever).  These markings
should allow external tools to more easily spot bugs involving leaking
pointers out of RCU read-side critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6c3ceceb6148..587eb057e2fa 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -813,6 +813,28 @@ static inline void rcu_preempt_sleep_check(void)
 #define rcu_dereference_sched(p) rcu_dereference_sched_check(p, 0)
 
 /**
+ * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
+ * @p: The pointer to hand off
+ *
+ * This is simply an identity function, but it documents where a pointer
+ * is handed off from RCU to some other synchronization mechanism, for
+ * example, reference counting or locking.  In C11, it would map to
+ * kill_dependency().  It could be used as follows:
+ *
+ *	rcu_read_lock();
+ *	p = rcu_dereference(gp);
+ *	long_lived = is_long_lived(p);
+ *	if (long_lived) {
+ *		if (!atomic_inc_not_zero(p->refcnt))
+ *			long_lived = false;
+ *		else
+ *			p = rcu_pointer_handoff(p);
+ *	}
+ *	rcu_read_unlock();
+ */
+#define rcu_pointer_handoff(p) (p)
+
+/**
  * rcu_read_lock() - mark the beginning of an RCU read-side critical section
  *
  * When synchronize_rcu() is invoked on one CPU while other CPUs
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (8 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff() Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-26  8:45     ` Ingo Molnar
  2015-10-06 16:13   ` [PATCH tip/core/rcu 12/13] rcu: Remove deprecated rcu_lockdep_assert() Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 13/13] rculist: Use WRITE_ONCE() when deleting from reader-visible list Paul E. McKenney
  11 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Patrick Marlier, Paul E. McKenney

From: Patrick Marlier <patrick.marlier@gmail.com>

The current list_entry_rcu() implementation copies the pointer to a stack
variable, then invokes rcu_dereference_raw() on it.  This results in an
additional store-load pair.  Now, most compilers will emit normal store
and load instructions, which might seem to be of negligible overhead,
but this results in a load-hit-store situation that can cause surprisingly
long pipeline stalls, even on modern microprocessors.  The problem is
that it takes time for the store to get the store buffer updated, which
can delay the subsequent load, which immediately follows.

This commit therefore switches to the lockless_dereference() primitive,
which does not expect the __rcu annotations (that are anyway not present
in the list_head structure) and which, like rcu_dereference_raw(),
does not check for an enclosing RCU read-side critical section.
Most importantly, it does not copy the pointer, thus avoiding the
load-hit-store overhead.

Signed-off-by: Patrick Marlier <patrick.marlier@gmail.com>
[ paulmck: Switched to lockless_dereference() to suppress sparse warnings. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rculist.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 17c6b1f84a77..5ed540986019 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
  * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
  */
 #define list_entry_rcu(ptr, type, member) \
-({ \
-	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
-	container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
-})
+	container_of(lockless_dereference(ptr), type, member)
 
 /**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 12/13] rcu: Remove deprecated rcu_lockdep_assert()
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (9 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  2015-10-06 16:13   ` [PATCH tip/core/rcu 13/13] rculist: Use WRITE_ONCE() when deleting from reader-visible list Paul E. McKenney
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The old rcu_lockdep_assert() was retained to ease handling of incoming
patches, but any use will result in deprecated warnings.  However, its
replacement, RCU_LOCKDEP_WARN(), is now upstream.  It is therefore
time to remove rcu_lockdep_assert(), which this commit does.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 21 ---------------------
 1 file changed, 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 587eb057e2fa..a0189ba67fde 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -537,29 +537,9 @@ static inline int rcu_read_lock_sched_held(void)
 
 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
-/* Deprecate rcu_lockdep_assert():  Use RCU_LOCKDEP_WARN() instead. */
-static inline void __attribute((deprecated)) deprecate_rcu_lockdep_assert(void)
-{
-}
-
 #ifdef CONFIG_PROVE_RCU
 
 /**
- * rcu_lockdep_assert - emit lockdep splat if specified condition not met
- * @c: condition to check
- * @s: informative message
- */
-#define rcu_lockdep_assert(c, s)					\
-	do {								\
-		static bool __section(.data.unlikely) __warned;		\
-		deprecate_rcu_lockdep_assert();				\
-		if (debug_lockdep_rcu_enabled() && !__warned && !(c)) {	\
-			__warned = true;				\
-			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
-		}							\
-	} while (0)
-
-/**
  * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
  * @c: condition to check
  * @s: informative message
@@ -596,7 +576,6 @@ static inline void rcu_preempt_sleep_check(void)
 
 #else /* #ifdef CONFIG_PROVE_RCU */
 
-#define rcu_lockdep_assert(c, s) deprecate_rcu_lockdep_assert()
 #define RCU_LOCKDEP_WARN(c, s) do { } while (0)
 #define rcu_sleep_check() do { } while (0)
 
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH tip/core/rcu 13/13] rculist: Use WRITE_ONCE() when deleting from reader-visible list
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
                     ` (10 preceding siblings ...)
  2015-10-06 16:13   ` [PATCH tip/core/rcu 12/13] rcu: Remove deprecated rcu_lockdep_assert() Paul E. McKenney
@ 2015-10-06 16:13   ` Paul E. McKenney
  11 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 16:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The various RCU list-deletion macros (list_del_rcu(),
hlist_del_init_rcu(), hlist_del_rcu(), hlist_bl_del_init_rcu(),
hlist_bl_del_rcu(), hlist_nulls_del_init_rcu(), and hlist_nulls_del_rcu())
do plain stores into the ->next pointer of the preceding list elemment.
Unfortunately, the compiler is within its rights to (for example) use
byte-at-a-time writes to update the pointer, which would fatally confuse
concurrent readers.  This patch therefore adds the needed WRITE_ONCE()
macros.

KernelThreadSanitizer (KTSAN) reported the __hlist_del() issue, which
is a problem when __hlist_del() is invoked by hlist_del_rcu().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/list.h       | 5 +++--
 include/linux/list_bl.h    | 5 +++--
 include/linux/list_nulls.h | 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 3e3e64a61002..993395a2e55c 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -87,7 +87,7 @@ static inline void list_add_tail(struct list_head *new, struct list_head *head)
 static inline void __list_del(struct list_head * prev, struct list_head * next)
 {
 	next->prev = prev;
-	prev->next = next;
+	WRITE_ONCE(prev->next, next);
 }
 
 /**
@@ -615,7 +615,8 @@ static inline void __hlist_del(struct hlist_node *n)
 {
 	struct hlist_node *next = n->next;
 	struct hlist_node **pprev = n->pprev;
-	*pprev = next;
+
+	WRITE_ONCE(*pprev, next);
 	if (next)
 		next->pprev = pprev;
 }
diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h
index 2eb88556c5c5..8132214e8efd 100644
--- a/include/linux/list_bl.h
+++ b/include/linux/list_bl.h
@@ -93,9 +93,10 @@ static inline void __hlist_bl_del(struct hlist_bl_node *n)
 	LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK);
 
 	/* pprev may be `first`, so be careful not to lose the lock bit */
-	*pprev = (struct hlist_bl_node *)
+	WRITE_ONCE(*pprev,
+		   (struct hlist_bl_node *)
 			((unsigned long)next |
-			 ((unsigned long)*pprev & LIST_BL_LOCKMASK));
+			 ((unsigned long)*pprev & LIST_BL_LOCKMASK)));
 	if (next)
 		next->pprev = pprev;
 }
diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h
index f266661d2666..444d2b1313bd 100644
--- a/include/linux/list_nulls.h
+++ b/include/linux/list_nulls.h
@@ -76,7 +76,8 @@ static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
 {
 	struct hlist_nulls_node *next = n->next;
 	struct hlist_nulls_node **pprev = n->pprev;
-	*pprev = next;
+
+	WRITE_ONCE(*pprev, next);
 	if (!is_a_nulls(next))
 		next->pprev = pprev;
 }
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [Kernel networking modules.]  OSI levels 2 & 3, Assistance - If anyone knows anyone in the US. North West region
  2015-10-06 16:13   ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Paul E. McKenney
@ 2015-10-06 16:20     ` John D Allen, Leveridge Systems INC
  2015-10-06 16:44     ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Josh Triplett
  1 sibling, 0 replies; 57+ messages in thread
From: John D Allen, Leveridge Systems INC @ 2015-10-06 16:20 UTC (permalink / raw)
  To: linux-kernel


OSI levels 2 & 3, Assistance - If anyone knows anyone in the US. North West region, please advise.

Welcome assistance on Kernel developments on the networking side.

thanks in advance.

Regards, John.


> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 16:13   ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Paul E. McKenney
  2015-10-06 16:20     ` [Kernel networking modules.] OSI levels 2 & 3, Assistance - If anyone knows anyone in the US. North West region John D Allen, Leveridge Systems INC
@ 2015-10-06 16:44     ` Josh Triplett
  2015-10-06 17:01       ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 16:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 09:13:39AM -0700, Paul E. McKenney wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
> 
> Because preempt_disable() maps to barrier() for non-debug builds,
> it forces the compiler to spill and reload registers.  Because Tree
> RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
> barrier() instances generate needless extra code for each instance of
> rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
> RCU and bloats Tiny RCU.
> 
> This commit therefore removes the preempt_disable() and preempt_enable()
> from the non-preemptible implementations of __rcu_read_lock() and
> __rcu_read_unlock(), respectively.  However, for debug purposes,
> preempt_disable() and preempt_enable() are still invoked if
> CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
> atomic sections in non-preemptible kernels.
> 
> This is based on an earlier patch by Paul E. McKenney, fixing
> a bug encountered in kernels built with CONFIG_PREEMPT=n and
> CONFIG_PREEMPT_COUNT=y.

This also adds explicit barrier() calls to several internal RCU
functions, but the commit message doesn't explain those at all.

> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/rcupdate.h | 6 ++++--
>  include/linux/rcutiny.h  | 1 +
>  kernel/rcu/tree.c        | 9 +++++++++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index d63bb77dab35..6c3ceceb6148 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -297,12 +297,14 @@ void synchronize_rcu(void);
>  
>  static inline void __rcu_read_lock(void)
>  {
> -	preempt_disable();
> +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> +		preempt_disable();
>  }
>  
>  static inline void __rcu_read_unlock(void)
>  {
> -	preempt_enable();
> +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> +		preempt_enable();
>  }
>  
>  static inline void synchronize_rcu(void)
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index c8a0722f77ea..4c1aaf9cce7b 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
>  
>  static inline void rcu_all_qs(void)
>  {
> +	barrier(); /* Avoid RCU read-side critical sections leaking across. */
>  }
>  
>  #endif /* __LINUX_RCUTINY_H */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b9d9e0249e2f..93c0f23c3e45 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
>   */
>  void rcu_note_context_switch(void)
>  {
> +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
>  	trace_rcu_utilization(TPS("Start context switch"));
>  	rcu_sched_qs();
>  	rcu_preempt_note_context_switch();
>  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
>  		rcu_momentary_dyntick_idle();
>  	trace_rcu_utilization(TPS("End context switch"));
> +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
>  }
>  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
>  
> @@ -353,12 +355,19 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
>   * RCU flavors in desperate need of a quiescent state, which will normally
>   * be none of them).  Either way, do a lightweight quiescent state for
>   * all RCU flavors.
> + *
> + * The barrier() calls are redundant in the common case when this is
> + * called externally, but just in case this is called from within this
> + * file.
> + *
>   */
>  void rcu_all_qs(void)
>  {
> +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
>  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
>  		rcu_momentary_dyntick_idle();
>  	this_cpu_inc(rcu_qs_ctr);
> +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
>  }
>  EXPORT_SYMBOL_GPL(rcu_all_qs);
>  
> -- 
> 2.5.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 16:44     ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Josh Triplett
@ 2015-10-06 17:01       ` Paul E. McKenney
  2015-10-06 17:16         ` Josh Triplett
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 17:01 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 09:44:45AM -0700, Josh Triplett wrote:
> On Tue, Oct 06, 2015 at 09:13:39AM -0700, Paul E. McKenney wrote:
> > From: Boqun Feng <boqun.feng@gmail.com>
> > 
> > Because preempt_disable() maps to barrier() for non-debug builds,
> > it forces the compiler to spill and reload registers.  Because Tree
> > RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
> > barrier() instances generate needless extra code for each instance of
> > rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
> > RCU and bloats Tiny RCU.
> > 
> > This commit therefore removes the preempt_disable() and preempt_enable()
> > from the non-preemptible implementations of __rcu_read_lock() and
> > __rcu_read_unlock(), respectively.  However, for debug purposes,
> > preempt_disable() and preempt_enable() are still invoked if
> > CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
> > atomic sections in non-preemptible kernels.
> > 
> > This is based on an earlier patch by Paul E. McKenney, fixing
> > a bug encountered in kernels built with CONFIG_PREEMPT=n and
> > CONFIG_PREEMPT_COUNT=y.
> 
> This also adds explicit barrier() calls to several internal RCU
> functions, but the commit message doesn't explain those at all.

To compensate for them being removed from rcu_read_lock() and
rcu_read_unlock(), but yes, I will update.

And good catch!

						Thanx, Paul

> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/rcupdate.h | 6 ++++--
> >  include/linux/rcutiny.h  | 1 +
> >  kernel/rcu/tree.c        | 9 +++++++++
> >  3 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index d63bb77dab35..6c3ceceb6148 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -297,12 +297,14 @@ void synchronize_rcu(void);
> >  
> >  static inline void __rcu_read_lock(void)
> >  {
> > -	preempt_disable();
> > +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> > +		preempt_disable();
> >  }
> >  
> >  static inline void __rcu_read_unlock(void)
> >  {
> > -	preempt_enable();
> > +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> > +		preempt_enable();
> >  }
> >  
> >  static inline void synchronize_rcu(void)
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > index c8a0722f77ea..4c1aaf9cce7b 100644
> > --- a/include/linux/rcutiny.h
> > +++ b/include/linux/rcutiny.h
> > @@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
> >  
> >  static inline void rcu_all_qs(void)
> >  {
> > +	barrier(); /* Avoid RCU read-side critical sections leaking across. */
> >  }
> >  
> >  #endif /* __LINUX_RCUTINY_H */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b9d9e0249e2f..93c0f23c3e45 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> >   */
> >  void rcu_note_context_switch(void)
> >  {
> > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> >  	trace_rcu_utilization(TPS("Start context switch"));
> >  	rcu_sched_qs();
> >  	rcu_preempt_note_context_switch();
> >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> >  		rcu_momentary_dyntick_idle();
> >  	trace_rcu_utilization(TPS("End context switch"));
> > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> >  }
> >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> >  
> > @@ -353,12 +355,19 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> >   * RCU flavors in desperate need of a quiescent state, which will normally
> >   * be none of them).  Either way, do a lightweight quiescent state for
> >   * all RCU flavors.
> > + *
> > + * The barrier() calls are redundant in the common case when this is
> > + * called externally, but just in case this is called from within this
> > + * file.
> > + *
> >   */
> >  void rcu_all_qs(void)
> >  {
> > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> >  		rcu_momentary_dyntick_idle();
> >  	this_cpu_inc(rcu_qs_ctr);
> > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> >  }
> >  EXPORT_SYMBOL_GPL(rcu_all_qs);
> >  
> > -- 
> > 2.5.2
> > 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 06/13] rcu: Add online/offline info to stall warning message
  2015-10-06 16:13   ` [PATCH tip/core/rcu 06/13] rcu: Add online/offline info to stall warning message Paul E. McKenney
@ 2015-10-06 17:15     ` Josh Triplett
  0 siblings, 0 replies; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:13:41AM -0700, Paul E. McKenney wrote:
> This commit makes the RCU CPU stall warning message print online/offline
> indications immediately after the CPU number.  A "?" indicates global
> offline, a "," global online, and a "!" indicates RCU believes that the
> CPU is offline and "." otherwise, both right after the CPU number.
> So for CPU 10, you would normally see "10,.:" indicating that everything
> believes that the CPU is online.

This explanation doesn't seem to agree with the actual characters used.
These use 'O', 'o', and 'N', instead, with '.' for "true" in all three
cases.  And the output from the code includes a '-' after the CPU
number.

Also, this output needs matching documentation in
src/linux/Documentation/RCU/stallwarn.txt, ideally added as part of the
same commit.

> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/tree_plugin.h | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 06116ae6dfd7..57ed9c13ae5a 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1702,8 +1702,12 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
>  		ticks_value = rsp->gpnum - rdp->gpnum;
>  	}
>  	print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
> -	pr_err("\t%d: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u fqs=%ld %s\n",
> -	       cpu, ticks_value, ticks_title,
> +	pr_err("\t%d-%c%c%c: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u fqs=%ld %s\n",
> +	       cpu,
> +	       "O."[!!cpu_online(cpu)],
> +	       "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
> +	       "N."[!!(rdp->grpmask & rdp->mynode->qsmaskinitnext)],
> +	       ticks_value, ticks_title,
>  	       atomic_read(&rdtp->dynticks) & 0xfff,
>  	       rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
>  	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
> -- 
> 2.5.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 17:01       ` Paul E. McKenney
@ 2015-10-06 17:16         ` Josh Triplett
  2015-10-06 17:42           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 10:01:01AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 09:44:45AM -0700, Josh Triplett wrote:
> > On Tue, Oct 06, 2015 at 09:13:39AM -0700, Paul E. McKenney wrote:
> > > From: Boqun Feng <boqun.feng@gmail.com>
> > > 
> > > Because preempt_disable() maps to barrier() for non-debug builds,
> > > it forces the compiler to spill and reload registers.  Because Tree
> > > RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
> > > barrier() instances generate needless extra code for each instance of
> > > rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
> > > RCU and bloats Tiny RCU.
> > > 
> > > This commit therefore removes the preempt_disable() and preempt_enable()
> > > from the non-preemptible implementations of __rcu_read_lock() and
> > > __rcu_read_unlock(), respectively.  However, for debug purposes,
> > > preempt_disable() and preempt_enable() are still invoked if
> > > CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
> > > atomic sections in non-preemptible kernels.
> > > 
> > > This is based on an earlier patch by Paul E. McKenney, fixing
> > > a bug encountered in kernels built with CONFIG_PREEMPT=n and
> > > CONFIG_PREEMPT_COUNT=y.
> > 
> > This also adds explicit barrier() calls to several internal RCU
> > functions, but the commit message doesn't explain those at all.
> 
> To compensate for them being removed from rcu_read_lock() and
> rcu_read_unlock(), but yes, I will update.

That much seemed clear from the comments, but that doesn't explain *why*
those functions need barriers of their own even though rcu_read_lock()
and rcu_read_unlock() don't.

> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > ---
> > >  include/linux/rcupdate.h | 6 ++++--
> > >  include/linux/rcutiny.h  | 1 +
> > >  kernel/rcu/tree.c        | 9 +++++++++
> > >  3 files changed, 14 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index d63bb77dab35..6c3ceceb6148 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -297,12 +297,14 @@ void synchronize_rcu(void);
> > >  
> > >  static inline void __rcu_read_lock(void)
> > >  {
> > > -	preempt_disable();
> > > +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> > > +		preempt_disable();
> > >  }
> > >  
> > >  static inline void __rcu_read_unlock(void)
> > >  {
> > > -	preempt_enable();
> > > +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> > > +		preempt_enable();
> > >  }
> > >  
> > >  static inline void synchronize_rcu(void)
> > > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > > index c8a0722f77ea..4c1aaf9cce7b 100644
> > > --- a/include/linux/rcutiny.h
> > > +++ b/include/linux/rcutiny.h
> > > @@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
> > >  
> > >  static inline void rcu_all_qs(void)
> > >  {
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking across. */
> > >  }
> > >  
> > >  #endif /* __LINUX_RCUTINY_H */
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b9d9e0249e2f..93c0f23c3e45 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> > >   */
> > >  void rcu_note_context_switch(void)
> > >  {
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > >  	trace_rcu_utilization(TPS("Start context switch"));
> > >  	rcu_sched_qs();
> > >  	rcu_preempt_note_context_switch();
> > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > >  		rcu_momentary_dyntick_idle();
> > >  	trace_rcu_utilization(TPS("End context switch"));
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > >  }
> > >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > >  
> > > @@ -353,12 +355,19 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > >   * RCU flavors in desperate need of a quiescent state, which will normally
> > >   * be none of them).  Either way, do a lightweight quiescent state for
> > >   * all RCU flavors.
> > > + *
> > > + * The barrier() calls are redundant in the common case when this is
> > > + * called externally, but just in case this is called from within this
> > > + * file.
> > > + *
> > >   */
> > >  void rcu_all_qs(void)
> > >  {
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > >  		rcu_momentary_dyntick_idle();
> > >  	this_cpu_inc(rcu_qs_ctr);
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > >  }
> > >  EXPORT_SYMBOL_GPL(rcu_all_qs);
> > >  
> > > -- 
> > > 2.5.2
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 16:13   ` [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock() Paul E. McKenney
@ 2015-10-06 17:18     ` Josh Triplett
  2015-10-06 17:36       ` Paul E. McKenney
  2015-10-06 20:07     ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> Currently, __srcu_read_lock() cannot be invoked from restricted
> environments because it contains calls to preempt_disable() and
> preempt_enable(), both of which can invoke lockdep, which is a bad
> idea in some restricted execution modes.  This commit therefore moves
> the preempt_disable() and preempt_enable() from __srcu_read_lock()
> to srcu_read_lock().  It also inserts the preempt_disable() and
> preempt_enable() around the call to __srcu_read_lock() in do_exit().

What restricted environments do you intend to invoke
__srcu_read_lock from?

This change seems fine, but I don't see any change in this patch series
that needs this, hence my curiosity.

> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/srcu.h | 5 ++++-
>  kernel/exit.c        | 2 ++
>  kernel/rcu/srcu.c    | 2 --
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index bdeb4567b71e..f5f80c5643ac 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -215,8 +215,11 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
>   */
>  static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
>  {
> -	int retval = __srcu_read_lock(sp);
> +	int retval;
>  
> +	preempt_disable();
> +	retval = __srcu_read_lock(sp);
> +	preempt_enable();
>  	rcu_lock_acquire(&(sp)->dep_map);
>  	return retval;
>  }
> diff --git a/kernel/exit.c b/kernel/exit.c
> index ea95ee1b5ef7..0e93b63bbc59 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -761,7 +761,9 @@ void do_exit(long code)
>  	 */
>  	flush_ptrace_hw_breakpoint(tsk);
>  
> +	TASKS_RCU(preempt_disable());
>  	TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
> +	TASKS_RCU(preempt_enable());
>  	exit_notify(tsk, group_dead);
>  	proc_exit_connector(tsk);
>  #ifdef CONFIG_NUMA
> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> index 9e6122540d28..a63a1ea5a41b 100644
> --- a/kernel/rcu/srcu.c
> +++ b/kernel/rcu/srcu.c
> @@ -298,11 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp)
>  	int idx;
>  
>  	idx = READ_ONCE(sp->completed) & 0x1;
> -	preempt_disable();
>  	__this_cpu_inc(sp->per_cpu_ref->c[idx]);
>  	smp_mb(); /* B */  /* Avoid leaking the critical section. */
>  	__this_cpu_inc(sp->per_cpu_ref->seq[idx]);
> -	preempt_enable();
>  	return idx;
>  }
>  EXPORT_SYMBOL_GPL(__srcu_read_lock);
> -- 
> 2.5.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 16:13   ` [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff() Paul E. McKenney
@ 2015-10-06 17:21     ` Josh Triplett
  2015-10-06 17:31       ` Paul E. McKenney
  2015-10-06 20:27     ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
> This commit adds an rcu_pointer_handoff() that is intended to mark
> situations where a structure's protection transitions from RCU to some
> other mechanism (locking, reference counting, whatever).  These markings
> should allow external tools to more easily spot bugs involving leaking
> pointers out of RCU read-side critical sections.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Shouldn't this expect the __rcu address space on the pointer, and cast
away the __rcu with __force?

>  include/linux/rcupdate.h | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 6c3ceceb6148..587eb057e2fa 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -813,6 +813,28 @@ static inline void rcu_preempt_sleep_check(void)
>  #define rcu_dereference_sched(p) rcu_dereference_sched_check(p, 0)
>  
>  /**
> + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> + * @p: The pointer to hand off
> + *
> + * This is simply an identity function, but it documents where a pointer
> + * is handed off from RCU to some other synchronization mechanism, for
> + * example, reference counting or locking.  In C11, it would map to
> + * kill_dependency().  It could be used as follows:
> + *
> + *	rcu_read_lock();
> + *	p = rcu_dereference(gp);
> + *	long_lived = is_long_lived(p);
> + *	if (long_lived) {
> + *		if (!atomic_inc_not_zero(p->refcnt))
> + *			long_lived = false;
> + *		else
> + *			p = rcu_pointer_handoff(p);
> + *	}
> + *	rcu_read_unlock();
> + */
> +#define rcu_pointer_handoff(p) (p)
> +
> +/**
>   * rcu_read_lock() - mark the beginning of an RCU read-side critical section
>   *
>   * When synchronize_rcu() is invoked on one CPU while other CPUs
> -- 
> 2.5.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4
  2015-10-06 16:13 [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Paul E. McKenney
  2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
@ 2015-10-06 17:23 ` Josh Triplett
  2015-10-06 17:38   ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:13:05AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This series contains miscellaneous fixes:
> 
> 1.	Export sched_setscheduler_nocheck() so that the new locktorture
> 	rtmutex_lock tests can be run as modules, courtesy of Davidlohr
> 	Bueso.
> 
> 2.	Use rcu_callback_t in call_rcu*() and friends to improve
> 	readability and to make cscope able to find them, courtesy of
> 	Boqun Feng.
> 
> 3.	Use call_rcu_func_t to replace explicit type equivalents when
> 	defining RCU callback functions, courtesy of Boqun Feng.
> 
> 4.	Don't unnecessarily disable preemption for Tiny and Tree
> 	RCU readers (only for preemptible RCU readers), courtesy
> 	of Boqun Feng.
> 
> 5.	Eliminate boot-time panic when a silly boot-time fanout is
> 	specified.
> 
> 6.	Add online/offline info to help debug stall-warning messages.
> 
> 7.	Move preemption disabling out of __srcu_read_lock() into
> 	srcu_read_lock().
> 
> 8.	Finish folding ->fqs_state into ->gp_state, courtesy of Petr Mladek.
> 
> 9.	Correct comment for values of ->gp_state field.
> 
> 10.	Add rcu_pointer_handoff() to allow explicit marking of handing
> 	off protection from RCU to some other means, such as locking
> 	or reference counting.
> 
> 11.	Make list_entry_rcu() use lockless_dereference(), courtesy
> 	of Patrick Marlier.  Despite the fact that this patch
> 	does nothing more than eliminate a single store and a
> 	single load of an unshared stack variable it nevertheless
> 	manages to provide a measurable performance increase:
> 	http://people.csail.mit.edu/amatveev/RLU_SOSP2015.pdf
> 
> 12.	Remove deprecated rcu_lockdep_assert().

(And a patch 13 not mentioned here.)

I responded to patches 4, 6, and 10 with feedback; for the rest (and for
those with the issues addressed):

Reviewed-by: Josh Triplett <josh@joshtriplett.org>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 17:21     ` Josh Triplett
@ 2015-10-06 17:31       ` Paul E. McKenney
  2015-10-06 17:36         ` Josh Triplett
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 17:31 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:21:28AM -0700, Josh Triplett wrote:
> On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
> > This commit adds an rcu_pointer_handoff() that is intended to mark
> > situations where a structure's protection transitions from RCU to some
> > other mechanism (locking, reference counting, whatever).  These markings
> > should allow external tools to more easily spot bugs involving leaking
> > pointers out of RCU read-side critical sections.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Shouldn't this expect the __rcu address space on the pointer, and cast
> away the __rcu with __force?

I do not believe so, given that the __rcu was already removed by a preceding
rcu_dereference().  Or am I missing something?

							Thanx, Paul

> >  include/linux/rcupdate.h | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 6c3ceceb6148..587eb057e2fa 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -813,6 +813,28 @@ static inline void rcu_preempt_sleep_check(void)
> >  #define rcu_dereference_sched(p) rcu_dereference_sched_check(p, 0)
> >  
> >  /**
> > + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> > + * @p: The pointer to hand off
> > + *
> > + * This is simply an identity function, but it documents where a pointer
> > + * is handed off from RCU to some other synchronization mechanism, for
> > + * example, reference counting or locking.  In C11, it would map to
> > + * kill_dependency().  It could be used as follows:
> > + *
> > + *	rcu_read_lock();
> > + *	p = rcu_dereference(gp);
> > + *	long_lived = is_long_lived(p);
> > + *	if (long_lived) {
> > + *		if (!atomic_inc_not_zero(p->refcnt))
> > + *			long_lived = false;
> > + *		else
> > + *			p = rcu_pointer_handoff(p);
> > + *	}
> > + *	rcu_read_unlock();
> > + */
> > +#define rcu_pointer_handoff(p) (p)
> > +
> > +/**
> >   * rcu_read_lock() - mark the beginning of an RCU read-side critical section
> >   *
> >   * When synchronize_rcu() is invoked on one CPU while other CPUs
> > -- 
> > 2.5.2
> > 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 17:31       ` Paul E. McKenney
@ 2015-10-06 17:36         ` Josh Triplett
  0 siblings, 0 replies; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:31:52AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:21:28AM -0700, Josh Triplett wrote:
> > On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
> > > This commit adds an rcu_pointer_handoff() that is intended to mark
> > > situations where a structure's protection transitions from RCU to some
> > > other mechanism (locking, reference counting, whatever).  These markings
> > > should allow external tools to more easily spot bugs involving leaking
> > > pointers out of RCU read-side critical sections.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > Shouldn't this expect the __rcu address space on the pointer, and cast
> > away the __rcu with __force?
> 
> I do not believe so, given that the __rcu was already removed by a preceding
> rcu_dereference().  Or am I missing something?

Ah, I see.  Per the example, you don't call this on an __rcu pointer
directly, only on a pointer you've already obtained from RCU after
giving it a lifetime of its own.  Nevermind.

> > >  include/linux/rcupdate.h | 22 ++++++++++++++++++++++
> > >  1 file changed, 22 insertions(+)
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 6c3ceceb6148..587eb057e2fa 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -813,6 +813,28 @@ static inline void rcu_preempt_sleep_check(void)
> > >  #define rcu_dereference_sched(p) rcu_dereference_sched_check(p, 0)
> > >  
> > >  /**
> > > + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> > > + * @p: The pointer to hand off
> > > + *
> > > + * This is simply an identity function, but it documents where a pointer
> > > + * is handed off from RCU to some other synchronization mechanism, for
> > > + * example, reference counting or locking.  In C11, it would map to
> > > + * kill_dependency().  It could be used as follows:
> > > + *
> > > + *	rcu_read_lock();
> > > + *	p = rcu_dereference(gp);
> > > + *	long_lived = is_long_lived(p);
> > > + *	if (long_lived) {
> > > + *		if (!atomic_inc_not_zero(p->refcnt))
> > > + *			long_lived = false;
> > > + *		else
> > > + *			p = rcu_pointer_handoff(p);
> > > + *	}
> > > + *	rcu_read_unlock();
> > > + */
> > > +#define rcu_pointer_handoff(p) (p)
> > > +
> > > +/**
> > >   * rcu_read_lock() - mark the beginning of an RCU read-side critical section
> > >   *
> > >   * When synchronize_rcu() is invoked on one CPU while other CPUs
> > > -- 
> > > 2.5.2
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 17:18     ` Josh Triplett
@ 2015-10-06 17:36       ` Paul E. McKenney
  2015-10-06 17:43         ` Josh Triplett
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 17:36 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:18:39AM -0700, Josh Triplett wrote:
> On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > Currently, __srcu_read_lock() cannot be invoked from restricted
> > environments because it contains calls to preempt_disable() and
> > preempt_enable(), both of which can invoke lockdep, which is a bad
> > idea in some restricted execution modes.  This commit therefore moves
> > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > to srcu_read_lock().  It also inserts the preempt_disable() and
> > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> 
> What restricted environments do you intend to invoke
> __srcu_read_lock from?
> 
> This change seems fine, but I don't see any change in this patch series
> that needs this, hence my curiosity.

Someone asked me for it, and now I cannot find it.  :-(

Something to the effect of when running unmapped during exception entry
or something like that.  I guess one way to find out would be to remove
the commit and see who complained, but on the other hand, it arguably
makes more sense to have only the bare mechanism is __srcu_read_lock()
and put the environmental protection into srcu_read_lock().

							Thanx, Paul

> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/srcu.h | 5 ++++-
> >  kernel/exit.c        | 2 ++
> >  kernel/rcu/srcu.c    | 2 --
> >  3 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index bdeb4567b71e..f5f80c5643ac 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -215,8 +215,11 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
> >   */
> >  static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
> >  {
> > -	int retval = __srcu_read_lock(sp);
> > +	int retval;
> >  
> > +	preempt_disable();
> > +	retval = __srcu_read_lock(sp);
> > +	preempt_enable();
> >  	rcu_lock_acquire(&(sp)->dep_map);
> >  	return retval;
> >  }
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index ea95ee1b5ef7..0e93b63bbc59 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -761,7 +761,9 @@ void do_exit(long code)
> >  	 */
> >  	flush_ptrace_hw_breakpoint(tsk);
> >  
> > +	TASKS_RCU(preempt_disable());
> >  	TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
> > +	TASKS_RCU(preempt_enable());
> >  	exit_notify(tsk, group_dead);
> >  	proc_exit_connector(tsk);
> >  #ifdef CONFIG_NUMA
> > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> > index 9e6122540d28..a63a1ea5a41b 100644
> > --- a/kernel/rcu/srcu.c
> > +++ b/kernel/rcu/srcu.c
> > @@ -298,11 +298,9 @@ int __srcu_read_lock(struct srcu_struct *sp)
> >  	int idx;
> >  
> >  	idx = READ_ONCE(sp->completed) & 0x1;
> > -	preempt_disable();
> >  	__this_cpu_inc(sp->per_cpu_ref->c[idx]);
> >  	smp_mb(); /* B */  /* Avoid leaking the critical section. */
> >  	__this_cpu_inc(sp->per_cpu_ref->seq[idx]);
> > -	preempt_enable();
> >  	return idx;
> >  }
> >  EXPORT_SYMBOL_GPL(__srcu_read_lock);
> > -- 
> > 2.5.2
> > 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4
  2015-10-06 17:23 ` [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Josh Triplett
@ 2015-10-06 17:38   ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 17:38 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:23:51AM -0700, Josh Triplett wrote:
> On Tue, Oct 06, 2015 at 09:13:05AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This series contains miscellaneous fixes:
> > 
> > 1.	Export sched_setscheduler_nocheck() so that the new locktorture
> > 	rtmutex_lock tests can be run as modules, courtesy of Davidlohr
> > 	Bueso.
> > 
> > 2.	Use rcu_callback_t in call_rcu*() and friends to improve
> > 	readability and to make cscope able to find them, courtesy of
> > 	Boqun Feng.
> > 
> > 3.	Use call_rcu_func_t to replace explicit type equivalents when
> > 	defining RCU callback functions, courtesy of Boqun Feng.
> > 
> > 4.	Don't unnecessarily disable preemption for Tiny and Tree
> > 	RCU readers (only for preemptible RCU readers), courtesy
> > 	of Boqun Feng.
> > 
> > 5.	Eliminate boot-time panic when a silly boot-time fanout is
> > 	specified.
> > 
> > 6.	Add online/offline info to help debug stall-warning messages.
> > 
> > 7.	Move preemption disabling out of __srcu_read_lock() into
> > 	srcu_read_lock().
> > 
> > 8.	Finish folding ->fqs_state into ->gp_state, courtesy of Petr Mladek.
> > 
> > 9.	Correct comment for values of ->gp_state field.
> > 
> > 10.	Add rcu_pointer_handoff() to allow explicit marking of handing
> > 	off protection from RCU to some other means, such as locking
> > 	or reference counting.
> > 
> > 11.	Make list_entry_rcu() use lockless_dereference(), courtesy
> > 	of Patrick Marlier.  Despite the fact that this patch
> > 	does nothing more than eliminate a single store and a
> > 	single load of an unshared stack variable it nevertheless
> > 	manages to provide a measurable performance increase:
> > 	http://people.csail.mit.edu/amatveev/RLU_SOSP2015.pdf
> > 
> > 12.	Remove deprecated rcu_lockdep_assert().
> 
> (And a patch 13 not mentioned here.)

Oops...  "Use WRITE_ONCE() when deleting from reader-visible list" included
to make KTSAN happier.

> I responded to patches 4, 6, and 10 with feedback; for the rest (and for
> those with the issues addressed):
> 
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>

Thank you for the review and comments!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 17:16         ` Josh Triplett
@ 2015-10-06 17:42           ` Paul E. McKenney
  2015-10-06 17:46             ` Josh Triplett
  2015-10-06 20:05             ` Peter Zijlstra
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 17:42 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 10:16:30AM -0700, Josh Triplett wrote:
> On Tue, Oct 06, 2015 at 10:01:01AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 09:44:45AM -0700, Josh Triplett wrote:
> > > On Tue, Oct 06, 2015 at 09:13:39AM -0700, Paul E. McKenney wrote:
> > > > From: Boqun Feng <boqun.feng@gmail.com>
> > > > 
> > > > Because preempt_disable() maps to barrier() for non-debug builds,
> > > > it forces the compiler to spill and reload registers.  Because Tree
> > > > RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
> > > > barrier() instances generate needless extra code for each instance of
> > > > rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
> > > > RCU and bloats Tiny RCU.
> > > > 
> > > > This commit therefore removes the preempt_disable() and preempt_enable()
> > > > from the non-preemptible implementations of __rcu_read_lock() and
> > > > __rcu_read_unlock(), respectively.  However, for debug purposes,
> > > > preempt_disable() and preempt_enable() are still invoked if
> > > > CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
> > > > atomic sections in non-preemptible kernels.
> > > > 
> > > > This is based on an earlier patch by Paul E. McKenney, fixing
> > > > a bug encountered in kernels built with CONFIG_PREEMPT=n and
> > > > CONFIG_PREEMPT_COUNT=y.
> > > 
> > > This also adds explicit barrier() calls to several internal RCU
> > > functions, but the commit message doesn't explain those at all.
> > 
> > To compensate for them being removed from rcu_read_lock() and
> > rcu_read_unlock(), but yes, I will update.
> 
> That much seemed clear from the comments, but that doesn't explain *why*
> those functions need barriers of their own even though rcu_read_lock()
> and rcu_read_unlock() don't.

Ah.  The reason is that Tiny RCU and Tree RCU (the !PREEMPT ones) act
by implicitly extending (and, if need be, merging) the RCU read-side
critical sections to include all the code between successive quiescent
states, for example, all the code between a pair of calls to schedule().

Therefore, there need to be barrier() calls in the quiescent-state
functions.  Some could be argued to be implicitly present due to
translation-unit boundaries, but paranoia and all that.

Would adding that sort of explanation help?

							Thanx, Paul

> > > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > ---
> > > >  include/linux/rcupdate.h | 6 ++++--
> > > >  include/linux/rcutiny.h  | 1 +
> > > >  kernel/rcu/tree.c        | 9 +++++++++
> > > >  3 files changed, 14 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index d63bb77dab35..6c3ceceb6148 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -297,12 +297,14 @@ void synchronize_rcu(void);
> > > >  
> > > >  static inline void __rcu_read_lock(void)
> > > >  {
> > > > -	preempt_disable();
> > > > +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> > > > +		preempt_disable();
> > > >  }
> > > >  
> > > >  static inline void __rcu_read_unlock(void)
> > > >  {
> > > > -	preempt_enable();
> > > > +	if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
> > > > +		preempt_enable();
> > > >  }
> > > >  
> > > >  static inline void synchronize_rcu(void)
> > > > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > > > index c8a0722f77ea..4c1aaf9cce7b 100644
> > > > --- a/include/linux/rcutiny.h
> > > > +++ b/include/linux/rcutiny.h
> > > > @@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
> > > >  
> > > >  static inline void rcu_all_qs(void)
> > > >  {
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking across. */
> > > >  }
> > > >  
> > > >  #endif /* __LINUX_RCUTINY_H */
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b9d9e0249e2f..93c0f23c3e45 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> > > >   */
> > > >  void rcu_note_context_switch(void)
> > > >  {
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > > >  	trace_rcu_utilization(TPS("Start context switch"));
> > > >  	rcu_sched_qs();
> > > >  	rcu_preempt_note_context_switch();
> > > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > > >  		rcu_momentary_dyntick_idle();
> > > >  	trace_rcu_utilization(TPS("End context switch"));
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > > >  
> > > > @@ -353,12 +355,19 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > > >   * RCU flavors in desperate need of a quiescent state, which will normally
> > > >   * be none of them).  Either way, do a lightweight quiescent state for
> > > >   * all RCU flavors.
> > > > + *
> > > > + * The barrier() calls are redundant in the common case when this is
> > > > + * called externally, but just in case this is called from within this
> > > > + * file.
> > > > + *
> > > >   */
> > > >  void rcu_all_qs(void)
> > > >  {
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > > >  		rcu_momentary_dyntick_idle();
> > > >  	this_cpu_inc(rcu_qs_ctr);
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(rcu_all_qs);
> > > >  
> > > > -- 
> > > > 2.5.2
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 17:36       ` Paul E. McKenney
@ 2015-10-06 17:43         ` Josh Triplett
  2015-10-06 18:07           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:36:46AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:18:39AM -0700, Josh Triplett wrote:
> > On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > > Currently, __srcu_read_lock() cannot be invoked from restricted
> > > environments because it contains calls to preempt_disable() and
> > > preempt_enable(), both of which can invoke lockdep, which is a bad
> > > idea in some restricted execution modes.  This commit therefore moves
> > > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > > to srcu_read_lock().  It also inserts the preempt_disable() and
> > > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> > 
> > What restricted environments do you intend to invoke
> > __srcu_read_lock from?
> > 
> > This change seems fine, but I don't see any change in this patch series
> > that needs this, hence my curiosity.
> 
> Someone asked me for it, and now I cannot find it.  :-(
> 
> Something to the effect of when running unmapped during exception entry
> or something like that.  I guess one way to find out would be to remove
> the commit and see who complained, but on the other hand, it arguably
> makes more sense to have only the bare mechanism is __srcu_read_lock()
> and put the environmental protection into srcu_read_lock().

I agree; I just find the idea that someone would need to call
__srcu_read_lock rather than srcu_read_lock odd and worthy of further
understanding. :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 17:42           ` Paul E. McKenney
@ 2015-10-06 17:46             ` Josh Triplett
  2015-10-06 20:05             ` Peter Zijlstra
  1 sibling, 0 replies; 57+ messages in thread
From: Josh Triplett @ 2015-10-06 17:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 10:42:04AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:16:30AM -0700, Josh Triplett wrote:
> > On Tue, Oct 06, 2015 at 10:01:01AM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 06, 2015 at 09:44:45AM -0700, Josh Triplett wrote:
> > > > On Tue, Oct 06, 2015 at 09:13:39AM -0700, Paul E. McKenney wrote:
> > > > > From: Boqun Feng <boqun.feng@gmail.com>
> > > > > 
> > > > > Because preempt_disable() maps to barrier() for non-debug builds,
> > > > > it forces the compiler to spill and reload registers.  Because Tree
> > > > > RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
> > > > > barrier() instances generate needless extra code for each instance of
> > > > > rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
> > > > > RCU and bloats Tiny RCU.
> > > > > 
> > > > > This commit therefore removes the preempt_disable() and preempt_enable()
> > > > > from the non-preemptible implementations of __rcu_read_lock() and
> > > > > __rcu_read_unlock(), respectively.  However, for debug purposes,
> > > > > preempt_disable() and preempt_enable() are still invoked if
> > > > > CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
> > > > > atomic sections in non-preemptible kernels.
> > > > > 
> > > > > This is based on an earlier patch by Paul E. McKenney, fixing
> > > > > a bug encountered in kernels built with CONFIG_PREEMPT=n and
> > > > > CONFIG_PREEMPT_COUNT=y.
> > > > 
> > > > This also adds explicit barrier() calls to several internal RCU
> > > > functions, but the commit message doesn't explain those at all.
> > > 
> > > To compensate for them being removed from rcu_read_lock() and
> > > rcu_read_unlock(), but yes, I will update.
> > 
> > That much seemed clear from the comments, but that doesn't explain *why*
> > those functions need barriers of their own even though rcu_read_lock()
> > and rcu_read_unlock() don't.
> 
> Ah.  The reason is that Tiny RCU and Tree RCU (the !PREEMPT ones) act
> by implicitly extending (and, if need be, merging) the RCU read-side
> critical sections to include all the code between successive quiescent
> states, for example, all the code between a pair of calls to schedule().
> 
> Therefore, there need to be barrier() calls in the quiescent-state
> functions.  Some could be argued to be implicitly present due to
> translation-unit boundaries, but paranoia and all that.
> 
> Would adding that sort of explanation help?

Yes, it would.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 17:43         ` Josh Triplett
@ 2015-10-06 18:07           ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 18:07 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, peterz, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:43:11AM -0700, Josh Triplett wrote:
> On Tue, Oct 06, 2015 at 10:36:46AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 10:18:39AM -0700, Josh Triplett wrote:
> > > On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > > > Currently, __srcu_read_lock() cannot be invoked from restricted
> > > > environments because it contains calls to preempt_disable() and
> > > > preempt_enable(), both of which can invoke lockdep, which is a bad
> > > > idea in some restricted execution modes.  This commit therefore moves
> > > > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > > > to srcu_read_lock().  It also inserts the preempt_disable() and
> > > > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> > > 
> > > What restricted environments do you intend to invoke
> > > __srcu_read_lock from?
> > > 
> > > This change seems fine, but I don't see any change in this patch series
> > > that needs this, hence my curiosity.
> > 
> > Someone asked me for it, and now I cannot find it.  :-(
> > 
> > Something to the effect of when running unmapped during exception entry
> > or something like that.  I guess one way to find out would be to remove
> > the commit and see who complained, but on the other hand, it arguably
> > makes more sense to have only the bare mechanism is __srcu_read_lock()
> > and put the environmental protection into srcu_read_lock().
> 
> I agree; I just find the idea that someone would need to call
> __srcu_read_lock rather than srcu_read_lock odd and worthy of further
> understanding. :)

And they did supply an explanation that seemed satisfactory at the time,
but I cannot find that either.  I clearly need to track that sort of stuff
better!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 17:42           ` Paul E. McKenney
  2015-10-06 17:46             ` Josh Triplett
@ 2015-10-06 20:05             ` Peter Zijlstra
  2015-10-06 20:18               ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-06 20:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Triplett, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 10:42:04AM -0700, Paul E. McKenney wrote:

> 
> Ah.  The reason is that Tiny RCU and Tree RCU (the !PREEMPT ones) act
> by implicitly extending (and, if need be, merging) the RCU read-side
> critical sections to include all the code between successive quiescent
> states, for example, all the code between a pair of calls to schedule().
> 
> Therefore, there need to be barrier() calls in the quiescent-state
> functions.  Some could be argued to be implicitly present due to
> translation-unit boundaries, but paranoia and all that.
> 
> Would adding that sort of explanation help?

> +++ b/include/linux/rcutiny.h
> @@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
>  
>  static inline void rcu_all_qs(void)
>  {
> +	barrier(); /* Avoid RCU read-side critical sections leaking across. */
>  }
>  
>  #endif /* __LINUX_RCUTINY_H */

This is more than sheer paranoia I think, inlined functions are not a
compiler barrier.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b9d9e0249e2f..93c0f23c3e45 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
>   */
>  void rcu_note_context_switch(void)
>  {
> +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
>  	trace_rcu_utilization(TPS("Start context switch"));
>  	rcu_sched_qs();
>  	rcu_preempt_note_context_switch();
>  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
>  		rcu_momentary_dyntick_idle();
>  	trace_rcu_utilization(TPS("End context switch"));
> +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
>  }
>  EXPORT_SYMBOL_GPL(rcu_note_context_switch);

These OTOH could be fixed with a noinline, such that the compiler may
never inline it, even with whole-program-optimizations, thereby
guaranteeing a function call boundary or compiler barrier.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 16:13   ` [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock() Paul E. McKenney
  2015-10-06 17:18     ` Josh Triplett
@ 2015-10-06 20:07     ` Peter Zijlstra
  2015-10-06 20:19       ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-06 20:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> Currently, __srcu_read_lock() cannot be invoked from restricted
> environments because it contains calls to preempt_disable() and
> preempt_enable(), both of which can invoke lockdep, which is a bad
> idea in some restricted execution modes.  This commit therefore moves
> the preempt_disable() and preempt_enable() from __srcu_read_lock()
> to srcu_read_lock().  It also inserts the preempt_disable() and
> preempt_enable() around the call to __srcu_read_lock() in do_exit().

Did you not simply want to use: preempt_disable_notrace() ?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 20:05             ` Peter Zijlstra
@ 2015-10-06 20:18               ` Paul E. McKenney
  2015-10-06 20:52                 ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 20:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Triplett, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 10:05:38PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 10:42:04AM -0700, Paul E. McKenney wrote:
> 
> > 
> > Ah.  The reason is that Tiny RCU and Tree RCU (the !PREEMPT ones) act
> > by implicitly extending (and, if need be, merging) the RCU read-side
> > critical sections to include all the code between successive quiescent
> > states, for example, all the code between a pair of calls to schedule().
> > 
> > Therefore, there need to be barrier() calls in the quiescent-state
> > functions.  Some could be argued to be implicitly present due to
> > translation-unit boundaries, but paranoia and all that.
> > 
> > Would adding that sort of explanation help?
> 
> > +++ b/include/linux/rcutiny.h
> > @@ -216,6 +216,7 @@ static inline bool rcu_is_watching(void)
> >  
> >  static inline void rcu_all_qs(void)
> >  {
> > +	barrier(); /* Avoid RCU read-side critical sections leaking across. */
> >  }
> >  
> >  #endif /* __LINUX_RCUTINY_H */
> 
> This is more than sheer paranoia I think, inlined functions are not a
> compiler barrier.

Yep, agreed.

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b9d9e0249e2f..93c0f23c3e45 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> >   */
> >  void rcu_note_context_switch(void)
> >  {
> > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> >  	trace_rcu_utilization(TPS("Start context switch"));
> >  	rcu_sched_qs();
> >  	rcu_preempt_note_context_switch();
> >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> >  		rcu_momentary_dyntick_idle();
> >  	trace_rcu_utilization(TPS("End context switch"));
> > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> >  }
> >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> 
> These OTOH could be fixed with a noinline, such that the compiler may
> never inline it, even with whole-program-optimizations, thereby
> guaranteeing a function call boundary or compiler barrier.

I like the barrier() with the comment.  I expect it will be a bit more
robust against toolchain changes.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 20:07     ` Peter Zijlstra
@ 2015-10-06 20:19       ` Paul E. McKenney
  2015-10-06 20:32         ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 20:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:07:25PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > Currently, __srcu_read_lock() cannot be invoked from restricted
> > environments because it contains calls to preempt_disable() and
> > preempt_enable(), both of which can invoke lockdep, which is a bad
> > idea in some restricted execution modes.  This commit therefore moves
> > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > to srcu_read_lock().  It also inserts the preempt_disable() and
> > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> 
> Did you not simply want to use: preempt_disable_notrace() ?

I believe that tracing the preempt_disable() in srcu_read_lock() and
srcu_read_unlock() is actually a good thing.  Or am I missing your
point?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 16:13   ` [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff() Paul E. McKenney
  2015-10-06 17:21     ` Josh Triplett
@ 2015-10-06 20:27     ` Peter Zijlstra
  2015-10-06 21:02       ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-06 20:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
>  /**
> + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> + * @p: The pointer to hand off
> + *
> + * This is simply an identity function, but it documents where a pointer
> + * is handed off from RCU to some other synchronization mechanism, for
> + * example, reference counting or locking.  In C11, it would map to
> + * kill_dependency().  It could be used as follows:
> + *
> + *	rcu_read_lock();
> + *	p = rcu_dereference(gp);
> + *	long_lived = is_long_lived(p);
> + *	if (long_lived) {
> + *		if (!atomic_inc_not_zero(p->refcnt))
> + *			long_lived = false;
> + *		else
> + *			p = rcu_pointer_handoff(p);
> + *	}
> + *	rcu_read_unlock();
> + */
> +#define rcu_pointer_handoff(p) (p)

Will you actually be using this? It seems a tad pointless to add if you
don't.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 20:19       ` Paul E. McKenney
@ 2015-10-06 20:32         ` Peter Zijlstra
  2015-10-06 21:03           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-06 20:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 01:19:15PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:07:25PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > > Currently, __srcu_read_lock() cannot be invoked from restricted
> > > environments because it contains calls to preempt_disable() and
> > > preempt_enable(), both of which can invoke lockdep, which is a bad
> > > idea in some restricted execution modes.  This commit therefore moves
> > > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > > to srcu_read_lock().  It also inserts the preempt_disable() and
> > > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> > 
> > Did you not simply want to use: preempt_disable_notrace() ?
> 
> I believe that tracing the preempt_disable() in srcu_read_lock() and
> srcu_read_unlock() is actually a good thing.  Or am I missing your
> point?

Depends a bit on why we needed this change in the first place -- which,
going by the other branch of this thread, seems lost. However,
preempt_{dis,en}able_notrace() will not end up in any tracer/lockdep and
generate the minimum code that preserves the required semantics.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 20:18               ` Paul E. McKenney
@ 2015-10-06 20:52                 ` Peter Zijlstra
  2015-10-06 21:05                   ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-06 20:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Triplett, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 01:18:01PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:05:38PM +0200, Peter Zijlstra wrote:

> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b9d9e0249e2f..93c0f23c3e45 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> > >   */
> > >  void rcu_note_context_switch(void)
> > >  {
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > >  	trace_rcu_utilization(TPS("Start context switch"));
> > >  	rcu_sched_qs();
> > >  	rcu_preempt_note_context_switch();
> > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > >  		rcu_momentary_dyntick_idle();
> > >  	trace_rcu_utilization(TPS("End context switch"));
> > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > >  }
> > >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > 
> > These OTOH could be fixed with a noinline, such that the compiler may
> > never inline it, even with whole-program-optimizations, thereby
> > guaranteeing a function call boundary or compiler barrier.
> 
> I like the barrier() with the comment.  I expect it will be a bit more
> robust against toolchain changes.

Don't you in fact already rely on the fact that schedule() is a function
call and will not be inlined? (it doesn't have noinline and I suppose
whole program optimizers could go funny on it).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 20:27     ` Peter Zijlstra
@ 2015-10-06 21:02       ` Paul E. McKenney
  2015-10-07  7:22         ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 21:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:27:41PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
> >  /**
> > + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> > + * @p: The pointer to hand off
> > + *
> > + * This is simply an identity function, but it documents where a pointer
> > + * is handed off from RCU to some other synchronization mechanism, for
> > + * example, reference counting or locking.  In C11, it would map to
> > + * kill_dependency().  It could be used as follows:
> > + *
> > + *	rcu_read_lock();
> > + *	p = rcu_dereference(gp);
> > + *	long_lived = is_long_lived(p);
> > + *	if (long_lived) {
> > + *		if (!atomic_inc_not_zero(p->refcnt))
> > + *			long_lived = false;
> > + *		else
> > + *			p = rcu_pointer_handoff(p);
> > + *	}
> > + *	rcu_read_unlock();
> > + */
> > +#define rcu_pointer_handoff(p) (p)
> 
> Will you actually be using this? It seems a tad pointless to add if you
> don't.

Some of the LLVM guys believe that they can diagnose RCU pointer leaks
if this is used.  But yes, it does need to be used.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 20:32         ` Peter Zijlstra
@ 2015-10-06 21:03           ` Paul E. McKenney
  2015-10-07  7:20             ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 21:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 10:32:24PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 01:19:15PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 10:07:25PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > > > Currently, __srcu_read_lock() cannot be invoked from restricted
> > > > environments because it contains calls to preempt_disable() and
> > > > preempt_enable(), both of which can invoke lockdep, which is a bad
> > > > idea in some restricted execution modes.  This commit therefore moves
> > > > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > > > to srcu_read_lock().  It also inserts the preempt_disable() and
> > > > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> > > 
> > > Did you not simply want to use: preempt_disable_notrace() ?
> > 
> > I believe that tracing the preempt_disable() in srcu_read_lock() and
> > srcu_read_unlock() is actually a good thing.  Or am I missing your
> > point?
> 
> Depends a bit on why we needed this change in the first place -- which,
> going by the other branch of this thread, seems lost. However,
> preempt_{dis,en}able_notrace() will not end up in any tracer/lockdep and
> generate the minimum code that preserves the required semantics.

True enough!  But can all architectures locate the TIF in all contexts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 20:52                 ` Peter Zijlstra
@ 2015-10-06 21:05                   ` Paul E. McKenney
  2015-10-07  7:19                     ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-06 21:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Triplett, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 10:52:00PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 01:18:01PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 10:05:38PM +0200, Peter Zijlstra wrote:
> 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b9d9e0249e2f..93c0f23c3e45 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> > > >   */
> > > >  void rcu_note_context_switch(void)
> > > >  {
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > > >  	trace_rcu_utilization(TPS("Start context switch"));
> > > >  	rcu_sched_qs();
> > > >  	rcu_preempt_note_context_switch();
> > > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > > >  		rcu_momentary_dyntick_idle();
> > > >  	trace_rcu_utilization(TPS("End context switch"));
> > > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > > 
> > > These OTOH could be fixed with a noinline, such that the compiler may
> > > never inline it, even with whole-program-optimizations, thereby
> > > guaranteeing a function call boundary or compiler barrier.
> > 
> > I like the barrier() with the comment.  I expect it will be a bit more
> > robust against toolchain changes.
> 
> Don't you in fact already rely on the fact that schedule() is a function
> call and will not be inlined? (it doesn't have noinline and I suppose
> whole program optimizers could go funny on it).

Probably pretty much everywhere I call schedule().  But I was thinking
that barrier() and the beginning and end of an external function didn't
need to do anything.  Is that incorrect?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers
  2015-10-06 21:05                   ` Paul E. McKenney
@ 2015-10-07  7:19                     ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-07  7:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Triplett, linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, tglx, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Boqun Feng

On Tue, Oct 06, 2015 at 02:05:39PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:52:00PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 06, 2015 at 01:18:01PM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 06, 2015 at 10:05:38PM +0200, Peter Zijlstra wrote:
> > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b9d9e0249e2f..93c0f23c3e45 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -337,12 +337,14 @@ static void rcu_momentary_dyntick_idle(void)
> > > > >   */
> > > > >  void rcu_note_context_switch(void)
> > > > >  {
> > > > > +	barrier(); /* Avoid RCU read-side critical sections leaking down. */
> > > > >  	trace_rcu_utilization(TPS("Start context switch"));
> > > > >  	rcu_sched_qs();
> > > > >  	rcu_preempt_note_context_switch();
> > > > >  	if (unlikely(raw_cpu_read(rcu_sched_qs_mask)))
> > > > >  		rcu_momentary_dyntick_idle();
> > > > >  	trace_rcu_utilization(TPS("End context switch"));
> > > > > +	barrier(); /* Avoid RCU read-side critical sections leaking up. */
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(rcu_note_context_switch);
> > > > 
> > > > These OTOH could be fixed with a noinline, such that the compiler may
> > > > never inline it, even with whole-program-optimizations, thereby
> > > > guaranteeing a function call boundary or compiler barrier.
> > > 
> > > I like the barrier() with the comment.  I expect it will be a bit more
> > > robust against toolchain changes.
> > 
> > Don't you in fact already rely on the fact that schedule() is a function
> > call and will not be inlined? (it doesn't have noinline and I suppose
> > whole program optimizers could go funny on it).
> 
> Probably pretty much everywhere I call schedule().  But I was thinking
> that barrier() and the beginning and end of an external function didn't
> need to do anything.  Is that incorrect?

No. My point was more that by removing barrier() from
rcu_read_{un,}lock() you hard rely on schedule() being a compiler
barrier, and I was thinking you need it to be a function call for that,
but this is incorrect.

Even without it being a function call, there's explicit compiler
barriers in there that even whole program optimizers cannot make go
away, so my bad for creating confusion.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-06 21:03           ` Paul E. McKenney
@ 2015-10-07  7:20             ` Peter Zijlstra
  2015-10-07 14:18               ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-07  7:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 02:03:48PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:32:24PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 06, 2015 at 01:19:15PM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 06, 2015 at 10:07:25PM +0200, Peter Zijlstra wrote:
> > > > On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > > > > Currently, __srcu_read_lock() cannot be invoked from restricted
> > > > > environments because it contains calls to preempt_disable() and
> > > > > preempt_enable(), both of which can invoke lockdep, which is a bad
> > > > > idea in some restricted execution modes.  This commit therefore moves
> > > > > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > > > > to srcu_read_lock().  It also inserts the preempt_disable() and
> > > > > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> > > > 
> > > > Did you not simply want to use: preempt_disable_notrace() ?
> > > 
> > > I believe that tracing the preempt_disable() in srcu_read_lock() and
> > > srcu_read_unlock() is actually a good thing.  Or am I missing your
> > > point?
> > 
> > Depends a bit on why we needed this change in the first place -- which,
> > going by the other branch of this thread, seems lost. However,
> > preempt_{dis,en}able_notrace() will not end up in any tracer/lockdep and
> > generate the minimum code that preserves the required semantics.
> 
> True enough!  But can all architectures locate the TIF in all contexts?

They had better, otherwise we have a problem with NMIs touching it :-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-06 21:02       ` Paul E. McKenney
@ 2015-10-07  7:22         ` Peter Zijlstra
  2015-10-07 14:20           ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2015-10-07  7:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Tue, Oct 06, 2015 at 02:02:43PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 06, 2015 at 10:27:41PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
> > >  /**
> > > + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> > > + * @p: The pointer to hand off
> > > + *
> > > + * This is simply an identity function, but it documents where a pointer
> > > + * is handed off from RCU to some other synchronization mechanism, for
> > > + * example, reference counting or locking.  In C11, it would map to
> > > + * kill_dependency().  It could be used as follows:
> > > + *
> > > + *	rcu_read_lock();
> > > + *	p = rcu_dereference(gp);
> > > + *	long_lived = is_long_lived(p);
> > > + *	if (long_lived) {
> > > + *		if (!atomic_inc_not_zero(p->refcnt))
> > > + *			long_lived = false;
> > > + *		else
> > > + *			p = rcu_pointer_handoff(p);
> > > + *	}
> > > + *	rcu_read_unlock();
> > > + */
> > > +#define rcu_pointer_handoff(p) (p)
> > 
> > Will you actually be using this? It seems a tad pointless to add if you
> > don't.
> 
> Some of the LLVM guys believe that they can diagnose RCU pointer leaks
> if this is used.  But yes, it does need to be used.

The thing is, I'm not convinced this is a 'sane' interface. Its _far_
too easy to forget. It doesn't make any kind of sense either, which is
part of why its hard to remember.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock()
  2015-10-07  7:20             ` Peter Zijlstra
@ 2015-10-07 14:18               ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-07 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 09:20:38AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 02:03:48PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 10:32:24PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 06, 2015 at 01:19:15PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Oct 06, 2015 at 10:07:25PM +0200, Peter Zijlstra wrote:
> > > > > On Tue, Oct 06, 2015 at 09:13:42AM -0700, Paul E. McKenney wrote:
> > > > > > Currently, __srcu_read_lock() cannot be invoked from restricted
> > > > > > environments because it contains calls to preempt_disable() and
> > > > > > preempt_enable(), both of which can invoke lockdep, which is a bad
> > > > > > idea in some restricted execution modes.  This commit therefore moves
> > > > > > the preempt_disable() and preempt_enable() from __srcu_read_lock()
> > > > > > to srcu_read_lock().  It also inserts the preempt_disable() and
> > > > > > preempt_enable() around the call to __srcu_read_lock() in do_exit().
> > > > > 
> > > > > Did you not simply want to use: preempt_disable_notrace() ?
> > > > 
> > > > I believe that tracing the preempt_disable() in srcu_read_lock() and
> > > > srcu_read_unlock() is actually a good thing.  Or am I missing your
> > > > point?
> > > 
> > > Depends a bit on why we needed this change in the first place -- which,
> > > going by the other branch of this thread, seems lost. However,
> > > preempt_{dis,en}able_notrace() will not end up in any tracer/lockdep and
> > > generate the minimum code that preserves the required semantics.
> > 
> > True enough!  But can all architectures locate the TIF in all contexts?
> 
> They had better, otherwise we have a problem with NMIs touching it :-)

Fair enough!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff()
  2015-10-07  7:22         ` Peter Zijlstra
@ 2015-10-07 14:20           ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-07 14:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Oct 07, 2015 at 09:22:27AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2015 at 02:02:43PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 06, 2015 at 10:27:41PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 06, 2015 at 09:13:45AM -0700, Paul E. McKenney wrote:
> > > >  /**
> > > > + * rcu_pointer_handoff() - Hand off a pointer from RCU to other mechanism
> > > > + * @p: The pointer to hand off
> > > > + *
> > > > + * This is simply an identity function, but it documents where a pointer
> > > > + * is handed off from RCU to some other synchronization mechanism, for
> > > > + * example, reference counting or locking.  In C11, it would map to
> > > > + * kill_dependency().  It could be used as follows:
> > > > + *
> > > > + *	rcu_read_lock();
> > > > + *	p = rcu_dereference(gp);
> > > > + *	long_lived = is_long_lived(p);
> > > > + *	if (long_lived) {
> > > > + *		if (!atomic_inc_not_zero(p->refcnt))
> > > > + *			long_lived = false;
> > > > + *		else
> > > > + *			p = rcu_pointer_handoff(p);
> > > > + *	}
> > > > + *	rcu_read_unlock();
> > > > + */
> > > > +#define rcu_pointer_handoff(p) (p)
> > > 
> > > Will you actually be using this? It seems a tad pointless to add if you
> > > don't.
> > 
> > Some of the LLVM guys believe that they can diagnose RCU pointer leaks
> > if this is used.  But yes, it does need to be used.
> 
> The thing is, I'm not convinced this is a 'sane' interface. Its _far_
> too easy to forget. It doesn't make any kind of sense either, which is
> part of why its hard to remember.

Indeed, the only thing that would make it easy to remember is if there are
tools that check for pointer leaks from RCU read-side critical sections.
But without this interface, such tools are insanely difficult to create.
So there is a chicken-and-egg problem here, which I am attempting to
deal with by providing an egg.  Hopefully not laying an egg, but time
will tell.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-06 16:13   ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
@ 2015-10-26  8:45     ` Ingo Molnar
  2015-10-26 14:55       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2015-10-26  8:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Patrick Marlier


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> From: Patrick Marlier <patrick.marlier@gmail.com>
> 
> The current list_entry_rcu() implementation copies the pointer to a stack
> variable, then invokes rcu_dereference_raw() on it.  This results in an
> additional store-load pair.  Now, most compilers will emit normal store
> and load instructions, which might seem to be of negligible overhead,
> but this results in a load-hit-store situation that can cause surprisingly
> long pipeline stalls, even on modern microprocessors.  The problem is
> that it takes time for the store to get the store buffer updated, which
> can delay the subsequent load, which immediately follows.
> 
> This commit therefore switches to the lockless_dereference() primitive,
> which does not expect the __rcu annotations (that are anyway not present
> in the list_head structure) and which, like rcu_dereference_raw(),
> does not check for an enclosing RCU read-side critical section.
> Most importantly, it does not copy the pointer, thus avoiding the
> load-hit-store overhead.
> 
> Signed-off-by: Patrick Marlier <patrick.marlier@gmail.com>
> [ paulmck: Switched to lockless_dereference() to suppress sparse warnings. ]
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/rculist.h | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f84a77..5ed540986019 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
>   * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
>   */
>  #define list_entry_rcu(ptr, type, member) \
> -({ \
> -	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> -	container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
> -})
> +	container_of(lockless_dereference(ptr), type, member)

So this commit:

  8db70b132dd5 ("rculist: Make list_entry_rcu() use lockless_dereference()")

when merged with Linus's latest tree, triggers the following build failure on 
allyesconfig/allmodconfig x86:

triton:~/tip> make fs/fs-writeback.o
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CC      fs/fs-writeback.o
In file included from fs/fs-writeback.c:16:0:
fs/fs-writeback.c: In function ‘bdi_split_work_to_wbs’:
include/linux/compiler.h:281:20: error: lvalue required as unary ‘&’ operand
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                    ^
include/linux/kernel.h:811:49: note: in definition of macro ‘container_of’
  const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                 ^
include/linux/compiler.h:286:22: note: in expansion of macro ‘__READ_ONCE’
 #define READ_ONCE(x) __READ_ONCE(x, 1)
                      ^
include/linux/compiler.h:533:26: note: in expansion of macro ‘READ_ONCE’
  typeof(p) _________p1 = READ_ONCE(p); \
                          ^
include/linux/rculist.h:250:15: note: in expansion of macro ‘lockless_dereference’
  container_of(lockless_dereference(ptr), type, member)
               ^
fs/fs-writeback.c:782:29: note: in expansion of macro ‘list_entry_rcu’
  struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
                             ^
include/linux/compiler.h:283:28: error: lvalue required as unary ‘&’ operand
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                            ^
include/linux/kernel.h:811:49: note: in definition of macro ‘container_of’
  const typeof( ((type *)0)->member ) *__mptr = (ptr); \
                                                 ^
include/linux/compiler.h:286:22: note: in expansion of macro ‘__READ_ONCE’
 #define READ_ONCE(x) __READ_ONCE(x, 1)
                      ^
include/linux/compiler.h:533:26: note: in expansion of macro ‘READ_ONCE’
  typeof(p) _________p1 = READ_ONCE(p); \
                          ^
include/linux/rculist.h:250:15: note: in expansion of macro ‘lockless_dereference’
  container_of(lockless_dereference(ptr), type, member)
               ^
fs/fs-writeback.c:782:29: note: in expansion of macro ‘list_entry_rcu’
  struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
                             ^
scripts/Makefile.build:258: recipe for target 'fs/fs-writeback.o' failed
make[1]: *** [fs/fs-writeback.o] Error 1
Makefile:1526: recipe for target 'fs/fs-writeback.o' failed
make: *** [fs/fs-writeback.o] Error 2

It's this new usage in fs/fs-writeback.c:

static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
                                  struct wb_writeback_work *base_work,
                                  bool skip_if_busy)
{
        struct bdi_writeback *last_wb = NULL;
        struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
                                                struct bdi_writeback, bdi_node);

so AFAICS the problem is lockless_dereference() not being able to take this valid 
but non-lvalue list head, because READ_ONCE() does not take non-lvalues. All other 
existing uses of list_entry_rcu() happen to use lvalues, so this bug didn't get 
triggered before.

For now I've reverted this commit locally, to make the kernel build and so, but we 
need a cleaner solution I suspect.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-26  8:45     ` Ingo Molnar
@ 2015-10-26 14:55       ` Paul E. McKenney
  2015-10-26 18:02         ` Ingo Molnar
  2015-10-27  3:37         ` Linus Torvalds
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-26 14:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Patrick Marlier

On Mon, Oct 26, 2015 at 09:45:06AM +0100, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > From: Patrick Marlier <patrick.marlier@gmail.com>
> > 
> > The current list_entry_rcu() implementation copies the pointer to a stack
> > variable, then invokes rcu_dereference_raw() on it.  This results in an
> > additional store-load pair.  Now, most compilers will emit normal store
> > and load instructions, which might seem to be of negligible overhead,
> > but this results in a load-hit-store situation that can cause surprisingly
> > long pipeline stalls, even on modern microprocessors.  The problem is
> > that it takes time for the store to get the store buffer updated, which
> > can delay the subsequent load, which immediately follows.
> > 
> > This commit therefore switches to the lockless_dereference() primitive,
> > which does not expect the __rcu annotations (that are anyway not present
> > in the list_head structure) and which, like rcu_dereference_raw(),
> > does not check for an enclosing RCU read-side critical section.
> > Most importantly, it does not copy the pointer, thus avoiding the
> > load-hit-store overhead.
> > 
> > Signed-off-by: Patrick Marlier <patrick.marlier@gmail.com>
> > [ paulmck: Switched to lockless_dereference() to suppress sparse warnings. ]
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/rculist.h | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index 17c6b1f84a77..5ed540986019 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -247,10 +247,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
> >   * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> >   */
> >  #define list_entry_rcu(ptr, type, member) \
> > -({ \
> > -	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> > -	container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
> > -})
> > +	container_of(lockless_dereference(ptr), type, member)
> 
> So this commit:
> 
>   8db70b132dd5 ("rculist: Make list_entry_rcu() use lockless_dereference()")
> 
> when merged with Linus's latest tree, triggers the following build failure on 
> allyesconfig/allmodconfig x86:
> 
> triton:~/tip> make fs/fs-writeback.o
>   CHK     include/config/kernel.release
>   CHK     include/generated/uapi/linux/version.h
>   CHK     include/generated/utsrelease.h
>   CHK     include/generated/bounds.h
>   CHK     include/generated/timeconst.h
>   CHK     include/generated/asm-offsets.h
>   CALL    scripts/checksyscalls.sh
>   CC      fs/fs-writeback.o
> In file included from fs/fs-writeback.c:16:0:
> fs/fs-writeback.c: In function ‘bdi_split_work_to_wbs’:
> include/linux/compiler.h:281:20: error: lvalue required as unary ‘&’ operand
>    __read_once_size(&(x), __u.__c, sizeof(x));  \
>                     ^
> include/linux/kernel.h:811:49: note: in definition of macro ‘container_of’
>   const typeof( ((type *)0)->member ) *__mptr = (ptr); \
>                                                  ^
> include/linux/compiler.h:286:22: note: in expansion of macro ‘__READ_ONCE’
>  #define READ_ONCE(x) __READ_ONCE(x, 1)
>                       ^
> include/linux/compiler.h:533:26: note: in expansion of macro ‘READ_ONCE’
>   typeof(p) _________p1 = READ_ONCE(p); \
>                           ^
> include/linux/rculist.h:250:15: note: in expansion of macro ‘lockless_dereference’
>   container_of(lockless_dereference(ptr), type, member)
>                ^
> fs/fs-writeback.c:782:29: note: in expansion of macro ‘list_entry_rcu’
>   struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
>                              ^
> include/linux/compiler.h:283:28: error: lvalue required as unary ‘&’ operand
>    __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
>                             ^
> include/linux/kernel.h:811:49: note: in definition of macro ‘container_of’
>   const typeof( ((type *)0)->member ) *__mptr = (ptr); \
>                                                  ^
> include/linux/compiler.h:286:22: note: in expansion of macro ‘__READ_ONCE’
>  #define READ_ONCE(x) __READ_ONCE(x, 1)
>                       ^
> include/linux/compiler.h:533:26: note: in expansion of macro ‘READ_ONCE’
>   typeof(p) _________p1 = READ_ONCE(p); \
>                           ^
> include/linux/rculist.h:250:15: note: in expansion of macro ‘lockless_dereference’
>   container_of(lockless_dereference(ptr), type, member)
>                ^
> fs/fs-writeback.c:782:29: note: in expansion of macro ‘list_entry_rcu’
>   struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
>                              ^
> scripts/Makefile.build:258: recipe for target 'fs/fs-writeback.o' failed
> make[1]: *** [fs/fs-writeback.o] Error 1
> Makefile:1526: recipe for target 'fs/fs-writeback.o' failed
> make: *** [fs/fs-writeback.o] Error 2
> 
> It's this new usage in fs/fs-writeback.c:
> 
> static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
>                                   struct wb_writeback_work *base_work,
>                                   bool skip_if_busy)
> {
>         struct bdi_writeback *last_wb = NULL;
>         struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,

I believe that the above should instead be:

	struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,

After all, RCU read-side list primitives need to fetch pointers in order
to traverse those pointers in an RCU-safe manner.  The patch below clears
this up for me, does it also work for you?

							Thanx, Paul

------------------------------------------------------------------------

commit 9221c4e78cc3511cd15b1433617ae2548ad8f631
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Oct 26 07:47:20 2015 -0700

    fs: Fix argument to list_entry_rcu()
    
    The list_entry_rcu() macro requires that its first argument be the
    pointer to the list_head contained in the structure whose pointer is to
    be returned, but fetched in an RCU-safe manner.  This commit therefore
    fixes the use in bdi_split_work_to_wbs() to follow this convention.
    
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 29e4599f6fc1..126d4de8faad 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -779,7 +779,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 				  bool skip_if_busy)
 {
 	struct bdi_writeback *last_wb = NULL;
-	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
+	struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,
 						struct bdi_writeback, bdi_node);
 
 	might_sleep();


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-26 14:55       ` Paul E. McKenney
@ 2015-10-26 18:02         ` Ingo Molnar
  2015-10-27  3:37         ` Linus Torvalds
  1 sibling, 0 replies; 57+ messages in thread
From: Ingo Molnar @ 2015-10-26 18:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, jiangshanlai, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, peterz, rostedt, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, Patrick Marlier, Linus Torvalds


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > It's this new usage in fs/fs-writeback.c:
> > 
> > static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
> >                                   struct wb_writeback_work *base_work,
> >                                   bool skip_if_busy)
> > {
> >         struct bdi_writeback *last_wb = NULL;
> >         struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> 
> I believe that the above should instead be:
> 
> 	struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,
> 
> After all, RCU read-side list primitives need to fetch pointers in order to 
> traverse those pointers in an RCU-safe manner.  The patch below clears this up 
> for me, does it also work for you?

Are you sure about that?

I considered this solution too, but the code goes like this:

static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
                                  struct wb_writeback_work *base_work,
                                  bool skip_if_busy)
{
        struct bdi_writeback *last_wb = NULL;
        struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
                                                struct bdi_writeback, bdi_node);

        might_sleep();
restart:
        rcu_read_lock();
        list_for_each_entry_continue_rcu(wb, &bdi->wb_list, bdi_node) {

and list_for_each_entry_continue_rcu() will start the iteration with the next 
entry. So if you initialize the head with .next, then we'll start with 
.next->next, i.e. we skip the first entry.

That seems to change behavior and break the logic.

Another solution I considered is to use bd->wb_list.next->prev, but that, beyond 
being ugly, causes actual extra runtime overhead - for something that seems 
academical.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-26 14:55       ` Paul E. McKenney
  2015-10-26 18:02         ` Ingo Molnar
@ 2015-10-27  3:37         ` Linus Torvalds
  2015-10-27  5:19           ` Tejun Heo
  2015-10-27  5:32           ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
  1 sibling, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2015-10-27  3:37 UTC (permalink / raw)
  To: Paul McKenney, Tejun Heo
  Cc: Ingo Molnar, Linux Kernel Mailing List, Lai Jiangshan,
	Dipankar Sarma, Andrew Morton, Mathieu Desnoyers, Josh Triplett,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt, David Howells,
	Eric Dumazet, Darren Hart, Frédéric Weisbecker,
	Oleg Nesterov, pranith kumar, Patrick Marlier

On Mon, Oct 26, 2015 at 11:55 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>>         struct bdi_writeback *last_wb = NULL;
>>         struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
>
> I believe that the above should instead be:
>
>         struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,

I don't think you can do that.

You haven't even taken the RCU read lock yet at this point.

What the code seems to try to do is to get the "head pointer" of the
list before taking the read lock (since _that_ is stable), and then
follow the list under the lock.

You're making it actually follow the first RCU pointer too early.

That said, I'm not sure why it doesn't just do the normal

    rcu_read_lock();
    list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
        ....
    }
    rcu_read_unlock();

like the other places do. It looks like it wants that
"list_for_each_entry_continue_rcu()" because it does that odd "pin
entry and drop rcu lock and retake it and continue where you left
off", but I'm not sure why the continue version would be so
different.. It's going to do that "follow next entry" regardless, and
the "goto restart" doesn't look like it actually adds anything. If
following the next pointer is ok even after having released the RCU
read lock, then I'm not seeing why the end of the loop couldn't just
do

                rcu_read_unlock();
                wb_wait_for_completion(bdi, &fallback_work_done);
                rcu_read_lock();

and just continue the loop (and the pinning of "wb" and releasing the
"last_wb" thing in the *next* iteration should make it all work the
same).

Adding Tejun to the cc, because this is his code and there's probably
something subtle I'm missing. Tejun, can you take a look? It's
bdi_split_work_to_wbs() in fs/fs-writeback.c.

                Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-27  3:37         ` Linus Torvalds
@ 2015-10-27  5:19           ` Tejun Heo
  2015-10-27  5:33             ` Paul E. McKenney
                               ` (2 more replies)
  2015-10-27  5:32           ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
  1 sibling, 3 replies; 57+ messages in thread
From: Tejun Heo @ 2015-10-27  5:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Ingo Molnar, Linux Kernel Mailing List,
	Lai Jiangshan, Dipankar Sarma, Andrew Morton, Mathieu Desnoyers,
	Josh Triplett, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	David Howells, Eric Dumazet, Darren Hart,
	Frédéric Weisbecker, Oleg Nesterov, pranith kumar,
	Patrick Marlier

Hello,

On Tue, Oct 27, 2015 at 12:37:16PM +0900, Linus Torvalds wrote:
> > I believe that the above should instead be:
> >
> >         struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,

I should have just used list_entry() here.  It's just offseting the
pointer to set up the initial iteration point.

...
> That said, I'm not sure why it doesn't just do the normal
> 
>     rcu_read_lock();
>     list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
>         ....
>     }
>     rcu_read_unlock();
> 
> like the other places do. It looks like it wants that
> "list_for_each_entry_continue_rcu()" because it does that odd "pin
> entry and drop rcu lock and retake it and continue where you left
> off", but I'm not sure why the continue version would be so
> different.. It's going to do that "follow next entry" regardless, and
> the "goto restart" doesn't look like it actually adds anything. If
> following the next pointer is ok even after having released the RCU
> read lock, then I'm not seeing why the end of the loop couldn't just
> do
> 
>                 rcu_read_unlock();
>                 wb_wait_for_completion(bdi, &fallback_work_done);
>                 rcu_read_lock();
> 
> and just continue the loop (and the pinning of "wb" and releasing the
> "last_wb" thing in the *next* iteration should make it all work the
> same).
> 
> Adding Tejun to the cc, because this is his code and there's probably
> something subtle I'm missing. Tejun, can you take a look? It's
> bdi_split_work_to_wbs() in fs/fs-writeback.c.

Yeah, just releasing and regrabbing should work too as the iterator
doesn't depend on anything other than the current entry (e.g. as
opposed to imaginary list_for_each_entry_safe_rcu()).  It's slightly
icky to meddle with locking behind the iterator's back tho.  Either
way should be fine but how about something like the following?

Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()

bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
walk @bdi->wb_list.  To set up the initial iteration condition, it
uses list_entry_rcu() to calculate the entry pointer corresponding to
the list head; however, this isn't an actual RCU dereference and using
list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
change because it was feeding an non-lvalue pointer into the macro.

Don't use the RCU variant for simple pointer offsetting.  Use
list_entry() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fs-writeback.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 29e4599..7378169 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 				  bool skip_if_busy)
 {
 	struct bdi_writeback *last_wb = NULL;
-	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
-						struct bdi_writeback, bdi_node);
+	struct bdi_writeback *wb = list_entry(&bdi->wb_list,
+					      struct bdi_writeback, bdi_node);
 
 	might_sleep();
 restart:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-27  3:37         ` Linus Torvalds
  2015-10-27  5:19           ` Tejun Heo
@ 2015-10-27  5:32           ` Paul E. McKenney
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-27  5:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Ingo Molnar, Linux Kernel Mailing List, Lai Jiangshan,
	Dipankar Sarma, Andrew Morton, Mathieu Desnoyers, Josh Triplett,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt, David Howells,
	Eric Dumazet, Darren Hart, Frédéric Weisbecker,
	Oleg Nesterov, pranith kumar, Patrick Marlier

On Tue, Oct 27, 2015 at 12:37:16PM +0900, Linus Torvalds wrote:
> On Mon, Oct 26, 2015 at 11:55 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >>         struct bdi_writeback *last_wb = NULL;
> >>         struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> >
> > I believe that the above should instead be:
> >
> >         struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,
> 
> I don't think you can do that.
> 
> You haven't even taken the RCU read lock yet at this point.
> 
> What the code seems to try to do is to get the "head pointer" of the
> list before taking the read lock (since _that_ is stable), and then
> follow the list under the lock.
> 
> You're making it actually follow the first RCU pointer too early.

Good point, color me dazed and confused.  :-/

							Thanx, Paul

> That said, I'm not sure why it doesn't just do the normal
> 
>     rcu_read_lock();
>     list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
>         ....
>     }
>     rcu_read_unlock();
> 
> like the other places do. It looks like it wants that
> "list_for_each_entry_continue_rcu()" because it does that odd "pin
> entry and drop rcu lock and retake it and continue where you left
> off", but I'm not sure why the continue version would be so
> different.. It's going to do that "follow next entry" regardless, and
> the "goto restart" doesn't look like it actually adds anything. If
> following the next pointer is ok even after having released the RCU
> read lock, then I'm not seeing why the end of the loop couldn't just
> do
> 
>                 rcu_read_unlock();
>                 wb_wait_for_completion(bdi, &fallback_work_done);
>                 rcu_read_lock();
> 
> and just continue the loop (and the pinning of "wb" and releasing the
> "last_wb" thing in the *next* iteration should make it all work the
> same).
> 
> Adding Tejun to the cc, because this is his code and there's probably
> something subtle I'm missing. Tejun, can you take a look? It's
> bdi_split_work_to_wbs() in fs/fs-writeback.c.
> 
>                 Linus
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-27  5:19           ` Tejun Heo
@ 2015-10-27  5:33             ` Paul E. McKenney
  2015-10-28  8:33             ` Ingo Molnar
  2015-10-28 20:58             ` [tip:core/rcu] fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() tip-bot for Tejun Heo
  2 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-27  5:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Lai Jiangshan, Dipankar Sarma, Andrew Morton, Mathieu Desnoyers,
	Josh Triplett, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	David Howells, Eric Dumazet, Darren Hart,
	Frédéric Weisbecker, Oleg Nesterov, pranith kumar,
	Patrick Marlier

On Tue, Oct 27, 2015 at 02:19:39PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Tue, Oct 27, 2015 at 12:37:16PM +0900, Linus Torvalds wrote:
> > > I believe that the above should instead be:
> > >
> > >         struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next,
> 
> I should have just used list_entry() here.  It's just offseting the
> pointer to set up the initial iteration point.

OK, that sounds much better!

> ...
> > That said, I'm not sure why it doesn't just do the normal
> > 
> >     rcu_read_lock();
> >     list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
> >         ....
> >     }
> >     rcu_read_unlock();
> > 
> > like the other places do. It looks like it wants that
> > "list_for_each_entry_continue_rcu()" because it does that odd "pin
> > entry and drop rcu lock and retake it and continue where you left
> > off", but I'm not sure why the continue version would be so
> > different.. It's going to do that "follow next entry" regardless, and
> > the "goto restart" doesn't look like it actually adds anything. If
> > following the next pointer is ok even after having released the RCU
> > read lock, then I'm not seeing why the end of the loop couldn't just
> > do
> > 
> >                 rcu_read_unlock();
> >                 wb_wait_for_completion(bdi, &fallback_work_done);
> >                 rcu_read_lock();
> > 
> > and just continue the loop (and the pinning of "wb" and releasing the
> > "last_wb" thing in the *next* iteration should make it all work the
> > same).
> > 
> > Adding Tejun to the cc, because this is his code and there's probably
> > something subtle I'm missing. Tejun, can you take a look? It's
> > bdi_split_work_to_wbs() in fs/fs-writeback.c.
> 
> Yeah, just releasing and regrabbing should work too as the iterator
> doesn't depend on anything other than the current entry (e.g. as
> opposed to imaginary list_for_each_entry_safe_rcu()).  It's slightly
> icky to meddle with locking behind the iterator's back tho.  Either
> way should be fine but how about something like the following?
> 
> Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
> 
> bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
> walk @bdi->wb_list.  To set up the initial iteration condition, it
> uses list_entry_rcu() to calculate the entry pointer corresponding to
> the list head; however, this isn't an actual RCU dereference and using
> list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
> change because it was feeding an non-lvalue pointer into the macro.
> 
> Don't use the RCU variant for simple pointer offsetting.  Use
> list_entry() instead.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  fs/fs-writeback.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 29e4599..7378169 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
>  				  bool skip_if_busy)
>  {
>  	struct bdi_writeback *last_wb = NULL;
> -	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> -						struct bdi_writeback, bdi_node);
> +	struct bdi_writeback *wb = list_entry(&bdi->wb_list,
> +					      struct bdi_writeback, bdi_node);
> 
>  	might_sleep();
>  restart:
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-27  5:19           ` Tejun Heo
  2015-10-27  5:33             ` Paul E. McKenney
@ 2015-10-28  8:33             ` Ingo Molnar
  2015-10-28 20:35               ` Patrick Marlier
  2015-10-28 20:58             ` [tip:core/rcu] fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() tip-bot for Tejun Heo
  2 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2015-10-28  8:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Paul McKenney, Linux Kernel Mailing List,
	Lai Jiangshan, Dipankar Sarma, Andrew Morton, Mathieu Desnoyers,
	Josh Triplett, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	David Howells, Eric Dumazet, Darren Hart,
	Frédéric Weisbecker, Oleg Nesterov, pranith kumar,
	Patrick Marlier


* Tejun Heo <tj@kernel.org> wrote:

> Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
> 
> bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
> walk @bdi->wb_list.  To set up the initial iteration condition, it
> uses list_entry_rcu() to calculate the entry pointer corresponding to
> the list head; however, this isn't an actual RCU dereference and using
> list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
> change because it was feeding an non-lvalue pointer into the macro.
> 
> Don't use the RCU variant for simple pointer offsetting.  Use
> list_entry() instead.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  fs/fs-writeback.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 29e4599..7378169 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
>  				  bool skip_if_busy)
>  {
>  	struct bdi_writeback *last_wb = NULL;
> -	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> -						struct bdi_writeback, bdi_node);
> +	struct bdi_writeback *wb = list_entry(&bdi->wb_list,
> +					      struct bdi_writeback, bdi_node);
>  
>  	might_sleep();

Any objections against me applying this fix to tip:core/rcu so that I can push the 
recent RCU changes towards linux-next without triggering a build failure?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-28  8:33             ` Ingo Molnar
@ 2015-10-28 20:35               ` Patrick Marlier
  2015-10-29  0:00                 ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Patrick Marlier @ 2015-10-28 20:35 UTC (permalink / raw)
  To: Ingo Molnar, Tejun Heo
  Cc: Linus Torvalds, Paul McKenney, Linux Kernel Mailing List,
	Lai Jiangshan, Dipankar Sarma, Andrew Morton, Mathieu Desnoyers,
	Josh Triplett, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	David Howells, Eric Dumazet, Darren Hart,
	Frédéric Weisbecker, Oleg Nesterov, pranith kumar



On 10/28/2015 09:33 AM, Ingo Molnar wrote:
>
> * Tejun Heo <tj@kernel.org> wrote:
>
>> Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
>>
>> bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
>> walk @bdi->wb_list.  To set up the initial iteration condition, it
>> uses list_entry_rcu() to calculate the entry pointer corresponding to
>> the list head; however, this isn't an actual RCU dereference and using
>> list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
>> change because it was feeding an non-lvalue pointer into the macro.
>>
>> Don't use the RCU variant for simple pointer offsetting.  Use
>> list_entry() instead.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>   fs/fs-writeback.c |    4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 29e4599..7378169 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
>>   				  bool skip_if_busy)
>>   {
>>   	struct bdi_writeback *last_wb = NULL;
>> -	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
>> -						struct bdi_writeback, bdi_node);
>> +	struct bdi_writeback *wb = list_entry(&bdi->wb_list,
>> +					      struct bdi_writeback, bdi_node);
>>
>>   	might_sleep();
>
> Any objections against me applying this fix to tip:core/rcu so that I can push the
> recent RCU changes towards linux-next without triggering a build failure?

No objection on my side but probably you are waiting for an ack from 
somebody else.
--
Pat

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [tip:core/rcu] fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
  2015-10-27  5:19           ` Tejun Heo
  2015-10-27  5:33             ` Paul E. McKenney
  2015-10-28  8:33             ` Ingo Molnar
@ 2015-10-28 20:58             ` tip-bot for Tejun Heo
  2 siblings, 0 replies; 57+ messages in thread
From: tip-bot for Tejun Heo @ 2015-10-28 20:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: edumazet, dipankar, torvalds, jiangshanlai, bobby.prani, rostedt,
	peterz, linux-kernel, mingo, mathieu.desnoyers, dhowells, hpa,
	dvhart, tj, paulmck, tglx, josh, fweisbec, oleg, patrick.marlier

Commit-ID:  b33e18f61bd18227a456016a77b1a968f5bc1d65
Gitweb:     http://git.kernel.org/tip/b33e18f61bd18227a456016a77b1a968f5bc1d65
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Tue, 27 Oct 2015 14:19:39 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 28 Oct 2015 13:17:30 +0100

fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()

bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue()
to walk @bdi->wb_list.  To set up the initial iteration
condition, it uses list_entry_rcu() to calculate the entry
pointer corresponding to the list head; however, this isn't an
actual RCU dereference and using list_entry_rcu() for it ended
up breaking a proposed list_entry_rcu() change because it was
feeding an non-lvalue pointer into the macro.

Don't use the RCU variant for simple pointer offsetting.  Use
list_entry() instead.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Patrick Marlier <patrick.marlier@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pranith kumar <bobby.prani@gmail.com>
Link: http://lkml.kernel.org/r/20151027051939.GA19355@mtj.duckdns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 fs/fs-writeback.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 29e4599..7378169 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 				  bool skip_if_busy)
 {
 	struct bdi_writeback *last_wb = NULL;
-	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
-						struct bdi_writeback, bdi_node);
+	struct bdi_writeback *wb = list_entry(&bdi->wb_list,
+					      struct bdi_writeback, bdi_node);
 
 	might_sleep();
 restart:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-28 20:35               ` Patrick Marlier
@ 2015-10-29  0:00                 ` Paul E. McKenney
  2015-10-29  2:13                   ` Tejun Heo
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2015-10-29  0:00 UTC (permalink / raw)
  To: Patrick Marlier
  Cc: Ingo Molnar, Tejun Heo, Linus Torvalds,
	Linux Kernel Mailing List, Lai Jiangshan, Dipankar Sarma,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	Darren Hart, Frédéric Weisbecker, Oleg Nesterov,
	pranith kumar

On Wed, Oct 28, 2015 at 09:35:42PM +0100, Patrick Marlier wrote:
> 
> 
> On 10/28/2015 09:33 AM, Ingo Molnar wrote:
> >
> >* Tejun Heo <tj@kernel.org> wrote:
> >
> >>Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs()
> >>
> >>bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to
> >>walk @bdi->wb_list.  To set up the initial iteration condition, it
> >>uses list_entry_rcu() to calculate the entry pointer corresponding to
> >>the list head; however, this isn't an actual RCU dereference and using
> >>list_entry_rcu() for it ended up breaking a proposed list_entry_rcu()
> >>change because it was feeding an non-lvalue pointer into the macro.
> >>
> >>Don't use the RCU variant for simple pointer offsetting.  Use
> >>list_entry() instead.
> >>
> >>Signed-off-by: Tejun Heo <tj@kernel.org>
> >>---
> >>  fs/fs-writeback.c |    4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >>index 29e4599..7378169 100644
> >>--- a/fs/fs-writeback.c
> >>+++ b/fs/fs-writeback.c
> >>@@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
> >>  				  bool skip_if_busy)
> >>  {
> >>  	struct bdi_writeback *last_wb = NULL;
> >>-	struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list,
> >>-						struct bdi_writeback, bdi_node);
> >>+	struct bdi_writeback *wb = list_entry(&bdi->wb_list,
> >>+					      struct bdi_writeback, bdi_node);
> >>
> >>  	might_sleep();
> >
> >Any objections against me applying this fix to tip:core/rcu so that I can push the
> >recent RCU changes towards linux-next without triggering a build failure?
> 
> No objection on my side but probably you are waiting for an ack from
> somebody else.

I am guessing that he was asking Tejun, but just for the record, I am
OK with it as well:

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference()
  2015-10-29  0:00                 ` Paul E. McKenney
@ 2015-10-29  2:13                   ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2015-10-29  2:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Patrick Marlier, Ingo Molnar, Linus Torvalds,
	Linux Kernel Mailing List, Lai Jiangshan, Dipankar Sarma,
	Andrew Morton, Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	Darren Hart, Frédéric Weisbecker, Oleg Nesterov,
	pranith kumar

On Wed, Oct 28, 2015 at 05:00:30PM -0700, Paul E. McKenney wrote:
> > >Any objections against me applying this fix to tip:core/rcu so that I can push the
> > >recent RCU changes towards linux-next without triggering a build failure?
> > 
> > No objection on my side but probably you are waiting for an ack from
> > somebody else.
> 
> I am guessing that he was asking Tejun, but just for the record, I am
> OK with it as well:
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Oops, sorry, for some reason I thought Ingo was asking someone else.
The patch already got applied but yes routing it through -tip sounds
good.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2015-10-29  2:13 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-06 16:13 [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Paul E. McKenney
2015-10-06 16:13 ` [PATCH tip/core/rcu 01/13] sched: Export sched_setscheduler_nocheck Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 02/13] rcu: Use rcu_callback_t in call_rcu*() and friends Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 03/13] rcu: Use call_rcu_func_t to replace explicit type equivalents Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Paul E. McKenney
2015-10-06 16:20     ` [Kernel networking modules.] OSI levels 2 & 3, Assistance - If anyone knows anyone in the US. North West region John D Allen, Leveridge Systems INC
2015-10-06 16:44     ` [PATCH tip/core/rcu 04/13] rcu: Don't disable preemption for Tiny and Tree RCU readers Josh Triplett
2015-10-06 17:01       ` Paul E. McKenney
2015-10-06 17:16         ` Josh Triplett
2015-10-06 17:42           ` Paul E. McKenney
2015-10-06 17:46             ` Josh Triplett
2015-10-06 20:05             ` Peter Zijlstra
2015-10-06 20:18               ` Paul E. McKenney
2015-10-06 20:52                 ` Peter Zijlstra
2015-10-06 21:05                   ` Paul E. McKenney
2015-10-07  7:19                     ` Peter Zijlstra
2015-10-06 16:13   ` [PATCH tip/core/rcu 05/13] rcu: Eliminate panic when silly boot-time fanout specified Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 06/13] rcu: Add online/offline info to stall warning message Paul E. McKenney
2015-10-06 17:15     ` Josh Triplett
2015-10-06 16:13   ` [PATCH tip/core/rcu 07/13] rcu: Move preemption disabling out of __srcu_read_lock() Paul E. McKenney
2015-10-06 17:18     ` Josh Triplett
2015-10-06 17:36       ` Paul E. McKenney
2015-10-06 17:43         ` Josh Triplett
2015-10-06 18:07           ` Paul E. McKenney
2015-10-06 20:07     ` Peter Zijlstra
2015-10-06 20:19       ` Paul E. McKenney
2015-10-06 20:32         ` Peter Zijlstra
2015-10-06 21:03           ` Paul E. McKenney
2015-10-07  7:20             ` Peter Zijlstra
2015-10-07 14:18               ` Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 08/13] rcu: Finish folding ->fqs_state into ->gp_state Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 09/13] rcu: Correct comment for values of ->gp_state field Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 10/13] rcu: Add rcu_pointer_handoff() Paul E. McKenney
2015-10-06 17:21     ` Josh Triplett
2015-10-06 17:31       ` Paul E. McKenney
2015-10-06 17:36         ` Josh Triplett
2015-10-06 20:27     ` Peter Zijlstra
2015-10-06 21:02       ` Paul E. McKenney
2015-10-07  7:22         ` Peter Zijlstra
2015-10-07 14:20           ` Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
2015-10-26  8:45     ` Ingo Molnar
2015-10-26 14:55       ` Paul E. McKenney
2015-10-26 18:02         ` Ingo Molnar
2015-10-27  3:37         ` Linus Torvalds
2015-10-27  5:19           ` Tejun Heo
2015-10-27  5:33             ` Paul E. McKenney
2015-10-28  8:33             ` Ingo Molnar
2015-10-28 20:35               ` Patrick Marlier
2015-10-29  0:00                 ` Paul E. McKenney
2015-10-29  2:13                   ` Tejun Heo
2015-10-28 20:58             ` [tip:core/rcu] fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() tip-bot for Tejun Heo
2015-10-27  5:32           ` [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 12/13] rcu: Remove deprecated rcu_lockdep_assert() Paul E. McKenney
2015-10-06 16:13   ` [PATCH tip/core/rcu 13/13] rculist: Use WRITE_ONCE() when deleting from reader-visible list Paul E. McKenney
2015-10-06 17:23 ` [PATCH tip/core/rcu 0/13] Miscellaneous fixes for 4.4 Josh Triplett
2015-10-06 17:38   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).