All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/9] v2 Fixes for 3.12
@ 2013-08-20  2:41 Paul E. McKenney
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw

Hello!

This series contains a number of miscellaneous fixes:

1.	Switch to expedited grace periods during hibernation and
	suspend/resume operations (courtesy of Borislav Petkov and
	Bjørn Mork).

2.	Make RCU debugobjects fixup actions leak callbacks rather than
	risking corrupting the callback lists.

3.	Propagate above callback leaking to debugobjects's callers.

4.	Make call_rcu() leak callbacks when debugobjects indicates an error.

5.	Avoid needless self-wakeups from the grace-period kthreads.

6.	Make list_first_or_null_rcu() use list_entry_rcu() (courtesy of
	Tejun Heo).

7.	Fix irq_work_queue() build error for TREE_PREEMPT_RCU for some
	configurations (courtesy of James Hogan).

8.	Simplify _rcu_barrier() processing per Linus Torvalds feedback.

9.	Avoid signed-overflow undefined behavior in time_after() and friends.

Changes since v1 (https://lkml.org/lkml/2013/8/17/106):

o	Merge v1 patches 1, 2, and 6 into the new patch 1.

o	Apply other review comments from Josh Triplett.

							Thanx, Paul


 b/include/linux/debugobjects.h |    6 +-
 b/include/linux/jiffies.h      |    8 +--
 b/include/linux/rculist.h      |    5 +-
 b/init/Kconfig                 |    1 
 b/kernel/rcu.h                 |   10 ++--
 b/kernel/rcupdate.c            |  100 -----------------------------------------
 b/kernel/rcutree.c             |   58 +++++++++++++++++++++--
 b/lib/debugobjects.c           |   20 +++++---
 8 files changed, 84 insertions(+), 124 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume
  2013-08-20  2:41 [PATCH tip/core/rcu 0/9] v2 Fixes for 3.12 Paul E. McKenney
@ 2013-08-20  2:42 ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 2/9] rcu: Simplify debug-objects fixups Paul E. McKenney
                     ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Borislav Petkov, Borislav Petkov, Bjørn Mork,
	Paul E. McKenney

From: Borislav Petkov <bp@alien8.de>

CONFIG_RCU_FAST_NO_HZ can increase grace-period durations by up to
a factor of four, which can result in long suspend and resume times.
Thus, this commit temporarily switches to expedited grace periods when
suspending the box and return to normal settings when resuming.  Similar
logic is applied to hibernation.

Because expedited grace periods are of dubious benefit on very large
systems, so this commit restricts their automated use during suspend
and resume to systems of 256 or fewer CPUs.  (Some day a number of
Linux-kernel facilities, including RCU's expedited grace periods,
will be more scalable, but I need to see bug reports first.)

[ paulmck: This also papers over an audio/irq bug, but hopefully that will
  be fixed soon. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcutree.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 338f1d1..a7bf517 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -54,6 +54,7 @@
 #include <linux/stop_machine.h>
 #include <linux/random.h>
 #include <linux/ftrace_event.h>
+#include <linux/suspend.h>
 
 #include "rcutree.h"
 #include <trace/events/rcu.h>
@@ -3032,6 +3033,25 @@ static int rcu_cpu_notify(struct notifier_block *self,
 	return NOTIFY_OK;
 }
 
+static int rcu_pm_notify(struct notifier_block *self,
+			 unsigned long action, void *hcpu)
+{
+	switch (action) {
+	case PM_HIBERNATION_PREPARE:
+	case PM_SUSPEND_PREPARE:
+		if (nr_cpu_ids <= 256) /* Expediting bad for large systems. */
+			rcu_expedited = 1;
+		break;
+	case PM_POST_HIBERNATION:
+	case PM_POST_SUSPEND:
+		rcu_expedited = 0;
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
 /*
  * Spawn the kthread that handles this RCU flavor's grace periods.
  */
@@ -3273,6 +3293,7 @@ void __init rcu_init(void)
 	 * or the scheduler are operational.
 	 */
 	cpu_notifier(rcu_cpu_notify, 0);
+	pm_notifier(rcu_pm_notify, 0);
 	for_each_online_cpu(cpu)
 		rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
 }
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 2/9] rcu: Simplify debug-objects fixups
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 3/9] debugobjects: Make debug_object_activate() return status Paul E. McKenney
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, Mathieu Desnoyers, Sedat Dilek,
	Davidlohr Bueso, Rik van Riel, Linus Torvalds

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The current debug-objects fixups are complex and heavyweight, and the
fixups are not complete:  Even with the fixups, RCU's callback lists
can still be corrupted.  This commit therefore strips the fixups down
to their minimal form, eliminating two of the three.

It would be even better if (for example) call_rcu() simply leaked
any problematic callbacks, but for that to happen, the debug-objects
system would need to inform its caller of suspicious situations.
This is the subject of a later commit in this series.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcupdate.c | 100 ------------------------------------------------------
 1 file changed, 100 deletions(-)

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 14994d4..33eb462 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -212,43 +212,6 @@ static inline void debug_rcu_head_free(struct rcu_head *head)
 }
 
 /*
- * fixup_init is called when:
- * - an active object is initialized
- */
-static int rcuhead_fixup_init(void *addr, enum debug_obj_state state)
-{
-	struct rcu_head *head = addr;
-
-	switch (state) {
-	case ODEBUG_STATE_ACTIVE:
-		/*
-		 * Ensure that queued callbacks are all executed.
-		 * If we detect that we are nested in a RCU read-side critical
-		 * section, we should simply fail, otherwise we would deadlock.
-		 * In !PREEMPT configurations, there is no way to tell if we are
-		 * in a RCU read-side critical section or not, so we never
-		 * attempt any fixup and just print a warning.
-		 */
-#ifndef CONFIG_PREEMPT
-		WARN_ON_ONCE(1);
-		return 0;
-#endif
-		if (rcu_preempt_depth() != 0 || preempt_count() != 0 ||
-		    irqs_disabled()) {
-			WARN_ON_ONCE(1);
-			return 0;
-		}
-		rcu_barrier();
-		rcu_barrier_sched();
-		rcu_barrier_bh();
-		debug_object_init(head, &rcuhead_debug_descr);
-		return 1;
-	default:
-		return 0;
-	}
-}
-
-/*
  * fixup_activate is called when:
  * - an active object is activated
  * - an unknown object is activated (might be a statically initialized object)
@@ -268,69 +231,8 @@ static int rcuhead_fixup_activate(void *addr, enum debug_obj_state state)
 		debug_object_init(head, &rcuhead_debug_descr);
 		debug_object_activate(head, &rcuhead_debug_descr);
 		return 0;
-
-	case ODEBUG_STATE_ACTIVE:
-		/*
-		 * Ensure that queued callbacks are all executed.
-		 * If we detect that we are nested in a RCU read-side critical
-		 * section, we should simply fail, otherwise we would deadlock.
-		 * In !PREEMPT configurations, there is no way to tell if we are
-		 * in a RCU read-side critical section or not, so we never
-		 * attempt any fixup and just print a warning.
-		 */
-#ifndef CONFIG_PREEMPT
-		WARN_ON_ONCE(1);
-		return 0;
-#endif
-		if (rcu_preempt_depth() != 0 || preempt_count() != 0 ||
-		    irqs_disabled()) {
-			WARN_ON_ONCE(1);
-			return 0;
-		}
-		rcu_barrier();
-		rcu_barrier_sched();
-		rcu_barrier_bh();
-		debug_object_activate(head, &rcuhead_debug_descr);
-		return 1;
 	default:
-		return 0;
-	}
-}
-
-/*
- * fixup_free is called when:
- * - an active object is freed
- */
-static int rcuhead_fixup_free(void *addr, enum debug_obj_state state)
-{
-	struct rcu_head *head = addr;
-
-	switch (state) {
-	case ODEBUG_STATE_ACTIVE:
-		/*
-		 * Ensure that queued callbacks are all executed.
-		 * If we detect that we are nested in a RCU read-side critical
-		 * section, we should simply fail, otherwise we would deadlock.
-		 * In !PREEMPT configurations, there is no way to tell if we are
-		 * in a RCU read-side critical section or not, so we never
-		 * attempt any fixup and just print a warning.
-		 */
-#ifndef CONFIG_PREEMPT
-		WARN_ON_ONCE(1);
-		return 0;
-#endif
-		if (rcu_preempt_depth() != 0 || preempt_count() != 0 ||
-		    irqs_disabled()) {
-			WARN_ON_ONCE(1);
-			return 0;
-		}
-		rcu_barrier();
-		rcu_barrier_sched();
-		rcu_barrier_bh();
-		debug_object_free(head, &rcuhead_debug_descr);
 		return 1;
-	default:
-		return 0;
 	}
 }
 
@@ -369,9 +271,7 @@ EXPORT_SYMBOL_GPL(destroy_rcu_head_on_stack);
 
 struct debug_obj_descr rcuhead_debug_descr = {
 	.name = "rcu_head",
-	.fixup_init = rcuhead_fixup_init,
 	.fixup_activate = rcuhead_fixup_activate,
-	.fixup_free = rcuhead_fixup_free,
 };
 EXPORT_SYMBOL_GPL(rcuhead_debug_descr);
 #endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 3/9] debugobjects: Make debug_object_activate() return status
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 2/9] rcu: Simplify debug-objects fixups Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 4/9] rcu: Make call_rcu() leak callbacks for debug-object errors Paul E. McKenney
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, Mathieu Desnoyers, Sedat Dilek,
	Davidlohr Bueso, Rik van Riel, Linus Torvalds

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

In order to better respond to things like duplicate invocations
of call_rcu(), RCU needs to see the status of a call to
debug_object_activate().  This would allow RCU to leak the callback in
order to avoid adding freelist-reuse mischief to the duplicate invoations.
This commit therefore makes debug_object_activate() return status,
zero for success and -EINVAL for failure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 include/linux/debugobjects.h |  6 +++---
 lib/debugobjects.c           | 20 ++++++++++++++------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/debugobjects.h b/include/linux/debugobjects.h
index 0e5f578..98ffcbd 100644
--- a/include/linux/debugobjects.h
+++ b/include/linux/debugobjects.h
@@ -63,7 +63,7 @@ struct debug_obj_descr {
 extern void debug_object_init      (void *addr, struct debug_obj_descr *descr);
 extern void
 debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr);
-extern void debug_object_activate  (void *addr, struct debug_obj_descr *descr);
+extern int debug_object_activate  (void *addr, struct debug_obj_descr *descr);
 extern void debug_object_deactivate(void *addr, struct debug_obj_descr *descr);
 extern void debug_object_destroy   (void *addr, struct debug_obj_descr *descr);
 extern void debug_object_free      (void *addr, struct debug_obj_descr *descr);
@@ -85,8 +85,8 @@ static inline void
 debug_object_init      (void *addr, struct debug_obj_descr *descr) { }
 static inline void
 debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr) { }
-static inline void
-debug_object_activate  (void *addr, struct debug_obj_descr *descr) { }
+static inline int
+debug_object_activate  (void *addr, struct debug_obj_descr *descr) { return 0; }
 static inline void
 debug_object_deactivate(void *addr, struct debug_obj_descr *descr) { }
 static inline void
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 37061ed..bf2c8b1 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -381,19 +381,21 @@ void debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr)
  * debug_object_activate - debug checks when an object is activated
  * @addr:	address of the object
  * @descr:	pointer to an object specific debug description structure
+ * Returns 0 for success, -EINVAL for check failed.
  */
-void debug_object_activate(void *addr, struct debug_obj_descr *descr)
+int debug_object_activate(void *addr, struct debug_obj_descr *descr)
 {
 	enum debug_obj_state state;
 	struct debug_bucket *db;
 	struct debug_obj *obj;
 	unsigned long flags;
+	int ret;
 	struct debug_obj o = { .object = addr,
 			       .state = ODEBUG_STATE_NOTAVAILABLE,
 			       .descr = descr };
 
 	if (!debug_objects_enabled)
-		return;
+		return 0;
 
 	db = get_bucket((unsigned long) addr);
 
@@ -405,23 +407,26 @@ void debug_object_activate(void *addr, struct debug_obj_descr *descr)
 		case ODEBUG_STATE_INIT:
 		case ODEBUG_STATE_INACTIVE:
 			obj->state = ODEBUG_STATE_ACTIVE;
+			ret = 0;
 			break;
 
 		case ODEBUG_STATE_ACTIVE:
 			debug_print_object(obj, "activate");
 			state = obj->state;
 			raw_spin_unlock_irqrestore(&db->lock, flags);
-			debug_object_fixup(descr->fixup_activate, addr, state);
-			return;
+			ret = debug_object_fixup(descr->fixup_activate, addr, state);
+			return ret ? -EINVAL : 0;
 
 		case ODEBUG_STATE_DESTROYED:
 			debug_print_object(obj, "activate");
+			ret = -EINVAL;
 			break;
 		default:
+			ret = 0;
 			break;
 		}
 		raw_spin_unlock_irqrestore(&db->lock, flags);
-		return;
+		return ret;
 	}
 
 	raw_spin_unlock_irqrestore(&db->lock, flags);
@@ -431,8 +436,11 @@ void debug_object_activate(void *addr, struct debug_obj_descr *descr)
 	 * true or not.
 	 */
 	if (debug_object_fixup(descr->fixup_activate, addr,
-			   ODEBUG_STATE_NOTAVAILABLE))
+			   ODEBUG_STATE_NOTAVAILABLE)) {
 		debug_print_object(&o, "activate");
+		return -EINVAL;
+	}
+	return 0;
 }
 
 /**
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 4/9] rcu: Make call_rcu() leak callbacks for debug-object errors
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 2/9] rcu: Simplify debug-objects fixups Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 3/9] debugobjects: Make debug_object_activate() return status Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 5/9] rcu: Avoid redundant grace-period kthread wakeups Paul E. McKenney
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, Mathieu Desnoyers, Sedat Dilek,
	Davidlohr Bueso, Rik van Riel, Linus Torvalds

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

If someone does a duplicate call_rcu(), the worst thing the second
call_rcu() could do would be to actually queue the callback the second
time because doing so corrupts whatever list the callback was already
queued on.  This commit therefore makes __call_rcu() check the new
return value from debug-objects and leak the callback upon error.
This commit also substitutes rcu_leak_callback() for whatever callback
function was previously in place in order to avoid freeing the callback
out from under any readers that might still be referencing it.

These changes increase the probability that the debug-objects error
messages will actually make it somewhere visible.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcu.h     | 10 +++++++---
 kernel/rcutree.c | 14 +++++++++++++-
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu.h b/kernel/rcu.h
index 0a90ccc..7713196 100644
--- a/kernel/rcu.h
+++ b/kernel/rcu.h
@@ -67,12 +67,15 @@
 
 extern struct debug_obj_descr rcuhead_debug_descr;
 
-static inline void debug_rcu_head_queue(struct rcu_head *head)
+static inline int debug_rcu_head_queue(struct rcu_head *head)
 {
-	debug_object_activate(head, &rcuhead_debug_descr);
+	int r1;
+
+	r1 = debug_object_activate(head, &rcuhead_debug_descr);
 	debug_object_active_state(head, &rcuhead_debug_descr,
 				  STATE_RCU_HEAD_READY,
 				  STATE_RCU_HEAD_QUEUED);
+	return r1;
 }
 
 static inline void debug_rcu_head_unqueue(struct rcu_head *head)
@@ -83,8 +86,9 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
 	debug_object_deactivate(head, &rcuhead_debug_descr);
 }
 #else	/* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
-static inline void debug_rcu_head_queue(struct rcu_head *head)
+static inline int debug_rcu_head_queue(struct rcu_head *head)
 {
+	return 0;
 }
 
 static inline void debug_rcu_head_unqueue(struct rcu_head *head)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index a7bf517..9184056 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2305,6 +2305,13 @@ static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
 }
 
 /*
+ * RCU callback function to leak a callback.
+ */
+static void rcu_leak_callback(struct rcu_head *rhp)
+{
+}
+
+/*
  * Helper function for call_rcu() and friends.  The cpu argument will
  * normally be -1, indicating "currently running CPU".  It may specify
  * a CPU only if that CPU is a no-CBs CPU.  Currently, only _rcu_barrier()
@@ -2318,7 +2325,12 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 	struct rcu_data *rdp;
 
 	WARN_ON_ONCE((unsigned long)head & 0x3); /* Misaligned rcu_head! */
-	debug_rcu_head_queue(head);
+	if (debug_rcu_head_queue(head)) {
+		/* Probable double call_rcu(), so leak the callback. */
+		ACCESS_ONCE(head->func) = rcu_leak_callback;
+		WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
+		return;
+	}
 	head->func = func;
 	head->next = NULL;
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 5/9] rcu: Avoid redundant grace-period kthread wakeups
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
                     ` (2 preceding siblings ...)
  2013-08-20  2:42   ` [PATCH tip/core/rcu 4/9] rcu: Make call_rcu() leak callbacks for debug-object errors Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 6/9] rculist: list_first_or_null_rcu() should use list_entry_rcu() Paul E. McKenney
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

When setting up an in-the-future "advanced" grace period, the code needs
to wake up the relevant grace-period kthread, which it currently does
unconditionally.  However, this results in needless wakeups in the case
where the advanced grace period is being set up by the grace-period
kthread itself, which is a non-uncommon situation.  This commit therefore
checks to see if the running thread is the grace-period kthread, and
avoids doing the irq_work_queue()-mediated wakeup in that case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcutree.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 9184056..c6a064a 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1576,10 +1576,12 @@ rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
 
 	/*
 	 * We can't do wakeups while holding the rnp->lock, as that
-	 * could cause possible deadlocks with the rq->lock. Deter
-	 * the wakeup to interrupt context.
+	 * could cause possible deadlocks with the rq->lock. Defer
+	 * the wakeup to interrupt context.  And don't bother waking
+	 * up the running kthread.
 	 */
-	irq_work_queue(&rsp->wakeup_work);
+	if (current != rsp->gp_kthread)
+		irq_work_queue(&rsp->wakeup_work);
 }
 
 /*
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 6/9] rculist: list_first_or_null_rcu() should use list_entry_rcu()
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
                     ` (3 preceding siblings ...)
  2013-08-20  2:42   ` [PATCH tip/core/rcu 5/9] rcu: Avoid redundant grace-period kthread wakeups Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 7/9] rcu: Select IRQ_WORK from TREE_PREEMPT_RCU Paul E. McKenney
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Tejun Heo, Paul E. McKenney, David S. Miller, Li Zefan,
	Patrick McHardy, stable

From: Tejun Heo <tj@kernel.org>

list_first_or_null() should test whether the list is empty and return
pointer to the first entry if not in a RCU safe manner.  It's broken
in several ways.

* It compares __kernel @__ptr with __rcu @__next triggering the
  following sparse warning.

  net/core/dev.c:4331:17: error: incompatible types in comparison expression (different address spaces)

* It doesn't perform rcu_dereference*() and computes the entry address
  using container_of() directly from the __rcu pointer which is
  inconsitent with other rculist interface.  As a result, all three
  in-kernel users - net/core/dev.c, macvlan, cgroup - are buggy.  They
  dereference the pointer w/o going through read barrier.

* While ->next dereference passes through list_next_rcu(), the
  compiler is still free to fetch ->next more than once and thus
  nullify the "__ptr != __next" condition check.

Fix it by making list_first_or_null_rcu() dereference ->next directly
using ACCESS_ONCE() and then use list_entry_rcu() on it like other
rculist accessors.

v2: Paul pointed out that the compiler may fetch the pointer more than
    once nullifying the condition check.  ACCESS_ONCE() added on
    ->next dereference.

v3: Restored () around macro param which was accidentally removed.
    Spotted by Paul.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: stable@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 include/linux/rculist.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index f4b1001..4106721 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -267,8 +267,9 @@ static inline void list_splice_init_rcu(struct list_head *list,
  */
 #define list_first_or_null_rcu(ptr, type, member) \
 	({struct list_head *__ptr = (ptr); \
-	  struct list_head __rcu *__next = list_next_rcu(__ptr); \
-	  likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
+	  struct list_head *__next = ACCESS_ONCE(__ptr->next); \
+	  likely(__ptr != __next) ? \
+		list_entry_rcu(__next, type, member) : NULL; \
 	})
 
 /**
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 7/9] rcu: Select IRQ_WORK from TREE_PREEMPT_RCU
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
                     ` (4 preceding siblings ...)
  2013-08-20  2:42   ` [PATCH tip/core/rcu 6/9] rculist: list_first_or_null_rcu() should use list_entry_rcu() Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  2:42   ` [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing Paul E. McKenney
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	James Hogan, Paul E. McKenney

From: James Hogan <james.hogan@imgtec.com>

TREE_RCU and TREE_PREEMPT_RCU both cause kernel/rcutree.c to be built,
but only TREE_RCU selects IRQ_WORK, which can result in an undefined
reference to irq_work_queue for some (random) configs:

kernel/built-in.o In function `rcu_start_gp_advanced':
kernel/rcutree.c:1564: undefined reference to `irq_work_queue'

Select IRQ_WORK from TREE_PREEMPT_RCU too to fix this.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 init/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/init/Kconfig b/init/Kconfig
index 247084b..c08a549 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -470,6 +470,7 @@ config TREE_RCU
 config TREE_PREEMPT_RCU
 	bool "Preemptible tree-based hierarchical RCU"
 	depends on PREEMPT
+	select IRQ_WORK
 	help
 	  This option selects the RCU implementation that is
 	  designed for very large SMP systems with hundreds or
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
                     ` (5 preceding siblings ...)
  2013-08-20  2:42   ` [PATCH tip/core/rcu 7/9] rcu: Select IRQ_WORK from TREE_PREEMPT_RCU Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  9:48     ` Lai Jiangshan
  2013-08-20  2:42   ` [PATCH tip/core/rcu 9/9] jiffies: Avoid undefined behavior from signed overflow Paul E. McKenney
  2013-08-20  9:58   ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Lai Jiangshan
  8 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit drops an unneeded ACCESS_ONCE() and simplifies an "our work
is done" check in _rcu_barrier().  This applies feedback from Linus
(https://lkml.org/lkml/2013/7/26/777) that he gave to similar code
in an unrelated patch.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcutree.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index c6a064a..612aff1 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2817,9 +2817,20 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	 * transition.  The "if" expression below therefore rounds the old
 	 * value up to the next even number and adds two before comparing.
 	 */
-	snap_done = ACCESS_ONCE(rsp->n_barrier_done);
+	snap_done = rsp->n_barrier_done;
 	_rcu_barrier_trace(rsp, "Check", -1, snap_done);
-	if (ULONG_CMP_GE(snap_done, ((snap + 1) & ~0x1) + 2)) {
+
+	/*
+	 * If the value in snap is odd, we needed to wait for the current
+	 * rcu_barrier() to complete, then wait for the next one, in other
+	 * words, we need the value of snap_done to be three larger than
+	 * the value of snap.  On the other hand, if the value in snap is
+	 * even, we only had to wait for the next rcu_barrier() to complete,
+	 * in other words, we need the value of snap_done to be only two
+	 * greater than the value of snap.  The "(snap + 3) & 0x1" computes
+	 * this for us (thank you, Linus!).
+	 */
+	if (ULONG_CMP_GE(snap_done, (snap + 3) & ~0x1)) {
 		_rcu_barrier_trace(rsp, "EarlyExit", -1, snap_done);
 		smp_mb(); /* caller's subsequent code after above check. */
 		mutex_unlock(&rsp->barrier_mutex);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/rcu 9/9] jiffies: Avoid undefined behavior from signed overflow
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
                     ` (6 preceding siblings ...)
  2013-08-20  2:42   ` [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing Paul E. McKenney
@ 2013-08-20  2:42   ` Paul E. McKenney
  2013-08-20  9:58   ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Lai Jiangshan
  8 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, sbw,
	Paul E. McKenney, John Stultz, David S. Miller, Arnd Bergmann,
	Ingo Molnar, Linus Torvalds, Eric Dumazet, Kevin Easton

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

According to the C standard 3.4.3p3, overflow of a signed integer results
in undefined behavior.  This commit therefore changes the definitions
of time_after(), time_after_eq(), time_after64(), and time_after_eq64()
to avoid this undefined behavior.  The trick is that the subtraction
is done using unsigned arithmetic, which according to 6.2.5p9 cannot
overflow because it is defined as modulo arithmetic.  This has the added
(though admittedly quite small) benefit of shortening four lines of code
by four characters each.

Note that the C standard considers the cast from unsigned to
signed to be implementation-defined, see 6.3.1.3p3.  However, on a
two's-complement system, an implementation that defines anything other
than a reinterpretation of the bits is free to come to me, and I will be
happy to act as a witness for its being committed to an insane asylum.
(Although I have nothing against saturating arithmetic or signals in some
cases, these things really should not be the default when compiling an
operating-system kernel.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Kevin Easton <kevin@guarana.org>
[ paulmck: Included time_after64() and time_after_eq64(), as suggested
  by Eric Dumazet, also fixed commit message.]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 include/linux/jiffies.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 97ba4e7..d235e88 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -101,13 +101,13 @@ static inline u64 get_jiffies_64(void)
 #define time_after(a,b)		\
 	(typecheck(unsigned long, a) && \
 	 typecheck(unsigned long, b) && \
-	 ((long)(b) - (long)(a) < 0))
+	 ((long)((b) - (a)) < 0))
 #define time_before(a,b)	time_after(b,a)
 
 #define time_after_eq(a,b)	\
 	(typecheck(unsigned long, a) && \
 	 typecheck(unsigned long, b) && \
-	 ((long)(a) - (long)(b) >= 0))
+	 ((long)((a) - (b)) >= 0))
 #define time_before_eq(a,b)	time_after_eq(b,a)
 
 /*
@@ -130,13 +130,13 @@ static inline u64 get_jiffies_64(void)
 #define time_after64(a,b)	\
 	(typecheck(__u64, a) &&	\
 	 typecheck(__u64, b) && \
-	 ((__s64)(b) - (__s64)(a) < 0))
+	 ((__s64)((b) - (a)) < 0))
 #define time_before64(a,b)	time_after64(b,a)
 
 #define time_after_eq64(a,b)	\
 	(typecheck(__u64, a) && \
 	 typecheck(__u64, b) && \
-	 ((__s64)(a) - (__s64)(b) >= 0))
+	 ((__s64)((a) - (b)) >= 0))
 #define time_before_eq64(a,b)	time_after_eq64(b,a)
 
 #define time_in_range64(a, b, c) \
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing
  2013-08-20  2:42   ` [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing Paul E. McKenney
@ 2013-08-20  9:48     ` Lai Jiangshan
  2013-08-20 18:50       ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Lai Jiangshan @ 2013-08-20  9:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw

On 08/20/2013 10:42 AM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> This commit drops an unneeded ACCESS_ONCE() and simplifies an "our work
> is done" check in _rcu_barrier().  This applies feedback from Linus
> (https://lkml.org/lkml/2013/7/26/777) that he gave to similar code
> in an unrelated patch.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  kernel/rcutree.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index c6a064a..612aff1 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2817,9 +2817,20 @@ static void _rcu_barrier(struct rcu_state *rsp)
>  	 * transition.  The "if" expression below therefore rounds the old
>  	 * value up to the next even number and adds two before comparing.
>  	 */
> -	snap_done = ACCESS_ONCE(rsp->n_barrier_done);
> +	snap_done = rsp->n_barrier_done;
>  	_rcu_barrier_trace(rsp, "Check", -1, snap_done);
> -	if (ULONG_CMP_GE(snap_done, ((snap + 1) & ~0x1) + 2)) {
> +
> +	/*
> +	 * If the value in snap is odd, we needed to wait for the current
> +	 * rcu_barrier() to complete, then wait for the next one, in other
> +	 * words, we need the value of snap_done to be three larger than
> +	 * the value of snap.  On the other hand, if the value in snap is
> +	 * even, we only had to wait for the next rcu_barrier() to complete,
> +	 * in other words, we need the value of snap_done to be only two
> +	 * greater than the value of snap.  The "(snap + 3) & 0x1" computes

"(snap + 3) & 0x1"
==> "(snap + 3) & ~0x1"

> +	 * this for us (thank you, Linus!).
> +	 */
> +	if (ULONG_CMP_GE(snap_done, (snap + 3) & ~0x1)) {
>  		_rcu_barrier_trace(rsp, "EarlyExit", -1, snap_done);
>  		smp_mb(); /* caller's subsequent code after above check. */
>  		mutex_unlock(&rsp->barrier_mutex);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume
  2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
                     ` (7 preceding siblings ...)
  2013-08-20  2:42   ` [PATCH tip/core/rcu 9/9] jiffies: Avoid undefined behavior from signed overflow Paul E. McKenney
@ 2013-08-20  9:58   ` Lai Jiangshan
  2013-08-20 18:42     ` Paul E. McKenney
  8 siblings, 1 reply; 14+ messages in thread
From: Lai Jiangshan @ 2013-08-20  9:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Borislav Petkov, Borislav Petkov, Bjørn Mork

On 08/20/2013 10:42 AM, Paul E. McKenney wrote:
> From: Borislav Petkov <bp@alien8.de>
> 
> CONFIG_RCU_FAST_NO_HZ can increase grace-period durations by up to
> a factor of four, which can result in long suspend and resume times.
> Thus, this commit temporarily switches to expedited grace periods when
> suspending the box and return to normal settings when resuming.  Similar
> logic is applied to hibernation.
> 
> Because expedited grace periods are of dubious benefit on very large
> systems, so this commit restricts their automated use during suspend
> and resume to systems of 256 or fewer CPUs.  (Some day a number of
> Linux-kernel facilities, including RCU's expedited grace periods,
> will be more scalable, but I need to see bug reports first.)
> 
> [ paulmck: This also papers over an audio/irq bug, but hopefully that will
>   be fixed soon. ]
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  kernel/rcutree.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 338f1d1..a7bf517 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -54,6 +54,7 @@
>  #include <linux/stop_machine.h>
>  #include <linux/random.h>
>  #include <linux/ftrace_event.h>
> +#include <linux/suspend.h>
>  
>  #include "rcutree.h"
>  #include <trace/events/rcu.h>
> @@ -3032,6 +3033,25 @@ static int rcu_cpu_notify(struct notifier_block *self,
>  	return NOTIFY_OK;
>  }
>  
> +static int rcu_pm_notify(struct notifier_block *self,
> +			 unsigned long action, void *hcpu)
> +{
> +	switch (action) {
> +	case PM_HIBERNATION_PREPARE:
> +	case PM_SUSPEND_PREPARE:
> +		if (nr_cpu_ids <= 256) /* Expediting bad for large systems. */
> +			rcu_expedited = 1;
> +		break;
> +	case PM_POST_HIBERNATION:
> +	case PM_POST_SUSPEND:
> +		rcu_expedited = 0;

Users can set it via sysfs, this notify will changes it.
I think we can introduce an rcu_expedited_syfs_saved;
thus we can change this line to:
-		rcu_expedited = 0;
+		rcu_expedited = rcu_expedited_syfs_saved;


rcu_init() {
	...
+	rcu_expedited_syfs_saved = rcu_expedited;
}

static ssize_t rcu_expedited_store(struct kobject *kobj,
				   struct kobj_attribute *attr,
				   const char *buf, size_t count)
{
	if (kstrtoint(buf, 0, &rcu_expedited))
		return -EINVAL;

+	rcu_expedited_syfs_saved = rcu_expedited;
	return count;
}

> +		break;
> +	default:
> +		break;
> +	}
> +	return NOTIFY_OK;
> +}
> +
>  /*
>   * Spawn the kthread that handles this RCU flavor's grace periods.
>   */
> @@ -3273,6 +3293,7 @@ void __init rcu_init(void)
>  	 * or the scheduler are operational.
>  	 */
>  	cpu_notifier(rcu_cpu_notify, 0);
> +	pm_notifier(rcu_pm_notify, 0);
>  	for_each_online_cpu(cpu)
>  		rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
>  }


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume
  2013-08-20  9:58   ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Lai Jiangshan
@ 2013-08-20 18:42     ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20 18:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw, Borislav Petkov, Borislav Petkov, Bjørn Mork

On Tue, Aug 20, 2013 at 05:58:40PM +0800, Lai Jiangshan wrote:
> On 08/20/2013 10:42 AM, Paul E. McKenney wrote:
> > From: Borislav Petkov <bp@alien8.de>
> > 
> > CONFIG_RCU_FAST_NO_HZ can increase grace-period durations by up to
> > a factor of four, which can result in long suspend and resume times.
> > Thus, this commit temporarily switches to expedited grace periods when
> > suspending the box and return to normal settings when resuming.  Similar
> > logic is applied to hibernation.
> > 
> > Because expedited grace periods are of dubious benefit on very large
> > systems, so this commit restricts their automated use during suspend
> > and resume to systems of 256 or fewer CPUs.  (Some day a number of
> > Linux-kernel facilities, including RCU's expedited grace periods,
> > will be more scalable, but I need to see bug reports first.)
> > 
> > [ paulmck: This also papers over an audio/irq bug, but hopefully that will
> >   be fixed soon. ]
> > 
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > Signed-off-by: Bjørn Mork <bjorn@mork.no>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  kernel/rcutree.c | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 338f1d1..a7bf517 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -54,6 +54,7 @@
> >  #include <linux/stop_machine.h>
> >  #include <linux/random.h>
> >  #include <linux/ftrace_event.h>
> > +#include <linux/suspend.h>
> >  
> >  #include "rcutree.h"
> >  #include <trace/events/rcu.h>
> > @@ -3032,6 +3033,25 @@ static int rcu_cpu_notify(struct notifier_block *self,
> >  	return NOTIFY_OK;
> >  }
> >  
> > +static int rcu_pm_notify(struct notifier_block *self,
> > +			 unsigned long action, void *hcpu)
> > +{
> > +	switch (action) {
> > +	case PM_HIBERNATION_PREPARE:
> > +	case PM_SUSPEND_PREPARE:
> > +		if (nr_cpu_ids <= 256) /* Expediting bad for large systems. */
> > +			rcu_expedited = 1;
> > +		break;
> > +	case PM_POST_HIBERNATION:
> > +	case PM_POST_SUSPEND:
> > +		rcu_expedited = 0;
> 
> Users can set it via sysfs, this notify will changes it.
> I think we can introduce an rcu_expedited_syfs_saved;
> thus we can change this line to:
> -		rcu_expedited = 0;
> +		rcu_expedited = rcu_expedited_syfs_saved;

We could do this, but there are still races where user tasks update sysfs
while the operation is in progress.  There are other races as well,
particularly if multiple user tasks are concurrently attempting to do
this sysfs update.  The final solution likely involves a bunch of stuff,
possibly including a driver to gain release-on-exit semantics.

Until someone actually tries using this, we won't really know what we
actually need.  And it is always possible that no one will actually
use it.  So we need to hold off until we see some real-world use cases.

							Thanx, Paul

> rcu_init() {
> 	...
> +	rcu_expedited_syfs_saved = rcu_expedited;
> }
> 
> static ssize_t rcu_expedited_store(struct kobject *kobj,
> 				   struct kobj_attribute *attr,
> 				   const char *buf, size_t count)
> {
> 	if (kstrtoint(buf, 0, &rcu_expedited))
> 		return -EINVAL;
> 
> +	rcu_expedited_syfs_saved = rcu_expedited;
> 	return count;
> }
> 
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +	return NOTIFY_OK;
> > +}
> > +
> >  /*
> >   * Spawn the kthread that handles this RCU flavor's grace periods.
> >   */
> > @@ -3273,6 +3293,7 @@ void __init rcu_init(void)
> >  	 * or the scheduler are operational.
> >  	 */
> >  	cpu_notifier(rcu_cpu_notify, 0);
> > +	pm_notifier(rcu_pm_notify, 0);
> >  	for_each_online_cpu(cpu)
> >  		rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
> >  }
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing
  2013-08-20  9:48     ` Lai Jiangshan
@ 2013-08-20 18:50       ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2013-08-20 18:50 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	sbw

On Tue, Aug 20, 2013 at 05:48:01PM +0800, Lai Jiangshan wrote:
> On 08/20/2013 10:42 AM, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > This commit drops an unneeded ACCESS_ONCE() and simplifies an "our work
> > is done" check in _rcu_barrier().  This applies feedback from Linus
> > (https://lkml.org/lkml/2013/7/26/777) that he gave to similar code
> > in an unrelated patch.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  kernel/rcutree.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index c6a064a..612aff1 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -2817,9 +2817,20 @@ static void _rcu_barrier(struct rcu_state *rsp)
> >  	 * transition.  The "if" expression below therefore rounds the old
> >  	 * value up to the next even number and adds two before comparing.
> >  	 */
> > -	snap_done = ACCESS_ONCE(rsp->n_barrier_done);
> > +	snap_done = rsp->n_barrier_done;
> >  	_rcu_barrier_trace(rsp, "Check", -1, snap_done);
> > -	if (ULONG_CMP_GE(snap_done, ((snap + 1) & ~0x1) + 2)) {
> > +
> > +	/*
> > +	 * If the value in snap is odd, we needed to wait for the current
> > +	 * rcu_barrier() to complete, then wait for the next one, in other
> > +	 * words, we need the value of snap_done to be three larger than
> > +	 * the value of snap.  On the other hand, if the value in snap is
> > +	 * even, we only had to wait for the next rcu_barrier() to complete,
> > +	 * in other words, we need the value of snap_done to be only two
> > +	 * greater than the value of snap.  The "(snap + 3) & 0x1" computes
> 
> "(snap + 3) & 0x1"
> ==> "(snap + 3) & ~0x1"

Good catch, fixed!

							Thanx, Paul

> > +	 * this for us (thank you, Linus!).
> > +	 */
> > +	if (ULONG_CMP_GE(snap_done, (snap + 3) & ~0x1)) {
> >  		_rcu_barrier_trace(rsp, "EarlyExit", -1, snap_done);
> >  		smp_mb(); /* caller's subsequent code after above check. */
> >  		mutex_unlock(&rsp->barrier_mutex);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-08-20 18:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-20  2:41 [PATCH tip/core/rcu 0/9] v2 Fixes for 3.12 Paul E. McKenney
2013-08-20  2:42 ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 2/9] rcu: Simplify debug-objects fixups Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 3/9] debugobjects: Make debug_object_activate() return status Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 4/9] rcu: Make call_rcu() leak callbacks for debug-object errors Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 5/9] rcu: Avoid redundant grace-period kthread wakeups Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 6/9] rculist: list_first_or_null_rcu() should use list_entry_rcu() Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 7/9] rcu: Select IRQ_WORK from TREE_PREEMPT_RCU Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 8/9] rcu: Simplify _rcu_barrier() processing Paul E. McKenney
2013-08-20  9:48     ` Lai Jiangshan
2013-08-20 18:50       ` Paul E. McKenney
2013-08-20  2:42   ` [PATCH tip/core/rcu 9/9] jiffies: Avoid undefined behavior from signed overflow Paul E. McKenney
2013-08-20  9:58   ` [PATCH tip/core/rcu 1/9] rcu: Expedite grace periods during suspend/resume Lai Jiangshan
2013-08-20 18:42     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.