All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/16] RCU-tasks implementation
@ 2014-08-11 22:48 Paul E. McKenney
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
  2014-08-12 23:57 ` [PATCH tip/core/rcu 0/16] RCU-tasks implementation Paul E. McKenney
  0 siblings, 2 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

Hello!

This series provides v5 of a prototype of an RCU-tasks implementation,
which has been requested to assist with tramopoline removal.  This flavor
of RCU is task-based rather than CPU-based, and has voluntary context
switch, usermode execution, and the idle loops as its only quiescent
states.  This selection of quiescent states ensures that at the end
of a grace period, there will no longer be any tasks depending on a
trampoline that was removed before the beginning of that grace period.
This works because such trampolines do not contain function calls,
do not contain voluntary context switches, do not switch to usermode,
and do not switch to idle.

The patches in this series are as follows:

1.	Adds the basic call_rcu_tasks() functionality.

2.	Provides cond_resched_rcu_qs() to force quiescent states, including
	RCU-tasks quiescent states, in long loops.

3.	Adds synchronous APIs: synchronize_rcu_tasks() and
	rcu_barrier_tasks().

4.	Handle the possibility of tasks being preempted for extended
	periods of time after being removed from the task list.

5.	Adds GPL exports for the above APIs, courtesy of Steven Rostedt.

6.	Adds rcutorture tests for RCU-tasks.

7.	Adds RCU-tasks test cases to rcutorture scripting.

8.	Adds stall-warning checks for RCU-tasks.

9.	Improves RCU-tasks energy efficiency by replacing polling with
	wait/wakeup.

10.	Document RCU-tasks stall-warning messages.

11.	Defer rcu_tasks_kthread() creation until first call_rcu_tasks()
	to avoid populating systems with unneeded kthreads.

12.	Treat nohz_full= operation by a given task on a given CPU as
	an RCU-tasks quiescent state.  (In previous versions, RCU-tasks
	would wait for a context switch or non-nohz_full operation in
	this case.)

13.	Allow preemption while looping over holdout tasks while waiting
	for grace-period to end.

14.	Remove redundant preempt_disable() from
	rcu_note_voluntary_context_switch().

15.	Make RCU-tasks grace periods wait for idle tasks.

16.	Add additional task-specific information on RCU-tasks stall-warning
	messages.

Changes from v4:

o       CONFIG_PROVE_RCU added to one of the test scenarios.

o	Moved from srcu_read_lock() and srcu_read_unlock to __srcu_read_lock()
	and __srcu_read_unlock() to avoid CONFIG_PROVE_RCU false positives
	in do_exit().

o	Added tracking of idle tasks.

o	Improved tracking of nohz_full tasks (and by extension, idle tasks)
	by having RCU's dyntick-idle transitions store the CPU number into
	the task structure instead of the previous choice of storing the
	task pointer into the per-CPU rcu_dynticks data structure.

o	Changed timings to reduce overhead.  The loop scanning the list
	of holdout tasks is now done once per second instead of ten times
	a second, and stall warnings are emitted ten minutes into the
	grace period instead of three minutes into the grace period.

o	Added more task-state information on stall warnings.

Changes from v3:

o	Add do_exit() SRCU hooks to handle tasks being preempted after
	having removed themselves from the task list.  The need for this
	was pointed out by Oleg Nesterov, and the implmentation suggested
	by Lai Jiangshan.

o	Create rcu_tasks_kthread only if call_rcu_tasks() is invoked.

Changes from v2:

o	Use get_task_struct() instead of do_exit() hooks to synchronize
	with exiting tasks, as suggested by Lai Jiangshan.

o	Add checks of ->on_rq to the grace-period-wait polling, again
	as suggested by Lai Jiangshan.

o	Repositioned synchronize_sched() calls and improved their
	comments.

Changes from v1:

o	The lockdep issue with list locking was finessed by ditching
	list locking in favor of having the list manipulated by a single
	kthread.  This change trimmed about 150 highly concurrent lines
	from the implementation.

o	Get rid of the scheduler hooks in favor of polling the
	per-task count of voluntary context switches, in response
	to Peter Zijlstra's concerns about scheduler overhead.

o	Passes more aggressive rcutorture runs, which indicates that
	an increase in rcutorture's aggression is called for.

o	Handled review comments from Peter Zijlstra, Lai Jiangshan,
	Frederic Weisbecker, and Oleg Nesterov.

o	Added RCU-tasks stall-warning documentation.

Remaining issues include:

o	The current implementation does not yet recognize tasks that start
	out executing is usermode.  Instead, it waits for the next
	scheduling-clock tick to note them.

o	If a task is preempted while executing in usermode, the RCU-tasks
	grace period will not end until that task resumes.

o	More about RCU-tasks needs to be added to Documentation/RCU.

o	CPUs that would otherwise remain idle throughout an RCU-tasks
	grace period will be interrupted once.  (It might be possible
	to do without this, discussions in flight.)

o	There are concerns that use of this very special-purpose RCU
	API might grow unexpectedly large.  One idea under discussion
	is to export this API only to specific subsystems.

o	There are probably still bugs.

							Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/RCU/stallwarn.txt                             |   33 
 b/Documentation/kernel-parameters.txt                         |    5 
 b/fs/file.c                                                   |    2 
 b/include/linux/init_task.h                                   |   12 
 b/include/linux/rcupdate.h                                    |   57 +
 b/include/linux/sched.h                                       |   25 
 b/init/Kconfig                                                |   10 
 b/kernel/exit.c                                               |    3 
 b/kernel/rcu/rcutorture.c                                     |   54 +
 b/kernel/rcu/tiny.c                                           |    2 
 b/kernel/rcu/tree.c                                           |   16 
 b/kernel/rcu/tree.h                                           |    2 
 b/kernel/rcu/tree_plugin.h                                    |   18 
 b/kernel/rcu/update.c                                         |  413 +++++++++-
 b/mm/mlock.c                                                  |    2 
 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01      |    9 
 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02      |    5 
 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot |    1 
 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03      |   13 
 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot |    1 
 21 files changed, 618 insertions(+), 66 deletions(-)


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks()
  2014-08-11 22:48 [PATCH tip/core/rcu 0/16] RCU-tasks implementation Paul E. McKenney
@ 2014-08-11 22:48 ` Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 02/16] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops Paul E. McKenney
                     ` (15 more replies)
  2014-08-12 23:57 ` [PATCH tip/core/rcu 0/16] RCU-tasks implementation Paul E. McKenney
  1 sibling, 16 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit adds a new RCU-tasks flavor of RCU, which provides
call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
context switch (not preemption!), userspace execution, and the idle loop.
Note that unlike other RCU flavors, these quiescent states occur in tasks,
not necessarily CPUs.  Includes fixes from Steven Rostedt.

This RCU flavor is assumed to have very infrequent latency-tolerant
updaters.  This assumption permits significant simplifications, including
a single global callback list protected by a single global lock, along
with a single linked list containing all tasks that have not yet passed
through a quiescent state.  If experience shows this assumption to be
incorrect, the required additional complexity will be added.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/init_task.h |   9 +++
 include/linux/rcupdate.h  |  36 ++++++++++
 include/linux/sched.h     |  23 ++++---
 init/Kconfig              |  10 +++
 kernel/rcu/tiny.c         |   2 +
 kernel/rcu/tree.c         |   2 +
 kernel/rcu/update.c       | 171 ++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 242 insertions(+), 11 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6df7f9fe0d01..78715ea7c30c 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -124,6 +124,14 @@ extern struct group_info init_groups;
 #else
 #define INIT_TASK_RCU_PREEMPT(tsk)
 #endif
+#ifdef CONFIG_TASKS_RCU
+#define INIT_TASK_RCU_TASKS(tsk)					\
+	.rcu_tasks_holdout = false,					\
+	.rcu_tasks_holdout_list =					\
+		LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
+#else
+#define INIT_TASK_RCU_TASKS(tsk)
+#endif
 
 extern struct cred init_cred;
 
@@ -231,6 +239,7 @@ extern struct task_group root_task_group;
 	INIT_FTRACE_GRAPH						\
 	INIT_TRACE_RECURSION						\
 	INIT_TASK_RCU_PREEMPT(tsk)					\
+	INIT_TASK_RCU_TASKS(tsk)					\
 	INIT_CPUSET_SEQ(tsk)						\
 	INIT_RT_MUTEXES(tsk)						\
 	INIT_VTIME(tsk)							\
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6a94cc8b1ca0..829efc99df3e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
 
 void synchronize_sched(void);
 
+/**
+ * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all currently executing RCU
+ * read-side critical sections have completed. call_rcu_tasks() assumes
+ * that the read-side critical sections end at a voluntary context
+ * switch (not a preemption!), entry into idle, or transition to usermode
+ * execution.  As such, there are no read-side primitives analogous to
+ * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
+ * to determine that all tasks have passed through a safe state, not so
+ * much for data-strcuture synchronization.
+ *
+ * See the description of call_rcu() for more detailed information on
+ * memory ordering guarantees.
+ */
+void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head));
+
 #ifdef CONFIG_PREEMPT_RCU
 
 void __rcu_read_lock(void);
@@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
 		rcu_irq_exit(); \
 	} while (0)
 
+/*
+ * Note a voluntary context switch for RCU-tasks benefit.  This is a
+ * macro rather than an inline function to avoid #include hell.
+ */
+#ifdef CONFIG_TASKS_RCU
+#define rcu_note_voluntary_context_switch(t) \
+	do { \
+		preempt_disable(); /* Exclude synchronize_sched(); */ \
+		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
+			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
+		preempt_enable(); \
+	} while (0)
+#else /* #ifdef CONFIG_TASKS_RCU */
+#define rcu_note_voluntary_context_switch(t)	do { } while (0)
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
+
 #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP)
 bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 306f4f0c987a..3cf124389ec7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1273,6 +1273,11 @@ struct task_struct {
 #ifdef CONFIG_RCU_BOOST
 	struct rt_mutex *rcu_boost_mutex;
 #endif /* #ifdef CONFIG_RCU_BOOST */
+#ifdef CONFIG_TASKS_RCU
+	unsigned long rcu_tasks_nvcsw;
+	int rcu_tasks_holdout;
+	struct list_head rcu_tasks_holdout_list;
+#endif /* #ifdef CONFIG_TASKS_RCU */
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
 	struct sched_info sched_info;
@@ -1998,31 +2003,27 @@ extern void task_clear_jobctl_pending(struct task_struct *task,
 				      unsigned int mask);
 
 #ifdef CONFIG_PREEMPT_RCU
-
 #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
 #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
+#endif /* #ifdef CONFIG_PREEMPT_RCU */
 
 static inline void rcu_copy_process(struct task_struct *p)
 {
+#ifdef CONFIG_PREEMPT_RCU
 	p->rcu_read_lock_nesting = 0;
 	p->rcu_read_unlock_special = 0;
-#ifdef CONFIG_TREE_PREEMPT_RCU
 	p->rcu_blocked_node = NULL;
-#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
 #ifdef CONFIG_RCU_BOOST
 	p->rcu_boost_mutex = NULL;
 #endif /* #ifdef CONFIG_RCU_BOOST */
 	INIT_LIST_HEAD(&p->rcu_node_entry);
+#endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TASKS_RCU
+	p->rcu_tasks_holdout = false;
+	INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
+#endif /* #ifdef CONFIG_TASKS_RCU */
 }
 
-#else
-
-static inline void rcu_copy_process(struct task_struct *p)
-{
-}
-
-#endif
-
 static inline void tsk_restore_flags(struct task_struct *task,
 				unsigned long orig_flags, unsigned long flags)
 {
diff --git a/init/Kconfig b/init/Kconfig
index 9d76b99af1b9..c56cb62a2df1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -507,6 +507,16 @@ config PREEMPT_RCU
 	  This option enables preemptible-RCU code that is common between
 	  the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
 
+config TASKS_RCU
+	bool "Task_based RCU implementation using voluntary context switch"
+	default n
+	help
+	  This option enables a task-based RCU implementation that uses
+	  only voluntary context switch (not preemption!), idle, and
+	  user-mode execution as quiescent states.
+
+	  If unsure, say N.
+
 config RCU_STALL_COMMON
 	def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
 	help
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index d9efcc13008c..717f00854fc0 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -254,6 +254,8 @@ void rcu_check_callbacks(int cpu, int user)
 		rcu_sched_qs(cpu);
 	else if (!in_softirq())
 		rcu_bh_qs(cpu);
+	if (user)
+		rcu_note_voluntary_context_switch(current);
 }
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 625d0b0cd75a..f958c52f644d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2413,6 +2413,8 @@ void rcu_check_callbacks(int cpu, int user)
 	rcu_preempt_check_callbacks(cpu);
 	if (rcu_pending(cpu))
 		invoke_rcu_core();
+	if (user)
+		rcu_note_voluntary_context_switch(current);
 	trace_rcu_utilization(TPS("End scheduler-tick"));
 }
 
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index bc7883570530..f6f164119a14 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -47,6 +47,7 @@
 #include <linux/hardirq.h>
 #include <linux/delay.h>
 #include <linux/module.h>
+#include <linux/kthread.h>
 
 #define CREATE_TRACE_POINTS
 
@@ -350,3 +351,173 @@ static int __init check_cpu_stall_init(void)
 early_initcall(check_cpu_stall_init);
 
 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
+
+#ifdef CONFIG_TASKS_RCU
+
+/*
+ * Simple variant of RCU whose quiescent states are voluntary context switch,
+ * user-space execution, and idle.  As such, grace periods can take one good
+ * long time.  There are no read-side primitives similar to rcu_read_lock()
+ * and rcu_read_unlock() because this implementation is intended to get
+ * the system into a safe state for some of the manipulations involved in
+ * tracing and the like.  Finally, this implementation does not support
+ * high call_rcu_tasks() rates from multiple CPUs.  If this is required,
+ * per-CPU callback lists will be needed.
+ */
+
+/* Global list of callbacks and associated lock. */
+static struct rcu_head *rcu_tasks_cbs_head;
+static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
+static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
+
+/* Post an RCU-tasks callback. */
+void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
+{
+	unsigned long flags;
+
+	rhp->next = NULL;
+	rhp->func = func;
+	raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
+	*rcu_tasks_cbs_tail = rhp;
+	rcu_tasks_cbs_tail = &rhp->next;
+	raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
+}
+EXPORT_SYMBOL_GPL(call_rcu_tasks);
+
+/* See if tasks are still holding out, complain if so. */
+static void check_holdout_task(struct task_struct *t)
+{
+	if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
+	    t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
+	    !ACCESS_ONCE(t->on_rq)) {
+		ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
+		list_del_rcu(&t->rcu_tasks_holdout_list);
+		put_task_struct(t);
+	}
+}
+
+/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
+static int __noreturn rcu_tasks_kthread(void *arg)
+{
+	unsigned long flags;
+	struct task_struct *g, *t;
+	struct rcu_head *list;
+	struct rcu_head *next;
+	LIST_HEAD(rcu_tasks_holdouts);
+
+	/* FIXME: Add housekeeping affinity. */
+
+	/*
+	 * Each pass through the following loop makes one check for
+	 * newly arrived callbacks, and, if there are some, waits for
+	 * one RCU-tasks grace period and then invokes the callbacks.
+	 * This loop is terminated by the system going down.  ;-)
+	 */
+	for (;;) {
+
+		/* Pick up any new callbacks. */
+		raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
+		list = rcu_tasks_cbs_head;
+		rcu_tasks_cbs_head = NULL;
+		rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
+		raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
+
+		/* If there were none, wait a bit and start over. */
+		if (!list) {
+			schedule_timeout_interruptible(HZ);
+			WARN_ON(signal_pending(current));
+			continue;
+		}
+
+		/*
+		 * Wait for all pre-existing t->on_rq and t->nvcsw
+		 * transitions to complete.  Invoking synchronize_sched()
+		 * suffices because all these transitions occur with
+		 * interrupts disabled.  Without this synchronize_sched(),
+		 * a read-side critical section that started before the
+		 * grace period might be incorrectly seen as having started
+		 * after the grace period.
+		 *
+		 * This synchronize_sched() also dispenses with the
+		 * need for a memory barrier on the first store to
+		 * ->rcu_tasks_holdout, as it forces the store to happen
+		 * after the beginning of the grace period.
+		 */
+		synchronize_sched();
+
+		/*
+		 * There were callbacks, so we need to wait for an
+		 * RCU-tasks grace period.  Start off by scanning
+		 * the task list for tasks that are not already
+		 * voluntarily blocked.  Mark these tasks and make
+		 * a list of them in rcu_tasks_holdouts.
+		 */
+		rcu_read_lock();
+		for_each_process_thread(g, t) {
+			if (t != current && ACCESS_ONCE(t->on_rq) &&
+			    !is_idle_task(t)) {
+				get_task_struct(t);
+				t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw);
+				ACCESS_ONCE(t->rcu_tasks_holdout) = 1;
+				list_add(&t->rcu_tasks_holdout_list,
+					 &rcu_tasks_holdouts);
+			}
+		}
+		rcu_read_unlock();
+
+		/*
+		 * Each pass through the following loop scans the list
+		 * of holdout tasks, removing any that are no longer
+		 * holdouts.  When the list is empty, we are done.
+		 */
+		while (!list_empty(&rcu_tasks_holdouts)) {
+			schedule_timeout_interruptible(HZ);
+			WARN_ON(signal_pending(current));
+			rcu_read_lock();
+			list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
+						rcu_tasks_holdout_list)
+				check_holdout_task(t);
+			rcu_read_unlock();
+		}
+
+		/*
+		 * Because ->on_rq and ->nvcsw are not guaranteed
+		 * to have a full memory barriers prior to them in the
+		 * schedule() path, memory reordering on other CPUs could
+		 * cause their RCU-tasks read-side critical sections to
+		 * extend past the end of the grace period.  However,
+		 * because these ->nvcsw updates are carried out with
+		 * interrupts disabled, we can use synchronize_sched()
+		 * to force the needed ordering on all such CPUs.
+		 *
+		 * This synchronize_sched() also confines all
+		 * ->rcu_tasks_holdout accesses to be within the grace
+		 * period, avoiding the need for memory barriers for
+		 * ->rcu_tasks_holdout accesses.
+		 */
+		synchronize_sched();
+
+		/* Invoke the callbacks. */
+		while (list) {
+			next = list->next;
+			local_bh_disable();
+			list->func(list);
+			local_bh_enable();
+			list = next;
+			cond_resched();
+		}
+	}
+}
+
+/* Spawn rcu_tasks_kthread() at boot time. */
+static int __init rcu_spawn_tasks_kthread(void)
+{
+	struct task_struct __maybe_unused *t;
+
+	t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
+	BUG_ON(IS_ERR(t));
+	return 0;
+}
+early_initcall(rcu_spawn_tasks_kthread);
+
+#endif /* #ifdef CONFIG_TASKS_RCU */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 02/16] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 03/16] rcu: Add synchronous grace-period waiting for RCU-tasks Paul E. McKenney
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

RCU-tasks requires the occasional voluntary context switch
from CPU-bound in-kernel tasks.  In some cases, this requires
instrumenting cond_resched().  However, there is some reluctance
to countenance unconditionally instrumenting cond_resched() (see
http://lwn.net/Articles/603252/), so this commit creates a separate
cond_resched_rcu_qs() that may be used in place of cond_resched() in
locations prone to long-duration in-kernel looping.

This commit currently instruments only RCU-tasks.  Future possibilities
include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
IPI usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 fs/file.c                |  2 +-
 include/linux/rcupdate.h | 13 +++++++++++++
 kernel/rcu/rcutorture.c  |  4 ++--
 kernel/rcu/tree.c        | 12 ++++++------
 kernel/rcu/tree_plugin.h |  2 +-
 mm/mlock.c               |  2 +-
 6 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 66923fe3176e..1cafc4c9275b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -367,7 +367,7 @@ static struct fdtable *close_files(struct files_struct * files)
 				struct file * file = xchg(&fdt->fd[i], NULL);
 				if (file) {
 					filp_close(file, files);
-					cond_resched();
+					cond_resched_rcu_qs();
 				}
 			}
 			i++;
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 829efc99df3e..ac87f587a1c1 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -330,6 +330,19 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
 #define rcu_note_voluntary_context_switch(t)	do { } while (0)
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
+/**
+ * cond_resched_rcu_qs - Report potential quiescent states to RCU
+ *
+ * This macro resembles cond_resched(), except that it is defined to
+ * report potential quiescent states to RCU-tasks even if the cond_resched()
+ * machinery were to be shut off, as some advocate for PREEMPT kernels.
+ */
+#define cond_resched_rcu_qs() \
+do { \
+	rcu_note_voluntary_context_switch(current); \
+	cond_resched(); \
+} while (0)
+
 #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP)
 bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7fa34f86e5ba..febe07062ac5 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -667,7 +667,7 @@ static int rcu_torture_boost(void *arg)
 				}
 				call_rcu_time = jiffies;
 			}
-			cond_resched();
+			cond_resched_rcu_qs();
 			stutter_wait("rcu_torture_boost");
 			if (torture_must_stop())
 				goto checkwait;
@@ -1019,7 +1019,7 @@ rcu_torture_reader(void *arg)
 		__this_cpu_inc(rcu_torture_batch[completed]);
 		preempt_enable();
 		cur_ops->readunlock(idx);
-		cond_resched();
+		cond_resched_rcu_qs();
 		stutter_wait("rcu_torture_reader");
 	} while (!torture_must_stop());
 	if (irqreader && cur_ops->irq_capable) {
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f958c52f644d..645a33efc0d4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1650,7 +1650,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 		    system_state == SYSTEM_RUNNING)
 			udelay(200);
 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
-		cond_resched();
+		cond_resched_rcu_qs();
 	}
 
 	mutex_unlock(&rsp->onoff_mutex);
@@ -1739,7 +1739,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 		/* smp_mb() provided by prior unlock-lock pair. */
 		nocb += rcu_future_gp_cleanup(rsp, rnp);
 		raw_spin_unlock_irq(&rnp->lock);
-		cond_resched();
+		cond_resched_rcu_qs();
 	}
 	rnp = rcu_get_root(rsp);
 	raw_spin_lock_irq(&rnp->lock);
@@ -1788,7 +1788,7 @@ static int __noreturn rcu_gp_kthread(void *arg)
 			/* Locking provides needed memory barrier. */
 			if (rcu_gp_init(rsp))
 				break;
-			cond_resched();
+			cond_resched_rcu_qs();
 			flush_signals(current);
 			trace_rcu_grace_period(rsp->name,
 					       ACCESS_ONCE(rsp->gpnum),
@@ -1831,10 +1831,10 @@ static int __noreturn rcu_gp_kthread(void *arg)
 				trace_rcu_grace_period(rsp->name,
 						       ACCESS_ONCE(rsp->gpnum),
 						       TPS("fqsend"));
-				cond_resched();
+				cond_resched_rcu_qs();
 			} else {
 				/* Deal with stray signal. */
-				cond_resched();
+				cond_resched_rcu_qs();
 				flush_signals(current);
 				trace_rcu_grace_period(rsp->name,
 						       ACCESS_ONCE(rsp->gpnum),
@@ -2437,7 +2437,7 @@ static void force_qs_rnp(struct rcu_state *rsp,
 	struct rcu_node *rnp;
 
 	rcu_for_each_leaf_node(rsp, rnp) {
-		cond_resched();
+		cond_resched_rcu_qs();
 		mask = 0;
 		raw_spin_lock_irqsave(&rnp->lock, flags);
 		smp_mb__after_unlock_lock();
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 02ac0fb186b8..a86a363ea453 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1842,7 +1842,7 @@ static int rcu_oom_notify(struct notifier_block *self,
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
-		cond_resched();
+		cond_resched_rcu_qs();
 	}
 	put_online_cpus();
 
diff --git a/mm/mlock.c b/mm/mlock.c
index b1eb53634005..bc386a22d647 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -782,7 +782,7 @@ static int do_mlockall(int flags)
 
 		/* Ignore errors */
 		mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
-		cond_resched();
+		cond_resched_rcu_qs();
 	}
 out:
 	return 0;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 03/16] rcu: Add synchronous grace-period waiting for RCU-tasks
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 02/16] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 04/16] rcu: Make TASKS_RCU handle tasks that are almost done exiting Paul E. McKenney
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

It turns out to be easier to add the synchronous grace-period waiting
functions to RCU-tasks than to work around their absense in rcutorture,
so this commit adds them.  The key point is that the existence of
call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |  2 ++
 kernel/rcu/update.c      | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ac87f587a1c1..1f073af940a5 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -216,6 +216,8 @@ void synchronize_sched(void);
  * memory ordering guarantees.
  */
 void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head));
+void synchronize_rcu_tasks(void);
+void rcu_barrier_tasks(void);
 
 #ifdef CONFIG_PREEMPT_RCU
 
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f6f164119a14..d6fa54befa22 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -384,6 +384,61 @@ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
 }
 EXPORT_SYMBOL_GPL(call_rcu_tasks);
 
+/**
+ * synchronize_rcu_tasks - wait until an rcu-tasks grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full rcu-tasks
+ * grace period has elapsed, in other words after all currently
+ * executing rcu-tasks read-side critical sections have elapsed.  These
+ * read-side critical sections are delimited by calls to schedule(),
+ * cond_resched_rcu_qs(), idle execution, userspace execution, calls
+ * to synchronize_rcu_tasks(), and (in theory, anyway) cond_resched().
+ *
+ * This is a very specialized primitive, intended only for a few uses in
+ * tracing and other situations requiring manipulation of function
+ * preambles and profiling hooks.  The synchronize_rcu_tasks() function
+ * is not (yet) intended for heavy use from multiple CPUs.
+ *
+ * Note that this guarantee implies further memory-ordering guarantees.
+ * On systems with more than one CPU, when synchronize_rcu_tasks() returns,
+ * each CPU is guaranteed to have executed a full memory barrier since the
+ * end of its last RCU-tasks read-side critical section whose beginning
+ * preceded the call to synchronize_rcu_tasks().  In addition, each CPU
+ * having an RCU-tasks read-side critical section that extends beyond
+ * the return from synchronize_rcu_tasks() is guaranteed to have executed
+ * a full memory barrier after the beginning of synchronize_rcu_tasks()
+ * and before the beginning of that RCU-tasks read-side critical section.
+ * Note that these guarantees include CPUs that are offline, idle, or
+ * executing in user mode, as well as CPUs that are executing in the kernel.
+ *
+ * Furthermore, if CPU A invoked synchronize_rcu_tasks(), which returned
+ * to its caller on CPU B, then both CPU A and CPU B are guaranteed
+ * to have executed a full memory barrier during the execution of
+ * synchronize_rcu_tasks() -- even if CPU A and CPU B are the same CPU
+ * (but again only if the system has more than one CPU).
+ */
+void synchronize_rcu_tasks(void)
+{
+	/* Complain if the scheduler has not started.  */
+	rcu_lockdep_assert(!rcu_scheduler_active,
+			   "synchronize_rcu_tasks called too soon");
+
+	/* Wait for the grace period. */
+	wait_rcu_gp(call_rcu_tasks);
+}
+
+/**
+ * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks.
+ *
+ * Although the current implementation is guaranteed to wait, it is not
+ * obligated to, for example, if there are no pending callbacks.
+ */
+void rcu_barrier_tasks(void)
+{
+	/* There is only one callback queue, so this is easy.  ;-) */
+	synchronize_rcu_tasks();
+}
+
 /* See if tasks are still holding out, complain if so. */
 static void check_holdout_task(struct task_struct *t)
 {
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 04/16] rcu: Make TASKS_RCU handle tasks that are almost done exiting
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 02/16] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 03/16] rcu: Add synchronous grace-period waiting for RCU-tasks Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules Paul E. McKenney
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Once a task has passed exit_notify() in the do_exit() code path, it
is no longer on the task lists, and is therefore no longer visible
to rcu_tasks_kthread().  This means that an almost-exited task might
be preempted while within a trampoline, and this task won't be waited
on by rcu_tasks_kthread().  This commit fixes this bug by adding an
srcu_struct.  An exiting task does srcu_read_lock() just before calling
exit_notify(), and does the corresponding srcu_read_unlock() after
doing the final preempt_disable().  This means that rcu_tasks_kthread()
can do synchronize_srcu() to wait for all mostly-exited tasks to reach
their final preempt_disable() region, and then use synchronize_sched()
to wait for those tasks to finish exiting.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |  3 +++
 kernel/exit.c            |  3 +++
 kernel/rcu/update.c      | 21 +++++++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1f073af940a5..e6aea256ad39 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -321,6 +321,8 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
  * macro rather than an inline function to avoid #include hell.
  */
 #ifdef CONFIG_TASKS_RCU
+#define TASKS_RCU(x) x
+extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
 		preempt_disable(); /* Exclude synchronize_sched(); */ \
@@ -329,6 +331,7 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
 		preempt_enable(); \
 	} while (0)
 #else /* #ifdef CONFIG_TASKS_RCU */
+#define TASKS_RCU(x) do { } while (0)
 #define rcu_note_voluntary_context_switch(t)	do { } while (0)
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
diff --git a/kernel/exit.c b/kernel/exit.c
index e5c4668f1799..b63823876afc 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -663,6 +663,7 @@ void do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
+	TASKS_RCU(int tasks_rcu_i);
 
 	profile_task_exit(tsk);
 
@@ -772,6 +773,7 @@ void do_exit(long code)
 	 */
 	flush_ptrace_hw_breakpoint(tsk);
 
+	TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu));
 	exit_notify(tsk, group_dead);
 	proc_exit_connector(tsk);
 #ifdef CONFIG_NUMA
@@ -811,6 +813,7 @@ void do_exit(long code)
 	if (tsk->nr_dirtied)
 		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
 	exit_rcu();
+	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
 
 	/*
 	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index d6fa54befa22..4cece6e886ee 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -370,6 +370,13 @@ static struct rcu_head *rcu_tasks_cbs_head;
 static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
 static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
 
+/* Track exiting tasks in order to allow them to be waited for. */
+DEFINE_SRCU(tasks_rcu_exit_srcu);
+
+/* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. */
+static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3;
+module_param(rcu_task_stall_timeout, int, 0644);
+
 /* Post an RCU-tasks callback. */
 void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
 {
@@ -521,6 +528,15 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		rcu_read_unlock();
 
 		/*
+		 * Wait for tasks that are in the process of exiting.
+		 * This does only part of the job, ensuring that all
+		 * tasks that were previously exiting reach the point
+		 * where they have disabled preemption, allowing the
+		 * later synchronize_sched() to finish the job.
+		 */
+		synchronize_srcu(&tasks_rcu_exit_srcu);
+
+		/*
 		 * Each pass through the following loop scans the list
 		 * of holdout tasks, removing any that are no longer
 		 * holdouts.  When the list is empty, we are done.
@@ -549,6 +565,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		 * ->rcu_tasks_holdout accesses to be within the grace
 		 * period, avoiding the need for memory barriers for
 		 * ->rcu_tasks_holdout accesses.
+		 *
+		 * In addition, this synchronize_sched() waits for exiting
+		 * tasks to complete their final preempt_disable() region
+		 * of execution, cleaning up after the synchronize_srcu()
+		 * above.
 		 */
 		synchronize_sched();
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (2 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 04/16] rcu: Make TASKS_RCU handle tasks that are almost done exiting Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-14 19:08     ` Pranith Kumar
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks Paul E. McKenney
                     ` (11 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: Steven Rostedt <rostedt@goodmis.org>

This commit exports the RCU-tasks APIs, call_rcu_tasks(),
synchronize_rcu_tasks(), and rcu_barrier_tasks(), to GPL-licensed
kernel modules.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcu/update.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4cece6e886ee..8f53a41dd9ee 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -433,6 +433,7 @@ void synchronize_rcu_tasks(void)
 	/* Wait for the grace period. */
 	wait_rcu_gp(call_rcu_tasks);
 }
+EXPORT_SYMBOL_GPL(synchronize_rcu_tasks);
 
 /**
  * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks.
@@ -445,6 +446,7 @@ void rcu_barrier_tasks(void)
 	/* There is only one callback queue, so this is easy.  ;-) */
 	synchronize_rcu_tasks();
 }
+EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 /* See if tasks are still holding out, complain if so. */
 static void check_holdout_task(struct task_struct *t)
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (3 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-14 21:34     ` Pranith Kumar
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 07/16] rcutorture: Add RCU-tasks test cases Paul E. McKenney
                     ` (10 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit adds torture tests for RCU-tasks.  It also fixes a bug that
would segfault for an RCU flavor lacking a callback-barrier function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 include/linux/rcupdate.h |  1 +
 kernel/rcu/rcutorture.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index e6aea256ad39..f504f797c9c8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,7 @@ enum rcutorture_type {
 	RCU_FLAVOR,
 	RCU_BH_FLAVOR,
 	RCU_SCHED_FLAVOR,
+	RCU_TASKS_FLAVOR,
 	SRCU_FLAVOR,
 	INVALID_RCU_FLAVOR
 };
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index febe07062ac5..52423f2c74da 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
 	.name		= "sched"
 };
 
+#ifdef CONFIG_TASKS_RCU
+
+/*
+ * Definitions for RCU-tasks torture testing.
+ */
+
+static int tasks_torture_read_lock(void)
+{
+	return 0;
+}
+
+static void tasks_torture_read_unlock(int idx)
+{
+}
+
+static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
+{
+	call_rcu_tasks(&p->rtort_rcu, rcu_torture_cb);
+}
+
+static struct rcu_torture_ops tasks_ops = {
+	.ttype		= RCU_TASKS_FLAVOR,
+	.init		= rcu_sync_torture_init,
+	.readlock	= tasks_torture_read_lock,
+	.read_delay	= rcu_read_delay,  /* just reuse rcu's version. */
+	.readunlock	= tasks_torture_read_unlock,
+	.completed	= rcu_no_completed,
+	.deferred_free	= rcu_tasks_torture_deferred_free,
+	.sync		= synchronize_rcu_tasks,
+	.exp_sync	= synchronize_rcu_tasks,
+	.call		= call_rcu_tasks,
+	.cb_barrier	= rcu_barrier_tasks,
+	.fqs		= NULL,
+	.stats		= NULL,
+	.irq_capable	= 1,
+	.name		= "tasks"
+};
+
+#define RCUTORTURE_TASKS_OPS &tasks_ops,
+
+#else /* #ifdef CONFIG_TASKS_RCU */
+
+#define RCUTORTURE_TASKS_OPS
+
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
+
 /*
  * RCU torture priority-boost testing.  Runs one real-time thread per
  * CPU for moderate bursts, repeatedly registering RCU callbacks and
@@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
 		if (atomic_dec_and_test(&barrier_cbs_count))
 			wake_up(&barrier_wq);
 	} while (!torture_must_stop());
-	cur_ops->cb_barrier();
+	if (cur_ops->cb_barrier != NULL)
+		cur_ops->cb_barrier();
 	destroy_rcu_head_on_stack(&rcu);
 	torture_kthread_stopping("rcu_torture_barrier_cbs");
 	return 0;
@@ -1534,6 +1581,7 @@ rcu_torture_init(void)
 	int firsterr = 0;
 	static struct rcu_torture_ops *torture_ops[] = {
 		&rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &sched_ops,
+		RCUTORTURE_TASKS_OPS
 	};
 
 	if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable))
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 07/16] rcutorture: Add RCU-tasks test cases
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (4 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks Paul E. McKenney
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit adds the TASKS01 and TASKS02 Kconfig fragments, along with
the corresponding TASKS01.boot and TASKS02.boot boot-parameter files
specifying that rcutorture test RCU-tasks instead of the default flavor.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 tools/testing/selftests/rcutorture/configs/rcu/TASKS01      |  9 +++++++++
 tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot |  1 +
 tools/testing/selftests/rcutorture/configs/rcu/TASKS02      |  5 +++++
 tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot |  1 +
 tools/testing/selftests/rcutorture/configs/rcu/TASKS03      | 13 +++++++++++++
 tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot |  1 +
 6 files changed, 30 insertions(+)
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TASKS01
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TASKS02
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TASKS03
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01
new file mode 100644
index 000000000000..97f0a0b27ef7
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01
@@ -0,0 +1,9 @@
+CONFIG_SMP=y
+CONFIG_NR_CPUS=2
+CONFIG_HOTPLUG_CPU=y
+CONFIG_PREEMPT_NONE=n
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=y
+CONFIG_DEBUG_LOCK_ALLOC=y
+CONFIG_PROVE_RCU=y
+CONFIG_TASKS_RCU=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot
new file mode 100644
index 000000000000..cd2a188eeb6d
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot
@@ -0,0 +1 @@
+rcutorture.torture_type=tasks
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS02 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02
new file mode 100644
index 000000000000..696d2ea74d13
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02
@@ -0,0 +1,5 @@
+CONFIG_SMP=n
+CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=n
+CONFIG_TASKS_RCU=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot
new file mode 100644
index 000000000000..cd2a188eeb6d
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS02.boot
@@ -0,0 +1 @@
+rcutorture.torture_type=tasks
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS03 b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03
new file mode 100644
index 000000000000..9c60da5b5d1d
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03
@@ -0,0 +1,13 @@
+CONFIG_SMP=y
+CONFIG_NR_CPUS=2
+CONFIG_HOTPLUG_CPU=n
+CONFIG_SUSPEND=n
+CONFIG_HIBERNATION=n
+CONFIG_PREEMPT_NONE=n
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=y
+CONFIG_TASKS_RCU=y
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=n
+CONFIG_NO_HZ_FULL=y
+CONFIG_NO_HZ_FULL_ALL=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot
new file mode 100644
index 000000000000..cd2a188eeb6d
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS03.boot
@@ -0,0 +1 @@
+rcutorture.torture_type=tasks
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (5 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 07/16] rcutorture: Add RCU-tasks test cases Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-14 21:39     ` Pranith Kumar
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency Paul E. McKenney
                     ` (8 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit adds a three-minute RCU-tasks stall warning.  The actual
time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
with values less than or equal to zero disabling the stall warnings.
The default value is three minutes, which means that the tasks that
have not yet responded will get their stacks dumped every ten minutes,
until they pass through a voluntary context switch.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt |  5 +++++
 kernel/rcu/update.c                 | 27 ++++++++++++++++++++++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 910c3829f81d..8cdbde7b17f5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2921,6 +2921,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	rcupdate.rcu_cpu_stall_timeout= [KNL]
 			Set timeout for RCU CPU stall warning messages.
 
+	rcupdate.rcu_task_stall_timeout= [KNL]
+			Set timeout in jiffies for RCU task stall warning
+			messages.  Disable with a value less than or equal
+			to zero.
+
 	rdinit=		[KNL]
 			Format: <full_path>
 			Run specified binary instead of /init from the ramdisk,
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 8f53a41dd9ee..f1535404a79e 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -374,7 +374,7 @@ static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
 DEFINE_SRCU(tasks_rcu_exit_srcu);
 
 /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. */
-static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3;
+static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
 module_param(rcu_task_stall_timeout, int, 0644);
 
 /* Post an RCU-tasks callback. */
@@ -449,7 +449,8 @@ void rcu_barrier_tasks(void)
 EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 
 /* See if tasks are still holding out, complain if so. */
-static void check_holdout_task(struct task_struct *t)
+static void check_holdout_task(struct task_struct *t,
+			       bool needreport, bool *firstreport)
 {
 	if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
 	    t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
@@ -457,7 +458,15 @@ static void check_holdout_task(struct task_struct *t)
 		ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
 		list_del_rcu(&t->rcu_tasks_holdout_list);
 		put_task_struct(t);
+		return;
 	}
+	if (!needreport)
+		return;
+	if (*firstreport) {
+		pr_err("INFO: rcu_tasks detected stalls on tasks:\n");
+		*firstreport = false;
+	}
+	sched_show_task(t);
 }
 
 /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
@@ -465,6 +474,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 {
 	unsigned long flags;
 	struct task_struct *g, *t;
+	unsigned long lastreport;
 	struct rcu_head *list;
 	struct rcu_head *next;
 	LIST_HEAD(rcu_tasks_holdouts);
@@ -543,13 +553,24 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		 * of holdout tasks, removing any that are no longer
 		 * holdouts.  When the list is empty, we are done.
 		 */
+		lastreport = jiffies;
 		while (!list_empty(&rcu_tasks_holdouts)) {
+			bool firstreport;
+			bool needreport;
+			int rtst;
+
 			schedule_timeout_interruptible(HZ);
+			rtst = ACCESS_ONCE(rcu_task_stall_timeout);
+			needreport = rtst > 0 &&
+				     time_after(jiffies, lastreport + rtst);
+			if (needreport)
+				lastreport = jiffies;
+			firstreport = true;
 			WARN_ON(signal_pending(current));
 			rcu_read_lock();
 			list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
 						rcu_tasks_holdout_list)
-				check_holdout_task(t);
+				check_holdout_task(t, needreport, &firstreport);
 			rcu_read_unlock();
 		}
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (6 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-14 21:42     ` Pranith Kumar
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 10/16] documentation: Add verbiage on RCU-tasks stall warning messages Paul E. McKenney
                     ` (7 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The current RCU-tasks implementation uses strict polling to detect
callback arrivals.  This works quite well, but is not so good for
energy efficiency.  This commit therefore replaces the strict polling
with a wait queue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index f1535404a79e..1256a900cd01 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
 /* Global list of callbacks and associated lock. */
 static struct rcu_head *rcu_tasks_cbs_head;
 static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
+static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
 static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
 
 /* Track exiting tasks in order to allow them to be waited for. */
@@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
 void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
 {
 	unsigned long flags;
+	bool needwake;
 
 	rhp->next = NULL;
 	rhp->func = func;
 	raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
+	needwake = !rcu_tasks_cbs_head;
 	*rcu_tasks_cbs_tail = rhp;
 	rcu_tasks_cbs_tail = &rhp->next;
 	raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
+	if (needwake)
+		wake_up(&rcu_tasks_cbs_wq);
 }
 EXPORT_SYMBOL_GPL(call_rcu_tasks);
 
@@ -498,8 +503,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 
 		/* If there were none, wait a bit and start over. */
 		if (!list) {
-			schedule_timeout_interruptible(HZ);
-			WARN_ON(signal_pending(current));
+			wait_event_interruptible(rcu_tasks_cbs_wq,
+						 rcu_tasks_cbs_head);
+			if (!rcu_tasks_cbs_head) {
+				WARN_ON(signal_pending(current));
+				schedule_timeout_interruptible(HZ/10);
+			}
 			continue;
 		}
 
@@ -605,6 +614,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			list = next;
 			cond_resched();
 		}
+		schedule_timeout_uninterruptible(HZ/10);
 	}
 }
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 10/16] documentation: Add verbiage on RCU-tasks stall warning messages
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (7 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency Paul E. McKenney
@ 2014-08-11 22:48   ` Paul E. McKenney
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks() Paul E. McKenney
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit documents RCU-tasks stall warning messages and also describes
when to use the new cond_resched_rcu_qs() API.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/stallwarn.txt | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 68fe3ad27015..ef5a2fd4ff70 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -56,8 +56,20 @@ RCU_STALL_RAT_DELAY
 	two jiffies.  (This is a cpp macro, not a kernel configuration
 	parameter.)
 
-When a CPU detects that it is stalling, it will print a message similar
-to the following:
+rcupdate.rcu_task_stall_timeout
+
+	This boot/sysfs parameter controls the RCU-tasks stall warning
+	interval.  A value of zero or less suppresses RCU-tasks stall
+	warnings.  A positive value sets the stall-warning interval
+	in jiffies.  An RCU-tasks stall warning starts wtih the line:
+
+		INFO: rcu_tasks detected stalls on tasks:
+
+	And continues with the output of sched_show_task() for each
+	task stalling the current RCU-tasks grace period.
+
+For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
+it will print a message similar to the following:
 
 INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies)
 
@@ -174,8 +186,12 @@ o	A CPU looping with preemption disabled.  This condition can
 o	A CPU looping with bottom halves disabled.  This condition can
 	result in RCU-sched and RCU-bh stalls.
 
-o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
-	without invoking schedule().
+o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the
+	kernel without invoking schedule().  Note that cond_resched()
+	does not necessarily prevent RCU CPU stall warnings.  Therefore,
+	if the looping in the kernel is really expected and desirable
+	behavior, you might need to replace some of the cond_resched()
+	calls with calls to cond_resched_rcu_qs().
 
 o	A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
 	happen to preempt a low-priority task in the middle of an RCU
@@ -208,11 +224,10 @@ o	A hardware failure.  This is quite unlikely, but has occurred
 	This resulted in a series of RCU CPU stall warnings, eventually
 	leading the realization that the CPU had failed.
 
-The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
-SRCU does not have its own CPU stall warnings, but its calls to
-synchronize_sched() will result in RCU-sched detecting RCU-sched-related
-CPU stalls.  Please note that RCU only detects CPU stalls when there is
-a grace period in progress.  No grace period, no CPU stall warnings.
+The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall
+warning.  Note that SRCU does -not- have CPU stall warnings.  Please note
+that RCU only detects CPU stalls when there is a grace period in progress.
+No grace period, no CPU stall warnings.
 
 To diagnose the cause of the stall, inspect the stack traces.
 The offending function will usually be near the top of the stack.
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks()
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (8 preceding siblings ...)
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 10/16] documentation: Add verbiage on RCU-tasks stall warning messages Paul E. McKenney
@ 2014-08-11 22:49   ` Paul E. McKenney
  2014-08-14 22:28     ` Pranith Kumar
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs Paul E. McKenney
                     ` (5 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

It is expected that many sites will have CONFIG_TASKS_RCU=y, but
will never actually invoke call_rcu_tasks().  For such sites, creating
rcu_tasks_kthread() at boot is wasteful.  This commit therefore defers
creation of this kthread until the time of the first call_rcu_tasks().

This of course means that the first call_rcu_tasks() must be invoked
from process context after the scheduler is fully operational.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 1256a900cd01..d997163c7e92 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -378,7 +378,12 @@ DEFINE_SRCU(tasks_rcu_exit_srcu);
 static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
 module_param(rcu_task_stall_timeout, int, 0644);
 
-/* Post an RCU-tasks callback. */
+static void rcu_spawn_tasks_kthread(void);
+
+/*
+ * Post an RCU-tasks callback.  First call must be from process context
+ * after the scheduler if fully operational.
+ */
 void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
 {
 	unsigned long flags;
@@ -391,8 +396,10 @@ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
 	*rcu_tasks_cbs_tail = rhp;
 	rcu_tasks_cbs_tail = &rhp->next;
 	raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
-	if (needwake)
+	if (needwake) {
+		rcu_spawn_tasks_kthread();
 		wake_up(&rcu_tasks_cbs_wq);
+	}
 }
 EXPORT_SYMBOL_GPL(call_rcu_tasks);
 
@@ -618,15 +625,27 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 	}
 }
 
-/* Spawn rcu_tasks_kthread() at boot time. */
-static int __init rcu_spawn_tasks_kthread(void)
+/* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */
+static void rcu_spawn_tasks_kthread(void)
 {
-	struct task_struct __maybe_unused *t;
+	static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
+	static struct task_struct *rcu_tasks_kthread_ptr;
+	struct task_struct *t;
 
+	if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) {
+		smp_mb(); /* Ensure caller sees full kthread. */
+		return;
+	}
+	mutex_lock(&rcu_tasks_kthread_mutex);
+	if (rcu_tasks_kthread_ptr) {
+		mutex_unlock(&rcu_tasks_kthread_mutex);
+		return;
+	}
 	t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
 	BUG_ON(IS_ERR(t));
-	return 0;
+	smp_mb(); /* Ensure others see full kthread. */
+	ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
+	mutex_unlock(&rcu_tasks_kthread_mutex);
 }
-early_initcall(rcu_spawn_tasks_kthread);
 
 #endif /* #ifdef CONFIG_TASKS_RCU */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (9 preceding siblings ...)
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks() Paul E. McKenney
@ 2014-08-11 22:49   ` Paul E. McKenney
  2014-08-14 22:55     ` Pranith Kumar
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 13/16] rcu: Make rcu_tasks_kthread()'s GP-wait loop allow preemption Paul E. McKenney
                     ` (4 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
usermode execution.  There would be neither a context switch nor a
scheduling-clock interrupt to tell TASKS_RCU that the task in question
had passed through a quiescent state.  The grace period would therefore
extend indefinitely.  This commit therefore makes RCU's dyntick-idle
subsystem record the task_struct structure of the task that is running
in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
then access this information and record a quiescent state on
behalf of any CPU running in dyntick-idle usermode.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/init_task.h |  3 ++-
 include/linux/sched.h     |  2 ++
 kernel/rcu/tree.c         |  2 ++
 kernel/rcu/tree.h         |  2 ++
 kernel/rcu/tree_plugin.h  | 16 ++++++++++++++++
 kernel/rcu/update.c       |  4 +++-
 6 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 78715ea7c30c..642828009324 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -128,7 +128,8 @@ extern struct group_info init_groups;
 #define INIT_TASK_RCU_TASKS(tsk)					\
 	.rcu_tasks_holdout = false,					\
 	.rcu_tasks_holdout_list =					\
-		LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
+		LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),		\
+	.rcu_tasks_idle_cpu = -1,
 #else
 #define INIT_TASK_RCU_TASKS(tsk)
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3cf124389ec7..5fa041f7a034 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1277,6 +1277,7 @@ struct task_struct {
 	unsigned long rcu_tasks_nvcsw;
 	int rcu_tasks_holdout;
 	struct list_head rcu_tasks_holdout_list;
+	int rcu_tasks_idle_cpu;
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
@@ -2021,6 +2022,7 @@ static inline void rcu_copy_process(struct task_struct *p)
 #ifdef CONFIG_TASKS_RCU
 	p->rcu_tasks_holdout = false;
 	INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
+	p->rcu_tasks_idle_cpu = -1;
 #endif /* #ifdef CONFIG_TASKS_RCU */
 }
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 645a33efc0d4..0d9ee1e4f446 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -526,6 +526,7 @@ static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval,
 	atomic_inc(&rdtp->dynticks);
 	smp_mb__after_atomic();  /* Force ordering with next sojourn. */
 	WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
+	rcu_dynticks_task_enter();
 
 	/*
 	 * It is illegal to enter an extended quiescent state while
@@ -642,6 +643,7 @@ void rcu_irq_exit(void)
 static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval,
 			       int user)
 {
+	rcu_dynticks_task_exit();
 	smp_mb__before_atomic();  /* Force ordering w/previous sojourn. */
 	atomic_inc(&rdtp->dynticks);
 	/* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0f69a79c5b7d..37ff593b7725 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -579,6 +579,8 @@ static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
 static void rcu_bind_gp_kthread(void);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
+static void rcu_dynticks_task_enter(void);
+static void rcu_dynticks_task_exit(void);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a86a363ea453..0d8ef5cb1976 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2852,3 +2852,19 @@ static void rcu_bind_gp_kthread(void)
 		set_cpus_allowed_ptr(current, cpumask_of(cpu));
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
+
+/* Record the current task on dyntick-idle entry. */
+static void rcu_dynticks_task_enter(void)
+{
+#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
+	ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
+#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
+}
+
+/* Record no current task on dyntick-idle exit. */
+static void rcu_dynticks_task_exit(void)
+{
+#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
+	ACCESS_ONCE(current->rcu_tasks_idle_cpu) = -1;
+#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
+}
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index d997163c7e92..a4140f25cf1a 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -466,7 +466,9 @@ static void check_holdout_task(struct task_struct *t,
 {
 	if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
 	    t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
-	    !ACCESS_ONCE(t->on_rq)) {
+	    !ACCESS_ONCE(t->on_rq) ||
+	    (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
+	     !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) {
 		ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
 		list_del_rcu(&t->rcu_tasks_holdout_list);
 		put_task_struct(t);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 13/16] rcu: Make rcu_tasks_kthread()'s GP-wait loop allow preemption
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (10 preceding siblings ...)
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs Paul E. McKenney
@ 2014-08-11 22:49   ` Paul E. McKenney
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() Paul E. McKenney
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The grace-period-wait loop in rcu_tasks_kthread() is under (unnecessary)
RCU protection, and therefore has no preemption points in a PREEMPT=n
kernel.  This commit therefore removes the RCU protection and inserts
cond_resched().

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a4140f25cf1a..2ae6fb8752d4 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -470,7 +470,7 @@ static void check_holdout_task(struct task_struct *t,
 	    (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
 	     !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) {
 		ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
-		list_del_rcu(&t->rcu_tasks_holdout_list);
+		list_del_init(&t->rcu_tasks_holdout_list);
 		put_task_struct(t);
 		return;
 	}
@@ -576,6 +576,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			bool firstreport;
 			bool needreport;
 			int rtst;
+			struct task_struct *t1;
 
 			schedule_timeout_interruptible(HZ);
 			rtst = ACCESS_ONCE(rcu_task_stall_timeout);
@@ -585,11 +586,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 				lastreport = jiffies;
 			firstreport = true;
 			WARN_ON(signal_pending(current));
-			rcu_read_lock();
-			list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
-						rcu_tasks_holdout_list)
+			list_for_each_entry_safe(t, t1, &rcu_tasks_holdouts,
+						rcu_tasks_holdout_list) {
 				check_holdout_task(t, needreport, &firstreport);
-			rcu_read_unlock();
+				cond_resched();
+			}
 		}
 
 		/*
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (11 preceding siblings ...)
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 13/16] rcu: Make rcu_tasks_kthread()'s GP-wait loop allow preemption Paul E. McKenney
@ 2014-08-11 22:49   ` Paul E. McKenney
  2014-08-13 10:56     ` Peter Zijlstra
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks Paul E. McKenney
                     ` (2 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

In theory, synchronize_sched() requires a read-side critical section to
order against.  In practice, preemption can be thought of as being
disabled across every machine instruction.  So this commit removes
the redundant preempt_disable() from rcu_note_voluntary_context_switch().

Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f504f797c9c8..ed6e3e2e0089 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -326,10 +326,8 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
 extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
-		preempt_disable(); /* Exclude synchronize_sched(); */ \
 		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
 			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
-		preempt_enable(); \
 	} while (0)
 #else /* #ifdef CONFIG_TASKS_RCU */
 #define TASKS_RCU(x) do { } while (0)
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (12 preceding siblings ...)
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() Paul E. McKenney
@ 2014-08-11 22:49   ` Paul E. McKenney
  2014-08-13  8:12     ` Peter Zijlstra
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 16/16] rcu: Additional information on RCU-tasks stall-warning messages Paul E. McKenney
  2014-08-14 20:46   ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Pranith Kumar
  15 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Because idle-task code may need to be patched, RCU-tasks need to wait
for idle tasks to schedule.  This commit therefore detects this case
via context switch.  Block CPU hotplug during this time to avoid sending
IPIs to offline CPUs.

Note that checking for changes in the dyntick-idle counters is tempting,
but wrong.  The reason that it is wrong is that a interrupt or NMI can
increment these counters without necessarily allowing the idle tasks to
make any forward progress.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 2ae6fb8752d4..9ea2a26487c5 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -48,6 +48,7 @@
 #include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/kthread.h>
+#include "../sched/sched.h" /* cpu_rq()->idle */
 
 #define CREATE_TRACE_POINTS
 
@@ -464,15 +465,33 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 static void check_holdout_task(struct task_struct *t,
 			       bool needreport, bool *firstreport)
 {
-	if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
-	    t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
-	    !ACCESS_ONCE(t->on_rq) ||
-	    (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
-	     !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) {
-		ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
-		list_del_init(&t->rcu_tasks_holdout_list);
-		put_task_struct(t);
-		return;
+	if (!ACCESS_ONCE(t->rcu_tasks_holdout))
+		goto not_holdout; /* Other detection of non-holdout status. */
+	if (t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw))
+		goto not_holdout; /* Voluntary context switch. */
+	if (!ACCESS_ONCE(t->on_rq))
+		goto not_holdout; /* Not on runqueue. */
+	if (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
+	    !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)
+		goto not_holdout; /* NO_HZ_FULL userspace execution. */
+	if (is_idle_task(t)) {
+		int cpu;
+
+		cpu = task_cpu(t);
+		if (cpu >= 0 && cpu_curr(cpu) != t)
+			goto not_holdout; /* Idle task not running. */
+
+		if (cpu >= 0) {
+			/*
+			 * We must schedule on the idle CPU.  Note that
+			 * checking for changes in dyntick-idle counters
+			 * is not sufficient, as an interrupt or NMI can
+			 * change these counters without guaranteeing that
+			 * the underlying idle task has made progress.
+			 */
+			set_cpus_allowed_ptr(current, cpumask_of(cpu));
+			set_cpus_allowed_ptr(current, cpu_online_mask);
+		}
 	}
 	if (!needreport)
 		return;
@@ -481,11 +500,17 @@ static void check_holdout_task(struct task_struct *t,
 		*firstreport = false;
 	}
 	sched_show_task(t);
+	return;
+not_holdout:
+	ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
+	list_del_init(&t->rcu_tasks_holdout_list);
+	put_task_struct(t);
 }
 
 /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
 static int __noreturn rcu_tasks_kthread(void *arg)
 {
+	int cpu;
 	unsigned long flags;
 	struct task_struct *g, *t;
 	unsigned long lastreport;
@@ -546,8 +571,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		 */
 		rcu_read_lock();
 		for_each_process_thread(g, t) {
-			if (t != current && ACCESS_ONCE(t->on_rq) &&
-			    !is_idle_task(t)) {
+			if (t != current && ACCESS_ONCE(t->on_rq)) {
 				get_task_struct(t);
 				t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw);
 				ACCESS_ONCE(t->rcu_tasks_holdout) = 1;
@@ -558,6 +582,24 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		rcu_read_unlock();
 
 		/*
+		 * Next, queue up any currently running idle tasks.
+		 * Exclude CPU hotplug during the time we are working
+		 * with idle tasks, as it is considered bad form to
+		 * send IPIs to offline CPUs.
+		 */
+		get_online_cpus();
+		for_each_online_cpu(cpu) {
+			t = cpu_rq(cpu)->idle;
+			if (t == cpu_curr(cpu)) {
+				get_task_struct(t);
+				t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw);
+				ACCESS_ONCE(t->rcu_tasks_holdout) = 1;
+				list_add(&t->rcu_tasks_holdout_list,
+					 &rcu_tasks_holdouts);
+			}
+		}
+
+		/*
 		 * Wait for tasks that are in the process of exiting.
 		 * This does only part of the job, ensuring that all
 		 * tasks that were previously exiting reach the point
@@ -592,6 +634,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 				cond_resched();
 			}
 		}
+		put_online_cpus();
 
 		/*
 		 * Because ->on_rq and ->nvcsw are not guaranteed
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v5 tip/core/rcu 16/16] rcu: Additional information on RCU-tasks stall-warning messages
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (13 preceding siblings ...)
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks Paul E. McKenney
@ 2014-08-11 22:49   ` Paul E. McKenney
  2014-08-14 20:46   ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Pranith Kumar
  15 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-11 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 9ea2a26487c5..43ea9c37bbd0 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -465,6 +465,8 @@ EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
 static void check_holdout_task(struct task_struct *t,
 			       bool needreport, bool *firstreport)
 {
+	int cpu;
+
 	if (!ACCESS_ONCE(t->rcu_tasks_holdout))
 		goto not_holdout; /* Other detection of non-holdout status. */
 	if (t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw))
@@ -475,8 +477,6 @@ static void check_holdout_task(struct task_struct *t,
 	    !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)
 		goto not_holdout; /* NO_HZ_FULL userspace execution. */
 	if (is_idle_task(t)) {
-		int cpu;
-
 		cpu = task_cpu(t);
 		if (cpu >= 0 && cpu_curr(cpu) != t)
 			goto not_holdout; /* Idle task not running. */
@@ -499,6 +499,12 @@ static void check_holdout_task(struct task_struct *t,
 		pr_err("INFO: rcu_tasks detected stalls on tasks:\n");
 		*firstreport = false;
 	}
+	cpu = task_cpu(t);
+	pr_alert("%p: %c%c nvcsw: %lu/%lu holdout: %d idle_cpu: %d/%d\n",
+		 t, ".I"[is_idle_task(t)],
+		 "N."[cpu < 0 || !tick_nohz_full_cpu(cpu)],
+		 t->rcu_tasks_nvcsw, t->nvcsw, t->rcu_tasks_holdout,
+		 t->rcu_tasks_idle_cpu, cpu);
 	sched_show_task(t);
 	return;
 not_holdout:
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH tip/core/rcu 0/16] RCU-tasks implementation
  2014-08-11 22:48 [PATCH tip/core/rcu 0/16] RCU-tasks implementation Paul E. McKenney
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
@ 2014-08-12 23:57 ` Paul E. McKenney
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-12 23:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Mon, Aug 11, 2014 at 03:48:40PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This series provides v5 of a prototype of an RCU-tasks implementation,
> which has been requested to assist with tramopoline removal.  This flavor
> of RCU is task-based rather than CPU-based, and has voluntary context
> switch, usermode execution, and the idle loops as its only quiescent
> states.  This selection of quiescent states ensures that at the end
> of a grace period, there will no longer be any tasks depending on a
> trampoline that was removed before the beginning of that grace period.
> This works because such trampolines do not contain function calls,
> do not contain voluntary context switches, do not switch to usermode,
> and do not switch to idle.

[ . . . ]

> o	There are probably still bugs.

And there probably are, but this version passes 10-hour rcutorture tests
in a few configurations, so getting reasonably robust.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks Paul E. McKenney
@ 2014-08-13  8:12     ` Peter Zijlstra
  2014-08-13 12:48       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13  8:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On Mon, Aug 11, 2014 at 03:49:04PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> Because idle-task code may need to be patched, RCU-tasks need to wait
> for idle tasks to schedule.  This commit therefore detects this case
> via context switch.  Block CPU hotplug during this time to avoid sending
> IPIs to offline CPUs.
> 
> Note that checking for changes in the dyntick-idle counters is tempting,
> but wrong.  The reason that it is wrong is that a interrupt or NMI can
> increment these counters without necessarily allowing the idle tasks to
> make any forward progress.

I'm going to NAK this.. with that rcu_idle patch I send there's
typically only a single idle function thats out of bounds and if its
more it can be made that with a bit of tlc to the cpuidle driver in
question.

This needs _FAR_ more justification than a maybe and a want.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() Paul E. McKenney
@ 2014-08-13 10:56     ` Peter Zijlstra
  2014-08-13 14:07       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 10:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On Mon, Aug 11, 2014 at 03:49:03PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> In theory, synchronize_sched() requires a read-side critical section to
> order against.  In practice, preemption can be thought of as being
> disabled across every machine instruction.  So this commit removes
> the redundant preempt_disable() from rcu_note_voluntary_context_switch().

>  #define rcu_note_voluntary_context_switch(t) \
>  	do { \
> -		preempt_disable(); /* Exclude synchronize_sched(); */ \
>  		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
>  			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> -		preempt_enable(); \
>  	} while (0)

But that's more than 1 instruction.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13  8:12     ` Peter Zijlstra
@ 2014-08-13 12:48       ` Paul E. McKenney
  2014-08-13 13:40         ` Peter Zijlstra
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 12:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Wed, Aug 13, 2014 at 10:12:15AM +0200, Peter Zijlstra wrote:
> On Mon, Aug 11, 2014 at 03:49:04PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > Because idle-task code may need to be patched, RCU-tasks need to wait
> > for idle tasks to schedule.  This commit therefore detects this case
> > via context switch.  Block CPU hotplug during this time to avoid sending
> > IPIs to offline CPUs.
> > 
> > Note that checking for changes in the dyntick-idle counters is tempting,
> > but wrong.  The reason that it is wrong is that a interrupt or NMI can
> > increment these counters without necessarily allowing the idle tasks to
> > make any forward progress.
> 
> I'm going to NAK this.. with that rcu_idle patch I send there's
> typically only a single idle function thats out of bounds and if its
> more it can be made that with a bit of tlc to the cpuidle driver in
> question.
> 
> This needs _FAR_ more justification than a maybe and a want.

Peter, your patch might be a good start, but I didn't see any reaction
from Steven or Masami and it did only x86.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 12:48       ` Paul E. McKenney
@ 2014-08-13 13:40         ` Peter Zijlstra
  2014-08-13 13:51           ` Steven Rostedt
  2014-08-13 14:12           ` Paul E. McKenney
  0 siblings, 2 replies; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 13:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On Wed, Aug 13, 2014 at 05:48:18AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 13, 2014 at 10:12:15AM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 11, 2014 at 03:49:04PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > 
> > > Because idle-task code may need to be patched, RCU-tasks need to wait
> > > for idle tasks to schedule.  This commit therefore detects this case
> > > via context switch.  Block CPU hotplug during this time to avoid sending
> > > IPIs to offline CPUs.
> > > 
> > > Note that checking for changes in the dyntick-idle counters is tempting,
> > > but wrong.  The reason that it is wrong is that a interrupt or NMI can
> > > increment these counters without necessarily allowing the idle tasks to
> > > make any forward progress.
> > 
> > I'm going to NAK this.. with that rcu_idle patch I send there's
> > typically only a single idle function thats out of bounds and if its
> > more it can be made that with a bit of tlc to the cpuidle driver in
> > question.
> > 
> > This needs _FAR_ more justification than a maybe and a want.
> 
> Peter, your patch might be a good start, but I didn't see any reaction
> from Steven or Masami and it did only x86.

That's not an excuse for doing horrible things. And inventing new infra
that needs to wake all CPUs is horrible.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 13:40         ` Peter Zijlstra
@ 2014-08-13 13:51           ` Steven Rostedt
  2014-08-13 14:07             ` Peter Zijlstra
  2014-08-13 20:56             ` Paul E. McKenney
  2014-08-13 14:12           ` Paul E. McKenney
  1 sibling, 2 replies; 60+ messages in thread
From: Steven Rostedt @ 2014-08-13 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

On Wed, 13 Aug 2014 15:40:25 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Aug 13, 2014 at 05:48:18AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 13, 2014 at 10:12:15AM +0200, Peter Zijlstra wrote:
> > > On Mon, Aug 11, 2014 at 03:49:04PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > Because idle-task code may need to be patched, RCU-tasks need to wait
> > > > for idle tasks to schedule.  This commit therefore detects this case
> > > > via context switch.  Block CPU hotplug during this time to avoid sending
> > > > IPIs to offline CPUs.
> > > > 
> > > > Note that checking for changes in the dyntick-idle counters is tempting,
> > > > but wrong.  The reason that it is wrong is that a interrupt or NMI can
> > > > increment these counters without necessarily allowing the idle tasks to
> > > > make any forward progress.
> > > 
> > > I'm going to NAK this.. with that rcu_idle patch I send there's
> > > typically only a single idle function thats out of bounds and if its
> > > more it can be made that with a bit of tlc to the cpuidle driver in
> > > question.
> > > 
> > > This needs _FAR_ more justification than a maybe and a want.
> > 
> > Peter, your patch might be a good start, but I didn't see any reaction
> > from Steven or Masami and it did only x86.
> 
> That's not an excuse for doing horrible things. And inventing new infra
> that needs to wake all CPUs is horrible.

I still need to look at the patches, but if this is just for the idle
case, then we don't need it. The idle case can be solved with a simple
sched_on_each_cpu(). I need a way to solve waiting for processes to
finish from a preemption point.

That's all I want, and if we can remove the "idle" case and document it
well that it's not covered and a sched_on_each_cpu() may be needed,
then I'm fine with that.

	sched_on_each_cpu(dummy_op);
	call_rcu_tasks(free_tramp);

Would that work?

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 13:51           ` Steven Rostedt
@ 2014-08-13 14:07             ` Peter Zijlstra
  2014-08-13 14:13               ` Steven Rostedt
  2014-08-13 20:56             ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 14:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

On Wed, Aug 13, 2014 at 09:51:32AM -0400, Steven Rostedt wrote:

> I still need to look at the patches, but if this is just for the idle
> case, then we don't need it. The idle case can be solved with a simple
> sched_on_each_cpu(). I need a way to solve waiting for processes to
> finish from a preemption point.
> 
> That's all I want, and if we can remove the "idle" case and document it
> well that it's not covered and a sched_on_each_cpu() may be needed,
> then I'm fine with that.
> 
> 	sched_on_each_cpu(dummy_op);
> 	call_rcu_tasks(free_tramp);

Sure, but why not dtrt and push rcu_idle hooks all the way down into the
idle drivers if and where appropriate?

There isn't _that_ much idle driver code. Also, some stuff should be
cleaned up; we're already calling stop_critical_timings() in the generic
idle code, and then calling it again in the cpuidle drivers.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()
  2014-08-13 10:56     ` Peter Zijlstra
@ 2014-08-13 14:07       ` Paul E. McKenney
  2014-08-13 14:33         ` Peter Zijlstra
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Wed, Aug 13, 2014 at 12:56:18PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 11, 2014 at 03:49:03PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > In theory, synchronize_sched() requires a read-side critical section to
> > order against.  In practice, preemption can be thought of as being
> > disabled across every machine instruction.  So this commit removes
> > the redundant preempt_disable() from rcu_note_voluntary_context_switch().
> 
> >  #define rcu_note_voluntary_context_switch(t) \
> >  	do { \
> > -		preempt_disable(); /* Exclude synchronize_sched(); */ \
> >  		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> >  			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > -		preempt_enable(); \
> >  	} while (0)
> 
> But that's more than 1 instruction.

Yeah, the commit log could use some help.  The instruction in question
is the store.  The "if" is just an optimization.

So suppose that this sequence is preempted between the "if" and the store,
and that the synchronize_sched() (and quite a bit more besides!) takes
place during this preemption.  The task is still in a quiescent state
at the time of the store, so the store is still legitimate.

That said, it might be better to just leave preemption disabled, as that
certainly makes things simpler.  Thoughts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 13:40         ` Peter Zijlstra
  2014-08-13 13:51           ` Steven Rostedt
@ 2014-08-13 14:12           ` Paul E. McKenney
  2014-08-13 14:42             ` Peter Zijlstra
  1 sibling, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Wed, Aug 13, 2014 at 03:40:25PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 05:48:18AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 13, 2014 at 10:12:15AM +0200, Peter Zijlstra wrote:
> > > On Mon, Aug 11, 2014 at 03:49:04PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > Because idle-task code may need to be patched, RCU-tasks need to wait
> > > > for idle tasks to schedule.  This commit therefore detects this case
> > > > via context switch.  Block CPU hotplug during this time to avoid sending
> > > > IPIs to offline CPUs.
> > > > 
> > > > Note that checking for changes in the dyntick-idle counters is tempting,
> > > > but wrong.  The reason that it is wrong is that a interrupt or NMI can
> > > > increment these counters without necessarily allowing the idle tasks to
> > > > make any forward progress.
> > > 
> > > I'm going to NAK this.. with that rcu_idle patch I send there's
> > > typically only a single idle function thats out of bounds and if its
> > > more it can be made that with a bit of tlc to the cpuidle driver in
> > > question.
> > > 
> > > This needs _FAR_ more justification than a maybe and a want.
> > 
> > Peter, your patch might be a good start, but I didn't see any reaction
> > from Steven or Masami and it did only x86.
> 
> That's not an excuse for doing horrible things. And inventing new infra
> that needs to wake all CPUs is horrible.

Does your patch even work?  Looks like it should, and yes, the idle loop
seems quite a bit simpler than it was a few years ago, but we really
don't need some strange thing that leaves a CPU idle but not visible as
such to RCU.

I have already said that I will be happy to rip out the wakeup code
when it is no longer needed, and I agree that it would be way better if
not needed.  But I won't base a patch on hypotheticals.  You have already
drawn way too much water from -that- well over the past years!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:07             ` Peter Zijlstra
@ 2014-08-13 14:13               ` Steven Rostedt
  2014-08-13 14:43                 ` Paul E. McKenney
  2014-08-13 14:43                 ` Peter Zijlstra
  0 siblings, 2 replies; 60+ messages in thread
From: Steven Rostedt @ 2014-08-13 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

On Wed, 13 Aug 2014 16:07:05 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Aug 13, 2014 at 09:51:32AM -0400, Steven Rostedt wrote:
> 
> > I still need to look at the patches, but if this is just for the idle
> > case, then we don't need it. The idle case can be solved with a simple
> > sched_on_each_cpu(). I need a way to solve waiting for processes to
> > finish from a preemption point.
> > 
> > That's all I want, and if we can remove the "idle" case and document it
> > well that it's not covered and a sched_on_each_cpu() may be needed,
> > then I'm fine with that.
> > 
> > 	sched_on_each_cpu(dummy_op);
> > 	call_rcu_tasks(free_tramp);
> 
> Sure, but why not dtrt and push rcu_idle hooks all the way down into the
> idle drivers if and where appropriate?
> 
> There isn't _that_ much idle driver code. Also, some stuff should be
> cleaned up; we're already calling stop_critical_timings() in the generic
> idle code, and then calling it again in the cpuidle drivers.
> 
> 

True, perhaps the rcu code should hook into the stop_critical_timings
code?

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()
  2014-08-13 14:07       ` Paul E. McKenney
@ 2014-08-13 14:33         ` Peter Zijlstra
  2014-08-13 20:06           ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 14:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

On Wed, Aug 13, 2014 at 07:07:51AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 13, 2014 at 12:56:18PM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 11, 2014 at 03:49:03PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > 
> > > In theory, synchronize_sched() requires a read-side critical section to
> > > order against.  In practice, preemption can be thought of as being
> > > disabled across every machine instruction.  So this commit removes
> > > the redundant preempt_disable() from rcu_note_voluntary_context_switch().
> > 
> > >  #define rcu_note_voluntary_context_switch(t) \
> > >  	do { \
> > > -		preempt_disable(); /* Exclude synchronize_sched(); */ \
> > >  		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> > >  			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > > -		preempt_enable(); \
> > >  	} while (0)
> > 
> > But that's more than 1 instruction.
> 
> Yeah, the commit log could use some help.  The instruction in question
> is the store.  The "if" is just an optimization.
> 
> So suppose that this sequence is preempted between the "if" and the store,
> and that the synchronize_sched() (and quite a bit more besides!) takes
> place during this preemption.  The task is still in a quiescent state
> at the time of the store, so the store is still legitimate.
> 
> That said, it might be better to just leave preemption disabled, as that
> certainly makes things simpler.  Thoughts?

A comment explaining it should be fine I think. I was just raising the
obvious fail in the changelog.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:12           ` Paul E. McKenney
@ 2014-08-13 14:42             ` Peter Zijlstra
  2014-08-13 17:24               ` Peter Zijlstra
  2014-08-13 18:20               ` Paul E. McKenney
  0 siblings, 2 replies; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 14:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

[-- Attachment #1: Type: text/plain, Size: 1739 bytes --]

On Wed, Aug 13, 2014 at 07:12:17AM -0700, Paul E. McKenney wrote:
> > That's not an excuse for doing horrible things. And inventing new infra
> > that needs to wake all CPUs is horrible.
> 
> Does your patch even work? 

Haven't even tried compiling it, but making it work shouldn't be too
hard.

> Looks like it should, and yes, the idle loop
> seems quite a bit simpler than it was a few years ago, but we really
> don't need some strange thing that leaves a CPU idle but not visible as
> such to RCU.

There's slightly more to it though; things like the x86 mwait idle wait
functions tend to do far too much; for instance look at:

drivers/idle/intel_idle.c:intel_idle()

We should push the rcu_idle_{enter,exit}() down to around
mwait_idle_with_hints(), so we don't call half the word with RCU
disabled.

> I have already said that I will be happy to rip out the wakeup code
> when it is no longer needed, and I agree that it would be way better if
> not needed.

I'd prefer to dtrt now and not needing to fix it later.

Auditing all idle functions will be somewhat of a pain, but its entirely
doable. Looking at this stuff, it appears we can clean it up massively;
see how the generic cpuidle code already has the broadcast logic in, so
we can remove that from the drivers by setting the right flags.

We can similarly pull out the leave_mm() call by adding a
CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
intel_idle (and all other cpuidle_state::enter functions with __notrace.

> But I won't base a patch on hypotheticals.  You have already
> drawn way too much water from -that- well over the past years!  ;-)

not entirely sure what you're referring to there ;-)

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:13               ` Steven Rostedt
@ 2014-08-13 14:43                 ` Paul E. McKenney
  2014-08-13 16:30                   ` Peter Zijlstra
  2014-08-13 16:35                   ` Peter Zijlstra
  2014-08-13 14:43                 ` Peter Zijlstra
  1 sibling, 2 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 14:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

On Wed, Aug 13, 2014 at 10:13:01AM -0400, Steven Rostedt wrote:
> On Wed, 13 Aug 2014 16:07:05 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Aug 13, 2014 at 09:51:32AM -0400, Steven Rostedt wrote:
> > 
> > > I still need to look at the patches, but if this is just for the idle
> > > case, then we don't need it. The idle case can be solved with a simple
> > > sched_on_each_cpu(). I need a way to solve waiting for processes to
> > > finish from a preemption point.
> > > 
> > > That's all I want, and if we can remove the "idle" case and document it
> > > well that it's not covered and a sched_on_each_cpu() may be needed,
> > > then I'm fine with that.
> > > 
> > > 	sched_on_each_cpu(dummy_op);
> > > 	call_rcu_tasks(free_tramp);
> > 
> > Sure, but why not dtrt and push rcu_idle hooks all the way down into the
> > idle drivers if and where appropriate?
> > 
> > There isn't _that_ much idle driver code. Also, some stuff should be
> > cleaned up; we're already calling stop_critical_timings() in the generic
> > idle code, and then calling it again in the cpuidle drivers.
> 
> True, perhaps the rcu code should hook into the stop_critical_timings
> code?

Is that safe given that stop_critical_timings() is invoked around other things?
Let's see...

o	drivers/acpi/acpi_pad.c, power_saving_thread().

	Looks like a kthread that does idle injection.  Currently, RCU
	sees it as not a quiescent state.  Would it kill these guys to
	put in a comment or two about what this is for???

	So adding rcu_idle_enter() and rcu_idle_exit() here might
	actually fix a bug, though it is not clear how long this thing
	actually runs.  If only for a few milliseconds, no harm done.

o	drivers/acpi/processor_idle.c, acpi_idle_do_entry().

	Hmmm...  Is the idle loop really simpler these days?  Or is the
	complexity just better hidden?  :-/

	So acpi_idle_do_entry() is called from several places, will chase
	them down later.  Does stop_critical_timings() nest?

o	drivers/thermal/intel_powerclamp.c, clamp_thread().

	Looks similar to power_saving_thread(), but for thermal control.
	Probably short term, shouldn't be a problem either way.

o	kernel/printk/printk.c, console_cont_flush().

	call_console_drivers() certainly isn't idle.  Might not use
	RCU read sides, but...

o	kernel/printk/printk.c, console_unlock()

	Same as console_cont_flush().

So the first three look OK to hook rcu_idle_enter() and rcu_idle_exit()
into, but the last two don't look so good.

That said, if you are OK not tracing the stuff under stop_critical_timings(),
then I can use the RCU dyntick-idle state and not wake anything up.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:13               ` Steven Rostedt
  2014-08-13 14:43                 ` Paul E. McKenney
@ 2014-08-13 14:43                 ` Peter Zijlstra
  1 sibling, 0 replies; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 14:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

On Wed, Aug 13, 2014 at 10:13:01AM -0400, Steven Rostedt wrote:
> On Wed, 13 Aug 2014 16:07:05 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Aug 13, 2014 at 09:51:32AM -0400, Steven Rostedt wrote:
> > 
> > > I still need to look at the patches, but if this is just for the idle
> > > case, then we don't need it. The idle case can be solved with a simple
> > > sched_on_each_cpu(). I need a way to solve waiting for processes to
> > > finish from a preemption point.
> > > 
> > > That's all I want, and if we can remove the "idle" case and document it
> > > well that it's not covered and a sched_on_each_cpu() may be needed,
> > > then I'm fine with that.
> > > 
> > > 	sched_on_each_cpu(dummy_op);
> > > 	call_rcu_tasks(free_tramp);
> > 
> > Sure, but why not dtrt and push rcu_idle hooks all the way down into the
> > idle drivers if and where appropriate?
> > 
> > There isn't _that_ much idle driver code. Also, some stuff should be
> > cleaned up; we're already calling stop_critical_timings() in the generic
> > idle code, and then calling it again in the cpuidle drivers.
> > 
> > 
> 
> True, perhaps the rcu code should hook into the stop_critical_timings
> code?

Not sure; the current proposal would have rcu_idle code be far narrower
than the critical_timings thing, not sure if that's an accident or
desired.

If they have similar requirements we could indeed merge them.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:43                 ` Paul E. McKenney
@ 2014-08-13 16:30                   ` Peter Zijlstra
  2014-08-13 16:43                     ` Jacob Pan
  2014-08-13 16:35                   ` Peter Zijlstra
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 16:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani, jacob.jun.pan

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

On Wed, Aug 13, 2014 at 07:43:32AM -0700, Paul E. McKenney wrote:
> o	drivers/acpi/acpi_pad.c, power_saving_thread().
> 
> 	Looks like a kthread that does idle injection.  Currently, RCU
> 	sees it as not a quiescent state.  Would it kill these guys to
> 	put in a comment or two about what this is for???
> 
> 	So adding rcu_idle_enter() and rcu_idle_exit() here might
> 	actually fix a bug, though it is not clear how long this thing
> 	actually runs.  If only for a few milliseconds, no harm done.
> 

> o	drivers/thermal/intel_powerclamp.c, clamp_thread().
> 
> 	Looks similar to power_saving_thread(), but for thermal control.
> 	Probably short term, shouldn't be a problem either way.
> 

There's patches somewhere that make that go-away

  https://lkml.org/lkml/2014/6/4/56

Jacob was going to look at that.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:43                 ` Paul E. McKenney
  2014-08-13 16:30                   ` Peter Zijlstra
@ 2014-08-13 16:35                   ` Peter Zijlstra
  2014-08-13 18:25                     ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 16:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Wed, Aug 13, 2014 at 07:43:32AM -0700, Paul E. McKenney wrote:
> So the first three look OK to hook rcu_idle_enter() and rcu_idle_exit()
> into, but the last two don't look so good.
> 
> That said, if you are OK not tracing the stuff under stop_critical_timings(),
> then I can use the RCU dyntick-idle state and not wake anything up.

Either way, Steve could easily whip up a debug thing that could validate
that. Simply WARN whenever an __mcount happens when under rcu_idle.

And if we make these idle functions small enough that should not be a
problem at all.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 16:30                   ` Peter Zijlstra
@ 2014-08-13 16:43                     ` Jacob Pan
  2014-08-13 18:24                       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Jacob Pan @ 2014-08-13 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Steven Rostedt, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, tglx, dhowells,
	edumazet, dvhart, fweisbec, oleg, bobby.prani

On Wed, 13 Aug 2014 18:30:15 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Aug 13, 2014 at 07:43:32AM -0700, Paul E. McKenney wrote:
> > o	drivers/acpi/acpi_pad.c, power_saving_thread().
> > 
> > 	Looks like a kthread that does idle injection.  Currently,
> > RCU sees it as not a quiescent state.  Would it kill these guys to
> > 	put in a comment or two about what this is for???
> > 
> > 	So adding rcu_idle_enter() and rcu_idle_exit() here might
> > 	actually fix a bug, though it is not clear how long this
> > thing actually runs.  If only for a few milliseconds, no harm done.
> > 
> 
> > o	drivers/thermal/intel_powerclamp.c, clamp_thread().
> > 
> > 	Looks similar to power_saving_thread(), but for thermal
> > control. Probably short term, shouldn't be a problem either way.
> > 
> 
> There's patches somewhere that make that go-away
> 
>   https://lkml.org/lkml/2014/6/4/56
> 
> Jacob was going to look at that.

yes, it is on my plate, plan to submit for 3.18. the idle period is
only for a few miliseconds, default to 6ms.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:42             ` Peter Zijlstra
@ 2014-08-13 17:24               ` Peter Zijlstra
  2014-08-13 17:30                 ` Peter Zijlstra
  2014-08-13 18:16                 ` Peter Zijlstra
  2014-08-13 18:20               ` Paul E. McKenney
  1 sibling, 2 replies; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 17:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

[-- Attachment #1: Type: text/plain, Size: 2693 bytes --]

On Wed, Aug 13, 2014 at 04:42:19PM +0200, Peter Zijlstra wrote:
> Auditing all idle functions will be somewhat of a pain, but its entirely
> doable. Looking at this stuff, it appears we can clean it up massively;
> see how the generic cpuidle code already has the broadcast logic in, so
> we can remove that from the drivers by setting the right flags.
> 
> We can similarly pull out the leave_mm() call by adding a
> CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
> intel_idle (and all other cpuidle_state::enter functions with __notrace.

This removes the broadcast stuff from intel_idle.c; processor_idle.c hurts
my brain, but something similar should be possible.

---
 drivers/idle/intel_idle.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 4d140bbbe100..6613d4ee60ce 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -508,11 +508,8 @@ static int intel_idle(struct cpuidle_device *dev,
 	unsigned long ecx = 1; /* break on interrupt flag */
 	struct cpuidle_state *state = &drv->states[index];
 	unsigned long eax = flg2MWAIT(state->flags);
-	unsigned int cstate;
 	int cpu = smp_processor_id();
 
-	cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
-
 	/*
 	 * leave_mm() to avoid costly and often unnecessary wakeups
 	 * for flushing the user TLB's associated with the active mm.
@@ -520,14 +517,8 @@ static int intel_idle(struct cpuidle_device *dev,
 	if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
 		leave_mm(cpu);
 
-	if (!(lapic_timer_reliable_states & (1 << (cstate))))
-		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
-
 	mwait_idle_with_hints(eax, ecx);
 
-	if (!(lapic_timer_reliable_states & (1 << (cstate))))
-		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
-
 	return index;
 }
 
@@ -670,6 +661,7 @@ static int __init intel_idle_probe(void)
 {
 	unsigned int eax, ebx, ecx;
 	const struct x86_cpu_id *id;
+	int i;
 
 	if (max_cstate == 0) {
 		pr_debug(PREFIX "disabled\n");
@@ -705,6 +697,15 @@ static int __init intel_idle_probe(void)
 	else
 		on_each_cpu(__setup_broadcast_timer, (void *)true, 1);
 
+	for (i = 0; cpuidle_state_table[i].enter; i++) {
+		struct cpuidle_state *state = &cpuidle_state_table[i];
+		int cstate = ((flg2MWAIT(state->flags) >> MWAIT_SUBSTATE_SIZE) & 
+				MWAIT_CSTATE_MASK) + 1;
+
+		if (!(lapic_timer_reliable_states & (1 << cstate)))
+			state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+	}
+
 	pr_debug(PREFIX "v" INTEL_IDLE_VERSION
 		" model 0x%X\n", boot_cpu_data.x86_model);
 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 17:24               ` Peter Zijlstra
@ 2014-08-13 17:30                 ` Peter Zijlstra
  2014-08-13 18:16                 ` Peter Zijlstra
  1 sibling, 0 replies; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 17:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]

On Wed, Aug 13, 2014 at 07:24:07PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 04:42:19PM +0200, Peter Zijlstra wrote:
> > Auditing all idle functions will be somewhat of a pain, but its entirely
> > doable. Looking at this stuff, it appears we can clean it up massively;
> > see how the generic cpuidle code already has the broadcast logic in, so
> > we can remove that from the drivers by setting the right flags.
> > 
> > We can similarly pull out the leave_mm() call by adding a
> > CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
> > intel_idle (and all other cpuidle_state::enter functions with __notrace.
> 
> This removes the broadcast stuff from intel_idle.c; processor_idle.c hurts
> my brain, but something similar should be possible.
> 

And this moves the leave_mm() bit to generic code.

---
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -101,14 +101,6 @@ static int intel_idle_cpu_init(int cpu);
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
- * Set this flag for states where the HW flushes the TLB for us
- * and so we don't need cross-calls to keep it consistent.
- * If this flag is set, SW flushes the TLB, so even if the
- * HW doesn't do the flushing, this flag is safe to use.
- */
-#define CPUIDLE_FLAG_TLB_FLUSHED	0x10000
-
-/*
  * MWAIT takes an 8-bit "hint" in EAX "suggesting"
  * the C-state (top nibble) and sub-state (bottom nibble)
  * 0x00 means "MWAIT(C1)", 0x10 means "MWAIT(C2)" etc.
@@ -508,14 +500,6 @@ static int intel_idle(struct cpuidle_dev
 	unsigned long ecx = 1; /* break on interrupt flag */
 	struct cpuidle_state *state = &drv->states[index];
 	unsigned long eax = flg2MWAIT(state->flags);
-	int cpu = smp_processor_id();
-
-	/*
-	 * leave_mm() to avoid costly and often unnecessary wakeups
-	 * for flushing the user TLB's associated with the active mm.
-	 */
-	if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
-		leave_mm(cpu);
 
 	mwait_idle_with_hints(eax, ecx);
 
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -56,6 +56,7 @@ struct cpuidle_state {
 #define CPUIDLE_FLAG_TIME_VALID	(0x01) /* is residency time measurable? */
 #define CPUIDLE_FLAG_COUPLED	(0x02) /* state applies to multiple cpus */
 #define CPUIDLE_FLAG_TIMER_STOP (0x04)  /* timer is stopped on this state */
+#define CPUIDLE_FLAG_TLB_FLUSHED (0x08) /* TLBs are flushed on this state */
 
 #define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000)
 
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -79,7 +79,7 @@ static void cpuidle_idle_call(void)
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
 	int next_state, entered_state;
-	unsigned int broadcast;
+	unsigned int broadcast, flags;
 
 	/*
 	 * Check if the idle task must be rescheduled. If it is the
@@ -135,7 +135,16 @@ static void cpuidle_idle_call(void)
 		goto exit_idle;
 	}
 
-	broadcast = drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP;
+	flags = drv->states[next_state].flags;
+
+	/*
+	 * leave_mm() to avoid costly and often unnecessary wakeups
+	 * for flushing the user TLB's associated with the active mm.
+	 */
+	if (flags & CPUIDLE_FLAG_TLB_FLUSHED)
+		leave_mm(dev->cpu);
+
+	broadcast = flags & CPUIDLE_FLAG_TIMER_STOP;
 
 	/*
 	 * Tell the time framework to switch to a broadcast timer

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 17:24               ` Peter Zijlstra
  2014-08-13 17:30                 ` Peter Zijlstra
@ 2014-08-13 18:16                 ` Peter Zijlstra
  1 sibling, 0 replies; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 18:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

[-- Attachment #1: Type: text/plain, Size: 5739 bytes --]

On Wed, Aug 13, 2014 at 07:24:07PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 04:42:19PM +0200, Peter Zijlstra wrote:
> > Auditing all idle functions will be somewhat of a pain, but its entirely
> > doable. Looking at this stuff, it appears we can clean it up massively;
> > see how the generic cpuidle code already has the broadcast logic in, so
> > we can remove that from the drivers by setting the right flags.
> > 
> > We can similarly pull out the leave_mm() call by adding a
> > CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
> > intel_idle (and all other cpuidle_state::enter functions with __notrace.
> 
> This removes the broadcast stuff from intel_idle.c; processor_idle.c hurts
> my brain, but something similar should be possible.

Still hurts my brain; esp. acpi_idle_enter_bm() which calls a massive
lot of code to disable busmastering, but maybe its doable, we need that
__notrace detectoring.


Also, someone needs to double check this; preferably LenB or Rafael who
have seen this driver before. Teh pain.

---
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -171,32 +171,11 @@ static void lapic_timer_propagate_broadc
 				 (void *)pr, 1);
 }
 
-/* Power(C) State timer broadcast control */
-static void lapic_timer_state_broadcast(struct acpi_processor *pr,
-				       struct acpi_processor_cx *cx,
-				       int broadcast)
-{
-	int state = cx - pr->power.states;
-
-	if (state >= pr->power.timer_broadcast_on_state) {
-		unsigned long reason;
-
-		reason = broadcast ?  CLOCK_EVT_NOTIFY_BROADCAST_ENTER :
-			CLOCK_EVT_NOTIFY_BROADCAST_EXIT;
-		clockevents_notify(reason, &pr->id);
-	}
-}
-
 #else
 
 static void lapic_timer_check_state(int state, struct acpi_processor *pr,
 				   struct acpi_processor_cx *cstate) { }
 static void lapic_timer_propagate_broadcast(struct acpi_processor *pr) { }
-static void lapic_timer_state_broadcast(struct acpi_processor *pr,
-				       struct acpi_processor_cx *cx,
-				       int broadcast)
-{
-}
 
 #endif
 
@@ -717,19 +696,10 @@ static inline void acpi_idle_do_entry(st
 static int acpi_idle_enter_c1(struct cpuidle_device *dev,
 		struct cpuidle_driver *drv, int index)
 {
-	struct acpi_processor *pr;
 	struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
 
-	pr = __this_cpu_read(processors);
-
-	if (unlikely(!pr))
-		return -EINVAL;
-
-	lapic_timer_state_broadcast(pr, cx, 1);
 	acpi_idle_do_entry(cx);
 
-	lapic_timer_state_broadcast(pr, cx, 0);
-
 	return index;
 }
 
@@ -785,22 +755,8 @@ static int acpi_idle_enter_simple(struct
 		return acpi_idle_enter_c1(dev, drv, CPUIDLE_DRIVER_STATE_START);
 #endif
 
-	/*
-	 * Must be done before busmaster disable as we might need to
-	 * access HPET !
-	 */
-	lapic_timer_state_broadcast(pr, cx, 1);
-
-	if (cx->type == ACPI_STATE_C3)
-		ACPI_FLUSH_CPU_CACHE();
-
-	/* Tell the scheduler that we are going deep-idle: */
-	sched_clock_idle_sleep_event();
 	acpi_idle_do_entry(cx);
 
-	sched_clock_idle_wakeup_event(0);
-
-	lapic_timer_state_broadcast(pr, cx, 0);
 	return index;
 }
 
@@ -843,16 +799,6 @@ static int acpi_idle_enter_bm(struct cpu
 		}
 	}
 
-	acpi_unlazy_tlb(smp_processor_id());
-
-	/* Tell the scheduler that we are going deep-idle: */
-	sched_clock_idle_sleep_event();
-	/*
-	 * Must be done before busmaster disable as we might need to
-	 * access HPET !
-	 */
-	lapic_timer_state_broadcast(pr, cx, 1);
-
 	/*
 	 * disable bus master
 	 * bm_check implies we need ARB_DIS
@@ -884,9 +830,6 @@ static int acpi_idle_enter_bm(struct cpu
 		raw_spin_unlock(&c3_lock);
 	}
 
-	sched_clock_idle_wakeup_event(0);
-
-	lapic_timer_state_broadcast(pr, cx, 0);
 	return index;
 }
 
@@ -989,6 +932,7 @@ static int acpi_processor_setup_cpuidle_
 				state->flags |= CPUIDLE_FLAG_TIME_VALID;
 
 			state->enter = acpi_idle_enter_c1;
+			state->flags |= CPUIDLE_FLAG_TIMER_STOP;
 			state->enter_dead = acpi_idle_play_dead;
 			drv->safe_state_index = count;
 			break;
@@ -996,15 +940,20 @@ static int acpi_processor_setup_cpuidle_
 			case ACPI_STATE_C2:
 			state->flags |= CPUIDLE_FLAG_TIME_VALID;
 			state->enter = acpi_idle_enter_simple;
+			state->flags |= CPUIDLE_FLAG_TIMER_STOP;
 			state->enter_dead = acpi_idle_play_dead;
 			drv->safe_state_index = count;
 			break;
 
 			case ACPI_STATE_C3:
 			state->flags |= CPUIDLE_FLAG_TIME_VALID;
-			state->enter = pr->flags.bm_check ?
-					acpi_idle_enter_bm :
-					acpi_idle_enter_simple;
+			state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
+			state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+			if (pr->flags.bm_check) {
+				state->enter = acpi_idle_enter_bm;
+			} else {
+				state->enter = acpi_idle_enter_simple;
+			}
 			break;
 		}
 
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -152,9 +152,11 @@ static void cpuidle_idle_call(void)
 	 * is used from another cpu as a broadcast timer, this call may
 	 * fail if it is not available
 	 */
-	if (broadcast &&
-	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
-		goto use_default;
+	if (broadcast) {
+		if (clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
+			goto use_default;
+		sched_clock_idle_sleep_event();
+	}
 
 	trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
@@ -167,8 +169,10 @@ static void cpuidle_idle_call(void)
 
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
-	if (broadcast)
+	if (broadcast) {
+		sched_clock_idle_wakeup_event(0);
 		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
+	}
 
 	/*
 	 * Give the governor an opportunity to reflect on the outcome

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 14:42             ` Peter Zijlstra
  2014-08-13 17:24               ` Peter Zijlstra
@ 2014-08-13 18:20               ` Paul E. McKenney
  2014-08-13 18:55                 ` Peter Zijlstra
  1 sibling, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 18:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

On Wed, Aug 13, 2014 at 04:42:19PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 07:12:17AM -0700, Paul E. McKenney wrote:
> > > That's not an excuse for doing horrible things. And inventing new infra
> > > that needs to wake all CPUs is horrible.
> > 
> > Does your patch even work? 
> 
> Haven't even tried compiling it, but making it work shouldn't be too
> hard.
> 
> > Looks like it should, and yes, the idle loop
> > seems quite a bit simpler than it was a few years ago, but we really
> > don't need some strange thing that leaves a CPU idle but not visible as
> > such to RCU.
> 
> There's slightly more to it though; things like the x86 mwait idle wait
> functions tend to do far too much; for instance look at:
> 
> drivers/idle/intel_idle.c:intel_idle()

OK, let's see if I can follow the idly bouncing ball.

Looks like most arches call cpu_startup_entry(), which calls
arch_cpu_idle_prepare() followed by cpu_idle_loop().

arch_cpu_idle_prepare() is local_fiq_enable() on ARM and empty elsewhere.

cpu_idle_loop() does tick_nohz_idle_enter(), which does some nohz stuff.
It also checks for offline, and invokes arch_cpu_idle_enter(), which
is empty except on x86 and ARM.  On ARM, it messes with LEDs, and on
x86 it appears to disable an NMI-based watchdog timer.
Interrupts are disabled, and we do either cpu_idle_poll() if specified
or cpuidle_idle_call().  cpu_idle_poll() is the old-time idle loop,
with rcu_idle_enter()/_exit() and enabling interrupts prior to spinning.
No sign of stop_critical_timings(), though.  So let's not bury
rcu_idle_enter() in stop_critical_timings().

cpuidle_idle_call() does a fastpath irq-enable/exit if need-resched,
then does stop_critical_timings() and rcu_idle_enter().  Then we
have the buried complexity with cpuidle_select(), but a negative
return says to check need-resched and enable interrupts or to
invoke arch_cpu_idle(), which executes various sleep instructions
on various architectures.  Some notable variants:

o	ARM has an arm_pm_idle() function pointer so that different
	SoCs can have different idle-power-down sequences.  Alternatively,
	some SoCs have functions named CPUNAME_do_idle(), for example,
	imx5_cpu_do_idle().  These seem to invoke processor.do_idle().

	Pushing rcu_idle_enter() and rcu_idle_exit() down below
	arch_cpu_idle() on ARM looks to be asking for it.

o	ARM64 does cpu_do_idle, which does a wait-for-interrupt instruction.

o	AVR calls into assembly, hand-coding the need-resched check.
	Not sure why that would still be needed.

o	CRIS has one function that just enables interrupts and returns,
	and anohter that enables interrupts and halts.

o	UM times the duration of the idle time based on what appears to
	be the time until the next event.

o	Unicore does a string of what appear to be no-ops.

o	x86 does a couple of levels of indirection, one being the
	x86_idle() function pointer and another being the safe_halt()
	function pointer.

	amd_e400_idle() is a bit ornate, but still does default_idle()
	which wrappers the safe_halt() pointer.

And various other architectures seem to work similarly, but lots of
hair here.  So Steven, you OK with the underlying arch_cpu_idle()
functions being off-limits to tracing?

Now, if cpuidle_select() returns non-negative, we are dealing with
the CPU-idle governor, which is invoked at the later cpuidle_enter().

Hmmm...  On the CPU-idle drivers...

o	apm_idle_driver puts the idle loop into the ->enter() function,
	apm_cpu_idle().

o	ACPI puts the idle loop in acpi_idle_do_entry(), and does call
	stop_critical_timings(), but not rcu_idle_enter().
	So presumably stop_critical_timings() can nest?  Not clear
	from the code.

o	The CPS driver is even stranger...  Is cps_gen_entry_code()
	really depositing assembly instructions into a buffer that is
	passed back as a function?

o	The intel_idle driver is the one with mwait_idle_with_hints(),
	so you covered it below.

Your patch covers the cpuidle_enter() transition, which means
that functions like cpuidle_enter(), acpi_idle_enter_c1(), and
acpi_idle_do_entry() would be off-limits to trampolining.  In the case
of CPS, quite a bit of code.

> We should push the rcu_idle_{enter,exit}() down to around
> mwait_idle_with_hints(), so we don't call half the word with RCU
> disabled.

That would be for the intel_idle.c CPU-idle driver.  The other drivers
also need rcu_idle_{enter,exit}().

> > I have already said that I will be happy to rip out the wakeup code
> > when it is no longer needed, and I agree that it would be way better if
> > not needed.
> 
> I'd prefer to dtrt now and not needing to fix it later.

Once it works, I might consider it "right" and adjust accordingly.
At the moment, speculation.

> Auditing all idle functions will be somewhat of a pain, but its entirely
> doable. Looking at this stuff, it appears we can clean it up massively;
> see how the generic cpuidle code already has the broadcast logic in, so
> we can remove that from the drivers by setting the right flags.

There is certainly quite a bit of hair in a number of these drivers,
no two ways about it.

> We can similarly pull out the leave_mm() call by adding a
> CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
> intel_idle (and all other cpuidle_state::enter functions with __notrace.

That one seems to be specific to intel_idle.  But yes, nice to avoid
waking an idle CPU for TLB flushes.

> > But I won't base a patch on hypotheticals.  You have already
> > drawn way too much water from -that- well over the past years!  ;-)
> 
> not entirely sure what you're referring to there ;-)

Heh!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 16:43                     ` Jacob Pan
@ 2014-08-13 18:24                       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 18:24 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Peter Zijlstra, Steven Rostedt, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, tglx, dhowells,
	edumazet, dvhart, fweisbec, oleg, bobby.prani

On Wed, Aug 13, 2014 at 09:43:05AM -0700, Jacob Pan wrote:
> On Wed, 13 Aug 2014 18:30:15 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Aug 13, 2014 at 07:43:32AM -0700, Paul E. McKenney wrote:
> > > o	drivers/acpi/acpi_pad.c, power_saving_thread().
> > > 
> > > 	Looks like a kthread that does idle injection.  Currently,
> > > RCU sees it as not a quiescent state.  Would it kill these guys to
> > > 	put in a comment or two about what this is for???
> > > 
> > > 	So adding rcu_idle_enter() and rcu_idle_exit() here might
> > > 	actually fix a bug, though it is not clear how long this
> > > thing actually runs.  If only for a few milliseconds, no harm done.
> > > 
> > 
> > > o	drivers/thermal/intel_powerclamp.c, clamp_thread().
> > > 
> > > 	Looks similar to power_saving_thread(), but for thermal
> > > control. Probably short term, shouldn't be a problem either way.
> > > 
> > 
> > There's patches somewhere that make that go-away
> > 
> >   https://lkml.org/lkml/2014/6/4/56
> > 
> > Jacob was going to look at that.
> 
> yes, it is on my plate, plan to submit for 3.18. the idle period is
> only for a few miliseconds, default to 6ms.

OK, 6ms shouldn't be a problem.  But if you want much longer, you
will need to do rcu_idle_enter() and rcu_idle_exit().  Of course, if
you are getting rid of this entirely, no need to worry -- though the
patch looks like it just uses a different idle loop.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 16:35                   ` Peter Zijlstra
@ 2014-08-13 18:25                     ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 18:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

On Wed, Aug 13, 2014 at 06:35:22PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 07:43:32AM -0700, Paul E. McKenney wrote:
> > So the first three look OK to hook rcu_idle_enter() and rcu_idle_exit()
> > into, but the last two don't look so good.
> > 
> > That said, if you are OK not tracing the stuff under stop_critical_timings(),
> > then I can use the RCU dyntick-idle state and not wake anything up.
> 
> Either way, Steve could easily whip up a debug thing that could validate
> that. Simply WARN whenever an __mcount happens when under rcu_idle.
> 
> And if we make these idle functions small enough that should not be a
> problem at all.

Right now, the CPU-idle drivers look quite hairy and error-prone.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 18:20               ` Paul E. McKenney
@ 2014-08-13 18:55                 ` Peter Zijlstra
  2014-08-13 19:54                   ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Zijlstra @ 2014-08-13 18:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

[-- Attachment #1: Type: text/plain, Size: 3875 bytes --]

On Wed, Aug 13, 2014 at 11:20:30AM -0700, Paul E. McKenney wrote:
> cpuidle_idle_call() does a fastpath irq-enable/exit if need-resched,
> then does stop_critical_timings() and rcu_idle_enter().  Then we
> have the buried complexity with cpuidle_select(), but a negative
> return says to check need-resched and enable interrupts or to
> invoke arch_cpu_idle(), which executes various sleep instructions
> on various architectures.  Some notable variants:


> And various other architectures seem to work similarly, but lots of
> hair here.  So Steven, you OK with the underlying arch_cpu_idle()
> functions being off-limits to tracing?

I didn't find anything particularly hairy in the arch_cpu_idle()
implementations, lots of simple 'go sleep' or 'spin' like things.

> Now, if cpuidle_select() returns non-negative, we are dealing with
> the CPU-idle governor, which is invoked at the later cpuidle_enter().
> 
> Hmmm...  On the CPU-idle drivers...
> 
> o	apm_idle_driver puts the idle loop into the ->enter() function,
> 	apm_cpu_idle().

Yes, this one is creative. The best I came up with is
adding CPUIDLE_FLAG_RCU_IDLE which indicates that the driver will do the
rcu_idle calls and place them in apm_do_idle() around the
apm_bios_call_simple() thing.

Now, that apm_bios_call_simple() thing uses on_cpu0(), which schedules
work on cpu0, which to me seems to guarantee this won't be used on any
SMP system, because that simply _cannot_ work for idle.

And on UP its a few more function calls, we could sprinkle some
__always_inline()s around if we really care I suppose.

> o	ACPI puts the idle loop in acpi_idle_do_entry(), and does call
> 	stop_critical_timings(), but not rcu_idle_enter().
> 	So presumably stop_critical_timings() can nest?  Not clear
> 	from the code.

Yeah, so I'm not sure I see that they nest properly..

Still ACPI does a lot of weird crap in the busmaster idle function,
again I'd suggest that CPUIDLE_FLAG_RCU_IDLE which would let the driver
do rcu_idle itself, and place it in appropriate sites.

Not too hard I think in this case.

> o	The CPS driver is even stranger...  Is cps_gen_entry_code()
> 	really depositing assembly instructions into a buffer that is
> 	passed back as a function?

I had not yet looked at this one; its got that cpu_pm_{enter,exit}()
thing going.. we could do the same and place the manual RCU_IDLE around
cps_pm_enter_state()

> o	The intel_idle driver is the one with mwait_idle_with_hints(),
> 	so you covered it below.

Yeah, fairly straight fwd driver that, _lots_ saner than the ACPI one.

> Your patch covers the cpuidle_enter() transition, which means
> that functions like cpuidle_enter(), acpi_idle_enter_c1(), and
> acpi_idle_do_entry() would be off-limits to trampolining.  In the case
> of CPS, quite a bit of code.

So I think we can do this; sure lots of code, but typically 'simpler'
than RCU stuff.

> > We should push the rcu_idle_{enter,exit}() down to around
> > mwait_idle_with_hints(), so we don't call half the word with RCU
> > disabled.
> 
> That would be for the intel_idle.c CPU-idle driver.  The other drivers
> also need rcu_idle_{enter,exit}().

Right, so simple drivers can use the generic rcu_idle bits from
kernel/sched/idle.c and difficult drivers can use CPUIDLE_FLAG_RCU_IDLE
and do some manual cleverness.

> > > I have already said that I will be happy to rip out the wakeup code
> > > when it is no longer needed, and I agree that it would be way better if
> > > not needed.
> > 
> > I'd prefer to dtrt now and not needing to fix it later.
> 
> Once it works, I might consider it "right" and adjust accordingly.
> At the moment, speculation.

I think its simpler than doing RCU, maybe a little more work, but hey,
I'm the idiot that does full arch/ sweeps on a semi regular basis.


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 18:55                 ` Peter Zijlstra
@ 2014-08-13 19:54                   ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 19:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, rafael

On Wed, Aug 13, 2014 at 08:55:29PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 11:20:30AM -0700, Paul E. McKenney wrote:
> > cpuidle_idle_call() does a fastpath irq-enable/exit if need-resched,
> > then does stop_critical_timings() and rcu_idle_enter().  Then we
> > have the buried complexity with cpuidle_select(), but a negative
> > return says to check need-resched and enable interrupts or to
> > invoke arch_cpu_idle(), which executes various sleep instructions
> > on various architectures.  Some notable variants:
> 
> > And various other architectures seem to work similarly, but lots of
> > hair here.  So Steven, you OK with the underlying arch_cpu_idle()
> > functions being off-limits to tracing?
> 
> I didn't find anything particularly hairy in the arch_cpu_idle()
> implementations, lots of simple 'go sleep' or 'spin' like things.

"Hairy" in terms of lots of assembly, in many cases apparently tightly
coupled to a given SoC.

> > Now, if cpuidle_select() returns non-negative, we are dealing with
> > the CPU-idle governor, which is invoked at the later cpuidle_enter().
> > 
> > Hmmm...  On the CPU-idle drivers...
> > 
> > o	apm_idle_driver puts the idle loop into the ->enter() function,
> > 	apm_cpu_idle().
> 
> Yes, this one is creative. The best I came up with is
> adding CPUIDLE_FLAG_RCU_IDLE which indicates that the driver will do the
> rcu_idle calls and place them in apm_do_idle() around the
> apm_bios_call_simple() thing.
> 
> Now, that apm_bios_call_simple() thing uses on_cpu0(), which schedules
> work on cpu0, which to me seems to guarantee this won't be used on any
> SMP system, because that simply _cannot_ work for idle.
> 
> And on UP its a few more function calls, we could sprinkle some
> __always_inline()s around if we really care I suppose.

Heh, I missed the UP-only pieces.

> > o	ACPI puts the idle loop in acpi_idle_do_entry(), and does call
> > 	stop_critical_timings(), but not rcu_idle_enter().
> > 	So presumably stop_critical_timings() can nest?  Not clear
> > 	from the code.
> 
> Yeah, so I'm not sure I see that they nest properly..
> 
> Still ACPI does a lot of weird crap in the busmaster idle function,
> again I'd suggest that CPUIDLE_FLAG_RCU_IDLE which would let the driver
> do rcu_idle itself, and place it in appropriate sites.
> 
> Not too hard I think in this case.

My main concern is avoiding situations where the driver manages to loop
without passing through the rcu_idle_enter() and rcu_idle_exit().
In case someone manages to misconfigure things so that the driver
just endlessly shuttles among states without actually ever really
going idle.

> > o	The CPS driver is even stranger...  Is cps_gen_entry_code()
> > 	really depositing assembly instructions into a buffer that is
> > 	passed back as a function?
> 
> I had not yet looked at this one; its got that cpu_pm_{enter,exit}()
> thing going.. we could do the same and place the manual RCU_IDLE around
> cps_pm_enter_state()

Again, as long as it avoids loops that don't include code under
rcu_idle_enter().

> > o	The intel_idle driver is the one with mwait_idle_with_hints(),
> > 	so you covered it below.
> 
> Yeah, fairly straight fwd driver that, _lots_ saner than the ACPI one.

Now -that- is damning with faint praise!  ;-)

> > Your patch covers the cpuidle_enter() transition, which means
> > that functions like cpuidle_enter(), acpi_idle_enter_c1(), and
> > acpi_idle_do_entry() would be off-limits to trampolining.  In the case
> > of CPS, quite a bit of code.
> 
> So I think we can do this; sure lots of code, but typically 'simpler'
> than RCU stuff.

For some definition of "simpler".  ;-)

> > > We should push the rcu_idle_{enter,exit}() down to around
> > > mwait_idle_with_hints(), so we don't call half the word with RCU
> > > disabled.
> > 
> > That would be for the intel_idle.c CPU-idle driver.  The other drivers
> > also need rcu_idle_{enter,exit}().
> 
> Right, so simple drivers can use the generic rcu_idle bits from
> kernel/sched/idle.c and difficult drivers can use CPUIDLE_FLAG_RCU_IDLE
> and do some manual cleverness.

OK, but in this case the relevant definition of "simple" is "never will
need a trampoline".  Steven, thoughts?

> > > > I have already said that I will be happy to rip out the wakeup code
> > > > when it is no longer needed, and I agree that it would be way better if
> > > > not needed.
> > > 
> > > I'd prefer to dtrt now and not needing to fix it later.
> > 
> > Once it works, I might consider it "right" and adjust accordingly.
> > At the moment, speculation.
> 
> I think its simpler than doing RCU, maybe a little more work, but hey,
> I'm the idiot that does full arch/ sweeps on a semi regular basis.

Hmmm...  Exactly what are you thinking could be enabled by this
proposed change to the idle code?  From what I can see at the moment,
it would allow me to drop the schedule-on-holdout-idle code and allow
synchronize_sched() to be substituted on !CONFIG_PREEMPT kernels.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()
  2014-08-13 14:33         ` Peter Zijlstra
@ 2014-08-13 20:06           ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 20:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Wed, Aug 13, 2014 at 04:33:10PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 07:07:51AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 13, 2014 at 12:56:18PM +0200, Peter Zijlstra wrote:
> > > On Mon, Aug 11, 2014 at 03:49:03PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > In theory, synchronize_sched() requires a read-side critical section to
> > > > order against.  In practice, preemption can be thought of as being
> > > > disabled across every machine instruction.  So this commit removes
> > > > the redundant preempt_disable() from rcu_note_voluntary_context_switch().
> > > 
> > > >  #define rcu_note_voluntary_context_switch(t) \
> > > >  	do { \
> > > > -		preempt_disable(); /* Exclude synchronize_sched(); */ \
> > > >  		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> > > >  			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > > > -		preempt_enable(); \
> > > >  	} while (0)
> > > 
> > > But that's more than 1 instruction.
> > 
> > Yeah, the commit log could use some help.  The instruction in question
> > is the store.  The "if" is just an optimization.
> > 
> > So suppose that this sequence is preempted between the "if" and the store,
> > and that the synchronize_sched() (and quite a bit more besides!) takes
> > place during this preemption.  The task is still in a quiescent state
> > at the time of the store, so the store is still legitimate.
> > 
> > That said, it might be better to just leave preemption disabled, as that
> > certainly makes things simpler.  Thoughts?
> 
> A comment explaining it should be fine I think. I was just raising the
> obvious fail in the changelog.

Fair enough, here is the update.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()

In theory, synchronize_sched() requires a read-side critical section
to order against.  In practice, preemption can be thought of as
being disabled across every machine instruction, at least for those
machine instructions that are not in the idle loop and not on offline
CPUs.  So this commit removes the redundant preempt_disable() from
rcu_note_voluntary_context_switch().

Please note that the single instruction in question is the store of
zero to ->rcu_tasks_holdout.  The "if" is simply a performance optimization
that avoids unnecessary stores.  To see this, keep in mind that both
the "if" condition and the store are in a quiescent state.  Therefore,
even if the task is preempted for a full grace period (presumably due
to its having done a context switch beforehand), the store will be
recording a legitimate quiescent state.

Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f504f797c9c8..ed6e3e2e0089 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -326,10 +326,8 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
 extern struct srcu_struct tasks_rcu_exit_srcu;
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
-		preempt_disable(); /* Exclude synchronize_sched(); */ \
 		if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
 			ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
-		preempt_enable(); \
 	} while (0)
 #else /* #ifdef CONFIG_TASKS_RCU */
 #define TASKS_RCU(x) do { } while (0)


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks
  2014-08-13 13:51           ` Steven Rostedt
  2014-08-13 14:07             ` Peter Zijlstra
@ 2014-08-13 20:56             ` Paul E. McKenney
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-13 20:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, dhowells, edumazet, dvhart,
	fweisbec, oleg, bobby.prani

On Wed, Aug 13, 2014 at 09:51:32AM -0400, Steven Rostedt wrote:
> On Wed, 13 Aug 2014 15:40:25 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Aug 13, 2014 at 05:48:18AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 13, 2014 at 10:12:15AM +0200, Peter Zijlstra wrote:
> > > > On Mon, Aug 11, 2014 at 03:49:04PM -0700, Paul E. McKenney wrote:
> > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > Because idle-task code may need to be patched, RCU-tasks need to wait
> > > > > for idle tasks to schedule.  This commit therefore detects this case
> > > > > via context switch.  Block CPU hotplug during this time to avoid sending
> > > > > IPIs to offline CPUs.
> > > > > 
> > > > > Note that checking for changes in the dyntick-idle counters is tempting,
> > > > > but wrong.  The reason that it is wrong is that a interrupt or NMI can
> > > > > increment these counters without necessarily allowing the idle tasks to
> > > > > make any forward progress.
> > > > 
> > > > I'm going to NAK this.. with that rcu_idle patch I send there's
> > > > typically only a single idle function thats out of bounds and if its
> > > > more it can be made that with a bit of tlc to the cpuidle driver in
> > > > question.
> > > > 
> > > > This needs _FAR_ more justification than a maybe and a want.
> > > 
> > > Peter, your patch might be a good start, but I didn't see any reaction
> > > from Steven or Masami and it did only x86.
> > 
> > That's not an excuse for doing horrible things. And inventing new infra
> > that needs to wake all CPUs is horrible.
> 
> I still need to look at the patches, but if this is just for the idle
> case, then we don't need it. The idle case can be solved with a simple
> sched_on_each_cpu(). I need a way to solve waiting for processes to
> finish from a preemption point.
> 
> That's all I want, and if we can remove the "idle" case and document it
> well that it's not covered and a sched_on_each_cpu() may be needed,
> then I'm fine with that.
> 
> 	sched_on_each_cpu(dummy_op);
> 	call_rcu_tasks(free_tramp);
> 
> Would that work?

If you are taking that approach, I can of course drop my commit dealing
with idle tasks.  Should the rcu_idle_enter() and rcu_idle_exit() calls
avoid cover any functions needing trampolines, it would be easy to pull
them back in -- especially given that the RCU dyntick-idle information
would call out the quiescent states appropriately.

So unless you tell me otherwise, Steven, I will drop the idle-detection
commit in favor of your sched_on_each_cpu() approach.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules Paul E. McKenney
@ 2014-08-14 19:08     ` Pranith Kumar
  2014-08-14 21:29       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 19:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> This commit exports the RCU-tasks APIs, call_rcu_tasks(),
> synchronize_rcu_tasks(), and rcu_barrier_tasks(), to GPL-licensed
> kernel modules.

Only two of these are being exported in this patch. Patch 1 is adding
the export for call_rcu_tasks().


>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  kernel/rcu/update.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 4cece6e886ee..8f53a41dd9ee 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -433,6 +433,7 @@ void synchronize_rcu_tasks(void)
>         /* Wait for the grace period. */
>         wait_rcu_gp(call_rcu_tasks);
>  }
> +EXPORT_SYMBOL_GPL(synchronize_rcu_tasks);
>
>  /**
>   * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks.
> @@ -445,6 +446,7 @@ void rcu_barrier_tasks(void)
>         /* There is only one callback queue, so this is easy.  ;-) */
>         synchronize_rcu_tasks();
>  }
> +EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
>
>  /* See if tasks are still holding out, complain if so. */
>  static void check_holdout_task(struct task_struct *t)
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks()
  2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
                     ` (14 preceding siblings ...)
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 16/16] rcu: Additional information on RCU-tasks stall-warning messages Paul E. McKenney
@ 2014-08-14 20:46   ` Pranith Kumar
  2014-08-14 21:22     ` Paul E. McKenney
  15 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 20:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> This commit adds a new RCU-tasks flavor of RCU, which provides
> call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> context switch (not preemption!), userspace execution, and the idle loop.
> Note that unlike other RCU flavors, these quiescent states occur in tasks,
> not necessarily CPUs.  Includes fixes from Steven Rostedt.
>
> This RCU flavor is assumed to have very infrequent latency-tolerant
> updaters.  This assumption permits significant simplifications, including
> a single global callback list protected by a single global lock, along
> with a single linked list containing all tasks that have not yet passed
> through a quiescent state.  If experience shows this assumption to be
> incorrect, the required additional complexity will be added.
>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Please find comments below. I did not read all the ~100 emails in this
series, so please forgive if I ask something repetitive and just point
that out. I will go digging :)

> ---
>  include/linux/init_task.h |   9 +++
>  include/linux/rcupdate.h  |  36 ++++++++++
>  include/linux/sched.h     |  23 ++++---
>  init/Kconfig              |  10 +++
>  kernel/rcu/tiny.c         |   2 +
>  kernel/rcu/tree.c         |   2 +
>  kernel/rcu/update.c       | 171 ++++++++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 242 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6df7f9fe0d01..78715ea7c30c 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -124,6 +124,14 @@ extern struct group_info init_groups;
>  #else
>  #define INIT_TASK_RCU_PREEMPT(tsk)
>  #endif
> +#ifdef CONFIG_TASKS_RCU
> +#define INIT_TASK_RCU_TASKS(tsk)                                       \
> +       .rcu_tasks_holdout = false,                                     \
> +       .rcu_tasks_holdout_list =                                       \
> +               LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> +#else
> +#define INIT_TASK_RCU_TASKS(tsk)
> +#endif

rcu_tasks_holdout is defined as an int. So use 0 may be?

I see that there are other locations which set it to 'false'. So may
just change the definition to bool, as it seems more appropriate.

Also why is rcu_tasks_nvcsw not being initialized? I see that it can
be read before initialized, no?

>
>  extern struct cred init_cred;
>
> @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
>         INIT_FTRACE_GRAPH                                               \
>         INIT_TRACE_RECURSION                                            \
>         INIT_TASK_RCU_PREEMPT(tsk)                                      \
> +       INIT_TASK_RCU_TASKS(tsk)                                        \
>         INIT_CPUSET_SEQ(tsk)                                            \
>         INIT_RT_MUTEXES(tsk)                                            \
>         INIT_VTIME(tsk)                                                 \
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 6a94cc8b1ca0..829efc99df3e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
>
>  void synchronize_sched(void);
>
> +/**
> + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period

-ENOPARSE :(

> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual callback function to be invoked after the grace period
> + *
> + * The callback function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed. call_rcu_tasks() assumes
> + * that the read-side critical sections end at a voluntary context
> + * switch (not a preemption!), entry into idle, or transition to usermode
> + * execution.  As such, there are no read-side primitives analogous to
> + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> + * to determine that all tasks have passed through a safe state, not so
> + * much for data-strcuture synchronization.

s/strcuture/structure

> + *
> + * See the description of call_rcu() for more detailed information on
> + * memory ordering guarantees.
> + */
> +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head));
> +
>  #ifdef CONFIG_PREEMPT_RCU
>
>  void __rcu_read_lock(void);
> @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
>                 rcu_irq_exit(); \
>         } while (0)
>
> +/*
> + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> + * macro rather than an inline function to avoid #include hell.
> + */
> +#ifdef CONFIG_TASKS_RCU
> +#define rcu_note_voluntary_context_switch(t) \
> +       do { \
> +               preempt_disable(); /* Exclude synchronize_sched(); */ \
> +               if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> +                       ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> +               preempt_enable(); \
> +       } while (0)
> +#else /* #ifdef CONFIG_TASKS_RCU */
> +#define rcu_note_voluntary_context_switch(t)   do { } while (0)
> +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> +
>  #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP)
>  bool __rcu_is_watching(void);
>  #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 306f4f0c987a..3cf124389ec7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1273,6 +1273,11 @@ struct task_struct {
>  #ifdef CONFIG_RCU_BOOST
>         struct rt_mutex *rcu_boost_mutex;
>  #endif /* #ifdef CONFIG_RCU_BOOST */
> +#ifdef CONFIG_TASKS_RCU
> +       unsigned long rcu_tasks_nvcsw;
> +       int rcu_tasks_holdout;
> +       struct list_head rcu_tasks_holdout_list;
> +#endif /* #ifdef CONFIG_TASKS_RCU */
>
>  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
>         struct sched_info sched_info;
> @@ -1998,31 +2003,27 @@ extern void task_clear_jobctl_pending(struct task_struct *task,
>                                       unsigned int mask);
>
>  #ifdef CONFIG_PREEMPT_RCU
> -
>  #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
>  #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
> +#endif /* #ifdef CONFIG_PREEMPT_RCU */
>
>  static inline void rcu_copy_process(struct task_struct *p)
>  {
> +#ifdef CONFIG_PREEMPT_RCU
>         p->rcu_read_lock_nesting = 0;
>         p->rcu_read_unlock_special = 0;
> -#ifdef CONFIG_TREE_PREEMPT_RCU
>         p->rcu_blocked_node = NULL;
> -#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
>  #ifdef CONFIG_RCU_BOOST
>         p->rcu_boost_mutex = NULL;
>  #endif /* #ifdef CONFIG_RCU_BOOST */
>         INIT_LIST_HEAD(&p->rcu_node_entry);
> +#endif /* #ifdef CONFIG_PREEMPT_RCU */
> +#ifdef CONFIG_TASKS_RCU
> +       p->rcu_tasks_holdout = false;
> +       INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> +#endif /* #ifdef CONFIG_TASKS_RCU */
>  }

I think rcu_tasks_nvcsw needs to be set here too.

>
> -#else
> -
> -static inline void rcu_copy_process(struct task_struct *p)
> -{
> -}
> -
> -#endif
> -
>  static inline void tsk_restore_flags(struct task_struct *task,
>                                 unsigned long orig_flags, unsigned long flags)
>  {
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d76b99af1b9..c56cb62a2df1 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -507,6 +507,16 @@ config PREEMPT_RCU
>           This option enables preemptible-RCU code that is common between
>           the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
>
> +config TASKS_RCU
> +       bool "Task_based RCU implementation using voluntary context switch"
> +       default n
> +       help
> +         This option enables a task-based RCU implementation that uses
> +         only voluntary context switch (not preemption!), idle, and
> +         user-mode execution as quiescent states.
> +
> +         If unsure, say N.
> +
>  config RCU_STALL_COMMON
>         def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
>         help
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index d9efcc13008c..717f00854fc0 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -254,6 +254,8 @@ void rcu_check_callbacks(int cpu, int user)
>                 rcu_sched_qs(cpu);
>         else if (!in_softirq())
>                 rcu_bh_qs(cpu);
> +       if (user)
> +               rcu_note_voluntary_context_switch(current);
>  }
>
>  /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 625d0b0cd75a..f958c52f644d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2413,6 +2413,8 @@ void rcu_check_callbacks(int cpu, int user)
>         rcu_preempt_check_callbacks(cpu);
>         if (rcu_pending(cpu))
>                 invoke_rcu_core();
> +       if (user)
> +               rcu_note_voluntary_context_switch(current);
>         trace_rcu_utilization(TPS("End scheduler-tick"));
>  }
>
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index bc7883570530..f6f164119a14 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -47,6 +47,7 @@
>  #include <linux/hardirq.h>
>  #include <linux/delay.h>
>  #include <linux/module.h>
> +#include <linux/kthread.h>
>
>  #define CREATE_TRACE_POINTS
>
> @@ -350,3 +351,173 @@ static int __init check_cpu_stall_init(void)
>  early_initcall(check_cpu_stall_init);
>
>  #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
> +
> +#ifdef CONFIG_TASKS_RCU
> +
> +/*
> + * Simple variant of RCU whose quiescent states are voluntary context switch,
> + * user-space execution, and idle.  As such, grace periods can take one good
> + * long time.  There are no read-side primitives similar to rcu_read_lock()
> + * and rcu_read_unlock() because this implementation is intended to get
> + * the system into a safe state for some of the manipulations involved in
> + * tracing and the like.  Finally, this implementation does not support
> + * high call_rcu_tasks() rates from multiple CPUs.  If this is required,
> + * per-CPU callback lists will be needed.
> + */
> +
> +/* Global list of callbacks and associated lock. */
> +static struct rcu_head *rcu_tasks_cbs_head;
> +static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
> +static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
> +
> +/* Post an RCU-tasks callback. */
> +void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
> +{
> +       unsigned long flags;
> +
> +       rhp->next = NULL;
> +       rhp->func = func;
> +       raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> +       *rcu_tasks_cbs_tail = rhp;
> +       rcu_tasks_cbs_tail = &rhp->next;
> +       raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(call_rcu_tasks);
> +
> +/* See if tasks are still holding out, complain if so. */
> +static void check_holdout_task(struct task_struct *t)
> +{
> +       if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
> +           t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> +           !ACCESS_ONCE(t->on_rq)) {
> +               ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
> +               list_del_rcu(&t->rcu_tasks_holdout_list);
> +               put_task_struct(t);
> +       }
> +}
> +

I don't see a WARN() for the "complain if so" part. :)


> +/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
> +static int __noreturn rcu_tasks_kthread(void *arg)
> +{
> +       unsigned long flags;
> +       struct task_struct *g, *t;
> +       struct rcu_head *list;
> +       struct rcu_head *next;
> +       LIST_HEAD(rcu_tasks_holdouts);
> +
> +       /* FIXME: Add housekeeping affinity. */
> +
> +       /*
> +        * Each pass through the following loop makes one check for
> +        * newly arrived callbacks, and, if there are some, waits for
> +        * one RCU-tasks grace period and then invokes the callbacks.
> +        * This loop is terminated by the system going down.  ;-)
> +        */
> +       for (;;) {
> +
> +               /* Pick up any new callbacks. */
> +               raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> +               list = rcu_tasks_cbs_head;
> +               rcu_tasks_cbs_head = NULL;
> +               rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
> +               raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> +
> +               /* If there were none, wait a bit and start over. */
> +               if (!list) {
> +                       schedule_timeout_interruptible(HZ);
> +                       WARN_ON(signal_pending(current));
> +                       continue;
> +               }

Why not use a wait queue here? Since this is called very infrequently,
it should be a win when compared to periodically waking up and
checking, no?

> +
> +               /*
> +                * Wait for all pre-existing t->on_rq and t->nvcsw
> +                * transitions to complete.  Invoking synchronize_sched()
> +                * suffices because all these transitions occur with
> +                * interrupts disabled.  Without this synchronize_sched(),
> +                * a read-side critical section that started before the
> +                * grace period might be incorrectly seen as having started
> +                * after the grace period.
> +                *
> +                * This synchronize_sched() also dispenses with the
> +                * need for a memory barrier on the first store to
> +                * ->rcu_tasks_holdout, as it forces the store to happen
> +                * after the beginning of the grace period.
> +                */
> +               synchronize_sched();
> +
> +               /*
> +                * There were callbacks, so we need to wait for an
> +                * RCU-tasks grace period.  Start off by scanning
> +                * the task list for tasks that are not already
> +                * voluntarily blocked.  Mark these tasks and make
> +                * a list of them in rcu_tasks_holdouts.
> +                */
> +               rcu_read_lock();
> +               for_each_process_thread(g, t) {
> +                       if (t != current && ACCESS_ONCE(t->on_rq) &&
> +                           !is_idle_task(t)) {
> +                               get_task_struct(t);
> +                               t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw);
> +                               ACCESS_ONCE(t->rcu_tasks_holdout) = 1;
> +                               list_add(&t->rcu_tasks_holdout_list,
> +                                        &rcu_tasks_holdouts);
> +                       }
> +               }
> +               rcu_read_unlock();

I don't see why this is a read side critical section. What am I missing?

> +
> +               /*
> +                * Each pass through the following loop scans the list
> +                * of holdout tasks, removing any that are no longer
> +                * holdouts.  When the list is empty, we are done.
> +                */
> +               while (!list_empty(&rcu_tasks_holdouts)) {
> +                       schedule_timeout_interruptible(HZ);
> +                       WARN_ON(signal_pending(current));
> +                       rcu_read_lock();
> +                       list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
> +                                               rcu_tasks_holdout_list)
> +                               check_holdout_task(t);
> +                       rcu_read_unlock();
> +               }
> +
> +               /*
> +                * Because ->on_rq and ->nvcsw are not guaranteed
> +                * to have a full memory barriers prior to them in the
> +                * schedule() path, memory reordering on other CPUs could
> +                * cause their RCU-tasks read-side critical sections to
> +                * extend past the end of the grace period.  However,
> +                * because these ->nvcsw updates are carried out with
> +                * interrupts disabled, we can use synchronize_sched()
> +                * to force the needed ordering on all such CPUs.
> +                *
> +                * This synchronize_sched() also confines all
> +                * ->rcu_tasks_holdout accesses to be within the grace
> +                * period, avoiding the need for memory barriers for
> +                * ->rcu_tasks_holdout accesses.
> +                */
> +               synchronize_sched();
> +
> +               /* Invoke the callbacks. */
> +               while (list) {
> +                       next = list->next;

I think adding a prefetch(next) here should be helpful.

> +                       local_bh_disable();
> +                       list->func(list);
> +                       local_bh_enable();
> +                       list = next;
> +                       cond_resched();
> +               }
> +       }
> +}
> +
> +/* Spawn rcu_tasks_kthread() at boot time. */
> +static int __init rcu_spawn_tasks_kthread(void)
> +{
> +       struct task_struct __maybe_unused *t;
> +
> +       t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
> +       BUG_ON(IS_ERR(t));
> +       return 0;
> +}
> +early_initcall(rcu_spawn_tasks_kthread);
> +
> +#endif /* #ifdef CONFIG_TASKS_RCU */
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks()
  2014-08-14 20:46   ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Pranith Kumar
@ 2014-08-14 21:22     ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 21:22 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 04:46:34PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > This commit adds a new RCU-tasks flavor of RCU, which provides
> > call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> > context switch (not preemption!), userspace execution, and the idle loop.
> > Note that unlike other RCU flavors, these quiescent states occur in tasks,
> > not necessarily CPUs.  Includes fixes from Steven Rostedt.
> >
> > This RCU flavor is assumed to have very infrequent latency-tolerant
> > updaters.  This assumption permits significant simplifications, including
> > a single global callback list protected by a single global lock, along
> > with a single linked list containing all tasks that have not yet passed
> > through a quiescent state.  If experience shows this assumption to be
> > incorrect, the required additional complexity will be added.
> >
> > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Please find comments below. I did not read all the ~100 emails in this
> series, so please forgive if I ask something repetitive and just point
> that out. I will go digging :)

;-)

> > ---
> >  include/linux/init_task.h |   9 +++
> >  include/linux/rcupdate.h  |  36 ++++++++++
> >  include/linux/sched.h     |  23 ++++---
> >  init/Kconfig              |  10 +++
> >  kernel/rcu/tiny.c         |   2 +
> >  kernel/rcu/tree.c         |   2 +
> >  kernel/rcu/update.c       | 171 ++++++++++++++++++++++++++++++++++++++++++++++
> >  7 files changed, 242 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 6df7f9fe0d01..78715ea7c30c 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -124,6 +124,14 @@ extern struct group_info init_groups;
> >  #else
> >  #define INIT_TASK_RCU_PREEMPT(tsk)
> >  #endif
> > +#ifdef CONFIG_TASKS_RCU
> > +#define INIT_TASK_RCU_TASKS(tsk)                                       \
> > +       .rcu_tasks_holdout = false,                                     \
> > +       .rcu_tasks_holdout_list =                                       \
> > +               LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +#else
> > +#define INIT_TASK_RCU_TASKS(tsk)
> > +#endif
> 
> rcu_tasks_holdout is defined as an int. So use 0 may be?

Good point.  I started with a bool, but then needed to do
smp_store_release(), which doesn't support bool.

> I see that there are other locations which set it to 'false'. So may
> just change the definition to bool, as it seems more appropriate.

If I no longer use smp_store_release, yep.

And it appears that I no longer do, so changed back to bool.

> Also why is rcu_tasks_nvcsw not being initialized? I see that it can
> be read before initialized, no?

It initialized by rcu_tasks_kthread() before putting a given task on the
rcu_tasks_holdouts list.  It is only read for tasks on that list.  So
there is not use before initialization.

> >  extern struct cred init_cred;
> >
> > @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
> >         INIT_FTRACE_GRAPH                                               \
> >         INIT_TRACE_RECURSION                                            \
> >         INIT_TASK_RCU_PREEMPT(tsk)                                      \
> > +       INIT_TASK_RCU_TASKS(tsk)                                        \
> >         INIT_CPUSET_SEQ(tsk)                                            \
> >         INIT_RT_MUTEXES(tsk)                                            \
> >         INIT_VTIME(tsk)                                                 \
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 6a94cc8b1ca0..829efc99df3e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
> >
> >  void synchronize_sched(void);
> >
> > +/**
> > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> 
> -ENOPARSE :(
> 
> > + * @head: structure to be used for queueing the RCU updates.
> > + * @func: actual callback function to be invoked after the grace period
> > + *
> > + * The callback function will be invoked some time after a full grace
> > + * period elapses, in other words after all currently executing RCU
> > + * read-side critical sections have completed. call_rcu_tasks() assumes
> > + * that the read-side critical sections end at a voluntary context
> > + * switch (not a preemption!), entry into idle, or transition to usermode
> > + * execution.  As such, there are no read-side primitives analogous to
> > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> > + * to determine that all tasks have passed through a safe state, not so
> > + * much for data-strcuture synchronization.
> 
> s/strcuture/structure
> 
> > + *
> > + * See the description of call_rcu() for more detailed information on
> > + * memory ordering guarantees.
> > + */
> > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head *head));
> > +
> >  #ifdef CONFIG_PREEMPT_RCU
> >
> >  void __rcu_read_lock(void);
> > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
> >                 rcu_irq_exit(); \
> >         } while (0)
> >
> > +/*
> > + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > + * macro rather than an inline function to avoid #include hell.
> > + */
> > +#ifdef CONFIG_TASKS_RCU
> > +#define rcu_note_voluntary_context_switch(t) \
> > +       do { \
> > +               preempt_disable(); /* Exclude synchronize_sched(); */ \
> > +               if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> > +                       ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > +               preempt_enable(); \
> > +       } while (0)
> > +#else /* #ifdef CONFIG_TASKS_RCU */
> > +#define rcu_note_voluntary_context_switch(t)   do { } while (0)
> > +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> > +
> >  #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP)
> >  bool __rcu_is_watching(void);
> >  #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 306f4f0c987a..3cf124389ec7 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1273,6 +1273,11 @@ struct task_struct {
> >  #ifdef CONFIG_RCU_BOOST
> >         struct rt_mutex *rcu_boost_mutex;
> >  #endif /* #ifdef CONFIG_RCU_BOOST */
> > +#ifdef CONFIG_TASKS_RCU
> > +       unsigned long rcu_tasks_nvcsw;
> > +       int rcu_tasks_holdout;
> > +       struct list_head rcu_tasks_holdout_list;
> > +#endif /* #ifdef CONFIG_TASKS_RCU */
> >
> >  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> >         struct sched_info sched_info;
> > @@ -1998,31 +2003,27 @@ extern void task_clear_jobctl_pending(struct task_struct *task,
> >                                       unsigned int mask);
> >
> >  #ifdef CONFIG_PREEMPT_RCU
> > -
> >  #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
> >  #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
> > +#endif /* #ifdef CONFIG_PREEMPT_RCU */
> >
> >  static inline void rcu_copy_process(struct task_struct *p)
> >  {
> > +#ifdef CONFIG_PREEMPT_RCU
> >         p->rcu_read_lock_nesting = 0;
> >         p->rcu_read_unlock_special = 0;
> > -#ifdef CONFIG_TREE_PREEMPT_RCU
> >         p->rcu_blocked_node = NULL;
> > -#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
> >  #ifdef CONFIG_RCU_BOOST
> >         p->rcu_boost_mutex = NULL;
> >  #endif /* #ifdef CONFIG_RCU_BOOST */
> >         INIT_LIST_HEAD(&p->rcu_node_entry);
> > +#endif /* #ifdef CONFIG_PREEMPT_RCU */
> > +#ifdef CONFIG_TASKS_RCU
> > +       p->rcu_tasks_holdout = false;
> > +       INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> > +#endif /* #ifdef CONFIG_TASKS_RCU */
> >  }
> 
> I think rcu_tasks_nvcsw needs to be set here too.

Nope, just in rcu_tasks_kthread().

> >
> > -#else
> > -
> > -static inline void rcu_copy_process(struct task_struct *p)
> > -{
> > -}
> > -
> > -#endif
> > -
> >  static inline void tsk_restore_flags(struct task_struct *task,
> >                                 unsigned long orig_flags, unsigned long flags)
> >  {
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 9d76b99af1b9..c56cb62a2df1 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -507,6 +507,16 @@ config PREEMPT_RCU
> >           This option enables preemptible-RCU code that is common between
> >           the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
> >
> > +config TASKS_RCU
> > +       bool "Task_based RCU implementation using voluntary context switch"
> > +       default n
> > +       help
> > +         This option enables a task-based RCU implementation that uses
> > +         only voluntary context switch (not preemption!), idle, and
> > +         user-mode execution as quiescent states.
> > +
> > +         If unsure, say N.
> > +
> >  config RCU_STALL_COMMON
> >         def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
> >         help
> > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > index d9efcc13008c..717f00854fc0 100644
> > --- a/kernel/rcu/tiny.c
> > +++ b/kernel/rcu/tiny.c
> > @@ -254,6 +254,8 @@ void rcu_check_callbacks(int cpu, int user)
> >                 rcu_sched_qs(cpu);
> >         else if (!in_softirq())
> >                 rcu_bh_qs(cpu);
> > +       if (user)
> > +               rcu_note_voluntary_context_switch(current);
> >  }
> >
> >  /*
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 625d0b0cd75a..f958c52f644d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2413,6 +2413,8 @@ void rcu_check_callbacks(int cpu, int user)
> >         rcu_preempt_check_callbacks(cpu);
> >         if (rcu_pending(cpu))
> >                 invoke_rcu_core();
> > +       if (user)
> > +               rcu_note_voluntary_context_switch(current);
> >         trace_rcu_utilization(TPS("End scheduler-tick"));
> >  }
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index bc7883570530..f6f164119a14 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -47,6 +47,7 @@
> >  #include <linux/hardirq.h>
> >  #include <linux/delay.h>
> >  #include <linux/module.h>
> > +#include <linux/kthread.h>
> >
> >  #define CREATE_TRACE_POINTS
> >
> > @@ -350,3 +351,173 @@ static int __init check_cpu_stall_init(void)
> >  early_initcall(check_cpu_stall_init);
> >
> >  #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
> > +
> > +#ifdef CONFIG_TASKS_RCU
> > +
> > +/*
> > + * Simple variant of RCU whose quiescent states are voluntary context switch,
> > + * user-space execution, and idle.  As such, grace periods can take one good
> > + * long time.  There are no read-side primitives similar to rcu_read_lock()
> > + * and rcu_read_unlock() because this implementation is intended to get
> > + * the system into a safe state for some of the manipulations involved in
> > + * tracing and the like.  Finally, this implementation does not support
> > + * high call_rcu_tasks() rates from multiple CPUs.  If this is required,
> > + * per-CPU callback lists will be needed.
> > + */
> > +
> > +/* Global list of callbacks and associated lock. */
> > +static struct rcu_head *rcu_tasks_cbs_head;
> > +static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
> > +static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
> > +
> > +/* Post an RCU-tasks callback. */
> > +void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
> > +{
> > +       unsigned long flags;
> > +
> > +       rhp->next = NULL;
> > +       rhp->func = func;
> > +       raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> > +       *rcu_tasks_cbs_tail = rhp;
> > +       rcu_tasks_cbs_tail = &rhp->next;
> > +       raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu_tasks);
> > +
> > +/* See if tasks are still holding out, complain if so. */
> > +static void check_holdout_task(struct task_struct *t)
> > +{
> > +       if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
> > +           t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> > +           !ACCESS_ONCE(t->on_rq)) {
> > +               ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
> > +               list_del_rcu(&t->rcu_tasks_holdout_list);
> > +               put_task_struct(t);
> > +       }
> > +}
> > +
> 
> I don't see a WARN() for the "complain if so" part. :)

Indeed, that comes in a later patch.  Good catch, fixed the comment.

> > +/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
> > +static int __noreturn rcu_tasks_kthread(void *arg)
> > +{
> > +       unsigned long flags;
> > +       struct task_struct *g, *t;
> > +       struct rcu_head *list;
> > +       struct rcu_head *next;
> > +       LIST_HEAD(rcu_tasks_holdouts);
> > +
> > +       /* FIXME: Add housekeeping affinity. */
> > +
> > +       /*
> > +        * Each pass through the following loop makes one check for
> > +        * newly arrived callbacks, and, if there are some, waits for
> > +        * one RCU-tasks grace period and then invokes the callbacks.
> > +        * This loop is terminated by the system going down.  ;-)
> > +        */
> > +       for (;;) {
> > +
> > +               /* Pick up any new callbacks. */
> > +               raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> > +               list = rcu_tasks_cbs_head;
> > +               rcu_tasks_cbs_head = NULL;
> > +               rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
> > +               raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> > +
> > +               /* If there were none, wait a bit and start over. */
> > +               if (!list) {
> > +                       schedule_timeout_interruptible(HZ);
> > +                       WARN_ON(signal_pending(current));
> > +                       continue;
> > +               }
> 
> Why not use a wait queue here? Since this is called very infrequently,
> it should be a win when compared to periodically waking up and
> checking, no?

That comes in a later patch (rcu: Improve RCU-tasks energy efficiency).
Brain-dead simple first, more sophisticated later.

> > +
> > +               /*
> > +                * Wait for all pre-existing t->on_rq and t->nvcsw
> > +                * transitions to complete.  Invoking synchronize_sched()
> > +                * suffices because all these transitions occur with
> > +                * interrupts disabled.  Without this synchronize_sched(),
> > +                * a read-side critical section that started before the
> > +                * grace period might be incorrectly seen as having started
> > +                * after the grace period.
> > +                *
> > +                * This synchronize_sched() also dispenses with the
> > +                * need for a memory barrier on the first store to
> > +                * ->rcu_tasks_holdout, as it forces the store to happen
> > +                * after the beginning of the grace period.
> > +                */
> > +               synchronize_sched();
> > +
> > +               /*
> > +                * There were callbacks, so we need to wait for an
> > +                * RCU-tasks grace period.  Start off by scanning
> > +                * the task list for tasks that are not already
> > +                * voluntarily blocked.  Mark these tasks and make
> > +                * a list of them in rcu_tasks_holdouts.
> > +                */
> > +               rcu_read_lock();
> > +               for_each_process_thread(g, t) {
> > +                       if (t != current && ACCESS_ONCE(t->on_rq) &&
> > +                           !is_idle_task(t)) {
> > +                               get_task_struct(t);
> > +                               t->rcu_tasks_nvcsw = ACCESS_ONCE(t->nvcsw);
> > +                               ACCESS_ONCE(t->rcu_tasks_holdout) = 1;
> > +                               list_add(&t->rcu_tasks_holdout_list,
> > +                                        &rcu_tasks_holdouts);
> > +                       }
> > +               }
> > +               rcu_read_unlock();
> 
> I don't see why this is a read side critical section. What am I missing?

You are missing that it is not safe to traverse the tasks list without
either holding the tasks lock or being in a read-side critical section.

> > +
> > +               /*
> > +                * Each pass through the following loop scans the list
> > +                * of holdout tasks, removing any that are no longer
> > +                * holdouts.  When the list is empty, we are done.
> > +                */
> > +               while (!list_empty(&rcu_tasks_holdouts)) {
> > +                       schedule_timeout_interruptible(HZ);
> > +                       WARN_ON(signal_pending(current));
> > +                       rcu_read_lock();
> > +                       list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
> > +                                               rcu_tasks_holdout_list)
> > +                               check_holdout_task(t);
> > +                       rcu_read_unlock();
> > +               }
> > +
> > +               /*
> > +                * Because ->on_rq and ->nvcsw are not guaranteed
> > +                * to have a full memory barriers prior to them in the
> > +                * schedule() path, memory reordering on other CPUs could
> > +                * cause their RCU-tasks read-side critical sections to
> > +                * extend past the end of the grace period.  However,
> > +                * because these ->nvcsw updates are carried out with
> > +                * interrupts disabled, we can use synchronize_sched()
> > +                * to force the needed ordering on all such CPUs.
> > +                *
> > +                * This synchronize_sched() also confines all
> > +                * ->rcu_tasks_holdout accesses to be within the grace
> > +                * period, avoiding the need for memory barriers for
> > +                * ->rcu_tasks_holdout accesses.
> > +                */
> > +               synchronize_sched();
> > +
> > +               /* Invoke the callbacks. */
> > +               while (list) {
> > +                       next = list->next;
> 
> I think adding a prefetch(next) here should be helpful.

We do have that on the tree and tiny callback invocation, which makes
sense because those flavors can easily have a large number of callbacks.
But SRCU and RCU-tasks dispense with the prefetch() because there are
not likely to be very many callbacks.

Might add the prefetch() for SRCU and RCU-tasks at some point if that
changes.

							Thanx, Paul

> > +                       local_bh_disable();
> > +                       list->func(list);
> > +                       local_bh_enable();
> > +                       list = next;
> > +                       cond_resched();
> > +               }
> > +       }
> > +}
> > +
> > +/* Spawn rcu_tasks_kthread() at boot time. */
> > +static int __init rcu_spawn_tasks_kthread(void)
> > +{
> > +       struct task_struct __maybe_unused *t;
> > +
> > +       t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
> > +       BUG_ON(IS_ERR(t));
> > +       return 0;
> > +}
> > +early_initcall(rcu_spawn_tasks_kthread);
> > +
> > +#endif /* #ifdef CONFIG_TASKS_RCU */
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules
  2014-08-14 19:08     ` Pranith Kumar
@ 2014-08-14 21:29       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 21:29 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 03:08:06PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: Steven Rostedt <rostedt@goodmis.org>
> >
> > This commit exports the RCU-tasks APIs, call_rcu_tasks(),
> > synchronize_rcu_tasks(), and rcu_barrier_tasks(), to GPL-licensed
> > kernel modules.
> 
> Only two of these are being exported in this patch. Patch 1 is adding
> the export for call_rcu_tasks().

Good point, updated the commit log accordingly.

							Thanx, Paul

> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  kernel/rcu/update.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 4cece6e886ee..8f53a41dd9ee 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -433,6 +433,7 @@ void synchronize_rcu_tasks(void)
> >         /* Wait for the grace period. */
> >         wait_rcu_gp(call_rcu_tasks);
> >  }
> > +EXPORT_SYMBOL_GPL(synchronize_rcu_tasks);
> >
> >  /**
> >   * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks.
> > @@ -445,6 +446,7 @@ void rcu_barrier_tasks(void)
> >         /* There is only one callback queue, so this is easy.  ;-) */
> >         synchronize_rcu_tasks();
> >  }
> > +EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
> >
> >  /* See if tasks are still holding out, complain if so. */
> >  static void check_holdout_task(struct task_struct *t)
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks Paul E. McKenney
@ 2014-08-14 21:34     ` Pranith Kumar
  2014-08-14 21:44       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 21:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> This commit adds torture tests for RCU-tasks.  It also fixes a bug that
> would segfault for an RCU flavor lacking a callback-barrier function.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  include/linux/rcupdate.h |  1 +
>  kernel/rcu/rcutorture.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index e6aea256ad39..f504f797c9c8 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -55,6 +55,7 @@ enum rcutorture_type {
>         RCU_FLAVOR,
>         RCU_BH_FLAVOR,
>         RCU_SCHED_FLAVOR,
> +       RCU_TASKS_FLAVOR,
>         SRCU_FLAVOR,
>         INVALID_RCU_FLAVOR
>  };
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index febe07062ac5..52423f2c74da 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
>         .name           = "sched"
>  };
>
> +#ifdef CONFIG_TASKS_RCU
> +
> +/*
> + * Definitions for RCU-tasks torture testing.
> + */
> +
> +static int tasks_torture_read_lock(void)
> +{
> +       return 0;
> +}
> +
> +static void tasks_torture_read_unlock(int idx)
> +{
> +}
> +
> +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
> +{
> +       call_rcu_tasks(&p->rtort_rcu, rcu_torture_cb);
> +}
> +
> +static struct rcu_torture_ops tasks_ops = {
> +       .ttype          = RCU_TASKS_FLAVOR,
> +       .init           = rcu_sync_torture_init,
> +       .readlock       = tasks_torture_read_lock,
> +       .read_delay     = rcu_read_delay,  /* just reuse rcu's version. */
> +       .readunlock     = tasks_torture_read_unlock,
> +       .completed      = rcu_no_completed,
> +       .deferred_free  = rcu_tasks_torture_deferred_free,
> +       .sync           = synchronize_rcu_tasks,
> +       .exp_sync       = synchronize_rcu_tasks,
> +       .call           = call_rcu_tasks,
> +       .cb_barrier     = rcu_barrier_tasks,
> +       .fqs            = NULL,
> +       .stats          = NULL,
> +       .irq_capable    = 1,
> +       .name           = "tasks"
> +};
> +
> +#define RCUTORTURE_TASKS_OPS &tasks_ops,


Not sure about the comma here, no harm but still... a minor nit :)

> +
> +#else /* #ifdef CONFIG_TASKS_RCU */
> +
> +#define RCUTORTURE_TASKS_OPS
> +
> +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> +
>  /*
>   * RCU torture priority-boost testing.  Runs one real-time thread per
>   * CPU for moderate bursts, repeatedly registering RCU callbacks and
> @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
>                 if (atomic_dec_and_test(&barrier_cbs_count))
>                         wake_up(&barrier_wq);
>         } while (!torture_must_stop());
> -       cur_ops->cb_barrier();
> +       if (cur_ops->cb_barrier != NULL)
> +               cur_ops->cb_barrier();
>         destroy_rcu_head_on_stack(&rcu);
>         torture_kthread_stopping("rcu_torture_barrier_cbs");
>         return 0;
> @@ -1534,6 +1581,7 @@ rcu_torture_init(void)
>         int firsterr = 0;
>         static struct rcu_torture_ops *torture_ops[] = {
>                 &rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &sched_ops,
> +               RCUTORTURE_TASKS_OPS
>         };
>
>         if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable))
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks Paul E. McKenney
@ 2014-08-14 21:39     ` Pranith Kumar
  2014-08-14 21:59       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 21:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> This commit adds a three-minute RCU-tasks stall warning.  The actual
> time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
> with values less than or equal to zero disabling the stall warnings.
> The default value is three minutes, which means that the tasks that
> have not yet responded will get their stacks dumped every ten minutes,
> until they pass through a voluntary context switch.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Something about 3 minutes and 10 minutes is mixed up here!

> ---
>  Documentation/kernel-parameters.txt |  5 +++++
>  kernel/rcu/update.c                 | 27 ++++++++++++++++++++++++---
>  2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 910c3829f81d..8cdbde7b17f5 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2921,6 +2921,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>         rcupdate.rcu_cpu_stall_timeout= [KNL]
>                         Set timeout for RCU CPU stall warning messages.
>
> +       rcupdate.rcu_task_stall_timeout= [KNL]
> +                       Set timeout in jiffies for RCU task stall warning
> +                       messages.  Disable with a value less than or equal
> +                       to zero.
> +
>         rdinit=         [KNL]
>                         Format: <full_path>
>                         Run specified binary instead of /init from the ramdisk,
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 8f53a41dd9ee..f1535404a79e 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -374,7 +374,7 @@ static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
>  DEFINE_SRCU(tasks_rcu_exit_srcu);
>
>  /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. */
> -static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3;
> +static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
>  module_param(rcu_task_stall_timeout, int, 0644);
>
>  /* Post an RCU-tasks callback. */
> @@ -449,7 +449,8 @@ void rcu_barrier_tasks(void)
>  EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
>
>  /* See if tasks are still holding out, complain if so. */
> -static void check_holdout_task(struct task_struct *t)
> +static void check_holdout_task(struct task_struct *t,
> +                              bool needreport, bool *firstreport)
>  {
>         if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
>             t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> @@ -457,7 +458,15 @@ static void check_holdout_task(struct task_struct *t)
>                 ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
>                 list_del_rcu(&t->rcu_tasks_holdout_list);
>                 put_task_struct(t);
> +               return;
>         }
> +       if (!needreport)
> +               return;
> +       if (*firstreport) {
> +               pr_err("INFO: rcu_tasks detected stalls on tasks:\n");
> +               *firstreport = false;
> +       }
> +       sched_show_task(t);
>  }
>
>  /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
> @@ -465,6 +474,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  {
>         unsigned long flags;
>         struct task_struct *g, *t;
> +       unsigned long lastreport;
>         struct rcu_head *list;
>         struct rcu_head *next;
>         LIST_HEAD(rcu_tasks_holdouts);
> @@ -543,13 +553,24 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>                  * of holdout tasks, removing any that are no longer
>                  * holdouts.  When the list is empty, we are done.
>                  */
> +               lastreport = jiffies;
>                 while (!list_empty(&rcu_tasks_holdouts)) {
> +                       bool firstreport;
> +                       bool needreport;
> +                       int rtst;
> +
>                         schedule_timeout_interruptible(HZ);
> +                       rtst = ACCESS_ONCE(rcu_task_stall_timeout);
> +                       needreport = rtst > 0 &&
> +                                    time_after(jiffies, lastreport + rtst);
> +                       if (needreport)
> +                               lastreport = jiffies;
> +                       firstreport = true;
>                         WARN_ON(signal_pending(current));
>                         rcu_read_lock();
>                         list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
>                                                 rcu_tasks_holdout_list)
> -                               check_holdout_task(t);
> +                               check_holdout_task(t, needreport, &firstreport);
>                         rcu_read_unlock();
>                 }
>
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency
  2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency Paul E. McKenney
@ 2014-08-14 21:42     ` Pranith Kumar
  2014-08-14 21:55       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 21:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> The current RCU-tasks implementation uses strict polling to detect
> callback arrivals.  This works quite well, but is not so good for
> energy efficiency.  This commit therefore replaces the strict polling
> with a wait queue.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/update.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index f1535404a79e..1256a900cd01 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
>  /* Global list of callbacks and associated lock. */
>  static struct rcu_head *rcu_tasks_cbs_head;
>  static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
> +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
>  static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
>
>  /* Track exiting tasks in order to allow them to be waited for. */
> @@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
>  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
>  {
>         unsigned long flags;
> +       bool needwake;
>
>         rhp->next = NULL;
>         rhp->func = func;
>         raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> +       needwake = !rcu_tasks_cbs_head;
>         *rcu_tasks_cbs_tail = rhp;
>         rcu_tasks_cbs_tail = &rhp->next;
>         raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> +       if (needwake)
> +               wake_up(&rcu_tasks_cbs_wq);
>  }
>  EXPORT_SYMBOL_GPL(call_rcu_tasks);

I think you want

needwake = !!rcu_tasks_cbs_head;

otherwise it will wake up when rcu_tasks_cbs_head is null, no?

>
> @@ -498,8 +503,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>
>                 /* If there were none, wait a bit and start over. */
>                 if (!list) {
> -                       schedule_timeout_interruptible(HZ);
> -                       WARN_ON(signal_pending(current));
> +                       wait_event_interruptible(rcu_tasks_cbs_wq,
> +                                                rcu_tasks_cbs_head);
> +                       if (!rcu_tasks_cbs_head) {
> +                               WARN_ON(signal_pending(current));
> +                               schedule_timeout_interruptible(HZ/10);
> +                       }
>                         continue;
>                 }
>
> @@ -605,6 +614,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>                         list = next;
>                         cond_resched();
>                 }
> +               schedule_timeout_uninterruptible(HZ/10);
>         }
>  }
>
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks
  2014-08-14 21:34     ` Pranith Kumar
@ 2014-08-14 21:44       ` Paul E. McKenney
  2014-08-14 21:49         ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 21:44 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 05:34:53PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > This commit adds torture tests for RCU-tasks.  It also fixes a bug that
> > would segfault for an RCU flavor lacking a callback-barrier function.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  include/linux/rcupdate.h |  1 +
> >  kernel/rcu/rcutorture.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index e6aea256ad39..f504f797c9c8 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -55,6 +55,7 @@ enum rcutorture_type {
> >         RCU_FLAVOR,
> >         RCU_BH_FLAVOR,
> >         RCU_SCHED_FLAVOR,
> > +       RCU_TASKS_FLAVOR,
> >         SRCU_FLAVOR,
> >         INVALID_RCU_FLAVOR
> >  };
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index febe07062ac5..52423f2c74da 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
> >         .name           = "sched"
> >  };
> >
> > +#ifdef CONFIG_TASKS_RCU
> > +
> > +/*
> > + * Definitions for RCU-tasks torture testing.
> > + */
> > +
> > +static int tasks_torture_read_lock(void)
> > +{
> > +       return 0;
> > +}
> > +
> > +static void tasks_torture_read_unlock(int idx)
> > +{
> > +}
> > +
> > +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
> > +{
> > +       call_rcu_tasks(&p->rtort_rcu, rcu_torture_cb);
> > +}
> > +
> > +static struct rcu_torture_ops tasks_ops = {
> > +       .ttype          = RCU_TASKS_FLAVOR,
> > +       .init           = rcu_sync_torture_init,
> > +       .readlock       = tasks_torture_read_lock,
> > +       .read_delay     = rcu_read_delay,  /* just reuse rcu's version. */
> > +       .readunlock     = tasks_torture_read_unlock,
> > +       .completed      = rcu_no_completed,
> > +       .deferred_free  = rcu_tasks_torture_deferred_free,
> > +       .sync           = synchronize_rcu_tasks,
> > +       .exp_sync       = synchronize_rcu_tasks,
> > +       .call           = call_rcu_tasks,
> > +       .cb_barrier     = rcu_barrier_tasks,
> > +       .fqs            = NULL,
> > +       .stats          = NULL,
> > +       .irq_capable    = 1,
> > +       .name           = "tasks"
> > +};
> > +
> > +#define RCUTORTURE_TASKS_OPS &tasks_ops,
> 
> Not sure about the comma here, no harm but still... a minor nit :)

Good point, it would be better to parenthesize this an put the comma
at the point of use.  Fixed!

							Thanx, Paul

> > +
> > +#else /* #ifdef CONFIG_TASKS_RCU */
> > +
> > +#define RCUTORTURE_TASKS_OPS
> > +
> > +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> > +
> >  /*
> >   * RCU torture priority-boost testing.  Runs one real-time thread per
> >   * CPU for moderate bursts, repeatedly registering RCU callbacks and
> > @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
> >                 if (atomic_dec_and_test(&barrier_cbs_count))
> >                         wake_up(&barrier_wq);
> >         } while (!torture_must_stop());
> > -       cur_ops->cb_barrier();
> > +       if (cur_ops->cb_barrier != NULL)
> > +               cur_ops->cb_barrier();
> >         destroy_rcu_head_on_stack(&rcu);
> >         torture_kthread_stopping("rcu_torture_barrier_cbs");
> >         return 0;
> > @@ -1534,6 +1581,7 @@ rcu_torture_init(void)
> >         int firsterr = 0;
> >         static struct rcu_torture_ops *torture_ops[] = {
> >                 &rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &sched_ops,
> > +               RCUTORTURE_TASKS_OPS
> >         };
> >
> >         if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable))
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks
  2014-08-14 21:44       ` Paul E. McKenney
@ 2014-08-14 21:49         ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 21:49 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 02:44:15PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 14, 2014 at 05:34:53PM -0400, Pranith Kumar wrote:
> > On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > >
> > > This commit adds torture tests for RCU-tasks.  It also fixes a bug that
> > > would segfault for an RCU flavor lacking a callback-barrier function.
> > >
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > > ---
> > >  include/linux/rcupdate.h |  1 +
> > >  kernel/rcu/rcutorture.c  | 50 +++++++++++++++++++++++++++++++++++++++++++++++-
> > >  2 files changed, 50 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index e6aea256ad39..f504f797c9c8 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -55,6 +55,7 @@ enum rcutorture_type {
> > >         RCU_FLAVOR,
> > >         RCU_BH_FLAVOR,
> > >         RCU_SCHED_FLAVOR,
> > > +       RCU_TASKS_FLAVOR,
> > >         SRCU_FLAVOR,
> > >         INVALID_RCU_FLAVOR
> > >  };
> > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > index febe07062ac5..52423f2c74da 100644
> > > --- a/kernel/rcu/rcutorture.c
> > > +++ b/kernel/rcu/rcutorture.c
> > > @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
> > >         .name           = "sched"
> > >  };
> > >
> > > +#ifdef CONFIG_TASKS_RCU
> > > +
> > > +/*
> > > + * Definitions for RCU-tasks torture testing.
> > > + */
> > > +
> > > +static int tasks_torture_read_lock(void)
> > > +{
> > > +       return 0;
> > > +}
> > > +
> > > +static void tasks_torture_read_unlock(int idx)
> > > +{
> > > +}
> > > +
> > > +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
> > > +{
> > > +       call_rcu_tasks(&p->rtort_rcu, rcu_torture_cb);
> > > +}
> > > +
> > > +static struct rcu_torture_ops tasks_ops = {
> > > +       .ttype          = RCU_TASKS_FLAVOR,
> > > +       .init           = rcu_sync_torture_init,
> > > +       .readlock       = tasks_torture_read_lock,
> > > +       .read_delay     = rcu_read_delay,  /* just reuse rcu's version. */
> > > +       .readunlock     = tasks_torture_read_unlock,
> > > +       .completed      = rcu_no_completed,
> > > +       .deferred_free  = rcu_tasks_torture_deferred_free,
> > > +       .sync           = synchronize_rcu_tasks,
> > > +       .exp_sync       = synchronize_rcu_tasks,
> > > +       .call           = call_rcu_tasks,
> > > +       .cb_barrier     = rcu_barrier_tasks,
> > > +       .fqs            = NULL,
> > > +       .stats          = NULL,
> > > +       .irq_capable    = 1,
> > > +       .name           = "tasks"
> > > +};
> > > +
> > > +#define RCUTORTURE_TASKS_OPS &tasks_ops,
> > 
> > Not sure about the comma here, no harm but still... a minor nit :)
> 
> Good point, it would be better to parenthesize this an put the comma
> at the point of use.  Fixed!

Except that this gives me a syntax error when CONFIG_TASKS_RCU=n because
we end up with a pair of consecutive commas.  Nice try, though!  ;-)

							Thanx, Paul

> > > +
> > > +#else /* #ifdef CONFIG_TASKS_RCU */
> > > +
> > > +#define RCUTORTURE_TASKS_OPS
> > > +
> > > +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> > > +
> > >  /*
> > >   * RCU torture priority-boost testing.  Runs one real-time thread per
> > >   * CPU for moderate bursts, repeatedly registering RCU callbacks and
> > > @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
> > >                 if (atomic_dec_and_test(&barrier_cbs_count))
> > >                         wake_up(&barrier_wq);
> > >         } while (!torture_must_stop());
> > > -       cur_ops->cb_barrier();
> > > +       if (cur_ops->cb_barrier != NULL)
> > > +               cur_ops->cb_barrier();
> > >         destroy_rcu_head_on_stack(&rcu);
> > >         torture_kthread_stopping("rcu_torture_barrier_cbs");
> > >         return 0;
> > > @@ -1534,6 +1581,7 @@ rcu_torture_init(void)
> > >         int firsterr = 0;
> > >         static struct rcu_torture_ops *torture_ops[] = {
> > >                 &rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &sched_ops,
> > > +               RCUTORTURE_TASKS_OPS
> > >         };
> > >
> > >         if (!torture_init_begin(torture_type, verbose, &rcutorture_runnable))
> > > --
> > > 1.8.1.5
> > >
> > 
> > 
> > 
> > -- 
> > Pranith
> > 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency
  2014-08-14 21:42     ` Pranith Kumar
@ 2014-08-14 21:55       ` Paul E. McKenney
  2014-08-14 22:00         ` Pranith Kumar
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 21:55 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 05:42:06PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > The current RCU-tasks implementation uses strict polling to detect
> > callback arrivals.  This works quite well, but is not so good for
> > energy efficiency.  This commit therefore replaces the strict polling
> > with a wait queue.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/update.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index f1535404a79e..1256a900cd01 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
> >  /* Global list of callbacks and associated lock. */
> >  static struct rcu_head *rcu_tasks_cbs_head;
> >  static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
> > +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
> >  static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
> >
> >  /* Track exiting tasks in order to allow them to be waited for. */
> > @@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
> >  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
> >  {
> >         unsigned long flags;
> > +       bool needwake;
> >
> >         rhp->next = NULL;
> >         rhp->func = func;
> >         raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> > +       needwake = !rcu_tasks_cbs_head;
> >         *rcu_tasks_cbs_tail = rhp;
> >         rcu_tasks_cbs_tail = &rhp->next;
> >         raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> > +       if (needwake)
> > +               wake_up(&rcu_tasks_cbs_wq);
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu_tasks);
> 
> I think you want
> 
> needwake = !!rcu_tasks_cbs_head;
> 
> otherwise it will wake up when rcu_tasks_cbs_head is null, no?

Well, that is exactly what we want.  Note that we do the test -before-
the enqueue.  This means that we do the wakeup if the list -was-
empty before the enqueue, which is exactly the case where the task
might be asleep without having already been sent a wakeup.

Assuming that wakeups are reliably delivered, of course.  But if they
are not reliably delivered, that is a bug that needs to be fixed.

							Thanx, Paul

> > @@ -498,8 +503,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >
> >                 /* If there were none, wait a bit and start over. */
> >                 if (!list) {
> > -                       schedule_timeout_interruptible(HZ);
> > -                       WARN_ON(signal_pending(current));
> > +                       wait_event_interruptible(rcu_tasks_cbs_wq,
> > +                                                rcu_tasks_cbs_head);
> > +                       if (!rcu_tasks_cbs_head) {
> > +                               WARN_ON(signal_pending(current));
> > +                               schedule_timeout_interruptible(HZ/10);
> > +                       }
> >                         continue;
> >                 }
> >
> > @@ -605,6 +614,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >                         list = next;
> >                         cond_resched();
> >                 }
> > +               schedule_timeout_uninterruptible(HZ/10);
> >         }
> >  }
> >
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks
  2014-08-14 21:39     ` Pranith Kumar
@ 2014-08-14 21:59       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 21:59 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 05:39:54PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > This commit adds a three-minute RCU-tasks stall warning.  The actual
> > time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
> > with values less than or equal to zero disabling the stall warnings.
> > The default value is three minutes, which means that the tasks that
> > have not yet responded will get their stacks dumped every ten minutes,
> > until they pass through a voluntary context switch.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Something about 3 minutes and 10 minutes is mixed up here!

Good catch, updated the commit log to also say ten minutes.

							Thanx, Paul

> > ---
> >  Documentation/kernel-parameters.txt |  5 +++++
> >  kernel/rcu/update.c                 | 27 ++++++++++++++++++++++++---
> >  2 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 910c3829f81d..8cdbde7b17f5 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2921,6 +2921,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >         rcupdate.rcu_cpu_stall_timeout= [KNL]
> >                         Set timeout for RCU CPU stall warning messages.
> >
> > +       rcupdate.rcu_task_stall_timeout= [KNL]
> > +                       Set timeout in jiffies for RCU task stall warning
> > +                       messages.  Disable with a value less than or equal
> > +                       to zero.
> > +
> >         rdinit=         [KNL]
> >                         Format: <full_path>
> >                         Run specified binary instead of /init from the ramdisk,
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 8f53a41dd9ee..f1535404a79e 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -374,7 +374,7 @@ static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
> >  DEFINE_SRCU(tasks_rcu_exit_srcu);
> >
> >  /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. */
> > -static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3;
> > +static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
> >  module_param(rcu_task_stall_timeout, int, 0644);
> >
> >  /* Post an RCU-tasks callback. */
> > @@ -449,7 +449,8 @@ void rcu_barrier_tasks(void)
> >  EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
> >
> >  /* See if tasks are still holding out, complain if so. */
> > -static void check_holdout_task(struct task_struct *t)
> > +static void check_holdout_task(struct task_struct *t,
> > +                              bool needreport, bool *firstreport)
> >  {
> >         if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
> >             t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> > @@ -457,7 +458,15 @@ static void check_holdout_task(struct task_struct *t)
> >                 ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
> >                 list_del_rcu(&t->rcu_tasks_holdout_list);
> >                 put_task_struct(t);
> > +               return;
> >         }
> > +       if (!needreport)
> > +               return;
> > +       if (*firstreport) {
> > +               pr_err("INFO: rcu_tasks detected stalls on tasks:\n");
> > +               *firstreport = false;
> > +       }
> > +       sched_show_task(t);
> >  }
> >
> >  /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
> > @@ -465,6 +474,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  {
> >         unsigned long flags;
> >         struct task_struct *g, *t;
> > +       unsigned long lastreport;
> >         struct rcu_head *list;
> >         struct rcu_head *next;
> >         LIST_HEAD(rcu_tasks_holdouts);
> > @@ -543,13 +553,24 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >                  * of holdout tasks, removing any that are no longer
> >                  * holdouts.  When the list is empty, we are done.
> >                  */
> > +               lastreport = jiffies;
> >                 while (!list_empty(&rcu_tasks_holdouts)) {
> > +                       bool firstreport;
> > +                       bool needreport;
> > +                       int rtst;
> > +
> >                         schedule_timeout_interruptible(HZ);
> > +                       rtst = ACCESS_ONCE(rcu_task_stall_timeout);
> > +                       needreport = rtst > 0 &&
> > +                                    time_after(jiffies, lastreport + rtst);
> > +                       if (needreport)
> > +                               lastreport = jiffies;
> > +                       firstreport = true;
> >                         WARN_ON(signal_pending(current));
> >                         rcu_read_lock();
> >                         list_for_each_entry_rcu(t, &rcu_tasks_holdouts,
> >                                                 rcu_tasks_holdout_list)
> > -                               check_holdout_task(t);
> > +                               check_holdout_task(t, needreport, &firstreport);
> >                         rcu_read_unlock();
> >                 }
> >
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency
  2014-08-14 21:55       ` Paul E. McKenney
@ 2014-08-14 22:00         ` Pranith Kumar
  0 siblings, 0 replies; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 22:00 UTC (permalink / raw)
  To: Paul McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 5:55 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Aug 14, 2014 at 05:42:06PM -0400, Pranith Kumar wrote:
>> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>> >
>> > The current RCU-tasks implementation uses strict polling to detect
>> > callback arrivals.  This works quite well, but is not so good for
>> > energy efficiency.  This commit therefore replaces the strict polling
>> > with a wait queue.
>> >
>> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > ---
>> >  kernel/rcu/update.c | 14 ++++++++++++--
>> >  1 file changed, 12 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
>> > index f1535404a79e..1256a900cd01 100644
>> > --- a/kernel/rcu/update.c
>> > +++ b/kernel/rcu/update.c
>> > @@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
>> >  /* Global list of callbacks and associated lock. */
>> >  static struct rcu_head *rcu_tasks_cbs_head;
>> >  static struct rcu_head **rcu_tasks_cbs_tail = &rcu_tasks_cbs_head;
>> > +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
>> >  static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
>> >
>> >  /* Track exiting tasks in order to allow them to be waited for. */
>> > @@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
>> >  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
>> >  {
>> >         unsigned long flags;
>> > +       bool needwake;
>> >
>> >         rhp->next = NULL;
>> >         rhp->func = func;
>> >         raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
>> > +       needwake = !rcu_tasks_cbs_head;
>> >         *rcu_tasks_cbs_tail = rhp;
>> >         rcu_tasks_cbs_tail = &rhp->next;
>> >         raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
>> > +       if (needwake)
>> > +               wake_up(&rcu_tasks_cbs_wq);
>> >  }
>> >  EXPORT_SYMBOL_GPL(call_rcu_tasks);
>>
>> I think you want
>>
>> needwake = !!rcu_tasks_cbs_head;
>>
>> otherwise it will wake up when rcu_tasks_cbs_head is null, no?
>
> Well, that is exactly what we want.  Note that we do the test -before-
> the enqueue.  This means that we do the wakeup if the list -was-
> empty before the enqueue, which is exactly the case where the task
> might be asleep without having already been sent a wakeup.
>
> Assuming that wakeups are reliably delivered, of course.  But if they
> are not reliably delivered, that is a bug that needs to be fixed.
>

Ohk, I did not notice the modification through rcu_tasks_cbs_tail! All is well.

-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks()
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks() Paul E. McKenney
@ 2014-08-14 22:28     ` Pranith Kumar
  2014-08-14 22:53       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 22:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> It is expected that many sites will have CONFIG_TASKS_RCU=y, but
> will never actually invoke call_rcu_tasks().  For such sites, creating
> rcu_tasks_kthread() at boot is wasteful.  This commit therefore defers
> creation of this kthread until the time of the first call_rcu_tasks().
>
> This of course means that the first call_rcu_tasks() must be invoked
> from process context after the scheduler is fully operational.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/update.c | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 1256a900cd01..d997163c7e92 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -378,7 +378,12 @@ DEFINE_SRCU(tasks_rcu_exit_srcu);
>  static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
>  module_param(rcu_task_stall_timeout, int, 0644);
>
> -/* Post an RCU-tasks callback. */
> +static void rcu_spawn_tasks_kthread(void);
> +
> +/*
> + * Post an RCU-tasks callback.  First call must be from process context
> + * after the scheduler if fully operational.
> + */
>  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
>  {
>         unsigned long flags;
> @@ -391,8 +396,10 @@ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
>         *rcu_tasks_cbs_tail = rhp;
>         rcu_tasks_cbs_tail = &rhp->next;
>         raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> -       if (needwake)
> +       if (needwake) {
> +               rcu_spawn_tasks_kthread();
>                 wake_up(&rcu_tasks_cbs_wq);
> +       }
>  }
>  EXPORT_SYMBOL_GPL(call_rcu_tasks);
>
> @@ -618,15 +625,27 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>         }
>  }
>
> -/* Spawn rcu_tasks_kthread() at boot time. */
> -static int __init rcu_spawn_tasks_kthread(void)
> +/* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */
> +static void rcu_spawn_tasks_kthread(void)
>  {
> -       struct task_struct __maybe_unused *t;
> +       static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
> +       static struct task_struct *rcu_tasks_kthread_ptr;
> +       struct task_struct *t;
>
> +       if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) {
> +               smp_mb(); /* Ensure caller sees full kthread. */
> +               return;
> +       }

I don't see the need for this smp_mb(). The caller has already seen
that rcu_tasks_kthread_ptr is assigned. What are we ensuring with this
barrier again?

an smp_rmb() before this ACCESS_ONCE() and an smp_wmb() after
assigning to rcu_tasks_kthread_ptr should be enough, right?

> +       mutex_lock(&rcu_tasks_kthread_mutex);
> +       if (rcu_tasks_kthread_ptr) {
> +               mutex_unlock(&rcu_tasks_kthread_mutex);
> +               return;
> +       }
>         t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
>         BUG_ON(IS_ERR(t));
> -       return 0;
> +       smp_mb(); /* Ensure others see full kthread. */
> +       ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;

Isn't it better to reverse these two statements and change as follows?

ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
smp_wmb();

or

smp_store_release(rcu_tasks_kthread_ptr, t);

will ensure that this write to rcu_task_kthread_ptr is ordered with
the previous read. I recently read memory-barriers.txt, so please
excuse me if I am totally wrong. But I am confused! :(

> +       mutex_unlock(&rcu_tasks_kthread_mutex);
>  }
> -early_initcall(rcu_spawn_tasks_kthread);
>
>  #endif /* #ifdef CONFIG_TASKS_RCU */
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks()
  2014-08-14 22:28     ` Pranith Kumar
@ 2014-08-14 22:53       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 22:53 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 06:28:53PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > It is expected that many sites will have CONFIG_TASKS_RCU=y, but
> > will never actually invoke call_rcu_tasks().  For such sites, creating
> > rcu_tasks_kthread() at boot is wasteful.  This commit therefore defers
> > creation of this kthread until the time of the first call_rcu_tasks().
> >
> > This of course means that the first call_rcu_tasks() must be invoked
> > from process context after the scheduler is fully operational.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/update.c | 33 ++++++++++++++++++++++++++-------
> >  1 file changed, 26 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 1256a900cd01..d997163c7e92 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -378,7 +378,12 @@ DEFINE_SRCU(tasks_rcu_exit_srcu);
> >  static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
> >  module_param(rcu_task_stall_timeout, int, 0644);
> >
> > -/* Post an RCU-tasks callback. */
> > +static void rcu_spawn_tasks_kthread(void);
> > +
> > +/*
> > + * Post an RCU-tasks callback.  First call must be from process context
> > + * after the scheduler if fully operational.
> > + */
> >  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
> >  {
> >         unsigned long flags;
> > @@ -391,8 +396,10 @@ void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
> >         *rcu_tasks_cbs_tail = rhp;
> >         rcu_tasks_cbs_tail = &rhp->next;
> >         raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> > -       if (needwake)
> > +       if (needwake) {
> > +               rcu_spawn_tasks_kthread();
> >                 wake_up(&rcu_tasks_cbs_wq);
> > +       }
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu_tasks);
> >
> > @@ -618,15 +625,27 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >         }
> >  }
> >
> > -/* Spawn rcu_tasks_kthread() at boot time. */
> > -static int __init rcu_spawn_tasks_kthread(void)
> > +/* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */
> > +static void rcu_spawn_tasks_kthread(void)
> >  {
> > -       struct task_struct __maybe_unused *t;
> > +       static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
> > +       static struct task_struct *rcu_tasks_kthread_ptr;
> > +       struct task_struct *t;
> >
> > +       if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) {
> > +               smp_mb(); /* Ensure caller sees full kthread. */
> > +               return;
> > +       }
> 
> I don't see the need for this smp_mb(). The caller has already seen
> that rcu_tasks_kthread_ptr is assigned. What are we ensuring with this
> barrier again?

We are ensuring that any later operations on rcu_tasks_kthread_ptr
see a fully initialized thread.  Because these later operations
might be loads, we cannot rely on control dependencies.

> an smp_rmb() before this ACCESS_ONCE() and an smp_wmb() after
> assigning to rcu_tasks_kthread_ptr should be enough, right?

Probably.  But given that rcu_spawn_tasks_kthread() is only called
when a CPU is onlined, I am not much inclined to weaken it.

> > +       mutex_lock(&rcu_tasks_kthread_mutex);
> > +       if (rcu_tasks_kthread_ptr) {
> > +               mutex_unlock(&rcu_tasks_kthread_mutex);
> > +               return;
> > +       }
> >         t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
> >         BUG_ON(IS_ERR(t));
> > -       return 0;
> > +       smp_mb(); /* Ensure others see full kthread. */
> > +       ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
> 
> Isn't it better to reverse these two statements and change as follows?
> 
> ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
> smp_wmb();

This would break.  We need all the task creation stuff to be seen as
having happened before the store to rcu_tasks_kthread_ptr.  Putting
the barrier after the store to rcu_tasks_kthread_ptr would allow
both compiler and CPU to reorder task-creation stuff to follow the
store to the pointer, which would not be good.

> or
> 
> smp_store_release(rcu_tasks_kthread_ptr, t);
> 
> will ensure that this write to rcu_task_kthread_ptr is ordered with
> the previous read. I recently read memory-barriers.txt, so please
> excuse me if I am totally wrong. But I am confused! :(

Hmmm...  An smp_store_release() combined with smp_load_acquire()
up earlier might be a good approach.  Maybe as a future cleanup.

But please note that smp_store_release() puts the barrier -before-
the store.  ;-)

							Thanx, Paul

> > +       mutex_unlock(&rcu_tasks_kthread_mutex);
> >  }
> > -early_initcall(rcu_spawn_tasks_kthread);
> >
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs
  2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs Paul E. McKenney
@ 2014-08-14 22:55     ` Pranith Kumar
  2014-08-14 23:16       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Pranith Kumar @ 2014-08-14 22:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>
> Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
> usermode execution.  There would be neither a context switch nor a
> scheduling-clock interrupt to tell TASKS_RCU that the task in question
> had passed through a quiescent state.  The grace period would therefore
> extend indefinitely.  This commit therefore makes RCU's dyntick-idle
> subsystem record the task_struct structure of the task that is running
> in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
> then access this information and record a quiescent state on
> behalf of any CPU running in dyntick-idle usermode.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/init_task.h |  3 ++-
>  include/linux/sched.h     |  2 ++
>  kernel/rcu/tree.c         |  2 ++
>  kernel/rcu/tree.h         |  2 ++
>  kernel/rcu/tree_plugin.h  | 16 ++++++++++++++++
>  kernel/rcu/update.c       |  4 +++-
>  6 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 78715ea7c30c..642828009324 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -128,7 +128,8 @@ extern struct group_info init_groups;
>  #define INIT_TASK_RCU_TASKS(tsk)                                       \
>         .rcu_tasks_holdout = false,                                     \
>         .rcu_tasks_holdout_list =                                       \
> -               LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> +               LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),             \
> +       .rcu_tasks_idle_cpu = -1,
>  #else
>  #define INIT_TASK_RCU_TASKS(tsk)
>  #endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 3cf124389ec7..5fa041f7a034 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1277,6 +1277,7 @@ struct task_struct {
>         unsigned long rcu_tasks_nvcsw;
>         int rcu_tasks_holdout;
>         struct list_head rcu_tasks_holdout_list;
> +       int rcu_tasks_idle_cpu;
>  #endif /* #ifdef CONFIG_TASKS_RCU */
>
>  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> @@ -2021,6 +2022,7 @@ static inline void rcu_copy_process(struct task_struct *p)
>  #ifdef CONFIG_TASKS_RCU
>         p->rcu_tasks_holdout = false;
>         INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> +       p->rcu_tasks_idle_cpu = -1;
>  #endif /* #ifdef CONFIG_TASKS_RCU */
>  }
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 645a33efc0d4..0d9ee1e4f446 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -526,6 +526,7 @@ static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval,
>         atomic_inc(&rdtp->dynticks);
>         smp_mb__after_atomic();  /* Force ordering with next sojourn. */
>         WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> +       rcu_dynticks_task_enter();
>
>         /*
>          * It is illegal to enter an extended quiescent state while
> @@ -642,6 +643,7 @@ void rcu_irq_exit(void)
>  static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval,
>                                int user)
>  {
> +       rcu_dynticks_task_exit();
>         smp_mb__before_atomic();  /* Force ordering w/previous sojourn. */
>         atomic_inc(&rdtp->dynticks);
>         /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 0f69a79c5b7d..37ff593b7725 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -579,6 +579,8 @@ static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
>  static void rcu_bind_gp_kthread(void);
>  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
>  static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
> +static void rcu_dynticks_task_enter(void);
> +static void rcu_dynticks_task_exit(void);
>
>  #endif /* #ifndef RCU_TREE_NONCORE */
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index a86a363ea453..0d8ef5cb1976 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2852,3 +2852,19 @@ static void rcu_bind_gp_kthread(void)
>                 set_cpus_allowed_ptr(current, cpumask_of(cpu));
>  #endif /* #ifdef CONFIG_NO_HZ_FULL */
>  }
> +
> +/* Record the current task on dyntick-idle entry. */
> +static void rcu_dynticks_task_enter(void)
> +{
> +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> +       ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
> +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */

Shouldn't we check that the cpu is actually a nohz_full cpu, like follows:

 static void rcu_dynticks_task_enter(void)
 {
 #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
-       ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
+       if (tick_nohz_full_cpu(smp_processor_id())
+                 ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
 #endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
 }

> +}
> +
> +/* Record no current task on dyntick-idle exit. */
> +static void rcu_dynticks_task_exit(void)
> +{
> +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> +       ACCESS_ONCE(current->rcu_tasks_idle_cpu) = -1;
> +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
> +}
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index d997163c7e92..a4140f25cf1a 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -466,7 +466,9 @@ static void check_holdout_task(struct task_struct *t,
>  {
>         if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
>             t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> -           !ACCESS_ONCE(t->on_rq)) {
> +           !ACCESS_ONCE(t->on_rq) ||
> +           (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
> +            !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) {

rcu_tasks_idle_cpu will be -1 in CONFIG_NO_HZ_FULL is not enabled. Why
are you checking both here?

>                 ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
>                 list_del_rcu(&t->rcu_tasks_holdout_list);
>                 put_task_struct(t);
> --
> 1.8.1.5
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs
  2014-08-14 22:55     ` Pranith Kumar
@ 2014-08-14 23:16       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2014-08-14 23:16 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, Lai Jiangshan, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	dvhart, Frédéric Weisbecker, Oleg Nesterov

On Thu, Aug 14, 2014 at 06:55:35PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >
> > Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
> > usermode execution.  There would be neither a context switch nor a
> > scheduling-clock interrupt to tell TASKS_RCU that the task in question
> > had passed through a quiescent state.  The grace period would therefore
> > extend indefinitely.  This commit therefore makes RCU's dyntick-idle
> > subsystem record the task_struct structure of the task that is running
> > in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
> > then access this information and record a quiescent state on
> > behalf of any CPU running in dyntick-idle usermode.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/init_task.h |  3 ++-
> >  include/linux/sched.h     |  2 ++
> >  kernel/rcu/tree.c         |  2 ++
> >  kernel/rcu/tree.h         |  2 ++
> >  kernel/rcu/tree_plugin.h  | 16 ++++++++++++++++
> >  kernel/rcu/update.c       |  4 +++-
> >  6 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 78715ea7c30c..642828009324 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -128,7 +128,8 @@ extern struct group_info init_groups;
> >  #define INIT_TASK_RCU_TASKS(tsk)                                       \
> >         .rcu_tasks_holdout = false,                                     \
> >         .rcu_tasks_holdout_list =                                       \
> > -               LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +               LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),             \
> > +       .rcu_tasks_idle_cpu = -1,
> >  #else
> >  #define INIT_TASK_RCU_TASKS(tsk)
> >  #endif
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 3cf124389ec7..5fa041f7a034 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1277,6 +1277,7 @@ struct task_struct {
> >         unsigned long rcu_tasks_nvcsw;
> >         int rcu_tasks_holdout;
> >         struct list_head rcu_tasks_holdout_list;
> > +       int rcu_tasks_idle_cpu;
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> >
> >  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> > @@ -2021,6 +2022,7 @@ static inline void rcu_copy_process(struct task_struct *p)
> >  #ifdef CONFIG_TASKS_RCU
> >         p->rcu_tasks_holdout = false;
> >         INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> > +       p->rcu_tasks_idle_cpu = -1;
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> >  }
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 645a33efc0d4..0d9ee1e4f446 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -526,6 +526,7 @@ static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval,
> >         atomic_inc(&rdtp->dynticks);
> >         smp_mb__after_atomic();  /* Force ordering with next sojourn. */
> >         WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> > +       rcu_dynticks_task_enter();
> >
> >         /*
> >          * It is illegal to enter an extended quiescent state while
> > @@ -642,6 +643,7 @@ void rcu_irq_exit(void)
> >  static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval,
> >                                int user)
> >  {
> > +       rcu_dynticks_task_exit();
> >         smp_mb__before_atomic();  /* Force ordering w/previous sojourn. */
> >         atomic_inc(&rdtp->dynticks);
> >         /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 0f69a79c5b7d..37ff593b7725 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -579,6 +579,8 @@ static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> >  static void rcu_bind_gp_kthread(void);
> >  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
> >  static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
> > +static void rcu_dynticks_task_enter(void);
> > +static void rcu_dynticks_task_exit(void);
> >
> >  #endif /* #ifndef RCU_TREE_NONCORE */
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index a86a363ea453..0d8ef5cb1976 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2852,3 +2852,19 @@ static void rcu_bind_gp_kthread(void)
> >                 set_cpus_allowed_ptr(current, cpumask_of(cpu));
> >  #endif /* #ifdef CONFIG_NO_HZ_FULL */
> >  }
> > +
> > +/* Record the current task on dyntick-idle entry. */
> > +static void rcu_dynticks_task_enter(void)
> > +{
> > +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> > +       ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
> > +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
> 
> Shouldn't we check that the cpu is actually a nohz_full cpu, like follows:
> 
>  static void rcu_dynticks_task_enter(void)
>  {
>  #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> -       ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
> +       if (tick_nohz_full_cpu(smp_processor_id())
> +                 ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
>  #endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
>  }

No need.  We put any required idle-task checks on the
rcu_tasks_kthread() slow path, not on this fast path.  Keep in mind
that the tick_nohz_full_cpu() check is not necessarily cheaper than just
doing the store.

> > +}
> > +
> > +/* Record no current task on dyntick-idle exit. */
> > +static void rcu_dynticks_task_exit(void)
> > +{
> > +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> > +       ACCESS_ONCE(current->rcu_tasks_idle_cpu) = -1;
> > +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
> > +}
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index d997163c7e92..a4140f25cf1a 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -466,7 +466,9 @@ static void check_holdout_task(struct task_struct *t,
> >  {
> >         if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
> >             t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> > -           !ACCESS_ONCE(t->on_rq)) {
> > +           !ACCESS_ONCE(t->on_rq) ||
> > +           (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
> > +            !is_idle_task(t) && t->rcu_tasks_idle_cpu >= 0)) {
> 
> rcu_tasks_idle_cpu will be -1 in CONFIG_NO_HZ_FULL is not enabled. Why
> are you checking both here?

If CONFIG_NO_HZ_FULL is not enabled, the remainder of the condition
is dead code.  If CONFIG_NO_HZ_FULL -is- enabled, and if idle tasks
got on the list somehow, this allows the stall warning code to
complain about this.

							Thanx, Paul

> >                 ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
> >                 list_del_rcu(&t->rcu_tasks_holdout_list);
> >                 put_task_struct(t);
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2014-08-14 23:16 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-11 22:48 [PATCH tip/core/rcu 0/16] RCU-tasks implementation Paul E. McKenney
2014-08-11 22:48 ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 02/16] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 03/16] rcu: Add synchronous grace-period waiting for RCU-tasks Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 04/16] rcu: Make TASKS_RCU handle tasks that are almost done exiting Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules Paul E. McKenney
2014-08-14 19:08     ` Pranith Kumar
2014-08-14 21:29       ` Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks Paul E. McKenney
2014-08-14 21:34     ` Pranith Kumar
2014-08-14 21:44       ` Paul E. McKenney
2014-08-14 21:49         ` Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 07/16] rcutorture: Add RCU-tasks test cases Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks Paul E. McKenney
2014-08-14 21:39     ` Pranith Kumar
2014-08-14 21:59       ` Paul E. McKenney
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency Paul E. McKenney
2014-08-14 21:42     ` Pranith Kumar
2014-08-14 21:55       ` Paul E. McKenney
2014-08-14 22:00         ` Pranith Kumar
2014-08-11 22:48   ` [PATCH v5 tip/core/rcu 10/16] documentation: Add verbiage on RCU-tasks stall warning messages Paul E. McKenney
2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks() Paul E. McKenney
2014-08-14 22:28     ` Pranith Kumar
2014-08-14 22:53       ` Paul E. McKenney
2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs Paul E. McKenney
2014-08-14 22:55     ` Pranith Kumar
2014-08-14 23:16       ` Paul E. McKenney
2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 13/16] rcu: Make rcu_tasks_kthread()'s GP-wait loop allow preemption Paul E. McKenney
2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 14/16] rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() Paul E. McKenney
2014-08-13 10:56     ` Peter Zijlstra
2014-08-13 14:07       ` Paul E. McKenney
2014-08-13 14:33         ` Peter Zijlstra
2014-08-13 20:06           ` Paul E. McKenney
2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks Paul E. McKenney
2014-08-13  8:12     ` Peter Zijlstra
2014-08-13 12:48       ` Paul E. McKenney
2014-08-13 13:40         ` Peter Zijlstra
2014-08-13 13:51           ` Steven Rostedt
2014-08-13 14:07             ` Peter Zijlstra
2014-08-13 14:13               ` Steven Rostedt
2014-08-13 14:43                 ` Paul E. McKenney
2014-08-13 16:30                   ` Peter Zijlstra
2014-08-13 16:43                     ` Jacob Pan
2014-08-13 18:24                       ` Paul E. McKenney
2014-08-13 16:35                   ` Peter Zijlstra
2014-08-13 18:25                     ` Paul E. McKenney
2014-08-13 14:43                 ` Peter Zijlstra
2014-08-13 20:56             ` Paul E. McKenney
2014-08-13 14:12           ` Paul E. McKenney
2014-08-13 14:42             ` Peter Zijlstra
2014-08-13 17:24               ` Peter Zijlstra
2014-08-13 17:30                 ` Peter Zijlstra
2014-08-13 18:16                 ` Peter Zijlstra
2014-08-13 18:20               ` Paul E. McKenney
2014-08-13 18:55                 ` Peter Zijlstra
2014-08-13 19:54                   ` Paul E. McKenney
2014-08-11 22:49   ` [PATCH v5 tip/core/rcu 16/16] rcu: Additional information on RCU-tasks stall-warning messages Paul E. McKenney
2014-08-14 20:46   ` [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks() Pranith Kumar
2014-08-14 21:22     ` Paul E. McKenney
2014-08-12 23:57 ` [PATCH tip/core/rcu 0/16] RCU-tasks implementation Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.