linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3
@ 2011-11-02 20:30 Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 01/28] powerpc: Strengthen value-returning-atomics memory barriers Paul E. McKenney
                   ` (28 more replies)
  0 siblings, 29 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches

Hello!

This patchset permits idle tasks to use RCU read-side critical sections,
although they are still prohibited between tick_nohz_idle_exit_norcu()
and tick_nohz_idle_exit_norcu(); makes synchronize_sched_expedited()
better able to share work among concurrent callers, allows ftrace_dump()
to be invoked from modules, dumps tracing upon detection of an rcutorture
failure, detects illegal use of RCU read-side critical sections from
extended quiescent states, legitimizes the pre-existin use of RCU in the
idle notifiers, fixes a memory-barrier botch, introduces an SRCU-like bulk
reference count, improve dynticks entry/exit tracing, further improves
RCU's ability to allow a given CPU to enter dyntick-idle mode quickly,
fixes idle-task checks, updates documentation, and additional fixes
from a still-ongoing top-to-bottom inspection of RCU.  The patches are
as follows:

1.	Strengthen memory barriers used in PowerPC value-returning
	atomics and locking primitives.  It is likely that this
	commit will be superseded by something from the powerpc
	maintainers.  The need for this strengthening was validated
	by tooling from Peter Sewell's group at the University of
	Cambridge.
2.	Rename ->signaled to ->fqs_state to clarify the code.
3.	Fix a race that could permit RCU-preempt expedited grace
	periods to complete too soon.
4.	Improve synchronize_sched_expedited()'s ability to share work
	among concurrent callers.
5.	Document the troubleshooting of lockdep lock-class leaks.
6.	Explicitly track idleness, which is a step towards permitting
	the idle tasks to contain RCU read-side critical sections
	(but only outside the body of the idle loop).
7,8.	Add an EXPORT_SYMBOL_GPL() for ftrace_dump() so that
	rcutorture can dump the trace buffer upon detection of
	an error, and then make rcutorture do the dumping.
9.	Document a failing scheduling-clock tick as yet another
	possible cause of RCU CPU stall warnings.
10.	Disable preemption in rcu_is_cpu_idle() in order to prevent
	spurious lockdep-RCU splats.
11.	Remove a useless self-awaken when setting up expedited grace
	periods, courtesy of Thomas Gleixner and the -rt effort.
12-17.	Make lockdep-RCU warn when RCU read-side primitives are
	invoked from an idle RCU extended quiescent state, mostly
	courtesy of Frederic Weisbecker.
18-23.	Separate out the scheduler-clock tick's idea of dyntick
	idle from RCU's notion of an idle extended quiescent state, mostly
	courtesy of Frederic Weisbecker.  These commits are needed for
	Frederic's work to suppress the scheduler-clock tick when there
	is but one runnable task on a given CPU.
24.	Introduce a bulk reference count, which is related to SRCU,
	but which allows a reference to be acquired in an irq handler
	and released by the task that was interrupted.
25-26.	Improve dyntick-idle tracing and diagnostics.
27.	Allow CPUs with pending RCU callbacks to enter dyntick-idle
	mode.  Beware this commit, as it compiled and passed rcutorture
	on the first try, which historically has indicated the presence
	of subtle and highly destructive bugs.
28.	Fix RCU's determination of whether or not it is running in the
	context of an idle task.

For a testing-only version of this patchset from git, please see the
following subject-to-rebase branch:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/dev

							Thanx, Paul

------------------------------------------------------------------------

 arch/arm/kernel/process.c                  |    4 
 arch/avr32/kernel/process.c                |    4 
 arch/blackfin/kernel/process.c             |    4 
 arch/microblaze/kernel/process.c           |    4 
 arch/mips/kernel/process.c                 |    4 
 arch/openrisc/kernel/idle.c                |    4 
 arch/powerpc/kernel/idle.c                 |   20 +
 arch/powerpc/platforms/iseries/setup.c     |    8 
 arch/s390/kernel/process.c                 |    4 
 arch/sh/kernel/idle.c                      |    4 
 arch/sparc/kernel/process_64.c             |    4 
 arch/tile/kernel/process.c                 |    4 
 arch/um/kernel/process.c                   |    4 
 arch/unicore32/kernel/process.c            |    4 
 arch/x86/kernel/process_32.c               |    4 
 arch/x86/kernel/process_64.c               |   14 -
 b/Documentation/RCU/stallwarn.txt          |    5 
 b/Documentation/RCU/trace.txt              |    4 
 b/Documentation/lockdep-design.txt         |   61 +++++
 b/arch/arm/kernel/process.c                |    4 
 b/arch/avr32/kernel/process.c              |    4 
 b/arch/blackfin/kernel/process.c           |    4 
 b/arch/microblaze/kernel/process.c         |    4 
 b/arch/mips/kernel/process.c               |    4 
 b/arch/openrisc/kernel/idle.c              |    4 
 b/arch/powerpc/include/asm/synch.h         |    6 
 b/arch/powerpc/kernel/idle.c               |    4 
 b/arch/powerpc/platforms/iseries/setup.c   |    8 
 b/arch/powerpc/platforms/pseries/lpar.c    |    4 
 b/arch/s390/kernel/process.c               |    4 
 b/arch/sh/kernel/idle.c                    |    4 
 b/arch/sparc/kernel/process_64.c           |    4 
 b/arch/tile/kernel/process.c               |    4 
 b/arch/um/kernel/process.c                 |    4 
 b/arch/unicore32/kernel/process.c          |    4 
 b/arch/x86/kernel/apic/apic.c              |    6 
 b/arch/x86/kernel/apic/io_apic.c           |    2 
 b/arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 
 b/arch/x86/kernel/cpu/mcheck/threshold.c   |    2 
 b/arch/x86/kernel/irq.c                    |    6 
 b/arch/x86/kernel/process_32.c             |    4 
 b/arch/x86/kernel/process_64.c             |    4 
 b/include/linux/hardirq.h                  |   21 --
 b/include/linux/rcupdate.h                 |   21 --
 b/include/linux/srcu.h                     |   36 ++-
 b/include/linux/tick.h                     |   11 -
 b/include/trace/events/rcu.h               |   10 
 b/kernel/lockdep.c                         |   22 ++
 b/kernel/rcu.h                             |    7 
 b/kernel/rcupdate.c                        |   10 
 b/kernel/rcutiny.c                         |  124 ++++++++++--
 b/kernel/rcutorture.c                      |   18 +
 b/kernel/rcutree.c                         |   16 -
 b/kernel/rcutree.h                         |    4 
 b/kernel/rcutree_plugin.h                  |    7 
 b/kernel/rcutree_trace.c                   |    2 
 b/kernel/softirq.c                         |    2 
 b/kernel/srcu.c                            |    3 
 b/kernel/time/tick-sched.c                 |    6 
 b/kernel/trace/trace.c                     |    1 
 include/linux/rcupdate.h                   |  139 ++++++++-----
 include/linux/srcu.h                       |   55 +++++
 include/linux/tick.h                       |   59 ++++-
 include/trace/events/rcu.h                 |   41 +++
 kernel/rcupdate.c                          |    2 
 kernel/rcutiny.c                           |   45 ++--
 kernel/rcutree.c                           |  298 ++++++++++++++++++++---------
 kernel/rcutree.h                           |   22 --
 kernel/rcutree_plugin.h                    |  175 +++++++++++++----
 kernel/rcutree_trace.c                     |   10 
 kernel/softirq.c                           |    2 
 kernel/time/tick-sched.c                   |  118 ++++++-----
 72 files changed, 1080 insertions(+), 467 deletions(-)


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 01/28] powerpc: Strengthen value-returning-atomics memory barriers
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 02/28] rcu: ->signaled better named ->fqs_state Paul E. McKenney
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, anton, benh, paulus

The trailing isync/lwsync in PowerPC value-returning atomics needs
to be a sync in order to provide the required ordering properties.
The leading lwsync/eieio can remain, as the remainder of the required
ordering guarantees are provided by the atomic instructions: Any
reordering will cause the stwcx to fail, which will result in a retry.

This commit provides the needed adjustment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
---
 arch/powerpc/include/asm/synch.h |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index d7cab44..4d97fbe 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -37,11 +37,7 @@ static inline void isync(void)
 #endif
 
 #ifdef CONFIG_SMP
-#define __PPC_ACQUIRE_BARRIER				\
-	START_LWSYNC_SECTION(97);			\
-	isync;						\
-	MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
-#define PPC_ACQUIRE_BARRIER	"\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
+#define PPC_ACQUIRE_BARRIER	"\n" stringify_in_c(sync;)
 #define PPC_RELEASE_BARRIER	stringify_in_c(LWSYNC) "\n"
 #else
 #define PPC_ACQUIRE_BARRIER
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 02/28] rcu: ->signaled better named ->fqs_state
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 01/28] powerpc: Strengthen value-returning-atomics memory barriers Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 03/28] rcu: Avoid RCU-preempt expedited grace-period botch Paul E. McKenney
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

The ->signaled field was named before complications in the form of
dyntick-idle mode and offlined CPUs.  These complications have required
that force_quiescent_state() be implemented as a state machine, instead
of simply unconditionally sending reschedule IPIs.  Therefore, this
commit renames ->signaled to ->fqs_state to catch up with the new
force_quiescent_state() reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c       |   16 ++++++++--------
 kernel/rcutree.h       |    4 ++--
 kernel/rcutree_trace.c |    2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e234eb9..cb7c46e 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -69,7 +69,7 @@ static struct lock_class_key rcu_node_class[NUM_RCU_LVLS];
 		NUM_RCU_LVL_3, \
 		NUM_RCU_LVL_4, /* == MAX_RCU_LVLS */ \
 	}, \
-	.signaled = RCU_GP_IDLE, \
+	.fqs_state = RCU_GP_IDLE, \
 	.gpnum = -300, \
 	.completed = -300, \
 	.onofflock = __RAW_SPIN_LOCK_UNLOCKED(&structname##_state.onofflock), \
@@ -866,8 +866,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 	/* Advance to a new grace period and initialize state. */
 	rsp->gpnum++;
 	trace_rcu_grace_period(rsp->name, rsp->gpnum, "start");
-	WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);
-	rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */
+	WARN_ON_ONCE(rsp->fqs_state == RCU_GP_INIT);
+	rsp->fqs_state = RCU_GP_INIT; /* Hold off force_quiescent_state. */
 	rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
 	record_gp_stall_check_time(rsp);
 
@@ -877,7 +877,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 		rnp->qsmask = rnp->qsmaskinit;
 		rnp->gpnum = rsp->gpnum;
 		rnp->completed = rsp->completed;
-		rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
+		rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state OK */
 		rcu_start_gp_per_cpu(rsp, rnp, rdp);
 		rcu_preempt_boost_start_gp(rnp);
 		trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
@@ -927,7 +927,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 
 	rnp = rcu_get_root(rsp);
 	raw_spin_lock(&rnp->lock);		/* irqs already disabled. */
-	rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
+	rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
 	raw_spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 	raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
 }
@@ -991,7 +991,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
 
 	rsp->completed = rsp->gpnum;  /* Declare the grace period complete. */
 	trace_rcu_grace_period(rsp->name, rsp->completed, "end");
-	rsp->signaled = RCU_GP_IDLE;
+	rsp->fqs_state = RCU_GP_IDLE;
 	rcu_start_gp(rsp, flags);  /* releases root node's rnp->lock. */
 }
 
@@ -1457,7 +1457,7 @@ static void force_quiescent_state(struct rcu_state *rsp, int relaxed)
 		goto unlock_fqs_ret;  /* no GP in progress, time updated. */
 	}
 	rsp->fqs_active = 1;
-	switch (rsp->signaled) {
+	switch (rsp->fqs_state) {
 	case RCU_GP_IDLE:
 	case RCU_GP_INIT:
 
@@ -1473,7 +1473,7 @@ static void force_quiescent_state(struct rcu_state *rsp, int relaxed)
 		force_qs_rnp(rsp, dyntick_save_progress_counter);
 		raw_spin_lock(&rnp->lock);  /* irqs already disabled */
 		if (rcu_gp_in_progress(rsp))
-			rsp->signaled = RCU_FORCE_QS;
+			rsp->fqs_state = RCU_FORCE_QS;
 		break;
 
 	case RCU_FORCE_QS:
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 849ce9e..517f2f8 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -302,7 +302,7 @@ struct rcu_data {
 	struct rcu_state *rsp;
 };
 
-/* Values for signaled field in struct rcu_state. */
+/* Values for fqs_state field in struct rcu_state. */
 #define RCU_GP_IDLE		0	/* No grace period in progress. */
 #define RCU_GP_INIT		1	/* Grace period being initialized. */
 #define RCU_SAVE_DYNTICK	2	/* Need to scan dyntick state. */
@@ -361,7 +361,7 @@ struct rcu_state {
 
 	/* The following fields are guarded by the root rcu_node's lock. */
 
-	u8	signaled ____cacheline_internodealigned_in_smp;
+	u8	fqs_state ____cacheline_internodealigned_in_smp;
 						/* Force QS state. */
 	u8	fqs_active;			/* force_quiescent_state() */
 						/*  is running. */
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 9feffa4..59c7bee 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -278,7 +278,7 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 	gpnum = rsp->gpnum;
 	seq_printf(m, "c=%lu g=%lu s=%d jfq=%ld j=%x "
 		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu\n",
-		   rsp->completed, gpnum, rsp->signaled,
+		   rsp->completed, gpnum, rsp->fqs_state,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 03/28] rcu: Avoid RCU-preempt expedited grace-period botch
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 01/28] powerpc: Strengthen value-returning-atomics memory barriers Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 02/28] rcu: ->signaled better named ->fqs_state Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 04/28] rcu: Make synchronize_sched_expedited() better at work sharing Paul E. McKenney
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:

1.	Task A exits its RCU read-side critical section, and removes
	itself from the ->blkd_tasks list, releases rnp->lock, and is
	then preempted.  Task B remains on the ->blkd_tasks list, and
	blocks the current expedited grace period.

2.	Task B exits from its RCU read-side critical section and removes
	itself from the ->blkd_tasks list.  Because it is the last task
	blocking the current expedited grace period, it ends that
	expedited grace period.

3.	Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
	of course indicates that nothing is blocking the nonexistent
	expedited grace period. Task A is again preempted.

4.	Some other CPU starts an expedited grace period.  There are several
	tasks blocking this expedited grace period queued on the
	same rcu_node structure that Task A was using in step 1 above.

5.	Task A examines its state and incorrectly concludes that it was
	the last task blocking the expedited grace period on the current
	rcu_node structure.  It therefore reports completion up the
	rcu_node tree.

6.	The expedited grace period can then incorrectly complete before
	the tasks blocked on this same rcu_node structure exit their
	RCU read-side critical sections.  Arbitrarily bad things happen.

This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4b9b9f8..7986053 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -312,6 +312,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 {
 	int empty;
 	int empty_exp;
+	int empty_exp_now;
 	unsigned long flags;
 	struct list_head *np;
 #ifdef CONFIG_RCU_BOOST
@@ -382,8 +383,10 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 		/*
 		 * If this was the last task on the current list, and if
 		 * we aren't waiting on any CPUs, report the quiescent state.
-		 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock.
+		 * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
+		 * so we must take a snapshot of the expedited state.
 		 */
+		empty_exp_now = !rcu_preempted_readers_exp(rnp);
 		if (!empty && !rcu_preempt_blocked_readers_cgp(rnp)) {
 			trace_rcu_quiescent_state_report("preempt_rcu",
 							 rnp->gpnum,
@@ -406,7 +409,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 		 * If this was the last task on the expedited lists,
 		 * then we need to report up the rcu_node hierarchy.
 		 */
-		if (!empty_exp && !rcu_preempted_readers_exp(rnp))
+		if (!empty_exp && empty_exp_now)
 			rcu_report_exp_rnp(&rcu_preempt_state, rnp);
 	} else {
 		local_irq_restore(flags);
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 04/28] rcu: Make synchronize_sched_expedited() better at work sharing
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 03/28] rcu: Avoid RCU-preempt expedited grace-period botch Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection Paul E. McKenney
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

When synchronize_sched_expedited() takes its second and subsequent
snapshots of sync_sched_expedited_started, it subtracts 1.  This
means that the concurrent caller of synchronize_sched_expedited()
that incremented to that value sees our successful completion, it
will not be able to take advantage of it.  This restriction is
pointless, given that our full expedited grace period would have
happened after the other guy started, and thus should be able to
serve as a proxy for the other guy successfully executing
try_stop_cpus().

This commit therefore removes the subtraction of 1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 7986053..708dc57 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1910,7 +1910,7 @@ void synchronize_sched_expedited(void)
 		 * grace period works for us.
 		 */
 		get_online_cpus();
-		snap = atomic_read(&sync_sched_expedited_started) - 1;
+		snap = atomic_read(&sync_sched_expedited_started);
 		smp_mb(); /* ensure read is before try_stop_cpus(). */
 	}
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 04/28] rcu: Make synchronize_sched_expedited() better at work sharing Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  2:57   ` Josh Triplett
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 06/28] rcu: Track idleness independent of idle tasks Paul E. McKenney
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

There are a number of bugs that can leak or overuse lock classes,
which can cause the maximum number of lock classes (currently 8191)
to be exceeded.  However, the documentation does not tell you how to
track down these problems.  This commit addresses this shortcoming.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/lockdep-design.txt |   61 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index abf768c..383bb23 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -221,3 +221,64 @@ when the chain is validated for the first time, is then put into a hash
 table, which hash-table can be checked in a lockfree manner. If the
 locking chain occurs again later on, the hash table tells us that we
 dont have to validate the chain again.
+
+Troubleshooting:
+----------------
+
+The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
+Exceeding this number will trigger the following lockdep warning:
+
+	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+
+By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
+desktop systems have less than 1,000 lock classes, so this warning
+normally results from lock-class leakage or failure to properly
+initialize locks.  These two problems are illustrated below:
+
+1.	Repeated module loading and unloading while running the validator
+	will result in lock-class leakage.  The issue here is that each
+	load of the module will create a new set of lock classes for that
+	module's locks, but module unloading does not remove old classes.
+	Therefore, if that module is loaded and unloaded repeatedly,
+	the number of lock classes will eventually reach the maximum.
+
+2.	Using structures such as arrays that have large numbers of
+	locks that are not explicitly initialized.  For example,
+	a hash table with 8192 buckets where each bucket has its
+	own spinlock_t will consume 8192 lock classes -unless- each
+	spinlock is initialized, for example, using spin_lock_init().
+	Failure to properly initialize the per-bucket spinlocks would
+	guarantee lock-class overflow.	In contrast, a loop that called
+	spin_lock_init() on each lock would place all 8192 locks into a
+	single lock class.
+
+	The moral of this story is that you should always explicitly
+	initialize your locks.
+
+One might argue that the validator should be modified to allow lock
+classes to be reused.  However, if you are tempted to make this argument,
+first review the code and think through the changes that would be
+required, keeping in mind that the lock classes to be removed are likely
+to be linked into the lock-dependency graph.  This turns out to be a
+harder to do than to say.
+
+Of course, if you do run out of lock classes, the next thing to do is
+to find the offending lock classes.  First, the following command gives
+you the number of lock classes currently in use along with the maximum:
+
+	grep "lock-classes" /proc/lockdep_stats
+
+This command produces the following output on a modest Power system:
+
+	 lock-classes:                          748 [max: 8191]
+
+If the number allocated (748 above) increases continually over time,
+then there is likely a leak.  The following command can be used to
+identify the leaking lock classes:
+
+	grep "BD" /proc/lockdep
+
+Run the command and save the output, then compare against the output
+from a later run of this command to identify the leakers.  This same
+output can also help you find situations where lock initialization
+has been omitted.
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 06/28] rcu: Track idleness independent of idle tasks
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 07/28] trace: Allow ftrace_dump() to be called from modules Paul E. McKenney
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration.  This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts.  This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions.  It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating.  Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/trace.txt |    4 -
 include/linux/hardirq.h     |   21 ----
 include/linux/rcupdate.h    |   21 +---
 include/linux/tick.h        |   11 ++-
 include/trace/events/rcu.h  |   10 +-
 kernel/rcutiny.c            |  124 ++++++++++++++++++++---
 kernel/rcutree.c            |  229 ++++++++++++++++++++++++++++++-------------
 kernel/rcutree.h            |   15 +--
 kernel/rcutree_trace.c      |   10 +--
 kernel/time/tick-sched.c    |    6 +-
 10 files changed, 297 insertions(+), 154 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index aaf65f6..49587ab 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -105,14 +105,10 @@ o	"dt" is the current value of the dyntick counter that is incremented
 	or one greater than the interrupt-nesting depth otherwise.
 	The number after the second "/" is the NMI nesting depth.
 
-	This field is displayed only for CONFIG_NO_HZ kernels.
-
 o	"df" is the number of times that some other CPU has forced a
 	quiescent state on behalf of this CPU due to this CPU being in
 	dynticks-idle state.
 
-	This field is displayed only for CONFIG_NO_HZ kernels.
-
 o	"of" is the number of times that some other CPU has forced a
 	quiescent state on behalf of this CPU due to this CPU being
 	offline.  In a perfect world, this might never happen, but it
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index f743883..bb7f309 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,20 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
 extern void account_system_vtime(struct task_struct *tsk);
 #endif
 
-#if defined(CONFIG_NO_HZ)
 #if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
-extern void rcu_enter_nohz(void);
-extern void rcu_exit_nohz(void);
-
-static inline void rcu_irq_enter(void)
-{
-	rcu_exit_nohz();
-}
-
-static inline void rcu_irq_exit(void)
-{
-	rcu_enter_nohz();
-}
 
 static inline void rcu_nmi_enter(void)
 {
@@ -163,17 +150,9 @@ static inline void rcu_nmi_exit(void)
 }
 
 #else
-extern void rcu_irq_enter(void);
-extern void rcu_irq_exit(void);
 extern void rcu_nmi_enter(void);
 extern void rcu_nmi_exit(void);
 #endif
-#else
-# define rcu_irq_enter() do { } while (0)
-# define rcu_irq_exit() do { } while (0)
-# define rcu_nmi_enter() do { } while (0)
-# define rcu_nmi_exit() do { } while (0)
-#endif /* #if defined(CONFIG_NO_HZ) */
 
 /*
  * It is safe to do non-atomic ops on ->hardirq_context,
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 2cf4226..cd1ad4b 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -177,23 +177,10 @@ extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
 extern void rcu_check_callbacks(int cpu, int user);
 struct notifier_block;
-
-#ifdef CONFIG_NO_HZ
-
-extern void rcu_enter_nohz(void);
-extern void rcu_exit_nohz(void);
-
-#else /* #ifdef CONFIG_NO_HZ */
-
-static inline void rcu_enter_nohz(void)
-{
-}
-
-static inline void rcu_exit_nohz(void)
-{
-}
-
-#endif /* #else #ifdef CONFIG_NO_HZ */
+extern void rcu_idle_enter(void);
+extern void rcu_idle_exit(void);
+extern void rcu_irq_enter(void);
+extern void rcu_irq_exit(void);
 
 /*
  * Infrastructure to implement the synchronize_() primitives in
diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..ca40838 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -127,8 +127,15 @@ extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_stop_sched_tick(int inidle) { }
-static inline void tick_nohz_restart_sched_tick(void) { }
+static inline void tick_nohz_stop_sched_tick(int inidle)
+{
+	if (inidle)
+		rcu_idle_enter();
+}
+static inline void tick_nohz_restart_sched_tick(void)
+{
+	rcu_idle_exit();
+}
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
 	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 669fbd6..e577180 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -246,19 +246,21 @@ TRACE_EVENT(rcu_fqs,
  */
 TRACE_EVENT(rcu_dyntick,
 
-	TP_PROTO(char *polarity),
+	TP_PROTO(char *polarity, int nesting),
 
-	TP_ARGS(polarity),
+	TP_ARGS(polarity, nesting),
 
 	TP_STRUCT__entry(
 		__field(char *, polarity)
+		__field(int, nesting)
 	),
 
 	TP_fast_assign(
 		__entry->polarity = polarity;
+		__entry->nesting = nesting;
 	),
 
-	TP_printk("%s", __entry->polarity)
+	TP_printk("%s %d", __entry->polarity, __entry->nesting)
 );
 
 /*
@@ -443,7 +445,7 @@ TRACE_EVENT(rcu_batch_end,
 #define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0)
 #define trace_rcu_quiescent_state_report(rcuname, gpnum, mask, qsmask, level, grplo, grphi, gp_tasks) do { } while (0)
 #define trace_rcu_fqs(rcuname, gpnum, cpu, qsevent) do { } while (0)
-#define trace_rcu_dyntick(polarity) do { } while (0)
+#define trace_rcu_dyntick(polarity, nesting) do { } while (0)
 #define trace_rcu_callback(rcuname, rhp, qlen) do { } while (0)
 #define trace_rcu_kfree_callback(rcuname, rhp, offset, qlen) do { } while (0)
 #define trace_rcu_batch_start(rcuname, qlen, blimit) do { } while (0)
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index da775c8..6b0ace4 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -54,31 +54,122 @@ static void __call_rcu(struct rcu_head *head,
 
 #include "rcutiny_plugin.h"
 
-#ifdef CONFIG_NO_HZ
+static long long rcu_dynticks_nesting = LLONG_MAX / 2;
 
-static long rcu_dynticks_nesting = 1;
+/* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */
+static void rcu_idle_enter_common(void)
+{
+	if (rcu_dynticks_nesting) {
+		RCU_TRACE(trace_rcu_dyntick("--=", rcu_dynticks_nesting));
+		return;
+	}
+	RCU_TRACE(trace_rcu_dyntick("Start", rcu_dynticks_nesting));
+	if (!idle_cpu(smp_processor_id())) {
+		WARN_ON_ONCE(1);	/* must be idle task! */
+		RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
+					    rcu_dynticks_nesting));
+		ftrace_dump(DUMP_ALL);
+	}
+	rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
+}
 
 /*
- * Enter dynticks-idle mode, which is an extended quiescent state
- * if we have fully entered that mode (i.e., if the new value of
- * dynticks_nesting is zero).
+ * Enter idle, which is an extended quiescent state if we have fully
+ * entered that mode (i.e., if the new value of dynticks_nesting is zero).
  */
-void rcu_enter_nohz(void)
+void rcu_idle_enter(void)
 {
-	if (--rcu_dynticks_nesting == 0)
-		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
+	unsigned long flags;
+
+	local_irq_save(flags);
+	rcu_dynticks_nesting = 0;
+	rcu_idle_enter_common();
+	local_irq_restore(flags);
 }
 
 /*
- * Exit dynticks-idle mode, so that we are no longer in an extended
- * quiescent state.
+ * Exit an interrupt handler towards idle.
+ */
+void rcu_irq_exit(void)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	rcu_dynticks_nesting--;
+	WARN_ON_ONCE(rcu_dynticks_nesting < 0);
+	rcu_idle_enter_common();
+	local_irq_restore(flags);
+}
+
+/* Common code for rcu_idle_exit() and rcu_irq_enter(), see kernel/rcutree.c. */
+static void rcu_idle_exit_common(long long oldval)
+{
+	if (oldval) {
+		RCU_TRACE(trace_rcu_dyntick("++=", rcu_dynticks_nesting));
+		return;
+	}
+	RCU_TRACE(trace_rcu_dyntick("End", oldval));
+	if (!idle_cpu(smp_processor_id())) {
+		WARN_ON_ONCE(1);	/* must be idle task! */
+		RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task",
+			  oldval));
+		ftrace_dump(DUMP_ALL);
+	}
+}
+
+/*
+ * Exit idle, so that we are no longer in an extended quiescent state.
  */
-void rcu_exit_nohz(void)
+void rcu_idle_exit(void)
 {
+	unsigned long flags;
+	long long oldval;
+
+	local_irq_save(flags);
+	oldval = rcu_dynticks_nesting;
+	WARN_ON_ONCE(oldval != 0);
+	rcu_dynticks_nesting = LLONG_MAX / 2;
+	rcu_idle_exit_common(oldval);
+	local_irq_restore(flags);
+}
+
+/*
+ * Enter an interrupt handler, moving away from idle.
+ */
+void rcu_irq_enter(void)
+{
+	unsigned long flags;
+	long long oldval;
+
+	local_irq_save(flags);
+	oldval = rcu_dynticks_nesting;
 	rcu_dynticks_nesting++;
+	WARN_ON_ONCE(rcu_dynticks_nesting == 0);
+	rcu_idle_exit_common(oldval);
+	local_irq_restore(flags);
+}
+
+#ifdef CONFIG_PROVE_RCU
+
+/*
+ * Test whether RCU thinks that the current CPU is idle.
+ */
+int rcu_is_cpu_idle(void)
+{
+	return !rcu_dynticks_nesting;
 }
 
-#endif /* #ifdef CONFIG_NO_HZ */
+#endif /* #ifdef CONFIG_PROVE_RCU */
+
+/*
+ * Test whether the current CPU was interrupted from idle.  Nested
+ * interrupts don't count, we must be running at the first interrupt
+ * level.
+ */
+int rcu_is_cpu_rrupt_from_idle(void)
+{
+	return rcu_dynticks_nesting <= 0;
+}
 
 /*
  * Helper function for rcu_sched_qs() and rcu_bh_qs().
@@ -127,14 +218,13 @@ void rcu_bh_qs(int cpu)
 
 /*
  * Check to see if the scheduling-clock interrupt came from an extended
- * quiescent state, and, if so, tell RCU about it.
+ * quiescent state, and, if so, tell RCU about it.  This function must
+ * be called from hardirq context.  It is normally called from the
+ * scheduling-clock interrupt.
  */
 void rcu_check_callbacks(int cpu, int user)
 {
-	if (user ||
-	    (idle_cpu(cpu) &&
-	     !in_softirq() &&
-	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
+	if (user || rcu_is_cpu_rrupt_from_idle())
 		rcu_sched_qs(cpu);
 	else if (!in_softirq())
 		rcu_bh_qs(cpu);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cb7c46e..abb5167 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -195,12 +195,10 @@ void rcu_note_context_switch(int cpu)
 }
 EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
-#ifdef CONFIG_NO_HZ
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
-	.dynticks_nesting = 1,
+	.dynticks_nesting = LLONG_MAX / 2,
 	.dynticks = ATOMIC_INIT(1),
 };
-#endif /* #ifdef CONFIG_NO_HZ */
 
 static int blimit = 10;		/* Maximum callbacks per rcu_do_batch. */
 static int qhimark = 10000;	/* If this many pending, ignore blimit. */
@@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
 		return 1;
 	}
 
-	/* If preemptible RCU, no point in sending reschedule IPI. */
-	if (rdp->preemptible)
-		return 0;
-
-	/* The CPU is online, so send it a reschedule IPI. */
+	/*
+	 * The CPU is online, so send it a reschedule IPI.  This forces
+	 * it through the scheduler, and (inefficiently) also handles cases
+	 * where idle loops fail to inform RCU about the CPU being idle.
+	 */
 	if (rdp->cpu != smp_processor_id())
 		smp_send_reschedule(rdp->cpu);
 	else
@@ -343,51 +341,97 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
 
 #endif /* #ifdef CONFIG_SMP */
 
-#ifdef CONFIG_NO_HZ
+/*
+ * rcu_idle_enter_common - inform RCU that current CPU is moving towards idle
+ *
+ * If the new value of the ->dynticks_nesting counter now is zero,
+ * we really have entered idle, and must do the appropriate accounting.
+ * The caller must have disabled interrupts.
+ */
+static void rcu_idle_enter_common(struct rcu_dynticks *rdtp)
+{
+	if (rdtp->dynticks_nesting) {
+		trace_rcu_dyntick("--=", rdtp->dynticks_nesting);
+		return;
+	}
+	trace_rcu_dyntick("Start", rdtp->dynticks_nesting);
+	if (!idle_cpu(smp_processor_id())) {
+		WARN_ON_ONCE(1);	/* must be idle task! */
+		trace_rcu_dyntick("Error on entry: not idle task",
+				   rdtp->dynticks_nesting);
+		ftrace_dump(DUMP_ALL);
+	}
+	/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
+	smp_mb__before_atomic_inc();  /* See above. */
+	atomic_inc(&rdtp->dynticks);
+	smp_mb__after_atomic_inc();  /* Force ordering with next sojourn. */
+	WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
+}
 
 /**
- * rcu_enter_nohz - inform RCU that current CPU is entering nohz
+ * rcu_idle_enter - inform RCU that current CPU is entering idle
  *
- * Enter nohz mode, in other words, -leave- the mode in which RCU
+ * Enter idle mode, in other words, -leave- the mode in which RCU
  * read-side critical sections can occur.  (Though RCU read-side
- * critical sections can occur in irq handlers in nohz mode, a possibility
- * handled by rcu_irq_enter() and rcu_irq_exit()).
+ * critical sections can occur in irq handlers in idle, a possibility
+ * handled by irq_enter() and irq_exit().)
+ *
+ * We crowbar the ->dynticks_nesting field to zero to allow for
+ * the possibility of usermode upcalls having messed up our count
+ * of interrupt nesting level during the prior busy period.
  */
-void rcu_enter_nohz(void)
+void rcu_idle_enter(void)
 {
 	unsigned long flags;
 	struct rcu_dynticks *rdtp;
 
 	local_irq_save(flags);
 	rdtp = &__get_cpu_var(rcu_dynticks);
-	if (--rdtp->dynticks_nesting) {
-		local_irq_restore(flags);
-		return;
-	}
-	trace_rcu_dyntick("Start");
-	/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
-	smp_mb__before_atomic_inc();  /* See above. */
-	atomic_inc(&rdtp->dynticks);
-	smp_mb__after_atomic_inc();  /* Force ordering with next sojourn. */
-	WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
+	rdtp->dynticks_nesting = 0;
+	rcu_idle_enter_common(rdtp);
 	local_irq_restore(flags);
 }
 
-/*
- * rcu_exit_nohz - inform RCU that current CPU is leaving nohz
+/**
+ * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
+ *
+ * Exit from an interrupt handler, which might possibly result in entering
+ * idle mode, in other words, leaving the mode in which read-side critical
+ * sections can occur.
  *
- * Exit nohz mode, in other words, -enter- the mode in which RCU
- * read-side critical sections normally occur.
+ * This code assumes that the idle loop never does anything that might
+ * result in unbalanced calls to irq_enter() and irq_exit().  If your
+ * architecture violates this assumption, RCU will give you what you
+ * deserve, good and hard.  But very infrequently and irreproducibly.
+ *
+ * Use things like work queues to work around this limitation.
+ *
+ * You have been warned.
  */
-void rcu_exit_nohz(void)
+void rcu_irq_exit(void)
 {
 	unsigned long flags;
 	struct rcu_dynticks *rdtp;
 
 	local_irq_save(flags);
 	rdtp = &__get_cpu_var(rcu_dynticks);
-	if (rdtp->dynticks_nesting++) {
-		local_irq_restore(flags);
+	rdtp->dynticks_nesting--;
+	WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
+	rcu_idle_enter_common(rdtp);
+	local_irq_restore(flags);
+}
+
+/*
+ * rcu_idle_exit_common - inform RCU that current CPU is moving away from idle
+ *
+ * If the new value of the ->dynticks_nesting counter was previously zero,
+ * we really have exited idle, and must do the appropriate accounting.
+ * The caller must have disabled interrupts.
+ */
+static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
+{
+	if (oldval) {
+		trace_rcu_dyntick("++=", rdtp->dynticks_nesting);
 		return;
 	}
 	smp_mb__before_atomic_inc();  /* Force ordering w/previous sojourn. */
@@ -395,7 +439,71 @@ void rcu_exit_nohz(void)
 	/* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
 	smp_mb__after_atomic_inc();  /* See above. */
 	WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
-	trace_rcu_dyntick("End");
+	trace_rcu_dyntick("End", oldval);
+	if (!idle_cpu(smp_processor_id())) {
+		WARN_ON_ONCE(1);	/* must be idle task! */
+		trace_rcu_dyntick("Error on exit: not idle task", oldval);
+		ftrace_dump(DUMP_ALL);
+	}
+}
+
+/**
+ * rcu_idle_exit - inform RCU that current CPU is leaving idle
+ *
+ * Exit idle mode, in other words, -enter- the mode in which RCU
+ * read-side critical sections can occur.
+ *
+ * We crowbar the ->dynticks_nesting field to LLONG_MAX/2 to allow for
+ * the possibility of usermode upcalls messing up our count
+ * of interrupt nesting level during the busy period that is just
+ * now starting.
+ */
+void rcu_idle_exit(void)
+{
+	unsigned long flags;
+	struct rcu_dynticks *rdtp;
+	long long oldval;
+
+	local_irq_save(flags);
+	rdtp = &__get_cpu_var(rcu_dynticks);
+	oldval = rdtp->dynticks_nesting;
+	WARN_ON_ONCE(oldval != 0);
+	rdtp->dynticks_nesting = LLONG_MAX / 2;
+	rcu_idle_exit_common(rdtp, oldval);
+	local_irq_restore(flags);
+}
+
+/**
+ * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
+ *
+ * Enter an interrupt handler, which might possibly result in exiting
+ * idle mode, in other words, entering the mode in which read-side critical
+ * sections can occur.
+ *
+ * Note that the Linux kernel is fully capable of entering an interrupt
+ * handler that it never exits, for example when doing upcalls to
+ * user mode!  This code assumes that the idle loop never does upcalls to
+ * user mode.  If your architecture does do upcalls from the idle loop (or
+ * does anything else that results in unbalanced calls to the irq_enter()
+ * and irq_exit() functions), RCU will give you what you deserve, good
+ * and hard.  But very infrequently and irreproducibly.
+ *
+ * Use things like work queues to work around this limitation.
+ *
+ * You have been warned.
+ */
+void rcu_irq_enter(void)
+{
+	unsigned long flags;
+	struct rcu_dynticks *rdtp;
+	long long oldval;
+
+	local_irq_save(flags);
+	rdtp = &__get_cpu_var(rcu_dynticks);
+	oldval = rdtp->dynticks_nesting;
+	rdtp->dynticks_nesting++;
+	WARN_ON_ONCE(rdtp->dynticks_nesting == 0);
+	rcu_idle_exit_common(rdtp, oldval);
 	local_irq_restore(flags);
 }
 
@@ -442,27 +550,32 @@ void rcu_nmi_exit(void)
 	WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
 }
 
+#ifdef CONFIG_PROVE_RCU
+
 /**
- * rcu_irq_enter - inform RCU of entry to hard irq context
+ * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle
  *
- * If the CPU was idle with dynamic ticks active, this updates the
- * rdtp->dynticks to let the RCU handling know that the CPU is active.
+ * If the current CPU is in its idle loop and is neither in an interrupt
+ * or NMI handler, return true.  The caller must have at least disabled
+ * preemption.
  */
-void rcu_irq_enter(void)
+int rcu_is_cpu_idle(void)
 {
-	rcu_exit_nohz();
+	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
 }
 
+#endif /* #ifdef CONFIG_PROVE_RCU */
+
 /**
- * rcu_irq_exit - inform RCU of exit from hard irq context
+ * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
  *
- * If the CPU was idle with dynamic ticks active, update the rdp->dynticks
- * to put let the RCU handling be aware that the CPU is going back to idle
- * with no ticks.
+ * If the current CPU is idle or running at a first-level (not nested)
+ * interrupt from idle, return true.  The caller must have at least
+ * disabled preemption.
  */
-void rcu_irq_exit(void)
+int rcu_is_cpu_rrupt_from_idle(void)
 {
-	rcu_enter_nohz();
+	return __get_cpu_var(rcu_dynticks).dynticks_nesting <= 1;
 }
 
 #ifdef CONFIG_SMP
@@ -512,24 +625,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 
 #endif /* #ifdef CONFIG_SMP */
 
-#else /* #ifdef CONFIG_NO_HZ */
-
-#ifdef CONFIG_SMP
-
-static int dyntick_save_progress_counter(struct rcu_data *rdp)
-{
-	return 0;
-}
-
-static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
-{
-	return rcu_implicit_offline_qs(rdp);
-}
-
-#endif /* #ifdef CONFIG_SMP */
-
-#endif /* #else #ifdef CONFIG_NO_HZ */
-
 int rcu_cpu_stall_suppress __read_mostly;
 
 static void record_gp_stall_check_time(struct rcu_state *rsp)
@@ -1334,16 +1429,14 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
  * (user mode or idle loop for rcu, non-softirq execution for rcu_bh).
  * Also schedule RCU core processing.
  *
- * This function must be called with hardirqs disabled.  It is normally
+ * This function must be called from hardirq context.  It is normally
  * invoked from the scheduling-clock interrupt.  If rcu_pending returns
  * false, there is no point in invoking rcu_check_callbacks().
  */
 void rcu_check_callbacks(int cpu, int user)
 {
 	trace_rcu_utilization("Start scheduler-tick");
-	if (user ||
-	    (idle_cpu(cpu) && rcu_scheduler_active &&
-	     !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) {
+	if (user || rcu_is_cpu_rrupt_from_idle()) {
 
 		/*
 		 * Get here if this CPU took its interrupt from user
@@ -1913,9 +2006,9 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 	for (i = 0; i < RCU_NEXT_SIZE; i++)
 		rdp->nxttail[i] = &rdp->nxtlist;
 	rdp->qlen = 0;
-#ifdef CONFIG_NO_HZ
 	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
-#endif /* #ifdef CONFIG_NO_HZ */
+	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != LLONG_MAX / 2);
+	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
 	rdp->cpu = cpu;
 	rdp->rsp = rsp;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
@@ -1942,6 +2035,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 	rdp->qlen_last_fqs_check = 0;
 	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
+	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != LLONG_MAX / 2);
+	WARN_ON_ONCE((atomic_read(&rdp->dynticks->dynticks) & 0x1) != 1);
 	raw_spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
 	/*
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 517f2f8..0963fa1 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -84,9 +84,10 @@
  * Dynticks per-CPU state.
  */
 struct rcu_dynticks {
-	int dynticks_nesting;	/* Track irq/process nesting level. */
-	int dynticks_nmi_nesting; /* Track NMI nesting level. */
-	atomic_t dynticks;	/* Even value for dynticks-idle, else odd. */
+	long long dynticks_nesting; /* Track irq/process nesting level. */
+				    /* Process level is worth LLONG_MAX/2. */
+	int dynticks_nmi_nesting;   /* Track NMI nesting level. */
+	atomic_t dynticks;	    /* Even value for idle, else odd. */
 };
 
 /* RCU's kthread states for tracing. */
@@ -274,16 +275,12 @@ struct rcu_data {
 					/* did other CPU force QS recently? */
 	long		blimit;		/* Upper limit on a processed batch */
 
-#ifdef CONFIG_NO_HZ
 	/* 3) dynticks interface. */
 	struct rcu_dynticks *dynticks;	/* Shared per-CPU dynticks state. */
 	int dynticks_snap;		/* Per-GP tracking for dynticks. */
-#endif /* #ifdef CONFIG_NO_HZ */
 
 	/* 4) reasons this CPU needed to be kicked by force_quiescent_state */
-#ifdef CONFIG_NO_HZ
 	unsigned long dynticks_fqs;	/* Kicked due to dynticks idle. */
-#endif /* #ifdef CONFIG_NO_HZ */
 	unsigned long offline_fqs;	/* Kicked due to being offline. */
 	unsigned long resched_ipi;	/* Sent a resched IPI. */
 
@@ -307,11 +304,7 @@ struct rcu_data {
 #define RCU_GP_INIT		1	/* Grace period being initialized. */
 #define RCU_SAVE_DYNTICK	2	/* Need to scan dyntick state. */
 #define RCU_FORCE_QS		3	/* Need to force quiescent state. */
-#ifdef CONFIG_NO_HZ
 #define RCU_SIGNAL_INIT		RCU_SAVE_DYNTICK
-#else /* #ifdef CONFIG_NO_HZ */
-#define RCU_SIGNAL_INIT		RCU_FORCE_QS
-#endif /* #else #ifdef CONFIG_NO_HZ */
 
 #define RCU_JIFFIES_TILL_FORCE_QS	 3	/* for rsp->jiffies_force_qs */
 
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 59c7bee..654cfe6 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -67,13 +67,11 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 		   rdp->completed, rdp->gpnum,
 		   rdp->passed_quiesce, rdp->passed_quiesce_gpnum,
 		   rdp->qs_pending);
-#ifdef CONFIG_NO_HZ
-	seq_printf(m, " dt=%d/%d/%d df=%lu",
+	seq_printf(m, " dt=%d/%llx/%d df=%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,
 		   rdp->dynticks->dynticks_nmi_nesting,
 		   rdp->dynticks_fqs);
-#endif /* #ifdef CONFIG_NO_HZ */
 	seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi);
 	seq_printf(m, " ql=%ld qs=%c%c%c%c",
 		   rdp->qlen,
@@ -141,13 +139,11 @@ static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp)
 		   rdp->completed, rdp->gpnum,
 		   rdp->passed_quiesce, rdp->passed_quiesce_gpnum,
 		   rdp->qs_pending);
-#ifdef CONFIG_NO_HZ
-	seq_printf(m, ",%d,%d,%d,%lu",
+	seq_printf(m, ",%d,%llx,%d,%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,
 		   rdp->dynticks->dynticks_nmi_nesting,
 		   rdp->dynticks_fqs);
-#endif /* #ifdef CONFIG_NO_HZ */
 	seq_printf(m, ",%lu,%lu", rdp->offline_fqs, rdp->resched_ipi);
 	seq_printf(m, ",%ld,\"%c%c%c%c\"", rdp->qlen,
 		   ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] !=
@@ -171,9 +167,7 @@ static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp)
 static int show_rcudata_csv(struct seq_file *m, void *unused)
 {
 	seq_puts(m, "\"CPU\",\"Online?\",\"c\",\"g\",\"pq\",\"pgp\",\"pq\",");
-#ifdef CONFIG_NO_HZ
 	seq_puts(m, "\"dt\",\"dt nesting\",\"dt NMI nesting\",\"df\",");
-#endif /* #ifdef CONFIG_NO_HZ */
 	seq_puts(m, "\"of\",\"ri\",\"ql\",\"qs\"");
 #ifdef CONFIG_RCU_BOOST
 	seq_puts(m, "\"kt\",\"ktl\"");
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index eb98e55..4692907 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -405,7 +405,6 @@ void tick_nohz_stop_sched_tick(int inidle)
 			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
-			rcu_enter_nohz();
 		}
 
 		ts->idle_sleeps++;
@@ -444,6 +443,8 @@ out:
 	ts->last_jiffies = last_jiffies;
 	ts->sleep_length = ktime_sub(dev->next_event, now);
 end:
+	if (inidle)
+		rcu_idle_enter();
 	local_irq_restore(flags);
 }
 
@@ -500,6 +501,7 @@ void tick_nohz_restart_sched_tick(void)
 	ktime_t now;
 
 	local_irq_disable();
+	rcu_idle_exit();
 	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
 		now = ktime_get();
 
@@ -514,8 +516,6 @@ void tick_nohz_restart_sched_tick(void)
 
 	ts->inidle = 0;
 
-	rcu_exit_nohz();
-
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 07/28] trace: Allow ftrace_dump() to be called from modules
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 06/28] rcu: Track idleness independent of idle tasks Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 08/28] rcu: Add failure tracing to rcutorture Paul E. McKenney
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
upon detection of an RCU error.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/trace/trace.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e5df02c..a8ccb84 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4658,6 +4658,7 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 {
 	__ftrace_dump(true, oops_dump_mode);
 }
+EXPORT_SYMBOL_GPL(ftrace_dump);
 
 __init static int tracer_alloc_buffers(void)
 {
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 08/28] rcu: Add failure tracing to rcutorture
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 07/28] trace: Allow ftrace_dump() to be called from modules Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning Paul E. McKenney
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

Trace the rcutorture RCU accesses and dump the trace buffer when the
first failure is detected.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h   |    8 ++++++++
 include/trace/events/rcu.h |   26 ++++++++++++++++++++++++++
 kernel/rcupdate.c          |   10 ++++++++++
 kernel/rcutorture.c        |   18 ++++++++++++++++++
 4 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index cd1ad4b..8d315b0 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -51,6 +51,8 @@ extern int rcutorture_runnable; /* for sysctl */
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
 extern void rcutorture_record_test_transition(void);
 extern void rcutorture_record_progress(unsigned long vernum);
+extern void do_trace_rcu_torture_read(char *rcutorturename,
+				      struct rcu_head *rhp);
 #else
 static inline void rcutorture_record_test_transition(void)
 {
@@ -58,6 +60,12 @@ static inline void rcutorture_record_test_transition(void)
 static inline void rcutorture_record_progress(unsigned long vernum)
 {
 }
+#ifdef CONFIG_RCU_TRACE
+extern void do_trace_rcu_torture_read(char *rcutorturename,
+				      struct rcu_head *rhp);
+#else
+#define do_trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)
+#endif
 #endif
 
 #define UINT_CMP_GE(a, b)	(UINT_MAX / 2 >= (a) - (b))
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index e577180..172620a 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -437,6 +437,31 @@ TRACE_EVENT(rcu_batch_end,
 		  __entry->rcuname, __entry->callbacks_invoked)
 );
 
+/*
+ * Tracepoint for rcutorture readers.  The first argument is the name
+ * of the RCU flavor from rcutorture's viewpoint and the second argument
+ * is the callback address.
+ */
+TRACE_EVENT(rcu_torture_read,
+
+	TP_PROTO(char *rcutorturename, struct rcu_head *rhp),
+
+	TP_ARGS(rcutorturename, rhp),
+
+	TP_STRUCT__entry(
+		__field(char *, rcutorturename)
+		__field(struct rcu_head *, rhp)
+	),
+
+	TP_fast_assign(
+		__entry->rcutorturename = rcutorturename;
+		__entry->rhp = rhp;
+	),
+
+	TP_printk("%s torture read %p",
+		  __entry->rcutorturename, __entry->rhp)
+);
+
 #else /* #ifdef CONFIG_RCU_TRACE */
 
 #define trace_rcu_grace_period(rcuname, gpnum, gpevent) do { } while (0)
@@ -452,6 +477,7 @@ TRACE_EVENT(rcu_batch_end,
 #define trace_rcu_invoke_callback(rcuname, rhp) do { } while (0)
 #define trace_rcu_invoke_kfree_callback(rcuname, rhp, offset) do { } while (0)
 #define trace_rcu_batch_end(rcuname, callbacks_invoked) do { } while (0)
+#define trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)
 
 #endif /* #else #ifdef CONFIG_RCU_TRACE */
 
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index ca0d23b..34a02da 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -316,3 +316,13 @@ struct debug_obj_descr rcuhead_debug_descr = {
 };
 EXPORT_SYMBOL_GPL(rcuhead_debug_descr);
 #endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+
+#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU) || defined(CONFIG_RCU_TRACE)
+void do_trace_rcu_torture_read(char *rcutorturename, struct rcu_head *rhp)
+{
+	trace_rcu_torture_read(rcutorturename, rhp);
+}
+EXPORT_SYMBOL_GPL(do_trace_rcu_torture_read);
+#else
+#define do_trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)
+#endif
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 764825c..df35228 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -913,6 +913,18 @@ rcu_torture_fakewriter(void *arg)
 	return 0;
 }
 
+void rcutorture_trace_dump(void)
+{
+	static atomic_t beenhere = ATOMIC_INIT(0);
+
+	if (atomic_read(&beenhere))
+		return;
+	if (atomic_xchg(&beenhere, 1) != 0)
+		return;
+	do_trace_rcu_torture_read(cur_ops->name, (struct rcu_head *)~0UL);
+	ftrace_dump(DUMP_ALL);
+}
+
 /*
  * RCU torture reader from timer handler.  Dereferences rcu_torture_current,
  * incrementing the corresponding element of the pipeline array.  The
@@ -934,6 +946,7 @@ static void rcu_torture_timer(unsigned long unused)
 				  rcu_read_lock_bh_held() ||
 				  rcu_read_lock_sched_held() ||
 				  srcu_read_lock_held(&srcu_ctl));
+	do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu);
 	if (p == NULL) {
 		/* Leave because rcu_torture_writer is not yet underway */
 		cur_ops->readunlock(idx);
@@ -951,6 +964,8 @@ static void rcu_torture_timer(unsigned long unused)
 		/* Should not happen, but... */
 		pipe_count = RCU_TORTURE_PIPE_LEN;
 	}
+	if (pipe_count > 1)
+		rcutorture_trace_dump();
 	__this_cpu_inc(rcu_torture_count[pipe_count]);
 	completed = cur_ops->completed() - completed;
 	if (completed > RCU_TORTURE_PIPE_LEN) {
@@ -994,6 +1009,7 @@ rcu_torture_reader(void *arg)
 					  rcu_read_lock_bh_held() ||
 					  rcu_read_lock_sched_held() ||
 					  srcu_read_lock_held(&srcu_ctl));
+		do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu);
 		if (p == NULL) {
 			/* Wait for rcu_torture_writer to get underway */
 			cur_ops->readunlock(idx);
@@ -1009,6 +1025,8 @@ rcu_torture_reader(void *arg)
 			/* Should not happen, but... */
 			pipe_count = RCU_TORTURE_PIPE_LEN;
 		}
+		if (pipe_count > 1)
+			rcutorture_trace_dump();
 		__this_cpu_inc(rcu_torture_count[pipe_count]);
 		completed = cur_ops->completed() - completed;
 		if (completed > RCU_TORTURE_PIPE_LEN) {
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 08/28] rcu: Add failure tracing to rcutorture Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  3:07   ` Josh Triplett
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 10/28] rcu: Disable preemption in rcu_is_cpu_idle() Paul E. McKenney
                   ` (19 subsequent siblings)
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/stallwarn.txt |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 4e95920..f3e0625 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -101,6 +101,11 @@ o	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
 	CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
 	messages.
 
+o	A hardware or software issue shuts off the scheduler-clock
+	interrupt on a CPU that is not in dyntick-idle mode.  This
+	problem really has happened, and seems to be most likely to
+	result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
+
 o	A bug in the RCU implementation.
 
 o	A hardware failure.  This is quite unlikely, but has occurred
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 10/28] rcu: Disable preemption in rcu_is_cpu_idle()
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period Paul E. McKenney
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

Because rcu_is_cpu_idle() is to be used to check for extended quiescent
states in RCU-preempt read-side critical sections, it cannot assume that
preemption is disabled.  And preemption must be disabled when accessing
the dyntick-idle state, because otherwise the following sequence of events
could occur:

1.	Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
	to CPU 1's per-CPU variables.

2.	Task B preempts Task A and starts running on CPU 1.

3.	Task A migrates to CPU 2.

4.	Task B blocks, leaving CPU 1 idle.

5.	Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
	information using the pointer fetched in step 1 above, and finds
	that CPU 1 is idle.

6.	Task A therefore incorrectly concludes that it is executing in
	an extended quiescent state, possibly issuing a spurious splat.

Therefore, this commit disables preemption within the rcu_is_cpu_idle()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index abb5167..c097394 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -556,12 +556,16 @@ void rcu_nmi_exit(void)
  * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle
  *
  * If the current CPU is in its idle loop and is neither in an interrupt
- * or NMI handler, return true.  The caller must have at least disabled
- * preemption.
+ * or NMI handler, return true.
  */
 int rcu_is_cpu_idle(void)
 {
-	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
+	int ret;
+
+	preempt_disable();
+	ret = (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
+	preempt_enable();
+	return ret;
 }
 
 #endif /* #ifdef CONFIG_PROVE_RCU */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 10/28] rcu: Disable preemption in rcu_is_cpu_idle() Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  3:16   ` Josh Triplett
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 12/28] rcu: Detect illegal rcu dereference in extended quiescent state Paul E. McKenney
                   ` (17 subsequent siblings)
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

From: Thomas Gleixner <tglx@linutronix.de>

When setting up an expedited grace period, if there were no readers, the
task will awaken itself.  This commit removes this useless self-awakening.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c        |    2 +-
 kernel/rcutree.h        |    3 ++-
 kernel/rcutree_plugin.h |   17 +++++++++++------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index c097394..bbcafba 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1320,7 +1320,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
 	else
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	if (need_report & RCU_OFL_TASKS_EXP_GP)
-		rcu_report_exp_rnp(rsp, rnp);
+		rcu_report_exp_rnp(rsp, rnp, true);
 	rcu_node_kthread_setaffinity(rnp, -1);
 }
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 0963fa1..fd2f87d 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -444,7 +444,8 @@ static void rcu_preempt_check_callbacks(int cpu);
 static void rcu_preempt_process_callbacks(void);
 void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
 #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_TREE_PREEMPT_RCU)
-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp);
+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
+			       bool wake);
 #endif /* #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_TREE_PREEMPT_RCU) */
 static int rcu_preempt_pending(int cpu);
 static int rcu_preempt_needs_cpu(int cpu);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 708dc57..7a7961f 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -410,7 +410,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 		 * then we need to report up the rcu_node hierarchy.
 		 */
 		if (!empty_exp && empty_exp_now)
-			rcu_report_exp_rnp(&rcu_preempt_state, rnp);
+			rcu_report_exp_rnp(&rcu_preempt_state, rnp, true);
 	} else {
 		local_irq_restore(flags);
 	}
@@ -732,9 +732,13 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
  * recursively up the tree.  (Calm down, calm down, we do the recursion
  * iteratively!)
  *
+ * Most callers will set the "wake" flag, but the task initiating the
+ * expedited grace period need not wake itself.
+ *
  * Caller must hold sync_rcu_preempt_exp_mutex.
  */
-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
+			       bool wake)
 {
 	unsigned long flags;
 	unsigned long mask;
@@ -747,7 +751,8 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 		}
 		if (rnp->parent == NULL) {
 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
-			wake_up(&sync_rcu_preempt_exp_wq);
+			if (wake)
+				wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
 		mask = rnp->grpmask;
@@ -780,7 +785,7 @@ sync_rcu_preempt_exp_init(struct rcu_state *rsp, struct rcu_node *rnp)
 		must_wait = 1;
 	}
 	if (!must_wait)
-		rcu_report_exp_rnp(rsp, rnp);
+		rcu_report_exp_rnp(rsp, rnp, false); /* Don't wake self. */
 }
 
 /*
@@ -1072,9 +1077,9 @@ EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
  * report on tasks preempted in RCU read-side critical sections during
  * expedited RCU grace periods.
  */
-static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
+static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
+			       bool wake)
 {
-	return;
 }
 
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 12/28] rcu: Detect illegal rcu dereference in extended quiescent state
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 13/28] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Paul E. McKenney
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Paul E. McKenney,
	Peter Zijlstra

From: Frederic Weisbecker <fweisbec@gmail.com>

Report that none of the rcu read lock maps are held while in an RCU
extended quiescent state (the section between rcu_idle_enter()
and rcu_idle_exit()). This helps detect any use of rcu_dereference()
and friends from within the section in idle where RCU is not allowed.

This way we can guarantee an extended quiescent window where the CPU
can be put in dyntick idle mode or can simply aoid to be part of any
global grace period completion while in the idle loop.

Uses of RCU from such mode are totally ignored by RCU, hence the
importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |   26 ++++++++++++++++++++++++++
 kernel/rcupdate.c        |    2 ++
 kernel/rcutiny.c         |    1 +
 kernel/rcutree.c         |    1 +
 4 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 8d315b0..bf91fcf 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -228,6 +228,15 @@ static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
+#ifdef CONFIG_PROVE_RCU
+extern int rcu_is_cpu_idle(void);
+#else /* !CONFIG_PROVE_RCU */
+static inline int rcu_is_cpu_idle(void)
+{
+	return 0;
+}
+#endif /* else !CONFIG_PROVE_RCU */
+
 extern struct lockdep_map rcu_lock_map;
 # define rcu_read_acquire() \
 		lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
@@ -262,6 +271,8 @@ static inline int rcu_read_lock_held(void)
 {
 	if (!debug_lockdep_rcu_enabled())
 		return 1;
+	if (rcu_is_cpu_idle())
+		return 0;
 	return lock_is_held(&rcu_lock_map);
 }
 
@@ -285,6 +296,19 @@ extern int rcu_read_lock_bh_held(void);
  *
  * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
  * and while lockdep is disabled.
+ *
+ * Note that if the CPU is in the idle loop from an RCU point of
+ * view (ie: that we are in the section between rcu_idle_enter() and
+ * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
+ * did an rcu_read_lock().  The reason for this is that RCU ignores CPUs
+ * that are in such a section, considering these as in extended quiescent
+ * state, so such a CPU is effectively never in an RCU read-side critical
+ * section regardless of what RCU primitives it invokes.  This state of
+ * affairs is required --- we need to keep an RCU-free window in idle
+ * where the CPU may possibly enter into low power mode. This way we can
+ * notice an extended quiescent state to other CPUs that started a grace
+ * period. Otherwise we would delay any grace period as long as we run in
+ * the idle task.
  */
 #ifdef CONFIG_PREEMPT_COUNT
 static inline int rcu_read_lock_sched_held(void)
@@ -293,6 +317,8 @@ static inline int rcu_read_lock_sched_held(void)
 
 	if (!debug_lockdep_rcu_enabled())
 		return 1;
+	if (rcu_is_cpu_idle())
+		return 0;
 	if (debug_locks)
 		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
 	return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 34a02da..05633da 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -93,6 +93,8 @@ int rcu_read_lock_bh_held(void)
 {
 	if (!debug_lockdep_rcu_enabled())
 		return 1;
+	if (rcu_is_cpu_idle())
+		return 0;
 	return in_softirq() || irqs_disabled();
 }
 EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 6b0ace4..089820d 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -158,6 +158,7 @@ int rcu_is_cpu_idle(void)
 {
 	return !rcu_dynticks_nesting;
 }
+EXPORT_SYMBOL(rcu_is_cpu_idle);
 
 #endif /* #ifdef CONFIG_PROVE_RCU */
 
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index bbcafba..28f8f92 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -567,6 +567,7 @@ int rcu_is_cpu_idle(void)
 	preempt_enable();
 	return ret;
 }
+EXPORT_SYMBOL(rcu_is_cpu_idle);
 
 #endif /* #ifdef CONFIG_PROVE_RCU */
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 13/28] rcu: Inform the user about extended quiescent state on PROVE_RCU warning
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (11 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 12/28] rcu: Detect illegal rcu dereference in extended quiescent state Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 14/28] rcu: Warn when rcu_read_lock() is used in extended quiescent state Paul E. McKenney
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Paul E. McKenney,
	Peter Zijlstra

From: Frederic Weisbecker <fweisbec@gmail.com>

Inform the user if an RCU usage error is detected by lockdep while in
an extended quiescent state (in this case, the RCU-free window in idle).
This is accomplished by adding a line to the RCU lockdep splat indicating
whether or not the splat occurred in extended quiescent state.

Uses of RCU from within extended quiescent state mode are totally ignored
by RCU, hence the importance of this diagnostic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/lockdep.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 1e48f1c..8873f6e 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4026,6 +4026,28 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
 	printk("%s:%d %s!\n", file, line, s);
 	printk("\nother info that might help us debug this:\n\n");
 	printk("\nrcu_scheduler_active = %d, debug_locks = %d\n", rcu_scheduler_active, debug_locks);
+
+	/*
+	 * If a CPU is in the RCU-free window in idle (ie: in the section
+	 * between rcu_idle_enter() and rcu_idle_exit(), then RCU
+	 * considers that CPU to be in an "extended quiescent state",
+	 * which means that RCU will be completely ignoring that CPU.
+	 * Therefore, rcu_read_lock() and friends have absolutely no
+	 * effect on a CPU running in that state. In other words, even if
+	 * such an RCU-idle CPU has called rcu_read_lock(), RCU might well
+	 * delete data structures out from under it.  RCU really has no
+	 * choice here: we need to keep an RCU-free window in idle where
+	 * the CPU may possibly enter into low power mode. This way we can
+	 * notice an extended quiescent state to other CPUs that started a grace
+	 * period. Otherwise we would delay any grace period as long as we run
+	 * in the idle task.
+	 *
+	 * So complain bitterly if someone does call rcu_read_lock(),
+	 * rcu_read_lock_bh() and so on from extended quiescent states.
+	 */
+	if (rcu_is_cpu_idle())
+		printk("RCU used illegally from extended quiescent state!\n");
+
 	lockdep_print_held_locks(curr);
 	printk("\nstack backtrace:\n");
 	dump_stack();
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 14/28] rcu: Warn when rcu_read_lock() is used in extended quiescent state
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (12 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 13/28] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 15/28] rcu: Remove one layer of abstraction from PROVE_RCU checking Paul E. McKenney
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Paul E. McKenney,
	Peter Zijlstra

From: Frederic Weisbecker <fweisbec@gmail.com>

We are currently able to detect uses of rcu_dereference_check() inside
extended quiescent states (such as the RCU-free window in idle).
But rcu_read_lock() and friends can be used without rcu_dereference(),
so that the earlier commit checking for use of rcu_dereference() and
friends while in RCU idle mode miss some error conditions.  This commit
therefore adds extended quiescent state checking to rcu_read_lock() and
friends.

Uses of RCU from within RCU-idle mode are totally ignored by
RCU, hence the importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |   52 +++++++++++++++++++++++++++++++++++++--------
 1 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index bf91fcf..d201c15 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -237,21 +237,53 @@ static inline int rcu_is_cpu_idle(void)
 }
 #endif /* else !CONFIG_PROVE_RCU */
 
+static inline void rcu_lock_acquire(struct lockdep_map *map)
+{
+	WARN_ON_ONCE(rcu_is_cpu_idle());
+	lock_acquire(map, 0, 0, 2, 1, NULL, _THIS_IP_);
+}
+
+static inline void rcu_lock_release(struct lockdep_map *map)
+{
+	WARN_ON_ONCE(rcu_is_cpu_idle());
+	lock_release(map, 1, _THIS_IP_);
+}
+
 extern struct lockdep_map rcu_lock_map;
-# define rcu_read_acquire() \
-		lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release()	lock_release(&rcu_lock_map, 1, _THIS_IP_)
+
+static inline void rcu_read_acquire(void)
+{
+	rcu_lock_acquire(&rcu_lock_map);
+}
+
+static inline void rcu_read_release(void)
+{
+	rcu_lock_release(&rcu_lock_map);
+}
 
 extern struct lockdep_map rcu_bh_lock_map;
-# define rcu_read_acquire_bh() \
-		lock_acquire(&rcu_bh_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release_bh()	lock_release(&rcu_bh_lock_map, 1, _THIS_IP_)
+
+static inline void rcu_read_acquire_bh(void)
+{
+	rcu_lock_acquire(&rcu_bh_lock_map);
+}
+
+static inline void rcu_read_release_bh(void)
+{
+	rcu_lock_release(&rcu_bh_lock_map);
+}
 
 extern struct lockdep_map rcu_sched_lock_map;
-# define rcu_read_acquire_sched() \
-		lock_acquire(&rcu_sched_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release_sched() \
-		lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
+
+static inline void rcu_read_acquire_sched(void)
+{
+	rcu_lock_acquire(&rcu_sched_lock_map);
+}
+
+static inline void rcu_read_release_sched(void)
+{
+	rcu_lock_release(&rcu_sched_lock_map);
+}
 
 extern int debug_lockdep_rcu_enabled(void);
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 15/28] rcu: Remove one layer of abstraction from PROVE_RCU checking
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (13 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 14/28] rcu: Warn when rcu_read_lock() is used in extended quiescent state Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 16/28] rcu: Warn when srcu_read_lock() is used in an extended quiescent state Paul E. McKenney
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

Simplify things a bit by substituting the definitions of the single-line
rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
rcu_read_release_bh(), rcu_read_acquire_sched(), and
rcu_read_release_sched() functions at their call points.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |   53 +++++++---------------------------------------
 1 files changed, 8 insertions(+), 45 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d201c15..5dd6fd8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -250,41 +250,8 @@ static inline void rcu_lock_release(struct lockdep_map *map)
 }
 
 extern struct lockdep_map rcu_lock_map;
-
-static inline void rcu_read_acquire(void)
-{
-	rcu_lock_acquire(&rcu_lock_map);
-}
-
-static inline void rcu_read_release(void)
-{
-	rcu_lock_release(&rcu_lock_map);
-}
-
 extern struct lockdep_map rcu_bh_lock_map;
-
-static inline void rcu_read_acquire_bh(void)
-{
-	rcu_lock_acquire(&rcu_bh_lock_map);
-}
-
-static inline void rcu_read_release_bh(void)
-{
-	rcu_lock_release(&rcu_bh_lock_map);
-}
-
 extern struct lockdep_map rcu_sched_lock_map;
-
-static inline void rcu_read_acquire_sched(void)
-{
-	rcu_lock_acquire(&rcu_sched_lock_map);
-}
-
-static inline void rcu_read_release_sched(void)
-{
-	rcu_lock_release(&rcu_sched_lock_map);
-}
-
 extern int debug_lockdep_rcu_enabled(void);
 
 /**
@@ -364,12 +331,8 @@ static inline int rcu_read_lock_sched_held(void)
 
 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
-# define rcu_read_acquire()		do { } while (0)
-# define rcu_read_release()		do { } while (0)
-# define rcu_read_acquire_bh()		do { } while (0)
-# define rcu_read_release_bh()		do { } while (0)
-# define rcu_read_acquire_sched()	do { } while (0)
-# define rcu_read_release_sched()	do { } while (0)
+# define rcu_lock_acquire(a)		do { } while (0)
+# define rcu_lock_release(a)		do { } while (0)
 
 static inline int rcu_read_lock_held(void)
 {
@@ -690,7 +653,7 @@ static inline void rcu_read_lock(void)
 {
 	__rcu_read_lock();
 	__acquire(RCU);
-	rcu_read_acquire();
+	rcu_lock_acquire(&rcu_lock_map);
 }
 
 /*
@@ -710,7 +673,7 @@ static inline void rcu_read_lock(void)
  */
 static inline void rcu_read_unlock(void)
 {
-	rcu_read_release();
+	rcu_lock_release(&rcu_lock_map);
 	__release(RCU);
 	__rcu_read_unlock();
 }
@@ -731,7 +694,7 @@ static inline void rcu_read_lock_bh(void)
 {
 	local_bh_disable();
 	__acquire(RCU_BH);
-	rcu_read_acquire_bh();
+	rcu_lock_acquire(&rcu_bh_lock_map);
 }
 
 /*
@@ -741,7 +704,7 @@ static inline void rcu_read_lock_bh(void)
  */
 static inline void rcu_read_unlock_bh(void)
 {
-	rcu_read_release_bh();
+	rcu_lock_release(&rcu_bh_lock_map);
 	__release(RCU_BH);
 	local_bh_enable();
 }
@@ -758,7 +721,7 @@ static inline void rcu_read_lock_sched(void)
 {
 	preempt_disable();
 	__acquire(RCU_SCHED);
-	rcu_read_acquire_sched();
+	rcu_lock_acquire(&rcu_sched_lock_map);
 }
 
 /* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
@@ -775,7 +738,7 @@ static inline notrace void rcu_read_lock_sched_notrace(void)
  */
 static inline void rcu_read_unlock_sched(void)
 {
-	rcu_read_release_sched();
+	rcu_lock_release(&rcu_sched_lock_map);
 	__release(RCU_SCHED);
 	preempt_enable();
 }
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 16/28] rcu: Warn when srcu_read_lock() is used in an extended quiescent state
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (14 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 15/28] rcu: Remove one layer of abstraction from PROVE_RCU checking Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Paul E. McKenney
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Frederic Weisbecker

Catch SRCU up to the other variants of RCU by making PROVE_RCU
complain if either srcu_read_lock() or srcu_read_lock_held() are
used from within RCU-idle mode.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/srcu.h |   36 +++++++++++++++++++++++-------------
 1 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 58971e8..4e0a3d4 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -28,6 +28,7 @@
 #define _LINUX_SRCU_H
 
 #include <linux/mutex.h>
+#include <linux/rcupdate.h>
 
 struct srcu_struct_array {
 	int c[2];
@@ -60,18 +61,10 @@ int __init_srcu_struct(struct srcu_struct *sp, const char *name,
 	__init_srcu_struct((sp), #sp, &__srcu_key); \
 })
 
-# define srcu_read_acquire(sp) \
-		lock_acquire(&(sp)->dep_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define srcu_read_release(sp) \
-		lock_release(&(sp)->dep_map, 1, _THIS_IP_)
-
 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
 int init_srcu_struct(struct srcu_struct *sp);
 
-# define srcu_read_acquire(sp)  do { } while (0)
-# define srcu_read_release(sp)  do { } while (0)
-
 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
 void cleanup_srcu_struct(struct srcu_struct *sp);
@@ -90,12 +83,29 @@ long srcu_batches_completed(struct srcu_struct *sp);
  * read-side critical section.  In absence of CONFIG_DEBUG_LOCK_ALLOC,
  * this assumes we are in an SRCU read-side critical section unless it can
  * prove otherwise.
+ *
+ * Note that if the CPU is in the idle loop from an RCU point of view
+ * (ie: that we are in the section between rcu_idle_enter() and
+ * rcu_idle_exit()) then srcu_read_lock_held() returns false even if
+ * the CPU did an srcu_read_lock().  The reason for this is that RCU
+ * ignores CPUs that are in such a section, considering these as in
+ * extended quiescent state, so such a CPU is effectively never in an
+ * RCU read-side critical section regardless of what RCU primitives it
+ * invokes.  This state of affairs is required --- we need to keep an
+ * RCU-free window in idle where the CPU may possibly enter into low
+ * power mode. This way we can notice an extended quiescent state to
+ * other CPUs that started a grace period. Otherwise we would delay any
+ * grace period as long as we run in the idle task.
  */
 static inline int srcu_read_lock_held(struct srcu_struct *sp)
 {
-	if (debug_locks)
-		return lock_is_held(&sp->dep_map);
-	return 1;
+	if (rcu_is_cpu_idle())
+		return 0;
+
+	if (!debug_locks)
+		return 1;
+
+	return lock_is_held(&sp->dep_map);
 }
 
 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
@@ -150,7 +160,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 {
 	int retval = __srcu_read_lock(sp);
 
-	srcu_read_acquire(sp);
+	rcu_lock_acquire(&(sp)->dep_map);
 	return retval;
 }
 
@@ -164,7 +174,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__releases(sp)
 {
-	srcu_read_release(sp);
+	rcu_lock_release(&(sp)->dep_map);
 	__srcu_read_unlock(sp, idx);
 }
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (15 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 16/28] rcu: Warn when srcu_read_lock() is used in an extended quiescent state Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  3:48   ` Josh Triplett
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 18/28] nohz: Separate out irq exit and idle loop dyntick logic Paul E. McKenney
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Paul E. McKenney

From: Frederic Weisbecker <fweisbec@gmail.com>

A common debug_lockdep_rcu_enabled() function is used to check whether
RCU lockdep splats should be reported, but srcu_read_lock() does not
use it.  This commit therefore brings srcu_read_lock_held() up to date.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/srcu.h |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 4e0a3d4..d4b1244 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -84,6 +84,9 @@ long srcu_batches_completed(struct srcu_struct *sp);
  * this assumes we are in an SRCU read-side critical section unless it can
  * prove otherwise.
  *
+ * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot
+ * and while lockdep is disabled.
+ *
  * Note that if the CPU is in the idle loop from an RCU point of view
  * (ie: that we are in the section between rcu_idle_enter() and
  * rcu_idle_exit()) then srcu_read_lock_held() returns false even if
@@ -102,7 +105,7 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
 	if (rcu_is_cpu_idle())
 		return 0;
 
-	if (!debug_locks)
+	if (!debug_lockdep_rcu_enabled())
 		return 1;
 
 	return lock_is_held(&sp->dep_map);
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 18/28] nohz: Separate out irq exit and idle loop dyntick logic
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (16 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop Paul E. McKenney
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Paul E. McKenney, Ingo Molnar, Peter Zijlstra,
	H. Peter Anvin, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

From: Frederic Weisbecker <fweisbec@gmail.com>

The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two
places:

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this
action.

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle
loop.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/arm/kernel/process.c              |    4 +-
 arch/avr32/kernel/process.c            |    4 +-
 arch/blackfin/kernel/process.c         |    4 +-
 arch/microblaze/kernel/process.c       |    4 +-
 arch/mips/kernel/process.c             |    4 +-
 arch/openrisc/kernel/idle.c            |    4 +-
 arch/powerpc/kernel/idle.c             |    4 +-
 arch/powerpc/platforms/iseries/setup.c |    8 ++--
 arch/s390/kernel/process.c             |    4 +-
 arch/sh/kernel/idle.c                  |    4 +-
 arch/sparc/kernel/process_64.c         |    4 +-
 arch/tile/kernel/process.c             |    4 +-
 arch/um/kernel/process.c               |    4 +-
 arch/unicore32/kernel/process.c        |    4 +-
 arch/x86/kernel/process_32.c           |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 include/linux/tick.h                   |   13 +++--
 kernel/softirq.c                       |    2 +-
 kernel/time/tick-sched.c               |   93 +++++++++++++++++++------------
 19 files changed, 99 insertions(+), 77 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 1a347f4..f9261d0 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -183,7 +183,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		leds_event(led_idle_start);
 		while (!need_resched()) {
 #ifdef CONFIG_HOTPLUG_CPU
@@ -210,7 +210,7 @@ void cpu_idle(void)
 			}
 		}
 		leds_event(led_idle_end);
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index ef5a2a0..6ee7952 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			cpu_idle_sleep();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a80a9e..7b141b5 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
 #endif
 		if (!idle)
 			idle = default_idle;
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index dbb8124..6dc123e 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
 		if (!idle)
 			idle = default_idle;
 
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 
 		preempt_enable_no_resched();
 		schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index b30cb25..d50a005 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched() && cpu_online(cpu)) {
 #ifdef CONFIG_MIPS_MT_SMTC
 			extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
 		     system_state == SYSTEM_BOOTING))
 			play_dead();
 #endif
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
index d5bc5f8..fb6a9bf 100644
--- a/arch/openrisc/kernel/idle.c
+++ b/arch/openrisc/kernel/idle.c
@@ -51,7 +51,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -69,7 +69,7 @@ void cpu_idle(void)
 			set_thread_flag(TIF_POLLING_NRFLAG);
 		}
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..878572f 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
@@ -93,7 +93,7 @@ void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index c25a081..e2f5fad 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
 static void iseries_shared_idle(void)
 {
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched() && !hvlpevent_is_pending()) {
 			local_irq_disable();
 			ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 
 		if (hvlpevent_is_pending())
 			process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		if (!need_resched()) {
 			while (!need_resched()) {
 				ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 541a750..db3e930 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
 void cpu_idle(void)
 {
 	for (;;) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched())
 			default_idle();
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index db4ecd7..6015743 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -89,7 +89,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -111,7 +111,7 @@ void cpu_idle(void)
 			start_critical_timings();
 		}
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..1235f63 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while(1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 
 		preempt_enable_no_resched();
 
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 9c45d8b..920e674 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 			if (cpu_is_offline(cpu))
 				BUG();  /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
 				local_irq_enable();
 			current_thread_info()->status |= TS_POLLING;
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 21c1ae7..41acf59 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
 		if (need_resched())
 			schedule();
 
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 	}
 }
 
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index ba401df..9999b9a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 			local_irq_disable();
 			stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
 			local_irq_enable();
 			start_critical_timings();
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7a3b651..ad93205 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -98,7 +98,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 
 			check_pgt_cache();
@@ -114,7 +114,7 @@ void cpu_idle(void)
 				pm_idle();
 			start_critical_timings();
 		}
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index f693e44..9ca714e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_stop_sched_tick(1);
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 
 			rmb();
@@ -147,7 +147,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_restart_sched_tick();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index ca40838..0df1d50 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -121,21 +121,22 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
-extern void tick_nohz_stop_sched_tick(int inidle);
-extern void tick_nohz_restart_sched_tick(void);
+extern void tick_nohz_idle_enter(void);
+extern void tick_nohz_idle_exit(void);
+extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_stop_sched_tick(int inidle)
+static inline void tick_nohz_idle_enter(void)
 {
-	if (inidle)
-		rcu_idle_enter();
+	rcu_idle_enter();
 }
-static inline void tick_nohz_restart_sched_tick(void)
+static inline void tick_nohz_idle_exit(void)
 {
 	rcu_idle_exit();
 }
+
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
 	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/softirq.c b/kernel/softirq.c
index fca82c3..d2be0e0 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -351,7 +351,7 @@ void irq_exit(void)
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
-		tick_nohz_stop_sched_tick(0);
+		tick_nohz_irq_exit();
 #endif
 	preempt_enable_no_resched();
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4692907..52b7ace 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -246,42 +246,17 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 
-/**
- * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
- *
- * When the next event is more than a tick into the future, stop the idle tick
- * Called either from the idle loop or from irq_exit() when an idle period was
- * just interrupted by an interrupt which did not cause a reschedule.
- */
-void tick_nohz_stop_sched_tick(int inidle)
+static void tick_nohz_stop_sched_tick(struct tick_sched *ts)
 {
-	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
-	struct tick_sched *ts;
+	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;
 	ktime_t last_update, expires, now;
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
 	int cpu;
 
-	local_irq_save(flags);
-
 	cpu = smp_processor_id();
 	ts = &per_cpu(tick_cpu_sched, cpu);
 
-	/*
-	 * Call to tick_nohz_start_idle stops the last_update_time from being
-	 * updated. Thus, it must not be called in the event we are called from
-	 * irq_exit() with the prior state different than idle.
-	 */
-	if (!inidle && !ts->inidle)
-		goto end;
-
-	/*
-	 * Set ts->inidle unconditionally. Even if the system did not
-	 * switch to NOHZ mode the cpu frequency governers rely on the
-	 * update of the idle time accounting in tick_nohz_start_idle().
-	 */
-	ts->inidle = 1;
-
 	now = tick_nohz_start_idle(cpu, ts);
 
 	/*
@@ -297,10 +272,10 @@ void tick_nohz_stop_sched_tick(int inidle)
 	}
 
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
-		goto end;
+		return;
 
 	if (need_resched())
-		goto end;
+		return;
 
 	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
 		static int ratelimit;
@@ -310,7 +285,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 			       (unsigned int) local_softirq_pending());
 			ratelimit++;
 		}
-		goto end;
+		return;
 	}
 
 	ts->idle_calls++;
@@ -442,10 +417,54 @@ out:
 	ts->next_jiffies = next_jiffies;
 	ts->last_jiffies = last_jiffies;
 	ts->sleep_length = ktime_sub(dev->next_event, now);
-end:
-	if (inidle)
-		rcu_idle_enter();
-	local_irq_restore(flags);
+}
+
+/**
+ * tick_nohz_idle_enter - stop the idle tick from the idle task
+ *
+ * When the next event is more than a tick into the future, stop the idle tick
+ * Called when we start the idle loop.
+ * This also enters into RCU extended quiescent state so that this CPU doesn't
+ * need anymore to be part of any global grace period completion. This way
+ * the tick can be stopped safely as we don't need to report quiescent states.
+ */
+void tick_nohz_idle_enter(void)
+{
+	struct tick_sched *ts;
+
+	WARN_ON_ONCE(irqs_disabled());
+
+	local_irq_disable();
+
+	ts = &__get_cpu_var(tick_cpu_sched);
+	/*
+	 * set ts->inidle unconditionally. even if the system did not
+	 * switch to nohz mode the cpu frequency governers rely on the
+	 * update of the idle time accounting in tick_nohz_start_idle().
+	 */
+	ts->inidle = 1;
+	tick_nohz_stop_sched_tick(ts);
+	rcu_idle_enter();
+
+	local_irq_enable();
+}
+
+/**
+ * tick_nohz_irq_exit - update next tick event from interrupt exit
+ *
+ * When an interrupt fires while we are idle and it doesn't cause
+ * a reschedule, it may still add, modify or delete a timer, enqueue
+ * an RCU callback, etc...
+ * So we need to re-calculate and reprogram the next tick event.
+ */
+void tick_nohz_irq_exit(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (!ts->inidle)
+		return;
+
+	tick_nohz_stop_sched_tick(ts);
 }
 
 /**
@@ -487,11 +506,13 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 }
 
 /**
- * tick_nohz_restart_sched_tick - restart the idle tick from the idle task
+ * tick_nohz_idle_exit - restart the idle tick from the idle task
  *
  * Restart the idle tick when the CPU is woken up from idle
+ * This also exit the RCU extended quiescent state. The CPU
+ * can use RCU again after this function is called.
  */
-void tick_nohz_restart_sched_tick(void)
+void tick_nohz_idle_exit(void)
 {
 	int cpu = smp_processor_id();
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (17 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 18/28] nohz: Separate out irq exit and idle loop dyntick logic Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  4:00   ` Josh Triplett
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 20/28] x86: Enter rcu extended qs after idle notifier call Paul E. McKenney
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Paul E. McKenney, Ingo Molnar, Peter Zijlstra,
	H. Peter Anvin, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

From: Frederic Weisbecker <fweisbec@gmail.com>

It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/arm/kernel/process.c              |    4 +-
 arch/avr32/kernel/process.c            |    4 +-
 arch/blackfin/kernel/process.c         |    4 +-
 arch/microblaze/kernel/process.c       |    4 +-
 arch/mips/kernel/process.c             |    4 +-
 arch/openrisc/kernel/idle.c            |    4 +-
 arch/powerpc/kernel/idle.c             |    4 +-
 arch/powerpc/platforms/iseries/setup.c |    8 +++---
 arch/s390/kernel/process.c             |    4 +-
 arch/sh/kernel/idle.c                  |    4 +-
 arch/sparc/kernel/process_64.c         |    4 +-
 arch/tile/kernel/process.c             |    4 +-
 arch/um/kernel/process.c               |    4 +-
 arch/unicore32/kernel/process.c        |    4 +-
 arch/x86/kernel/process_32.c           |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 include/linux/tick.h                   |   46 +++++++++++++++++++++++++++++--
 kernel/time/tick-sched.c               |   25 +++++++++--------
 18 files changed, 90 insertions(+), 49 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f9261d0..4f83362 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -183,7 +183,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		leds_event(led_idle_start);
 		while (!need_resched()) {
 #ifdef CONFIG_HOTPLUG_CPU
@@ -210,7 +210,7 @@ void cpu_idle(void)
 			}
 		}
 		leds_event(led_idle_end);
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 6ee7952..34c8c70 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			cpu_idle_sleep();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 7b141b5..57e0749 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
 #endif
 		if (!idle)
 			idle = default_idle;
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 6dc123e..c6ece38 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
 		if (!idle)
 			idle = default_idle;
 
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 
 		preempt_enable_no_resched();
 		schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d50a005..7df2ffc 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched() && cpu_online(cpu)) {
 #ifdef CONFIG_MIPS_MT_SMTC
 			extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
 		     system_state == SYSTEM_BOOTING))
 			play_dead();
 #endif
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
index fb6a9bf..2e82cd0 100644
--- a/arch/openrisc/kernel/idle.c
+++ b/arch/openrisc/kernel/idle.c
@@ -51,7 +51,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -69,7 +69,7 @@ void cpu_idle(void)
 			set_thread_flag(TIF_POLLING_NRFLAG);
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 878572f..2e782a3 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
@@ -93,7 +93,7 @@ void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index e2f5fad..77ff6eb 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
 static void iseries_shared_idle(void)
 {
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched() && !hvlpevent_is_pending()) {
 			local_irq_disable();
 			ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 
 		if (hvlpevent_is_pending())
 			process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		if (!need_resched()) {
 			while (!need_resched()) {
 				ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index db3e930..44028ae 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
 void cpu_idle(void)
 {
 	for (;;) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			default_idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 6015743..ad58e75 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -89,7 +89,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -111,7 +111,7 @@ void cpu_idle(void)
 			start_critical_timings();
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 1235f63..78b1bc0 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while(1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 
 		preempt_enable_no_resched();
 
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 920e674..53ac895 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 			if (cpu_is_offline(cpu))
 				BUG();  /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
 				local_irq_enable();
 			current_thread_info()->status |= TS_POLLING;
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 41acf59..9e7176b 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
 		if (need_resched())
 			schedule();
 
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 	}
 }
 
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 9999b9a..095ff5a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 			local_irq_disable();
 			stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
 			local_irq_enable();
 			start_critical_timings();
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ad93205..f311d096 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -98,7 +98,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 
 			check_pgt_cache();
@@ -114,7 +114,7 @@ void cpu_idle(void)
 				pm_idle();
 			start_critical_timings();
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9ca714e..e72daf9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 
 			rmb();
@@ -147,7 +147,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0df1d50..327434a 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -7,6 +7,7 @@
 #define _LINUX_TICK_H
 
 #include <linux/clockchips.h>
+#include <linux/irqflags.h>
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
@@ -121,18 +122,57 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
-extern void tick_nohz_idle_enter(void);
+extern void __tick_nohz_idle_enter(void);
+static inline void tick_nohz_idle_enter(void)
+{
+	local_irq_disable();
+	__tick_nohz_idle_enter();
+	local_irq_enable();
+}
 extern void tick_nohz_idle_exit(void);
+
+/*
+ * Call this pair of function if the arch doesn't make any use
+ * of RCU in-between. You won't need to call rcu_idle_enter() and
+ * rcu_idle_exit().
+ * Otherwise you need to call tick_nohz_idle_enter() and tick_nohz_idle_exit()
+ * and explicitly tell RCU about the window around the place the CPU enters low
+ * power mode where no RCU use is made. This is done by calling rcu_idle_enter()
+ * after the last use of RCU before the CPU is put to sleep and by calling
+ * rcu_idle_exit() before the first use of RCU after the CPU woke up.
+ */
+static inline void tick_nohz_idle_enter_norcu(void)
+{
+	/*
+	 * Also call rcu_idle_enter() in the irq disabled section even
+	 * if it disables irq itself.
+	 * Just an optimization that prevents from an interrupt happening
+	 * between it and __tick_nohz_idle_enter() to lose time to help
+	 * completing a grace period while we could be in extended grace
+	 * period already.
+	 */
+	local_irq_disable();
+	__tick_nohz_idle_enter();
+	rcu_idle_enter();
+	local_irq_enable();
+}
+static inline void tick_nohz_idle_exit_norcu(void)
+{
+	rcu_idle_exit();
+	tick_nohz_idle_exit();
+}
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_idle_enter(void)
+static inline void tick_nohz_idle_enter(void) { }
+static inline void tick_nohz_idle_exit(void) { }
+static inline void tick_nohz_idle_enter_norcu(void)
 {
 	rcu_idle_enter();
 }
-static inline void tick_nohz_idle_exit(void)
+static inline void tick_nohz_idle_exit_norcu(void)
 {
 	rcu_idle_exit();
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52b7ace..360d028 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -424,18 +424,22 @@ out:
  *
  * When the next event is more than a tick into the future, stop the idle tick
  * Called when we start the idle loop.
- * This also enters into RCU extended quiescent state so that this CPU doesn't
- * need anymore to be part of any global grace period completion. This way
- * the tick can be stopped safely as we don't need to report quiescent states.
+ *
+ * If no use of RCU is made in the idle loop between
+ * tick_nohz_idle_enter() and tick_nohz_idle_exit() calls, then
+ * tick_nohz_idle_enter_norcu() should be called instead and the arch
+ * doesn't need to call rcu_idle_enter() and rcu_idle_exit() explicitly.
+ *
+ * Otherwise the arch is responsible of calling:
+ *
+ * - rcu_idle_enter() after its last use of RCU before the CPU is put
+ *  to sleep.
+ * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
  */
-void tick_nohz_idle_enter(void)
+void __tick_nohz_idle_enter(void)
 {
 	struct tick_sched *ts;
 
-	WARN_ON_ONCE(irqs_disabled());
-
-	local_irq_disable();
-
 	ts = &__get_cpu_var(tick_cpu_sched);
 	/*
 	 * set ts->inidle unconditionally. even if the system did not
@@ -444,9 +448,6 @@ void tick_nohz_idle_enter(void)
 	 */
 	ts->inidle = 1;
 	tick_nohz_stop_sched_tick(ts);
-	rcu_idle_enter();
-
-	local_irq_enable();
 }
 
 /**
@@ -522,7 +523,7 @@ void tick_nohz_idle_exit(void)
 	ktime_t now;
 
 	local_irq_disable();
-	rcu_idle_exit();
+
 	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
 		now = ktime_get();
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 20/28] x86: Enter rcu extended qs after idle notifier call
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (18 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 21/28] x86: Call idle notifier after irq_enter() Paul E. McKenney
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Paul E. McKenney,
	Ingo Molnar, H. Peter Anvin

From: Frederic Weisbecker <fweisbec@gmail.com>

The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
the RCU-idle window (rcu_idle_enter() has been called). And it's
illegal to use rcu_read_lock() in that state.

This results in rcu reporting its bad mood:

[    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    1.275635] Hardware name: AMD690VM-FMH
[    1.275635] Modules linked in:
[    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[    1.275635] Call Trace:
[    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
[    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
[    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
[    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[    1.275635] ---[ end trace a22d306b065d4a66 ]---

Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/x86/kernel/process_64.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e72daf9..4a1535a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@ void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter_norcu();
+		tick_nohz_idle_enter();
 		while (!need_resched()) {
 
 			rmb();
@@ -137,8 +137,14 @@ void cpu_idle(void)
 			enter_idle();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
+
+			/* enter_idle() needs rcu for notifiers */
+			rcu_idle_enter();
+
 			if (cpuidle_idle_call())
 				pm_idle();
+
+			rcu_idle_exit();
 			start_critical_timings();
 
 			/* In many cases the interrupt that ended idle
@@ -147,7 +153,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_idle_exit_norcu();
+		tick_nohz_idle_exit();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 21/28] x86: Call idle notifier after irq_enter()
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (19 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 20/28] x86: Enter rcu extended qs after idle notifier call Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 22/28] rcu: Fix early call to rcu_idle_enter() Paul E. McKenney
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Paul E. McKenney,
	Ingo Molnar, H. Peter Anvin, Andy Henroid

From: Frederic Weisbecker <fweisbec@gmail.com>

Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for irq_enter() -> rcu_idle_exit() to be called before
doing so otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/x86/kernel/apic/apic.c              |    6 +++---
 arch/x86/kernel/apic/io_apic.c           |    2 +-
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    2 +-
 arch/x86/kernel/cpu/mcheck/threshold.c   |    2 +-
 arch/x86/kernel/irq.c                    |    6 +++---
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 52fa563..54e3472 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -857,8 +857,8 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
 	 * Besides, if we don't timer interrupts ignore the global
 	 * interrupt lock, which is the WrongThing (tm) to do.
 	 */
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	local_apic_timer_interrupt();
 	irq_exit();
 
@@ -1791,8 +1791,8 @@ void smp_spurious_interrupt(struct pt_regs *regs)
 {
 	u32 v;
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	/*
 	 * Check if this really is a spurious interrupt and ACK it
 	 * if it is a vectored one.  Just in case...
@@ -1828,8 +1828,8 @@ void smp_error_interrupt(struct pt_regs *regs)
 		"Illegal register address",	/* APIC Error Bit 7 */
 	};
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	/* First tickle the hardware, only then report what went on. -- REW */
 	v0 = apic_read(APIC_ESR);
 	apic_write(APIC_ESR, 0);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8eb863e..3f4a706 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2316,8 +2316,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
 	unsigned vector, me;
 
 	ack_APIC_irq();
-	exit_idle();
 	irq_enter();
+	exit_idle();
 
 	me = smp_processor_id();
 	for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 27c6251..f6bbc64 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -396,8 +396,8 @@ static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
 
 asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
 {
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	inc_irq_stat(irq_thermal_count);
 	smp_thermal_vector();
 	irq_exit();
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index d746df2..aa578ca 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -19,8 +19,8 @@ void (*mce_threshold_vector)(void) = default_threshold_interrupt;
 
 asmlinkage void smp_threshold_interrupt(void)
 {
-	exit_idle();
 	irq_enter();
+	exit_idle();
 	inc_irq_stat(irq_threshold_count);
 	mce_threshold_vector();
 	irq_exit();
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 6c0802e..73cf928 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -180,8 +180,8 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
 	unsigned vector = ~regs->orig_ax;
 	unsigned irq;
 
-	exit_idle();
 	irq_enter();
+	exit_idle();
 
 	irq = __this_cpu_read(vector_irq[vector]);
 
@@ -208,10 +208,10 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
 
 	ack_APIC_irq();
 
-	exit_idle();
-
 	irq_enter();
 
+	exit_idle();
+
 	inc_irq_stat(x86_platform_ipis);
 
 	if (x86_platform_ipi_callback)
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 22/28] rcu: Fix early call to rcu_idle_enter()
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (20 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 21/28] x86: Call idle notifier after irq_enter() Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 23/28] powerpc: Tell RCU about idle after hcall tracing Paul E. McKenney
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Frederic Weisbecker, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney

From: Frederic Weisbecker <fweisbec@gmail.com>

On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.

This is an illegal use of RCU because we may be in RCU
extended quiescent state if we interrupted an RCU-idle
window in the idle loop:

[  132.978883] ===============================
[  132.978883] [ INFO: suspicious RCU usage. ]
[  132.978883] -------------------------------
[  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[  132.978883]
[  132.978883] other info that might help us debug this:
[  132.978883]
[  132.978883]
[  132.978883] rcu_scheduler_active = 1, debug_locks = 0
[  132.978883] RCU used illegally from extended quiescent state!
[  132.978883] 2 locks held by swapper/0:
[  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[  132.978883]
[  132.978883] stack backtrace:
[  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
[  132.978883] Call Trace:
[  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
[  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
[  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
[  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
[  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112

Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/softirq.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d2be0e0..328aabb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -347,12 +347,12 @@ void irq_exit(void)
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
 
-	rcu_irq_exit();
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
 		tick_nohz_irq_exit();
 #endif
+	rcu_irq_exit();
 	preempt_enable_no_resched();
 }
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 23/28] powerpc: Tell RCU about idle after hcall tracing
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (21 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 22/28] rcu: Fix early call to rcu_idle_enter() Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count Paul E. McKenney
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
hypervisor calls that is traced is the H_CEDE call in the idle loop
that tells the hypervisor that this OS instance no longer needs the
current CPU.  However, tracing uses RCU, so this combination of kernel
configuration variables needs to avoid telling RCU about the current CPU's
idleness until after the H_CEDE-entry tracing completes on the one hand,
and must tell RCU that the the current CPU is no longer idle before the
H_CEDE-exit tracing starts.

In all other cases, it suffices to inform RCU of CPU idleness upon
idle-loop entry and exit.

This commit makes the required adjustments.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/idle.c            |   16 ++++++++++++++--
 arch/powerpc/platforms/pseries/lpar.c |    4 ++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 2e782a3..3cd73d1 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -46,6 +46,12 @@ static int __init powersave_off(char *arg)
 }
 __setup("powersave=off", powersave_off);
 
+#if defined(CONFIG_PPC_PSERIES) && defined(CONFIG_TRACEPOINTS)
+static const bool idle_uses_rcu = 1;
+#else
+static const bool idle_uses_rcu;
+#endif
+
 /*
  * The body of the idle task.
  */
@@ -56,7 +62,10 @@ void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_idle_enter_norcu();
+		if (idle_uses_rcu)
+			tick_nohz_idle_enter();
+		else
+			tick_nohz_idle_enter_norcu();
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
@@ -93,7 +102,10 @@ void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit_norcu();
+		if (idle_uses_rcu)
+			tick_nohz_idle_exit();
+		else
+			tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index c9a29da..be2a026 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -554,6 +554,8 @@ void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
 
 	(*depth)++;
 	trace_hcall_entry(opcode, args);
+	if (opcode == H_CEDE)
+		rcu_idle_enter();
 	(*depth)--;
 
 out:
@@ -574,6 +576,8 @@ void __trace_hcall_exit(long opcode, unsigned long retval,
 		goto out;
 
 	(*depth)++;
+	if (opcode == H_CEDE)
+		rcu_idle_exit();
 	trace_hcall_exit(opcode, retval, retbuf);
 	(*depth)--;
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (22 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 23/28] powerpc: Tell RCU about idle after hcall tracing Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  4:34   ` Josh Triplett
  2011-11-28 12:41   ` Peter Zijlstra
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 25/28] rcu: Deconfuse dynticks entry-exit tracing Paul E. McKenney
                   ` (4 subsequent siblings)
  28 siblings, 2 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney

The RCU implementations, including SRCU, are designed to be used in a
lock-like fashion, so that the read-side lock and unlock primitives must
execute in the same context for any given read-side critical section.
This constraint is enforced by lockdep-RCU.  However, there is a need for
something that acts more like a reference count than a lock, in order
to allow (for example) the reference to be acquired within the context
of an exception, while that same reference is released in the context of
the task that encountered the exception.  The cost of this capability is
that the read-side operations incur the overhead of disabling interrupts.
Some optimization is possible, and will be carried out if warranted.

Note that although the current implementation allows a given reference to
be acquired by one task and then released by another, all known possible
implementations that allow this have scalability problems.  Therefore,
a given reference must be released by the same task that acquired it,
though perhaps from an interrupt or exception handler running within
that task's context.

Requested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/srcu.h |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/srcu.c        |    3 ++-
 2 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index d4b1244..d5334d0 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -181,4 +181,54 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	__srcu_read_unlock(sp, idx);
 }
 
+/* Definitions for bulkref_t, currently defined in terms of SRCU. */
+
+typedef struct srcu_struct bulkref_t;
+int init_srcu_struct_fields(struct srcu_struct *sp);
+
+static inline int init_bulkref(bulkref_t *brp)
+{
+	return init_srcu_struct_fields(brp);
+}
+
+static inline void cleanup_bulkref(bulkref_t *brp)
+{
+	cleanup_srcu_struct(brp);
+}
+
+static inline int bulkref_get(bulkref_t *brp)
+{
+	unsigned long flags;
+	int ret;
+
+	local_irq_save(flags);
+	ret =  __srcu_read_lock(brp);
+	local_irq_restore(flags);
+	return ret;
+}
+
+static inline void bulkref_put(bulkref_t *brp, int idx)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__srcu_read_unlock(brp, idx);
+	local_irq_restore(flags);
+}
+
+static inline void bulkref_wait_old(bulkref_t *brp)
+{
+	synchronize_srcu(brp);
+}
+
+static inline void bulkref_wait_old_expedited(bulkref_t *brp)
+{
+	synchronize_srcu_expedited(brp);
+}
+
+static inline long bulkref_batches_completed(bulkref_t *brp)
+{
+	return srcu_batches_completed(brp);
+}
+
 #endif
diff --git a/kernel/srcu.c b/kernel/srcu.c
index 73ce23f..10214c8 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -34,13 +34,14 @@
 #include <linux/delay.h>
 #include <linux/srcu.h>
 
-static int init_srcu_struct_fields(struct srcu_struct *sp)
+int init_srcu_struct_fields(struct srcu_struct *sp)
 {
 	sp->completed = 0;
 	mutex_init(&sp->mutex);
 	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
 	return sp->per_cpu_ref ? 0 : -ENOMEM;
 }
+EXPORT_SYMBOL_GPL(init_srcu_struct_fields);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 25/28] rcu: Deconfuse dynticks entry-exit tracing
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (23 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 26/28] rcu: Add more information to the wrong-idle-task complaint Paul E. McKenney
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The trace_rcu_dyntick() trace event did not print both the old and
the new value of the nesting level, and furthermore printed only
the low-order 32 bits of it.  This could result in some confusion
when interpreting trace-event dumps, so this commit prints both
the old and the new value, prints the full 64 bits, and also selects
the process-entry/exit increment to print nicely in hexadecimal.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/trace/events/rcu.h |   15 +++++++++------
 kernel/rcu.h               |    7 +++++++
 kernel/rcutiny.c           |   28 +++++++++++++++++-----------
 kernel/rcutree.c           |   35 ++++++++++++++++++++---------------
 4 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 172620a..c29fb2f 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -246,21 +246,24 @@ TRACE_EVENT(rcu_fqs,
  */
 TRACE_EVENT(rcu_dyntick,
 
-	TP_PROTO(char *polarity, int nesting),
+	TP_PROTO(char *polarity, long long oldnesting, long long newnesting),
 
-	TP_ARGS(polarity, nesting),
+	TP_ARGS(polarity, oldnesting, newnesting),
 
 	TP_STRUCT__entry(
 		__field(char *, polarity)
-		__field(int, nesting)
+		__field(long long, oldnesting)
+		__field(long long, newnesting)
 	),
 
 	TP_fast_assign(
 		__entry->polarity = polarity;
-		__entry->nesting = nesting;
+		__entry->oldnesting = oldnesting;
+		__entry->newnesting = newnesting;
 	),
 
-	TP_printk("%s %d", __entry->polarity, __entry->nesting)
+	TP_printk("%s %llx %llx", __entry->polarity,
+		  __entry->oldnesting, __entry->newnesting)
 );
 
 /*
@@ -470,7 +473,7 @@ TRACE_EVENT(rcu_torture_read,
 #define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0)
 #define trace_rcu_quiescent_state_report(rcuname, gpnum, mask, qsmask, level, grplo, grphi, gp_tasks) do { } while (0)
 #define trace_rcu_fqs(rcuname, gpnum, cpu, qsevent) do { } while (0)
-#define trace_rcu_dyntick(polarity, nesting) do { } while (0)
+#define trace_rcu_dyntick(polarity, oldnesting, newnesting) do { } while (0)
 #define trace_rcu_callback(rcuname, rhp, qlen) do { } while (0)
 #define trace_rcu_kfree_callback(rcuname, rhp, offset, qlen) do { } while (0)
 #define trace_rcu_batch_start(rcuname, qlen, blimit) do { } while (0)
diff --git a/kernel/rcu.h b/kernel/rcu.h
index f600868..aa88baa 100644
--- a/kernel/rcu.h
+++ b/kernel/rcu.h
@@ -30,6 +30,13 @@
 #endif /* #else #ifdef CONFIG_RCU_TRACE */
 
 /*
+ * Process-level increment to ->dynticks_nesting field.  This allows for
+ * architectures that use half-interrupts and half-exceptions from
+ * process context.
+ */
+#define DYNTICK_TASK_NESTING (LLONG_MAX / 2 - 1)
+
+/*
  * debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally
  * by call_rcu() and rcu callback execution, and are therefore not part of the
  * RCU API. Leaving in rcupdate.h because they are used by all RCU flavors.
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 089820d..e0df33f 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -54,20 +54,21 @@ static void __call_rcu(struct rcu_head *head,
 
 #include "rcutiny_plugin.h"
 
-static long long rcu_dynticks_nesting = LLONG_MAX / 2;
+static long long rcu_dynticks_nesting = DYNTICK_TASK_NESTING;
 
 /* Common code for rcu_idle_enter() and rcu_irq_exit(), see kernel/rcutree.c. */
-static void rcu_idle_enter_common(void)
+static void rcu_idle_enter_common(long long oldval)
 {
 	if (rcu_dynticks_nesting) {
-		RCU_TRACE(trace_rcu_dyntick("--=", rcu_dynticks_nesting));
+		RCU_TRACE(trace_rcu_dyntick("--=",
+					    oldval, rcu_dynticks_nesting));
 		return;
 	}
-	RCU_TRACE(trace_rcu_dyntick("Start", rcu_dynticks_nesting));
+	RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
 	if (!idle_cpu(smp_processor_id())) {
 		WARN_ON_ONCE(1);	/* must be idle task! */
 		RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
-					    rcu_dynticks_nesting));
+					    oldval, rcu_dynticks_nesting));
 		ftrace_dump(DUMP_ALL);
 	}
 	rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
@@ -80,10 +81,12 @@ static void rcu_idle_enter_common(void)
 void rcu_idle_enter(void)
 {
 	unsigned long flags;
+	long long oldval;
 
 	local_irq_save(flags);
+	oldval = rcu_dynticks_nesting;
 	rcu_dynticks_nesting = 0;
-	rcu_idle_enter_common();
+	rcu_idle_enter_common(oldval);
 	local_irq_restore(flags);
 }
 
@@ -93,11 +96,13 @@ void rcu_idle_enter(void)
 void rcu_irq_exit(void)
 {
 	unsigned long flags;
+	long long oldval;
 
 	local_irq_save(flags);
+	oldval = rcu_dynticks_nesting;
 	rcu_dynticks_nesting--;
 	WARN_ON_ONCE(rcu_dynticks_nesting < 0);
-	rcu_idle_enter_common();
+	rcu_idle_enter_common(oldval);
 	local_irq_restore(flags);
 }
 
@@ -105,14 +110,15 @@ void rcu_irq_exit(void)
 static void rcu_idle_exit_common(long long oldval)
 {
 	if (oldval) {
-		RCU_TRACE(trace_rcu_dyntick("++=", rcu_dynticks_nesting));
+		RCU_TRACE(trace_rcu_dyntick("++=",
+					    oldval, rcu_dynticks_nesting));
 		return;
 	}
-	RCU_TRACE(trace_rcu_dyntick("End", oldval));
+	RCU_TRACE(trace_rcu_dyntick("End", oldval, rcu_dynticks_nesting));
 	if (!idle_cpu(smp_processor_id())) {
 		WARN_ON_ONCE(1);	/* must be idle task! */
 		RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task",
-			  oldval));
+			  oldval, rcu_dynticks_nesting));
 		ftrace_dump(DUMP_ALL);
 	}
 }
@@ -128,7 +134,7 @@ void rcu_idle_exit(void)
 	local_irq_save(flags);
 	oldval = rcu_dynticks_nesting;
 	WARN_ON_ONCE(oldval != 0);
-	rcu_dynticks_nesting = LLONG_MAX / 2;
+	rcu_dynticks_nesting = DYNTICK_TASK_NESTING;
 	rcu_idle_exit_common(oldval);
 	local_irq_restore(flags);
 }
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 28f8f92..cc04876 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -196,7 +196,7 @@ void rcu_note_context_switch(int cpu)
 EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
-	.dynticks_nesting = LLONG_MAX / 2,
+	.dynticks_nesting = DYNTICK_TASK_NESTING,
 	.dynticks = ATOMIC_INIT(1),
 };
 
@@ -348,17 +348,17 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
  * we really have entered idle, and must do the appropriate accounting.
  * The caller must have disabled interrupts.
  */
-static void rcu_idle_enter_common(struct rcu_dynticks *rdtp)
+static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)
 {
 	if (rdtp->dynticks_nesting) {
-		trace_rcu_dyntick("--=", rdtp->dynticks_nesting);
+		trace_rcu_dyntick("--=", oldval, rdtp->dynticks_nesting);
 		return;
 	}
-	trace_rcu_dyntick("Start", rdtp->dynticks_nesting);
+	trace_rcu_dyntick("Start", oldval, rdtp->dynticks_nesting);
 	if (!idle_cpu(smp_processor_id())) {
 		WARN_ON_ONCE(1);	/* must be idle task! */
 		trace_rcu_dyntick("Error on entry: not idle task",
-				   rdtp->dynticks_nesting);
+				   oldval, rdtp->dynticks_nesting);
 		ftrace_dump(DUMP_ALL);
 	}
 	/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
@@ -383,12 +383,14 @@ static void rcu_idle_enter_common(struct rcu_dynticks *rdtp)
 void rcu_idle_enter(void)
 {
 	unsigned long flags;
+	long long oldval;
 	struct rcu_dynticks *rdtp;
 
 	local_irq_save(flags);
 	rdtp = &__get_cpu_var(rcu_dynticks);
+	oldval = rdtp->dynticks_nesting;
 	rdtp->dynticks_nesting = 0;
-	rcu_idle_enter_common(rdtp);
+	rcu_idle_enter_common(rdtp, oldval);
 	local_irq_restore(flags);
 }
 
@@ -411,13 +413,15 @@ void rcu_idle_enter(void)
 void rcu_irq_exit(void)
 {
 	unsigned long flags;
+	long long oldval;
 	struct rcu_dynticks *rdtp;
 
 	local_irq_save(flags);
 	rdtp = &__get_cpu_var(rcu_dynticks);
+	oldval = rdtp->dynticks_nesting;
 	rdtp->dynticks_nesting--;
 	WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
-	rcu_idle_enter_common(rdtp);
+	rcu_idle_enter_common(rdtp, oldval);
 	local_irq_restore(flags);
 }
 
@@ -431,7 +435,7 @@ void rcu_irq_exit(void)
 static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
 {
 	if (oldval) {
-		trace_rcu_dyntick("++=", rdtp->dynticks_nesting);
+		trace_rcu_dyntick("++=", oldval, rdtp->dynticks_nesting);
 		return;
 	}
 	smp_mb__before_atomic_inc();  /* Force ordering w/previous sojourn. */
@@ -439,10 +443,11 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
 	/* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
 	smp_mb__after_atomic_inc();  /* See above. */
 	WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
-	trace_rcu_dyntick("End", oldval);
+	trace_rcu_dyntick("End", oldval, rdtp->dynticks_nesting);
 	if (!idle_cpu(smp_processor_id())) {
 		WARN_ON_ONCE(1);	/* must be idle task! */
-		trace_rcu_dyntick("Error on exit: not idle task", oldval);
+		trace_rcu_dyntick("Error on exit: not idle task",
+				  oldval, rdtp->dynticks_nesting);
 		ftrace_dump(DUMP_ALL);
 	}
 }
@@ -453,8 +458,8 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
  * Exit idle mode, in other words, -enter- the mode in which RCU
  * read-side critical sections can occur.
  *
- * We crowbar the ->dynticks_nesting field to LLONG_MAX/2 to allow for
- * the possibility of usermode upcalls messing up our count
+ * We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NESTING to
+ * allow for the possibility of usermode upcalls messing up our count
  * of interrupt nesting level during the busy period that is just
  * now starting.
  */
@@ -468,7 +473,7 @@ void rcu_idle_exit(void)
 	rdtp = &__get_cpu_var(rcu_dynticks);
 	oldval = rdtp->dynticks_nesting;
 	WARN_ON_ONCE(oldval != 0);
-	rdtp->dynticks_nesting = LLONG_MAX / 2;
+	rdtp->dynticks_nesting = DYNTICK_TASK_NESTING;
 	rcu_idle_exit_common(rdtp, oldval);
 	local_irq_restore(flags);
 }
@@ -2012,7 +2017,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 		rdp->nxttail[i] = &rdp->nxtlist;
 	rdp->qlen = 0;
 	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
-	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != LLONG_MAX / 2);
+	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_NESTING);
 	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
 	rdp->cpu = cpu;
 	rdp->rsp = rsp;
@@ -2040,7 +2045,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptible)
 	rdp->qlen_last_fqs_check = 0;
 	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
-	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != LLONG_MAX / 2);
+	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_NESTING);
 	WARN_ON_ONCE((atomic_read(&rdp->dynticks->dynticks) & 0x1) != 1);
 	raw_spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 26/28] rcu: Add more information to the wrong-idle-task complaint
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (24 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 25/28] rcu: Deconfuse dynticks entry-exit tracing Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks Paul E. McKenney
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The current code just complains if the current task is not the idle task.
This commit therefore adds printing of the identity of the idle task.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutiny.c |   12 ++++++++++--
 kernel/rcutree.c |   12 ++++++++++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index e0df33f..f4e7bc3 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -66,10 +66,14 @@ static void rcu_idle_enter_common(long long oldval)
 	}
 	RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
 	if (!idle_cpu(smp_processor_id())) {
-		WARN_ON_ONCE(1);	/* must be idle task! */
+		struct task_struct *idle = idle_task(smp_processor_id());
+
 		RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
 					    oldval, rcu_dynticks_nesting));
 		ftrace_dump(DUMP_ALL);
+		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+			  current->pid, current->comm,
+			  idle->pid, idle->comm); /* must be idle task! */
 	}
 	rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
 }
@@ -116,10 +120,14 @@ static void rcu_idle_exit_common(long long oldval)
 	}
 	RCU_TRACE(trace_rcu_dyntick("End", oldval, rcu_dynticks_nesting));
 	if (!idle_cpu(smp_processor_id())) {
-		WARN_ON_ONCE(1);	/* must be idle task! */
+		struct task_struct *idle = idle_task(smp_processor_id());
+
 		RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task",
 			  oldval, rcu_dynticks_nesting));
 		ftrace_dump(DUMP_ALL);
+		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+			  current->pid, current->comm,
+			  idle->pid, idle->comm); /* must be idle task! */
 	}
 }
 
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cc04876..2a8d9a6 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -356,10 +356,14 @@ static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)
 	}
 	trace_rcu_dyntick("Start", oldval, rdtp->dynticks_nesting);
 	if (!idle_cpu(smp_processor_id())) {
-		WARN_ON_ONCE(1);	/* must be idle task! */
+		struct task_struct *idle = idle_task(smp_processor_id());
+
 		trace_rcu_dyntick("Error on entry: not idle task",
 				   oldval, rdtp->dynticks_nesting);
 		ftrace_dump(DUMP_ALL);
+		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+			  current->pid, current->comm,
+			  idle->pid, idle->comm); /* must be idle task! */
 	}
 	/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
 	smp_mb__before_atomic_inc();  /* See above. */
@@ -445,10 +449,14 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
 	WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
 	trace_rcu_dyntick("End", oldval, rdtp->dynticks_nesting);
 	if (!idle_cpu(smp_processor_id())) {
-		WARN_ON_ONCE(1);	/* must be idle task! */
+		struct task_struct *idle = idle_task(smp_processor_id());
+
 		trace_rcu_dyntick("Error on exit: not idle task",
 				  oldval, rdtp->dynticks_nesting);
 		ftrace_dump(DUMP_ALL);
+		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+			  current->pid, current->comm,
+			  idle->pid, idle->comm); /* must be idle task! */
 	}
 }
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (25 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 26/28] rcu: Add more information to the wrong-idle-task complaint Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  4:47   ` Josh Triplett
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks Paul E. McKenney
  2011-11-03  4:55 ` [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Josh Triplett
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
CPU has any RCU callbacks queued.  This means that workloads for which
each CPU wakes up and does some RCU updates every few ticks will never
enter dyntick-idle mode.  This can result in significant unnecessary power
consumption, so this patch permits a given to enter dyntick-idle mode if
it has callbacks, but only if that same CPU has completed all current
work for the RCU core.  We determine use rcu_pending() to determine
whether a given CPU has completed all current work for the RCU core.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c        |    5 +-
 kernel/rcutree.h        |    4 +
 kernel/rcutree_plugin.h |  156 +++++++++++++++++++++++++++++++++++++----------
 3 files changed, 132 insertions(+), 33 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 2a8d9a6..3d7b474 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -365,6 +365,7 @@ static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)
 			  current->pid, current->comm,
 			  idle->pid, idle->comm); /* must be idle task! */
 	}
+	rcu_prepare_for_idle(smp_processor_id());
 	/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
 	smp_mb__before_atomic_inc();  /* See above. */
 	atomic_inc(&rdtp->dynticks);
@@ -1085,6 +1086,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
 	 * callbacks are waiting on the grace period that just now
 	 * completed.
 	 */
+	rcu_schedule_wake_gp_end();
 	if (*rdp->nxttail[RCU_WAIT_TAIL] == NULL) {
 		raw_spin_unlock(&rnp->lock);	 /* irqs remain disabled. */
 
@@ -1670,6 +1672,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
 				&__get_cpu_var(rcu_sched_data));
 	__rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data));
 	rcu_preempt_process_callbacks();
+	rcu_wake_cpus_for_gp_end();
 	trace_rcu_utilization("End RCU core");
 }
 
@@ -1923,7 +1926,7 @@ static int rcu_pending(int cpu)
  * by the current CPU, even if none need be done immediately, returning
  * 1 if so.
  */
-static int rcu_needs_cpu_quick_check(int cpu)
+static int rcu_cpu_has_callbacks(int cpu)
 {
 	/* RCU callbacks either ready or pending? */
 	return per_cpu(rcu_sched_data, cpu).nxtlist ||
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index fd2f87d..ea32405 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -88,6 +88,7 @@ struct rcu_dynticks {
 				    /* Process level is worth LLONG_MAX/2. */
 	int dynticks_nmi_nesting;   /* Track NMI nesting level. */
 	atomic_t dynticks;	    /* Even value for idle, else odd. */
+	int wake_gp_end;	    /* A GP ended, need to wake up CPUs. */
 };
 
 /* RCU's kthread states for tracing. */
@@ -467,5 +468,8 @@ static void rcu_yield(void (*f)(unsigned long), unsigned long arg);
 #endif /* #ifdef CONFIG_RCU_BOOST */
 static void rcu_cpu_kthread_setrt(int cpu, int to_rt);
 static void __cpuinit rcu_prepare_kthreads(int cpu);
+static void rcu_prepare_for_idle(int cpu);
+static void rcu_wake_cpus_for_gp_end(void);
+static void rcu_schedule_wake_gp_end(void);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 7a7961f..5509147 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1953,7 +1953,31 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
  */
 int rcu_needs_cpu(int cpu)
 {
-	return rcu_needs_cpu_quick_check(cpu);
+	return rcu_cpu_has_callbacks(cpu);
+}
+
+/*
+ * Do the idle-entry grace-period work, which, because CONFIG_RCU_FAST_NO_HZ=y,
+ * is nothing.
+ */
+static void rcu_prepare_for_idle(int cpu)
+{
+}
+
+/*
+ * CPUs are never putting themselves to sleep with callbacks pending,
+ * so there is no need to awaken them.
+ */
+static void rcu_wake_cpus_for_gp_end(void)
+{
+}
+
+/*
+ * CPUs are never putting themselves to sleep with callbacks pending,
+ * so there is no need to schedule the act of awakening them.
+ */
+static void rcu_schedule_wake_gp_end(void)
+{
 }
 
 #else /* #if !defined(CONFIG_RCU_FAST_NO_HZ) */
@@ -1961,47 +1985,56 @@ int rcu_needs_cpu(int cpu)
 #define RCU_NEEDS_CPU_FLUSHES 5
 static DEFINE_PER_CPU(int, rcu_dyntick_drain);
 static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
+static DEFINE_PER_CPU(bool, rcu_awake_at_gp_end);
 
 /*
- * Check to see if any future RCU-related work will need to be done
- * by the current CPU, even if none need be done immediately, returning
- * 1 if so.  This function is part of the RCU implementation; it is -not-
- * an exported member of the RCU API.
+ * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
+ * callbacks on this CPU, (2) this CPU has not yet attempted to enter
+ * dyntick-idle mode, and (3) this CPU is in the process of attempting to
+ * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed
+ * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
+ * it is better to incur scheduling-clock interrupts than to spin
+ * continuously for the same time duration!
+ */
+int rcu_needs_cpu(int cpu)
+{
+	/* If no callbacks, RCU doesn't need the CPU. */
+	if (!rcu_cpu_has_callbacks(cpu))
+		return 0;
+	/* Otherwise, RCU needs the CPU only if it recently tried and failed. */
+	return per_cpu(rcu_dyntick_holdoff, cpu) == jiffies;
+}
+
+/*
+ * Check to see if any RCU-related work can be done by the current CPU,
+ * and if so, schedule a softirq to get it done.  This function is part
+ * of the RCU implementation; it is -not- an exported member of the RCU API.
  *
- * Because we are not supporting preemptible RCU, attempt to accelerate
- * any current grace periods so that RCU no longer needs this CPU, but
- * only if all other CPUs are already in dynticks-idle mode.  This will
- * allow the CPU cores to be powered down immediately, as opposed to after
- * waiting many milliseconds for grace periods to elapse.
+ * The idea is for the current CPU to clear out all work required by the
+ * RCU core for the current grace period, so that this CPU can be permitted
+ * to enter dyntick-idle mode.  In some cases, it will need to be awakened
+ * at the end of the grace period by whatever CPU ends the grace period.
+ * This allows CPUs to go dyntick-idle more quickly, and to reduce the
+ * number of wakeups by a modest integer factor.
  *
  * Because it is not legal to invoke rcu_process_callbacks() with irqs
  * disabled, we do one pass of force_quiescent_state(), then do a
  * invoke_rcu_core() to cause rcu_process_callbacks() to be invoked
  * later.  The per-cpu rcu_dyntick_drain variable controls the sequencing.
+ *
+ * The caller must have disabled interrupts.
  */
-int rcu_needs_cpu(int cpu)
+static void rcu_prepare_for_idle(int cpu)
 {
 	int c = 0;
-	int snap;
-	int thatcpu;
 
-	/* Check for being in the holdoff period. */
-	if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies)
-		return rcu_needs_cpu_quick_check(cpu);
-
-	/* Don't bother unless we are the last non-dyntick-idle CPU. */
-	for_each_online_cpu(thatcpu) {
-		if (thatcpu == cpu)
-			continue;
-		snap = atomic_add_return(0, &per_cpu(rcu_dynticks,
-						     thatcpu).dynticks);
-		smp_mb(); /* Order sampling of snap with end of grace period. */
-		if ((snap & 0x1) != 0) {
-			per_cpu(rcu_dyntick_drain, cpu) = 0;
-			per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
-			return rcu_needs_cpu_quick_check(cpu);
-		}
+	/* If no callbacks or in the holdoff period, enter dyntick-idle. */
+	if (!rcu_cpu_has_callbacks(cpu)) {
+		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
+		return;
 	}
+	if (per_cpu(rcu_dyntick_holdoff, cpu) == jiffies)
+		return;
 
 	/* Check and update the rcu_dyntick_drain sequencing. */
 	if (per_cpu(rcu_dyntick_drain, cpu) <= 0) {
@@ -2010,10 +2043,25 @@ int rcu_needs_cpu(int cpu)
 	} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
 		/* We have hit the limit, so time to give up. */
 		per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
-		return rcu_needs_cpu_quick_check(cpu);
+		if (!rcu_pending(cpu)) {
+			per_cpu(rcu_awake_at_gp_end, cpu) = 1;
+			return;  /* Nothing to do immediately. */
+		}
+		invoke_rcu_core();  /* Force the CPU out of dyntick-idle. */
+		return;
 	}
 
-	/* Do one step pushing remaining RCU callbacks through. */
+	/*
+	 * Do one step of pushing the remaining RCU callbacks through
+	 * the RCU core state machine.
+	 */
+#ifdef CONFIG_TREE_PREEMPT_RCU
+	if (per_cpu(rcu_preempt_data, cpu).nxtlist) {
+		rcu_preempt_qs(cpu);
+		force_quiescent_state(&rcu_preempt_state, 0);
+		c = c || per_cpu(rcu_preempt_data, cpu).nxtlist;
+	}
+#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
 	if (per_cpu(rcu_sched_data, cpu).nxtlist) {
 		rcu_sched_qs(cpu);
 		force_quiescent_state(&rcu_sched_state, 0);
@@ -2028,7 +2076,51 @@ int rcu_needs_cpu(int cpu)
 	/* If RCU callbacks are still pending, RCU still needs this CPU. */
 	if (c)
 		invoke_rcu_core();
-	return c;
 }
 
+/*
+ * Wake up a CPU by invoking the RCU core.  Intended for use by
+ * rcu_wake_cpus_for_gp_end(), which passes this function to
+ * smp_call_function_single().
+ */
+static void rcu_wake_cpu(void *unused)
+{
+	invoke_rcu_core();
+}
+
+/*
+ * If an RCU grace period ended recently, scan the rcu_awake_at_gp_end
+ * per-CPU variables, and wake up any CPUs that requested a wakeup.
+ */
+static void rcu_wake_cpus_for_gp_end(void)
+{
+	int cpu;
+	struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks);
+
+	if (!rdtp->wake_gp_end)
+		return;
+	rdtp->wake_gp_end = 0;
+	for_each_online_cpu(cpu) {
+		if (per_cpu(rcu_awake_at_gp_end, cpu)) {
+			per_cpu(rcu_awake_at_gp_end, cpu) = 0;
+			smp_call_function_single(cpu, rcu_wake_cpu, NULL, 0);
+		}
+	}
+}
+
+/*
+ * A grace period has just ended, and so we will need to awaken CPUs
+ * that now have work to do.  But we cannot send IPIs with interrupts
+ * disabled, so just set a flag so that this will happen upon exit
+ * from RCU core processing.
+ */
+static void rcu_schedule_wake_gp_end(void)
+{
+	struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks);
+
+	rdtp->wake_gp_end = 1;
+}
+
+/* @@@ need tracing as well. */
+
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (26 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks Paul E. McKenney
@ 2011-11-02 20:30 ` Paul E. McKenney
  2011-11-03  4:55   ` Josh Triplett
  2011-11-03  4:55 ` [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Josh Triplett
  28 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-02 20:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, Paul E. McKenney, Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

RCU has traditionally relied on idle_cpu() to determine whether a given
CPU is running in the context of an idle task, but recent changes have
invalidated this approach.  This commit therefore switches from idle_cpu
to "current->pid != 0".

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Suggested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutiny.c |    4 ++--
 kernel/rcutree.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index f4e7bc3..35f8a07 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -65,7 +65,7 @@ static void rcu_idle_enter_common(long long oldval)
 		return;
 	}
 	RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
-	if (!idle_cpu(smp_processor_id())) {
+	if (current->pid != 0) {
 		struct task_struct *idle = idle_task(smp_processor_id());
 
 		RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
@@ -119,7 +119,7 @@ static void rcu_idle_exit_common(long long oldval)
 		return;
 	}
 	RCU_TRACE(trace_rcu_dyntick("End", oldval, rcu_dynticks_nesting));
-	if (!idle_cpu(smp_processor_id())) {
+	if (!current->pid != 0) {
 		struct task_struct *idle = idle_task(smp_processor_id());
 
 		RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task",
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 3d7b474..414af68 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -355,7 +355,7 @@ static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)
 		return;
 	}
 	trace_rcu_dyntick("Start", oldval, rdtp->dynticks_nesting);
-	if (!idle_cpu(smp_processor_id())) {
+	if (current->pid != 0) {
 		struct task_struct *idle = idle_task(smp_processor_id());
 
 		trace_rcu_dyntick("Error on entry: not idle task",
@@ -449,7 +449,7 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
 	smp_mb__after_atomic_inc();  /* See above. */
 	WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
 	trace_rcu_dyntick("End", oldval, rdtp->dynticks_nesting);
-	if (!idle_cpu(smp_processor_id())) {
+	if (current->pid != 0) {
 		struct task_struct *idle = idle_task(smp_processor_id());
 
 		trace_rcu_dyntick("Error on exit: not idle task",
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection Paul E. McKenney
@ 2011-11-03  2:57   ` Josh Triplett
  2011-11-03 19:42     ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  2:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 01:30:26PM -0700, Paul E. McKenney wrote:
> There are a number of bugs that can leak or overuse lock classes,
> which can cause the maximum number of lock classes (currently 8191)
> to be exceeded.  However, the documentation does not tell you how to
> track down these problems.  This commit addresses this shortcoming.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  Documentation/lockdep-design.txt |   61 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 61 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
> index abf768c..383bb23 100644
> --- a/Documentation/lockdep-design.txt
> +++ b/Documentation/lockdep-design.txt
> @@ -221,3 +221,64 @@ when the chain is validated for the first time, is then put into a hash
>  table, which hash-table can be checked in a lockfree manner. If the
>  locking chain occurs again later on, the hash table tells us that we
>  dont have to validate the chain again.
> +
> +Troubleshooting:
> +----------------
> +
> +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
> +Exceeding this number will trigger the following lockdep warning:
> +
> +	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
> +
> +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
> +desktop systems have less than 1,000 lock classes, so this warning
> +normally results from lock-class leakage or failure to properly
> +initialize locks.  These two problems are illustrated below:
> +
> +1.	Repeated module loading and unloading while running the validator
> +	will result in lock-class leakage.  The issue here is that each
> +	load of the module will create a new set of lock classes for that
> +	module's locks, but module unloading does not remove old classes.

I'd explicitly add a parenthetical here: (see below about reusing lock
classes for why).  I stared at this for a minute trying to think about
why the old classes couldn't go away, before realizing this fell into
the case you described below: removing them would require cleaning up
any dependency chains involving them.

> +	Therefore, if that module is loaded and unloaded repeatedly,
> +	the number of lock classes will eventually reach the maximum.
> +
> +2.	Using structures such as arrays that have large numbers of
> +	locks that are not explicitly initialized.  For example,
> +	a hash table with 8192 buckets where each bucket has its
> +	own spinlock_t will consume 8192 lock classes -unless- each
> +	spinlock is initialized, for example, using spin_lock_init().
> +	Failure to properly initialize the per-bucket spinlocks would
> +	guarantee lock-class overflow.	In contrast, a loop that called
> +	spin_lock_init() on each lock would place all 8192 locks into a
> +	single lock class.
> +
> +	The moral of this story is that you should always explicitly
> +	initialize your locks.

Spin locks *require* initialization, right?  Doesn't this constitute a
bug regardless of lockdep?

If so, could we simply arrange to have lockdep scream when it encounters
an uninitialized spinlock?

> +One might argue that the validator should be modified to allow lock
> +classes to be reused.  However, if you are tempted to make this argument,
> +first review the code and think through the changes that would be
> +required, keeping in mind that the lock classes to be removed are likely
> +to be linked into the lock-dependency graph.  This turns out to be a
> +harder to do than to say.

Typo fix: s/to be a harder/to be harder/.

> +Of course, if you do run out of lock classes, the next thing to do is
> +to find the offending lock classes.  First, the following command gives
> +you the number of lock classes currently in use along with the maximum:
> +
> +	grep "lock-classes" /proc/lockdep_stats
> +
> +This command produces the following output on a modest Power system:
> +
> +	 lock-classes:                          748 [max: 8191]

Does Power matter here?  Could this just say "a modest system"?

> +If the number allocated (748 above) increases continually over time,
> +then there is likely a leak.  The following command can be used to
> +identify the leaking lock classes:
> +
> +	grep "BD" /proc/lockdep
> +
> +Run the command and save the output, then compare against the output
> +from a later run of this command to identify the leakers.  This same
> +output can also help you find situations where lock initialization
> +has been omitted.

You might consider giving an example of what a lack of lock
initialization would look like here.

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning Paul E. McKenney
@ 2011-11-03  3:07   ` Josh Triplett
  2011-11-03 13:25     ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  3:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 01:30:30PM -0700, Paul E. McKenney wrote:
> One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
> These turned out to be caused by a bug that stopped scheduling-clock
> tick interrupts from being sent to a given CPU for several hundred seconds.

Out of curiosity, what caused this bug?

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period Paul E. McKenney
@ 2011-11-03  3:16   ` Josh Triplett
  2011-11-03 19:43     ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  3:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 01:30:32PM -0700, Paul E. McKenney wrote:
> -static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> +static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
> +			       bool wake)
>  {
> -	return;
>  }

Removing this return represents a separate cleanup, which ought to go in
a separate commit.

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Paul E. McKenney
@ 2011-11-03  3:48   ` Josh Triplett
  2011-11-03 11:14     ` Frederic Weisbecker
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  3:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Frederic Weisbecker

On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> A common debug_lockdep_rcu_enabled() function is used to check whether
> RCU lockdep splats should be reported, but srcu_read_lock() does not
> use it.  This commit therefore brings srcu_read_lock_held() up to date.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Just how signed off does this patch need to be? ;)

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop Paul E. McKenney
@ 2011-11-03  4:00   ` Josh Triplett
  2011-11-03 11:54     ` Frederic Weisbecker
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  4:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Frederic Weisbecker,
	Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
	Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar,
	Peter Zijlstra, H. Peter Anvin, Russell King, Paul Mackerras,
	Heiko Carstens, Paul Mundt

On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
> 
> To prepare for fixing this, add two new APIs:
> tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> 
> If no use of RCU is made in the idle loop between
> tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> must instead call the new *_norcu() version such that the arch doesn't
> need to call rcu_idle_enter() and rcu_idle_exit().

The _norcu names confused me a bit.  At first, I thought they meant
"idle but not RCU idle, so you can use RCU", but from re-reading the
commit message, apparently they mean "idle and RCU idle, so don't use
RCU".  What about something like _forbid_rcu instead?  Or,
alternatively, why not just go ahead and separate the two types of idle
entirely rather than introducing the _norcu variants first?

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count Paul E. McKenney
@ 2011-11-03  4:34   ` Josh Triplett
  2011-11-03 13:34     ` Paul E. McKenney
  2011-11-28 12:41   ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  4:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 01:30:45PM -0700, Paul E. McKenney wrote:
> The RCU implementations, including SRCU, are designed to be used in a
> lock-like fashion, so that the read-side lock and unlock primitives must
> execute in the same context for any given read-side critical section.
> This constraint is enforced by lockdep-RCU.  However, there is a need for
> something that acts more like a reference count than a lock, in order
> to allow (for example) the reference to be acquired within the context
> of an exception, while that same reference is released in the context of
> the task that encountered the exception.  The cost of this capability is
> that the read-side operations incur the overhead of disabling interrupts.
> Some optimization is possible, and will be carried out if warranted.
> 
> Note that although the current implementation allows a given reference to
> be acquired by one task and then released by another, all known possible
> implementations that allow this have scalability problems.  Therefore,
> a given reference must be released by the same task that acquired it,
> though perhaps from an interrupt or exception handler running within
> that task's context.

This new bulkref API seems in dire need of documentation. :)

> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -181,4 +181,54 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
>  	__srcu_read_unlock(sp, idx);
>  }
>  
> +/* Definitions for bulkref_t, currently defined in terms of SRCU. */
> +
> +typedef struct srcu_struct bulkref_t;
> +int init_srcu_struct_fields(struct srcu_struct *sp);
> +
> +static inline int init_bulkref(bulkref_t *brp)
> +{
> +	return init_srcu_struct_fields(brp);
> +}

Why can't this call init_srcu_struct and avoid the need to use the
previously unexported internal function?

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks Paul E. McKenney
@ 2011-11-03  4:47   ` Josh Triplett
  2011-11-03 19:53     ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  4:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Paul E. McKenney

On Wed, Nov 02, 2011 at 01:30:48PM -0700, Paul E. McKenney wrote:
>  /*
> - * Check to see if any future RCU-related work will need to be done
> - * by the current CPU, even if none need be done immediately, returning
> - * 1 if so.  This function is part of the RCU implementation; it is -not-
> - * an exported member of the RCU API.
> + * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
> + * callbacks on this CPU, (2) this CPU has not yet attempted to enter
> + * dyntick-idle mode, and (3) this CPU is in the process of attempting to
> + * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed

This sentence doesn't quite work; "if either...and..." should become
"either...or" or "if...and".

> + * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
> + * it is better to incur scheduling-clock interrupts than to spin
> + * continuously for the same time duration!
> + */
> +int rcu_needs_cpu(int cpu)
> +{
> +	/* If no callbacks, RCU doesn't need the CPU. */
> +	if (!rcu_cpu_has_callbacks(cpu))
> +		return 0;
> +	/* Otherwise, RCU needs the CPU only if it recently tried and failed. */
> +	return per_cpu(rcu_dyntick_holdoff, cpu) == jiffies;

Sigh, one more use of jiffies. :(

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks Paul E. McKenney
@ 2011-11-03  4:55   ` Josh Triplett
  2011-11-03 21:00     ` Paul E. McKenney
  2011-11-09 14:52     ` Peter Zijlstra
  0 siblings, 2 replies; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  4:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Paul E. McKenney

On Wed, Nov 02, 2011 at 01:30:49PM -0700, Paul E. McKenney wrote:
> From: Paul E. McKenney <paul.mckenney@linaro.org>
> 
> RCU has traditionally relied on idle_cpu() to determine whether a given
> CPU is running in the context of an idle task, but recent changes have
> invalidated this approach.  This commit therefore switches from idle_cpu
> to "current->pid != 0".

Could you elaborate a bit on "recent changes"?  It looks like you mean
commit 908a3283728d92df36e0c7cd63304fd35e93a8a9; if so, could you add
that reference to the commit message?

Also, the hard-coded use of "current->pid != 0" concerns me.  Could this
use some existing function?  Does idle_task() help?  If no appropriate
predicate exists, perhaps it should.  is_idle_task(current)?

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3
  2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
                   ` (27 preceding siblings ...)
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks Paul E. McKenney
@ 2011-11-03  4:55 ` Josh Triplett
  2011-11-03 21:45   ` Paul E. McKenney
  28 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03  4:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 01:30:17PM -0700, Paul E. McKenney wrote:
> This patchset permits idle tasks to use RCU read-side critical sections,
> although they are still prohibited between tick_nohz_idle_exit_norcu()
> and tick_nohz_idle_exit_norcu(); makes synchronize_sched_expedited()
> better able to share work among concurrent callers, allows ftrace_dump()
> to be invoked from modules, dumps tracing upon detection of an rcutorture
> failure, detects illegal use of RCU read-side critical sections from
> extended quiescent states, legitimizes the pre-existin use of RCU in the
> idle notifiers, fixes a memory-barrier botch, introduces an SRCU-like bulk
> reference count, improve dynticks entry/exit tracing, further improves
> RCU's ability to allow a given CPU to enter dyntick-idle mode quickly,
> fixes idle-task checks, updates documentation, and additional fixes
> from a still-ongoing top-to-bottom inspection of RCU.  The patches are
> as follows:

I've reviewed all of these patches.  For all of them except those
indicated below:
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

> 1.	Strengthen memory barriers used in PowerPC value-returning
> 	atomics and locking primitives.  It is likely that this
> 	commit will be superseded by something from the powerpc
> 	maintainers.  The need for this strengthening was validated
> 	by tooling from Peter Sewell's group at the University of
> 	Cambridge.

As before, I don't have the background on powerpc to provide review on
this one.  However, I trust Peter Sewell's group to have gotten the
details right. :)

> 5.	Document the troubleshooting of lockdep lock-class leaks.

I replied with a few comments and a typo fix.

> 11.	Remove a useless self-awaken when setting up expedited grace
> 	periods, courtesy of Thomas Gleixner and the -rt effort.

Replied with a fix: commit needs splitting.

> 12-17.	Make lockdep-RCU warn when RCU read-side primitives are
> 	invoked from an idle RCU extended quiescent state, mostly
> 	courtesy of Frederic Weisbecker.

Replied to 17 with a minor nit, but Reviewed-by still applies to all
five with or without that nit fixed.

> 18-23.	Separate out the scheduler-clock tick's idea of dyntick
> 	idle from RCU's notion of an idle extended quiescent state, mostly
> 	courtesy of Frederic Weisbecker.  These commits are needed for
> 	Frederic's work to suppress the scheduler-clock tick when there
> 	is but one runnable task on a given CPU.

Very much looking forward to that work.  Any pointer to more information
on tickless-when-one-task?

Replied to patch 19 with some naming comments; the rest seem fine.

> 24.	Introduce a bulk reference count, which is related to SRCU,
> 	but which allows a reference to be acquired in an irq handler
> 	and released by the task that was interrupted.

Replied with comments.

> 27.	Allow CPUs with pending RCU callbacks to enter dyntick-idle
> 	mode.  Beware this commit, as it compiled and passed rcutorture
> 	on the first try, which historically has indicated the presence
> 	of subtle and highly destructive bugs.

Heh.  I reviewed this one particularly carefully, then, but I didn't
find any logic errors.  I did reply with a couple of comments, though.

> 28.	Fix RCU's determination of whether or not it is running in the
> 	context of an idle task.

Replied with concerns.

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-03  3:48   ` Josh Triplett
@ 2011-11-03 11:14     ` Frederic Weisbecker
  2011-11-03 13:19       ` Steven Rostedt
  2011-11-03 13:29       ` Paul E. McKenney
  0 siblings, 2 replies; 74+ messages in thread
From: Frederic Weisbecker @ 2011-11-03 11:14 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 08:48:54PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> > From: Frederic Weisbecker <fweisbec@gmail.com>
> > 
> > A common debug_lockdep_rcu_enabled() function is used to check whether
> > RCU lockdep splats should be reported, but srcu_read_lock() does not
> > use it.  This commit therefore brings srcu_read_lock_held() up to date.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Just how signed off does this patch need to be? ;)

Dunno but I feel uncomfortable now with that strange feeling I'm walking
on the street with two Paul holding my hand on each side.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-03  4:00   ` Josh Triplett
@ 2011-11-03 11:54     ` Frederic Weisbecker
  2011-11-03 13:32       ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Frederic Weisbecker @ 2011-11-03 11:54 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > From: Frederic Weisbecker <fweisbec@gmail.com>
> > 
> > It is assumed that rcu won't be used once we switch to tickless
> > mode and until we restart the tick. However this is not always
> > true, as in x86-64 where we dereference the idle notifiers after
> > the tick is stopped.
> > 
> > To prepare for fixing this, add two new APIs:
> > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > 
> > If no use of RCU is made in the idle loop between
> > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > must instead call the new *_norcu() version such that the arch doesn't
> > need to call rcu_idle_enter() and rcu_idle_exit().
> 
> The _norcu names confused me a bit.  At first, I thought they meant
> "idle but not RCU idle, so you can use RCU", but from re-reading the
> commit message, apparently they mean "idle and RCU idle, so don't use
> RCU".  What about something like _forbid_rcu instead?  Or,
> alternatively, why not just go ahead and separate the two types of idle
> entirely rather than introducing the _norcu variants first?

Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?

Sounds clear but too long. May be we can shorten the tick_nohz thing in the
beginning.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-03 11:14     ` Frederic Weisbecker
@ 2011-11-03 13:19       ` Steven Rostedt
  2011-11-03 13:30         ` Paul E. McKenney
  2011-11-03 13:29       ` Paul E. McKenney
  1 sibling, 1 reply; 74+ messages in thread
From: Steven Rostedt @ 2011-11-03 13:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, Paul E. McKenney, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, niv, tglx, peterz,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches

On Thu, 2011-11-03 at 12:14 +0100, Frederic Weisbecker wrote:
> On Wed, Nov 02, 2011 at 08:48:54PM -0700, Josh Triplett wrote:
> > On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > A common debug_lockdep_rcu_enabled() function is used to check whether
> > > RCU lockdep splats should be reported, but srcu_read_lock() does not
> > > use it.  This commit therefore brings srcu_read_lock_held() up to date.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > Just how signed off does this patch need to be? ;)
> 
> Dunno but I feel uncomfortable now with that strange feeling I'm walking
> on the street with two Paul holding my hand on each side.

We already established the split Paul, but what is really scarey, is
that the Pauls on both sides of you, are the evil Pauls. ;)

-- Steve



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning
  2011-11-03  3:07   ` Josh Triplett
@ 2011-11-03 13:25     ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 13:25 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 08:07:50PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:30PM -0700, Paul E. McKenney wrote:
> > One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
> > These turned out to be caused by a bug that stopped scheduling-clock
> > tick interrupts from being sent to a given CPU for several hundred seconds.
> 
> Out of curiosity, what caused this bug?

If I remember correctly, software/configuration bugs in the clock code.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-03 11:14     ` Frederic Weisbecker
  2011-11-03 13:19       ` Steven Rostedt
@ 2011-11-03 13:29       ` Paul E. McKenney
  2011-11-03 13:59         ` Steven Rostedt
  1 sibling, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 13:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches

On Thu, Nov 03, 2011 at 12:14:20PM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 02, 2011 at 08:48:54PM -0700, Josh Triplett wrote:
> > On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > A common debug_lockdep_rcu_enabled() function is used to check whether
> > > RCU lockdep splats should be reported, but srcu_read_lock() does not
> > > use it.  This commit therefore brings srcu_read_lock_held() up to date.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > Just how signed off does this patch need to be? ;)

If you have sufficient patience to scroll past the Signed-off-by's
to see the patch, then there clearly are not enough.  ;-)

> Dunno but I feel uncomfortable now with that strange feeling I'm walking
> on the street with two Paul holding my hand on each side.

I did catch one of these, but missed the other.  Here is the history:

o	Paul wrote the patch.

o	Frederic reworked the patches that this one depended on,
	and then resent the patch.

o	Paul did "git am -s" on the series that Frederic sent,
	which added the extra Signed-off-by.
	
It is not clear to me what the Signed-off-by chain should look like in
this case.  My default action would be to remove my second Signed-off-by.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-03 13:19       ` Steven Rostedt
@ 2011-11-03 13:30         ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 13:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Josh Triplett, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, niv, tglx, peterz,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches

On Thu, Nov 03, 2011 at 09:19:26AM -0400, Steven Rostedt wrote:
> On Thu, 2011-11-03 at 12:14 +0100, Frederic Weisbecker wrote:
> > On Wed, Nov 02, 2011 at 08:48:54PM -0700, Josh Triplett wrote:
> > > On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > 
> > > > A common debug_lockdep_rcu_enabled() function is used to check whether
> > > > RCU lockdep splats should be reported, but srcu_read_lock() does not
> > > > use it.  This commit therefore brings srcu_read_lock_held() up to date.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > Just how signed off does this patch need to be? ;)
> > 
> > Dunno but I feel uncomfortable now with that strange feeling I'm walking
> > on the street with two Paul holding my hand on each side.
> 
> We already established the split Paul, but what is really scarey, is
> that the Pauls on both sides of you, are the evil Pauls. ;)

Well, I -could- make the first one by my linaro.org address and the
second one be my linux.vnet.ibm.com address...  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-03 11:54     ` Frederic Weisbecker
@ 2011-11-03 13:32       ` Paul E. McKenney
  2011-11-03 15:31         ` Josh Triplett
  0 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 13:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > It is assumed that rcu won't be used once we switch to tickless
> > > mode and until we restart the tick. However this is not always
> > > true, as in x86-64 where we dereference the idle notifiers after
> > > the tick is stopped.
> > > 
> > > To prepare for fixing this, add two new APIs:
> > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > 
> > > If no use of RCU is made in the idle loop between
> > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > must instead call the new *_norcu() version such that the arch doesn't
> > > need to call rcu_idle_enter() and rcu_idle_exit().
> > 
> > The _norcu names confused me a bit.  At first, I thought they meant
> > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > commit message, apparently they mean "idle and RCU idle, so don't use
> > RCU".  What about something like _forbid_rcu instead?  Or,
> > alternatively, why not just go ahead and separate the two types of idle
> > entirely rather than introducing the _norcu variants first?
> 
> Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> 
> Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> beginning.

How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
entry to the idle loop and tick_nohz_rcu_idle_exit() vs
tick_nohz_idle_exit() on exit?

That said, I don't feel all that strongly on this naming topic.

								Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-03  4:34   ` Josh Triplett
@ 2011-11-03 13:34     ` Paul E. McKenney
  2011-11-03 20:19       ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 13:34 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 09:34:41PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:45PM -0700, Paul E. McKenney wrote:
> > The RCU implementations, including SRCU, are designed to be used in a
> > lock-like fashion, so that the read-side lock and unlock primitives must
> > execute in the same context for any given read-side critical section.
> > This constraint is enforced by lockdep-RCU.  However, there is a need for
> > something that acts more like a reference count than a lock, in order
> > to allow (for example) the reference to be acquired within the context
> > of an exception, while that same reference is released in the context of
> > the task that encountered the exception.  The cost of this capability is
> > that the read-side operations incur the overhead of disabling interrupts.
> > Some optimization is possible, and will be carried out if warranted.
> > 
> > Note that although the current implementation allows a given reference to
> > be acquired by one task and then released by another, all known possible
> > implementations that allow this have scalability problems.  Therefore,
> > a given reference must be released by the same task that acquired it,
> > though perhaps from an interrupt or exception handler running within
> > that task's context.
> 
> This new bulkref API seems in dire need of documentation. :)
> 
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -181,4 +181,54 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
> >  	__srcu_read_unlock(sp, idx);
> >  }
> >  
> > +/* Definitions for bulkref_t, currently defined in terms of SRCU. */
> > +
> > +typedef struct srcu_struct bulkref_t;
> > +int init_srcu_struct_fields(struct srcu_struct *sp);
> > +
> > +static inline int init_bulkref(bulkref_t *brp)
> > +{
> > +	return init_srcu_struct_fields(brp);
> > +}
> 
> Why can't this call init_srcu_struct and avoid the need to use the
> previously unexported internal function?

Seems reasonable now that you mention it.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-03 13:29       ` Paul E. McKenney
@ 2011-11-03 13:59         ` Steven Rostedt
  2011-11-03 20:14           ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Steven Rostedt @ 2011-11-03 13:59 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, Josh Triplett, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, niv, tglx, peterz,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches

On Thu, 2011-11-03 at 06:29 -0700, Paul E. McKenney wrote:
> On Thu, Nov 03, 2011 at 12:14:20PM +0100, Frederic Weisbecker wrote:
> > On Wed, Nov 02, 2011 at 08:48:54PM -0700, Josh Triplett wrote:
> > > On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > 
> > > > A common debug_lockdep_rcu_enabled() function is used to check whether
> > > > RCU lockdep splats should be reported, but srcu_read_lock() does not
> > > > use it.  This commit therefore brings srcu_read_lock_held() up to date.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > Just how signed off does this patch need to be? ;)
> 
> If you have sufficient patience to scroll past the Signed-off-by's
> to see the patch, then there clearly are not enough.  ;-)
> 
> > Dunno but I feel uncomfortable now with that strange feeling I'm walking
> > on the street with two Paul holding my hand on each side.
> 
> I did catch one of these, but missed the other.  Here is the history:
> 
> o	Paul wrote the patch.
> 
> o	Frederic reworked the patches that this one depended on,
> 	and then resent the patch.
> 
> o	Paul did "git am -s" on the series that Frederic sent,
> 	which added the extra Signed-off-by.
> 	
> It is not clear to me what the Signed-off-by chain should look like in
> this case.  My default action would be to remove my second Signed-off-by.

The author should be you (change the From: to you not Frederic), and
then the first SoB would be Frederic, and yours at the end as you
committed it.

I would also state in the change log what Frederic did to the original
patch.

-- Steve



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-03 13:32       ` Paul E. McKenney
@ 2011-11-03 15:31         ` Josh Triplett
  2011-11-03 16:06           ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Josh Triplett @ 2011-11-03 15:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > 
> > > > It is assumed that rcu won't be used once we switch to tickless
> > > > mode and until we restart the tick. However this is not always
> > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > the tick is stopped.
> > > > 
> > > > To prepare for fixing this, add two new APIs:
> > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > 
> > > > If no use of RCU is made in the idle loop between
> > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > 
> > > The _norcu names confused me a bit.  At first, I thought they meant
> > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > alternatively, why not just go ahead and separate the two types of idle
> > > entirely rather than introducing the _norcu variants first?
> > 
> > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > 
> > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > beginning.
> 
> How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> tick_nohz_idle_exit() on exit?
> 
> That said, I don't feel all that strongly on this naming topic.

Mostly I think that since this series tries to separate the concepts of
"idle nohz" and "rcu extended quiescent state", we should end up with
two entirely separate functions delimiting those two, without any
functions that poke both with correspondingly complex compound names.

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-03 15:31         ` Josh Triplett
@ 2011-11-03 16:06           ` Paul E. McKenney
  2011-11-09 14:28             ` Peter Zijlstra
  2011-11-09 16:48             ` Frederic Weisbecker
  0 siblings, 2 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 16:06 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Frederic Weisbecker, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > 
> > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > mode and until we restart the tick. However this is not always
> > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > the tick is stopped.
> > > > > 
> > > > > To prepare for fixing this, add two new APIs:
> > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > 
> > > > > If no use of RCU is made in the idle loop between
> > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > 
> > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > alternatively, why not just go ahead and separate the two types of idle
> > > > entirely rather than introducing the _norcu variants first?
> > > 
> > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > 
> > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > beginning.
> > 
> > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > tick_nohz_idle_exit() on exit?
> > 
> > That said, I don't feel all that strongly on this naming topic.
> 
> Mostly I think that since this series tries to separate the concepts of
> "idle nohz" and "rcu extended quiescent state", we should end up with
> two entirely separate functions delimiting those two, without any
> functions that poke both with correspondingly complex compound names.

Having four API members rather than the current six does seem quite
attractive to me.  Frederic, any reason why this approach won't work?

						Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection
  2011-11-03  2:57   ` Josh Triplett
@ 2011-11-03 19:42     ` Paul E. McKenney
  2011-11-09 14:02       ` Peter Zijlstra
  0 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 19:42 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 07:57:16PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:26PM -0700, Paul E. McKenney wrote:
> > There are a number of bugs that can leak or overuse lock classes,
> > which can cause the maximum number of lock classes (currently 8191)
> > to be exceeded.  However, the documentation does not tell you how to
> > track down these problems.  This commit addresses this shortcoming.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  Documentation/lockdep-design.txt |   61 ++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 61 insertions(+), 0 deletions(-)
> > 
> > diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
> > index abf768c..383bb23 100644
> > --- a/Documentation/lockdep-design.txt
> > +++ b/Documentation/lockdep-design.txt
> > @@ -221,3 +221,64 @@ when the chain is validated for the first time, is then put into a hash
> >  table, which hash-table can be checked in a lockfree manner. If the
> >  locking chain occurs again later on, the hash table tells us that we
> >  dont have to validate the chain again.
> > +
> > +Troubleshooting:
> > +----------------
> > +
> > +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
> > +Exceeding this number will trigger the following lockdep warning:
> > +
> > +	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
> > +
> > +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
> > +desktop systems have less than 1,000 lock classes, so this warning
> > +normally results from lock-class leakage or failure to properly
> > +initialize locks.  These two problems are illustrated below:
> > +
> > +1.	Repeated module loading and unloading while running the validator
> > +	will result in lock-class leakage.  The issue here is that each
> > +	load of the module will create a new set of lock classes for that
> > +	module's locks, but module unloading does not remove old classes.
> 
> I'd explicitly add a parenthetical here: (see below about reusing lock
> classes for why).  I stared at this for a minute trying to think about
> why the old classes couldn't go away, before realizing this fell into
> the case you described below: removing them would require cleaning up
> any dependency chains involving them.

Done!

> > +	Therefore, if that module is loaded and unloaded repeatedly,
> > +	the number of lock classes will eventually reach the maximum.
> > +
> > +2.	Using structures such as arrays that have large numbers of
> > +	locks that are not explicitly initialized.  For example,
> > +	a hash table with 8192 buckets where each bucket has its
> > +	own spinlock_t will consume 8192 lock classes -unless- each
> > +	spinlock is initialized, for example, using spin_lock_init().
> > +	Failure to properly initialize the per-bucket spinlocks would
> > +	guarantee lock-class overflow.	In contrast, a loop that called
> > +	spin_lock_init() on each lock would place all 8192 locks into a
> > +	single lock class.
> > +
> > +	The moral of this story is that you should always explicitly
> > +	initialize your locks.
> 
> Spin locks *require* initialization, right?  Doesn't this constitute a
> bug regardless of lockdep?
> 
> If so, could we simply arrange to have lockdep scream when it encounters
> an uninitialized spinlock?

I reworded to distinguish between compile-time initialization (which will
cause lockdep to have a separate class per instance) and run-time
initialization (which will cause lockdep to have one class total).

Making lockdep scream in this case might be useful, but if I understand
correctly, that would give false positives for compile-time initialized
global locks.

> > +One might argue that the validator should be modified to allow lock
> > +classes to be reused.  However, if you are tempted to make this argument,
> > +first review the code and think through the changes that would be
> > +required, keeping in mind that the lock classes to be removed are likely
> > +to be linked into the lock-dependency graph.  This turns out to be a
> > +harder to do than to say.
> 
> Typo fix: s/to be a harder/to be harder/.

Fixed.

> > +Of course, if you do run out of lock classes, the next thing to do is
> > +to find the offending lock classes.  First, the following command gives
> > +you the number of lock classes currently in use along with the maximum:
> > +
> > +	grep "lock-classes" /proc/lockdep_stats
> > +
> > +This command produces the following output on a modest Power system:
> > +
> > +	 lock-classes:                          748 [max: 8191]
> 
> Does Power matter here?  Could this just say "a modest system"?

Good point -- true but irrelevant.  Removed "Power".

> > +If the number allocated (748 above) increases continually over time,
> > +then there is likely a leak.  The following command can be used to
> > +identify the leaking lock classes:
> > +
> > +	grep "BD" /proc/lockdep
> > +
> > +Run the command and save the output, then compare against the output
> > +from a later run of this command to identify the leakers.  This same
> > +output can also help you find situations where lock initialization
> > +has been omitted.
> 
> You might consider giving an example of what a lack of lock
> initialization would look like here.

Hopefully the compile-time vs. run-time clears this up.

								Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period
  2011-11-03  3:16   ` Josh Triplett
@ 2011-11-03 19:43     ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 19:43 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 08:16:13PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:32PM -0700, Paul E. McKenney wrote:
> > -static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> > +static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
> > +			       bool wake)
> >  {
> > -	return;
> >  }
> 
> Removing this return represents a separate cleanup, which ought to go in
> a separate commit.

I split this out.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks
  2011-11-03  4:47   ` Josh Triplett
@ 2011-11-03 19:53     ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 19:53 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Paul E. McKenney

On Wed, Nov 02, 2011 at 09:47:44PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:48PM -0700, Paul E. McKenney wrote:
> >  /*
> > - * Check to see if any future RCU-related work will need to be done
> > - * by the current CPU, even if none need be done immediately, returning
> > - * 1 if so.  This function is part of the RCU implementation; it is -not-
> > - * an exported member of the RCU API.
> > + * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
> > + * callbacks on this CPU, (2) this CPU has not yet attempted to enter
> > + * dyntick-idle mode, and (3) this CPU is in the process of attempting to
> > + * enter dyntick-idle mode.  Otherwise, if we have recently tried and failed
> 
> This sentence doesn't quite work; "if either...and..." should become
> "either...or" or "if...and".

Good eyes -- "or" it is!

> > + * to enter dyntick-idle mode, we refuse to try to enter it.  After all,
> > + * it is better to incur scheduling-clock interrupts than to spin
> > + * continuously for the same time duration!
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > +	/* If no callbacks, RCU doesn't need the CPU. */
> > +	if (!rcu_cpu_has_callbacks(cpu))
> > +		return 0;
> > +	/* Otherwise, RCU needs the CPU only if it recently tried and failed. */
> > +	return per_cpu(rcu_dyntick_holdoff, cpu) == jiffies;
> 
> Sigh, one more use of jiffies. :(

Your suggested alternative?  I need something cheap, doesn't need to
be accurate to more than a few milliseconds, needs to be synchronized
across all CPUs.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function
  2011-11-03 13:59         ` Steven Rostedt
@ 2011-11-03 20:14           ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 20:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Josh Triplett, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, niv, tglx, peterz,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches

On Thu, Nov 03, 2011 at 09:59:02AM -0400, Steven Rostedt wrote:
> On Thu, 2011-11-03 at 06:29 -0700, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2011 at 12:14:20PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Nov 02, 2011 at 08:48:54PM -0700, Josh Triplett wrote:
> > > > On Wed, Nov 02, 2011 at 01:30:38PM -0700, Paul E. McKenney wrote:
> > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > 
> > > > > A common debug_lockdep_rcu_enabled() function is used to check whether
> > > > > RCU lockdep splats should be reported, but srcu_read_lock() does not
> > > > > use it.  This commit therefore brings srcu_read_lock_held() up to date.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > Just how signed off does this patch need to be? ;)
> > 
> > If you have sufficient patience to scroll past the Signed-off-by's
> > to see the patch, then there clearly are not enough.  ;-)
> > 
> > > Dunno but I feel uncomfortable now with that strange feeling I'm walking
> > > on the street with two Paul holding my hand on each side.
> > 
> > I did catch one of these, but missed the other.  Here is the history:
> > 
> > o	Paul wrote the patch.
> > 
> > o	Frederic reworked the patches that this one depended on,
> > 	and then resent the patch.
> > 
> > o	Paul did "git am -s" on the series that Frederic sent,
> > 	which added the extra Signed-off-by.
> > 	
> > It is not clear to me what the Signed-off-by chain should look like in
> > this case.  My default action would be to remove my second Signed-off-by.
> 
> The author should be you (change the From: to you not Frederic), and
> then the first SoB would be Frederic, and yours at the end as you
> committed it.
> 
> I would also state in the change log what Frederic did to the original
> patch.

Fair enough!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-03 13:34     ` Paul E. McKenney
@ 2011-11-03 20:19       ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 20:19 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Thu, Nov 03, 2011 at 06:34:35AM -0700, Paul E. McKenney wrote:
> On Wed, Nov 02, 2011 at 09:34:41PM -0700, Josh Triplett wrote:
> > On Wed, Nov 02, 2011 at 01:30:45PM -0700, Paul E. McKenney wrote:
> > > The RCU implementations, including SRCU, are designed to be used in a
> > > lock-like fashion, so that the read-side lock and unlock primitives must
> > > execute in the same context for any given read-side critical section.
> > > This constraint is enforced by lockdep-RCU.  However, there is a need for
> > > something that acts more like a reference count than a lock, in order
> > > to allow (for example) the reference to be acquired within the context
> > > of an exception, while that same reference is released in the context of
> > > the task that encountered the exception.  The cost of this capability is
> > > that the read-side operations incur the overhead of disabling interrupts.
> > > Some optimization is possible, and will be carried out if warranted.
> > > 
> > > Note that although the current implementation allows a given reference to
> > > be acquired by one task and then released by another, all known possible
> > > implementations that allow this have scalability problems.  Therefore,
> > > a given reference must be released by the same task that acquired it,
> > > though perhaps from an interrupt or exception handler running within
> > > that task's context.
> > 
> > This new bulkref API seems in dire need of documentation. :)
> > 
> > > --- a/include/linux/srcu.h
> > > +++ b/include/linux/srcu.h
> > > @@ -181,4 +181,54 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
> > >  	__srcu_read_unlock(sp, idx);
> > >  }
> > >  
> > > +/* Definitions for bulkref_t, currently defined in terms of SRCU. */
> > > +
> > > +typedef struct srcu_struct bulkref_t;
> > > +int init_srcu_struct_fields(struct srcu_struct *sp);
> > > +
> > > +static inline int init_bulkref(bulkref_t *brp)
> > > +{
> > > +	return init_srcu_struct_fields(brp);
> > > +}
> > 
> > Why can't this call init_srcu_struct and avoid the need to use the
> > previously unexported internal function?
> 
> Seems reasonable now that you mention it.  ;-)

Except that doing so results in lockdep initialization that cannot be
used.  :-(

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks
  2011-11-03  4:55   ` Josh Triplett
@ 2011-11-03 21:00     ` Paul E. McKenney
  2011-11-03 23:05       ` Josh Triplett
  2011-11-09 14:52     ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 21:00 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Paul E. McKenney

On Wed, Nov 02, 2011 at 09:55:09PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:49PM -0700, Paul E. McKenney wrote:
> > From: Paul E. McKenney <paul.mckenney@linaro.org>
> > 
> > RCU has traditionally relied on idle_cpu() to determine whether a given
> > CPU is running in the context of an idle task, but recent changes have
> > invalidated this approach.  This commit therefore switches from idle_cpu
> > to "current->pid != 0".
> 
> Could you elaborate a bit on "recent changes"?  It looks like you mean
> commit 908a3283728d92df36e0c7cd63304fd35e93a8a9; if so, could you add
> that reference to the commit message?

Will do!

> Also, the hard-coded use of "current->pid != 0" concerns me.  Could this
> use some existing function?  Does idle_task() help?  If no appropriate
> predicate exists, perhaps it should.  is_idle_task(current)?

I could use idle_task(), but that does quite a bit more work.
The hard-coded "current->pid != 0" is used in a number of other places
in the kernel, so there is precedent.  Might be worth fixing globally
as a separate fix, though.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3
  2011-11-03  4:55 ` [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Josh Triplett
@ 2011-11-03 21:45   ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-03 21:45 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, Nov 02, 2011 at 09:55:35PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:17PM -0700, Paul E. McKenney wrote:
> > This patchset permits idle tasks to use RCU read-side critical sections,
> > although they are still prohibited between tick_nohz_idle_exit_norcu()
> > and tick_nohz_idle_exit_norcu(); makes synchronize_sched_expedited()
> > better able to share work among concurrent callers, allows ftrace_dump()
> > to be invoked from modules, dumps tracing upon detection of an rcutorture
> > failure, detects illegal use of RCU read-side critical sections from
> > extended quiescent states, legitimizes the pre-existin use of RCU in the
> > idle notifiers, fixes a memory-barrier botch, introduces an SRCU-like bulk
> > reference count, improve dynticks entry/exit tracing, further improves
> > RCU's ability to allow a given CPU to enter dyntick-idle mode quickly,
> > fixes idle-task checks, updates documentation, and additional fixes
> > from a still-ongoing top-to-bottom inspection of RCU.  The patches are
> > as follows:
> 
> I've reviewed all of these patches.  For all of them except those
> indicated below:
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>

Thank you for your careful review and thoughtful comments!

I have pushed the changes to -rcu at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/dev

Could you please check to make sure that I got your Reviewed-by
on the right commits?

							Thanx, Paul

> > 1.	Strengthen memory barriers used in PowerPC value-returning
> > 	atomics and locking primitives.  It is likely that this
> > 	commit will be superseded by something from the powerpc
> > 	maintainers.  The need for this strengthening was validated
> > 	by tooling from Peter Sewell's group at the University of
> > 	Cambridge.
> 
> As before, I don't have the background on powerpc to provide review on
> this one.  However, I trust Peter Sewell's group to have gotten the
> details right. :)
> 
> > 5.	Document the troubleshooting of lockdep lock-class leaks.
> 
> I replied with a few comments and a typo fix.
> 
> > 11.	Remove a useless self-awaken when setting up expedited grace
> > 	periods, courtesy of Thomas Gleixner and the -rt effort.
> 
> Replied with a fix: commit needs splitting.
> 
> > 12-17.	Make lockdep-RCU warn when RCU read-side primitives are
> > 	invoked from an idle RCU extended quiescent state, mostly
> > 	courtesy of Frederic Weisbecker.
> 
> Replied to 17 with a minor nit, but Reviewed-by still applies to all
> five with or without that nit fixed.
> 
> > 18-23.	Separate out the scheduler-clock tick's idea of dyntick
> > 	idle from RCU's notion of an idle extended quiescent state, mostly
> > 	courtesy of Frederic Weisbecker.  These commits are needed for
> > 	Frederic's work to suppress the scheduler-clock tick when there
> > 	is but one runnable task on a given CPU.
> 
> Very much looking forward to that work.  Any pointer to more information
> on tickless-when-one-task?
> 
> Replied to patch 19 with some naming comments; the rest seem fine.
> 
> > 24.	Introduce a bulk reference count, which is related to SRCU,
> > 	but which allows a reference to be acquired in an irq handler
> > 	and released by the task that was interrupted.
> 
> Replied with comments.
> 
> > 27.	Allow CPUs with pending RCU callbacks to enter dyntick-idle
> > 	mode.  Beware this commit, as it compiled and passed rcutorture
> > 	on the first try, which historically has indicated the presence
> > 	of subtle and highly destructive bugs.
> 
> Heh.  I reviewed this one particularly carefully, then, but I didn't
> find any logic errors.  I did reply with a couple of comments, though.
> 
> > 28.	Fix RCU's determination of whether or not it is running in the
> > 	context of an idle task.
> 
> Replied with concerns.
> 
> - Josh Triplett
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks
  2011-11-03 21:00     ` Paul E. McKenney
@ 2011-11-03 23:05       ` Josh Triplett
  0 siblings, 0 replies; 74+ messages in thread
From: Josh Triplett @ 2011-11-03 23:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, Paul E. McKenney

On Thu, Nov 03, 2011 at 02:00:05PM -0700, Paul E. McKenney wrote:
> On Wed, Nov 02, 2011 at 09:55:09PM -0700, Josh Triplett wrote:
> > On Wed, Nov 02, 2011 at 01:30:49PM -0700, Paul E. McKenney wrote:
> > > From: Paul E. McKenney <paul.mckenney@linaro.org>
> > > 
> > > RCU has traditionally relied on idle_cpu() to determine whether a given
> > > CPU is running in the context of an idle task, but recent changes have
> > > invalidated this approach.  This commit therefore switches from idle_cpu
> > > to "current->pid != 0".
> > 
> > Could you elaborate a bit on "recent changes"?  It looks like you mean
> > commit 908a3283728d92df36e0c7cd63304fd35e93a8a9; if so, could you add
> > that reference to the commit message?
> 
> Will do!
> 
> > Also, the hard-coded use of "current->pid != 0" concerns me.  Could this
> > use some existing function?  Does idle_task() help?  If no appropriate
> > predicate exists, perhaps it should.  is_idle_task(current)?
> 
> I could use idle_task(), but that does quite a bit more work.

Doesn't seem that high-overhead, but *shrug*.

> The hard-coded "current->pid != 0" is used in a number of other places
> in the kernel, so there is precedent. 

Well, 2 is a number, yes:

arch/sparc/kernel/setup_32.c:   if(current->pid != 0) {
kernel/events/core.c:           if (!(event->attr.exclude_idle && current->pid == 0))

> Might be worth fixing globally
> as a separate fix, though.

Fair enough.

- Josh Triplett

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection
  2011-11-03 19:42     ` Paul E. McKenney
@ 2011-11-09 14:02       ` Peter Zijlstra
  2011-11-10 17:22         ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-09 14:02 UTC (permalink / raw)
  To: paulmck
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches

On Thu, 2011-11-03 at 12:42 -0700, Paul E. McKenney wrote:
> > If so, could we simply arrange to have lockdep scream when it encounters
> > an uninitialized spinlock?
> 
> I reworded to distinguish between compile-time initialization (which will
> cause lockdep to have a separate class per instance) and run-time
> initialization (which will cause lockdep to have one class total).

Right, runtime init will key off of the call-site, compile-time init
will key off of the static data address.

> Making lockdep scream in this case might be useful, but if I understand
> correctly, that would give false positives for compile-time initialized
> global locks. 

Yeah, that's going to bring a lot of pain with it, in particular all the
early stuff like the init task etc. are all statically initialized.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-03 16:06           ` Paul E. McKenney
@ 2011-11-09 14:28             ` Peter Zijlstra
  2011-11-09 16:48             ` Frederic Weisbecker
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-09 14:28 UTC (permalink / raw)
  To: paulmck
  Cc: Josh Triplett, Frederic Weisbecker, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches,
	Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
	Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar,
	H. Peter Anvin, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

On Thu, 2011-11-03 at 09:06 -0700, Paul E. McKenney wrote:
> > Mostly I think that since this series tries to separate the concepts of
> > "idle nohz" and "rcu extended quiescent state", we should end up with
> > two entirely separate functions delimiting those two, without any
> > functions that poke both with correspondingly complex compound names.
> 
> Having four API members rather than the current six does seem quite
> attractive to me.  Frederic, any reason why this approach won't work? 

Quite agreed. And since you seem to be touching most archs anyway,
touching them all isn't much more extra work.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks
  2011-11-03  4:55   ` Josh Triplett
  2011-11-03 21:00     ` Paul E. McKenney
@ 2011-11-09 14:52     ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-09 14:52 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Paul E. McKenney

On Wed, 2011-11-02 at 21:55 -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:49PM -0700, Paul E. McKenney wrote:
> > From: Paul E. McKenney <paul.mckenney@linaro.org>
> > 
> > RCU has traditionally relied on idle_cpu() to determine whether a given
> > CPU is running in the context of an idle task, but recent changes have
> > invalidated this approach.  This commit therefore switches from idle_cpu
> > to "current->pid != 0".
> 
> Could you elaborate a bit on "recent changes"?  It looks like you mean
> commit 908a3283728d92df36e0c7cd63304fd35e93a8a9; if so, could you add
> that reference to the commit message?

Oh, that was unintended fallout, idle_cpu() was taken to mean is this
cpu currently idle, and was changed to not return true when there's
pending wakeups, since in that case the cpu isn't actually idle, even
though it might still be running the idle task.

> Also, the hard-coded use of "current->pid != 0" concerns me.  Could this
> use some existing function?  Does idle_task() help?  If no appropriate
> predicate exists, perhaps it should.  is_idle_task(current)?

Right, current == idle_task(smp_processor_id()) will test if the current
task is the idle task for the current cpu, regardless of whether the cpu
is actually idle or not.

Then again, the ->pid == 0 thing seems to be fairly solid as well,
having just looked at the fork_idle() code etc.. 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-03 16:06           ` Paul E. McKenney
  2011-11-09 14:28             ` Peter Zijlstra
@ 2011-11-09 16:48             ` Frederic Weisbecker
  2011-11-10 10:52               ` Peter Zijlstra
  2011-11-10 17:22               ` Paul E. McKenney
  1 sibling, 2 replies; 74+ messages in thread
From: Frederic Weisbecker @ 2011-11-09 16:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > 
> > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > mode and until we restart the tick. However this is not always
> > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > the tick is stopped.
> > > > > > 
> > > > > > To prepare for fixing this, add two new APIs:
> > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > 
> > > > > > If no use of RCU is made in the idle loop between
> > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > 
> > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > entirely rather than introducing the _norcu variants first?
> > > > 
> > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > 
> > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > beginning.
> > > 
> > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > tick_nohz_idle_exit() on exit?
> > > 
> > > That said, I don't feel all that strongly on this naming topic.
> > 
> > Mostly I think that since this series tries to separate the concepts of
> > "idle nohz" and "rcu extended quiescent state", we should end up with
> > two entirely separate functions delimiting those two, without any
> > functions that poke both with correspondingly complex compound names.
> 
> Having four API members rather than the current six does seem quite
> attractive to me.  Frederic, any reason why this approach won't work?

The approach I took might sound silly but it's mostly an optimization:

I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
call under the same local_irq_disable() section.

This way we can't have an interrupt in between that can needlessly perform
RCU work (and trigger the softirq in the worst case), delaying the point
where we actually put the CPU to sleep.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-09 16:48             ` Frederic Weisbecker
@ 2011-11-10 10:52               ` Peter Zijlstra
  2011-11-10 17:22               ` Paul E. McKenney
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-10 10:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches,
	Mike Frysinger, Guan Xuetao, David Miller, Chris Metcalf,
	Hans-Christian Egtvedt, Ralf Baechle, Ingo Molnar,
	H. Peter Anvin, Russell King, Paul Mackerras, Heiko Carstens,
	Paul Mundt

On Wed, 2011-11-09 at 17:48 +0100, Frederic Weisbecker wrote:
> > Having four API members rather than the current six does seem quite
> > attractive to me.  Frederic, any reason why this approach won't work?
> 
> The approach I took might sound silly but it's mostly an optimization:
> 
> I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> call under the same local_irq_disable() section.
> 
> This way we can't have an interrupt in between that can needlessly perform
> RCU work (and trigger the softirq in the worst case), delaying the point
> where we actually put the CPU to sleep. 

I'm not sure I get what you're saying. A fully decoupled RCU/NO_HZ API
looks like:

  rcu_idle_enter();
  rcu_idle_exit();

  tick_nohz_idle_enter();
  tick_nohz_idle_exit();

And done you are, no funny interactions, 4 functions.

There is no _norcu variant simply because nohz will never touch rcu. If
you want the old coupled behaviour simply call both
tick_nohz_idle_enter() and rcu_idle_enter().



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-09 16:48             ` Frederic Weisbecker
  2011-11-10 10:52               ` Peter Zijlstra
@ 2011-11-10 17:22               ` Paul E. McKenney
  2011-11-15 18:30                 ` Frederic Weisbecker
  1 sibling, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-10 17:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Wed, Nov 09, 2011 at 05:48:11PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > > 
> > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > the tick is stopped.
> > > > > > > 
> > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > 
> > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > 
> > > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > > entirely rather than introducing the _norcu variants first?
> > > > > 
> > > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > > 
> > > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > > beginning.
> > > > 
> > > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > > tick_nohz_idle_exit() on exit?
> > > > 
> > > > That said, I don't feel all that strongly on this naming topic.
> > > 
> > > Mostly I think that since this series tries to separate the concepts of
> > > "idle nohz" and "rcu extended quiescent state", we should end up with
> > > two entirely separate functions delimiting those two, without any
> > > functions that poke both with correspondingly complex compound names.
> > 
> > Having four API members rather than the current six does seem quite
> > attractive to me.  Frederic, any reason why this approach won't work?
> 
> The approach I took might sound silly but it's mostly an optimization:
> 
> I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> call under the same local_irq_disable() section.
> 
> This way we can't have an interrupt in between that can needlessly perform
> RCU work (and trigger the softirq in the worst case), delaying the point
> where we actually put the CPU to sleep.

But we have to tolerate this sort of thing on some architectures (x86
and Power) in order to allow idle-task use of RCU read-side primitives,
right?

So consolidating from six to four APIs doesn't expand the overall state
space.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection
  2011-11-09 14:02       ` Peter Zijlstra
@ 2011-11-10 17:22         ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-10 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches

On Wed, Nov 09, 2011 at 03:02:08PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-11-03 at 12:42 -0700, Paul E. McKenney wrote:
> > > If so, could we simply arrange to have lockdep scream when it encounters
> > > an uninitialized spinlock?
> > 
> > I reworded to distinguish between compile-time initialization (which will
> > cause lockdep to have a separate class per instance) and run-time
> > initialization (which will cause lockdep to have one class total).
> 
> Right, runtime init will key off of the call-site, compile-time init
> will key off of the static data address.
> 
> > Making lockdep scream in this case might be useful, but if I understand
> > correctly, that would give false positives for compile-time initialized
> > global locks. 
> 
> Yeah, that's going to bring a lot of pain with it, in particular all the
> early stuff like the init task etc. are all statically initialized.

OK, will stick with the current approach, then.

								Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-10 17:22               ` Paul E. McKenney
@ 2011-11-15 18:30                 ` Frederic Weisbecker
  2011-11-16 19:41                   ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Frederic Weisbecker @ 2011-11-15 18:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Thu, Nov 10, 2011 at 09:22:19AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 09, 2011 at 05:48:11PM +0100, Frederic Weisbecker wrote:
> > On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> > > On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > > > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > > > 
> > > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > > the tick is stopped.
> > > > > > > > 
> > > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > > 
> > > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > > 
> > > > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > > > entirely rather than introducing the _norcu variants first?
> > > > > > 
> > > > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > > > 
> > > > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > > > beginning.
> > > > > 
> > > > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > > > tick_nohz_idle_exit() on exit?
> > > > > 
> > > > > That said, I don't feel all that strongly on this naming topic.
> > > > 
> > > > Mostly I think that since this series tries to separate the concepts of
> > > > "idle nohz" and "rcu extended quiescent state", we should end up with
> > > > two entirely separate functions delimiting those two, without any
> > > > functions that poke both with correspondingly complex compound names.
> > > 
> > > Having four API members rather than the current six does seem quite
> > > attractive to me.  Frederic, any reason why this approach won't work?
> > 
> > The approach I took might sound silly but it's mostly an optimization:
> > 
> > I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> > call under the same local_irq_disable() section.
> > 
> > This way we can't have an interrupt in between that can needlessly perform
> > RCU work (and trigger the softirq in the worst case), delaying the point
> > where we actually put the CPU to sleep.
> 
> But we have to tolerate this sort of thing on some architectures (x86
> and Power) in order to allow idle-task use of RCU read-side primitives,
> right?
> 
> So consolidating from six to four APIs doesn't expand the overall state
> space.

Well, we tolerate that, the two more APIs are there for optimization, not
to provide correctness.
But if you want me to remove the optimization and keep only the four APIs I can do it.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop
  2011-11-15 18:30                 ` Frederic Weisbecker
@ 2011-11-16 19:41                   ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-16 19:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, Mike Frysinger,
	Guan Xuetao, David Miller, Chris Metcalf, Hans-Christian Egtvedt,
	Ralf Baechle, Ingo Molnar, Peter Zijlstra, H. Peter Anvin,
	Russell King, Paul Mackerras, Heiko Carstens, Paul Mundt

On Tue, Nov 15, 2011 at 07:30:29PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 10, 2011 at 09:22:19AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 09, 2011 at 05:48:11PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > > > > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > > > > 
> > > > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > > > the tick is stopped.
> > > > > > > > > 
> > > > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > > > 
> > > > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > > > 
> > > > > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > > > > entirely rather than introducing the _norcu variants first?
> > > > > > > 
> > > > > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > > > > 
> > > > > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > > > > beginning.
> > > > > > 
> > > > > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > > > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > > > > tick_nohz_idle_exit() on exit?
> > > > > > 
> > > > > > That said, I don't feel all that strongly on this naming topic.
> > > > > 
> > > > > Mostly I think that since this series tries to separate the concepts of
> > > > > "idle nohz" and "rcu extended quiescent state", we should end up with
> > > > > two entirely separate functions delimiting those two, without any
> > > > > functions that poke both with correspondingly complex compound names.
> > > > 
> > > > Having four API members rather than the current six does seem quite
> > > > attractive to me.  Frederic, any reason why this approach won't work?
> > > 
> > > The approach I took might sound silly but it's mostly an optimization:
> > > 
> > > I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> > > call under the same local_irq_disable() section.
> > > 
> > > This way we can't have an interrupt in between that can needlessly perform
> > > RCU work (and trigger the softirq in the worst case), delaying the point
> > > where we actually put the CPU to sleep.
> > 
> > But we have to tolerate this sort of thing on some architectures (x86
> > and Power) in order to allow idle-task use of RCU read-side primitives,
> > right?
> > 
> > So consolidating from six to four APIs doesn't expand the overall state
> > space.
> 
> Well, we tolerate that, the two more APIs are there for optimization, not
> to provide correctness.
> But if you want me to remove the optimization and keep only the four APIs I can do it.

Probably best to start with the simpler API and expand it if performance
considerations suggest that this is approapriate.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count Paul E. McKenney
  2011-11-03  4:34   ` Josh Triplett
@ 2011-11-28 12:41   ` Peter Zijlstra
  2011-11-28 17:15     ` Paul E. McKenney
  1 sibling, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-28 12:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Wed, 2011-11-02 at 13:30 -0700, Paul E. McKenney wrote:
> The RCU implementations, including SRCU, are designed to be used in a
> lock-like fashion, so that the read-side lock and unlock primitives must
> execute in the same context for any given read-side critical section.
> This constraint is enforced by lockdep-RCU.  However, there is a need for
> something that acts more like a reference count than a lock, in order
> to allow (for example) the reference to be acquired within the context
> of an exception, while that same reference is released in the context of
> the task that encountered the exception.  The cost of this capability is
> that the read-side operations incur the overhead of disabling interrupts.
> Some optimization is possible, and will be carried out if warranted.
> 
> Note that although the current implementation allows a given reference to
> be acquired by one task and then released by another, all known possible
> implementations that allow this have scalability problems.  Therefore,
> a given reference must be released by the same task that acquired it,
> though perhaps from an interrupt or exception handler running within
> that task's context.

I'm having trouble with the naming as well as the need for an explicit
new API.

To me this looks like a regular (S)RCU variant, nothing to do with
references per-se (aside from the fact that SRCU is a refcounted rcu
variant). Also WTF is this bulk stuff about? Its still a single ref at a
time, not 10s or 100s or whatnot.

> +static inline int bulkref_get(bulkref_t *brp)
> +{
> +	unsigned long flags;
> +	int ret;
> +
> +	local_irq_save(flags);
> +	ret =  __srcu_read_lock(brp);
> +	local_irq_restore(flags);
> +	return ret;
> +}
> +
> +static inline void bulkref_put(bulkref_t *brp, int idx)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	__srcu_read_unlock(brp, idx);
> +	local_irq_restore(flags);
> +}

This seems to be the main gist of the patch, which to me sounds utterly
ridiculous. Why not document that srcu_read_{un,}lock() aren't IRQ safe
and if you want to use it from those contexts you have to fix it up
yourself.

RCU lockdep doesn't do the full validation so it won't actually catch it
if you mess up the irq states, but I guess if you want we could look at
adding that.

> diff --git a/kernel/srcu.c b/kernel/srcu.c
> index 73ce23f..10214c8 100644
> --- a/kernel/srcu.c
> +++ b/kernel/srcu.c
> @@ -34,13 +34,14 @@
>  #include <linux/delay.h>
>  #include <linux/srcu.h>
>  
> -static int init_srcu_struct_fields(struct srcu_struct *sp)
> +int init_srcu_struct_fields(struct srcu_struct *sp)
>  {
>  	sp->completed = 0;
>  	mutex_init(&sp->mutex);
>  	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
>  	return sp->per_cpu_ref ? 0 : -ENOMEM;
>  }
> +EXPORT_SYMBOL_GPL(init_srcu_struct_fields);

What do we need this export for? Usually we don't add exports unless
there's a use-case. Since Srikar requested this nonsense, I guess the
user is uprobes, but that isn't a module, so no export needed.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-28 12:41   ` Peter Zijlstra
@ 2011-11-28 17:15     ` Paul E. McKenney
  2011-11-28 18:17       ` Peter Zijlstra
  0 siblings, 1 reply; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-28 17:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Mon, Nov 28, 2011 at 01:41:11PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-11-02 at 13:30 -0700, Paul E. McKenney wrote:
> > The RCU implementations, including SRCU, are designed to be used in a
> > lock-like fashion, so that the read-side lock and unlock primitives must
> > execute in the same context for any given read-side critical section.
> > This constraint is enforced by lockdep-RCU.  However, there is a need for
> > something that acts more like a reference count than a lock, in order
> > to allow (for example) the reference to be acquired within the context
> > of an exception, while that same reference is released in the context of
> > the task that encountered the exception.  The cost of this capability is
> > that the read-side operations incur the overhead of disabling interrupts.
> > Some optimization is possible, and will be carried out if warranted.
> > 
> > Note that although the current implementation allows a given reference to
> > be acquired by one task and then released by another, all known possible
> > implementations that allow this have scalability problems.  Therefore,
> > a given reference must be released by the same task that acquired it,
> > though perhaps from an interrupt or exception handler running within
> > that task's context.
> 
> I'm having trouble with the naming as well as the need for an explicit
> new API.
> 
> To me this looks like a regular (S)RCU variant, nothing to do with
> references per-se (aside from the fact that SRCU is a refcounted rcu
> variant). Also WTF is this bulk stuff about? Its still a single ref at a
> time, not 10s or 100s or whatnot.

It is a bulk reference in comparison to a conventional atomic_inc()-style
reference count, which is normally associated with a specific structure.
In contrast, doing a bulkref_get() normally protects a group of structures,
everything covered by the bulkref_t.

Yes, in theory you could have a global reference counter that protected
a group of structures, but in practice we both know that this would not
end well.  ;-)

> > +static inline int bulkref_get(bulkref_t *brp)
> > +{
> > +	unsigned long flags;
> > +	int ret;
> > +
> > +	local_irq_save(flags);
> > +	ret =  __srcu_read_lock(brp);
> > +	local_irq_restore(flags);
> > +	return ret;
> > +}
> > +
> > +static inline void bulkref_put(bulkref_t *brp, int idx)
> > +{
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> > +	__srcu_read_unlock(brp, idx);
> > +	local_irq_restore(flags);
> > +}
> 
> This seems to be the main gist of the patch, which to me sounds utterly
> ridiculous. Why not document that srcu_read_{un,}lock() aren't IRQ safe
> and if you want to use it from those contexts you have to fix it up
> yourself.

I thought I had documented this, but I guess not.  I will add that.

I lost you on the "fix it up yourself" -- what are you suggesting that
someone needing to use RCU in this manner actually do?

> RCU lockdep doesn't do the full validation so it won't actually catch it
> if you mess up the irq states, but I guess if you want we could look at
> adding that.

Ah, I had missed that.  Yes, it would be very good if that could be added.
The vast majority of the uses exit the RCU read-side critical section in
the same context that they enter it, so it would be good to check.

> > diff --git a/kernel/srcu.c b/kernel/srcu.c
> > index 73ce23f..10214c8 100644
> > --- a/kernel/srcu.c
> > +++ b/kernel/srcu.c
> > @@ -34,13 +34,14 @@
> >  #include <linux/delay.h>
> >  #include <linux/srcu.h>
> >  
> > -static int init_srcu_struct_fields(struct srcu_struct *sp)
> > +int init_srcu_struct_fields(struct srcu_struct *sp)
> >  {
> >  	sp->completed = 0;
> >  	mutex_init(&sp->mutex);
> >  	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
> >  	return sp->per_cpu_ref ? 0 : -ENOMEM;
> >  }
> > +EXPORT_SYMBOL_GPL(init_srcu_struct_fields);
> 
> What do we need this export for? Usually we don't add exports unless
> there's a use-case. Since Srikar requested this nonsense, I guess the
> user is uprobes, but that isn't a module, so no export needed.

Yep, the user is uprobes.  The export is for rcutorture, which can run
as a module.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-28 17:15     ` Paul E. McKenney
@ 2011-11-28 18:17       ` Peter Zijlstra
  2011-11-28 18:31         ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-28 18:17 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Mon, 2011-11-28 at 09:15 -0800, Paul E. McKenney wrote:

> > I'm having trouble with the naming as well as the need for an explicit
> > new API.
> > 
> > To me this looks like a regular (S)RCU variant, nothing to do with
> > references per-se (aside from the fact that SRCU is a refcounted rcu
> > variant). Also WTF is this bulk stuff about? Its still a single ref at a
> > time, not 10s or 100s or whatnot.
> 
> It is a bulk reference in comparison to a conventional atomic_inc()-style
> reference count, which is normally associated with a specific structure.
> In contrast, doing a bulkref_get() normally protects a group of structures,
> everything covered by the bulkref_t.
> 
> Yes, in theory you could have a global reference counter that protected
> a group of structures, but in practice we both know that this would not
> end well.  ;-)

Well, all the counter based RCUs are basically that. And yes, making
them scale is 'interesting', however you've done pretty well so far ;-)

I just hate the name in that it totally obscures the fact that its
regular SRCU.

> > > +static inline int bulkref_get(bulkref_t *brp)
> > > +{
> > > +	unsigned long flags;
> > > +	int ret;
> > > +
> > > +	local_irq_save(flags);
> > > +	ret =  __srcu_read_lock(brp);
> > > +	local_irq_restore(flags);
> > > +	return ret;
> > > +}
> > > +
> > > +static inline void bulkref_put(bulkref_t *brp, int idx)
> > > +{
> > > +	unsigned long flags;
> > > +
> > > +	local_irq_save(flags);
> > > +	__srcu_read_unlock(brp, idx);
> > > +	local_irq_restore(flags);
> > > +}
> > 
> > This seems to be the main gist of the patch, which to me sounds utterly
> > ridiculous. Why not document that srcu_read_{un,}lock() aren't IRQ safe
> > and if you want to use it from those contexts you have to fix it up
> > yourself.
> 
> I thought I had documented this, but I guess not.  I will add that.

Oh, I hadn't checked, it could be.

> I lost you on the "fix it up yourself" -- what are you suggesting that
> someone needing to use RCU in this manner actually do?

  local_irq_save(flags);
  srcu_read_lock(&my_srcu_domain);
  local_irq_restore(flags);

and

  local_irq_save(flags);
  srcu_read_unlock(&my_srcu_domain);
  local_irq_restore(flags)

Doesn't look to be too hard, or confusing.

> > RCU lockdep doesn't do the full validation so it won't actually catch it
> > if you mess up the irq states, but I guess if you want we could look at
> > adding that.
> 
> Ah, I had missed that.  Yes, it would be very good if that could be added.
> The vast majority of the uses exit the RCU read-side critical section in
> the same context that they enter it, so it would be good to check.

/me adds to TODO list.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-28 18:17       ` Peter Zijlstra
@ 2011-11-28 18:31         ` Paul E. McKenney
  2011-11-28 18:35           ` Peter Zijlstra
  2011-11-28 18:36           ` Peter Zijlstra
  0 siblings, 2 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-28 18:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Mon, Nov 28, 2011 at 07:17:59PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 09:15 -0800, Paul E. McKenney wrote:
> 
> > > I'm having trouble with the naming as well as the need for an explicit
> > > new API.
> > > 
> > > To me this looks like a regular (S)RCU variant, nothing to do with
> > > references per-se (aside from the fact that SRCU is a refcounted rcu
> > > variant). Also WTF is this bulk stuff about? Its still a single ref at a
> > > time, not 10s or 100s or whatnot.
> > 
> > It is a bulk reference in comparison to a conventional atomic_inc()-style
> > reference count, which is normally associated with a specific structure.
> > In contrast, doing a bulkref_get() normally protects a group of structures,
> > everything covered by the bulkref_t.
> > 
> > Yes, in theory you could have a global reference counter that protected
> > a group of structures, but in practice we both know that this would not
> > end well.  ;-)
> 
> Well, all the counter based RCUs are basically that. And yes, making
> them scale is 'interesting', however you've done pretty well so far ;-)

Fair point, and thank you for the vote of confidence.  ;-)

Nevertheless, when most people talk to me about explicit reference
counters, they are thinking in terms of a reference counter within a
structure protecting that structure.

> I just hate the name in that it totally obscures the fact that its
> regular SRCU.

OK, what names would you suggest?

> > > > +static inline int bulkref_get(bulkref_t *brp)
> > > > +{
> > > > +	unsigned long flags;
> > > > +	int ret;
> > > > +
> > > > +	local_irq_save(flags);
> > > > +	ret =  __srcu_read_lock(brp);
> > > > +	local_irq_restore(flags);
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static inline void bulkref_put(bulkref_t *brp, int idx)
> > > > +{
> > > > +	unsigned long flags;
> > > > +
> > > > +	local_irq_save(flags);
> > > > +	__srcu_read_unlock(brp, idx);
> > > > +	local_irq_restore(flags);
> > > > +}
> > > 
> > > This seems to be the main gist of the patch, which to me sounds utterly
> > > ridiculous. Why not document that srcu_read_{un,}lock() aren't IRQ safe
> > > and if you want to use it from those contexts you have to fix it up
> > > yourself.
> > 
> > I thought I had documented this, but I guess not.  I will add that.
> 
> Oh, I hadn't checked, it could be.

It wasn't.  I just now fixed it in my local git tree.  ;-)

> > I lost you on the "fix it up yourself" -- what are you suggesting that
> > someone needing to use RCU in this manner actually do?
> 
>   local_irq_save(flags);
>   srcu_read_lock(&my_srcu_domain);
>   local_irq_restore(flags);
> 
> and
> 
>   local_irq_save(flags);
>   srcu_read_unlock(&my_srcu_domain);
>   local_irq_restore(flags)
> 
> Doesn't look to be too hard, or confusing.

Ah, OK, I was under the mistaken impression that lockdep would splat
if you did (for example) srcu_read_lock() in an exception handler and
srcu_read_unlock() in the context of the task that took the exception.

> > > RCU lockdep doesn't do the full validation so it won't actually catch it
> > > if you mess up the irq states, but I guess if you want we could look at
> > > adding that.
> > 
> > Ah, I had missed that.  Yes, it would be very good if that could be added.
> > The vast majority of the uses exit the RCU read-side critical section in
> > the same context that they enter it, so it would be good to check.
> 
> /me adds to TODO list.

Thank you!  Please CC me on this one -- the above fixup would start
failing once lockdep checked for this, right?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-28 18:31         ` Paul E. McKenney
@ 2011-11-28 18:35           ` Peter Zijlstra
  2011-11-29 13:33             ` Peter Zijlstra
  2011-11-28 18:36           ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-28 18:35 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Mon, 2011-11-28 at 10:31 -0800, Paul E. McKenney wrote:
> >   local_irq_save(flags);
> >   srcu_read_lock(&my_srcu_domain);
> >   local_irq_restore(flags);
> > 
> > and
> > 
> >   local_irq_save(flags);
> >   srcu_read_unlock(&my_srcu_domain);
> >   local_irq_restore(flags)
> > 
> > Doesn't look to be too hard, or confusing.
> 
> Ah, OK, I was under the mistaken impression that lockdep would splat
> if you did (for example) srcu_read_lock() in an exception handler and
> srcu_read_unlock() in the context of the task that took the exception. 

I don't think it will, lockdep does very little actual validation on the
RCU locks other than recording they're held. But if they do, the planned
TODO item will get inversed.

Should be easy enough to test I guess.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-28 18:31         ` Paul E. McKenney
  2011-11-28 18:35           ` Peter Zijlstra
@ 2011-11-28 18:36           ` Peter Zijlstra
  1 sibling, 0 replies; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-28 18:36 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Mon, 2011-11-28 at 10:31 -0800, Paul E. McKenney wrote:
> Nevertheless, when most people talk to me about explicit reference
> counters, they are thinking in terms of a reference counter within a
> structure protecting that structure.

Right and when I see bulk I think of exactly those but think +=,-=
instead of ++,--.

> > I just hate the name in that it totally obscures the fact that its
> > regular SRCU.
> 
> OK, what names would you suggest? 

How about nothing at all? Simply use the existing SRCU API?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-28 18:35           ` Peter Zijlstra
@ 2011-11-29 13:33             ` Peter Zijlstra
  2011-11-29 17:41               ` Paul E. McKenney
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2011-11-29 13:33 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Mon, 2011-11-28 at 19:35 +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 10:31 -0800, Paul E. McKenney wrote:
> > >   local_irq_save(flags);
> > >   srcu_read_lock(&my_srcu_domain);
> > >   local_irq_restore(flags);
> > > 
> > > and
> > > 
> > >   local_irq_save(flags);
> > >   srcu_read_unlock(&my_srcu_domain);
> > >   local_irq_restore(flags)
> > > 
> > > Doesn't look to be too hard, or confusing.
> > 
> > Ah, OK, I was under the mistaken impression that lockdep would splat
> > if you did (for example) srcu_read_lock() in an exception handler and
> > srcu_read_unlock() in the context of the task that took the exception. 
> 
> I don't think it will, lockdep does very little actual validation on the
> RCU locks other than recording they're held. But if they do, the planned
> TODO item will get inversed.
> 
> Should be easy enough to test I guess.

OK, so I had me a little peek at lockdep and you're right, it will
complain.

Still uprobes can do:

  local_irq_save(flags);
  __srcu_read_lock(&mr_srcu_domain);
  local_irq_restore(flags);

However if you object to exposing the __srcu functions (which I can
understand) you could expose these two functions as
srcu_read_{,un}lock_raw() or so, to mirror the non-validation also found
in rcu_dereference_raw()



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count
  2011-11-29 13:33             ` Peter Zijlstra
@ 2011-11-29 17:41               ` Paul E. McKenney
  0 siblings, 0 replies; 74+ messages in thread
From: Paul E. McKenney @ 2011-11-29 17:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches

On Tue, Nov 29, 2011 at 02:33:35PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-11-28 at 19:35 +0100, Peter Zijlstra wrote:
> > On Mon, 2011-11-28 at 10:31 -0800, Paul E. McKenney wrote:
> > > >   local_irq_save(flags);
> > > >   srcu_read_lock(&my_srcu_domain);
> > > >   local_irq_restore(flags);
> > > > 
> > > > and
> > > > 
> > > >   local_irq_save(flags);
> > > >   srcu_read_unlock(&my_srcu_domain);
> > > >   local_irq_restore(flags)
> > > > 
> > > > Doesn't look to be too hard, or confusing.
> > > 
> > > Ah, OK, I was under the mistaken impression that lockdep would splat
> > > if you did (for example) srcu_read_lock() in an exception handler and
> > > srcu_read_unlock() in the context of the task that took the exception. 
> > 
> > I don't think it will, lockdep does very little actual validation on the
> > RCU locks other than recording they're held. But if they do, the planned
> > TODO item will get inversed.
> > 
> > Should be easy enough to test I guess.
> 
> OK, so I had me a little peek at lockdep and you're right, it will
> complain.

OK, I will cross that test off my list for today.  ;-)

> Still uprobes can do:
> 
>   local_irq_save(flags);
>   __srcu_read_lock(&mr_srcu_domain);
>   local_irq_restore(flags);

And this is exactly what the bulkref stuff does, so we at least agree
on the implementation.

> However if you object to exposing the __srcu functions (which I can
> understand) you could expose these two functions as
> srcu_read_{,un}lock_raw() or so, to mirror the non-validation also found
> in rcu_dereference_raw()

Good point, the _raw suffix is used elsewhere in RCU for "turn off lockdep",
so it makes sense to use it here as well.

I will change to srcu_read_lock_raw() and srcu_read_unlock_raw().  And
that has the added benefit of getting rid of the alternative names for
the initialization and cleanup functions, so sounds good!  Thank you!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2011-11-29 17:42 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-02 20:30 [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 01/28] powerpc: Strengthen value-returning-atomics memory barriers Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 02/28] rcu: ->signaled better named ->fqs_state Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 03/28] rcu: Avoid RCU-preempt expedited grace-period botch Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 04/28] rcu: Make synchronize_sched_expedited() better at work sharing Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 05/28] lockdep: Update documentation for lock-class leak detection Paul E. McKenney
2011-11-03  2:57   ` Josh Triplett
2011-11-03 19:42     ` Paul E. McKenney
2011-11-09 14:02       ` Peter Zijlstra
2011-11-10 17:22         ` Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 06/28] rcu: Track idleness independent of idle tasks Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 07/28] trace: Allow ftrace_dump() to be called from modules Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 08/28] rcu: Add failure tracing to rcutorture Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 09/28] rcu: Document failing tick as cause of RCU CPU stall warning Paul E. McKenney
2011-11-03  3:07   ` Josh Triplett
2011-11-03 13:25     ` Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 10/28] rcu: Disable preemption in rcu_is_cpu_idle() Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 11/28] rcu: Omit self-awaken when setting up expedited grace period Paul E. McKenney
2011-11-03  3:16   ` Josh Triplett
2011-11-03 19:43     ` Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 12/28] rcu: Detect illegal rcu dereference in extended quiescent state Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 13/28] rcu: Inform the user about extended quiescent state on PROVE_RCU warning Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 14/28] rcu: Warn when rcu_read_lock() is used in extended quiescent state Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 15/28] rcu: Remove one layer of abstraction from PROVE_RCU checking Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 16/28] rcu: Warn when srcu_read_lock() is used in an extended quiescent state Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 17/28] rcu: Make srcu_read_lock_held() call common lockdep-enabled function Paul E. McKenney
2011-11-03  3:48   ` Josh Triplett
2011-11-03 11:14     ` Frederic Weisbecker
2011-11-03 13:19       ` Steven Rostedt
2011-11-03 13:30         ` Paul E. McKenney
2011-11-03 13:29       ` Paul E. McKenney
2011-11-03 13:59         ` Steven Rostedt
2011-11-03 20:14           ` Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 18/28] nohz: Separate out irq exit and idle loop dyntick logic Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop Paul E. McKenney
2011-11-03  4:00   ` Josh Triplett
2011-11-03 11:54     ` Frederic Weisbecker
2011-11-03 13:32       ` Paul E. McKenney
2011-11-03 15:31         ` Josh Triplett
2011-11-03 16:06           ` Paul E. McKenney
2011-11-09 14:28             ` Peter Zijlstra
2011-11-09 16:48             ` Frederic Weisbecker
2011-11-10 10:52               ` Peter Zijlstra
2011-11-10 17:22               ` Paul E. McKenney
2011-11-15 18:30                 ` Frederic Weisbecker
2011-11-16 19:41                   ` Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 20/28] x86: Enter rcu extended qs after idle notifier call Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 21/28] x86: Call idle notifier after irq_enter() Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 22/28] rcu: Fix early call to rcu_idle_enter() Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 23/28] powerpc: Tell RCU about idle after hcall tracing Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 24/28] rcu: Introduce bulk reference count Paul E. McKenney
2011-11-03  4:34   ` Josh Triplett
2011-11-03 13:34     ` Paul E. McKenney
2011-11-03 20:19       ` Paul E. McKenney
2011-11-28 12:41   ` Peter Zijlstra
2011-11-28 17:15     ` Paul E. McKenney
2011-11-28 18:17       ` Peter Zijlstra
2011-11-28 18:31         ` Paul E. McKenney
2011-11-28 18:35           ` Peter Zijlstra
2011-11-29 13:33             ` Peter Zijlstra
2011-11-29 17:41               ` Paul E. McKenney
2011-11-28 18:36           ` Peter Zijlstra
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 25/28] rcu: Deconfuse dynticks entry-exit tracing Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 26/28] rcu: Add more information to the wrong-idle-task complaint Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 27/28] rcu: Allow dyntick-idle mode for CPUs with callbacks Paul E. McKenney
2011-11-03  4:47   ` Josh Triplett
2011-11-03 19:53     ` Paul E. McKenney
2011-11-02 20:30 ` [PATCH RFC tip/core/rcu 28/28] rcu: Fix idle-task checks Paul E. McKenney
2011-11-03  4:55   ` Josh Triplett
2011-11-03 21:00     ` Paul E. McKenney
2011-11-03 23:05       ` Josh Triplett
2011-11-09 14:52     ` Peter Zijlstra
2011-11-03  4:55 ` [PATCH RFC tip/core/rcu 0/28] Preview of RCU changes for 3.3 Josh Triplett
2011-11-03 21:45   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).