linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU
@ 2009-10-09 22:49 Paul E. McKenney
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7 Paul E. McKenney
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-09 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, avi,
	mtosatti, torvalds

This series rebases TINY_RCU and creates an expedited SRCU along with
corresponding RCU torture tests.  These are post-2.6.32 material.

							Thanx, Paul


 include/linux/hardirq.h  |   24 ++++
 include/linux/rcupdate.h |    6 +
 include/linux/rcutiny.h  |  103 +++++++++++++++++
 include/linux/srcu.h     |    1 
 init/Kconfig             |    9 +
 kernel/Makefile          |    1 
 kernel/rcupdate.c        |    4 
 kernel/rcutiny.c         |  282 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/rcutorture.c      |   26 +++-
 kernel/srcu.c            |   75 ++++++++----
 10 files changed, 502 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-09 22:49 [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Paul E. McKenney
@ 2009-10-09 22:50 ` Paul E. McKenney
  2009-10-12  9:29   ` Lai Jiangshan
  2009-10-13  7:44   ` Lai Jiangshan
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 2/3] rcu: Add synchronize_srcu_expedited() Paul E. McKenney
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-09 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, avi,
	mtosatti, torvalds, Paul E. McKenney

This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation.  In particular, the implementation
of synchronize_rcu() is extremely lightweight and high performance.
It passes rcutorture testing in each of the four relevant configurations
(combinations of NO_HZ and PREEMPT) on x86.  This saves about 1K bytes
compared to old Classic RCU (which is no longer in mainline), and more
than three kilobytes compared to Hierarchical RCU (updated to 2.6.30):

	CONFIG_TREE_RCU:

	   text	   data	    bss	    dec	    filename
	    663      32      20     715     kernel/rcupdate.o
	   3278     528      44    3850     kernel/rcutree.o
				   4565 Total (vs 4045 for v4)

	CONFIG_TREE_PREEMPT_RCU:

	   text	   data	    bss	    dec	    filename
	    743      32      20     795     kernel/rcupdate.o
	   4548     752      60    5360     kernel/rcutree.o
	   			   6155 Total (N/A for v4)

	CONFIG_TINY_RCU:

	   text	   data	    bss	    dec	    filename
	     96       4       0     100     kernel/rcupdate.o
	    720      28       0     748     kernel/rcutiny.o
	    			    848 Total (vs 1140 for v6)

The above is for x86.  Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.

Changes from v6 (http://lkml.org/lkml/2009/9/23/293).

o	Forward ported to put it into the 2.6.33 stream.

o	Added lockdep support.

o	Make lightweight rcu_barrier.

Changes from v5 (http://lkml.org/lkml/2009/6/23/12).

o	Ported to latest pre-2.6.32 merge window kernel.

	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
	- Provided trivial rcu_cpu_notify().
	- Provided trivial exit_rcu().
	- Provided trivial rcu_needs_cpu().
	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.

o	Removed the dependence on EMBEDDED, with a view to making
	TINY_RCU default for !SMP at some time in the future.

o	Added (trivial) support for expedited grace periods.

Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:

o	Squeeze the size down a bit further by removing the
	->completed field from struct rcu_ctrlblk.

o	This permits synchronize_rcu() to become the empty function.
	Previous concerns about rcutorture were unfounded, as
	rcutorture correctly handles a constant value from
	rcu_batches_completed() and rcu_batches_completed_bh().

Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:

o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
	rcu_nmi_exit(), to be static inlines, as suggested by David
	Howells.  Doing this saves about 100 bytes from rcutiny.o.
	(The numbers between v3 and this v4 of the patch are not directly
	comparable, since they are against different versions of Linux.)

Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:

o	Fix whitespace issues.

o	Change short-circuit "||" operator to instead be "+" in order to
	fix performance bug noted by "kraai" on LWN.

		(http://lwn.net/Articles/324348/)

Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:

o	This version depends on EMBEDDED as well as !SMP, as suggested
	by Ingo.

o	Updated rcu_needs_cpu() to unconditionally return zero,
	permitting the CPU to enter dynticks-idle mode at any time.
	This works because callbacks can be invoked upon entry to
	dynticks-idle mode.

o	Paul is now OK with this being included, based on a poll at the
	Kernel Miniconf at linux.conf.au, where about ten people said
	that they cared about saving 900 bytes on single-CPU systems.

o	Applies to both mainline and tip/core/rcu.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/hardirq.h  |   24 ++++
 include/linux/rcupdate.h |    6 +
 include/linux/rcutiny.h  |  103 +++++++++++++++++
 init/Kconfig             |    9 ++
 kernel/Makefile          |    1 +
 kernel/rcupdate.c        |    4 +
 kernel/rcutiny.c         |  281 ++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 428 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/rcutiny.h
 create mode 100644 kernel/rcutiny.c

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 6d527ee..d5b3876 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,10 +139,34 @@ static inline void account_system_vtime(struct task_struct *tsk)
 #endif
 
 #if defined(CONFIG_NO_HZ)
+#if defined(CONFIG_TINY_RCU)
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+static inline void rcu_irq_enter(void)
+{
+	rcu_exit_nohz();
+}
+
+static inline void rcu_irq_exit(void)
+{
+	rcu_enter_nohz();
+}
+
+static inline void rcu_nmi_enter(void)
+{
+}
+
+static inline void rcu_nmi_exit(void)
+{
+}
+
+#else
 extern void rcu_irq_enter(void);
 extern void rcu_irq_exit(void);
 extern void rcu_nmi_enter(void);
 extern void rcu_nmi_exit(void);
+#endif
 #else
 # define rcu_irq_enter() do { } while (0)
 # define rcu_irq_exit() do { } while (0)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3ebd0b7..6dd71fa 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -68,11 +68,17 @@ extern int sched_expedited_torture_stats(char *page);
 /* Internal to kernel */
 extern void rcu_init(void);
 extern void rcu_scheduler_starting(void);
+#ifndef CONFIG_TINY_RCU
 extern int rcu_needs_cpu(int cpu);
+#else
+static inline int rcu_needs_cpu(int cpu) { return 0; }
+#endif
 extern int rcu_scheduler_active;
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
 #include <linux/rcutree.h>
+#elif CONFIG_TINY_RCU
+#include <linux/rcutiny.h>
 #else
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
new file mode 100644
index 0000000..08f17ab
--- /dev/null
+++ b/include/linux/rcutiny.h
@@ -0,0 +1,103 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2008
+ *
+ * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * 		Documentation/RCU
+ */
+
+#ifndef __LINUX_TINY_H
+#define __LINUX_TINY_H
+
+#include <linux/cache.h>
+
+/* Global control variables for rcupdate callback mechanism. */
+struct rcu_ctrlblk {
+	struct rcu_head *rcucblist;	/* List of pending callbacks (CBs). */
+	struct rcu_head **donetail;	/* ->next pointer of last "done" CB. */
+	struct rcu_head **curtail;	/* ->next pointer of last CB. */
+};
+
+extern struct rcu_ctrlblk rcu_ctrlblk;
+extern struct rcu_ctrlblk rcu_bh_ctrlblk;
+
+void rcu_sched_qs(int cpu);
+void rcu_bh_qs(int cpu);
+
+#define __rcu_read_lock()	preempt_disable()
+#define __rcu_read_unlock()	preempt_enable()
+#define __rcu_read_lock_bh()	local_bh_disable()
+#define __rcu_read_unlock_bh()	local_bh_enable()
+#define call_rcu_sched		call_rcu
+
+#define rcu_init_sched()	do { } while (0)
+extern void rcu_check_callbacks(int cpu, int user);
+extern void __rcu_init(void);
+/* extern void rcu_restart_cpu(int cpu); */
+
+/*
+ * Return the number of grace periods.
+ */
+static inline long rcu_batches_completed(void)
+{
+	return 0;
+}
+
+/*
+ * Return the number of bottom-half grace periods.
+ */
+static inline long rcu_batches_completed_bh(void)
+{
+	return 0;
+}
+
+extern int rcu_expedited_torture_stats(char *page);
+
+static inline int rcu_pending(int cpu)
+{
+	return 1;
+}
+
+struct notifier_block;
+extern int rcu_cpu_notify(struct notifier_block *self,
+			  unsigned long action, void *hcpu);
+
+#ifdef CONFIG_NO_HZ
+
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+#else /* #ifdef CONFIG_NO_HZ */
+
+static inline void rcu_enter_nohz(void)
+{
+}
+
+static inline void rcu_exit_nohz(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_NO_HZ */
+
+static inline void exit_rcu(void)
+{
+}
+
+#endif /* __LINUX_RCUTINY_H */
diff --git a/init/Kconfig b/init/Kconfig
index 0121c0e..4fecb53 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -334,6 +334,15 @@ config TREE_PREEMPT_RCU
 	  is also required.  It also scales down nicely to
 	  smaller systems.
 
+config TINY_RCU
+	bool "UP-only small-memory-footprint RCU"
+	depends on !SMP
+	help
+	  This option selects the RCU implementation that is
+	  designed for UP systems from which real-time response
+	  is not required.  This option greatly reduces the
+	  memory footprint of RCU.
+
 endchoice
 
 config RCU_TRACE
diff --git a/kernel/Makefile b/kernel/Makefile
index 7c9b0a5..0098bcf 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -83,6 +83,7 @@ obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
 obj-$(CONFIG_TREE_RCU) += rcutree.o
 obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
 obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
+obj-$(CONFIG_TINY_RCU) += rcutiny.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 4001833..7625f20 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -67,6 +67,8 @@ void wakeme_after_rcu(struct rcu_head  *head)
 	complete(&rcu->completion);
 }
 
+#ifndef CONFIG_TINY_RCU
+
 #ifdef CONFIG_TREE_PREEMPT_RCU
 
 /**
@@ -157,6 +159,8 @@ void synchronize_rcu_bh(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
 
+#endif /* #ifndef CONFIG_TINY_RCU */
+
 static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
 		unsigned long action, void *hcpu)
 {
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
new file mode 100644
index 0000000..89124b0
--- /dev/null
+++ b/kernel/rcutiny.c
@@ -0,0 +1,281 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2008
+ *
+ * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * 		Documentation/RCU
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/completion.h>
+#include <linux/moduleparam.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
+#include <linux/mutex.h>
+#include <linux/time.h>
+
+/* Definition for rcupdate control block. */
+struct rcu_ctrlblk rcu_ctrlblk = {
+	.rcucblist = NULL,
+	.donetail = &rcu_ctrlblk.rcucblist,
+	.curtail = &rcu_ctrlblk.rcucblist,
+};
+EXPORT_SYMBOL_GPL(rcu_ctrlblk);
+struct rcu_ctrlblk rcu_bh_ctrlblk = {
+	.rcucblist = NULL,
+	.donetail = &rcu_bh_ctrlblk.rcucblist,
+	.curtail = &rcu_bh_ctrlblk.rcucblist,
+};
+EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);
+
+#ifdef CONFIG_NO_HZ
+
+static long rcu_dynticks_nesting = 1;
+
+/*
+ * Enter dynticks-idle mode, which is an extended quiescent state
+ * if we have fully entered that mode (i.e., if the new value of
+ * dynticks_nesting is zero).
+ */
+void rcu_enter_nohz(void)
+{
+	if (--rcu_dynticks_nesting == 0)
+		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
+}
+
+/*
+ * Exit dynticks-idle mode, so that we are no longer in an extended
+ * quiescent state.
+ */
+void rcu_exit_nohz(void)
+{
+	rcu_dynticks_nesting++;
+}
+
+#endif /* #ifdef CONFIG_NO_HZ */
+
+/*
+ * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc().
+ */
+static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
+{
+	if (rcp->rcucblist != NULL &&
+	    rcp->donetail != rcp->curtail) {
+		rcp->donetail = rcp->curtail;
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Record an rcu quiescent state.  And an rcu_bh quiescent state while we
+ * are at it, given that any rcu quiescent state is also an rcu_bh
+ * quiescent state.  Use "+" instead of "||" to defeat short circuiting.
+ */
+void rcu_sched_qs(int cpu)
+{
+	if (rcu_qsctr_help(&rcu_ctrlblk) + rcu_qsctr_help(&rcu_bh_ctrlblk))
+		raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Record an rcu_bh quiescent state.
+ */
+void rcu_bh_qs(int cpu)
+{
+	if (rcu_qsctr_help(&rcu_bh_ctrlblk))
+		raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Check to see if the scheduling-clock interrupt came from an extended
+ * quiescent state, and, if so, tell RCU about it.
+ */
+void rcu_check_callbacks(int cpu, int user)
+{
+	if (!rcu_needs_cpu(0))
+		return;	/* RCU doesn't need anything to be done. */
+	if (user ||
+	    (idle_cpu(cpu) &&
+	     !in_softirq() &&
+	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
+		rcu_sched_qs(cpu);
+	else if (!in_softirq())
+		rcu_bh_qs(cpu);
+}
+
+/*
+ * Helper function for rcu_process_callbacks() that operates on the
+ * specified rcu_ctrlkblk structure.
+ */
+static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
+{
+	unsigned long flags;
+	struct rcu_head *next, *list;
+
+	/* If no RCU callbacks ready to invoke, just return. */
+	if (&rcp->rcucblist == rcp->donetail)
+		return;
+
+	/* Move the ready-to-invoke callbacks to a local list. */
+	local_irq_save(flags);
+	list = rcp->rcucblist;
+	rcp->rcucblist = *rcp->donetail;
+	*rcp->donetail = NULL;
+	if (rcp->curtail == rcp->donetail)
+		rcp->curtail = &rcp->rcucblist;
+	rcp->donetail = &rcp->rcucblist;
+	local_irq_restore(flags);
+
+	/* Invoke the callbacks on the local list. */
+	while (list) {
+		next = list->next;
+		prefetch(next);
+		list->func(list);
+		list = next;
+	}
+}
+
+/*
+ * Invoke any callbacks whose grace period has completed.
+ */
+static void rcu_process_callbacks(struct softirq_action *unused)
+{
+	__rcu_process_callbacks(&rcu_ctrlblk);
+	__rcu_process_callbacks(&rcu_bh_ctrlblk);
+}
+
+/*
+ * Null function to handle CPU being onlined.  Longer term, we want to
+ * make TINY_RCU avoid using rcupdate.c, but later...
+ */
+int rcu_cpu_notify(struct notifier_block *self,
+		   unsigned long action, void *hcpu)
+{
+	return NOTIFY_OK;
+}
+
+/*
+ * Wait for a grace period to elapse.  But it is illegal to invoke
+ * synchronize_sched() from within an RCU read-side critical section.
+ * Therefore, any legal call to synchronize_sched() is a quiescent
+ * state, and so on a UP system, synchronize_sched() need do nothing.
+ * Ditto for synchronize_rcu_bh().
+ *
+ * Cool, huh?  (Due to Josh Triplett.)
+ *
+ * But we want to make this a static inline later.
+ */
+void synchronize_sched(void)
+{
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
+void synchronize_rcu_bh(void)
+{
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
+
+/*
+ * Helper function for call_rcu() and call_rcu_bh().
+ */
+static void __call_rcu(struct rcu_head *head,
+		       void (*func)(struct rcu_head *rcu),
+		       struct rcu_ctrlblk *rcp)
+{
+	unsigned long flags;
+
+	head->func = func;
+	head->next = NULL;
+	local_irq_save(flags);
+	*rcp->curtail = head;
+	rcp->curtail = &head->next;
+	local_irq_restore(flags);
+}
+
+/*
+ * Post an RCU callback to be invoked after the end of an RCU grace
+ * period.  But since we have but one CPU, that would be after any
+ * quiescent state.
+ */
+void call_rcu(struct rcu_head *head,
+	      void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu(head, func, &rcu_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+/*
+ * Post an RCU bottom-half callback to be invoked after any subsequent
+ * quiescent state.
+ */
+void call_rcu_bh(struct rcu_head *head,
+		 void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu(head, func, &rcu_bh_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu_bh);
+
+void rcu_barrier(void)
+{
+	struct rcu_synchronize rcu;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+void rcu_barrier_bh(void)
+{
+	struct rcu_synchronize rcu;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu_bh(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_bh);
+
+void rcu_barrier_sched(void)
+{
+	struct rcu_synchronize rcu;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu_sched(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_sched);
+
+void __rcu_init(void)
+{
+	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
+}
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC tip/core/rcu 2/3] rcu: Add synchronize_srcu_expedited()
  2009-10-09 22:49 [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Paul E. McKenney
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7 Paul E. McKenney
@ 2009-10-09 22:50 ` Paul E. McKenney
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 3/3] rcu: add synchronize_srcu_expedited() to the rcutorture test suite Paul E. McKenney
  2009-10-10  3:47 ` [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Josh Triplett
  3 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-09 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, avi,
	mtosatti, torvalds, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

This patch creates a synchronize_srcu_expedited() that uses
synchronize_sched_expedited() where synchronize_srcu()
uses synchronize_sched().  The synchronize_srcu() and
synchronize_srcu_expedited() functions become one-liners that pass
synchronize_sched() or synchronize_sched_expedited(), repectively,
to a new __synchronize_srcu() function.

While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
follow the corresponding functions.

Requested-by: Avi Kivity <avi@redhat.com>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/srcu.h |    1 +
 kernel/srcu.c        |   74 ++++++++++++++++++++++++++++++++++---------------
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index aca0eee..4765d97 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -48,6 +48,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp);
 int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
 void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
 void synchronize_srcu(struct srcu_struct *sp);
+void synchronize_srcu_expedited(struct srcu_struct *sp);
 long srcu_batches_completed(struct srcu_struct *sp);
 
 #endif
diff --git a/kernel/srcu.c b/kernel/srcu.c
index b0aeeaf..818d7d9 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -49,6 +49,7 @@ int init_srcu_struct(struct srcu_struct *sp)
 	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
 	return (sp->per_cpu_ref ? 0 : -ENOMEM);
 }
+EXPORT_SYMBOL_GPL(init_srcu_struct);
 
 /*
  * srcu_readers_active_idx -- returns approximate number of readers
@@ -97,6 +98,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
 	free_percpu(sp->per_cpu_ref);
 	sp->per_cpu_ref = NULL;
 }
+EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
 
 /**
  * srcu_read_lock - register a new reader for an SRCU-protected structure.
@@ -118,6 +120,7 @@ int srcu_read_lock(struct srcu_struct *sp)
 	preempt_enable();
 	return idx;
 }
+EXPORT_SYMBOL_GPL(srcu_read_lock);
 
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
@@ -136,22 +139,12 @@ void srcu_read_unlock(struct srcu_struct *sp, int idx)
 	per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
 	preempt_enable();
 }
+EXPORT_SYMBOL_GPL(srcu_read_unlock);
 
-/**
- * synchronize_srcu - wait for prior SRCU read-side critical-section completion
- * @sp: srcu_struct with which to synchronize.
- *
- * Flip the completed counter, and wait for the old count to drain to zero.
- * As with classic RCU, the updater must use some separate means of
- * synchronizing concurrent updates.  Can block; must be called from
- * process context.
- *
- * Note that it is illegal to call synchornize_srcu() from the corresponding
- * SRCU read-side critical section; doing so will result in deadlock.
- * However, it is perfectly legal to call synchronize_srcu() on one
- * srcu_struct from some other srcu_struct's read-side critical section.
+/*
+ * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
  */
-void synchronize_srcu(struct srcu_struct *sp)
+void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
 {
 	int idx;
 
@@ -173,7 +166,7 @@ void synchronize_srcu(struct srcu_struct *sp)
 		return;
 	}
 
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
+	sync_func();  /* Force memory barrier on all CPUs. */
 
 	/*
 	 * The preceding synchronize_sched() ensures that any CPU that
@@ -190,7 +183,7 @@ void synchronize_srcu(struct srcu_struct *sp)
 	idx = sp->completed & 0x1;
 	sp->completed++;
 
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
+	sync_func();  /* Force memory barrier on all CPUs. */
 
 	/*
 	 * At this point, because of the preceding synchronize_sched(),
@@ -203,7 +196,7 @@ void synchronize_srcu(struct srcu_struct *sp)
 	while (srcu_readers_active_idx(sp, idx))
 		schedule_timeout_interruptible(1);
 
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
+	sync_func();  /* Force memory barrier on all CPUs. */
 
 	/*
 	 * The preceding synchronize_sched() forces all srcu_read_unlock()
@@ -237,6 +230,47 @@ void synchronize_srcu(struct srcu_struct *sp)
 }
 
 /**
+ * synchronize_srcu - wait for prior SRCU read-side critical-section completion
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu() from the corresponding
+ * SRCU read-side critical section; doing so will result in deadlock.
+ * However, it is perfectly legal to call synchronize_srcu() on one
+ * srcu_struct from some other srcu_struct's read-side critical section.
+ */
+void synchronize_srcu(struct srcu_struct *sp)
+{
+	__synchronize_srcu(sp, synchronize_sched);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu);
+
+/**
+ * synchronize_srcu_expedited - like synchronize_srcu, but less patient
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu_expedited()
+ * from the corresponding SRCU read-side critical section; doing so
+ * will result in deadlock.  However, it is perfectly legal to call
+ * synchronize_srcu_expedited() on one srcu_struct from some other
+ * srcu_struct's read-side critical section.
+ */
+void synchronize_srcu_expedited(struct srcu_struct *sp)
+{
+	__synchronize_srcu(sp, synchronize_sched_expedited);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu_expedited);
+
+/**
  * srcu_batches_completed - return batches completed.
  * @sp: srcu_struct on which to report batch completion.
  *
@@ -248,10 +282,4 @@ long srcu_batches_completed(struct srcu_struct *sp)
 {
 	return sp->completed;
 }
-
-EXPORT_SYMBOL_GPL(init_srcu_struct);
-EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
-EXPORT_SYMBOL_GPL(srcu_read_lock);
-EXPORT_SYMBOL_GPL(srcu_read_unlock);
-EXPORT_SYMBOL_GPL(synchronize_srcu);
 EXPORT_SYMBOL_GPL(srcu_batches_completed);
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC tip/core/rcu 3/3] rcu: add synchronize_srcu_expedited() to the rcutorture test suite
  2009-10-09 22:49 [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Paul E. McKenney
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7 Paul E. McKenney
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 2/3] rcu: Add synchronize_srcu_expedited() Paul E. McKenney
@ 2009-10-09 22:50 ` Paul E. McKenney
  2009-10-10  3:47 ` [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Josh Triplett
  3 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-09 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, avi,
	mtosatti, torvalds, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Adds the "srcu_expedited" torture type, and also renames sched_ops_sync
to sched_sync_ops for consistency while we are in this file.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutorture.c |   25 ++++++++++++++++++++++---
 1 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 697c0a0..14480e8 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -547,6 +547,25 @@ static struct rcu_torture_ops srcu_ops = {
 	.name		= "srcu"
 };
 
+static void srcu_torture_synchronize_expedited(void)
+{
+	synchronize_srcu_expedited(&srcu_ctl);
+}
+
+static struct rcu_torture_ops srcu_expedited_ops = {
+	.init		= srcu_torture_init,
+	.cleanup	= srcu_torture_cleanup,
+	.readlock	= srcu_torture_read_lock,
+	.read_delay	= srcu_read_delay,
+	.readunlock	= srcu_torture_read_unlock,
+	.completed	= srcu_torture_completed,
+	.deferred_free	= rcu_sync_torture_deferred_free,
+	.sync		= srcu_torture_synchronize_expedited,
+	.cb_barrier	= NULL,
+	.stats		= srcu_torture_stats,
+	.name		= "srcu_expedited"
+};
+
 /*
  * Definitions for sched torture testing.
  */
@@ -592,7 +611,7 @@ static struct rcu_torture_ops sched_ops = {
 	.name		= "sched"
 };
 
-static struct rcu_torture_ops sched_ops_sync = {
+static struct rcu_torture_ops sched_sync_ops = {
 	.init		= rcu_sync_torture_init,
 	.cleanup	= NULL,
 	.readlock	= sched_torture_read_lock,
@@ -1098,8 +1117,8 @@ rcu_torture_init(void)
 	int firsterr = 0;
 	static struct rcu_torture_ops *torture_ops[] =
 		{ &rcu_ops, &rcu_sync_ops, &rcu_bh_ops, &rcu_bh_sync_ops,
-		  &sched_expedited_ops,
-		  &srcu_ops, &sched_ops, &sched_ops_sync, };
+		  &srcu_ops, &srcu_expedited_ops,
+		  &sched_ops, &sched_sync_ops, &sched_expedited_ops, };
 
 	mutex_lock(&fullstop_mutex);
 
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU
  2009-10-09 22:49 [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Paul E. McKenney
                   ` (2 preceding siblings ...)
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 3/3] rcu: add synchronize_srcu_expedited() to the rcutorture test suite Paul E. McKenney
@ 2009-10-10  3:47 ` Josh Triplett
  3 siblings, 0 replies; 18+ messages in thread
From: Josh Triplett @ 2009-10-10  3:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

On Fri, Oct 09, 2009 at 03:49:54PM -0700, Paul E. McKenney wrote:
> This series rebases TINY_RCU and creates an expedited SRCU along with
> corresponding RCU torture tests.  These are post-2.6.32 material.
> 
> 							Thanx, Paul
> 
> 
>  include/linux/hardirq.h  |   24 ++++
>  include/linux/rcupdate.h |    6 +
>  include/linux/rcutiny.h  |  103 +++++++++++++++++
>  include/linux/srcu.h     |    1 
>  init/Kconfig             |    9 +
>  kernel/Makefile          |    1 
>  kernel/rcupdate.c        |    4 
>  kernel/rcutiny.c         |  282 ++++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/rcutorture.c      |   26 +++-
>  kernel/srcu.c            |   75 ++++++++----
>  10 files changed, 502 insertions(+), 29 deletions(-)

For all three patches:
Acked-by: Josh Triplett <josh@joshtriplett.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7 Paul E. McKenney
@ 2009-10-12  9:29   ` Lai Jiangshan
  2009-10-12 16:40     ` Linus Torvalds
  2009-10-12 17:30     ` Paul E. McKenney
  2009-10-13  7:44   ` Lai Jiangshan
  1 sibling, 2 replies; 18+ messages in thread
From: Lai Jiangshan @ 2009-10-12  9:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds



Paul E. McKenney wrote:
> This patch is a version of RCU designed for !SMP provided for a
> small-footprint RCU implementation.  In particular, the implementation
> of synchronize_rcu() is extremely lightweight and high performance.
> It passes rcutorture testing in each of the four relevant configurations
> (combinations of NO_HZ and PREEMPT) on x86.  This saves about 1K bytes
> compared to old Classic RCU (which is no longer in mainline), and more
> than three kilobytes compared to Hierarchical RCU (updated to 2.6.30):
> 
> 	CONFIG_TREE_RCU:
> 
> 	   text	   data	    bss	    dec	    filename
> 	    663      32      20     715     kernel/rcupdate.o
> 	   3278     528      44    3850     kernel/rcutree.o
> 				   4565 Total (vs 4045 for v4)
> 
> 	CONFIG_TREE_PREEMPT_RCU:
> 
> 	   text	   data	    bss	    dec	    filename
> 	    743      32      20     795     kernel/rcupdate.o
> 	   4548     752      60    5360     kernel/rcutree.o
> 	   			   6155 Total (N/A for v4)
> 
> 	CONFIG_TINY_RCU:
> 
> 	   text	   data	    bss	    dec	    filename
> 	     96       4       0     100     kernel/rcupdate.o
> 	    720      28       0     748     kernel/rcutiny.o
> 	    			    848 Total (vs 1140 for v6)
> 
> The above is for x86.  Your mileage may vary on other platforms.
> Further compression is possible, but is being procrastinated.
> 
> Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
> 
> o	Forward ported to put it into the 2.6.33 stream.
> 
> o	Added lockdep support.
> 
> o	Make lightweight rcu_barrier.
> 
> Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
> 
> o	Ported to latest pre-2.6.32 merge window kernel.
> 
> 	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
> 	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
> 	- Provided trivial rcu_cpu_notify().
> 	- Provided trivial exit_rcu().
> 	- Provided trivial rcu_needs_cpu().
> 	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
> 
> o	Removed the dependence on EMBEDDED, with a view to making
> 	TINY_RCU default for !SMP at some time in the future.
> 
> o	Added (trivial) support for expedited grace periods.
> 
> Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
> 
> o	Squeeze the size down a bit further by removing the
> 	->completed field from struct rcu_ctrlblk.
> 
> o	This permits synchronize_rcu() to become the empty function.
> 	Previous concerns about rcutorture were unfounded, as
> 	rcutorture correctly handles a constant value from
> 	rcu_batches_completed() and rcu_batches_completed_bh().
> 
> Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
> 
> o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
> 	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
> 	rcu_nmi_exit(), to be static inlines, as suggested by David
> 	Howells.  Doing this saves about 100 bytes from rcutiny.o.
> 	(The numbers between v3 and this v4 of the patch are not directly
> 	comparable, since they are against different versions of Linux.)
> 
> Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
> 
> o	Fix whitespace issues.
> 
> o	Change short-circuit "||" operator to instead be "+" in order to
> 	fix performance bug noted by "kraai" on LWN.
> 
> 		(http://lwn.net/Articles/324348/)
> 
> Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
> 
> o	This version depends on EMBEDDED as well as !SMP, as suggested
> 	by Ingo.
> 
> o	Updated rcu_needs_cpu() to unconditionally return zero,
> 	permitting the CPU to enter dynticks-idle mode at any time.
> 	This works because callbacks can be invoked upon entry to
> 	dynticks-idle mode.
> 
> o	Paul is now OK with this being included, based on a poll at the
> 	Kernel Miniconf at linux.conf.au, where about ten people said
> 	that they cared about saving 900 bytes on single-CPU systems.
> 
> o	Applies to both mainline and tip/core/rcu.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/hardirq.h  |   24 ++++
>  include/linux/rcupdate.h |    6 +
>  include/linux/rcutiny.h  |  103 +++++++++++++++++
>  init/Kconfig             |    9 ++
>  kernel/Makefile          |    1 +
>  kernel/rcupdate.c        |    4 +
>  kernel/rcutiny.c         |  281 ++++++++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 428 insertions(+), 0 deletions(-)
>  create mode 100644 include/linux/rcutiny.h
>  create mode 100644 kernel/rcutiny.c
> 
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index 6d527ee..d5b3876 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -139,10 +139,34 @@ static inline void account_system_vtime(struct task_struct *tsk)
>  #endif
>  
>  #if defined(CONFIG_NO_HZ)
> +#if defined(CONFIG_TINY_RCU)
> +extern void rcu_enter_nohz(void);
> +extern void rcu_exit_nohz(void);
> +
> +static inline void rcu_irq_enter(void)
> +{
> +	rcu_exit_nohz();
> +}
> +
> +static inline void rcu_irq_exit(void)
> +{
> +	rcu_enter_nohz();
> +}
> +
> +static inline void rcu_nmi_enter(void)
> +{
> +}
> +
> +static inline void rcu_nmi_exit(void)
> +{
> +}
> +
> +#else
>  extern void rcu_irq_enter(void);
>  extern void rcu_irq_exit(void);
>  extern void rcu_nmi_enter(void);
>  extern void rcu_nmi_exit(void);
> +#endif
>  #else
>  # define rcu_irq_enter() do { } while (0)
>  # define rcu_irq_exit() do { } while (0)
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 3ebd0b7..6dd71fa 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -68,11 +68,17 @@ extern int sched_expedited_torture_stats(char *page);
>  /* Internal to kernel */
>  extern void rcu_init(void);
>  extern void rcu_scheduler_starting(void);
> +#ifndef CONFIG_TINY_RCU
>  extern int rcu_needs_cpu(int cpu);
> +#else
> +static inline int rcu_needs_cpu(int cpu) { return 0; }
> +#endif
>  extern int rcu_scheduler_active;
>  
>  #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
>  #include <linux/rcutree.h>
> +#elif CONFIG_TINY_RCU
> +#include <linux/rcutiny.h>
>  #else
>  #error "Unknown RCU implementation specified to kernel configuration"
>  #endif
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> new file mode 100644
> index 0000000..08f17ab
> --- /dev/null
> +++ b/include/linux/rcutiny.h
> @@ -0,0 +1,103 @@
> +/*
> + * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright IBM Corporation, 2008
> + *
> + * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> + *
> + * For detailed explanation of Read-Copy Update mechanism see -
> + * 		Documentation/RCU
> + */
> +
> +#ifndef __LINUX_TINY_H
> +#define __LINUX_TINY_H
> +
> +#include <linux/cache.h>
> +
> +/* Global control variables for rcupdate callback mechanism. */
> +struct rcu_ctrlblk {
> +	struct rcu_head *rcucblist;	/* List of pending callbacks (CBs). */
> +	struct rcu_head **donetail;	/* ->next pointer of last "done" CB. */
> +	struct rcu_head **curtail;	/* ->next pointer of last CB. */
> +};
> +
> +extern struct rcu_ctrlblk rcu_ctrlblk;
> +extern struct rcu_ctrlblk rcu_bh_ctrlblk;
> +
> +void rcu_sched_qs(int cpu);
> +void rcu_bh_qs(int cpu);
> +
> +#define __rcu_read_lock()	preempt_disable()
> +#define __rcu_read_unlock()	preempt_enable()
> +#define __rcu_read_lock_bh()	local_bh_disable()
> +#define __rcu_read_unlock_bh()	local_bh_enable()
> +#define call_rcu_sched		call_rcu
> +
> +#define rcu_init_sched()	do { } while (0)
> +extern void rcu_check_callbacks(int cpu, int user);
> +extern void __rcu_init(void);
> +/* extern void rcu_restart_cpu(int cpu); */
> +
> +/*
> + * Return the number of grace periods.
> + */
> +static inline long rcu_batches_completed(void)
> +{
> +	return 0;
> +}
> +
> +/*
> + * Return the number of bottom-half grace periods.
> + */
> +static inline long rcu_batches_completed_bh(void)
> +{
> +	return 0;
> +}
> +
> +extern int rcu_expedited_torture_stats(char *page);
> +
> +static inline int rcu_pending(int cpu)
> +{
> +	return 1;
> +}
> +
> +struct notifier_block;
> +extern int rcu_cpu_notify(struct notifier_block *self,
> +			  unsigned long action, void *hcpu);
> +
> +#ifdef CONFIG_NO_HZ
> +
> +extern void rcu_enter_nohz(void);
> +extern void rcu_exit_nohz(void);
> +
> +#else /* #ifdef CONFIG_NO_HZ */
> +
> +static inline void rcu_enter_nohz(void)
> +{
> +}
> +
> +static inline void rcu_exit_nohz(void)
> +{
> +}
> +
> +#endif /* #else #ifdef CONFIG_NO_HZ */
> +
> +static inline void exit_rcu(void)
> +{
> +}
> +
> +#endif /* __LINUX_RCUTINY_H */
> diff --git a/init/Kconfig b/init/Kconfig
> index 0121c0e..4fecb53 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -334,6 +334,15 @@ config TREE_PREEMPT_RCU
>  	  is also required.  It also scales down nicely to
>  	  smaller systems.
>  
> +config TINY_RCU
> +	bool "UP-only small-memory-footprint RCU"
> +	depends on !SMP
> +	help
> +	  This option selects the RCU implementation that is
> +	  designed for UP systems from which real-time response
> +	  is not required.  This option greatly reduces the
> +	  memory footprint of RCU.
> +
>  endchoice
>  
>  config RCU_TRACE
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 7c9b0a5..0098bcf 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -83,6 +83,7 @@ obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
>  obj-$(CONFIG_TREE_RCU) += rcutree.o
>  obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
>  obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
> +obj-$(CONFIG_TINY_RCU) += rcutiny.o
>  obj-$(CONFIG_RELAY) += relay.o
>  obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
>  obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index 4001833..7625f20 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -67,6 +67,8 @@ void wakeme_after_rcu(struct rcu_head  *head)
>  	complete(&rcu->completion);
>  }
>  
> +#ifndef CONFIG_TINY_RCU
> +
>  #ifdef CONFIG_TREE_PREEMPT_RCU
>  
>  /**
> @@ -157,6 +159,8 @@ void synchronize_rcu_bh(void)
>  }
>  EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
>  
> +#endif /* #ifndef CONFIG_TINY_RCU */
> +
>  static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
>  		unsigned long action, void *hcpu)
>  {
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> new file mode 100644
> index 0000000..89124b0
> --- /dev/null
> +++ b/kernel/rcutiny.c
> @@ -0,0 +1,281 @@
> +/*
> + * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright IBM Corporation, 2008
> + *
> + * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> + *
> + * For detailed explanation of Read-Copy Update mechanism see -
> + * 		Documentation/RCU
> + */
> +
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/rcupdate.h>
> +#include <linux/interrupt.h>
> +#include <linux/sched.h>
> +#include <linux/module.h>
> +#include <linux/completion.h>
> +#include <linux/moduleparam.h>
> +#include <linux/notifier.h>
> +#include <linux/cpu.h>
> +#include <linux/mutex.h>
> +#include <linux/time.h>
> +
> +/* Definition for rcupdate control block. */
> +struct rcu_ctrlblk rcu_ctrlblk = {
> +	.rcucblist = NULL,
> +	.donetail = &rcu_ctrlblk.rcucblist,
> +	.curtail = &rcu_ctrlblk.rcucblist,
> +};
> +EXPORT_SYMBOL_GPL(rcu_ctrlblk);
> +struct rcu_ctrlblk rcu_bh_ctrlblk = {
> +	.rcucblist = NULL,
> +	.donetail = &rcu_bh_ctrlblk.rcucblist,
> +	.curtail = &rcu_bh_ctrlblk.rcucblist,
> +};
> +EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);
> +
> +#ifdef CONFIG_NO_HZ
> +
> +static long rcu_dynticks_nesting = 1;
> +
> +/*
> + * Enter dynticks-idle mode, which is an extended quiescent state
> + * if we have fully entered that mode (i.e., if the new value of
> + * dynticks_nesting is zero).
> + */
> +void rcu_enter_nohz(void)
> +{
> +	if (--rcu_dynticks_nesting == 0)
> +		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> +}
> +
> +/*
> + * Exit dynticks-idle mode, so that we are no longer in an extended
> + * quiescent state.
> + */
> +void rcu_exit_nohz(void)
> +{
> +	rcu_dynticks_nesting++;
> +}
> +
> +#endif /* #ifdef CONFIG_NO_HZ */
> +
> +/*
> + * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc().
> + */
> +static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
> +{
> +	if (rcp->rcucblist != NULL &&
> +	    rcp->donetail != rcp->curtail) {
> +		rcp->donetail = rcp->curtail;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Record an rcu quiescent state.  And an rcu_bh quiescent state while we
> + * are at it, given that any rcu quiescent state is also an rcu_bh
> + * quiescent state.  Use "+" instead of "||" to defeat short circuiting.
> + */
> +void rcu_sched_qs(int cpu)
> +{
> +	if (rcu_qsctr_help(&rcu_ctrlblk) + rcu_qsctr_help(&rcu_bh_ctrlblk))
> +		raise_softirq(RCU_SOFTIRQ);


local_irq_disable()(better) or local_bh_disable() is needed here.

see here:
schedule() {
	...
	preempt_disable();
	....
	rcu_sched_qs(cpu); /* nothing to proctect accessing rcp->donetail */
	.....
}

> +}
> +
> +/*
> + * Record an rcu_bh quiescent state.
> + */
> +void rcu_bh_qs(int cpu)
> +{
> +	if (rcu_qsctr_help(&rcu_bh_ctrlblk))
> +		raise_softirq(RCU_SOFTIRQ);


It doesn't need local_irq_disable() nor local_bh_disable().
It's only called at __do_softirq(), but maybe a comment is needed.

> +}
> +
> +/*
> + * Check to see if the scheduling-clock interrupt came from an extended
> + * quiescent state, and, if so, tell RCU about it.
> + */
> +void rcu_check_callbacks(int cpu, int user)
> +{
> +	if (!rcu_needs_cpu(0))
> +		return;	/* RCU doesn't need anything to be done. */

rcu_needs_cpu(0) always returns 0 ......
The next statements will not be executed .....

> +	if (user ||
> +	    (idle_cpu(cpu) &&
> +	     !in_softirq() &&
> +	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> +		rcu_sched_qs(cpu);
> +	else if (!in_softirq())
> +		rcu_bh_qs(cpu);
> +}
> +
> +/*
> + * Helper function for rcu_process_callbacks() that operates on the
> + * specified rcu_ctrlkblk structure.
> + */
> +static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
> +{
> +	unsigned long flags;
> +	struct rcu_head *next, *list;
> +
> +	/* If no RCU callbacks ready to invoke, just return. */
> +	if (&rcp->rcucblist == rcp->donetail)
> +		return;
> +
> +	/* Move the ready-to-invoke callbacks to a local list. */
> +	local_irq_save(flags);
> +	list = rcp->rcucblist;
> +	rcp->rcucblist = *rcp->donetail;
> +	*rcp->donetail = NULL;
> +	if (rcp->curtail == rcp->donetail)
> +		rcp->curtail = &rcp->rcucblist;
> +	rcp->donetail = &rcp->rcucblist;
> +	local_irq_restore(flags);
> +
> +	/* Invoke the callbacks on the local list. */
> +	while (list) {
> +		next = list->next;
> +		prefetch(next);
> +		list->func(list);
> +		list = next;
> +	}
> +}
> +
> +/*
> + * Invoke any callbacks whose grace period has completed.
> + */
> +static void rcu_process_callbacks(struct softirq_action *unused)
> +{
> +	__rcu_process_callbacks(&rcu_ctrlblk);
> +	__rcu_process_callbacks(&rcu_bh_ctrlblk);
> +}
> +
> +/*
> + * Null function to handle CPU being onlined.  Longer term, we want to
> + * make TINY_RCU avoid using rcupdate.c, but later...
> + */
> +int rcu_cpu_notify(struct notifier_block *self,
> +		   unsigned long action, void *hcpu)
> +{
> +	return NOTIFY_OK;
> +}
> +
> +/*
> + * Wait for a grace period to elapse.  But it is illegal to invoke
> + * synchronize_sched() from within an RCU read-side critical section.
> + * Therefore, any legal call to synchronize_sched() is a quiescent
> + * state, and so on a UP system, synchronize_sched() need do nothing.
> + * Ditto for synchronize_rcu_bh().
> + *
> + * Cool, huh?  (Due to Josh Triplett.)
> + *
> + * But we want to make this a static inline later.
> + */
> +void synchronize_sched(void)
> +{

I stubbornly recommend adding a cond_resched()/might_sleep() here.

It reduces latency. (for !CONFIG_PREEMPT)
It prevents someone calls it on nonsleepable context.

> +}
> +EXPORT_SYMBOL_GPL(synchronize_sched);
> +
> +void synchronize_rcu_bh(void)
> +{

Ditto.

> +}
> +EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
> +
> +/*
> + * Helper function for call_rcu() and call_rcu_bh().
> + */
> +static void __call_rcu(struct rcu_head *head,
> +		       void (*func)(struct rcu_head *rcu),
> +		       struct rcu_ctrlblk *rcp)
> +{
> +	unsigned long flags;
> +
> +	head->func = func;
> +	head->next = NULL;
> +	local_irq_save(flags);
> +	*rcp->curtail = head;
> +	rcp->curtail = &head->next;
> +	local_irq_restore(flags);
> +}
> +
> +/*
> + * Post an RCU callback to be invoked after the end of an RCU grace
> + * period.  But since we have but one CPU, that would be after any
> + * quiescent state.
> + */
> +void call_rcu(struct rcu_head *head,
> +	      void (*func)(struct rcu_head *rcu))
> +{
> +	__call_rcu(head, func, &rcu_ctrlblk);
> +}
> +EXPORT_SYMBOL_GPL(call_rcu);
> +
> +/*
> + * Post an RCU bottom-half callback to be invoked after any subsequent
> + * quiescent state.
> + */
> +void call_rcu_bh(struct rcu_head *head,
> +		 void (*func)(struct rcu_head *rcu))
> +{
> +	__call_rcu(head, func, &rcu_bh_ctrlblk);
> +}
> +EXPORT_SYMBOL_GPL(call_rcu_bh);
> +
> +void rcu_barrier(void)
> +{
> +	struct rcu_synchronize rcu;
> +
> +	init_completion(&rcu.completion);
> +	/* Will wake me after RCU finished. */
> +	call_rcu(&rcu.head, wakeme_after_rcu);
> +	/* Wait for it. */
> +	wait_for_completion(&rcu.completion);
> +}
> +EXPORT_SYMBOL_GPL(rcu_barrier);
> +
> +void rcu_barrier_bh(void)
> +{
> +	struct rcu_synchronize rcu;
> +
> +	init_completion(&rcu.completion);
> +	/* Will wake me after RCU finished. */
> +	call_rcu_bh(&rcu.head, wakeme_after_rcu);
> +	/* Wait for it. */
> +	wait_for_completion(&rcu.completion);
> +}
> +EXPORT_SYMBOL_GPL(rcu_barrier_bh);
> +
> +void rcu_barrier_sched(void)
> +{
> +	struct rcu_synchronize rcu;
> +
> +	init_completion(&rcu.completion);
> +	/* Will wake me after RCU finished. */
> +	call_rcu_sched(&rcu.head, wakeme_after_rcu);
> +	/* Wait for it. */
> +	wait_for_completion(&rcu.completion);


alternative implementation(nonsleep implementation)

{
	cond_resched();

	rcp = &rcu_ctrlblk;
	local_irq_save(flags);
	if (rcp->rcucblist != NULL) {
		rcp->donetail = rcp->curtail;
		local_irq_restore(flags);

		local_bh_disable();
		__rcu_process_callbacks(rcp);
		local_bh_enable();
	} else
		local_irq_restore(flags);
}

Ditto for other rcu_barrier*()

> +}
> +EXPORT_SYMBOL_GPL(rcu_barrier_sched);
> +
> +void __rcu_init(void)
> +{
> +	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
> +}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-12  9:29   ` Lai Jiangshan
@ 2009-10-12 16:40     ` Linus Torvalds
  2009-10-12 17:30     ` Paul E. McKenney
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2009-10-12 16:40 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Paul E. McKenney, linux-kernel, mingo, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, avi, mtosatti



On Mon, 12 Oct 2009, Lai Jiangshan wrote:
>
> [ snip snip ]

Please don't quote the whole email just to comment on a small part of it.

It makes it almost completely impossible to see what your comments are, 
because they are hidden by a _much_ larger quoted section.

So please just edit out the parts of the original email that you are _not_ 
directly commenting on.

		Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-12  9:29   ` Lai Jiangshan
  2009-10-12 16:40     ` Linus Torvalds
@ 2009-10-12 17:30     ` Paul E. McKenney
  2009-10-13  6:05       ` Lai Jiangshan
  1 sibling, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-12 17:30 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

On Mon, Oct 12, 2009 at 05:29:25PM +0800, Lai Jiangshan wrote:

First, thank you very much for looking this over so carefully!!!

> Paul E. McKenney wrote:
> > This patch is a version of RCU designed for !SMP provided for a
> > small-footprint RCU implementation.  In particular, the implementation
> > of synchronize_rcu() is extremely lightweight and high performance.
> > It passes rcutorture testing in each of the four relevant configurations
> > (combinations of NO_HZ and PREEMPT) on x86.  This saves about 1K bytes
> > compared to old Classic RCU (which is no longer in mainline), and more
> > than three kilobytes compared to Hierarchical RCU (updated to 2.6.30):
> > 
> > 	CONFIG_TREE_RCU:
> > 
> > 	   text	   data	    bss	    dec	    filename
> > 	    663      32      20     715     kernel/rcupdate.o
> > 	   3278     528      44    3850     kernel/rcutree.o
> > 				   4565 Total (vs 4045 for v4)
> > 
> > 	CONFIG_TREE_PREEMPT_RCU:
> > 
> > 	   text	   data	    bss	    dec	    filename
> > 	    743      32      20     795     kernel/rcupdate.o
> > 	   4548     752      60    5360     kernel/rcutree.o
> > 	   			   6155 Total (N/A for v4)
> > 
> > 	CONFIG_TINY_RCU:
> > 
> > 	   text	   data	    bss	    dec	    filename
> > 	     96       4       0     100     kernel/rcupdate.o
> > 	    720      28       0     748     kernel/rcutiny.o
> > 	    			    848 Total (vs 1140 for v6)
> > 
> > The above is for x86.  Your mileage may vary on other platforms.
> > Further compression is possible, but is being procrastinated.
> > 
> > Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
> > 
> > o	Forward ported to put it into the 2.6.33 stream.
> > 
> > o	Added lockdep support.
> > 
> > o	Make lightweight rcu_barrier.
> > 
> > Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
> > 
> > o	Ported to latest pre-2.6.32 merge window kernel.
> > 
> > 	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
> > 	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
> > 	- Provided trivial rcu_cpu_notify().
> > 	- Provided trivial exit_rcu().
> > 	- Provided trivial rcu_needs_cpu().
> > 	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
> > 
> > o	Removed the dependence on EMBEDDED, with a view to making
> > 	TINY_RCU default for !SMP at some time in the future.
> > 
> > o	Added (trivial) support for expedited grace periods.
> > 
> > Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
> > 
> > o	Squeeze the size down a bit further by removing the
> > 	->completed field from struct rcu_ctrlblk.
> > 
> > o	This permits synchronize_rcu() to become the empty function.
> > 	Previous concerns about rcutorture were unfounded, as
> > 	rcutorture correctly handles a constant value from
> > 	rcu_batches_completed() and rcu_batches_completed_bh().
> > 
> > Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
> > 
> > o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
> > 	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
> > 	rcu_nmi_exit(), to be static inlines, as suggested by David
> > 	Howells.  Doing this saves about 100 bytes from rcutiny.o.
> > 	(The numbers between v3 and this v4 of the patch are not directly
> > 	comparable, since they are against different versions of Linux.)
> > 
> > Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
> > 
> > o	Fix whitespace issues.
> > 
> > o	Change short-circuit "||" operator to instead be "+" in order to
> > 	fix performance bug noted by "kraai" on LWN.
> > 
> > 		(http://lwn.net/Articles/324348/)
> > 
> > Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
> > 
> > o	This version depends on EMBEDDED as well as !SMP, as suggested
> > 	by Ingo.
> > 
> > o	Updated rcu_needs_cpu() to unconditionally return zero,
> > 	permitting the CPU to enter dynticks-idle mode at any time.
> > 	This works because callbacks can be invoked upon entry to
> > 	dynticks-idle mode.
> > 
> > o	Paul is now OK with this being included, based on a poll at the
> > 	Kernel Miniconf at linux.conf.au, where about ten people said
> > 	that they cared about saving 900 bytes on single-CPU systems.
> > 
> > o	Applies to both mainline and tip/core/rcu.
> > 
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/hardirq.h  |   24 ++++
> >  include/linux/rcupdate.h |    6 +
> >  include/linux/rcutiny.h  |  103 +++++++++++++++++
> >  init/Kconfig             |    9 ++
> >  kernel/Makefile          |    1 +
> >  kernel/rcupdate.c        |    4 +
> >  kernel/rcutiny.c         |  281 ++++++++++++++++++++++++++++++++++++++++++++++
> >  7 files changed, 428 insertions(+), 0 deletions(-)
> >  create mode 100644 include/linux/rcutiny.h
> >  create mode 100644 kernel/rcutiny.c
> > 
> > diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> > index 6d527ee..d5b3876 100644
> > --- a/include/linux/hardirq.h
> > +++ b/include/linux/hardirq.h
> > @@ -139,10 +139,34 @@ static inline void account_system_vtime(struct task_struct *tsk)
> >  #endif
> >  
> >  #if defined(CONFIG_NO_HZ)
> > +#if defined(CONFIG_TINY_RCU)
> > +extern void rcu_enter_nohz(void);
> > +extern void rcu_exit_nohz(void);
> > +
> > +static inline void rcu_irq_enter(void)
> > +{
> > +	rcu_exit_nohz();
> > +}
> > +
> > +static inline void rcu_irq_exit(void)
> > +{
> > +	rcu_enter_nohz();
> > +}
> > +
> > +static inline void rcu_nmi_enter(void)
> > +{
> > +}
> > +
> > +static inline void rcu_nmi_exit(void)
> > +{
> > +}
> > +
> > +#else
> >  extern void rcu_irq_enter(void);
> >  extern void rcu_irq_exit(void);
> >  extern void rcu_nmi_enter(void);
> >  extern void rcu_nmi_exit(void);
> > +#endif
> >  #else
> >  # define rcu_irq_enter() do { } while (0)
> >  # define rcu_irq_exit() do { } while (0)
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 3ebd0b7..6dd71fa 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -68,11 +68,17 @@ extern int sched_expedited_torture_stats(char *page);
> >  /* Internal to kernel */
> >  extern void rcu_init(void);
> >  extern void rcu_scheduler_starting(void);
> > +#ifndef CONFIG_TINY_RCU
> >  extern int rcu_needs_cpu(int cpu);
> > +#else
> > +static inline int rcu_needs_cpu(int cpu) { return 0; }
> > +#endif
> >  extern int rcu_scheduler_active;
> >  
> >  #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
> >  #include <linux/rcutree.h>
> > +#elif CONFIG_TINY_RCU
> > +#include <linux/rcutiny.h>
> >  #else
> >  #error "Unknown RCU implementation specified to kernel configuration"
> >  #endif
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > new file mode 100644
> > index 0000000..08f17ab
> > --- /dev/null
> > +++ b/include/linux/rcutiny.h
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright IBM Corporation, 2008
> > + *
> > + * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > + *
> > + * For detailed explanation of Read-Copy Update mechanism see -
> > + * 		Documentation/RCU
> > + */
> > +
> > +#ifndef __LINUX_TINY_H
> > +#define __LINUX_TINY_H
> > +
> > +#include <linux/cache.h>
> > +
> > +/* Global control variables for rcupdate callback mechanism. */
> > +struct rcu_ctrlblk {
> > +	struct rcu_head *rcucblist;	/* List of pending callbacks (CBs). */
> > +	struct rcu_head **donetail;	/* ->next pointer of last "done" CB. */
> > +	struct rcu_head **curtail;	/* ->next pointer of last CB. */
> > +};
> > +
> > +extern struct rcu_ctrlblk rcu_ctrlblk;
> > +extern struct rcu_ctrlblk rcu_bh_ctrlblk;
> > +
> > +void rcu_sched_qs(int cpu);
> > +void rcu_bh_qs(int cpu);
> > +
> > +#define __rcu_read_lock()	preempt_disable()
> > +#define __rcu_read_unlock()	preempt_enable()
> > +#define __rcu_read_lock_bh()	local_bh_disable()
> > +#define __rcu_read_unlock_bh()	local_bh_enable()
> > +#define call_rcu_sched		call_rcu
> > +
> > +#define rcu_init_sched()	do { } while (0)
> > +extern void rcu_check_callbacks(int cpu, int user);
> > +extern void __rcu_init(void);
> > +/* extern void rcu_restart_cpu(int cpu); */
> > +
> > +/*
> > + * Return the number of grace periods.
> > + */
> > +static inline long rcu_batches_completed(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Return the number of bottom-half grace periods.
> > + */
> > +static inline long rcu_batches_completed_bh(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +extern int rcu_expedited_torture_stats(char *page);
> > +
> > +static inline int rcu_pending(int cpu)
> > +{
> > +	return 1;
> > +}
> > +
> > +struct notifier_block;
> > +extern int rcu_cpu_notify(struct notifier_block *self,
> > +			  unsigned long action, void *hcpu);
> > +
> > +#ifdef CONFIG_NO_HZ
> > +
> > +extern void rcu_enter_nohz(void);
> > +extern void rcu_exit_nohz(void);
> > +
> > +#else /* #ifdef CONFIG_NO_HZ */
> > +
> > +static inline void rcu_enter_nohz(void)
> > +{
> > +}
> > +
> > +static inline void rcu_exit_nohz(void)
> > +{
> > +}
> > +
> > +#endif /* #else #ifdef CONFIG_NO_HZ */
> > +
> > +static inline void exit_rcu(void)
> > +{
> > +}
> > +
> > +#endif /* __LINUX_RCUTINY_H */
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 0121c0e..4fecb53 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -334,6 +334,15 @@ config TREE_PREEMPT_RCU
> >  	  is also required.  It also scales down nicely to
> >  	  smaller systems.
> >  
> > +config TINY_RCU
> > +	bool "UP-only small-memory-footprint RCU"
> > +	depends on !SMP
> > +	help
> > +	  This option selects the RCU implementation that is
> > +	  designed for UP systems from which real-time response
> > +	  is not required.  This option greatly reduces the
> > +	  memory footprint of RCU.
> > +
> >  endchoice
> >  
> >  config RCU_TRACE
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 7c9b0a5..0098bcf 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -83,6 +83,7 @@ obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
> >  obj-$(CONFIG_TREE_RCU) += rcutree.o
> >  obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
> >  obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
> > +obj-$(CONFIG_TINY_RCU) += rcutiny.o
> >  obj-$(CONFIG_RELAY) += relay.o
> >  obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
> >  obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
> > diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> > index 4001833..7625f20 100644
> > --- a/kernel/rcupdate.c
> > +++ b/kernel/rcupdate.c
> > @@ -67,6 +67,8 @@ void wakeme_after_rcu(struct rcu_head  *head)
> >  	complete(&rcu->completion);
> >  }
> >  
> > +#ifndef CONFIG_TINY_RCU
> > +
> >  #ifdef CONFIG_TREE_PREEMPT_RCU
> >  
> >  /**
> > @@ -157,6 +159,8 @@ void synchronize_rcu_bh(void)
> >  }
> >  EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
> >  
> > +#endif /* #ifndef CONFIG_TINY_RCU */
> > +
> >  static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
> >  		unsigned long action, void *hcpu)
> >  {
> > diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> > new file mode 100644
> > index 0000000..89124b0
> > --- /dev/null
> > +++ b/kernel/rcutiny.c
> > @@ -0,0 +1,281 @@
> > +/*
> > + * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright IBM Corporation, 2008
> > + *
> > + * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > + *
> > + * For detailed explanation of Read-Copy Update mechanism see -
> > + * 		Documentation/RCU
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/kernel.h>
> > +#include <linux/init.h>
> > +#include <linux/rcupdate.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/sched.h>
> > +#include <linux/module.h>
> > +#include <linux/completion.h>
> > +#include <linux/moduleparam.h>
> > +#include <linux/notifier.h>
> > +#include <linux/cpu.h>
> > +#include <linux/mutex.h>
> > +#include <linux/time.h>
> > +
> > +/* Definition for rcupdate control block. */
> > +struct rcu_ctrlblk rcu_ctrlblk = {
> > +	.rcucblist = NULL,
> > +	.donetail = &rcu_ctrlblk.rcucblist,
> > +	.curtail = &rcu_ctrlblk.rcucblist,
> > +};
> > +EXPORT_SYMBOL_GPL(rcu_ctrlblk);
> > +struct rcu_ctrlblk rcu_bh_ctrlblk = {
> > +	.rcucblist = NULL,
> > +	.donetail = &rcu_bh_ctrlblk.rcucblist,
> > +	.curtail = &rcu_bh_ctrlblk.rcucblist,
> > +};
> > +EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);
> > +
> > +#ifdef CONFIG_NO_HZ
> > +
> > +static long rcu_dynticks_nesting = 1;
> > +
> > +/*
> > + * Enter dynticks-idle mode, which is an extended quiescent state
> > + * if we have fully entered that mode (i.e., if the new value of
> > + * dynticks_nesting is zero).
> > + */
> > +void rcu_enter_nohz(void)
> > +{
> > +	if (--rcu_dynticks_nesting == 0)
> > +		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> > +}
> > +
> > +/*
> > + * Exit dynticks-idle mode, so that we are no longer in an extended
> > + * quiescent state.
> > + */
> > +void rcu_exit_nohz(void)
> > +{
> > +	rcu_dynticks_nesting++;
> > +}
> > +
> > +#endif /* #ifdef CONFIG_NO_HZ */
> > +
> > +/*
> > + * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc().
> > + */
> > +static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
> > +{
> > +	if (rcp->rcucblist != NULL &&
> > +	    rcp->donetail != rcp->curtail) {
> > +		rcp->donetail = rcp->curtail;
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Record an rcu quiescent state.  And an rcu_bh quiescent state while we
> > + * are at it, given that any rcu quiescent state is also an rcu_bh
> > + * quiescent state.  Use "+" instead of "||" to defeat short circuiting.
> > + */
> > +void rcu_sched_qs(int cpu)
> > +{
> > +	if (rcu_qsctr_help(&rcu_ctrlblk) + rcu_qsctr_help(&rcu_bh_ctrlblk))
> > +		raise_softirq(RCU_SOFTIRQ);
> 
> 
> local_irq_disable()(better) or local_bh_disable() is needed here.
> 
> see here:
> schedule() {
> 	...
> 	preempt_disable();
> 	....
> 	rcu_sched_qs(cpu); /* nothing to proctect accessing rcp->donetail */
> 	.....
> }

Good eyes!!!

Otherwise, an interrupt might be taken from within rcu_qsctr_help(),
and the interrupt handler might invoke call_rcu(), which could fatally
confuse rcu_qsctr_help().  Not needed for treercu, since treercu's
version of rcu_sched_qs() just mucks with flags (famous last words!).

Fixed.

> > +}
> > +
> > +/*
> > + * Record an rcu_bh quiescent state.
> > + */
> > +void rcu_bh_qs(int cpu)
> > +{
> > +	if (rcu_qsctr_help(&rcu_bh_ctrlblk))
> > +		raise_softirq(RCU_SOFTIRQ);
> 
> 
> It doesn't need local_irq_disable() nor local_bh_disable().
> It's only called at __do_softirq(), but maybe a comment is needed.

Hmmmm...  Does this apply even when called from ksoftirqd?

Adding the local_irq_save() just out of paranoia for the moment.
And therefore moving the local_irq_save() down to the common
rcu_qsctr_help() function.  Which I am renaming to rcu_qs_help()
for consistency.

Or am I missing something here?

> > +}
> > +
> > +/*
> > + * Check to see if the scheduling-clock interrupt came from an extended
> > + * quiescent state, and, if so, tell RCU about it.
> > + */
> > +void rcu_check_callbacks(int cpu, int user)
> > +{
> > +	if (!rcu_needs_cpu(0))
> > +		return;	/* RCU doesn't need anything to be done. */
> 
> rcu_needs_cpu(0) always returns 0 ......
> The next statements will not be executed .....

Indeed!  Should instead be rcu_pending() -- which always returns 1.
So just deleting the above "if" statement.

The theory behing rcu_needs_cpu() always returning zero is that
rcu_enter_nohz() will be invoked on the way to no_hz mode, which
will invoke rcu_sched_qs(), which will update callbacks and do
raise_softirq(), causing any extant callbacks to be invoked.

Seem reasonable?

> > +	if (user ||
> > +	    (idle_cpu(cpu) &&
> > +	     !in_softirq() &&
> > +	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > +		rcu_sched_qs(cpu);
> > +	else if (!in_softirq())
> > +		rcu_bh_qs(cpu);
> > +}
> > +
> > +/*
> > + * Helper function for rcu_process_callbacks() that operates on the
> > + * specified rcu_ctrlkblk structure.
> > + */
> > +static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_head *next, *list;
> > +
> > +	/* If no RCU callbacks ready to invoke, just return. */
> > +	if (&rcp->rcucblist == rcp->donetail)
> > +		return;
> > +
> > +	/* Move the ready-to-invoke callbacks to a local list. */
> > +	local_irq_save(flags);
> > +	list = rcp->rcucblist;
> > +	rcp->rcucblist = *rcp->donetail;
> > +	*rcp->donetail = NULL;
> > +	if (rcp->curtail == rcp->donetail)
> > +		rcp->curtail = &rcp->rcucblist;
> > +	rcp->donetail = &rcp->rcucblist;
> > +	local_irq_restore(flags);
> > +
> > +	/* Invoke the callbacks on the local list. */
> > +	while (list) {
> > +		next = list->next;
> > +		prefetch(next);
> > +		list->func(list);
> > +		list = next;
> > +	}
> > +}
> > +
> > +/*
> > + * Invoke any callbacks whose grace period has completed.
> > + */
> > +static void rcu_process_callbacks(struct softirq_action *unused)
> > +{
> > +	__rcu_process_callbacks(&rcu_ctrlblk);
> > +	__rcu_process_callbacks(&rcu_bh_ctrlblk);
> > +}
> > +
> > +/*
> > + * Null function to handle CPU being onlined.  Longer term, we want to
> > + * make TINY_RCU avoid using rcupdate.c, but later...
> > + */
> > +int rcu_cpu_notify(struct notifier_block *self,
> > +		   unsigned long action, void *hcpu)
> > +{
> > +	return NOTIFY_OK;
> > +}
> > +
> > +/*
> > + * Wait for a grace period to elapse.  But it is illegal to invoke
> > + * synchronize_sched() from within an RCU read-side critical section.
> > + * Therefore, any legal call to synchronize_sched() is a quiescent
> > + * state, and so on a UP system, synchronize_sched() need do nothing.
> > + * Ditto for synchronize_rcu_bh().
> > + *
> > + * Cool, huh?  (Due to Josh Triplett.)
> > + *
> > + * But we want to make this a static inline later.
> > + */
> > +void synchronize_sched(void)
> > +{
> 
> I stubbornly recommend adding a cond_resched()/might_sleep() here.
> 
> It reduces latency. (for !CONFIG_PREEMPT)
> It prevents someone calls it on nonsleepable context.

Good point, fixed.

> > +}
> > +EXPORT_SYMBOL_GPL(synchronize_sched);
> > +
> > +void synchronize_rcu_bh(void)
> > +{
> 
> Ditto.

Just made this invoke synchronize_sched().

> > +}
> > +EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
> > +
> > +/*
> > + * Helper function for call_rcu() and call_rcu_bh().
> > + */
> > +static void __call_rcu(struct rcu_head *head,
> > +		       void (*func)(struct rcu_head *rcu),
> > +		       struct rcu_ctrlblk *rcp)
> > +{
> > +	unsigned long flags;
> > +
> > +	head->func = func;
> > +	head->next = NULL;
> > +	local_irq_save(flags);
> > +	*rcp->curtail = head;
> > +	rcp->curtail = &head->next;
> > +	local_irq_restore(flags);
> > +}
> > +
> > +/*
> > + * Post an RCU callback to be invoked after the end of an RCU grace
> > + * period.  But since we have but one CPU, that would be after any
> > + * quiescent state.
> > + */
> > +void call_rcu(struct rcu_head *head,
> > +	      void (*func)(struct rcu_head *rcu))
> > +{
> > +	__call_rcu(head, func, &rcu_ctrlblk);
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu);
> > +
> > +/*
> > + * Post an RCU bottom-half callback to be invoked after any subsequent
> > + * quiescent state.
> > + */
> > +void call_rcu_bh(struct rcu_head *head,
> > +		 void (*func)(struct rcu_head *rcu))
> > +{
> > +	__call_rcu(head, func, &rcu_bh_ctrlblk);
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu_bh);
> > +
> > +void rcu_barrier(void)
> > +{
> > +	struct rcu_synchronize rcu;
> > +
> > +	init_completion(&rcu.completion);
> > +	/* Will wake me after RCU finished. */
> > +	call_rcu(&rcu.head, wakeme_after_rcu);
> > +	/* Wait for it. */
> > +	wait_for_completion(&rcu.completion);
> > +}
> > +EXPORT_SYMBOL_GPL(rcu_barrier);
> > +
> > +void rcu_barrier_bh(void)
> > +{
> > +	struct rcu_synchronize rcu;
> > +
> > +	init_completion(&rcu.completion);
> > +	/* Will wake me after RCU finished. */
> > +	call_rcu_bh(&rcu.head, wakeme_after_rcu);
> > +	/* Wait for it. */
> > +	wait_for_completion(&rcu.completion);
> > +}
> > +EXPORT_SYMBOL_GPL(rcu_barrier_bh);
> > +
> > +void rcu_barrier_sched(void)
> > +{
> > +	struct rcu_synchronize rcu;
> > +
> > +	init_completion(&rcu.completion);
> > +	/* Will wake me after RCU finished. */
> > +	call_rcu_sched(&rcu.head, wakeme_after_rcu);
> > +	/* Wait for it. */
> > +	wait_for_completion(&rcu.completion);
> 
> 
> alternative implementation(nonsleep implementation)
> 
> {
> 	cond_resched();
> 
> 	rcp = &rcu_ctrlblk;
> 	local_irq_save(flags);
> 	if (rcp->rcucblist != NULL) {
> 		rcp->donetail = rcp->curtail;
> 		local_irq_restore(flags);
> 
> 		local_bh_disable();
> 		__rcu_process_callbacks(rcp);
> 		local_bh_enable();
> 	} else
> 		local_irq_restore(flags);
> }
> 
> Ditto for other rcu_barrier*()

I certainly can check for there being no callbacks present, and just
return in that case (with the cond_resched()).  Except that this is
TINY_RCU, where code size is the biggest issue, and people had better
not be using rcu_barrier() on latency-sensitive code paths.  (In
contrast, the low-latency synchronize_rcu() trick results in both
smaller code and lower latency.)

The concern I have with simply executing the callbacks directly is that
it might one day be necessary to throttle callback execution.  I do not
believe that this will happen because there is only a single CPU, but I
would like to make it easy to switch back to throttling should someone
prove me wrong.

So I am holding off on this for the moment, but if it turns out that
throttling is never necessary, this would be an attractive optimization.

> > +}
> > +EXPORT_SYMBOL_GPL(rcu_barrier_sched);
> > +
> > +void __rcu_init(void)
> > +{
> > +	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
> > +}

Again, thank you for your careful review and thoughtful comments!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-12 17:30     ` Paul E. McKenney
@ 2009-10-13  6:05       ` Lai Jiangshan
  0 siblings, 0 replies; 18+ messages in thread
From: Lai Jiangshan @ 2009-10-13  6:05 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

Paul E. McKenney wrote:
> Hmmmm...  Does this apply even when called from ksoftirqd?
> 
> Adding the local_irq_save() just out of paranoia for the moment.
> And therefore moving the local_irq_save() down to the common
> rcu_qsctr_help() function.  Which I am renaming to rcu_qs_help()
> for consistency.
> 
> Or am I missing something here?
> 


You are right! local_irq_save() is needed.

Lai

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7 Paul E. McKenney
  2009-10-12  9:29   ` Lai Jiangshan
@ 2009-10-13  7:44   ` Lai Jiangshan
  2009-10-13 17:00     ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Lai Jiangshan @ 2009-10-13  7:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

Again. trivial code-beautifying except the last one.

Paul E. McKenney wrote:

[...]

> +
> +/* Global control variables for rcupdate callback mechanism. */
> +struct rcu_ctrlblk {
> +	struct rcu_head *rcucblist;	/* List of pending callbacks (CBs). */
> +	struct rcu_head **donetail;	/* ->next pointer of last "done" CB. */
> +	struct rcu_head **curtail;	/* ->next pointer of last CB. */
> +};
> +
> +extern struct rcu_ctrlblk rcu_ctrlblk;
> +extern struct rcu_ctrlblk rcu_bh_ctrlblk;

Since rcu_batches_completed() returns a constant, this piece of
code is not needed here. We can move it to rcutiny.c.

and remove "EXPORT_SYMBOL_GPL(rcu_ctrlblk);" &
"EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);".

[...]

> +/* extern void rcu_restart_cpu(int cpu); */
> +

redundant comments.

[...]

> +
> +static inline int rcu_pending(int cpu)
> +{
> +	return 1;
> +}

It seems that no one use it.

[...]

> +/* Definition for rcupdate control block. */
> +struct rcu_ctrlblk rcu_ctrlblk = {
> +	.rcucblist = NULL,
> +	.donetail = &rcu_ctrlblk.rcucblist,
> +	.curtail = &rcu_ctrlblk.rcucblist,
> +};
> +EXPORT_SYMBOL_GPL(rcu_ctrlblk);

remove "EXPORT_SYMBOL_GPL(rcu_ctrlblk);"

> +struct rcu_ctrlblk rcu_bh_ctrlblk = {
> +	.rcucblist = NULL,
> +	.donetail = &rcu_bh_ctrlblk.rcucblist,
> +	.curtail = &rcu_bh_ctrlblk.rcucblist,
> +};
> +EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);

remove "EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);"

> +
> +#ifdef CONFIG_NO_HZ
> +
> +static long rcu_dynticks_nesting = 1;
> +
> +/*
> + * Enter dynticks-idle mode, which is an extended quiescent state
> + * if we have fully entered that mode (i.e., if the new value of
> + * dynticks_nesting is zero).
> + */
> +void rcu_enter_nohz(void)
> +{
> +	if (--rcu_dynticks_nesting == 0)
> +		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> +}
> +
> +/*
> + * Exit dynticks-idle mode, so that we are no longer in an extended
> + * quiescent state.
> + */
> +void rcu_exit_nohz(void)
> +{
> +	rcu_dynticks_nesting++;
> +}
> +
> +#endif /* #ifdef CONFIG_NO_HZ */

It's an old issue.
It's not only about RCUTINY, it's also about other rcu implementations:

rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.

irq_exit() calls tick_nohz_stop_sched_tick() which calls rcu_enter_nohz(),
where is the corresponding rcu_exit_nohz()?
(or tick_nohz_restart_sched_tick())?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-13  7:44   ` Lai Jiangshan
@ 2009-10-13 17:00     ` Paul E. McKenney
  2009-10-14  0:37       ` Lai Jiangshan
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-13 17:00 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

On Tue, Oct 13, 2009 at 03:44:53PM +0800, Lai Jiangshan wrote:
> Again. trivial code-beautifying except the last one.
> 
> Paul E. McKenney wrote:
> 
> [...]
> 
> > +
> > +/* Global control variables for rcupdate callback mechanism. */
> > +struct rcu_ctrlblk {
> > +	struct rcu_head *rcucblist;	/* List of pending callbacks (CBs). */
> > +	struct rcu_head **donetail;	/* ->next pointer of last "done" CB. */
> > +	struct rcu_head **curtail;	/* ->next pointer of last CB. */
> > +};
> > +
> > +extern struct rcu_ctrlblk rcu_ctrlblk;
> > +extern struct rcu_ctrlblk rcu_bh_ctrlblk;
> 
> Since rcu_batches_completed() returns a constant, this piece of
> code is not needed here. We can move it to rcutiny.c.

Good point, moved.  And removed the "extern" declarations.

> and remove "EXPORT_SYMBOL_GPL(rcu_ctrlblk);" &
> "EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);".

Fixed, thank you!  And marked them "static".

> [...]
> 
> > +/* extern void rcu_restart_cpu(int cpu); */
> > +
> 
> redundant comments.

Good eyes!  I believe I have long since proven to myself that
rcu_restart_cpu() is no longer used.  :-)

> [...]
> 
> > +
> > +static inline int rcu_pending(int cpu)
> > +{
> > +	return 1;
> > +}
> 
> It seems that no one use it.

Indeed!  Removed.

> [...]
> 
> > +/* Definition for rcupdate control block. */
> > +struct rcu_ctrlblk rcu_ctrlblk = {
> > +	.rcucblist = NULL,
> > +	.donetail = &rcu_ctrlblk.rcucblist,
> > +	.curtail = &rcu_ctrlblk.rcucblist,
> > +};
> > +EXPORT_SYMBOL_GPL(rcu_ctrlblk);
> 
> remove "EXPORT_SYMBOL_GPL(rcu_ctrlblk);"

Done, above.

> > +struct rcu_ctrlblk rcu_bh_ctrlblk = {
> > +	.rcucblist = NULL,
> > +	.donetail = &rcu_bh_ctrlblk.rcucblist,
> > +	.curtail = &rcu_bh_ctrlblk.rcucblist,
> > +};
> > +EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);
> 
> remove "EXPORT_SYMBOL_GPL(rcu_bh_ctrlblk);"

Done, above.

> > +
> > +#ifdef CONFIG_NO_HZ
> > +
> > +static long rcu_dynticks_nesting = 1;
> > +
> > +/*
> > + * Enter dynticks-idle mode, which is an extended quiescent state
> > + * if we have fully entered that mode (i.e., if the new value of
> > + * dynticks_nesting is zero).
> > + */
> > +void rcu_enter_nohz(void)
> > +{
> > +	if (--rcu_dynticks_nesting == 0)
> > +		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> > +}
> > +
> > +/*
> > + * Exit dynticks-idle mode, so that we are no longer in an extended
> > + * quiescent state.
> > + */
> > +void rcu_exit_nohz(void)
> > +{
> > +	rcu_dynticks_nesting++;
> > +}
> > +
> > +#endif /* #ifdef CONFIG_NO_HZ */
> 
> It's an old issue.
> It's not only about RCUTINY, it's also about other rcu implementations:
> 
> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
> 
> irq_exit() calls tick_nohz_stop_sched_tick() which calls rcu_enter_nohz(),
> where is the corresponding rcu_exit_nohz()?
> (or tick_nohz_restart_sched_tick())?

The tick_nohz_restart_sched_tick() function is called from the various
per-architecture cpu_idle() functions (or default_idle() or whatever
name that the architecture uses).  For example, in:

	arch/x86/kernel/process_64.c

the cpu_idle() function invokes tick_nohz_restart_sched_tick() just
before invoking schedule() to exit the idle loop.

And, as you say, tick_nohz_restart_sched_tick() invokes rcu_exit_nohz().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-13 17:00     ` Paul E. McKenney
@ 2009-10-14  0:37       ` Lai Jiangshan
  2009-10-14  1:09         ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Lai Jiangshan @ 2009-10-14  0:37 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

Paul E. McKenney wrote:
>> It's an old issue.
>> It's not only about RCUTINY, it's also about other rcu implementations:
>>
>> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
>>
>> irq_exit() calls tick_nohz_stop_sched_tick() which calls rcu_enter_nohz(),
>> where is the corresponding rcu_exit_nohz()?
>> (or tick_nohz_restart_sched_tick())?
> 
> The tick_nohz_restart_sched_tick() function is called from the various
> per-architecture cpu_idle() functions (or default_idle() or whatever
> name that the architecture uses).  For example, in:
> 
> 	arch/x86/kernel/process_64.c
> 
> the cpu_idle() function invokes tick_nohz_restart_sched_tick() just
> before invoking schedule() to exit the idle loop.
> 
> And, as you say, tick_nohz_restart_sched_tick() invokes rcu_exit_nohz().
> 
> 							Thanx, Paul
> 
> 

These tick_nohz_restart_sched_tick() which are called from the various
per-architecture cpu_idle() functions are not the opposite of
the tick_nohz_stop_sched_tick() in *irq_exit()*. So I figure that 
rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-14  0:37       ` Lai Jiangshan
@ 2009-10-14  1:09         ` Paul E. McKenney
  2009-10-14  2:05           ` Lai Jiangshan
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-14  1:09 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

On Wed, Oct 14, 2009 at 08:37:18AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> >> It's an old issue.
> >> It's not only about RCUTINY, it's also about other rcu implementations:
> >>
> >> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
> >>
> >> irq_exit() calls tick_nohz_stop_sched_tick() which calls rcu_enter_nohz(),
> >> where is the corresponding rcu_exit_nohz()?
> >> (or tick_nohz_restart_sched_tick())?
> > 
> > The tick_nohz_restart_sched_tick() function is called from the various
> > per-architecture cpu_idle() functions (or default_idle() or whatever
> > name that the architecture uses).  For example, in:
> > 
> > 	arch/x86/kernel/process_64.c
> > 
> > the cpu_idle() function invokes tick_nohz_restart_sched_tick() just
> > before invoking schedule() to exit the idle loop.
> > 
> > And, as you say, tick_nohz_restart_sched_tick() invokes rcu_exit_nohz().
> 
> These tick_nohz_restart_sched_tick() which are called from the various
> per-architecture cpu_idle() functions are not the opposite of
> the tick_nohz_stop_sched_tick() in *irq_exit()*. So I figure that 
> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.

OK, let's start with rcu_enter_nohz(), which tells RCU that the running
CPU is going into dyntick-idle mode, and thus should be ignored by RCU.
Let's do the idle loop first:

o	Upon entry to the idle() loop (using cpu_idle() in
	arch/x86/kernel/process_64.c for this exercise),
	we invoke tick_nohz_stop_sched_tick(1), which says we
	are in an idle loop.  (This is in contrast to the call
	from irq_exit(), where we are not in the idle loop.)

o	tick_nohz_stop_sched_tick() invokes rcu_enter_nohz(),
	does a bunch of timer checking, and returns.  If anything
	indicated that entering dyntick-idle mode would be bad,
	we raise TIMER_SOFTIRQ to kick us out of this mode.

	Either way, we return to the idle loop.

o	The idle loops until need_resched().  Upon exit from the
	idle loop, we call tick_nohz_restart_sched_tick(), which
	invokes rcu_exit_nohz(), which tells RCU to start paying
	attention to this CPU once more.

OK, now for interrupts.

o	The hardware interrupt handlers invoke irq_enter(), which in
	turn invokes rcu_irq_enter().  This has no real effect (other
	than incrementing a counter) if the interrupt did not come
	from dyntick-idle mode.

	Either way, RCU is now paying attention to RCU read-side
	critical sections on this CPU.

o	Upon return from interrupt, the hardware interrupt handlers
	invoke irq_exit(), which in turn invokes rcu_irq_exit().
	This has no real effect (other than decrementing a counter)
	if the interrupt is not returning to dyntick-idle mode.

	However, if the interrupt -is- returning to dyntick-idle
	mode, then RCU will stop paying attention to RCU read-side
	critical sections on this CPU.

So I do believe that rcu_enter_nohz() and rcu_exit_nohz() are in fact
invoked in pairs.  One strange thing about this is that the idle loop
first invokes rcu_enter_nohz(), then invokes rcu_exit_nohz(), while
an interrupt handler first invokes rcu_irq_enter() and then invokes
rcu_irq_exit().  So the idle loop enters dyntick-idle mode and then
leaves it, while an interrupt handler might leave dyntick-idle mode and
then re-enter it.

Or am I still missing something here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-14  1:09         ` Paul E. McKenney
@ 2009-10-14  2:05           ` Lai Jiangshan
  2009-10-14  2:49             ` Steven Rostedt
  2009-10-14  2:52             ` Paul E. McKenney
  0 siblings, 2 replies; 18+ messages in thread
From: Lai Jiangshan @ 2009-10-14  2:05 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

Paul E. McKenney wrote:
> On Wed, Oct 14, 2009 at 08:37:18AM +0800, Lai Jiangshan wrote:
>> Paul E. McKenney wrote:
>>>> It's an old issue.
>>>> It's not only about RCUTINY, it's also about other rcu implementations:
>>>>
>>>> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
>>>>
>>>> irq_exit() calls tick_nohz_stop_sched_tick() which calls rcu_enter_nohz(),
>>>> where is the corresponding rcu_exit_nohz()?
>>>> (or tick_nohz_restart_sched_tick())?
>>> The tick_nohz_restart_sched_tick() function is called from the various
>>> per-architecture cpu_idle() functions (or default_idle() or whatever
>>> name that the architecture uses).  For example, in:
>>>
>>> 	arch/x86/kernel/process_64.c
>>>
>>> the cpu_idle() function invokes tick_nohz_restart_sched_tick() just
>>> before invoking schedule() to exit the idle loop.
>>>
>>> And, as you say, tick_nohz_restart_sched_tick() invokes rcu_exit_nohz().
>> These tick_nohz_restart_sched_tick() which are called from the various
>> per-architecture cpu_idle() functions are not the opposite of
>> the tick_nohz_stop_sched_tick() in *irq_exit()*. So I figure that 
>> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
> 
> OK, let's start with rcu_enter_nohz(), which tells RCU that the running
> CPU is going into dyntick-idle mode, and thus should be ignored by RCU.
> Let's do the idle loop first:
> 
> o	Upon entry to the idle() loop (using cpu_idle() in
> 	arch/x86/kernel/process_64.c for this exercise),
> 	we invoke tick_nohz_stop_sched_tick(1), which says we
> 	are in an idle loop.  (This is in contrast to the call
> 	from irq_exit(), where we are not in the idle loop.)
> 
> o	tick_nohz_stop_sched_tick() invokes rcu_enter_nohz(),
> 	does a bunch of timer checking, and returns.  If anything
> 	indicated that entering dyntick-idle mode would be bad,
> 	we raise TIMER_SOFTIRQ to kick us out of this mode.
> 
> 	Either way, we return to the idle loop.
> 
> o	The idle loops until need_resched().  Upon exit from the
> 	idle loop, we call tick_nohz_restart_sched_tick(), which
> 	invokes rcu_exit_nohz(), which tells RCU to start paying
> 	attention to this CPU once more.
> 
> OK, now for interrupts.
> 
> o	The hardware interrupt handlers invoke irq_enter(), which in
> 	turn invokes rcu_irq_enter().  This has no real effect (other
> 	than incrementing a counter) if the interrupt did not come
> 	from dyntick-idle mode.
> 
> 	Either way, RCU is now paying attention to RCU read-side
> 	critical sections on this CPU.
> 
> o	Upon return from interrupt, the hardware interrupt handlers
> 	invoke irq_exit(), which in turn invokes rcu_irq_exit().
> 	This has no real effect (other than decrementing a counter)
> 	if the interrupt is not returning to dyntick-idle mode.
> 
> 	However, if the interrupt -is- returning to dyntick-idle
> 	mode, then RCU will stop paying attention to RCU read-side
> 	critical sections on this CPU.


You haven't explain the tick_nohz_stop_sched_tick() in *irq_exit()*.
(tick_nohz_stop_sched_tick() calls rcu_enter_nohz())

void irq_exit(void)
{
	....
	rcu_irq_exit(); /* This is OK, the opposite is in irq_enter() */
	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
		tick_nohz_stop_sched_tick(0); /* where is the opposite ??? */
	....
}

This means if the interrupt -is- returning to dyntick-idle mode,
rcu_enter_nohz() is called again.

Take this flow as example:

cpu_idle():
while(1) {

  tick_nohz_stop_sched_tick()
     rcu_enter_nohz()                 *****
------->interrupt happen
        irq_enter()
        irq_exit()
           tick_nohz_stop_sched_tick()
              rcu_enter_nohz()       *****
<-------interrupt returns
  tick_nohz_restart_sched_tick()
     rcu_exit_nohz()                  *****

} /* while(1) */


You can see that rcu_enter_nohz() is called twice and
rcu_exit_nohz() is only called once in this flow.

It's because tick_nohz_stop_sched_tick()/tick_nohz_restart_sched_tick()
are not called in pairs, so rcu_enter_nohz() and rcu_exit_nohz()
are not called in pairs either.

Lai

> 
> So I do believe that rcu_enter_nohz() and rcu_exit_nohz() are in fact
> invoked in pairs.  One strange thing about this is that the idle loop
> first invokes rcu_enter_nohz(), then invokes rcu_exit_nohz(), while
> an interrupt handler first invokes rcu_irq_enter() and then invokes
> rcu_irq_exit().  So the idle loop enters dyntick-idle mode and then
> leaves it, while an interrupt handler might leave dyntick-idle mode and
> then re-enter it.
> 
> Or am I still missing something here?
> 
> 							Thanx, Paul
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-14  2:05           ` Lai Jiangshan
@ 2009-10-14  2:49             ` Steven Rostedt
  2009-10-27  7:26               ` Lai Jiangshan
  2009-10-14  2:52             ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2009-10-14  2:49 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: paulmck, linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, Valdis.Kletnieks, dhowells, avi,
	mtosatti, torvalds

On Wed, 2009-10-14 at 10:05 +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:

> You haven't explain the tick_nohz_stop_sched_tick() in *irq_exit()*.
> (tick_nohz_stop_sched_tick() calls rcu_enter_nohz())
> 
> void irq_exit(void)
> {
> 	....
> 	rcu_irq_exit(); /* This is OK, the opposite is in irq_enter() */
> 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
> 		tick_nohz_stop_sched_tick(0); /* where is the opposite ??? */
> 	....
> }
> 
> This means if the interrupt -is- returning to dyntick-idle mode,
> rcu_enter_nohz() is called again.
> 
> Take this flow as example:
> 
> cpu_idle():
> while(1) {
> 
>   tick_nohz_stop_sched_tick()
>      rcu_enter_nohz()                 *****
> ------->interrupt happen
>         irq_enter()
>         irq_exit()
>            tick_nohz_stop_sched_tick()
>               rcu_enter_nohz()       *****
> <-------interrupt returns
>   tick_nohz_restart_sched_tick()
>      rcu_exit_nohz()                  *****
> 
> } /* while(1) */
> 
> 
> You can see that rcu_enter_nohz() is called twice and
> rcu_exit_nohz() is only called once in this flow.
> 
> It's because tick_nohz_stop_sched_tick()/tick_nohz_restart_sched_tick()
> are not called in pairs, so rcu_enter_nohz() and rcu_exit_nohz()
> are not called in pairs either.


But isn't the tick_nohz_stop_sched_tick have several exits?

void tick_nohz_stop_sched_tick(int inidle)
{
[..]

	if (!inidle && !ts->inidle)
		goto end;

	ts->inidle = 1;

[..]

		if (!ts->tick_stopped) {
[..]
			ts->tick_stopped = 1;
			ts->idle_jiffies = last_jiffies;
			rcu_enter_nohz();
		}
[..]


So I'm not sure calling tick_nohz_stop_sched_tick twice equals calling
rcu_enter_nohz twice.

-- Steve

> > 
> > So I do believe that rcu_enter_nohz() and rcu_exit_nohz() are in fact
> > invoked in pairs.  One strange thing about this is that the idle loop
> > first invokes rcu_enter_nohz(), then invokes rcu_exit_nohz(), while
> > an interrupt handler first invokes rcu_irq_enter() and then invokes
> > rcu_irq_exit().  So the idle loop enters dyntick-idle mode and then
> > leaves it, while an interrupt handler might leave dyntick-idle mode and
> > then re-enter it.
> > 
> > Or am I still missing something here?
> > 
> > 							Thanx, Paul
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-14  2:05           ` Lai Jiangshan
  2009-10-14  2:49             ` Steven Rostedt
@ 2009-10-14  2:52             ` Paul E. McKenney
  1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2009-10-14  2:52 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	avi, mtosatti, torvalds

On Wed, Oct 14, 2009 at 10:05:25AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Wed, Oct 14, 2009 at 08:37:18AM +0800, Lai Jiangshan wrote:
> >> Paul E. McKenney wrote:
> >>>> It's an old issue.
> >>>> It's not only about RCUTINY, it's also about other rcu implementations:
> >>>>
> >>>> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
> >>>>
> >>>> irq_exit() calls tick_nohz_stop_sched_tick() which calls rcu_enter_nohz(),
> >>>> where is the corresponding rcu_exit_nohz()?
> >>>> (or tick_nohz_restart_sched_tick())?
> >>> The tick_nohz_restart_sched_tick() function is called from the various
> >>> per-architecture cpu_idle() functions (or default_idle() or whatever
> >>> name that the architecture uses).  For example, in:
> >>>
> >>> 	arch/x86/kernel/process_64.c
> >>>
> >>> the cpu_idle() function invokes tick_nohz_restart_sched_tick() just
> >>> before invoking schedule() to exit the idle loop.
> >>>
> >>> And, as you say, tick_nohz_restart_sched_tick() invokes rcu_exit_nohz().
> >> These tick_nohz_restart_sched_tick() which are called from the various
> >> per-architecture cpu_idle() functions are not the opposite of
> >> the tick_nohz_stop_sched_tick() in *irq_exit()*. So I figure that 
> >> rcu_enter_nohz()/rcu_exit_nohz() are not called in pairs.
> > 
> > OK, let's start with rcu_enter_nohz(), which tells RCU that the running
> > CPU is going into dyntick-idle mode, and thus should be ignored by RCU.
> > Let's do the idle loop first:
> > 
> > o	Upon entry to the idle() loop (using cpu_idle() in
> > 	arch/x86/kernel/process_64.c for this exercise),
> > 	we invoke tick_nohz_stop_sched_tick(1), which says we
> > 	are in an idle loop.  (This is in contrast to the call
> > 	from irq_exit(), where we are not in the idle loop.)
> > 
> > o	tick_nohz_stop_sched_tick() invokes rcu_enter_nohz(),
> > 	does a bunch of timer checking, and returns.  If anything
> > 	indicated that entering dyntick-idle mode would be bad,
> > 	we raise TIMER_SOFTIRQ to kick us out of this mode.
> > 
> > 	Either way, we return to the idle loop.
> > 
> > o	The idle loops until need_resched().  Upon exit from the
> > 	idle loop, we call tick_nohz_restart_sched_tick(), which
> > 	invokes rcu_exit_nohz(), which tells RCU to start paying
> > 	attention to this CPU once more.
> > 
> > OK, now for interrupts.
> > 
> > o	The hardware interrupt handlers invoke irq_enter(), which in
> > 	turn invokes rcu_irq_enter().  This has no real effect (other
> > 	than incrementing a counter) if the interrupt did not come
> > 	from dyntick-idle mode.
> > 
> > 	Either way, RCU is now paying attention to RCU read-side
> > 	critical sections on this CPU.
> > 
> > o	Upon return from interrupt, the hardware interrupt handlers
> > 	invoke irq_exit(), which in turn invokes rcu_irq_exit().
> > 	This has no real effect (other than decrementing a counter)
> > 	if the interrupt is not returning to dyntick-idle mode.
> > 
> > 	However, if the interrupt -is- returning to dyntick-idle
> > 	mode, then RCU will stop paying attention to RCU read-side
> > 	critical sections on this CPU.
> 
> 
> You haven't explain the tick_nohz_stop_sched_tick() in *irq_exit()*.
> (tick_nohz_stop_sched_tick() calls rcu_enter_nohz())
> 
> void irq_exit(void)
> {
> 	....
> 	rcu_irq_exit(); /* This is OK, the opposite is in irq_enter() */
> 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
> 		tick_nohz_stop_sched_tick(0); /* where is the opposite ??? */
> 	....
> }
> 
> This means if the interrupt -is- returning to dyntick-idle mode,
> rcu_enter_nohz() is called again.
> 
> Take this flow as example:
> 
> cpu_idle():
> while(1) {
> 
>   tick_nohz_stop_sched_tick()
>      rcu_enter_nohz()                 *****

=== now RCU is in no_hz mode.

> ------->interrupt happen
>         irq_enter()

		rcu_irq_enter()

=== now RCU is no longer in no_hz mode.

>         irq_exit()

		rcu_irq_exit()

=== now RCU is in no_hz mode again.

>            tick_nohz_stop_sched_tick()

=== but tick_nohz_stop_sched_tick() is passed "0" as the argument.  
=== I might be missing something, but doesn't this prevent
=== rcu_enter_nohz() from being called at this point?

>               rcu_enter_nohz()       *****
> <-------interrupt returns
>   tick_nohz_restart_sched_tick()
>      rcu_exit_nohz()                  *****

=== now RCU is no longer in no_hz mode.

> } /* while(1) */
> 
> 
> You can see that rcu_enter_nohz() is called twice and
> rcu_exit_nohz() is only called once in this flow.
> 
> It's because tick_nohz_stop_sched_tick()/tick_nohz_restart_sched_tick()
> are not called in pairs, so rcu_enter_nohz() and rcu_exit_nohz()
> are not called in pairs either.

I believe that the checks in tick_nohz_stop_sched_tick() prevent this
scenario from happening, but could easily be mistaken.  I am not seeing
the WARN_ON_RATELIMIT() in rcu_exit_nohz(), however.

							Thanx, Paul

> Lai
> 
> > 
> > So I do believe that rcu_enter_nohz() and rcu_exit_nohz() are in fact
> > invoked in pairs.  One strange thing about this is that the idle loop
> > first invokes rcu_enter_nohz(), then invokes rcu_exit_nohz(), while
> > an interrupt handler first invokes rcu_irq_enter() and then invokes
> > rcu_irq_exit().  So the idle loop enters dyntick-idle mode and then
> > leaves it, while an interrupt handler might leave dyntick-idle mode and
> > then re-enter it.
> > 
> > Or am I still missing something here?
> > 
> > 							Thanx, Paul
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-14  2:49             ` Steven Rostedt
@ 2009-10-27  7:26               ` Lai Jiangshan
  2009-10-27 19:56                 ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Lai Jiangshan @ 2009-10-27  7:26 UTC (permalink / raw)
  To: rostedt, tglx
  Cc: paulmck, linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, peterz, Valdis.Kletnieks, dhowells, avi,
	mtosatti, torvalds

Steven Rostedt wrote:
> But isn't the tick_nohz_stop_sched_tick have several exits?
> 
> void tick_nohz_stop_sched_tick(int inidle)
> {
> [..]
> 
> 	if (!inidle && !ts->inidle)
> 		goto end;
> 
> 	ts->inidle = 1;
> 
> [..]
> 
> 		if (!ts->tick_stopped) {
> [..]
> 			ts->tick_stopped = 1;
> 			ts->idle_jiffies = last_jiffies;
> 			rcu_enter_nohz();
> 		}
> [..]
> 
> 
> So I'm not sure calling tick_nohz_stop_sched_tick twice equals calling
> rcu_enter_nohz twice.
> 

Hi, tglx, steven,

(Thank to tglx for helping me at the Japan Linux Symposium)

I found something weird about NO_HZ, maybe I misunderstood the codes.

see this flow:

cpu idle
  enter nohz
  cpu halt
---->interrupt happens
     irq_enter()
        we don't reprogram the clock device                         #1
     irq_exit()
       tick_nohz_stop_sched_tick(inidle = 0)
         something disallow this cpu reenter nohz                   #2
         we don't reprogram the clock device                        #3
<----interrupt return
   cpu halt again and wait interrupt for a long time than expected  #4
   exit nohz


#1 tick_nohz_kick_tick() is disabled in the current mainline kernel,
   so we don't calls tick_nohz_restart(ts, now) when irq_enter()

static void tick_nohz_kick_tick(int cpu)
{
#if 0                                            <------------- here
	/* Switch back to 2.6.27 behaviour */

	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
	ktime_t delta, now;

	if (!ts->tick_stopped)
		return;

	/*
	 * Do not touch the tick device, when the next expiry is either
	 * already reached or less/equal than the tick period.
	 */
	now = ktime_get();
	delta =	ktime_sub(hrtimer_get_expires(&ts->sched_timer), now);
	if (delta.tv64 <= tick_period.tv64)
		return;

	tick_nohz_restart(ts, now);               <----------- here
#endif
}


#2 When rcu_needs_cpu() or  printk_needs_cpu()
   returns true then tick_nohz_stop_sched_tick() will just return.

#3 And we don't reprogram the clock device when #2 happens

#4 So we may be in nohz for a long time than expected, but actually
    we have some work to do. (rcu, printk... etc)

So I think, we need to reprogram the clock device and restart the tick
when #2 happens, or there is something that I have misunderstood.

Thanks, Lai



> -- Steve
> 
>>> So I do believe that rcu_enter_nohz() and rcu_exit_nohz() are in fact
>>> invoked in pairs.  One strange thing about this is that the idle loop
>>> first invokes rcu_enter_nohz(), then invokes rcu_exit_nohz(), while
>>> an interrupt handler first invokes rcu_irq_enter() and then invokes
>>> rcu_irq_exit().  So the idle loop enters dyntick-idle mode and then
>>> leaves it, while an interrupt handler might leave dyntick-idle mode and
>>> then re-enter it.
>>>
>>> Or am I still missing something here?
>>>
>>> 							Thanx, Paul
>>>
>>>
> 
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7
  2009-10-27  7:26               ` Lai Jiangshan
@ 2009-10-27 19:56                 ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2009-10-27 19:56 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: tglx, paulmck, linux-kernel, mingo, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, peterz, Valdis.Kletnieks,
	dhowells, avi, mtosatti, torvalds

On Tue, 2009-10-27 at 15:26 +0800, Lai Jiangshan wrote:
> Steven Rostedt wrote:

> I found something weird about NO_HZ, maybe I misunderstood the codes.
> 
> see this flow:
> 
> cpu idle
>   enter nohz
>   cpu halt
> ---->interrupt happens
>      irq_enter()
>         we don't reprogram the clock device                         #1
>      irq_exit()
>        tick_nohz_stop_sched_tick(inidle = 0)
>          something disallow this cpu reenter nohz                   #2
>          we don't reprogram the clock device                        #3
> <----interrupt return
>    cpu halt again and wait interrupt for a long time than expected  #4
>    exit nohz
> 
> 
> #1 tick_nohz_kick_tick() is disabled in the current mainline kernel,
>    so we don't calls tick_nohz_restart(ts, now) when irq_enter()
> 
> static void tick_nohz_kick_tick(int cpu)
> {
> #if 0                                            <------------- here
> 	/* Switch back to 2.6.27 behaviour */
> 
> 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> 	ktime_t delta, now;
> 
> 	if (!ts->tick_stopped)
> 		return;
> 
> 	/*
> 	 * Do not touch the tick device, when the next expiry is either
> 	 * already reached or less/equal than the tick period.
> 	 */
> 	now = ktime_get();
> 	delta =	ktime_sub(hrtimer_get_expires(&ts->sched_timer), now);
> 	if (delta.tv64 <= tick_period.tv64)
> 		return;
> 
> 	tick_nohz_restart(ts, now);               <----------- here
> #endif
> }
> 
> 
> #2 When rcu_needs_cpu() or  printk_needs_cpu()
>    returns true then tick_nohz_stop_sched_tick() will just return.
> 
> #3 And we don't reprogram the clock device when #2 happens
> 
> #4 So we may be in nohz for a long time than expected, but actually
>     we have some work to do. (rcu, printk... etc)
> 
> So I think, we need to reprogram the clock device and restart the tick
> when #2 happens, or there is something that I have misunderstood.


This looks like a different question than you asked before. And
something I don't know the answer to ;-)

-- Steve



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-10-27 20:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-09 22:49 [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Paul E. McKenney
2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 1/3] rcu: The Bloatwatch Edition, v7 Paul E. McKenney
2009-10-12  9:29   ` Lai Jiangshan
2009-10-12 16:40     ` Linus Torvalds
2009-10-12 17:30     ` Paul E. McKenney
2009-10-13  6:05       ` Lai Jiangshan
2009-10-13  7:44   ` Lai Jiangshan
2009-10-13 17:00     ` Paul E. McKenney
2009-10-14  0:37       ` Lai Jiangshan
2009-10-14  1:09         ` Paul E. McKenney
2009-10-14  2:05           ` Lai Jiangshan
2009-10-14  2:49             ` Steven Rostedt
2009-10-27  7:26               ` Lai Jiangshan
2009-10-27 19:56                 ` Steven Rostedt
2009-10-14  2:52             ` Paul E. McKenney
2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 2/3] rcu: Add synchronize_srcu_expedited() Paul E. McKenney
2009-10-09 22:50 ` [PATCH RFC tip/core/rcu 3/3] rcu: add synchronize_srcu_expedited() to the rcutorture test suite Paul E. McKenney
2009-10-10  3:47 ` [PATCH RFC tip/core/rcu 0/3] Tiny RCU and expedited SRCU Josh Triplett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).