All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Optimize the cpu hotplug locking
@ 2013-10-02 14:56 Peter Zijlstra
  2013-10-02 14:56 ` [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-02 14:56 UTC (permalink / raw)
  To: Oleg Nesterov, Paul McKenney
  Cc: Mel Gorman, Rik van Riel, Srikar Dronamraju, Ingo Molnar,
	Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel, Peter Zijlstra

The current cpu hotplug lock is a single global lock; therefore excluding
hotplug is a very expensive proposition even though it is rare occurence under
normal operation.

There is a desire for a more light weight implementation of
{get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side.

The current hotplug lock is a full reader preference lock -- and thus supports
reader recursion. However since we're making the read side lock much cheaper it
is the expectation that it will also be used far more. Which in turn would lead
to writer starvation.

Therefore the new lock proposed is completely fair; albeit somewhat expensive
on the write side. This in turn means that we need a per-task nesting count to
support reader recursion.

This patch-set is in 3 parts;

 1) The new hotplug lock implementation, very fast on the read side,
    very expensive on the write side.

 2) Some new rcu_sync primitive by Oleg

 3) Employment of the rcy_sync thingy to optimize the write-side of the
    new hotplug lock.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus()
  2013-10-02 14:56 [PATCH 0/3] Optimize the cpu hotplug locking Peter Zijlstra
@ 2013-10-02 14:56 ` Peter Zijlstra
  2013-10-03 14:01   ` Peter Zijlstra
  2013-10-03 16:26   ` Paul E. McKenney
  2013-10-02 14:56 ` [PATCH 2/3] rcu: Create rcu_sync infrastructure Peter Zijlstra
  2013-10-02 14:56 ` [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync Peter Zijlstra
  2 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-02 14:56 UTC (permalink / raw)
  To: Oleg Nesterov, Paul McKenney
  Cc: Mel Gorman, Rik van Riel, Srikar Dronamraju, Ingo Molnar,
	Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel, Peter Zijlstra

[-- Attachment #1: peterz-per-cpu-hotplug.patch --]
[-- Type: text/plain, Size: 11830 bytes --]

The current implementation of get_online_cpus() is global of nature
and thus not suited for any kind of common usage.

Re-implement the current recursive r/w cpu hotplug lock such that the
read side locks are as light as possible.

The current cpu hotplug lock is entirely reader biased; but since
readers are expensive there aren't a lot of them about and writer
starvation isn't a particular problem.

However by making the reader side more usable there is a fair chance
it will get used more and thus the starvation issue becomes a real
possibility.

Therefore this new implementation is fair, alternating readers and
writers; this however requires per-task state to allow the reader
recursion -- this new task_struct member is placed in a 4 byte hole on
64bit builds.

Many comments are contributed by Paul McKenney, and many previous
attempts were shown to be inadequate by both Paul and Oleg; many
thanks to them for persisting to poke holes in my attempts.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/cpu.h   |   67 ++++++++++++++
 include/linux/sched.h |    3 
 kernel/cpu.c          |  226 ++++++++++++++++++++++++++++++++++++--------------
 kernel/sched/core.c   |    2 
 4 files changed, 235 insertions(+), 63 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -16,6 +16,8 @@
 #include <linux/node.h>
 #include <linux/compiler.h>
 #include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
 
 struct device;
 
@@ -173,10 +175,69 @@ extern struct bus_type cpu_subsys;
 #ifdef CONFIG_HOTPLUG_CPU
 /* Stop CPUs going up and down. */
 
+extern void cpu_hotplug_init_task(struct task_struct *p);
+
 extern void cpu_hotplug_begin(void);
 extern void cpu_hotplug_done(void);
-extern void get_online_cpus(void);
-extern void put_online_cpus(void);
+
+extern int __cpuhp_state;
+DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
+
+extern void __get_online_cpus(void);
+
+static inline void get_online_cpus(void)
+{
+	might_sleep();
+
+	/* Support reader recursion */
+	/* The value was >= 1 and remains so, reordering causes no harm. */
+	if (current->cpuhp_ref++)
+		return;
+
+	preempt_disable();
+	/*
+	 * We are in an RCU-sched read-side critical section, so the writer
+	 * cannot both change __cpuhp_state from readers_fast and start
+	 * checking counters while we are here. So if we see !__cpuhp_state,
+	 * we know that the writer won't be checking until we past the
+	 * preempt_enable() and that once the synchronize_sched() is done, the
+	 * writer will see anything we did within this RCU-sched read-side
+	 * critical section.
+	 */
+	if (likely(!__cpuhp_state))
+		__this_cpu_inc(__cpuhp_refcount);
+	else
+		__get_online_cpus(); /* Unconditional memory barrier. */
+	preempt_enable();
+	/*
+	 * The barrier() from preempt_enable() prevents the compiler from
+	 * bleeding the critical section out.
+	 */
+}
+
+extern void __put_online_cpus(void);
+
+static inline void put_online_cpus(void)
+{
+	/* The value was >= 1 and remains so, reordering causes no harm. */
+	if (--current->cpuhp_ref)
+		return;
+
+	/*
+	 * The barrier() in preempt_disable() prevents the compiler from
+	 * bleeding the critical section out.
+	 */
+	preempt_disable();
+	/*
+	 * Same as in get_online_cpus().
+	 */
+	if (likely(!__cpuhp_state))
+		__this_cpu_dec(__cpuhp_refcount);
+	else
+		__put_online_cpus(); /* Unconditional memory barrier. */
+	preempt_enable();
+}
+
 extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
@@ -200,6 +261,8 @@ static inline void cpu_hotplug_driver_un
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
+static inline void cpu_hotplug_init_task(struct task_struct *p) {}
+
 static inline void cpu_hotplug_begin(void) {}
 static inline void cpu_hotplug_done(void) {}
 #define get_online_cpus()	do { } while (0)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1039,6 +1039,9 @@ struct task_struct {
 #ifdef CONFIG_SMP
 	struct llist_node wake_entry;
 	int on_cpu;
+#ifdef CONFIG_HOTPLUG_CPU
+	int cpuhp_ref;
+#endif
 	struct task_struct *last_wakee;
 	unsigned long wakee_flips;
 	unsigned long wakee_flip_decay_ts;
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -49,88 +49,192 @@ static int cpu_hotplug_disabled;
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-static struct {
-	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
-	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
-	 */
-	int refcount;
-} cpu_hotplug = {
-	.active_writer = NULL,
-	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
-	.refcount = 0,
-};
+enum { readers_fast = 0, readers_slow, readers_block };
 
-void get_online_cpus(void)
+int __cpuhp_state;
+EXPORT_SYMBOL_GPL(__cpuhp_state);
+
+DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
+EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
+
+static atomic_t cpuhp_waitcount;
+static DECLARE_WAIT_QUEUE_HEAD(cpuhp_readers);
+static DECLARE_WAIT_QUEUE_HEAD(cpuhp_writer);
+
+void cpu_hotplug_init_task(struct task_struct *p)
 {
-	might_sleep();
-	if (cpu_hotplug.active_writer == current)
+	p->cpuhp_ref = 0;
+}
+
+void __get_online_cpus(void)
+{
+again:
+	__this_cpu_inc(__cpuhp_refcount);
+
+	/*
+	 * Due to having preemption disabled the decrement happens on
+	 * the same CPU as the increment, avoiding the
+	 * increment-on-one-CPU-and-decrement-on-another problem.
+	 *
+	 * And yes, if the reader misses the writer's assignment of
+	 * readers_block to __cpuhp_state, then the writer is
+	 * guaranteed to see the reader's increment.  Conversely, any
+	 * readers that increment their __cpuhp_refcount after the
+	 * writer looks are guaranteed to see the readers_block value,
+	 * which in turn means that they are guaranteed to immediately
+	 * decrement their __cpuhp_refcount, so that it doesn't matter
+	 * that the writer missed them.
+	 */
+
+	smp_mb(); /* A matches D */
+
+	if (likely(__cpuhp_state != readers_block))
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
 
+	/*
+	 * Make sure an outgoing writer sees the waitcount to ensure we
+	 * make progress.
+	 */
+	atomic_inc(&cpuhp_waitcount);
+
+	/*
+	 * Per the above comment; we still have preemption disabled and
+	 * will thus decrement on the same CPU as we incremented.
+	 */
+	__put_online_cpus();
+
+	/*
+	 * We either call schedule() in the wait, or we'll fall through
+	 * and reschedule on the preempt_enable() in get_online_cpus().
+	 */
+	preempt_enable_no_resched();
+	__wait_event(cpuhp_readers, __cpuhp_state != readers_block);
+	preempt_disable();
+
+	/*
+	 * Given we've still got preempt_disabled and new cpu_hotplug_begin()
+	 * must do a synchronize_sched() we're guaranteed a successfull
+	 * acquisition this time -- even if we wake the current
+	 * cpu_hotplug_end() now.
+	 */
+	if (atomic_dec_and_test(&cpuhp_waitcount))
+		wake_up(&cpuhp_writer);
+
+	goto again:
 }
-EXPORT_SYMBOL_GPL(get_online_cpus);
+EXPORT_SYMBOL_GPL(__get_online_cpus);
 
-void put_online_cpus(void)
+void __put_online_cpus(void)
 {
-	if (cpu_hotplug.active_writer == current)
-		return;
-	mutex_lock(&cpu_hotplug.lock);
+	smp_mb(); /* B matches C */
+	/*
+	 * In other words, if they see our decrement (presumably to aggregate
+	 * zero, as that is the only time it matters) they will also see our
+	 * critical section.
+	 */
+	this_cpu_dec(__cpuhp_refcount);
+
+	/* Prod writer to recheck readers_active */
+	wake_up(&cpuhp_writer);
+}
+EXPORT_SYMBOL_GPL(__put_online_cpus);
+
+#define per_cpu_sum(var)						\
+({ 									\
+ 	typeof(var) __sum = 0;						\
+ 	int cpu;							\
+ 	for_each_possible_cpu(cpu)					\
+ 		__sum += per_cpu(var, cpu);				\
+ 	__sum;								\
+)}
 
-	if (WARN_ON(!cpu_hotplug.refcount))
-		cpu_hotplug.refcount++; /* try to fix things up */
+/*
+ * See srcu_readers_active_idx_check() for a rather more detailed explanation.
+ */
+static bool cpuhp_readers_active_check(void)
+{
+	if (per_cpu_sum(__cpuhp_refcount) != 0)
+		return false;
 
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
+	/*
+	 * If we observed the decrement; ensure we see the entire critical
+	 * section.
+	 */
+
+	smp_mb(); /* C matches B */
 
+	return true;
 }
-EXPORT_SYMBOL_GPL(put_online_cpus);
 
 /*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
- *
- * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
- *
- * Since cpu_hotplug_begin() is always called after invoking
- * cpu_maps_update_begin(), we can be sure that only one writer is active.
- *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
+ * This will notify new readers to block and wait for all active readers to
+ * complete.
  */
 void cpu_hotplug_begin(void)
 {
-	cpu_hotplug.active_writer = current;
+	/*
+	 * Since cpu_hotplug_begin() is always called after invoking
+	 * cpu_maps_update_begin(), we can be sure that only one writer is
+	 * active.
+	 */
+	lockdep_assert_held(&cpu_add_remove_lock);
+
+	/* Allow reader-in-writer recursion. */
+	current->cpuhp_ref++;
+
+	/* Notify readers to take the slow path. */
+	__cpuhp_state = readers_slow;
+
+	/* See percpu_down_write(); guarantees all readers take the slow path */
+	synchronize_sched();
+
+	/*
+	 * Notify new readers to block; up until now, and thus throughout the
+	 * longish synchronize_sched() above, new readers could still come in.
+	 */
+	__cpuhp_state = readers_block;
 
-	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
-			break;
-		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
-		schedule();
-	}
+	smp_mb(); /* D matches A */
+
+	/*
+	 * If they don't see our writer of readers_block to __cpuhp_state,
+	 * then we are guaranteed to see their __cpuhp_refcount increment, and
+	 * therefore will wait for them.
+	 */
+
+	/* Wait for all now active readers to complete. */
+	wait_event(cpuhp_writer, cpuhp_readers_active_check());
 }
 
 void cpu_hotplug_done(void)
 {
-	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
+	/*
+	 * Signal the writer is done, no fast path yet.
+	 *
+	 * One reason that we cannot just immediately flip to readers_fast is
+	 * that new readers might fail to see the results of this writer's
+	 * critical section.
+	 */
+	__cpuhp_state = readers_slow;
+	wake_up_all(&cpuhp_readers);
+
+	/*
+	 * The wait_event()/wake_up_all() prevents the race where the readers
+	 * are delayed between fetching __cpuhp_state and blocking.
+	 */
+
+	/* See percpu_up_write(); readers will no longer attempt to block. */
+	synchronize_sched();
+
+	/* Let 'em rip */
+	__cpuhp_state = readers_fast;
+	current->cpuhp_ref--;
+
+	/*
+	 * Wait for any pending readers to be running. This ensures readers
+	 * after writer and avoids writers starving readers.
+	 */
+	wait_event(cpuhp_writer, !atomic_read(&cpuhp_waitcount));
 }
 
 /*
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1635,6 +1635,8 @@ static void __sched_fork(struct task_str
 	p->numa_scan_period = sysctl_numa_balancing_scan_delay;
 	p->numa_work.next = &p->numa_work;
 #endif /* CONFIG_NUMA_BALANCING */
+
+	cpu_hotplug_init_task(p);
 }
 
 #ifdef CONFIG_NUMA_BALANCING



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-02 14:56 [PATCH 0/3] Optimize the cpu hotplug locking Peter Zijlstra
  2013-10-02 14:56 ` [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
@ 2013-10-02 14:56 ` Peter Zijlstra
  2013-10-02 15:49   ` Oleg Nesterov
  2013-10-03 16:41   ` Paul E. McKenney
  2013-10-02 14:56 ` [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync Peter Zijlstra
  2 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-02 14:56 UTC (permalink / raw)
  To: Oleg Nesterov, Paul McKenney
  Cc: Mel Gorman, Rik van Riel, Srikar Dronamraju, Ingo Molnar,
	Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel, Peter Zijlstra

[-- Attachment #1: oleg-sync_sched.patch --]
[-- Type: text/plain, Size: 5941 bytes --]

From: Oleg Nesterov <oleg@redhat.com>

It is functionally equivalent to

        struct xxx_struct {
                atomic_t counter;
        };

        static inline bool xxx_is_idle(struct xxx_struct *xxx)
        {
                return atomic_read(&xxx->counter) == 0;
        }

        static inline void xxx_enter(struct xxx_struct *xxx)
        {
                atomic_inc(&xxx->counter);
                synchronize_sched();
        }

        static inline void xxx_enter(struct xxx_struct *xxx)
        {
                synchronize_sched();
                atomic_dec(&xxx->counter);
        }

except: it records the state and synchronize_sched() is only called by
xxx_enter() and only if necessary.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130929183634.GA15563@redhat.com
---
 include/linux/rcusync.h |   64 ++++++++++++++++++++++++++++
 kernel/Makefile         |    3 -
 kernel/rcusync.c        |  108 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 174 insertions(+), 1 deletion(-)

--- /dev/null
+++ b/include/linux/rcusync.h
@@ -0,0 +1,64 @@
+#ifndef _LINUX_RCUSYNC_H_
+#define _LINUX_RCUSYNC_H_
+
+#include <linux/wait.h>
+#include <linux/rcupdate.h>
+
+struct rcu_sync_struct {
+	int			gp_state;
+	int			gp_count;
+	wait_queue_head_t	gp_wait;
+
+	int			cb_state;
+	struct rcu_head		cb_head;
+
+	void (*sync)(void);
+	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
+};
+
+#define ___RCU_SYNC_INIT(name)						\
+	.gp_state = 0,							\
+	.gp_count = 0,							\
+	.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),		\
+	.cb_state = 0
+
+#define __RCU_SCHED_SYNC_INIT(name) {					\
+	___RCU_SYNC_INIT(name),						\
+	.sync = synchronize_sched,					\
+	.call = call_rcu_sched,						\
+}
+
+#define __RCU_BH_SYNC_INIT(name) {					\
+	___RCU_SYNC_INIT(name),						\
+	.sync = synchronize_rcu_bh,					\
+	.call = call_rcu_bh,						\
+}
+
+#define __RCU_SYNC_INIT(name) {						\
+	___RCU_SYNC_INIT(name),						\
+	.sync = synchronize_rcu,					\
+	.call = call_rcu,						\
+}
+
+#define DEFINE_RCU_SCHED_SYNC(name)					\
+	struct rcu_sync_struct name = __RCU_SCHED_SYNC_INIT(name)
+
+#define DEFINE_RCU_BH_SYNC(name)					\
+	struct rcu_sync_struct name = __RCU_BH_SYNC_INIT(name)
+
+#define DEFINE_RCU_SYNC(name)						\
+	struct rcu_sync_struct name = __RCU_SYNC_INIT(name)
+
+static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
+{
+	return !rss->gp_state; /* GP_IDLE */
+}
+
+enum rcu_sync_type { RCU_SYNC, RCU_SCHED_SYNC, RCU_BH_SYNC };
+
+extern void rcu_sync_init(struct rcu_sync_struct *, enum rcu_sync_type);
+extern void rcu_sync_enter(struct rcu_sync_struct *);
+extern void rcu_sync_exit(struct rcu_sync_struct *);
+
+#endif /* _LINUX_RCUSYNC_H_ */
+
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,8 @@ obj-y     = fork.o exec_domain.o panic.o
 	    kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \
 	    hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
 	    notifier.o ksysfs.o cred.o reboot.o \
-	    async.o range.o groups.o lglock.o smpboot.o
+	    async.o range.o groups.o lglock.o smpboot.o \
+	    rcusync.o
 
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace debug files and internal ftrace files
--- /dev/null
+++ b/kernel/rcusync.c
@@ -0,0 +1,108 @@
+
+#include <linux/rcusync.h>
+#include <linux/sched.h>
+
+enum { GP_IDLE = 0, GP_PENDING, GP_PASSED };
+enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
+
+#define	rss_lock	gp_wait.lock
+
+void rcu_sync_init(struct rcu_sync_struct *rss, enum rcu_sync_type type)
+{
+	memset(rss, 0, sizeof(*rss));
+	init_waitqueue_head(&rss->gp_wait);
+
+	switch (type) {
+	case RCU_SYNC:
+		rss->sync = synchronize_rcu;
+		rss->call = call_rcu;
+		break;
+
+	case RCU_SCHED_SYNC:
+		rss->sync = synchronize_sched;
+		rss->call = call_rcu_sched;
+		break;
+
+	case RCU_BH_SYNC:
+		rss->sync = synchronize_rcu_bh;
+		rss->call = call_rcu_bh;
+		break;
+	}
+}
+
+void rcu_sync_enter(struct rcu_sync_struct *rss)
+{
+	bool need_wait, need_sync;
+
+	spin_lock_irq(&rss->rss_lock);
+	need_wait = rss->gp_count++;
+	need_sync = rss->gp_state == GP_IDLE;
+	if (need_sync)
+		rss->gp_state = GP_PENDING;
+	spin_unlock_irq(&rss->rss_lock);
+
+	BUG_ON(need_wait && need_sync);
+
+	if (need_sync) {
+		rss->sync();
+		rss->gp_state = GP_PASSED;
+		wake_up_all(&rss->gp_wait);
+	} else if (need_wait) {
+		wait_event(rss->gp_wait, rss->gp_state == GP_PASSED);
+	} else {
+		/*
+		 * Possible when there's a pending CB from a rcu_sync_exit().
+		 * Nobody has yet been allowed the 'fast' path and thus we can
+		 * avoid doing any sync(). The callback will get 'dropped'.
+		 */
+		BUG_ON(rss->gp_state != GP_PASSED);
+	}
+}
+
+static void rcu_sync_func(struct rcu_head *rcu)
+{
+	struct rcu_sync_struct *rss =
+		container_of(rcu, struct rcu_sync_struct, cb_head);
+	unsigned long flags;
+
+
+	BUG_ON(rss->gp_state != GP_PASSED);
+	BUG_ON(rss->cb_state == CB_IDLE);
+
+	spin_lock_irqsave(&rss->rss_lock, flags);
+	if (rss->gp_count) {
+		/*
+		 * A new rcu_sync_begin() has happened; drop the callback.
+		 */
+		rss->cb_state = CB_IDLE;
+	} else if (rss->cb_state == CB_REPLAY) {
+		/*
+		 * A new rcu_sync_exit() has happened; requeue the callback
+		 * to catch a later GP.
+		 */
+		rss->cb_state = CB_PENDING;
+		rss->call(&rss->cb_head, rcu_sync_func);
+	} else {
+		/*
+		 * We're at least a GP after rcu_sync_exit(); eveybody will now
+		 * have observed the write side critical section. Let 'em rip!.
+		 */
+		rss->cb_state = CB_IDLE;
+		rss->gp_state = GP_IDLE;
+	}
+	spin_unlock_irqrestore(&rss->rss_lock, flags);
+}
+
+void rcu_sync_exit(struct rcu_sync_struct *rss)
+{
+	spin_lock_irq(&rss->rss_lock);
+	if (!--rss->gp_count) {
+		if (rss->cb_state == CB_IDLE) {
+			rss->cb_state = CB_PENDING;
+			rss->call(&rss->cb_head, rcu_sync_func);
+		} else if (rss->cb_state == CB_PENDING) {
+			rss->cb_state = CB_REPLAY;
+		}
+	}
+	spin_unlock_irq(&rss->rss_lock);
+}



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync
  2013-10-02 14:56 [PATCH 0/3] Optimize the cpu hotplug locking Peter Zijlstra
  2013-10-02 14:56 ` [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
  2013-10-02 14:56 ` [PATCH 2/3] rcu: Create rcu_sync infrastructure Peter Zijlstra
@ 2013-10-02 14:56 ` Peter Zijlstra
  2013-10-03 16:48   ` Paul E. McKenney
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-02 14:56 UTC (permalink / raw)
  To: Oleg Nesterov, Paul McKenney
  Cc: Mel Gorman, Rik van Riel, Srikar Dronamraju, Ingo Molnar,
	Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel, Peter Zijlstra

[-- Attachment #1: peterz-cpu-hotplug-2.patch --]
[-- Type: text/plain, Size: 5593 bytes --]

Use the fancy new rcu_sync bits from Oleg to optimize the fancy new
hotplug lock implementation.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/cpu.h |    7 ++++---
 kernel/cpu.c        |   52 +++++++++++++++++++++++-----------------------------
 2 files changed, 27 insertions(+), 32 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -18,6 +18,7 @@
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/sched.h>
+#include <linux/rcusync.h>
 
 struct device;
 
@@ -180,7 +181,7 @@ extern void cpu_hotplug_init_task(struct
 extern void cpu_hotplug_begin(void);
 extern void cpu_hotplug_done(void);
 
-extern int __cpuhp_state;
+extern struct rcu_sync_struct __cpuhp_rss;
 DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
 
 extern void __get_online_cpus(void);
@@ -204,7 +205,7 @@ static inline void get_online_cpus(void)
 	 * writer will see anything we did within this RCU-sched read-side
 	 * critical section.
 	 */
-	if (likely(!__cpuhp_state))
+	if (likely(rcu_sync_is_idle(&__cpuhp_rss)))
 		__this_cpu_inc(__cpuhp_refcount);
 	else
 		__get_online_cpus(); /* Unconditional memory barrier. */
@@ -231,7 +232,7 @@ static inline void put_online_cpus(void)
 	/*
 	 * Same as in get_online_cpus().
 	 */
-	if (likely(!__cpuhp_state))
+	if (likely(rcu_sync_is_idle(&__cpuhp_rss)))
 		__this_cpu_dec(__cpuhp_refcount);
 	else
 		__put_online_cpus(); /* Unconditional memory barrier. */
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -49,14 +49,15 @@ static int cpu_hotplug_disabled;
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-enum { readers_fast = 0, readers_slow, readers_block };
+enum { readers_slow, readers_block };
 
-int __cpuhp_state;
-EXPORT_SYMBOL_GPL(__cpuhp_state);
+DEFINE_RCU_SCHED_SYNC(__cpuhp_rss);
+EXPORT_SYMBOL_GPL(__cpuhp_rss);
 
 DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
 EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
 
+static int cpuhp_state = readers_slow;
 static atomic_t cpuhp_waitcount;
 static DECLARE_WAIT_QUEUE_HEAD(cpuhp_readers);
 static DECLARE_WAIT_QUEUE_HEAD(cpuhp_writer);
@@ -68,7 +69,6 @@ void cpu_hotplug_init_task(struct task_s
 
 void __get_online_cpus(void)
 {
-again:
 	__this_cpu_inc(__cpuhp_refcount);
 
 	/*
@@ -77,7 +77,7 @@ void __get_online_cpus(void)
 	 * increment-on-one-CPU-and-decrement-on-another problem.
 	 *
 	 * And yes, if the reader misses the writer's assignment of
-	 * readers_block to __cpuhp_state, then the writer is
+	 * readers_block to cpuhp_state, then the writer is
 	 * guaranteed to see the reader's increment.  Conversely, any
 	 * readers that increment their __cpuhp_refcount after the
 	 * writer looks are guaranteed to see the readers_block value,
@@ -88,7 +88,7 @@ void __get_online_cpus(void)
 
 	smp_mb(); /* A matches D */
 
-	if (likely(__cpuhp_state != readers_block))
+	if (likely(cpuhp_state != readers_block))
 		return;
 
 	/*
@@ -108,19 +108,19 @@ void __get_online_cpus(void)
 	 * and reschedule on the preempt_enable() in get_online_cpus().
 	 */
 	preempt_enable_no_resched();
-	__wait_event(cpuhp_readers, __cpuhp_state != readers_block);
+	__wait_event(cpuhp_readers, cpuhp_state != readers_block);
 	preempt_disable();
 
+	__this_cpu_inc(__cpuhp_refcount);
+
 	/*
-	 * Given we've still got preempt_disabled and new cpu_hotplug_begin()
-	 * must do a synchronize_sched() we're guaranteed a successfull
-	 * acquisition this time -- even if we wake the current
-	 * cpu_hotplug_end() now.
+	 * cpu_hotplug_done() waits until all pending readers are gone;
+	 * this means that a new cpu_hotplug_begin() must observe our
+	 * refcount increment and wait for it to go away.
 	 */
-	if (atomic_dec_and_test(&cpuhp_waitcount))
-		wake_up(&cpuhp_writer);
 
-	goto again;
+	if (atomic_dec_and_test(&cpuhp_waitcount)) /* A */
+		wake_up(&cpuhp_writer);
 }
 EXPORT_SYMBOL_GPL(__get_online_cpus);
 
@@ -183,21 +183,18 @@ void cpu_hotplug_begin(void)
 	current->cpuhp_ref++;
 
 	/* Notify readers to take the slow path. */
-	__cpuhp_state = readers_slow;
-
-	/* See percpu_down_write(); guarantees all readers take the slow path */
-	synchronize_sched();
+	rcu_sync_enter(&__cpuhp_rss);
 
 	/*
 	 * Notify new readers to block; up until now, and thus throughout the
 	 * longish synchronize_sched() above, new readers could still come in.
 	 */
-	__cpuhp_state = readers_block;
+	cpuhp_state = readers_block;
 
 	smp_mb(); /* D matches A */
 
 	/*
-	 * If they don't see our writer of readers_block to __cpuhp_state,
+	 * If they don't see our writer of readers_block to cpuhp_state,
 	 * then we are guaranteed to see their __cpuhp_refcount increment, and
 	 * therefore will wait for them.
 	 */
@@ -215,26 +212,23 @@ void cpu_hotplug_done(void)
 	 * that new readers might fail to see the results of this writer's
 	 * critical section.
 	 */
-	__cpuhp_state = readers_slow;
+	cpuhp_state = readers_slow;
 	wake_up_all(&cpuhp_readers);
 
 	/*
 	 * The wait_event()/wake_up_all() prevents the race where the readers
-	 * are delayed between fetching __cpuhp_state and blocking.
+	 * are delayed between fetching cpuhp_state and blocking.
 	 */
 
-	/* See percpu_up_write(); readers will no longer attempt to block. */
-	synchronize_sched();
-
-	/* Let 'em rip */
-	__cpuhp_state = readers_fast;
 	current->cpuhp_ref--;
 
 	/*
-	 * Wait for any pending readers to be running. This ensures readers
-	 * after writer and avoids writers starving readers.
+	 * Wait for any pending readers to be running. This avoids writers
+	 * starving readers.
 	 */
 	wait_event(cpuhp_writer, !atomic_read(&cpuhp_waitcount));
+
+	rcu_sync_exit(&__cpuhp_rss);
 }
 
 /*



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-02 14:56 ` [PATCH 2/3] rcu: Create rcu_sync infrastructure Peter Zijlstra
@ 2013-10-02 15:49   ` Oleg Nesterov
  2013-10-03 16:42     ` Paul E. McKenney
  2013-10-08  8:18     ` Peter Zijlstra
  2013-10-03 16:41   ` Paul E. McKenney
  1 sibling, 2 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-02 15:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/02, Peter Zijlstra wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>

Thanks! I was writing the patch, and I chose almost the same naming ;)

> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

In fact I'd like to add my sob to 1/3 and 3/3 as well.


Paul, to remind, this is only the first step. I am going to send
the following improvements:

	1. Add rcu_sync->exlusive. The change is simple, just we
	   need s/wait_queue_head_t/completion/ in rcu_sync_struct
	   and a couple of "if (rss->exclusive)" checks in enter/exit.

	2. rcu_sync_enter() should return !!need_sync. This can help
	   in exclusive mode.

	3. rcu_sync_struct needs more function pointers (perhaps we
	   should add a single rcu_sync_struct->ops pointer but this
	   is minor). See below.

But let me repeat just in case, we should do this later.
And once this series is applied, I'll change percpu_rw_semaphore.


> +struct rcu_sync_struct {
> +	int			gp_state;
> +	int			gp_count;
> +	wait_queue_head_t	gp_wait;
> +
> +	int			cb_state;
> +	struct rcu_head		cb_head;
> +
> +	void (*sync)(void);
> +	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));

Yes, and we also need rcu_sync_struct->barrier(). From the patch I was
working on:

	void rcu_sync_wait_for_callback(struct rcu_sync *sync)
	{
		int cb_state;

		BUG_ON(sync->gp_count);

		spin_lock_irq(&sync->state_lock);
		if (sync->cb_state == CB_REPLAY)
			sync->cb_state = CB_PENDING;
		cb_state = sync->cb_state;
		spin_unlock_irq(&sync->state_lock);

		if (cb_state != CB_IDLE) {
			rcu_barrier_sched();
			BUG_ON(sync->cb_state != CB_IDLE);
		}
	}

It should be called if you are going to kfree the object.

Perhaps another rcu_sync_struct->state_change(new_state) callback (set
by the user) makes sense too, this can help (for example) to implement
the array of semaphores with a single rcu_sync_struct (freeze_super).

Thanks.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus()
  2013-10-02 14:56 ` [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
@ 2013-10-03 14:01   ` Peter Zijlstra
  2013-10-03 16:27     ` Paul E. McKenney
  2013-10-03 16:26   ` Paul E. McKenney
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-03 14:01 UTC (permalink / raw)
  To: Oleg Nesterov, Paul McKenney
  Cc: Mel Gorman, Rik van Riel, Srikar Dronamraju, Ingo Molnar,
	Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Wed, Oct 02, 2013 at 04:56:56PM +0200, Peter Zijlstra wrote:
> +	if (atomic_dec_and_test(&cpuhp_waitcount))
> +		wake_up(&cpuhp_writer);
> +
> +	goto again:
>  }

> +
> +#define per_cpu_sum(var)						\
> +({ 									\
> + 	typeof(var) __sum = 0;						\
> + 	int cpu;							\
> + 	for_each_possible_cpu(cpu)					\
> + 		__sum += per_cpu(var, cpu);				\
> + 	__sum;								\
> +)}
>  

I just noticed I forgot a refresh..


+++ b/kernel/cpu.c
@@ -120,7 +120,7 @@ void __get_online_cpus(void)
        if (atomic_dec_and_test(&cpuhp_waitcount))
                wake_up(&cpuhp_writer);
 
-       goto again:
+       goto again;
 }
 EXPORT_SYMBOL_GPL(__get_online_cpus);
 
@@ -146,7 +146,7 @@ EXPORT_SYMBOL_GPL(__put_online_cpus);
        for_each_possible_cpu(cpu)                                      \
                __sum += per_cpu(var, cpu);                             \
        __sum;                                                          \
-)}
+})
 
 /*
  * See srcu_readers_active_idx_check() for a rather more detailed explanation.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus()
  2013-10-02 14:56 ` [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
  2013-10-03 14:01   ` Peter Zijlstra
@ 2013-10-03 16:26   ` Paul E. McKenney
  1 sibling, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 16:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Wed, Oct 02, 2013 at 04:56:56PM +0200, Peter Zijlstra wrote:
> The current implementation of get_online_cpus() is global of nature
> and thus not suited for any kind of common usage.
> 
> Re-implement the current recursive r/w cpu hotplug lock such that the
> read side locks are as light as possible.
> 
> The current cpu hotplug lock is entirely reader biased; but since
> readers are expensive there aren't a lot of them about and writer
> starvation isn't a particular problem.
> 
> However by making the reader side more usable there is a fair chance
> it will get used more and thus the starvation issue becomes a real
> possibility.
> 
> Therefore this new implementation is fair, alternating readers and
> writers; this however requires per-task state to allow the reader
> recursion -- this new task_struct member is placed in a 4 byte hole on
> 64bit builds.
> 
> Many comments are contributed by Paul McKenney, and many previous
> attempts were shown to be inadequate by both Paul and Oleg; many
> thanks to them for persisting to poke holes in my attempts.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

One change to a now-obsolete comment called out below, with that
change:

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  include/linux/cpu.h   |   67 ++++++++++++++
>  include/linux/sched.h |    3 
>  kernel/cpu.c          |  226 ++++++++++++++++++++++++++++++++++++--------------
>  kernel/sched/core.c   |    2 
>  4 files changed, 235 insertions(+), 63 deletions(-)
> 
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -16,6 +16,8 @@
>  #include <linux/node.h>
>  #include <linux/compiler.h>
>  #include <linux/cpumask.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> 
>  struct device;
> 
> @@ -173,10 +175,69 @@ extern struct bus_type cpu_subsys;
>  #ifdef CONFIG_HOTPLUG_CPU
>  /* Stop CPUs going up and down. */
> 
> +extern void cpu_hotplug_init_task(struct task_struct *p);
> +
>  extern void cpu_hotplug_begin(void);
>  extern void cpu_hotplug_done(void);
> -extern void get_online_cpus(void);
> -extern void put_online_cpus(void);
> +
> +extern int __cpuhp_state;
> +DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
> +
> +extern void __get_online_cpus(void);
> +
> +static inline void get_online_cpus(void)
> +{
> +	might_sleep();
> +
> +	/* Support reader recursion */
> +	/* The value was >= 1 and remains so, reordering causes no harm. */
> +	if (current->cpuhp_ref++)
> +		return;
> +
> +	preempt_disable();
> +	/*
> +	 * We are in an RCU-sched read-side critical section, so the writer
> +	 * cannot both change __cpuhp_state from readers_fast and start
> +	 * checking counters while we are here. So if we see !__cpuhp_state,
> +	 * we know that the writer won't be checking until we past the
> +	 * preempt_enable() and that once the synchronize_sched() is done, the
> +	 * writer will see anything we did within this RCU-sched read-side
> +	 * critical section.
> +	 */
> +	if (likely(!__cpuhp_state))
> +		__this_cpu_inc(__cpuhp_refcount);
> +	else
> +		__get_online_cpus(); /* Unconditional memory barrier. */
> +	preempt_enable();
> +	/*
> +	 * The barrier() from preempt_enable() prevents the compiler from
> +	 * bleeding the critical section out.
> +	 */
> +}
> +
> +extern void __put_online_cpus(void);
> +
> +static inline void put_online_cpus(void)
> +{
> +	/* The value was >= 1 and remains so, reordering causes no harm. */
> +	if (--current->cpuhp_ref)
> +		return;
> +
> +	/*
> +	 * The barrier() in preempt_disable() prevents the compiler from
> +	 * bleeding the critical section out.
> +	 */
> +	preempt_disable();
> +	/*
> +	 * Same as in get_online_cpus().
> +	 */
> +	if (likely(!__cpuhp_state))
> +		__this_cpu_dec(__cpuhp_refcount);
> +	else
> +		__put_online_cpus(); /* Unconditional memory barrier. */
> +	preempt_enable();
> +}
> +
>  extern void cpu_hotplug_disable(void);
>  extern void cpu_hotplug_enable(void);
>  #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
> @@ -200,6 +261,8 @@ static inline void cpu_hotplug_driver_un
> 
>  #else		/* CONFIG_HOTPLUG_CPU */
> 
> +static inline void cpu_hotplug_init_task(struct task_struct *p) {}
> +
>  static inline void cpu_hotplug_begin(void) {}
>  static inline void cpu_hotplug_done(void) {}
>  #define get_online_cpus()	do { } while (0)
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1039,6 +1039,9 @@ struct task_struct {
>  #ifdef CONFIG_SMP
>  	struct llist_node wake_entry;
>  	int on_cpu;
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int cpuhp_ref;
> +#endif
>  	struct task_struct *last_wakee;
>  	unsigned long wakee_flips;
>  	unsigned long wakee_flip_decay_ts;
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -49,88 +49,192 @@ static int cpu_hotplug_disabled;
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> 
> -static struct {
> -	struct task_struct *active_writer;
> -	struct mutex lock; /* Synchronizes accesses to refcount, */
> -	/*
> -	 * Also blocks the new readers during
> -	 * an ongoing cpu hotplug operation.
> -	 */
> -	int refcount;
> -} cpu_hotplug = {
> -	.active_writer = NULL,
> -	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
> -	.refcount = 0,
> -};
> +enum { readers_fast = 0, readers_slow, readers_block };
> 
> -void get_online_cpus(void)
> +int __cpuhp_state;
> +EXPORT_SYMBOL_GPL(__cpuhp_state);
> +
> +DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
> +EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
> +
> +static atomic_t cpuhp_waitcount;
> +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_readers);
> +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_writer);
> +
> +void cpu_hotplug_init_task(struct task_struct *p)
>  {
> -	might_sleep();
> -	if (cpu_hotplug.active_writer == current)
> +	p->cpuhp_ref = 0;
> +}
> +
> +void __get_online_cpus(void)
> +{
> +again:
> +	__this_cpu_inc(__cpuhp_refcount);
> +
> +	/*
> +	 * Due to having preemption disabled the decrement happens on
> +	 * the same CPU as the increment, avoiding the
> +	 * increment-on-one-CPU-and-decrement-on-another problem.
> +	 *
> +	 * And yes, if the reader misses the writer's assignment of
> +	 * readers_block to __cpuhp_state, then the writer is
> +	 * guaranteed to see the reader's increment.  Conversely, any
> +	 * readers that increment their __cpuhp_refcount after the
> +	 * writer looks are guaranteed to see the readers_block value,
> +	 * which in turn means that they are guaranteed to immediately
> +	 * decrement their __cpuhp_refcount, so that it doesn't matter
> +	 * that the writer missed them.
> +	 */
> +
> +	smp_mb(); /* A matches D */
> +
> +	if (likely(__cpuhp_state != readers_block))
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount++;
> -	mutex_unlock(&cpu_hotplug.lock);
> 
> +	/*
> +	 * Make sure an outgoing writer sees the waitcount to ensure we
> +	 * make progress.
> +	 */
> +	atomic_inc(&cpuhp_waitcount);
> +
> +	/*
> +	 * Per the above comment; we still have preemption disabled and
> +	 * will thus decrement on the same CPU as we incremented.
> +	 */
> +	__put_online_cpus();
> +
> +	/*
> +	 * We either call schedule() in the wait, or we'll fall through
> +	 * and reschedule on the preempt_enable() in get_online_cpus().
> +	 */
> +	preempt_enable_no_resched();
> +	__wait_event(cpuhp_readers, __cpuhp_state != readers_block);
> +	preempt_disable();
> +
> +	/*
> +	 * Given we've still got preempt_disabled and new cpu_hotplug_begin()
> +	 * must do a synchronize_sched() we're guaranteed a successfull
> +	 * acquisition this time -- even if we wake the current
> +	 * cpu_hotplug_end() now.
> +	 */
> +	if (atomic_dec_and_test(&cpuhp_waitcount))
> +		wake_up(&cpuhp_writer);
> +
> +	goto again:
>  }
> -EXPORT_SYMBOL_GPL(get_online_cpus);
> +EXPORT_SYMBOL_GPL(__get_online_cpus);
> 
> -void put_online_cpus(void)
> +void __put_online_cpus(void)
>  {
> -	if (cpu_hotplug.active_writer == current)
> -		return;
> -	mutex_lock(&cpu_hotplug.lock);
> +	smp_mb(); /* B matches C */
> +	/*
> +	 * In other words, if they see our decrement (presumably to aggregate
> +	 * zero, as that is the only time it matters) they will also see our
> +	 * critical section.
> +	 */
> +	this_cpu_dec(__cpuhp_refcount);
> +
> +	/* Prod writer to recheck readers_active */
> +	wake_up(&cpuhp_writer);
> +}
> +EXPORT_SYMBOL_GPL(__put_online_cpus);
> +
> +#define per_cpu_sum(var)						\
> +({ 									\
> + 	typeof(var) __sum = 0;						\
> + 	int cpu;							\
> + 	for_each_possible_cpu(cpu)					\
> + 		__sum += per_cpu(var, cpu);				\
> + 	__sum;								\
> +)}
> 
> -	if (WARN_ON(!cpu_hotplug.refcount))
> -		cpu_hotplug.refcount++; /* try to fix things up */
> +/*
> + * See srcu_readers_active_idx_check() for a rather more detailed explanation.
> + */

The above comment is now obsolete, suggest something like:

/*
 * Return true if the modular sum of the __cpuhp_refcount per-CPU
 * variables is zero.  If this sum is zero, then it is stable
 * due to the fact that if any newly arriving readers increment
 * a given counter, they will immediately decrement that same
 * counter.
 */

> +static bool cpuhp_readers_active_check(void)
> +{
> +	if (per_cpu_sum(__cpuhp_refcount) != 0)
> +		return false;
> 
> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> -		wake_up_process(cpu_hotplug.active_writer);
> -	mutex_unlock(&cpu_hotplug.lock);
> +	/*
> +	 * If we observed the decrement; ensure we see the entire critical
> +	 * section.
> +	 */
> +
> +	smp_mb(); /* C matches B */
> 
> +	return true;
>  }
> -EXPORT_SYMBOL_GPL(put_online_cpus);
> 
>  /*
> - * This ensures that the hotplug operation can begin only when the
> - * refcount goes to zero.
> - *
> - * Note that during a cpu-hotplug operation, the new readers, if any,
> - * will be blocked by the cpu_hotplug.lock
> - *
> - * Since cpu_hotplug_begin() is always called after invoking
> - * cpu_maps_update_begin(), we can be sure that only one writer is active.
> - *
> - * Note that theoretically, there is a possibility of a livelock:
> - * - Refcount goes to zero, last reader wakes up the sleeping
> - *   writer.
> - * - Last reader unlocks the cpu_hotplug.lock.
> - * - A new reader arrives at this moment, bumps up the refcount.
> - * - The writer acquires the cpu_hotplug.lock finds the refcount
> - *   non zero and goes to sleep again.
> - *
> - * However, this is very difficult to achieve in practice since
> - * get_online_cpus() not an api which is called all that often.
> - *
> + * This will notify new readers to block and wait for all active readers to
> + * complete.
>   */
>  void cpu_hotplug_begin(void)
>  {
> -	cpu_hotplug.active_writer = current;
> +	/*
> +	 * Since cpu_hotplug_begin() is always called after invoking
> +	 * cpu_maps_update_begin(), we can be sure that only one writer is
> +	 * active.
> +	 */
> +	lockdep_assert_held(&cpu_add_remove_lock);
> +
> +	/* Allow reader-in-writer recursion. */
> +	current->cpuhp_ref++;
> +
> +	/* Notify readers to take the slow path. */
> +	__cpuhp_state = readers_slow;
> +
> +	/* See percpu_down_write(); guarantees all readers take the slow path */
> +	synchronize_sched();
> +
> +	/*
> +	 * Notify new readers to block; up until now, and thus throughout the
> +	 * longish synchronize_sched() above, new readers could still come in.
> +	 */
> +	__cpuhp_state = readers_block;
> 
> -	for (;;) {
> -		mutex_lock(&cpu_hotplug.lock);
> -		if (likely(!cpu_hotplug.refcount))
> -			break;
> -		__set_current_state(TASK_UNINTERRUPTIBLE);
> -		mutex_unlock(&cpu_hotplug.lock);
> -		schedule();
> -	}
> +	smp_mb(); /* D matches A */
> +
> +	/*
> +	 * If they don't see our writer of readers_block to __cpuhp_state,
> +	 * then we are guaranteed to see their __cpuhp_refcount increment, and
> +	 * therefore will wait for them.
> +	 */
> +
> +	/* Wait for all now active readers to complete. */
> +	wait_event(cpuhp_writer, cpuhp_readers_active_check());
>  }
> 
>  void cpu_hotplug_done(void)
>  {
> -	cpu_hotplug.active_writer = NULL;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	/*
> +	 * Signal the writer is done, no fast path yet.
> +	 *
> +	 * One reason that we cannot just immediately flip to readers_fast is
> +	 * that new readers might fail to see the results of this writer's
> +	 * critical section.
> +	 */
> +	__cpuhp_state = readers_slow;
> +	wake_up_all(&cpuhp_readers);
> +
> +	/*
> +	 * The wait_event()/wake_up_all() prevents the race where the readers
> +	 * are delayed between fetching __cpuhp_state and blocking.
> +	 */
> +
> +	/* See percpu_up_write(); readers will no longer attempt to block. */
> +	synchronize_sched();
> +
> +	/* Let 'em rip */
> +	__cpuhp_state = readers_fast;
> +	current->cpuhp_ref--;
> +
> +	/*
> +	 * Wait for any pending readers to be running. This ensures readers
> +	 * after writer and avoids writers starving readers.
> +	 */
> +	wait_event(cpuhp_writer, !atomic_read(&cpuhp_waitcount));
>  }
> 
>  /*
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1635,6 +1635,8 @@ static void __sched_fork(struct task_str
>  	p->numa_scan_period = sysctl_numa_balancing_scan_delay;
>  	p->numa_work.next = &p->numa_work;
>  #endif /* CONFIG_NUMA_BALANCING */
> +
> +	cpu_hotplug_init_task(p);
>  }
> 
>  #ifdef CONFIG_NUMA_BALANCING
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus()
  2013-10-03 14:01   ` Peter Zijlstra
@ 2013-10-03 16:27     ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 16:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 04:01:50PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 02, 2013 at 04:56:56PM +0200, Peter Zijlstra wrote:
> > +	if (atomic_dec_and_test(&cpuhp_waitcount))
> > +		wake_up(&cpuhp_writer);
> > +
> > +	goto again:
> >  }
> 
> > +
> > +#define per_cpu_sum(var)						\
> > +({ 									\
> > + 	typeof(var) __sum = 0;						\
> > + 	int cpu;							\
> > + 	for_each_possible_cpu(cpu)					\
> > + 		__sum += per_cpu(var, cpu);				\
> > + 	__sum;								\
> > +)}
> >  
> 
> I just noticed I forgot a refresh..

And including these changes as well.  ;-)

							Thanx, Paul

> +++ b/kernel/cpu.c
> @@ -120,7 +120,7 @@ void __get_online_cpus(void)
>         if (atomic_dec_and_test(&cpuhp_waitcount))
>                 wake_up(&cpuhp_writer);
> 
> -       goto again:
> +       goto again;
>  }
>  EXPORT_SYMBOL_GPL(__get_online_cpus);
> 
> @@ -146,7 +146,7 @@ EXPORT_SYMBOL_GPL(__put_online_cpus);
>         for_each_possible_cpu(cpu)                                      \
>                 __sum += per_cpu(var, cpu);                             \
>         __sum;                                                          \
> -)}
> +})
> 
>  /*
>   * See srcu_readers_active_idx_check() for a rather more detailed explanation.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-02 14:56 ` [PATCH 2/3] rcu: Create rcu_sync infrastructure Peter Zijlstra
  2013-10-02 15:49   ` Oleg Nesterov
@ 2013-10-03 16:41   ` Paul E. McKenney
  2013-10-03 17:00     ` Oleg Nesterov
  2013-10-03 18:40     ` Peter Zijlstra
  1 sibling, 2 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Wed, Oct 02, 2013 at 04:56:57PM +0200, Peter Zijlstra wrote:
> From: Oleg Nesterov <oleg@redhat.com>
> 
> It is functionally equivalent to
> 
>         struct xxx_struct {
>                 atomic_t counter;
>         };
> 
>         static inline bool xxx_is_idle(struct xxx_struct *xxx)
>         {
>                 return atomic_read(&xxx->counter) == 0;
>         }
> 
>         static inline void xxx_enter(struct xxx_struct *xxx)
>         {
>                 atomic_inc(&xxx->counter);
>                 synchronize_sched();
>         }
> 
>         static inline void xxx_enter(struct xxx_struct *xxx)
>         {
>                 synchronize_sched();
>                 atomic_dec(&xxx->counter);
>         }
> 
> except: it records the state and synchronize_sched() is only called by
> xxx_enter() and only if necessary.

Could you please make the "xxx_" above match the prefix below?

One comment below for the code's self-defense.

							Thanx, Paul

> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20130929183634.GA15563@redhat.com
> ---
>  include/linux/rcusync.h |   64 ++++++++++++++++++++++++++++
>  kernel/Makefile         |    3 -
>  kernel/rcusync.c        |  108 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 174 insertions(+), 1 deletion(-)
> 
> --- /dev/null
> +++ b/include/linux/rcusync.h
> @@ -0,0 +1,64 @@
> +#ifndef _LINUX_RCUSYNC_H_
> +#define _LINUX_RCUSYNC_H_
> +
> +#include <linux/wait.h>
> +#include <linux/rcupdate.h>
> +
> +struct rcu_sync_struct {
> +	int			gp_state;
> +	int			gp_count;
> +	wait_queue_head_t	gp_wait;
> +
> +	int			cb_state;
> +	struct rcu_head		cb_head;
> +
> +	void (*sync)(void);
> +	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> +};
> +
> +#define ___RCU_SYNC_INIT(name)						\
> +	.gp_state = 0,							\
> +	.gp_count = 0,							\
> +	.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),		\
> +	.cb_state = 0
> +
> +#define __RCU_SCHED_SYNC_INIT(name) {					\
> +	___RCU_SYNC_INIT(name),						\
> +	.sync = synchronize_sched,					\
> +	.call = call_rcu_sched,						\
> +}
> +
> +#define __RCU_BH_SYNC_INIT(name) {					\
> +	___RCU_SYNC_INIT(name),						\
> +	.sync = synchronize_rcu_bh,					\
> +	.call = call_rcu_bh,						\
> +}
> +
> +#define __RCU_SYNC_INIT(name) {						\
> +	___RCU_SYNC_INIT(name),						\
> +	.sync = synchronize_rcu,					\
> +	.call = call_rcu,						\
> +}
> +
> +#define DEFINE_RCU_SCHED_SYNC(name)					\
> +	struct rcu_sync_struct name = __RCU_SCHED_SYNC_INIT(name)
> +
> +#define DEFINE_RCU_BH_SYNC(name)					\
> +	struct rcu_sync_struct name = __RCU_BH_SYNC_INIT(name)
> +
> +#define DEFINE_RCU_SYNC(name)						\
> +	struct rcu_sync_struct name = __RCU_SYNC_INIT(name)
> +
> +static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
> +{

It would be nice to validate that we are in the corresponding type of
RCU read-side critical section, but I would hate to pay a function call
through a pointer for this if !PROVE_RCU.  But even more, I would hate
rare bugs due to someone forgetting an rcu_read_lock().

How about the something like the following, where ->read_side_check()
gets rcu_read_lock_held(), rcu_read_lock_bh_held(), or
rcu_read_lock_sched_held(), as appropriate?

#ifdef CONFIG_PROVE_RCU
#define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
#else
#define rcu_sync_is_idle_check(rss) do { } while (0)
#endif

	rcu_sync_is_idle_check(rss);

> +	return !rss->gp_state; /* GP_IDLE */
> +}
> +
> +enum rcu_sync_type { RCU_SYNC, RCU_SCHED_SYNC, RCU_BH_SYNC };
> +
> +extern void rcu_sync_init(struct rcu_sync_struct *, enum rcu_sync_type);
> +extern void rcu_sync_enter(struct rcu_sync_struct *);
> +extern void rcu_sync_exit(struct rcu_sync_struct *);
> +
> +#endif /* _LINUX_RCUSYNC_H_ */
> +
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -10,7 +10,8 @@ obj-y     = fork.o exec_domain.o panic.o
>  	    kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \
>  	    hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
>  	    notifier.o ksysfs.o cred.o reboot.o \
> -	    async.o range.o groups.o lglock.o smpboot.o
> +	    async.o range.o groups.o lglock.o smpboot.o \
> +	    rcusync.o
> 
>  ifdef CONFIG_FUNCTION_TRACER
>  # Do not trace debug files and internal ftrace files
> --- /dev/null
> +++ b/kernel/rcusync.c
> @@ -0,0 +1,108 @@
> +
> +#include <linux/rcusync.h>
> +#include <linux/sched.h>
> +
> +enum { GP_IDLE = 0, GP_PENDING, GP_PASSED };
> +enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> +
> +#define	rss_lock	gp_wait.lock
> +
> +void rcu_sync_init(struct rcu_sync_struct *rss, enum rcu_sync_type type)
> +{
> +	memset(rss, 0, sizeof(*rss));
> +	init_waitqueue_head(&rss->gp_wait);
> +
> +	switch (type) {
> +	case RCU_SYNC:
> +		rss->sync = synchronize_rcu;
> +		rss->call = call_rcu;
> +		break;
> +
> +	case RCU_SCHED_SYNC:
> +		rss->sync = synchronize_sched;
> +		rss->call = call_rcu_sched;
> +		break;
> +
> +	case RCU_BH_SYNC:
> +		rss->sync = synchronize_rcu_bh;
> +		rss->call = call_rcu_bh;
> +		break;
> +	}
> +}
> +
> +void rcu_sync_enter(struct rcu_sync_struct *rss)
> +{
> +	bool need_wait, need_sync;
> +
> +	spin_lock_irq(&rss->rss_lock);
> +	need_wait = rss->gp_count++;
> +	need_sync = rss->gp_state == GP_IDLE;
> +	if (need_sync)
> +		rss->gp_state = GP_PENDING;
> +	spin_unlock_irq(&rss->rss_lock);
> +
> +	BUG_ON(need_wait && need_sync);
> +
> +	if (need_sync) {
> +		rss->sync();
> +		rss->gp_state = GP_PASSED;
> +		wake_up_all(&rss->gp_wait);
> +	} else if (need_wait) {
> +		wait_event(rss->gp_wait, rss->gp_state == GP_PASSED);
> +	} else {
> +		/*
> +		 * Possible when there's a pending CB from a rcu_sync_exit().
> +		 * Nobody has yet been allowed the 'fast' path and thus we can
> +		 * avoid doing any sync(). The callback will get 'dropped'.
> +		 */
> +		BUG_ON(rss->gp_state != GP_PASSED);
> +	}
> +}
> +
> +static void rcu_sync_func(struct rcu_head *rcu)
> +{
> +	struct rcu_sync_struct *rss =
> +		container_of(rcu, struct rcu_sync_struct, cb_head);
> +	unsigned long flags;
> +
> +
> +	BUG_ON(rss->gp_state != GP_PASSED);
> +	BUG_ON(rss->cb_state == CB_IDLE);
> +
> +	spin_lock_irqsave(&rss->rss_lock, flags);
> +	if (rss->gp_count) {
> +		/*
> +		 * A new rcu_sync_begin() has happened; drop the callback.
> +		 */
> +		rss->cb_state = CB_IDLE;
> +	} else if (rss->cb_state == CB_REPLAY) {
> +		/*
> +		 * A new rcu_sync_exit() has happened; requeue the callback
> +		 * to catch a later GP.
> +		 */
> +		rss->cb_state = CB_PENDING;
> +		rss->call(&rss->cb_head, rcu_sync_func);
> +	} else {
> +		/*
> +		 * We're at least a GP after rcu_sync_exit(); eveybody will now
> +		 * have observed the write side critical section. Let 'em rip!.
> +		 */
> +		rss->cb_state = CB_IDLE;
> +		rss->gp_state = GP_IDLE;
> +	}
> +	spin_unlock_irqrestore(&rss->rss_lock, flags);
> +}
> +
> +void rcu_sync_exit(struct rcu_sync_struct *rss)
> +{
> +	spin_lock_irq(&rss->rss_lock);
> +	if (!--rss->gp_count) {
> +		if (rss->cb_state == CB_IDLE) {
> +			rss->cb_state = CB_PENDING;
> +			rss->call(&rss->cb_head, rcu_sync_func);
> +		} else if (rss->cb_state == CB_PENDING) {
> +			rss->cb_state = CB_REPLAY;
> +		}
> +	}
> +	spin_unlock_irq(&rss->rss_lock);
> +}
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-02 15:49   ` Oleg Nesterov
@ 2013-10-03 16:42     ` Paul E. McKenney
  2013-10-08  8:18     ` Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 16:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Wed, Oct 02, 2013 at 05:49:01PM +0200, Oleg Nesterov wrote:
> On 10/02, Peter Zijlstra wrote:
> >
> > From: Oleg Nesterov <oleg@redhat.com>
> 
> Thanks! I was writing the patch, and I chose almost the same naming ;)
> 
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> In fact I'd like to add my sob to 1/3 and 3/3 as well.
> 
> 
> Paul, to remind, this is only the first step. I am going to send
> the following improvements:
> 
> 	1. Add rcu_sync->exlusive. The change is simple, just we
> 	   need s/wait_queue_head_t/completion/ in rcu_sync_struct
> 	   and a couple of "if (rss->exclusive)" checks in enter/exit.
> 
> 	2. rcu_sync_enter() should return !!need_sync. This can help
> 	   in exclusive mode.
> 
> 	3. rcu_sync_struct needs more function pointers (perhaps we
> 	   should add a single rcu_sync_struct->ops pointer but this
> 	   is minor). See below.
> 
> But let me repeat just in case, we should do this later.
> And once this series is applied, I'll change percpu_rw_semaphore.

I took this into account in my review, the upgrades would be good!  ;-)

							Thanx, Paul

> > +struct rcu_sync_struct {
> > +	int			gp_state;
> > +	int			gp_count;
> > +	wait_queue_head_t	gp_wait;
> > +
> > +	int			cb_state;
> > +	struct rcu_head		cb_head;
> > +
> > +	void (*sync)(void);
> > +	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> 
> Yes, and we also need rcu_sync_struct->barrier(). From the patch I was
> working on:
> 
> 	void rcu_sync_wait_for_callback(struct rcu_sync *sync)
> 	{
> 		int cb_state;
> 
> 		BUG_ON(sync->gp_count);
> 
> 		spin_lock_irq(&sync->state_lock);
> 		if (sync->cb_state == CB_REPLAY)
> 			sync->cb_state = CB_PENDING;
> 		cb_state = sync->cb_state;
> 		spin_unlock_irq(&sync->state_lock);
> 
> 		if (cb_state != CB_IDLE) {
> 			rcu_barrier_sched();
> 			BUG_ON(sync->cb_state != CB_IDLE);
> 		}
> 	}
> 
> It should be called if you are going to kfree the object.
> 
> Perhaps another rcu_sync_struct->state_change(new_state) callback (set
> by the user) makes sense too, this can help (for example) to implement
> the array of semaphores with a single rcu_sync_struct (freeze_super).
> 
> Thanks.
> 
> Oleg.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync
  2013-10-02 14:56 ` [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync Peter Zijlstra
@ 2013-10-03 16:48   ` Paul E. McKenney
  2013-10-03 18:41     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Wed, Oct 02, 2013 at 04:56:58PM +0200, Peter Zijlstra wrote:
> Use the fancy new rcu_sync bits from Oleg to optimize the fancy new
> hotplug lock implementation.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Looks good, just a couple of comments on comments below...

							Thanx, Paul

> ---
>  include/linux/cpu.h |    7 ++++---
>  kernel/cpu.c        |   52 +++++++++++++++++++++++-----------------------------
>  2 files changed, 27 insertions(+), 32 deletions(-)
> 
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -18,6 +18,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/percpu.h>
>  #include <linux/sched.h>
> +#include <linux/rcusync.h>
> 
>  struct device;
> 
> @@ -180,7 +181,7 @@ extern void cpu_hotplug_init_task(struct
>  extern void cpu_hotplug_begin(void);
>  extern void cpu_hotplug_done(void);
> 
> -extern int __cpuhp_state;
> +extern struct rcu_sync_struct __cpuhp_rss;
>  DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
> 
>  extern void __get_online_cpus(void);
> @@ -204,7 +205,7 @@ static inline void get_online_cpus(void)
>  	 * writer will see anything we did within this RCU-sched read-side
>  	 * critical section.
>  	 */
> -	if (likely(!__cpuhp_state))
> +	if (likely(rcu_sync_is_idle(&__cpuhp_rss)))
>  		__this_cpu_inc(__cpuhp_refcount);
>  	else
>  		__get_online_cpus(); /* Unconditional memory barrier. */
> @@ -231,7 +232,7 @@ static inline void put_online_cpus(void)
>  	/*
>  	 * Same as in get_online_cpus().
>  	 */
> -	if (likely(!__cpuhp_state))
> +	if (likely(rcu_sync_is_idle(&__cpuhp_rss)))
>  		__this_cpu_dec(__cpuhp_refcount);
>  	else
>  		__put_online_cpus(); /* Unconditional memory barrier. */
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -49,14 +49,15 @@ static int cpu_hotplug_disabled;
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> 
> -enum { readers_fast = 0, readers_slow, readers_block };
> +enum { readers_slow, readers_block };

It took me a bit to realize that readers_fast is obsoleted by the
rcu_sync_is_idle() above.  ;-)

> 
> -int __cpuhp_state;
> -EXPORT_SYMBOL_GPL(__cpuhp_state);
> +DEFINE_RCU_SCHED_SYNC(__cpuhp_rss);

OK, matched RCU flavors.  ;-)

> +EXPORT_SYMBOL_GPL(__cpuhp_rss);
> 
>  DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
>  EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
> 
> +static int cpuhp_state = readers_slow;
>  static atomic_t cpuhp_waitcount;
>  static DECLARE_WAIT_QUEUE_HEAD(cpuhp_readers);
>  static DECLARE_WAIT_QUEUE_HEAD(cpuhp_writer);
> @@ -68,7 +69,6 @@ void cpu_hotplug_init_task(struct task_s
> 
>  void __get_online_cpus(void)
>  {
> -again:
>  	__this_cpu_inc(__cpuhp_refcount);
> 
>  	/*
> @@ -77,7 +77,7 @@ void __get_online_cpus(void)
>  	 * increment-on-one-CPU-and-decrement-on-another problem.
>  	 *
>  	 * And yes, if the reader misses the writer's assignment of
> -	 * readers_block to __cpuhp_state, then the writer is
> +	 * readers_block to cpuhp_state, then the writer is
>  	 * guaranteed to see the reader's increment.  Conversely, any
>  	 * readers that increment their __cpuhp_refcount after the
>  	 * writer looks are guaranteed to see the readers_block value,
> @@ -88,7 +88,7 @@ void __get_online_cpus(void)
> 
>  	smp_mb(); /* A matches D */
> 
> -	if (likely(__cpuhp_state != readers_block))
> +	if (likely(cpuhp_state != readers_block))
>  		return;
> 
>  	/*
> @@ -108,19 +108,19 @@ void __get_online_cpus(void)
>  	 * and reschedule on the preempt_enable() in get_online_cpus().
>  	 */
>  	preempt_enable_no_resched();
> -	__wait_event(cpuhp_readers, __cpuhp_state != readers_block);
> +	__wait_event(cpuhp_readers, cpuhp_state != readers_block);
>  	preempt_disable();
> 
> +	__this_cpu_inc(__cpuhp_refcount);
> +
>  	/*
> -	 * Given we've still got preempt_disabled and new cpu_hotplug_begin()
> -	 * must do a synchronize_sched() we're guaranteed a successfull
> -	 * acquisition this time -- even if we wake the current
> -	 * cpu_hotplug_end() now.
> +	 * cpu_hotplug_done() waits until all pending readers are gone;
> +	 * this means that a new cpu_hotplug_begin() must observe our
> +	 * refcount increment and wait for it to go away.
>  	 */
> -	if (atomic_dec_and_test(&cpuhp_waitcount))
> -		wake_up(&cpuhp_writer);
> 
> -	goto again;
> +	if (atomic_dec_and_test(&cpuhp_waitcount)) /* A */
> +		wake_up(&cpuhp_writer);
>  }
>  EXPORT_SYMBOL_GPL(__get_online_cpus);
> 
> @@ -183,21 +183,18 @@ void cpu_hotplug_begin(void)
>  	current->cpuhp_ref++;
> 
>  	/* Notify readers to take the slow path. */
> -	__cpuhp_state = readers_slow;
> -
> -	/* See percpu_down_write(); guarantees all readers take the slow path */
> -	synchronize_sched();
> +	rcu_sync_enter(&__cpuhp_rss);
> 
>  	/*
>  	 * Notify new readers to block; up until now, and thus throughout the
>  	 * longish synchronize_sched() above, new readers could still come in.

s/synchronize_sched/rcu_sync_enter/

>  	 */
> -	__cpuhp_state = readers_block;
> +	cpuhp_state = readers_block;
> 
>  	smp_mb(); /* D matches A */
> 
>  	/*
> -	 * If they don't see our writer of readers_block to __cpuhp_state,
> +	 * If they don't see our writer of readers_block to cpuhp_state,
>  	 * then we are guaranteed to see their __cpuhp_refcount increment, and
>  	 * therefore will wait for them.
>  	 */
> @@ -215,26 +212,23 @@ void cpu_hotplug_done(void)
>  	 * that new readers might fail to see the results of this writer's
>  	 * critical section.
>  	 */
> -	__cpuhp_state = readers_slow;
> +	cpuhp_state = readers_slow;
>  	wake_up_all(&cpuhp_readers);
> 
>  	/*
>  	 * The wait_event()/wake_up_all() prevents the race where the readers
> -	 * are delayed between fetching __cpuhp_state and blocking.
> +	 * are delayed between fetching cpuhp_state and blocking.
>  	 */
> 
> -	/* See percpu_up_write(); readers will no longer attempt to block. */
> -	synchronize_sched();
> -
> -	/* Let 'em rip */
> -	__cpuhp_state = readers_fast;
>  	current->cpuhp_ref--;
> 
>  	/*
> -	 * Wait for any pending readers to be running. This ensures readers
> -	 * after writer and avoids writers starving readers.
> +	 * Wait for any pending readers to be running. This avoids writers
> +	 * starving readers.
>  	 */
>  	wait_event(cpuhp_writer, !atomic_read(&cpuhp_waitcount));
> +
> +	rcu_sync_exit(&__cpuhp_rss);
>  }
> 
>  /*
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 16:41   ` Paul E. McKenney
@ 2013-10-03 17:00     ` Oleg Nesterov
  2013-10-03 17:15       ` Paul E. McKenney
  2013-10-03 18:40     ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 17:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Paul E. McKenney wrote:
>
> How about the something like the following, where ->read_side_check()
> gets rcu_read_lock_held(), rcu_read_lock_bh_held(), or
> rcu_read_lock_sched_held(), as appropriate?
>
> #ifdef CONFIG_PROVE_RCU
> #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
> #else
> #define rcu_sync_is_idle_check(rss) do { } while (0)
> #endif
>
> 	rcu_sync_is_idle_check(rss);

Agreed!

but can't we do this in a separate patch? (I will be happy to do
this trivial exercise ;)

This change is trivial, but perhaps it would be better to keep the
initial patch as simple as possible. And discuss the potential
"cosmetic" issues (like naming) separately. Say, rcu_lockdep_assert.
We can't use it directly, we need the new helper.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 17:00     ` Oleg Nesterov
@ 2013-10-03 17:15       ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 17:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 07:00:52PM +0200, Oleg Nesterov wrote:
> On 10/03, Paul E. McKenney wrote:
> >
> > How about the something like the following, where ->read_side_check()
> > gets rcu_read_lock_held(), rcu_read_lock_bh_held(), or
> > rcu_read_lock_sched_held(), as appropriate?
> >
> > #ifdef CONFIG_PROVE_RCU
> > #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
> > #else
> > #define rcu_sync_is_idle_check(rss) do { } while (0)
> > #endif
> >
> > 	rcu_sync_is_idle_check(rss);
> 
> Agreed!
> 
> but can't we do this in a separate patch? (I will be happy to do
> this trivial exercise ;)

I am good with that.

> This change is trivial, but perhaps it would be better to keep the
> initial patch as simple as possible. And discuss the potential
> "cosmetic" issues (like naming) separately. Say, rcu_lockdep_assert.
> We can't use it directly, we need the new helper.

OK, with s/xxx_/rcu_sync_/ in the comit log:

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

But someone else needs to lock all the candidate names in a room
overnight and tend to the resulting wounds.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 16:41   ` Paul E. McKenney
  2013-10-03 17:00     ` Oleg Nesterov
@ 2013-10-03 18:40     ` Peter Zijlstra
  2013-10-03 18:45       ` Paul E. McKenney
                         ` (2 more replies)
  1 sibling, 3 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-03 18:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 09:41:17AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 02, 2013 at 04:56:57PM +0200, Peter Zijlstra wrote:
> #ifdef CONFIG_PROVE_RCU
> #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
> #else
> #define rcu_sync_is_idle_check(rss) do { } while (0)
> #endif
> 
> 	rcu_sync_is_idle_check(rss);

The below actually seems to build, although I didn't test the config
permutations, only defconfig.

---
--- a/include/linux/rcusync.h
+++ b/include/linux/rcusync.h
@@ -14,6 +14,7 @@ struct rcu_sync_struct {
 
 	void (*sync)(void);
 	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
+	int  (*held)(void);
 };
 
 #define ___RCU_SYNC_INIT(name)						\
@@ -26,18 +27,21 @@ struct rcu_sync_struct {
 	___RCU_SYNC_INIT(name),						\
 	.sync = synchronize_sched,					\
 	.call = call_rcu_sched,						\
+	.held = rcu_read_lock_sched_held,				\
 }
 
 #define __RCU_BH_SYNC_INIT(name) {					\
 	___RCU_SYNC_INIT(name),						\
 	.sync = synchronize_rcu_bh,					\
 	.call = call_rcu_bh,						\
+	.held = rcu_read_lock_bh_held,					\
 }
 
 #define __RCU_SYNC_INIT(name) {						\
 	___RCU_SYNC_INIT(name),						\
 	.sync = synchronize_rcu,					\
 	.call = call_rcu,						\
+	.held = rcu_read_lock_held,					\
 }
 
 #define DEFINE_RCU_SCHED_SYNC(name)					\
@@ -51,6 +55,9 @@ struct rcu_sync_struct {
 
 static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
 {
+#ifdef CONFIG_PROVE_RCU
+	BUG_ON(!rss->held());
+#endif
 	return !rss->gp_state; /* GP_IDLE */
 }
 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync
  2013-10-03 16:48   ` Paul E. McKenney
@ 2013-10-03 18:41     ` Peter Zijlstra
  2013-10-03 18:46       ` Paul E. McKenney
  2013-10-03 19:05       ` Oleg Nesterov
  0 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-03 18:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 09:48:38AM -0700, Paul E. McKenney wrote:
> > -enum { readers_fast = 0, readers_slow, readers_block };
> > +enum { readers_slow, readers_block };
> 
> It took me a bit to realize that readers_fast is obsoleted by the
> rcu_sync_is_idle() above.  ;-)

Yeah.. I pondered changing/adding to the rcu_sync interface to allow
using gp_count like status to avoid the extra variable, but decided
against it for now.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 18:40     ` Peter Zijlstra
@ 2013-10-03 18:45       ` Paul E. McKenney
  2013-10-03 18:47       ` Oleg Nesterov
  2013-10-03 20:14       ` Paolo Bonzini
  2 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 08:40:01PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 03, 2013 at 09:41:17AM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 02, 2013 at 04:56:57PM +0200, Peter Zijlstra wrote:
> > #ifdef CONFIG_PROVE_RCU
> > #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
> > #else
> > #define rcu_sync_is_idle_check(rss) do { } while (0)
> > #endif
> > 
> > 	rcu_sync_is_idle_check(rss);
> 
> The below actually seems to build, although I didn't test the config
> permutations, only defconfig.

Looks good to me!

							Thanx, Paul

> ---
> --- a/include/linux/rcusync.h
> +++ b/include/linux/rcusync.h
> @@ -14,6 +14,7 @@ struct rcu_sync_struct {
> 
>  	void (*sync)(void);
>  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> +	int  (*held)(void);
>  };
> 
>  #define ___RCU_SYNC_INIT(name)						\
> @@ -26,18 +27,21 @@ struct rcu_sync_struct {
>  	___RCU_SYNC_INIT(name),						\
>  	.sync = synchronize_sched,					\
>  	.call = call_rcu_sched,						\
> +	.held = rcu_read_lock_sched_held,				\
>  }
> 
>  #define __RCU_BH_SYNC_INIT(name) {					\
>  	___RCU_SYNC_INIT(name),						\
>  	.sync = synchronize_rcu_bh,					\
>  	.call = call_rcu_bh,						\
> +	.held = rcu_read_lock_bh_held,					\
>  }
> 
>  #define __RCU_SYNC_INIT(name) {						\
>  	___RCU_SYNC_INIT(name),						\
>  	.sync = synchronize_rcu,					\
>  	.call = call_rcu,						\
> +	.held = rcu_read_lock_held,					\
>  }
> 
>  #define DEFINE_RCU_SCHED_SYNC(name)					\
> @@ -51,6 +55,9 @@ struct rcu_sync_struct {
> 
>  static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
>  {
> +#ifdef CONFIG_PROVE_RCU
> +	BUG_ON(!rss->held());
> +#endif
>  	return !rss->gp_state; /* GP_IDLE */
>  }
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync
  2013-10-03 18:41     ` Peter Zijlstra
@ 2013-10-03 18:46       ` Paul E. McKenney
  2013-10-03 19:05       ` Oleg Nesterov
  1 sibling, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 18:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 08:41:51PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 03, 2013 at 09:48:38AM -0700, Paul E. McKenney wrote:
> > > -enum { readers_fast = 0, readers_slow, readers_block };
> > > +enum { readers_slow, readers_block };
> > 
> > It took me a bit to realize that readers_fast is obsoleted by the
> > rcu_sync_is_idle() above.  ;-)
> 
> Yeah.. I pondered changing/adding to the rcu_sync interface to allow
> using gp_count like status to avoid the extra variable, but decided
> against it for now.

Should be OK as is -- people who haven't reviewed the old version with
readers_fast are unlikely to be confused by the current setup.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 18:40     ` Peter Zijlstra
  2013-10-03 18:45       ` Paul E. McKenney
@ 2013-10-03 18:47       ` Oleg Nesterov
  2013-10-03 19:21         ` Paul E. McKenney
  2013-10-03 20:14       ` Paolo Bonzini
  2 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 18:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Peter Zijlstra wrote:
>
> On Thu, Oct 03, 2013 at 09:41:17AM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 02, 2013 at 04:56:57PM +0200, Peter Zijlstra wrote:
> > #ifdef CONFIG_PROVE_RCU
> > #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
> > #else
> > #define rcu_sync_is_idle_check(rss) do { } while (0)
> > #endif
> >
> > 	rcu_sync_is_idle_check(rss);
>
> The below actually seems to build, although I didn't test the config
> permutations, only defconfig.

Yes, but...

I think it would be better to start with the patch below, this way
it would be easier a) to add the new ops (we need more anyway) and b)
use CONFIG_PROVE_RCU to avoid the unused members even if this is
really minor.

Oleg.

rcusync: introdude struct rcu_sync_ops

CHANGELOG

---
 include/linux/rcusync.h |   63 +++++++++++++++++++++--------------------------
 kernel/rcusync.c        |   40 ++++++++++++++---------------
 2 files changed, 47 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcusync.h b/include/linux/rcusync.h
index 7858491..30c6037 100644
--- a/include/linux/rcusync.h
+++ b/include/linux/rcusync.h
@@ -4,6 +4,11 @@
 #include <linux/wait.h>
 #include <linux/rcupdate.h>
 
+struct rcu_sync_ops {
+	void (*sync)(void);
+	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
+};
+
 struct rcu_sync_struct {
 	int			gp_state;
 	int			gp_count;
@@ -12,43 +17,9 @@ struct rcu_sync_struct {
 	int			cb_state;
 	struct rcu_head		cb_head;
 
-	void (*sync)(void);
-	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
+	struct rcu_sync_ops	*ops;
 };
 
-#define ___RCU_SYNC_INIT(name)						\
-	.gp_state = 0,							\
-	.gp_count = 0,							\
-	.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),		\
-	.cb_state = 0
-
-#define __RCU_SCHED_SYNC_INIT(name) {					\
-	___RCU_SYNC_INIT(name),						\
-	.sync = synchronize_sched,					\
-	.call = call_rcu_sched,						\
-}
-
-#define __RCU_BH_SYNC_INIT(name) {					\
-	___RCU_SYNC_INIT(name),						\
-	.sync = synchronize_rcu_bh,					\
-	.call = call_rcu_bh,						\
-}
-
-#define __RCU_SYNC_INIT(name) {						\
-	___RCU_SYNC_INIT(name),						\
-	.sync = synchronize_rcu,					\
-	.call = call_rcu,						\
-}
-
-#define DEFINE_RCU_SCHED_SYNC(name)					\
-	struct rcu_sync_struct name = __RCU_SCHED_SYNC_INIT(name)
-
-#define DEFINE_RCU_BH_SYNC(name)					\
-	struct rcu_sync_struct name = __RCU_BH_SYNC_INIT(name)
-
-#define DEFINE_RCU_SYNC(name)						\
-	struct rcu_sync_struct name = __RCU_SYNC_INIT(name)
-
 static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
 {
 	return !rss->gp_state; /* GP_IDLE */
@@ -60,5 +31,27 @@ extern void rcu_sync_init(struct rcu_sync_struct *, enum rcu_sync_type);
 extern void rcu_sync_enter(struct rcu_sync_struct *);
 extern void rcu_sync_exit(struct rcu_sync_struct *);
 
+extern struct rcu_sync_ops rcu_sync_ops_array[];
+
+#define __RCU_SYNC_INITIALIZER(name, type) {				\
+		.gp_state = 0,						\
+		.gp_count = 0,						\
+		.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),	\
+		.cb_state = 0,						\
+		.ops = rcu_sync_ops_array + (type),			\
+	}
+
+#define	__DEFINE_RCU_SYNC(name, type)	\
+	struct rcu_sync_struct name = __RCU_SYNC_INITIALIZER(name, type)
+
+#define DEFINE_RCU_SYNC(name)		\
+	__DEFINE_RCU_SYNC(name, RCU_SYNC)
+
+#define DEFINE_RCU_SCHED_SYNC(name)	\
+	__DEFINE_RCU_SYNC(name, RCU_SCHED_SYNC)
+
+#define DEFINE_RCU_BH_SYNC(name)	\
+	__DEFINE_RCU_SYNC(name, RCU_BH_SYNC)
+
 #endif /* _LINUX_RCUSYNC_H_ */
 
diff --git a/kernel/rcusync.c b/kernel/rcusync.c
index f84176a..1cefb83 100644
--- a/kernel/rcusync.c
+++ b/kernel/rcusync.c
@@ -1,4 +1,3 @@
-
 #include <linux/rcusync.h>
 #include <linux/sched.h>
 
@@ -7,27 +6,26 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
 
 #define	rss_lock	gp_wait.lock
 
+struct rcu_sync_ops rcu_sync_ops_array[] = {
+	[RCU_SYNC] = {
+		.sync = synchronize_rcu,
+		.call = call_rcu,
+	},
+	[RCU_SCHED_SYNC] = {
+		.sync = synchronize_sched,
+		.call = call_rcu_sched,
+	},
+	[RCU_BH_SYNC] = {
+		.sync = synchronize_rcu_bh,
+		.call = call_rcu_bh,
+	},
+};
+
 void rcu_sync_init(struct rcu_sync_struct *rss, enum rcu_sync_type type)
 {
 	memset(rss, 0, sizeof(*rss));
 	init_waitqueue_head(&rss->gp_wait);
-
-	switch (type) {
-	case RCU_SYNC:
-		rss->sync = synchronize_rcu;
-		rss->call = call_rcu;
-		break;
-
-	case RCU_SCHED_SYNC:
-		rss->sync = synchronize_sched;
-		rss->call = call_rcu_sched;
-		break;
-
-	case RCU_BH_SYNC:
-		rss->sync = synchronize_rcu_bh;
-		rss->call = call_rcu_bh;
-		break;
-	}
+	rss->ops = rcu_sync_ops_array + type;
 }
 
 void rcu_sync_enter(struct rcu_sync_struct *rss)
@@ -44,7 +42,7 @@ void rcu_sync_enter(struct rcu_sync_struct *rss)
 	BUG_ON(need_wait && need_sync);
 
 	if (need_sync) {
-		rss->sync();
+		rss->ops->sync();
 		rss->gp_state = GP_PASSED;
 		wake_up_all(&rss->gp_wait);
 	} else if (need_wait) {
@@ -81,7 +79,7 @@ static void rcu_sync_func(struct rcu_head *rcu)
 		 * to catch a later GP.
 		 */
 		rss->cb_state = CB_PENDING;
-		rss->call(&rss->cb_head, rcu_sync_func);
+		rss->ops->call(&rss->cb_head, rcu_sync_func);
 	} else {
 		/*
 		 * We're at least a GP after rcu_sync_exit(); eveybody will now
@@ -99,7 +97,7 @@ void rcu_sync_exit(struct rcu_sync_struct *rss)
 	if (!--rss->gp_count) {
 		if (rss->cb_state == CB_IDLE) {
 			rss->cb_state = CB_PENDING;
-			rss->call(&rss->cb_head, rcu_sync_func);
+			rss->ops->call(&rss->cb_head, rcu_sync_func);
 		} else if (rss->cb_state == CB_PENDING) {
 			rss->cb_state = CB_REPLAY;
 		}


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync
  2013-10-03 18:41     ` Peter Zijlstra
  2013-10-03 18:46       ` Paul E. McKenney
@ 2013-10-03 19:05       ` Oleg Nesterov
  1 sibling, 0 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 19:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Peter Zijlstra wrote:
>
> On Thu, Oct 03, 2013 at 09:48:38AM -0700, Paul E. McKenney wrote:
> > > -enum { readers_fast = 0, readers_slow, readers_block };
> > > +enum { readers_slow, readers_block };
> >
> > It took me a bit to realize that readers_fast is obsoleted by the
> > rcu_sync_is_idle() above.  ;-)
>
> Yeah.. I pondered changing/adding to the rcu_sync interface to allow
> using gp_count like status to avoid the extra variable, but decided
> against it for now.

Agreed, it looks simple enough even if get/put has to read ->gp_state
or/and cpuhp_state.

But, just is case, this is one example of why it probably makes sense
to rcu_sync_struct->state_changed(new_state, void *data) callback. In
this case it could simply do

	static void cpuhp_rcu_sync_state_cb(state, ...)
	{
		switch (state) {
		case GP_IDLE:
			cpuhp_state = readers_fast;
			break;
		case GP_PENDING:
			cpuhp_state = readers_slow;
			break;
		}
	}

Doesn't make sense in this particular case, but perhaps in general.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 18:47       ` Oleg Nesterov
@ 2013-10-03 19:21         ` Paul E. McKenney
  2013-10-03 19:32           ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 19:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 08:47:19PM +0200, Oleg Nesterov wrote:
> On 10/03, Peter Zijlstra wrote:
> >
> > On Thu, Oct 03, 2013 at 09:41:17AM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 02, 2013 at 04:56:57PM +0200, Peter Zijlstra wrote:
> > > #ifdef CONFIG_PROVE_RCU
> > > #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
> > > #else
> > > #define rcu_sync_is_idle_check(rss) do { } while (0)
> > > #endif
> > >
> > > 	rcu_sync_is_idle_check(rss);
> >
> > The below actually seems to build, although I didn't test the config
> > permutations, only defconfig.
> 
> Yes, but...
> 
> I think it would be better to start with the patch below, this way
> it would be easier a) to add the new ops (we need more anyway) and b)
> use CONFIG_PROVE_RCU to avoid the unused members even if this is
> really minor.

I am fine with this as well, but I don't see the code that checks for
the required RCU read-side critical section.

							Thanx, Paul

> Oleg.
> 
> rcusync: introdude struct rcu_sync_ops
> 
> CHANGELOG
> 
> ---
>  include/linux/rcusync.h |   63 +++++++++++++++++++++--------------------------
>  kernel/rcusync.c        |   40 ++++++++++++++---------------
>  2 files changed, 47 insertions(+), 56 deletions(-)
> 
> diff --git a/include/linux/rcusync.h b/include/linux/rcusync.h
> index 7858491..30c6037 100644
> --- a/include/linux/rcusync.h
> +++ b/include/linux/rcusync.h
> @@ -4,6 +4,11 @@
>  #include <linux/wait.h>
>  #include <linux/rcupdate.h>
> 
> +struct rcu_sync_ops {
> +	void (*sync)(void);
> +	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> +};
> +
>  struct rcu_sync_struct {
>  	int			gp_state;
>  	int			gp_count;
> @@ -12,43 +17,9 @@ struct rcu_sync_struct {
>  	int			cb_state;
>  	struct rcu_head		cb_head;
> 
> -	void (*sync)(void);
> -	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> +	struct rcu_sync_ops	*ops;
>  };
> 
> -#define ___RCU_SYNC_INIT(name)						\
> -	.gp_state = 0,							\
> -	.gp_count = 0,							\
> -	.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),		\
> -	.cb_state = 0
> -
> -#define __RCU_SCHED_SYNC_INIT(name) {					\
> -	___RCU_SYNC_INIT(name),						\
> -	.sync = synchronize_sched,					\
> -	.call = call_rcu_sched,						\
> -}
> -
> -#define __RCU_BH_SYNC_INIT(name) {					\
> -	___RCU_SYNC_INIT(name),						\
> -	.sync = synchronize_rcu_bh,					\
> -	.call = call_rcu_bh,						\
> -}
> -
> -#define __RCU_SYNC_INIT(name) {						\
> -	___RCU_SYNC_INIT(name),						\
> -	.sync = synchronize_rcu,					\
> -	.call = call_rcu,						\
> -}
> -
> -#define DEFINE_RCU_SCHED_SYNC(name)					\
> -	struct rcu_sync_struct name = __RCU_SCHED_SYNC_INIT(name)
> -
> -#define DEFINE_RCU_BH_SYNC(name)					\
> -	struct rcu_sync_struct name = __RCU_BH_SYNC_INIT(name)
> -
> -#define DEFINE_RCU_SYNC(name)						\
> -	struct rcu_sync_struct name = __RCU_SYNC_INIT(name)
> -
>  static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
>  {
>  	return !rss->gp_state; /* GP_IDLE */
> @@ -60,5 +31,27 @@ extern void rcu_sync_init(struct rcu_sync_struct *, enum rcu_sync_type);
>  extern void rcu_sync_enter(struct rcu_sync_struct *);
>  extern void rcu_sync_exit(struct rcu_sync_struct *);
> 
> +extern struct rcu_sync_ops rcu_sync_ops_array[];
> +
> +#define __RCU_SYNC_INITIALIZER(name, type) {				\
> +		.gp_state = 0,						\
> +		.gp_count = 0,						\
> +		.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),	\
> +		.cb_state = 0,						\
> +		.ops = rcu_sync_ops_array + (type),			\
> +	}
> +
> +#define	__DEFINE_RCU_SYNC(name, type)	\
> +	struct rcu_sync_struct name = __RCU_SYNC_INITIALIZER(name, type)
> +
> +#define DEFINE_RCU_SYNC(name)		\
> +	__DEFINE_RCU_SYNC(name, RCU_SYNC)
> +
> +#define DEFINE_RCU_SCHED_SYNC(name)	\
> +	__DEFINE_RCU_SYNC(name, RCU_SCHED_SYNC)
> +
> +#define DEFINE_RCU_BH_SYNC(name)	\
> +	__DEFINE_RCU_SYNC(name, RCU_BH_SYNC)
> +
>  #endif /* _LINUX_RCUSYNC_H_ */
> 
> diff --git a/kernel/rcusync.c b/kernel/rcusync.c
> index f84176a..1cefb83 100644
> --- a/kernel/rcusync.c
> +++ b/kernel/rcusync.c
> @@ -1,4 +1,3 @@
> -
>  #include <linux/rcusync.h>
>  #include <linux/sched.h>
> 
> @@ -7,27 +6,26 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> 
>  #define	rss_lock	gp_wait.lock
> 
> +struct rcu_sync_ops rcu_sync_ops_array[] = {
> +	[RCU_SYNC] = {
> +		.sync = synchronize_rcu,
> +		.call = call_rcu,
> +	},
> +	[RCU_SCHED_SYNC] = {
> +		.sync = synchronize_sched,
> +		.call = call_rcu_sched,
> +	},
> +	[RCU_BH_SYNC] = {
> +		.sync = synchronize_rcu_bh,
> +		.call = call_rcu_bh,
> +	},
> +};
> +
>  void rcu_sync_init(struct rcu_sync_struct *rss, enum rcu_sync_type type)
>  {
>  	memset(rss, 0, sizeof(*rss));
>  	init_waitqueue_head(&rss->gp_wait);
> -
> -	switch (type) {
> -	case RCU_SYNC:
> -		rss->sync = synchronize_rcu;
> -		rss->call = call_rcu;
> -		break;
> -
> -	case RCU_SCHED_SYNC:
> -		rss->sync = synchronize_sched;
> -		rss->call = call_rcu_sched;
> -		break;
> -
> -	case RCU_BH_SYNC:
> -		rss->sync = synchronize_rcu_bh;
> -		rss->call = call_rcu_bh;
> -		break;
> -	}
> +	rss->ops = rcu_sync_ops_array + type;
>  }
> 
>  void rcu_sync_enter(struct rcu_sync_struct *rss)
> @@ -44,7 +42,7 @@ void rcu_sync_enter(struct rcu_sync_struct *rss)
>  	BUG_ON(need_wait && need_sync);
> 
>  	if (need_sync) {
> -		rss->sync();
> +		rss->ops->sync();
>  		rss->gp_state = GP_PASSED;
>  		wake_up_all(&rss->gp_wait);
>  	} else if (need_wait) {
> @@ -81,7 +79,7 @@ static void rcu_sync_func(struct rcu_head *rcu)
>  		 * to catch a later GP.
>  		 */
>  		rss->cb_state = CB_PENDING;
> -		rss->call(&rss->cb_head, rcu_sync_func);
> +		rss->ops->call(&rss->cb_head, rcu_sync_func);
>  	} else {
>  		/*
>  		 * We're at least a GP after rcu_sync_exit(); eveybody will now
> @@ -99,7 +97,7 @@ void rcu_sync_exit(struct rcu_sync_struct *rss)
>  	if (!--rss->gp_count) {
>  		if (rss->cb_state == CB_IDLE) {
>  			rss->cb_state = CB_PENDING;
> -			rss->call(&rss->cb_head, rcu_sync_func);
> +			rss->ops->call(&rss->cb_head, rcu_sync_func);
>  		} else if (rss->cb_state == CB_PENDING) {
>  			rss->cb_state = CB_REPLAY;
>  		}
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 19:21         ` Paul E. McKenney
@ 2013-10-03 19:32           ` Oleg Nesterov
  2013-10-03 19:33             ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 19:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Paul E. McKenney wrote:
>
> On Thu, Oct 03, 2013 at 08:47:19PM +0200, Oleg Nesterov wrote:
> >
> > Yes, but...
> >
> > I think it would be better to start with the patch below, this way
> > it would be easier a) to add the new ops (we need more anyway) and b)
> > use CONFIG_PROVE_RCU to avoid the unused members even if this is
> > really minor.
>
> I am fine with this as well, but I don't see the code that checks for
> the required RCU read-side critical section.

Because I believe this needs another patch ;) see below, didn't test
it yet.

(and yes, until I saw the patch from Peter I didn't realize we already
 have all necessary helpers as functions, somehow I tought that they
 are macros).

Oleg.

rcusync: add the CONFIG_PROVE_RCU checks

CHANGELOG
---

diff --git a/include/linux/rcusync.h b/include/linux/rcusync.h
index 30c6037..aa7e9e0 100644
--- a/include/linux/rcusync.h
+++ b/include/linux/rcusync.h
@@ -7,6 +7,9 @@
 struct rcu_sync_ops {
 	void (*sync)(void);
 	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
+#ifdef CONFIG_PROVE_RCU
+	bool (*held)(void);
+#endif
 };
 
 struct rcu_sync_struct {
@@ -22,6 +25,9 @@ struct rcu_sync_struct {
 
 static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
 {
+#ifdef CONFIG_PROVE_RCU
+	WARN_ON(!rss->ops->held());
+#endif
 	return !rss->gp_state; /* GP_IDLE */
 }
 
diff --git a/kernel/rcusync.c b/kernel/rcusync.c
index 1cefb83..21f191f 100644
--- a/kernel/rcusync.c
+++ b/kernel/rcusync.c
@@ -6,18 +6,27 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
 
 #define	rss_lock	gp_wait.lock
 
+#ifdef CONFIG_PROVE_RCU
+#define __INIT_HELD(func)	.held = func,
+#else
+#define	__INIT_HELD(func)
+#endif
+
 struct rcu_sync_ops rcu_sync_ops_array[] = {
 	[RCU_SYNC] = {
 		.sync = synchronize_rcu,
 		.call = call_rcu,
+		__INIT_HELD(rcu_read_lock_held)
 	},
 	[RCU_SCHED_SYNC] = {
 		.sync = synchronize_sched,
 		.call = call_rcu_sched,
+		__INIT_HELD(rcu_read_lock_sched_held)
 	},
 	[RCU_BH_SYNC] = {
 		.sync = synchronize_rcu_bh,
 		.call = call_rcu_bh,
+		__INIT_HELD(rcu_read_lock_bh_held)
 	},
 };
 


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 19:32           ` Oleg Nesterov
@ 2013-10-03 19:33             ` Oleg Nesterov
  2013-10-03 19:50               ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 19:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Oleg Nesterov wrote:
>
> Because I believe this needs another patch ;) see below, didn't test
> it yet.
> ...
>  struct rcu_sync_ops {
>  	void (*sync)(void);
>  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> +#ifdef CONFIG_PROVE_RCU
> +	bool (*held)(void);
        ^^^^
OK, it has to return "int".

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 19:33             ` Oleg Nesterov
@ 2013-10-03 19:50               ` Paul E. McKenney
  2013-10-03 20:00                 ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 19:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 09:33:19PM +0200, Oleg Nesterov wrote:
> On 10/03, Oleg Nesterov wrote:
> >
> > Because I believe this needs another patch ;) see below, didn't test
> > it yet.
> > ...
> >  struct rcu_sync_ops {
> >  	void (*sync)(void);
> >  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> > +#ifdef CONFIG_PROVE_RCU
> > +	bool (*held)(void);
>         ^^^^
> OK, it has to return "int".

I missed this, but the rest looked good.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 19:50               ` Paul E. McKenney
@ 2013-10-03 20:00                 ` Oleg Nesterov
  2013-10-03 21:10                   ` Oleg Nesterov
  2013-10-04  7:00                   ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 20:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Paul E. McKenney wrote:
>
> On Thu, Oct 03, 2013 at 09:33:19PM +0200, Oleg Nesterov wrote:
> > On 10/03, Oleg Nesterov wrote:
> > >
> > > Because I believe this needs another patch ;) see below, didn't test
> > > it yet.
> > > ...
> > >  struct rcu_sync_ops {
> > >  	void (*sync)(void);
> > >  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> > > +#ifdef CONFIG_PROVE_RCU
> > > +	bool (*held)(void);
> >         ^^^^
> > OK, it has to return "int".
>
> I missed this, but the rest looked good.  ;-)

OK thanks ;)

So unless Peter objects I'll write the changelogs (always nontrivial task),
test, and send these 2 patches + "add ops->barr() / rcu_sync_wait_for_cb"
tomorrow.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 18:40     ` Peter Zijlstra
  2013-10-03 18:45       ` Paul E. McKenney
  2013-10-03 18:47       ` Oleg Nesterov
@ 2013-10-03 20:14       ` Paolo Bonzini
  2013-10-04  7:01         ` Peter Zijlstra
  2 siblings, 1 reply; 46+ messages in thread
From: Paolo Bonzini @ 2013-10-03 20:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Oleg Nesterov, Mel Gorman, Rik van Riel,
	Srikar Dronamraju, Ingo Molnar, Andrea Arcangeli,
	Johannes Weiner, Thomas Gleixner, Steven Rostedt, Linus Torvalds,
	linux-kernel

Il 03/10/2013 20:40, Peter Zijlstra ha scritto:
> On Thu, Oct 03, 2013 at 09:41:17AM -0700, Paul E. McKenney wrote:
>> On Wed, Oct 02, 2013 at 04:56:57PM +0200, Peter Zijlstra wrote:
>> #ifdef CONFIG_PROVE_RCU
>> #define rcu_sync_is_idle_check(rss) BUG_ON(!rss->read_side_check())
>> #else
>> #define rcu_sync_is_idle_check(rss) do { } while (0)
>> #endif
>>
>> 	rcu_sync_is_idle_check(rss);
> 
> The below actually seems to build, although I didn't test the config
> permutations, only defconfig.
> 
> ---
> --- a/include/linux/rcusync.h
> +++ b/include/linux/rcusync.h
> @@ -14,6 +14,7 @@ struct rcu_sync_struct {
>  
>  	void (*sync)(void);
>  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> +	int  (*held)(void);
>  };
>  
>  #define ___RCU_SYNC_INIT(name)						\
> @@ -26,18 +27,21 @@ struct rcu_sync_struct {
>  	___RCU_SYNC_INIT(name),						\
>  	.sync = synchronize_sched,					\
>  	.call = call_rcu_sched,						\
> +	.held = rcu_read_lock_sched_held,				\
>  }
>  
>  #define __RCU_BH_SYNC_INIT(name) {					\
>  	___RCU_SYNC_INIT(name),						\
>  	.sync = synchronize_rcu_bh,					\
>  	.call = call_rcu_bh,						\
> +	.held = rcu_read_lock_bh_held,					\
>  }
>  
>  #define __RCU_SYNC_INIT(name) {						\
>  	___RCU_SYNC_INIT(name),						\
>  	.sync = synchronize_rcu,					\
>  	.call = call_rcu,						\
> +	.held = rcu_read_lock_held,					\
>  }
>  
>  #define DEFINE_RCU_SCHED_SYNC(name)					\
> @@ -51,6 +55,9 @@ struct rcu_sync_struct {
>  
>  static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
>  {
> +#ifdef CONFIG_PROVE_RCU
> +	BUG_ON(!rss->held());
> +#endif
>  	return !rss->gp_state; /* GP_IDLE */
>  }
>  
> 

Do you need to do the same in rcu_sync_init?

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 20:00                 ` Oleg Nesterov
@ 2013-10-03 21:10                   ` Oleg Nesterov
  2013-10-03 22:00                     ` Paul E. McKenney
  2013-10-04  7:18                     ` Peter Zijlstra
  2013-10-04  7:00                   ` Peter Zijlstra
  1 sibling, 2 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-03 21:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Oleg Nesterov wrote:
>
> So unless Peter objects I'll write the changelogs (always nontrivial task),
> test, and send these 2 patches + "add ops->barr() / rcu_sync_wait_for_cb"
> tomorrow.

And, can't resist, probably another patch below (incomplete, needs the
obvious change in cpu.c and the new trivial __complete_locked() helper).

Hmm. But when I look at wait_for_completion() I think it is buggy. Will
try to send the fix tomorrow.

Oleg.

rcusync: introduce rcu_sync_struct->exclusive mode

CHANGELOG.
---

diff --git a/include/linux/rcusync.h b/include/linux/rcusync.h
index ab787c1..265875c 100644
--- a/include/linux/rcusync.h
+++ b/include/linux/rcusync.h
@@ -1,8 +1,8 @@
 #ifndef _LINUX_RCUSYNC_H_
 #define _LINUX_RCUSYNC_H_
 
-#include <linux/wait.h>
 #include <linux/rcupdate.h>
+#include <linux/completion.h>
 
 struct rcu_sync_ops {
 	void (*sync)(void);
@@ -15,11 +15,12 @@ struct rcu_sync_ops {
 struct rcu_sync_struct {
 	int			gp_state;
 	int			gp_count;
-	wait_queue_head_t	gp_wait;
+	struct completion	gp_comp;
 
 	int			cb_state;
 	struct rcu_head		cb_head;
 
+	bool			exclusive;
 	struct rcu_sync_ops	*ops;
 };
 
@@ -33,31 +34,33 @@ static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
 
 enum rcu_sync_type { RCU_SYNC, RCU_SCHED_SYNC, RCU_BH_SYNC };
 
-extern void rcu_sync_init(struct rcu_sync_struct *, enum rcu_sync_type);
+extern void rcu_sync_init(struct rcu_sync_struct *,
+				enum rcu_sync_type, bool excl);
 extern void rcu_sync_enter(struct rcu_sync_struct *);
 extern void rcu_sync_exit(struct rcu_sync_struct *);
 
 extern struct rcu_sync_ops rcu_sync_ops_array[];
 
-#define __RCU_SYNC_INITIALIZER(name, type) {				\
+#define __RCU_SYNC_INITIALIZER(name, type, excl) {			\
 		.gp_state = 0,						\
 		.gp_count = 0,						\
-		.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),	\
+		.gp_comp = COMPLETION_INITIALIZER(name.gp_comp),	\
 		.cb_state = 0,						\
+		.exclusive = excl,					\
 		.ops = rcu_sync_ops_array + (type),			\
 	}
 
-#define	__DEFINE_RCU_SYNC(name, type)	\
-	struct rcu_sync_struct name = __RCU_SYNC_INITIALIZER(name, type)
+#define	__DEFINE_RCU_SYNC(name, type, excl)	\
+	struct rcu_sync_struct name = __RCU_SYNC_INITIALIZER(name, type, excl)
 
-#define DEFINE_RCU_SYNC(name)		\
-	__DEFINE_RCU_SYNC(name, RCU_SYNC)
+#define DEFINE_RCU_SYNC(name, excl)		\
+	__DEFINE_RCU_SYNC(name, RCU_SYNC, excl)
 
-#define DEFINE_RCU_SCHED_SYNC(name)	\
-	__DEFINE_RCU_SYNC(name, RCU_SCHED_SYNC)
+#define DEFINE_RCU_SCHED_SYNC(name, excl)	\
+	__DEFINE_RCU_SYNC(name, RCU_SCHED_SYNC, excl)
 
-#define DEFINE_RCU_BH_SYNC(name)	\
-	__DEFINE_RCU_SYNC(name, RCU_BH_SYNC)
+#define DEFINE_RCU_BH_SYNC(name, excl)		\
+	__DEFINE_RCU_SYNC(name, RCU_BH_SYNC, excl)
 
 #endif /* _LINUX_RCUSYNC_H_ */
 
diff --git a/kernel/rcusync.c b/kernel/rcusync.c
index 21cde9b..d8068df 100644
--- a/kernel/rcusync.c
+++ b/kernel/rcusync.c
@@ -4,7 +4,7 @@
 enum { GP_IDLE = 0, GP_PENDING, GP_PASSED };
 enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
 
-#define	rss_lock	gp_wait.lock
+#define	rss_lock	gp_comp.wait.lock
 
 #ifdef CONFIG_PROVE_RCU
 #define __INIT_HELD(func)	.held = func,
@@ -30,11 +30,13 @@ struct rcu_sync_ops rcu_sync_ops_array[] = {
 	},
 };
 
-void rcu_sync_init(struct rcu_sync_struct *rss, enum rcu_sync_type type)
+void rcu_sync_init(struct rcu_sync_struct *rss,
+			enum rcu_sync_type type, bool excl)
 {
 	memset(rss, 0, sizeof(*rss));
-	init_waitqueue_head(&rss->gp_wait);
+	init_completion(&rss->gp_comp);
 	rss->ops = rcu_sync_ops_array + type;
+	rss->exclusive = excl;
 }
 
 void rcu_sync_enter(struct rcu_sync_struct *rss)
@@ -53,9 +55,13 @@ void rcu_sync_enter(struct rcu_sync_struct *rss)
 	if (need_sync) {
 		rss->ops->sync();
 		rss->gp_state = GP_PASSED;
-		wake_up_all(&rss->gp_wait);
+		if (!rss->exclusive)
+			wake_up_all(&rss->gp_comp.wait);
 	} else if (need_wait) {
-		wait_event(rss->gp_wait, rss->gp_state == GP_PASSED);
+		if (!rss->exclusive)
+			wait_event(rss->gp_comp.wait, rss->gp_state == GP_PASSED);
+		else
+			wait_for_completion(&rss->gp_comp);
 	} else {
 		/*
 		 * Possible when there's a pending CB from a rcu_sync_exit().
@@ -110,6 +116,8 @@ void rcu_sync_exit(struct rcu_sync_struct *rss)
 		} else if (rss->cb_state == CB_PENDING) {
 			rss->cb_state = CB_REPLAY;
 		}
+	} else if (rss->exclusive) {
+		__complete_locked(&rss->gp_comp);
 	}
 	spin_unlock_irq(&rss->rss_lock);
 }


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 21:10                   ` Oleg Nesterov
@ 2013-10-03 22:00                     ` Paul E. McKenney
  2013-10-04 11:29                       ` Oleg Nesterov
  2013-10-04  7:18                     ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-03 22:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 11:10:09PM +0200, Oleg Nesterov wrote:
> On 10/03, Oleg Nesterov wrote:
> >
> > So unless Peter objects I'll write the changelogs (always nontrivial task),
> > test, and send these 2 patches + "add ops->barr() / rcu_sync_wait_for_cb"
> > tomorrow.
> 
> And, can't resist, probably another patch below (incomplete, needs the
> obvious change in cpu.c and the new trivial __complete_locked() helper).
> 
> Hmm. But when I look at wait_for_completion() I think it is buggy. Will
> try to send the fix tomorrow.
> 
> Oleg.
> 
> rcusync: introduce rcu_sync_struct->exclusive mode
> 
> CHANGELOG.

Should the changelog really be in all caps?  (Sorry, couldn't resist...)

One confusion below (mine), otherwise cannot see any obvious problems
with it.  Not that this should count for much, getting a bit punch-drunk
reviewing this sort of code.  ;-)

							Thanx, Paul

> ---
> 
> diff --git a/include/linux/rcusync.h b/include/linux/rcusync.h
> index ab787c1..265875c 100644
> --- a/include/linux/rcusync.h
> +++ b/include/linux/rcusync.h
> @@ -1,8 +1,8 @@
>  #ifndef _LINUX_RCUSYNC_H_
>  #define _LINUX_RCUSYNC_H_
> 
> -#include <linux/wait.h>
>  #include <linux/rcupdate.h>
> +#include <linux/completion.h>
> 
>  struct rcu_sync_ops {
>  	void (*sync)(void);
> @@ -15,11 +15,12 @@ struct rcu_sync_ops {
>  struct rcu_sync_struct {
>  	int			gp_state;
>  	int			gp_count;
> -	wait_queue_head_t	gp_wait;
> +	struct completion	gp_comp;
> 
>  	int			cb_state;
>  	struct rcu_head		cb_head;
> 
> +	bool			exclusive;
>  	struct rcu_sync_ops	*ops;
>  };
> 
> @@ -33,31 +34,33 @@ static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
> 
>  enum rcu_sync_type { RCU_SYNC, RCU_SCHED_SYNC, RCU_BH_SYNC };
> 
> -extern void rcu_sync_init(struct rcu_sync_struct *, enum rcu_sync_type);
> +extern void rcu_sync_init(struct rcu_sync_struct *,
> +				enum rcu_sync_type, bool excl);
>  extern void rcu_sync_enter(struct rcu_sync_struct *);
>  extern void rcu_sync_exit(struct rcu_sync_struct *);
> 
>  extern struct rcu_sync_ops rcu_sync_ops_array[];
> 
> -#define __RCU_SYNC_INITIALIZER(name, type) {				\
> +#define __RCU_SYNC_INITIALIZER(name, type, excl) {			\
>  		.gp_state = 0,						\
>  		.gp_count = 0,						\
> -		.gp_wait = __WAIT_QUEUE_HEAD_INITIALIZER(name.gp_wait),	\
> +		.gp_comp = COMPLETION_INITIALIZER(name.gp_comp),	\
>  		.cb_state = 0,						\
> +		.exclusive = excl,					\
>  		.ops = rcu_sync_ops_array + (type),			\
>  	}
> 
> -#define	__DEFINE_RCU_SYNC(name, type)	\
> -	struct rcu_sync_struct name = __RCU_SYNC_INITIALIZER(name, type)
> +#define	__DEFINE_RCU_SYNC(name, type, excl)	\
> +	struct rcu_sync_struct name = __RCU_SYNC_INITIALIZER(name, type, excl)
> 
> -#define DEFINE_RCU_SYNC(name)		\
> -	__DEFINE_RCU_SYNC(name, RCU_SYNC)
> +#define DEFINE_RCU_SYNC(name, excl)		\
> +	__DEFINE_RCU_SYNC(name, RCU_SYNC, excl)
> 
> -#define DEFINE_RCU_SCHED_SYNC(name)	\
> -	__DEFINE_RCU_SYNC(name, RCU_SCHED_SYNC)
> +#define DEFINE_RCU_SCHED_SYNC(name, excl)	\
> +	__DEFINE_RCU_SYNC(name, RCU_SCHED_SYNC, excl)
> 
> -#define DEFINE_RCU_BH_SYNC(name)	\
> -	__DEFINE_RCU_SYNC(name, RCU_BH_SYNC)
> +#define DEFINE_RCU_BH_SYNC(name, excl)		\
> +	__DEFINE_RCU_SYNC(name, RCU_BH_SYNC, excl)
> 
>  #endif /* _LINUX_RCUSYNC_H_ */
> 
> diff --git a/kernel/rcusync.c b/kernel/rcusync.c
> index 21cde9b..d8068df 100644
> --- a/kernel/rcusync.c
> +++ b/kernel/rcusync.c
> @@ -4,7 +4,7 @@
>  enum { GP_IDLE = 0, GP_PENDING, GP_PASSED };
>  enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
> 
> -#define	rss_lock	gp_wait.lock
> +#define	rss_lock	gp_comp.wait.lock
> 
>  #ifdef CONFIG_PROVE_RCU
>  #define __INIT_HELD(func)	.held = func,
> @@ -30,11 +30,13 @@ struct rcu_sync_ops rcu_sync_ops_array[] = {
>  	},
>  };
> 
> -void rcu_sync_init(struct rcu_sync_struct *rss, enum rcu_sync_type type)
> +void rcu_sync_init(struct rcu_sync_struct *rss,
> +			enum rcu_sync_type type, bool excl)
>  {
>  	memset(rss, 0, sizeof(*rss));
> -	init_waitqueue_head(&rss->gp_wait);
> +	init_completion(&rss->gp_comp);
>  	rss->ops = rcu_sync_ops_array + type;
> +	rss->exclusive = excl;
>  }
> 
>  void rcu_sync_enter(struct rcu_sync_struct *rss)
> @@ -53,9 +55,13 @@ void rcu_sync_enter(struct rcu_sync_struct *rss)
>  	if (need_sync) {
>  		rss->ops->sync();
>  		rss->gp_state = GP_PASSED;
> -		wake_up_all(&rss->gp_wait);
> +		if (!rss->exclusive)
> +			wake_up_all(&rss->gp_comp.wait);

Not sure about the wake_up_all() on a completion, but if we are exclusive,
don't we need to complete() the completion here?

Oh, I guess gp_comp.wait is exactly a wait_queue_head_t, so I guess
you can do wake_up_all() on it...  And the thing we would wake up
is ourselves, and we are already awake.

Never mind!!!

>  	} else if (need_wait) {
> -		wait_event(rss->gp_wait, rss->gp_state == GP_PASSED);
> +		if (!rss->exclusive)
> +			wait_event(rss->gp_comp.wait, rss->gp_state == GP_PASSED);
> +		else
> +			wait_for_completion(&rss->gp_comp);
>  	} else {
>  		/*
>  		 * Possible when there's a pending CB from a rcu_sync_exit().
> @@ -110,6 +116,8 @@ void rcu_sync_exit(struct rcu_sync_struct *rss)
>  		} else if (rss->cb_state == CB_PENDING) {
>  			rss->cb_state = CB_REPLAY;
>  		}
> +	} else if (rss->exclusive) {
> +		__complete_locked(&rss->gp_comp);
>  	}
>  	spin_unlock_irq(&rss->rss_lock);
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 20:00                 ` Oleg Nesterov
  2013-10-03 21:10                   ` Oleg Nesterov
@ 2013-10-04  7:00                   ` Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04  7:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 10:00:02PM +0200, Oleg Nesterov wrote:
> On 10/03, Paul E. McKenney wrote:
> >
> > On Thu, Oct 03, 2013 at 09:33:19PM +0200, Oleg Nesterov wrote:
> > > On 10/03, Oleg Nesterov wrote:
> > > >
> > > > Because I believe this needs another patch ;) see below, didn't test
> > > > it yet.
> > > > ...
> > > >  struct rcu_sync_ops {
> > > >  	void (*sync)(void);
> > > >  	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> > > > +#ifdef CONFIG_PROVE_RCU
> > > > +	bool (*held)(void);
> > >         ^^^^
> > > OK, it has to return "int".
> >
> > I missed this, but the rest looked good.  ;-)
> 
> OK thanks ;)
> 
> So unless Peter objects I'll write the changelogs (always nontrivial task),
> test, and send these 2 patches + "add ops->barr() / rcu_sync_wait_for_cb"
> tomorrow.

No I'm fine with that.. its sad that the ops thing requires a double
deref and an extra cachemiss but I don't suppose that's a real issue
since we're calling synchronize_*() things which take for ever anyway
:-)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 20:14       ` Paolo Bonzini
@ 2013-10-04  7:01         ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04  7:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul E. McKenney, Oleg Nesterov, Mel Gorman, Rik van Riel,
	Srikar Dronamraju, Ingo Molnar, Andrea Arcangeli,
	Johannes Weiner, Thomas Gleixner, Steven Rostedt, Linus Torvalds,
	linux-kernel

On Thu, Oct 03, 2013 at 10:14:57PM +0200, Paolo Bonzini wrote:
> Do you need to do the same in rcu_sync_init?

Yes.. I actually did but only send the diff for the header file :/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 21:10                   ` Oleg Nesterov
  2013-10-03 22:00                     ` Paul E. McKenney
@ 2013-10-04  7:18                     ` Peter Zijlstra
  2013-10-04 11:15                       ` Oleg Nesterov
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04  7:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Thu, Oct 03, 2013 at 11:10:09PM +0200, Oleg Nesterov wrote:
> On 10/03, Oleg Nesterov wrote:
> >
> > So unless Peter objects I'll write the changelogs (always nontrivial task),
> > test, and send these 2 patches + "add ops->barr() / rcu_sync_wait_for_cb"
> > tomorrow.
> 
> And, can't resist, probably another patch below (incomplete, needs the
> obvious change in cpu.c and the new trivial __complete_locked() helper).
> 
> Hmm. But when I look at wait_for_completion() I think it is buggy. Will
> try to send the fix tomorrow.
> 
> Oleg.
> 
> rcusync: introduce rcu_sync_struct->exclusive mode

What's exclusive to mean? One writer at a time? Why don't use use the
waitqueue in exclusive mode and use a single wake_up instead of
wake_up_all()?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04  7:18                     ` Peter Zijlstra
@ 2013-10-04 11:15                       ` Oleg Nesterov
  2013-10-04 11:36                         ` Peter Zijlstra
  2013-10-04 11:44                         ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 11:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Thu, Oct 03, 2013 at 11:10:09PM +0200, Oleg Nesterov wrote:
> > On 10/03, Oleg Nesterov wrote:
> > >
> > > So unless Peter objects I'll write the changelogs (always nontrivial task),
> > > test, and send these 2 patches + "add ops->barr() / rcu_sync_wait_for_cb"
> > > tomorrow.
> >
> > And, can't resist, probably another patch below (incomplete, needs the
> > obvious change in cpu.c and the new trivial __complete_locked() helper).
> >
> > Hmm. But when I look at wait_for_completion() I think it is buggy.

No, sorry for noise, it is fine.

> > rcusync: introduce rcu_sync_struct->exclusive mode
>
> What's exclusive to mean? One writer at a time?

Yes,

> Why don't use use the
> waitqueue in exclusive mode and use a single wake_up instead of
> wake_up_all()?

But this won't work, wait_event_exclusive(wq, CONDITION)-like code
obviously can't guarantee that only a single thread sees CONDITION,
even if "unlock" does __wake_up(nr_exclusive => 1).

Of course we can fix this, but wait_for_completion/complete already
does the necessary work: x->done acts as a resource counter which is
always checked/incremented/decremented under the same lock.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-03 22:00                     ` Paul E. McKenney
@ 2013-10-04 11:29                       ` Oleg Nesterov
  2013-10-04 16:22                         ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 11:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/03, Paul E. McKenney wrote:
>
> On Thu, Oct 03, 2013 at 11:10:09PM +0200, Oleg Nesterov wrote:
> >
> > rcusync: introduce rcu_sync_struct->exclusive mode
> >
> > CHANGELOG.
>
> Should the changelog really be in all caps?  (Sorry, couldn't resist...)

Apparently you do not realize it is going to be an EXCELLENT changelog!

> > @@ -53,9 +55,13 @@ void rcu_sync_enter(struct rcu_sync_struct *rss)
> >  	if (need_sync) {
> >  		rss->ops->sync();
> >  		rss->gp_state = GP_PASSED;
> > -		wake_up_all(&rss->gp_wait);
> > +		if (!rss->exclusive)
> > +			wake_up_all(&rss->gp_comp.wait);
>
> Not sure about the wake_up_all() on a completion,

Yes, we reuse completion->wait in the !exclusive case. Like we reuse
its spinlock as rss_lock.

We can add a completion/complete union, but this will complicate the
code a bit and imo doesn't make sense.

> but if we are exclusive,
> don't we need to complete() the completion here?

No, if we are exclusive we should delay the "wake up the next writer"
till rcu_sync_exit().

> Oh, I guess gp_comp.wait is exactly a wait_queue_head_t, so I guess
> you can do wake_up_all() on it...

Yes, and we never "mix" completion/wait_queue_head_t operations/members.
IOW, we always use ->gp_comp if "exclusive", and only ->gp_comp.wait is
used otherwise.

> Never mind!!!

Agreed ;)

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 11:15                       ` Oleg Nesterov
@ 2013-10-04 11:36                         ` Peter Zijlstra
  2013-10-04 11:50                           ` Oleg Nesterov
  2013-10-04 11:44                         ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04 11:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 01:15:13PM +0200, Oleg Nesterov wrote:
> Of course we can fix this, but wait_for_completion/complete already
> does the necessary work: x->done acts as a resource counter which is
> always checked/incremented/decremented under the same lock.

Urgh, so now you're not using a semaphore as completion but using a
completion as semaphore.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 11:15                       ` Oleg Nesterov
  2013-10-04 11:36                         ` Peter Zijlstra
@ 2013-10-04 11:44                         ` Peter Zijlstra
  2013-10-04 12:13                           ` Oleg Nesterov
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04 11:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 01:15:13PM +0200, Oleg Nesterov wrote:
> > What's exclusive to mean? One writer at a time?
> 
> Yes,

I'm not entirely sure what the advantage is of having that logic in this
primitive. Shouldn't that be something the user of this rcu_sync stuff
does (or not) depending on its needs.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 11:36                         ` Peter Zijlstra
@ 2013-10-04 11:50                           ` Oleg Nesterov
  0 siblings, 0 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 04, 2013 at 01:15:13PM +0200, Oleg Nesterov wrote:
> > Of course we can fix this, but wait_for_completion/complete already
> > does the necessary work: x->done acts as a resource counter which is
> > always checked/incremented/decremented under the same lock.
>
> Urgh, so now you're not using a semaphore as completion but using a
> completion as semaphore.

Yes. And we can use it as semaphore only because we already have gp_count.

Of course, we could add ->gp_mutex and change enter/exit to lock/unlock
it if exclusive. But I do not really like it even if this can makes the
code a bit (only a little bit) simpler. This will increase sizeof() and
for no reason, I think, because to some degree this will also duplicate
the synchronization logic. And while this doesn't really matter, this
will penalize the contending writers a bit: it is not easy (but of
course possible) to take this mutex after rss->gp_count++, and we do
want to increment this counter "asap".

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 11:44                         ` Peter Zijlstra
@ 2013-10-04 12:13                           ` Oleg Nesterov
  2013-10-04 12:38                             ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 04, 2013 at 01:15:13PM +0200, Oleg Nesterov wrote:
> > > What's exclusive to mean? One writer at a time?
> >
> > Yes,
>
> I'm not entirely sure what the advantage is of having that logic in this
> primitive. Shouldn't that be something the user of this rcu_sync stuff
> does (or not) depending on its needs.

Yes, the user can do the locking itself. But I think this option can help.
If nothing else it can help to avoid another mutex/whatever and unnecessary
wakeup/scheule's, even if this is minor.

And. rcu_sync_enter() should be "bool", it should return "need_sync". IOW,
rcu_sync_enter() == T means that this thread has done the FAST -> SLOW
transition, this is particularly useful in "exclusive" mode.

Consider percpu_down_write(). It takes rw_sem for writing (and this blocks
the readers) before clear_fast_ctr(), but we only need to do this this
after sync_sched(), so it could do

	if (rcu_sync_enter(&brw->rcu_sync))
		atomic_add(clear_fast_ctr(brw), &brw->slow_read_ctr);
	else
		; /* the above was already done */

	/* exclude readers */
	down_write(&brw->rw_sem);

and now ->rw_sem is only needed to serialize readers/writer.

Sure, this all is minor (and we will probably copy the "pending writer"
logic from cpu_hotplug_begin/get_online_cpus).

But we can get this feature almost for free, so I think it makes sense.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 12:13                           ` Oleg Nesterov
@ 2013-10-04 12:38                             ` Peter Zijlstra
  2013-10-04 13:31                               ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04 12:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 02:13:00PM +0200, Oleg Nesterov wrote:
> On 10/04, Peter Zijlstra wrote:
> >
> > On Fri, Oct 04, 2013 at 01:15:13PM +0200, Oleg Nesterov wrote:
> > > > What's exclusive to mean? One writer at a time?
> > >
> > > Yes,
> >
> > I'm not entirely sure what the advantage is of having that logic in this
> > primitive. Shouldn't that be something the user of this rcu_sync stuff
> > does (or not) depending on its needs.
> 
> Yes, the user can do the locking itself. But I think this option can help.
> If nothing else it can help to avoid another mutex/whatever and unnecessary
> wakeup/scheule's, even if this is minor.
> 
> And. rcu_sync_enter() should be "bool", it should return "need_sync". IOW,
> rcu_sync_enter() == T means that this thread has done the FAST -> SLOW
> transition, this is particularly useful in "exclusive" mode.
> 
> Consider percpu_down_write(). It takes rw_sem for writing (and this blocks
> the readers) before clear_fast_ctr(), but we only need to do this this
> after sync_sched(), so it could do
> 
> 	if (rcu_sync_enter(&brw->rcu_sync))
> 		atomic_add(clear_fast_ctr(brw), &brw->slow_read_ctr);
> 	else
> 		; /* the above was already done */
> 
> 	/* exclude readers */
> 	down_write(&brw->rw_sem);
> 
> and now ->rw_sem is only needed to serialize readers/writer.
> 
> Sure, this all is minor (and we will probably copy the "pending writer"
> logic from cpu_hotplug_begin/get_online_cpus).
> 
> But we can get this feature almost for free, so I think it makes sense.

Well, the whole reason I asked is because adding that completion in
there didn't at all smell like free to me; not to mention that I hadn't
at all realized you're using it as a semaphore.

Also; what would be the use once you convert the per-cpu rwsem over to
the scheme I used with hotplug?

I'm really starting to think we shouldn't do this in rcu_sync at all.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 12:38                             ` Peter Zijlstra
@ 2013-10-04 13:31                               ` Oleg Nesterov
  2013-10-04 14:43                                 ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 04, 2013 at 02:13:00PM +0200, Oleg Nesterov wrote:
> > On 10/04, Peter Zijlstra wrote:
> > >
> > > I'm not entirely sure what the advantage is of having that logic in this
> > > primitive. Shouldn't that be something the user of this rcu_sync stuff
> > > does (or not) depending on its needs.
> >
> > Yes, the user can do the locking itself. But I think this option can help.
> > If nothing else it can help to avoid another mutex/whatever and unnecessary
> > wakeup/scheule's, even if this is minor.
> >
> > And. rcu_sync_enter() should be "bool", it should return "need_sync". IOW,
> > rcu_sync_enter() == T means that this thread has done the FAST -> SLOW
> > transition, this is particularly useful in "exclusive" mode.
> >
> > Consider percpu_down_write(). It takes rw_sem for writing (and this blocks
> > the readers) before clear_fast_ctr(), but we only need to do this this
> > after sync_sched(), so it could do
> >
> > 	if (rcu_sync_enter(&brw->rcu_sync))
> > 		atomic_add(clear_fast_ctr(brw), &brw->slow_read_ctr);
> > 	else
> > 		; /* the above was already done */
> >
> > 	/* exclude readers */
> > 	down_write(&brw->rw_sem);
> >
> > and now ->rw_sem is only needed to serialize readers/writer.
> >
> > Sure, this all is minor (and we will probably copy the "pending writer"
> > logic from cpu_hotplug_begin/get_online_cpus).
> >
> > But we can get this feature almost for free, so I think it makes sense.
>
> Well, the whole reason I asked is because adding that completion in
> there didn't at all smell like free to me;

Why? this only adds sizeof(long).

If you dislike the idea to add the new __complete_locked() one-liner,
this is not strictly necessary, just a bit simpler/understandable.

> not to mention that I hadn't
> at all realized you're using it as a semaphore.

And the logic is trivial.

> Also; what would be the use once you convert the per-cpu rwsem over to
> the scheme I used with hotplug?

It will still use the exclusive more to block other writers? This avoids
another mutex simplifies the code.

> I'm really starting to think we shouldn't do this in rcu_sync at all.

I do not really understand why you insist that rcu_sync() should not
try to help to the users which need the exclusive mode.

rcu_sync_enter/exit have to do some work to serialize with each other
anyway, we already have ->nr_writers, so why we can't add 3 simple
"if exclusive" checks?

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 13:31                               ` Oleg Nesterov
@ 2013-10-04 14:43                                 ` Peter Zijlstra
  2013-10-04 15:13                                   ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04 14:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 03:31:35PM +0200, Oleg Nesterov wrote:
> > I'm really starting to think we shouldn't do this in rcu_sync at all.
> 
> I do not really understand why you insist that rcu_sync() should not
> try to help to the users which need the exclusive mode.

But what other users; you yourself said that percpu-rwsem isn't one for
long if ever.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 14:43                                 ` Peter Zijlstra
@ 2013-10-04 15:13                                   ` Oleg Nesterov
  2013-10-04 16:25                                     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 15:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 04, 2013 at 03:31:35PM +0200, Oleg Nesterov wrote:
> > > I'm really starting to think we shouldn't do this in rcu_sync at all.
> >
> > I do not really understand why you insist that rcu_sync() should not
> > try to help to the users which need the exclusive mode.
>
> But what other users; you yourself said that percpu-rwsem isn't one for
> long if ever.

Not sure I understand...

percpu-rwsem will use exclusive mode, with or without the possible
improvements we can copy from cpuhp. sb_writers can probably use it
too (along with other helpers from percpu-rwsem we do not have yet).

If you mean that the "raw" rcu_sync infrastructure will never have
more users, will I am not sure, it looks "natural". But I can not
know for sure.

OK, I am not going to argue too much, let me send other patches
you seem to agree with. I am also going to send this one for the
record/review (without __complete_locked), please ack-or-nack it
explicitly.

I am not trying saying this feature is "must have", of course it
is not. The only problem, I am a bit puzzled why you dislike it
that much.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 11:29                       ` Oleg Nesterov
@ 2013-10-04 16:22                         ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2013-10-04 16:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 01:29:45PM +0200, Oleg Nesterov wrote:
> On 10/03, Paul E. McKenney wrote:
> >
> > On Thu, Oct 03, 2013 at 11:10:09PM +0200, Oleg Nesterov wrote:
> > >
> > > rcusync: introduce rcu_sync_struct->exclusive mode
> > >
> > > CHANGELOG.
> >
> > Should the changelog really be in all caps?  (Sorry, couldn't resist...)
> 
> Apparently you do not realize it is going to be an EXCELLENT changelog!

;-)

> > > @@ -53,9 +55,13 @@ void rcu_sync_enter(struct rcu_sync_struct *rss)
> > >  	if (need_sync) {
> > >  		rss->ops->sync();
> > >  		rss->gp_state = GP_PASSED;
> > > -		wake_up_all(&rss->gp_wait);
> > > +		if (!rss->exclusive)
> > > +			wake_up_all(&rss->gp_comp.wait);
> >
> > Not sure about the wake_up_all() on a completion,
> 
> Yes, we reuse completion->wait in the !exclusive case. Like we reuse
> its spinlock as rss_lock.
> 
> We can add a completion/complete union, but this will complicate the
> code a bit and imo doesn't make sense.

Fair enough!

							Thanx, Paul

> > but if we are exclusive,
> > don't we need to complete() the completion here?
> 
> No, if we are exclusive we should delay the "wake up the next writer"
> till rcu_sync_exit().
> 
> > Oh, I guess gp_comp.wait is exactly a wait_queue_head_t, so I guess
> > you can do wake_up_all() on it...
> 
> Yes, and we never "mix" completion/wait_queue_head_t operations/members.
> IOW, we always use ->gp_comp if "exclusive", and only ->gp_comp.wait is
> used otherwise.
> 
> > Never mind!!!
> 
> Agreed ;)
> 
> Oleg.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 15:13                                   ` Oleg Nesterov
@ 2013-10-04 16:25                                     ` Peter Zijlstra
  2013-10-04 19:06                                       ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04 16:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 05:13:23PM +0200, Oleg Nesterov wrote:
> Not sure I understand...
> 
> percpu-rwsem will use exclusive mode, with or without the possible
> improvements we can copy from cpuhp. sb_writers can probably use it
> too (along with other helpers from percpu-rwsem we do not have yet).

Oh; I thought to have understood it would go away when you switched to
the reader scheme from the hotplug bits.

But thinking a little more; yes you'd still need something to serialize
writers.

> If you mean that the "raw" rcu_sync infrastructure will never have
> more users, will I am not sure, it looks "natural". But I can not
> know for sure.

Right, so this would be something I'd forgo if there were no immediate
users. We could always reconsider if there was one; but apparently
percpu-rwsem is one..

> I am not trying saying this feature is "must have", of course it
> is not. The only problem, I am a bit puzzled why you dislike it
> that much.

The reason I dislike it is because I feel we're now mixing two objects
into one; one object doing mutual exclusion and one object being
terribly smart with sync_rcu.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 16:25                                     ` Peter Zijlstra
@ 2013-10-04 19:06                                       ` Oleg Nesterov
  2013-10-04 19:41                                         ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-04 19:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 04, 2013 at 05:13:23PM +0200, Oleg Nesterov wrote:
>
> > I am not trying saying this feature is "must have", of course it
> > is not. The only problem, I am a bit puzzled why you dislike it
> > that much.
>
> The reason I dislike it is because I feel we're now mixing two objects
> into one; one object doing mutual exclusion and one object being
> terribly smart with sync_rcu.

OK, I see your point.

But rcu_sync_struct has to serialize the writers anyway. The only
question is how many other writers the thread doing ->sync() should
wakeup and when.

And otoh. Currently nobody needs the non-exclusive mode (cpu-hotplug
doesn't care because it is always exclusive itself). And in fact you
initially argued with wake_up_all ;) "exclusive" is more natural, it
is like rw_semaphore.

However, yes-yes-yes, I do think that we need the non-exclusive mode
too, at least for percpu_down_write_nonexclusive() which I think we
need as well.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 19:06                                       ` Oleg Nesterov
@ 2013-10-04 19:41                                         ` Peter Zijlstra
  2013-10-05 17:31                                           ` Oleg Nesterov
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-04 19:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Fri, Oct 04, 2013 at 09:06:53PM +0200, Oleg Nesterov wrote:
> However, yes-yes-yes, I do think that we need the non-exclusive mode
> too, at least for percpu_down_write_nonexclusive() which I think we
> need as well.

I just need to disagree with the write_nonexclusive() name; the
construct I quite understand and could even agree with.

How about something like:

State excluding 'writers', but not itself:
	percpu_read_lock()
	percpu_read_unlock()

State excluding readers, but not itself:
	percpu_non_read_lock();
	percpu_non_read_unlock();

Full exclusive state:
	percpu_write_lock();
	percpu_write_unlock();

At which point I start to have doubts about the percpu prefix.. ;-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-04 19:41                                         ` Peter Zijlstra
@ 2013-10-05 17:31                                           ` Oleg Nesterov
  0 siblings, 0 replies; 46+ messages in thread
From: Oleg Nesterov @ 2013-10-05 17:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On 10/04, Peter Zijlstra wrote:
>
> On Fri, Oct 04, 2013 at 09:06:53PM +0200, Oleg Nesterov wrote:
> > However, yes-yes-yes, I do think that we need the non-exclusive mode
> > too, at least for percpu_down_write_nonexclusive() which I think we
> > need as well.
>
> I just need to disagree with the write_nonexclusive() name; the
> construct I quite understand and could even agree with.
>
> How about something like:

I am fine either way ;)

> At which point I start to have doubts about the percpu prefix.. ;-)

Yes... and "percpu" just refers to implementation details, this is
pointless.

Oleg.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rcu: Create rcu_sync infrastructure
  2013-10-02 15:49   ` Oleg Nesterov
  2013-10-03 16:42     ` Paul E. McKenney
@ 2013-10-08  8:18     ` Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2013-10-08  8:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul McKenney, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Ingo Molnar, Andrea Arcangeli, Johannes Weiner, Thomas Gleixner,
	Steven Rostedt, Linus Torvalds, linux-kernel

On Wed, Oct 02, 2013 at 05:49:01PM +0200, Oleg Nesterov wrote:
> On 10/02, Peter Zijlstra wrote:
> >
> > From: Oleg Nesterov <oleg@redhat.com>
> 
> Thanks! I was writing the patch, and I chose almost the same naming ;)
> 
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> In fact I'd like to add my sob to 1/3 and 3/3 as well.

I think the rules for SOB are that you either have to write or pass the
patch along in an aggregate work; you did neither -- although I much
value your contributions.

Can I make that a reviewed-by for both instead?

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-10-08  8:18 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-02 14:56 [PATCH 0/3] Optimize the cpu hotplug locking Peter Zijlstra
2013-10-02 14:56 ` [PATCH 1/3] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
2013-10-03 14:01   ` Peter Zijlstra
2013-10-03 16:27     ` Paul E. McKenney
2013-10-03 16:26   ` Paul E. McKenney
2013-10-02 14:56 ` [PATCH 2/3] rcu: Create rcu_sync infrastructure Peter Zijlstra
2013-10-02 15:49   ` Oleg Nesterov
2013-10-03 16:42     ` Paul E. McKenney
2013-10-08  8:18     ` Peter Zijlstra
2013-10-03 16:41   ` Paul E. McKenney
2013-10-03 17:00     ` Oleg Nesterov
2013-10-03 17:15       ` Paul E. McKenney
2013-10-03 18:40     ` Peter Zijlstra
2013-10-03 18:45       ` Paul E. McKenney
2013-10-03 18:47       ` Oleg Nesterov
2013-10-03 19:21         ` Paul E. McKenney
2013-10-03 19:32           ` Oleg Nesterov
2013-10-03 19:33             ` Oleg Nesterov
2013-10-03 19:50               ` Paul E. McKenney
2013-10-03 20:00                 ` Oleg Nesterov
2013-10-03 21:10                   ` Oleg Nesterov
2013-10-03 22:00                     ` Paul E. McKenney
2013-10-04 11:29                       ` Oleg Nesterov
2013-10-04 16:22                         ` Paul E. McKenney
2013-10-04  7:18                     ` Peter Zijlstra
2013-10-04 11:15                       ` Oleg Nesterov
2013-10-04 11:36                         ` Peter Zijlstra
2013-10-04 11:50                           ` Oleg Nesterov
2013-10-04 11:44                         ` Peter Zijlstra
2013-10-04 12:13                           ` Oleg Nesterov
2013-10-04 12:38                             ` Peter Zijlstra
2013-10-04 13:31                               ` Oleg Nesterov
2013-10-04 14:43                                 ` Peter Zijlstra
2013-10-04 15:13                                   ` Oleg Nesterov
2013-10-04 16:25                                     ` Peter Zijlstra
2013-10-04 19:06                                       ` Oleg Nesterov
2013-10-04 19:41                                         ` Peter Zijlstra
2013-10-05 17:31                                           ` Oleg Nesterov
2013-10-04  7:00                   ` Peter Zijlstra
2013-10-03 20:14       ` Paolo Bonzini
2013-10-04  7:01         ` Peter Zijlstra
2013-10-02 14:56 ` [PATCH 3/3] hotplug: Optimize cpu_hotplug_{begin,done}() using rcu_sync Peter Zijlstra
2013-10-03 16:48   ` Paul E. McKenney
2013-10-03 18:41     ` Peter Zijlstra
2013-10-03 18:46       ` Paul E. McKenney
2013-10-03 19:05       ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.