All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()
Date: Tue, 1 Oct 2013 13:40:07 -0700	[thread overview]
Message-ID: <20131001204007.GA13320@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130926111042.GS3081@twins.programming.kicks-ass.net>

On Thu, Sep 26, 2013 at 01:10:42PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 25, 2013 at 02:22:00PM -0700, Paul E. McKenney wrote:
> > A couple of nits and some commentary, but if there are races, they are
> > quite subtle.  ;-)
> 
> *whee*..
> 
> I made one little change in the logic; I moved the waitcount increment
> to before the __put_online_cpus() call, such that the writer will have
> to wait for us to wake up before trying again -- not for us to actually
> have acquired the read lock, for that we'd need to mess up
> __get_online_cpus() a bit more.
> 
> Complete patch below.

OK, looks like Oleg is correct, the cpuhp_seq can be dispensed with.

I still don't see anything wrong with it, so time for a serious stress
test on a large system.  ;-)

Additional commentary interspersed.

							Thanx, Paul

> ---
> Subject: hotplug: Optimize {get,put}_online_cpus()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Sep 17 16:17:11 CEST 2013
> 
> The current implementation of get_online_cpus() is global of nature
> and thus not suited for any kind of common usage.
> 
> Re-implement the current recursive r/w cpu hotplug lock such that the
> read side locks are as light as possible.
> 
> The current cpu hotplug lock is entirely reader biased; but since
> readers are expensive there aren't a lot of them about and writer
> starvation isn't a particular problem.
> 
> However by making the reader side more usable there is a fair chance
> it will get used more and thus the starvation issue becomes a real
> possibility.
> 
> Therefore this new implementation is fair, alternating readers and
> writers; this however requires per-task state to allow the reader
> recursion.
> 
> Many comments are contributed by Paul McKenney, and many previous
> attempts were shown to be inadequate by both Paul and Oleg; many
> thanks to them for persisting to poke holes in my attempts.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  include/linux/cpu.h   |   58 +++++++++++++
>  include/linux/sched.h |    3 
>  kernel/cpu.c          |  209 +++++++++++++++++++++++++++++++++++---------------
>  kernel/sched/core.c   |    2 
>  4 files changed, 208 insertions(+), 64 deletions(-)

I stripped the removed lines to keep my eyes from going buggy.

> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -16,6 +16,7 @@
>  #include <linux/node.h>
>  #include <linux/compiler.h>
>  #include <linux/cpumask.h>
> +#include <linux/percpu.h>
> 
>  struct device;
> 
> @@ -173,10 +174,61 @@ extern struct bus_type cpu_subsys;
>  #ifdef CONFIG_HOTPLUG_CPU
>  /* Stop CPUs going up and down. */
> 
> +extern void cpu_hotplug_init_task(struct task_struct *p);
> +
>  extern void cpu_hotplug_begin(void);
>  extern void cpu_hotplug_done(void);
> +
> +extern int __cpuhp_state;
> +DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
> +
> +extern void __get_online_cpus(void);
> +
> +static inline void get_online_cpus(void)
> +{
> +	might_sleep();
> +
> +	/* Support reader recursion */
> +	/* The value was >= 1 and remains so, reordering causes no harm. */
> +	if (current->cpuhp_ref++)
> +		return;
> +
> +	preempt_disable();
> +	if (likely(!__cpuhp_state)) {
> +		/* The barrier here is supplied by synchronize_sched(). */

I guess I shouldn't complain about the comment given where it came
from, but...

A more accurate comment would say that we are in an RCU-sched read-side
critical section, so the writer cannot both change __cpuhp_state from
readers_fast and start checking counters while we are here.  So if we see
!__cpuhp_state, we know that the writer won't be checking until we past
the preempt_enable() and that once the synchronize_sched() is done,
the writer will see anything we did within this RCU-sched read-side
critical section.

(The writer -can- change __cpuhp_state from readers_slow to readers_block
while we are in this read-side critical section and then start summing
counters, but that corresponds to a different "if" statement.)

> +		__this_cpu_inc(__cpuhp_refcount);
> +	} else {
> +		__get_online_cpus(); /* Unconditional memory barrier. */
> +	}
> +	preempt_enable();
> +	/*
> +	 * The barrier() from preempt_enable() prevents the compiler from
> +	 * bleeding the critical section out.
> +	 */
> +}
> +
> +extern void __put_online_cpus(void);
> +
> +static inline void put_online_cpus(void)
> +{
> +	/* The value was >= 1 and remains so, reordering causes no harm. */
> +	if (--current->cpuhp_ref)
> +		return;
> +
> +	/*
> +	 * The barrier() in preempt_disable() prevents the compiler from
> +	 * bleeding the critical section out.
> +	 */
> +	preempt_disable();
> +	if (likely(!__cpuhp_state)) {
> +		/* The barrier here is supplied by synchronize_sched().  */

Same here, both for the implied self-criticism and the more complete story.

Due to the basic RCU guarantee, the writer cannot both change __cpuhp_state
and start checking counters while we are in this RCU-sched read-side
critical section.  And again, if the synchronize_sched() had to wait on
us (or if we were early enough that no waiting was needed), then once
the synchronize_sched() completes, the writer will see anything that we
did within this RCU-sched read-side critical section.

> +		__this_cpu_dec(__cpuhp_refcount);
> +	} else {
> +		__put_online_cpus(); /* Unconditional memory barrier. */
> +	}
> +	preempt_enable();
> +}
> +
>  extern void cpu_hotplug_disable(void);
>  extern void cpu_hotplug_enable(void);
>  #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
> @@ -200,6 +252,8 @@ static inline void cpu_hotplug_driver_un
> 
>  #else		/* CONFIG_HOTPLUG_CPU */
> 
> +static inline void cpu_hotplug_init_task(struct task_struct *p) {}
> +
>  static inline void cpu_hotplug_begin(void) {}
>  static inline void cpu_hotplug_done(void) {}
>  #define get_online_cpus()	do { } while (0)
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1454,6 +1454,9 @@ struct task_struct {
>  	unsigned int	sequential_io;
>  	unsigned int	sequential_io_avg;
>  #endif
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int		cpuhp_ref;
> +#endif
>  };
> 
>  /* Future-safe accessor for struct task_struct's cpus_allowed. */
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -49,88 +49,173 @@ static int cpu_hotplug_disabled;
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> 
> +enum { readers_fast = 0, readers_slow, readers_block };
> +
> +int __cpuhp_state;
> +EXPORT_SYMBOL_GPL(__cpuhp_state);
> +
> +DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
> +EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
> +
> +static DEFINE_PER_CPU(unsigned int, cpuhp_seq);
> +static atomic_t cpuhp_waitcount;
> +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_readers);
> +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_writer);
> +
> +void cpu_hotplug_init_task(struct task_struct *p)
> +{
> +	p->cpuhp_ref = 0;
> +}
> +
> +void __get_online_cpus(void)
> +{
> +again:
> +	/* See __srcu_read_lock() */
> +	__this_cpu_inc(__cpuhp_refcount);
> +	smp_mb(); /* A matches B, E */
> +	// __this_cpu_inc(cpuhp_seq);

Deleting the above per Oleg's suggestion.  We still need the preceding
memory barrier.

> +
> +	if (unlikely(__cpuhp_state == readers_block)) {
> +		/*
> +		 * Make sure an outgoing writer sees the waitcount to ensure
> +		 * we make progress.
> +		 */
> +		atomic_inc(&cpuhp_waitcount);
> +		__put_online_cpus();

The decrement happens on the same CPU as the increment, avoiding the
increment-on-one-CPU-and-decrement-on-another problem.

And yes, if the reader misses the writer's assignment of readers_block
to __cpuhp_state, then the writer is guaranteed to see the reader's
increment.  Conversely, any readers that increment their __cpuhp_refcount
after the writer looks are guaranteed to see the readers_block value,
which in turn means that they are guaranteed to immediately decrement
their __cpuhp_refcount, so that it doesn't matter that the writer
missed them.

Unfortunately, this trick does not apply back to SRCU, at least not
without adding a second memory barrier to the srcu_read_lock() path
(one to separate reading the index from incrementing the counter and
another to separate incrementing the counter from the critical section.
Can't have everything, I guess!

> +
> +		/*
> +		 * We either call schedule() in the wait, or we'll fall through
> +		 * and reschedule on the preempt_enable() in get_online_cpus().
> +		 */
> +		preempt_enable_no_resched();
> +		__wait_event(cpuhp_readers, __cpuhp_state != readers_block);
> +		preempt_disable();
> +
> +		if (atomic_dec_and_test(&cpuhp_waitcount))
> +			wake_up_all(&cpuhp_writer);

I still don't see why this is a wake_up_all() given that there can be
only one writer.  Not that it makes much difference, but...

> +
> +		goto again;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(__get_online_cpus);
> 
> +void __put_online_cpus(void)
>  {
> +	/* See __srcu_read_unlock() */
> +	smp_mb(); /* C matches D */
> +	/*
> +	 * In other words, if they see our decrement (presumably to aggregate
> +	 * zero, as that is the only time it matters) they will also see our
> +	 * critical section.
> +	 */
> +	this_cpu_dec(__cpuhp_refcount);
> 
> +	/* Prod writer to recheck readers_active */
> +	wake_up_all(&cpuhp_writer);
>  }
> +EXPORT_SYMBOL_GPL(__put_online_cpus);
> +
> +#define per_cpu_sum(var)						\
> +({ 									\
> + 	typeof(var) __sum = 0;						\
> + 	int cpu;							\
> + 	for_each_possible_cpu(cpu)					\
> + 		__sum += per_cpu(var, cpu);				\
> + 	__sum;								\
> +)}
> 
> +/*
> + * See srcu_readers_active_idx_check() for a rather more detailed explanation.
> + */
> +static bool cpuhp_readers_active_check(void)
>  {
> +	// unsigned int seq = per_cpu_sum(cpuhp_seq);

Delete the above per Oleg's suggestion.

> +
> +	smp_mb(); /* B matches A */
> +
> +	/*
> +	 * In other words, if we see __get_online_cpus() cpuhp_seq increment,
> +	 * we are guaranteed to also see its __cpuhp_refcount increment.
> +	 */
> 
> +	if (per_cpu_sum(__cpuhp_refcount) != 0)
> +		return false;
> 
> +	smp_mb(); /* D matches C */
> 
> +	/*
> +	 * On equality, we know that there could not be any "sneak path" pairs
> +	 * where we see a decrement but not the corresponding increment for a
> +	 * given reader. If we saw its decrement, the memory barriers guarantee
> +	 * that we now see its cpuhp_seq increment.
> +	 */
> +
> +	// return per_cpu_sum(cpuhp_seq) == seq;

Delete the above per Oleg's suggestion, but actually need to replace with
"return true;".  We should be able to get rid of the first memory barrier
(B matches A) because the smp_mb() in cpu_hotplug_begin() covers it, but we
cannot git rid of the second memory barrier (D matches C).

>  }
> 
>  /*
> + * This will notify new readers to block and wait for all active readers to
> + * complete.
>   */
>  void cpu_hotplug_begin(void)
>  {
> +	/*
> +	 * Since cpu_hotplug_begin() is always called after invoking
> +	 * cpu_maps_update_begin(), we can be sure that only one writer is
> +	 * active.
> +	 */
> +	lockdep_assert_held(&cpu_add_remove_lock);
> 
> +	/* Allow reader-in-writer recursion. */
> +	current->cpuhp_ref++;
> +
> +	/* Notify readers to take the slow path. */
> +	__cpuhp_state = readers_slow;
> +
> +	/* See percpu_down_write(); guarantees all readers take the slow path */
> +	synchronize_sched();
> +
> +	/*
> +	 * Notify new readers to block; up until now, and thus throughout the
> +	 * longish synchronize_sched() above, new readers could still come in.
> +	 */
> +	__cpuhp_state = readers_block;
> +
> +	smp_mb(); /* E matches A */
> +
> +	/*
> +	 * If they don't see our writer of readers_block to __cpuhp_state,
> +	 * then we are guaranteed to see their __cpuhp_refcount increment, and
> +	 * therefore will wait for them.
> +	 */
> +
> +	/* Wait for all now active readers to complete. */
> +	wait_event(cpuhp_writer, cpuhp_readers_active_check());
>  }
> 
>  void cpu_hotplug_done(void)
>  {
> +	/* Signal the writer is done, no fast path yet. */
> +	__cpuhp_state = readers_slow;
> +	wake_up_all(&cpuhp_readers);

And one reason that we cannot just immediately flip to readers_fast
is that new readers might fail to see the results of this writer's
critical section.

> +
> +	/*
> +	 * The wait_event()/wake_up_all() prevents the race where the readers
> +	 * are delayed between fetching __cpuhp_state and blocking.
> +	 */
> +
> +	/* See percpu_up_write(); readers will no longer attempt to block. */
> +	synchronize_sched();
> +
> +	/* Let 'em rip */
> +	__cpuhp_state = readers_fast;
> +	current->cpuhp_ref--;
> +
> +	/*
> +	 * Wait for any pending readers to be running. This ensures readers
> +	 * after writer and avoids writers starving readers.
> +	 */
> +	wait_event(cpuhp_writer, !atomic_read(&cpuhp_waitcount));
>  }
> 
>  /*
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1736,6 +1736,8 @@ static void __sched_fork(unsigned long c
>  	INIT_LIST_HEAD(&p->numa_entry);
>  	p->numa_group = NULL;
>  #endif /* CONFIG_NUMA_BALANCING */
> +
> +	cpu_hotplug_init_task(p);
>  }
> 
>  #ifdef CONFIG_NUMA_BALANCING
> 


WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()
Date: Tue, 1 Oct 2013 13:40:07 -0700	[thread overview]
Message-ID: <20131001204007.GA13320@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130926111042.GS3081@twins.programming.kicks-ass.net>

On Thu, Sep 26, 2013 at 01:10:42PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 25, 2013 at 02:22:00PM -0700, Paul E. McKenney wrote:
> > A couple of nits and some commentary, but if there are races, they are
> > quite subtle.  ;-)
> 
> *whee*..
> 
> I made one little change in the logic; I moved the waitcount increment
> to before the __put_online_cpus() call, such that the writer will have
> to wait for us to wake up before trying again -- not for us to actually
> have acquired the read lock, for that we'd need to mess up
> __get_online_cpus() a bit more.
> 
> Complete patch below.

OK, looks like Oleg is correct, the cpuhp_seq can be dispensed with.

I still don't see anything wrong with it, so time for a serious stress
test on a large system.  ;-)

Additional commentary interspersed.

							Thanx, Paul

> ---
> Subject: hotplug: Optimize {get,put}_online_cpus()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Sep 17 16:17:11 CEST 2013
> 
> The current implementation of get_online_cpus() is global of nature
> and thus not suited for any kind of common usage.
> 
> Re-implement the current recursive r/w cpu hotplug lock such that the
> read side locks are as light as possible.
> 
> The current cpu hotplug lock is entirely reader biased; but since
> readers are expensive there aren't a lot of them about and writer
> starvation isn't a particular problem.
> 
> However by making the reader side more usable there is a fair chance
> it will get used more and thus the starvation issue becomes a real
> possibility.
> 
> Therefore this new implementation is fair, alternating readers and
> writers; this however requires per-task state to allow the reader
> recursion.
> 
> Many comments are contributed by Paul McKenney, and many previous
> attempts were shown to be inadequate by both Paul and Oleg; many
> thanks to them for persisting to poke holes in my attempts.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  include/linux/cpu.h   |   58 +++++++++++++
>  include/linux/sched.h |    3 
>  kernel/cpu.c          |  209 +++++++++++++++++++++++++++++++++++---------------
>  kernel/sched/core.c   |    2 
>  4 files changed, 208 insertions(+), 64 deletions(-)

I stripped the removed lines to keep my eyes from going buggy.

> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -16,6 +16,7 @@
>  #include <linux/node.h>
>  #include <linux/compiler.h>
>  #include <linux/cpumask.h>
> +#include <linux/percpu.h>
> 
>  struct device;
> 
> @@ -173,10 +174,61 @@ extern struct bus_type cpu_subsys;
>  #ifdef CONFIG_HOTPLUG_CPU
>  /* Stop CPUs going up and down. */
> 
> +extern void cpu_hotplug_init_task(struct task_struct *p);
> +
>  extern void cpu_hotplug_begin(void);
>  extern void cpu_hotplug_done(void);
> +
> +extern int __cpuhp_state;
> +DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
> +
> +extern void __get_online_cpus(void);
> +
> +static inline void get_online_cpus(void)
> +{
> +	might_sleep();
> +
> +	/* Support reader recursion */
> +	/* The value was >= 1 and remains so, reordering causes no harm. */
> +	if (current->cpuhp_ref++)
> +		return;
> +
> +	preempt_disable();
> +	if (likely(!__cpuhp_state)) {
> +		/* The barrier here is supplied by synchronize_sched(). */

I guess I shouldn't complain about the comment given where it came
from, but...

A more accurate comment would say that we are in an RCU-sched read-side
critical section, so the writer cannot both change __cpuhp_state from
readers_fast and start checking counters while we are here.  So if we see
!__cpuhp_state, we know that the writer won't be checking until we past
the preempt_enable() and that once the synchronize_sched() is done,
the writer will see anything we did within this RCU-sched read-side
critical section.

(The writer -can- change __cpuhp_state from readers_slow to readers_block
while we are in this read-side critical section and then start summing
counters, but that corresponds to a different "if" statement.)

> +		__this_cpu_inc(__cpuhp_refcount);
> +	} else {
> +		__get_online_cpus(); /* Unconditional memory barrier. */
> +	}
> +	preempt_enable();
> +	/*
> +	 * The barrier() from preempt_enable() prevents the compiler from
> +	 * bleeding the critical section out.
> +	 */
> +}
> +
> +extern void __put_online_cpus(void);
> +
> +static inline void put_online_cpus(void)
> +{
> +	/* The value was >= 1 and remains so, reordering causes no harm. */
> +	if (--current->cpuhp_ref)
> +		return;
> +
> +	/*
> +	 * The barrier() in preempt_disable() prevents the compiler from
> +	 * bleeding the critical section out.
> +	 */
> +	preempt_disable();
> +	if (likely(!__cpuhp_state)) {
> +		/* The barrier here is supplied by synchronize_sched().  */

Same here, both for the implied self-criticism and the more complete story.

Due to the basic RCU guarantee, the writer cannot both change __cpuhp_state
and start checking counters while we are in this RCU-sched read-side
critical section.  And again, if the synchronize_sched() had to wait on
us (or if we were early enough that no waiting was needed), then once
the synchronize_sched() completes, the writer will see anything that we
did within this RCU-sched read-side critical section.

> +		__this_cpu_dec(__cpuhp_refcount);
> +	} else {
> +		__put_online_cpus(); /* Unconditional memory barrier. */
> +	}
> +	preempt_enable();
> +}
> +
>  extern void cpu_hotplug_disable(void);
>  extern void cpu_hotplug_enable(void);
>  #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
> @@ -200,6 +252,8 @@ static inline void cpu_hotplug_driver_un
> 
>  #else		/* CONFIG_HOTPLUG_CPU */
> 
> +static inline void cpu_hotplug_init_task(struct task_struct *p) {}
> +
>  static inline void cpu_hotplug_begin(void) {}
>  static inline void cpu_hotplug_done(void) {}
>  #define get_online_cpus()	do { } while (0)
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1454,6 +1454,9 @@ struct task_struct {
>  	unsigned int	sequential_io;
>  	unsigned int	sequential_io_avg;
>  #endif
> +#ifdef CONFIG_HOTPLUG_CPU
> +	int		cpuhp_ref;
> +#endif
>  };
> 
>  /* Future-safe accessor for struct task_struct's cpus_allowed. */
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -49,88 +49,173 @@ static int cpu_hotplug_disabled;
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> 
> +enum { readers_fast = 0, readers_slow, readers_block };
> +
> +int __cpuhp_state;
> +EXPORT_SYMBOL_GPL(__cpuhp_state);
> +
> +DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
> +EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
> +
> +static DEFINE_PER_CPU(unsigned int, cpuhp_seq);
> +static atomic_t cpuhp_waitcount;
> +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_readers);
> +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_writer);
> +
> +void cpu_hotplug_init_task(struct task_struct *p)
> +{
> +	p->cpuhp_ref = 0;
> +}
> +
> +void __get_online_cpus(void)
> +{
> +again:
> +	/* See __srcu_read_lock() */
> +	__this_cpu_inc(__cpuhp_refcount);
> +	smp_mb(); /* A matches B, E */
> +	// __this_cpu_inc(cpuhp_seq);

Deleting the above per Oleg's suggestion.  We still need the preceding
memory barrier.

> +
> +	if (unlikely(__cpuhp_state == readers_block)) {
> +		/*
> +		 * Make sure an outgoing writer sees the waitcount to ensure
> +		 * we make progress.
> +		 */
> +		atomic_inc(&cpuhp_waitcount);
> +		__put_online_cpus();

The decrement happens on the same CPU as the increment, avoiding the
increment-on-one-CPU-and-decrement-on-another problem.

And yes, if the reader misses the writer's assignment of readers_block
to __cpuhp_state, then the writer is guaranteed to see the reader's
increment.  Conversely, any readers that increment their __cpuhp_refcount
after the writer looks are guaranteed to see the readers_block value,
which in turn means that they are guaranteed to immediately decrement
their __cpuhp_refcount, so that it doesn't matter that the writer
missed them.

Unfortunately, this trick does not apply back to SRCU, at least not
without adding a second memory barrier to the srcu_read_lock() path
(one to separate reading the index from incrementing the counter and
another to separate incrementing the counter from the critical section.
Can't have everything, I guess!

> +
> +		/*
> +		 * We either call schedule() in the wait, or we'll fall through
> +		 * and reschedule on the preempt_enable() in get_online_cpus().
> +		 */
> +		preempt_enable_no_resched();
> +		__wait_event(cpuhp_readers, __cpuhp_state != readers_block);
> +		preempt_disable();
> +
> +		if (atomic_dec_and_test(&cpuhp_waitcount))
> +			wake_up_all(&cpuhp_writer);

I still don't see why this is a wake_up_all() given that there can be
only one writer.  Not that it makes much difference, but...

> +
> +		goto again;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(__get_online_cpus);
> 
> +void __put_online_cpus(void)
>  {
> +	/* See __srcu_read_unlock() */
> +	smp_mb(); /* C matches D */
> +	/*
> +	 * In other words, if they see our decrement (presumably to aggregate
> +	 * zero, as that is the only time it matters) they will also see our
> +	 * critical section.
> +	 */
> +	this_cpu_dec(__cpuhp_refcount);
> 
> +	/* Prod writer to recheck readers_active */
> +	wake_up_all(&cpuhp_writer);
>  }
> +EXPORT_SYMBOL_GPL(__put_online_cpus);
> +
> +#define per_cpu_sum(var)						\
> +({ 									\
> + 	typeof(var) __sum = 0;						\
> + 	int cpu;							\
> + 	for_each_possible_cpu(cpu)					\
> + 		__sum += per_cpu(var, cpu);				\
> + 	__sum;								\
> +)}
> 
> +/*
> + * See srcu_readers_active_idx_check() for a rather more detailed explanation.
> + */
> +static bool cpuhp_readers_active_check(void)
>  {
> +	// unsigned int seq = per_cpu_sum(cpuhp_seq);

Delete the above per Oleg's suggestion.

> +
> +	smp_mb(); /* B matches A */
> +
> +	/*
> +	 * In other words, if we see __get_online_cpus() cpuhp_seq increment,
> +	 * we are guaranteed to also see its __cpuhp_refcount increment.
> +	 */
> 
> +	if (per_cpu_sum(__cpuhp_refcount) != 0)
> +		return false;
> 
> +	smp_mb(); /* D matches C */
> 
> +	/*
> +	 * On equality, we know that there could not be any "sneak path" pairs
> +	 * where we see a decrement but not the corresponding increment for a
> +	 * given reader. If we saw its decrement, the memory barriers guarantee
> +	 * that we now see its cpuhp_seq increment.
> +	 */
> +
> +	// return per_cpu_sum(cpuhp_seq) == seq;

Delete the above per Oleg's suggestion, but actually need to replace with
"return true;".  We should be able to get rid of the first memory barrier
(B matches A) because the smp_mb() in cpu_hotplug_begin() covers it, but we
cannot git rid of the second memory barrier (D matches C).

>  }
> 
>  /*
> + * This will notify new readers to block and wait for all active readers to
> + * complete.
>   */
>  void cpu_hotplug_begin(void)
>  {
> +	/*
> +	 * Since cpu_hotplug_begin() is always called after invoking
> +	 * cpu_maps_update_begin(), we can be sure that only one writer is
> +	 * active.
> +	 */
> +	lockdep_assert_held(&cpu_add_remove_lock);
> 
> +	/* Allow reader-in-writer recursion. */
> +	current->cpuhp_ref++;
> +
> +	/* Notify readers to take the slow path. */
> +	__cpuhp_state = readers_slow;
> +
> +	/* See percpu_down_write(); guarantees all readers take the slow path */
> +	synchronize_sched();
> +
> +	/*
> +	 * Notify new readers to block; up until now, and thus throughout the
> +	 * longish synchronize_sched() above, new readers could still come in.
> +	 */
> +	__cpuhp_state = readers_block;
> +
> +	smp_mb(); /* E matches A */
> +
> +	/*
> +	 * If they don't see our writer of readers_block to __cpuhp_state,
> +	 * then we are guaranteed to see their __cpuhp_refcount increment, and
> +	 * therefore will wait for them.
> +	 */
> +
> +	/* Wait for all now active readers to complete. */
> +	wait_event(cpuhp_writer, cpuhp_readers_active_check());
>  }
> 
>  void cpu_hotplug_done(void)
>  {
> +	/* Signal the writer is done, no fast path yet. */
> +	__cpuhp_state = readers_slow;
> +	wake_up_all(&cpuhp_readers);

And one reason that we cannot just immediately flip to readers_fast
is that new readers might fail to see the results of this writer's
critical section.

> +
> +	/*
> +	 * The wait_event()/wake_up_all() prevents the race where the readers
> +	 * are delayed between fetching __cpuhp_state and blocking.
> +	 */
> +
> +	/* See percpu_up_write(); readers will no longer attempt to block. */
> +	synchronize_sched();
> +
> +	/* Let 'em rip */
> +	__cpuhp_state = readers_fast;
> +	current->cpuhp_ref--;
> +
> +	/*
> +	 * Wait for any pending readers to be running. This ensures readers
> +	 * after writer and avoids writers starving readers.
> +	 */
> +	wait_event(cpuhp_writer, !atomic_read(&cpuhp_waitcount));
>  }
> 
>  /*
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1736,6 +1736,8 @@ static void __sched_fork(unsigned long c
>  	INIT_LIST_HEAD(&p->numa_entry);
>  	p->numa_group = NULL;
>  #endif /* CONFIG_NUMA_BALANCING */
> +
> +	cpu_hotplug_init_task(p);
>  }
> 
>  #ifdef CONFIG_NUMA_BALANCING
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-10-01 20:40 UTC|newest]

Thread overview: 361+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-10  9:31 [PATCH 0/50] Basic scheduler support for automatic NUMA balancing V7 Mel Gorman
2013-09-10  9:31 ` Mel Gorman
2013-09-10  9:31 ` [PATCH 01/50] sched: monolithic code dump of what is being pushed upstream Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-11  0:58   ` Joonsoo Kim
2013-09-11  0:58     ` Joonsoo Kim
2013-09-11  3:11   ` Hillf Danton
2013-09-11  3:11     ` Hillf Danton
2013-09-13  8:11     ` Mel Gorman
2013-09-13  8:11       ` Mel Gorman
2013-09-10  9:31 ` [PATCH 02/50] mm: numa: Document automatic NUMA balancing sysctls Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 03/50] sched, numa: Comment fixlets Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 04/50] mm: numa: Do not account for a hinting fault if we raced Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 05/50] mm: Wait for THP migrations to complete during NUMA hinting faults Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 06/50] mm: Prevent parallel splits during THP migration Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 07/50] mm: Account for a THP NUMA hinting update as one PTE update Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-16 12:36   ` Peter Zijlstra
2013-09-16 12:36     ` Peter Zijlstra
2013-09-16 13:39     ` Rik van Riel
2013-09-16 13:39       ` Rik van Riel
2013-09-16 14:54       ` Peter Zijlstra
2013-09-16 14:54         ` Peter Zijlstra
2013-09-16 16:11         ` Mel Gorman
2013-09-16 16:11           ` Mel Gorman
2013-09-16 16:37           ` Peter Zijlstra
2013-09-16 16:37             ` Peter Zijlstra
2013-09-10  9:31 ` [PATCH 08/50] mm: numa: Sanitize task_numa_fault() callsites Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 09/50] mm: numa: Do not migrate or account for hinting faults on the zero page Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 10/50] sched: numa: Mitigate chance that same task always updates PTEs Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 11/50] sched: numa: Continue PTE scanning even if migrate rate limited Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 12/50] Revert "mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node" Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 13/50] sched: numa: Initialise numa_next_scan properly Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 14/50] sched: Set the scan rate proportional to the memory usage of the task being scanned Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-16 15:18   ` Peter Zijlstra
2013-09-16 15:18     ` Peter Zijlstra
2013-09-16 15:40     ` Mel Gorman
2013-09-16 15:40       ` Mel Gorman
2013-09-10  9:31 ` [PATCH 15/50] sched: numa: Correct adjustment of numa_scan_period Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 16/50] mm: Only flush TLBs if a transhuge PMD is modified for NUMA pte scanning Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 17/50] mm: Do not flush TLB during protection change if !pte_present && !migration_entry Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-16 16:35   ` Peter Zijlstra
2013-09-16 16:35     ` Peter Zijlstra
2013-09-17 17:00     ` Mel Gorman
2013-09-17 17:00       ` Mel Gorman
2013-09-10  9:31 ` [PATCH 18/50] sched: numa: Slow scan rate if no NUMA hinting faults are being recorded Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:31 ` [PATCH 19/50] sched: Track NUMA hinting faults on per-node basis Mel Gorman
2013-09-10  9:31   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 20/50] sched: Select a preferred node with the most numa hinting faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 21/50] sched: Update NUMA hinting faults once per scan Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 22/50] sched: Favour moving tasks towards the preferred node Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 23/50] sched: Resist moving tasks towards nodes with fewer hinting faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 24/50] sched: Reschedule task on preferred NUMA node once selected Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 25/50] sched: Add infrastructure for split shared/private accounting of NUMA hinting faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 26/50] sched: Check current->mm before allocating NUMA faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 27/50] mm: numa: Scan pages with elevated page_mapcount Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-12  2:10   ` Hillf Danton
2013-09-12  2:10     ` Hillf Danton
2013-09-13  8:11     ` Mel Gorman
2013-09-13  8:11       ` Mel Gorman
2013-09-10  9:32 ` [PATCH 28/50] sched: Remove check that skips small VMAs Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 29/50] sched: Set preferred NUMA node based on number of private faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 30/50] sched: Do not migrate memory immediately after switching node Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 31/50] sched: Avoid overloading CPUs on a preferred NUMA node Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 32/50] sched: Retry migration of tasks to CPU on a preferred node Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 33/50] sched: numa: increment numa_migrate_seq when task runs in correct location Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 34/50] sched: numa: Do not trap hinting faults for shared libraries Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-17  2:02   ` 答复: " 张天飞
2013-09-17  8:05     ` ????: " Mel Gorman
2013-09-17  8:05       ` Mel Gorman
2013-09-17  8:22       ` Figo.zhang
2013-09-10  9:32 ` [PATCH 35/50] mm: numa: Only trap pmd hinting faults if we would otherwise trap PTE faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 36/50] stop_machine: Introduce stop_two_cpus() Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 37/50] sched: Introduce migrate_swap() Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-17 14:30   ` [PATCH] hotplug: Optimize {get,put}_online_cpus() Peter Zijlstra
2013-09-17 14:30     ` Peter Zijlstra
2013-09-17 16:20     ` Mel Gorman
2013-09-17 16:20       ` Mel Gorman
2013-09-17 16:45       ` Peter Zijlstra
2013-09-17 16:45         ` Peter Zijlstra
2013-09-18 15:49         ` Peter Zijlstra
2013-09-18 15:49           ` Peter Zijlstra
2013-09-19 14:32           ` Peter Zijlstra
2013-09-19 14:32             ` Peter Zijlstra
2013-09-21 16:34             ` Oleg Nesterov
2013-09-21 16:34               ` Oleg Nesterov
2013-09-21 19:13               ` Oleg Nesterov
2013-09-21 19:13                 ` Oleg Nesterov
2013-09-23  9:29               ` Peter Zijlstra
2013-09-23  9:29                 ` Peter Zijlstra
2013-09-23 17:32                 ` Oleg Nesterov
2013-09-23 17:32                   ` Oleg Nesterov
2013-09-24 20:24                   ` Peter Zijlstra
2013-09-24 20:24                     ` Peter Zijlstra
2013-09-24 21:02                     ` Peter Zijlstra
2013-09-24 21:02                       ` Peter Zijlstra
2013-09-25 15:55                     ` Oleg Nesterov
2013-09-25 15:55                       ` Oleg Nesterov
2013-09-25 16:59                       ` Paul E. McKenney
2013-09-25 16:59                         ` Paul E. McKenney
2013-09-25 17:43                       ` Peter Zijlstra
2013-09-25 17:43                         ` Peter Zijlstra
2013-09-25 17:50                         ` Oleg Nesterov
2013-09-25 17:50                           ` Oleg Nesterov
2013-09-25 18:40                           ` Peter Zijlstra
2013-09-25 18:40                             ` Peter Zijlstra
2013-09-25 21:22                             ` Paul E. McKenney
2013-09-25 21:22                               ` Paul E. McKenney
2013-09-26 11:10                               ` Peter Zijlstra
2013-09-26 11:10                                 ` Peter Zijlstra
     [not found]                                 ` <20130926155321.GA4342@redhat.com>
2013-09-26 16:13                                   ` Peter Zijlstra
2013-09-26 16:13                                     ` Peter Zijlstra
2013-09-26 16:14                                     ` Oleg Nesterov
2013-09-26 16:14                                       ` Oleg Nesterov
2013-09-26 16:40                                       ` Peter Zijlstra
2013-09-26 16:40                                         ` Peter Zijlstra
2013-09-26 16:58                                 ` Oleg Nesterov
2013-09-26 16:58                                   ` Oleg Nesterov
2013-09-26 17:50                                   ` Peter Zijlstra
2013-09-26 17:50                                     ` Peter Zijlstra
2013-09-27 18:15                                     ` Oleg Nesterov
2013-09-27 18:15                                       ` Oleg Nesterov
2013-09-27 20:41                                       ` Peter Zijlstra
2013-09-27 20:41                                         ` Peter Zijlstra
2013-09-28 12:48                                         ` Oleg Nesterov
2013-09-28 12:48                                           ` Oleg Nesterov
2013-09-28 14:47                                           ` Peter Zijlstra
2013-09-28 14:47                                             ` Peter Zijlstra
2013-09-28 16:31                                             ` Oleg Nesterov
2013-09-28 16:31                                               ` Oleg Nesterov
2013-09-30 20:11                                               ` Rafael J. Wysocki
2013-09-30 20:11                                                 ` Rafael J. Wysocki
2013-10-01 17:11                                                 ` Srivatsa S. Bhat
2013-10-01 17:11                                                   ` Srivatsa S. Bhat
2013-10-01 17:36                                                   ` Peter Zijlstra
2013-10-01 17:36                                                     ` Peter Zijlstra
2013-10-01 17:45                                                     ` Oleg Nesterov
2013-10-01 17:45                                                       ` Oleg Nesterov
2013-10-01 17:56                                                       ` Peter Zijlstra
2013-10-01 17:56                                                         ` Peter Zijlstra
2013-10-01 18:07                                                         ` Oleg Nesterov
2013-10-01 18:07                                                           ` Oleg Nesterov
2013-10-01 19:05                                                           ` Paul E. McKenney
2013-10-01 19:05                                                             ` Paul E. McKenney
2013-10-02 12:16                                                             ` Oleg Nesterov
2013-10-02 12:16                                                               ` Oleg Nesterov
2013-10-02  9:08                                                           ` Peter Zijlstra
2013-10-02  9:08                                                             ` Peter Zijlstra
2013-10-02 12:13                                                             ` Oleg Nesterov
2013-10-02 12:13                                                               ` Oleg Nesterov
2013-10-02 12:25                                                               ` Peter Zijlstra
2013-10-02 12:25                                                                 ` Peter Zijlstra
2013-10-02 13:31                                                               ` Peter Zijlstra
2013-10-02 13:31                                                                 ` Peter Zijlstra
2013-10-02 14:00                                                                 ` Oleg Nesterov
2013-10-02 14:00                                                                   ` Oleg Nesterov
2013-10-02 15:17                                                                   ` Peter Zijlstra
2013-10-02 15:17                                                                     ` Peter Zijlstra
2013-10-02 16:31                                                                     ` Oleg Nesterov
2013-10-02 16:31                                                                       ` Oleg Nesterov
2013-10-02 17:52                                                                   ` Paul E. McKenney
2013-10-02 17:52                                                                     ` Paul E. McKenney
2013-10-01 19:03                                                         ` Srivatsa S. Bhat
2013-10-01 19:03                                                           ` Srivatsa S. Bhat
2013-10-01 18:14                                                     ` Srivatsa S. Bhat
2013-10-01 18:14                                                       ` Srivatsa S. Bhat
2013-10-01 18:56                                                       ` Srivatsa S. Bhat
2013-10-01 18:56                                                         ` Srivatsa S. Bhat
2013-10-02 10:14                                                       ` Srivatsa S. Bhat
2013-10-02 10:14                                                         ` Srivatsa S. Bhat
2013-09-28 20:46                                           ` Paul E. McKenney
2013-09-28 20:46                                             ` Paul E. McKenney
2013-10-01  3:56                                         ` Paul E. McKenney
2013-10-01  3:56                                           ` Paul E. McKenney
2013-10-01 14:14                                           ` Oleg Nesterov
2013-10-01 14:14                                             ` Oleg Nesterov
2013-10-01 14:45                                             ` Paul E. McKenney
2013-10-01 14:45                                               ` Paul E. McKenney
2013-10-01 14:48                                               ` Peter Zijlstra
2013-10-01 14:48                                                 ` Peter Zijlstra
2013-10-01 15:24                                                 ` Paul E. McKenney
2013-10-01 15:24                                                   ` Paul E. McKenney
2013-10-01 15:34                                                   ` Oleg Nesterov
2013-10-01 15:34                                                     ` Oleg Nesterov
2013-10-01 15:00                                               ` Oleg Nesterov
2013-10-01 15:00                                                 ` Oleg Nesterov
2013-09-29 13:56                                       ` Oleg Nesterov
2013-09-29 13:56                                         ` Oleg Nesterov
2013-10-01 15:38                                         ` Paul E. McKenney
2013-10-01 15:38                                           ` Paul E. McKenney
2013-10-01 15:40                                           ` Oleg Nesterov
2013-10-01 15:40                                             ` Oleg Nesterov
2013-10-01 20:40                                 ` Paul E. McKenney [this message]
2013-10-01 20:40                                   ` Paul E. McKenney
2013-09-23 14:50             ` Steven Rostedt
2013-09-23 14:50               ` Steven Rostedt
2013-09-23 14:54               ` Peter Zijlstra
2013-09-23 14:54                 ` Peter Zijlstra
2013-09-23 15:13                 ` Steven Rostedt
2013-09-23 15:13                   ` Steven Rostedt
2013-09-23 15:22                   ` Peter Zijlstra
2013-09-23 15:22                     ` Peter Zijlstra
2013-09-23 15:59                     ` Steven Rostedt
2013-09-23 15:59                       ` Steven Rostedt
2013-09-23 16:02                       ` Peter Zijlstra
2013-09-23 16:02                         ` Peter Zijlstra
2013-09-23 15:50                   ` Paul E. McKenney
2013-09-23 15:50                     ` Paul E. McKenney
2013-09-23 16:01                     ` Peter Zijlstra
2013-09-23 16:01                       ` Peter Zijlstra
2013-09-23 17:04                       ` Paul E. McKenney
2013-09-23 17:04                         ` Paul E. McKenney
2013-09-23 17:30                         ` Peter Zijlstra
2013-09-23 17:30                           ` Peter Zijlstra
2013-09-23 17:50             ` Oleg Nesterov
2013-09-23 17:50               ` Oleg Nesterov
2013-09-24 12:38               ` Peter Zijlstra
2013-09-24 12:38                 ` Peter Zijlstra
2013-09-24 14:42                 ` Paul E. McKenney
2013-09-24 14:42                   ` Paul E. McKenney
2013-09-24 16:09                   ` Peter Zijlstra
2013-09-24 16:09                     ` Peter Zijlstra
2013-09-24 16:31                     ` Oleg Nesterov
2013-09-24 16:31                       ` Oleg Nesterov
2013-09-24 21:09                     ` Paul E. McKenney
2013-09-24 21:09                       ` Paul E. McKenney
2013-09-24 16:03                 ` Oleg Nesterov
2013-09-24 16:03                   ` Oleg Nesterov
2013-09-24 16:43                   ` Steven Rostedt
2013-09-24 16:43                     ` Steven Rostedt
2013-09-24 17:06                     ` Oleg Nesterov
2013-09-24 17:06                       ` Oleg Nesterov
2013-09-24 17:47                       ` Paul E. McKenney
2013-09-24 17:47                         ` Paul E. McKenney
2013-09-24 18:00                         ` Oleg Nesterov
2013-09-24 18:00                           ` Oleg Nesterov
2013-09-24 20:35                           ` Peter Zijlstra
2013-09-24 20:35                             ` Peter Zijlstra
2013-09-25 15:16                             ` Oleg Nesterov
2013-09-25 15:16                               ` Oleg Nesterov
2013-09-25 15:35                               ` Peter Zijlstra
2013-09-25 15:35                                 ` Peter Zijlstra
2013-09-25 16:33                                 ` Oleg Nesterov
2013-09-25 16:33                                   ` Oleg Nesterov
2013-09-24 16:49                   ` Paul E. McKenney
2013-09-24 16:49                     ` Paul E. McKenney
2013-09-24 16:54                     ` Peter Zijlstra
2013-09-24 16:54                       ` Peter Zijlstra
2013-09-24 17:02                       ` Oleg Nesterov
2013-09-24 17:02                         ` Oleg Nesterov
2013-09-24 16:51                   ` Peter Zijlstra
2013-09-24 16:51                     ` Peter Zijlstra
2013-09-24 16:39                 ` Steven Rostedt
2013-09-24 16:39                   ` Steven Rostedt
2013-09-29 18:36     ` [RFC] introduce synchronize_sched_{enter,exit}() Oleg Nesterov
2013-09-29 18:36       ` Oleg Nesterov
2013-09-29 20:01       ` Paul E. McKenney
2013-09-29 20:01         ` Paul E. McKenney
2013-09-30 12:42         ` Oleg Nesterov
2013-09-30 12:42           ` Oleg Nesterov
2013-09-29 21:34       ` Steven Rostedt
2013-09-29 21:34         ` Steven Rostedt
2013-09-30 13:03         ` Oleg Nesterov
2013-09-30 13:03           ` Oleg Nesterov
2013-09-30 12:59       ` Peter Zijlstra
2013-09-30 12:59         ` Peter Zijlstra
2013-09-30 14:24         ` Peter Zijlstra
2013-09-30 14:24           ` Peter Zijlstra
2013-09-30 15:06           ` Peter Zijlstra
2013-09-30 15:06             ` Peter Zijlstra
2013-09-30 16:58             ` Oleg Nesterov
2013-09-30 16:58               ` Oleg Nesterov
2013-09-30 16:38         ` Oleg Nesterov
2013-09-30 16:38           ` Oleg Nesterov
2013-10-02 14:41       ` Peter Zijlstra
2013-10-02 14:41         ` Peter Zijlstra
2013-10-03  7:04         ` Ingo Molnar
2013-10-03  7:04           ` Ingo Molnar
2013-10-03  7:43           ` Peter Zijlstra
2013-10-03  7:43             ` Peter Zijlstra
2013-09-17 14:32   ` [PATCH 37/50] sched: Introduce migrate_swap() Peter Zijlstra
2013-09-17 14:32     ` Peter Zijlstra
2013-09-10  9:32 ` [PATCH 38/50] sched: numa: Use a system-wide search to find swap/migration candidates Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 39/50] sched: numa: Favor placing a task on the preferred node Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 40/50] mm: numa: Change page last {nid,pid} into {cpu,pid} Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 41/50] sched: numa: Use {cpu, pid} to create task groups for shared faults Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-12 12:42   ` Hillf Danton
2013-09-12 12:42     ` Hillf Danton
2013-09-12 14:40     ` Mel Gorman
2013-09-12 14:40       ` Mel Gorman
2013-09-12 12:45   ` Hillf Danton
2013-09-12 12:45     ` Hillf Danton
2013-09-10  9:32 ` [PATCH 42/50] sched: numa: Report a NUMA task group ID Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 43/50] mm: numa: Do not group on RO pages Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 44/50] sched: numa: stay on the same node if CLONE_VM Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 45/50] sched: numa: use group fault statistics in numa placement Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 46/50] sched: numa: Prevent parallel updates to group stats during placement Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-20  9:55   ` Peter Zijlstra
2013-09-20  9:55     ` Peter Zijlstra
2013-09-20 12:31     ` Mel Gorman
2013-09-20 12:31       ` Mel Gorman
2013-09-20 12:36       ` Peter Zijlstra
2013-09-20 12:36         ` Peter Zijlstra
2013-09-20 13:31       ` Mel Gorman
2013-09-20 13:31         ` Mel Gorman
2013-09-10  9:32 ` [PATCH 47/50] sched: numa: add debugging Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 48/50] sched: numa: Decide whether to favour task or group weights based on swap candidate relationships Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 49/50] sched: numa: fix task or group comparison Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-10  9:32 ` [PATCH 50/50] sched: numa: Avoid migrating tasks that are placed on their preferred node Mel Gorman
2013-09-10  9:32   ` Mel Gorman
2013-09-11  2:03 ` [PATCH 0/50] Basic scheduler support for automatic NUMA balancing V7 Rik van Riel
2013-09-14  2:57 ` Bob Liu
2013-09-14  2:57   ` Bob Liu
2013-09-30 10:30   ` Mel Gorman
2013-09-30 10:30     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131001204007.GA13320@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.