linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug
@ 2012-12-04  8:53 Srivatsa S. Bhat
  2012-12-04  8:53 ` [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers Srivatsa S. Bhat
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:53 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

Hi,

This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
offline path and provides an alternative (set of APIs) to preempt_disable() to
prevent CPUs from going offline, which can be invoked from atomic context.

This is an RFC patchset with only a few call-sites of preempt_disable()
converted to the new APIs for now, and the main goal is to get feedback on the
design of the new atomic APIs and see if it serves as a viable replacement for
stop_machine()-free CPU hotplug.

Overview of the patches:
-----------------------

Patch 1 introduces the new APIs that can be used from atomic context, to
prevent CPUs from going offline.

Patches 2 to 6 convert various call-sites to use the new APIs.

Patches 7, 8 and 9 fix a KVM issue that comes into picture when we remove
stop_machine() from the CPU hotplug path. (Actually, patches 7 and 8 are
already in the kvm tree. Patch 9 is the fix we need, but I preserved the
other 2 as well so that the patches can apply easily without external
dependencies).

Patch 10 is the one which actually removes stop_machine() from the CPU
offline path.

Comments and suggestions welcome!

--
 Michael Wang (2):
      CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
      smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly

Paul E. McKenney (1):
      cpu: No more __stop_machine() in _cpu_down()

Srivatsa S. Bhat (4):
      smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
      sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq()
      kick_process(), cpu-hotplug: Prevent offlining of target CPU properly
      yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly

Xiao Guangrong (3):
      KVM: VMX: fix invalid cpu passed to smp_call_function_single
      KVM: VMX: fix memory order between loading vmcs and clearing vmcs
      KVM: VMX: fix unsyc vmcs status when cpu is going down


  arch/x86/kvm/vmx.c      |   32 ++++++++++-
 include/linux/cpu.h     |    4 +
 include/linux/cpumask.h |    5 ++
 kernel/cpu.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/core.c     |   28 +++++++---
 kernel/smp.c            |   84 +++++++++++++++++------------
 6 files changed, 246 insertions(+), 45 deletions(-)



Thanks,
Srivatsa S. Bhat
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
@ 2012-12-04  8:53 ` Srivatsa S. Bhat
  2012-12-04 15:17   ` Tejun Heo
  2012-12-04 22:10   ` Andrew Morton
  2012-12-04  8:54 ` [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly Srivatsa S. Bhat
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:53 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

From: Michael Wang <wangyun@linux.vnet.ibm.com>

There are places where preempt_disable() is used to prevent any CPU from
going offline during the critical section. Let us call them as "atomic
hotplug readers" (atomic because they run in atomic contexts).

Often, these atomic hotplug readers have a simple need : they want the cpu
online mask that they work with (inside their critical section), to be
stable, i.e., it should be guaranteed that CPUs in that mask won't go
offline during the critical section. An important point here is that they
don't really care if such a "stable" mask is a subset of the actual
cpu_online_mask.

The intent of this patch is to provide such a "stable" cpu online mask
for that class of atomic hotplug readers.

Fundamental idea behind the design:
-----------------------------------

Simply put, have a separate mask called the stable cpu online mask; and
at the hotplug writer (cpu_down()), note down the CPU that is about to go
offline, and remove it from the stable cpu online mask. Then, feel free
to take that CPU offline, since the atomic hotplug readers won't see it
from now on. Also, don't start any new cpu_down() operations until all
existing atomic hotplug readers have completed (because they might still
be working with the old value of the stable online mask).

Some important design requirements and considerations:
-----------------------------------------------------

1. The atomic hotplug readers should ideally *never* wait for the hotplug
   writer (cpu_down()) for *anything*. Because, these atomic hotplug readers
   can be in very hot-paths like interrupt handling/IPI and hence, if they
   have to wait for an ongoing cpu_down() to complete, it would pretty much
   introduce the same performance/latency problems as stop_machine().

2. Any synchronization at the atomic hotplug readers side must be highly
   scalable - avoid global locks/counters etc. Because, these paths currently
   use the extremely fast preempt_disable(); our replacement to
   preempt_disable() should not become ridiculously costly.

3. preempt_disable() was recursive. The replacement should also be recursive.

Implementation of the design:
----------------------------

Atomic hotplug reader side:

We use per-cpu counters to mark the presence of atomic hotplug readers.
A reader would increment its per-cpu counter and continue, without waiting
for anything. And no locks are used in this path. Together, these satisfy
all the 3 requirements mentioned above.

The hotplug writer uses (reads) the per-cpu counters of all CPUs in order to
ensure that all existing atomic hotplug readers have completed. Only after
that, it will proceed to actually take the CPU offline.

[ Michael: Designed the synchronization for the IPI case ]
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
[ Srivatsa: Generalized it to work for all cases and wrote the changelog ]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h     |    4 +
 include/linux/cpumask.h |    5 ++
 kernel/cpu.c            |  129 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..c64b6ed 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern void get_online_cpus_stable_atomic(void);
+extern void put_online_cpus_stable_atomic(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
@@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+#define get_online_cpus_stable_atomic()	do { } while (0)
+#define put_online_cpus_stable_atomic()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 0325602..2148e60 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -78,6 +78,7 @@ extern int nr_cpu_ids;
 
 extern const struct cpumask *const cpu_possible_mask;
 extern const struct cpumask *const cpu_online_mask;
+extern const struct cpumask *const cpu_online_stable_mask;
 extern const struct cpumask *const cpu_present_mask;
 extern const struct cpumask *const cpu_active_mask;
 
@@ -87,6 +88,7 @@ extern const struct cpumask *const cpu_active_mask;
 #define num_present_cpus()	cpumask_weight(cpu_present_mask)
 #define num_active_cpus()	cpumask_weight(cpu_active_mask)
 #define cpu_online(cpu)		cpumask_test_cpu((cpu), cpu_online_mask)
+#define cpu_online_stable(cpu)	cpumask_test_cpu((cpu), cpu_online_stable_mask)
 #define cpu_possible(cpu)	cpumask_test_cpu((cpu), cpu_possible_mask)
 #define cpu_present(cpu)	cpumask_test_cpu((cpu), cpu_present_mask)
 #define cpu_active(cpu)		cpumask_test_cpu((cpu), cpu_active_mask)
@@ -705,12 +707,15 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
 
 #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
 #define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_mask)
+#define for_each_online_cpu_stable(cpu)					  \
+				for_each_cpu((cpu), cpu_online_stable_mask)
 #define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_mask)
 
 /* Wrappers for arch boot code to manipulate normally-constant masks */
 void set_cpu_possible(unsigned int cpu, bool possible);
 void set_cpu_present(unsigned int cpu, bool present);
 void set_cpu_online(unsigned int cpu, bool online);
+void set_cpu_online_stable(unsigned int cpu, bool online);
 void set_cpu_active(unsigned int cpu, bool active);
 void init_cpu_present(const struct cpumask *src);
 void init_cpu_possible(const struct cpumask *src);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 42bd331..aaf2393 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -49,6 +49,73 @@ static int cpu_hotplug_disabled;
 
 #ifdef CONFIG_HOTPLUG_CPU
 
+/*
+ * Per-cpu counter to mark the presence of active atomic hotplug readers
+ * (those that run in atomic context and want to prevent CPUs from going
+ * offline).
+ */
+static DEFINE_PER_CPU(int, hotplug_reader_refcount);
+
+/*
+ * Hotplug readers (those that want to prevent CPUs from going offline)
+ * sometimes run from atomic contexts, and hence can't use
+ * get/put_online_cpus() because they can sleep. And often-times, all
+ * they really want is a stable (unchanging) online mask to work with, which
+ * could be a subset of the actual cpu_online_mask, but with a guarantee
+ * that all the CPUs in that stable mask stay online throughout the
+ * hotplug-read-side critical section.
+ *
+ * In such cases, these atomic hotplug readers can use the pair
+ * get/put_online_cpus_stable_atomic() around their critical section to
+ * ensure that the stable mask 'cpu_online_stable_mask' remains unaltered
+ * throughout that critical section. And of course, they should only use
+ * the stable mask in their critical section, and not the actual
+ * cpu_online_mask!
+ *
+ * Eg:
+ *
+ * Atomic hotplug read-side critical section:
+ * -----------------------------------------
+ *
+ * get_online_cpus_stable_atomic();
+ *
+ * for_each_online_cpu_stable(cpu) {
+ *         ... Do something...
+ * }
+ *    ...
+ *
+ * if (cpu_online_stable(other_cpu))
+ *         do_something();
+ *
+ * put_online_cpus_stable_atomic();
+ *
+ * You can call this function recursively.
+ */
+void get_online_cpus_stable_atomic(void)
+{
+	preempt_disable();
+	this_cpu_inc(hotplug_reader_refcount);
+
+	/*
+	 * Prevent reordering of writes to hotplug_reader_refcount and
+	 * reads from cpu_online_stable_mask.
+	 */
+	smp_mb();
+}
+EXPORT_SYMBOL_GPL(get_online_cpus_stable_atomic);
+
+void put_online_cpus_stable_atomic(void)
+{
+	/*
+	 * Prevent reordering of reads from cpu_online_stable_mask and
+	 * writes to hotplug_reader_refcount.
+	 */
+	smp_mb();
+	this_cpu_dec(hotplug_reader_refcount);
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic);
+
 static struct {
 	struct task_struct *active_writer;
 	struct mutex lock; /* Synchronizes accesses to refcount, */
@@ -237,6 +304,44 @@ static inline void check_for_tasks(int cpu)
 	write_unlock_irq(&tasklist_lock);
 }
 
+/*
+ * We want all atomic hotplug readers to refer to the new value of
+ * cpu_online_stable_mask. So wait for the existing atomic hotplug readers
+ * to complete. Any new atomic hotplug readers will see the updated mask
+ * and hence pose no problems.
+ */
+static void sync_hotplug_readers(void)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu) {
+		while (per_cpu(hotplug_reader_refcount, cpu))
+			cpu_relax();
+	}
+}
+
+/*
+ * We are serious about taking this CPU down. So clear the CPU from the
+ * stable online mask.
+ */
+static void prepare_cpu_take_down(unsigned int cpu)
+{
+	set_cpu_online_stable(cpu, false);
+
+	/*
+	 * Prevent reordering of writes to cpu_online_stable_mask and reads
+	 * from hotplug_reader_refcount.
+	 */
+	smp_mb();
+
+	/*
+	 * Wait for all active hotplug readers to complete, to ensure that
+	 * there are no critical sections still referring to the old stable
+	 * online mask.
+	 */
+	sync_hotplug_readers();
+}
+
 struct take_cpu_down_param {
 	unsigned long mod;
 	void *hcpu;
@@ -246,7 +351,9 @@ struct take_cpu_down_param {
 static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
-	int err;
+	int err, cpu = (long)(param->hcpu);
+
+	prepare_cpu_take_down(cpu);
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
@@ -670,6 +777,11 @@ static DECLARE_BITMAP(cpu_online_bits, CONFIG_NR_CPUS) __read_mostly;
 const struct cpumask *const cpu_online_mask = to_cpumask(cpu_online_bits);
 EXPORT_SYMBOL(cpu_online_mask);
 
+static DECLARE_BITMAP(cpu_online_stable_bits, CONFIG_NR_CPUS) __read_mostly;
+const struct cpumask *const cpu_online_stable_mask =
+					to_cpumask(cpu_online_stable_bits);
+EXPORT_SYMBOL(cpu_online_stable_mask);
+
 static DECLARE_BITMAP(cpu_present_bits, CONFIG_NR_CPUS) __read_mostly;
 const struct cpumask *const cpu_present_mask = to_cpumask(cpu_present_bits);
 EXPORT_SYMBOL(cpu_present_mask);
@@ -694,12 +806,26 @@ void set_cpu_present(unsigned int cpu, bool present)
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_present_bits));
 }
 
+void set_cpu_online_stable(unsigned int cpu, bool online)
+{
+	if (online)
+		cpumask_set_cpu(cpu, to_cpumask(cpu_online_stable_bits));
+	else
+		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_stable_bits));
+}
+
 void set_cpu_online(unsigned int cpu, bool online)
 {
 	if (online)
 		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
 	else
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+
+	/*
+	 * Any changes to the online mask must also be propagated to the
+	 * stable online mask.
+	 */
+	set_cpu_online_stable(cpu, online);
 }
 
 void set_cpu_active(unsigned int cpu, bool active)
@@ -723,4 +849,5 @@ void init_cpu_possible(const struct cpumask *src)
 void init_cpu_online(const struct cpumask *src)
 {
 	cpumask_copy(to_cpumask(cpu_online_bits), src);
+	cpumask_copy(to_cpumask(cpu_online_stable_bits), src);
 }


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
  2012-12-04  8:53 ` [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers Srivatsa S. Bhat
@ 2012-12-04  8:54 ` Srivatsa S. Bhat
  2012-12-04 22:17   ` Andrew Morton
  2012-12-04 23:39   ` Rusty Russell
  2012-12-04  8:54 ` [RFC PATCH 03/10] smp, cpu hotplug: Fix on_each_cpu_*() " Srivatsa S. Bhat
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:54 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

From: Michael Wang <wangyun@linux.vnet.ibm.com>

With stop_machine() gone from the CPU offline path, we can't depend on
preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

[ Michael: Designed the synchronization for the IPI case ]
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
[ Srivatsa: Generalized it to work for all cases and wrote the changelog ]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   54 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 29dd40a..581727c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -310,7 +310,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	 * prevent preemption and reschedule on another processor,
 	 * as well as CPU removal
 	 */
-	this_cpu = get_cpu();
+	get_online_cpus_stable_atomic();
+	this_cpu = smp_processor_id();
 
 	/*
 	 * Can deadlock when called with interrupts disabled.
@@ -326,7 +327,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		func(info);
 		local_irq_restore(flags);
 	} else {
-		if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
+		if ((unsigned)cpu < nr_cpu_ids && cpu_online_stable(cpu)) {
 			struct call_single_data *data = &d;
 
 			if (!wait)
@@ -342,7 +343,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_stable_atomic();
 
 	return err;
 }
@@ -371,8 +372,10 @@ int smp_call_function_any(const struct cpumask *mask,
 	const struct cpumask *nodemask;
 	int ret;
 
+	get_online_cpus_stable_atomic();
 	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
+	cpu = smp_processor_id();
+
 	if (cpumask_test_cpu(cpu, mask))
 		goto call;
 
@@ -380,15 +383,15 @@ int smp_call_function_any(const struct cpumask *mask,
 	nodemask = cpumask_of_node(cpu_to_node(cpu));
 	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
 	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
-		if (cpu_online(cpu))
+		if (cpu_online_stable(cpu))
 			goto call;
 	}
 
 	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
-	cpu = cpumask_any_and(mask, cpu_online_mask);
+	cpu = cpumask_any_and(mask, cpu_online_stable_mask);
 call:
 	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
+	put_online_cpus_stable_atomic();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
@@ -409,14 +412,17 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	unsigned int this_cpu;
 	unsigned long flags;
 
-	this_cpu = get_cpu();
+	get_online_cpus_stable_atomic();
+
+	this_cpu = smp_processor_id();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
 	 * send smp call function interrupt to this cpu and as such deadlocks
 	 * can't happen.
 	 */
-	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
+	WARN_ON_ONCE(cpu_online(this_cpu) && wait && irqs_disabled()
 		     && !oops_in_progress);
 
 	if (cpu == this_cpu) {
@@ -427,7 +433,7 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 		csd_lock(data);
 		generic_exec_single(cpu, data, wait);
 	}
-	put_cpu();
+	put_online_cpus_stable_atomic();
 }
 
 /**
@@ -451,6 +457,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	unsigned long flags;
 	int refs, cpu, next_cpu, this_cpu = smp_processor_id();
 
+	get_online_cpus_stable_atomic();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -461,23 +469,24 @@ void smp_call_function_many(const struct cpumask *mask,
 		     && !oops_in_progress && !early_boot_irqs_disabled);
 
 	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
-	cpu = cpumask_first_and(mask, cpu_online_mask);
+	cpu = cpumask_first_and(mask, cpu_online_stable_mask);
 	if (cpu == this_cpu)
-		cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
+		cpu = cpumask_next_and(cpu, mask, cpu_online_stable_mask);
 
 	/* No online cpus?  We're done. */
 	if (cpu >= nr_cpu_ids)
-		return;
+		goto out_unlock;
 
 	/* Do we have another CPU which isn't us? */
-	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
+	next_cpu = cpumask_next_and(cpu, mask, cpu_online_stable_mask);
 	if (next_cpu == this_cpu)
-		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
+		next_cpu = cpumask_next_and(next_cpu, mask,
+						cpu_online_stable_mask);
 
 	/* Fastpath: do that cpu by itself. */
 	if (next_cpu >= nr_cpu_ids) {
 		smp_call_function_single(cpu, func, info, wait);
-		return;
+		goto out_unlock;
 	}
 
 	data = &__get_cpu_var(cfd_data);
@@ -516,14 +525,14 @@ void smp_call_function_many(const struct cpumask *mask,
 	smp_wmb();
 
 	/* We rely on the "and" being processed before the store */
-	cpumask_and(data->cpumask, mask, cpu_online_mask);
+	cpumask_and(data->cpumask, mask, cpu_online_stable_mask);
 	cpumask_clear_cpu(this_cpu, data->cpumask);
 	refs = cpumask_weight(data->cpumask);
 
 	/* Some callers race with other cpus changing the passed mask */
 	if (unlikely(!refs)) {
 		csd_unlock(&data->csd);
-		return;
+		goto out_unlock;
 	}
 
 	raw_spin_lock_irqsave(&call_function.lock, flags);
@@ -554,6 +563,9 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Optionally wait for the CPUs to complete */
 	if (wait)
 		csd_lock_wait(&data->csd);
+
+out_unlock:
+	put_online_cpus_stable_atomic();
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -574,9 +586,9 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 int smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
-	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	get_online_cpus_stable_atomic();
+	smp_call_function_many(cpu_online_stable_mask, func, info, wait);
+	put_online_cpus_stable_atomic();
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 03/10] smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
  2012-12-04  8:53 ` [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers Srivatsa S. Bhat
  2012-12-04  8:54 ` [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly Srivatsa S. Bhat
@ 2012-12-04  8:54 ` Srivatsa S. Bhat
  2012-12-04  8:54 ` [RFC PATCH 04/10] sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq() Srivatsa S. Bhat
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:54 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

With stop_machine() gone from the CPU offline path, we can't depend on
preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 581727c..ea81293 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -688,12 +688,12 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	unsigned long flags;
 	int ret = 0;
 
-	preempt_disable();
+	get_online_cpus_stable_atomic();
 	ret = smp_call_function(func, info, wait);
 	local_irq_save(flags);
 	func(info);
 	local_irq_restore(flags);
-	preempt_enable();
+	put_online_cpus_stable_atomic();
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
@@ -715,7 +715,11 @@ EXPORT_SYMBOL(on_each_cpu);
 void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 			void *info, bool wait)
 {
-	int cpu = get_cpu();
+	int cpu;
+
+	get_online_cpus_stable_atomic();
+
+	cpu = smp_processor_id();
 
 	smp_call_function_many(mask, func, info, wait);
 	if (cpumask_test_cpu(cpu, mask)) {
@@ -723,7 +727,7 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 		func(info);
 		local_irq_enable();
 	}
-	put_cpu();
+	put_online_cpus_stable_atomic();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
 
@@ -748,8 +752,10 @@ EXPORT_SYMBOL(on_each_cpu_mask);
  * The function might sleep if the GFP flags indicates a non
  * atomic allocation is allowed.
  *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
+ * We use get/put_online_cpus_stable_atomic() to have a stable online mask
+ * to work with, whose CPUs won't go offline in-between our operation.
+ * And we will skip those CPUs which have already begun their offline journey.
+ * CPUs coming online during the call will not be seen or sent an IPI.
  *
  * You must not call this function with disabled interrupts or
  * from a hardware interrupt handler or from a bottom half handler.
@@ -764,26 +770,26 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 	might_sleep_if(gfp_flags & __GFP_WAIT);
 
 	if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
-		preempt_disable();
-		for_each_online_cpu(cpu)
+		get_online_cpus_stable_atomic();
+		for_each_online_cpu_stable(cpu)
 			if (cond_func(cpu, info))
 				cpumask_set_cpu(cpu, cpus);
 		on_each_cpu_mask(cpus, func, info, wait);
-		preempt_enable();
+		put_online_cpus_stable_atomic();
 		free_cpumask_var(cpus);
 	} else {
 		/*
 		 * No free cpumask, bother. No matter, we'll
 		 * just have to IPI them one by one.
 		 */
-		preempt_disable();
-		for_each_online_cpu(cpu)
+		get_online_cpus_stable_atomic();
+		for_each_online_cpu_stable(cpu)
 			if (cond_func(cpu, info)) {
 				ret = smp_call_function_single(cpu, func,
 								info, wait);
 				WARN_ON_ONCE(!ret);
 			}
-		preempt_enable();
+		put_online_cpus_stable_atomic();
 	}
 }
 EXPORT_SYMBOL(on_each_cpu_cond);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 04/10] sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq()
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (2 preceding siblings ...)
  2012-12-04  8:54 ` [RFC PATCH 03/10] smp, cpu hotplug: Fix on_each_cpu_*() " Srivatsa S. Bhat
@ 2012-12-04  8:54 ` Srivatsa S. Bhat
  2012-12-04  8:55 ` [RFC PATCH 05/10] kick_process(), cpu-hotplug: Prevent offlining of target CPU properly Srivatsa S. Bhat
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:54 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

With stop_machine() gone from the CPU offline path, we can't depend on
preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Scheduler functions such as try_to_wake_up() and select_task_rq() (and even
select_fallback_rq()) deal with picking new CPUs to run tasks. It would be
better if they picked those CPUs from the set of CPUs that are going to
remain online. So use the cpu_online_stable_mask while making those decisions.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..ef6ada4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1103,6 +1103,10 @@ EXPORT_SYMBOL_GPL(kick_process);
 #ifdef CONFIG_SMP
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
+ *
+ *  Must be called under get/put_online_cpus_stable_atomic() or
+ *  equivalent, to avoid CPUs from going offline from underneath
+ *  us.
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -1112,7 +1116,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 
 	/* Look for allowed, online CPU in same node. */
 	for_each_cpu(dest_cpu, nodemask) {
-		if (!cpu_online(dest_cpu))
+		if (!cpu_online_stable(dest_cpu))
 			continue;
 		if (!cpu_active(dest_cpu))
 			continue;
@@ -1123,7 +1127,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 	for (;;) {
 		/* Any allowed, online CPU? */
 		for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) {
-			if (!cpu_online(dest_cpu))
+			if (!cpu_online_stable(dest_cpu))
 				continue;
 			if (!cpu_active(dest_cpu))
 				continue;
@@ -1166,6 +1170,9 @@ out:
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
+ *
+ * Must be called under get/put_online_cpus_stable_atomic(), to prevent
+ * CPUs from going offline from underneath us.
  */
 static inline
 int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
@@ -1183,7 +1190,7 @@ int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
 	 *   not worry about this generic constraint ]
 	 */
 	if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) ||
-		     !cpu_online(cpu)))
+		     !cpu_online_stable(cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
 	return cpu;
@@ -1406,6 +1413,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	int cpu, success = 0;
 
 	smp_wmb();
+	get_online_cpus_stable_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
@@ -1446,6 +1454,7 @@ stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_stable_atomic();
 
 	return success;
 }
@@ -1624,6 +1633,7 @@ void wake_up_new_task(struct task_struct *p)
 	unsigned long flags;
 	struct rq *rq;
 
+	get_online_cpus_stable_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
 	/*
@@ -1644,6 +1654,7 @@ void wake_up_new_task(struct task_struct *p)
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
+	put_online_cpus_stable_atomic();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -2541,6 +2552,7 @@ void sched_exec(void)
 	unsigned long flags;
 	int dest_cpu;
 
+	get_online_cpus_stable_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
@@ -2550,11 +2562,13 @@ void sched_exec(void)
 		struct migration_arg arg = { p, dest_cpu };
 
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		put_online_cpus_stable_atomic();
 		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_stable_atomic();
 }
 
 #endif


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 05/10] kick_process(), cpu-hotplug: Prevent offlining of target CPU properly
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (3 preceding siblings ...)
  2012-12-04  8:54 ` [RFC PATCH 04/10] sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq() Srivatsa S. Bhat
@ 2012-12-04  8:55 ` Srivatsa S. Bhat
  2012-12-04  8:55 ` [RFC PATCH 06/10] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly Srivatsa S. Bhat
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:55 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

With stop_machine() gone from the CPU offline path, we can't depend on
preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ef6ada4..c1a5d93 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1091,11 +1091,11 @@ void kick_process(struct task_struct *p)
 {
 	int cpu;
 
-	preempt_disable();
+	get_online_cpus_stable_atomic();
 	cpu = task_cpu(p);
 	if ((cpu != smp_processor_id()) && task_curr(p))
 		smp_send_reschedule(cpu);
-	preempt_enable();
+	put_online_cpus_stable_atomic();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 #endif /* CONFIG_SMP */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 06/10] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (4 preceding siblings ...)
  2012-12-04  8:55 ` [RFC PATCH 05/10] kick_process(), cpu-hotplug: Prevent offlining of target CPU properly Srivatsa S. Bhat
@ 2012-12-04  8:55 ` Srivatsa S. Bhat
  2012-12-04  8:55 ` [RFC PATCH 07/10] KVM: VMX: fix invalid cpu passed to smp_call_function_single Srivatsa S. Bhat
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:55 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

With stop_machine() gone from the CPU offline path, we can't depend on
local_irq_save() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from
going offline, while invoking from atomic context. And use the stable
online mask while interacting with other CPUs.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c1a5d93..d0b159b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4312,6 +4312,7 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
 	unsigned long flags;
 	bool yielded = 0;
 
+	get_online_cpus_stable_atomic();
 	local_irq_save(flags);
 	rq = this_rq();
 
@@ -4339,13 +4340,14 @@ again:
 		 * Make p's CPU reschedule; pick_next_entity takes care of
 		 * fairness.
 		 */
-		if (preempt && rq != p_rq)
+		if (preempt && rq != p_rq && cpu_online_stable(task_cpu(p)))
 			resched_task(p_rq->curr);
 	}
 
 out:
 	double_rq_unlock(rq, p_rq);
 	local_irq_restore(flags);
+	put_online_cpus_stable_atomic();
 
 	if (yielded)
 		schedule();


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 07/10] KVM: VMX: fix invalid cpu passed to smp_call_function_single
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (5 preceding siblings ...)
  2012-12-04  8:55 ` [RFC PATCH 06/10] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly Srivatsa S. Bhat
@ 2012-12-04  8:55 ` Srivatsa S. Bhat
  2012-12-04  8:55 ` [RFC PATCH 08/10] KVM: VMX: fix memory order between loading vmcs and clearing vmcs Srivatsa S. Bhat
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:55 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to
smp_call_function_single, if the target cpu is downing (doing cpu hot remove),
loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single

It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called
in the preemptionable context

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f858159..9dc562a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1007,9 +1007,11 @@ static void __loaded_vmcs_clear(void *arg)
 
 static void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
 {
-	if (loaded_vmcs->cpu != -1)
-		smp_call_function_single(
-			loaded_vmcs->cpu, __loaded_vmcs_clear, loaded_vmcs, 1);
+	int cpu = loaded_vmcs->cpu;
+
+	if (cpu != -1)
+		smp_call_function_single(cpu,
+			 __loaded_vmcs_clear, loaded_vmcs, 1);
 }
 
 static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx)


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 08/10] KVM: VMX: fix memory order between loading vmcs and clearing vmcs
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (6 preceding siblings ...)
  2012-12-04  8:55 ` [RFC PATCH 07/10] KVM: VMX: fix invalid cpu passed to smp_call_function_single Srivatsa S. Bhat
@ 2012-12-04  8:55 ` Srivatsa S. Bhat
  2012-12-04  8:56 ` [RFC PATCH 09/10] KVM: VMX: fix unsyc vmcs status when cpu is going down Srivatsa S. Bhat
  2012-12-04  8:56 ` [RFC PATCH 10/10] cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:55 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs
does not exist on any vcpu

If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu
list. The list can be corrupted if the cpu prefetch the vmcs's list before
reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before
making vmcs->vcpu == -1 be visible

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9dc562a..95e502b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1002,6 +1002,15 @@ static void __loaded_vmcs_clear(void *arg)
 	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
 		per_cpu(current_vmcs, cpu) = NULL;
 	list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
+
+	/*
+	 * we should ensure updating loaded_vmcs->loaded_vmcss_on_cpu_link
+	 * is before setting loaded_vmcs->vcpu to -1 which is done in
+	 * loaded_vmcs_init. Otherwise, other cpu can see vcpu = -1 fist
+	 * then adds the vmcs into percpu list before it is deleted.
+	 */
+	smp_wmb();
+
 	loaded_vmcs_init(loaded_vmcs);
 }
 
@@ -1537,6 +1546,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 		local_irq_disable();
+
+		/*
+		 * Read loaded_vmcs->cpu should be before fetching
+		 * loaded_vmcs->loaded_vmcss_on_cpu_link.
+		 * See the comments in __loaded_vmcs_clear().
+		 */
+		smp_rmb();
+
 		list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
 			 &per_cpu(loaded_vmcss_on_cpu, cpu));
 		local_irq_enable();


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 09/10] KVM: VMX: fix unsyc vmcs status when cpu is going down
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (7 preceding siblings ...)
  2012-12-04  8:55 ` [RFC PATCH 08/10] KVM: VMX: fix memory order between loading vmcs and clearing vmcs Srivatsa S. Bhat
@ 2012-12-04  8:56 ` Srivatsa S. Bhat
  2012-12-04  8:56 ` [RFC PATCH 10/10] cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:56 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

When vcpu is scheduled to the different cpu, it should sent IPI to
the cpu which the vcpu ran to clear the vcpu->vmcs

It is safe since cpu-offline path can not concurrently run with
other cpu. After implementing stop_machine()-free, smp_call_function_sing
will return -ENXIO immediately when the target cpu is going down, that
means, we can load the vmcs but it is not cleared on target cpu and corrupt
per_cpu(loaded_vmcss_on_cpu, cpu). This bug can be triggered under this case:

# general protection fault: 0000 [#1] PREEMPT SMP

[......]

Call Trace:
 [<ffffffffa052980f>] kvm_arch_hardware_disable+0x1f/0x50 [kvm]
 [<ffffffffa050ef43>] hardware_disable_nolock+0x33/0x40 [kvm]
 [<ffffffffa050efa3>] kvm_cpu_hotplug+0x53/0xb0 [kvm]
 [<ffffffff81548b1d>] notifier_call_chain+0x4d/0x70
 [<ffffffff81517fe0>] ? spp_getpage+0xb0/0xb0
 [<ffffffff8108459e>] __raw_notifier_call_chain+0xe/0x10
 [<ffffffff810599f0>] __cpu_notify+0x20/0x40
 [<ffffffff8151802e>] take_cpu_down+0x4e/0x90
 [<ffffffff810d184b>] cpu_stopper_thread+0xdb/0x1d0
 [<ffffffff8108b3ce>] ? finish_task_switch+0x4e/0xe0
 [<ffffffff815438d0>] ? __schedule+0x460/0x740
 [<ffffffff810d1770>] ? cpu_stop_signal_done+0x40/0x40
 [<ffffffff8107de30>] kthread+0xc0/0xd0
 [<ffffffff8107dd70>] ? flush_kthread_worker+0xb0/0xb0
 [<ffffffff8154cc6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107dd70>] ? flush_kthread_worker+0xb0/0xb0

Fix it by waiting for target vcpu to clear the vmcs

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 95e502b..4fb4e51 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1019,8 +1019,15 @@ static void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
 	int cpu = loaded_vmcs->cpu;
 
 	if (cpu != -1)
-		smp_call_function_single(cpu,
-			 __loaded_vmcs_clear, loaded_vmcs, 1);
+		if (smp_call_function_single(cpu,
+			 __loaded_vmcs_clear, loaded_vmcs, 1))
+
+			/*
+			 * The target cpu is going down, we should
+			 * wait for it to clear the vmcs status.
+			 */
+			while (ACCESS_ONCE(loaded_vmcs->cpu) != -1)
+				cpu_relax();
 }
 
 static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx)


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 10/10] cpu: No more __stop_machine() in _cpu_down()
  2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
                   ` (8 preceding siblings ...)
  2012-12-04  8:56 ` [RFC PATCH 09/10] KVM: VMX: fix unsyc vmcs status when cpu is going down Srivatsa S. Bhat
@ 2012-12-04  8:56 ` Srivatsa S. Bhat
  9 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04  8:56 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

From: Paul E. McKenney <paul.mckenney@linaro.org>

The _cpu_down() function invoked as part of the CPU-hotplug offlining
process currently invokes __stop_machine(), which is slow and inflicts
substantial real-time latencies on the entire system.  This patch
substitutes stop_cpus() for __stop_machine() in order to improve
both performance and real-time latency.

This is currently unsafe, because there are a number of uses of
preempt_disable() that are intended to block CPU-hotplug offlining.
These will be fixed by using get/put_online_cpus_stable_atomic(), but in the
meantime, this commit is one way to help locate them.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ Srivatsa: Refer to get/put_online_cpus_stable_atomic() in the changelog ]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aaf2393..59592e5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -352,15 +352,20 @@ static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
 	int err, cpu = (long)(param->hcpu);
+	unsigned long flags;
 
 	prepare_cpu_take_down(cpu);
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
+	local_irq_save(flags);
 	err = __cpu_disable();
-	if (err < 0)
+	if (err < 0) {
+		local_irq_restore(flags);
 		return err;
+	}
 
 	cpu_notify(CPU_DYING | param->mod, param->hcpu);
+	local_irq_restore(flags);
 	return 0;
 }
 
@@ -393,7 +398,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	}
 	smpboot_park_threads(cpu);
 
-	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+	err = stop_cpus(cpumask_of(cpu), take_cpu_down, &tcd_param);
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		smpboot_unpark_threads(cpu);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-04  8:53 ` [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers Srivatsa S. Bhat
@ 2012-12-04 15:17   ` Tejun Heo
  2012-12-04 21:14     ` Srivatsa S. Bhat
  2012-12-04 22:10   ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2012-12-04 15:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

Hello, Srivatsa.

On Tue, Dec 04, 2012 at 02:23:41PM +0530, Srivatsa S. Bhat wrote:
>  extern const struct cpumask *const cpu_possible_mask;
>  extern const struct cpumask *const cpu_online_mask;
> +extern const struct cpumask *const cpu_online_stable_mask;
>  extern const struct cpumask *const cpu_present_mask;
>  extern const struct cpumask *const cpu_active_mask;

This is a bit nasty.  The distinction between cpu_online_mask and the
stable one is quite subtle and there's no mechanism to verify the
right one is in use.  IIUC, the only time cpu_online_mask and
cpu_online_stable_mask can deviate is during the final stage CPU take
down, right?  If so, why not just make cpu_online_mask the stable one
and the few cases where the actual online state matters deal with the
internal version?  Anyone outside cpu hotplug proper should be happy
with the stable version anyway, no?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-04 15:17   ` Tejun Heo
@ 2012-12-04 21:14     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-04 21:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

Hi Tejun,

On 12/04/2012 08:47 PM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Tue, Dec 04, 2012 at 02:23:41PM +0530, Srivatsa S. Bhat wrote:
>>  extern const struct cpumask *const cpu_possible_mask;
>>  extern const struct cpumask *const cpu_online_mask;
>> +extern const struct cpumask *const cpu_online_stable_mask;
>>  extern const struct cpumask *const cpu_present_mask;
>>  extern const struct cpumask *const cpu_active_mask;
> 
> This is a bit nasty.  The distinction between cpu_online_mask and the
> stable one is quite subtle and there's no mechanism to verify the
> right one is in use.  IIUC, the only time cpu_online_mask and
> cpu_online_stable_mask can deviate is during the final stage CPU take
> down, right?

No, actually they deviate in the initial stage itself. We flip the bit
in the stable mask right in the beginning, and then flip the bit in the
online mask slightly later, in __cpu_disable().

...which makes it look stupid to have a separate "stable" mask in the
first place! Hmm...

Thinking in this direction a bit more, I have written a patchset that
doesn't need a separate stable mask, but which works with the existing
cpu_online_mask itself. I'll post it tomorrow after testing and updating
the patch descriptions.

One of the things I'm trying to achieve is to identify 2 types of
hotplug readers: 

1. Readers who care only about synchronizing with the updates to
cpu_online_mask (light-weight readers)

2. Readers who really want full synchronization with the entire CPU
tear-down sequence.

The reason for doing this, instead of assuming every reader to be of
type 2 is that, if we don't make this distinction, we can end up in the
very same latency issues and performance problems that we hit when
using stop_machine(), without even using stop_machine()!

[The readers can be in very hot paths, like interrupt handlers. So if
there is no distinction between light-weight readers and full-readers,
we can potentially slow down the entire machine unnecessarily, effectively
creating the same effect as stop_machine()]

IOW, IMHO, one of the goals of the replacement to stop_machine() should
be that it should not indirectly induce the "stop_machine() effect".

The new patchset that I have written takes care of this requirement
and provides APIs for both types of readers, and also doesn't use
any extra cpu masks. I'll post this patchset tomorrow, after taking a
careful look at it again.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-04  8:53 ` [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers Srivatsa S. Bhat
  2012-12-04 15:17   ` Tejun Heo
@ 2012-12-04 22:10   ` Andrew Morton
  2012-12-05  2:56     ` Michael Wang
  2012-12-05 12:38     ` Srivatsa S. Bhat
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2012-12-04 22:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, namhyung, vincent.guittot,
	sbw, tj, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong,
	nikunj, linux-pm, linux-kernel

On Tue, 04 Dec 2012 14:23:41 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:

> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> 
> There are places where preempt_disable() is used to prevent any CPU from
> going offline during the critical section. Let us call them as "atomic
> hotplug readers" (atomic because they run in atomic contexts).
> 
> Often, these atomic hotplug readers have a simple need : they want the cpu
> online mask that they work with (inside their critical section), to be
> stable, i.e., it should be guaranteed that CPUs in that mask won't go
> offline during the critical section. An important point here is that they
> don't really care if such a "stable" mask is a subset of the actual
> cpu_online_mask.
> 
> The intent of this patch is to provide such a "stable" cpu online mask
> for that class of atomic hotplug readers.
> 
> Fundamental idea behind the design:
> -----------------------------------
> 
> Simply put, have a separate mask called the stable cpu online mask; and
> at the hotplug writer (cpu_down()), note down the CPU that is about to go
> offline, and remove it from the stable cpu online mask. Then, feel free
> to take that CPU offline, since the atomic hotplug readers won't see it
> from now on. Also, don't start any new cpu_down() operations until all
> existing atomic hotplug readers have completed (because they might still
> be working with the old value of the stable online mask).
> 
> Some important design requirements and considerations:
> -----------------------------------------------------
> 
> 1. The atomic hotplug readers should ideally *never* wait for the hotplug
>    writer (cpu_down()) for *anything*. Because, these atomic hotplug readers
>    can be in very hot-paths like interrupt handling/IPI and hence, if they
>    have to wait for an ongoing cpu_down() to complete, it would pretty much
>    introduce the same performance/latency problems as stop_machine().
> 
> 2. Any synchronization at the atomic hotplug readers side must be highly
>    scalable - avoid global locks/counters etc. Because, these paths currently
>    use the extremely fast preempt_disable(); our replacement to
>    preempt_disable() should not become ridiculously costly.
> 
> 3. preempt_disable() was recursive. The replacement should also be recursive.
> 
> Implementation of the design:
> ----------------------------
> 
> Atomic hotplug reader side:
> 
> We use per-cpu counters to mark the presence of atomic hotplug readers.
> A reader would increment its per-cpu counter and continue, without waiting
> for anything. And no locks are used in this path. Together, these satisfy
> all the 3 requirements mentioned above.
> 
> The hotplug writer uses (reads) the per-cpu counters of all CPUs in order to
> ensure that all existing atomic hotplug readers have completed. Only after
> that, it will proceed to actually take the CPU offline.
> 
> [ Michael: Designed the synchronization for the IPI case ]

Like this:

[wangyun@linux.vnet.ibm.com: designed the synchronization for the IPI case]

> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> [ Srivatsa: Generalized it to work for all cases and wrote the changelog ]
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>
> ...
>
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
>  
>  extern void get_online_cpus(void);
>  extern void put_online_cpus(void);
> +extern void get_online_cpus_stable_atomic(void);
> +extern void put_online_cpus_stable_atomic(void);
>  #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
>  #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
> @@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
>  
>  #define get_online_cpus()	do { } while (0)
>  #define put_online_cpus()	do { } while (0)
> +#define get_online_cpus_stable_atomic()	do { } while (0)
> +#define put_online_cpus_stable_atomic()	do { } while (0)

static inline C functions would be preferred if possible.  Feel free to
fix up the wrong crufty surrounding code as well ;)

>
> ...
>
> @@ -705,12 +707,15 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
>  
>  #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
>  #define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_mask)
> +#define for_each_online_cpu_stable(cpu)					  \
> +				for_each_cpu((cpu), cpu_online_stable_mask)
>  #define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_mask)
>  
>  /* Wrappers for arch boot code to manipulate normally-constant masks */
>  void set_cpu_possible(unsigned int cpu, bool possible);
>  void set_cpu_present(unsigned int cpu, bool present);
>  void set_cpu_online(unsigned int cpu, bool online);
> +void set_cpu_online_stable(unsigned int cpu, bool online);

The naming is inconsistent.  This is "cpu_online_stable", but
for_each_online_cpu_stable() is "online_cpu_stable".  Can we make
everything the same?

>  void set_cpu_active(unsigned int cpu, bool active);
>  void init_cpu_present(const struct cpumask *src);
>  void init_cpu_possible(const struct cpumask *src);
>
> ...
>
> +void get_online_cpus_stable_atomic(void)
> +{
> +	preempt_disable();
> +	this_cpu_inc(hotplug_reader_refcount);
> +
> +	/*
> +	 * Prevent reordering of writes to hotplug_reader_refcount and
> +	 * reads from cpu_online_stable_mask.
> +	 */
> +	smp_mb();
> +}
> +EXPORT_SYMBOL_GPL(get_online_cpus_stable_atomic);
> +
> +void put_online_cpus_stable_atomic(void)
> +{
> +	/*
> +	 * Prevent reordering of reads from cpu_online_stable_mask and
> +	 * writes to hotplug_reader_refcount.
> +	 */
> +	smp_mb();
> +	this_cpu_dec(hotplug_reader_refcount);
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic);
> +
>  static struct {
>  	struct task_struct *active_writer;
>  	struct mutex lock; /* Synchronizes accesses to refcount, */
> @@ -237,6 +304,44 @@ static inline void check_for_tasks(int cpu)
>  	write_unlock_irq(&tasklist_lock);
>  }
>  
> +/*
> + * We want all atomic hotplug readers to refer to the new value of
> + * cpu_online_stable_mask. So wait for the existing atomic hotplug readers
> + * to complete. Any new atomic hotplug readers will see the updated mask
> + * and hence pose no problems.
> + */
> +static void sync_hotplug_readers(void)
> +{
> +	unsigned int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		while (per_cpu(hotplug_reader_refcount, cpu))
> +			cpu_relax();
> +	}
> +}

That all looks solid to me.

> +/*
> + * We are serious about taking this CPU down. So clear the CPU from the
> + * stable online mask.
> + */
> +static void prepare_cpu_take_down(unsigned int cpu)
> +{
> +	set_cpu_online_stable(cpu, false);
> +
> +	/*
> +	 * Prevent reordering of writes to cpu_online_stable_mask and reads
> +	 * from hotplug_reader_refcount.
> +	 */
> +	smp_mb();
> +
> +	/*
> +	 * Wait for all active hotplug readers to complete, to ensure that
> +	 * there are no critical sections still referring to the old stable
> +	 * online mask.
> +	 */
> +	sync_hotplug_readers();
> +}

I wonder about the cpu-online case.  A typical caller might want to do:


/*
 * Set each online CPU's "foo" to "bar"
 */

int global_bar;

void set_cpu_foo(int bar)
{
	get_online_cpus_stable_atomic();
	global_bar = bar;
	for_each_online_cpu_stable()
		cpu->foo = bar;
	put_online_cpus_stable_atomic()
}

void_cpu_online_notifier_handler(void)
{
	cpu->foo = global_bar;
}

And I think that set_cpu_foo() would be buggy, because a CPU could come
online before global_bar was altered, and that newly-online CPU would
pick up the old value of `bar'.

So what's the rule here?  global_bar must be written before we run
get_online_cpus_stable_atomic()?

Anyway, please have a think and spell all this out?

>  struct take_cpu_down_param {
>  	unsigned long mod;
>  	void *hcpu;
> @@ -246,7 +351,9 @@ struct take_cpu_down_param {
>  static int __ref take_cpu_down(void *_param)
>  {
>  	struct take_cpu_down_param *param = _param;
> -	int err;
> +	int err, cpu = (long)(param->hcpu);
> +

Like this please:

	int err;
	int cpu = (long)(param->hcpu);

> +	prepare_cpu_take_down(cpu);
>  
>  	/* Ensure this CPU doesn't handle any more interrupts. */
>  	err = __cpu_disable();
>
> ...
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
  2012-12-04  8:54 ` [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly Srivatsa S. Bhat
@ 2012-12-04 22:17   ` Andrew Morton
  2012-12-05 12:41     ` Srivatsa S. Bhat
  2012-12-04 23:39   ` Rusty Russell
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-12-04 22:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, namhyung, vincent.guittot,
	sbw, tj, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong,
	nikunj, linux-pm, linux-kernel

On Tue, 04 Dec 2012 14:24:28 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:

> From: Michael Wang <wangyun@linux.vnet.ibm.com>
> 
> With stop_machine() gone from the CPU offline path, we can't depend on
> preempt_disable() to prevent CPUs from going offline from under us.
> 
> Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from going
> offline, while invoking from atomic context.
>
> ...
>
>  	 */
> -	this_cpu = get_cpu();
> +	get_online_cpus_stable_atomic();
> +	this_cpu = smp_processor_id();

I wonder if get_online_cpus_stable_atomic() should return the local CPU
ID.  Just as a little convenience thing.  Time will tell.
 
>  	/*
>  	 * Can deadlock when called with interrupts disabled.
>
> ...
>
> @@ -380,15 +383,15 @@ int smp_call_function_any(const struct cpumask *mask,
>  	nodemask = cpumask_of_node(cpu_to_node(cpu));
>  	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
>  	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
> -		if (cpu_online(cpu))
> +		if (cpu_online_stable(cpu))
>  			goto call;
>  	}
>  
>  	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
> -	cpu = cpumask_any_and(mask, cpu_online_mask);
> +	cpu = cpumask_any_and(mask, cpu_online_stable_mask);
>  call:
>  	ret = smp_call_function_single(cpu, func, info, wait);
> -	put_cpu();
> +	put_online_cpus_stable_atomic();
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(smp_call_function_any);

So smp_call_function_any() has no synchronization against CPUs coming
online.  Hence callers of smp_call_function_any() are responsible for
ensuring that CPUs which are concurrently coming online will adopt the
required state?

I guess that has always been the case...


>
> ...
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
  2012-12-04  8:54 ` [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly Srivatsa S. Bhat
  2012-12-04 22:17   ` Andrew Morton
@ 2012-12-04 23:39   ` Rusty Russell
  2012-12-05 12:44     ` Srivatsa S. Bhat
  1 sibling, 1 reply; 21+ messages in thread
From: Rusty Russell @ 2012-12-04 23:39 UTC (permalink / raw)
  To: Srivatsa S. Bhat, tglx, peterz, paulmck, mingo, akpm, namhyung,
	vincent.guittot
  Cc: sbw, tj, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>
> With stop_machine() gone from the CPU offline path, we can't depend on
> preempt_disable() to prevent CPUs from going offline from under us.

Minor gripe: I'd prefer this paragraph to use the future rather than
past tense, like: "Once stop_machine() is gone ... we won't be able to
depend".

Since you're not supposed to use the _stable() accessors without calling
get_online_cpus_stable_atomic(), why not have
get_online_cpus_stable_atomic() return a pointer to the stable cpumask?
(Which is otherwise static, at least for debug).

Might make the patches messier though...

Oh, and I'd love to see actual benchmarks to make sure we've actually
fixed a problem with this ;)

Rusty.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-04 22:10   ` Andrew Morton
@ 2012-12-05  2:56     ` Michael Wang
  2012-12-05  3:28       ` Michael Wang
  2012-12-05 12:38     ` Srivatsa S. Bhat
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Wang @ 2012-12-05  2:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srivatsa S. Bhat, tglx, peterz, paulmck, rusty, mingo, namhyung,
	vincent.guittot, sbw, tj, amit.kucheria, rostedt, rjw,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

On 12/05/2012 06:10 AM, Andrew Morton wrote:
> On Tue, 04 Dec 2012 14:23:41 +0530
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> 
>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>
>> There are places where preempt_disable() is used to prevent any CPU from
>> going offline during the critical section. Let us call them as "atomic
>> hotplug readers" (atomic because they run in atomic contexts).
>>
>> Often, these atomic hotplug readers have a simple need : they want the cpu
>> online mask that they work with (inside their critical section), to be
>> stable, i.e., it should be guaranteed that CPUs in that mask won't go
>> offline during the critical section. An important point here is that they
>> don't really care if such a "stable" mask is a subset of the actual
>> cpu_online_mask.
>>
>> The intent of this patch is to provide such a "stable" cpu online mask
>> for that class of atomic hotplug readers.
>>
>> Fundamental idea behind the design:
>> -----------------------------------
>>
>> Simply put, have a separate mask called the stable cpu online mask; and
>> at the hotplug writer (cpu_down()), note down the CPU that is about to go
>> offline, and remove it from the stable cpu online mask. Then, feel free
>> to take that CPU offline, since the atomic hotplug readers won't see it
>> from now on. Also, don't start any new cpu_down() operations until all
>> existing atomic hotplug readers have completed (because they might still
>> be working with the old value of the stable online mask).
>>
>> Some important design requirements and considerations:
>> -----------------------------------------------------
>>
>> 1. The atomic hotplug readers should ideally *never* wait for the hotplug
>>    writer (cpu_down()) for *anything*. Because, these atomic hotplug readers
>>    can be in very hot-paths like interrupt handling/IPI and hence, if they
>>    have to wait for an ongoing cpu_down() to complete, it would pretty much
>>    introduce the same performance/latency problems as stop_machine().
>>
>> 2. Any synchronization at the atomic hotplug readers side must be highly
>>    scalable - avoid global locks/counters etc. Because, these paths currently
>>    use the extremely fast preempt_disable(); our replacement to
>>    preempt_disable() should not become ridiculously costly.
>>
>> 3. preempt_disable() was recursive. The replacement should also be recursive.
>>
>> Implementation of the design:
>> ----------------------------
>>
>> Atomic hotplug reader side:
>>
>> We use per-cpu counters to mark the presence of atomic hotplug readers.
>> A reader would increment its per-cpu counter and continue, without waiting
>> for anything. And no locks are used in this path. Together, these satisfy
>> all the 3 requirements mentioned above.
>>
>> The hotplug writer uses (reads) the per-cpu counters of all CPUs in order to
>> ensure that all existing atomic hotplug readers have completed. Only after
>> that, it will proceed to actually take the CPU offline.
>>
>> [ Michael: Designed the synchronization for the IPI case ]
> 
> Like this:
> 
> [wangyun@linux.vnet.ibm.com: designed the synchronization for the IPI case]
> 
>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>> [ Srivatsa: Generalized it to work for all cases and wrote the changelog ]
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> ...
>>
>> --- a/include/linux/cpu.h
>> +++ b/include/linux/cpu.h
>> @@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
>>  
>>  extern void get_online_cpus(void);
>>  extern void put_online_cpus(void);
>> +extern void get_online_cpus_stable_atomic(void);
>> +extern void put_online_cpus_stable_atomic(void);
>>  #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
>>  #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
>>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
>> @@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
>>  
>>  #define get_online_cpus()	do { } while (0)
>>  #define put_online_cpus()	do { } while (0)
>> +#define get_online_cpus_stable_atomic()	do { } while (0)
>> +#define put_online_cpus_stable_atomic()	do { } while (0)
> 
> static inline C functions would be preferred if possible.  Feel free to
> fix up the wrong crufty surrounding code as well ;)
> 
>>
>> ...
>>
>> @@ -705,12 +707,15 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
>>  
>>  #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
>>  #define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_mask)
>> +#define for_each_online_cpu_stable(cpu)					  \
>> +				for_each_cpu((cpu), cpu_online_stable_mask)
>>  #define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_mask)
>>  
>>  /* Wrappers for arch boot code to manipulate normally-constant masks */
>>  void set_cpu_possible(unsigned int cpu, bool possible);
>>  void set_cpu_present(unsigned int cpu, bool present);
>>  void set_cpu_online(unsigned int cpu, bool online);
>> +void set_cpu_online_stable(unsigned int cpu, bool online);
> 
> The naming is inconsistent.  This is "cpu_online_stable", but
> for_each_online_cpu_stable() is "online_cpu_stable".  Can we make
> everything the same?
> 
>>  void set_cpu_active(unsigned int cpu, bool active);
>>  void init_cpu_present(const struct cpumask *src);
>>  void init_cpu_possible(const struct cpumask *src);
>>
>> ...
>>
>> +void get_online_cpus_stable_atomic(void)
>> +{
>> +	preempt_disable();
>> +	this_cpu_inc(hotplug_reader_refcount);
>> +
>> +	/*
>> +	 * Prevent reordering of writes to hotplug_reader_refcount and
>> +	 * reads from cpu_online_stable_mask.
>> +	 */
>> +	smp_mb();
>> +}
>> +EXPORT_SYMBOL_GPL(get_online_cpus_stable_atomic);
>> +
>> +void put_online_cpus_stable_atomic(void)
>> +{
>> +	/*
>> +	 * Prevent reordering of reads from cpu_online_stable_mask and
>> +	 * writes to hotplug_reader_refcount.
>> +	 */
>> +	smp_mb();
>> +	this_cpu_dec(hotplug_reader_refcount);
>> +	preempt_enable();
>> +}
>> +EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic);
>> +
>>  static struct {
>>  	struct task_struct *active_writer;
>>  	struct mutex lock; /* Synchronizes accesses to refcount, */
>> @@ -237,6 +304,44 @@ static inline void check_for_tasks(int cpu)
>>  	write_unlock_irq(&tasklist_lock);
>>  }
>>  
>> +/*
>> + * We want all atomic hotplug readers to refer to the new value of
>> + * cpu_online_stable_mask. So wait for the existing atomic hotplug readers
>> + * to complete. Any new atomic hotplug readers will see the updated mask
>> + * and hence pose no problems.
>> + */
>> +static void sync_hotplug_readers(void)
>> +{
>> +	unsigned int cpu;
>> +
>> +	for_each_online_cpu(cpu) {
>> +		while (per_cpu(hotplug_reader_refcount, cpu))
>> +			cpu_relax();
>> +	}
>> +}
> 
> That all looks solid to me.
> 
>> +/*
>> + * We are serious about taking this CPU down. So clear the CPU from the
>> + * stable online mask.
>> + */
>> +static void prepare_cpu_take_down(unsigned int cpu)
>> +{
>> +	set_cpu_online_stable(cpu, false);
>> +
>> +	/*
>> +	 * Prevent reordering of writes to cpu_online_stable_mask and reads
>> +	 * from hotplug_reader_refcount.
>> +	 */
>> +	smp_mb();
>> +
>> +	/*
>> +	 * Wait for all active hotplug readers to complete, to ensure that
>> +	 * there are no critical sections still referring to the old stable
>> +	 * online mask.
>> +	 */
>> +	sync_hotplug_readers();
>> +}
> 
> I wonder about the cpu-online case.  A typical caller might want to do:
> 
> 
> /*
>  * Set each online CPU's "foo" to "bar"
>  */
> 
> int global_bar;
> 
> void set_cpu_foo(int bar)
> {
> 	get_online_cpus_stable_atomic();
> 	global_bar = bar;
> 	for_each_online_cpu_stable()
> 		cpu->foo = bar;
> 	put_online_cpus_stable_atomic()
> }
> 
> void_cpu_online_notifier_handler(void)
> {
> 	cpu->foo = global_bar;
> }
> 
> And I think that set_cpu_foo() would be buggy, because a CPU could come
> online before global_bar was altered, and that newly-online CPU would
> pick up the old value of `bar'.
> 
> So what's the rule here?  global_bar must be written before we run
> get_online_cpus_stable_atomic()?
> 
> Anyway, please have a think and spell all this out?

That's right, actually this related to one question, should the hotplug
happen during get_online and put_online?

Answer will be YES according to old API which using mutex, the hotplug
won't happen in critical section, but the cost is get_online() will
block, which will kill the performance.

So we designed this mechanism to do acceleration, but as you pointed
out, although the result will never be wrong, but the 'stable' mask is
not stable since it could be changed in critical section.

And we have two solution.

One is from Srivatsa, using 'read_lock' and 'write_lock', it will
prevent hotplug happen just like the old rule, the cost is we need a
global 'rw_lock' which perform bad on NUMA system, and no doubt,
get_online() will block for short time when doing hotplug.

Another is to maintain a per-cpu cache mask, this mask will only be
updated in get_online(), and be used in critical section, then we will
get a real stable mask, but one flaw is, on different cpu in critical
section, online mask will be different.

We will be appreciate if we could collect some comments on which one to
be used in next version.

Regards,
Michael Wang

> 
>>  struct take_cpu_down_param {
>>  	unsigned long mod;
>>  	void *hcpu;
>> @@ -246,7 +351,9 @@ struct take_cpu_down_param {
>>  static int __ref take_cpu_down(void *_param)
>>  {
>>  	struct take_cpu_down_param *param = _param;
>> -	int err;
>> +	int err, cpu = (long)(param->hcpu);
>> +
> 
> Like this please:
> 
> 	int err;
> 	int cpu = (long)(param->hcpu);
> 
>> +	prepare_cpu_take_down(cpu);
>>  
>>  	/* Ensure this CPU doesn't handle any more interrupts. */
>>  	err = __cpu_disable();
>>
>> ...
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-05  2:56     ` Michael Wang
@ 2012-12-05  3:28       ` Michael Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Wang @ 2012-12-05  3:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srivatsa S. Bhat, tglx, peterz, paulmck, rusty, mingo, namhyung,
	vincent.guittot, sbw, tj, amit.kucheria, rostedt, rjw,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

On 12/05/2012 10:56 AM, Michael Wang wrote:
[...]
>>
>> I wonder about the cpu-online case.  A typical caller might want to do:
>>
>>
>> /*
>>  * Set each online CPU's "foo" to "bar"
>>  */
>>
>> int global_bar;
>>
>> void set_cpu_foo(int bar)
>> {
>> 	get_online_cpus_stable_atomic();
>> 	global_bar = bar;
>> 	for_each_online_cpu_stable()
>> 		cpu->foo = bar;
>> 	put_online_cpus_stable_atomic()
>> }
>>
>> void_cpu_online_notifier_handler(void)
>> {
>> 	cpu->foo = global_bar;
>> }

Oh, forgive me for misunderstanding your question :(

In this case, we have to prevent hotplug happen, not just ensure the
online mask is correct.

Hmm..., we need more consideration.

Regards,
Michael Wang

>>
>> And I think that set_cpu_foo() would be buggy, because a CPU could come
>> online before global_bar was altered, and that newly-online CPU would
>> pick up the old value of `bar'.
>>
>> So what's the rule here?  global_bar must be written before we run
>> get_online_cpus_stable_atomic()?
>>
>> Anyway, please have a think and spell all this out?
> 
> That's right, actually this related to one question, should the hotplug
> happen during get_online and put_online?
> 
> Answer will be YES according to old API which using mutex, the hotplug
> won't happen in critical section, but the cost is get_online() will
> block, which will kill the performance.
> 
> So we designed this mechanism to do acceleration, but as you pointed
> out, although the result will never be wrong, but the 'stable' mask is
> not stable since it could be changed in critical section.
> 
> And we have two solution.
> 
> One is from Srivatsa, using 'read_lock' and 'write_lock', it will
> prevent hotplug happen just like the old rule, the cost is we need a
> global 'rw_lock' which perform bad on NUMA system, and no doubt,
> get_online() will block for short time when doing hotplug.
> 
> Another is to maintain a per-cpu cache mask, this mask will only be
> updated in get_online(), and be used in critical section, then we will
> get a real stable mask, but one flaw is, on different cpu in critical
> section, online mask will be different.
> 
> We will be appreciate if we could collect some comments on which one to
> be used in next version.
> 
> Regards,
> Michael Wang
> 
>>
>>>  struct take_cpu_down_param {
>>>  	unsigned long mod;
>>>  	void *hcpu;
>>> @@ -246,7 +351,9 @@ struct take_cpu_down_param {
>>>  static int __ref take_cpu_down(void *_param)
>>>  {
>>>  	struct take_cpu_down_param *param = _param;
>>> -	int err;
>>> +	int err, cpu = (long)(param->hcpu);
>>> +
>>
>> Like this please:
>>
>> 	int err;
>> 	int cpu = (long)(param->hcpu);
>>
>>> +	prepare_cpu_take_down(cpu);
>>>  
>>>  	/* Ensure this CPU doesn't handle any more interrupts. */
>>>  	err = __cpu_disable();
>>>
>>> ...
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers
  2012-12-04 22:10   ` Andrew Morton
  2012-12-05  2:56     ` Michael Wang
@ 2012-12-05 12:38     ` Srivatsa S. Bhat
  1 sibling, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-05 12:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tglx, peterz, paulmck, rusty, mingo, namhyung, vincent.guittot,
	sbw, tj, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong,
	nikunj, linux-pm, linux-kernel

On 12/05/2012 03:40 AM, Andrew Morton wrote:
> On Tue, 04 Dec 2012 14:23:41 +0530
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> 
>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>
>> There are places where preempt_disable() is used to prevent any CPU from
>> going offline during the critical section. Let us call them as "atomic
>> hotplug readers" (atomic because they run in atomic contexts).
>>
>> Often, these atomic hotplug readers have a simple need : they want the cpu
>> online mask that they work with (inside their critical section), to be
>> stable, i.e., it should be guaranteed that CPUs in that mask won't go
>> offline during the critical section. An important point here is that they
>> don't really care if such a "stable" mask is a subset of the actual
>> cpu_online_mask.
>>
>> The intent of this patch is to provide such a "stable" cpu online mask
>> for that class of atomic hotplug readers.
>>
[...]
>>
>> ...
>>
>> @@ -705,12 +707,15 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
>>  
>>  #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
>>  #define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_mask)
>> +#define for_each_online_cpu_stable(cpu)					  \
>> +				for_each_cpu((cpu), cpu_online_stable_mask)
>>  #define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_mask)
>>  
>>  /* Wrappers for arch boot code to manipulate normally-constant masks */
>>  void set_cpu_possible(unsigned int cpu, bool possible);
>>  void set_cpu_present(unsigned int cpu, bool present);
>>  void set_cpu_online(unsigned int cpu, bool online);
>> +void set_cpu_online_stable(unsigned int cpu, bool online);
> 
> The naming is inconsistent.  This is "cpu_online_stable", but
> for_each_online_cpu_stable() is "online_cpu_stable".  Can we make
> everything the same?
> 

I've actually gotten rid of the cpu_online_stable_mask in my new version
(which I'll post shortly), based on Tejun's suggestion.

>>  void set_cpu_active(unsigned int cpu, bool active);
>>  void init_cpu_present(const struct cpumask *src);
>>  void init_cpu_possible(const struct cpumask *src);
>>
>> ...
>>
>> +void get_online_cpus_stable_atomic(void)
>> +{
>> +	preempt_disable();
>> +	this_cpu_inc(hotplug_reader_refcount);
>> +
>> +	/*
>> +	 * Prevent reordering of writes to hotplug_reader_refcount and
>> +	 * reads from cpu_online_stable_mask.
>> +	 */
>> +	smp_mb();
>> +}
>> +EXPORT_SYMBOL_GPL(get_online_cpus_stable_atomic);
>> +
>> +void put_online_cpus_stable_atomic(void)
>> +{
>> +	/*
>> +	 * Prevent reordering of reads from cpu_online_stable_mask and
>> +	 * writes to hotplug_reader_refcount.
>> +	 */
>> +	smp_mb();
>> +	this_cpu_dec(hotplug_reader_refcount);
>> +	preempt_enable();
>> +}
>> +EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic);
>> +
>>  static struct {
>>  	struct task_struct *active_writer;
>>  	struct mutex lock; /* Synchronizes accesses to refcount, */
>> @@ -237,6 +304,44 @@ static inline void check_for_tasks(int cpu)
>>  	write_unlock_irq(&tasklist_lock);
>>  }
>>  
>> +/*
>> + * We want all atomic hotplug readers to refer to the new value of
>> + * cpu_online_stable_mask. So wait for the existing atomic hotplug readers
>> + * to complete. Any new atomic hotplug readers will see the updated mask
>> + * and hence pose no problems.
>> + */
>> +static void sync_hotplug_readers(void)
>> +{
>> +	unsigned int cpu;
>> +
>> +	for_each_online_cpu(cpu) {
>> +		while (per_cpu(hotplug_reader_refcount, cpu))
>> +			cpu_relax();
>> +	}
>> +}
> 
> That all looks solid to me.

Actually it isn't fully correct. For example, consider a reader such as this:

get_online_cpus_atomic_stable();

for_each_online_cpu_stable(cpu)
	do_operation_X();

for_each_online_cpu_stable(cpu)
	undo_operation_X();

put_online_cpus_atomic_stable();

Here, the stable mask is supposed to remain *unchanged* throughout the
entire duration of get/put_online_cpus_stable_atomic(). However, since the
hotplug writer updates the stable online mask without waiting for anything,
the reader can see updates to the stable mask *in-between* his critical section!
So he can end up doing operation_X() on CPUs 1, 2 and 3 and undoing the operation
on only CPUs 1 and 2, for example, because CPU 3 was removed from the stable
mask in the meantime, by the hotplug writer!

IOW, it actually breaks the fundamental guarantee that we set out to provide
in the first place! Of course, the usecase I gave above is hypothetical, but
it _is_ valid and important nevertheless, IMHO.

Anyway, the new version (which gets rid of the extra cpumask) won't get
into such issues. I actually have a version of the "extra cpumask" patchset
that fixes this particular issue using rwlocks (like Michael mentioned), but
I won't post it because IMHO it is much superior to provide the necessary
synchronization without using such extra cpumasks at all.

> 
>> +/*
>> + * We are serious about taking this CPU down. So clear the CPU from the
>> + * stable online mask.
>> + */
>> +static void prepare_cpu_take_down(unsigned int cpu)
>> +{
>> +	set_cpu_online_stable(cpu, false);
>> +
>> +	/*
>> +	 * Prevent reordering of writes to cpu_online_stable_mask and reads
>> +	 * from hotplug_reader_refcount.
>> +	 */
>> +	smp_mb();
>> +
>> +	/*
>> +	 * Wait for all active hotplug readers to complete, to ensure that
>> +	 * there are no critical sections still referring to the old stable
>> +	 * online mask.
>> +	 */
>> +	sync_hotplug_readers();
>> +}
> 
> I wonder about the cpu-online case.  A typical caller might want to do:
> 
> 
> /*
>  * Set each online CPU's "foo" to "bar"
>  */
> 
> int global_bar;
> 
> void set_cpu_foo(int bar)
> {
> 	get_online_cpus_stable_atomic();
> 	global_bar = bar;
> 	for_each_online_cpu_stable()
> 		cpu->foo = bar;
> 	put_online_cpus_stable_atomic()
> }
> 
> void_cpu_online_notifier_handler(void)
> {
> 	cpu->foo = global_bar;
> }
> 
> And I think that set_cpu_foo() would be buggy, because a CPU could come
> online before global_bar was altered, and that newly-online CPU would
> pick up the old value of `bar'.
> 
> So what's the rule here?  global_bar must be written before we run
> get_online_cpus_stable_atomic()?
> 
> Anyway, please have a think and spell all this out?
> 

Can I please skip this issue of CPUs coming online for now?

preempt_disable() along with stop_machine() never prevented CPUs from coming
online. It only prevented CPUs from going offline. The drop-in replacement
to stop_machine() should also provide that guarantee, at minimum. Later we
can either improvise it to also prevent CPU online, or probably come up with
suitable rules/conventions to deal with it.

In summary, I would like to have a solid replacement for stop_machine() as
the first goal. I hope that sounds reasonable...?

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
  2012-12-04 22:17   ` Andrew Morton
@ 2012-12-05 12:41     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-05 12:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: tglx, peterz, paulmck, rusty, mingo, namhyung, vincent.guittot,
	sbw, tj, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong,
	nikunj, linux-pm, linux-kernel

On 12/05/2012 03:47 AM, Andrew Morton wrote:
> On Tue, 04 Dec 2012 14:24:28 +0530
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> 
>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>
>> With stop_machine() gone from the CPU offline path, we can't depend on
>> preempt_disable() to prevent CPUs from going offline from under us.
>>
>> Use the get/put_online_cpus_stable_atomic() APIs to prevent CPUs from going
>> offline, while invoking from atomic context.
>>
>> ...
>>
>>  	 */
>> -	this_cpu = get_cpu();
>> +	get_online_cpus_stable_atomic();
>> +	this_cpu = smp_processor_id();
> 
> I wonder if get_online_cpus_stable_atomic() should return the local CPU
> ID.  Just as a little convenience thing.  Time will tell.
>

With the new version which doesn't use extra cpumasks, we won't have to
bother about this..
 
>>  	/*
>>  	 * Can deadlock when called with interrupts disabled.
>>
>> ...
>>
>> @@ -380,15 +383,15 @@ int smp_call_function_any(const struct cpumask *mask,
>>  	nodemask = cpumask_of_node(cpu_to_node(cpu));
>>  	for (cpu = cpumask_first_and(nodemask, mask); cpu < nr_cpu_ids;
>>  	     cpu = cpumask_next_and(cpu, nodemask, mask)) {
>> -		if (cpu_online(cpu))
>> +		if (cpu_online_stable(cpu))
>>  			goto call;
>>  	}
>>  
>>  	/* Any online will do: smp_call_function_single handles nr_cpu_ids. */
>> -	cpu = cpumask_any_and(mask, cpu_online_mask);
>> +	cpu = cpumask_any_and(mask, cpu_online_stable_mask);
>>  call:
>>  	ret = smp_call_function_single(cpu, func, info, wait);
>> -	put_cpu();
>> +	put_online_cpus_stable_atomic();
>>  	return ret;
>>  }
>>  EXPORT_SYMBOL_GPL(smp_call_function_any);
> 
> So smp_call_function_any() has no synchronization against CPUs coming
> online.  Hence callers of smp_call_function_any() are responsible for
> ensuring that CPUs which are concurrently coming online will adopt the
> required state?
>

Yes.
 
> I guess that has always been the case...
> 

Right.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
  2012-12-04 23:39   ` Rusty Russell
@ 2012-12-05 12:44     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 21+ messages in thread
From: Srivatsa S. Bhat @ 2012-12-05 12:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: tglx, peterz, paulmck, mingo, akpm, namhyung, vincent.guittot,
	sbw, tj, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong,
	nikunj, linux-pm, linux-kernel

On 12/05/2012 05:09 AM, Rusty Russell wrote:
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
>> From: Michael Wang <wangyun@linux.vnet.ibm.com>
>>
>> With stop_machine() gone from the CPU offline path, we can't depend on
>> preempt_disable() to prevent CPUs from going offline from under us.
> 
> Minor gripe: I'd prefer this paragraph to use the future rather than
> past tense, like: "Once stop_machine() is gone ... we won't be able to
> depend".
> 

Fixed in the new version.

> Since you're not supposed to use the _stable() accessors without calling
> get_online_cpus_stable_atomic(), why not have
> get_online_cpus_stable_atomic() return a pointer to the stable cpumask?
> (Which is otherwise static, at least for debug).
>

We don't have to worry about this now because the new version doesn't
use stable cpumask.
 
> Might make the patches messier though...
> 
> Oh, and I'd love to see actual benchmarks to make sure we've actually
> fixed a problem with this ;)
> 

Of course :-) They will follow, once we have a proper implementation
that we are happy with :-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-12-05 12:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-04  8:53 [RFC PATCH 00/10] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
2012-12-04  8:53 ` [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu online mask, for atomic hotplug readers Srivatsa S. Bhat
2012-12-04 15:17   ` Tejun Heo
2012-12-04 21:14     ` Srivatsa S. Bhat
2012-12-04 22:10   ` Andrew Morton
2012-12-05  2:56     ` Michael Wang
2012-12-05  3:28       ` Michael Wang
2012-12-05 12:38     ` Srivatsa S. Bhat
2012-12-04  8:54 ` [RFC PATCH 02/10] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly Srivatsa S. Bhat
2012-12-04 22:17   ` Andrew Morton
2012-12-05 12:41     ` Srivatsa S. Bhat
2012-12-04 23:39   ` Rusty Russell
2012-12-05 12:44     ` Srivatsa S. Bhat
2012-12-04  8:54 ` [RFC PATCH 03/10] smp, cpu hotplug: Fix on_each_cpu_*() " Srivatsa S. Bhat
2012-12-04  8:54 ` [RFC PATCH 04/10] sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq() Srivatsa S. Bhat
2012-12-04  8:55 ` [RFC PATCH 05/10] kick_process(), cpu-hotplug: Prevent offlining of target CPU properly Srivatsa S. Bhat
2012-12-04  8:55 ` [RFC PATCH 06/10] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly Srivatsa S. Bhat
2012-12-04  8:55 ` [RFC PATCH 07/10] KVM: VMX: fix invalid cpu passed to smp_call_function_single Srivatsa S. Bhat
2012-12-04  8:55 ` [RFC PATCH 08/10] KVM: VMX: fix memory order between loading vmcs and clearing vmcs Srivatsa S. Bhat
2012-12-04  8:56 ` [RFC PATCH 09/10] KVM: VMX: fix unsyc vmcs status when cpu is going down Srivatsa S. Bhat
2012-12-04  8:56 ` [RFC PATCH 10/10] cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).