linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch v3 0/2] enhance stop machine infrastructure for MTRR rendezvous sequence
@ 2011-06-09 22:32 Suresh Siddha
  2011-06-09 22:32 ` [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path Suresh Siddha
  2011-06-09 22:32 ` [patch v3 2/2] x86, mtrr: use __stop_machine() for doing MTRR rendezvous Suresh Siddha
  0 siblings, 2 replies; 5+ messages in thread
From: Suresh Siddha @ 2011-06-09 22:32 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty
  Cc: linux-kernel, suresh.b.siddha, youquan.song

First patch enhance the stop machine infrastructure so that we
can call __stop_machine() from the cpu hotplug path, where the calling
cpu is not yet online. We do allow this for already for !CONFIG_SMP. So this
patch brings the CONFIG_SMP behavior inline with !CONFIG_SMP

Second patch uses the enhanced __stop_machine() to implement the x86 MTRR
init rendezvous sequence and thus remove the duplicate implementation
of stop machine using stop_one_cpu_nowait(). This duplicate implementation
of stop machine can potentially lead to deadlock if there is a parallel
system wide rendezvous using __stop_machine().

Both these address the deadlock mentioned in the
https://bugzilla.novell.com/show_bug.cgi?id=672008

Changes from v1:
* Use stop_cpu thread enabled status to find out if the cpu is online/offline,
  instead of using cpu_online(smp_processor_id()). This avoids a false
  positive of using smp_processor_id() from preemptible section.

Changes from v2:
* Use cpu_all_mask, instead of NULL mask to include the calling cpu that is
  offline.
* Make __stop_cpus static
* Conslidate wait_for_completion/polling code into
  cpu_stop_wait_for_completion() based on Tejun's review.

thanks,
suresh



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path
  2011-06-09 22:32 [patch v3 0/2] enhance stop machine infrastructure for MTRR rendezvous sequence Suresh Siddha
@ 2011-06-09 22:32 ` Suresh Siddha
  2011-06-10 13:05   ` Tejun Heo
  2011-06-09 22:32 ` [patch v3 2/2] x86, mtrr: use __stop_machine() for doing MTRR rendezvous Suresh Siddha
  1 sibling, 1 reply; 5+ messages in thread
From: Suresh Siddha @ 2011-06-09 22:32 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty
  Cc: linux-kernel, suresh.b.siddha, youquan.song, stable

[-- Attachment #1: stop_machine_from_cpu_hotplug_path.patch --]
[-- Type: text/plain, Size: 5616 bytes --]

Currently stop machine infrastructure can be called only from a cpu that is
online. But for !CONFIG_SMP, we do allow calling __stop_machine() before the
cpu is online.

x86 for example requires stop machine infrastructure in the cpu online path
and currently implements its own stop machine (using stop_one_cpu_nowait())
for MTRR initialization in the cpu online path.

Enhance the __stop_machine() so that it can be called before the cpu
is onlined. This will pave the way for code consolidation and address potential
deadlocks caused by multiple mechanisms of doing system wide rendezvous.

This will also address the behavioral differences of __stop_machine()
between SMP and UP builds.

Also mark __stop_cpus() to be static, no one else uses it.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org    # v2.6.35+
---
 kernel/stop_machine.c |   58 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 49 insertions(+), 9 deletions(-)

Index: linux-2.6-tip/kernel/stop_machine.c
===================================================================
--- linux-2.6-tip.orig/kernel/stop_machine.c
+++ linux-2.6-tip/kernel/stop_machine.c
@@ -28,6 +28,7 @@
 struct cpu_stop_done {
 	atomic_t		nr_todo;	/* nr left to execute */
 	bool			executed;	/* actually executed? */
+	bool			offline_ctxt;	/* stop_cpu from offline ctxt */
 	int			ret;		/* collected return value */
 	struct completion	completion;	/* fired if nr_todo reaches 0 */
 };
@@ -47,15 +48,32 @@ static void cpu_stop_init_done(struct cp
 	memset(done, 0, sizeof(*done));
 	atomic_set(&done->nr_todo, nr_todo);
 	init_completion(&done->completion);
+	done->offline_ctxt = !percpu_read(cpu_stopper.enabled);
+}
+
+static inline void cpu_stop_wait_for_completion(struct cpu_stop_done *done)
+{
+	if (!done->offline_ctxt)
+		wait_for_completion(&done->completion);
+	else {
+		/*
+		 * If the calling cpu is not online, then we can't afford to
+		 * sleep, so poll till the work is completed on the target
+		 * cpu's.
+		 */
+		while (atomic_read(&done->nr_todo))
+			cpu_relax();
+	}
 }
 
 /* signal completion unless @done is NULL */
 static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
 {
 	if (done) {
+		bool offline_ctxt = done->offline_ctxt;
 		if (executed)
 			done->executed = true;
-		if (atomic_dec_and_test(&done->nr_todo))
+		if (atomic_dec_and_test(&done->nr_todo) &&  !offline_ctxt)
 			complete(&done->completion);
 	}
 }
@@ -108,7 +126,7 @@ int stop_one_cpu(unsigned int cpu, cpu_s
 
 	cpu_stop_init_done(&done, 1);
 	cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), &work);
-	wait_for_completion(&done.completion);
+	cpu_stop_wait_for_completion(&done);
 	return done.executed ? done.ret : -ENOENT;
 }
 
@@ -136,20 +154,24 @@ void stop_one_cpu_nowait(unsigned int cp
 static DEFINE_MUTEX(stop_cpus_mutex);
 static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
 
+static
 int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
 {
+	int online = percpu_read(cpu_stopper.enabled);
 	struct cpu_stop_work *work;
 	struct cpu_stop_done done;
+	unsigned int weight = 0;
 	unsigned int cpu;
 
 	/* initialize works and done */
-	for_each_cpu(cpu, cpumask) {
+	for_each_cpu_and(cpu, cpumask, cpu_online_mask) {
 		work = &per_cpu(stop_cpus_work, cpu);
 		work->fn = fn;
 		work->arg = arg;
 		work->done = &done;
+		weight++;
 	}
-	cpu_stop_init_done(&done, cpumask_weight(cpumask));
+	cpu_stop_init_done(&done, weight);
 
 	/*
 	 * Disable preemption while queueing to avoid getting
@@ -157,12 +179,19 @@ int __stop_cpus(const struct cpumask *cp
 	 * to enter @fn which can lead to deadlock.
 	 */
 	preempt_disable();
-	for_each_cpu(cpu, cpumask)
+	for_each_cpu_and(cpu, cpumask, cpu_online_mask)
 		cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu),
 				    &per_cpu(stop_cpus_work, cpu));
+
+	/*
+	 * This cpu is not yet online. If @fn needs to be run on this
+	 * cpu, run it now.
+	 */
+	if (!online && cpu_isset(smp_processor_id(), *cpumask))
+		fn(arg);
 	preempt_enable();
 
-	wait_for_completion(&done.completion);
+	cpu_stop_wait_for_completion(&done);
 	return done.executed ? done.ret : -ENOENT;
 }
 
@@ -431,6 +460,7 @@ static int stop_machine_cpu_stop(void *d
 	struct stop_machine_data *smdata = data;
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 	int cpu = smp_processor_id(), err = 0;
+	unsigned long flags = 0;
 	bool is_active;
 
 	if (!smdata->active_cpus)
@@ -446,7 +476,7 @@ static int stop_machine_cpu_stop(void *d
 			curstate = smdata->state;
 			switch (curstate) {
 			case STOPMACHINE_DISABLE_IRQ:
-				local_irq_disable();
+				local_irq_save(flags);
 				hard_irq_disable();
 				break;
 			case STOPMACHINE_RUN:
@@ -460,7 +490,7 @@ static int stop_machine_cpu_stop(void *d
 		}
 	} while (curstate != STOPMACHINE_EXIT);
 
-	local_irq_enable();
+	local_irq_restore(flags);
 	return err;
 }
 
@@ -470,9 +500,19 @@ int __stop_machine(int (*fn)(void *), vo
 					    .num_threads = num_online_cpus(),
 					    .active_cpus = cpus };
 
+	/* Include the calling cpu that might not be online yet. */
+	if (!percpu_read(cpu_stopper.enabled))
+		smdata.num_threads++;
+
 	/* Set the initial state and stop all online cpus. */
 	set_state(&smdata, STOPMACHINE_PREPARE);
-	return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
+
+	if (percpu_read(cpu_stopper.enabled))
+		return stop_cpus(cpu_online_mask, stop_machine_cpu_stop,
+				 &smdata);
+	else
+		return __stop_cpus(cpu_all_mask, stop_machine_cpu_stop,
+				   &smdata);
 }
 
 int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch v3 2/2] x86, mtrr: use __stop_machine() for doing MTRR rendezvous
  2011-06-09 22:32 [patch v3 0/2] enhance stop machine infrastructure for MTRR rendezvous sequence Suresh Siddha
  2011-06-09 22:32 ` [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path Suresh Siddha
@ 2011-06-09 22:32 ` Suresh Siddha
  1 sibling, 0 replies; 5+ messages in thread
From: Suresh Siddha @ 2011-06-09 22:32 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty
  Cc: linux-kernel, suresh.b.siddha, youquan.song, stable

[-- Attachment #1: use_stop_machine_for_mtrr_rendezvous.patch --]
[-- Type: text/plain, Size: 7633 bytes --]

MTRR rendezvous seqeuence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).

MTRR rendezvous sequence is not implemened using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).

Now that __stop_machine() works even when the calling cpu is not online,
use __stop_machine() to implement the MTRR rendezvous sequence. This
will consolidate the code aswell as avoid the above mentioned deadlock.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org    # v2.6.35+
---
 arch/x86/kernel/cpu/mtrr/main.c |  154 +++++++---------------------------------
 1 file changed, 27 insertions(+), 127 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c
@@ -137,18 +137,15 @@ static void __init init_table(void)
 }
 
 struct set_mtrr_data {
-	atomic_t	count;
-	atomic_t	gate;
 	unsigned long	smp_base;
 	unsigned long	smp_size;
 	unsigned int	smp_reg;
 	mtrr_type	smp_type;
 };
 
-static DEFINE_PER_CPU(struct cpu_stop_work, mtrr_work);
-
 /**
- * mtrr_work_handler - Synchronisation handler. Executed by "other" CPUs.
+ * mtrr_work_handler - Work done in the synchronisation handler. Executed by
+ * all the CPUs.
  * @info: pointer to mtrr configuration data
  *
  * Returns nothing.
@@ -157,35 +154,26 @@ static int mtrr_work_handler(void *info)
 {
 #ifdef CONFIG_SMP
 	struct set_mtrr_data *data = info;
-	unsigned long flags;
-
-	atomic_dec(&data->count);
-	while (!atomic_read(&data->gate))
-		cpu_relax();
-
-	local_irq_save(flags);
 
-	atomic_dec(&data->count);
-	while (atomic_read(&data->gate))
-		cpu_relax();
-
-	/*  The master has cleared me to execute  */
+	/*
+	 * We use this same function to initialize the mtrrs during boot,
+	 * resume, runtime cpu online and on an explicit request to set a
+	 * specific MTRR.
+	 *
+	 * During boot or suspend, the state of the boot cpu's mtrrs has been
+	 * saved, and we want to replicate that across all the cpus that come
+	 * online (either at the end of boot or resume or during a runtime cpu
+	 * online). If we're doing that, @reg is set to something special and on
+	 * all the cpu's we do mtrr_if->set_all() (On the logical cpu that
+	 * started the boot/resume sequence, this might be a duplicate
+	 * set_all()).
+	 */
 	if (data->smp_reg != ~0U) {
 		mtrr_if->set(data->smp_reg, data->smp_base,
 			     data->smp_size, data->smp_type);
-	} else if (mtrr_aps_delayed_init) {
-		/*
-		 * Initialize the MTRRs inaddition to the synchronisation.
-		 */
+	} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
 		mtrr_if->set_all();
 	}
-
-	atomic_dec(&data->count);
-	while (!atomic_read(&data->gate))
-		cpu_relax();
-
-	atomic_dec(&data->count);
-	local_irq_restore(flags);
 #endif
 	return 0;
 }
@@ -223,20 +211,11 @@ static inline int types_compatible(mtrr_
  * 14. Wait for buddies to catch up
  * 15. Enable interrupts.
  *
- * What does that mean for us? Well, first we set data.count to the number
- * of CPUs. As each CPU announces that it started the rendezvous handler by
- * decrementing the count, We reset data.count and set the data.gate flag
- * allowing all the cpu's to proceed with the work. As each cpu disables
- * interrupts, it'll decrement data.count once. We wait until it hits 0 and
- * proceed. We clear the data.gate flag and reset data.count. Meanwhile, they
- * are waiting for that flag to be cleared. Once it's cleared, each
- * CPU goes through the transition of updating MTRRs.
- * The CPU vendors may each do it differently,
- * so we call mtrr_if->set() callback and let them take care of it.
- * When they're done, they again decrement data->count and wait for data.gate
- * to be set.
- * When we finish, we wait for data.count to hit 0 and toggle the data.gate flag
- * Everyone then enables interrupts and we all continue on.
+ * What does that mean for us? Well, __stop_machine() will ensure that
+ * the rendezvous handler is started on each CPU. And in lockstep they
+ * do the state transition of disabling interrupts, updating MTRR's
+ * (the CPU vendors may each do it differently, so we call mtrr_if->set()
+ * callback and let them take care of it.) and enabling interrupts.
  *
  * Note that the mechanism is the same for UP systems, too; all the SMP stuff
  * becomes nops.
@@ -244,92 +223,13 @@ static inline int types_compatible(mtrr_
 static void
 set_mtrr(unsigned int reg, unsigned long base, unsigned long size, mtrr_type type)
 {
-	struct set_mtrr_data data;
-	unsigned long flags;
-	int cpu;
-
-	preempt_disable();
-
-	data.smp_reg = reg;
-	data.smp_base = base;
-	data.smp_size = size;
-	data.smp_type = type;
-	atomic_set(&data.count, num_booting_cpus() - 1);
-
-	/* Make sure data.count is visible before unleashing other CPUs */
-	smp_wmb();
-	atomic_set(&data.gate, 0);
-
-	/* Start the ball rolling on other CPUs */
-	for_each_online_cpu(cpu) {
-		struct cpu_stop_work *work = &per_cpu(mtrr_work, cpu);
-
-		if (cpu == smp_processor_id())
-			continue;
-
-		stop_one_cpu_nowait(cpu, mtrr_work_handler, &data, work);
-	}
-
-
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	/* Ok, reset count and toggle gate */
-	atomic_set(&data.count, num_booting_cpus() - 1);
-	smp_wmb();
-	atomic_set(&data.gate, 1);
-
-	local_irq_save(flags);
-
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	/* Ok, reset count and toggle gate */
-	atomic_set(&data.count, num_booting_cpus() - 1);
-	smp_wmb();
-	atomic_set(&data.gate, 0);
-
-	/* Do our MTRR business */
-
-	/*
-	 * HACK!
-	 *
-	 * We use this same function to initialize the mtrrs during boot,
-	 * resume, runtime cpu online and on an explicit request to set a
-	 * specific MTRR.
-	 *
-	 * During boot or suspend, the state of the boot cpu's mtrrs has been
-	 * saved, and we want to replicate that across all the cpus that come
-	 * online (either at the end of boot or resume or during a runtime cpu
-	 * online). If we're doing that, @reg is set to something special and on
-	 * this cpu we still do mtrr_if->set_all(). During boot/resume, this
-	 * is unnecessary if at this point we are still on the cpu that started
-	 * the boot/resume sequence. But there is no guarantee that we are still
-	 * on the same cpu. So we do mtrr_if->set_all() on this cpu aswell to be
-	 * sure that we are in sync with everyone else.
-	 */
-	if (reg != ~0U)
-		mtrr_if->set(reg, base, size, type);
-	else
-		mtrr_if->set_all();
-
-	/* Wait for the others */
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	atomic_set(&data.count, num_booting_cpus() - 1);
-	smp_wmb();
-	atomic_set(&data.gate, 1);
-
-	/*
-	 * Wait here for everyone to have seen the gate change
-	 * So we're the last ones to touch 'data'
-	 */
-	while (atomic_read(&data.count))
-		cpu_relax();
+	struct set_mtrr_data data = { .smp_reg = reg,
+				      .smp_base = base,
+				      .smp_size = size,
+				      .smp_type = type
+				    };
 
-	local_irq_restore(flags);
-	preempt_enable();
+	__stop_machine(mtrr_work_handler, &data, cpu_callout_mask);
 }
 
 /**



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path
  2011-06-09 22:32 ` [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path Suresh Siddha
@ 2011-06-10 13:05   ` Tejun Heo
  2011-06-13 17:58     ` Suresh Siddha
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2011-06-10 13:05 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, tglx, hpa, trenn, prarit, rusty, linux-kernel,
	youquan.song, stable

Hello, Suresh.

On Thu, Jun 09, 2011 at 03:32:12PM -0700, Suresh Siddha wrote:
> Index: linux-2.6-tip/kernel/stop_machine.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/stop_machine.c
> +++ linux-2.6-tip/kernel/stop_machine.c
> @@ -28,6 +28,7 @@
>  struct cpu_stop_done {
>  	atomic_t		nr_todo;	/* nr left to execute */
>  	bool			executed;	/* actually executed? */
> +	bool			offline_ctxt;	/* stop_cpu from offline ctxt */

Maybe something a bit more explicit is better?  Say, from_offline_cpu?

>  	int			ret;		/* collected return value */
>  	struct completion	completion;	/* fired if nr_todo reaches 0 */
>  };
> @@ -47,15 +48,32 @@ static void cpu_stop_init_done(struct cp
>  	memset(done, 0, sizeof(*done));
>  	atomic_set(&done->nr_todo, nr_todo);
>  	init_completion(&done->completion);
> +	done->offline_ctxt = !percpu_read(cpu_stopper.enabled);
> +}

I'm not sure the above test is safe.  CPU_ONLINE notification is not
guaranteed to execute on the CPU being brought online.  It's likely to
happen on the CPU which is bring up the target CPU, so
cpu_stopper.enabled may change asynchronously.  I think the correct
test to perform is querying local CPU onlineness with accompanying
WARN_ON_ONCE() on preemption enable state.

> +static inline void cpu_stop_wait_for_completion(struct cpu_stop_done *done)
> +{
> +	if (!done->offline_ctxt)
> +		wait_for_completion(&done->completion);

Missing {}'s.

> +	else {

Shouldn't this function execute the function on local CPU?

> +		/*
> +		 * If the calling cpu is not online, then we can't afford to
> +		 * sleep, so poll till the work is completed on the target
> +		 * cpu's.
> +		 */
> +		while (atomic_read(&done->nr_todo))
> +			cpu_relax();
> +	}
>  }
>  
>  /* signal completion unless @done is NULL */
>  static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
>  {
>  	if (done) {
> +		bool offline_ctxt = done->offline_ctxt;
>  		if (executed)
>  			done->executed = true;
> -		if (atomic_dec_and_test(&done->nr_todo))
> +		if (atomic_dec_and_test(&done->nr_todo) &&  !offline_ctxt)
>  			complete(&done->completion);

What does the local variable achieve?  Also, an extra space.

> +static
>  int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)

Please put static on the same line.

>  {
> +	int online = percpu_read(cpu_stopper.enabled);
>  	struct cpu_stop_work *work;
>  	struct cpu_stop_done done;
> +	unsigned int weight = 0;
>  	unsigned int cpu;
>  
>  	/* initialize works and done */
> -	for_each_cpu(cpu, cpumask) {
> +	for_each_cpu_and(cpu, cpumask, cpu_online_mask) {
>  		work = &per_cpu(stop_cpus_work, cpu);
>  		work->fn = fn;
>  		work->arg = arg;
>  		work->done = &done;
> +		weight++;

This seems a bit dangerous.  Upto now, whether to execute or not is
solely determined by testing stopper->enabled while holding
stopper->lock.  There's no reason to change that behavior for this
feature and even if necessary it should be done in a separate patch
with ample explanation why that's necessary and why it's safe.

>  	preempt_disable();
> -	for_each_cpu(cpu, cpumask)
> +	for_each_cpu_and(cpu, cpumask, cpu_online_mask)
>  		cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu),
>  				    &per_cpu(stop_cpus_work, cpu));

Ditto.

> +	/*
> +	 * This cpu is not yet online. If @fn needs to be run on this
> +	 * cpu, run it now.
> +	 */
> +	if (!online && cpu_isset(smp_processor_id(), *cpumask))
> +		fn(arg);

Can't we put this in cpu_stop_wait_for_completion()?  And please
factor out fn() execution from cpu_stopper_thread() and call that.  I
know you objected to that in the other reply but please do it that
way.  All that's necessary is using a different context and avoiding
sleeping.  It's best to deviate only there.  Please note that the
above change already breaks the return value.

> @@ -431,6 +460,7 @@ static int stop_machine_cpu_stop(void *d
>  	struct stop_machine_data *smdata = data;
>  	enum stopmachine_state curstate = STOPMACHINE_NONE;
>  	int cpu = smp_processor_id(), err = 0;
> +	unsigned long flags = 0;
>  	bool is_active;
>  
>  	if (!smdata->active_cpus)
> @@ -446,7 +476,7 @@ static int stop_machine_cpu_stop(void *d
>  			curstate = smdata->state;
>  			switch (curstate) {
>  			case STOPMACHINE_DISABLE_IRQ:
> -				local_irq_disable();
> +				local_irq_save(flags);
>  				hard_irq_disable();
>  				break;
>  			case STOPMACHINE_RUN:
> @@ -460,7 +490,7 @@ static int stop_machine_cpu_stop(void *d
>  		}
>  	} while (curstate != STOPMACHINE_EXIT);
>  
> -	local_irq_enable();
> +	local_irq_restore(flags);

It would be nice to explain how the function may be called with irq
disabled.  It would also be nice the special offline case explained in
the docbook comments as well.

> @@ -470,9 +500,19 @@ int __stop_machine(int (*fn)(void *), vo
>  					    .num_threads = num_online_cpus(),
>  					    .active_cpus = cpus };
>  
> +	/* Include the calling cpu that might not be online yet. */
> +	if (!percpu_read(cpu_stopper.enabled))
> +		smdata.num_threads++;
> +
>  	/* Set the initial state and stop all online cpus. */
>  	set_state(&smdata, STOPMACHINE_PREPARE);
> -	return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
> +
> +	if (percpu_read(cpu_stopper.enabled))
> +		return stop_cpus(cpu_online_mask, stop_machine_cpu_stop,
> +				 &smdata);
> +	else
> +		return __stop_cpus(cpu_all_mask, stop_machine_cpu_stop,
> +				   &smdata);

Ummm... I'm completely lost here.  How is this supposed to work?  As
local CPU can't sleep, we skip synchronization altogether?  What if
another stop machine is already in progress?  Shouldn't you be busy
looping w/ trylock instead?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path
  2011-06-10 13:05   ` Tejun Heo
@ 2011-06-13 17:58     ` Suresh Siddha
  0 siblings, 0 replies; 5+ messages in thread
From: Suresh Siddha @ 2011-06-13 17:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mingo, tglx, hpa, trenn, prarit, rusty, linux-kernel, Song,
	Youquan, stable

On Fri, 2011-06-10 at 06:05 -0700, Tejun Heo wrote:
> > @@ -47,15 +48,32 @@ static void cpu_stop_init_done(struct cp
> >  	memset(done, 0, sizeof(*done));
> >  	atomic_set(&done->nr_todo, nr_todo);
> >  	init_completion(&done->completion);
> > +	done->offline_ctxt = !percpu_read(cpu_stopper.enabled);
> > +}
> 
> I'm not sure the above test is safe.  CPU_ONLINE notification is not
> guaranteed to execute on the CPU being brought online.  It's likely to
> happen on the CPU which is bring up the target CPU, so
> cpu_stopper.enabled may change asynchronously.  I think the correct
> test to perform is querying local CPU onlineness with accompanying
> WARN_ON_ONCE() on preemption enable state.

Thinking a bit more, I think using raw_smp_processor_id() is safe and
easier to understand. So I went back to cpu_online() checks here. We
don't need to worry about preemption state, as we are checking only if
the cpu is online/offline and there is no load balance that happens
between an online and offline cpu.

> >  {
> >  	if (done) {
> > +		bool offline_ctxt = done->offline_ctxt;
> >  		if (executed)
> >  			done->executed = true;
> > -		if (atomic_dec_and_test(&done->nr_todo))
> > +		if (atomic_dec_and_test(&done->nr_todo) &&  !offline_ctxt)
> >  			complete(&done->completion);
> 
> What does the local variable achieve?  Also, an extra space.

If the caller is polling on nr_todo, we need to check the offline status
before decrementing the nr_todo, as the callers stack holding 'done' may
no longer be active (as the caller can return as soon as he sees null
todo). Memory barrier in atomic_dec_and_test() ensures that the offline
status is read before.

> >  	/* initialize works and done */
> > -	for_each_cpu(cpu, cpumask) {
> > +	for_each_cpu_and(cpu, cpumask, cpu_online_mask) {
> >  		work = &per_cpu(stop_cpus_work, cpu);
> >  		work->fn = fn;
> >  		work->arg = arg;
> >  		work->done = &done;
> > +		weight++;
> 
> This seems a bit dangerous.  Upto now, whether to execute or not is
> solely determined by testing stopper->enabled while holding
> stopper->lock.  There's no reason to change that behavior for this
> feature and even if necessary it should be done in a separate patch
> with ample explanation why that's necessary and why it's safe.

Making stop_cpus() to work in offline case requires a bit more work for
both CONFIG_SMP and !CONFIG_SMP cases. And there is no user for it
currently.

So in the new version, I made only __stop_machine() to be called from an
online path and no changes for stop_cpus() which can be called only from
a cpu that is already online.

If there is a need, we can make stop_cpus() also behave similarly in
future. But for now, only __stop_machine() is to be used from an online
path (used in the second patch by x86 mtrr code).

> > +	/*
> > +	 * This cpu is not yet online. If @fn needs to be run on this
> > +	 * cpu, run it now.
> > +	 */
> > +	if (!online && cpu_isset(smp_processor_id(), *cpumask))
> > +		fn(arg);
> 
> Can't we put this in cpu_stop_wait_for_completion()?  And please
> factor out fn() execution from cpu_stopper_thread() and call that.  I
> know you objected to that in the other reply but please do it that
> way.  All that's necessary is using a different context and avoiding
> sleeping.

Calling cpu is not yet online (with irq's disabled etc), so it can't do
any context switch etc.

I fixed the return value checking you mentioned.

> > @@ -470,9 +500,19 @@ int __stop_machine(int (*fn)(void *), vo
> >  					    .num_threads = num_online_cpus(),
> >  					    .active_cpus = cpus };
> >  
> > +	/* Include the calling cpu that might not be online yet. */
> > +	if (!percpu_read(cpu_stopper.enabled))
> > +		smdata.num_threads++;
> > +
> >  	/* Set the initial state and stop all online cpus. */
> >  	set_state(&smdata, STOPMACHINE_PREPARE);
> > -	return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
> > +
> > +	if (percpu_read(cpu_stopper.enabled))
> > +		return stop_cpus(cpu_online_mask, stop_machine_cpu_stop,
> > +				 &smdata);
> > +	else
> > +		return __stop_cpus(cpu_all_mask, stop_machine_cpu_stop,
> > +				   &smdata);
> 
> Ummm... I'm completely lost here.  How is this supposed to work?  As
> local CPU can't sleep, we skip synchronization altogether?  What if
> another stop machine is already in progress? 

There can't be another stop machine in parallel as stop_machine() does
get_online_cpus() and will be waiting for that if there is a cpu that is
coming online which does __stop_machine().

> Shouldn't you be busy
> looping w/ trylock instead?

Reviewing more closely, stop_cpus() can come in parallel to a
__stop_machine(). So I ended up busy looping with trylock in this path
aswell.

v4 patchset will follow shortly.

thanks,
suresh


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-06-13 17:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-09 22:32 [patch v3 0/2] enhance stop machine infrastructure for MTRR rendezvous sequence Suresh Siddha
2011-06-09 22:32 ` [patch v3 1/2] stop_machine: enable __stop_machine() to be called from the cpu online path Suresh Siddha
2011-06-10 13:05   ` Tejun Heo
2011-06-13 17:58     ` Suresh Siddha
2011-06-09 22:32 ` [patch v3 2/2] x86, mtrr: use __stop_machine() for doing MTRR rendezvous Suresh Siddha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).