linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
@ 2008-10-03 10:56 Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 1/4] stop_machine: atomic update for combined return value Heiko Carstens
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Heiko Carstens @ 2008-10-03 10:56 UTC (permalink / raw)
  To: rusty, jens.axboe, schwidefsky; +Cc: linux-kernel

This patch series would allow to convert s390 to the generic IPI interface.
We can't to that currently since our etr/stp code relies on the old semantics
of smp_call_function that guarantee that the function only returns after all
receiving cpus have acknowledged the IPI. That way it is known that all other
cpus are running in an interrupt handler with interrupts disabled.
This is not true anymore with the generic IPI infrastructure.

So one idea was to use stop_machine in order to synchronize all cpus. Rusty
was kind enough to extend it so that it is now possible to run a function
on several cpus, instead of just one.
However we need to be able to do that without allocating any memory. That's
what this patch set is about. It introduces two new interfaces which allow
to create the kstop threads in advance so that a subsequent stop_machine call
will be able to run without the need to create any threads and therefore
avoids the need to allocate any memory.

The two interfaces (see patch 2) are:

stop_machine_get_threads() - create all needed kstop threads in advance.
If it is called multiple times it will just increase an internal usecount.

stop_machine_put_threads() - kill all previously created kstop threads,
if the internal usecount drops to zero.

Patch 1 is a stop_machine bugfix and is independent of the rest
Patch 2 introduces the new proposed interface
Patch 3 converts the s390 etr and stp code to the new API
Patch 4 converts s390 to the generic IPI interface

Thanks,
Heiko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH/RFC 1/4] stop_machine: atomic update for combined return value
  2008-10-03 10:56 [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Heiko Carstens
@ 2008-10-03 10:56 ` Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 2/4] stop_machine: add stop_machine_get/put_threads interface Heiko Carstens
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2008-10-03 10:56 UTC (permalink / raw)
  To: rusty, jens.axboe, schwidefsky; +Cc: linux-kernel, Heiko Carstens

[-- Attachment #1: stopmachine_atomic.diff --]
[-- Type: text/plain, Size: 1207 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Using |= for updating a value which might be updated on several cpus
concurrently will not always work since we need to make sure that the
update happens atomically. So let's use a cmpxchg loop instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/stop_machine.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/stop_machine.c
===================================================================
--- linux-2.6.orig/kernel/stop_machine.c
+++ linux-2.6/kernel/stop_machine.c
@@ -65,6 +65,7 @@ static void ack_state(void)
 static int stop_cpu(struct stop_machine_data *smdata)
 {
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
+	int *frp, fnret, old;
 
 	/* Simple state machine */
 	do {
@@ -80,7 +81,11 @@ static int stop_cpu(struct stop_machine_
 			case STOPMACHINE_RUN:
 				/* |= allows error detection if functions on
 				 * multiple CPUs. */
-				smdata->fnret |= smdata->fn(smdata->data);
+				fnret = smdata->fn(smdata->data);
+				frp = &smdata->fnret;
+				do {
+					old = *frp;
+				} while (cmpxchg(frp, old, old | fnret) != old);
 				break;
 			default:
 				break;

-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH/RFC 2/4] stop_machine: add stop_machine_get/put_threads interface
  2008-10-03 10:56 [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 1/4] stop_machine: atomic update for combined return value Heiko Carstens
@ 2008-10-03 10:56 ` Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 3/4] s390: convert etr/stp to stop_machine interface Heiko Carstens
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2008-10-03 10:56 UTC (permalink / raw)
  To: rusty, jens.axboe, schwidefsky; +Cc: linux-kernel, Heiko Carstens

[-- Attachment #1: stopmachine_getput.diff --]
[-- Type: text/plain, Size: 8734 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Please note that this patch is not final. It contains bugs, doesn't care much
about memory barries etc. It just is intended to get feedback if the proposed
interface makes sense.

This patch adds to new interfaces to the stop_machine infrastructure:

stop_machine_get_threads() will create all needed kstop threads in advance.
If it is called multiple times it will just increase an internal usecount.

stop_machine_put_threads() will kill all previously created kstop threads,
if the internal usecount drops to zero.

This new interface can be used if there is the need to synchronize all cpus
without allocating any memory. This is achieved by simply creating the
kstop threads early and letting them sleep until they are finally needed.

It could also make sense to use this interface if plenty of stop_machine
calls are going to happen (e.g. kprobes). This would probably speed things
up.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 include/linux/stop_machine.h |    8 +
 kernel/stop_machine.c        |  206 +++++++++++++++++++++++++++++++++----------
 2 files changed, 167 insertions(+), 47 deletions(-)

Index: linux-2.6/kernel/stop_machine.c
===================================================================
--- linux-2.6.orig/kernel/stop_machine.c
+++ linux-2.6/kernel/stop_machine.c
@@ -4,6 +4,7 @@
 #include <linux/cpu.h>
 #include <linux/err.h>
 #include <linux/kthread.h>
+#include <linux/init.h>
 #include <linux/module.h>
 #include <linux/sched.h>
 #include <linux/stop_machine.h>
@@ -39,6 +40,10 @@ static unsigned int num_threads;
 static atomic_t thread_ack;
 static struct completion finished;
 static DEFINE_MUTEX(lock);
+static struct stop_machine_data active, idle;
+static cpumask_t active_cpus;
+static struct task_struct *threads[NR_CPUS];
+static int usecount;
 
 static void set_state(enum stopmachine_state newstate)
 {
@@ -48,6 +53,13 @@ static void set_state(enum stopmachine_s
 	state = newstate;
 }
 
+static enum stopmachine_state read_state(void)
+{
+	/* Force read of state. */
+	barrier();
+	return state;
+}
+
 /* Last one to ack a state moves to the next state. */
 static void ack_state(void)
 {
@@ -62,7 +74,7 @@ static void ack_state(void)
 
 /* This is the actual thread which stops the CPU.  It exits by itself rather
  * than waiting for kthread_stop(), because it's easier for hotplug CPU. */
-static int stop_cpu(struct stop_machine_data *smdata)
+static void __stop_cpu(struct stop_machine_data *smdata)
 {
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 	int *frp, fnret, old;
@@ -95,7 +107,27 @@ static int stop_cpu(struct stop_machine_
 	} while (curstate != STOPMACHINE_EXIT);
 
 	local_irq_enable();
-	do_exit(0);
+}
+
+static int stop_cpu(void *smcpu)
+{
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wait);
+	struct stop_machine_data *smdata;
+	int cpu;
+
+	cpu = (long)smcpu;
+	while (1) {
+		if (kthread_should_stop())
+			break;
+		/* active_cpus mask might have changed. */
+		barrier();
+		smdata =  cpu_isset(cpu, active_cpus) ? &active : &idle;
+		__stop_cpu(smdata);
+		wait_event_interruptible(wait,
+					 kthread_should_stop() ||
+					 read_state() == STOPMACHINE_PREPARE);
+	}
+	return 0;
 }
 
 /* Callback for CPUs which aren't supposed to do anything. */
@@ -104,55 +136,103 @@ static int chill(void *unused)
 	return 0;
 }
 
+static int create_kstop_thread(int cpu)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+	struct task_struct *k;
+	int err;
+
+	k = kthread_create((void *)stop_cpu, (void *)(long)cpu, "kstop%u", cpu);
+	err = IS_ERR(k) ? PTR_ERR(k) : 0;
+	if (err)
+		return err;
+	threads[cpu] = k;
+	/* Place it onto correct cpu. */
+	kthread_bind(k, cpu);
+
+	/* Make it highest prio. */
+	if (sched_setscheduler_nocheck(k, SCHED_FIFO, &param))
+		BUG();
+	return 0;
+}
+
+static void kill_kstop_thread(int cpu)
+{
+	if (!threads[cpu])
+		return;
+	kthread_stop(threads[cpu]);
+	threads[cpu] = NULL;
+}
+
+static int __stop_machine_get_threads(void)
+{
+	int i, err;
+
+	if (usecount++)
+		return 0;
+	for_each_online_cpu(i) {
+		err = create_kstop_thread(i);
+		if (err)
+			goto kill_threads;
+	}
+	return 0;
+kill_threads:
+	for_each_online_cpu(i)
+		kill_kstop_thread(i);
+	usecount--;
+	return err;
+}
+
+int stop_machine_get_threads(void)
+{
+	int err;
+
+	mutex_lock(&lock);
+	/* FIXME: All created tasks will be in state UNINTERRUPTIBLE
+	 * but we want INTERRUPTIBLE. */
+	err = __stop_machine_get_threads();
+	mutex_unlock(&lock);
+	return err;
+}
+
+static void __stop_machine_put_threads(void)
+{
+	int i;
+
+	if (--usecount)
+		return;
+	for_each_online_cpu(i)
+		kill_kstop_thread(i);
+}
+
+void stop_machine_put_threads(void)
+{
+	mutex_lock(&lock);
+	__stop_machine_put_threads();
+	mutex_unlock(&lock);
+}
+
 int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
 {
 	int i, err;
-	struct stop_machine_data active, idle;
-	struct task_struct **threads;
 
+	active_cpus = cpus ? *cpus : cpumask_of_cpu(first_cpu(cpu_online_map));
 	active.fn = fn;
 	active.data = data;
 	active.fnret = 0;
 	idle.fn = chill;
 	idle.data = NULL;
 
-	/* This could be too big for stack on large machines. */
-	threads = kcalloc(NR_CPUS, sizeof(threads[0]), GFP_KERNEL);
-	if (!threads)
-		return -ENOMEM;
-
 	/* Set up initial state. */
 	mutex_lock(&lock);
 	init_completion(&finished);
 	num_threads = num_online_cpus();
 	set_state(STOPMACHINE_PREPARE);
 
-	for_each_online_cpu(i) {
-		struct stop_machine_data *smdata = &idle;
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
-		if (!cpus) {
-			if (i == first_cpu(cpu_online_map))
-				smdata = &active;
-		} else {
-			if (cpu_isset(i, *cpus))
-				smdata = &active;
-		}
-
-		threads[i] = kthread_create((void *)stop_cpu, smdata, "kstop%u",
-					    i);
-		if (IS_ERR(threads[i])) {
-			err = PTR_ERR(threads[i]);
-			threads[i] = NULL;
-			goto kill_threads;
-		}
-
-		/* Place it onto correct cpu. */
-		kthread_bind(threads[i], i);
-
-		/* Make it highest prio. */
-		if (sched_setscheduler_nocheck(threads[i], SCHED_FIFO, &param))
-			BUG();
+	err = __stop_machine_get_threads();
+	if (err) {
+		mutex_unlock(&lock);
+		return err;
 	}
 
 	/* We've created all the threads.  Wake them all: hold this CPU so one
@@ -164,20 +244,9 @@ int __stop_machine(int (*fn)(void *), vo
 	/* This will release the thread on our CPU. */
 	put_cpu();
 	wait_for_completion(&finished);
+	__stop_machine_put_threads();
 	mutex_unlock(&lock);
-
-	kfree(threads);
-
 	return active.fnret;
-
-kill_threads:
-	for_each_online_cpu(i)
-		if (threads[i])
-			kthread_stop(threads[i]);
-	mutex_unlock(&lock);
-
-	kfree(threads);
-	return err;
 }
 
 int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
@@ -192,3 +261,46 @@ int stop_machine(int (*fn)(void *), void
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
+
+static int __cpuinit stop_machine_notify(struct notifier_block *self,
+					 unsigned long action, void *hcpu)
+{
+	int rc = 0;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		mutex_lock(&lock);
+		if (usecount)
+			/* FIXME: new thread will be in state UNINTERRUPTIBLE
+			 * but we want INTERRUPTIBLE. */
+			rc = create_kstop_thread((long)hcpu);
+		mutex_unlock(&lock);
+		break;
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+		mutex_lock(&lock);
+		if (usecount)
+			kill_kstop_thread((long)hcpu);
+		mutex_unlock(&lock);
+		break;
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		mutex_lock(&lock);
+		kill_kstop_thread((long)hcpu);
+		mutex_unlock(&lock);
+		break;
+	}
+	return rc ? NOTIFY_BAD : NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata stop_machine_nb = {
+	.notifier_call = stop_machine_notify,
+};
+
+static int __init stop_machine_init(void)
+{
+	register_hotcpu_notifier(&stop_machine_nb);
+	return 0;
+}
+early_initcall(stop_machine_init);
Index: linux-2.6/include/linux/stop_machine.h
===================================================================
--- linux-2.6.orig/include/linux/stop_machine.h
+++ linux-2.6/include/linux/stop_machine.h
@@ -35,6 +35,10 @@ int stop_machine(int (*fn)(void *), void
  * won't come or go while it's being called.  Used by hotplug cpu.
  */
 int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus);
+
+int stop_machine_get_threads(void);
+void stop_machine_put_threads(void);
+
 #else
 
 static inline int stop_machine(int (*fn)(void *), void *data,
@@ -46,5 +50,9 @@ static inline int stop_machine(int (*fn)
 	local_irq_enable();
 	return ret;
 }
+
+static inline int stop_machine_get_threads(void) { return 0; }
+static inline void stop_machine_put_threads(void) { }
+
 #endif /* CONFIG_SMP */
 #endif /* _LINUX_STOP_MACHINE */

-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH/RFC 3/4] s390: convert etr/stp to stop_machine interface
  2008-10-03 10:56 [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 1/4] stop_machine: atomic update for combined return value Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 2/4] stop_machine: add stop_machine_get/put_threads interface Heiko Carstens
@ 2008-10-03 10:56 ` Heiko Carstens
  2008-10-03 10:56 ` [PATCH/RFC 4/4] s390: convert to generic IPI infrstructure Heiko Carstens
  2008-10-06  4:42 ` [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Rusty Russell
  4 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2008-10-03 10:56 UTC (permalink / raw)
  To: rusty, jens.axboe, schwidefsky; +Cc: linux-kernel, Heiko Carstens

[-- Attachment #1: etr.diff --]
[-- Type: text/plain, Size: 9222 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

The s390 etr and stp code uses smp_call_function in order to synchronize
all cpus without allocating any memory.

The semantics of the old smp_call_function guaranteed that it returned
only after the IPIs have been delivered to all other cpus. That way it was
known that all other cpus were running in an interrupt handler with
interrupts disabled.
This semantic is gone with the new generic IPI infrastructure and prevented
s390 to use it yet.
With the new stop_machine interface stop_machine_get_threads we can create
the kstop threads in advance. This allows to use the stop_machine interface
to synchronize all cpus without allocating any memory.
So convert the etr/stp code to use stop_machine instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 arch/s390/kernel/time.c |  182 ++++++++++++++++++++++++++++++------------------
 1 file changed, 115 insertions(+), 67 deletions(-)

Index: linux-2.6/arch/s390/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/s390/kernel/time.c
+++ linux-2.6/arch/s390/kernel/time.c
@@ -20,6 +20,8 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/interrupt.h>
+#include <linux/cpu.h>
+#include <linux/stop_machine.h>
 #include <linux/time.h>
 #include <linux/sysdev.h>
 #include <linux/delay.h>
@@ -450,9 +452,13 @@ static void etr_reset(void)
 static int __init etr_init(void)
 {
 	struct etr_aib aib;
+	int rc;
 
 	if (!test_bit(CLOCK_SYNC_HAS_ETR, &clock_sync_flags))
 		return 0;
+	rc = stop_machine_get_threads();
+	if (rc)
+		return rc;
 	/* Check if this machine has the steai instruction. */
 	if (etr_steai(&aib, ETR_STEAI_STEPPING_PORT) == 0)
 		etr_steai_available = 1;
@@ -643,14 +649,16 @@ static int etr_aib_follows(struct etr_ai
 }
 
 struct clock_sync_data {
+	atomic_t cpus;
 	int in_sync;
 	unsigned long long fixup_cc;
+	int etr_port;
+	struct etr_aib *etr_aib;
 };
 
-static void clock_sync_cpu_start(void *dummy)
+static void clock_sync_cpu(struct clock_sync_data *sync)
 {
-	struct clock_sync_data *sync = dummy;
-
+	atomic_dec(&sync->cpus);
 	enable_sync_clock();
 	/*
 	 * This looks like a busy wait loop but it isn't. etr_sync_cpus
@@ -676,39 +684,35 @@ static void clock_sync_cpu_start(void *d
 	fixup_clock_comparator(sync->fixup_cc);
 }
 
-static void clock_sync_cpu_end(void *dummy)
-{
-}
-
 /*
  * Sync the TOD clock using the port refered to by aibp. This port
  * has to be enabled and the other port has to be disabled. The
  * last eacr update has to be more than 1.6 seconds in the past.
  */
-static int etr_sync_clock(struct etr_aib *aib, int port)
+static int etr_sync_clock(void *data)
 {
-	struct etr_aib *sync_port;
-	struct clock_sync_data etr_sync;
+	static int first;
 	unsigned long long clock, old_clock, delay, delta;
-	int follows;
+	struct clock_sync_data *etr_sync;
+	struct etr_aib *sync_port, *aib;
+	int port;
 	int rc;
 
-	/* Check if the current aib is adjacent to the sync port aib. */
-	sync_port = (port == 0) ? &etr_port0 : &etr_port1;
-	follows = etr_aib_follows(sync_port, aib, port);
-	memcpy(sync_port, aib, sizeof(*aib));
-	if (!follows)
-		return -EAGAIN;
+	etr_sync = data;
 
-	/*
-	 * Catch all other cpus and make them wait until we have
-	 * successfully synced the clock. smp_call_function will
-	 * return after all other cpus are in etr_sync_cpu_start.
-	 */
-	memset(&etr_sync, 0, sizeof(etr_sync));
-	preempt_disable();
-	smp_call_function(clock_sync_cpu_start, &etr_sync, 0);
-	local_irq_disable();
+	if (xchg(&first, 1) == 1) {
+		/* Slave */
+		clock_sync_cpu(etr_sync);
+		return 0;
+	}
+
+	/* Wait until all other cpus entered the sync function. */
+	while (atomic_read(&etr_sync->cpus) != 0)
+		cpu_relax();
+
+	port = etr_sync->etr_port;
+	aib = etr_sync->etr_aib;
+	sync_port = (port == 0) ? &etr_port0 : &etr_port1;
 	enable_sync_clock();
 
 	/* Set clock to next OTE. */
@@ -725,16 +729,16 @@ static int etr_sync_clock(struct etr_aib
 		delay = (unsigned long long)
 			(aib->edf2.etv - sync_port->edf2.etv) << 32;
 		delta = adjust_time(old_clock, clock, delay);
-		etr_sync.fixup_cc = delta;
+		etr_sync->fixup_cc = delta;
 		fixup_clock_comparator(delta);
 		/* Verify that the clock is properly set. */
 		if (!etr_aib_follows(sync_port, aib, port)) {
 			/* Didn't work. */
 			disable_sync_clock(NULL);
-			etr_sync.in_sync = -EAGAIN;
+			etr_sync->in_sync = -EAGAIN;
 			rc = -EAGAIN;
 		} else {
-			etr_sync.in_sync = 1;
+			etr_sync->in_sync = 1;
 			rc = 0;
 		}
 	} else {
@@ -742,12 +746,33 @@ static int etr_sync_clock(struct etr_aib
 		__ctl_clear_bit(0, 29);
 		__ctl_clear_bit(14, 21);
 		disable_sync_clock(NULL);
-		etr_sync.in_sync = -EAGAIN;
+		etr_sync->in_sync = -EAGAIN;
 		rc = -EAGAIN;
 	}
-	local_irq_enable();
-	smp_call_function(clock_sync_cpu_end, NULL, 0);
-	preempt_enable();
+	xchg(&first, 0);
+	return rc;
+}
+
+static int etr_sync_clock_stop(struct etr_aib *aib, int port)
+{
+	struct clock_sync_data etr_sync;
+	struct etr_aib *sync_port;
+	int follows;
+	int rc;
+
+	/* Check if the current aib is adjacent to the sync port aib. */
+	sync_port = (port == 0) ? &etr_port0 : &etr_port1;
+	follows = etr_aib_follows(sync_port, aib, port);
+	memcpy(sync_port, aib, sizeof(*aib));
+	if (!follows)
+		return -EAGAIN;
+	memset(&etr_sync, 0, sizeof(etr_sync));
+	etr_sync.etr_aib = aib;
+	etr_sync.etr_port = port;
+	get_online_cpus();
+	atomic_set(&etr_sync.cpus, num_online_cpus() - 1);
+	rc = stop_machine(etr_sync_clock, &etr_sync, &cpu_online_map);
+	put_online_cpus();
 	return rc;
 }
 
@@ -904,7 +929,7 @@ static void etr_update_eacr(struct etr_e
 }
 
 /*
- * ETR tasklet. In this function you'll find the main logic. In
+ * ETR work. In this function you'll find the main logic. In
  * particular this is the only function that calls etr_update_eacr(),
  * it "controls" the etr control register.
  */
@@ -1037,7 +1062,7 @@ static void etr_work_fn(struct work_stru
 	etr_update_eacr(eacr);
 	set_bit(CLOCK_SYNC_ETR, &clock_sync_flags);
 	if (now < etr_tolec + (1600000 << 12) ||
-	    etr_sync_clock(&aib, sync_port) != 0) {
+	    etr_sync_clock_stop(&aib, sync_port) != 0) {
 		/* Sync failed. Try again in 1/2 second. */
 		eacr.es = 0;
 		etr_update_eacr(eacr);
@@ -1366,8 +1391,16 @@ static void __init stp_reset(void)
 
 static int __init stp_init(void)
 {
-	if (test_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags) && stp_online)
-		schedule_work(&stp_work);
+	int rc;
+
+	if (!test_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags))
+		return 0;
+	rc = stop_machine_get_threads();
+	if (rc)
+		return rc;
+	if (!stp_online)
+		return 0;
+	schedule_work(&stp_work);
 	return 0;
 }
 
@@ -1415,38 +1448,26 @@ void stp_island_check(void)
 	schedule_work(&stp_work);
 }
 
-/*
- * STP tasklet. Check for the STP state and take over the clock
- * synchronization if the STP clock source is usable.
- */
-static void stp_work_fn(struct work_struct *work)
+
+static int stp_sync_clock(void *data)
 {
-	struct clock_sync_data stp_sync;
+	static int first;
 	unsigned long long old_clock, delta;
+	struct clock_sync_data *stp_sync;
 	int rc;
 
-	if (!stp_online) {
-		chsc_sstpc(stp_page, STP_OP_CTRL, 0x0000);
-		return;
-	}
+	stp_sync = data;
 
-	rc = chsc_sstpc(stp_page, STP_OP_CTRL, 0xb0e0);
-	if (rc)
-		return;
+	if (xchg(&first, 1) == 1) {
+		/* Slave */
+		clock_sync_cpu(stp_sync);
+		return 0;
+	}
 
-	rc = chsc_sstpi(stp_page, &stp_info, sizeof(struct stp_sstpi));
-	if (rc || stp_info.c == 0)
-		return;
+	/* Wait until all other cpus entered the sync function. */
+	while (atomic_read(&stp_sync->cpus) != 0)
+		cpu_relax();
 
-	/*
-	 * Catch all other cpus and make them wait until we have
-	 * successfully synced the clock. smp_call_function will
-	 * return after all other cpus are in clock_sync_cpu_start.
-	 */
-	memset(&stp_sync, 0, sizeof(stp_sync));
-	preempt_disable();
-	smp_call_function(clock_sync_cpu_start, &stp_sync, 0);
-	local_irq_disable();
 	enable_sync_clock();
 
 	set_bit(CLOCK_SYNC_STP, &clock_sync_flags);
@@ -1470,16 +1491,43 @@ static void stp_work_fn(struct work_stru
 	}
 	if (rc) {
 		disable_sync_clock(NULL);
-		stp_sync.in_sync = -EAGAIN;
+		stp_sync->in_sync = -EAGAIN;
 		clear_bit(CLOCK_SYNC_STP, &clock_sync_flags);
 		if (etr_port0_online || etr_port1_online)
 			schedule_work(&etr_work);
 	} else
-		stp_sync.in_sync = 1;
+		stp_sync->in_sync = 1;
+	xchg(&first, 0);
+	return 0;
+}
+
+/*
+ * STP work. Check for the STP state and take over the clock
+ * synchronization if the STP clock source is usable.
+ */
+static void stp_work_fn(struct work_struct *work)
+{
+	struct clock_sync_data stp_sync;
+	int rc;
 
-	local_irq_enable();
-	smp_call_function(clock_sync_cpu_end, NULL, 0);
-	preempt_enable();
+	if (!stp_online) {
+		chsc_sstpc(stp_page, STP_OP_CTRL, 0x0000);
+		return;
+	}
+
+	rc = chsc_sstpc(stp_page, STP_OP_CTRL, 0xb0e0);
+	if (rc)
+		return;
+
+	rc = chsc_sstpi(stp_page, &stp_info, sizeof(struct stp_sstpi));
+	if (rc || stp_info.c == 0)
+		return;
+
+	memset(&stp_sync, 0, sizeof(stp_sync));
+	get_online_cpus();
+	atomic_set(&stp_sync.cpus, num_online_cpus() - 1);
+	stop_machine(stp_sync_clock, &stp_sync, &cpu_online_map);
+	put_online_cpus();
 }
 
 /*

-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH/RFC 4/4] s390: convert to generic IPI infrstructure
  2008-10-03 10:56 [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Heiko Carstens
                   ` (2 preceding siblings ...)
  2008-10-03 10:56 ` [PATCH/RFC 3/4] s390: convert etr/stp to stop_machine interface Heiko Carstens
@ 2008-10-03 10:56 ` Heiko Carstens
  2008-10-06  4:42 ` [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Rusty Russell
  4 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2008-10-03 10:56 UTC (permalink / raw)
  To: rusty, jens.axboe, schwidefsky; +Cc: linux-kernel, Heiko Carstens

[-- Attachment #1: generic_ipi.diff --]
[-- Type: text/plain, Size: 7408 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Since etr/stp don't need the old smp_call_function semantics anymore
we can convert s390 to the generic IPI infrastructure.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 arch/s390/Kconfig            |    1 
 arch/s390/include/asm/sigp.h |    1 
 arch/s390/include/asm/smp.h  |    5 -
 arch/s390/kernel/smp.c       |  175 ++++---------------------------------------
 4 files changed, 24 insertions(+), 158 deletions(-)

Index: linux-2.6/arch/s390/include/asm/sigp.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/sigp.h
+++ linux-2.6/arch/s390/include/asm/sigp.h
@@ -61,6 +61,7 @@ typedef enum
 {
 	ec_schedule=0,
 	ec_call_function,
+	ec_call_function_single,
 	ec_bit_last
 } ec_bit_sig;
 
Index: linux-2.6/arch/s390/include/asm/smp.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/smp.h
+++ linux-2.6/arch/s390/include/asm/smp.h
@@ -91,8 +91,9 @@ extern int __cpu_up (unsigned int cpu);
 extern struct mutex smp_cpu_state_mutex;
 extern int smp_cpu_polarization[];
 
-extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
-	void *info, int wait);
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi(cpumask_t mask);
+
 #endif
 
 #ifndef CONFIG_SMP
Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig
+++ linux-2.6/arch/s390/Kconfig
@@ -74,6 +74,7 @@ config S390
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
 	select HAVE_KVM if 64BIT
+	select USE_GENERIC_SMP_HELPERS if SMP
 
 source "init/Kconfig"
 
Index: linux-2.6/arch/s390/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/s390/kernel/smp.c
+++ linux-2.6/arch/s390/kernel/smp.c
@@ -77,159 +77,6 @@ static DEFINE_PER_CPU(struct cpu, cpu_de
 
 static void smp_ext_bitcall(int, ec_bit_sig);
 
-/*
- * Structure and data for __smp_call_function_map(). This is designed to
- * minimise static memory requirements. It also looks cleaner.
- */
-static DEFINE_SPINLOCK(call_lock);
-
-struct call_data_struct {
-	void (*func) (void *info);
-	void *info;
-	cpumask_t started;
-	cpumask_t finished;
-	int wait;
-};
-
-static struct call_data_struct *call_data;
-
-/*
- * 'Call function' interrupt callback
- */
-static void do_call_function(void)
-{
-	void (*func) (void *info) = call_data->func;
-	void *info = call_data->info;
-	int wait = call_data->wait;
-
-	cpu_set(smp_processor_id(), call_data->started);
-	(*func)(info);
-	if (wait)
-		cpu_set(smp_processor_id(), call_data->finished);;
-}
-
-static void __smp_call_function_map(void (*func) (void *info), void *info,
-				    int wait, cpumask_t map)
-{
-	struct call_data_struct data;
-	int cpu, local = 0;
-
-	/*
-	 * Can deadlock when interrupts are disabled or if in wrong context.
-	 */
-	WARN_ON(irqs_disabled() || in_irq());
-
-	/*
-	 * Check for local function call. We have to have the same call order
-	 * as in on_each_cpu() because of machine_restart_smp().
-	 */
-	if (cpu_isset(smp_processor_id(), map)) {
-		local = 1;
-		cpu_clear(smp_processor_id(), map);
-	}
-
-	cpus_and(map, map, cpu_online_map);
-	if (cpus_empty(map))
-		goto out;
-
-	data.func = func;
-	data.info = info;
-	data.started = CPU_MASK_NONE;
-	data.wait = wait;
-	if (wait)
-		data.finished = CPU_MASK_NONE;
-
-	call_data = &data;
-
-	for_each_cpu_mask(cpu, map)
-		smp_ext_bitcall(cpu, ec_call_function);
-
-	/* Wait for response */
-	while (!cpus_equal(map, data.started))
-		cpu_relax();
-	if (wait)
-		while (!cpus_equal(map, data.finished))
-			cpu_relax();
-out:
-	if (local) {
-		local_irq_disable();
-		func(info);
-		local_irq_enable();
-	}
-}
-
-/*
- * smp_call_function:
- * @func: the function to run; this must be fast and non-blocking
- * @info: an arbitrary pointer to pass to the function
- * @wait: if true, wait (atomically) until function has completed on other CPUs
- *
- * Run a function on all other CPUs.
- *
- * You must not call this function with disabled interrupts, from a
- * hardware interrupt handler or from a bottom half.
- */
-int smp_call_function(void (*func) (void *info), void *info, int wait)
-{
-	cpumask_t map;
-
-	spin_lock(&call_lock);
-	map = cpu_online_map;
-	cpu_clear(smp_processor_id(), map);
-	__smp_call_function_map(func, info, wait, map);
-	spin_unlock(&call_lock);
-	return 0;
-}
-EXPORT_SYMBOL(smp_call_function);
-
-/*
- * smp_call_function_single:
- * @cpu: the CPU where func should run
- * @func: the function to run; this must be fast and non-blocking
- * @info: an arbitrary pointer to pass to the function
- * @wait: if true, wait (atomically) until function has completed on other CPUs
- *
- * Run a function on one processor.
- *
- * You must not call this function with disabled interrupts, from a
- * hardware interrupt handler or from a bottom half.
- */
-int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
-			     int wait)
-{
-	spin_lock(&call_lock);
-	__smp_call_function_map(func, info, wait, cpumask_of_cpu(cpu));
-	spin_unlock(&call_lock);
-	return 0;
-}
-EXPORT_SYMBOL(smp_call_function_single);
-
-/**
- * smp_call_function_mask(): Run a function on a set of other CPUs.
- * @mask: The set of cpus to run on.  Must not include the current cpu.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed on other CPUs.
- *
- * Returns 0 on success, else a negative status code.
- *
- * If @wait is true, then returns once @func has returned; otherwise
- * it returns just before the target cpu calls @func.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.
- */
-int smp_call_function_mask(cpumask_t mask, void (*func)(void *), void *info,
-			   int wait)
-{
-	spin_lock(&call_lock);
-	cpu_clear(smp_processor_id(), mask);
-	__smp_call_function_map(func, info, wait, mask);
-	spin_unlock(&call_lock);
-	return 0;
-}
-EXPORT_SYMBOL(smp_call_function_mask);
-
 void smp_send_stop(void)
 {
 	int cpu, rc;
@@ -271,7 +118,10 @@ static void do_ext_call_interrupt(__u16 
 	bits = xchg(&S390_lowcore.ext_call_fast, 0);
 
 	if (test_bit(ec_call_function, &bits))
-		do_call_function();
+		generic_smp_call_function_interrupt();
+
+	if (test_bit(ec_call_function_single, &bits))
+		generic_smp_call_function_single_interrupt();
 }
 
 /*
@@ -288,6 +138,19 @@ static void smp_ext_bitcall(int cpu, ec_
 		udelay(10);
 }
 
+void arch_send_call_function_ipi(cpumask_t mask)
+{
+	int cpu;
+
+	for_each_cpu_mask(cpu, mask)
+		smp_ext_bitcall(cpu, ec_call_function);
+}
+
+void arch_send_call_function_single_ipi(int cpu)
+{
+	smp_ext_bitcall(cpu, ec_call_function_single);
+}
+
 #ifndef CONFIG_64BIT
 /*
  * this function sends a 'purge tlb' signal to another CPU.
@@ -586,9 +449,9 @@ int __cpuinit start_secondary(void *cpuv
 	pfault_init();
 
 	/* Mark this cpu as online */
-	spin_lock(&call_lock);
+	ipi_call_lock();
 	cpu_set(smp_processor_id(), cpu_online_map);
-	spin_unlock(&call_lock);
+	ipi_call_unlock();
 	/* Switch on interrupts */
 	local_irq_enable();
 	/* Print info about this processor */

-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-03 10:56 [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Heiko Carstens
                   ` (3 preceding siblings ...)
  2008-10-03 10:56 ` [PATCH/RFC 4/4] s390: convert to generic IPI infrstructure Heiko Carstens
@ 2008-10-06  4:42 ` Rusty Russell
  2008-10-06 20:16   ` Heiko Carstens
  4 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2008-10-06  4:42 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: jens.axboe, schwidefsky, linux-kernel

On Friday 03 October 2008 20:56:32 you wrote:
> However we need to be able to do that without allocating any memory.

Nice work Heiko!

See free_module(), which calls stop_machine and, well, just hopes it works.  
So we've needed this for a while.

> Patch 1 is a stop_machine bugfix and is independent of the rest

Hmm, do you actually need this?  It was a whim (and clearly a dumb one).  I'm 
tempted to change it to:

	err = smdata->fn(smdata->data);
	if (err)
		smdata->fnret = err;

> Patch 2 introduces the new proposed interface

Could we just encapsulate the threads etc. into a "struct stopmachine" which 
is returned from stop_machine_prepare(), then implement everything in terms 
of that?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-06  4:42 ` [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Rusty Russell
@ 2008-10-06 20:16   ` Heiko Carstens
  2008-10-07  1:39     ` Rusty Russell
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2008-10-06 20:16 UTC (permalink / raw)
  To: Rusty Russell; +Cc: jens.axboe, schwidefsky, linux-kernel

On Mon, Oct 06, 2008 at 02:42:40PM +1000, Rusty Russell wrote:
> On Friday 03 October 2008 20:56:32 you wrote:
> > However we need to be able to do that without allocating any memory.
> 
> Nice work Heiko!
> 
> See free_module(), which calls stop_machine and, well, just hopes it works.  
> So we've needed this for a while.

Ah, good. At least there is one other user then :)

> > Patch 1 is a stop_machine bugfix and is independent of the rest
> Hmm, do you actually need this?  It was a whim (and clearly a dumb one).  I'm 
> tempted to change it to:
> 
> 	err = smdata->fn(smdata->data);
> 	if (err)
> 		smdata->fnret = err;

That looks much better than the cmpxchg loop I came up with. All we need to
know is that 'something' went wrong. Any return code != 0 should be enough.

> > Patch 2 introduces the new proposed interface
> 
> Could we just encapsulate the threads etc. into a "struct stopmachine" which 
> is returned from stop_machine_prepare(), then implement everything in terms 
> of that?

You mean that we put the pointers to the threads, the cpu mask, etc. in this
structure, instead of wasting bss size?
That would be just a kmalloc call in __stop_machine_get_threads().
Or do you think of something different?

Anyway, I'm going to send a hopefully better patch tommorrow.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-06 20:16   ` Heiko Carstens
@ 2008-10-07  1:39     ` Rusty Russell
  2008-10-07 15:38       ` Heiko Carstens
  0 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2008-10-07  1:39 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: jens.axboe, schwidefsky, linux-kernel

On Tuesday 07 October 2008 07:16:50 Heiko Carstens wrote:
> > > Patch 2 introduces the new proposed interface
> >
> > Could we just encapsulate the threads etc. into a "struct stopmachine"
> > which is returned from stop_machine_prepare(), then implement everything
> > in terms of that?
>
> You mean that we put the pointers to the threads, the cpu mask, etc. in
> this structure, instead of wasting bss size?
> That would be just a kmalloc call in __stop_machine_get_threads().
> Or do you think of something different?

That's exactly my idea.  We kmalloc already because NR_CPUS might be too big 
for the stack.  This version would just kmalloc a struct containing 
everything we need.

I prefer _prepare() / _run() / _destroy() as nomenclature BTW.  prepare comes 
from wait.h's prepare_to_wait; I don't like alloc() since it does more than 
allocate memory, yet _get_threads unnecessarily reveals too much about the 
implementation.

Then we have the simple case:

static inline int stop_machine(int (*fn)(void *), void *data,
			       const struct cpumask *cpus)
{
	struct stop_machine *sm = stop_machine_prepare();
	int err;

	if (!sm)
		return -ENOMEM;

	err = stop_machine_run(sm, fn, data, cpus);
	stop_machine_destroy(sm);
	return err;
}

I think you want to be able to call stop_machine_run() with the same "sm" 
multiple times, but that should be pretty easy to ensure.

Cheers!
Rusty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-07  1:39     ` Rusty Russell
@ 2008-10-07 15:38       ` Heiko Carstens
  2008-10-08  0:27         ` Rusty Russell
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2008-10-07 15:38 UTC (permalink / raw)
  To: Rusty Russell; +Cc: jens.axboe, schwidefsky, linux-kernel

On Tue, Oct 07, 2008 at 11:39:58AM +1000, Rusty Russell wrote:
> On Tuesday 07 October 2008 07:16:50 Heiko Carstens wrote:
> > > > Patch 2 introduces the new proposed interface
> > >
> > > Could we just encapsulate the threads etc. into a "struct stopmachine"
> > > which is returned from stop_machine_prepare(), then implement everything
> > > in terms of that?
> >
> > You mean that we put the pointers to the threads, the cpu mask, etc. in
> > this structure, instead of wasting bss size?
> > That would be just a kmalloc call in __stop_machine_get_threads().
> > Or do you think of something different?
> 
> That's exactly my idea.  We kmalloc already because NR_CPUS might be too big 
> for the stack.  This version would just kmalloc a struct containing 
> everything we need.

Ok, I did that but the resulting code is astonishingly ugly, so I thought I
should share it :)
 
> I prefer _prepare() / _run() / _destroy() as nomenclature BTW.  prepare comes 
> from wait.h's prepare_to_wait; I don't like alloc() since it does more than 
> allocate memory, yet _get_threads unnecessarily reveals too much about the 
> implementation.
> 
> Then we have the simple case:
> 
> static inline int stop_machine(int (*fn)(void *), void *data,
> 			       const struct cpumask *cpus)
> {
> 	struct stop_machine *sm = stop_machine_prepare();
> 	int err;
> 
> 	if (!sm)
> 		return -ENOMEM;
> 
> 	err = stop_machine_run(sm, fn, data, cpus);
> 	stop_machine_destroy(sm);
> 	return err;
> }
> I think you want to be able to call stop_machine_run() with the same "sm" 
> multiple times, but that should be pretty easy to ensure.

Actually there should be at most a single "sm" present. stop_machine_prepare()
also is supposed to create the kstop threads. So there is no point in having
several of them. Which again makes me ask, why should it a return a pointer
to a (the) stop_machine structure at all?
Imho an error code should be sufficient.

Another thing that comes to mind is cpu hotplug: if somebody issued
stop_machine_prepare() and then a cpu hotplug operation gets started we need
to create or kill a kstop thread. For that we need the "sm" so we can
save/find the task_struct pointer of the thread.

And yet another ugly detail: I decided to kill all kstop threads with
kthread_stop(). In case of cpu hot unplug this is a bit of a problem, since
the thread in question hasn't been migrated yet (yet == when
stop_machine_destroy gets called). So I have to wait until the cpu hotplug
notifier list gets called... and hence I need a reference to the "sm"
structure before it can be freed, because that's where the pointer to the
task_struct is stored.
This all leads to very ugly reference counting.

Hmm.. while thinking about it.. maybe it would sense to do something like

	wait_task_inactive(p, 0);
	set_task_cpu(p, any_online_cpu());
	kthread_stop(p);

within stop_machine_destroy() in case of cpu hot unplug for the thread that
was on the dead cpu?
That would ease the ugly reference counting in the patch below a lot.

Anyway, the patch below is what I currently have. It does work and should
give you an idea of _what_ I want. However the implementation does suck
currently, no question about that.

---
 include/linux/stop_machine.h |    8 +
 kernel/stop_machine.c        |  250 +++++++++++++++++++++++++++++++++----------
 2 files changed, 204 insertions(+), 54 deletions(-)

Index: linux-2.6/kernel/stop_machine.c
===================================================================
--- linux-2.6.orig/kernel/stop_machine.c
+++ linux-2.6/kernel/stop_machine.c
@@ -4,6 +4,7 @@
 #include <linux/cpu.h>
 #include <linux/err.h>
 #include <linux/kthread.h>
+#include <linux/init.h>
 #include <linux/module.h>
 #include <linux/sched.h>
 #include <linux/stop_machine.h>
@@ -34,11 +35,20 @@ struct stop_machine_data {
 	int fnret;
 };
 
+struct stop_machine {
+	struct task_struct *threads[NR_CPUS];
+	int usecount;
+	int threadcount;
+	struct stop_machine_data active, idle;
+	cpumask_t active_cpus;
+};
+
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
 static unsigned int num_threads;
 static atomic_t thread_ack;
 static struct completion finished;
 static DEFINE_MUTEX(lock);
+static struct stop_machine *smh;
 
 static void set_state(enum stopmachine_state newstate)
 {
@@ -48,6 +58,13 @@ static void set_state(enum stopmachine_s
 	state = newstate;
 }
 
+static enum stopmachine_state read_state(void)
+{
+	/* Force read of state. */
+	barrier();
+	return state;
+}
+
 /* Last one to ack a state moves to the next state. */
 static void ack_state(void)
 {
@@ -62,7 +79,7 @@ static void ack_state(void)
 
 /* This is the actual thread which stops the CPU.  It exits by itself rather
  * than waiting for kthread_stop(), because it's easier for hotplug CPU. */
-static int stop_cpu(struct stop_machine_data *smdata)
+static void __stop_cpu(struct stop_machine_data *smdata)
 {
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 
@@ -90,7 +107,30 @@ static int stop_cpu(struct stop_machine_
 	} while (curstate != STOPMACHINE_EXIT);
 
 	local_irq_enable();
-	do_exit(0);
+}
+
+static int stop_cpu(void *smcpu)
+{
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wait);
+	struct stop_machine_data *smdata;
+	int cpu;
+
+	cpu = (long)smcpu;
+	while (1) {
+		wait_event_interruptible(wait,
+					 kthread_should_stop() ||
+					 read_state() == STOPMACHINE_PREPARE);
+		if (kthread_should_stop())
+			break;
+		/* active_cpus mask might have changed. */
+		barrier();
+		if (cpu_isset(cpu, smh->active_cpus))
+			smdata = &smh->active;
+		else
+			smdata = &smh->idle;
+		__stop_cpu(smdata);
+	}
+	return 0;
 }
 
 /* Callback for CPUs which aren't supposed to do anything. */
@@ -99,79 +139,142 @@ static int chill(void *unused)
 	return 0;
 }
 
-int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
+static void put_smh(int is_thread)
 {
-	int i, err;
-	struct stop_machine_data active, idle;
-	struct task_struct **threads;
+	if (is_thread)
+		smh->threadcount--;
+	else
+		smh->usecount--;
+	if (smh->threadcount || smh->usecount)
+		return;
+	kfree(smh);
+	smh = NULL;
+}
 
-	active.fn = fn;
-	active.data = data;
-	active.fnret = 0;
-	idle.fn = chill;
-	idle.data = NULL;
-
-	/* This could be too big for stack on large machines. */
-	threads = kcalloc(NR_CPUS, sizeof(threads[0]), GFP_KERNEL);
-	if (!threads)
-		return -ENOMEM;
+static int create_kstop_thread(int cpu)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+	struct task_struct *k;
+	int err;
+
+	if (!smh || !smh->usecount)
+		return 0;
+	k = kthread_create((void *)stop_cpu, (void *)(long)cpu, "kstop%u", cpu);
+	err = IS_ERR(k) ? PTR_ERR(k) : 0;
+	if (err)
+		return err;
+	smh->threads[cpu] = k;
+	/* Place it onto correct cpu. */
+	kthread_bind(k, cpu);
+
+	/* Make it highest prio. */
+	if (sched_setscheduler_nocheck(k, SCHED_FIFO, &param))
+		BUG();
+	/* Move it into state INTERRUPTIBLE. */
+	wake_up_process(k);
+	smh->threadcount++;
+	return 0;
+}
 
-	/* Set up initial state. */
+static void kill_kstop_thread(int cpu)
+{
+	if (!smh || !smh->threads[cpu])
+		return;
+	kthread_stop(smh->threads[cpu]);
+	smh->threads[cpu] = NULL;
+	put_smh(1);
+}
+
+static void __stop_machine_destroy(void)
+{
+	int i;
+
+	if (smh->usecount > 1) {
+		put_smh(0);
+		return;
+	}
+	for_each_online_cpu(i)
+		kill_kstop_thread(i);
+	put_smh(0);
+}
+
+void stop_machine_destroy(void)
+{
 	mutex_lock(&lock);
-	init_completion(&finished);
-	num_threads = num_online_cpus();
-	set_state(STOPMACHINE_PREPARE);
+	__stop_machine_destroy();
+	mutex_unlock(&lock);
+}
+
+static int __stop_machine_prepare(void)
+{
+	int i, err;
 
+	if (!smh)
+		smh = kzalloc(sizeof(*smh), GFP_KERNEL);
+	if (!smh)
+		return -ENOMEM;
+	if (smh->usecount++)
+		return 0;
 	for_each_online_cpu(i) {
-		struct stop_machine_data *smdata = &idle;
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+		err = create_kstop_thread(i);
+		if (err)
+			goto destroy;
+	}
+	return 0;
+destroy:
+	__stop_machine_destroy();
+	return err;
+}
 
-		if (!cpus) {
-			if (i == first_cpu(cpu_online_map))
-				smdata = &active;
-		} else {
-			if (cpu_isset(i, *cpus))
-				smdata = &active;
-		}
+int stop_machine_prepare(void)
+{
+	int err;
 
-		threads[i] = kthread_create((void *)stop_cpu, smdata, "kstop%u",
-					    i);
-		if (IS_ERR(threads[i])) {
-			err = PTR_ERR(threads[i]);
-			threads[i] = NULL;
-			goto kill_threads;
-		}
+	mutex_lock(&lock);
+	err = __stop_machine_prepare();
+	mutex_unlock(&lock);
+	return err;
+}
+
+int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
+{
+	int i, err;
 
-		/* Place it onto correct cpu. */
-		kthread_bind(threads[i], i);
+	/* Set up initial state. */
+	mutex_lock(&lock);
+	init_completion(&finished);
+	num_threads = num_online_cpus();
 
-		/* Make it highest prio. */
-		if (sched_setscheduler_nocheck(threads[i], SCHED_FIFO, &param))
-			BUG();
+	set_state(STOPMACHINE_NONE);
+	err = __stop_machine_prepare();
+	if (err) {
+		mutex_unlock(&lock);
+		return err;
 	}
 
+	if (cpus)
+		smh->active_cpus = *cpus;
+	else
+		smh->active_cpus = cpumask_of_cpu(first_cpu(cpu_online_map));
+	smh->active.fn = fn;
+	smh->active.data = data;
+	smh->active.fnret = 0;
+	smh->idle.fn = chill;
+	smh->idle.data = NULL;
+
 	/* We've created all the threads.  Wake them all: hold this CPU so one
 	 * doesn't hit this CPU until we're ready. */
 	get_cpu();
+	set_state(STOPMACHINE_PREPARE);
 	for_each_online_cpu(i)
-		wake_up_process(threads[i]);
+		wake_up_process(smh->threads[i]);
 
 	/* This will release the thread on our CPU. */
 	put_cpu();
 	wait_for_completion(&finished);
+	err = smh->active.fnret;
+	__stop_machine_destroy();
 	mutex_unlock(&lock);
-
-	kfree(threads);
-
-	return active.fnret;
-
-kill_threads:
-	for_each_online_cpu(i)
-		if (threads[i])
-			kthread_stop(threads[i]);
-	mutex_unlock(&lock);
-
-	kfree(threads);
 	return err;
 }
 
@@ -187,3 +290,42 @@ int stop_machine(int (*fn)(void *), void
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
+
+static int __cpuinit stop_machine_notify(struct notifier_block *self,
+					 unsigned long action, void *hcpu)
+{
+	int rc = 0;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		mutex_lock(&lock);
+		rc = create_kstop_thread((long)hcpu);
+		mutex_unlock(&lock);
+		break;
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+		mutex_lock(&lock);
+		kill_kstop_thread((long)hcpu);
+		mutex_unlock(&lock);
+		break;
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		mutex_lock(&lock);
+		kill_kstop_thread((long)hcpu);
+		mutex_unlock(&lock);
+		break;
+	}
+	return rc ? NOTIFY_BAD : NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata stop_machine_nb = {
+	.notifier_call = stop_machine_notify,
+};
+
+static int __init stop_machine_init(void)
+{
+	register_hotcpu_notifier(&stop_machine_nb);
+	return 0;
+}
+early_initcall(stop_machine_init);
Index: linux-2.6/include/linux/stop_machine.h
===================================================================
--- linux-2.6.orig/include/linux/stop_machine.h
+++ linux-2.6/include/linux/stop_machine.h
@@ -35,6 +35,10 @@ int stop_machine(int (*fn)(void *), void
  * won't come or go while it's being called.  Used by hotplug cpu.
  */
 int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus);
+
+int stop_machine_prepare(void);
+void stop_machine_destroy(void);
+
 #else
 
 static inline int stop_machine(int (*fn)(void *), void *data,
@@ -46,5 +50,9 @@ static inline int stop_machine(int (*fn)
 	local_irq_enable();
 	return ret;
 }
+
+static inline int stop_machine_prepare(void) { return 0; }
+static inline void stop_machine_destroy(void) { }
+
 #endif /* CONFIG_SMP */
 #endif /* _LINUX_STOP_MACHINE */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-07 15:38       ` Heiko Carstens
@ 2008-10-08  0:27         ` Rusty Russell
  2008-10-08 10:14           ` Heiko Carstens
  0 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2008-10-08  0:27 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: jens.axboe, schwidefsky, linux-kernel, Ingo Molnar

On Wednesday 08 October 2008 02:38:54 Heiko Carstens wrote:
> On Tue, Oct 07, 2008 at 11:39:58AM +1000, Rusty Russell wrote:
> > That's exactly my idea.  We kmalloc already because NR_CPUS might be too
> > big for the stack.  This version would just kmalloc a struct containing
> > everything we need.
>
> Ok, I did that but the resulting code is astonishingly ugly, so I thought I
> should share it :)

Yeah, the diffstat tells the story.

> Another thing that comes to mind is cpu hotplug: if somebody issued
> stop_machine_prepare() and then a cpu hotplug operation gets started we
> need to create or kill a kstop thread. For that we need the "sm" so we can
> save/find the task_struct pointer of the thread.

Erk, good point.  Suckage.

OK, idea #2.  Let's just always have a kstopmachine thread running on every 
online cpu.  Is there a sane way to reuse the workqueue threads for this?

Thanks,
Rusty.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-08  0:27         ` Rusty Russell
@ 2008-10-08 10:14           ` Heiko Carstens
  2008-10-09  0:18             ` Rusty Russell
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2008-10-08 10:14 UTC (permalink / raw)
  To: Rusty Russell
  Cc: jens.axboe, schwidefsky, linux-kernel, Ingo Molnar, Andrew Morton

On Wed, Oct 08, 2008 at 10:27:04AM +1000, Rusty Russell wrote:
> 
> > Another thing that comes to mind is cpu hotplug: if somebody issued
> > stop_machine_prepare() and then a cpu hotplug operation gets started we
> > need to create or kill a kstop thread. For that we need the "sm" so we can
> > save/find the task_struct pointer of the thread.
> 
> Erk, good point.  Suckage.
> 
> OK, idea #2.  Let's just always have a kstopmachine thread running on every 
> online cpu.  Is there a sane way to reuse the workqueue threads for this?

That's a very good idea and what the patch below does. It even simplifies
the stop_machine code and it does work on an otherwise idle system.
The only thing that needs to be addressed is that workqueue threads aka
stop_machine threads are no real time threads now.
We would need something like create_workqueue_prio() or create_workqueue_rt().
Would that be acceptable?

---
 kernel/stop_machine.c |   97 ++++++++++++++++----------------------------------
 1 file changed, 32 insertions(+), 65 deletions(-)

Index: linux-2.6/kernel/stop_machine.c
===================================================================
--- linux-2.6.orig/kernel/stop_machine.c
+++ linux-2.6/kernel/stop_machine.c
@@ -37,9 +37,13 @@ struct stop_machine_data {
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
 static unsigned int num_threads;
 static atomic_t thread_ack;
-static struct completion finished;
 static DEFINE_MUTEX(lock);
 
+static struct workqueue_struct *stop_machine_wq;
+static struct work_struct *stop_machine_work;
+static struct stop_machine_data active, idle;
+static cpumask_t active_cpus;
+
 static void set_state(enum stopmachine_state newstate)
 {
 	/* Reset ack counter. */
@@ -51,21 +55,21 @@ static void set_state(enum stopmachine_s
 /* Last one to ack a state moves to the next state. */
 static void ack_state(void)
 {
-	if (atomic_dec_and_test(&thread_ack)) {
-		/* If we're the last one to ack the EXIT, we're finished. */
-		if (state == STOPMACHINE_EXIT)
-			complete(&finished);
-		else
-			set_state(state + 1);
-	}
+	if (atomic_dec_and_test(&thread_ack))
+		set_state(state + 1);
 }
 
 /* This is the actual thread which stops the CPU.  It exits by itself rather
  * than waiting for kthread_stop(), because it's easier for hotplug CPU. */
-static int stop_cpu(struct stop_machine_data *smdata)
+static void stop_cpu(struct work_struct *unused)
 {
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
+	struct stop_machine_data *smdata;
 
+	if (cpu_isset(smp_processor_id(), active_cpus))
+		smdata = &active;
+	else
+		smdata = &idle;
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read stopmachine_state. */
@@ -90,7 +94,6 @@ static int stop_cpu(struct stop_machine_
 	} while (curstate != STOPMACHINE_EXIT);
 
 	local_irq_enable();
-	do_exit(0);
 }
 
 /* Callback for CPUs which aren't supposed to do anything. */
@@ -101,78 +104,33 @@ static int chill(void *unused)
 
 int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
 {
-	int i, err;
-	struct stop_machine_data active, idle;
-	struct task_struct **threads;
+	int i;
 
+	/* Set up initial state. */
+	mutex_lock(&lock);
+	num_threads = num_online_cpus();
+	active_cpus = cpus ? *cpus : cpumask_of_cpu(first_cpu(cpu_online_map));
 	active.fn = fn;
 	active.data = data;
 	active.fnret = 0;
 	idle.fn = chill;
 	idle.data = NULL;
 
-	/* This could be too big for stack on large machines. */
-	threads = kcalloc(NR_CPUS, sizeof(threads[0]), GFP_KERNEL);
-	if (!threads)
-		return -ENOMEM;
-
-	/* Set up initial state. */
-	mutex_lock(&lock);
-	init_completion(&finished);
-	num_threads = num_online_cpus();
 	set_state(STOPMACHINE_PREPARE);
 
-	for_each_online_cpu(i) {
-		struct stop_machine_data *smdata = &idle;
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
-		if (!cpus) {
-			if (i == first_cpu(cpu_online_map))
-				smdata = &active;
-		} else {
-			if (cpu_isset(i, *cpus))
-				smdata = &active;
-		}
-
-		threads[i] = kthread_create((void *)stop_cpu, smdata, "kstop%u",
-					    i);
-		if (IS_ERR(threads[i])) {
-			err = PTR_ERR(threads[i]);
-			threads[i] = NULL;
-			goto kill_threads;
-		}
-
-		/* Place it onto correct cpu. */
-		kthread_bind(threads[i], i);
-
-		/* Make it highest prio. */
-		if (sched_setscheduler_nocheck(threads[i], SCHED_FIFO, &param))
-			BUG();
-	}
-
 	/* We've created all the threads.  Wake them all: hold this CPU so one
 	 * doesn't hit this CPU until we're ready. */
 	get_cpu();
-	for_each_online_cpu(i)
-		wake_up_process(threads[i]);
-
+	for_each_online_cpu(i) {
+		INIT_WORK(&stop_machine_work[i], stop_cpu);
+		queue_work_on(i, stop_machine_wq, &stop_machine_work[i]);
+	}
 	/* This will release the thread on our CPU. */
 	put_cpu();
-	wait_for_completion(&finished);
+	flush_workqueue(stop_machine_wq);
 	mutex_unlock(&lock);
 
-	kfree(threads);
-
 	return active.fnret;
-
-kill_threads:
-	for_each_online_cpu(i)
-		if (threads[i])
-			kthread_stop(threads[i]);
-	mutex_unlock(&lock);
-
-	kfree(threads);
-	return err;
 }
 
 int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
@@ -187,3 +145,12 @@ int stop_machine(int (*fn)(void *), void
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
+
+static int __init stop_machine_init(void)
+{
+	stop_machine_wq = create_workqueue("kstop");
+	stop_machine_work = kcalloc(NR_CPUS, sizeof(struct work_struct),
+				    GFP_KERNEL);
+	return 0;
+}
+device_initcall(stop_machine_init);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-08 10:14           ` Heiko Carstens
@ 2008-10-09  0:18             ` Rusty Russell
  2008-10-09 16:25               ` Heiko Carstens
  0 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2008-10-09  0:18 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: jens.axboe, schwidefsky, linux-kernel, Ingo Molnar, Andrew Morton

On Wednesday 08 October 2008 21:14:50 Heiko Carstens wrote:
> On Wed, Oct 08, 2008 at 10:27:04AM +1000, Rusty Russell wrote:
> > OK, idea #2.  Let's just always have a kstopmachine thread running on
> > every online cpu.  Is there a sane way to reuse the workqueue threads for
> > this?
>
> That's a very good idea and what the patch below does. It even simplifies
> the stop_machine code and it does work on an otherwise idle system.
> The only thing that needs to be addressed is that workqueue threads aka
> stop_machine threads are no real time threads now.
> We would need something like create_workqueue_prio() or
> create_workqueue_rt(). Would that be acceptable?

Hmm, I was hoping to reuse the kevent threads rather than create YA set
of threads.  But hey, everyone else is doing it.

> +static struct workqueue_struct *stop_machine_wq;
> +static struct work_struct *stop_machine_work;
> +static struct stop_machine_data active, idle;
> +static cpumask_t active_cpus;

Hmm, please make active cpus a const cpumask_t pointer.  I'm trying to
get rid of these kind of decls in another patch series :)

>  /* This is the actual thread which stops the CPU.  It exits by itself rather
>   * than waiting for kthread_stop(), because it's easier for hotplug CPU. */

This comment is no longer valid...

> +static int __init stop_machine_init(void)
> +{
> +	stop_machine_wq = create_workqueue("kstop");
> +	stop_machine_work = kcalloc(NR_CPUS, sizeof(struct work_struct),
> +				    GFP_KERNEL);

Perhaps make stop_machine_work a per-cpu array of struct work_struct
instead of initializing it here.  Or at least make it a percpu pointer and
only alloc possible cpus.

Does it break cpu hotplug BTW?  That's usually the problem.

But it looks nice!
Rusty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue.
  2008-10-09  0:18             ` Rusty Russell
@ 2008-10-09 16:25               ` Heiko Carstens
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2008-10-09 16:25 UTC (permalink / raw)
  To: Rusty Russell
  Cc: jens.axboe, schwidefsky, linux-kernel, Ingo Molnar, Andrew Morton

On Thu, Oct 09, 2008 at 11:18:27AM +1100, Rusty Russell wrote:
> On Wednesday 08 October 2008 21:14:50 Heiko Carstens wrote:
> > The only thing that needs to be addressed is that workqueue threads aka
> > stop_machine threads are no real time threads now.
> > We would need something like create_workqueue_prio() or
> > create_workqueue_rt(). Would that be acceptable?
> 
> Hmm, I was hoping to reuse the kevent threads rather than create YA set
> of threads.  But hey, everyone else is doing it.

Reusing the kevent threads wouldn't work. The stopmachine work might be
queued behind a different work which could block on I/O.
Which is what I try to avoid.

> > +static struct workqueue_struct *stop_machine_wq;
> > +static struct work_struct *stop_machine_work;
> > +static struct stop_machine_data active, idle;
> > +static cpumask_t active_cpus;
> Hmm, please make active cpus a const cpumask_t pointer.  I'm trying to
> get rid of these kind of decls in another patch series :)

done.

> >  /* This is the actual thread which stops the CPU.  It exits by itself rather
> >   * than waiting for kthread_stop(), because it's easier for hotplug CPU. */
> This comment is no longer valid...

fixed.
 
> > +static int __init stop_machine_init(void)
> > +{
> > +	stop_machine_wq = create_workqueue("kstop");
> > +	stop_machine_work = kcalloc(NR_CPUS, sizeof(struct work_struct),
> > +				    GFP_KERNEL);
> Perhaps make stop_machine_work a per-cpu array of struct work_struct
> instead of initializing it here.  Or at least make it a percpu pointer and
> only alloc possible cpus.

For the time being I converted it to alloc_percpu. Might be worth to use
percpu_alloc instead and register a cpu hotplug notifier...
 
> Does it break cpu hotplug BTW?  That's usually the problem.

It works. CPU hotplug is my primary test case, because it usually breaks ;)

So, how about the patch below then? I converted the device_initcall() to an
early_initcall(), so we get stop_machine working before smp works. For that
I had to move the init_workqueues() call a bit.

There are two more things that probably need to be addressed: the priority
of the workqueue should probably be changed to MAX_RT_PRIO-1 like it was
for the kstop threads.

And for num_online_cpus() == 1 we might as well call the simple stop_machine()
version which is defined in include/linux/stop_machine.h for !CONFIG_SMP.

---
 init/main.c           |    4 -
 kernel/stop_machine.c |  109 ++++++++++++++++++--------------------------------
 2 files changed, 42 insertions(+), 71 deletions(-)

Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -767,8 +767,6 @@ static void __init do_initcalls(void)
 static void __init do_basic_setup(void)
 {
 	rcu_init_sched(); /* needed by module_init stage. */
-	/* drivers will send hotplug events */
-	init_workqueues();
 	usermodehelper_init();
 	driver_init();
 	init_irq_proc();
@@ -852,6 +850,8 @@ static int __init kernel_init(void * unu
 
 	cad_pid = task_pid(current);
 
+	init_workqueues();
+
 	smp_prepare_cpus(setup_max_cpus);
 
 	do_pre_smp_initcalls();
Index: linux-2.6/kernel/stop_machine.c
===================================================================
--- linux-2.6.orig/kernel/stop_machine.c
+++ linux-2.6/kernel/stop_machine.c
@@ -37,9 +37,13 @@ struct stop_machine_data {
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
 static unsigned int num_threads;
 static atomic_t thread_ack;
-static struct completion finished;
 static DEFINE_MUTEX(lock);
 
+static struct workqueue_struct *stop_machine_wq;
+static struct stop_machine_data active, idle;
+static const cpumask_t *active_cpus;
+static void *stop_machine_work;
+
 static void set_state(enum stopmachine_state newstate)
 {
 	/* Reset ack counter. */
@@ -51,21 +55,25 @@ static void set_state(enum stopmachine_s
 /* Last one to ack a state moves to the next state. */
 static void ack_state(void)
 {
-	if (atomic_dec_and_test(&thread_ack)) {
-		/* If we're the last one to ack the EXIT, we're finished. */
-		if (state == STOPMACHINE_EXIT)
-			complete(&finished);
-		else
-			set_state(state + 1);
-	}
+	if (atomic_dec_and_test(&thread_ack))
+		set_state(state + 1);
 }
 
-/* This is the actual thread which stops the CPU.  It exits by itself rather
- * than waiting for kthread_stop(), because it's easier for hotplug CPU. */
-static int stop_cpu(struct stop_machine_data *smdata)
+/* This is the actual function which stops the CPU. It runs
+ * in the context of a dedicated stopmachine workqueue. */
+static void stop_cpu(struct work_struct *unused)
 {
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
+	struct stop_machine_data *smdata = &idle;
+	int cpu = smp_processor_id();
 
+	if (!active_cpus) {
+		if (cpu == first_cpu(cpu_online_map))
+			smdata = &active;
+	} else {
+		if (cpu_isset(cpu, *active_cpus))
+			smdata = &active;
+	}
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read stopmachine_state. */
@@ -90,7 +98,6 @@ static int stop_cpu(struct stop_machine_
 	} while (curstate != STOPMACHINE_EXIT);
 
 	local_irq_enable();
-	do_exit(0);
 }
 
 /* Callback for CPUs which aren't supposed to do anything. */
@@ -101,78 +108,34 @@ static int chill(void *unused)
 
 int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
 {
-	int i, err;
-	struct stop_machine_data active, idle;
-	struct task_struct **threads;
+	struct work_struct *sm_work;
+	int i;
 
+	/* Set up initial state. */
+	mutex_lock(&lock);
+	num_threads = num_online_cpus();
+	active_cpus = cpus;
 	active.fn = fn;
 	active.data = data;
 	active.fnret = 0;
 	idle.fn = chill;
 	idle.data = NULL;
 
-	/* This could be too big for stack on large machines. */
-	threads = kcalloc(NR_CPUS, sizeof(threads[0]), GFP_KERNEL);
-	if (!threads)
-		return -ENOMEM;
-
-	/* Set up initial state. */
-	mutex_lock(&lock);
-	init_completion(&finished);
-	num_threads = num_online_cpus();
 	set_state(STOPMACHINE_PREPARE);
 
-	for_each_online_cpu(i) {
-		struct stop_machine_data *smdata = &idle;
-		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
-		if (!cpus) {
-			if (i == first_cpu(cpu_online_map))
-				smdata = &active;
-		} else {
-			if (cpu_isset(i, *cpus))
-				smdata = &active;
-		}
-
-		threads[i] = kthread_create((void *)stop_cpu, smdata, "kstop%u",
-					    i);
-		if (IS_ERR(threads[i])) {
-			err = PTR_ERR(threads[i]);
-			threads[i] = NULL;
-			goto kill_threads;
-		}
-
-		/* Place it onto correct cpu. */
-		kthread_bind(threads[i], i);
-
-		/* Make it highest prio. */
-		if (sched_setscheduler_nocheck(threads[i], SCHED_FIFO, &param))
-			BUG();
-	}
-
-	/* We've created all the threads.  Wake them all: hold this CPU so one
+	/* Schedule the stop_cpu work on all cpus: hold this CPU so one
 	 * doesn't hit this CPU until we're ready. */
 	get_cpu();
-	for_each_online_cpu(i)
-		wake_up_process(threads[i]);
-
+	for_each_online_cpu(i) {
+		sm_work = percpu_ptr(stop_machine_work, i);
+		INIT_WORK(sm_work, stop_cpu);
+		queue_work_on(i, stop_machine_wq, sm_work);
+	}
 	/* This will release the thread on our CPU. */
 	put_cpu();
-	wait_for_completion(&finished);
+	flush_workqueue(stop_machine_wq);
 	mutex_unlock(&lock);
-
-	kfree(threads);
-
 	return active.fnret;
-
-kill_threads:
-	for_each_online_cpu(i)
-		if (threads[i])
-			kthread_stop(threads[i]);
-	mutex_unlock(&lock);
-
-	kfree(threads);
-	return err;
 }
 
 int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
@@ -187,3 +150,11 @@ int stop_machine(int (*fn)(void *), void
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
+
+static int __init stop_machine_init(void)
+{
+	stop_machine_wq = create_workqueue("kstop");
+	stop_machine_work = alloc_percpu(struct work_struct);
+	return 0;
+}
+early_initcall(stop_machine_init);

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-10-09 16:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-03 10:56 [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Heiko Carstens
2008-10-03 10:56 ` [PATCH/RFC 1/4] stop_machine: atomic update for combined return value Heiko Carstens
2008-10-03 10:56 ` [PATCH/RFC 2/4] stop_machine: add stop_machine_get/put_threads interface Heiko Carstens
2008-10-03 10:56 ` [PATCH/RFC 3/4] s390: convert etr/stp to stop_machine interface Heiko Carstens
2008-10-03 10:56 ` [PATCH/RFC 4/4] s390: convert to generic IPI infrstructure Heiko Carstens
2008-10-06  4:42 ` [PATCH/RFC 0/4] Add stop_machine_get/put_threads to stop_machine infrastructrue Rusty Russell
2008-10-06 20:16   ` Heiko Carstens
2008-10-07  1:39     ` Rusty Russell
2008-10-07 15:38       ` Heiko Carstens
2008-10-08  0:27         ` Rusty Russell
2008-10-08 10:14           ` Heiko Carstens
2008-10-09  0:18             ` Rusty Russell
2008-10-09 16:25               ` Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).