* [PATCHSET sched/core] cpu_stop: implement and use cpu_stop
@ 2010-04-22 16:09 Tejun Heo
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Tejun Heo @ 2010-04-22 16:09 UTC (permalink / raw)
To: mingo, linux-kernel, x86, oleg, peterz, rusty, sivanich,
heiko.carstens, dipankar, josh, paulmck, akpm, arjan, torvalds
Hello, all.
cpu_hog has been renamed to cpu_stop and moved into
kernel/stop_machine.c per Peter Zijlstra's suggestion. This patchset
is feature-wise identical to the second take of cpuhog[L]. The only
changes are the rename, relocation and refresh against the current
sched/core.
The following API renames took place.
- hog_one_cpu() -> stop_one_cpu()
- hog_one_cpu_nowait() -> stop_one_cpu_nowait()
- hog_cpus() -> stop_cpus()
- try_hog_cpus() -> try_stop_cpus()
- *_hog() callbacks -> *_cpu_stop()
Internal names have been renamed accordingly. e.g. cpuhog thread
became cpu_stopper thread and so on.
This patchset contains the following four patches.
0001-cpu_stop-implement-stop_cpu-s.patch
0002-stop_machine-reimplement-using-cpu_stop.patch
0003-scheduler-replace-migration_thread-with-cpu_stop.patch
0004-scheduler-kill-paranoia-check-in-synchronize_sched_e.patch
The patches are against the current linux-2.6-tip/sched/core
(09a40af5240de02d848247ab82440ad75b31ab11) and are available in the
following git tree.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpu_stop
I retained the original acked/reviewed-by's as the changes are mostly
cosmetic. If you disagree, please let me know. I'll try to push this
through sched/core again once Peter acks.
diffstat follows.
Documentation/RCU/torture.txt | 10
arch/s390/kernel/time.c | 1
drivers/xen/manage.c | 14 -
include/linux/rcutiny.h | 2
include/linux/rcutree.h | 1
include/linux/stop_machine.h | 59 ++--
kernel/cpu.c | 8
kernel/module.c | 14 -
kernel/rcutorture.c | 2
kernel/sched.c | 271 +++------------------
kernel/sched_fair.c | 42 ++-
kernel/stop_machine.c | 525 ++++++++++++++++++++++++++++++++----------
12 files changed, 514 insertions(+), 435 deletions(-)
Thanks.
--
tejun
[L] http://thread.gmane.org/gmane.linux.kernel/962635
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-04-22 16:09 [PATCHSET sched/core] cpu_stop: implement and use cpu_stop Tejun Heo
@ 2010-04-22 16:09 ` Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
` (2 more replies)
2010-04-22 16:09 ` [PATCH 2/4] stop_machine: reimplement using cpu_stop Tejun Heo
` (2 subsequent siblings)
3 siblings, 3 replies; 18+ messages in thread
From: Tejun Heo @ 2010-04-22 16:09 UTC (permalink / raw)
To: mingo, linux-kernel, x86, oleg, peterz, rusty, sivanich,
heiko.carstens, dipankar, josh, paulmck, akpm, arjan, torvalds
Cc: Tejun Heo
Implement a simplistic per-cpu maximum priority cpu monopolization
mechanism. A non-sleeping callback can be scheduled to run on one or
multiple cpus with maximum priority monopolozing those cpus. This is
primarily to replace and unify RT workqueue usage in stop_machine and
scheduler migration_thread which currently is serving multiple
purposes.
Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
stop_cpus() and try_stop_cpus().
This is to allow clean sharing of resources among stop_cpu and all the
migration thread users. One stopper thread per cpu is created which
is currently named "stopper/CPU". This will eventually replace the
migration thread and take on its name.
* This facility was originally named cpuhog and lived in separate
files but Peter Zijlstra nacked the name and thus got renamed to
cpu_stop and moved into stop_machine.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
include/linux/stop_machine.h | 39 ++++-
kernel/stop_machine.c | 367 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 397 insertions(+), 9 deletions(-)
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index baba3a2..efcbd6c 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -1,15 +1,46 @@
#ifndef _LINUX_STOP_MACHINE
#define _LINUX_STOP_MACHINE
-/* "Bogolock": stop the entire machine, disable interrupts. This is a
- very heavy lock, which is equivalent to grabbing every spinlock
- (and more). So the "read" side to such a lock is anything which
- disables preeempt. */
+
#include <linux/cpu.h>
#include <linux/cpumask.h>
+#include <linux/list.h>
#include <asm/system.h>
#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+/*
+ * stop_cpu[s]() is simplistic per-cpu maximum priority cpu
+ * monopolization mechanism. The caller can specify a non-sleeping
+ * function to be executed on a single or multiple cpus preempting all
+ * other processes and monopolizing those cpus until it finishes.
+ *
+ * Resources for this mechanism are preallocated when a cpu is brought
+ * up and requests are guaranteed to be served as long as the target
+ * cpus are online.
+ */
+
+typedef int (*cpu_stop_fn_t)(void *arg);
+
+struct cpu_stop_work {
+ struct list_head list; /* cpu_stopper->works */
+ cpu_stop_fn_t fn;
+ void *arg;
+ struct cpu_stop_done *done;
+};
+
+int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
+void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
+ struct cpu_stop_work *work_buf);
+int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+
+/*
+ * stop_machine "Bogolock": stop the entire machine, disable
+ * interrupts. This is a very heavy lock, which is equivalent to
+ * grabbing every spinlock (and more). So the "read" side to such a
+ * lock is anything which disables preeempt.
+ */
+
/**
* stop_machine: freeze the machine on all CPUs and run this function
* @fn: the function to run
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 9bb9fb1..3ae2485 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -1,17 +1,374 @@
-/* Copyright 2008, 2005 Rusty Russell rusty@rustcorp.com.au IBM Corporation.
- * GPL v2 and any later version.
+/*
+ * kernel/stop_machine.c
+ *
+ * Copyright (C) 2008, 2005 IBM Corporation.
+ * Copyright (C) 2008, 2005 Rusty Russell rusty@rustcorp.com.au
+ * Copyright (C) 2010 SUSE Linux Products GmbH
+ * Copyright (C) 2010 Tejun Heo <tj@kernel.org>
+ *
+ * This file is released under the GPLv2 and any later version.
*/
+#include <linux/completion.h>
#include <linux/cpu.h>
-#include <linux/err.h>
+#include <linux/init.h>
#include <linux/kthread.h>
#include <linux/module.h>
+#include <linux/percpu.h>
#include <linux/sched.h>
#include <linux/stop_machine.h>
-#include <linux/syscalls.h>
#include <linux/interrupt.h>
#include <asm/atomic.h>
-#include <asm/uaccess.h>
+
+/*
+ * Structure to determine completion condition and record errors. May
+ * be shared by works on different cpus.
+ */
+struct cpu_stop_done {
+ atomic_t nr_todo; /* nr left to execute */
+ bool executed; /* actually executed? */
+ int ret; /* collected return value */
+ struct completion completion; /* fired if nr_todo reaches 0 */
+};
+
+/* the actual stopper, one per every possible cpu, enabled on online cpus */
+struct cpu_stopper {
+ spinlock_t lock;
+ struct list_head works; /* list of pending works */
+ struct task_struct *thread; /* stopper thread */
+ bool enabled; /* is this stopper enabled? */
+};
+
+static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
+
+static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
+{
+ memset(done, 0, sizeof(*done));
+ atomic_set(&done->nr_todo, nr_todo);
+ init_completion(&done->completion);
+}
+
+/* signal completion unless @done is NULL */
+static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
+{
+ if (done) {
+ if (executed)
+ done->executed = true;
+ if (atomic_dec_and_test(&done->nr_todo))
+ complete(&done->completion);
+ }
+}
+
+/* queue @work to @stopper. if offline, @work is completed immediately */
+static void cpu_stop_queue_work(struct cpu_stopper *stopper,
+ struct cpu_stop_work *work)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&stopper->lock, flags);
+
+ if (stopper->enabled) {
+ list_add_tail(&work->list, &stopper->works);
+ wake_up_process(stopper->thread);
+ } else
+ cpu_stop_signal_done(work->done, false);
+
+ spin_unlock_irqrestore(&stopper->lock, flags);
+}
+
+/**
+ * stop_one_cpu - stop a cpu
+ * @cpu: cpu to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Execute @fn(@arg) on @cpu. @fn is run in a process context with
+ * the highest priority preempting any task on the cpu and
+ * monopolizing it. This function returns after the execution is
+ * complete.
+ *
+ * This function doesn't guarantee @cpu stays online till @fn
+ * completes. If @cpu goes down in the middle, execution may happen
+ * partially or fully on different cpus. @fn should either be ready
+ * for that or the caller should ensure that @cpu stays online until
+ * this function completes.
+ *
+ * CONTEXT:
+ * Might sleep.
+ *
+ * RETURNS:
+ * -ENOENT if @fn(@arg) was not executed because @cpu was offline;
+ * otherwise, the return value of @fn.
+ */
+int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
+{
+ struct cpu_stop_done done;
+ struct cpu_stop_work work = { .fn = fn, .arg = arg, .done = &done };
+
+ cpu_stop_init_done(&done, 1);
+ cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), &work);
+ wait_for_completion(&done.completion);
+ return done.executed ? done.ret : -ENOENT;
+}
+
+/**
+ * stop_one_cpu_nowait - stop a cpu but don't wait for completion
+ * @cpu: cpu to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Similar to stop_one_cpu() but doesn't wait for completion. The
+ * caller is responsible for ensuring @work_buf is currently unused
+ * and will remain untouched until stopper starts executing @fn.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
+ struct cpu_stop_work *work_buf)
+{
+ memset(work_buf, 0, sizeof(*work_buf));
+ work_buf->fn = fn;
+ work_buf->arg = arg;
+ cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), work_buf);
+}
+
+/* static data for stop_cpus */
+static DEFINE_MUTEX(stop_cpus_mutex);
+static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
+
+int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+{
+ struct cpu_stop_work *work;
+ struct cpu_stop_done done;
+ unsigned int cpu;
+
+ /* initialize works and done */
+ for_each_cpu(cpu, cpumask) {
+ work = &per_cpu(stop_cpus_work, cpu);
+ work->fn = fn;
+ work->arg = arg;
+ work->done = &done;
+ }
+ cpu_stop_init_done(&done, cpumask_weight(cpumask));
+
+ /*
+ * Disable preemption while queueing to avoid getting
+ * preempted by a stopper which might wait for other stoppers
+ * to enter @fn which can lead to deadlock.
+ */
+ preempt_disable();
+ for_each_cpu(cpu, cpumask)
+ cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu),
+ &per_cpu(stop_cpus_work, cpu));
+ preempt_enable();
+
+ wait_for_completion(&done.completion);
+ return done.executed ? done.ret : -ENOENT;
+}
+
+/**
+ * stop_cpus - stop multiple cpus
+ * @cpumask: cpus to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Execute @fn(@arg) on online cpus in @cpumask. On each target cpu,
+ * @fn is run in a process context with the highest priority
+ * preempting any task on the cpu and monopolizing it. This function
+ * returns after all executions are complete.
+ *
+ * This function doesn't guarantee the cpus in @cpumask stay online
+ * till @fn completes. If some cpus go down in the middle, execution
+ * on the cpu may happen partially or fully on different cpus. @fn
+ * should either be ready for that or the caller should ensure that
+ * the cpus stay online until this function completes.
+ *
+ * All stop_cpus() calls are serialized making it safe for @fn to wait
+ * for all cpus to start executing it.
+ *
+ * CONTEXT:
+ * Might sleep.
+ *
+ * RETURNS:
+ * -ENOENT if @fn(@arg) was not executed at all because all cpus in
+ * @cpumask were offline; otherwise, 0 if all executions of @fn
+ * returned 0, any non zero return value if any returned non zero.
+ */
+int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+{
+ int ret;
+
+ /* static works are used, process one request at a time */
+ mutex_lock(&stop_cpus_mutex);
+ ret = __stop_cpus(cpumask, fn, arg);
+ mutex_unlock(&stop_cpus_mutex);
+ return ret;
+}
+
+/**
+ * try_stop_cpus - try to stop multiple cpus
+ * @cpumask: cpus to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Identical to stop_cpus() except that it fails with -EAGAIN if
+ * someone else is already using the facility.
+ *
+ * CONTEXT:
+ * Might sleep.
+ *
+ * RETURNS:
+ * -EAGAIN if someone else is already stopping cpus, -ENOENT if
+ * @fn(@arg) was not executed at all because all cpus in @cpumask were
+ * offline; otherwise, 0 if all executions of @fn returned 0, any non
+ * zero return value if any returned non zero.
+ */
+int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+{
+ int ret;
+
+ /* static works are used, process one request at a time */
+ if (!mutex_trylock(&stop_cpus_mutex))
+ return -EAGAIN;
+ ret = __stop_cpus(cpumask, fn, arg);
+ mutex_unlock(&stop_cpus_mutex);
+ return ret;
+}
+
+static int cpu_stopper_thread(void *data)
+{
+ struct cpu_stopper *stopper = data;
+ struct cpu_stop_work *work;
+ int ret;
+
+repeat:
+ set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
+
+ if (kthread_should_stop()) {
+ __set_current_state(TASK_RUNNING);
+ return 0;
+ }
+
+ work = NULL;
+ spin_lock_irq(&stopper->lock);
+ if (!list_empty(&stopper->works)) {
+ work = list_first_entry(&stopper->works,
+ struct cpu_stop_work, list);
+ list_del_init(&work->list);
+ }
+ spin_unlock_irq(&stopper->lock);
+
+ if (work) {
+ struct cpu_stop_done *done = work->done;
+
+ __set_current_state(TASK_RUNNING);
+
+ /* cpu stop callbacks are not allowed to sleep */
+ preempt_disable();
+
+ ret = work->fn(work->arg);
+ if (ret)
+ done->ret = ret;
+
+ /* restore preemption and check it's still balanced */
+ preempt_enable();
+ WARN_ON_ONCE(preempt_count());
+
+ cpu_stop_signal_done(done, true);
+ } else
+ schedule();
+
+ goto repeat;
+}
+
+/* manage stopper for a cpu, mostly lifted from sched migration thread mgmt */
+static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+ unsigned int cpu = (unsigned long)hcpu;
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+ struct cpu_stop_work *work;
+ struct task_struct *p;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_UP_PREPARE:
+ BUG_ON(stopper->thread || stopper->enabled ||
+ !list_empty(&stopper->works));
+ p = kthread_create(cpu_stopper_thread, stopper, "stopper/%d",
+ cpu);
+ if (IS_ERR(p))
+ return NOTIFY_BAD;
+ sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
+ get_task_struct(p);
+ stopper->thread = p;
+ break;
+
+ case CPU_ONLINE:
+ kthread_bind(stopper->thread, cpu);
+ /* strictly unnecessary, as first user will wake it */
+ wake_up_process(stopper->thread);
+ /* mark enabled */
+ spin_lock_irq(&stopper->lock);
+ stopper->enabled = true;
+ spin_unlock_irq(&stopper->lock);
+ break;
+
+#ifdef CONFIG_HOTPLUG_CPU
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ /* kill the stopper */
+ kthread_stop(stopper->thread);
+ /* drain remaining works */
+ spin_lock_irq(&stopper->lock);
+ list_for_each_entry(work, &stopper->works, list)
+ cpu_stop_signal_done(work->done, false);
+ stopper->enabled = false;
+ spin_unlock_irq(&stopper->lock);
+ /* release the stopper */
+ put_task_struct(stopper->thread);
+ stopper->thread = NULL;
+ break;
+#endif
+ }
+
+ return NOTIFY_OK;
+}
+
+/*
+ * Give it a higher priority so that cpu stopper is available to other
+ * cpu notifiers. It currently shares the same priority as sched
+ * migration_notifier.
+ */
+static struct notifier_block __cpuinitdata cpu_stop_cpu_notifier = {
+ .notifier_call = cpu_stop_cpu_callback,
+ .priority = 10,
+};
+
+static int __init cpu_stop_init(void)
+{
+ void *bcpu = (void *)(long)smp_processor_id();
+ unsigned int cpu;
+ int err;
+
+ for_each_possible_cpu(cpu) {
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+ spin_lock_init(&stopper->lock);
+ INIT_LIST_HEAD(&stopper->works);
+ }
+
+ /* start one for the boot cpu */
+ err = cpu_stop_cpu_callback(&cpu_stop_cpu_notifier, CPU_UP_PREPARE,
+ bcpu);
+ BUG_ON(err == NOTIFY_BAD);
+ cpu_stop_cpu_callback(&cpu_stop_cpu_notifier, CPU_ONLINE, bcpu);
+ register_cpu_notifier(&cpu_stop_cpu_notifier);
+
+ return 0;
+}
+early_initcall(cpu_stop_init);
/* This controls the threads on each CPU. */
enum stopmachine_state {
--
1.6.4.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] stop_machine: reimplement using cpu_stop
2010-04-22 16:09 [PATCHSET sched/core] cpu_stop: implement and use cpu_stop Tejun Heo
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
@ 2010-04-22 16:09 ` Tejun Heo
2010-04-22 16:09 ` [PATCH 3/4] scheduler: replace migration_thread with cpu_stop Tejun Heo
2010-04-22 16:09 ` [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited() Tejun Heo
3 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2010-04-22 16:09 UTC (permalink / raw)
To: mingo, linux-kernel, x86, oleg, peterz, rusty, sivanich,
heiko.carstens, dipankar, josh, paulmck, akpm, arjan, torvalds
Cc: Tejun Heo
Reimplement stop_machine using cpu_stop. As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.
With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler. Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.
stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.
The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines. According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines. cpu_stop resources are preallocated for all online
cpus and should have the same effect.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
arch/s390/kernel/time.c | 1 -
drivers/xen/manage.c | 14 +---
include/linux/stop_machine.h | 20 -----
kernel/cpu.c | 8 --
kernel/module.c | 14 +---
kernel/stop_machine.c | 158 ++++++++++--------------------------------
6 files changed, 42 insertions(+), 173 deletions(-)
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index fba6dec..03d9656 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -390,7 +390,6 @@ static void __init time_init_wq(void)
if (time_sync_wq)
return;
time_sync_wq = create_singlethread_workqueue("timesync");
- stop_machine_create();
}
/*
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 2ac4440..8943b8c 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -80,12 +80,6 @@ static void do_suspend(void)
shutting_down = SHUTDOWN_SUSPEND;
- err = stop_machine_create();
- if (err) {
- printk(KERN_ERR "xen suspend: failed to setup stop_machine %d\n", err);
- goto out;
- }
-
#ifdef CONFIG_PREEMPT
/* If the kernel is preemptible, we need to freeze all the processes
to prevent them from being in the middle of a pagetable update
@@ -93,7 +87,7 @@ static void do_suspend(void)
err = freeze_processes();
if (err) {
printk(KERN_ERR "xen suspend: freeze failed %d\n", err);
- goto out_destroy_sm;
+ goto out;
}
#endif
@@ -136,12 +130,8 @@ out_resume:
out_thaw:
#ifdef CONFIG_PREEMPT
thaw_processes();
-
-out_destroy_sm:
-#endif
- stop_machine_destroy();
-
out:
+#endif
shutting_down = SHUTDOWN_INVALID;
}
#endif /* CONFIG_PM_SLEEP */
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index efcbd6c..0e552e7 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -67,23 +67,6 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
*/
int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
-/**
- * stop_machine_create: create all stop_machine threads
- *
- * Description: This causes all stop_machine threads to be created before
- * stop_machine actually gets called. This can be used by subsystems that
- * need a non failing stop_machine infrastructure.
- */
-int stop_machine_create(void);
-
-/**
- * stop_machine_destroy: destroy all stop_machine threads
- *
- * Description: This causes all stop_machine threads which were created with
- * stop_machine_create to be destroyed again.
- */
-void stop_machine_destroy(void);
-
#else
static inline int stop_machine(int (*fn)(void *), void *data,
@@ -96,8 +79,5 @@ static inline int stop_machine(int (*fn)(void *), void *data,
return ret;
}
-static inline int stop_machine_create(void) { return 0; }
-static inline void stop_machine_destroy(void) { }
-
#endif /* CONFIG_SMP */
#endif /* _LINUX_STOP_MACHINE */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 914aedc..5457775 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -266,9 +266,6 @@ int __ref cpu_down(unsigned int cpu)
{
int err;
- err = stop_machine_create();
- if (err)
- return err;
cpu_maps_update_begin();
if (cpu_hotplug_disabled) {
@@ -280,7 +277,6 @@ int __ref cpu_down(unsigned int cpu)
out:
cpu_maps_update_done();
- stop_machine_destroy();
return err;
}
EXPORT_SYMBOL(cpu_down);
@@ -361,9 +357,6 @@ int disable_nonboot_cpus(void)
{
int cpu, first_cpu, error;
- error = stop_machine_create();
- if (error)
- return error;
cpu_maps_update_begin();
first_cpu = cpumask_first(cpu_online_mask);
/*
@@ -394,7 +387,6 @@ int disable_nonboot_cpus(void)
printk(KERN_ERR "Non-boot CPUs are not disabled\n");
}
cpu_maps_update_done();
- stop_machine_destroy();
return error;
}
diff --git a/kernel/module.c b/kernel/module.c
index 1016b75..0838246 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -723,16 +723,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
return -EFAULT;
name[MODULE_NAME_LEN-1] = '\0';
- /* Create stop_machine threads since free_module relies on
- * a non-failing stop_machine call. */
- ret = stop_machine_create();
- if (ret)
- return ret;
-
- if (mutex_lock_interruptible(&module_mutex) != 0) {
- ret = -EINTR;
- goto out_stop;
- }
+ if (mutex_lock_interruptible(&module_mutex) != 0)
+ return -EINTR;
mod = find_module(name);
if (!mod) {
@@ -792,8 +784,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
out:
mutex_unlock(&module_mutex);
-out_stop:
- stop_machine_destroy();
return ret;
}
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 3ae2485..e7c1929 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -383,174 +383,92 @@ enum stopmachine_state {
/* Exit */
STOPMACHINE_EXIT,
};
-static enum stopmachine_state state;
struct stop_machine_data {
- int (*fn)(void *);
- void *data;
- int fnret;
+ int (*fn)(void *);
+ void *data;
+ /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
+ unsigned int num_threads;
+ const struct cpumask *active_cpus;
+
+ enum stopmachine_state state;
+ atomic_t thread_ack;
};
-/* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
-static unsigned int num_threads;
-static atomic_t thread_ack;
-static DEFINE_MUTEX(lock);
-/* setup_lock protects refcount, stop_machine_wq and stop_machine_work. */
-static DEFINE_MUTEX(setup_lock);
-/* Users of stop_machine. */
-static int refcount;
-static struct workqueue_struct *stop_machine_wq;
-static struct stop_machine_data active, idle;
-static const struct cpumask *active_cpus;
-static void __percpu *stop_machine_work;
-
-static void set_state(enum stopmachine_state newstate)
+static void set_state(struct stop_machine_data *smdata,
+ enum stopmachine_state newstate)
{
/* Reset ack counter. */
- atomic_set(&thread_ack, num_threads);
+ atomic_set(&smdata->thread_ack, smdata->num_threads);
smp_wmb();
- state = newstate;
+ smdata->state = newstate;
}
/* Last one to ack a state moves to the next state. */
-static void ack_state(void)
+static void ack_state(struct stop_machine_data *smdata)
{
- if (atomic_dec_and_test(&thread_ack))
- set_state(state + 1);
+ if (atomic_dec_and_test(&smdata->thread_ack))
+ set_state(smdata, smdata->state + 1);
}
-/* This is the actual function which stops the CPU. It runs
- * in the context of a dedicated stopmachine workqueue. */
-static void stop_cpu(struct work_struct *unused)
+/* This is the cpu_stop function which stops the CPU. */
+static int stop_machine_cpu_stop(void *data)
{
+ struct stop_machine_data *smdata = data;
enum stopmachine_state curstate = STOPMACHINE_NONE;
- struct stop_machine_data *smdata = &idle;
- int cpu = smp_processor_id();
- int err;
+ int cpu = smp_processor_id(), err = 0;
+ bool is_active;
+
+ if (!smdata->active_cpus)
+ is_active = cpu == cpumask_first(cpu_online_mask);
+ else
+ is_active = cpumask_test_cpu(cpu, smdata->active_cpus);
- if (!active_cpus) {
- if (cpu == cpumask_first(cpu_online_mask))
- smdata = &active;
- } else {
- if (cpumask_test_cpu(cpu, active_cpus))
- smdata = &active;
- }
/* Simple state machine */
do {
/* Chill out and ensure we re-read stopmachine_state. */
cpu_relax();
- if (state != curstate) {
- curstate = state;
+ if (smdata->state != curstate) {
+ curstate = smdata->state;
switch (curstate) {
case STOPMACHINE_DISABLE_IRQ:
local_irq_disable();
hard_irq_disable();
break;
case STOPMACHINE_RUN:
- /* On multiple CPUs only a single error code
- * is needed to tell that something failed. */
- err = smdata->fn(smdata->data);
- if (err)
- smdata->fnret = err;
+ if (is_active)
+ err = smdata->fn(smdata->data);
break;
default:
break;
}
- ack_state();
+ ack_state(smdata);
}
} while (curstate != STOPMACHINE_EXIT);
local_irq_enable();
+ return err;
}
-/* Callback for CPUs which aren't supposed to do anything. */
-static int chill(void *unused)
-{
- return 0;
-}
-
-int stop_machine_create(void)
-{
- mutex_lock(&setup_lock);
- if (refcount)
- goto done;
- stop_machine_wq = create_rt_workqueue("kstop");
- if (!stop_machine_wq)
- goto err_out;
- stop_machine_work = alloc_percpu(struct work_struct);
- if (!stop_machine_work)
- goto err_out;
-done:
- refcount++;
- mutex_unlock(&setup_lock);
- return 0;
-
-err_out:
- if (stop_machine_wq)
- destroy_workqueue(stop_machine_wq);
- mutex_unlock(&setup_lock);
- return -ENOMEM;
-}
-EXPORT_SYMBOL_GPL(stop_machine_create);
-
-void stop_machine_destroy(void)
-{
- mutex_lock(&setup_lock);
- refcount--;
- if (refcount)
- goto done;
- destroy_workqueue(stop_machine_wq);
- free_percpu(stop_machine_work);
-done:
- mutex_unlock(&setup_lock);
-}
-EXPORT_SYMBOL_GPL(stop_machine_destroy);
-
int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
{
- struct work_struct *sm_work;
- int i, ret;
-
- /* Set up initial state. */
- mutex_lock(&lock);
- num_threads = num_online_cpus();
- active_cpus = cpus;
- active.fn = fn;
- active.data = data;
- active.fnret = 0;
- idle.fn = chill;
- idle.data = NULL;
-
- set_state(STOPMACHINE_PREPARE);
-
- /* Schedule the stop_cpu work on all cpus: hold this CPU so one
- * doesn't hit this CPU until we're ready. */
- get_cpu();
- for_each_online_cpu(i) {
- sm_work = per_cpu_ptr(stop_machine_work, i);
- INIT_WORK(sm_work, stop_cpu);
- queue_work_on(i, stop_machine_wq, sm_work);
- }
- /* This will release the thread on our CPU. */
- put_cpu();
- flush_workqueue(stop_machine_wq);
- ret = active.fnret;
- mutex_unlock(&lock);
- return ret;
+ struct stop_machine_data smdata = { .fn = fn, .data = data,
+ .num_threads = num_online_cpus(),
+ .active_cpus = cpus };
+
+ /* Set the initial state and stop all online cpus. */
+ set_state(&smdata, STOPMACHINE_PREPARE);
+ return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
}
int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
{
int ret;
- ret = stop_machine_create();
- if (ret)
- return ret;
/* No CPUs can come up or down during this. */
get_online_cpus();
ret = __stop_machine(fn, data, cpus);
put_online_cpus();
- stop_machine_destroy();
return ret;
}
EXPORT_SYMBOL_GPL(stop_machine);
--
1.6.4.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] scheduler: replace migration_thread with cpu_stop
2010-04-22 16:09 [PATCHSET sched/core] cpu_stop: implement and use cpu_stop Tejun Heo
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
2010-04-22 16:09 ` [PATCH 2/4] stop_machine: reimplement using cpu_stop Tejun Heo
@ 2010-04-22 16:09 ` Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2010-04-22 16:09 ` [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited() Tejun Heo
3 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2010-04-22 16:09 UTC (permalink / raw)
To: mingo, linux-kernel, x86, oleg, peterz, rusty, sivanich,
heiko.carstens, dipankar, josh, paulmck, akpm, arjan, torvalds
Cc: Tejun Heo
Currently migration_thread is serving three purposes - migration
pusher, context to execute active_load_balance() and forced context
switcher for expedited RCU synchronize_sched. All three roles are
hardcoded into migration_thread() and determining which job is
scheduled is slightly messy.
This patch kills migration_thread and replaces all three uses with
cpu_stop. The three different roles of migration_thread() are
splitted into three separate cpu_stop callbacks -
migration_cpu_stop(), active_load_balance_cpu_stop() and
synchronize_sched_expedited_cpu_stop() - and each use case now simply
asks cpu_stop to execute the callback as necessary.
synchronize_sched_expedited() was implemented with private
preallocated resources and custom multi-cpu queueing and waiting
logic, both of which are provided by cpu_stop.
synchronize_sched_expedited_count is made atomic and all other shared
resources along with the mutex are dropped.
synchronize_sched_expedited() also implemented a check to detect cases
where not all the callback got executed on their assigned cpus and
fall back to synchronize_sched(). If called with cpu hotplug blocked,
cpu_stop already guarantees that and the condition cannot happen;
otherwise, stop_machine() would break. However, this patch preserves
the paranoid check using a cpumask to record on which cpus the stopper
ran so that it can serve as a bisection point if something actually
goes wrong theree.
Because the internal execution state is no longer visible,
rcu_expedited_torture_stats() is removed.
This patch also renames cpu_stop threads to from "stopper/%d" to
"migration/%d". The names of these threads ultimately don't matter
and there's no reason to make unnecessary userland visible changes.
With this patch applied, stop_machine() and sched now share the same
resources. stop_machine() is faster without wasting any resources and
sched migration users are much cleaner.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
Documentation/RCU/torture.txt | 10 --
include/linux/rcutiny.h | 2 -
include/linux/rcutree.h | 1 -
kernel/rcutorture.c | 2 +-
kernel/sched.c | 303 +++++++++++------------------------------
kernel/sched_fair.c | 42 ++++--
kernel/stop_machine.c | 2 +-
7 files changed, 109 insertions(+), 253 deletions(-)
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 0e50bc2..5d90167 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following:
sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0
sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0
sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0
- state: -1 / 0:0 3:0 4:0
-
-As before, the first four lines are similar to those for RCU.
-The last line shows the task-migration state. The first number is
--1 if synchronize_sched_expedited() is idle, -2 if in the process of
-posting wakeups to the migration kthreads, and N when waiting on CPU N.
-Each of the colon-separated fields following the "/" is a CPU:state pair.
-Valid states are "0" for idle, "1" for waiting for quiescent state,
-"2" for passed through quiescent state, and "3" when a race with a
-CPU-hotplug event forces use of the synchronize_sched() primitive.
USAGE
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index a519587..0006b2d 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -60,8 +60,6 @@ static inline long rcu_batches_completed_bh(void)
return 0;
}
-extern int rcu_expedited_torture_stats(char *page);
-
static inline void rcu_force_quiescent_state(void)
{
}
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 42cc3a0..24e467e 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -35,7 +35,6 @@ struct notifier_block;
extern void rcu_sched_qs(int cpu);
extern void rcu_bh_qs(int cpu);
extern int rcu_needs_cpu(int cpu);
-extern int rcu_expedited_torture_stats(char *page);
#ifdef CONFIG_TREE_PREEMPT_RCU
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 58df55b..2b676f3 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -669,7 +669,7 @@ static struct rcu_torture_ops sched_expedited_ops = {
.sync = synchronize_sched_expedited,
.cb_barrier = NULL,
.fqs = rcu_sched_force_quiescent_state,
- .stats = rcu_expedited_torture_stats,
+ .stats = NULL,
.irq_capable = 1,
.name = "sched_expedited"
};
diff --git a/kernel/sched.c b/kernel/sched.c
index de0da71..947b2bb 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -55,9 +55,9 @@
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/percpu.h>
-#include <linux/kthread.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
+#include <linux/stop_machine.h>
#include <linux/sysctl.h>
#include <linux/syscalls.h>
#include <linux/times.h>
@@ -539,15 +539,13 @@ struct rq {
int post_schedule;
int active_balance;
int push_cpu;
+ struct cpu_stop_work active_balance_work;
/* cpu of this runqueue: */
int cpu;
int online;
unsigned long avg_load_per_task;
- struct task_struct *migration_thread;
- struct list_head migration_queue;
-
u64 rt_avg;
u64 age_stamp;
u64 idle_stamp;
@@ -2037,21 +2035,18 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
__set_task_cpu(p, new_cpu);
}
-struct migration_req {
- struct list_head list;
-
+struct migration_arg {
struct task_struct *task;
int dest_cpu;
-
- struct completion done;
};
+static int migration_cpu_stop(void *data);
+
/*
* The task's runqueue lock must be held.
* Returns true if you have to wait for migration thread.
*/
-static int
-migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
+static bool migrate_task(struct task_struct *p, int dest_cpu)
{
struct rq *rq = task_rq(p);
@@ -2059,15 +2054,7 @@ migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
* If the task is not on a runqueue (and not running), then
* the next wake-up will properly place the task.
*/
- if (!p->se.on_rq && !task_running(rq, p))
- return 0;
-
- init_completion(&req->done);
- req->task = p;
- req->dest_cpu = dest_cpu;
- list_add(&req->list, &rq->migration_queue);
-
- return 1;
+ return p->se.on_rq || task_running(rq, p);
}
/*
@@ -3056,7 +3043,6 @@ static void update_cpu_load(struct rq *this_rq)
void sched_exec(void)
{
struct task_struct *p = current;
- struct migration_req req;
unsigned long flags;
struct rq *rq;
int dest_cpu;
@@ -3070,17 +3056,11 @@ void sched_exec(void)
* select_task_rq() can race against ->cpus_allowed
*/
if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
- likely(cpu_active(dest_cpu)) &&
- migrate_task(p, dest_cpu, &req)) {
- /* Need to wait for migration thread (might exit: take ref). */
- struct task_struct *mt = rq->migration_thread;
+ likely(cpu_active(dest_cpu)) && migrate_task(p, dest_cpu)) {
+ struct migration_arg arg = { p, dest_cpu };
- get_task_struct(mt);
task_rq_unlock(rq, &flags);
- wake_up_process(mt);
- put_task_struct(mt);
- wait_for_completion(&req.done);
-
+ stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
return;
}
unlock:
@@ -5236,17 +5216,15 @@ static inline void sched_init_granularity(void)
/*
* This is how migration works:
*
- * 1) we queue a struct migration_req structure in the source CPU's
- * runqueue and wake up that CPU's migration thread.
- * 2) we down() the locked semaphore => thread blocks.
- * 3) migration thread wakes up (implicitly it forces the migrated
- * thread off the CPU)
- * 4) it gets the migration request and checks whether the migrated
- * task is still in the wrong runqueue.
- * 5) if it's in the wrong runqueue then the migration thread removes
+ * 1) we invoke migration_cpu_stop() on the target CPU using
+ * stop_one_cpu().
+ * 2) stopper starts to run (implicitly forcing the migrated thread
+ * off the CPU)
+ * 3) it checks whether the migrated task is still in the wrong runqueue.
+ * 4) if it's in the wrong runqueue then the migration thread removes
* it and puts it into the right queue.
- * 6) migration thread up()s the semaphore.
- * 7) we wake up and the migration is done.
+ * 5) stopper completes and stop_one_cpu() returns and the migration
+ * is done.
*/
/*
@@ -5260,9 +5238,9 @@ static inline void sched_init_granularity(void)
*/
int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
{
- struct migration_req req;
unsigned long flags;
struct rq *rq;
+ unsigned int dest_cpu;
int ret = 0;
/*
@@ -5300,15 +5278,12 @@ again:
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
- if (migrate_task(p, cpumask_any_and(cpu_active_mask, new_mask), &req)) {
+ dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+ if (migrate_task(p, dest_cpu)) {
+ struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
- struct task_struct *mt = rq->migration_thread;
-
- get_task_struct(mt);
task_rq_unlock(rq, &flags);
- wake_up_process(mt);
- put_task_struct(mt);
- wait_for_completion(&req.done);
+ stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
tlb_migrate_finish(p->mm);
return 0;
}
@@ -5366,70 +5341,22 @@ fail:
return ret;
}
-#define RCU_MIGRATION_IDLE 0
-#define RCU_MIGRATION_NEED_QS 1
-#define RCU_MIGRATION_GOT_QS 2
-#define RCU_MIGRATION_MUST_SYNC 3
-
/*
- * migration_thread - this is a highprio system thread that performs
- * thread migration by bumping thread off CPU then 'pushing' onto
- * another runqueue.
+ * migration_cpu_stop - this will be executed by a highprio stopper thread
+ * and performs thread migration by bumping thread off CPU then
+ * 'pushing' onto another runqueue.
*/
-static int migration_thread(void *data)
+static int migration_cpu_stop(void *data)
{
- int badcpu;
- int cpu = (long)data;
- struct rq *rq;
-
- rq = cpu_rq(cpu);
- BUG_ON(rq->migration_thread != current);
-
- set_current_state(TASK_INTERRUPTIBLE);
- while (!kthread_should_stop()) {
- struct migration_req *req;
- struct list_head *head;
-
- raw_spin_lock_irq(&rq->lock);
-
- if (cpu_is_offline(cpu)) {
- raw_spin_unlock_irq(&rq->lock);
- break;
- }
-
- if (rq->active_balance) {
- active_load_balance(rq, cpu);
- rq->active_balance = 0;
- }
-
- head = &rq->migration_queue;
-
- if (list_empty(head)) {
- raw_spin_unlock_irq(&rq->lock);
- schedule();
- set_current_state(TASK_INTERRUPTIBLE);
- continue;
- }
- req = list_entry(head->next, struct migration_req, list);
- list_del_init(head->next);
-
- if (req->task != NULL) {
- raw_spin_unlock(&rq->lock);
- __migrate_task(req->task, cpu, req->dest_cpu);
- } else if (likely(cpu == (badcpu = smp_processor_id()))) {
- req->dest_cpu = RCU_MIGRATION_GOT_QS;
- raw_spin_unlock(&rq->lock);
- } else {
- req->dest_cpu = RCU_MIGRATION_MUST_SYNC;
- raw_spin_unlock(&rq->lock);
- WARN_ONCE(1, "migration_thread() on CPU %d, expected %d\n", badcpu, cpu);
- }
- local_irq_enable();
-
- complete(&req->done);
- }
- __set_current_state(TASK_RUNNING);
+ struct migration_arg *arg = data;
+ /*
+ * The original target cpu might have gone down and we might
+ * be on another cpu but it doesn't matter.
+ */
+ local_irq_disable();
+ __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
+ local_irq_enable();
return 0;
}
@@ -5796,35 +5723,20 @@ static void set_rq_offline(struct rq *rq)
static int __cpuinit
migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
- struct task_struct *p;
int cpu = (long)hcpu;
unsigned long flags;
- struct rq *rq;
+ struct rq *rq = cpu_rq(cpu);
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
- p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
- if (IS_ERR(p))
- return NOTIFY_BAD;
- kthread_bind(p, cpu);
- /* Must be high prio: stop_machine expects to yield to it. */
- rq = task_rq_lock(p, &flags);
- __setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
- task_rq_unlock(rq, &flags);
- get_task_struct(p);
- cpu_rq(cpu)->migration_thread = p;
rq->calc_load_update = calc_load_update;
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
- /* Strictly unnecessary, as first user will wake it. */
- wake_up_process(cpu_rq(cpu)->migration_thread);
-
/* Update our root-domain */
- rq = cpu_rq(cpu);
raw_spin_lock_irqsave(&rq->lock, flags);
if (rq->rd) {
BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
@@ -5835,25 +5747,9 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
break;
#ifdef CONFIG_HOTPLUG_CPU
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
- if (!cpu_rq(cpu)->migration_thread)
- break;
- /* Unbind it from offline cpu so it can run. Fall thru. */
- kthread_bind(cpu_rq(cpu)->migration_thread,
- cpumask_any(cpu_online_mask));
- kthread_stop(cpu_rq(cpu)->migration_thread);
- put_task_struct(cpu_rq(cpu)->migration_thread);
- cpu_rq(cpu)->migration_thread = NULL;
- break;
-
case CPU_DEAD:
case CPU_DEAD_FROZEN:
migrate_live_tasks(cpu);
- rq = cpu_rq(cpu);
- kthread_stop(rq->migration_thread);
- put_task_struct(rq->migration_thread);
- rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */
raw_spin_lock_irq(&rq->lock);
deactivate_task(rq, rq->idle, 0);
@@ -5864,29 +5760,11 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
migrate_nr_uninterruptible(rq);
BUG_ON(rq->nr_running != 0);
calc_global_load_remove(rq);
- /*
- * No need to migrate the tasks: it was best-effort if
- * they didn't take sched_hotcpu_mutex. Just wake up
- * the requestors.
- */
- raw_spin_lock_irq(&rq->lock);
- while (!list_empty(&rq->migration_queue)) {
- struct migration_req *req;
-
- req = list_entry(rq->migration_queue.next,
- struct migration_req, list);
- list_del_init(&req->list);
- raw_spin_unlock_irq(&rq->lock);
- complete(&req->done);
- raw_spin_lock_irq(&rq->lock);
- }
- raw_spin_unlock_irq(&rq->lock);
break;
case CPU_DYING:
case CPU_DYING_FROZEN:
/* Update our root-domain */
- rq = cpu_rq(cpu);
raw_spin_lock_irqsave(&rq->lock, flags);
if (rq->rd) {
BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
@@ -7700,10 +7578,8 @@ void __init sched_init(void)
rq->push_cpu = 0;
rq->cpu = i;
rq->online = 0;
- rq->migration_thread = NULL;
rq->idle_stamp = 0;
rq->avg_idle = 2*sysctl_sched_migration_cost;
- INIT_LIST_HEAD(&rq->migration_queue);
rq_attach_root(rq, &def_root_domain);
#endif
init_rq_hrtick(rq);
@@ -8997,12 +8873,6 @@ struct cgroup_subsys cpuacct_subsys = {
#ifndef CONFIG_SMP
-int rcu_expedited_torture_stats(char *page)
-{
- return 0;
-}
-EXPORT_SYMBOL_GPL(rcu_expedited_torture_stats);
-
void synchronize_sched_expedited(void)
{
}
@@ -9010,30 +8880,20 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
#else /* #ifndef CONFIG_SMP */
-static DEFINE_PER_CPU(struct migration_req, rcu_migration_req);
-static DEFINE_MUTEX(rcu_sched_expedited_mutex);
-
-#define RCU_EXPEDITED_STATE_POST -2
-#define RCU_EXPEDITED_STATE_IDLE -1
-
-static int rcu_expedited_state = RCU_EXPEDITED_STATE_IDLE;
+static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
-int rcu_expedited_torture_stats(char *page)
+static int synchronize_sched_expedited_cpu_stop(void *data)
{
- int cnt = 0;
- int cpu;
+ static DEFINE_SPINLOCK(done_mask_lock);
+ struct cpumask *done_mask = data;
- cnt += sprintf(&page[cnt], "state: %d /", rcu_expedited_state);
- for_each_online_cpu(cpu) {
- cnt += sprintf(&page[cnt], " %d:%d",
- cpu, per_cpu(rcu_migration_req, cpu).dest_cpu);
+ if (done_mask) {
+ spin_lock(&done_mask_lock);
+ cpumask_set_cpu(smp_processor_id(), done_mask);
+ spin_unlock(&done_mask_lock);
}
- cnt += sprintf(&page[cnt], "\n");
- return cnt;
+ return 0;
}
-EXPORT_SYMBOL_GPL(rcu_expedited_torture_stats);
-
-static long synchronize_sched_expedited_count;
/*
* Wait for an rcu-sched grace period to elapse, but use "big hammer"
@@ -9047,60 +8907,55 @@ static long synchronize_sched_expedited_count;
*/
void synchronize_sched_expedited(void)
{
- int cpu;
- unsigned long flags;
- bool need_full_sync = 0;
- struct rq *rq;
- struct migration_req *req;
- long snap;
- int trycount = 0;
+ cpumask_var_t done_mask_var;
+ struct cpumask *done_mask = NULL;
+ int snap, trycount = 0;
+
+ /*
+ * done_mask is used to check that all cpus actually have
+ * finished running the stopper, which is guaranteed by
+ * stop_cpus() if it's called with cpu hotplug blocked. Keep
+ * the paranoia for now but it's best effort if cpumask is off
+ * stack.
+ */
+ if (zalloc_cpumask_var(&done_mask_var, GFP_ATOMIC))
+ done_mask = done_mask_var;
smp_mb(); /* ensure prior mod happens before capturing snap. */
- snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
+ snap = atomic_read(&synchronize_sched_expedited_count) + 1;
get_online_cpus();
- while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
+ while (try_stop_cpus(cpu_online_mask,
+ synchronize_sched_expedited_cpu_stop,
+ done_mask) == -EAGAIN) {
put_online_cpus();
if (trycount++ < 10)
udelay(trycount * num_online_cpus());
else {
synchronize_sched();
- return;
+ goto free_out;
}
- if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
+ if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {
smp_mb(); /* ensure test happens before caller kfree */
- return;
+ goto free_out;
}
get_online_cpus();
}
- rcu_expedited_state = RCU_EXPEDITED_STATE_POST;
- for_each_online_cpu(cpu) {
- rq = cpu_rq(cpu);
- req = &per_cpu(rcu_migration_req, cpu);
- init_completion(&req->done);
- req->task = NULL;
- req->dest_cpu = RCU_MIGRATION_NEED_QS;
- raw_spin_lock_irqsave(&rq->lock, flags);
- list_add(&req->list, &rq->migration_queue);
- raw_spin_unlock_irqrestore(&rq->lock, flags);
- wake_up_process(rq->migration_thread);
- }
- for_each_online_cpu(cpu) {
- rcu_expedited_state = cpu;
- req = &per_cpu(rcu_migration_req, cpu);
- rq = cpu_rq(cpu);
- wait_for_completion(&req->done);
- raw_spin_lock_irqsave(&rq->lock, flags);
- if (unlikely(req->dest_cpu == RCU_MIGRATION_MUST_SYNC))
- need_full_sync = 1;
- req->dest_cpu = RCU_MIGRATION_IDLE;
- raw_spin_unlock_irqrestore(&rq->lock, flags);
- }
- rcu_expedited_state = RCU_EXPEDITED_STATE_IDLE;
- synchronize_sched_expedited_count++;
- mutex_unlock(&rcu_sched_expedited_mutex);
+ atomic_inc(&synchronize_sched_expedited_count);
+ if (done_mask)
+ cpumask_xor(done_mask, done_mask, cpu_online_mask);
put_online_cpus();
- if (need_full_sync)
+
+ /* paranoia - this can't happen */
+ if (done_mask && cpumask_weight(done_mask)) {
+ char buf[80];
+
+ cpulist_scnprintf(buf, sizeof(buf), done_mask);
+ WARN_ONCE(1, "synchronize_sched_expedited: cpu online and done masks disagree on %d cpus: %s\n",
+ cpumask_weight(done_mask), buf);
synchronize_sched();
+ }
+free_out:
+ free_cpumask_var(done_mask_var);
}
EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 88d3053..8df55da 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2802,6 +2802,8 @@ static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle)
return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
}
+static int active_load_balance_cpu_stop(void *data);
+
/*
* Check this_cpu to ensure it is balanced within domain. Attempt to move
* tasks if there is an imbalance.
@@ -2891,8 +2893,9 @@ redo:
if (need_active_balance(sd, sd_idle, idle)) {
raw_spin_lock_irqsave(&busiest->lock, flags);
- /* don't kick the migration_thread, if the curr
- * task on busiest cpu can't be moved to this_cpu
+ /* don't kick the active_load_balance_cpu_stop,
+ * if the curr task on busiest cpu can't be
+ * moved to this_cpu
*/
if (!cpumask_test_cpu(this_cpu,
&busiest->curr->cpus_allowed)) {
@@ -2909,7 +2912,9 @@ redo:
}
raw_spin_unlock_irqrestore(&busiest->lock, flags);
if (active_balance)
- wake_up_process(busiest->migration_thread);
+ stop_one_cpu_nowait(cpu_of(busiest),
+ active_load_balance_cpu_stop, busiest,
+ &busiest->active_balance_work);
/*
* We've kicked active balancing, reset the failure
@@ -3016,24 +3021,29 @@ static void idle_balance(int this_cpu, struct rq *this_rq)
}
/*
- * active_load_balance is run by migration threads. It pushes running tasks
- * off the busiest CPU onto idle CPUs. It requires at least 1 task to be
- * running on each physical CPU where possible, and avoids physical /
- * logical imbalances.
- *
- * Called with busiest_rq locked.
+ * active_load_balance_cpu_stop is run by cpu stopper. It pushes
+ * running tasks off the busiest CPU onto idle CPUs. It requires at
+ * least 1 task to be running on each physical CPU where possible, and
+ * avoids physical / logical imbalances.
*/
-static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
+static int active_load_balance_cpu_stop(void *data)
{
+ struct rq *busiest_rq = data;
+ int busiest_cpu = cpu_of(busiest_rq);
int target_cpu = busiest_rq->push_cpu;
+ struct rq *target_rq = cpu_rq(target_cpu);
struct sched_domain *sd;
- struct rq *target_rq;
+
+ raw_spin_lock_irq(&busiest_rq->lock);
+
+ /* make sure the requested cpu hasn't gone down in the meantime */
+ if (unlikely(busiest_cpu != smp_processor_id() ||
+ !busiest_rq->active_balance))
+ goto out_unlock;
/* Is there any task to move? */
if (busiest_rq->nr_running <= 1)
- return;
-
- target_rq = cpu_rq(target_cpu);
+ goto out_unlock;
/*
* This condition is "impossible", if it occurs
@@ -3062,6 +3072,10 @@ static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
schedstat_inc(sd, alb_failed);
}
double_unlock_balance(busiest_rq, target_rq);
+out_unlock:
+ busiest_rq->active_balance = 0;
+ raw_spin_unlock_irq(&busiest_rq->lock);
+ return 0;
}
#ifdef CONFIG_NO_HZ
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e7c1929..4b9a573 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -296,7 +296,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
case CPU_UP_PREPARE:
BUG_ON(stopper->thread || stopper->enabled ||
!list_empty(&stopper->works));
- p = kthread_create(cpu_stopper_thread, stopper, "stopper/%d",
+ p = kthread_create(cpu_stopper_thread, stopper, "migration/%d",
cpu);
if (IS_ERR(p))
return NOTIFY_BAD;
--
1.6.4.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited()
2010-04-22 16:09 [PATCHSET sched/core] cpu_stop: implement and use cpu_stop Tejun Heo
` (2 preceding siblings ...)
2010-04-22 16:09 ` [PATCH 3/4] scheduler: replace migration_thread with cpu_stop Tejun Heo
@ 2010-04-22 16:09 ` Tejun Heo
3 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2010-04-22 16:09 UTC (permalink / raw)
To: mingo, linux-kernel, x86, oleg, peterz, rusty, sivanich,
heiko.carstens, dipankar, josh, paulmck, akpm, arjan, torvalds
Cc: Tejun Heo
The paranoid check which verifies that the cpu_stop callback is
actually called on all online cpus is completely superflous. It's
guaranteed by cpu_stop facility and if it didn't work as advertised
other things would go horribly wrong and trying to recover using
synchronize_sched() wouldn't be very meaningful.
Kill the paranoid check. Removal of this feature is done as a
separate step so that it can serve as a bisection point if something
actually goes wrong.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
kernel/sched.c | 40 +++-------------------------------------
1 files changed, 3 insertions(+), 37 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 947b2bb..5b88521 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8884,14 +8884,6 @@ static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
static int synchronize_sched_expedited_cpu_stop(void *data)
{
- static DEFINE_SPINLOCK(done_mask_lock);
- struct cpumask *done_mask = data;
-
- if (done_mask) {
- spin_lock(&done_mask_lock);
- cpumask_set_cpu(smp_processor_id(), done_mask);
- spin_unlock(&done_mask_lock);
- }
return 0;
}
@@ -8907,55 +8899,29 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
*/
void synchronize_sched_expedited(void)
{
- cpumask_var_t done_mask_var;
- struct cpumask *done_mask = NULL;
int snap, trycount = 0;
- /*
- * done_mask is used to check that all cpus actually have
- * finished running the stopper, which is guaranteed by
- * stop_cpus() if it's called with cpu hotplug blocked. Keep
- * the paranoia for now but it's best effort if cpumask is off
- * stack.
- */
- if (zalloc_cpumask_var(&done_mask_var, GFP_ATOMIC))
- done_mask = done_mask_var;
-
smp_mb(); /* ensure prior mod happens before capturing snap. */
snap = atomic_read(&synchronize_sched_expedited_count) + 1;
get_online_cpus();
while (try_stop_cpus(cpu_online_mask,
synchronize_sched_expedited_cpu_stop,
- done_mask) == -EAGAIN) {
+ NULL) == -EAGAIN) {
put_online_cpus();
if (trycount++ < 10)
udelay(trycount * num_online_cpus());
else {
synchronize_sched();
- goto free_out;
+ return;
}
if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {
smp_mb(); /* ensure test happens before caller kfree */
- goto free_out;
+ return;
}
get_online_cpus();
}
atomic_inc(&synchronize_sched_expedited_count);
- if (done_mask)
- cpumask_xor(done_mask, done_mask, cpu_online_mask);
put_online_cpus();
-
- /* paranoia - this can't happen */
- if (done_mask && cpumask_weight(done_mask)) {
- char buf[80];
-
- cpulist_scnprintf(buf, sizeof(buf), done_mask);
- WARN_ONCE(1, "synchronize_sched_expedited: cpu online and done masks disagree on %d cpus: %s\n",
- cpumask_weight(done_mask), buf);
- synchronize_sched();
- }
-free_out:
- free_cpumask_var(done_mask_var);
}
EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
--
1.6.4.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
@ 2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 6:36 ` Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2010-05-03 13:26 ` Peter Zijlstra
2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-05-03 13:26 UTC (permalink / raw)
To: Tejun Heo
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
> +static int cpu_stopper_thread(void *data)
> +{
> + struct cpu_stopper *stopper = data;
BUG_ON(stopper != __get_cpu_var(cpu_stopper)); ?
> + work = NULL;
> + spin_lock_irq(&stopper->lock);
> + if (!list_empty(&stopper->works)) {
> + work = list_first_entry(&stopper->works,
> + struct cpu_stop_work, list);
> + list_del_init(&work->list);
> + }
> + spin_unlock_irq(&stopper->lock);
Not sure if its worth the hassle, but you could list_splice_init() the
complete pending list onto a local list, possible avoiding some locks.
But since this isn't supposed to be used much, I doubt we'll ever see
the difference.
> + /* restore preemption and check it's still balanced */
> + preempt_enable();
> + WARN_ON_ONCE(preempt_count());
You would use WARN_ONCE() and print the function that last ran and
leaked the preempt count.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
@ 2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 6:36 ` Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-05-03 13:26 UTC (permalink / raw)
To: Tejun Heo
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
> +void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void
> *arg,
> + struct cpu_stop_work *work_buf)
> +{
> + memset(work_buf, 0, sizeof(*work_buf));
> + work_buf->fn = fn;
> + work_buf->arg = arg;
*work_buf = (struct cpu_stop_work){ .fn = fn, .arg = arg, };
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2010-05-03 13:26 ` Peter Zijlstra
@ 2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 6:40 ` Tejun Heo
2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-05-03 13:26 UTC (permalink / raw)
To: Tejun Heo
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
> +struct cpu_stop_done {
> + atomic_t nr_todo; /* nr left to execute */
> + bool executed; /* actually executed? */
> + int ret; /* collected return value */
> + struct completion completion; /* fired if nr_todo reaches 0 */
> +};
> +
> +/* signal completion unless @done is NULL */
> +static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
> +{
> + if (done) {
> + if (executed)
> + done->executed = true;
> + if (atomic_dec_and_test(&done->nr_todo))
> + complete(&done->completion);
> + }
> +}
> +
> +/* queue @work to @stopper. if offline, @work is completed immediately */
> +static void cpu_stop_queue_work(struct cpu_stopper *stopper,
> + struct cpu_stop_work *work)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&stopper->lock, flags);
> +
> + if (stopper->enabled) {
> + list_add_tail(&work->list, &stopper->works);
> + wake_up_process(stopper->thread);
> + } else
> + cpu_stop_signal_done(work->done, false);
> +
> + spin_unlock_irqrestore(&stopper->lock, flags);
> +}
> +
> +int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
> +{
> + struct cpu_stop_done done;
> + struct cpu_stop_work work = { .fn = fn, .arg = arg, .done = &done };
> +
> + cpu_stop_init_done(&done, 1);
> + cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), &work);
> + wait_for_completion(&done.completion);
> + return done.executed ? done.ret : -ENOENT;
> +}
If you do:
done = { .ret = -ENOENT, };
And remove that if()
> + ret = work->fn(work->arg);
> + if (ret)
> + done->ret = ret;
> +
You can do away with all the ->executed bits.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] scheduler: replace migration_thread with cpu_stop
2010-04-22 16:09 ` [PATCH 3/4] scheduler: replace migration_thread with cpu_stop Tejun Heo
@ 2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 7:17 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-05-03 13:26 UTC (permalink / raw)
To: Tejun Heo
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
> @@ -2909,7 +2912,9 @@ redo:
> }
> raw_spin_unlock_irqrestore(&busiest->lock, flags);
> if (active_balance)
> - wake_up_process(busiest->migration_thread);
> + stop_one_cpu_nowait(cpu_of(busiest),
> + active_load_balance_cpu_stop, busiest,
> + &busiest->active_balance_work);
So who guarantees busiest->active_balance_work isn't already enqueued by
some other cpu's load-balancer run?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-05-03 13:26 ` Peter Zijlstra
@ 2010-05-04 6:36 ` Tejun Heo
2010-05-04 7:03 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 6:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
Hello,
On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
> On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
>> +static int cpu_stopper_thread(void *data)
>> +{
>> + struct cpu_stopper *stopper = data;
>
> BUG_ON(stopper != __get_cpu_var(cpu_stopper)); ?
Added.
>> + work = NULL;
>> + spin_lock_irq(&stopper->lock);
>> + if (!list_empty(&stopper->works)) {
>> + work = list_first_entry(&stopper->works,
>> + struct cpu_stop_work, list);
>> + list_del_init(&work->list);
>> + }
>> + spin_unlock_irq(&stopper->lock);
>
> Not sure if its worth the hassle, but you could list_splice_init() the
> complete pending list onto a local list, possible avoiding some locks.
>
> But since this isn't supposed to be used much, I doubt we'll ever see
> the difference.
Yeah, I think it would be better to keep the code simple there.
>> + /* restore preemption and check it's still balanced */
>> + preempt_enable();
>> + WARN_ON_ONCE(preempt_count());
>
> You would use WARN_ONCE() and print the function that last ran and
> leaked the preempt count.
Updated to use WARN_ONCE() w/ print the function symbol and argument.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-05-03 13:26 ` Peter Zijlstra
@ 2010-05-04 6:36 ` Tejun Heo
0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 6:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
> On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
>> +void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void
>> *arg,
>> + struct cpu_stop_work *work_buf)
>> +{
>> + memset(work_buf, 0, sizeof(*work_buf));
>> + work_buf->fn = fn;
>> + work_buf->arg = arg;
>
> *work_buf = (struct cpu_stop_work){ .fn = fn, .arg = arg, };
>
Updated.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-05-03 13:26 ` Peter Zijlstra
@ 2010-05-04 6:40 ` Tejun Heo
2010-05-04 6:55 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 6:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
Hello,
On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
> If you do:
>
> done = { .ret = -ENOENT, };
>
> And remove that if()
>
>> + ret = work->fn(work->arg);
>> + if (ret)
>> + done->ret = ret;
>> +
>
> You can do away with all the ->executed bits.
Oh, I had code piece which wanted to discern between -ENOENT from
non-excution and -ENOENT return from the work function which seems
gone now. I'll check things again and drop ->executed if everything
looks okay.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-05-04 6:40 ` Tejun Heo
@ 2010-05-04 6:55 ` Tejun Heo
0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 6:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
Hello, again.
On 05/04/2010 08:40 AM, Tejun Heo wrote:
> Oh, I had code piece which wanted to discern between -ENOENT from
> non-excution and -ENOENT return from the work function which seems
> gone now. I'll check things again and drop ->executed if everything
> looks okay.
Eh... now I remember. If we start with ->ret = 0, stop_cpus() can't
return -ENOENT when none of the specified cpus executed without
tracking execution status (so the current code). If we start with
->ret = -ENOENT, we can't tell whether all cpus executed successfully
or none has executed unless we BUG_ON() -ENOENT return from work
functions and let 0 return override -ENOENT.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-05-04 6:36 ` Tejun Heo
@ 2010-05-04 7:03 ` Tejun Heo
2010-05-04 8:43 ` Peter Zijlstra
0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 7:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On 05/04/2010 08:36 AM, Tejun Heo wrote:
> On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
>> On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
>>> +static int cpu_stopper_thread(void *data)
>>> +{
>>> + struct cpu_stopper *stopper = data;
>>
>> BUG_ON(stopper != __get_cpu_var(cpu_stopper)); ?
>
> Added.
Now that I think more about it, there's a subtle race condition with
the above BUG_ON(). Stoppers are prepared by CPU_UP_PREPARE and
started by CPU_ONLINE but brought down by CPU_DEAD. IOW, they're
allowed to run detached from their designated CPUs between CPU_DYING
and CPU_DEAD (the reponsibility of guaranteeing target cpus's onliness
is on the callers). So, the above BUG_ON() might trigger spuriously
if a cpu goes down after brought online before its cpu_stopper had a
chance to pass through the BUG_ON() test.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] scheduler: replace migration_thread with cpu_stop
2010-05-03 13:26 ` Peter Zijlstra
@ 2010-05-04 7:17 ` Tejun Heo
2010-05-04 12:45 ` Peter Zijlstra
0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 7:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
Hello,
On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
> On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
>> @@ -2909,7 +2912,9 @@ redo:
>> }
>> raw_spin_unlock_irqrestore(&busiest->lock, flags);
>> if (active_balance)
>> - wake_up_process(busiest->migration_thread);
>> + stop_one_cpu_nowait(cpu_of(busiest),
>> + active_load_balance_cpu_stop, busiest,
>> + &busiest->active_balance_work);
>
> So who guarantees busiest->active_balance_work isn't already enqueued by
> some other cpu's load-balancer run?
>
Hmmm... maybe I'm mistaken but isn't that guaranteed by
busiest->active_balance which is protected by the rq lock?
active_load_balance_cpu_stop is scheduled iff busiest->active_balance
was changed from zero and only active_load_balance_cpu_stop() can
clear it at the end of its execution at which point the
active_balance_work is safe to reuse.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] cpu_stop: implement stop_cpu[s]()
2010-05-04 7:03 ` Tejun Heo
@ 2010-05-04 8:43 ` Peter Zijlstra
0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-05-04 8:43 UTC (permalink / raw)
To: Tejun Heo
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On Tue, 2010-05-04 at 09:03 +0200, Tejun Heo wrote:
> On 05/04/2010 08:36 AM, Tejun Heo wrote:
> > On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
> >> On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
> >>> +static int cpu_stopper_thread(void *data)
> >>> +{
> >>> + struct cpu_stopper *stopper = data;
> >>
> >> BUG_ON(stopper != __get_cpu_var(cpu_stopper)); ?
> >
> > Added.
>
> Now that I think more about it, there's a subtle race condition with
> the above BUG_ON(). Stoppers are prepared by CPU_UP_PREPARE and
> started by CPU_ONLINE but brought down by CPU_DEAD. IOW, they're
> allowed to run detached from their designated CPUs between CPU_DYING
> and CPU_DEAD (the reponsibility of guaranteeing target cpus's onliness
> is on the callers). So, the above BUG_ON() might trigger spuriously
> if a cpu goes down after brought online before its cpu_stopper had a
> chance to pass through the BUG_ON() test.
Ah indeed. A well, drop it then, its not worth making a more complicated
test.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] scheduler: replace migration_thread with cpu_stop
2010-05-04 7:17 ` Tejun Heo
@ 2010-05-04 12:45 ` Peter Zijlstra
2010-05-04 12:49 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-05-04 12:45 UTC (permalink / raw)
To: Tejun Heo
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On Tue, 2010-05-04 at 09:17 +0200, Tejun Heo wrote:
> Hello,
>
> On 05/03/2010 03:26 PM, Peter Zijlstra wrote:
> > On Thu, 2010-04-22 at 18:09 +0200, Tejun Heo wrote:
> >> @@ -2909,7 +2912,9 @@ redo:
> >> }
> >> raw_spin_unlock_irqrestore(&busiest->lock, flags);
> >> if (active_balance)
> >> - wake_up_process(busiest->migration_thread);
> >> + stop_one_cpu_nowait(cpu_of(busiest),
> >> + active_load_balance_cpu_stop, busiest,
> >> + &busiest->active_balance_work);
> >
> > So who guarantees busiest->active_balance_work isn't already enqueued by
> > some other cpu's load-balancer run?
> >
>
> Hmmm... maybe I'm mistaken but isn't that guaranteed by
> busiest->active_balance which is protected by the rq lock?
> active_load_balance_cpu_stop is scheduled iff busiest->active_balance
> was changed from zero and only active_load_balance_cpu_stop() can
> clear it at the end of its execution at which point the
> active_balance_work is safe to reuse.
Ah, indeed. It wasn't obvious from looking at the patch, but when
looking at the full code it fairly easy to see.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] scheduler: replace migration_thread with cpu_stop
2010-05-04 12:45 ` Peter Zijlstra
@ 2010-05-04 12:49 ` Tejun Heo
0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2010-05-04 12:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, x86, oleg, rusty, sivanich, heiko.carstens,
dipankar, josh, paulmck, akpm, arjan, torvalds
On 05/04/2010 02:45 PM, Peter Zijlstra wrote:
>> Hmmm... maybe I'm mistaken but isn't that guaranteed by
>> busiest->active_balance which is protected by the rq lock?
>> active_load_balance_cpu_stop is scheduled iff busiest->active_balance
>> was changed from zero and only active_load_balance_cpu_stop() can
>> clear it at the end of its execution at which point the
>> active_balance_work is safe to reuse.
>
> Ah, indeed. It wasn't obvious from looking at the patch, but when
> looking at the full code it fairly easy to see.
Hmmm... it's probably worthwhile to note tho. I'll add a comment and
send out the updated patches soon.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-05-04 12:50 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-22 16:09 [PATCHSET sched/core] cpu_stop: implement and use cpu_stop Tejun Heo
2010-04-22 16:09 ` [PATCH 1/4] cpu_stop: implement stop_cpu[s]() Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 6:36 ` Tejun Heo
2010-05-04 7:03 ` Tejun Heo
2010-05-04 8:43 ` Peter Zijlstra
2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 6:36 ` Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 6:40 ` Tejun Heo
2010-05-04 6:55 ` Tejun Heo
2010-04-22 16:09 ` [PATCH 2/4] stop_machine: reimplement using cpu_stop Tejun Heo
2010-04-22 16:09 ` [PATCH 3/4] scheduler: replace migration_thread with cpu_stop Tejun Heo
2010-05-03 13:26 ` Peter Zijlstra
2010-05-04 7:17 ` Tejun Heo
2010-05-04 12:45 ` Peter Zijlstra
2010-05-04 12:49 ` Tejun Heo
2010-04-22 16:09 ` [PATCH 4/4] scheduler: kill paranoia check in synchronize_sched_expedited() Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.