* [RFC][PATCH] sched: Start stopper early
@ 2015-10-07 8:41 Peter Zijlstra
2015-10-07 12:30 ` Oleg Nesterov
` (4 more replies)
0 siblings, 5 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-07 8:41 UTC (permalink / raw)
To: heiko.carstens
Cc: linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar, Rik van Riel
Hi,
So Heiko reported some 'interesting' fail where stop_two_cpus() got
stuck in multi_cpu_stop() with one cpu waiting for another that never
happens.
It _looks_ like the 'other' cpu isn't running and the current best
theory is that we race on cpu-up and get the stop_two_cpus() call in
before the stopper task is running.
This _is_ possible because we set 'online && active' _before_ we do the
smpboot_unpark thing because of ONLINE notifier order.
The below test patch manually starts the stopper task early.
It boots and hotplugs a cpu on my test box so its not insta broken.
---
kernel/sched/core.c | 7 ++++++-
kernel/stop_machine.c | 5 +++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1764a0f..9a56ef7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
rq->age_stamp = sched_clock_cpu(cpu);
}
+extern void cpu_stopper_unpark(unsigned int cpu);
+
static int sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (long)hcpu;
+
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
set_cpu_rq_start_time();
return NOTIFY_OK;
case CPU_ONLINE:
+ cpu_stopper_unpark(cpu);
/*
* At this point a starting CPU has marked itself as online via
* set_cpu_online(). But it might not yet have marked itself
@@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
* Thus, fall-through and help the starting CPU along.
*/
case CPU_DOWN_FAILED:
- set_cpu_active((long)hcpu, true);
+ set_cpu_active(cpu, true);
return NOTIFY_OK;
default:
return NOTIFY_DONE;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5..c674371 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.selfparking = true,
};
+void cpu_stopper_unpark(unsigned int cpu)
+{
+ kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
+}
+
static int __init cpu_stop_init(void)
{
unsigned int cpu;
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
@ 2015-10-07 12:30 ` Oleg Nesterov
2015-10-07 12:38 ` Peter Zijlstra
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
` (3 subsequent siblings)
4 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-07 12:30 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar, Rik van Riel
On 10/07, Peter Zijlstra wrote:
>
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
>
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
>
> This _is_ possible because we set 'online && active'
Argh. Can't really comment this change right now, but this reminds me
that stop_two_cpus() path should not rely on cpu_active() at all. I mean
we should not use this check to avoid the deadlock, migrate_swap_stop()
can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
be replaced by BUG_ON().
Probably slightly off-topic, but what do you finally think about the old
"[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
we discussed in http://marc.info/?t=143750670300014 ?
I won't really insist if you still dislike it, but it seems we both
agree that "lg_lock stop_cpus_lock" must die in any case, and after that
we can the cleanups mentioned above.
And, Peter, I see a lot of interesting emails from you, but currently
can't even read them. I hope very much I will read them later and perhaps
even reply ;)
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 12:30 ` Oleg Nesterov
@ 2015-10-07 12:38 ` Peter Zijlstra
2015-10-07 13:20 ` Oleg Nesterov
0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-07 12:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar, Rik van Riel
On Wed, Oct 07, 2015 at 02:30:46PM +0200, Oleg Nesterov wrote:
> On 10/07, Peter Zijlstra wrote:
> >
> > So Heiko reported some 'interesting' fail where stop_two_cpus() got
> > stuck in multi_cpu_stop() with one cpu waiting for another that never
> > happens.
> >
> > It _looks_ like the 'other' cpu isn't running and the current best
> > theory is that we race on cpu-up and get the stop_two_cpus() call in
> > before the stopper task is running.
> >
> > This _is_ possible because we set 'online && active'
>
> Argh. Can't really comment this change right now, but this reminds me
> that stop_two_cpus() path should not rely on cpu_active() at all. I mean
> we should not use this check to avoid the deadlock, migrate_swap_stop()
> can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
> be replaced by BUG_ON().
>
> Probably slightly off-topic, but what do you finally think about the old
> "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
> we discussed in http://marc.info/?t=143750670300014 ?
>
> I won't really insist if you still dislike it, but it seems we both
> agree that "lg_lock stop_cpus_lock" must die in any case, and after that
> we can the cleanups mentioned above.
Yes, I was looking at that, this issue reminded me we still had that
issue open.
> And, Peter, I see a lot of interesting emails from you, but currently
> can't even read them. I hope very much I will read them later and perhaps
> even reply ;)
Sure, take your time.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 12:38 ` Peter Zijlstra
@ 2015-10-07 13:20 ` Oleg Nesterov
2015-10-07 13:24 ` Oleg Nesterov
2015-10-07 13:36 ` kbuild test robot
0 siblings, 2 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-07 13:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar, Rik van Riel
On 10/07, Peter Zijlstra wrote:
>
> On Wed, Oct 07, 2015 at 02:30:46PM +0200, Oleg Nesterov wrote:
> > On 10/07, Peter Zijlstra wrote:
> > >
> > > So Heiko reported some 'interesting' fail where stop_two_cpus() got
> > > stuck in multi_cpu_stop() with one cpu waiting for another that never
> > > happens.
> > >
> > > It _looks_ like the 'other' cpu isn't running and the current best
> > > theory is that we race on cpu-up and get the stop_two_cpus() call in
> > > before the stopper task is running.
> > >
> > > This _is_ possible because we set 'online && active'
> >
> > Argh. Can't really comment this change right now, but this reminds me
> > that stop_two_cpus() path should not rely on cpu_active() at all. I mean
> > we should not use this check to avoid the deadlock, migrate_swap_stop()
> > can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
> > be replaced by BUG_ON().
> >
> > Probably slightly off-topic, but what do you finally think about the old
> > "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
> > we discussed in http://marc.info/?t=143750670300014 ?
> >
> > I won't really insist if you still dislike it, but it seems we both
> > agree that "lg_lock stop_cpus_lock" must die in any case, and after that
> > we can the cleanups mentioned above.
>
> Yes, I was looking at that, this issue reminded me we still had that
> issue open.
Great, thanks!
But let me add that I tried to confuse you because I forgot what actually
I was going to do... I meant something like the (incomplete) patch below,
and after that we can change stop_two_cpus() to rely on ->enabled and
remove the cpu_active() checks (again, ignoring the fact we do not want
to migrate to inactive CPU). Although I need to recall/recheck this all,
perhaps I missed something...
So while I think we should kill lg_lock in any case, this and the patch
above is absolutely off-topic, we can do this with or without lg_lock
removal.
Oleg.
--- x/kernel/cpu.c
+++ x/kernel/cpu.c
@@ -344,7 +344,7 @@ static int take_cpu_down(void *_param)
/* Give up timekeeping duties */
tick_handover_do_timer();
/* Park the stopper thread */
- kthread_park(current);
+ stop_machine_park(param->hcpu);
return 0;
}
--- x/kernel/stop_machine.c
+++ x/kernel/stop_machine.c
@@ -452,6 +452,15 @@ repeat:
}
}
+void stop_machine_park(int cpu)
+{
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+ spin_lock(&stopper->lock);
+ stopper->enabled = false;
+ spin_unlock(&stopper->lock);
+}
+
extern void sched_set_stop_task(int cpu, struct task_struct *stop);
static void cpu_stop_create(unsigned int cpu)
@@ -468,10 +477,10 @@ static void cpu_stop_park(unsigned int c
/* drain remaining works */
spin_lock_irqsave(&stopper->lock, flags);
list_for_each_entry_safe(work, tmp, &stopper->works, list) {
+ WARN_ON(1);
list_del_init(&work->list);
cpu_stop_signal_done(work->done, false);
}
- stopper->enabled = false;
spin_unlock_irqrestore(&stopper->lock, flags);
}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 13:20 ` Oleg Nesterov
@ 2015-10-07 13:24 ` Oleg Nesterov
2015-10-07 13:36 ` kbuild test robot
1 sibling, 0 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-07 13:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar, Rik van Riel
Damn sorry for noise ;)
On 10/07, Oleg Nesterov wrote:
>
> +void stop_machine_park(int cpu)
> +{
> + struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
> +
> + spin_lock(&stopper->lock);
> + stopper->enabled = false;
> + spin_unlock(&stopper->lock);
Of course, it should also do kthread_park(current).
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 13:20 ` Oleg Nesterov
2015-10-07 13:24 ` Oleg Nesterov
@ 2015-10-07 13:36 ` kbuild test robot
1 sibling, 0 replies; 39+ messages in thread
From: kbuild test robot @ 2015-10-07 13:36 UTC (permalink / raw)
To: Oleg Nesterov
Cc: kbuild-all, Peter Zijlstra, heiko.carstens, linux-kernel,
Tejun Heo, Ingo Molnar, Rik van Riel
[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]
Hi Oleg,
[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]
config: x86_64-randconfig-x019-201540 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
kernel/cpu.c: In function 'take_cpu_down':
>> kernel/cpu.c:347:2: error: implicit declaration of function 'stop_machine_park' [-Werror=implicit-function-declaration]
stop_machine_park(param->hcpu);
^
cc1: some warnings being treated as errors
vim +/stop_machine_park +347 kernel/cpu.c
341 return err;
342
343 cpu_notify(CPU_DYING | param->mod, param->hcpu);
344 /* Give up timekeeping duties */
345 tick_handover_do_timer();
346 /* Park the stopper thread */
> 347 stop_machine_park(param->hcpu);
348 return 0;
349 }
350
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 18806 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early)
2015-10-07 8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
2015-10-07 12:30 ` Oleg Nesterov
@ 2015-10-08 14:50 ` Oleg Nesterov
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
` (2 more replies)
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
` (2 subsequent siblings)
4 siblings, 3 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 14:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar,
Rik van Riel, Thomas Gleixner
On 10/07, Peter Zijlstra wrote:
>
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
>
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
How about this series? Slightly tested.
Note:
- To me this also looks like a preparation for lg lock removal, no
matter how exactly we will do this.
- We can do more cleanups on top of this. Say remove preempt_disable
in stop_two_cpus().
Oleg.
include/linux/stop_machine.h | 1
kernel/cpu.c | 2 -
kernel/stop_machine.c | 83 +++++++++++++++++++++++++++++--------------
3 files changed, 58 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park()
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
@ 2015-10-08 14:51 ` Oleg Nesterov
2015-10-14 15:34 ` Peter Zijlstra
2015-10-20 9:32 ` [tip:sched/core] stop_machine: Ensure " tip-bot for Oleg Nesterov
2015-10-08 14:51 ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
2015-10-08 14:51 ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2 siblings, 2 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 14:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
cpu_stop_queue_work() checks stopper->enabled before it queues the
work, but ->enabled == T can only guarantee cpu_stop_signal_done()
if we race with cpu_down().
This is not enough for stop_two_cpus() or stop_machine(), they will
deadlock if multi_cpu_stop() won't be called by one of the target
CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
But stop_two_cpus() has to check cpu_active() to avoid the same race
with hotplug, and this check is very unobvious and probably not even
correct if we race with cpu_up().
Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.
Note also that smpboot_thread_call() calls cpu_stop_unpark() which
sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
cpu_stopper_thread() is called at least once. This all means that if
cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/stop_machine.h | 1 +
kernel/cpu.c | 2 +-
kernel/stop_machine.c | 23 +++++++++++++----------
3 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 414d924..7b76362 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -33,6 +33,7 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
struct cpu_stop_work *work_buf);
int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+void stop_machine_park(int cpu);
#else /* CONFIG_SMP */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1f4566c..8a7225a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -342,7 +342,7 @@ static int __ref take_cpu_down(void *_param)
/* Give up timekeeping duties */
tick_handover_do_timer();
/* Park the stopper thread */
- kthread_park(current);
+ stop_machine_park((long)param->hcpu);
return 0;
}
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5..6a40209 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -452,6 +452,18 @@ repeat:
}
}
+void stop_machine_park(int cpu)
+{
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+ /*
+ * Lockless. cpu_stopper_thread() will take stopper->lock and flush
+ * the pending works before it parks, until then it is fine to queue
+ * the new works.
+ */
+ stopper->enabled = false;
+ kthread_park(stopper->thread);
+}
+
extern void sched_set_stop_task(int cpu, struct task_struct *stop);
static void cpu_stop_create(unsigned int cpu)
@@ -462,17 +474,8 @@ static void cpu_stop_create(unsigned int cpu)
static void cpu_stop_park(unsigned int cpu)
{
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
- struct cpu_stop_work *work, *tmp;
- unsigned long flags;
- /* drain remaining works */
- spin_lock_irqsave(&stopper->lock, flags);
- list_for_each_entry_safe(work, tmp, &stopper->works, list) {
- list_del_init(&work->list);
- cpu_stop_signal_done(work->done, false);
- }
- stopper->enabled = false;
- spin_unlock_irqrestore(&stopper->lock, flags);
+ WARN_ON(!list_empty(&stopper->works));
}
static void cpu_stop_unpark(unsigned int cpu)
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
@ 2015-10-08 14:51 ` Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Introduce " tip-bot for Oleg Nesterov
2015-10-08 14:51 ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 14:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
Preparation to simplify the review of the next change. Add two simple
helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which
simply take a bit of code from their callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/stop_machine.c | 37 ++++++++++++++++++++++++++-----------
1 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 6a40209..688d6b3 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -73,21 +73,24 @@ static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
}
}
+static void __cpu_stop_queue_work(struct cpu_stopper *stopper,
+ struct cpu_stop_work *work)
+{
+ list_add_tail(&work->list, &stopper->works);
+ wake_up_process(stopper->thread);
+}
+
/* queue @work to @stopper. if offline, @work is completed immediately */
static void cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
{
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
-
unsigned long flags;
spin_lock_irqsave(&stopper->lock, flags);
-
- if (stopper->enabled) {
- list_add_tail(&work->list, &stopper->works);
- wake_up_process(stopper->thread);
- } else
+ if (stopper->enabled)
+ __cpu_stop_queue_work(stopper, work);
+ else
cpu_stop_signal_done(work->done, false);
-
spin_unlock_irqrestore(&stopper->lock, flags);
}
@@ -213,6 +216,16 @@ static int multi_cpu_stop(void *data)
return err;
}
+static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
+ int cpu2, struct cpu_stop_work *work2)
+{
+ lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
+ cpu_stop_queue_work(cpu1, work1);
+ cpu_stop_queue_work(cpu2, work2);
+ lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
+
+ return 0;
+}
/**
* stop_two_cpus - stops two cpus
* @cpu1: the cpu to stop
@@ -260,10 +273,12 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
return -ENOENT;
}
- lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
- cpu_stop_queue_work(cpu1, &work1);
- cpu_stop_queue_work(cpu2, &work2);
- lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
+ if (cpu1 > cpu2)
+ swap(cpu1, cpu2);
+ if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) {
+ preempt_enable();
+ return -ENOENT;
+ }
preempt_enable();
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
2015-10-08 14:51 ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
@ 2015-10-08 14:51 ` Oleg Nesterov
2015-10-08 15:04 ` Peter Zijlstra
2015-10-08 17:01 ` [PATCH v2 " Oleg Nesterov
2 siblings, 2 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 14:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
Change cpu_stop_queue_two_works() to ensure that both CPU's have
stopper->enabled == T or fail otherwise.
This way stop_two_cpus() no longer needs to check cpu_active() to
avoid the deadlock. This patch doesn't remove these checks, we will
do this later.
Note: we need to take both stopper->lock's at the same time, but this
will also help to remove lglock from stop_machine.c, so I hope this
is fine.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/stop_machine.c | 29 ++++++++++++++++++++---------
1 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 688d6b3..6d85d27 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -219,12 +219,27 @@ static int multi_cpu_stop(void *data)
static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
int cpu2, struct cpu_stop_work *work2)
{
+ struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1);
+ struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
+ int err;
+
lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
- cpu_stop_queue_work(cpu1, work1);
- cpu_stop_queue_work(cpu2, work2);
+ spin_lock_irq(&stopper1->lock);
+ spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
+
+ err = -ENOENT;
+ if (!stopper1->enabled || !stopper2->enabled)
+ goto unlock;
+
+ err = 0;
+ __cpu_stop_queue_work(stopper1, work1);
+ __cpu_stop_queue_work(stopper2, work2);
+unlock:
+ spin_unlock(&stopper2->lock);
+ spin_unlock_irq(&stopper1->lock);
lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
- return 0;
+ return err;
}
/**
* stop_two_cpus - stops two cpus
@@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
set_state(&msdata, MULTI_STOP_PREPARE);
/*
- * If we observe both CPUs active we know _cpu_down() cannot yet have
- * queued its stop_machine works and therefore ours will get executed
- * first. Or its not either one of our CPUs that's getting unplugged,
- * in which case we don't care.
- *
- * This relies on the stopper workqueues to be FIFO.
+ * We do not want to migrate to inactive CPU. FIXME: move this
+ * into the caller.
*/
if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
preempt_enable();
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 14:51 ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
@ 2015-10-08 15:04 ` Peter Zijlstra
2015-10-08 15:59 ` Oleg Nesterov
2015-10-08 17:01 ` [PATCH v2 " Oleg Nesterov
1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-08 15:04 UTC (permalink / raw)
To: Oleg Nesterov
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On Thu, Oct 08, 2015 at 04:51:36PM +0200, Oleg Nesterov wrote:
> @@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
> set_state(&msdata, MULTI_STOP_PREPARE);
>
> /*
> + * We do not want to migrate to inactive CPU. FIXME: move this
> + * into the caller.
> */
> if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
> preempt_enable();
So we cannot move that into the caller.. because this function sleeps
with wait_for_completion().
Or rather, it would force the caller to use get_online_cpus(), which we
worked really hard to avoid.
Also, I think we still want the patch I proposed which ensures the
stopper thread is active 'early', because the load balancer pretty much
assumes that its available. And when 'online && active' the
load-balancer is fully available.
Not only the numa balancing stop_two_cpus() caller relies on it, but
also the self migration stuff does, and at CPU_ONLINE time the cpu
really is 'free' to run anything.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 15:04 ` Peter Zijlstra
@ 2015-10-08 15:59 ` Oleg Nesterov
2015-10-08 16:08 ` Oleg Nesterov
0 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 15:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On 10/08, Peter Zijlstra wrote:
>
> On Thu, Oct 08, 2015 at 04:51:36PM +0200, Oleg Nesterov wrote:
> > @@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
> > set_state(&msdata, MULTI_STOP_PREPARE);
> >
> > /*
> > + * We do not want to migrate to inactive CPU. FIXME: move this
> > + * into the caller.
> > */
> > if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
> > preempt_enable();
>
> So we cannot move that into the caller..
Why?
> because this function sleeps
> with wait_for_completion().
>
> Or rather, it would force the caller to use get_online_cpus(), which we
> worked really hard to avoid.
Aaah wait. Sorry for confusion!
I meant "move this into the callback, migrate_swap_stop()".
> Also, I think we still want the patch I proposed which ensures the
> stopper thread is active 'early', because the load balancer pretty much
Perhaps. Although I do not really understand why it is important.
I mean, either way we unpark it at CPU_ONLINE stage, just
sched_cpu_active() has a higher priority.
But this is off-topic in a sense that the main point of this patch
is that stop_two_cpus() no longer needs to abuse cpu_active() checks
to avoid the race with cpu_up/down, we can simply rely on ->enabled.
And again, we need to take both locks to remove "lglock stop_cpus_lock".
So I think your change can be applied after this series too. Or I missed
something?
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 15:59 ` Oleg Nesterov
@ 2015-10-08 16:08 ` Oleg Nesterov
0 siblings, 0 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 16:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On 10/08, Oleg Nesterov wrote:
>
> On 10/08, Peter Zijlstra wrote:
> >
> > On Thu, Oct 08, 2015 at 04:51:36PM +0200, Oleg Nesterov wrote:
> > > @@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
> > > set_state(&msdata, MULTI_STOP_PREPARE);
> > >
> > > /*
> > > + * We do not want to migrate to inactive CPU. FIXME: move this
> > > + * into the caller.
> > > */
> > > if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
> > > preempt_enable();
> >
> > So we cannot move that into the caller..
>
> Why?
>
> > because this function sleeps
> > with wait_for_completion().
> >
> > Or rather, it would force the caller to use get_online_cpus(), which we
> > worked really hard to avoid.
>
> Aaah wait. Sorry for confusion!
>
> I meant "move this into the callback, migrate_swap_stop()".
Forgot to mention... And note that both these cpu_active() are obviously
racy, CPU_DOWN_PREPARE can make it inactive right after the check.
> > Also, I think we still want the patch I proposed which ensures the
> > stopper thread is active 'early', because the load balancer pretty much
>
> Perhaps. Although I do not really understand why it is important.
> I mean, either way we unpark it at CPU_ONLINE stage, just
> sched_cpu_active() has a higher priority.
>
> But this is off-topic in a sense that the main point of this patch
> is that stop_two_cpus() no longer needs to abuse cpu_active() checks
> to avoid the race with cpu_up/down, we can simply rely on ->enabled.
>
> And again, we need to take both locks to remove "lglock stop_cpus_lock".
>
> So I think your change can be applied after this series too. Or I missed
> something?
So I think this series makes sense anyway and hopefully should fix the
problem with or without your change.
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 14:51 ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2015-10-08 15:04 ` Peter Zijlstra
@ 2015-10-08 17:01 ` Oleg Nesterov
2015-10-09 16:37 ` Peter Zijlstra
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Change " tip-bot for Oleg Nesterov
1 sibling, 2 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 17:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On 10/08, Oleg Nesterov wrote:
>
> + * We do not want to migrate to inactive CPU. FIXME: move this
> + * into the caller.
> */
> if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
> preempt_enable();
Of course, this comment is indeed wrong, thanks. Please see V2.
-------------------------------------------------------------------------------
>From 41d6d14e318335212ffac093c09de0b197235b90 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Thu, 8 Oct 2015 16:22:06 +0200
Subject: [PATCH v2 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
Change cpu_stop_queue_two_works() to ensure that both CPU's have
stopper->enabled == T or fail otherwise.
This way stop_two_cpus() no longer needs to check cpu_active() to
avoid the deadlock. This patch doesn't remove these checks, we will
do this later.
Note: we need to take both stopper->lock's at the same time, but this
will also help to remove lglock from stop_machine.c, so I hope this
is fine.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/stop_machine.c | 29 ++++++++++++++++++++---------
1 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 688d6b3..91fbb10 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -219,12 +219,27 @@ static int multi_cpu_stop(void *data)
static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
int cpu2, struct cpu_stop_work *work2)
{
+ struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1);
+ struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
+ int err;
+
lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
- cpu_stop_queue_work(cpu1, work1);
- cpu_stop_queue_work(cpu2, work2);
+ spin_lock_irq(&stopper1->lock);
+ spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
+
+ err = -ENOENT;
+ if (!stopper1->enabled || !stopper2->enabled)
+ goto unlock;
+
+ err = 0;
+ __cpu_stop_queue_work(stopper1, work1);
+ __cpu_stop_queue_work(stopper2, work2);
+unlock:
+ spin_unlock(&stopper2->lock);
+ spin_unlock_irq(&stopper1->lock);
lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
- return 0;
+ return err;
}
/**
* stop_two_cpus - stops two cpus
@@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
set_state(&msdata, MULTI_STOP_PREPARE);
/*
- * If we observe both CPUs active we know _cpu_down() cannot yet have
- * queued its stop_machine works and therefore ours will get executed
- * first. Or its not either one of our CPUs that's getting unplugged,
- * in which case we don't care.
- *
- * This relies on the stopper workqueues to be FIFO.
+ * We do not want to migrate to inactive CPU. FIXME: move this
+ * into migrate_swap_stop() callback.
*/
if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
preempt_enable();
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
2015-10-07 12:30 ` Oleg Nesterov
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
@ 2015-10-08 18:05 ` Oleg Nesterov
2015-10-08 18:47 ` Oleg Nesterov
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-16 8:22 ` [RFC][PATCH] " Heiko Carstens
4 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 18:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar, Rik van Riel
To avoid the confusion, let me repeat that I am not arguing with
this change, perhaps it makes sense too.
But unless I missed something it is not really correct and can't
fix the problem. So I still think the series I sent should be
applied first.
On 10/07, Peter Zijlstra wrote:
> static int sched_cpu_active(struct notifier_block *nfb,
> unsigned long action, void *hcpu)
> {
> + int cpu = (long)hcpu;
> +
> switch (action & ~CPU_TASKS_FROZEN) {
> case CPU_STARTING:
> set_cpu_rq_start_time();
> return NOTIFY_OK;
> case CPU_ONLINE:
> + cpu_stopper_unpark(cpu);
> /*
> * At this point a starting CPU has marked itself as online via
> * set_cpu_online(). But it might not yet have marked itself
> @@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
> * Thus, fall-through and help the starting CPU along.
> */
> case CPU_DOWN_FAILED:
> - set_cpu_active((long)hcpu, true);
> + set_cpu_active(cpu, true);
> return NOTIFY_OK;
> default:
> return NOTIFY_DONE;
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 12484e5..c674371 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
> .selfparking = true,
> };
>
> +void cpu_stopper_unpark(unsigned int cpu)
> +{
> + kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
> +}
But note that kthread_unpark() will only wake the stopper thread up.
cpu_stopper->enabled is still false, and it will be false until
smpboot_unpark_thread() calls ->pre_unpark() later. And this means
that stop_two_cpus()->cpu_stop_queue_work() can silently fail until
then. So I don't this patch can fix the problem.
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
@ 2015-10-08 18:47 ` Oleg Nesterov
0 siblings, 0 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-08 18:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar, Rik van Riel
On 10/08, Oleg Nesterov wrote:
>
> To avoid the confusion, let me repeat that I am not arguing with
> this change, perhaps it makes sense too.
>
> But unless I missed something it is not really correct and can't
> fix the problem. So I still think the series I sent should be
> applied first.
...
> But note that kthread_unpark() will only wake the stopper thread up.
>
> cpu_stopper->enabled is still false, and it will be false until
> smpboot_unpark_thread() calls ->pre_unpark() later. And this means
> that stop_two_cpus()->cpu_stop_queue_work() can silently fail until
> then. So I don't this patch can fix the problem.
So I'd suggest something like the patch below (uncompiled/untested)
on top of the series I sent.
Note also that (at least imo) it looks like a cleanup, and even
->selfparking becomes more consistent.
What do you think?
Oleg.
--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -40,7 +40,6 @@ struct smp_hotplug_thread {
void (*cleanup)(unsigned int cpu, bool online);
void (*park)(unsigned int cpu);
void (*unpark)(unsigned int cpu);
- void (*pre_unpark)(unsigned int cpu);
bool selfparking;
const char *thread_comm;
};
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 7b76362..0adedca 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -34,6 +34,7 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
void stop_machine_park(int cpu);
+void stop_machine_unpark(int cpu);
#else /* CONFIG_SMP */
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -221,9 +221,8 @@ static void smpboot_unpark_thread(struct smp_hotplug_thread *ht, unsigned int cp
{
struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
- if (ht->pre_unpark)
- ht->pre_unpark(cpu);
- kthread_unpark(tsk);
+ if (!ht->selfparking)
+ kthread_unpark(tsk);
}
void smpboot_unpark_threads(unsigned int cpu)
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -513,6 +513,14 @@ static void cpu_stop_unpark(unsigned int cpu)
spin_unlock_irq(&stopper->lock);
}
+void stop_machine_unpark(int cpu)
+{
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+ cpu_stop_unpark(cpu);
+ kthread_unpark(stopper->thread);
+}
+
static struct smp_hotplug_thread cpu_stop_threads = {
.store = &cpu_stopper.thread,
.thread_should_run = cpu_stop_should_run,
@@ -521,7 +529,6 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.create = cpu_stop_create,
.setup = cpu_stop_unpark,
.park = cpu_stop_park,
- .pre_unpark = cpu_stop_unpark,
.selfparking = true,
};
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5515,11 +5515,14 @@ static void set_cpu_rq_start_time(void)
static int sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (long)hcpu;
+
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
set_cpu_rq_start_time();
return NOTIFY_OK;
case CPU_ONLINE:
+ stop_machine_unpark(cpu);
/*
* At this point a starting CPU has marked itself as online via
* set_cpu_online(). But it might not yet have marked itself
@@ -5528,7 +5531,7 @@ static int sched_cpu_active(struct notif
* Thus, fall-through and help the starting CPU along.
*/
case CPU_DOWN_FAILED:
- set_cpu_active((long)hcpu, true);
+ set_cpu_active(cpu, true);
return NOTIFY_OK;
default:
return NOTIFY_DONE;
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 0/3] make stopper threads more "selfparking"
2015-10-07 8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
` (2 preceding siblings ...)
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
@ 2015-10-09 16:00 ` Oleg Nesterov
2015-10-09 16:00 ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
` (2 more replies)
2015-10-16 8:22 ` [RFC][PATCH] " Heiko Carstens
4 siblings, 3 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-09 16:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, linux-kernel, Tejun Heo, Ingo Molnar,
Rik van Riel, Thomas Gleixner
On top of "[PATCH 0/3]" I sent yesterday, although it doesn't
really depend on that series.
Peter, note that 3/3 is from you ;) I hope you won't object.
Thomas, could you please comment? 1/3 looks like a cleanup to me
(if correct), but cleanups are always subjective, so please nack
if you don't like it for any reason.
Oleg.
include/linux/smpboot.h | 4 ----
include/linux/stop_machine.h | 1 +
kernel/sched/core.c | 5 ++++-
kernel/smpboot.c | 5 ++---
kernel/stop_machine.c | 8 +++-----
5 files changed, 10 insertions(+), 13 deletions(-)
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
@ 2015-10-09 16:00 ` Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Kill smp_hotplug_thread-> pre_unpark, " tip-bot for Oleg Nesterov
2015-10-09 16:00 ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
2015-10-09 16:00 ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-09 16:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
1. Change smpboot_unpark_thread() to check ->selfparking, just
like smpboot_park_thread() does.
2. Introduce stop_machine_unpark() which sets ->enabled and calls
kthread_unpark().
3. Change smpboot_thread_call() and cpu_stop_init() to call
stop_machine_unpark() by hand.
This way:
- IMO the ->selfparking logic becomes more consistent.
- We can kill the smp_hotplug_thread->pre_unpark() method.
- We can easily unpark the stopper thread earlier. Say, we
can move stop_machine_unpark() from smpboot_thread_call()
to sched_cpu_active() as Peter suggests.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/smpboot.h | 4 ----
include/linux/stop_machine.h | 1 +
kernel/cpu.c | 1 +
kernel/smpboot.c | 5 ++---
kernel/stop_machine.c | 10 +++++++++-
5 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/linux/smpboot.h b/include/linux/smpboot.h
index da3c593..255661e 100644
--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -24,9 +24,6 @@ struct smpboot_thread_data;
* parked (cpu offline)
* @unpark: Optional unpark function, called when the thread is
* unparked (cpu online)
- * @pre_unpark: Optional unpark function, called before the thread is
- * unparked (cpu online). This is not guaranteed to be
- * called on the target cpu of the thread. Careful!
* @cpumask: Internal state. To update which threads are unparked,
* call smpboot_update_cpumask_percpu_thread().
* @selfparking: Thread is not parked by the park function.
@@ -42,7 +39,6 @@ struct smp_hotplug_thread {
void (*cleanup)(unsigned int cpu, bool online);
void (*park)(unsigned int cpu);
void (*unpark)(unsigned int cpu);
- void (*pre_unpark)(unsigned int cpu);
cpumask_var_t cpumask;
bool selfparking;
const char *thread_comm;
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 7b76362..0adedca 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -34,6 +34,7 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
void stop_machine_park(int cpu);
+void stop_machine_unpark(int cpu);
#else /* CONFIG_SMP */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 8a7225a..ef52b3b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -464,6 +464,7 @@ static int smpboot_thread_call(struct notifier_block *nfb,
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
+ stop_machine_unpark(cpu);
smpboot_unpark_threads(cpu);
break;
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index c697f73..cfd29cb 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -221,9 +221,8 @@ static void smpboot_unpark_thread(struct smp_hotplug_thread *ht, unsigned int cp
{
struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
- if (ht->pre_unpark)
- ht->pre_unpark(cpu);
- kthread_unpark(tsk);
+ if (!ht->selfparking)
+ kthread_unpark(tsk);
}
void smpboot_unpark_threads(unsigned int cpu)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 91fbb10..59096a5 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -513,6 +513,14 @@ static void cpu_stop_unpark(unsigned int cpu)
spin_unlock_irq(&stopper->lock);
}
+void stop_machine_unpark(int cpu)
+{
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+ cpu_stop_unpark(cpu);
+ kthread_unpark(stopper->thread);
+}
+
static struct smp_hotplug_thread cpu_stop_threads = {
.store = &cpu_stopper.thread,
.thread_should_run = cpu_stop_should_run,
@@ -521,7 +529,6 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.create = cpu_stop_create,
.setup = cpu_stop_unpark,
.park = cpu_stop_park,
- .pre_unpark = cpu_stop_unpark,
.selfparking = true,
};
@@ -537,6 +544,7 @@ static int __init cpu_stop_init(void)
}
BUG_ON(smpboot_register_percpu_thread(&cpu_stop_threads));
+ stop_machine_unpark(raw_smp_processor_id());
stop_machine_initialized = true;
return 0;
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark()
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-09 16:00 ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
@ 2015-10-09 16:00 ` Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] stop_machine: Kill " tip-bot for Oleg Nesterov
2015-10-09 16:00 ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-09 16:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
Now that we always use stop_machine_unpark() to wake the stopper
threas up, we can kill ->setup() and fold cpu_stop_unpark() into
stop_machine_unpark().
And we do not need stopper->lock to set stopper->enabled = true.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/stop_machine.c | 12 +-----------
1 files changed, 1 insertions(+), 11 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 59096a5..e5a09d2 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -504,20 +504,11 @@ static void cpu_stop_park(unsigned int cpu)
WARN_ON(!list_empty(&stopper->works));
}
-static void cpu_stop_unpark(unsigned int cpu)
-{
- struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
-
- spin_lock_irq(&stopper->lock);
- stopper->enabled = true;
- spin_unlock_irq(&stopper->lock);
-}
-
void stop_machine_unpark(int cpu)
{
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
- cpu_stop_unpark(cpu);
+ stopper->enabled = true;
kthread_unpark(stopper->thread);
}
@@ -527,7 +518,6 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.thread_fn = cpu_stopper_thread,
.thread_comm = "migration/%u",
.create = cpu_stop_create,
- .setup = cpu_stop_unpark,
.park = cpu_stop_park,
.selfparking = true,
};
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 3/3] sched: start stopper early
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-09 16:00 ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
2015-10-09 16:00 ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
@ 2015-10-09 16:00 ` Oleg Nesterov
2015-10-09 16:49 ` Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] sched: Start " tip-bot for Peter Zijlstra
2 siblings, 2 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-09 16:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
From: Peter Zijlstra <peterz@infradead.org>
Ensure the stopper thread is active 'early', because the load balancer
pretty much assumes that its available. And when 'online && active' the
load-balancer is fully available.
Not only the numa balancing stop_two_cpus() caller relies on it, but
also the self migration stuff does, and at CPU_ONLINE time the cpu
really is 'free' to run anything.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/cpu.c | 1 -
kernel/sched/core.c | 5 ++++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index ef52b3b..8a7225a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -464,7 +464,6 @@ static int smpboot_thread_call(struct notifier_block *nfb,
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
- stop_machine_unpark(cpu);
smpboot_unpark_threads(cpu);
break;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e691052..955c181 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5324,11 +5324,14 @@ static void __cpuinit set_cpu_rq_start_time(void)
static int sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (long)hcpu;
+
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
set_cpu_rq_start_time();
return NOTIFY_OK;
case CPU_ONLINE:
+ stop_machine_unpark(cpu);
/*
* At this point a starting CPU has marked itself as online via
* set_cpu_online(). But it might not yet have marked itself
@@ -5337,7 +5340,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
* Thus, fall-through and help the starting CPU along.
*/
case CPU_DOWN_FAILED:
- set_cpu_active((long)hcpu, true);
+ set_cpu_active(cpu, true);
return NOTIFY_OK;
default:
return NOTIFY_DONE;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 17:01 ` [PATCH v2 " Oleg Nesterov
@ 2015-10-09 16:37 ` Peter Zijlstra
2015-10-09 16:40 ` Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Change " tip-bot for Oleg Nesterov
1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-09 16:37 UTC (permalink / raw)
To: Oleg Nesterov
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On Thu, Oct 08, 2015 at 07:01:41PM +0200, Oleg Nesterov wrote:
> @@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
> set_state(&msdata, MULTI_STOP_PREPARE);
>
> /*
> - * If we observe both CPUs active we know _cpu_down() cannot yet have
> - * queued its stop_machine works and therefore ours will get executed
> - * first. Or its not either one of our CPUs that's getting unplugged,
> - * in which case we don't care.
> - *
> - * This relies on the stopper workqueues to be FIFO.
> + * We do not want to migrate to inactive CPU. FIXME: move this
> + * into migrate_swap_stop() callback.
> */
> if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
> preempt_enable();
I stuck that on top.. I'll have a closer look at the 7 patches later
when I might be more coherent (mad head-ache atm.)
---
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1335,12 +1335,16 @@ static int migrate_swap_stop(void *data)
struct rq *src_rq, *dst_rq;
int ret = -EAGAIN;
+ if (!cpu_active(arg->src_cpu) || !cpu_active(arg->dst_cpu))
+ return -EAGAIN;
+
src_rq = cpu_rq(arg->src_cpu);
dst_rq = cpu_rq(arg->dst_cpu);
double_raw_lock(&arg->src_task->pi_lock,
&arg->dst_task->pi_lock);
double_rq_lock(src_rq, dst_rq);
+
if (task_cpu(arg->dst_task) != arg->dst_cpu)
goto unlock;
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -275,15 +275,6 @@ int stop_two_cpus(unsigned int cpu1, uns
cpu_stop_init_done(&done, 2);
set_state(&msdata, MULTI_STOP_PREPARE);
- /*
- * We do not want to migrate to inactive CPU. FIXME: move this
- * into migrate_swap_stop() callback.
- */
- if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
- preempt_enable();
- return -ENOENT;
- }
-
if (cpu1 > cpu2)
swap(cpu1, cpu2);
if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) {
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-09 16:37 ` Peter Zijlstra
@ 2015-10-09 16:40 ` Oleg Nesterov
0 siblings, 0 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-09 16:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On 10/09, Peter Zijlstra wrote:
>
> I stuck that on top..
Yes, exactly! Plus I'll send another (minor) cleanup on top of
this if everything goes right.
> I'll have a closer look at the 7 patches later
> when I might be more coherent (mad head-ache atm.)
Thanks!
Hmm... but please please see another email I'll send in a minute
in reply to todays 3/3...
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/3] sched: start stopper early
2015-10-09 16:00 ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
@ 2015-10-09 16:49 ` Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] sched: Start " tip-bot for Peter Zijlstra
1 sibling, 0 replies; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-09 16:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel, Vitaly Kuznetsov
On 10/09, Oleg Nesterov wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
Peter, I tried to compromise you.
> case CPU_ONLINE:
> + stop_machine_unpark(cpu);
> /*
> * At this point a starting CPU has marked itself as online via
> * set_cpu_online(). But it might not yet have marked itself
> @@ -5337,7 +5340,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
> * Thus, fall-through and help the starting CPU along.
> */
> case CPU_DOWN_FAILED:
> - set_cpu_active((long)hcpu, true);
> + set_cpu_active(cpu, true);
On a second thought, we can't do this (and your initial change has
the same problem).
We can not wakeup it before set_cpu_active(). This can lead to the
same problem fixed by dd9d3843755da95f6 "sched: Fix cpu_active_mask/
cpu_online_mask race". The stopper thread can hit
BUG_ON(td->cpu != smp_processor_id()) in smpboot_thread_fn().
Easy to fix, CPU_ONLINE should do set_cpu_active() itself and not
fall through to CPU_DOWN_FAILED,
case CPU_ONLINE:
set_cpu_active(cpu, true);
stop_machine_unpark(cpu);
break;
But. This is another proof that stop_two_cpus() must not rely on
cpu_active().
Right?
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park()
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
@ 2015-10-14 15:34 ` Peter Zijlstra
2015-10-14 19:03 ` Oleg Nesterov
2015-10-20 9:32 ` [tip:sched/core] stop_machine: Ensure " tip-bot for Oleg Nesterov
1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-14 15:34 UTC (permalink / raw)
To: Oleg Nesterov
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On Thu, Oct 08, 2015 at 04:51:31PM +0200, Oleg Nesterov wrote:
> cpu_stop_queue_work() checks stopper->enabled before it queues the
> work, but ->enabled == T can only guarantee cpu_stop_signal_done()
> if we race with cpu_down().
>
> This is not enough for stop_two_cpus() or stop_machine(), they will
> deadlock if multi_cpu_stop() won't be called by one of the target
> CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
> But stop_two_cpus() has to check cpu_active() to avoid the same race
> with hotplug, and this check is very unobvious and probably not even
> correct if we race with cpu_up().
>
> Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
> flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.
>
> Note also that smpboot_thread_call() calls cpu_stop_unpark() which
> sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
> cpu_stopper_thread() is called at least once. This all means that if
> cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
This hard relies on the fact that cpu_down uses stop machine, right?
IIRC part of the hotplug rework Thomas is doing is geared towards
breaking away from stop machine. There is nothing fundamental about
hot-unplug that requires stop machine.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park()
2015-10-14 15:34 ` Peter Zijlstra
@ 2015-10-14 19:03 ` Oleg Nesterov
2015-10-14 20:32 ` Peter Zijlstra
0 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-14 19:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On 10/14, Peter Zijlstra wrote:
>
> On Thu, Oct 08, 2015 at 04:51:31PM +0200, Oleg Nesterov wrote:
> > cpu_stop_queue_work() checks stopper->enabled before it queues the
> > work, but ->enabled == T can only guarantee cpu_stop_signal_done()
> > if we race with cpu_down().
> >
> > This is not enough for stop_two_cpus() or stop_machine(), they will
> > deadlock if multi_cpu_stop() won't be called by one of the target
> > CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
> > But stop_two_cpus() has to check cpu_active() to avoid the same race
> > with hotplug, and this check is very unobvious and probably not even
> > correct if we race with cpu_up().
> >
> > Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
> > flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.
> >
> > Note also that smpboot_thread_call() calls cpu_stop_unpark() which
> > sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
> > cpu_stopper_thread() is called at least once. This all means that if
> > cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
>
> This hard relies on the fact that cpu_down uses stop machine, right?
Not really.
> IIRC part of the hotplug rework Thomas is doing is geared towards
> breaking away from stop machine. There is nothing fundamental about
> hot-unplug that requires stop machine.
cpu_down() should park/kill/whatever the percpu stopper thread anyway.
And this path should clear ->enabled, it can also flush the pending
works.
And we need this anyway even if cpu_down() won't use stop_machine(),
I think.
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park()
2015-10-14 19:03 ` Oleg Nesterov
@ 2015-10-14 20:32 ` Peter Zijlstra
2015-10-15 17:02 ` Oleg Nesterov
0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-14 20:32 UTC (permalink / raw)
To: Oleg Nesterov
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On Wed, Oct 14, 2015 at 09:03:56PM +0200, Oleg Nesterov wrote:
> On 10/14, Peter Zijlstra wrote:
> >
> > On Thu, Oct 08, 2015 at 04:51:31PM +0200, Oleg Nesterov wrote:
> > > cpu_stop_queue_work() checks stopper->enabled before it queues the
> > > work, but ->enabled == T can only guarantee cpu_stop_signal_done()
> > > if we race with cpu_down().
> > >
> > > This is not enough for stop_two_cpus() or stop_machine(), they will
> > > deadlock if multi_cpu_stop() won't be called by one of the target
> > > CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
> > > But stop_two_cpus() has to check cpu_active() to avoid the same race
> > > with hotplug, and this check is very unobvious and probably not even
> > > correct if we race with cpu_up().
> > >
> > > Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
> > > flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.
> > >
> > > Note also that smpboot_thread_call() calls cpu_stop_unpark() which
> > > sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
> > > cpu_stopper_thread() is called at least once. This all means that if
> > > cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
> >
> > This hard relies on the fact that cpu_down uses stop machine, right?
>
> Not really.
>
> > IIRC part of the hotplug rework Thomas is doing is geared towards
> > breaking away from stop machine. There is nothing fundamental about
> > hot-unplug that requires stop machine.
>
> cpu_down() should park/kill/whatever the percpu stopper thread anyway.
> And this path should clear ->enabled, it can also flush the pending
> works.
So the proposed patch does: ->enabled=false; park();, which can race
with if (->enabled) wake();
smpboot_thread_fn() will not call ->thread_fn() when should_park(), and
thus any pending work will not get flushed.
It only works now because the stopper task calls park(), which means
cpu_stopper_thread() will flush, but that very much relies on the
stopper thread calling park in itself.
Or I'm just terminally confused.. which is entirely possible.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park()
2015-10-14 20:32 ` Peter Zijlstra
@ 2015-10-15 17:02 ` Oleg Nesterov
2015-10-16 10:49 ` Peter Zijlstra
0 siblings, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2015-10-15 17:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On 10/14, Peter Zijlstra wrote:
>
> On Wed, Oct 14, 2015 at 09:03:56PM +0200, Oleg Nesterov wrote:
> > On 10/14, Peter Zijlstra wrote:
> > >
> > > On Thu, Oct 08, 2015 at 04:51:31PM +0200, Oleg Nesterov wrote:
> > > >
> > > > Note also that smpboot_thread_call() calls cpu_stop_unpark() which
> > > > sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
> > > > cpu_stopper_thread() is called at least once. This all means that if
> > > > cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
> > >
> > > This hard relies on the fact that cpu_down uses stop machine, right?
> >
> > Not really.
> >
> > > IIRC part of the hotplug rework Thomas is doing is geared towards
> > > breaking away from stop machine. There is nothing fundamental about
> > > hot-unplug that requires stop machine.
> >
> > cpu_down() should park/kill/whatever the percpu stopper thread anyway.
> > And this path should clear ->enabled, it can also flush the pending
> > works.
>
> So the proposed patch does: ->enabled=false; park();, which can race
> with if (->enabled) wake();
Yes, so I added the comment to explain that this is fine.
> smpboot_thread_fn() will not call ->thread_fn() when should_park(), and
> thus any pending work will not get flushed.
>
> It only works now because the stopper task calls park(), which means
> cpu_stopper_thread() will flush, but that very much relies on the
> stopper thread calling park in itself.
Yes. IOW, this relies on ->selfparking == T which implies "flush before
park".
But even if we change cpu_down() to avoid stop_machine() I think we need
to keep this "selfparking" logic. In a sense that, for example, this code
void func(int cpu, cpu_stop_fn_t fn)
{
get_online_cpus();
if (cpu_online(cpu) {
int ret = stop_one_cpu(cpu, fn, NULL);
BUG_ON(ret == -ENOENT);
}
put_online_cpus();
}
should be correct. Actually this example is not very good, it would
be better to use stop_one_cpu_nowait() but currently it returns "void"
and hmm, it looks buggy ;) I'll send the fix on top of this series if
you accept it.
Oleg.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-07 8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
` (3 preceding siblings ...)
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
@ 2015-10-16 8:22 ` Heiko Carstens
2015-10-16 9:57 ` Peter Zijlstra
4 siblings, 1 reply; 39+ messages in thread
From: Heiko Carstens @ 2015-10-16 8:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
Rik van Riel, Martin Schwidefsky, Michael Holzheu,
Tobias Orlamuende
On Wed, Oct 07, 2015 at 10:41:10AM +0200, Peter Zijlstra wrote:
> Hi,
>
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
>
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
>
> This _is_ possible because we set 'online && active' _before_ we do the
> smpboot_unpark thing because of ONLINE notifier order.
>
> The below test patch manually starts the stopper task early.
>
> It boots and hotplugs a cpu on my test box so its not insta broken.
>
> ---
> kernel/sched/core.c | 7 ++++++-
> kernel/stop_machine.c | 5 +++++
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1764a0f..9a56ef7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
> rq->age_stamp = sched_clock_cpu(cpu);
> }
>
> +extern void cpu_stopper_unpark(unsigned int cpu);
> +
> static int sched_cpu_active(struct notifier_block *nfb,
> unsigned long action, void *hcpu)
> {
> + int cpu = (long)hcpu;
> +
> switch (action & ~CPU_TASKS_FROZEN) {
> case CPU_STARTING:
> set_cpu_rq_start_time();
> return NOTIFY_OK;
> case CPU_ONLINE:
> + cpu_stopper_unpark(cpu);
> /*
> * At this point a starting CPU has marked itself as online via
> * set_cpu_online(). But it might not yet have marked itself
> @@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
> * Thus, fall-through and help the starting CPU along.
> */
> case CPU_DOWN_FAILED:
> - set_cpu_active((long)hcpu, true);
> + set_cpu_active(cpu, true);
> return NOTIFY_OK;
> default:
> return NOTIFY_DONE;
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 12484e5..c674371 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
> .selfparking = true,
> };
>
> +void cpu_stopper_unpark(unsigned int cpu)
> +{
> + kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
> +}
> +
So, actually this doesn't fix the bug and it _seems_ to be reproducible.
[ FWIW, I will be offline for the next two weeks ]
The bug was reproduced with your patch applied to 4.2.0 (+ couple of
unrelated internal patches).
In addition I cherry-picked these two upstream commits:
dd9d3843755d "sched: Fix cpu_active_mask/cpu_online_mask race"
02cb7aa923ec "stop_machine: Move 'cpu_stopper_task' and
'stop_cpus_work' into 'struct cpu_stopper'"
The new dump again shows one cpu looping in multi_cpu_stop() triggered by
stop_two_cpus(), and the second one will never enter multi_cpu_stop() since
the corresponding cpu_stop_work was never enqueued:
The two cpu_stop_work on the stack of the process that invocated
stop_two_cpus() look like this:
crash> struct cpu_stop_work 0x8ad8afa78
struct cpu_stop_work {
list = {
next = 0x8ad8afa78,
prev = 0x8ad8afa78
},
fn = 0x2091b0 <multi_cpu_stop>,
arg = 0x8ad8afac8,
done = 0x8ad8afaf0
}
crash> struct cpu_stop_work 0x8ad8afaa0
struct cpu_stop_work {
list = {
next = 0x0, <---- NULL indicates it was never enqueued
prev = 0x0
},
fn = 0x2091b0 <multi_cpu_stop>,
arg = 0x8ad8afac8,
done = 0x8ad8afaf0
}
The corresponding struct cpu_stop_done below indicates that at least for
one of them cpu_stop_signal_done() was called (nr_todo == 1). So the idea
is still that this happened when cpu_stop_queue_work() was being called,
but the corresponding stopper was not enabled.
crash> struct -x cpu_stop_done 00000008ad8afaf0
struct cpu_stop_done {
nr_todo = {
counter = 0x1
},
executed = 0x0,
ret = 0x0,
completion = {
done = 0x0,
wait = {
lock = {
{
rlock = {
raw_lock = {
lock = 0x0
},
break_lock = 0x0,
magic = 0xdead4ead,
owner_cpu = 0xffffffff,
owner = 0xffffffffffffffff,
dep_map = {
key = 0x1e901e0 <__key.5629>,
class_cache = {0x188fec0 <lock_classes+298096>, 0x0},
name = 0xb40d0c "&x->wait",
cpu = 0xb,
ip = 0x94e5b2
}
},
{
__padding = "\000\000\000\000\000\000\000\000 ޭN\255\377\377\377\377\377\377\377\377\377\377\377\377",
dep_map = {
key = 0x1e901e0 <__key.5629>,
class_cache = {0x188fec0 <lock_classes+298096>, 0x0},
name = 0xb40d0c "&x->wait",
cpu = 0xb,
ip = 0x94e5b2
}
}
}
},
task_list = {
next = 0x8ad8afa20,
prev = 0x8ad8afa20
}
}
}
}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-16 8:22 ` [RFC][PATCH] " Heiko Carstens
@ 2015-10-16 9:57 ` Peter Zijlstra
2015-10-16 12:01 ` Heiko Carstens
0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-16 9:57 UTC (permalink / raw)
To: Heiko Carstens
Cc: linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
Rik van Riel, Martin Schwidefsky, Michael Holzheu,
Tobias Orlamuende
On Fri, Oct 16, 2015 at 10:22:12AM +0200, Heiko Carstens wrote:
> So, actually this doesn't fix the bug and it _seems_ to be reproducible.
>
> [ FWIW, I will be offline for the next two weeks ]
So the series from Oleg would be good to try; I can make a git tree for
you, or otherwise stuff the lot into a single patch.
Should I be talking to someone else whilst you're having down time?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park()
2015-10-15 17:02 ` Oleg Nesterov
@ 2015-10-16 10:49 ` Peter Zijlstra
0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-16 10:49 UTC (permalink / raw)
To: Oleg Nesterov
Cc: heiko.carstens, Tejun Heo, Ingo Molnar, Rik van Riel,
Thomas Gleixner, linux-kernel
On Thu, Oct 15, 2015 at 07:02:47PM +0200, Oleg Nesterov wrote:
> Yes. IOW, this relies on ->selfparking == T which implies "flush before
> park".
Ah, we have a different reading of 'selfparking'. OK.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-16 9:57 ` Peter Zijlstra
@ 2015-10-16 12:01 ` Heiko Carstens
2015-10-26 14:24 ` Michael Holzheu
0 siblings, 1 reply; 39+ messages in thread
From: Heiko Carstens @ 2015-10-16 12:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
Rik van Riel, Martin Schwidefsky, Michael Holzheu,
Tobias Orlamuende
On Fri, Oct 16, 2015 at 11:57:06AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 16, 2015 at 10:22:12AM +0200, Heiko Carstens wrote:
> > So, actually this doesn't fix the bug and it _seems_ to be reproducible.
> >
> > [ FWIW, I will be offline for the next two weeks ]
>
> So the series from Oleg would be good to try; I can make a git tree for
> you, or otherwise stuff the lot into a single patch.
>
> Should I be talking to someone else whilst you're having down time?
Yes Michael Holzheu (on cc), can take care of this.
Thanks,
Heiko
^ permalink raw reply [flat|nested] 39+ messages in thread
* [tip:sched/core] stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
2015-10-14 15:34 ` Peter Zijlstra
@ 2015-10-20 9:32 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-10-20 9:32 UTC (permalink / raw)
To: linux-tip-commits
Cc: efault, tglx, akpm, paulmck, linux-kernel, oleg, mingo, hpa,
torvalds, tj, riel, peterz
Commit-ID: 233e7f267e580fefdeb36628b7efe8bfe056d27c
Gitweb: http://git.kernel.org/tip/233e7f267e580fefdeb36628b7efe8bfe056d27c
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Thu, 8 Oct 2015 16:51:31 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:23:53 +0200
stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
cpu_stop_queue_work() checks stopper->enabled before it queues the
work, but ->enabled == T can only guarantee cpu_stop_signal_done()
if we race with cpu_down().
This is not enough for stop_two_cpus() or stop_machine(), they will
deadlock if multi_cpu_stop() won't be called by one of the target
CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
But stop_two_cpus() has to check cpu_active() to avoid the same race
with hotplug, and this check is very unobvious and probably not even
correct if we race with cpu_up().
Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.
Note also that smpboot_thread_call() calls cpu_stop_unpark() which
sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
cpu_stopper_thread() is called at least once. This all means that if
cpu_stop_queue_work() succeeds, we know that work->fn() will be called.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/stop_machine.h | 1 +
kernel/cpu.c | 2 +-
kernel/stop_machine.c | 23 +++++++++++++----------
3 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 414d924..7b76362 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -33,6 +33,7 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
struct cpu_stop_work *work_buf);
int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+void stop_machine_park(int cpu);
#else /* CONFIG_SMP */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 050c634..c85df27 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -344,7 +344,7 @@ static int take_cpu_down(void *_param)
/* Give up timekeeping duties */
tick_handover_do_timer();
/* Park the stopper thread */
- kthread_park(current);
+ stop_machine_park((long)param->hcpu);
return 0;
}
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 12484e5..6a40209 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -452,6 +452,18 @@ repeat:
}
}
+void stop_machine_park(int cpu)
+{
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+ /*
+ * Lockless. cpu_stopper_thread() will take stopper->lock and flush
+ * the pending works before it parks, until then it is fine to queue
+ * the new works.
+ */
+ stopper->enabled = false;
+ kthread_park(stopper->thread);
+}
+
extern void sched_set_stop_task(int cpu, struct task_struct *stop);
static void cpu_stop_create(unsigned int cpu)
@@ -462,17 +474,8 @@ static void cpu_stop_create(unsigned int cpu)
static void cpu_stop_park(unsigned int cpu)
{
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
- struct cpu_stop_work *work, *tmp;
- unsigned long flags;
- /* drain remaining works */
- spin_lock_irqsave(&stopper->lock, flags);
- list_for_each_entry_safe(work, tmp, &stopper->works, list) {
- list_del_init(&work->list);
- cpu_stop_signal_done(work->done, false);
- }
- stopper->enabled = false;
- spin_unlock_irqrestore(&stopper->lock, flags);
+ WARN_ON(!list_empty(&stopper->works));
}
static void cpu_stop_unpark(unsigned int cpu)
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [tip:sched/core] stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
2015-10-08 14:51 ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
@ 2015-10-20 9:33 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-10-20 9:33 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, efault, akpm, mingo, tglx, tj, peterz, riel,
torvalds, hpa, paulmck, oleg
Commit-ID: 5caa1c089aebcb83ccd5b79a3b88b0aa58288d05
Gitweb: http://git.kernel.org/tip/5caa1c089aebcb83ccd5b79a3b88b0aa58288d05
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Thu, 8 Oct 2015 16:51:34 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:23:54 +0200
stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
Preparation to simplify the review of the next change. Add two simple
helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which
simply take a bit of code from their callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008145134.GA18146@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/stop_machine.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 6a40209..688d6b3 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -73,21 +73,24 @@ static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
}
}
+static void __cpu_stop_queue_work(struct cpu_stopper *stopper,
+ struct cpu_stop_work *work)
+{
+ list_add_tail(&work->list, &stopper->works);
+ wake_up_process(stopper->thread);
+}
+
/* queue @work to @stopper. if offline, @work is completed immediately */
static void cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
{
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
-
unsigned long flags;
spin_lock_irqsave(&stopper->lock, flags);
-
- if (stopper->enabled) {
- list_add_tail(&work->list, &stopper->works);
- wake_up_process(stopper->thread);
- } else
+ if (stopper->enabled)
+ __cpu_stop_queue_work(stopper, work);
+ else
cpu_stop_signal_done(work->done, false);
-
spin_unlock_irqrestore(&stopper->lock, flags);
}
@@ -213,6 +216,16 @@ static int multi_cpu_stop(void *data)
return err;
}
+static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
+ int cpu2, struct cpu_stop_work *work2)
+{
+ lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
+ cpu_stop_queue_work(cpu1, work1);
+ cpu_stop_queue_work(cpu2, work2);
+ lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
+
+ return 0;
+}
/**
* stop_two_cpus - stops two cpus
* @cpu1: the cpu to stop
@@ -260,10 +273,12 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
return -ENOENT;
}
- lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
- cpu_stop_queue_work(cpu1, &work1);
- cpu_stop_queue_work(cpu2, &work2);
- lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
+ if (cpu1 > cpu2)
+ swap(cpu1, cpu2);
+ if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) {
+ preempt_enable();
+ return -ENOENT;
+ }
preempt_enable();
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [tip:sched/core] stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
2015-10-08 17:01 ` [PATCH v2 " Oleg Nesterov
2015-10-09 16:37 ` Peter Zijlstra
@ 2015-10-20 9:33 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-10-20 9:33 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, tglx, mingo, oleg, riel, akpm, linux-kernel, tj, hpa,
efault, paulmck, torvalds
Commit-ID: d8bc853582bfd81a9c08ca6922aeb01570080ccc
Gitweb: http://git.kernel.org/tip/d8bc853582bfd81a9c08ca6922aeb01570080ccc
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Thu, 8 Oct 2015 19:01:41 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:23:55 +0200
stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
Change cpu_stop_queue_two_works() to ensure that both CPU's have
stopper->enabled == T or fail otherwise.
This way stop_two_cpus() no longer needs to check cpu_active() to
avoid the deadlock. This patch doesn't remove these checks, we will
do this later.
Note: we need to take both stopper->lock's at the same time, but this
will also help to remove lglock from stop_machine.c, so I hope this
is fine.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008170141.GA25537@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/stop_machine.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 688d6b3..91fbb10 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -219,12 +219,27 @@ static int multi_cpu_stop(void *data)
static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
int cpu2, struct cpu_stop_work *work2)
{
+ struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1);
+ struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
+ int err;
+
lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
- cpu_stop_queue_work(cpu1, work1);
- cpu_stop_queue_work(cpu2, work2);
+ spin_lock_irq(&stopper1->lock);
+ spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
+
+ err = -ENOENT;
+ if (!stopper1->enabled || !stopper2->enabled)
+ goto unlock;
+
+ err = 0;
+ __cpu_stop_queue_work(stopper1, work1);
+ __cpu_stop_queue_work(stopper2, work2);
+unlock:
+ spin_unlock(&stopper2->lock);
+ spin_unlock_irq(&stopper1->lock);
lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);
- return 0;
+ return err;
}
/**
* stop_two_cpus - stops two cpus
@@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
set_state(&msdata, MULTI_STOP_PREPARE);
/*
- * If we observe both CPUs active we know _cpu_down() cannot yet have
- * queued its stop_machine works and therefore ours will get executed
- * first. Or its not either one of our CPUs that's getting unplugged,
- * in which case we don't care.
- *
- * This relies on the stopper workqueues to be FIFO.
+ * We do not want to migrate to inactive CPU. FIXME: move this
+ * into migrate_swap_stop() callback.
*/
if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
preempt_enable();
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [tip:sched/core] stop_machine: Kill smp_hotplug_thread-> pre_unpark, introduce stop_machine_unpark()
2015-10-09 16:00 ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
@ 2015-10-20 9:33 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-10-20 9:33 UTC (permalink / raw)
To: linux-tip-commits
Cc: tj, tglx, mingo, riel, paulmck, akpm, oleg, efault, linux-kernel,
torvalds, peterz, hpa
Commit-ID: c00166d87e730088d919814020e96ffed129d0d1
Gitweb: http://git.kernel.org/tip/c00166d87e730088d919814020e96ffed129d0d1
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 9 Oct 2015 18:00:49 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:23:55 +0200
stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
1. Change smpboot_unpark_thread() to check ->selfparking, just
like smpboot_park_thread() does.
2. Introduce stop_machine_unpark() which sets ->enabled and calls
kthread_unpark().
3. Change smpboot_thread_call() and cpu_stop_init() to call
stop_machine_unpark() by hand.
This way:
- IMO the ->selfparking logic becomes more consistent.
- We can kill the smp_hotplug_thread->pre_unpark() method.
- We can easily unpark the stopper thread earlier. Say, we
can move stop_machine_unpark() from smpboot_thread_call()
to sched_cpu_active() as Peter suggests.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160049.GA10166@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/smpboot.h | 4 ----
include/linux/stop_machine.h | 1 +
kernel/cpu.c | 1 +
kernel/smpboot.c | 5 ++---
kernel/stop_machine.c | 10 +++++++++-
5 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/linux/smpboot.h b/include/linux/smpboot.h
index e6109a6..12910cf 100644
--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -24,9 +24,6 @@ struct smpboot_thread_data;
* parked (cpu offline)
* @unpark: Optional unpark function, called when the thread is
* unparked (cpu online)
- * @pre_unpark: Optional unpark function, called before the thread is
- * unparked (cpu online). This is not guaranteed to be
- * called on the target cpu of the thread. Careful!
* @cpumask: Internal state. To update which threads are unparked,
* call smpboot_update_cpumask_percpu_thread().
* @selfparking: Thread is not parked by the park function.
@@ -42,7 +39,6 @@ struct smp_hotplug_thread {
void (*cleanup)(unsigned int cpu, bool online);
void (*park)(unsigned int cpu);
void (*unpark)(unsigned int cpu);
- void (*pre_unpark)(unsigned int cpu);
cpumask_var_t cpumask;
bool selfparking;
const char *thread_comm;
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 7b76362..0adedca 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -34,6 +34,7 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
void stop_machine_park(int cpu);
+void stop_machine_unpark(int cpu);
#else /* CONFIG_SMP */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c85df27..6467521 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -475,6 +475,7 @@ static int smpboot_thread_call(struct notifier_block *nfb,
case CPU_DOWN_FAILED:
case CPU_ONLINE:
+ stop_machine_unpark(cpu);
smpboot_unpark_threads(cpu);
break;
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index a818cbc..d264f59 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -222,9 +222,8 @@ static void smpboot_unpark_thread(struct smp_hotplug_thread *ht, unsigned int cp
{
struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
- if (ht->pre_unpark)
- ht->pre_unpark(cpu);
- kthread_unpark(tsk);
+ if (!ht->selfparking)
+ kthread_unpark(tsk);
}
void smpboot_unpark_threads(unsigned int cpu)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 91fbb10..59096a5 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -513,6 +513,14 @@ static void cpu_stop_unpark(unsigned int cpu)
spin_unlock_irq(&stopper->lock);
}
+void stop_machine_unpark(int cpu)
+{
+ struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+ cpu_stop_unpark(cpu);
+ kthread_unpark(stopper->thread);
+}
+
static struct smp_hotplug_thread cpu_stop_threads = {
.store = &cpu_stopper.thread,
.thread_should_run = cpu_stop_should_run,
@@ -521,7 +529,6 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.create = cpu_stop_create,
.setup = cpu_stop_unpark,
.park = cpu_stop_park,
- .pre_unpark = cpu_stop_unpark,
.selfparking = true,
};
@@ -537,6 +544,7 @@ static int __init cpu_stop_init(void)
}
BUG_ON(smpboot_register_percpu_thread(&cpu_stop_threads));
+ stop_machine_unpark(raw_smp_processor_id());
stop_machine_initialized = true;
return 0;
}
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [tip:sched/core] stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
2015-10-09 16:00 ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
@ 2015-10-20 9:34 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-10-20 9:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: oleg, tglx, paulmck, akpm, mingo, torvalds, tj, linux-kernel,
hpa, riel, efault, peterz
Commit-ID: f0cf16cbd0659d2dd21352da9f06f3fab7a51596
Gitweb: http://git.kernel.org/tip/f0cf16cbd0659d2dd21352da9f06f3fab7a51596
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 9 Oct 2015 18:00:51 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:23:56 +0200
stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
Now that we always use stop_machine_unpark() to wake the stopper
threas up, we can kill ->setup() and fold cpu_stop_unpark() into
stop_machine_unpark().
And we do not need stopper->lock to set stopper->enabled = true.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160051.GA10169@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/stop_machine.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 59096a5..e5a09d2 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -504,20 +504,11 @@ static void cpu_stop_park(unsigned int cpu)
WARN_ON(!list_empty(&stopper->works));
}
-static void cpu_stop_unpark(unsigned int cpu)
-{
- struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
-
- spin_lock_irq(&stopper->lock);
- stopper->enabled = true;
- spin_unlock_irq(&stopper->lock);
-}
-
void stop_machine_unpark(int cpu)
{
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
- cpu_stop_unpark(cpu);
+ stopper->enabled = true;
kthread_unpark(stopper->thread);
}
@@ -527,7 +518,6 @@ static struct smp_hotplug_thread cpu_stop_threads = {
.thread_fn = cpu_stopper_thread,
.thread_comm = "migration/%u",
.create = cpu_stop_create,
- .setup = cpu_stop_unpark,
.park = cpu_stop_park,
.selfparking = true,
};
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [tip:sched/core] sched: Start stopper early
2015-10-09 16:00 ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2015-10-09 16:49 ` Oleg Nesterov
@ 2015-10-20 9:34 ` tip-bot for Peter Zijlstra
1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-10-20 9:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, riel, tglx, efault, hpa, linux-kernel, paulmck, oleg,
torvalds, tj, peterz, akpm
Commit-ID: 07f06cb3b5f6bd21374a48dbefdb431d71d53974
Gitweb: http://git.kernel.org/tip/07f06cb3b5f6bd21374a48dbefdb431d71d53974
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 9 Oct 2015 18:00:54 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 20 Oct 2015 10:25:55 +0200
sched: Start stopper early
Ensure the stopper thread is active 'early', because the load balancer
pretty much assumes that its available. And when 'online && active' the
load-balancer is fully available.
Not only the numa balancing stop_two_cpus() caller relies on it, but
also the self migration stuff does, and at CPU_ONLINE time the cpu
really is 'free' to run anything.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160054.GA10176@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/cpu.c | 1 -
kernel/sched/core.c | 12 +++++++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6467521..c85df27 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -475,7 +475,6 @@ static int smpboot_thread_call(struct notifier_block *nfb,
case CPU_DOWN_FAILED:
case CPU_ONLINE:
- stop_machine_unpark(cpu);
smpboot_unpark_threads(cpu);
break;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f45a7c7..7ee8cae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5545,21 +5545,27 @@ static void set_cpu_rq_start_time(void)
static int sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+ int cpu = (long)hcpu;
+
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
set_cpu_rq_start_time();
return NOTIFY_OK;
+
case CPU_ONLINE:
/*
* At this point a starting CPU has marked itself as online via
* set_cpu_online(). But it might not yet have marked itself
* as active, which is essential from here on.
- *
- * Thus, fall-through and help the starting CPU along.
*/
+ set_cpu_active(cpu, true);
+ stop_machine_unpark(cpu);
+ return NOTIFY_OK;
+
case CPU_DOWN_FAILED:
- set_cpu_active((long)hcpu, true);
+ set_cpu_active(cpu, true);
return NOTIFY_OK;
+
default:
return NOTIFY_DONE;
}
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-16 12:01 ` Heiko Carstens
@ 2015-10-26 14:24 ` Michael Holzheu
2015-10-26 20:20 ` Peter Zijlstra
0 siblings, 1 reply; 39+ messages in thread
From: Michael Holzheu @ 2015-10-26 14:24 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, linux-kernel, Tejun Heo, Oleg Nesterov,
Ingo Molnar, Rik van Riel, Martin Schwidefsky, Tobias Orlamuende
On Fri, 16 Oct 2015 14:01:25 +0200
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> On Fri, Oct 16, 2015 at 11:57:06AM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 16, 2015 at 10:22:12AM +0200, Heiko Carstens wrote:
> > > So, actually this doesn't fix the bug and it _seems_ to be reproducible.
> > >
> > > [ FWIW, I will be offline for the next two weeks ]
> >
> > So the series from Oleg would be good to try; I can make a git tree for
> > you, or otherwise stuff the lot into a single patch.
> >
> > Should I be talking to someone else whilst you're having down time?
>
> Yes Michael Holzheu (on cc), can take care of this.
I tested Peter's "tip/master" and "tip/sched/core". With the following
commit our issue seems to be fixed:
2b621a085a ("stop_machine: Change cpu_stop_queue_two_works() to rely
on stopper->enabled")
When do you plan to merge the patch series in the mainline kernel?
Regards,
Michael
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [RFC][PATCH] sched: Start stopper early
2015-10-26 14:24 ` Michael Holzheu
@ 2015-10-26 20:20 ` Peter Zijlstra
0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2015-10-26 20:20 UTC (permalink / raw)
To: Michael Holzheu
Cc: Heiko Carstens, linux-kernel, Tejun Heo, Oleg Nesterov,
Ingo Molnar, Rik van Riel, Martin Schwidefsky, Tobias Orlamuende
On Mon, Oct 26, 2015 at 03:24:36PM +0100, Michael Holzheu wrote:
> On Fri, 16 Oct 2015 14:01:25 +0200
> Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > On Fri, Oct 16, 2015 at 11:57:06AM +0200, Peter Zijlstra wrote:
> > > On Fri, Oct 16, 2015 at 10:22:12AM +0200, Heiko Carstens wrote:
> > > > So, actually this doesn't fix the bug and it _seems_ to be reproducible.
> > > >
> > > > [ FWIW, I will be offline for the next two weeks ]
> > >
> > > So the series from Oleg would be good to try; I can make a git tree for
> > > you, or otherwise stuff the lot into a single patch.
> > >
> > > Should I be talking to someone else whilst you're having down time?
> >
> > Yes Michael Holzheu (on cc), can take care of this.
>
> I tested Peter's "tip/master" and "tip/sched/core". With the following
> commit our issue seems to be fixed:
>
> 2b621a085a ("stop_machine: Change cpu_stop_queue_two_works() to rely
> on stopper->enabled")
>
> When do you plan to merge the patch series in the mainline kernel?
They're slated for the next merge window. Thanks for testing!
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2015-10-26 20:20 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-07 8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
2015-10-07 12:30 ` Oleg Nesterov
2015-10-07 12:38 ` Peter Zijlstra
2015-10-07 13:20 ` Oleg Nesterov
2015-10-07 13:24 ` Oleg Nesterov
2015-10-07 13:36 ` kbuild test robot
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
2015-10-08 14:51 ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
2015-10-14 15:34 ` Peter Zijlstra
2015-10-14 19:03 ` Oleg Nesterov
2015-10-14 20:32 ` Peter Zijlstra
2015-10-15 17:02 ` Oleg Nesterov
2015-10-16 10:49 ` Peter Zijlstra
2015-10-20 9:32 ` [tip:sched/core] stop_machine: Ensure " tip-bot for Oleg Nesterov
2015-10-08 14:51 ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Introduce " tip-bot for Oleg Nesterov
2015-10-08 14:51 ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2015-10-08 15:04 ` Peter Zijlstra
2015-10-08 15:59 ` Oleg Nesterov
2015-10-08 16:08 ` Oleg Nesterov
2015-10-08 17:01 ` [PATCH v2 " Oleg Nesterov
2015-10-09 16:37 ` Peter Zijlstra
2015-10-09 16:40 ` Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Change " tip-bot for Oleg Nesterov
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
2015-10-08 18:47 ` Oleg Nesterov
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-09 16:00 ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
2015-10-20 9:33 ` [tip:sched/core] stop_machine: Kill smp_hotplug_thread-> pre_unpark, " tip-bot for Oleg Nesterov
2015-10-09 16:00 ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] stop_machine: Kill " tip-bot for Oleg Nesterov
2015-10-09 16:00 ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2015-10-09 16:49 ` Oleg Nesterov
2015-10-20 9:34 ` [tip:sched/core] sched: Start " tip-bot for Peter Zijlstra
2015-10-16 8:22 ` [RFC][PATCH] " Heiko Carstens
2015-10-16 9:57 ` Peter Zijlstra
2015-10-16 12:01 ` Heiko Carstens
2015-10-26 14:24 ` Michael Holzheu
2015-10-26 20:20 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.