linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] smp/hotplug annotations
@ 2017-09-05  7:52 Peter Zijlstra
  2017-09-05  7:52 ` [PATCH 1/2] smp/hotplug,lockdep: Annotate st->done Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-05  7:52 UTC (permalink / raw)
  To: mingo, tglx; +Cc: linux-kernel, peterz

These two patches appear to make hotplug work again without tripping lockdep.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] smp/hotplug,lockdep: Annotate st->done
  2017-09-05  7:52 [PATCH 0/2] smp/hotplug annotations Peter Zijlstra
@ 2017-09-05  7:52 ` Peter Zijlstra
  2017-09-05  7:52 ` [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state Peter Zijlstra
  2017-09-05 13:31 ` [PATCH 0/2] smp/hotplug annotations Thomas Gleixner
  2 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-05  7:52 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, Byungchul Park, Sebastian Andrzej Siewior,
	Mike Galbraith

[-- Attachment #1: peter_zijlstra-hotplug_lockdep_splat_tip.patch --]
[-- Type: text/plain, Size: 2846 bytes --]

With the new lockdep cross-release feature, cpu hotplug reports the
following deadlock:

  takedown_cpu()
    irq_lock_sparse()
    wait_for_completion(&st->done)

				cpuhp_thread_fun
				  cpuhp_up_callback
				    cpuhp_invoke_callback
				      irq_affinity_online_cpu
				        irq_local_spare()
					irq_unlock_sparse()
				  complete(&st->done)

However, CPU-up and CPU-down are globally serialized, so the above
scenario cannot in fact happen.

Annotate this by splitting the st->done dependency chain for up and
down.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Byungchul Park <max.byungchul.park@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index acf5308fad51..0f44ddf64f24 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -533,6 +533,28 @@ void __init cpuhp_threads_init(void)
 	kthread_unpark(this_cpu_read(cpuhp_state.thread));
 }
 
+/*
+ * _cpu_down() and _cpu_up() have different lock ordering wrt st->done, but
+ * because these two functions are globally serialized and st->done is private
+ * to them, we can simply re-init st->done for each of them to separate the
+ * lock chains.
+ *
+ * Must be macro to ensure we have two different call sites.
+ */
+#ifdef CONFIG_LOCKDEP
+#define lockdep_reinit_st_done()				\
+do {								\
+	int __cpu;						\
+	for_each_possible_cpu(__cpu) {				\
+		struct cpuhp_cpu_state *st =			\
+			per_cpu_ptr(&cpuhp_state, __cpu);	\
+		init_completion(&st->done);			\
+	}							\
+} while(0)
+#else
+#define lockdep_reinit_st_done()
+#endif
+
 #ifdef CONFIG_HOTPLUG_CPU
 /**
  * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
@@ -676,12 +698,6 @@ void cpuhp_report_idle_dead(void)
 				 cpuhp_complete_idle_dead, st, 0);
 }
 
-#else
-#define takedown_cpu		NULL
-#endif
-
-#ifdef CONFIG_HOTPLUG_CPU
-
 /* Requires cpu_add_remove_lock to be held */
 static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 			   enum cpuhp_state target)
@@ -697,6 +713,8 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 
 	cpus_write_lock();
 
+	lockdep_reinit_st_done();
+
 	cpuhp_tasks_frozen = tasks_frozen;
 
 	prev_state = st->state;
@@ -759,6 +777,9 @@ int cpu_down(unsigned int cpu)
 	return do_cpu_down(cpu, CPUHP_OFFLINE);
 }
 EXPORT_SYMBOL(cpu_down);
+
+#else
+#define takedown_cpu		NULL
 #endif /*CONFIG_HOTPLUG_CPU*/
 
 /**
@@ -806,6 +827,8 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
 
 	cpus_write_lock();
 
+	lockdep_reinit_st_done();
+
 	if (!cpu_present(cpu)) {
 		ret = -EINVAL;
 		goto out;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state
  2017-09-05  7:52 [PATCH 0/2] smp/hotplug annotations Peter Zijlstra
  2017-09-05  7:52 ` [PATCH 1/2] smp/hotplug,lockdep: Annotate st->done Peter Zijlstra
@ 2017-09-05  7:52 ` Peter Zijlstra
  2017-09-05 12:59   ` Mike Galbraith
  2017-09-19 17:57   ` Paul E. McKenney
  2017-09-05 13:31 ` [PATCH 0/2] smp/hotplug annotations Thomas Gleixner
  2 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-05  7:52 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, Byungchul Park, Sebastian Andrzej Siewior,
	Mike Galbraith

[-- Attachment #1: peterz-hotplug-splat-2.patch --]
[-- Type: text/plain, Size: 2183 bytes --]

After the st->done annotation, lockdep cross-release now complains
about:

  CPU0                  CPU1                    CPU2
  cpuhp_up_callbacks:	takedown_cpu:		cpuhp_thread_fun:

  cpuhp_state
                        irq_lock_sparse()
    irq_lock_sparse()
                        wait_for_completion()
                                                cpuhp_state
                                                complete()

which again spells deadlock, because CPU0 needs to wait for CPU1's
irq_lock_sparse which will wait for CPU2's completion, which in turn
waits for CPU0's cpuhp_state.

Now, this again mixes up and down chains, but now on cpuhp_state.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Byungchul Park <max.byungchul.park@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -67,11 +67,14 @@ struct cpuhp_cpu_state {
 static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
 
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
-static struct lock_class_key cpuhp_state_key;
+static struct lock_class_key cpuhp_state_up_key;
+#ifdef CONFIG_HOTPLUG_CPU
+static struct lock_class_key cpuhp_state_down_key;
+#endif
 static struct lockdep_map cpuhp_state_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key);
+	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_key);
 #endif
 
 /**
  * cpuhp_step - Hotplug state machine step
  * @name:	Name of the step
@@ -714,6 +718,8 @@ static int __ref _cpu_down(unsigned int
 	cpus_write_lock();
 
 	lockdep_reinit_st_done();
+	lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-down",
+			 &cpuhp_state_down_key, 0);
 
 	cpuhp_tasks_frozen = tasks_frozen;
 
@@ -828,6 +834,8 @@ static int _cpu_up(unsigned int cpu, int
 	cpus_write_lock();
 
 	lockdep_reinit_st_done();
+	lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-up",
+			 &cpuhp_state_up_key, 0);
 
 	if (!cpu_present(cpu)) {
 		ret = -EINVAL;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state
  2017-09-05  7:52 ` [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state Peter Zijlstra
@ 2017-09-05 12:59   ` Mike Galbraith
  2017-09-19 17:57   ` Paul E. McKenney
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Galbraith @ 2017-09-05 12:59 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, tglx
  Cc: linux-kernel, Byungchul Park, Sebastian Andrzej Siewior

Mm, wants a build bandaid for !CONFIG_LOCKDEP

kernel/cpu.c: In function ‘_cpu_down’:
kernel/cpu.c:720:43: error: ‘cpuhp_state_down_key’ undeclared (first use in this function)
  lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-down",
                                           ^
kernel/cpu.c:720:43: note: each undeclared identifier is reported only once for each function it appears in
kernel/cpu.c: In function ‘_cpu_up’:
kernel/cpu.c:836:41: error: ‘cpuhp_state_up_key’ undeclared (first use in this function)
  lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-up",
                                         ^
scripts/Makefile.build:311: recipe for target 'kernel/cpu.o' failed
make[1]: *** [kernel/cpu.o] Error 1
Makefile:1666: recipe for target 'kernel/cpu.o' failed
make: *** [kernel/cpu.o] Error 2

(mine below, fold or vaporize as you see fit)

---
 kernel/cpu.c |   20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -545,7 +545,7 @@ void __init cpuhp_threads_init(void)
  * Must be macro to ensure we have two different call sites.
  */
 #ifdef CONFIG_LOCKDEP
-#define lockdep_reinit_st_done()				\
+#define lockdep_reinit_st_done(up)				\
 do {								\
 	int __cpu;						\
 	for_each_possible_cpu(__cpu) {				\
@@ -553,9 +553,17 @@ do {								\
 			per_cpu_ptr(&cpuhp_state, __cpu);	\
 		init_completion(&st->done);			\
 	}							\
+	if (up)							\
+		lockdep_init_map(&cpuhp_state_lock_map,		\
+				 "cpuhp_state-up",		\
+				 &cpuhp_state_up_key, 0);	\
+	else							\
+		lockdep_init_map(&cpuhp_state_lock_map,		\
+				 "cpuhp_state-down",		\
+				 &cpuhp_state_down_key, 0);	\
 } while(0)
 #else
-#define lockdep_reinit_st_done()
+#define lockdep_reinit_st_done(up)
 #endif
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -716,9 +724,7 @@ static int __ref _cpu_down(unsigned int
 
 	cpus_write_lock();
 
-	lockdep_reinit_st_done();
-	lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-down",
-			 &cpuhp_state_down_key, 0);
+	lockdep_reinit_st_done(0);
 
 	cpuhp_tasks_frozen = tasks_frozen;
 
@@ -832,9 +838,7 @@ static int _cpu_up(unsigned int cpu, int
 
 	cpus_write_lock();
 
-	lockdep_reinit_st_done();
-	lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-up",
-			 &cpuhp_state_up_key, 0);
+	lockdep_reinit_st_done(1);
 
 	if (!cpu_present(cpu)) {
 		ret = -EINVAL;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] smp/hotplug annotations
  2017-09-05  7:52 [PATCH 0/2] smp/hotplug annotations Peter Zijlstra
  2017-09-05  7:52 ` [PATCH 1/2] smp/hotplug,lockdep: Annotate st->done Peter Zijlstra
  2017-09-05  7:52 ` [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state Peter Zijlstra
@ 2017-09-05 13:31 ` Thomas Gleixner
  2017-09-05 13:36   ` Thomas Gleixner
  2 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2017-09-05 13:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On Tue, 5 Sep 2017, Peter Zijlstra wrote:

> These two patches appear to make hotplug work again without tripping lockdep.

They cover the case where the plug/unplug succeeds, but they will not work
when a plug/unplug operation fails, because after a fail it rolls back
automatically, so in case UP fails, it will go down again, but the
initiator side still waits on the 'UP' completion. Same issue on down.

I think that extra lockdep magic can be avoided completely by splitting the
completions into a 'up' and a 'down' completion, but that only solves a
part of the problem. The current failure handling does an automated
rollback, so if UP fails somewhere the AP rolls back, which means it
invokes the down callbacks. DOWN the other way round.

We can solve that by changing the way how rollback is handled so it does
not automatically roll back.

    if (callback() < 0) {
       store_state();
       complete(UP);
       wait_for_being_kicked_again()
    }

and on the control side have

    wait_for_completion(UP);

    if (UP->failed) {
       	kick(DOWN);
	wait_for_completion(DOWN);
    }

It's not entirely trivial, but I haven't seen a real problem with it yet.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] smp/hotplug annotations
  2017-09-05 13:31 ` [PATCH 0/2] smp/hotplug annotations Thomas Gleixner
@ 2017-09-05 13:36   ` Thomas Gleixner
  2017-09-06 17:08     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2017-09-05 13:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On Tue, 5 Sep 2017, Thomas Gleixner wrote:
> On Tue, 5 Sep 2017, Peter Zijlstra wrote:
> 
> > These two patches appear to make hotplug work again without tripping lockdep.
> 
> They cover the case where the plug/unplug succeeds, but they will not work
> when a plug/unplug operation fails, because after a fail it rolls back
> automatically, so in case UP fails, it will go down again, but the
> initiator side still waits on the 'UP' completion. Same issue on down.
> 
> I think that extra lockdep magic can be avoided completely by splitting the
> completions into a 'up' and a 'down' completion, but that only solves a
> part of the problem. The current failure handling does an automated
> rollback, so if UP fails somewhere the AP rolls back, which means it
> invokes the down callbacks. DOWN the other way round.
> 
> We can solve that by changing the way how rollback is handled so it does
> not automatically roll back.
> 
>     if (callback() < 0) {
>        store_state();
>        complete(UP);
>        wait_for_being_kicked_again()
>     }
> 
> and on the control side have
> 
>     wait_for_completion(UP);
> 
>     if (UP->failed) {
>        	kick(DOWN);
> 	wait_for_completion(DOWN);
>     }
> 
> It's not entirely trivial, but I haven't seen a real problem with it yet.

Now I found one. It's the multi instance rollback. This is a nested
rollback mechanism deep in the call chain. Seperating that one is going to
be a major pain.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] smp/hotplug annotations
  2017-09-05 13:36   ` Thomas Gleixner
@ 2017-09-06 17:08     ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-06 17:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: mingo, linux-kernel

On Tue, Sep 05, 2017 at 03:36:11PM +0200, Thomas Gleixner wrote:
> On Tue, 5 Sep 2017, Thomas Gleixner wrote:
> > On Tue, 5 Sep 2017, Peter Zijlstra wrote:
> > 
> > > These two patches appear to make hotplug work again without tripping lockdep.
> > 
> > They cover the case where the plug/unplug succeeds, but they will not work
> > when a plug/unplug operation fails, because after a fail it rolls back
> > automatically, so in case UP fails, it will go down again, but the
> > initiator side still waits on the 'UP' completion. Same issue on down.
> > 
> > I think that extra lockdep magic can be avoided completely by splitting the
> > completions into a 'up' and a 'down' completion, but that only solves a
> > part of the problem. The current failure handling does an automated
> > rollback, so if UP fails somewhere the AP rolls back, which means it
> > invokes the down callbacks. DOWN the other way round.
> > 
> > We can solve that by changing the way how rollback is handled so it does
> > not automatically roll back.
> > 
> >     if (callback() < 0) {
> >        store_state();
> >        complete(UP);
> >        wait_for_being_kicked_again()
> >     }
> > 
> > and on the control side have
> > 
> >     wait_for_completion(UP);
> > 
> >     if (UP->failed) {
> >        	kick(DOWN);
> > 	wait_for_completion(DOWN);
> >     }
> > 
> > It's not entirely trivial, but I haven't seen a real problem with it yet.
> 
> Now I found one. It's the multi instance rollback. This is a nested
> rollback mechanism deep in the call chain. Seperating that one is going to
> be a major pain.

Yes, *ouch*.. Does something like the below look like something in the
right direction to you?

It appears to boot and offline, online cycle a CPU. But this might just
be a fluke, tricky stuff this.

Of course, the rollback is 100% untested... because that doesn't
normally trigger. I'll have to write some kernel modules for that. And,
as we discussed, the failure during rollback is 'interesting' and I
simply BUG() on that for now.

---
 kernel/cpu.c | 335 +++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 232 insertions(+), 103 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index acf5308fad51..02edb0f1d786 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -46,7 +46,8 @@
  * @bringup:	Single callback bringup or teardown selector
  * @cb_state:	The state for a single callback (install/uninstall)
  * @result:	Result of the operation
- * @done:	Signal completion to the issuer of the task
+ * @done_up:	Signal completion to the issuer of the task for cpu-up
+ * @done_down:	Signal completion to the issuer of the task for cpu-down
  */
 struct cpuhp_cpu_state {
 	enum cpuhp_state	state;
@@ -58,20 +59,51 @@ struct cpuhp_cpu_state {
 	bool			single;
 	bool			bringup;
 	struct hlist_node	*node;
+	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
 	int			result;
-	struct completion	done;
+	struct completion	done_up;
+	struct completion	done_down;
 #endif
 };
 
 static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
 
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
-static struct lock_class_key cpuhp_state_key;
-static struct lockdep_map cpuhp_state_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key);
+static struct lockdep_map cpuhp_state_up_map =
+	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
+static struct lockdep_map cpuhp_state_down_map =
+	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
+
+
+static void inline cpuhp_lock_acquire(bool bringup)
+{
+	lock_map_acquire(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
+}
+
+static void inline cpuhp_lock_release(bool bringup)
+{
+	lock_map_release(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
+}
+#else
+
+static void inline cpuhp_lock_acquire(bool bringup) { }
+static void inline cpuhp_lock_release(bool bringup) { }
+
 #endif
 
+static inline void wait_for_ap_thread(struct cpuhp_cpu_state *st, bool bringup)
+{
+	struct completion *done = bringup ? &st->done_up : &st->done_down;
+	wait_for_completion(done);
+}
+
+static inline void complete_ap_thread(struct cpuhp_cpu_state *st, bool bringup)
+{
+	struct completion *done = bringup ? &st->done_up : &st->done_down;
+	complete(done);
+}
+
 /**
  * cpuhp_step - Hotplug state machine step
  * @name:	Name of the step
@@ -129,7 +161,8 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
  * Called from cpu hotplug and from the state register machinery.
  */
 static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
-				 bool bringup, struct hlist_node *node)
+				 bool bringup, struct hlist_node *node,
+				 struct hlist_node **lastp)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	struct cpuhp_step *step = cpuhp_get_step(state);
@@ -138,6 +171,7 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 	int ret, cnt;
 
 	if (!step->multi_instance) {
+		WARN_ON_ONCE(lastp && *lastp);
 		cb = bringup ? step->startup.single : step->teardown.single;
 		if (!cb)
 			return 0;
@@ -152,6 +186,7 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 
 	/* Single invocation for instance add/remove */
 	if (node) {
+		WARN_ON_ONCE(lastp && *lastp);
 		trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node);
 		ret = cbm(cpu, node);
 		trace_cpuhp_exit(cpu, st->state, state, ret);
@@ -161,14 +196,26 @@ static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
 	/* State transition. Invoke on all instances */
 	cnt = 0;
 	hlist_for_each(node, &step->list) {
+		if (lastp && node == *lastp)
+			break;
+
 		trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node);
 		ret = cbm(cpu, node);
 		trace_cpuhp_exit(cpu, st->state, state, ret);
-		if (ret)
-			goto err;
+
+		if (ret) {
+			if (!lastp)
+				goto err;
+
+			*lastp = node;
+			return ret;
+		}
 		cnt++;
 	}
+	if (lastp)
+		*lastp = NULL;
 	return 0;
+
 err:
 	/* Rollback the instances if one failed */
 	cbm = !bringup ? step->startup.multi : step->teardown.multi;
@@ -271,14 +318,73 @@ void cpu_hotplug_enable(void)
 EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 #endif	/* CONFIG_HOTPLUG_CPU */
 
-static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
+static inline enum cpuhp_state
+cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
+{
+	enum cpuhp_state prev_state = st->state;
+
+	st->rollback = false;
+	st->last = NULL;
+	st->target = target;
+	st->bringup = st->state < target;
+
+	return prev_state;
+}
+
+static inline void
+cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
+{
+	st->rollback = true;
+	/*
+	 * Unless we have st->last set, we've failed to reach st->state
+	 * and must start by undoing the previous state. If we have st->last
+	 * we need to undo partial multi_instance of this state first.
+	 */
+	if (!st->last)
+		st->state--;
+	st->target = prev_state;
+	st->bringup = st->state < prev_state;
+}
+
+/* Regular hotplug invocation of the AP hotplug thread */
+static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
+{
+	if (st->state == st->target)
+		return;
+
+	st->result = 0;
+	st->single = false;
+	/*
+	 * Make sure the above stores are visible before should_run becomes
+	 * true. Paired with the mb() above in cpuhp_thread_fun()
+	 */
+	smp_mb();
+	st->should_run = true;
+	wake_up_process(st->thread);
+	wait_for_ap_thread(st, st->bringup);
+}
+
+static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
+{
+	enum cpuhp_state prev_state = cpuhp_set_state(st, target);
+	int ret;
+
+	__cpuhp_kick_ap(st);
+
+	if ((ret = st->result)) {
+		cpuhp_reset_state(st, prev_state);
+		__cpuhp_kick_ap(st);
+	}
+
+	return ret;
+}
 
 static int bringup_wait_for_ap(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
 	/* Wait for the CPU to reach CPUHP_AP_ONLINE_IDLE */
-	wait_for_completion(&st->done);
+	wait_for_ap_thread(st, true);
 	if (WARN_ON_ONCE((!cpu_online(cpu))))
 		return -ECANCELED;
 
@@ -286,12 +392,10 @@ static int bringup_wait_for_ap(unsigned int cpu)
 	stop_machine_unpark(cpu);
 	kthread_unpark(st->thread);
 
-	/* Should we go further up ? */
-	if (st->target > CPUHP_AP_ONLINE_IDLE) {
-		__cpuhp_kick_ap_work(st);
-		wait_for_completion(&st->done);
-	}
-	return st->result;
+	if (st->target <= CPUHP_AP_ONLINE_IDLE)
+		return 0;
+
+	return cpuhp_kick_ap(st, st->target);
 }
 
 static int bringup_cpu(unsigned int cpu)
@@ -323,7 +427,7 @@ static void undo_cpu_down(unsigned int cpu, struct cpuhp_cpu_state *st)
 		struct cpuhp_step *step = cpuhp_get_step(st->state);
 
 		if (!step->skip_onerr)
-			cpuhp_invoke_callback(cpu, st->state, true, NULL);
+			cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 	}
 }
 
@@ -334,7 +438,7 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 	int ret = 0;
 
 	for (; st->state > target; st->state--) {
-		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL);
+		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 		if (ret) {
 			st->target = prev_state;
 			undo_cpu_down(cpu, st);
@@ -350,7 +454,7 @@ static void undo_cpu_up(unsigned int cpu, struct cpuhp_cpu_state *st)
 		struct cpuhp_step *step = cpuhp_get_step(st->state);
 
 		if (!step->skip_onerr)
-			cpuhp_invoke_callback(cpu, st->state, false, NULL);
+			cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 	}
 }
 
@@ -362,7 +466,7 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 
 	while (st->state < target) {
 		st->state++;
-		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL);
+		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
 			st->target = prev_state;
 			undo_cpu_up(cpu, st);
@@ -379,7 +483,8 @@ static void cpuhp_create(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
-	init_completion(&st->done);
+	init_completion(&st->done_up);
+	init_completion(&st->done_down);
 }
 
 static int cpuhp_should_run(unsigned int cpu)
@@ -389,20 +494,6 @@ static int cpuhp_should_run(unsigned int cpu)
 	return st->should_run;
 }
 
-/* Execute the teardown callbacks. Used to be CPU_DOWN_PREPARE */
-static int cpuhp_ap_offline(unsigned int cpu, struct cpuhp_cpu_state *st)
-{
-	enum cpuhp_state target = max((int)st->target, CPUHP_TEARDOWN_CPU);
-
-	return cpuhp_down_callbacks(cpu, st, target);
-}
-
-/* Execute the online startup callbacks. Used to be CPU_ONLINE */
-static int cpuhp_ap_online(unsigned int cpu, struct cpuhp_cpu_state *st)
-{
-	return cpuhp_up_callbacks(cpu, st, st->target);
-}
-
 /*
  * Execute teardown/startup callbacks on the plugged cpu. Also used to invoke
  * callbacks when a state gets [un]installed at runtime.
@@ -410,48 +501,70 @@ static int cpuhp_ap_online(unsigned int cpu, struct cpuhp_cpu_state *st)
 static void cpuhp_thread_fun(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
-	int ret = 0;
+	bool bringup = st->bringup;
+	enum cpuhp_state state;
 
 	/*
-	 * Paired with the mb() in cpuhp_kick_ap_work and
-	 * cpuhp_invoke_ap_callback, so the work set is consistent visible.
+	 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
+	 * that if we see ->should_run we also see the rest of the state.
 	 */
 	smp_mb();
-	if (!st->should_run)
+
+	if (WARN_ON_ONCE(!st->should_run))
 		return;
 
-	st->should_run = false;
+	cpuhp_lock_acquire(bringup);
 
-	lock_map_acquire(&cpuhp_state_lock_map);
-	/* Single callback invocation for [un]install ? */
 	if (st->single) {
-		if (st->cb_state < CPUHP_AP_ONLINE) {
-			local_irq_disable();
-			ret = cpuhp_invoke_callback(cpu, st->cb_state,
-						    st->bringup, st->node);
-			local_irq_enable();
+		state = st->cb_state;
+		st->should_run = false;
+	} else {
+		if (bringup) {
+			st->state++;
+			st->should_run = (st->state < st->target);
+			BUG_ON(st->state > st->target);
+			state = st->state;
 		} else {
-			ret = cpuhp_invoke_callback(cpu, st->cb_state,
-						    st->bringup, st->node);
+			state = st->state;
+			st->state--;
+			st->should_run = (st->state >
+					  max((int)st->target, CPUHP_TEARDOWN_CPU));
+			BUG_ON(st->state < st->target);
 		}
-	} else if (st->rollback) {
-		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
+	}
 
-		undo_cpu_down(cpu, st);
-		st->rollback = false;
+//	BUG_ON(!cpuhp_is_ap_state(state));
+	WARN_ONCE(!cpuhp_is_ap_state(state), "invalid AP state: %d\n", state);
+
+	if (st->rollback) {
+		struct cpuhp_step *step = cpuhp_get_step(state);
+		if (step->skip_onerr)
+			goto next;
+	}
+
+	if (state < CPUHP_AP_ONLINE) {
+		local_irq_disable();
+		st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
+		local_irq_enable();
 	} else {
-		/* Cannot happen .... */
-		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
-
-		/* Regular hotplug work */
-		if (st->state < st->target)
-			ret = cpuhp_ap_online(cpu, st);
-		else if (st->state > st->target)
-			ret = cpuhp_ap_offline(cpu, st);
+		st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
+	}
+
+	if (st->result) {
+		/*
+		 * If we fail on a rollback, we're up a creek without no
+		 * paddle, no way forward, no way back. We loose, thanks for
+		 * playing.
+		 */
+		BUG_ON(st->rollback);
+		st->should_run = false;
 	}
-	lock_map_release(&cpuhp_state_lock_map);
-	st->result = ret;
-	complete(&st->done);
+
+next:
+	cpuhp_lock_release(bringup);
+
+	if (!st->should_run)
+		complete_ap_thread(st, bringup);
 }
 
 /* Invoke a single callback on a remote cpu */
@@ -460,62 +573,75 @@ cpuhp_invoke_ap_callback(int cpu, enum cpuhp_state state, bool bringup,
 			 struct hlist_node *node)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+	int ret;
 
 	if (!cpu_online(cpu))
 		return 0;
 
-	lock_map_acquire(&cpuhp_state_lock_map);
-	lock_map_release(&cpuhp_state_lock_map);
+	cpuhp_lock_acquire(false);
+	cpuhp_lock_release(false);
+
+	cpuhp_lock_acquire(true);
+	cpuhp_lock_release(true);
 
 	/*
 	 * If we are up and running, use the hotplug thread. For early calls
 	 * we invoke the thread function directly.
 	 */
 	if (!st->thread)
-		return cpuhp_invoke_callback(cpu, state, bringup, node);
+		return cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 
-	st->cb_state = state;
+	st->rollback = false;
 	st->single = true;
 	st->bringup = bringup;
 	st->node = node;
+	st->last = NULL;
+	st->cb_state = state;
+	st->result = 0;
 
 	/*
-	 * Make sure the above stores are visible before should_run becomes
-	 * true. Paired with the mb() above in cpuhp_thread_fun()
+	 * RELEASE - ensures the above stores are visible when should_run
+	 * becomes true. Paired with the smp_mb() in cpuhp_thread_fun().
 	 */
 	smp_mb();
 	st->should_run = true;
 	wake_up_process(st->thread);
-	wait_for_completion(&st->done);
-	return st->result;
-}
+	wait_for_ap_thread(st, bringup);
 
-/* Regular hotplug invocation of the AP hotplug thread */
-static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st)
-{
-	st->result = 0;
-	st->single = false;
 	/*
-	 * Make sure the above stores are visible before should_run becomes
-	 * true. Paired with the mb() above in cpuhp_thread_fun()
+	 * If we failed and did a partial, do a rollback.
 	 */
-	smp_mb();
-	st->should_run = true;
-	wake_up_process(st->thread);
+	if ((ret = st->result) && st->last) {
+		st->rollback = true;
+		st->bringup = !bringup;
+
+		smp_mb();
+		st->should_run = true;
+		wake_up_process(st->thread);
+		wait_for_ap_thread(st, !bringup);
+	}
+
+	return ret;
 }
 
 static int cpuhp_kick_ap_work(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
-	enum cpuhp_state state = st->state;
+	enum cpuhp_state prev_state = st->state;
+	int ret;
+
+	trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
+
+	cpuhp_lock_acquire(false);
+	cpuhp_lock_release(false);
+
+	cpuhp_lock_acquire(true);
+	cpuhp_lock_release(true);
+
+	ret = cpuhp_kick_ap(st, st->target);
 
-	trace_cpuhp_enter(cpu, st->target, state, cpuhp_kick_ap_work);
-	lock_map_acquire(&cpuhp_state_lock_map);
-	lock_map_release(&cpuhp_state_lock_map);
-	__cpuhp_kick_ap_work(st);
-	wait_for_completion(&st->done);
-	trace_cpuhp_exit(cpu, st->state, state, st->result);
-	return st->result;
+	trace_cpuhp_exit(cpu, st->state, prev_state, ret);
+	return ret;
 }
 
 static struct smp_hotplug_thread cpuhp_threads = {
@@ -595,7 +721,7 @@ static int take_cpu_down(void *_param)
 	st->state--;
 	/* Invoke the former CPU_DYING callbacks */
 	for (; st->state > target; st->state--)
-		cpuhp_invoke_callback(cpu, st->state, false, NULL);
+		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -639,7 +765,7 @@ static int takedown_cpu(unsigned int cpu)
 	 *
 	 * Wait for the stop thread to go away.
 	 */
-	wait_for_completion(&st->done);
+	wait_for_ap_thread(st, false);
 	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
 
 	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
@@ -658,7 +784,7 @@ static void cpuhp_complete_idle_dead(void *arg)
 {
 	struct cpuhp_cpu_state *st = arg;
 
-	complete(&st->done);
+	complete_ap_thread(st, false);
 }
 
 void cpuhp_report_idle_dead(void)
@@ -680,6 +806,7 @@ void cpuhp_report_idle_dead(void)
 #define takedown_cpu		NULL
 #endif
 
+
 #ifdef CONFIG_HOTPLUG_CPU
 
 /* Requires cpu_add_remove_lock to be held */
@@ -699,8 +826,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 
 	cpuhp_tasks_frozen = tasks_frozen;
 
-	prev_state = st->state;
-	st->target = target;
+	prev_state = cpuhp_set_state(st, target);
 	/*
 	 * If the current CPU state is in the range of the AP hotplug thread,
 	 * then we need to kick the thread.
@@ -727,8 +853,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	 */
 	ret = cpuhp_down_callbacks(cpu, st, target);
 	if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
-		st->target = prev_state;
-		st->rollback = true;
+		cpuhp_reset_state(st, prev_state);
 		cpuhp_kick_ap_work(cpu);
 	}
 
@@ -776,7 +901,7 @@ void notify_cpu_starting(unsigned int cpu)
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	while (st->state < target) {
 		st->state++;
-		cpuhp_invoke_callback(cpu, st->state, true, NULL);
+		cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 	}
 }
 
@@ -794,7 +919,7 @@ void cpuhp_online_idle(enum cpuhp_state state)
 		return;
 
 	st->state = CPUHP_AP_ONLINE_IDLE;
-	complete(&st->done);
+	complete_ap_thread(st, true);
 }
 
 /* Requires cpu_add_remove_lock to be held */
@@ -829,7 +954,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
 
 	cpuhp_tasks_frozen = tasks_frozen;
 
-	st->target = target;
+	cpuhp_set_state(st, target);
 	/*
 	 * If the current CPU state is in the range of the AP hotplug thread,
 	 * then we need to kick the thread once more.
@@ -1296,6 +1421,10 @@ static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup,
 	struct cpuhp_step *sp = cpuhp_get_step(state);
 	int ret;
 
+	/*
+	 * If there's nothing to do, we done.
+	 * Relies on the union for multi_instance.
+	 */
 	if ((bringup && !sp->startup.single) ||
 	    (!bringup && !sp->teardown.single))
 		return 0;
@@ -1307,9 +1436,9 @@ static int cpuhp_issue_call(int cpu, enum cpuhp_state state, bool bringup,
 	if (cpuhp_is_ap_state(state))
 		ret = cpuhp_invoke_ap_callback(cpu, state, bringup, node);
 	else
-		ret = cpuhp_invoke_callback(cpu, state, bringup, node);
+		ret = cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 #else
-	ret = cpuhp_invoke_callback(cpu, state, bringup, node);
+	ret = cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 #endif
 	BUG_ON(ret && !bringup);
 	return ret;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state
  2017-09-05  7:52 ` [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state Peter Zijlstra
  2017-09-05 12:59   ` Mike Galbraith
@ 2017-09-19 17:57   ` Paul E. McKenney
  1 sibling, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2017-09-19 17:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, Byungchul Park,
	Sebastian Andrzej Siewior, Mike Galbraith

On Tue, Sep 05, 2017 at 09:52:20AM +0200, Peter Zijlstra wrote:
> After the st->done annotation, lockdep cross-release now complains
> about:
> 
>   CPU0                  CPU1                    CPU2
>   cpuhp_up_callbacks:	takedown_cpu:		cpuhp_thread_fun:
> 
>   cpuhp_state
>                         irq_lock_sparse()
>     irq_lock_sparse()
>                         wait_for_completion()
>                                                 cpuhp_state
>                                                 complete()
> 
> which again spells deadlock, because CPU0 needs to wait for CPU1's
> irq_lock_sparse which will wait for CPU2's completion, which in turn
> waits for CPU0's cpuhp_state.
> 
> Now, this again mixes up and down chains, but now on cpuhp_state.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Byungchul Park <max.byungchul.park@gmail.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Reported-by: Mike Galbraith <efault@gmx.de>
> Tested-by: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/cpu.c |   12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -67,11 +67,14 @@ struct cpuhp_cpu_state {
>  static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
>  
>  #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
> -static struct lock_class_key cpuhp_state_key;
> +static struct lock_class_key cpuhp_state_up_key;
> +#ifdef CONFIG_HOTPLUG_CPU
> +static struct lock_class_key cpuhp_state_down_key;
> +#endif

These two patches work for me if CONFIG_PROVE_LOCKING=y, but
the lockdep_init_map() wants its key argument to exist if
CONFIG_PROVE_LOCKING=n.

Not sure whether it is better to remove the CONFIG_LOCKDEP #if
on the one hand or to make lockdep_init_map() lose the "(void)"
things on the other...

My tests didn't involve failing CPU hotplug operations, so I
didn't run into the issue Thomas was concerned about.

						Thanx, Paul

>  static struct lockdep_map cpuhp_state_lock_map =
> -	STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key);
> +	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_key);
>  #endif
>  
>  /**
>   * cpuhp_step - Hotplug state machine step
>   * @name:	Name of the step
> @@ -714,6 +718,8 @@ static int __ref _cpu_down(unsigned int
>  	cpus_write_lock();
>  
>  	lockdep_reinit_st_done();
> +	lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-down",
> +			 &cpuhp_state_down_key, 0);
>  
>  	cpuhp_tasks_frozen = tasks_frozen;
>  
> @@ -828,6 +834,8 @@ static int _cpu_up(unsigned int cpu, int
>  	cpus_write_lock();
>  
>  	lockdep_reinit_st_done();
> +	lockdep_init_map(&cpuhp_state_lock_map, "cpuhp_state-up",
> +			 &cpuhp_state_up_key, 0);
>  
>  	if (!cpu_present(cpu)) {
>  		ret = -EINVAL;
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-19 17:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-05  7:52 [PATCH 0/2] smp/hotplug annotations Peter Zijlstra
2017-09-05  7:52 ` [PATCH 1/2] smp/hotplug,lockdep: Annotate st->done Peter Zijlstra
2017-09-05  7:52 ` [PATCH 2/2] smp/hotplug,lockdep: Annotate cpuhp_state Peter Zijlstra
2017-09-05 12:59   ` Mike Galbraith
2017-09-19 17:57   ` Paul E. McKenney
2017-09-05 13:31 ` [PATCH 0/2] smp/hotplug annotations Thomas Gleixner
2017-09-05 13:36   ` Thomas Gleixner
2017-09-06 17:08     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).