LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/7] smp/hotplug rework / lockdep annotate
@ 2017-09-20 17:00 Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 1/7] smp/hotplug: Add state diagram Peter Zijlstra
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park

These here patches rework the cpu hotplug state machine and add
two lockdep (crossrelease) annotations.

The rework is quite invasive, but was suggested by Thomas as being the right
way to go about doing things :-)

These patches have been tested with Steve's hotplug stress script, as well as
two new hotplug test scripts which test cpu up/down rollback through explicit
failure injection (see last patch).

Where cpu-down used to insta-trigger a lockdep splat, I am no longer able to
trigger anything with these patches applied.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/7] smp/hotplug: Add state diagram
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 2/7] smp/hotplug: Allow external multi-instance rollback Peter Zijlstra
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-1.patch --]
[-- Type: text/plain, Size: 793 bytes --]

Add a state diagram to clarify when which states are ran where.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cpuhotplug.h |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -3,6 +3,24 @@
 
 #include <linux/types.h>
 
+/*
+ * CPU-up			CPU-down
+ *
+ * BP		AP		BP		AP
+ *
+ * OFFLINE			OFFLINE
+ *   |				  ^
+ *   v				  |
+ * BRINGUP_CPU->AP_OFFLINE	BRINGUP_CPU  <- AP_IDLE_DEAD (idle thread/play_dead)
+ *		  |				AP_OFFLINE
+ *		  v (IRQ-off)	  ,---------------^
+ *		AP_ONLNE	  | (stop_machine)
+ *		  |		TEARDOWN_CPU <-	AP_ONLINE_IDLE
+ *		  |				  ^
+ *		  v				  |
+ *              AP_ACTIVE			AP_ACTIVE
+ */
+
 enum cpuhp_state {
 	CPUHP_OFFLINE,
 	CPUHP_CREATE_THREADS,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/7] smp/hotplug: Allow external multi-instance rollback
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 1/7] smp/hotplug: Add state diagram Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 3/7] smp/hotplug: Rewrite AP state machine core Peter Zijlstra
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-5.patch --]
[-- Type: text/plain, Size: 5658 bytes --]

Currently the rollback of multi-instance states is handled inside
cpuhp_invoke_callback(). The problem is that when we want to allow an
explicit state change for rollback, we need to return from the
function without doing the rollback.

Change cpuhp_invoke_callback() to optionally return the multi-instance
state, such that rollback can be done from a subsequent call.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c |   47 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 15 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -123,13 +123,16 @@ static struct cpuhp_step *cpuhp_get_step
 /**
  * cpuhp_invoke_callback _ Invoke the callbacks for a given state
  * @cpu:	The cpu for which the callback should be invoked
- * @step:	The step in the state machine
+ * @state:	The state to do callbacks for
  * @bringup:	True if the bringup callback should be invoked
+ * @node:	For multi-instance, do a single entry callback for install/remove
+ * @lastp:	For multi-instance rollback, remember how far we got
  *
  * Called from cpu hotplug and from the state register machinery.
  */
 static int cpuhp_invoke_callback(unsigned int cpu, enum cpuhp_state state,
-				 bool bringup, struct hlist_node *node)
+				 bool bringup, struct hlist_node *node,
+				 struct hlist_node **lastp)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	struct cpuhp_step *step = cpuhp_get_step(state);
@@ -138,6 +141,7 @@ static int cpuhp_invoke_callback(unsigne
 	int ret, cnt;
 
 	if (!step->multi_instance) {
+		WARN_ON_ONCE(lastp && *lastp);
 		cb = bringup ? step->startup.single : step->teardown.single;
 		if (!cb)
 			return 0;
@@ -152,6 +156,7 @@ static int cpuhp_invoke_callback(unsigne
 
 	/* Single invocation for instance add/remove */
 	if (node) {
+		WARN_ON_ONCE(lastp && *lastp);
 		trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node);
 		ret = cbm(cpu, node);
 		trace_cpuhp_exit(cpu, st->state, state, ret);
@@ -161,13 +166,23 @@ static int cpuhp_invoke_callback(unsigne
 	/* State transition. Invoke on all instances */
 	cnt = 0;
 	hlist_for_each(node, &step->list) {
+		if (lastp && node == *lastp)
+			break;
+
 		trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node);
 		ret = cbm(cpu, node);
 		trace_cpuhp_exit(cpu, st->state, state, ret);
-		if (ret)
-			goto err;
+		if (ret) {
+			if (!lastp)
+				goto err;
+
+			*lastp = node;
+			return ret;
+		}
 		cnt++;
 	}
+	if (lastp)
+		*lastp = NULL;
 	return 0;
 err:
 	/* Rollback the instances if one failed */
@@ -323,7 +338,7 @@ static void undo_cpu_down(unsigned int c
 		struct cpuhp_step *step = cpuhp_get_step(st->state);
 
 		if (!step->skip_onerr)
-			cpuhp_invoke_callback(cpu, st->state, true, NULL);
+			cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 	}
 }
 
@@ -334,7 +349,7 @@ static int cpuhp_down_callbacks(unsigned
 	int ret = 0;
 
 	for (; st->state > target; st->state--) {
-		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL);
+		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 		if (ret) {
 			st->target = prev_state;
 			undo_cpu_down(cpu, st);
@@ -350,7 +365,7 @@ static void undo_cpu_up(unsigned int cpu
 		struct cpuhp_step *step = cpuhp_get_step(st->state);
 
 		if (!step->skip_onerr)
-			cpuhp_invoke_callback(cpu, st->state, false, NULL);
+			cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 	}
 }
 
@@ -362,7 +377,7 @@ static int cpuhp_up_callbacks(unsigned i
 
 	while (st->state < target) {
 		st->state++;
-		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL);
+		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
 			st->target = prev_state;
 			undo_cpu_up(cpu, st);
@@ -428,11 +443,13 @@ static void cpuhp_thread_fun(unsigned in
 		if (st->cb_state < CPUHP_AP_ONLINE) {
 			local_irq_disable();
 			ret = cpuhp_invoke_callback(cpu, st->cb_state,
-						    st->bringup, st->node);
+						    st->bringup, st->node,
+						    NULL);
 			local_irq_enable();
 		} else {
 			ret = cpuhp_invoke_callback(cpu, st->cb_state,
-						    st->bringup, st->node);
+						    st->bringup, st->node,
+						    NULL);
 		}
 	} else if (st->rollback) {
 		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
@@ -472,7 +489,7 @@ cpuhp_invoke_ap_callback(int cpu, enum c
 	 * we invoke the thread function directly.
 	 */
 	if (!st->thread)
-		return cpuhp_invoke_callback(cpu, state, bringup, node);
+		return cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 
 	st->cb_state = state;
 	st->single = true;
@@ -595,7 +612,7 @@ static int take_cpu_down(void *_param)
 	st->state--;
 	/* Invoke the former CPU_DYING callbacks */
 	for (; st->state > target; st->state--)
-		cpuhp_invoke_callback(cpu, st->state, false, NULL);
+		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -776,7 +793,7 @@ void notify_cpu_starting(unsigned int cp
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	while (st->state < target) {
 		st->state++;
-		cpuhp_invoke_callback(cpu, st->state, true, NULL);
+		cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 	}
 }
 
@@ -1307,9 +1324,9 @@ static int cpuhp_issue_call(int cpu, enu
 	if (cpuhp_is_ap_state(state))
 		ret = cpuhp_invoke_ap_callback(cpu, state, bringup, node);
 	else
-		ret = cpuhp_invoke_callback(cpu, state, bringup, node);
+		ret = cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 #else
-	ret = cpuhp_invoke_callback(cpu, state, bringup, node);
+	ret = cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 #endif
 	BUG_ON(ret && !bringup);
 	return ret;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/7] smp/hotplug: Rewrite AP state machine core
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 1/7] smp/hotplug: Add state diagram Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 2/7] smp/hotplug: Allow external multi-instance rollback Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 4/7] smp/hotplug: Callback vs state-machine consistency Peter Zijlstra
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-2.patch --]
[-- Type: text/plain, Size: 11480 bytes --]

There is currently no explicit state change on rollback. That is,
st->bringup, st->rollback and st->target are not consistent when doing
the rollback.

Rework the AP state handling to be more coherent. This does mean we
have to do a second AP kick-and-wait for rollback, but since rollback
is the slow path of a slowpath, this really should not matter.

Take this opportunity to simplify the AP thread function to only run a
single callback per invocation. This unifies the three single/up/down
modes is supports. The looping it used to do for up/down are achieved
by retaining should_run and relying on the main smpboot_thread_fn()
loop.

(I have most of a patch that does the same for the BP state handling,
but that's not critical and gets a little complicated because
CPUHP_BRINGUP_CPU does the AP handoff from a callback, which gets
recursive @st usage, I still have de-fugly that.)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c |  260 +++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 176 insertions(+), 84 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -58,6 +58,7 @@ struct cpuhp_cpu_state {
 	bool			single;
 	bool			bringup;
 	struct hlist_node	*node;
+	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
 	int			result;
 	struct completion	done;
@@ -112,6 +113,14 @@ static bool cpuhp_is_ap_state(enum cpuhp
 	return state > CPUHP_BRINGUP_CPU && state != CPUHP_TEARDOWN_CPU;
 }
 
+/*
+ * The former STARTING/DYING states, ran with IRQs disabled and must not fail.
+ */
+static bool cpuhp_is_atomic_state(enum cpuhp_state state)
+{
+	return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
+}
+
 static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 {
 	struct cpuhp_step *sp;
@@ -286,7 +295,72 @@ void cpu_hotplug_enable(void)
 EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 #endif	/* CONFIG_HOTPLUG_CPU */
 
-static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
+static inline enum cpuhp_state
+cpuhp_set_state(struct cpuhp_cpu_state *st, enum cpuhp_state target)
+{
+	enum cpuhp_state prev_state = st->state;
+
+	st->rollback = false;
+	st->last = NULL;
+
+	st->target = target;
+	st->single = false;
+	st->bringup = st->state < target;
+
+	return prev_state;
+}
+
+static inline void
+cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
+{
+	st->rollback = true;
+
+	/*
+	 * If we have st->last we need to undo partial multi_instance of this
+	 * state first. Otherwise start undo at the previous state.
+	 */
+	if (!st->last) {
+		if (st->bringup)
+			st->state--;
+		else
+			st->state++;
+	}
+
+	st->target = prev_state;
+	st->bringup = !st->bringup;
+}
+
+/* Regular hotplug invocation of the AP hotplug thread */
+static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
+{
+	if (!st->single && st->state == st->target)
+		return;
+
+	st->result = 0;
+	/*
+	 * Make sure the above stores are visible before should_run becomes
+	 * true. Paired with the mb() above in cpuhp_thread_fun()
+	 */
+	smp_mb();
+	st->should_run = true;
+	wake_up_process(st->thread);
+	wait_for_completion(&st->done);
+}
+
+static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
+{
+	enum cpuhp_state prev_state;
+	int ret;
+
+	prev_state = cpuhp_set_state(st, target);
+	__cpuhp_kick_ap(st);
+	if ((ret = st->result)) {
+		cpuhp_reset_state(st, prev_state);
+		__cpuhp_kick_ap(st);
+	}
+
+	return ret;
+}
 
 static int bringup_wait_for_ap(unsigned int cpu)
 {
@@ -301,12 +375,10 @@ static int bringup_wait_for_ap(unsigned
 	stop_machine_unpark(cpu);
 	kthread_unpark(st->thread);
 
-	/* Should we go further up ? */
-	if (st->target > CPUHP_AP_ONLINE_IDLE) {
-		__cpuhp_kick_ap_work(st);
-		wait_for_completion(&st->done);
-	}
-	return st->result;
+	if (st->target <= CPUHP_AP_ONLINE_IDLE)
+		return 0;
+
+	return cpuhp_kick_ap(st, st->target);
 }
 
 static int bringup_cpu(unsigned int cpu)
@@ -404,71 +476,90 @@ static int cpuhp_should_run(unsigned int
 	return st->should_run;
 }
 
-/* Execute the teardown callbacks. Used to be CPU_DOWN_PREPARE */
-static int cpuhp_ap_offline(unsigned int cpu, struct cpuhp_cpu_state *st)
-{
-	enum cpuhp_state target = max((int)st->target, CPUHP_TEARDOWN_CPU);
-
-	return cpuhp_down_callbacks(cpu, st, target);
-}
-
-/* Execute the online startup callbacks. Used to be CPU_ONLINE */
-static int cpuhp_ap_online(unsigned int cpu, struct cpuhp_cpu_state *st)
-{
-	return cpuhp_up_callbacks(cpu, st, st->target);
-}
-
 /*
  * Execute teardown/startup callbacks on the plugged cpu. Also used to invoke
  * callbacks when a state gets [un]installed at runtime.
+ *
+ * Each invocation of this function by the smpboot thread does a single AP
+ * state callback.
+ *
+ * It has 3 modes of operation:
+ *  - single: runs st->cb_state
+ *  - up:     runs ++st->state, while st->state < st->target
+ *  - down:   runs st->state--, while st->state > st->target
+ *
+ * When complete or on error, should_run is cleared and the completion is fired.
  */
 static void cpuhp_thread_fun(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
-	int ret = 0;
+	bool bringup = st->bringup;
+	enum cpuhp_state state;
 
 	/*
-	 * Paired with the mb() in cpuhp_kick_ap_work and
-	 * cpuhp_invoke_ap_callback, so the work set is consistent visible.
+	 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
+	 * that if we see ->should_run we also see the rest of the state.
 	 */
 	smp_mb();
-	if (!st->should_run)
-		return;
 
-	st->should_run = false;
+	if (WARN_ON_ONCE(!st->should_run))
+		return;
 
 	lock_map_acquire(&cpuhp_state_lock_map);
-	/* Single callback invocation for [un]install ? */
+
 	if (st->single) {
-		if (st->cb_state < CPUHP_AP_ONLINE) {
-			local_irq_disable();
-			ret = cpuhp_invoke_callback(cpu, st->cb_state,
-						    st->bringup, st->node,
-						    NULL);
-			local_irq_enable();
+		state = st->cb_state;
+		st->should_run = false;
+	} else {
+		if (bringup) {
+			st->state++;
+			state = st->state;
+			st->should_run = (st->state < st->target);
+			WARN_ON_ONCE(st->state > st->target);
 		} else {
-			ret = cpuhp_invoke_callback(cpu, st->cb_state,
-						    st->bringup, st->node,
-						    NULL);
+			state = st->state;
+			st->state--;
+			st->should_run = (st->state > st->target);
+			WARN_ON_ONCE(st->state < st->target);
 		}
-	} else if (st->rollback) {
-		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
+	}
+
+	WARN_ON_ONCE(!cpuhp_is_ap_state(state));
+
+	if (st->rollback) {
+		struct cpuhp_step *step = cpuhp_get_step(state);
+		if (step->skip_onerr)
+			goto next;
+	}
 
-		undo_cpu_down(cpu, st);
-		st->rollback = false;
+	if (cpuhp_is_atomic_state(state)) {
+		local_irq_disable();
+		st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
+		local_irq_enable();
+
+		/*
+		 * STARTING/DYING must not fail!
+		 */
+		WARN_ON_ONCE(st->result);
 	} else {
-		/* Cannot happen .... */
-		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
+		st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
+	}
 
-		/* Regular hotplug work */
-		if (st->state < st->target)
-			ret = cpuhp_ap_online(cpu, st);
-		else if (st->state > st->target)
-			ret = cpuhp_ap_offline(cpu, st);
+	if (st->result) {
+		/*
+		 * If we fail on a rollback, we're up a creek without no
+		 * paddle, no way forward, no way back. We loose, thanks for
+		 * playing.
+		 */
+		WARN_ON_ONCE(st->rollback);
+		st->should_run = false;
 	}
+
+next:
 	lock_map_release(&cpuhp_state_lock_map);
-	st->result = ret;
-	complete(&st->done);
+
+	if (!st->should_run)
+		complete(&st->done);
 }
 
 /* Invoke a single callback on a remote cpu */
@@ -477,6 +568,7 @@ cpuhp_invoke_ap_callback(int cpu, enum c
 			 struct hlist_node *node)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+	int ret;
 
 	if (!cpu_online(cpu))
 		return 0;
@@ -491,48 +583,43 @@ cpuhp_invoke_ap_callback(int cpu, enum c
 	if (!st->thread)
 		return cpuhp_invoke_callback(cpu, state, bringup, node, NULL);
 
+	st->rollback = false;
+	st->last = NULL;
+
+	st->node = node;
+	st->bringup = bringup;
 	st->cb_state = state;
 	st->single = true;
-	st->bringup = bringup;
-	st->node = node;
 
-	/*
-	 * Make sure the above stores are visible before should_run becomes
-	 * true. Paired with the mb() above in cpuhp_thread_fun()
-	 */
-	smp_mb();
-	st->should_run = true;
-	wake_up_process(st->thread);
-	wait_for_completion(&st->done);
-	return st->result;
-}
+	__cpuhp_kick_ap(st);
 
-/* Regular hotplug invocation of the AP hotplug thread */
-static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st)
-{
-	st->result = 0;
-	st->single = false;
 	/*
-	 * Make sure the above stores are visible before should_run becomes
-	 * true. Paired with the mb() above in cpuhp_thread_fun()
+	 * If we failed and did a partial, do a rollback.
 	 */
-	smp_mb();
-	st->should_run = true;
-	wake_up_process(st->thread);
+	if ((ret = st->result) && st->last) {
+		st->rollback = true;
+		st->bringup = !bringup;
+
+		__cpuhp_kick_ap(st);
+	}
+
+	return ret;
 }
 
 static int cpuhp_kick_ap_work(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
-	enum cpuhp_state state = st->state;
+	enum cpuhp_state prev_state = st->state;
+	int ret;
 
-	trace_cpuhp_enter(cpu, st->target, state, cpuhp_kick_ap_work);
 	lock_map_acquire(&cpuhp_state_lock_map);
 	lock_map_release(&cpuhp_state_lock_map);
-	__cpuhp_kick_ap_work(st);
-	wait_for_completion(&st->done);
-	trace_cpuhp_exit(cpu, st->state, state, st->result);
-	return st->result;
+
+	trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
+	ret = cpuhp_kick_ap(st, st->target);
+	trace_cpuhp_exit(cpu, st->state, prev_state, ret);
+
+	return ret;
 }
 
 static struct smp_hotplug_thread cpuhp_threads = {
@@ -716,13 +803,13 @@ static int __ref _cpu_down(unsigned int
 
 	cpuhp_tasks_frozen = tasks_frozen;
 
-	prev_state = st->state;
-	st->target = target;
+	prev_state = cpuhp_set_state(st, target);
 	/*
 	 * If the current CPU state is in the range of the AP hotplug thread,
 	 * then we need to kick the thread.
 	 */
 	if (st->state > CPUHP_TEARDOWN_CPU) {
+		st->target = max((int)target, CPUHP_TEARDOWN_CPU);
 		ret = cpuhp_kick_ap_work(cpu);
 		/*
 		 * The AP side has done the error rollback already. Just
@@ -737,6 +824,8 @@ static int __ref _cpu_down(unsigned int
 		 */
 		if (st->state > CPUHP_TEARDOWN_CPU)
 			goto out;
+
+		st->target = target;
 	}
 	/*
 	 * The AP brought itself down to CPUHP_TEARDOWN_CPU. So we need
@@ -744,9 +833,8 @@ static int __ref _cpu_down(unsigned int
 	 */
 	ret = cpuhp_down_callbacks(cpu, st, target);
 	if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
-		st->target = prev_state;
-		st->rollback = true;
-		cpuhp_kick_ap_work(cpu);
+		cpuhp_reset_state(st, prev_state);
+		__cpuhp_kick_ap(st);
 	}
 
 out:
@@ -846,7 +934,7 @@ static int _cpu_up(unsigned int cpu, int
 
 	cpuhp_tasks_frozen = tasks_frozen;
 
-	st->target = target;
+	cpuhp_set_state(st, target);
 	/*
 	 * If the current CPU state is in the range of the AP hotplug thread,
 	 * then we need to kick the thread once more.
@@ -1313,6 +1401,10 @@ static int cpuhp_issue_call(int cpu, enu
 	struct cpuhp_step *sp = cpuhp_get_step(state);
 	int ret;
 
+	/*
+	 * If there's nothing to do, we done.
+	 * Relies on the union for multi_instance.
+	 */
 	if ((bringup && !sp->startup.single) ||
 	    (!bringup && !sp->teardown.single))
 		return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/7] smp/hotplug: Callback vs state-machine consistency
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
                   ` (2 preceding siblings ...)
  2017-09-20 17:00 ` [PATCH 3/7] smp/hotplug: Rewrite AP state machine core Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 5/7] smp/hotplug: Differentiate the AP completion between up and down Peter Zijlstra
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-7.patch --]
[-- Type: text/plain, Size: 2794 bytes --]

While the generic callback functions have an 'int' return and thus
appear to be allowed to return error, this is not true for all states.

Specifically, what used to be STARTING/DYING are ran with IRQs
disabled from critical parts of CPU bringup/teardown and are not
allowed to fail. Add WARNs to enforce this rule.

But since some callbacks are indeed allowed to fail, we have the
situation where a state-machine rollback encounters a failure, in this
case we're stuck, we can't go forward and we can't go back. Also add a
WARN for that case.

AFAICT this is a fundamental 'problem' with no real obvious solution.
We want the 'prepare' callbacks to allow failure on either up or down.
Typically on prepare-up this would be things like -ENOMEM from
resource allocations, and the typical usage in prepare-down would be
something like -EBUSY to avoid CPUs being taken away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -202,7 +202,14 @@ static int cpuhp_invoke_callback(unsigne
 	hlist_for_each(node, &step->list) {
 		if (!cnt--)
 			break;
-		cbm(cpu, node);
+
+		trace_cpuhp_multi_enter(cpu, st->target, state, cbm, node);
+		ret = cbm(cpu, node);
+		trace_cpuhp_exit(cpu, st->state, state, ret);
+		/*
+		 * Rollback must not fail,
+		 */
+		WARN_ON_ONCE(ret);
 	}
 	return ret;
 }
@@ -695,6 +702,7 @@ static int take_cpu_down(void *_param)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
 	int err, cpu = smp_processor_id();
+	int ret;
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
@@ -708,8 +716,13 @@ static int take_cpu_down(void *_param)
 	WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
 	st->state--;
 	/* Invoke the former CPU_DYING callbacks */
-	for (; st->state > target; st->state--)
-		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+	for (; st->state > target; st->state--) {
+		ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+		/*
+		 * DYING must not fail!
+		 */
+		WARN_ON_ONCE(ret);
+	}
 
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
@@ -884,11 +897,16 @@ void notify_cpu_starting(unsigned int cp
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
+	int ret;
 
 	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	while (st->state < target) {
 		st->state++;
-		cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
+		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
+		/*
+		 * STARTING must not fail!
+		 */
+		WARN_ON_ONCE(ret);
 	}
 }
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/7] smp/hotplug: Differentiate the AP completion between up and down
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
                   ` (3 preceding siblings ...)
  2017-09-20 17:00 ` [PATCH 4/7] smp/hotplug: Callback vs state-machine consistency Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  2017-09-25  8:49   ` Byungchul Park
  2017-09-20 17:00 ` [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class " Peter Zijlstra
  2017-09-20 17:00 ` [PATCH 7/7] smp/hotplug: Hotplug state fail injection Peter Zijlstra
  6 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-3.patch --]
[-- Type: text/plain, Size: 3961 bytes --]

With lockdep-crossrelease we get deadlock reports that span cpu-up and
cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
and cpu-down are globally serialized.

  takedown_cpu()
    irq_lock_sparse()
    wait_for_completion(&st->done)

                                cpuhp_thread_fun
                                  cpuhp_up_callback
                                    cpuhp_invoke_callback
                                      irq_affinity_online_cpu
                                        irq_local_spare()
                                        irq_unlock_sparse()
                                  complete(&st->done)

Now that we have consistent AP state, we can trivially separate the
AP completion between up and down using st->bringup.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -46,7 +46,8 @@
  * @bringup:	Single callback bringup or teardown selector
  * @cb_state:	The state for a single callback (install/uninstall)
  * @result:	Result of the operation
- * @done:	Signal completion to the issuer of the task
+ * @done_up:	Signal completion to the issuer of the task for cpu-up
+ * @done_down:	Signal completion to the issuer of the task for cpu-down
  */
 struct cpuhp_cpu_state {
 	enum cpuhp_state	state;
@@ -61,7 +62,8 @@ struct cpuhp_cpu_state {
 	struct hlist_node	*last;
 	enum cpuhp_state	cb_state;
 	int			result;
-	struct completion	done;
+	struct completion	done_up;
+	struct completion	done_down;
 #endif
 };
 
@@ -90,6 +92,18 @@ static void inline cpuhp_lock_release(bo
 
 #endif
 
+static inline void wait_for_ap_thread(struct cpuhp_cpu_state *st, bool bringup)
+{
+	struct completion *done = bringup ? &st->done_up : &st->done_down;
+	wait_for_completion(done);
+}
+
+static inline void complete_ap_thread(struct cpuhp_cpu_state *st, bool bringup)
+{
+	struct completion *done = bringup ? &st->done_up : &st->done_down;
+	complete(done);
+}
+
 /**
  * cpuhp_step - Hotplug state machine step
  * @name:	Name of the step
@@ -368,7 +382,7 @@ static void __cpuhp_kick_ap(struct cpuhp
 	smp_mb();
 	st->should_run = true;
 	wake_up_process(st->thread);
-	wait_for_completion(&st->done);
+	wait_for_ap_thread(st, st->bringup);
 }
 
 static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
@@ -391,7 +405,7 @@ static int bringup_wait_for_ap(unsigned
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
 	/* Wait for the CPU to reach CPUHP_AP_ONLINE_IDLE */
-	wait_for_completion(&st->done);
+	wait_for_ap_thread(st, true);
 	if (WARN_ON_ONCE((!cpu_online(cpu))))
 		return -ECANCELED;
 
@@ -490,7 +504,8 @@ static void cpuhp_create(unsigned int cp
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
-	init_completion(&st->done);
+	init_completion(&st->done_up);
+	init_completion(&st->done_down);
 }
 
 static int cpuhp_should_run(unsigned int cpu)
@@ -584,7 +599,7 @@ static void cpuhp_thread_fun(unsigned in
 	cpuhp_lock_release(bringup);
 
 	if (!st->should_run)
-		complete(&st->done);
+		complete_ap_thread(st, bringup);
 }
 
 /* Invoke a single callback on a remote cpu */
@@ -780,7 +795,7 @@ static int takedown_cpu(unsigned int cpu
 	 *
 	 * Wait for the stop thread to go away.
 	 */
-	wait_for_completion(&st->done);
+	wait_for_ap_thread(st, false);
 	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
 
 	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
@@ -799,7 +814,7 @@ static void cpuhp_complete_idle_dead(voi
 {
 	struct cpuhp_cpu_state *st = arg;
 
-	complete(&st->done);
+	complete_ap_thread(st, false);
 }
 
 void cpuhp_report_idle_dead(void)
@@ -938,7 +953,7 @@ void cpuhp_online_idle(enum cpuhp_state
 		return;
 
 	st->state = CPUHP_AP_ONLINE_IDLE;
-	complete(&st->done);
+	complete_ap_thread(st, true);
 }
 
 /* Requires cpu_add_remove_lock to be held */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class between up and down
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
                   ` (4 preceding siblings ...)
  2017-09-20 17:00 ` [PATCH 5/7] smp/hotplug: Differentiate the AP completion between up and down Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  2017-09-25  8:54   ` Byungchul Park
  2017-11-30 14:43   ` Lai Jiangshan
  2017-09-20 17:00 ` [PATCH 7/7] smp/hotplug: Hotplug state fail injection Peter Zijlstra
  6 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-4.patch --]
[-- Type: text/plain, Size: 3088 bytes --]

With lockdep-crossrelease we get deadlock reports that span cpu-up and
cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
and cpu-down are globally serialized.

  CPU0                  CPU1                    CPU2
  cpuhp_up_callbacks:   takedown_cpu:           cpuhp_thread_fun:

  cpuhp_state
                        irq_lock_sparse()
    irq_lock_sparse()
                        wait_for_completion()
                                                cpuhp_state
                                                complete()

Now that we have consistent AP state, we can trivially separate the
AP-work class between up and down using st->bringup.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/cpu.c |   41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -68,9 +68,26 @@ struct cpuhp_cpu_state {
 static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
 
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
-static struct lock_class_key cpuhp_state_key;
-static struct lockdep_map cpuhp_state_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key);
+static struct lockdep_map cpuhp_state_up_map =
+	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
+static struct lockdep_map cpuhp_state_down_map =
+	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
+
+
+static void inline cpuhp_lock_acquire(bool bringup)
+{
+	lock_map_acquire(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
+}
+
+static void inline cpuhp_lock_release(bool bringup)
+{
+	lock_map_release(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
+}
+#else
+
+static void inline cpuhp_lock_acquire(bool bringup) { }
+static void inline cpuhp_lock_release(bool bringup) { }
+
 #endif
 
 /**
@@ -512,7 +529,7 @@ static void cpuhp_thread_fun(unsigned in
 	if (WARN_ON_ONCE(!st->should_run))
 		return;
 
-	lock_map_acquire(&cpuhp_state_lock_map);
+	cpuhp_lock_acquire(bringup);
 
 	if (st->single) {
 		state = st->cb_state;
@@ -564,7 +581,7 @@ static void cpuhp_thread_fun(unsigned in
 	}
 
 next:
-	lock_map_release(&cpuhp_state_lock_map);
+	cpuhp_lock_release(bringup);
 
 	if (!st->should_run)
 		complete(&st->done);
@@ -581,8 +598,11 @@ cpuhp_invoke_ap_callback(int cpu, enum c
 	if (!cpu_online(cpu))
 		return 0;
 
-	lock_map_acquire(&cpuhp_state_lock_map);
-	lock_map_release(&cpuhp_state_lock_map);
+	cpuhp_lock_acquire(false);
+	cpuhp_lock_release(false);
+
+	cpuhp_lock_acquire(true);
+	cpuhp_lock_release(true);
 
 	/*
 	 * If we are up and running, use the hotplug thread. For early calls
@@ -620,8 +640,11 @@ static int cpuhp_kick_ap_work(unsigned i
 	enum cpuhp_state prev_state = st->state;
 	int ret;
 
-	lock_map_acquire(&cpuhp_state_lock_map);
-	lock_map_release(&cpuhp_state_lock_map);
+	cpuhp_lock_acquire(false);
+	cpuhp_lock_release(false);
+
+	cpuhp_lock_acquire(true);
+	cpuhp_lock_release(true);
 
 	trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
 	ret = cpuhp_kick_ap(st, st->target);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 7/7] smp/hotplug: Hotplug state fail injection
  2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
                   ` (5 preceding siblings ...)
  2017-09-20 17:00 ` [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class " Peter Zijlstra
@ 2017-09-20 17:00 ` Peter Zijlstra
  6 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:00 UTC (permalink / raw)
  To: mingo, tglx
  Cc: linux-kernel, peterz, rostedt, bigeasy, efault, max.byungchul.park


[-- Attachment #0: peterz-hotplug-6.patch --]
[-- Type: text/plain, Size: 3562 bytes --]

Add a sysfs file to one-time fail a specific state. This can be used
to test the state rollback code paths.

Something like this (hotplug-up.sh):

  #!/bin/bash

  echo 0 > /debug/sched_debug
  echo 1 > /debug/tracing/events/cpuhp/enable

  ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
  STATES=${1:-$ALL_STATES}

  for state in $STATES
  do
	  echo 0 > /sys/devices/system/cpu/cpu1/online
	  echo 0 > /debug/tracing/trace
	  echo Fail state: $state
	  echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
	  cat /sys/devices/system/cpu/cpu1/hotplug/fail
	  echo 1 > /sys/devices/system/cpu/cpu1/online

	  cat /debug/tracing/trace > hotfail-${state}.trace

	  sleep 1
  done

Can be used to test for all possible rollback (barring multi-instance)
scenarios on CPU-up, CPU-down is a trivial modification of the above.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cpuhotplug.h |    3 +-
 kernel/cpu.c               |   60 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 61 insertions(+), 2 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -22,7 +22,8 @@
  */
 
 enum cpuhp_state {
-	CPUHP_OFFLINE,
+	CPUHP_INVALID = -1,
+	CPUHP_OFFLINE = 0,
 	CPUHP_CREATE_THREADS,
 	CPUHP_PERF_PREPARE,
 	CPUHP_PERF_X86_PREPARE,
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -52,6 +52,7 @@
 struct cpuhp_cpu_state {
 	enum cpuhp_state	state;
 	enum cpuhp_state	target;
+	enum cpuhp_state	fail;
 #ifdef CONFIG_SMP
 	struct task_struct	*thread;
 	bool			should_run;
@@ -67,7 +68,9 @@ struct cpuhp_cpu_state {
 #endif
 };
 
-static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
+static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state) = {
+	.fail = CPUHP_INVALID,
+};
 
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
 static struct lockdep_map cpuhp_state_up_map =
@@ -180,6 +183,15 @@ static int cpuhp_invoke_callback(unsigne
 	int (*cb)(unsigned int cpu);
 	int ret, cnt;
 
+	if (st->fail == state) {
+		st->fail = CPUHP_INVALID;
+
+		if (!(bringup ? step->startup.single : step->teardown.single))
+			return 0;
+
+		return -EAGAIN;
+	}
+
 	if (!step->multi_instance) {
 		WARN_ON_ONCE(lastp && *lastp);
 		cb = bringup ? step->startup.single : step->teardown.single;
@@ -1806,9 +1818,55 @@ static ssize_t show_cpuhp_target(struct
 }
 static DEVICE_ATTR(target, 0644, show_cpuhp_target, write_cpuhp_target);
 
+
+static ssize_t write_cpuhp_fail(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, dev->id);
+	struct cpuhp_step *sp;
+	int fail, ret;
+
+	ret = kstrtoint(buf, 10, &fail);
+	if (ret)
+		return ret;
+
+	/*
+	 * Cannot fail STARTING/DYING callbacks.
+	 */
+	if (cpuhp_is_atomic_state(fail))
+		return -EINVAL;
+
+	/*
+	 * Cannot fail anything that doesn't have callbacks.
+	 */
+	mutex_lock(&cpuhp_state_mutex);
+	sp = cpuhp_get_step(fail);
+	if (!sp->startup.single && !sp->teardown.single)
+		ret = -EINVAL;
+	mutex_unlock(&cpuhp_state_mutex);
+	if (ret)
+		return ret;
+
+	st->fail = fail;
+
+	return count;
+}
+
+static ssize_t show_cpuhp_fail(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, dev->id);
+
+	return sprintf(buf, "%d\n", st->fail);
+}
+
+static DEVICE_ATTR(fail, 0644, show_cpuhp_fail, write_cpuhp_fail);
+
 static struct attribute *cpuhp_cpu_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_target.attr,
+	&dev_attr_fail.attr,
 	NULL
 };
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/7] smp/hotplug: Differentiate the AP completion between up and down
  2017-09-20 17:00 ` [PATCH 5/7] smp/hotplug: Differentiate the AP completion between up and down Peter Zijlstra
@ 2017-09-25  8:49   ` Byungchul Park
  0 siblings, 0 replies; 12+ messages in thread
From: Byungchul Park @ 2017-09-25  8:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, rostedt, bigeasy, efault,
	max.byungchul.park, kernel-team

On Wed, Sep 20, 2017 at 07:00:19PM +0200, Peter Zijlstra wrote:
> With lockdep-crossrelease we get deadlock reports that span cpu-up and
> cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
> and cpu-down are globally serialized.
> 
>   takedown_cpu()
>     irq_lock_sparse()
>     wait_for_completion(&st->done)
> 
>                                 cpuhp_thread_fun
>                                   cpuhp_up_callback
>                                     cpuhp_invoke_callback
>                                       irq_affinity_online_cpu
>                                         irq_local_spare()
>                                         irq_unlock_sparse()
>                                   complete(&st->done)
> 
> Now that we have consistent AP state, we can trivially separate the
> AP completion between up and down using st->bringup.

Acked-by: Byungchul Park <byungchul.park@lge.com>

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/cpu.c |   33 ++++++++++++++++++++++++---------
>  1 file changed, 24 insertions(+), 9 deletions(-)
> 
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -46,7 +46,8 @@
>   * @bringup:	Single callback bringup or teardown selector
>   * @cb_state:	The state for a single callback (install/uninstall)
>   * @result:	Result of the operation
> - * @done:	Signal completion to the issuer of the task
> + * @done_up:	Signal completion to the issuer of the task for cpu-up
> + * @done_down:	Signal completion to the issuer of the task for cpu-down
>   */
>  struct cpuhp_cpu_state {
>  	enum cpuhp_state	state;
> @@ -61,7 +62,8 @@ struct cpuhp_cpu_state {
>  	struct hlist_node	*last;
>  	enum cpuhp_state	cb_state;
>  	int			result;
> -	struct completion	done;
> +	struct completion	done_up;
> +	struct completion	done_down;
>  #endif
>  };
>  
> @@ -90,6 +92,18 @@ static void inline cpuhp_lock_release(bo
>  
>  #endif
>  
> +static inline void wait_for_ap_thread(struct cpuhp_cpu_state *st, bool bringup)
> +{
> +	struct completion *done = bringup ? &st->done_up : &st->done_down;
> +	wait_for_completion(done);
> +}
> +
> +static inline void complete_ap_thread(struct cpuhp_cpu_state *st, bool bringup)
> +{
> +	struct completion *done = bringup ? &st->done_up : &st->done_down;
> +	complete(done);
> +}
> +
>  /**
>   * cpuhp_step - Hotplug state machine step
>   * @name:	Name of the step
> @@ -368,7 +382,7 @@ static void __cpuhp_kick_ap(struct cpuhp
>  	smp_mb();
>  	st->should_run = true;
>  	wake_up_process(st->thread);
> -	wait_for_completion(&st->done);
> +	wait_for_ap_thread(st, st->bringup);
>  }
>  
>  static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> @@ -391,7 +405,7 @@ static int bringup_wait_for_ap(unsigned
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  
>  	/* Wait for the CPU to reach CPUHP_AP_ONLINE_IDLE */
> -	wait_for_completion(&st->done);
> +	wait_for_ap_thread(st, true);
>  	if (WARN_ON_ONCE((!cpu_online(cpu))))
>  		return -ECANCELED;
>  
> @@ -490,7 +504,8 @@ static void cpuhp_create(unsigned int cp
>  {
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  
> -	init_completion(&st->done);
> +	init_completion(&st->done_up);
> +	init_completion(&st->done_down);
>  }
>  
>  static int cpuhp_should_run(unsigned int cpu)
> @@ -584,7 +599,7 @@ static void cpuhp_thread_fun(unsigned in
>  	cpuhp_lock_release(bringup);
>  
>  	if (!st->should_run)
> -		complete(&st->done);
> +		complete_ap_thread(st, bringup);
>  }
>  
>  /* Invoke a single callback on a remote cpu */
> @@ -780,7 +795,7 @@ static int takedown_cpu(unsigned int cpu
>  	 *
>  	 * Wait for the stop thread to go away.
>  	 */
> -	wait_for_completion(&st->done);
> +	wait_for_ap_thread(st, false);
>  	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
>  
>  	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
> @@ -799,7 +814,7 @@ static void cpuhp_complete_idle_dead(voi
>  {
>  	struct cpuhp_cpu_state *st = arg;
>  
> -	complete(&st->done);
> +	complete_ap_thread(st, false);
>  }
>  
>  void cpuhp_report_idle_dead(void)
> @@ -938,7 +953,7 @@ void cpuhp_online_idle(enum cpuhp_state
>  		return;
>  
>  	st->state = CPUHP_AP_ONLINE_IDLE;
> -	complete(&st->done);
> +	complete_ap_thread(st, true);
>  }
>  
>  /* Requires cpu_add_remove_lock to be held */
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class between up and down
  2017-09-20 17:00 ` [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class " Peter Zijlstra
@ 2017-09-25  8:54   ` Byungchul Park
  2017-09-25  9:16     ` Peter Zijlstra
  2017-11-30 14:43   ` Lai Jiangshan
  1 sibling, 1 reply; 12+ messages in thread
From: Byungchul Park @ 2017-09-25  8:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, linux-kernel, rostedt, bigeasy, efault,
	max.byungchul.park, kernel-team

On Wed, Sep 20, 2017 at 07:00:20PM +0200, Peter Zijlstra wrote:
> With lockdep-crossrelease we get deadlock reports that span cpu-up and
> cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
> and cpu-down are globally serialized.
> 
>   CPU0                  CPU1                    CPU2
>   cpuhp_up_callbacks:   takedown_cpu:           cpuhp_thread_fun:
> 
>   cpuhp_state
>                         irq_lock_sparse()
>     irq_lock_sparse()
>                         wait_for_completion()
>                                                 cpuhp_state
>                                                 complete()
> 
> Now that we have consistent AP state, we can trivially separate the
> AP-work class between up and down using st->bringup.

Could you tell me what branch you worked the patches based on?
This is similar to the problem of workqueue so I want to fix it on
top of yours, as well.

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/cpu.c |   41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
> 
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -68,9 +68,26 @@ struct cpuhp_cpu_state {
>  static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
>  
>  #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
> -static struct lock_class_key cpuhp_state_key;
> -static struct lockdep_map cpuhp_state_lock_map =
> -	STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key);
> +static struct lockdep_map cpuhp_state_up_map =
> +	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
> +static struct lockdep_map cpuhp_state_down_map =
> +	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
> +
> +
> +static void inline cpuhp_lock_acquire(bool bringup)
> +{
> +	lock_map_acquire(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
> +}
> +
> +static void inline cpuhp_lock_release(bool bringup)
> +{
> +	lock_map_release(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
> +}
> +#else
> +
> +static void inline cpuhp_lock_acquire(bool bringup) { }
> +static void inline cpuhp_lock_release(bool bringup) { }
> +
>  #endif
>  
>  /**
> @@ -512,7 +529,7 @@ static void cpuhp_thread_fun(unsigned in
>  	if (WARN_ON_ONCE(!st->should_run))
>  		return;
>  
> -	lock_map_acquire(&cpuhp_state_lock_map);
> +	cpuhp_lock_acquire(bringup);
>  
>  	if (st->single) {
>  		state = st->cb_state;
> @@ -564,7 +581,7 @@ static void cpuhp_thread_fun(unsigned in
>  	}
>  
>  next:
> -	lock_map_release(&cpuhp_state_lock_map);
> +	cpuhp_lock_release(bringup);
>  
>  	if (!st->should_run)
>  		complete(&st->done);
> @@ -581,8 +598,11 @@ cpuhp_invoke_ap_callback(int cpu, enum c
>  	if (!cpu_online(cpu))
>  		return 0;
>  
> -	lock_map_acquire(&cpuhp_state_lock_map);
> -	lock_map_release(&cpuhp_state_lock_map);
> +	cpuhp_lock_acquire(false);
> +	cpuhp_lock_release(false);
> +
> +	cpuhp_lock_acquire(true);
> +	cpuhp_lock_release(true);
>  
>  	/*
>  	 * If we are up and running, use the hotplug thread. For early calls
> @@ -620,8 +640,11 @@ static int cpuhp_kick_ap_work(unsigned i
>  	enum cpuhp_state prev_state = st->state;
>  	int ret;
>  
> -	lock_map_acquire(&cpuhp_state_lock_map);
> -	lock_map_release(&cpuhp_state_lock_map);
> +	cpuhp_lock_acquire(false);
> +	cpuhp_lock_release(false);
> +
> +	cpuhp_lock_acquire(true);
> +	cpuhp_lock_release(true);
>  
>  	trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
>  	ret = cpuhp_kick_ap(st, st->target);
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class between up and down
  2017-09-25  8:54   ` Byungchul Park
@ 2017-09-25  9:16     ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-09-25  9:16 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, linux-kernel, rostedt, bigeasy, efault,
	max.byungchul.park, kernel-team

On Mon, Sep 25, 2017 at 05:54:59PM +0900, Byungchul Park wrote:
> On Wed, Sep 20, 2017 at 07:00:20PM +0200, Peter Zijlstra wrote:
> > With lockdep-crossrelease we get deadlock reports that span cpu-up and
> > cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
> > and cpu-down are globally serialized.
> > 
> >   CPU0                  CPU1                    CPU2
> >   cpuhp_up_callbacks:   takedown_cpu:           cpuhp_thread_fun:
> > 
> >   cpuhp_state
> >                         irq_lock_sparse()
> >     irq_lock_sparse()
> >                         wait_for_completion()
> >                                                 cpuhp_state
> >                                                 complete()
> > 
> > Now that we have consistent AP state, we can trivially separate the
> > AP-work class between up and down using st->bringup.
> 
> Could you tell me what branch you worked the patches based on?
> This is similar to the problem of workqueue so I want to fix it on
> top of yours, as well.

I wrote the patches on top of tip/master. Thomas maintains these bits so
hopefully he'll eventually merge them in the right tip tree.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class between up and down
  2017-09-20 17:00 ` [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class " Peter Zijlstra
  2017-09-25  8:54   ` Byungchul Park
@ 2017-11-30 14:43   ` Lai Jiangshan
  1 sibling, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2017-11-30 14:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Steven Rostedt,
	Sebastian Andrzej Siewior, efault, max.byungchul.park

On Thu, Sep 21, 2017 at 1:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> With lockdep-crossrelease we get deadlock reports that span cpu-up and
> cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
> and cpu-down are globally serialized.
>
>   CPU0                  CPU1                    CPU2
>   cpuhp_up_callbacks:   takedown_cpu:           cpuhp_thread_fun:
>
>   cpuhp_state
>                         irq_lock_sparse()
>     irq_lock_sparse()
>                         wait_for_completion()
>                                                 cpuhp_state
>                                                 complete()
>
> Now that we have consistent AP state, we can trivially separate the
> AP-work class between up and down using st->bringup.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/cpu.c |   41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -68,9 +68,26 @@ struct cpuhp_cpu_state {
>  static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state);
>
>  #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
> -static struct lock_class_key cpuhp_state_key;
> -static struct lockdep_map cpuhp_state_lock_map =
> -       STATIC_LOCKDEP_MAP_INIT("cpuhp_state", &cpuhp_state_key);
> +static struct lockdep_map cpuhp_state_up_map =
> +       STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
> +static struct lockdep_map cpuhp_state_down_map =
> +       STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
> +
> +
> +static void inline cpuhp_lock_acquire(bool bringup)
> +{
> +       lock_map_acquire(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
> +}
> +
> +static void inline cpuhp_lock_release(bool bringup)
> +{
> +       lock_map_release(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
> +}
> +#else
> +
> +static void inline cpuhp_lock_acquire(bool bringup) { }
> +static void inline cpuhp_lock_release(bool bringup) { }
> +
>  #endif
>
>  /**
> @@ -512,7 +529,7 @@ static void cpuhp_thread_fun(unsigned in
>         if (WARN_ON_ONCE(!st->should_run))
>                 return;
>
> -       lock_map_acquire(&cpuhp_state_lock_map);
> +       cpuhp_lock_acquire(bringup);
>
>         if (st->single) {
>                 state = st->cb_state;
> @@ -564,7 +581,7 @@ static void cpuhp_thread_fun(unsigned in
>         }
>
>  next:
> -       lock_map_release(&cpuhp_state_lock_map);
> +       cpuhp_lock_release(bringup);
>
>         if (!st->should_run)
>                 complete(&st->done);
> @@ -581,8 +598,11 @@ cpuhp_invoke_ap_callback(int cpu, enum c
>         if (!cpu_online(cpu))
>                 return 0;
>
> -       lock_map_acquire(&cpuhp_state_lock_map);
> -       lock_map_release(&cpuhp_state_lock_map);
> +       cpuhp_lock_acquire(false);
> +       cpuhp_lock_release(false);
> +
> +       cpuhp_lock_acquire(true);
> +       cpuhp_lock_release(true);

Hello, Peter,

I'm reading the code in kernel/cpu.c.
I couldn't understand why both lockep_map are acquired here?
Is the lockep_map matching for the argument @bringup enough here?

The log shows that the argument @bringup had been added
when the time this commit was applied. But it was quite probably
non-existed when you wrote the patch since the time was close.

thanks,
Lai.

>
>         /*
>          * If we are up and running, use the hotplug thread. For early calls
> @@ -620,8 +640,11 @@ static int cpuhp_kick_ap_work(unsigned i
>         enum cpuhp_state prev_state = st->state;
>         int ret;
>
> -       lock_map_acquire(&cpuhp_state_lock_map);
> -       lock_map_release(&cpuhp_state_lock_map);
> +       cpuhp_lock_acquire(false);
> +       cpuhp_lock_release(false);
> +
> +       cpuhp_lock_acquire(true);
> +       cpuhp_lock_release(true);
>
>         trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
>         ret = cpuhp_kick_ap(st, st->target);
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 17:00 [PATCH 0/7] smp/hotplug rework / lockdep annotate Peter Zijlstra
2017-09-20 17:00 ` [PATCH 1/7] smp/hotplug: Add state diagram Peter Zijlstra
2017-09-20 17:00 ` [PATCH 2/7] smp/hotplug: Allow external multi-instance rollback Peter Zijlstra
2017-09-20 17:00 ` [PATCH 3/7] smp/hotplug: Rewrite AP state machine core Peter Zijlstra
2017-09-20 17:00 ` [PATCH 4/7] smp/hotplug: Callback vs state-machine consistency Peter Zijlstra
2017-09-20 17:00 ` [PATCH 5/7] smp/hotplug: Differentiate the AP completion between up and down Peter Zijlstra
2017-09-25  8:49   ` Byungchul Park
2017-09-20 17:00 ` [PATCH 6/7] smp/hotplug: Differentiate the AP-work lockdep class " Peter Zijlstra
2017-09-25  8:54   ` Byungchul Park
2017-09-25  9:16     ` Peter Zijlstra
2017-11-30 14:43   ` Lai Jiangshan
2017-09-20 17:00 ` [PATCH 7/7] smp/hotplug: Hotplug state fail injection Peter Zijlstra

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git