LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	rt@linutronix.de, Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>
Subject: [PATCH v2] cpu/hotplug: fix rollback during error-out in __cpu_disable()
Date: Fri, 8 Apr 2016 14:40:15 +0200
Message-ID: <20160408124015.GA21960@linutronix.de> (raw)
In-Reply-To: <20160408061949.GA3433@osiris>

If we error out in __cpu_disable() (via takedown_cpu() which is
currently the last one that can fail) we don't rollback entirely to
CPUHP_ONLINE (where we started) but to CPUHP_AP_ONLINE_IDLE. This
happens because the former states were on the target CPU (the AP states)
and during the rollback we go back until the first BP state we started.
The next cpu_down attempt (on the same failed CPU) will take forever
because the cpuhp thread is still down (same goes for smpboot threads).

The fix this I rollback to where we started in _cpu_down(). For this I
add a ->rollback flag so we can invoke the states on the target CPU via
undo_cpu_down() (otherwise cpuhp_ap_online() rollback to
CPUHP_AP_ONLINE_IDLE in case of an error).

notify_online() has been marked as ->skip_onerr because otherwise we
will see the CPU_ONLINE notifier in addition to the CPU_DOWN_FAILED.
However with ->skip_onerr we neither see CPU_ONLINE nor CPU_DOWN_FAILED
if something in between (CPU_DOWN_FAILED … CPUHP_TEARDOWN_CPU).
Currently there is nothing.

This regression got probably introduce in the rework while we introduced
the hotplug thread to offload the work to the target CPU.

Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: replace the workqueue with cpuhp thread

CPU_DOWN_FAILED is still invoked on the "wrong" CPU, this is still just
about fixing the regression.

 kernel/cpu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6ea42e8da861..6433b9639946 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -36,6 +36,7 @@
  * @target:	The target state
  * @thread:	Pointer to the hotplug thread
  * @should_run:	Thread should execute
+ * @rollback:	Perform a rollback
  * @cb_stat:	The state for a single callback (install/uninstall)
  * @cb:		Single callback function (install/uninstall)
  * @result:	Result of the operation
@@ -47,6 +48,7 @@ struct cpuhp_cpu_state {
 #ifdef CONFIG_SMP
 	struct task_struct	*thread;
 	bool			should_run;
+	bool			rollback;
 	enum cpuhp_state	cb_state;
 	int			(*cb)(unsigned int cpu);
 	int			result;
@@ -477,6 +479,11 @@ static void cpuhp_thread_fun(unsigned int cpu)
 		} else {
 			ret = cpuhp_invoke_callback(cpu, st->cb_state, st->cb);
 		}
+	} else if (st->rollback) {
+		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
+
+		undo_cpu_down(cpu, st, cpuhp_ap_states);
+		st->rollback = false;
 	} else {
 		/* Cannot happen .... */
 		BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE);
@@ -724,6 +731,8 @@ static int takedown_cpu(unsigned int cpu)
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		cpu_notify_nofail(CPU_DOWN_FAILED, cpu);
 		irq_unlock_sparse();
+		kthread_unpark(per_cpu_ptr(&cpuhp_state, cpu)->thread);
+		/* smpboot threads are up via CPUHP_AP_SMPBOOT_THREADS */
 		return err;
 	}
 	BUG_ON(cpu_online(cpu));
@@ -832,6 +841,12 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	 * to do the further cleanups.
 	 */
 	ret = cpuhp_down_callbacks(cpu, st, cpuhp_bp_states, target);
+	if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+
+		st->target = prev_state;
+		st->rollback = true;
+		cpuhp_kick_ap_work(cpu);
+	}
 
 	hasdied = prev_state != st->state && st->state == CPUHP_OFFLINE;
 out:
@@ -1249,6 +1264,7 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 		.name			= "notify:online",
 		.startup		= notify_online,
 		.teardown		= notify_down_prepare,
+		.skip_onerr		= true,
 	},
 #endif
 	/*
-- 
2.8.0.rc3

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-04 10:27 [PATCH] s390/cpum_sf: Remove superfluous SMP function call Anna-Maria Gleixner
2016-04-05 10:49 ` Heiko Carstens
2016-04-05 11:13   ` [PREEMPT-RT] " Sebastian Andrzej Siewior
2016-04-05 11:23     ` Heiko Carstens
2016-04-05 11:36       ` Heiko Carstens
2016-04-05 11:51         ` rcochran
2016-04-05 11:55           ` Heiko Carstens
2016-04-05 11:57           ` Sebastian Andrzej Siewior
2016-04-05 12:11             ` Heiko Carstens
2016-04-05 12:19               ` Sebastian Andrzej Siewior
2016-04-05 15:59               ` [PATCH] cpu/hotplug: fix rollback during error-out in __cpu_disable() Sebastian Andrzej Siewior
2016-04-06 19:51                 ` Heiko Carstens
2016-04-07 15:14                   ` Sebastian Andrzej Siewior
2016-04-08  6:19                     ` Heiko Carstens
2016-04-08 12:40                       ` Sebastian Andrzej Siewior [this message]
2016-04-22  7:54                         ` [tip:smp/urgent] cpu/hotplug: Fix " tip-bot for Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160408124015.GA21960@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=rt@linutronix.de \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git