From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S965923AbcKJRsH (ORCPT <rfc822;w@1wt.eu>);
        Thu, 10 Nov 2016 12:48:07 -0500
Received: from Galois.linutronix.de ([146.0.238.70]:34001 "EHLO
        Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S965461AbcKJRq2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Nov 2016 12:46:28 -0500
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de, rt@linutronix.de,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
        linux-edac@vger.kernel.org, x86@kernel.org
Subject: [PATCH 5/7] x86/mcheck: reorganize the hotplug callbacks
Date: Thu, 10 Nov 2016 18:44:45 +0100
Message-Id: <20161110174447.11848-6-bigeasy@linutronix.de>
X-Mailer: git-send-email 2.10.2
In-Reply-To: <20161110174447.11848-1-bigeasy@linutronix.de>
References: <20161110091809.vxyf3yiuxtjy3vqv@pd.tnic>
 <20161110174447.11848-1-bigeasy@linutronix.de>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let =
it
become an independent early hotplug callback. The main problem here was that
the init on the boot CPU may happen too late
(device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiv=
ing
and MCE event at boot time leading to a shutdown (if the MCE feature is not=
 yet
enabled).

Here is attempt two: the timming stays as-is but the ordering of the functi=
ons
is changed:
- mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer
  struct but won't fire the timer. This is moved to CPU_ONLINE since its
  cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer e=
arly
  in the shutdown phase, it should be okay to start it late in the bring up=
 phase.

- CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in
  mce_disable_cpu(). If a failure occures it would be re-enabled on all ven=
dor
  CPUs (including Intel where it was not disabled during shutdown). To keep=
 this
  working I am moving it to CPU_ONLINE. smp_call_function_single() is dropp=
ed
  beause the notifier runs nowdays on the target CPU.

- CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device()
  but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and
  mce_device_remove()). In order to keep this symmetrical I am moving the c=
lean
  up from CPU_DEAD to CPU_DOWN_PREPARE.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/=
mce.c
index 052b5e05c3c4..3da6fd94fa2e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1771,6 +1771,9 @@ void (*machine_check_vector)(struct pt_regs *, long e=
rror_code) =3D
  */
 void mcheck_cpu_init(struct cpuinfo_x86 *c)
 {
+	struct timer_list *t =3D this_cpu_ptr(&mce_timer);
+	unsigned int cpu =3D smp_processor_id();
+
 	if (mca_cfg.disabled)
 		return;
=20
@@ -1796,7 +1799,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
 	__mcheck_cpu_init_clear_banks();
-	__mcheck_cpu_init_timer();
+	setup_pinned_timer(t, mce_timer_fn, cpu);
 }
=20
 /*
@@ -2470,28 +2473,25 @@ static void mce_device_remove(unsigned int cpu)
 }
=20
 /* Make sure there are no machine checks on offlined CPUs. */
-static void mce_disable_cpu(void *h)
+static void mce_disable_cpu(void)
 {
-	unsigned long action =3D *(unsigned long *)h;
-
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
=20
-	if (!(action & CPU_TASKS_FROZEN))
+	if (!cpuhp_tasks_frozen)
 		cmci_clear();
=20
 	vendor_disable_error_reporting();
 }
=20
-static void mce_reenable_cpu(void *h)
+static void mce_reenable_cpu(void)
 {
-	unsigned long action =3D *(unsigned long *)h;
 	int i;
=20
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
=20
-	if (!(action & CPU_TASKS_FROZEN))
+	if (!cpuhp_tasks_frozen)
 		cmci_reenable();
 	for (i =3D 0; i < mca_cfg.banks; i++) {
 		struct mce_bank *b =3D &mce_banks[i];
@@ -2510,6 +2510,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned=
 long action, void *hcpu)
=20
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_ONLINE:
+	case CPU_DOWN_FAILED:
=20
 		mce_device_create(cpu);
=20
@@ -2517,11 +2518,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsign=
ed long action, void *hcpu)
 			mce_device_remove(cpu);
 			return NOTIFY_BAD;
 		}
-
+		mce_reenable_cpu();
+		mce_start_timer(cpu, t);
 		break;
 	case CPU_DEAD:
-		mce_threshold_remove_device(cpu);
-		mce_device_remove(cpu);
 		mce_intel_hcpu_update(cpu);
=20
 		/* intentionally ignoring frozen here */
@@ -2529,12 +2529,11 @@ mce_cpu_callback(struct notifier_block *nfb, unsign=
ed long action, void *hcpu)
 			cmci_rediscover();
 		break;
 	case CPU_DOWN_PREPARE:
-		smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
+		mce_disable_cpu();
 		del_timer_sync(t);
-		break;
-	case CPU_DOWN_FAILED:
-		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
-		mce_start_timer(cpu, t);
+
+		mce_threshold_remove_device(cpu);
+		mce_device_remove(cpu);
 		break;
 	}
=20
--=20
2.10.2