From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933408AbcKNKrX (ORCPT ); Mon, 14 Nov 2016 05:47:23 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:39206 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751750AbcKNKrW (ORCPT ); Mon, 14 Nov 2016 05:47:22 -0500 Date: Mon, 14 Nov 2016 11:47:11 +0100 From: Sebastian Andrzej Siewior To: Borislav Petkov Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, rt@linutronix.de, Tony Luck , linux-edac@vger.kernel.org, x86@kernel.org Subject: [PATCH 5/7 v2] x86/mcheck: reorganize the hotplug callbacks Message-ID: <20161114104711.fkmgpaiklnkcoxnb@linutronix.de> References: <20161110091809.vxyf3yiuxtjy3vqv@pd.tnic> <20161110174447.11848-1-bigeasy@linutronix.de> <20161110174447.11848-6-bigeasy@linutronix.de> <20161111184453.ax5getrj5y3i4fuc@pd.tnic> <20161111193631.srv7sxqzykvymgbf@linutronix.de> <20161111195737.i2p35ax6km2ilcng@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161111195737.i2p35ax6km2ilcng@pd.tnic> User-Agent: NeoMutt/20161104 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let it become an independent early hotplug callback. The main problem here was that the init on the boot CPU may happen too late (device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiving and MCE event at boot time leading to a shutdown (if the MCE feature is not yet enabled). Here is attempt two: the timing stays as-is but the ordering of the functions is changed: - mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer struct but won't fire the timer. This is moved to CPU_ONLINE since its cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer early in the shutdown phase, it should be okay to start it late in the bring up phase. - CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in mce_disable_cpu(). If a failure occurs it would be re-enabled on all vendor CPUs (including Intel where it was not disabled during shutdown). To keep this working I am moving it to CPU_ONLINE. smp_call_function_single() is dropped because the notifier runs on the target CPU (since core code rework). - CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device() but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and mce_device_remove()). In order to keep this symmetrical I am moving the clean up from CPU_DEAD to CPU_DOWN_PREPARE. Cc: Tony Luck Cc: Borislav Petkov Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Sebastian Andrzej Siewior --- On 2016-11-11 20:57:37 [+0100], Borislav Petkov wrote: > Then please do it right: > > static void __mcheck_cap_setup_timer(void) > { > struct timer_list *t = this_cpu_ptr(&mce_timer); > unsigned int cpu = smp_processor_id(); > > setup_pinned_timer(t, mce_timer_fn, cpu); > } > > and call that function then. I can't believe that you ask for this but here it is. arch/x86/kernel/cpu/mcheck/mce.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1745,6 +1745,14 @@ static void mce_start_timer(unsigned int add_timer_on(t, cpu); } +static void __mcheck_cap_setup_timer(void) +{ + struct timer_list *t = this_cpu_ptr(&mce_timer); + unsigned int cpu = smp_processor_id(); + + setup_pinned_timer(t, mce_timer_fn, cpu); +} + static void __mcheck_cpu_init_timer(void) { struct timer_list *t = this_cpu_ptr(&mce_timer); @@ -1796,7 +1804,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 __mcheck_cpu_init_generic(); __mcheck_cpu_init_vendor(c); __mcheck_cpu_init_clear_banks(); - __mcheck_cpu_init_timer(); + __mcheck_cap_setup_timer(); } /* @@ -2470,28 +2478,25 @@ static void mce_device_remove(unsigned i } /* Make sure there are no machine checks on offlined CPUs. */ -static void mce_disable_cpu(void *h) +static void mce_disable_cpu(void) { - unsigned long action = *(unsigned long *)h; - if (!mce_available(raw_cpu_ptr(&cpu_info))) return; - if (!(action & CPU_TASKS_FROZEN)) + if (!cpuhp_tasks_frozen) cmci_clear(); vendor_disable_error_reporting(); } -static void mce_reenable_cpu(void *h) +static void mce_reenable_cpu(void) { - unsigned long action = *(unsigned long *)h; int i; if (!mce_available(raw_cpu_ptr(&cpu_info))) return; - if (!(action & CPU_TASKS_FROZEN)) + if (!cpuhp_tasks_frozen) cmci_reenable(); for (i = 0; i < mca_cfg.banks; i++) { struct mce_bank *b = &mce_banks[i]; @@ -2510,6 +2515,7 @@ mce_cpu_callback(struct notifier_block * switch (action & ~CPU_TASKS_FROZEN) { case CPU_ONLINE: + case CPU_DOWN_FAILED: mce_device_create(cpu); @@ -2517,11 +2523,10 @@ mce_cpu_callback(struct notifier_block * mce_device_remove(cpu); return NOTIFY_BAD; } - + mce_reenable_cpu(); + mce_start_timer(cpu, t); break; case CPU_DEAD: - mce_threshold_remove_device(cpu); - mce_device_remove(cpu); mce_intel_hcpu_update(cpu); /* intentionally ignoring frozen here */ @@ -2529,12 +2534,11 @@ mce_cpu_callback(struct notifier_block * cmci_rediscover(); break; case CPU_DOWN_PREPARE: - smp_call_function_single(cpu, mce_disable_cpu, &action, 1); + mce_disable_cpu(); del_timer_sync(t); - break; - case CPU_DOWN_FAILED: - smp_call_function_single(cpu, mce_reenable_cpu, &action, 1); - mce_start_timer(cpu, t); + + mce_threshold_remove_device(cpu); + mce_device_remove(cpu); break; }