From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751507AbdAPXQP (ORCPT ); Mon, 16 Jan 2017 18:16:15 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53340 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750964AbdAPXQK (ORCPT ); Mon, 16 Jan 2017 18:16:10 -0500 Message-ID: <587D53A3.1010305@redhat.com> Date: Mon, 16 Jan 2017 18:13:39 -0500 From: Prarit Bhargava User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Borislav Petkov CC: linux-kernel@vger.kernel.org, Tony Luck , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, linux-edac@vger.kernel.org Subject: Re: [PATCH] x86/mce: Fix initialization error warning References: <1484603381-15018-1-git-send-email-prarit@redhat.com> <20170116215612.f22reezt77nydsp5@pd.tnic> <587D43CA.5020609@redhat.com> <20170116224316.bhnr4goijsh6h76o@pd.tnic> In-Reply-To: <20170116224316.bhnr4goijsh6h76o@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 16 Jan 2017 23:13:41 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/16/2017 05:43 PM, Borislav Petkov wrote: > On Mon, Jan 16, 2017 at 05:06:02PM -0500, Prarit Bhargava wrote: >> Yes, it was loud enough to generate a bug report from a user. > > Yeah, because all users are sane and we should do whatever they want - > no questions asked. Especially those who boot with "mce=off". > > Did you actually ask that user why she/he is even booting with > "mce=off"? Yes, mce=off is the default for kdump: KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never" There is a race condition between NMI completing on a CPU and the MCE synchronization timing out that results in a kernel panic on the kdump kernel, and a loss of the dump image. There have been a few attempts to fix it over the years. It seems as simple as setting a flag in native_machine_crash_shutdown() and querying it in do_machine_check() to avoid mce & nmi race. P. >