From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AG47ELspfRshe+BVUE3dsRBUuw1P9IqJT6uLgNiVKHDvrjJVAcNcGlXg2nPru3X7glUtiaq0hPJG ARC-Seal: i=1; a=rsa-sha256; t=1521483529; cv=none; d=google.com; s=arc-20160816; b=HOM06wV8UiNXfrL4jCpNLycwoB4JM7NecN8KQeAQ5glVgGlN0DJqq43EElctLaYyPb T8JxIqdGTbz+qZ/k4DOI1RAfHBtvpMPlHu/XlIFYFq+7SxmNUsemCz1wcSQ64rMB8eKY chl2F1wn93vt/r2T3oIYXt8RMqnmXWFZ0rWAJ4icntPEvVfee4h/e1w/beBDDUwirx3n Ky9MIMdkKqQhQbMyQzZTeBftCsmFIcQ0jqO/kYocJMhwT3FQOeZfjk9W6l4EH+gIWicK UxpqZJ4T3Ki+SUdyjk2crhjUcsWB9IbcS1Toy+iefMGOexrJ8KPH7vq/s90Bfj4gIBCZ GeUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=6FDiO2tkTNNisq0vYo3Mk2A/QqklwA8M+72LH9b5CyQ=; b=O65PlazzRaIXCVR+vfQaLuWpU6RrVM+oyFQoyRqD7Ns702sUR+iUAqR7/xY4PCEwg3 K04VNK7OiPJ1AJ5B92lCWtf9kfWI0kGpVPkxgneNEQK3Dgg5jtTcbdgtzjwyNaiFD+hm VlLEt6AtRJ6LDVSXvNHjoCWHecuCx8vq91wkM3k8XOWeMXmkchPhAowmsaIU3pxAlBY/ ihMB3Ylq40YwWvoUAIvC5a1M+Yhf7lQnHC0ckYk/+PRJ5+OHwYoHH7injyrDbnHp4Jvd JWJwQavv69hIPf2LUeiL/4o2wegQBr4PTzboo1hWCD/KV37zNA0oCxc94Sl8D6PJJwpb cRsg== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning gregkh@linuxfoundation.org does not designate 90.92.61.202 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning gregkh@linuxfoundation.org does not designate 90.92.61.202 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Borislav Petkov , Xunlei Pang , Borislav Petkov , Tony Luck , Naoya Horiguchi , kexec@lists.infradead.org, linux-edac , Thomas Gleixner , Sasha Levin Subject: [PATCH 4.9 022/241] x86/mce: Handle broadcasted MCE gracefully with kexec Date: Mon, 19 Mar 2018 19:04:47 +0100 Message-Id: <20180319180752.087801059@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180319180751.172155436@linuxfoundation.org> References: <20180319180751.172155436@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-LABELS: =?utf-8?b?IlxcU2VudCI=?= X-GMAIL-THRID: =?utf-8?q?1595391113151628350?= X-GMAIL-MSGID: =?utf-8?q?1595391113151628350?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Xunlei Pang [ Upstream commit 5bc329503e8191c91c4c40836f062ef771d8ba83 ] When we are about to kexec a crash kernel and right then and there a broadcasted MCE fires while we're still in the first kernel and while the other CPUs remain in a holding pattern, the #MC handler of the first kernel will timeout and then panic due to never completing MCE synchronization. Handle this in a similar way as to when the CPUs are offlined when that broadcasted MCE happens. [ Boris: rewrote commit message and comments. ] Suggested-by: Borislav Petkov Signed-off-by: Xunlei Pang Signed-off-by: Borislav Petkov Acked-by: Tony Luck Cc: Naoya Horiguchi Cc: kexec@lists.infradead.org Cc: linux-edac Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/reboot.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++-- arch/x86/kernel/reboot.c | 5 +++-- 3 files changed, 20 insertions(+), 4 deletions(-) --- a/arch/x86/include/asm/reboot.h +++ b/arch/x86/include/asm/reboot.h @@ -15,6 +15,7 @@ struct machine_ops { }; extern struct machine_ops machine_ops; +extern int crashing_cpu; void native_machine_crash_shutdown(struct pt_regs *regs); void native_machine_shutdown(void); --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "mce-internal.h" @@ -1081,9 +1082,22 @@ void do_machine_check(struct pt_regs *re * on Intel. */ int lmce = 1; + int cpu = smp_processor_id(); - /* If this CPU is offline, just bail out. */ - if (cpu_is_offline(smp_processor_id())) { + /* + * Cases where we avoid rendezvous handler timeout: + * 1) If this CPU is offline. + * + * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to + * skip those CPUs which remain looping in the 1st kernel - see + * crash_nmi_callback(). + * + * Note: there still is a small window between kexec-ing and the new, + * kdump kernel establishing a new #MC handler where a broadcasted MCE + * might not get handled properly. + */ + if (cpu_is_offline(cpu) || + (crashing_cpu != -1 && crashing_cpu != cpu)) { u64 mcgstatus; mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -769,10 +769,11 @@ void machine_crash_shutdown(struct pt_re #endif +/* This is the CPU performing the emergency shutdown work. */ +int crashing_cpu = -1; + #if defined(CONFIG_SMP) -/* This keeps a track of which one is crashing cpu. */ -static int crashing_cpu; static nmi_shootdown_cb shootdown_callback; static atomic_t waiting_for_crash_ipi; From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [4.9,022/241] x86/mce: Handle broadcasted MCE gracefully with kexec From: Greg Kroah-Hartman Message-Id: <20180319180752.087801059@linuxfoundation.org> Date: Mon, 19 Mar 2018 19:04:47 +0100 To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Borislav Petkov , Xunlei Pang , Borislav Petkov , Tony Luck , Naoya Horiguchi , kexec@lists.infradead.org, linux-edac , Thomas Gleixner , Sasha Levin List-ID: NC45LXN0YWJsZSByZXZpZXcgcGF0Y2guICBJZiBhbnlvbmUgaGFzIGFueSBvYmplY3Rpb25zLCBw bGVhc2UgbGV0IG1lIGtub3cuCgotLS0tLS0tLS0tLS0tLS0tLS0KCkZyb206IFh1bmxlaSBQYW5n IDx4bHBhbmdAcmVkaGF0LmNvbT4KCgpbIFVwc3RyZWFtIGNvbW1pdCA1YmMzMjk1MDNlODE5MWM5 MWM0YzQwODM2ZjA2MmVmNzcxZDhiYTgzIF0KCldoZW4gd2UgYXJlIGFib3V0IHRvIGtleGVjIGEg Y3Jhc2gga2VybmVsIGFuZCByaWdodCB0aGVuIGFuZCB0aGVyZSBhCmJyb2FkY2FzdGVkIE1DRSBm aXJlcyB3aGlsZSB3ZSdyZSBzdGlsbCBpbiB0aGUgZmlyc3Qga2VybmVsIGFuZCB3aGlsZQp0aGUg b3RoZXIgQ1BVcyByZW1haW4gaW4gYSBob2xkaW5nIHBhdHRlcm4sIHRoZSAjTUMgaGFuZGxlciBv ZiB0aGUKZmlyc3Qga2VybmVsIHdpbGwgdGltZW91dCBhbmQgdGhlbiBwYW5pYyBkdWUgdG8gbmV2 ZXIgY29tcGxldGluZyBNQ0UKc3luY2hyb25pemF0aW9uLgoKSGFuZGxlIHRoaXMgaW4gYSBzaW1p bGFyIHdheSBhcyB0byB3aGVuIHRoZSBDUFVzIGFyZSBvZmZsaW5lZCB3aGVuIHRoYXQKYnJvYWRj YXN0ZWQgTUNFIGhhcHBlbnMuCgpbIEJvcmlzOiByZXdyb3RlIGNvbW1pdCBtZXNzYWdlIGFuZCBj b21tZW50cy4gXQoKU3VnZ2VzdGVkLWJ5OiBCb3Jpc2xhdiBQZXRrb3YgPGJwQGFsaWVuOC5kZT4K U2lnbmVkLW9mZi1ieTogWHVubGVpIFBhbmcgPHhscGFuZ0ByZWRoYXQuY29tPgpTaWduZWQtb2Zm LWJ5OiBCb3Jpc2xhdiBQZXRrb3YgPGJwQHN1c2UuZGU+CkFja2VkLWJ5OiBUb255IEx1Y2sgPHRv bnkubHVja0BpbnRlbC5jb20+CkNjOiBOYW95YSBIb3JpZ3VjaGkgPG4taG9yaWd1Y2hpQGFoLmpw Lm5lYy5jb20+CkNjOiBrZXhlY0BsaXN0cy5pbmZyYWRlYWQub3JnCkNjOiBsaW51eC1lZGFjIDxs aW51eC1lZGFjQHZnZXIua2VybmVsLm9yZz4KTGluazogaHR0cDovL2xrbWwua2VybmVsLm9yZy9y LzE0ODc4NTcwMTItOTA1OS0xLWdpdC1zZW5kLWVtYWlsLXhscGFuZ0ByZWRoYXQuY29tCkxpbms6 IGh0dHA6Ly9sa21sLmtlcm5lbC5vcmcvci8yMDE3MDMxMzA5NTAxOS4xOTM1MS0xLWJwQGFsaWVu OC5kZQpTaWduZWQtb2ZmLWJ5OiBUaG9tYXMgR2xlaXhuZXIgPHRnbHhAbGludXRyb25peC5kZT4K U2lnbmVkLW9mZi1ieTogU2FzaGEgTGV2aW4gPGFsZXhhbmRlci5sZXZpbkBtaWNyb3NvZnQuY29t PgpTaWduZWQtb2ZmLWJ5OiBHcmVnIEtyb2FoLUhhcnRtYW4gPGdyZWdraEBsaW51eGZvdW5kYXRp b24ub3JnPgotLS0KIGFyY2gveDg2L2luY2x1ZGUvYXNtL3JlYm9vdC5oICAgIHwgICAgMSArCiBh cmNoL3g4Ni9rZXJuZWwvY3B1L21jaGVjay9tY2UuYyB8ICAgMTggKysrKysrKysrKysrKysrKy0t CiBhcmNoL3g4Ni9rZXJuZWwvcmVib290LmMgICAgICAgICB8ICAgIDUgKysrLS0KIDMgZmlsZXMg Y2hhbmdlZCwgMjAgaW5zZXJ0aW9ucygrKSwgNCBkZWxldGlvbnMoLSkKCgoKLS0KVG8gdW5zdWJz Y3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LWVk YWMiIGluCnRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3Jn Ck1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21v LWluZm8uaHRtbAoKLS0tIGEvYXJjaC94ODYvaW5jbHVkZS9hc20vcmVib290LmgKKysrIGIvYXJj aC94ODYvaW5jbHVkZS9hc20vcmVib290LmgKQEAgLTE1LDYgKzE1LDcgQEAgc3RydWN0IG1hY2hp bmVfb3BzIHsKIH07CiAKIGV4dGVybiBzdHJ1Y3QgbWFjaGluZV9vcHMgbWFjaGluZV9vcHM7Citl eHRlcm4gaW50IGNyYXNoaW5nX2NwdTsKIAogdm9pZCBuYXRpdmVfbWFjaGluZV9jcmFzaF9zaHV0 ZG93bihzdHJ1Y3QgcHRfcmVncyAqcmVncyk7CiB2b2lkIG5hdGl2ZV9tYWNoaW5lX3NodXRkb3du KHZvaWQpOwotLS0gYS9hcmNoL3g4Ni9rZXJuZWwvY3B1L21jaGVjay9tY2UuYworKysgYi9hcmNo L3g4Ni9rZXJuZWwvY3B1L21jaGVjay9tY2UuYwpAQCAtNDgsNiArNDgsNyBAQAogI2luY2x1ZGUg PGFzbS90bGJmbHVzaC5oPgogI2luY2x1ZGUgPGFzbS9tY2UuaD4KICNpbmNsdWRlIDxhc20vbXNy Lmg+CisjaW5jbHVkZSA8YXNtL3JlYm9vdC5oPgogCiAjaW5jbHVkZSAibWNlLWludGVybmFsLmgi CiAKQEAgLTEwODEsOSArMTA4MiwyMiBAQCB2b2lkIGRvX21hY2hpbmVfY2hlY2soc3RydWN0IHB0 X3JlZ3MgKnJlCiAJICogb24gSW50ZWwuCiAJICovCiAJaW50IGxtY2UgPSAxOworCWludCBjcHUg PSBzbXBfcHJvY2Vzc29yX2lkKCk7CiAKLQkvKiBJZiB0aGlzIENQVSBpcyBvZmZsaW5lLCBqdXN0 IGJhaWwgb3V0LiAqLwotCWlmIChjcHVfaXNfb2ZmbGluZShzbXBfcHJvY2Vzc29yX2lkKCkpKSB7 CisJLyoKKwkgKiBDYXNlcyB3aGVyZSB3ZSBhdm9pZCByZW5kZXp2b3VzIGhhbmRsZXIgdGltZW91 dDoKKwkgKiAxKSBJZiB0aGlzIENQVSBpcyBvZmZsaW5lLgorCSAqCisJICogMikgSWYgY3Jhc2hp bmdfY3B1IHdhcyBzZXQsIGUuZy4gd2UncmUgZW50ZXJpbmcga2R1bXAgYW5kIHdlIG5lZWQgdG8K KwkgKiAgc2tpcCB0aG9zZSBDUFVzIHdoaWNoIHJlbWFpbiBsb29waW5nIGluIHRoZSAxc3Qga2Vy bmVsIC0gc2VlCisJICogIGNyYXNoX25taV9jYWxsYmFjaygpLgorCSAqCisJICogTm90ZTogdGhl cmUgc3RpbGwgaXMgYSBzbWFsbCB3aW5kb3cgYmV0d2VlbiBrZXhlYy1pbmcgYW5kIHRoZSBuZXcs CisJICoga2R1bXAga2VybmVsIGVzdGFibGlzaGluZyBhIG5ldyAjTUMgaGFuZGxlciB3aGVyZSBh IGJyb2FkY2FzdGVkIE1DRQorCSAqIG1pZ2h0IG5vdCBnZXQgaGFuZGxlZCBwcm9wZXJseS4KKwkg Ki8KKwlpZiAoY3B1X2lzX29mZmxpbmUoY3B1KSB8fAorCSAgICAoY3Jhc2hpbmdfY3B1ICE9IC0x ICYmIGNyYXNoaW5nX2NwdSAhPSBjcHUpKSB7CiAJCXU2NCBtY2dzdGF0dXM7CiAKIAkJbWNnc3Rh dHVzID0gbWNlX3JkbXNybChNU1JfSUEzMl9NQ0dfU1RBVFVTKTsKLS0tIGEvYXJjaC94ODYva2Vy bmVsL3JlYm9vdC5jCisrKyBiL2FyY2gveDg2L2tlcm5lbC9yZWJvb3QuYwpAQCAtNzY5LDEwICs3 NjksMTEgQEAgdm9pZCBtYWNoaW5lX2NyYXNoX3NodXRkb3duKHN0cnVjdCBwdF9yZQogI2VuZGlm CiAKIAorLyogVGhpcyBpcyB0aGUgQ1BVIHBlcmZvcm1pbmcgdGhlIGVtZXJnZW5jeSBzaHV0ZG93 biB3b3JrLiAqLworaW50IGNyYXNoaW5nX2NwdSA9IC0xOworCiAjaWYgZGVmaW5lZChDT05GSUdf U01QKQogCi0vKiBUaGlzIGtlZXBzIGEgdHJhY2sgb2Ygd2hpY2ggb25lIGlzIGNyYXNoaW5nIGNw dS4gKi8KLXN0YXRpYyBpbnQgY3Jhc2hpbmdfY3B1Owogc3RhdGljIG5taV9zaG9vdGRvd25fY2Ig c2hvb3Rkb3duX2NhbGxiYWNrOwogCiBzdGF0aWMgYXRvbWljX3Qgd2FpdGluZ19mb3JfY3Jhc2hf aXBpOwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail.linuxfoundation.org ([140.211.169.12]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1exzN6-0005Re-7b for kexec@lists.infradead.org; Mon, 19 Mar 2018 18:19:25 +0000 From: Greg Kroah-Hartman Subject: [PATCH 4.9 022/241] x86/mce: Handle broadcasted MCE gracefully with kexec Date: Mon, 19 Mar 2018 19:04:47 +0100 Message-Id: <20180319180752.087801059@linuxfoundation.org> In-Reply-To: <20180319180751.172155436@linuxfoundation.org> References: <20180319180751.172155436@linuxfoundation.org> MIME-Version: 1.0 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: linux-kernel@vger.kernel.org Cc: Tony Luck , Thomas Gleixner , Greg Kroah-Hartman , Xunlei Pang , stable@vger.kernel.org, Sasha Levin , Borislav Petkov , Naoya Horiguchi , Borislav Petkov , kexec@lists.infradead.org, linux-edac 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Xunlei Pang [ Upstream commit 5bc329503e8191c91c4c40836f062ef771d8ba83 ] When we are about to kexec a crash kernel and right then and there a broadcasted MCE fires while we're still in the first kernel and while the other CPUs remain in a holding pattern, the #MC handler of the first kernel will timeout and then panic due to never completing MCE synchronization. Handle this in a similar way as to when the CPUs are offlined when that broadcasted MCE happens. [ Boris: rewrote commit message and comments. ] Suggested-by: Borislav Petkov Signed-off-by: Xunlei Pang Signed-off-by: Borislav Petkov Acked-by: Tony Luck Cc: Naoya Horiguchi Cc: kexec@lists.infradead.org Cc: linux-edac Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/reboot.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++-- arch/x86/kernel/reboot.c | 5 +++-- 3 files changed, 20 insertions(+), 4 deletions(-) --- a/arch/x86/include/asm/reboot.h +++ b/arch/x86/include/asm/reboot.h @@ -15,6 +15,7 @@ struct machine_ops { }; extern struct machine_ops machine_ops; +extern int crashing_cpu; void native_machine_crash_shutdown(struct pt_regs *regs); void native_machine_shutdown(void); --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "mce-internal.h" @@ -1081,9 +1082,22 @@ void do_machine_check(struct pt_regs *re * on Intel. */ int lmce = 1; + int cpu = smp_processor_id(); - /* If this CPU is offline, just bail out. */ - if (cpu_is_offline(smp_processor_id())) { + /* + * Cases where we avoid rendezvous handler timeout: + * 1) If this CPU is offline. + * + * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to + * skip those CPUs which remain looping in the 1st kernel - see + * crash_nmi_callback(). + * + * Note: there still is a small window between kexec-ing and the new, + * kdump kernel establishing a new #MC handler where a broadcasted MCE + * might not get handled properly. + */ + if (cpu_is_offline(cpu) || + (crashing_cpu != -1 && crashing_cpu != cpu)) { u64 mcgstatus; mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -769,10 +769,11 @@ void machine_crash_shutdown(struct pt_re #endif +/* This is the CPU performing the emergency shutdown work. */ +int crashing_cpu = -1; + #if defined(CONFIG_SMP) -/* This keeps a track of which one is crashing cpu. */ -static int crashing_cpu; static nmi_shootdown_cb shootdown_callback; static atomic_t waiting_for_crash_ipi; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec