From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Cyrus-Session-Id: sloti22d1t05-1312649-1520121910-2-3796195520771335171 X-Sieve: CMU Sieve 3.0 X-Spam-known-sender: no X-Spam-score: 0.0 X-Spam-hits: BAYES_00 -1.9, HEADER_FROM_DIFFERENT_DOMAINS 0.249, RCVD_IN_DNSWL_HI -5, T_RP_MATCHES_RCVD -0.01, LANGUAGES en, BAYES_USED global, SA_VERSION 3.4.0 X-Spam-source: IP='209.132.180.67', Host='vger.kernel.org', Country='CN', FromHeader='com', MailFrom='org', XOriginatingCountry='US' X-Spam-charsets: plain='iso-8859-1' X-Resolved-to: greg@kroah.com X-Delivered-to: greg@kroah.com X-Mail-from: stable-owner@vger.kernel.org ARC-Seal: i=1; a=rsa-sha256; cv=none; d=messagingengine.com; s=arctest; t=1520121909; b=OM9bGGj1b7/5pHIpiZ2Lu2hcLd2P0P4gxpPkZFHKhTebfMS luda1/HwHIpyQs3W/IV3aCK4C8N+rzndetAQb1Q6qBzbAKTisPqA5Ofyte2carHK syKIQVzu9zVKwc9AQIQ5zbgkl6foZ7O4Ak5IlhYKuQc+5QZGXVbF2N45OlrWtGZK D2rnNZ5CQ/rCIPCay8TbRPFqtUlc5+VkAD58OS5pUcmNS544SzA2LNG6bTGoLVOw PcCaEHm7cPJbCyWFIZRRoi2+OuAswprjMXj+6+9ADNqqD7m3O7m8g3lOoHbOIWSF Fz9u1mgikVFys6ULwRTNbtRwHayjmhc7zar9bHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=from:to:cc:subject:date:message-id :references:in-reply-to:content-type:content-transfer-encoding :mime-version:sender:list-id; s=arctest; t=1520121909; bh=j2GyNV WPMS2SJMmGZuW9QEYkXdc2CkJ2FCHVsM4F7H0=; b=BXfaNFJOV5uRJKUCwfoRWX e5/UaRbxxFvTDx/jSPF2q+4Kf9k8qMWHVEeaEYjAQfDo3R3B9U99GPdUK08nip4r 0I9Sr/HKaU0lYuUdLHQQxQpzuaQ/4J/GJHS4oR5NYYW/mHO/tZTbqyNeCYzKsSzC yFdrSLOpe3BdAHBb7wIrENTfJ2saGUA7Lv9/yHDLgCHCSG7HY2tGh4dUBtTW5nha rwVC5lIu4oIaqj1OG6bZXq+hDjpR6gjUAQlEQfX4MDzrAZ1htnU74rNO6lXIlHaY NSrAdGDm9W5Hw1wIbW5o4c0XHI91oneu6NW2TpTUYebPzxCYSOvqFfN3EqMrqL9A == ARC-Authentication-Results: i=1; mx4.messagingengine.com; arc=none (no signatures found); dkim=pass (1024-bit rsa key sha256) header.d=microsoft.com header.i=@microsoft.com header.b=UUR5xqIA x-bits=1024 x-keytype=rsa x-algorithm=sha256 x-selector=selector1; dmarc=pass (p=reject,has-list-id=yes,d=none) header.from=microsoft.com; iprev=pass policy.iprev=209.132.180.67 (vger.kernel.org); spf=none smtp.mailfrom=stable-owner@vger.kernel.org smtp.helo=vger.kernel.org; x-aligned-from=fail; x-ptr=pass x-ptr-helo=vger.kernel.org x-ptr-lookup=vger.kernel.org; x-return-mx=pass smtp.domain=vger.kernel.org smtp.result=pass smtp_org.domain=kernel.org smtp_org.result=pass smtp_is_org_domain=no header.domain=microsoft.com header.result=pass header_is_org_domain=yes Authentication-Results: mx4.messagingengine.com; arc=none (no signatures found); dkim=pass (1024-bit rsa key sha256) header.d=microsoft.com header.i=@microsoft.com header.b=UUR5xqIA x-bits=1024 x-keytype=rsa x-algorithm=sha256 x-selector=selector1; dmarc=pass (p=reject,has-list-id=yes,d=none) header.from=microsoft.com; iprev=pass policy.iprev=209.132.180.67 (vger.kernel.org); spf=none smtp.mailfrom=stable-owner@vger.kernel.org smtp.helo=vger.kernel.org; x-aligned-from=fail; x-ptr=pass x-ptr-helo=vger.kernel.org x-ptr-lookup=vger.kernel.org; x-return-mx=pass smtp.domain=vger.kernel.org smtp.result=pass smtp_org.domain=kernel.org smtp_org.result=pass smtp_is_org_domain=no header.domain=microsoft.com header.result=pass header_is_org_domain=yes Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932516AbeCDAEs (ORCPT ); Sat, 3 Mar 2018 19:04:48 -0500 Received: from mail-sn1nam01on0113.outbound.protection.outlook.com ([104.47.32.113]:58832 "EHLO NAM01-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933579AbeCCWbW (ORCPT ); Sat, 3 Mar 2018 17:31:22 -0500 From: Sasha Levin To: "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" CC: Xunlei Pang , Borislav Petkov , Naoya Horiguchi , "kexec@lists.infradead.org" , linux-edac , Thomas Gleixner , Sasha Levin Subject: [PATCH AUTOSEL for 4.9 024/219] x86/mce: Handle broadcasted MCE gracefully with kexec Thread-Topic: [PATCH AUTOSEL for 4.9 024/219] x86/mce: Handle broadcasted MCE gracefully with kexec Thread-Index: AQHTsz7o4St1KjASoUSgEQKNPoODmw== Date: Sat, 3 Mar 2018 22:28:09 +0000 Message-ID: <20180303222716.26640-24-alexander.levin@microsoft.com> References: <20180303222716.26640-1-alexander.levin@microsoft.com> In-Reply-To: <20180303222716.26640-1-alexander.levin@microsoft.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [52.168.54.252] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;MW2PR2101MB0970;7:ytEN/UpkOLp3+VPGqlafAVRs1+VHibJ+0O9nUew8jGB9xzjEAq0BpQ7XCxDEgUkCo9AlpoKyypfv6wXq+hi8nY/0WwCjN0Y+M/kG1ihQBGaaFdHWlMI/K9+nFlOt0MXi8KLwLJQKdHywPLGnUO3K7EWdowHQABqgo1EZd+qKA80iknANTAc/ufPAhhiZGd25s8nIHiFTTp9V+nMsgiETySJC9OsnIAEKNSve7wIMWIBf9SskQXq8j+yyirKsHZnz x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: c8d81519-814d-46e6-c7c4-08d581567c3c x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(48565401081)(5600026)(4604075)(3008032)(2017052603307)(7193020);SRVR:MW2PR2101MB0970; x-ms-traffictypediagnostic: MW2PR2101MB0970: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Alexander.Levin@microsoft.com; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(28532068793085)(89211679590171)(9452136761055)(258649278758335)(42068640409301)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(61425038)(6040501)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3002001)(3231220)(944501244)(52105095)(6055026)(61426038)(61427038)(6041288)(20161123562045)(20161123558120)(20161123560045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011);SRVR:MW2PR2101MB0970;BCL:0;PCL:0;RULEID:;SRVR:MW2PR2101MB0970; x-forefront-prvs: 0600F93FE1 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(376002)(39380400002)(366004)(396003)(39860400002)(346002)(199004)(189003)(575784001)(86362001)(110136005)(4326008)(6506007)(14454004)(99286004)(8936002)(1076002)(36756003)(2906002)(102836004)(76176011)(25786009)(6486002)(316002)(478600001)(3280700002)(107886003)(186003)(54906003)(26005)(81166006)(966005)(7736002)(8676002)(6512007)(6306002)(81156014)(305945005)(72206003)(6436002)(53936002)(6116002)(6666003)(86612001)(68736007)(3846002)(2950100002)(3660700001)(10290500003)(22452003)(10090500001)(2900100001)(106356001)(2501003)(66066001)(97736004)(5660300001)(5250100002)(105586002)(22906009)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:MW2PR2101MB0970;H:MW2PR2101MB1034.namprd21.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; x-microsoft-antispam-message-info: wHERh0ua41c9kgDawFBh7xKS3mOWRe+xoU85hfBhNU/KNLkt2q9CZCa820OpDYeFx+MfinRW/gIpt/ddRsvIOwdtkILsNWI08LLizPtRllvqRxn4eArL4W8kTDmtZhmAP6JP52sy6qo6CFQqHt7lZBZcGUQzfFG61BP+ZwA2boQ= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: c8d81519-814d-46e6-c7c4-08d581567c3c X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Mar 2018 22:28:09.6037 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR2101MB0970 Sender: stable-owner@vger.kernel.org X-Mailing-List: stable@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-Mailing-List: linux-kernel@vger.kernel.org List-ID: From: Xunlei Pang [ Upstream commit 5bc329503e8191c91c4c40836f062ef771d8ba83 ] When we are about to kexec a crash kernel and right then and there a broadcasted MCE fires while we're still in the first kernel and while the other CPUs remain in a holding pattern, the #MC handler of the first kernel will timeout and then panic due to never completing MCE synchronization. Handle this in a similar way as to when the CPUs are offlined when that broadcasted MCE happens. [ Boris: rewrote commit message and comments. ] Suggested-by: Borislav Petkov Signed-off-by: Xunlei Pang Signed-off-by: Borislav Petkov Acked-by: Tony Luck Cc: Naoya Horiguchi Cc: kexec@lists.infradead.org Cc: linux-edac Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redh= at.com Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin --- arch/x86/include/asm/reboot.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++-- arch/x86/kernel/reboot.c | 5 +++-- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h index 2cb1cc253d51..fc62ba8dce93 100644 --- a/arch/x86/include/asm/reboot.h +++ b/arch/x86/include/asm/reboot.h @@ -15,6 +15,7 @@ struct machine_ops { }; =20 extern struct machine_ops machine_ops; +extern int crashing_cpu; =20 void native_machine_crash_shutdown(struct pt_regs *regs); void native_machine_shutdown(void); diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/= mce.c index fe5cd6ea1f0e..b0cc7bd26ab5 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -48,6 +48,7 @@ #include #include #include +#include =20 #include "mce-internal.h" =20 @@ -1078,9 +1079,22 @@ void do_machine_check(struct pt_regs *regs, long err= or_code) * on Intel. */ int lmce =3D 1; + int cpu =3D smp_processor_id(); =20 - /* If this CPU is offline, just bail out. */ - if (cpu_is_offline(smp_processor_id())) { + /* + * Cases where we avoid rendezvous handler timeout: + * 1) If this CPU is offline. + * + * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to + * skip those CPUs which remain looping in the 1st kernel - see + * crash_nmi_callback(). + * + * Note: there still is a small window between kexec-ing and the new, + * kdump kernel establishing a new #MC handler where a broadcasted MCE + * might not get handled properly. + */ + if (cpu_is_offline(cpu) || + (crashing_cpu !=3D -1 && crashing_cpu !=3D cpu)) { u64 mcgstatus; =20 mcgstatus =3D mce_rdmsrl(MSR_IA32_MCG_STATUS); diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index ce020a69bba9..03f21dbfaa9d 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -769,10 +769,11 @@ void machine_crash_shutdown(struct pt_regs *regs) #endif =20 =20 +/* This is the CPU performing the emergency shutdown work. */ +int crashing_cpu =3D -1; + #if defined(CONFIG_SMP) =20 -/* This keeps a track of which one is crashing cpu. */ -static int crashing_cpu; static nmi_shootdown_cb shootdown_callback; =20 static atomic_t waiting_for_crash_ipi; --=20 2.14.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [AUTOSEL,for,4.9,024/219] x86/mce: Handle broadcasted MCE gracefully with kexec From: Sasha Levin Message-Id: <20180303222716.26640-24-alexander.levin@microsoft.com> Date: Sat, 3 Mar 2018 22:28:09 +0000 To: "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Cc: Xunlei Pang , Borislav Petkov , Naoya Horiguchi , "kexec@lists.infradead.org" , linux-edac , Thomas Gleixner , Sasha Levin List-ID: RnJvbTogWHVubGVpIFBhbmcgPHhscGFuZ0ByZWRoYXQuY29tPgoKWyBVcHN0cmVhbSBjb21taXQg NWJjMzI5NTAzZTgxOTFjOTFjNGM0MDgzNmYwNjJlZjc3MWQ4YmE4MyBdCgpXaGVuIHdlIGFyZSBh Ym91dCB0byBrZXhlYyBhIGNyYXNoIGtlcm5lbCBhbmQgcmlnaHQgdGhlbiBhbmQgdGhlcmUgYQpi cm9hZGNhc3RlZCBNQ0UgZmlyZXMgd2hpbGUgd2UncmUgc3RpbGwgaW4gdGhlIGZpcnN0IGtlcm5l bCBhbmQgd2hpbGUKdGhlIG90aGVyIENQVXMgcmVtYWluIGluIGEgaG9sZGluZyBwYXR0ZXJuLCB0 aGUgI01DIGhhbmRsZXIgb2YgdGhlCmZpcnN0IGtlcm5lbCB3aWxsIHRpbWVvdXQgYW5kIHRoZW4g cGFuaWMgZHVlIHRvIG5ldmVyIGNvbXBsZXRpbmcgTUNFCnN5bmNocm9uaXphdGlvbi4KCkhhbmRs ZSB0aGlzIGluIGEgc2ltaWxhciB3YXkgYXMgdG8gd2hlbiB0aGUgQ1BVcyBhcmUgb2ZmbGluZWQg d2hlbiB0aGF0CmJyb2FkY2FzdGVkIE1DRSBoYXBwZW5zLgoKWyBCb3JpczogcmV3cm90ZSBjb21t aXQgbWVzc2FnZSBhbmQgY29tbWVudHMuIF0KClN1Z2dlc3RlZC1ieTogQm9yaXNsYXYgUGV0a292 IDxicEBhbGllbjguZGU+ClNpZ25lZC1vZmYtYnk6IFh1bmxlaSBQYW5nIDx4bHBhbmdAcmVkaGF0 LmNvbT4KU2lnbmVkLW9mZi1ieTogQm9yaXNsYXYgUGV0a292IDxicEBzdXNlLmRlPgpBY2tlZC1i eTogVG9ueSBMdWNrIDx0b255Lmx1Y2tAaW50ZWwuY29tPgpDYzogTmFveWEgSG9yaWd1Y2hpIDxu LWhvcmlndWNoaUBhaC5qcC5uZWMuY29tPgpDYzoga2V4ZWNAbGlzdHMuaW5mcmFkZWFkLm9yZwpD YzogbGludXgtZWRhYyA8bGludXgtZWRhY0B2Z2VyLmtlcm5lbC5vcmc+Ckxpbms6IGh0dHA6Ly9s a21sLmtlcm5lbC5vcmcvci8xNDg3ODU3MDEyLTkwNTktMS1naXQtc2VuZC1lbWFpbC14bHBhbmdA cmVkaGF0LmNvbQpMaW5rOiBodHRwOi8vbGttbC5rZXJuZWwub3JnL3IvMjAxNzAzMTMwOTUwMTku MTkzNTEtMS1icEBhbGllbjguZGUKU2lnbmVkLW9mZi1ieTogVGhvbWFzIEdsZWl4bmVyIDx0Z2x4 QGxpbnV0cm9uaXguZGU+ClNpZ25lZC1vZmYtYnk6IFNhc2hhIExldmluIDxhbGV4YW5kZXIubGV2 aW5AbWljcm9zb2Z0LmNvbT4KLS0tCiBhcmNoL3g4Ni9pbmNsdWRlL2FzbS9yZWJvb3QuaCAgICB8 ICAxICsKIGFyY2gveDg2L2tlcm5lbC9jcHUvbWNoZWNrL21jZS5jIHwgMTggKysrKysrKysrKysr KysrKy0tCiBhcmNoL3g4Ni9rZXJuZWwvcmVib290LmMgICAgICAgICB8ICA1ICsrKy0tCiAzIGZp bGVzIGNoYW5nZWQsIDIwIGluc2VydGlvbnMoKyksIDQgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0 IGEvYXJjaC94ODYvaW5jbHVkZS9hc20vcmVib290LmggYi9hcmNoL3g4Ni9pbmNsdWRlL2FzbS9y ZWJvb3QuaAppbmRleCAyY2IxY2MyNTNkNTEuLmZjNjJiYThkY2U5MyAxMDA2NDQKLS0tIGEvYXJj aC94ODYvaW5jbHVkZS9hc20vcmVib290LmgKKysrIGIvYXJjaC94ODYvaW5jbHVkZS9hc20vcmVi b290LmgKQEAgLTE1LDYgKzE1LDcgQEAgc3RydWN0IG1hY2hpbmVfb3BzIHsKIH07CiAKIGV4dGVy biBzdHJ1Y3QgbWFjaGluZV9vcHMgbWFjaGluZV9vcHM7CitleHRlcm4gaW50IGNyYXNoaW5nX2Nw dTsKIAogdm9pZCBuYXRpdmVfbWFjaGluZV9jcmFzaF9zaHV0ZG93bihzdHJ1Y3QgcHRfcmVncyAq cmVncyk7CiB2b2lkIG5hdGl2ZV9tYWNoaW5lX3NodXRkb3duKHZvaWQpOwpkaWZmIC0tZ2l0IGEv YXJjaC94ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMgYi9hcmNoL3g4Ni9rZXJuZWwvY3B1L21j aGVjay9tY2UuYwppbmRleCBmZTVjZDZlYTFmMGUuLmIwY2M3YmQyNmFiNSAxMDA2NDQKLS0tIGEv YXJjaC94ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMKKysrIGIvYXJjaC94ODYva2VybmVsL2Nw dS9tY2hlY2svbWNlLmMKQEAgLTQ4LDYgKzQ4LDcgQEAKICNpbmNsdWRlIDxhc20vdGxiZmx1c2gu aD4KICNpbmNsdWRlIDxhc20vbWNlLmg+CiAjaW5jbHVkZSA8YXNtL21zci5oPgorI2luY2x1ZGUg PGFzbS9yZWJvb3QuaD4KIAogI2luY2x1ZGUgIm1jZS1pbnRlcm5hbC5oIgogCkBAIC0xMDc4LDkg KzEwNzksMjIgQEAgdm9pZCBkb19tYWNoaW5lX2NoZWNrKHN0cnVjdCBwdF9yZWdzICpyZWdzLCBs b25nIGVycm9yX2NvZGUpCiAJICogb24gSW50ZWwuCiAJICovCiAJaW50IGxtY2UgPSAxOworCWlu dCBjcHUgPSBzbXBfcHJvY2Vzc29yX2lkKCk7CiAKLQkvKiBJZiB0aGlzIENQVSBpcyBvZmZsaW5l LCBqdXN0IGJhaWwgb3V0LiAqLwotCWlmIChjcHVfaXNfb2ZmbGluZShzbXBfcHJvY2Vzc29yX2lk KCkpKSB7CisJLyoKKwkgKiBDYXNlcyB3aGVyZSB3ZSBhdm9pZCByZW5kZXp2b3VzIGhhbmRsZXIg dGltZW91dDoKKwkgKiAxKSBJZiB0aGlzIENQVSBpcyBvZmZsaW5lLgorCSAqCisJICogMikgSWYg Y3Jhc2hpbmdfY3B1IHdhcyBzZXQsIGUuZy4gd2UncmUgZW50ZXJpbmcga2R1bXAgYW5kIHdlIG5l ZWQgdG8KKwkgKiAgc2tpcCB0aG9zZSBDUFVzIHdoaWNoIHJlbWFpbiBsb29waW5nIGluIHRoZSAx c3Qga2VybmVsIC0gc2VlCisJICogIGNyYXNoX25taV9jYWxsYmFjaygpLgorCSAqCisJICogTm90 ZTogdGhlcmUgc3RpbGwgaXMgYSBzbWFsbCB3aW5kb3cgYmV0d2VlbiBrZXhlYy1pbmcgYW5kIHRo ZSBuZXcsCisJICoga2R1bXAga2VybmVsIGVzdGFibGlzaGluZyBhIG5ldyAjTUMgaGFuZGxlciB3 aGVyZSBhIGJyb2FkY2FzdGVkIE1DRQorCSAqIG1pZ2h0IG5vdCBnZXQgaGFuZGxlZCBwcm9wZXJs eS4KKwkgKi8KKwlpZiAoY3B1X2lzX29mZmxpbmUoY3B1KSB8fAorCSAgICAoY3Jhc2hpbmdfY3B1 ICE9IC0xICYmIGNyYXNoaW5nX2NwdSAhPSBjcHUpKSB7CiAJCXU2NCBtY2dzdGF0dXM7CiAKIAkJ bWNnc3RhdHVzID0gbWNlX3JkbXNybChNU1JfSUEzMl9NQ0dfU1RBVFVTKTsKZGlmZiAtLWdpdCBh L2FyY2gveDg2L2tlcm5lbC9yZWJvb3QuYyBiL2FyY2gveDg2L2tlcm5lbC9yZWJvb3QuYwppbmRl eCBjZTAyMGE2OWJiYTkuLjAzZjIxZGJmYWE5ZCAxMDA2NDQKLS0tIGEvYXJjaC94ODYva2VybmVs L3JlYm9vdC5jCisrKyBiL2FyY2gveDg2L2tlcm5lbC9yZWJvb3QuYwpAQCAtNzY5LDEwICs3Njks MTEgQEAgdm9pZCBtYWNoaW5lX2NyYXNoX3NodXRkb3duKHN0cnVjdCBwdF9yZWdzICpyZWdzKQog I2VuZGlmCiAKIAorLyogVGhpcyBpcyB0aGUgQ1BVIHBlcmZvcm1pbmcgdGhlIGVtZXJnZW5jeSBz aHV0ZG93biB3b3JrLiAqLworaW50IGNyYXNoaW5nX2NwdSA9IC0xOworCiAjaWYgZGVmaW5lZChD T05GSUdfU01QKQogCi0vKiBUaGlzIGtlZXBzIGEgdHJhY2sgb2Ygd2hpY2ggb25lIGlzIGNyYXNo aW5nIGNwdS4gKi8KLXN0YXRpYyBpbnQgY3Jhc2hpbmdfY3B1Owogc3RhdGljIG5taV9zaG9vdGRv d25fY2Igc2hvb3Rkb3duX2NhbGxiYWNrOwogCiBzdGF0aWMgYXRvbWljX3Qgd2FpdGluZ19mb3Jf Y3Jhc2hfaXBpOwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail-bn3nam01on0101.outbound.protection.outlook.com ([104.47.33.101] helo=NAM01-BN3-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.89 #1 (Red Hat Linux)) id 1esFgj-00087N-7P for kexec@lists.infradead.org; Sat, 03 Mar 2018 22:31:50 +0000 From: Sasha Levin Subject: [PATCH AUTOSEL for 4.9 024/219] x86/mce: Handle broadcasted MCE gracefully with kexec Date: Sat, 3 Mar 2018 22:28:09 +0000 Message-ID: <20180303222716.26640-24-alexander.levin@microsoft.com> References: <20180303222716.26640-1-alexander.levin@microsoft.com> In-Reply-To: <20180303222716.26640-1-alexander.levin@microsoft.com> Content-Language: en-US MIME-Version: 1.0 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Cc: Xunlei Pang , "kexec@lists.infradead.org" , Sasha Levin , Naoya Horiguchi , Borislav Petkov , Thomas Gleixner , linux-edac From: Xunlei Pang [ Upstream commit 5bc329503e8191c91c4c40836f062ef771d8ba83 ] When we are about to kexec a crash kernel and right then and there a broadcasted MCE fires while we're still in the first kernel and while the other CPUs remain in a holding pattern, the #MC handler of the first kernel will timeout and then panic due to never completing MCE synchronization. Handle this in a similar way as to when the CPUs are offlined when that broadcasted MCE happens. [ Boris: rewrote commit message and comments. ] Suggested-by: Borislav Petkov Signed-off-by: Xunlei Pang Signed-off-by: Borislav Petkov Acked-by: Tony Luck Cc: Naoya Horiguchi Cc: kexec@lists.infradead.org Cc: linux-edac Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin --- arch/x86/include/asm/reboot.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++-- arch/x86/kernel/reboot.c | 5 +++-- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h index 2cb1cc253d51..fc62ba8dce93 100644 --- a/arch/x86/include/asm/reboot.h +++ b/arch/x86/include/asm/reboot.h @@ -15,6 +15,7 @@ struct machine_ops { }; extern struct machine_ops machine_ops; +extern int crashing_cpu; void native_machine_crash_shutdown(struct pt_regs *regs); void native_machine_shutdown(void); diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index fe5cd6ea1f0e..b0cc7bd26ab5 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "mce-internal.h" @@ -1078,9 +1079,22 @@ void do_machine_check(struct pt_regs *regs, long error_code) * on Intel. */ int lmce = 1; + int cpu = smp_processor_id(); - /* If this CPU is offline, just bail out. */ - if (cpu_is_offline(smp_processor_id())) { + /* + * Cases where we avoid rendezvous handler timeout: + * 1) If this CPU is offline. + * + * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to + * skip those CPUs which remain looping in the 1st kernel - see + * crash_nmi_callback(). + * + * Note: there still is a small window between kexec-ing and the new, + * kdump kernel establishing a new #MC handler where a broadcasted MCE + * might not get handled properly. + */ + if (cpu_is_offline(cpu) || + (crashing_cpu != -1 && crashing_cpu != cpu)) { u64 mcgstatus; mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index ce020a69bba9..03f21dbfaa9d 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -769,10 +769,11 @@ void machine_crash_shutdown(struct pt_regs *regs) #endif +/* This is the CPU performing the emergency shutdown work. */ +int crashing_cpu = -1; + #if defined(CONFIG_SMP) -/* This keeps a track of which one is crashing cpu. */ -static int crashing_cpu; static nmi_shootdown_cb shootdown_callback; static atomic_t waiting_for_crash_ipi; -- 2.14.1 _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec