From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: FAILED: patch "[PATCH] x86/mce: Fix incorrect "Machine check from unknown source"" failed to apply to 4.4-stable tree From: "Luck, Tony" Message-Id: <20180628220931.GA569@agluck-desk> Date: Thu, 28 Jun 2018 15:09:31 -0700 To: gregkh@linuxfoundation.org Cc: ashok.raj@intel.com, bp@suse.de, dan.j.williams@intel.com, linux-edac@vger.kernel.org, qiuxu.zhuo@intel.com, tglx@linutronix.de, stable@vger.kernel.org List-ID: T24gVGh1LCBKdW4gMjgsIDIwMTggYXQgMTE6MDc6MjJBTSArMDkwMCwgZ3JlZ2toQGxpbnV4Zm91 bmRhdGlvbi5vcmcgd3JvdGU6Cj4gCj4gVGhlIHBhdGNoIGJlbG93IGRvZXMgbm90IGFwcGx5IHRv IHRoZSA0LjQtc3RhYmxlIHRyZWUuCj4gSWYgc29tZW9uZSB3YW50cyBpdCBhcHBsaWVkIHRoZXJl LCBvciB0byBhbnkgb3RoZXIgc3RhYmxlIG9yIGxvbmd0ZXJtCj4gdHJlZSwgdGhlbiBwbGVhc2Ug ZW1haWwgdGhlIGJhY2twb3J0LCBpbmNsdWRpbmcgdGhlIG9yaWdpbmFsIGdpdCBjb21taXQKPiBp ZCB0byA8c3RhYmxlQHZnZXIua2VybmVsLm9yZz4uCj4gCj4gdGhhbmtzLAoKVGhpcyBwYXRjaCBy ZWxpZXMgb246CgoJM2FjYjQzMWI4NGQ4ICgieDg2L21jZTogRGV0ZWN0IGxvY2FsIE1DRXMgcHJv cGVybHkiKQoKY2hlcnJ5IHBpY2sgdGhhdCAoYW5kIGZpeCB1cCB0aGUgdHJpdmlhbCBtZXJnZSBw cm9ibGVtIGFyb3VuZCB0aGUKY2hhbmdlIHRvIGluaXRpYWxpemUgImxtY2UgPSAxOyIgaW5zdGVh ZCBvZiAibG1jZSA9IDAiOykKClRoZW4gdGhpcyB3aWxsIG1lcmdlIGNsZWFubHkuCgotVG9ueQo+ IAo+IC0tLS0tLS0tLS0tLS0tLS0tLSBvcmlnaW5hbCBjb21taXQgaW4gTGludXMncyB0cmVlIC0t LS0tLS0tLS0tLS0tLS0tLQo+IAo+IEZyb20gNDBjMzZlMjc0MWQ3ZmUxZTY2ZDZlYzU1NDc3YmE1 ZmQxOWM5YzVkMiBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDEKPiBGcm9tOiBUb255IEx1Y2sgPHRv bnkubHVja0BpbnRlbC5jb20+Cj4gRGF0ZTogRnJpLCAyMiBKdW4gMjAxOCAxMTo1NDoyMyArMDIw MAo+IFN1YmplY3Q6IFtQQVRDSF0geDg2L21jZTogRml4IGluY29ycmVjdCAiTWFjaGluZSBjaGVj ayBmcm9tIHVua25vd24gc291cmNlIgo+ICBtZXNzYWdlCj4gCj4gU29tZSBpbmplY3Rpb24gdGVz dGluZyByZXN1bHRlZCBpbiB0aGUgZm9sbG93aW5nIGNvbnNvbGUgbG9nOgo+IAo+ICAgbWNlOiBb SGFyZHdhcmUgRXJyb3JdOiBDUFUgMjI6IE1hY2hpbmUgQ2hlY2sgRXhjZXB0aW9uOiBmIEJhbmsg MTogYmQ4MDAwMDAwMDEwMDEzNAo+ICAgbWNlOiBbSGFyZHdhcmUgRXJyb3JdOiBSSVAgMTA6PGZm ZmZmZmZmYzA1MjkyZGQ+IHtwbWVtX2RvX2J2ZWMrMHgxMWQvMHgzMzAgW25kX3BtZW1dfQo+ICAg bWNlOiBbSGFyZHdhcmUgRXJyb3JdOiBUU0MgYzUxYTYzMDM1ZDUyIEFERFIgMzIzNGJjNDAwMCBN SVNDIDg4Cj4gICBtY2U6IFtIYXJkd2FyZSBFcnJvcl06IFBST0NFU1NPUiAwOjUwNjU0IFRJTUUg MTUyNjUwMjE5OSBTT0NLRVQgMCBBUElDIDM4IG1pY3JvY29kZSAyMDAwMDQzCj4gICBtY2U6IFtI YXJkd2FyZSBFcnJvcl06IFJ1biB0aGUgYWJvdmUgdGhyb3VnaCAnbWNlbG9nIC0tYXNjaWknCj4g ICBLZXJuZWwgcGFuaWMgLSBub3Qgc3luY2luZzogTWFjaGluZSBjaGVjayBmcm9tIHVua25vd24g c291cmNlCj4gCj4gVGhpcyBjb25mdXNlZCBldmVyeWJvZHkgYmVjYXVzZSB0aGUgZmlyc3QgbGlu ZSBxdWl0ZSBjbGVhcmx5IHNob3dzCj4gdGhhdCB3ZSBmb3VuZCBhIGxvZ2dlZCBlcnJvciBpbiAi QmFuayAxIiwgd2hpbGUgdGhlIGxhc3QgbGluZSBzYXlzCj4gInVua25vd24gc291cmNlIi4KPiAK PiBUaGUgcHJvYmxlbSBpcyB0aGF0IHRoZSBMaW51eCBjb2RlIGRvZXNuJ3QgZG8gdGhlIHJpZ2h0 IHRoaW5nCj4gZm9yIGEgbG9jYWwgbWFjaGluZSBjaGVjayB0aGF0IHJlc3VsdHMgaW4gYSBmYXRh bCBlcnJvci4KPiAKPiBJdCB0dXJucyBvdXQgdGhhdCB3ZSBrbm93IHZlcnkgZWFybHkgaW4gdGhl IGhhbmRsZXIgd2hldGhlciB0aGUKPiBtYWNoaW5lIGNoZWNrIGlzIGZhdGFsLiBUaGUgY2FsbCB0 byBtY2Vfbm9fd2F5X291dCgpIGhhcyBjaGVja2VkCj4gYWxsIHRoZSBiYW5rcyBmb3IgdGhlIENQ VSB0aGF0IHRvb2sgdGhlIGxvY2FsIG1hY2hpbmUgY2hlY2suIElmCj4gaXQgc2F5cyB3ZSBtdXN0 IGNyYXNoLCB3ZSBjYW4gZG8gc28gcmlnaHQgYXdheSB3aXRoIHRoZSByaWdodAo+IG1lc3NhZ2Vz Lgo+IAo+IFdlIGRvIHNjYW4gYWxsIHRoZSBiYW5rcyBhZ2Fpbi4gVGhpcyBtZWFucyB0aGF0IHdl IG1pZ2h0IGluaXRpYWxseQo+IG5vdCBzZWUgYSBwcm9ibGVtLCBidXQgZHVyaW5nIHRoZSBzZWNv bmQgc2NhbiBmaW5kIHNvbWV0aGluZyBmYXRhbC4KPiBJZiB0aGlzIGhhcHBlbnMgd2UgcHJpbnQg YSBzbGlnaHRseSBkaWZmZXJlbnQgbWVzc2FnZSAoc28gSSBjYW4KPiBzZWUgaWYgaXQgYWN0dWFs bHkgZXZlcnkgaGFwcGVucykuCj4gCj4gWyBicDogUmVtb3ZlIHVubmVlZGVkIHNldmVyaXR5IGFz c2lnbm1lbnQuIF0KPiAKPiBTaWduZWQtb2ZmLWJ5OiBUb255IEx1Y2sgPHRvbnkubHVja0BpbnRl bC5jb20+Cj4gU2lnbmVkLW9mZi1ieTogQm9yaXNsYXYgUGV0a292IDxicEBzdXNlLmRlPgo+IFNp Z25lZC1vZmYtYnk6IFRob21hcyBHbGVpeG5lciA8dGdseEBsaW51dHJvbml4LmRlPgo+IENjOiBB c2hvayBSYWogPGFzaG9rLnJhakBpbnRlbC5jb20+Cj4gQ2M6IERhbiBXaWxsaWFtcyA8ZGFuLmou d2lsbGlhbXNAaW50ZWwuY29tPgo+IENjOiBRaXV4dSBaaHVvIDxxaXV4dS56aHVvQGludGVsLmNv bT4KPiBDYzogbGludXgtZWRhYyA8bGludXgtZWRhY0B2Z2VyLmtlcm5lbC5vcmc+Cj4gQ2M6IHN0 YWJsZUB2Z2VyLmtlcm5lbC5vcmcgIyA0LjIKPiBMaW5rOiBodHRwOi8vbGttbC5rZXJuZWwub3Jn L3IvNTJlMDQ5YTQ5N2U4NmZkMGI3MWM1Mjk2NTFkZWY4ODcxYzgwNGRmMC4xNTI3MjgzODk3Lmdp dC50b255Lmx1Y2tAaW50ZWwuY29tCj4gCj4gZGlmZiAtLWdpdCBhL2FyY2gveDg2L2tlcm5lbC9j cHUvbWNoZWNrL21jZS5jIGIvYXJjaC94ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMKPiBpbmRl eCA3ZTZmNTFhOWQ5MTcuLmU5MzY3MGQ3MzZhNiAxMDA2NDQKPiAtLS0gYS9hcmNoL3g4Ni9rZXJu ZWwvY3B1L21jaGVjay9tY2UuYwo+ICsrKyBiL2FyY2gveDg2L2tlcm5lbC9jcHUvbWNoZWNrL21j ZS5jCj4gQEAgLTEyMDcsMTMgKzEyMDcsMTggQEAgdm9pZCBkb19tYWNoaW5lX2NoZWNrKHN0cnVj dCBwdF9yZWdzICpyZWdzLCBsb25nIGVycm9yX2NvZGUpCj4gIAkJbG1jZSA9IG0ubWNnc3RhdHVz ICYgTUNHX1NUQVRVU19MTUNFUzsKPiAgCj4gIAkvKgo+ICsJICogTG9jYWwgbWFjaGluZSBjaGVj ayBtYXkgYWxyZWFkeSBrbm93IHRoYXQgd2UgaGF2ZSB0byBwYW5pYy4KPiArCSAqIEJyb2FkY2Fz dCBtYWNoaW5lIGNoZWNrIGJlZ2lucyByZW5kZXp2b3VzIGluIG1jZV9zdGFydCgpCj4gIAkgKiBH byB0aHJvdWdoIGFsbCBiYW5rcyBpbiBleGNsdXNpb24gb2YgdGhlIG90aGVyIENQVXMuIFRoaXMg d2F5IHdlCj4gIAkgKiBkb24ndCByZXBvcnQgZHVwbGljYXRlZCBldmVudHMgb24gc2hhcmVkIGJh bmtzIGJlY2F1c2UgdGhlIGZpcnN0IG9uZQo+IC0JICogdG8gc2VlIGl0IHdpbGwgY2xlYXIgaXQu IElmIHRoaXMgaXMgYSBMb2NhbCBNQ0UsIHRoZW4gbm8gbmVlZCB0bwo+IC0JICogcGVyZm9ybSBy ZW5kZXp2b3VzLgo+ICsJICogdG8gc2VlIGl0IHdpbGwgY2xlYXIgaXQuCj4gIAkgKi8KPiAtCWlm ICghbG1jZSkKPiArCWlmIChsbWNlKSB7Cj4gKwkJaWYgKG5vX3dheV9vdXQpCj4gKwkJCW1jZV9w YW5pYygiRmF0YWwgbG9jYWwgbWFjaGluZSBjaGVjayIsICZtLCBtc2cpOwo+ICsJfSBlbHNlIHsK PiAgCQlvcmRlciA9IG1jZV9zdGFydCgmbm9fd2F5X291dCk7Cj4gKwl9Cj4gIAo+ICAJZm9yIChp ID0gMDsgaSA8IGNmZy0+YmFua3M7IGkrKykgewo+ICAJCV9fY2xlYXJfYml0KGksIHRvY2xlYXIp Owo+IEBAIC0xMjg5LDEyICsxMjk0LDE3IEBAIHZvaWQgZG9fbWFjaGluZV9jaGVjayhzdHJ1Y3Qg cHRfcmVncyAqcmVncywgbG9uZyBlcnJvcl9jb2RlKQo+ICAJCQlub193YXlfb3V0ID0gd29yc3Qg Pj0gTUNFX1BBTklDX1NFVkVSSVRZOwo+ICAJfSBlbHNlIHsKPiAgCQkvKgo+IC0JCSAqIExvY2Fs IE1DRSBza2lwcGVkIGNhbGxpbmcgbWNlX3JlaWduKCkKPiAtCQkgKiBJZiB3ZSBmb3VuZCBhIGZh dGFsIGVycm9yLCB3ZSBuZWVkIHRvIHBhbmljIGhlcmUuCj4gKwkJICogSWYgdGhlcmUgd2FzIGEg ZmF0YWwgbWFjaGluZSBjaGVjayB3ZSBzaG91bGQgaGF2ZQo+ICsJCSAqIGFscmVhZHkgY2FsbGVk IG1jZV9wYW5pYyBlYXJsaWVyIGluIHRoaXMgZnVuY3Rpb24uCj4gKwkJICogU2luY2Ugd2UgcmUt cmVhZCB0aGUgYmFua3MsIHdlIG1pZ2h0IGhhdmUgZm91bmQKPiArCQkgKiBzb21ldGhpbmcgbmV3 LiBDaGVjayBhZ2FpbiB0byBzZWUgaWYgd2UgZm91bmQgYQo+ICsJCSAqIGZhdGFsIGVycm9yLiBX ZSBjYWxsICJtY2Vfc2V2ZXJpdHkoKSIgYWdhaW4gdG8KPiArCQkgKiBtYWtlIHN1cmUgd2UgaGF2 ZSB0aGUgcmlnaHQgIm1zZyIuCj4gIAkJICovCj4gLQkJIGlmICh3b3JzdCA+PSBNQ0VfUEFOSUNf U0VWRVJJVFkgJiYgbWNhX2NmZy50b2xlcmFudCA8IDMpCj4gLQkJCW1jZV9wYW5pYygiTWFjaGlu ZSBjaGVjayBmcm9tIHVua25vd24gc291cmNlIiwKPiAtCQkJCU5VTEwsIE5VTEwpOwo+ICsJCWlm ICh3b3JzdCA+PSBNQ0VfUEFOSUNfU0VWRVJJVFkgJiYgbWNhX2NmZy50b2xlcmFudCA8IDMpIHsK PiArCQkJbWNlX3NldmVyaXR5KCZtLCBjZmctPnRvbGVyYW50LCAmbXNnLCB0cnVlKTsKPiArCQkJ bWNlX3BhbmljKCJMb2NhbCBmYXRhbCBtYWNoaW5lIGNoZWNrISIsICZtLCBtc2cpOwo+ICsJCX0K PiAgCX0KPiAgCj4gIAkvKgo+Ci0tLQpUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2Vu ZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtZWRhYyIgaW4KdGhlIGJvZHkgb2YgYSBtZXNz YWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcKTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAg aHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com ([134.134.136.31]:62226 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965855AbeF1WJg (ORCPT ); Thu, 28 Jun 2018 18:09:36 -0400 Date: Thu, 28 Jun 2018 15:09:31 -0700 From: "Luck, Tony" To: gregkh@linuxfoundation.org Cc: ashok.raj@intel.com, bp@suse.de, dan.j.williams@intel.com, linux-edac@vger.kernel.org, qiuxu.zhuo@intel.com, tglx@linutronix.de, stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] x86/mce: Fix incorrect "Machine check from unknown source"" failed to apply to 4.4-stable tree Message-ID: <20180628220931.GA569@agluck-desk> References: <1530151642162195@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1530151642162195@kroah.com> Sender: stable-owner@vger.kernel.org List-ID: On Thu, Jun 28, 2018 at 11:07:22AM +0900, gregkh@linuxfoundation.org wrote: > > The patch below does not apply to the 4.4-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to . > > thanks, This patch relies on: 3acb431b84d8 ("x86/mce: Detect local MCEs properly") cherry pick that (and fix up the trivial merge problem around the change to initialize "lmce = 1;" instead of "lmce = 0";) Then this will merge cleanly. -Tony > > ------------------ original commit in Linus's tree ------------------ > > From 40c36e2741d7fe1e66d6ec55477ba5fd19c9c5d2 Mon Sep 17 00:00:00 2001 > From: Tony Luck > Date: Fri, 22 Jun 2018 11:54:23 +0200 > Subject: [PATCH] x86/mce: Fix incorrect "Machine check from unknown source" > message > > Some injection testing resulted in the following console log: > > mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134 > mce: [Hardware Error]: RIP 10: {pmem_do_bvec+0x11d/0x330 [nd_pmem]} > mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88 > mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043 > mce: [Hardware Error]: Run the above through 'mcelog --ascii' > Kernel panic - not syncing: Machine check from unknown source > > This confused everybody because the first line quite clearly shows > that we found a logged error in "Bank 1", while the last line says > "unknown source". > > The problem is that the Linux code doesn't do the right thing > for a local machine check that results in a fatal error. > > It turns out that we know very early in the handler whether the > machine check is fatal. The call to mce_no_way_out() has checked > all the banks for the CPU that took the local machine check. If > it says we must crash, we can do so right away with the right > messages. > > We do scan all the banks again. This means that we might initially > not see a problem, but during the second scan find something fatal. > If this happens we print a slightly different message (so I can > see if it actually every happens). > > [ bp: Remove unneeded severity assignment. ] > > Signed-off-by: Tony Luck > Signed-off-by: Borislav Petkov > Signed-off-by: Thomas Gleixner > Cc: Ashok Raj > Cc: Dan Williams > Cc: Qiuxu Zhuo > Cc: linux-edac > Cc: stable@vger.kernel.org # 4.2 > Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 7e6f51a9d917..e93670d736a6 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -1207,13 +1207,18 @@ void do_machine_check(struct pt_regs *regs, long error_code) > lmce = m.mcgstatus & MCG_STATUS_LMCES; > > /* > + * Local machine check may already know that we have to panic. > + * Broadcast machine check begins rendezvous in mce_start() > * Go through all banks in exclusion of the other CPUs. This way we > * don't report duplicated events on shared banks because the first one > - * to see it will clear it. If this is a Local MCE, then no need to > - * perform rendezvous. > + * to see it will clear it. > */ > - if (!lmce) > + if (lmce) { > + if (no_way_out) > + mce_panic("Fatal local machine check", &m, msg); > + } else { > order = mce_start(&no_way_out); > + } > > for (i = 0; i < cfg->banks; i++) { > __clear_bit(i, toclear); > @@ -1289,12 +1294,17 @@ void do_machine_check(struct pt_regs *regs, long error_code) > no_way_out = worst >= MCE_PANIC_SEVERITY; > } else { > /* > - * Local MCE skipped calling mce_reign() > - * If we found a fatal error, we need to panic here. > + * If there was a fatal machine check we should have > + * already called mce_panic earlier in this function. > + * Since we re-read the banks, we might have found > + * something new. Check again to see if we found a > + * fatal error. We call "mce_severity()" again to > + * make sure we have the right "msg". > */ > - if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) > - mce_panic("Machine check from unknown source", > - NULL, NULL); > + if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { > + mce_severity(&m, cfg->tolerant, &msg, true); > + mce_panic("Local fatal machine check!", &m, msg); > + } > } > > /* >