From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Alex G." Subject: Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Wed, 25 Apr 2018 10:00:53 -0500 Message-ID: <48944beb-4e29-05cc-857b-7698e3dbe89b@gmail.com> References: <20180418175415.GJ4795@pd.tnic> <20180419154006.GE3600@pd.tnic> <977608e6-9f5d-c523-a78a-993ac5bfd55f@gmail.com> <20180419164528.GD5635@pd.tnic> <20180419190323.GF5635@pd.tnic> <20180422104849.GA32754@pd.tnic> <70c43399-e8e5-5061-b5a5-451deb5f02fa@gmail.com> <20180425140108.GA2597@pd.tnic> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180425140108.GA2597@pd.tnic> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Borislav Petkov Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Yazen Ghannam , Ard Biesheuvel List-Id: linux-acpi@vger.kernel.org On 04/25/2018 09:01 AM, Borislav Petkov wrote: > On Mon, Apr 23, 2018 at 11:19:25PM -0500, Alex G. wrote: >> That tells you what FFS said about the error. > > I betcha those status and command values have a human-readable counterparts. > > Btw, what do you abbreviate with "FFS"? Firmware-first. >> It's immediately obvious if there's a glaring FFS bug and if we get bogus >> data. If you distrust firmware as much as I do, then you will find great >> value in having such info in the logs. It's probably not too useful to a >> casual user, but then neither is a majority of the system log. > > No no, you're missing the point - I *want* all data in the error log > which helps debug a hardware issue. I just want it humanly readable so > that I don't have to jot down the values and go scour the manuals to map > what it actually means. We could probably use more of the native AER print functions, but that's beyond the scope of this patch. I tried something like this [1], but have given up following the PCI maintainer's radio silence. I don't care _that_ much about the log format. [1] http://www.spinics.net/lists/linux-pci/msg71422.html >> You're missing the timing and assuming you will get the hotplug interrupt. >> In this example, you have 22ms between the link down and presence detect >> state change. This is a fairly fast removal. >> >> Hotplug dependencies aside (you can have the kernel run without PCIe hotplug >> support), I don't think you want to just linger in NMI for dozens of >> milliseconds waiting for presence detect confirmation. > > No, I don't mean that. I mean something like deferred processing: Like the exact thing that this patch series implements? :) > you > get an error, you notice it is a device which supports physical removal > so you exit the NMI handler and process the error in normal, process > context which allows you to query the device and say, "Hey device, are > you still there?" Like the exact way the AER handler works? > If it is not, you drop all the hw I/O errors reported for it. Like the PCI error recovery mechanisms that AER invokes? > Hmmm? Hmmm From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [RFC,v2,3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. From: Alexandru Gagniuc Message-Id: <48944beb-4e29-05cc-857b-7698e3dbe89b@gmail.com> Date: Wed, 25 Apr 2018 10:00:53 -0500 To: Borislav Petkov Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Yazen Ghannam , Ard Biesheuvel List-ID: T24gMDQvMjUvMjAxOCAwOTowMSBBTSwgQm9yaXNsYXYgUGV0a292IHdyb3RlOgo+IE9uIE1vbiwg QXByIDIzLCAyMDE4IGF0IDExOjE5OjI1UE0gLTA1MDAsIEFsZXggRy4gd3JvdGU6Cj4+IFRoYXQg dGVsbHMgeW91IHdoYXQgRkZTIHNhaWQgYWJvdXQgdGhlIGVycm9yLgo+IAo+IEkgYmV0Y2hhIHRo b3NlIHN0YXR1cyBhbmQgY29tbWFuZCB2YWx1ZXMgaGF2ZSBhIGh1bWFuLXJlYWRhYmxlIGNvdW50 ZXJwYXJ0cy4KPiAKPiBCdHcsIHdoYXQgZG8geW91IGFiYnJldmlhdGUgd2l0aCAiRkZTIj8KCkZp cm13YXJlLWZpcnN0LgoKPj4gSXQncyBpbW1lZGlhdGVseSBvYnZpb3VzIGlmIHRoZXJlJ3MgYSBn bGFyaW5nIEZGUyBidWcgYW5kIGlmIHdlIGdldCBib2d1cwo+PiBkYXRhLiBJZiB5b3UgZGlzdHJ1 c3QgZmlybXdhcmUgYXMgbXVjaCBhcyBJIGRvLCB0aGVuIHlvdSB3aWxsIGZpbmQgZ3JlYXQKPj4g dmFsdWUgaW4gaGF2aW5nIHN1Y2ggaW5mbyBpbiB0aGUgbG9ncy4gSXQncyBwcm9iYWJseSBub3Qg dG9vIHVzZWZ1bCB0byBhCj4+IGNhc3VhbCB1c2VyLCBidXQgdGhlbiBuZWl0aGVyIGlzIGEgbWFq b3JpdHkgb2YgdGhlIHN5c3RlbSBsb2cuCj4gCj4gTm8gbm8sIHlvdSdyZSBtaXNzaW5nIHRoZSBw b2ludCAtIEkgKndhbnQqIGFsbCBkYXRhIGluIHRoZSBlcnJvciBsb2cKPiB3aGljaCBoZWxwcyBk ZWJ1ZyBhIGhhcmR3YXJlIGlzc3VlLiBJIGp1c3Qgd2FudCBpdCBodW1hbmx5IHJlYWRhYmxlIHNv Cj4gdGhhdCBJIGRvbid0IGhhdmUgdG8gam90IGRvd24gdGhlIHZhbHVlcyBhbmQgZ28gc2NvdXIg dGhlIG1hbnVhbHMgdG8gbWFwCj4gd2hhdCBpdCBhY3R1YWxseSBtZWFucy4KCldlIGNvdWxkIHBy b2JhYmx5IHVzZSBtb3JlIG9mIHRoZSBuYXRpdmUgQUVSIHByaW50IGZ1bmN0aW9ucywgYnV0IHRo YXQncwpiZXlvbmQgdGhlIHNjb3BlIG9mIHRoaXMgcGF0Y2guIEkgdHJpZWQgc29tZXRoaW5nIGxp a2UgdGhpcyBbMV0sIGJ1dApoYXZlIGdpdmVuIHVwIGZvbGxvd2luZyB0aGUgUENJIG1haW50YWlu ZXIncyByYWRpbyBzaWxlbmNlLiBJIGRvbid0IGNhcmUKX3RoYXRfIG11Y2ggYWJvdXQgdGhlIGxv ZyBmb3JtYXQuCgpbMV0gaHR0cDovL3d3dy5zcGluaWNzLm5ldC9saXN0cy9saW51eC1wY2kvbXNn NzE0MjIuaHRtbAoKPj4gWW91J3JlIG1pc3NpbmcgdGhlIHRpbWluZyBhbmQgYXNzdW1pbmcgeW91 IHdpbGwgZ2V0IHRoZSBob3RwbHVnIGludGVycnVwdC4KPj4gSW4gdGhpcyBleGFtcGxlLCB5b3Ug aGF2ZSAyMm1zIGJldHdlZW4gdGhlIGxpbmsgZG93biBhbmQgcHJlc2VuY2UgZGV0ZWN0Cj4+IHN0 YXRlIGNoYW5nZS4gVGhpcyBpcyBhIGZhaXJseSBmYXN0IHJlbW92YWwuCj4+Cj4+IEhvdHBsdWcg ZGVwZW5kZW5jaWVzIGFzaWRlICh5b3UgY2FuIGhhdmUgdGhlIGtlcm5lbCBydW4gd2l0aG91dCBQ Q0llIGhvdHBsdWcKPj4gc3VwcG9ydCksIEkgZG9uJ3QgdGhpbmsgeW91IHdhbnQgdG8ganVzdCBs aW5nZXIgaW4gTk1JIGZvciBkb3plbnMgb2YKPj4gbWlsbGlzZWNvbmRzIHdhaXRpbmcgZm9yIHBy ZXNlbmNlIGRldGVjdCBjb25maXJtYXRpb24uCj4gCj4gTm8sIEkgZG9uJ3QgbWVhbiB0aGF0LiBJ IG1lYW4gc29tZXRoaW5nIGxpa2UgZGVmZXJyZWQgcHJvY2Vzc2luZzoKCkxpa2UgdGhlIGV4YWN0 IHRoaW5nIHRoYXQgdGhpcyBwYXRjaCBzZXJpZXMgaW1wbGVtZW50cz8gOikKCj4geW91Cj4gZ2V0 IGFuIGVycm9yLCB5b3Ugbm90aWNlIGl0IGlzIGEgZGV2aWNlIHdoaWNoIHN1cHBvcnRzIHBoeXNp Y2FsIHJlbW92YWwKPiBzbyB5b3UgZXhpdCB0aGUgTk1JIGhhbmRsZXIgYW5kIHByb2Nlc3MgdGhl IGVycm9yIGluIG5vcm1hbCwgcHJvY2Vzcwo+IGNvbnRleHQgd2hpY2ggYWxsb3dzIHlvdSB0byBx dWVyeSB0aGUgZGV2aWNlIGFuZCBzYXksICJIZXkgZGV2aWNlLCBhcmUKPiB5b3Ugc3RpbGwgdGhl cmU/IgoKTGlrZSB0aGUgZXhhY3Qgd2F5IHRoZSBBRVIgaGFuZGxlciB3b3Jrcz8KCj4gSWYgaXQg aXMgbm90LCB5b3UgZHJvcCBhbGwgdGhlIGh3IEkvTyBlcnJvcnMgcmVwb3J0ZWQgZm9yIGl0LgoK TGlrZSB0aGUgUENJIGVycm9yIHJlY292ZXJ5IG1lY2hhbmlzbXMgdGhhdCBBRVIgaW52b2tlcz8K Cj4gSG1tbT8KSG1tbQotLS0KVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhl IGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LWVkYWMiIGluCnRoZSBib2R5IG9mIGEgbWVzc2FnZSB0 byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnCk1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6 Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbAo=