From: Borislav Petkov <bp@alien8.de> To: "Alex G." <mr.nuke.me@gmail.com> Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Yazen Ghannam <yazen.ghannam@amd.com>, Ard Biesheuvel <ard.biesheuvel@linaro.org> Subject: Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Wed, 25 Apr 2018 16:01:08 +0200 [thread overview] Message-ID: <20180425140108.GA2597@pd.tnic> (raw) In-Reply-To: <70c43399-e8e5-5061-b5a5-451deb5f02fa@gmail.com> On Mon, Apr 23, 2018 at 11:19:25PM -0500, Alex G. wrote: > That tells you what FFS said about the error. I betcha those status and command values have a human-readable counterparts. Btw, what do you abbreviate with "FFS"? > It's immediately obvious if there's a glaring FFS bug and if we get bogus > data. If you distrust firmware as much as I do, then you will find great > value in having such info in the logs. It's probably not too useful to a > casual user, but then neither is a majority of the system log. No no, you're missing the point - I *want* all data in the error log which helps debug a hardware issue. I just want it humanly readable so that I don't have to jot down the values and go scour the manuals to map what it actually means. > You're missing the timing and assuming you will get the hotplug interrupt. > In this example, you have 22ms between the link down and presence detect > state change. This is a fairly fast removal. > > Hotplug dependencies aside (you can have the kernel run without PCIe hotplug > support), I don't think you want to just linger in NMI for dozens of > milliseconds waiting for presence detect confirmation. No, I don't mean that. I mean something like deferred processing: you get an error, you notice it is a device which supports physical removal so you exit the NMI handler and process the error in normal, process context which allows you to query the device and say, "Hey device, are you still there?" If it is not, you drop all the hw I/O errors reported for it. Hmmm? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de> To: "Alex G." <mr.nuke.me@gmail.com> Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Yazen Ghannam <yazen.ghannam@amd.com>, Ard Biesheuvel <ard.biesheuvel@linaro.org> Subject: [RFC,v2,3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Wed, 25 Apr 2018 16:01:08 +0200 [thread overview] Message-ID: <20180425140108.GA2597@pd.tnic> (raw) On Mon, Apr 23, 2018 at 11:19:25PM -0500, Alex G. wrote: > That tells you what FFS said about the error. I betcha those status and command values have a human-readable counterparts. Btw, what do you abbreviate with "FFS"? > It's immediately obvious if there's a glaring FFS bug and if we get bogus > data. If you distrust firmware as much as I do, then you will find great > value in having such info in the logs. It's probably not too useful to a > casual user, but then neither is a majority of the system log. No no, you're missing the point - I *want* all data in the error log which helps debug a hardware issue. I just want it humanly readable so that I don't have to jot down the values and go scour the manuals to map what it actually means. > You're missing the timing and assuming you will get the hotplug interrupt. > In this example, you have 22ms between the link down and presence detect > state change. This is a fairly fast removal. > > Hotplug dependencies aside (you can have the kernel run without PCIe hotplug > support), I don't think you want to just linger in NMI for dozens of > milliseconds waiting for presence detect confirmation. No, I don't mean that. I mean something like deferred processing: you get an error, you notice it is a device which supports physical removal so you exit the NMI handler and process the error in normal, process context which allows you to query the device and say, "Hey device, are you still there?" If it is not, you drop all the hw I/O errors reported for it. Hmmm?
next prev parent reply other threads:[~2018-04-25 14:01 UTC|newest] Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-16 21:58 [RFC PATCH v2 0/4] acpi: apei: Improve error handling with firmware-first Alexandru Gagniuc 2018-04-16 21:59 ` [RFC PATCH v2 1/4] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,1/4] " Alexandru Gagniuc 2018-04-17 9:36 ` [RFC PATCH v2 1/4] " Borislav Petkov 2018-04-17 9:36 ` [RFC,v2,1/4] " Borislav Petkov 2018-04-17 16:43 ` [RFC PATCH v2 1/4] " Alex G. 2018-04-17 16:43 ` [RFC,v2,1/4] " Alexandru Gagniuc 2018-04-16 21:59 ` [RFC PATCH v2 2/4] acpi: apei: Split GHES handlers outside of ghes_do_proc Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-18 17:52 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-18 17:52 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 14:19 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 14:19 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 14:30 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 14:30 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 14:57 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 14:57 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 15:29 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 15:29 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 15:46 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 15:46 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 16:40 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 16:40 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-16 21:59 ` [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-18 17:54 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-18 17:54 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 14:57 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 14:57 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 15:35 ` [RFC PATCH v2 3/4] " James Morse 2018-04-19 15:35 ` [Devel] " James Morse 2018-04-19 15:35 ` [RFC,v2,3/4] " James Morse 2018-04-19 16:27 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 16:27 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 15:40 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 15:40 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 16:26 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 16:26 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 16:45 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 16:45 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 17:40 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 17:40 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 19:03 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 19:03 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 22:55 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 22:55 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-22 10:48 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-22 10:48 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-24 4:19 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-24 4:19 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 14:01 ` Borislav Petkov [this message] 2018-04-25 14:01 ` Borislav Petkov 2018-04-25 15:00 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-25 15:00 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 17:15 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 17:15 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-25 17:27 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-25 17:27 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 17:39 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 17:39 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-16 21:59 ` [RFC PATCH v2 4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,4/4] " Alexandru Gagniuc 2018-04-18 17:54 ` [RFC PATCH v2 4/4] " Borislav Petkov 2018-04-18 17:54 ` [RFC,v2,4/4] " Borislav Petkov 2018-04-19 15:11 ` [RFC PATCH v2 4/4] " Alex G. 2018-04-19 15:11 ` [RFC,v2,4/4] " Alexandru Gagniuc 2018-04-19 15:46 ` [RFC PATCH v2 4/4] " Borislav Petkov 2018-04-19 15:46 ` [RFC,v2,4/4] " Borislav Petkov 2018-04-25 20:39 ` [RFC PATCH v3 0/3] acpi: apei: Improve PCIe error handling with firmware-first Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 1/3] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,1/3] " Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 2/3] acpi: apei: Do not panic() on PCIe errors reported through GHES Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,2/3] " Alexandru Gagniuc 2018-04-26 11:19 ` [RFC PATCH v3 2/3] " Borislav Petkov 2018-04-26 11:19 ` [RFC,v3,2/3] " Borislav Petkov 2018-04-26 17:44 ` [RFC PATCH v3 2/3] " Alex G. 2018-04-26 17:44 ` [RFC,v3,2/3] " Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 3/3] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,3/3] " Alexandru Gagniuc 2018-04-26 11:20 ` [RFC PATCH v3 3/3] " Borislav Petkov 2018-04-26 11:20 ` [RFC,v3,3/3] " Borislav Petkov 2018-04-26 17:47 ` [RFC PATCH v3 3/3] " Alex G. 2018-04-26 17:47 ` [RFC,v3,3/3] " Alexandru Gagniuc 2018-04-26 18:03 ` [RFC PATCH v3 3/3] " Borislav Petkov 2018-04-26 18:03 ` [RFC,v3,3/3] " Borislav Petkov 2018-05-02 19:10 ` [RFC PATCH v3 3/3] " Pavel Machek 2018-05-02 19:10 ` [RFC,v3,3/3] " Pavel Machek 2018-05-02 19:29 ` [RFC PATCH v3 3/3] " Alex G. 2018-05-02 19:29 ` [RFC,v3,3/3] " Alexandru Gagniuc
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180425140108.GA2597@pd.tnic \ --to=bp@alien8.de \ --cc=alex_gagniuc@dellteam.com \ --cc=ard.biesheuvel@linaro.org \ --cc=austin_bolen@dell.com \ --cc=devel@acpica.org \ --cc=erik.schmauss@intel.com \ --cc=gengdongjiu@huawei.com \ --cc=james.morse@arm.com \ --cc=lenb@kernel.org \ --cc=linux-acpi@vger.kernel.org \ --cc=linux-edac@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mchehab@kernel.org \ --cc=mr.nuke.me@gmail.com \ --cc=rjw@rjwysocki.net \ --cc=robert.moore@intel.com \ --cc=shiju.jose@huawei.com \ --cc=shyam_iyer@dell.com \ --cc=tbaicar@codeaurora.org \ --cc=tony.luck@intel.com \ --cc=will.deacon@arm.com \ --cc=yazen.ghannam@amd.com \ --cc=zjzhang@codeaurora.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.