From: Alexandru Gagniuc <mr.nuke.me@gmail.com> To: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Alexandru Gagniuc <mr.nuke.me@gmail.com> Subject: [RFC PATCH v2 4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Date: Mon, 16 Apr 2018 16:59:03 -0500 [thread overview] Message-ID: <20180416215903.7318-5-mr.nuke.me@gmail.com> (raw) In-Reply-To: <20180416215903.7318-1-mr.nuke.me@gmail.com> There seems to be a culture amongst BIOS teams to want to crash the OS when an error can't be handled in firmware. Marking GHES errors as "fatal" is a very common way to do this. However, a number of errors reported by GHES may be fatal in the sense a device or link is lost, but are not fatal to the system. When there is a disagreement with firmware about the handleability of an error, print a warning message. Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> --- drivers/acpi/apei/ghes.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index e0528da4e8f8..6a117825611d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -535,13 +535,14 @@ static const struct ghes_handler *get_handler(const guid_t *type) static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { - int sev, sec_sev; + int sev, sec_sev, corrected_sev; struct acpi_hest_generic_data *gdata; const struct ghes_handler *handler; guid_t *sec_type; guid_t *fru_id = &NULL_UUID_LE; char *fru_text = ""; + corrected_sev = GHES_SEV_NO; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { sec_type = (guid_t *)gdata->section_type; @@ -563,6 +564,13 @@ static void ghes_do_proc(struct ghes *ghes, sec_sev, err, gdata->error_data_length); } + + corrected_sev = max(corrected_sev, sec_sev); + } + + if ((sev >= GHES_SEV_PANIC) && (corrected_sev < sev)) { + pr_warn("FIRMWARE BUG: Firmware sent fatal error that we were able to correct"); + pr_warn("BROKEN FIRMWARE: Complain to your hardware vendor"); } } -- 2.14.3
WARNING: multiple messages have this Message-ID (diff)
From: Alexandru Gagniuc <mr.nuke.me@gmail.com> To: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Alexandru Gagniuc <mr.nuke.me@gmail.com> Subject: [RFC,v2,4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Date: Mon, 16 Apr 2018 16:59:03 -0500 [thread overview] Message-ID: <20180416215903.7318-5-mr.nuke.me@gmail.com> (raw) There seems to be a culture amongst BIOS teams to want to crash the OS when an error can't be handled in firmware. Marking GHES errors as "fatal" is a very common way to do this. However, a number of errors reported by GHES may be fatal in the sense a device or link is lost, but are not fatal to the system. When there is a disagreement with firmware about the handleability of an error, print a warning message. Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> --- drivers/acpi/apei/ghes.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index e0528da4e8f8..6a117825611d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -535,13 +535,14 @@ static const struct ghes_handler *get_handler(const guid_t *type) static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { - int sev, sec_sev; + int sev, sec_sev, corrected_sev; struct acpi_hest_generic_data *gdata; const struct ghes_handler *handler; guid_t *sec_type; guid_t *fru_id = &NULL_UUID_LE; char *fru_text = ""; + corrected_sev = GHES_SEV_NO; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { sec_type = (guid_t *)gdata->section_type; @@ -563,6 +564,13 @@ static void ghes_do_proc(struct ghes *ghes, sec_sev, err, gdata->error_data_length); } + + corrected_sev = max(corrected_sev, sec_sev); + } + + if ((sev >= GHES_SEV_PANIC) && (corrected_sev < sev)) { + pr_warn("FIRMWARE BUG: Firmware sent fatal error that we were able to correct"); + pr_warn("BROKEN FIRMWARE: Complain to your hardware vendor"); } }
next prev parent reply other threads:[~2018-04-16 21:59 UTC|newest] Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-16 21:58 [RFC PATCH v2 0/4] acpi: apei: Improve error handling with firmware-first Alexandru Gagniuc 2018-04-16 21:59 ` [RFC PATCH v2 1/4] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,1/4] " Alexandru Gagniuc 2018-04-17 9:36 ` [RFC PATCH v2 1/4] " Borislav Petkov 2018-04-17 9:36 ` [RFC,v2,1/4] " Borislav Petkov 2018-04-17 16:43 ` [RFC PATCH v2 1/4] " Alex G. 2018-04-17 16:43 ` [RFC,v2,1/4] " Alexandru Gagniuc 2018-04-16 21:59 ` [RFC PATCH v2 2/4] acpi: apei: Split GHES handlers outside of ghes_do_proc Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-18 17:52 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-18 17:52 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 14:19 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 14:19 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 14:30 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 14:30 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 14:57 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 14:57 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 15:29 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 15:29 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 15:46 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 15:46 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 16:40 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 16:40 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-16 21:59 ` [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-18 17:54 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-18 17:54 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 14:57 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 14:57 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 15:35 ` [RFC PATCH v2 3/4] " James Morse 2018-04-19 15:35 ` [Devel] " James Morse 2018-04-19 15:35 ` [RFC,v2,3/4] " James Morse 2018-04-19 16:27 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 16:27 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 15:40 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 15:40 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 16:26 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 16:26 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 16:45 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 16:45 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 17:40 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 17:40 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 19:03 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 19:03 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 22:55 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 22:55 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-22 10:48 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-22 10:48 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-24 4:19 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-24 4:19 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 14:01 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 14:01 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-25 15:00 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-25 15:00 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 17:15 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 17:15 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-25 17:27 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-25 17:27 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 17:39 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 17:39 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-16 21:59 ` Alexandru Gagniuc [this message] 2018-04-16 21:59 ` [RFC,v2,4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc 2018-04-18 17:54 ` [RFC PATCH v2 4/4] " Borislav Petkov 2018-04-18 17:54 ` [RFC,v2,4/4] " Borislav Petkov 2018-04-19 15:11 ` [RFC PATCH v2 4/4] " Alex G. 2018-04-19 15:11 ` [RFC,v2,4/4] " Alexandru Gagniuc 2018-04-19 15:46 ` [RFC PATCH v2 4/4] " Borislav Petkov 2018-04-19 15:46 ` [RFC,v2,4/4] " Borislav Petkov 2018-04-25 20:39 ` [RFC PATCH v3 0/3] acpi: apei: Improve PCIe error handling with firmware-first Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 1/3] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,1/3] " Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 2/3] acpi: apei: Do not panic() on PCIe errors reported through GHES Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,2/3] " Alexandru Gagniuc 2018-04-26 11:19 ` [RFC PATCH v3 2/3] " Borislav Petkov 2018-04-26 11:19 ` [RFC,v3,2/3] " Borislav Petkov 2018-04-26 17:44 ` [RFC PATCH v3 2/3] " Alex G. 2018-04-26 17:44 ` [RFC,v3,2/3] " Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 3/3] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,3/3] " Alexandru Gagniuc 2018-04-26 11:20 ` [RFC PATCH v3 3/3] " Borislav Petkov 2018-04-26 11:20 ` [RFC,v3,3/3] " Borislav Petkov 2018-04-26 17:47 ` [RFC PATCH v3 3/3] " Alex G. 2018-04-26 17:47 ` [RFC,v3,3/3] " Alexandru Gagniuc 2018-04-26 18:03 ` [RFC PATCH v3 3/3] " Borislav Petkov 2018-04-26 18:03 ` [RFC,v3,3/3] " Borislav Petkov 2018-05-02 19:10 ` [RFC PATCH v3 3/3] " Pavel Machek 2018-05-02 19:10 ` [RFC,v3,3/3] " Pavel Machek 2018-05-02 19:29 ` [RFC PATCH v3 3/3] " Alex G. 2018-05-02 19:29 ` [RFC,v3,3/3] " Alexandru Gagniuc
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180416215903.7318-5-mr.nuke.me@gmail.com \ --to=mr.nuke.me@gmail.com \ --cc=alex_gagniuc@dellteam.com \ --cc=austin_bolen@dell.com \ --cc=bp@alien8.de \ --cc=devel@acpica.org \ --cc=erik.schmauss@intel.com \ --cc=gengdongjiu@huawei.com \ --cc=james.morse@arm.com \ --cc=lenb@kernel.org \ --cc=linux-acpi@vger.kernel.org \ --cc=linux-edac@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mchehab@kernel.org \ --cc=rjw@rjwysocki.net \ --cc=robert.moore@intel.com \ --cc=shiju.jose@huawei.com \ --cc=shyam_iyer@dell.com \ --cc=tbaicar@codeaurora.org \ --cc=tony.luck@intel.com \ --cc=will.deacon@arm.com \ --cc=zjzhang@codeaurora.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.