From: "Alex G." <mr.nuke.me@gmail.com> To: Borislav Petkov <bp@alien8.de> Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com Subject: Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Thu, 19 Apr 2018 09:57:07 -0500 [thread overview] Message-ID: <bdc1d19b-3cc1-3f90-d276-414aaacdbd3a@gmail.com> (raw) In-Reply-To: <20180418175415.GJ4795@pd.tnic> On 04/18/2018 12:54 PM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:02PM -0500, Alexandru Gagniuc wrote: >> Firmware is evil: >> - ACPI was created to "try and make the 'ACPI' extensions somehow >> Windows specific" in order to "work well with NT and not the others >> even if they are open" >> - EFI was created to hide "secret" registers from the OS. >> - UEFI was created to allow compromising an otherwise secure OS. >> >> Never has firmware been created to solve a problem or simplify an >> otherwise cumbersome process. It is of no surprise then, that >> firmware nowadays intentionally crashes an OS. > > I don't believe I'm saying this but, get rid of that rant. Even though I > agree, it doesn't belong in a commit message. Of course. (snip)> Well, Tyler touched that AER error severity handling recently and we had > it all nicely documented in the comment above ghes_handle_aer(). > > Your ghes_handle_aer_irqsafe() graft basically bypasses > ghes_handle_aer() instead of incorporating in it. > > If all you wanna say is, the severity computation should go through all > the sections and look at each error's severity before making a decision, > then add that to ghes_severity() instead of doing that "deferrable" > severity dance. ghes_severity() is a one-to-one mapping from a set of unsorted severities to monotonically increasing numbers. The "one-to-one" mapping part of the sentence is obvious from the function name. To change it to parse the entire GHES would completely destroy this, and I think it would apply policy in the wrong place. Should I do that, I might have to call it something like ghes_parse_and_apply_policy_to_severity(). But that misses the whole point if these changes. I would like to get to the handlers first, and then decide if things are okay or not, but the ARM guys didn't exactly like this approach. It seems there are quite some per-error-type considerations. The logical step is to associate these considerations with the specific error type they apply to, rather than hide them as a decision under an innocent ghes_severity(). > And add the changes to the policy to the comment above > ghes_handle_aer(). I don't want any changes from people coming and going > and leaving us scratching heads why we did it this way. > > And no need for those handlers and so on - make it simple first - then we > can talk more complex handling. I don't want to leave people scratching their heads, but I also don't want to make AER a special case without having a generic way to handle these cases. People are just as susceptible to scratch their heads wondering why AER is a special case and everything else crashes. Maybe it's better move the AER handling to NMI/IRQ context, since ghes_handle_aer() is only scheduling the real AER andler, and is irq safe. I'm scratching my head about why we're messing with IRQ work from NMI context, instead of just scheduling a regular handler to take care of things. Alex
WARNING: multiple messages have this Message-ID (diff)
From: Alexandru Gagniuc <mr.nuke.me@gmail.com> To: Borislav Petkov <bp@alien8.de> Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com Subject: [RFC,v2,3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Thu, 19 Apr 2018 09:57:07 -0500 [thread overview] Message-ID: <bdc1d19b-3cc1-3f90-d276-414aaacdbd3a@gmail.com> (raw) On 04/18/2018 12:54 PM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:02PM -0500, Alexandru Gagniuc wrote: >> Firmware is evil: >> - ACPI was created to "try and make the 'ACPI' extensions somehow >> Windows specific" in order to "work well with NT and not the others >> even if they are open" >> - EFI was created to hide "secret" registers from the OS. >> - UEFI was created to allow compromising an otherwise secure OS. >> >> Never has firmware been created to solve a problem or simplify an >> otherwise cumbersome process. It is of no surprise then, that >> firmware nowadays intentionally crashes an OS. > > I don't believe I'm saying this but, get rid of that rant. Even though I > agree, it doesn't belong in a commit message. Of course. (snip)> Well, Tyler touched that AER error severity handling recently and we had > it all nicely documented in the comment above ghes_handle_aer(). > > Your ghes_handle_aer_irqsafe() graft basically bypasses > ghes_handle_aer() instead of incorporating in it. > > If all you wanna say is, the severity computation should go through all > the sections and look at each error's severity before making a decision, > then add that to ghes_severity() instead of doing that "deferrable" > severity dance. ghes_severity() is a one-to-one mapping from a set of unsorted severities to monotonically increasing numbers. The "one-to-one" mapping part of the sentence is obvious from the function name. To change it to parse the entire GHES would completely destroy this, and I think it would apply policy in the wrong place. Should I do that, I might have to call it something like ghes_parse_and_apply_policy_to_severity(). But that misses the whole point if these changes. I would like to get to the handlers first, and then decide if things are okay or not, but the ARM guys didn't exactly like this approach. It seems there are quite some per-error-type considerations. The logical step is to associate these considerations with the specific error type they apply to, rather than hide them as a decision under an innocent ghes_severity(). > And add the changes to the policy to the comment above > ghes_handle_aer(). I don't want any changes from people coming and going > and leaving us scratching heads why we did it this way. > > And no need for those handlers and so on - make it simple first - then we > can talk more complex handling. I don't want to leave people scratching their heads, but I also don't want to make AER a special case without having a generic way to handle these cases. People are just as susceptible to scratch their heads wondering why AER is a special case and everything else crashes. Maybe it's better move the AER handling to NMI/IRQ context, since ghes_handle_aer() is only scheduling the real AER andler, and is irq safe. I'm scratching my head about why we're messing with IRQ work from NMI context, instead of just scheduling a regular handler to take care of things. Alex --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-04-19 14:57 UTC|newest] Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-16 21:58 [RFC PATCH v2 0/4] acpi: apei: Improve error handling with firmware-first Alexandru Gagniuc 2018-04-16 21:59 ` [RFC PATCH v2 1/4] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,1/4] " Alexandru Gagniuc 2018-04-17 9:36 ` [RFC PATCH v2 1/4] " Borislav Petkov 2018-04-17 9:36 ` [RFC,v2,1/4] " Borislav Petkov 2018-04-17 16:43 ` [RFC PATCH v2 1/4] " Alex G. 2018-04-17 16:43 ` [RFC,v2,1/4] " Alexandru Gagniuc 2018-04-16 21:59 ` [RFC PATCH v2 2/4] acpi: apei: Split GHES handlers outside of ghes_do_proc Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-18 17:52 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-18 17:52 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 14:19 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 14:19 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 14:30 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 14:30 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 14:57 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 14:57 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 15:29 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 15:29 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-19 15:46 ` [RFC PATCH v2 2/4] " Alex G. 2018-04-19 15:46 ` [RFC,v2,2/4] " Alexandru Gagniuc 2018-04-19 16:40 ` [RFC PATCH v2 2/4] " Borislav Petkov 2018-04-19 16:40 ` [RFC,v2,2/4] " Borislav Petkov 2018-04-16 21:59 ` [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-18 17:54 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-18 17:54 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 14:57 ` Alex G. [this message] 2018-04-19 14:57 ` Alexandru Gagniuc 2018-04-19 15:35 ` [RFC PATCH v2 3/4] " James Morse 2018-04-19 15:35 ` [Devel] " James Morse 2018-04-19 15:35 ` [RFC,v2,3/4] " James Morse 2018-04-19 16:27 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 16:27 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 15:40 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 15:40 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 16:26 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 16:26 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 16:45 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 16:45 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 17:40 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 17:40 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-19 19:03 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-19 19:03 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-19 22:55 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-19 22:55 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-22 10:48 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-22 10:48 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-24 4:19 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-24 4:19 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 14:01 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 14:01 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-25 15:00 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-25 15:00 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 17:15 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 17:15 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-25 17:27 ` [RFC PATCH v2 3/4] " Alex G. 2018-04-25 17:27 ` [RFC,v2,3/4] " Alexandru Gagniuc 2018-04-25 17:39 ` [RFC PATCH v2 3/4] " Borislav Petkov 2018-04-25 17:39 ` [RFC,v2,3/4] " Borislav Petkov 2018-04-16 21:59 ` [RFC PATCH v2 4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc 2018-04-16 21:59 ` [RFC,v2,4/4] " Alexandru Gagniuc 2018-04-18 17:54 ` [RFC PATCH v2 4/4] " Borislav Petkov 2018-04-18 17:54 ` [RFC,v2,4/4] " Borislav Petkov 2018-04-19 15:11 ` [RFC PATCH v2 4/4] " Alex G. 2018-04-19 15:11 ` [RFC,v2,4/4] " Alexandru Gagniuc 2018-04-19 15:46 ` [RFC PATCH v2 4/4] " Borislav Petkov 2018-04-19 15:46 ` [RFC,v2,4/4] " Borislav Petkov 2018-04-25 20:39 ` [RFC PATCH v3 0/3] acpi: apei: Improve PCIe error handling with firmware-first Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 1/3] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,1/3] " Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 2/3] acpi: apei: Do not panic() on PCIe errors reported through GHES Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,2/3] " Alexandru Gagniuc 2018-04-26 11:19 ` [RFC PATCH v3 2/3] " Borislav Petkov 2018-04-26 11:19 ` [RFC,v3,2/3] " Borislav Petkov 2018-04-26 17:44 ` [RFC PATCH v3 2/3] " Alex G. 2018-04-26 17:44 ` [RFC,v3,2/3] " Alexandru Gagniuc 2018-04-25 20:39 ` [RFC PATCH v3 3/3] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc 2018-04-25 20:39 ` [RFC,v3,3/3] " Alexandru Gagniuc 2018-04-26 11:20 ` [RFC PATCH v3 3/3] " Borislav Petkov 2018-04-26 11:20 ` [RFC,v3,3/3] " Borislav Petkov 2018-04-26 17:47 ` [RFC PATCH v3 3/3] " Alex G. 2018-04-26 17:47 ` [RFC,v3,3/3] " Alexandru Gagniuc 2018-04-26 18:03 ` [RFC PATCH v3 3/3] " Borislav Petkov 2018-04-26 18:03 ` [RFC,v3,3/3] " Borislav Petkov 2018-05-02 19:10 ` [RFC PATCH v3 3/3] " Pavel Machek 2018-05-02 19:10 ` [RFC,v3,3/3] " Pavel Machek 2018-05-02 19:29 ` [RFC PATCH v3 3/3] " Alex G. 2018-05-02 19:29 ` [RFC,v3,3/3] " Alexandru Gagniuc
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bdc1d19b-3cc1-3f90-d276-414aaacdbd3a@gmail.com \ --to=mr.nuke.me@gmail.com \ --cc=alex_gagniuc@dellteam.com \ --cc=austin_bolen@dell.com \ --cc=bp@alien8.de \ --cc=devel@acpica.org \ --cc=erik.schmauss@intel.com \ --cc=gengdongjiu@huawei.com \ --cc=james.morse@arm.com \ --cc=lenb@kernel.org \ --cc=linux-acpi@vger.kernel.org \ --cc=linux-edac@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mchehab@kernel.org \ --cc=rjw@rjwysocki.net \ --cc=robert.moore@intel.com \ --cc=shiju.jose@huawei.com \ --cc=shyam_iyer@dell.com \ --cc=tbaicar@codeaurora.org \ --cc=tony.luck@intel.com \ --cc=will.deacon@arm.com \ --cc=zjzhang@codeaurora.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.