* [PATCH] RAS/CEC: Add debugfs switch to disable at run time @ 2019-04-18 22:02 Tony Luck 2019-04-18 22:51 ` Cong Wang 2019-04-20 19:50 ` Borislav Petkov 0 siblings, 2 replies; 25+ messages in thread From: Tony Luck @ 2019-04-18 22:02 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, linux-kernel Useful when running error injection tests that want to see all of the MCi_(STATUS|ADDR|MISC) data via /dev/mcelog. Signed-off-by: Tony Luck <tony.luck@intel.com> --- drivers/ras/cec.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 2d9ec378a8bc..a2ceedcd8516 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -123,6 +123,9 @@ static u64 dfs_pfn; /* Amount of errors after which we offline */ static unsigned int count_threshold = COUNT_MASK; +/* debugfs switch to enable/disable CEC */ +static u64 cec_enabled = 1; + /* * The timer "decays" element count each timer_interval which is 24hrs by * default. @@ -400,6 +403,14 @@ static int count_threshold_set(void *data, u64 val) } DEFINE_DEBUGFS_ATTRIBUTE(count_threshold_ops, u64_get, count_threshold_set, "%lld\n"); +static int enable_set(void *data, u64 val) +{ + ce_arr.disabled = !val; + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(enable_ops, u64_get, enable_set, "%lld\n"); + static int array_dump(struct seq_file *m, void *v) { struct ce_array *ca = &ce_arr; @@ -451,7 +462,7 @@ static const struct file_operations array_ops = { static int __init create_debugfs_nodes(void) { - struct dentry *d, *pfn, *decay, *count, *array; + struct dentry *d, *pfn, *decay, *count, *array, *enable; d = debugfs_create_dir("cec", ras_debugfs_dir); if (!d) { @@ -485,6 +496,13 @@ static int __init create_debugfs_nodes(void) goto err; } + enable = debugfs_create_file("enable", S_IRUSR | S_IWUSR, d, + &cec_enabled, &enable_ops); + if (!enable) { + pr_warn("Error creating enable debugfs node!\n"); + goto err; + } + return 0; -- 2.19.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-18 22:02 [PATCH] RAS/CEC: Add debugfs switch to disable at run time Tony Luck @ 2019-04-18 22:51 ` Cong Wang 2019-04-18 23:29 ` Borislav Petkov 2019-04-20 19:50 ` Borislav Petkov 1 sibling, 1 reply; 25+ messages in thread From: Cong Wang @ 2019-04-18 22:51 UTC (permalink / raw) To: Tony Luck; +Cc: Borislav Petkov, LKML On Thu, Apr 18, 2019 at 3:02 PM Tony Luck <tony.luck@intel.com> wrote: > > Useful when running error injection tests that want to > see all of the MCi_(STATUS|ADDR|MISC) data via /dev/mcelog. > > Signed-off-by: Tony Luck <tony.luck@intel.com> We saw the same problem, CONFIG_RAS hijacks all the correctable memory errors, which leaves mcelog "broken" silently. I know it is arguable, but until we can switch from mcelog to rasdaemon, mcelog should still work as before. I like this idea of disabling it at runtime, so: Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-18 22:51 ` Cong Wang @ 2019-04-18 23:29 ` Borislav Petkov 2019-04-18 23:58 ` Cong Wang 2019-04-19 0:07 ` Luck, Tony 0 siblings, 2 replies; 25+ messages in thread From: Borislav Petkov @ 2019-04-18 23:29 UTC (permalink / raw) To: Cong Wang, Tony Luck; +Cc: LKML On Thu, Apr 18, 2019 at 03:51:07PM -0700, Cong Wang wrote: > On Thu, Apr 18, 2019 at 3:02 PM Tony Luck <tony.luck@intel.com> wrote: > > > > Useful when running error injection tests that want to > > see all of the MCi_(STATUS|ADDR|MISC) data via /dev/mcelog. > > > > Signed-off-by: Tony Luck <tony.luck@intel.com> > > We saw the same problem, CONFIG_RAS hijacks all the > correctable memory errors, which leaves mcelog "broken" > silently. I know it is arguable, but until we can switch from > mcelog to rasdaemon, mcelog should still work as before. It is "arguable" because this is not how the CEC is supposed to be used. If you want to collect errors with mcelog, you don't use the CEC at all. And there's ras=cec_disable for that or you simply don't enable it in your .config. As Tony says in the commit message, the enable should be used only for injection tests. Which is where that thing should only be used for - debugging the CEC itself. Which reminds me, Tony, I think all those debugging files "pfn" and "array" and the one you add now, should all be under a CONFIG_RAS_CEC_DEBUG which is default off and used only for development. Mind adding that too pls? Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-18 23:29 ` Borislav Petkov @ 2019-04-18 23:58 ` Cong Wang 2019-04-19 0:26 ` Borislav Petkov 2019-04-19 0:07 ` Luck, Tony 1 sibling, 1 reply; 25+ messages in thread From: Cong Wang @ 2019-04-18 23:58 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, LKML On Thu, Apr 18, 2019 at 4:29 PM Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Apr 18, 2019 at 03:51:07PM -0700, Cong Wang wrote: > > On Thu, Apr 18, 2019 at 3:02 PM Tony Luck <tony.luck@intel.com> wrote: > > > > > > Useful when running error injection tests that want to > > > see all of the MCi_(STATUS|ADDR|MISC) data via /dev/mcelog. > > > > > > Signed-off-by: Tony Luck <tony.luck@intel.com> > > > > We saw the same problem, CONFIG_RAS hijacks all the > > correctable memory errors, which leaves mcelog "broken" > > silently. I know it is arguable, but until we can switch from > > mcelog to rasdaemon, mcelog should still work as before. > > It is "arguable" because this is not how the CEC is supposed to be used. No, it is all about whether we should break users' expectation. > > If you want to collect errors with mcelog, you don't use the CEC at all. > And there's ras=cec_disable for that or you simply don't enable it in > your .config. > > As Tony says in the commit message, the enable should be used only for > injection tests. Which is where that thing should only be used for - > debugging the CEC itself. This doesn't sounds like a valid reason for us to break users' expectation. Prior to CONFIG_RAS, mcelog just works fine for users (at least Intel users). Suddenly after enabling CONFIG_RAS in kernel, mcelog will no longer receive any correctable memory errors _silently_. What's more, we don't even have rasdaemon running in our system, so there is no consumer of RAS CEC, these errors just simply disappear from users' expected place. I know CONFIG_RAS is new feature supposed to replace MCELOG, but they can co-exist in kernel config, which means mcelog should continue to work as before until it gets fully replaced. Even the following PoC change could make this situation better, because with this change when we enable CONFIG_RAS,mcelog will break _loudly_ rather than just silently, users will notice mcelog is no longer supported and will look for its alternative choice. diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig index b834ff555188..f2e2b75fffbe 100644 --- a/drivers/ras/Kconfig +++ b/drivers/ras/Kconfig @@ -1,5 +1,6 @@ menuconfig RAS bool "Reliability, Availability and Serviceability (RAS) features" + depends on !X86_MCELOG_LEGACY help Reliability, availability and serviceability (RAS) is a computer hardware engineering term. Computers designed with higher levels Just my 2 cents. Thanks. ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-18 23:58 ` Cong Wang @ 2019-04-19 0:26 ` Borislav Petkov 2019-04-20 5:43 ` Cong Wang 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-19 0:26 UTC (permalink / raw) To: Cong Wang; +Cc: Tony Luck, LKML On Thu, Apr 18, 2019 at 04:58:22PM -0700, Cong Wang wrote: > No, it is all about whether we should break users' expectation. What user expectation? > This doesn't sounds like a valid reason for us to break users' > expectation. I think it is *you* who has some sort of "expectation" but that "expectation" is wrong. > Prior to CONFIG_RAS, mcelog just works fine for users (at least Intel > users). Suddenly after enabling CONFIG_RAS in kernel, mcelog will > no longer receive any correctable memory errors _silently_. That is, of course, wrong too. > What's more, we don't even have rasdaemon running in our system, so Are you saying "we" to mean "we the users" or some company "we"? And that is wrong too, there's at least one rasdaemon: http://git.infradead.org/users/mchehab/rasdaemon.git > there is no consumer of RAS CEC, RAS CEC doesn't need a consumer. You're misunderstanding the whole concept of the error collector. > these errors just simply disappear from users' expected place. They "disappear" because you have CONFIG_RAS_CEC enabled. But they don't really disappear - they're collected by the thing to filter out only the pages which keep generating errors constantly and those get soft-offlined. The sporadic ones simply get ignored because they don't happen again and are only result of alpha particles or overheating conditions or whatever. Now here's the CEC help text: config RAS_CEC bool "Correctable Errors Collector" depends on X86_MCE && MEMORY_FAILURE && DEBUG_FS ---help--- This is a small cache which collects correctable memory errors per 4K page PFN and counts their repeated occurrence. Once the counter for a PFN overflows, we try to soft-offline that page as we take it to mean that it has reached a relatively high error count and would probably be best if we don't use it anymore. Bear in mind that this is absolutely useless if your platform doesn't have ECC DIMMs and doesn't have DRAM ECC checking enabled in the BIOS. you can tell me what in that text is not clear so that I can make it more clear and obvious what that thing is. > I know CONFIG_RAS is new feature supposed to replace MCELOG, No, it isn't. CONFIG_RAS is supposed to collect all the RAS-related functionality in the kernel and it looks like you have some misconceptions about it. > but they can co-exist in kernel config, which means mcelog should > continue to work as before until it gets fully replaced. For that you need to enable X86_MCELOG_LEGACY. And let me repeat it again - if you want to collect errors in userspace, do not enable RAS_CEC at all. > Even the following PoC change could make this situation better, > because with this change when we enable CONFIG_RAS,mcelog > will break _loudly_ rather than just silently, users will notice mcelog > is no longer supported and will look for its alternative choice. You have somehow put in your head that CONFIG_RAS is the counterpart of CONFIG_X86_MCELOG_LEGACY. Which is *simply* *not* *true*. And the moment you realize that, then you'll be a step further in the right direction. So enable X86_MCELOG_LEGACY and you can collect all the errors you wish. And there's a rasdaemon which you can use too, as I pointed above, if you don't want mcelog. CEC is something *completely* different and its purpose is to run in the kernel and prevent users and admins from upsetting unnecessarily with every sporadic correctable error and just because an alpha particle flew through their DIMMs, they all start running in headless chicken mode, trying to RMA perfectly good hardware. Now, if any of that above still doesn't make it clear, please state what you're trying to achieve and I'll try to help. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-19 0:26 ` Borislav Petkov @ 2019-04-20 5:43 ` Cong Wang 2019-04-20 9:13 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Cong Wang @ 2019-04-20 5:43 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, LKML On Thu, Apr 18, 2019 at 5:26 PM Borislav Petkov <bp@alien8.de> wrote: > > Now, if any of that above still doesn't make it clear, please state what > you're trying to achieve and I'll try to help. Sorry that I misled you to believe we don't even enable CONFIG_X86_MCELOG_LEGACY. Here is what we have and what we have tried: 1. We have CONFIG_X86_MCELOG_LEGACY=y 2. We also have CONFIG_RAS=y and CONFIG_RAS_CEC=y 3. mcelog started as a daemon successfully, like before 4. Some real correctable memory errors happened, as logged in dmesg 5. mcelog couldn't receive any of them, reported 0 errors 6. Admin's complained to us as they believe this is a kernel bug 7. We dug into kernel source code and found out CONFIG_RAS hijacks all these errors, by stopping there in the notification chain: static int mce_first_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *m = (struct mce *)data; if (!m) return NOTIFY_DONE; if (cec_add_mce(m)) return NOTIFY_STOP; // <=== Returns and stops here /* Emit the trace record: */ trace_mce_record(m); set_bit(0, &mce_need_notify); mce_notify_irq(); // <=== There is where MCELOG receives return NOTIFY_DONE; } 8. I noticed rasdaemon, and tried to start it instead of mcelog. 9. I injected some memory error and could successfully read them via ras-mc-ctl. To demonstrate what I think we should have, here is the PoC code ONLY to show the idea (please don't judge it): @ -567,12 +567,12 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *m = (struct mce *)data; + bool consumed; if (!m) return NOTIFY_DONE; - if (cec_add_mce(m)) - return NOTIFY_STOP; + consumed = cec_add_mce(m); /* Emit the trace record: */ trace_mce_record(m); @@ -581,7 +581,7 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, mce_notify_irq(); - return NOTIFY_DONE; + return consumed ? NOTIFY_STOP : NOTIFY_DONE; } With this change, although not even compiled, mcelog should still receive correctable memory errors like before, even when we have CONFIG_RAS_CEC=y. Does this make any sense to you? Thanks! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-20 5:43 ` Cong Wang @ 2019-04-20 9:13 ` Borislav Petkov 2019-04-20 18:18 ` Cong Wang 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-20 9:13 UTC (permalink / raw) To: Cong Wang; +Cc: Tony Luck, LKML On Fri, Apr 19, 2019 at 10:43:03PM -0700, Cong Wang wrote: > With this change, although not even compiled, mcelog should still > receive correctable memory errors like before, even when we have > CONFIG_RAS_CEC=y. > > Does this make any sense to you? Yes, the answer is in the mail you snipped. Did you read it? Hint: disable CONFIG_RAS_CEC. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-20 9:13 ` Borislav Petkov @ 2019-04-20 18:18 ` Cong Wang 2019-04-20 18:47 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Cong Wang @ 2019-04-20 18:18 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, LKML On Sat, Apr 20, 2019 at 2:13 AM Borislav Petkov <bp@alien8.de> wrote: > > On Fri, Apr 19, 2019 at 10:43:03PM -0700, Cong Wang wrote: > > With this change, although not even compiled, mcelog should still > > receive correctable memory errors like before, even when we have > > CONFIG_RAS_CEC=y. > > > > Does this make any sense to you? > > Yes, the answer is in the mail you snipped. Did you read it? I read it, all of your response is based on your speculation that I don't have CONFIG_X86_MCELOG_LEGACY=y, which is clearly a misunderstanding. You didn't answer my question here, because I asked you whether the following change (PoC only) makes sense: @ -567,12 +567,12 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *m = (struct mce *)data; + bool consumed; if (!m) return NOTIFY_DONE; - if (cec_add_mce(m)) - return NOTIFY_STOP; + consumed = cec_add_mce(m); /* Emit the trace record: */ trace_mce_record(m); @@ -581,7 +581,7 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, mce_notify_irq(); - return NOTIFY_DONE; + return consumed ? NOTIFY_STOP : NOTIFY_DONE; } > > Hint: disable CONFIG_RAS_CEC. I knew disabling it could cure the problem from the beginning, please save your own time by not repeating things we both already knew. :) Once again, I still don't think it is the right answer, which is also why I keep finding different solutions. I know you disagree, but you never explain why you disagree, you speculated CONFIG_X86_MCELOG_LEGACY, which is completely a misunderstanding. I brought up CONFIG_X86_MCELOG_LEGACY simply to show you how we could break mcelog _LOUDLY_ if we really decide to break it, currently it just breaks silently. You misinterpret it as if I understand CONFIG_RAS as a replacement for CONFIG_X86_MCELOG_LEGACY, which is a very sad misunderstanding. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-20 18:18 ` Cong Wang @ 2019-04-20 18:47 ` Borislav Petkov 2019-04-20 19:08 ` Cong Wang 2019-04-22 16:29 ` Luck, Tony 0 siblings, 2 replies; 25+ messages in thread From: Borislav Petkov @ 2019-04-20 18:47 UTC (permalink / raw) To: Cong Wang; +Cc: Tony Luck, LKML On Sat, Apr 20, 2019 at 11:18:46AM -0700, Cong Wang wrote: > You didn't answer my question here, because I asked you whether > the following change (PoC only) makes sense: I answered it - the answer is to disable CONFIG_RAS_CEC. But let me do a more detailed answer, maybe that'll help. The PoC doesn't make sense. Why? Because if you don't return early from the notifier when the CEC has consumed the error, you don't need the CEC at all. Ergo, you can just as well disable it. Because, let me paste from a couple of mails ago what the CEC is: "CEC is something *completely* different and its purpose is to run in the kernel and prevent users and admins from upsetting unnecessarily with every sporadic correctable error and just because an alpha particle flew through their DIMMs, they all start running in headless chicken mode, trying to RMA perfectly good hardware." IOW, when you have the CEC enabled, you don't need to log memory errors with a userspace agent. The CEC collects them and discards them if they don't repeat. If they do repeat, then it offlines the page. Without user intervention and interference. Now, if you still want to know how many errors and where they happened and when they happened and yadda yadda, you *disable* the CEC. I hope this makes more sense now. > I knew disabling it could cure the problem from the beginning, please > save your own time by not repeating things we both already knew. :) > > Once again, I still don't think it is the right answer, which is also why I > keep finding different solutions. This is where you come in and say "it is not the right answer because..." and give your arguments why. I gave mine a couple of times already. I never said this functionality is cast in stone the way it is but there has to be a *good* *reason* why it needs to be changed. I.e., basic kernel deveopment. People come with ideas and they *justify* those ideas with arguments why they're better. > I know you disagree, but you never explain why you disagree, You're kidding, right? https://lkml.kernel.org/r/20190419002645.GA559@zn.tnic -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-20 18:47 ` Borislav Petkov @ 2019-04-20 19:08 ` Cong Wang 2019-04-22 16:29 ` Luck, Tony 1 sibling, 0 replies; 25+ messages in thread From: Cong Wang @ 2019-04-20 19:08 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, LKML On Sat, Apr 20, 2019 at 11:47 AM Borislav Petkov <bp@alien8.de> wrote: > IOW, when you have the CEC enabled, you don't need to log memory errors > with a userspace agent. The CEC collects them and discards them if they > don't repeat. So, you mean breaking mcelog is intentionally, if so, why not break it loudly? That is, for example, preventing mcelog from starting by disabling CONFIG_X86_MCELOG_LEGACY in Kconfig _automatically_ when CONFIG_RAS is enabled? (Like what I showed in my PoC change.) Or, for another example, print a kernel warning and let users know this behavior is intentional? > > If they do repeat, then it offlines the page. > > Without user intervention and interference. > > Now, if you still want to know how many errors and where they happened > and when they happened and yadda yadda, you *disable* the CEC. Well, I believe rasdaemon has the counters too, it is not hard to count the trace events at all. I don't worry about this at all. What I worry is how we treat mcelog when having CONFIG_RAS=y. > > I hope this makes more sense now. Yes, thanks for the information. It is kinda what I expected, as I keep saying, I believe we can improve this situation to avoid users' confusion, rather than just saying CONFIG_RAS=n is the answer. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-20 18:47 ` Borislav Petkov 2019-04-20 19:08 ` Cong Wang @ 2019-04-22 16:29 ` Luck, Tony 2019-04-22 16:31 ` Borislav Petkov 1 sibling, 1 reply; 25+ messages in thread From: Luck, Tony @ 2019-04-22 16:29 UTC (permalink / raw) To: Borislav Petkov, Cong Wang; +Cc: LKML > Now, if you still want to know how many errors and where they happened > and when they happened and yadda yadda, you *disable* the CEC. Rebooting isn't popular in many end user situations. Many CSP (cloud service providers) vehemently hate the idea of rebooting. -Tony ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 16:29 ` Luck, Tony @ 2019-04-22 16:31 ` Borislav Petkov 2019-04-22 16:43 ` Luck, Tony 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-22 16:31 UTC (permalink / raw) To: Luck, Tony; +Cc: Cong Wang, LKML On Mon, Apr 22, 2019 at 04:29:35PM +0000, Luck, Tony wrote: > > Now, if you still want to know how many errors and where they happened > > and when they happened and yadda yadda, you *disable* the CEC. > > Rebooting isn't popular in many end user situations. Many CSP (cloud > service providers) vehemently hate the idea of rebooting. I meant disable in Kconfig - not build it in at all. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 16:31 ` Borislav Petkov @ 2019-04-22 16:43 ` Luck, Tony 2019-04-22 17:05 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Luck, Tony @ 2019-04-22 16:43 UTC (permalink / raw) To: Borislav Petkov; +Cc: Cong Wang, LKML >> Rebooting isn't popular in many end user situations. Many CSP (cloud >> service providers) vehemently hate the idea of rebooting. > > I meant disable in Kconfig - not build it in at all. If rebooting is bad, then re-compiling and rebooting is 100x worse. :-) -Tony ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 16:43 ` Luck, Tony @ 2019-04-22 17:05 ` Borislav Petkov 2019-04-22 17:23 ` Luck, Tony 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-22 17:05 UTC (permalink / raw) To: Luck, Tony; +Cc: Cong Wang, LKML On Mon, Apr 22, 2019 at 04:43:58PM +0000, Luck, Tony wrote: > >> Rebooting isn't popular in many end user situations. Many CSP (cloud > >> service providers) vehemently hate the idea of rebooting. > > > > I meant disable in Kconfig - not build it in at all. > > If rebooting is bad, then re-compiling and rebooting is 100x worse. :-) I think we're talking past each other here: I mean disable the CEC *forever* and *never* use it. Use only a userspace agent and log errors with it. Makes sense? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 17:05 ` Borislav Petkov @ 2019-04-22 17:23 ` Luck, Tony 0 siblings, 0 replies; 25+ messages in thread From: Luck, Tony @ 2019-04-22 17:23 UTC (permalink / raw) To: Borislav Petkov; +Cc: Cong Wang, LKML > I think we're talking past each other here: I mean disable the CEC > *forever* and *never* use it. Use only a userspace agent and log errors > with it. > > Makes sense? Not really. We want pretty much everyone to enable and use CEC. That way people don't bother use about the occasional neutron strike flipping a bit. So we need to make sure there are no excuses for disabling it. People may come up with other cases where they say they would need to disable (perhaps genuine, perhaps bogus) ... if our only answers are reboot, or recompile and reboot ... then they'll disable forever. If they do that, why do we even have it in the kernel? -Tony ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-18 23:29 ` Borislav Petkov 2019-04-18 23:58 ` Cong Wang @ 2019-04-19 0:07 ` Luck, Tony 2019-04-19 0:29 ` Borislav Petkov 2019-04-20 5:50 ` Cong Wang 1 sibling, 2 replies; 25+ messages in thread From: Luck, Tony @ 2019-04-19 0:07 UTC (permalink / raw) To: Borislav Petkov; +Cc: Cong Wang, LKML On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > Which reminds me, Tony, I think all those debugging files "pfn" > and "array" and the one you add now, should all be under a > CONFIG_RAS_CEC_DEBUG which is default off and used only for development. > Mind adding that too pls? Patch below, on top of previous patch. Note that I didn't move "enable" into the RAS_CEC_DEBUG code. I think it has some value even on production systems. It is still in debugfs (which many production systems don't mount) so I don't see that people are going to be randomly using it to disable the CEC. -Tony From ac9d8c9bf7b38e18dcffdd41f8fcf0f07c632cd3 Mon Sep 17 00:00:00 2001 From: Tony Luck <tony.luck@intel.com> Date: Thu, 18 Apr 2019 16:46:55 -0700 Subject: [PATCH] RAS/CEC: Move CEC debug features under a CONFIG_RAS_CEC_DEBUG option The pfn and array files in /sys/kernel/debug/ras/cec are intended for debugging the CEC code itself. They are not needed on production systems, so the default setting for this CONFIG option is "n". Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/ras/Kconfig | 10 ++++++++++ drivers/ras/cec.c | 13 ++++++++++++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/ras/Kconfig b/arch/x86/ras/Kconfig index a9c3db125222..7fde8d55e394 100644 --- a/arch/x86/ras/Kconfig +++ b/arch/x86/ras/Kconfig @@ -11,3 +11,13 @@ config RAS_CEC Bear in mind that this is absolutely useless if your platform doesn't have ECC DIMMs and doesn't have DRAM ECC checking enabled in the BIOS. + +config RAS_CEC_DEBUG + bool "Debugging" + default n + depends on RAS_CEC + ---help--- + Add extra files to /sys/kernel/debug/ras/cec to test the correctable + memory error feature. "pfn" is a writable file that allows user to + simulate an error in a particular page frame. "array" is a read-only + file that dumps out the current state of all pages logged so far. diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index a2ceedcd8516..ff880a5c289a 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -118,7 +118,9 @@ static struct ce_array { } ce_arr; static DEFINE_MUTEX(ce_mutex); +#ifdef CONFIG_RAS_CEC_DEBUG static u64 dfs_pfn; +#endif /* Amount of errors after which we offline */ static unsigned int count_threshold = COUNT_MASK; @@ -364,6 +366,7 @@ static int u64_get(void *data, u64 *val) return 0; } +#ifdef CONFIG_RAS_CEC_DEBUG static int pfn_set(void *data, u64 val) { *(u64 *)data = val; @@ -372,6 +375,7 @@ static int pfn_set(void *data, u64 val) } DEFINE_DEBUGFS_ATTRIBUTE(pfn_ops, u64_get, pfn_set, "0x%llx\n"); +#endif static int decay_interval_set(void *data, u64 val) { @@ -411,6 +415,7 @@ static int enable_set(void *data, u64 val) DEFINE_DEBUGFS_ATTRIBUTE(enable_ops, u64_get, enable_set, "%lld\n"); +#ifdef CONFIG_RAS_CEC_DEBUG static int array_dump(struct seq_file *m, void *v) { struct ce_array *ca = &ce_arr; @@ -459,10 +464,14 @@ static const struct file_operations array_ops = { .llseek = seq_lseek, .release = single_release, }; +#endif static int __init create_debugfs_nodes(void) { - struct dentry *d, *pfn, *decay, *count, *array, *enable; + struct dentry *d, *decay, *count, *enable; +#ifdef CONFIG_RAS_CEC_DEBUG + struct dentry *pfn, *array; +#endif d = debugfs_create_dir("cec", ras_debugfs_dir); if (!d) { @@ -470,6 +479,7 @@ static int __init create_debugfs_nodes(void) return -1; } +#ifdef CONFIG_RAS_CEC_DEBUG pfn = debugfs_create_file("pfn", S_IRUSR | S_IWUSR, d, &dfs_pfn, &pfn_ops); if (!pfn) { pr_warn("Error creating pfn debugfs node!\n"); @@ -481,6 +491,7 @@ static int __init create_debugfs_nodes(void) pr_warn("Error creating array debugfs node!\n"); goto err; } +#endif decay = debugfs_create_file("decay_interval", S_IRUSR | S_IWUSR, d, &timer_interval, &decay_interval_ops); -- 2.19.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-19 0:07 ` Luck, Tony @ 2019-04-19 0:29 ` Borislav Petkov 2019-04-19 15:04 ` Luck, Tony 2019-04-20 5:50 ` Cong Wang 1 sibling, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-19 0:29 UTC (permalink / raw) To: Luck, Tony; +Cc: Cong Wang, LKML On Thu, Apr 18, 2019 at 05:07:45PM -0700, Luck, Tony wrote: > On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > > Which reminds me, Tony, I think all those debugging files "pfn" > > and "array" and the one you add now, should all be under a > > CONFIG_RAS_CEC_DEBUG which is default off and used only for development. > > Mind adding that too pls? > > Patch below, on top of previous patch. Note that I didn't move "enable" > into the RAS_CEC_DEBUG code. I think it has some value even on > production systems. And that value is? Usecase? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-19 0:29 ` Borislav Petkov @ 2019-04-19 15:04 ` Luck, Tony 2019-04-20 9:41 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Luck, Tony @ 2019-04-19 15:04 UTC (permalink / raw) To: Borislav Petkov; +Cc: Cong Wang, LKML On Fri, Apr 19, 2019 at 02:29:11AM +0200, Borislav Petkov wrote: > On Thu, Apr 18, 2019 at 05:07:45PM -0700, Luck, Tony wrote: > > On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > > > Which reminds me, Tony, I think all those debugging files "pfn" > > > and "array" and the one you add now, should all be under a > > > CONFIG_RAS_CEC_DEBUG which is default off and used only for development. > > > Mind adding that too pls? > > > > Patch below, on top of previous patch. Note that I didn't move "enable" > > into the RAS_CEC_DEBUG code. I think it has some value even on > > production systems. > > And that value is? Usecase? Suppose that an entire device on a DIMM fails. Systems with the right type of DIMM (X4) and a memory controller that implements https://en.wikipedia.org/wiki/Chipkill (Intel calls this "SDDC") can continue running ... but there will be a lot of corrected errors from a vast range of different pages. After fifteen or so errors Linux will trigger storm mode and the user will see: mce: CMCI storm detected: switching to poll mode on the console. As we poll we'll find errors and hand them to CEC. But because the errors come from far more than 512 distinct pages CEC will never manage to get a count above 1 before it drops the entry to make space for a new log. So the only indication that the user sees that something is wrong is that storm warning (and the lack of a following "storm subsided" message) tells them that errors are still happening. This amounts to a serviceability failure ... lack of useful diagnostics about a problem. Now there isn't really anything better that CEC can do in this situation. It won't help to have a bigger array. Taking pages offline wouldn't solve the problem (though if that did happen at least it would break the silence). Same situation for other DRAM failure modes that affect a wide range of pages (rank, bank, perhaps row ... though all the errors from a single row failure might fit in the CEC array). Allowing the user to bypass CEC (without a reboot ... cloud folks hate to reboot their systems) would allow the sysadmin to see what is happening (either via /dev/mcelog, or via EDAC driver). -Tony ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-19 15:04 ` Luck, Tony @ 2019-04-20 9:41 ` Borislav Petkov 2019-04-22 15:59 ` Luck, Tony 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-20 9:41 UTC (permalink / raw) To: Luck, Tony; +Cc: Cong Wang, LKML On Fri, Apr 19, 2019 at 08:04:01AM -0700, Luck, Tony wrote: > Now there isn't really anything better that CEC can do in > this situation. It won't help to have a bigger array. Taking > pages offline wouldn't solve the problem (though if that > did happen at least it would break the silence). > > Same situation for other DRAM failure modes that affect a > wide range of pages (rank, bank, perhaps row ... though all > the errors from a single row failure might fit in the CEC array). > > Allowing the user to bypass CEC (without a reboot ... cloud folks > hate to reboot their systems) would allow the sysadmin to see > what is happening (either via /dev/mcelog, or via EDAC driver). Err, this all sounds to me like the storm detection code should *automatically* disable the CEC in such cases, I'd say. Because I don't see a cloud admin going into the debugfs and turning it off. Rather, if the detection heuristic we use is smart enough, disabling it automatically should be a lot better serviceability action. Hmmm? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-20 9:41 ` Borislav Petkov @ 2019-04-22 15:59 ` Luck, Tony 2019-04-22 17:15 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Luck, Tony @ 2019-04-22 15:59 UTC (permalink / raw) To: Borislav Petkov; +Cc: Cong Wang, LKML > Err, this all sounds to me like the storm detection code should > *automatically* disable the CEC in such cases, I'd say. Sounds good. But we should distinguish storms that have many different addresses from storms that just ping a few addresses. CEC will see counts hit the threshold in the latter case, but it might not be able to take the pages offline (because they are locked, or in-use by kernel). So I think the change might be to the return value from NOTIFY_STOP to NOTIFY_DONE ... but only if we are in the middle of a storm AND the CEC array is full. -Tony ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 15:59 ` Luck, Tony @ 2019-04-22 17:15 ` Borislav Petkov 2019-04-22 17:44 ` Luck, Tony 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2019-04-22 17:15 UTC (permalink / raw) To: Luck, Tony; +Cc: Cong Wang, LKML On Mon, Apr 22, 2019 at 03:59:16PM +0000, Luck, Tony wrote: > > Err, this all sounds to me like the storm detection code should > > *automatically* disable the CEC in such cases, I'd say. > > Sounds good. But we should distinguish storms that have many different > addresses from storms that just ping a few addresses. CEC will see counts > hit the threshold in the latter case, but it might not be able to take the pages > offline (because they are locked, or in-use by kernel). > > So I think the change might be to the return value from NOTIFY_STOP to NOTIFY_DONE > ... but only if we are in the middle of a storm AND the CEC array is full. Well, regardless of this specific use case, isn't that a generic enough action that we should do always? I mean, the aspect of falling back to logging to external agent. However, currently we don't signal that the CEC is full - we simply remove the LRU element in cec_add_elem() before we insert the new one. We can either return a specific retval to say, CEC is full and we had to delete an elem or we can add a cec_is_full() accessor... -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 17:15 ` Borislav Petkov @ 2019-04-22 17:44 ` Luck, Tony 2019-04-22 18:08 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Luck, Tony @ 2019-04-22 17:44 UTC (permalink / raw) To: Borislav Petkov; +Cc: Cong Wang, LKML On Mon, Apr 22, 2019 at 07:15:32PM +0200, Borislav Petkov wrote: > On Mon, Apr 22, 2019 at 03:59:16PM +0000, Luck, Tony wrote: > > > Err, this all sounds to me like the storm detection code should > > > *automatically* disable the CEC in such cases, I'd say. > > > > Sounds good. But we should distinguish storms that have many different > > addresses from storms that just ping a few addresses. CEC will see counts > > hit the threshold in the latter case, but it might not be able to take the pages > > offline (because they are locked, or in-use by kernel). > > > > So I think the change might be to the return value from NOTIFY_STOP to NOTIFY_DONE > > ... but only if we are in the middle of a storm AND the CEC array is full. > > Well, regardless of this specific use case, isn't that a generic enough > action that we should do always? I mean, the aspect of falling back to > logging to external agent. Yes. Automating this would be a very good idea. > However, currently we don't signal that the CEC is full - we simply > remove the LRU element in cec_add_elem() before we insert the new one. > > We can either return a specific retval to say, CEC is full and we had to > delete an elem or we can add a cec_is_full() accessor... A lot depends on why the CEC is full, and which entry is being deleted to make room. In the case of many errors at different addresses we are deleting the entry with the lowest count. But all of the entries have low counts because we are just thrashing the array with many different addresses. In this situation a warning would be helpful. But in the case where the system has been up for months and we very slowly accumlated logs of bit flips. The periodic spring cleaning means they all have generation "00", but we never actually drop an old entry because of age. In this case dropping one entry to make space for a new one is fine and doesn't need any action. Perhaps we can distinguish the cases by the generation? If we are dropping an entry that was recently added, then it will still have generation "11" (or at least not "00"). Use that to trigger an action? -Tony ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-22 17:44 ` Luck, Tony @ 2019-04-22 18:08 ` Borislav Petkov 0 siblings, 0 replies; 25+ messages in thread From: Borislav Petkov @ 2019-04-22 18:08 UTC (permalink / raw) To: Luck, Tony; +Cc: Cong Wang, LKML On Mon, Apr 22, 2019 at 10:44:15AM -0700, Luck, Tony wrote: > Yes. Automating this would be a very good idea. Yeah, in general integrating the CEC better with the rest of the error chain is something we still need to discuss and do. > In the case of many errors at different addresses we are deleting > the entry with the lowest count. But all of the entries have low > counts because we are just thrashing the array with many different > addresses. In this situation a warning would be helpful. Can we detect that situation reliably even? You can have many errors at different addresses which have accumulated over time, due to a slow but constant stream of errors. Dunno if that is possible though... someone needs to analyze error occurrence patterns :-\ > But in the case where the system has been up for months and > we very slowly accumlated logs of bit flips. The periodic > spring cleaning means they all have generation "00", but > we never actually drop an old entry because of age. Yes, we drop only on insertion and when the array is full or when we soft-offline. > In this case dropping one entry to make space for a new one is fine > and doesn't need any action. > > Perhaps we can distinguish the cases by the generation? If > we are dropping an entry that was recently added, then it > will still have generation "11" (or at least not "00"). > Use that to trigger an action? That and the fact that we're in an error storm is probably a good enough heurstic. And then when the storm subsides, we reenable it? We basically say, error storm is over, the error rate should go back to normal so we can stick the CEC in front of it again. Hmmm. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-19 0:07 ` Luck, Tony 2019-04-19 0:29 ` Borislav Petkov @ 2019-04-20 5:50 ` Cong Wang 1 sibling, 0 replies; 25+ messages in thread From: Cong Wang @ 2019-04-20 5:50 UTC (permalink / raw) To: Luck, Tony; +Cc: Borislav Petkov, LKML On Thu, Apr 18, 2019 at 5:07 PM Luck, Tony <tony.luck@intel.com> wrote: > > On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > > Which reminds me, Tony, I think all those debugging files "pfn" > > and "array" and the one you add now, should all be under a > > CONFIG_RAS_CEC_DEBUG which is default off and used only for development. > > Mind adding that too pls? > > Patch below, on top of previous patch. Note that I didn't move "enable" > into the RAS_CEC_DEBUG code. I think it has some value even on > production systems. It is still in debugfs (which many production > systems don't mount) so I don't see that people are going to be randomly > using it to disable the CEC. For me, "enable" is useful to make mcelog work like before. Please see the other email from me for all the details. For debugfs, I believe many productions mount it, as tracing still uses debugfs rather than tracefs (at least on CentOS7), and rasdaemon also uses trace events to collect different errors. Therefore, I believe your patch is fine as it is. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time 2019-04-18 22:02 [PATCH] RAS/CEC: Add debugfs switch to disable at run time Tony Luck 2019-04-18 22:51 ` Cong Wang @ 2019-04-20 19:50 ` Borislav Petkov 1 sibling, 0 replies; 25+ messages in thread From: Borislav Petkov @ 2019-04-20 19:50 UTC (permalink / raw) To: Tony Luck; +Cc: linux-kernel On Thu, Apr 18, 2019 at 03:02:29PM -0700, Tony Luck wrote: > Useful when running error injection tests that want to > see all of the MCi_(STATUS|ADDR|MISC) data via /dev/mcelog. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > drivers/ras/cec.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c > index 2d9ec378a8bc..a2ceedcd8516 100644 > --- a/drivers/ras/cec.c > +++ b/drivers/ras/cec.c > @@ -123,6 +123,9 @@ static u64 dfs_pfn; > /* Amount of errors after which we offline */ > static unsigned int count_threshold = COUNT_MASK; > > +/* debugfs switch to enable/disable CEC */ > +static u64 cec_enabled = 1; > + > /* > * The timer "decays" element count each timer_interval which is 24hrs by > * default. > @@ -400,6 +403,14 @@ static int count_threshold_set(void *data, u64 val) > } > DEFINE_DEBUGFS_ATTRIBUTE(count_threshold_ops, u64_get, count_threshold_set, "%lld\n"); > > +static int enable_set(void *data, u64 val) > +{ > + ce_arr.disabled = !val; Btw, regardless of what we end up doing wrt hiding this switch under RAS_CEC_DEBUG or exposing it, that ce_arr.disabled flag is not needed anymore and you can remove it along with the flags union in ce_array with the next version of this. Because one of the cec_enabled or ce_arr.disabled is now redundant and I think you want to have cec_enabled. :) Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2019-04-22 18:08 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-18 22:02 [PATCH] RAS/CEC: Add debugfs switch to disable at run time Tony Luck 2019-04-18 22:51 ` Cong Wang 2019-04-18 23:29 ` Borislav Petkov 2019-04-18 23:58 ` Cong Wang 2019-04-19 0:26 ` Borislav Petkov 2019-04-20 5:43 ` Cong Wang 2019-04-20 9:13 ` Borislav Petkov 2019-04-20 18:18 ` Cong Wang 2019-04-20 18:47 ` Borislav Petkov 2019-04-20 19:08 ` Cong Wang 2019-04-22 16:29 ` Luck, Tony 2019-04-22 16:31 ` Borislav Petkov 2019-04-22 16:43 ` Luck, Tony 2019-04-22 17:05 ` Borislav Petkov 2019-04-22 17:23 ` Luck, Tony 2019-04-19 0:07 ` Luck, Tony 2019-04-19 0:29 ` Borislav Petkov 2019-04-19 15:04 ` Luck, Tony 2019-04-20 9:41 ` Borislav Petkov 2019-04-22 15:59 ` Luck, Tony 2019-04-22 17:15 ` Borislav Petkov 2019-04-22 17:44 ` Luck, Tony 2019-04-22 18:08 ` Borislav Petkov 2019-04-20 5:50 ` Cong Wang 2019-04-20 19:50 ` Borislav Petkov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.