* [PATCH 0/3] enable ghes_edac on selected platforms @ 2017-07-17 21:59 Toshi Kani 2017-07-17 21:59 ` [1/3] " Toshi Kani ` (2 more replies) 0 siblings, 3 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel The ghes_edac driver was introduced in 2013 [1], but it has not been enabled by any distro yet. This is because the driver obtains error info from firmware interfaces, which are not properly implemented on many platforms. To get out from this situation, add a platform check to selectively enable the driver on the platforms that are known to have proper firmware implementation. Platform vendors can add their platforms to the list when they support ghes_edac. Patch 1 moves the platform check in acpi_blacklisted() to a common utility func, acpi_match_oemlist(). Patch 2 converts the intel_pstate driver to use acpi_match_oemlist(). Patch 3 introduces a platform check to the ghes_edac driver. --- Toshi Kani (3): 1/3 ACPI / blacklist: add acpi_match_oemlist() interface 2/3 intel_pstate: convert to use acpi_match_oemlist() 3/3 ghes_edac: add platform check to enable ghes_edac --- drivers/acpi/blacklist.c | 84 ++++++++---------------------------------- drivers/acpi/utils.c | 40 ++++++++++++++++++++ drivers/cpufreq/intel_pstate.c | 64 +++++++++++++------------------- drivers/edac/ghes_edac.c | 28 +++++++++++--- include/linux/acpi.h | 19 ++++++++++ 5 files changed, 122 insertions(+), 113 deletions(-) ^ permalink raw reply [flat|nested] 238+ messages in thread
* [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-17 21:59 ` Toshi Kani 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel, Toshi Kani ACPI OEM ID / OEM Table ID / Revision can be used to identify platform type based on ACPI firmware. acpi_blacklisted(), intel_pstate_platform_pwr_mgmt_exists() and some other funcs have been using this type of check to detect a list of platforms that require special handlings. Move the platform type check in acpi_blacklisted() to a common utility function, acpi_match_oemlist(), so that other drivers do not have to implement their own. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/acpi/blacklist.c | 84 ++++++++-------------------------------------- drivers/acpi/utils.c | 40 ++++++++++++++++++++++ include/linux/acpi.h | 19 ++++++++++ 3 files changed, 74 insertions(+), 69 deletions(-) diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c index bb542ac..288fe4d 100644 --- a/drivers/acpi/blacklist.c +++ b/drivers/acpi/blacklist.c @@ -30,30 +30,13 @@ #include "internal.h" -enum acpi_blacklist_predicates { - all_versions, - less_than_or_equal, - equal, - greater_than_or_equal, -}; - -struct acpi_blacklist_item { - char oem_id[7]; - char oem_table_id[9]; - u32 oem_revision; - char *table; - enum acpi_blacklist_predicates oem_revision_predicate; - char *reason; - u32 is_critical_error; -}; - static struct dmi_system_id acpi_rev_dmi_table[] __initdata; /* * POLICY: If *anything* doesn't work, put it on the blacklist. * If they are critical errors, mark it critical, and abort driver load. */ -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { +static struct acpi_oemlist acpi_blacklist[] __initdata = { /* Compaq Presario 1700 */ {"PTLTD ", " DSDT ", 0x06040000, ACPI_SIG_DSDT, less_than_or_equal, "Multiple problems", 1}, @@ -67,65 +50,28 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = { {"IBM ", "TP600E ", 0x00000105, ACPI_SIG_DSDT, less_than_or_equal, "Incorrect _ADR", 1}, - {""} + { } }; int __init acpi_blacklisted(void) { - int i = 0; + int i; int blacklisted = 0; - struct acpi_table_header table_header; - - while (acpi_blacklist[i].oem_id[0] != '\0') { - if (acpi_get_table_header(acpi_blacklist[i].table, 0, &table_header)) { - i++; - continue; - } - - if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) { - i++; - continue; - } - - if (strncmp - (acpi_blacklist[i].oem_table_id, table_header.oem_table_id, - 8)) { - i++; - continue; - } - - if ((acpi_blacklist[i].oem_revision_predicate == all_versions) - || (acpi_blacklist[i].oem_revision_predicate == - less_than_or_equal - && table_header.oem_revision <= - acpi_blacklist[i].oem_revision) - || (acpi_blacklist[i].oem_revision_predicate == - greater_than_or_equal - && table_header.oem_revision >= - acpi_blacklist[i].oem_revision) - || (acpi_blacklist[i].oem_revision_predicate == equal - && table_header.oem_revision == - acpi_blacklist[i].oem_revision)) { - printk(KERN_ERR PREFIX - "Vendor \"%6.6s\" System \"%8.8s\" " - "Revision 0x%x has a known ACPI BIOS problem.\n", - acpi_blacklist[i].oem_id, - acpi_blacklist[i].oem_table_id, - acpi_blacklist[i].oem_revision); + i = acpi_match_oemlist(acpi_blacklist); + if (i >= 0) { + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " + "Revision 0x%x has a known ACPI BIOS problem.\n", + acpi_blacklist[i].oem_id, + acpi_blacklist[i].oem_table_id, + acpi_blacklist[i].oem_revision); - printk(KERN_ERR PREFIX - "Reason: %s. This is a %s error\n", - acpi_blacklist[i].reason, - (acpi_blacklist[i]. - is_critical_error ? "non-recoverable" : - "recoverable")); + pr_err(PREFIX "Reason: %s. This is a %s error\n", + acpi_blacklist[i].reason, + (acpi_blacklist[i].data ? + "non-recoverable" : "recoverable")); - blacklisted = acpi_blacklist[i].is_critical_error; - break; - } else { - i++; - } + blacklisted = acpi_blacklist[i].data; } (void)early_acpi_osi_init(); diff --git a/drivers/acpi/utils.c b/drivers/acpi/utils.c index b9d956c..e5909d5 100644 --- a/drivers/acpi/utils.c +++ b/drivers/acpi/utils.c @@ -816,3 +816,43 @@ static int __init acpi_backlight(char *str) return 1; } __setup("acpi_backlight=", acpi_backlight); + +/** + * acpi_match_oemlist - Check if the system matches with an oem list + * @oem: pointer to acpi_oemlist table terminated by a NULL entry + * + * Return the matched index if the system is found in the oem list. + * Otherwise, return a negative error code. + */ +int acpi_match_oemlist(const struct acpi_oemlist *oem) +{ + struct acpi_table_header hdr; + int idx = 0; + + if (acpi_disabled) + return -ENODEV; + + for (; oem->oem_id[0]; oem++, idx++) { + if (ACPI_FAILURE(acpi_get_table_header(oem->table, 0, &hdr))) + continue; + + if (strncmp(oem->oem_id, hdr.oem_id, ACPI_OEM_ID_SIZE)) + continue; + + if (strncmp(oem->oem_table_id, hdr.oem_table_id, + ACPI_OEM_TABLE_ID_SIZE)) + continue; + + if ((oem->oem_revision_predicate == all_versions) || + (oem->oem_revision_predicate == less_than_or_equal + && hdr.oem_revision <= oem->oem_revision) || + (oem->oem_revision_predicate == greater_than_or_equal + && hdr.oem_revision >= oem->oem_revision) || + (oem->oem_revision_predicate == equal + && hdr.oem_revision == oem->oem_revision)) + return idx; + } + + return -ENODEV; +} +EXPORT_SYMBOL(acpi_match_oemlist); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index c749eef..86479b5 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -556,6 +556,25 @@ extern acpi_status acpi_pci_osc_control_set(acpi_handle handle, #define ACPI_OST_SC_DRIVER_LOAD_FAILURE 0x81 #define ACPI_OST_SC_INSERT_NOT_SUPPORTED 0x82 +enum acpi_oemlist_predicates { + all_versions, + less_than_or_equal, + equal, + greater_than_or_equal, +}; + +/* Table must be terminted by a NULL entry */ +struct acpi_oemlist { + char oem_id[ACPI_OEM_ID_SIZE]; + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE]; + u32 oem_revision; + char *table; + enum acpi_oemlist_predicates oem_revision_predicate; + char *reason; + u32 data; +}; +int acpi_match_oemlist(const struct acpi_oemlist *oem); + extern void acpi_early_init(void); extern void acpi_subsystem_init(void); ^ permalink raw reply related [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-17 21:59 ` Toshi Kani 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel, Toshi Kani ACPI OEM ID / OEM Table ID / Revision can be used to identify platform type based on ACPI firmware. acpi_blacklisted(), intel_pstate_platform_pwr_mgmt_exists() and some other funcs have been using this type of check to detect a list of platforms that require special handlings. Move the platform type check in acpi_blacklisted() to a common utility function, acpi_match_oemlist(), so that other drivers do not have to implement their own. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/acpi/blacklist.c | 84 ++++++++-------------------------------------- drivers/acpi/utils.c | 40 ++++++++++++++++++++++ include/linux/acpi.h | 19 ++++++++++ 3 files changed, 74 insertions(+), 69 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c index bb542ac..288fe4d 100644 --- a/drivers/acpi/blacklist.c +++ b/drivers/acpi/blacklist.c @@ -30,30 +30,13 @@ #include "internal.h" -enum acpi_blacklist_predicates { - all_versions, - less_than_or_equal, - equal, - greater_than_or_equal, -}; - -struct acpi_blacklist_item { - char oem_id[7]; - char oem_table_id[9]; - u32 oem_revision; - char *table; - enum acpi_blacklist_predicates oem_revision_predicate; - char *reason; - u32 is_critical_error; -}; - static struct dmi_system_id acpi_rev_dmi_table[] __initdata; /* * POLICY: If *anything* doesn't work, put it on the blacklist. * If they are critical errors, mark it critical, and abort driver load. */ -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { +static struct acpi_oemlist acpi_blacklist[] __initdata = { /* Compaq Presario 1700 */ {"PTLTD ", " DSDT ", 0x06040000, ACPI_SIG_DSDT, less_than_or_equal, "Multiple problems", 1}, @@ -67,65 +50,28 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = { {"IBM ", "TP600E ", 0x00000105, ACPI_SIG_DSDT, less_than_or_equal, "Incorrect _ADR", 1}, - {""} + { } }; int __init acpi_blacklisted(void) { - int i = 0; + int i; int blacklisted = 0; - struct acpi_table_header table_header; - - while (acpi_blacklist[i].oem_id[0] != '\0') { - if (acpi_get_table_header(acpi_blacklist[i].table, 0, &table_header)) { - i++; - continue; - } - - if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) { - i++; - continue; - } - - if (strncmp - (acpi_blacklist[i].oem_table_id, table_header.oem_table_id, - 8)) { - i++; - continue; - } - - if ((acpi_blacklist[i].oem_revision_predicate == all_versions) - || (acpi_blacklist[i].oem_revision_predicate == - less_than_or_equal - && table_header.oem_revision <= - acpi_blacklist[i].oem_revision) - || (acpi_blacklist[i].oem_revision_predicate == - greater_than_or_equal - && table_header.oem_revision >= - acpi_blacklist[i].oem_revision) - || (acpi_blacklist[i].oem_revision_predicate == equal - && table_header.oem_revision == - acpi_blacklist[i].oem_revision)) { - printk(KERN_ERR PREFIX - "Vendor \"%6.6s\" System \"%8.8s\" " - "Revision 0x%x has a known ACPI BIOS problem.\n", - acpi_blacklist[i].oem_id, - acpi_blacklist[i].oem_table_id, - acpi_blacklist[i].oem_revision); + i = acpi_match_oemlist(acpi_blacklist); + if (i >= 0) { + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " + "Revision 0x%x has a known ACPI BIOS problem.\n", + acpi_blacklist[i].oem_id, + acpi_blacklist[i].oem_table_id, + acpi_blacklist[i].oem_revision); - printk(KERN_ERR PREFIX - "Reason: %s. This is a %s error\n", - acpi_blacklist[i].reason, - (acpi_blacklist[i]. - is_critical_error ? "non-recoverable" : - "recoverable")); + pr_err(PREFIX "Reason: %s. This is a %s error\n", + acpi_blacklist[i].reason, + (acpi_blacklist[i].data ? + "non-recoverable" : "recoverable")); - blacklisted = acpi_blacklist[i].is_critical_error; - break; - } else { - i++; - } + blacklisted = acpi_blacklist[i].data; } (void)early_acpi_osi_init(); diff --git a/drivers/acpi/utils.c b/drivers/acpi/utils.c index b9d956c..e5909d5 100644 --- a/drivers/acpi/utils.c +++ b/drivers/acpi/utils.c @@ -816,3 +816,43 @@ static int __init acpi_backlight(char *str) return 1; } __setup("acpi_backlight=", acpi_backlight); + +/** + * acpi_match_oemlist - Check if the system matches with an oem list + * @oem: pointer to acpi_oemlist table terminated by a NULL entry + * + * Return the matched index if the system is found in the oem list. + * Otherwise, return a negative error code. + */ +int acpi_match_oemlist(const struct acpi_oemlist *oem) +{ + struct acpi_table_header hdr; + int idx = 0; + + if (acpi_disabled) + return -ENODEV; + + for (; oem->oem_id[0]; oem++, idx++) { + if (ACPI_FAILURE(acpi_get_table_header(oem->table, 0, &hdr))) + continue; + + if (strncmp(oem->oem_id, hdr.oem_id, ACPI_OEM_ID_SIZE)) + continue; + + if (strncmp(oem->oem_table_id, hdr.oem_table_id, + ACPI_OEM_TABLE_ID_SIZE)) + continue; + + if ((oem->oem_revision_predicate == all_versions) || + (oem->oem_revision_predicate == less_than_or_equal + && hdr.oem_revision <= oem->oem_revision) || + (oem->oem_revision_predicate == greater_than_or_equal + && hdr.oem_revision >= oem->oem_revision) || + (oem->oem_revision_predicate == equal + && hdr.oem_revision == oem->oem_revision)) + return idx; + } + + return -ENODEV; +} +EXPORT_SYMBOL(acpi_match_oemlist); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index c749eef..86479b5 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -556,6 +556,25 @@ extern acpi_status acpi_pci_osc_control_set(acpi_handle handle, #define ACPI_OST_SC_DRIVER_LOAD_FAILURE 0x81 #define ACPI_OST_SC_INSERT_NOT_SUPPORTED 0x82 +enum acpi_oemlist_predicates { + all_versions, + less_than_or_equal, + equal, + greater_than_or_equal, +}; + +/* Table must be terminted by a NULL entry */ +struct acpi_oemlist { + char oem_id[ACPI_OEM_ID_SIZE]; + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE]; + u32 oem_revision; + char *table; + enum acpi_oemlist_predicates oem_revision_predicate; + char *reason; + u32 data; +}; +int acpi_match_oemlist(const struct acpi_oemlist *oem); + extern void acpi_early_init(void); extern void acpi_subsystem_init(void); ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 5:34 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 5:34 UTC (permalink / raw) To: Toshi Kani Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Mon, Jul 17, 2017 at 03:59:10PM -0600, Toshi Kani wrote: > ACPI OEM ID / OEM Table ID / Revision can be used to identify > platform type based on ACPI firmware. acpi_blacklisted(), > intel_pstate_platform_pwr_mgmt_exists() and some other funcs > have been using this type of check to detect a list of platforms > that require special handlings. > > Move the platform type check in acpi_blacklisted() to a common > utility function, acpi_match_oemlist(), so that other drivers > do not have to implement their own. > > There is no change in functionality. > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > --- > drivers/acpi/blacklist.c | 84 ++++++++-------------------------------------- > drivers/acpi/utils.c | 40 ++++++++++++++++++++++ > include/linux/acpi.h | 19 ++++++++++ > 3 files changed, 74 insertions(+), 69 deletions(-) > > diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c > index bb542ac..288fe4d 100644 > --- a/drivers/acpi/blacklist.c > +++ b/drivers/acpi/blacklist.c > @@ -30,30 +30,13 @@ > > #include "internal.h" > > -enum acpi_blacklist_predicates { > - all_versions, > - less_than_or_equal, > - equal, > - greater_than_or_equal, > -}; > - > -struct acpi_blacklist_item { > - char oem_id[7]; > - char oem_table_id[9]; > - u32 oem_revision; > - char *table; > - enum acpi_blacklist_predicates oem_revision_predicate; > - char *reason; > - u32 is_critical_error; > -}; > - > static struct dmi_system_id acpi_rev_dmi_table[] __initdata; > > /* > * POLICY: If *anything* doesn't work, put it on the blacklist. > * If they are critical errors, mark it critical, and abort driver load. > */ > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > +static struct acpi_oemlist acpi_blacklist[] __initdata = { Why the arbitrary rename? If anything, you should shorten that enum acpi_blacklist_predicates oem_revision_predicate; unreadable insanity. > /* Compaq Presario 1700 */ > {"PTLTD ", " DSDT ", 0x06040000, ACPI_SIG_DSDT, less_than_or_equal, > "Multiple problems", 1}, > @@ -67,65 +50,28 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > {"IBM ", "TP600E ", 0x00000105, ACPI_SIG_DSDT, less_than_or_equal, > "Incorrect _ADR", 1}, > > - {""} > + { } > }; > > int __init acpi_blacklisted(void) > { > - int i = 0; > + int i; > int blacklisted = 0; > - struct acpi_table_header table_header; > - > - while (acpi_blacklist[i].oem_id[0] != '\0') { > - if (acpi_get_table_header(acpi_blacklist[i].table, 0, &table_header)) { > - i++; > - continue; > - } > - > - if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) { > - i++; > - continue; > - } > - > - if (strncmp > - (acpi_blacklist[i].oem_table_id, table_header.oem_table_id, > - 8)) { > - i++; > - continue; > - } > - > - if ((acpi_blacklist[i].oem_revision_predicate == all_versions) > - || (acpi_blacklist[i].oem_revision_predicate == > - less_than_or_equal > - && table_header.oem_revision <= > - acpi_blacklist[i].oem_revision) > - || (acpi_blacklist[i].oem_revision_predicate == > - greater_than_or_equal > - && table_header.oem_revision >= > - acpi_blacklist[i].oem_revision) > - || (acpi_blacklist[i].oem_revision_predicate == equal > - && table_header.oem_revision == > - acpi_blacklist[i].oem_revision)) { > > - printk(KERN_ERR PREFIX > - "Vendor \"%6.6s\" System \"%8.8s\" " > - "Revision 0x%x has a known ACPI BIOS problem.\n", > - acpi_blacklist[i].oem_id, > - acpi_blacklist[i].oem_table_id, > - acpi_blacklist[i].oem_revision); > + i = acpi_match_oemlist(acpi_blacklist); > + if (i >= 0) { > + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " > + "Revision 0x%x has a known ACPI BIOS problem.\n", Put that string on a single line for grepping. checkpatch catches that error, didn't you see it? > + acpi_blacklist[i].oem_id, > + acpi_blacklist[i].oem_table_id, > + acpi_blacklist[i].oem_revision); > > - printk(KERN_ERR PREFIX > - "Reason: %s. This is a %s error\n", > - acpi_blacklist[i].reason, > - (acpi_blacklist[i]. > - is_critical_error ? "non-recoverable" : > - "recoverable")); > + pr_err(PREFIX "Reason: %s. This is a %s error\n", > + acpi_blacklist[i].reason, > + (acpi_blacklist[i].data ? > + "non-recoverable" : "recoverable")); > > - blacklisted = acpi_blacklist[i].is_critical_error; > - break; > - } else { > - i++; > - } > + blacklisted = acpi_blacklist[i].data; > } > > (void)early_acpi_osi_init(); -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 5:34 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 5:34 UTC (permalink / raw) To: Toshi Kani Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Mon, Jul 17, 2017 at 03:59:10PM -0600, Toshi Kani wrote: > ACPI OEM ID / OEM Table ID / Revision can be used to identify > platform type based on ACPI firmware. acpi_blacklisted(), > intel_pstate_platform_pwr_mgmt_exists() and some other funcs > have been using this type of check to detect a list of platforms > that require special handlings. > > Move the platform type check in acpi_blacklisted() to a common > utility function, acpi_match_oemlist(), so that other drivers > do not have to implement their own. > > There is no change in functionality. > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > --- > drivers/acpi/blacklist.c | 84 ++++++++-------------------------------------- > drivers/acpi/utils.c | 40 ++++++++++++++++++++++ > include/linux/acpi.h | 19 ++++++++++ > 3 files changed, 74 insertions(+), 69 deletions(-) > > diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c > index bb542ac..288fe4d 100644 > --- a/drivers/acpi/blacklist.c > +++ b/drivers/acpi/blacklist.c > @@ -30,30 +30,13 @@ > > #include "internal.h" > > -enum acpi_blacklist_predicates { > - all_versions, > - less_than_or_equal, > - equal, > - greater_than_or_equal, > -}; > - > -struct acpi_blacklist_item { > - char oem_id[7]; > - char oem_table_id[9]; > - u32 oem_revision; > - char *table; > - enum acpi_blacklist_predicates oem_revision_predicate; > - char *reason; > - u32 is_critical_error; > -}; > - > static struct dmi_system_id acpi_rev_dmi_table[] __initdata; > > /* > * POLICY: If *anything* doesn't work, put it on the blacklist. > * If they are critical errors, mark it critical, and abort driver load. > */ > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > +static struct acpi_oemlist acpi_blacklist[] __initdata = { Why the arbitrary rename? If anything, you should shorten that enum acpi_blacklist_predicates oem_revision_predicate; unreadable insanity. > /* Compaq Presario 1700 */ > {"PTLTD ", " DSDT ", 0x06040000, ACPI_SIG_DSDT, less_than_or_equal, > "Multiple problems", 1}, > @@ -67,65 +50,28 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > {"IBM ", "TP600E ", 0x00000105, ACPI_SIG_DSDT, less_than_or_equal, > "Incorrect _ADR", 1}, > > - {""} > + { } > }; > > int __init acpi_blacklisted(void) > { > - int i = 0; > + int i; > int blacklisted = 0; > - struct acpi_table_header table_header; > - > - while (acpi_blacklist[i].oem_id[0] != '\0') { > - if (acpi_get_table_header(acpi_blacklist[i].table, 0, &table_header)) { > - i++; > - continue; > - } > - > - if (strncmp(acpi_blacklist[i].oem_id, table_header.oem_id, 6)) { > - i++; > - continue; > - } > - > - if (strncmp > - (acpi_blacklist[i].oem_table_id, table_header.oem_table_id, > - 8)) { > - i++; > - continue; > - } > - > - if ((acpi_blacklist[i].oem_revision_predicate == all_versions) > - || (acpi_blacklist[i].oem_revision_predicate == > - less_than_or_equal > - && table_header.oem_revision <= > - acpi_blacklist[i].oem_revision) > - || (acpi_blacklist[i].oem_revision_predicate == > - greater_than_or_equal > - && table_header.oem_revision >= > - acpi_blacklist[i].oem_revision) > - || (acpi_blacklist[i].oem_revision_predicate == equal > - && table_header.oem_revision == > - acpi_blacklist[i].oem_revision)) { > > - printk(KERN_ERR PREFIX > - "Vendor \"%6.6s\" System \"%8.8s\" " > - "Revision 0x%x has a known ACPI BIOS problem.\n", > - acpi_blacklist[i].oem_id, > - acpi_blacklist[i].oem_table_id, > - acpi_blacklist[i].oem_revision); > + i = acpi_match_oemlist(acpi_blacklist); > + if (i >= 0) { > + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " > + "Revision 0x%x has a known ACPI BIOS problem.\n", Put that string on a single line for grepping. checkpatch catches that error, didn't you see it? > + acpi_blacklist[i].oem_id, > + acpi_blacklist[i].oem_table_id, > + acpi_blacklist[i].oem_revision); > > - printk(KERN_ERR PREFIX > - "Reason: %s. This is a %s error\n", > - acpi_blacklist[i].reason, > - (acpi_blacklist[i]. > - is_critical_error ? "non-recoverable" : > - "recoverable")); > + pr_err(PREFIX "Reason: %s. This is a %s error\n", > + acpi_blacklist[i].reason, > + (acpi_blacklist[i].data ? > + "non-recoverable" : "recoverable")); > > - blacklisted = acpi_blacklist[i].is_critical_error; > - break; > - } else { > - i++; > - } > + blacklisted = acpi_blacklist[i].data; > } > > (void)early_acpi_osi_init(); ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 5:34 ` [1/3] " Borislav Petkov (?) @ 2017-07-18 15:48 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 15:48 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 07:34 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:10PM -0600, Toshi Kani wrote: > > ACPI OEM ID / OEM Table ID / Revision can be used to identify > > platform type based on ACPI firmware. acpi_blacklisted(), > > intel_pstate_platform_pwr_mgmt_exists() and some other funcs > > have been using this type of check to detect a list of platforms > > that require special handlings. > > > > Move the platform type check in acpi_blacklisted() to a common > > utility function, acpi_match_oemlist(), so that other drivers > > do not have to implement their own. > > > > There is no change in functionality. : > > /* > > * POLICY: If *anything* doesn't work, put it on the blacklist. > > * If they are critical errors, mark it critical, and > > abort driver load. > > */ > > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > > +static struct acpi_oemlist acpi_blacklist[] __initdata = { > > Why the arbitrary rename? This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" as a common structure, and replaces this specific 'struct acpi_blacklist'. > If anything, you should shorten that > > enum acpi_blacklist_predicates oem_revision_predicate; > > unreadable insanity. Agreed. Will change to a shorter name like below. enum acpi_oemlist_pred predicate; + i = acpi_match_oemlist(acpi_blacklist); > > + if (i >= 0) { > > + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " > > + "Revision 0x%x has a known ACPI BIOS > > problem.\n", > > Put that string on a single line for grepping. checkpatch catches > that error, didn't you see it? Will do. Thanks! -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 15:48 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 15:48 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 07:34 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:10PM -0600, Toshi Kani wrote: > > ACPI OEM ID / OEM Table ID / Revision can be used to identify > > platform type based on ACPI firmware. acpi_blacklisted(), > > intel_pstate_platform_pwr_mgmt_exists() and some other funcs > > have been using this type of check to detect a list of platforms > > that require special handlings. > > > > Move the platform type check in acpi_blacklisted() to a common > > utility function, acpi_match_oemlist(), so that other drivers > > do not have to implement their own. > > > > There is no change in functionality. : > > /* > > * POLICY: If *anything* doesn't work, put it on the blacklist. > > * If they are critical errors, mark it critical, and > > abort driver load. > > */ > > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > > +static struct acpi_oemlist acpi_blacklist[] __initdata = { > > Why the arbitrary rename? This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" as a common structure, and replaces this specific 'struct acpi_blacklist'. > If anything, you should shorten that > > enum acpi_blacklist_predicates oem_revision_predicate; > > unreadable insanity. Agreed. Will change to a shorter name like below. enum acpi_oemlist_pred predicate; + i = acpi_match_oemlist(acpi_blacklist); > > + if (i >= 0) { > > + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " > > + "Revision 0x%x has a known ACPI BIOS > > problem.\n", > > Put that string on a single line for grepping. checkpatch catches > that error, didn't you see it? Will do. Thanks! -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 15:48 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 15:48 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 07:34 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:10PM -0600, Toshi Kani wrote: > > ACPI OEM ID / OEM Table ID / Revision can be used to identify > > platform type based on ACPI firmware. acpi_blacklisted(), > > intel_pstate_platform_pwr_mgmt_exists() and some other funcs > > have been using this type of check to detect a list of platforms > > that require special handlings. > > > > Move the platform type check in acpi_blacklisted() to a common > > utility function, acpi_match_oemlist(), so that other drivers > > do not have to implement their own. > > > > There is no change in functionality. : > > /* > > * POLICY: If *anything* doesn't work, put it on the blacklist. > > * If they are critical errors, mark it critical, and > > abort driver load. > > */ > > -static struct acpi_blacklist_item acpi_blacklist[] __initdata = { > > +static struct acpi_oemlist acpi_blacklist[] __initdata = { > > Why the arbitrary rename? This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" as a common structure, and replaces this specific 'struct acpi_blacklist'. > If anything, you should shorten that > > enum acpi_blacklist_predicates oem_revision_predicate; > > unreadable insanity. Agreed. Will change to a shorter name like below. enum acpi_oemlist_pred predicate; + i = acpi_match_oemlist(acpi_blacklist); > > + if (i >= 0) { > > + pr_err(PREFIX "Vendor \"%6.6s\" System \"%8.8s\" " > > + "Revision 0x%x has a known ACPI BIOS > > problem.\n", > > Put that string on a single line for grepping. checkpatch catches > that error, didn't you see it? Will do. Thanks! -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 15:48 ` [PATCH 1/3] " Kani, Toshimitsu (?) @ 2017-07-18 16:43 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 16:43 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 03:48:54PM +0000, Kani, Toshimitsu wrote: > This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" as a I see that. > common structure, and replaces this specific 'struct acpi_blacklist'. And what makes acpi_oemlist "common" and acpi_blacklist "specific"? So let me save you some time - "oemlist" is more specific than "blacklist" and I can imagine a blacklist item not always being oem-specific. What I'm hinting at is, don't change that name. acpi_blacklist is just fine. > Agreed. Will change to a shorter name like below. > > enum acpi_oemlist_pred predicate; enum acpi_predicate pred; is even better. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 16:43 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 16:43 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 03:48:54PM +0000, Kani, Toshimitsu wrote: > This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" as a I see that. > common structure, and replaces this specific 'struct acpi_blacklist'. And what makes acpi_oemlist "common" and acpi_blacklist "specific"? So let me save you some time - "oemlist" is more specific than "blacklist" and I can imagine a blacklist item not always being oem-specific. What I'm hinting at is, don't change that name. acpi_blacklist is just fine. > Agreed. Will change to a shorter name like below. > > enum acpi_oemlist_pred predicate; enum acpi_predicate pred; is even better. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 16:43 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 16:43 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 03:48:54PM +0000, Kani, Toshimitsu wrote: > This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" as a I see that. > common structure, and replaces this specific 'struct acpi_blacklist'. And what makes acpi_oemlist "common" and acpi_blacklist "specific"? So let me save you some time - "oemlist" is more specific than "blacklist" and I can imagine a blacklist item not always being oem-specific. What I'm hinting at is, don't change that name. acpi_blacklist is just fine. > Agreed. Will change to a shorter name like below. > > enum acpi_oemlist_pred predicate; enum acpi_predicate pred; is even better. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 16:43 ` [PATCH 1/3] " Borislav Petkov (?) @ 2017-07-18 17:24 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 17:24 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 18:43 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 03:48:54PM +0000, Kani, Toshimitsu wrote: > > This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" > > as a > > I see that. > > > common structure, and replaces this specific 'struct > > acpi_blacklist'. > > And what makes acpi_oemlist "common" and acpi_blacklist "specific"? > > So let me save you some time - "oemlist" is more specific than > "blacklist" and I can imagine a blacklist item not always being > oem-specific. > > What I'm hinting at is, don't change that name. acpi_blacklist is > just fine. Well, a list does not need to be a black-list. It can be a white-list or anything that matters. The caller defines the usage of a list. So, I tried to avoid putting any usage to the structure name. > > Agreed. Will change to a shorter name like below. > > > > enum acpi_oemlist_pred predicate; > > enum acpi_predicate pred; > > is even better. Sounds good. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 17:24 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 17:24 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac T24gVHVlLCAyMDE3LTA3LTE4IGF0IDE4OjQzICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6 DQo+IE9uIFR1ZSwgSnVsIDE4LCAyMDE3IGF0IDAzOjQ4OjU0UE0gKzAwMDAsIEthbmksIFRvc2hp bWl0c3Ugd3JvdGU6DQo+ID4gVGhpcyBwYXRjaCBkZWZpbmVzICdzdHJ1Y3QgYWNwaV9vZW1saXN0 JyBpbiAiaW5jbHVkZS9saW51eC9hY3BpLmgiDQo+ID4gYXMgYQ0KPiANCj4gSSBzZWUgdGhhdC4N Cj4gDQo+ID4gY29tbW9uIHN0cnVjdHVyZSwgYW5kIHJlcGxhY2VzIHRoaXMgc3BlY2lmaWMgJ3N0 cnVjdA0KPiA+IGFjcGlfYmxhY2tsaXN0Jy4NCj4gDQo+IEFuZCB3aGF0IG1ha2VzIGFjcGlfb2Vt bGlzdCAiY29tbW9uIiBhbmQgYWNwaV9ibGFja2xpc3QgInNwZWNpZmljIj8NCj4gDQo+IFNvIGxl dCBtZSBzYXZlIHlvdSBzb21lIHRpbWUgLSAib2VtbGlzdCIgaXMgbW9yZSBzcGVjaWZpYyB0aGFu DQo+ICJibGFja2xpc3QiIGFuZCBJIGNhbiBpbWFnaW5lIGEgYmxhY2tsaXN0IGl0ZW0gbm90IGFs d2F5cyBiZWluZw0KPiBvZW0tc3BlY2lmaWMuDQo+IA0KPiBXaGF0IEknbSBoaW50aW5nIGF0IGlz LCBkb24ndCBjaGFuZ2UgdGhhdCBuYW1lLiBhY3BpX2JsYWNrbGlzdCBpcw0KPiBqdXN0IGZpbmUu DQoNCldlbGwsIGEgbGlzdCBkb2VzIG5vdCBuZWVkIHRvIGJlIGEgYmxhY2stbGlzdC4gIEl0IGNh biBiZSBhIHdoaXRlLWxpc3QNCm9yIGFueXRoaW5nIHRoYXQgbWF0dGVycy4gIFRoZSBjYWxsZXIg ZGVmaW5lcyB0aGUgdXNhZ2Ugb2YgYSBsaXN0LiAgU28sDQpJIHRyaWVkIHRvIGF2b2lkIHB1dHRp bmcgYW55IHVzYWdlIHRvIHRoZSBzdHJ1Y3R1cmUgbmFtZS4NCg0KPiA+IEFncmVlZC7CoMKgV2ls bCBjaGFuZ2UgdG8gYSBzaG9ydGVyIG5hbWUgbGlrZSBiZWxvdy7CoA0KPiA+IA0KPiA+IAllbnVt IGFjcGlfb2VtbGlzdF9wcmVkIHByZWRpY2F0ZTsNCj4gDQo+IAllbnVtIGFjcGlfcHJlZGljYXRl IHByZWQ7DQo+IA0KPiBpcyBldmVuIGJldHRlci4NCg0KU291bmRzIGdvb2QuDQoNClRoYW5rcywN Ci1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 17:24 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 17:24 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 18:43 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 03:48:54PM +0000, Kani, Toshimitsu wrote: > > This patch defines 'struct acpi_oemlist' in "include/linux/acpi.h" > > as a > > I see that. > > > common structure, and replaces this specific 'struct > > acpi_blacklist'. > > And what makes acpi_oemlist "common" and acpi_blacklist "specific"? > > So let me save you some time - "oemlist" is more specific than > "blacklist" and I can imagine a blacklist item not always being > oem-specific. > > What I'm hinting at is, don't change that name. acpi_blacklist is > just fine. Well, a list does not need to be a black-list. It can be a white-list or anything that matters. The caller defines the usage of a list. So, I tried to avoid putting any usage to the structure name. > > Agreed. Will change to a shorter name like below. > > > > enum acpi_oemlist_pred predicate; > > enum acpi_predicate pred; > > is even better. Sounds good. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 17:24 ` [PATCH 1/3] " Kani, Toshimitsu (?) @ 2017-07-18 17:42 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 17:42 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 05:24:50PM +0000, Kani, Toshimitsu wrote: > Well, a list does not need to be a black-list. But this one *is* a blacklist. > So, I tried to avoid putting any usage to the structure name. So OEM is a usage. The moment you need to use it for something else besides an OEM, it is not an OEM list anymore - it is a generic blacklist which blacklists OEMs too. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 17:42 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 17:42 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 05:24:50PM +0000, Kani, Toshimitsu wrote: > Well, a list does not need to be a black-list. But this one *is* a blacklist. > So, I tried to avoid putting any usage to the structure name. So OEM is a usage. The moment you need to use it for something else besides an OEM, it is not an OEM list anymore - it is a generic blacklist which blacklists OEMs too. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 17:42 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 17:42 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 05:24:50PM +0000, Kani, Toshimitsu wrote: > Well, a list does not need to be a black-list. But this one *is* a blacklist. > So, I tried to avoid putting any usage to the structure name. So OEM is a usage. The moment you need to use it for something else besides an OEM, it is not an OEM list anymore - it is a generic blacklist which blacklists OEMs too. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 17:42 ` [PATCH 1/3] " Borislav Petkov (?) @ 2017-07-18 18:49 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 18:49 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 19:42 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 05:24:50PM +0000, Kani, Toshimitsu wrote: > > Well, a list does not need to be a black-list. > > But this one *is* a blacklist. Right. Hence, acpi_backlisted() still declares the list as 'acpi_blacklist[]'. > > So, I tried to avoid putting any usage to the structure name. > > So OEM is a usage. The moment you need to use it for something else > besides an OEM, it is not an OEM list anymore - it is a generic > blacklist which blacklists OEMs too. The term "oem" represents data types of the structure, oem_id[], oem_table_id[], and oem_revision, which are defined by the ACPI spec. ghes_edac uses this structure as a while-list, so the term backlist is misleading. intel_pstate also uses it to list the platforms that do not need OS control. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 18:49 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 18:49 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac T24gVHVlLCAyMDE3LTA3LTE4IGF0IDE5OjQyICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6 DQo+IE9uIFR1ZSwgSnVsIDE4LCAyMDE3IGF0IDA1OjI0OjUwUE0gKzAwMDAsIEthbmksIFRvc2hp bWl0c3Ugd3JvdGU6DQo+ID4gV2VsbCwgYSBsaXN0IGRvZXMgbm90IG5lZWQgdG8gYmUgYSBibGFj ay1saXN0Lg0KPiANCj4gQnV0IHRoaXMgb25lICppcyogYSBibGFja2xpc3QuDQoNClJpZ2h0LiAg SGVuY2UsIGFjcGlfYmFja2xpc3RlZCgpIHN0aWxsIGRlY2xhcmVzIHRoZSBsaXN0IGFzDQonYWNw aV9ibGFja2xpc3RbXScuDQoNCj4gPiBTbywgSSB0cmllZCB0byBhdm9pZCBwdXR0aW5nIGFueSB1 c2FnZSB0byB0aGUgc3RydWN0dXJlIG5hbWUuDQo+IA0KPiBTbyBPRU0gaXMgYSB1c2FnZS4gVGhl IG1vbWVudCB5b3UgbmVlZCB0byB1c2UgaXQgZm9yIHNvbWV0aGluZyBlbHNlDQo+IGJlc2lkZXMg YW4gT0VNLCBpdCBpcyBub3QgYW4gT0VNIGxpc3QgYW55bW9yZSAtIGl0IGlzIGEgZ2VuZXJpYw0K PiBibGFja2xpc3Qgd2hpY2ggYmxhY2tsaXN0cyBPRU1zIHRvby4NCg0KVGhlIHRlcm0gIm9lbSIg cmVwcmVzZW50cyBkYXRhIHR5cGVzIG9mIHRoZSBzdHJ1Y3R1cmUsIG9lbV9pZFtdLA0Kb2VtX3Rh YmxlX2lkW10sIGFuZCBvZW1fcmV2aXNpb24sIHdoaWNoIGFyZSBkZWZpbmVkIGJ5IHRoZSBBQ1BJ IHNwZWMuICANCg0KZ2hlc19lZGFjIHVzZXMgdGhpcyBzdHJ1Y3R1cmUgYXMgYSB3aGlsZS1saXN0 LCBzbyB0aGUgdGVybSBiYWNrbGlzdCBpcw0KbWlzbGVhZGluZy4gIGludGVsX3BzdGF0ZSBhbHNv IHVzZXMgaXQgdG8gbGlzdCB0aGUgcGxhdGZvcm1zIHRoYXQgZG8NCm5vdCBuZWVkIE9TIGNvbnRy b2wuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 18:49 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 18:49 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 19:42 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 05:24:50PM +0000, Kani, Toshimitsu wrote: > > Well, a list does not need to be a black-list. > > But this one *is* a blacklist. Right. Hence, acpi_backlisted() still declares the list as 'acpi_blacklist[]'. > > So, I tried to avoid putting any usage to the structure name. > > So OEM is a usage. The moment you need to use it for something else > besides an OEM, it is not an OEM list anymore - it is a generic > blacklist which blacklists OEMs too. The term "oem" represents data types of the structure, oem_id[], oem_table_id[], and oem_revision, which are defined by the ACPI spec. ghes_edac uses this structure as a while-list, so the term backlist is misleading. intel_pstate also uses it to list the platforms that do not need OS control. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 18:49 ` [PATCH 1/3] " Kani, Toshimitsu (?) @ 2017-07-18 19:32 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 19:32 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 06:49:51PM +0000, Kani, Toshimitsu wrote: > ghes_edac uses this structure as a while-list, so the term backlist is > misleading. So this matching function gets both blacklists and whitelists. No wonder it is confusing. Now I finally understand what you wanna do: you want to call all those lists something agnostic as platform_list or so because they contain exactly that: platforms - not OEMs. And then you want to match *platforms*. *Not* OEMs. *Now* I understand what you're trying to tell me. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 19:32 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 19:32 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 06:49:51PM +0000, Kani, Toshimitsu wrote: > ghes_edac uses this structure as a while-list, so the term backlist is > misleading. So this matching function gets both blacklists and whitelists. No wonder it is confusing. Now I finally understand what you wanna do: you want to call all those lists something agnostic as platform_list or so because they contain exactly that: platforms - not OEMs. And then you want to match *platforms*. *Not* OEMs. *Now* I understand what you're trying to tell me. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 19:32 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 19:32 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 06:49:51PM +0000, Kani, Toshimitsu wrote: > ghes_edac uses this structure as a while-list, so the term backlist is > misleading. So this matching function gets both blacklists and whitelists. No wonder it is confusing. Now I finally understand what you wanna do: you want to call all those lists something agnostic as platform_list or so because they contain exactly that: platforms - not OEMs. And then you want to match *platforms*. *Not* OEMs. *Now* I understand what you're trying to tell me. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface 2017-07-18 19:32 ` [PATCH 1/3] " Borislav Petkov (?) @ 2017-07-18 20:17 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 20:17 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 21:32 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 06:49:51PM +0000, Kani, Toshimitsu wrote: > > ghes_edac uses this structure as a while-list, so the term backlist > > is misleading. > > So this matching function gets both blacklists and whitelists. No > wonder it is confusing. Now I finally understand what you wanna do: > you want to call all those lists something agnostic as platform_list > or so because they contain exactly that: platforms - not OEMs. Right. > And then you want to match *platforms*. *Not* OEMs. True, there is some stretch to use OEMIDs for detecting platforms. But we do not have other standard interfaces better than this one. > *Now* I understand what you're trying to tell me. :-) Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 20:17 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 20:17 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 21:32 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 06:49:51PM +0000, Kani, Toshimitsu wrote: > > ghes_edac uses this structure as a while-list, so the term backlist > > is misleading. > > So this matching function gets both blacklists and whitelists. No > wonder it is confusing. Now I finally understand what you wanna do: > you want to call all those lists something agnostic as platform_list > or so because they contain exactly that: platforms - not OEMs. Right. > And then you want to match *platforms*. *Not* OEMs. True, there is some stretch to use OEMIDs for detecting platforms. But we do not have other standard interfaces better than this one. > *Now* I understand what you're trying to tell me. :-) Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface @ 2017-07-18 20:17 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 20:17 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 21:32 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 06:49:51PM +0000, Kani, Toshimitsu wrote: > > ghes_edac uses this structure as a while-list, so the term backlist > > is misleading. > > So this matching function gets both blacklists and whitelists. No > wonder it is confusing. Now I finally understand what you wanna do: > you want to call all those lists something agnostic as platform_list > or so because they contain exactly that: platforms - not OEMs. Right. > And then you want to match *platforms*. *Not* OEMs. True, there is some stretch to use OEMIDs for detecting platforms. But we do not have other standard interfaces better than this one. > *Now* I understand what you're trying to tell me. :-) Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [PATCH 2/3] intel_pstate: convert to use acpi_match_oemlist() @ 2017-07-17 21:59 ` Toshi Kani 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel, Toshi Kani Convert to use acpi_match_oemlist() for the platform type check. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/cpufreq/intel_pstate.c | 64 ++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 39 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index b7fb8b7..8f7703c 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2458,39 +2458,31 @@ enum { PPC, }; -struct hw_vendor_info { - u16 valid; - char oem_id[ACPI_OEM_ID_SIZE]; - char oem_table_id[ACPI_OEM_TABLE_ID_SIZE]; - int oem_pwr_table; -}; - /* Hardware vendor-specific info that has its own power management modes */ -static struct hw_vendor_info vendor_info[] __initdata = { - {1, "HP ", "ProLiant", PSS}, - {1, "ORACLE", "X4-2 ", PPC}, - {1, "ORACLE", "X4-2L ", PPC}, - {1, "ORACLE", "X4-2B ", PPC}, - {1, "ORACLE", "X3-2 ", PPC}, - {1, "ORACLE", "X3-2L ", PPC}, - {1, "ORACLE", "X3-2B ", PPC}, - {1, "ORACLE", "X4470M2 ", PPC}, - {1, "ORACLE", "X4270M3 ", PPC}, - {1, "ORACLE", "X4270M2 ", PPC}, - {1, "ORACLE", "X4170M2 ", PPC}, - {1, "ORACLE", "X4170 M3", PPC}, - {1, "ORACLE", "X4275 M3", PPC}, - {1, "ORACLE", "X6-2 ", PPC}, - {1, "ORACLE", "Sudbury ", PPC}, - {0, "", ""}, +static struct acpi_oemlist oemlist[] __initdata = { + {"HP ", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS}, + {"ORACLE", "X4-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4-2L ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4-2B ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X3-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X3-2L ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X3-2B ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X6-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + { } /* End */ }; static bool __init intel_pstate_platform_pwr_mgmt_exists(void) { - struct acpi_table_header hdr; - struct hw_vendor_info *v_info; const struct x86_cpu_id *id; u64 misc_pwr; + int idx; id = x86_match_cpu(intel_pstate_cpu_oob_ids); if (id) { @@ -2499,21 +2491,15 @@ static bool __init intel_pstate_platform_pwr_mgmt_exists(void) return true; } - if (acpi_disabled || - ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, &hdr))) + idx = acpi_match_oemlist(oemlist); + if (idx < 0) return false; - for (v_info = vendor_info; v_info->valid; v_info++) { - if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) && - !strncmp(hdr.oem_table_id, v_info->oem_table_id, - ACPI_OEM_TABLE_ID_SIZE)) - switch (v_info->oem_pwr_table) { - case PSS: - return intel_pstate_no_acpi_pss(); - case PPC: - return intel_pstate_has_acpi_ppc() && - (!force_load); - } + switch (oemlist[idx].data) { + case PSS: + return intel_pstate_no_acpi_pss(); + case PPC: + return intel_pstate_has_acpi_ppc() && !force_load; } return false; ^ permalink raw reply related [flat|nested] 238+ messages in thread
* [2/3] intel_pstate: convert to use acpi_match_oemlist() @ 2017-07-17 21:59 ` Toshi Kani 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel, Toshi Kani Convert to use acpi_match_oemlist() for the platform type check. There is no change in functionality. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/cpufreq/intel_pstate.c | 64 ++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 39 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index b7fb8b7..8f7703c 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2458,39 +2458,31 @@ enum { PPC, }; -struct hw_vendor_info { - u16 valid; - char oem_id[ACPI_OEM_ID_SIZE]; - char oem_table_id[ACPI_OEM_TABLE_ID_SIZE]; - int oem_pwr_table; -}; - /* Hardware vendor-specific info that has its own power management modes */ -static struct hw_vendor_info vendor_info[] __initdata = { - {1, "HP ", "ProLiant", PSS}, - {1, "ORACLE", "X4-2 ", PPC}, - {1, "ORACLE", "X4-2L ", PPC}, - {1, "ORACLE", "X4-2B ", PPC}, - {1, "ORACLE", "X3-2 ", PPC}, - {1, "ORACLE", "X3-2L ", PPC}, - {1, "ORACLE", "X3-2B ", PPC}, - {1, "ORACLE", "X4470M2 ", PPC}, - {1, "ORACLE", "X4270M3 ", PPC}, - {1, "ORACLE", "X4270M2 ", PPC}, - {1, "ORACLE", "X4170M2 ", PPC}, - {1, "ORACLE", "X4170 M3", PPC}, - {1, "ORACLE", "X4275 M3", PPC}, - {1, "ORACLE", "X6-2 ", PPC}, - {1, "ORACLE", "Sudbury ", PPC}, - {0, "", ""}, +static struct acpi_oemlist oemlist[] __initdata = { + {"HP ", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS}, + {"ORACLE", "X4-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4-2L ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4-2B ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X3-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X3-2L ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X3-2B ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "X6-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + { } /* End */ }; static bool __init intel_pstate_platform_pwr_mgmt_exists(void) { - struct acpi_table_header hdr; - struct hw_vendor_info *v_info; const struct x86_cpu_id *id; u64 misc_pwr; + int idx; id = x86_match_cpu(intel_pstate_cpu_oob_ids); if (id) { @@ -2499,21 +2491,15 @@ static bool __init intel_pstate_platform_pwr_mgmt_exists(void) return true; } - if (acpi_disabled || - ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, &hdr))) + idx = acpi_match_oemlist(oemlist); + if (idx < 0) return false; - for (v_info = vendor_info; v_info->valid; v_info++) { - if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) && - !strncmp(hdr.oem_table_id, v_info->oem_table_id, - ACPI_OEM_TABLE_ID_SIZE)) - switch (v_info->oem_pwr_table) { - case PSS: - return intel_pstate_no_acpi_pss(); - case PPC: - return intel_pstate_has_acpi_ppc() && - (!force_load); - } + switch (oemlist[idx].data) { + case PSS: + return intel_pstate_no_acpi_pss(); + case PPC: + return intel_pstate_has_acpi_ppc() && !force_load; } return false; ^ permalink raw reply related [flat|nested] 238+ messages in thread
* [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-17 21:59 ` Toshi Kani 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel, Toshi Kani The ghes_edac driver was introduced in 2013 [1], but it has not been enabled by any distro yet. This driver obtains error info from firmware interfaces, which are not properly implemented on many platforms, as the driver always emits the messages below: This EDAC driver relies on BIOS to enumerate memory and get error reports. Unfortunately, not all BIOSes reflect the memory layout correctly So, the end result of using this driver varies from vendor to vendor If you find incorrect reports, please contact your hardware vendor to correct its BIOS. To get out from this situation, add a platform type check to selectively enable the driver on the platforms that are known to have proper firmware implementation. Platform vendors can add their platforms to the list when they support ghes_edac. "ghes_edac.any_oem=1" skips the platform type check. [1]: https://lwn.net/Articles/538438/ Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/edac/ghes_edac.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 4e61a62..00588a3 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -34,6 +34,9 @@ static LIST_HEAD(ghes_reglist); static DEFINE_MUTEX(ghes_edac_lock); static int ghes_edac_mc_num; +/* Set 1 to skip the platform check */ +static bool __read_mostly ghes_edac_any_oem; +module_param_named(any_oem, ghes_edac_any_oem, bool, 0); /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry { @@ -405,6 +408,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, } EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error); +/* + * Known systems that are safe to enable this module. + * "ghes_edac.any_oem=1" skips this check if necessary. + */ +static struct acpi_oemlist oemlist[] = { + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, + { } /* End */ +}; + int ghes_edac_register(struct ghes *ghes, struct device *dev) { bool fake = false; @@ -413,6 +425,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) struct edac_mc_layer layers[1]; struct ghes_edac_pvt *pvt; struct ghes_edac_dimm_fill dimm_fill; + int idx; + + /* Check if safe to enable on this system */ + idx = acpi_match_oemlist(oemlist); + if (!ghes_edac_any_oem && idx < 0) + return 0; /* Get the number of DIMMs */ dmi_walk(ghes_edac_count_dimms, &num_dimm); @@ -456,7 +474,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) mci->dev_name = "ghes"; if (!ghes_edac_mc_num) { - if (!fake) { + if (fake) { + pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); + pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); + pr_info("work on such system. Use this driver with caution\n"); + } else if (idx < 0) { pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"); pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n"); pr_info("So, the end result of using this driver varies from vendor to vendor.\n"); @@ -464,10 +486,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) pr_info("to correct its BIOS.\n"); pr_info("This system has %d DIMM sockets.\n", num_dimm); - } else { - pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); - pr_info("work on such system. Use this driver with caution\n"); } } ^ permalink raw reply related [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-17 21:59 ` Toshi Kani 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-17 21:59 UTC (permalink / raw) To: rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel, Toshi Kani The ghes_edac driver was introduced in 2013 [1], but it has not been enabled by any distro yet. This driver obtains error info from firmware interfaces, which are not properly implemented on many platforms, as the driver always emits the messages below: This EDAC driver relies on BIOS to enumerate memory and get error reports. Unfortunately, not all BIOSes reflect the memory layout correctly So, the end result of using this driver varies from vendor to vendor If you find incorrect reports, please contact your hardware vendor to correct its BIOS. To get out from this situation, add a platform type check to selectively enable the driver on the platforms that are known to have proper firmware implementation. Platform vendors can add their platforms to the list when they support ghes_edac. "ghes_edac.any_oem=1" skips the platform type check. [1]: https://lwn.net/Articles/538438/ Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/edac/ghes_edac.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 4e61a62..00588a3 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -34,6 +34,9 @@ static LIST_HEAD(ghes_reglist); static DEFINE_MUTEX(ghes_edac_lock); static int ghes_edac_mc_num; +/* Set 1 to skip the platform check */ +static bool __read_mostly ghes_edac_any_oem; +module_param_named(any_oem, ghes_edac_any_oem, bool, 0); /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry { @@ -405,6 +408,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, } EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error); +/* + * Known systems that are safe to enable this module. + * "ghes_edac.any_oem=1" skips this check if necessary. + */ +static struct acpi_oemlist oemlist[] = { + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, + { } /* End */ +}; + int ghes_edac_register(struct ghes *ghes, struct device *dev) { bool fake = false; @@ -413,6 +425,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) struct edac_mc_layer layers[1]; struct ghes_edac_pvt *pvt; struct ghes_edac_dimm_fill dimm_fill; + int idx; + + /* Check if safe to enable on this system */ + idx = acpi_match_oemlist(oemlist); + if (!ghes_edac_any_oem && idx < 0) + return 0; /* Get the number of DIMMs */ dmi_walk(ghes_edac_count_dimms, &num_dimm); @@ -456,7 +474,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) mci->dev_name = "ghes"; if (!ghes_edac_mc_num) { - if (!fake) { + if (fake) { + pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); + pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); + pr_info("work on such system. Use this driver with caution\n"); + } else if (idx < 0) { pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"); pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n"); pr_info("So, the end result of using this driver varies from vendor to vendor.\n"); @@ -464,10 +486,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) pr_info("to correct its BIOS.\n"); pr_info("This system has %d DIMM sockets.\n", num_dimm); - } else { - pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); - pr_info("work on such system. Use this driver with caution\n"); } } ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 6:00 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 6:00 UTC (permalink / raw) To: Toshi Kani, Tony Luck Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > The ghes_edac driver was introduced in 2013 [1], but it has not > been enabled by any distro yet. This driver obtains error info > from firmware interfaces, which are not properly implemented on > many platforms, as the driver always emits the messages below: > > This EDAC driver relies on BIOS to enumerate memory and get error reports. > Unfortunately, not all BIOSes reflect the memory layout correctly > So, the end result of using this driver varies from vendor to vendor > If you find incorrect reports, please contact your hardware vendor > to correct its BIOS. > > To get out from this situation, add a platform type check to > selectively enable the driver on the platforms that are known to > have proper firmware implementation. Platform vendors can add > their platforms to the list when they support ghes_edac. So maintaining whitelists for things has always been a PITA and we should try to avoid it, if possible. (We can always do it if nothing saner comes along.) Now, below is a dirty patch converting ghes_edac to a normal module. On systems where we have GHES, the firmware generally disables the detection of the presence of ECC hardware, thus preventing the platform EDAC driver from loading. Let me clarify: I have an AMD HP box which, when GHES is enabled in the BIOS, says that ECC is disabled in the memory controller and the amd64_edac driver doesn't load for that memory controller. And I think we should try this first: have the firmware disable detection methods so that the platform drivers don't load. Then, ghes_edac can be a simple module and no other driver would attempt loading. The question is: does the platform do this disabling now? Tony, I'm looking at sb_edac and there we don't do something like that or maybe I'm missing it. Hmmm. --- From: Borislav Petkov <bp@suse.de> Date: Thu, 29 Jun 2017 10:28:32 +0200 Subject: [PATCH] WIP Not-Signed-off-by: Borislav Petkov <bp@suse.de> --- drivers/acpi/apei/ghes.c | 32 ++++++----- drivers/edac/Kconfig | 4 +- drivers/edac/edac_mc.h | 3 ++ drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++----------------------- include/acpi/ghes.h | 27 +--------- 5 files changed, 98 insertions(+), 105 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d661d452b238..37cd698cacd2 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -140,6 +140,20 @@ static atomic_t ghes_estatus_cache_alloced; static int ghes_panic_timeout __read_mostly = 30; +static ATOMIC_NOTIFIER_HEAD(ghes_edac_chain); + +void ghes_register_edac_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_register(&ghes_edac_chain, nb); +} +EXPORT_SYMBOL_GPL(ghes_register_edac_chain); + +void ghes_unregister_edac_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_unregister(&ghes_edac_chain, nb); +} +EXPORT_SYMBOL_GPL(ghes_unregister_edac_chain); + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -461,11 +475,11 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { - int sev, sec_sev; struct acpi_hest_generic_data *gdata; guid_t *sec_type; guid_t *fru_id = &NULL_UUID_LE; char *fru_text = ""; + int sev, sec_sev; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { @@ -480,7 +494,8 @@ static void ghes_do_proc(struct ghes *ghes, if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) { struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); - ghes_edac_report_mem_error(ghes, sev, mem_err); + + atomic_notifier_call_chain(&ghes_edac_chain, sev, &mem_err); arch_apei_report_mem_error(sev, mem_err); ghes_handle_memory_failure(gdata, sev); @@ -1139,10 +1154,6 @@ static int ghes_probe(struct platform_device *ghes_dev) goto err; } - rc = ghes_edac_register(ghes, &ghes_dev->dev); - if (rc < 0) - goto err; - switch (generic->notify.type) { case ACPI_HEST_NOTIFY_POLLED: setup_deferrable_timer(&ghes->timer, ghes_poll_func, @@ -1155,13 +1166,13 @@ static int ghes_probe(struct platform_device *ghes_dev) if (rc) { pr_err(GHES_PFX "Failed to map GSI to IRQ for generic hardware error source: %d\n", generic->header.source_id); - goto err_edac_unreg; + goto err; } rc = request_irq(ghes->irq, ghes_irq_func, 0, "GHES IRQ", ghes); if (rc) { pr_err(GHES_PFX "Failed to register IRQ for generic hardware error source: %d\n", generic->header.source_id); - goto err_edac_unreg; + goto err; } break; @@ -1190,8 +1201,7 @@ static int ghes_probe(struct platform_device *ghes_dev) ghes_proc(ghes); return 0; -err_edac_unreg: - ghes_edac_unregister(ghes); + err: if (ghes) { ghes_fini(ghes); @@ -1241,8 +1251,6 @@ static int ghes_remove(struct platform_device *ghes_dev) ghes_fini(ghes); - ghes_edac_unregister(ghes); - kfree(ghes); platform_set_drvdata(ghes_dev, NULL); diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 96afb2aeed18..fdd8278ca89a 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -53,8 +53,8 @@ config EDAC_DECODE_MCE has been initialized. config EDAC_GHES - bool "Output ACPI APEI/GHES BIOS detected errors via EDAC" - depends on ACPI_APEI_GHES && (EDAC=y) + tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC" + depends on ACPI_APEI_GHES help Not all machines support hardware-driven error report. Some of those provide a BIOS-driven error report mechanism via ACPI, using the diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h index 5357800e418d..6d46f30dc657 100644 --- a/drivers/edac/edac_mc.h +++ b/drivers/edac/edac_mc.h @@ -60,6 +60,9 @@ #define edac_pci_printk(ctl, level, fmt, arg...) \ printk(level "EDAC PCI%d: " fmt, ctl->pci_idx, ##arg) +#define edac_pr_err(fmt, arg...) edac_printk(KERN_ERR, "", fmt, ##arg) +#define edac_pr_info(fmt, arg...) edac_printk(KERN_INFO, "", fmt, ##arg) + /* prefixes for edac_printk() and edac_mc_printk() */ #define EDAC_MC "MC" #define EDAC_PCI "PCI" diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 4e61a6229dd2..20fafc55eb2d 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -5,6 +5,9 @@ * License version 2. * * Copyright (c) 2013 by Mauro Carvalho Chehab + * (c) 2017 Borislav Petkov + * + * Borislav Petkov: turn it into a proper module. * * Red Hat Inc. http://www.redhat.com */ @@ -17,7 +20,14 @@ #include "edac_module.h" #include <ras/ras_event.h> -#define GHES_EDAC_REVISION " Ver: 1.0.0" +#define GHES_EDAC_REVISION " Ver: 2.0.0" + +/* + * Hand it into EDAC's core so that we have a device to operate on. + */ +static struct device dummy_dev; + +struct ghes_edac_pvt *ghes_pvt; struct ghes_edac_pvt { struct list_head list; @@ -30,11 +40,6 @@ struct ghes_edac_pvt { char msg[80]; }; -static LIST_HEAD(ghes_reglist); -static DEFINE_MUTEX(ghes_edac_lock); -static int ghes_edac_mc_num; - - /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry { u8 type; @@ -165,24 +170,21 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) } } -void ghes_edac_report_mem_error(struct ghes *ghes, int sev, - struct cper_sec_mem_err *mem_err) +static int report_mem_error(struct notifier_block *nb, unsigned long sev, void *data) { + struct cper_sec_mem_err *mem_err = data; enum hw_event_mc_err_type type; struct edac_raw_error_desc *e; struct mem_ctl_info *mci; - struct ghes_edac_pvt *pvt = NULL; - char *p; + struct ghes_edac_pvt *pvt = ghes_pvt; u8 grain_bits; + char *p; - list_for_each_entry(pvt, &ghes_reglist, list) { - if (ghes == pvt->ghes) - break; - } if (!pvt) { - pr_err("Internal error: Can't find EDAC structure\n"); - return; + edac_pr_err("Internal error: Can't find EDAC structure\n"); + return NOTIFY_DONE; } + mci = pvt->mci; e = &mci->error_desc; @@ -402,23 +404,40 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, /* Report the error via EDAC API */ edac_raw_mc_handle_error(type, mci, e); + + return NOTIFY_DONE; } -EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error); -int ghes_edac_register(struct ghes *ghes, struct device *dev) +static struct notifier_block ghes_nb = { + .notifier_call = report_mem_error, +}; + +static const char * const fake_msg = +"This EDAC driver relies on BIOS to enumerate memory and get error reports.\n" +"Unfortunately, not all BIOSes reflect the memory layout correctly.\n" +"So, the end result of using this driver varies from vendor to vendor.\n" +"If you find incorrect reports, please contact your hardware vendor\n" +"to correct its BIOS."; + +static const char * const super_crap_msg = +"This system has a very crappy BIOS: It doesn't even list the DIMMS.\n" +"Its SMBIOS info is wrong. It is doubtful that the error report would\n" +"work on such system. Use this driver with caution."; + +static int __init ghes_edac_register(void) { + struct ghes_edac_pvt *pvt = ghes_pvt; bool fake = false; int rc, num_dimm = 0; struct mem_ctl_info *mci; struct edac_mc_layer layers[1]; - struct ghes_edac_pvt *pvt; struct ghes_edac_dimm_fill dimm_fill; /* Get the number of DIMMs */ dmi_walk(ghes_edac_count_dimms, &num_dimm); /* Check if we've got a bogus BIOS */ - if (num_dimm == 0) { + if (!num_dimm) { fake = true; num_dimm = 1; } @@ -431,21 +450,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) * We need to serialize edac_mc_alloc() and edac_mc_add_mc(), * to avoid duplicated memory controller numbers */ - mutex_lock(&ghes_edac_lock); - mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers, - sizeof(*pvt)); + mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt)); if (!mci) { - pr_info("Can't allocate memory for EDAC data\n"); - mutex_unlock(&ghes_edac_lock); + edac_pr_err("Can't allocate memory for EDAC data\n"); return -ENOMEM; } pvt = mci->pvt_info; memset(pvt, 0, sizeof(*pvt)); - list_add_tail(&pvt->list, &ghes_reglist); - pvt->ghes = ghes; pvt->mci = mci; - mci->pdev = dev; + + mci->pdev = &dummy_dev; mci->mtype_cap = MEM_FLAG_EMPTY; mci->edac_ctl_cap = EDAC_FLAG_NONE; @@ -455,21 +470,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) mci->ctl_name = "ghes_edac"; mci->dev_name = "ghes"; - if (!ghes_edac_mc_num) { - if (!fake) { - pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"); - pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n"); - pr_info("So, the end result of using this driver varies from vendor to vendor.\n"); - pr_info("If you find incorrect reports, please contact your hardware vendor\n"); - pr_info("to correct its BIOS.\n"); - pr_info("This system has %d DIMM sockets.\n", - num_dimm); - } else { - pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); - pr_info("work on such system. Use this driver with caution\n"); - } - } + if (!fake) + edac_pr_info("%s\n", fake_msg); + else + edac_pr_info("%s\n", super_crap_msg); + + edac_pr_info("This system has %d DIMM sockets.\n", num_dimm); if (!fake) { /* @@ -478,13 +484,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) * Keep it in blank for the other memory controllers, as * there's no reliable way to properly credit each DIMM to * the memory controller, as different BIOSes fill the - * DMI bank location fields on different ways + * DMI bank location fields in different ways. */ - if (!ghes_edac_mc_num) { - dimm_fill.count = 0; - dimm_fill.mci = mci; - dmi_walk(ghes_edac_dmidecode, &dimm_fill); - } + dimm_fill.count = 0; + dimm_fill.mci = mci; + dmi_walk(ghes_edac_dmidecode, &dimm_fill); } else { struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, 0, 0, 0); @@ -498,30 +502,31 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) rc = edac_mc_add_mc(mci); if (rc < 0) { - pr_info("Can't register at EDAC core\n"); + edac_pr_err("Can't register with EDAC core\n"); edac_mc_free(mci); - mutex_unlock(&ghes_edac_lock); return -ENODEV; } - ghes_edac_mc_num++; - mutex_unlock(&ghes_edac_lock); + ghes_register_edac_chain(&ghes_nb); + return 0; } -EXPORT_SYMBOL_GPL(ghes_edac_register); +module_init(ghes_edac_register); -void ghes_edac_unregister(struct ghes *ghes) +static void __exit ghes_edac_unregister(void) { struct mem_ctl_info *mci; - struct ghes_edac_pvt *pvt, *tmp; - - list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) { - if (ghes == pvt->ghes) { - mci = pvt->mci; - edac_mc_del_mc(mci->pdev); - edac_mc_free(mci); - list_del(&pvt->list); - } - } + + ghes_unregister_edac_chain(&ghes_nb); + + mci = find_mci_by_dev(&dummy_dev); + WARN_ON(!mci); + + edac_mc_del_mc(mci->pdev); + edac_mc_free(mci); + } -EXPORT_SYMBOL_GPL(ghes_edac_unregister); +module_exit(ghes_edac_unregister); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("GHES error decoding module - " GHES_EDAC_REVISION); diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 9f26e01186ae..c02b8eb91bd6 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -51,31 +51,8 @@ enum { GHES_SEV_PANIC = 0x3, }; -/* From drivers/edac/ghes_edac.c */ - -#ifdef CONFIG_EDAC_GHES -void ghes_edac_report_mem_error(struct ghes *ghes, int sev, - struct cper_sec_mem_err *mem_err); - -int ghes_edac_register(struct ghes *ghes, struct device *dev); - -void ghes_edac_unregister(struct ghes *ghes); - -#else -static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev, - struct cper_sec_mem_err *mem_err) -{ -} - -static inline int ghes_edac_register(struct ghes *ghes, struct device *dev) -{ - return 0; -} - -static inline void ghes_edac_unregister(struct ghes *ghes) -{ -} -#endif +void ghes_register_edac_chain(struct notifier_block *nb); +void ghes_unregister_edac_chain(struct notifier_block *nb); static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata) { -- 2.14.0.rc0 -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply related [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 6:00 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 6:00 UTC (permalink / raw) To: Toshi Kani, Tony Luck Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > The ghes_edac driver was introduced in 2013 [1], but it has not > been enabled by any distro yet. This driver obtains error info > from firmware interfaces, which are not properly implemented on > many platforms, as the driver always emits the messages below: > > This EDAC driver relies on BIOS to enumerate memory and get error reports. > Unfortunately, not all BIOSes reflect the memory layout correctly > So, the end result of using this driver varies from vendor to vendor > If you find incorrect reports, please contact your hardware vendor > to correct its BIOS. > > To get out from this situation, add a platform type check to > selectively enable the driver on the platforms that are known to > have proper firmware implementation. Platform vendors can add > their platforms to the list when they support ghes_edac. So maintaining whitelists for things has always been a PITA and we should try to avoid it, if possible. (We can always do it if nothing saner comes along.) Now, below is a dirty patch converting ghes_edac to a normal module. On systems where we have GHES, the firmware generally disables the detection of the presence of ECC hardware, thus preventing the platform EDAC driver from loading. Let me clarify: I have an AMD HP box which, when GHES is enabled in the BIOS, says that ECC is disabled in the memory controller and the amd64_edac driver doesn't load for that memory controller. And I think we should try this first: have the firmware disable detection methods so that the platform drivers don't load. Then, ghes_edac can be a simple module and no other driver would attempt loading. The question is: does the platform do this disabling now? Tony, I'm looking at sb_edac and there we don't do something like that or maybe I'm missing it. Hmmm. --- From: Borislav Petkov <bp@suse.de> Date: Thu, 29 Jun 2017 10:28:32 +0200 Subject: [PATCH] WIP Not-Signed-off-by: Borislav Petkov <bp@suse.de> --- drivers/acpi/apei/ghes.c | 32 ++++++----- drivers/edac/Kconfig | 4 +- drivers/edac/edac_mc.h | 3 ++ drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++----------------------- include/acpi/ghes.h | 27 +--------- 5 files changed, 98 insertions(+), 105 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d661d452b238..37cd698cacd2 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -140,6 +140,20 @@ static atomic_t ghes_estatus_cache_alloced; static int ghes_panic_timeout __read_mostly = 30; +static ATOMIC_NOTIFIER_HEAD(ghes_edac_chain); + +void ghes_register_edac_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_register(&ghes_edac_chain, nb); +} +EXPORT_SYMBOL_GPL(ghes_register_edac_chain); + +void ghes_unregister_edac_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_unregister(&ghes_edac_chain, nb); +} +EXPORT_SYMBOL_GPL(ghes_unregister_edac_chain); + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -461,11 +475,11 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int static void ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { - int sev, sec_sev; struct acpi_hest_generic_data *gdata; guid_t *sec_type; guid_t *fru_id = &NULL_UUID_LE; char *fru_text = ""; + int sev, sec_sev; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { @@ -480,7 +494,8 @@ static void ghes_do_proc(struct ghes *ghes, if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) { struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); - ghes_edac_report_mem_error(ghes, sev, mem_err); + + atomic_notifier_call_chain(&ghes_edac_chain, sev, &mem_err); arch_apei_report_mem_error(sev, mem_err); ghes_handle_memory_failure(gdata, sev); @@ -1139,10 +1154,6 @@ static int ghes_probe(struct platform_device *ghes_dev) goto err; } - rc = ghes_edac_register(ghes, &ghes_dev->dev); - if (rc < 0) - goto err; - switch (generic->notify.type) { case ACPI_HEST_NOTIFY_POLLED: setup_deferrable_timer(&ghes->timer, ghes_poll_func, @@ -1155,13 +1166,13 @@ static int ghes_probe(struct platform_device *ghes_dev) if (rc) { pr_err(GHES_PFX "Failed to map GSI to IRQ for generic hardware error source: %d\n", generic->header.source_id); - goto err_edac_unreg; + goto err; } rc = request_irq(ghes->irq, ghes_irq_func, 0, "GHES IRQ", ghes); if (rc) { pr_err(GHES_PFX "Failed to register IRQ for generic hardware error source: %d\n", generic->header.source_id); - goto err_edac_unreg; + goto err; } break; @@ -1190,8 +1201,7 @@ static int ghes_probe(struct platform_device *ghes_dev) ghes_proc(ghes); return 0; -err_edac_unreg: - ghes_edac_unregister(ghes); + err: if (ghes) { ghes_fini(ghes); @@ -1241,8 +1251,6 @@ static int ghes_remove(struct platform_device *ghes_dev) ghes_fini(ghes); - ghes_edac_unregister(ghes); - kfree(ghes); platform_set_drvdata(ghes_dev, NULL); diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 96afb2aeed18..fdd8278ca89a 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -53,8 +53,8 @@ config EDAC_DECODE_MCE has been initialized. config EDAC_GHES - bool "Output ACPI APEI/GHES BIOS detected errors via EDAC" - depends on ACPI_APEI_GHES && (EDAC=y) + tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC" + depends on ACPI_APEI_GHES help Not all machines support hardware-driven error report. Some of those provide a BIOS-driven error report mechanism via ACPI, using the diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h index 5357800e418d..6d46f30dc657 100644 --- a/drivers/edac/edac_mc.h +++ b/drivers/edac/edac_mc.h @@ -60,6 +60,9 @@ #define edac_pci_printk(ctl, level, fmt, arg...) \ printk(level "EDAC PCI%d: " fmt, ctl->pci_idx, ##arg) +#define edac_pr_err(fmt, arg...) edac_printk(KERN_ERR, "", fmt, ##arg) +#define edac_pr_info(fmt, arg...) edac_printk(KERN_INFO, "", fmt, ##arg) + /* prefixes for edac_printk() and edac_mc_printk() */ #define EDAC_MC "MC" #define EDAC_PCI "PCI" diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 4e61a6229dd2..20fafc55eb2d 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -5,6 +5,9 @@ * License version 2. * * Copyright (c) 2013 by Mauro Carvalho Chehab + * (c) 2017 Borislav Petkov + * + * Borislav Petkov: turn it into a proper module. * * Red Hat Inc. http://www.redhat.com */ @@ -17,7 +20,14 @@ #include "edac_module.h" #include <ras/ras_event.h> -#define GHES_EDAC_REVISION " Ver: 1.0.0" +#define GHES_EDAC_REVISION " Ver: 2.0.0" + +/* + * Hand it into EDAC's core so that we have a device to operate on. + */ +static struct device dummy_dev; + +struct ghes_edac_pvt *ghes_pvt; struct ghes_edac_pvt { struct list_head list; @@ -30,11 +40,6 @@ struct ghes_edac_pvt { char msg[80]; }; -static LIST_HEAD(ghes_reglist); -static DEFINE_MUTEX(ghes_edac_lock); -static int ghes_edac_mc_num; - - /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry { u8 type; @@ -165,24 +170,21 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) } } -void ghes_edac_report_mem_error(struct ghes *ghes, int sev, - struct cper_sec_mem_err *mem_err) +static int report_mem_error(struct notifier_block *nb, unsigned long sev, void *data) { + struct cper_sec_mem_err *mem_err = data; enum hw_event_mc_err_type type; struct edac_raw_error_desc *e; struct mem_ctl_info *mci; - struct ghes_edac_pvt *pvt = NULL; - char *p; + struct ghes_edac_pvt *pvt = ghes_pvt; u8 grain_bits; + char *p; - list_for_each_entry(pvt, &ghes_reglist, list) { - if (ghes == pvt->ghes) - break; - } if (!pvt) { - pr_err("Internal error: Can't find EDAC structure\n"); - return; + edac_pr_err("Internal error: Can't find EDAC structure\n"); + return NOTIFY_DONE; } + mci = pvt->mci; e = &mci->error_desc; @@ -402,23 +404,40 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev, /* Report the error via EDAC API */ edac_raw_mc_handle_error(type, mci, e); + + return NOTIFY_DONE; } -EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error); -int ghes_edac_register(struct ghes *ghes, struct device *dev) +static struct notifier_block ghes_nb = { + .notifier_call = report_mem_error, +}; + +static const char * const fake_msg = +"This EDAC driver relies on BIOS to enumerate memory and get error reports.\n" +"Unfortunately, not all BIOSes reflect the memory layout correctly.\n" +"So, the end result of using this driver varies from vendor to vendor.\n" +"If you find incorrect reports, please contact your hardware vendor\n" +"to correct its BIOS."; + +static const char * const super_crap_msg = +"This system has a very crappy BIOS: It doesn't even list the DIMMS.\n" +"Its SMBIOS info is wrong. It is doubtful that the error report would\n" +"work on such system. Use this driver with caution."; + +static int __init ghes_edac_register(void) { + struct ghes_edac_pvt *pvt = ghes_pvt; bool fake = false; int rc, num_dimm = 0; struct mem_ctl_info *mci; struct edac_mc_layer layers[1]; - struct ghes_edac_pvt *pvt; struct ghes_edac_dimm_fill dimm_fill; /* Get the number of DIMMs */ dmi_walk(ghes_edac_count_dimms, &num_dimm); /* Check if we've got a bogus BIOS */ - if (num_dimm == 0) { + if (!num_dimm) { fake = true; num_dimm = 1; } @@ -431,21 +450,17 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) * We need to serialize edac_mc_alloc() and edac_mc_add_mc(), * to avoid duplicated memory controller numbers */ - mutex_lock(&ghes_edac_lock); - mci = edac_mc_alloc(ghes_edac_mc_num, ARRAY_SIZE(layers), layers, - sizeof(*pvt)); + mci = edac_mc_alloc(1, ARRAY_SIZE(layers), layers, sizeof(*pvt)); if (!mci) { - pr_info("Can't allocate memory for EDAC data\n"); - mutex_unlock(&ghes_edac_lock); + edac_pr_err("Can't allocate memory for EDAC data\n"); return -ENOMEM; } pvt = mci->pvt_info; memset(pvt, 0, sizeof(*pvt)); - list_add_tail(&pvt->list, &ghes_reglist); - pvt->ghes = ghes; pvt->mci = mci; - mci->pdev = dev; + + mci->pdev = &dummy_dev; mci->mtype_cap = MEM_FLAG_EMPTY; mci->edac_ctl_cap = EDAC_FLAG_NONE; @@ -455,21 +470,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) mci->ctl_name = "ghes_edac"; mci->dev_name = "ghes"; - if (!ghes_edac_mc_num) { - if (!fake) { - pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"); - pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n"); - pr_info("So, the end result of using this driver varies from vendor to vendor.\n"); - pr_info("If you find incorrect reports, please contact your hardware vendor\n"); - pr_info("to correct its BIOS.\n"); - pr_info("This system has %d DIMM sockets.\n", - num_dimm); - } else { - pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); - pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); - pr_info("work on such system. Use this driver with caution\n"); - } - } + if (!fake) + edac_pr_info("%s\n", fake_msg); + else + edac_pr_info("%s\n", super_crap_msg); + + edac_pr_info("This system has %d DIMM sockets.\n", num_dimm); if (!fake) { /* @@ -478,13 +484,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) * Keep it in blank for the other memory controllers, as * there's no reliable way to properly credit each DIMM to * the memory controller, as different BIOSes fill the - * DMI bank location fields on different ways + * DMI bank location fields in different ways. */ - if (!ghes_edac_mc_num) { - dimm_fill.count = 0; - dimm_fill.mci = mci; - dmi_walk(ghes_edac_dmidecode, &dimm_fill); - } + dimm_fill.count = 0; + dimm_fill.mci = mci; + dmi_walk(ghes_edac_dmidecode, &dimm_fill); } else { struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, 0, 0, 0); @@ -498,30 +502,31 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) rc = edac_mc_add_mc(mci); if (rc < 0) { - pr_info("Can't register at EDAC core\n"); + edac_pr_err("Can't register with EDAC core\n"); edac_mc_free(mci); - mutex_unlock(&ghes_edac_lock); return -ENODEV; } - ghes_edac_mc_num++; - mutex_unlock(&ghes_edac_lock); + ghes_register_edac_chain(&ghes_nb); + return 0; } -EXPORT_SYMBOL_GPL(ghes_edac_register); +module_init(ghes_edac_register); -void ghes_edac_unregister(struct ghes *ghes) +static void __exit ghes_edac_unregister(void) { struct mem_ctl_info *mci; - struct ghes_edac_pvt *pvt, *tmp; - - list_for_each_entry_safe(pvt, tmp, &ghes_reglist, list) { - if (ghes == pvt->ghes) { - mci = pvt->mci; - edac_mc_del_mc(mci->pdev); - edac_mc_free(mci); - list_del(&pvt->list); - } - } + + ghes_unregister_edac_chain(&ghes_nb); + + mci = find_mci_by_dev(&dummy_dev); + WARN_ON(!mci); + + edac_mc_del_mc(mci->pdev); + edac_mc_free(mci); + } -EXPORT_SYMBOL_GPL(ghes_edac_unregister); +module_exit(ghes_edac_unregister); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("GHES error decoding module - " GHES_EDAC_REVISION); diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 9f26e01186ae..c02b8eb91bd6 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -51,31 +51,8 @@ enum { GHES_SEV_PANIC = 0x3, }; -/* From drivers/edac/ghes_edac.c */ - -#ifdef CONFIG_EDAC_GHES -void ghes_edac_report_mem_error(struct ghes *ghes, int sev, - struct cper_sec_mem_err *mem_err); - -int ghes_edac_register(struct ghes *ghes, struct device *dev); - -void ghes_edac_unregister(struct ghes *ghes); - -#else -static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev, - struct cper_sec_mem_err *mem_err) -{ -} - -static inline int ghes_edac_register(struct ghes *ghes, struct device *dev) -{ - return 0; -} - -static inline void ghes_edac_unregister(struct ghes *ghes) -{ -} -#endif +void ghes_register_edac_chain(struct notifier_block *nb); +void ghes_unregister_edac_chain(struct notifier_block *nb); static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata) { ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 8:08 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 8:08 UTC (permalink / raw) To: Toshi Kani, Tony Luck Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Tue, Jul 18, 2017 at 08:00:07AM +0200, Borislav Petkov wrote: > And I think we should try this first: have the firmware disable > detection methods so that the platform drivers don't load. Btw, in looking at this more, what about the firmware-first thing? I.e., the firmware-first detection with apei_osc_setup() at the end of ghes_init(). Can we make ghes_edac loading dependent on that? I mean, that was *the* predicate for exactly that - to have the firmware look at the errors first. No need for platform whitelisting and so on. I'd still decouple ghes_edac loading from ghes_probe() even though loading the platform driver should've been done *after* the firmware-first detection regardless. So what we could do is make ghes_edac a normal module and have the relevant x86 EDAC modules query FF mode and if enabled, fail loading. Hmmm? My gut feeling tells me I'm on the right track here but who knows... Thx. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 8:08 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-18 8:08 UTC (permalink / raw) To: Toshi Kani, Tony Luck Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Tue, Jul 18, 2017 at 08:00:07AM +0200, Borislav Petkov wrote: > And I think we should try this first: have the firmware disable > detection methods so that the platform drivers don't load. Btw, in looking at this more, what about the firmware-first thing? I.e., the firmware-first detection with apei_osc_setup() at the end of ghes_init(). Can we make ghes_edac loading dependent on that? I mean, that was *the* predicate for exactly that - to have the firmware look at the errors first. No need for platform whitelisting and so on. I'd still decouple ghes_edac loading from ghes_probe() even though loading the platform driver should've been done *after* the firmware-first detection regardless. So what we could do is make ghes_edac a normal module and have the relevant x86 EDAC modules query FF mode and if enabled, fail loading. Hmmm? My gut feeling tells me I'm on the right track here but who knows... Thx. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 8:08 ` [3/3] " Borislav Petkov (?) @ 2017-07-18 21:20 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 21:20 UTC (permalink / raw) To: tony.luck, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 10:08 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 08:00:07AM +0200, Borislav Petkov wrote: > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > Btw, in looking at this more, what about the firmware-first thing? > > I.e., the firmware-first detection with apei_osc_setup() at the end > of ghes_init(). > > Can we make ghes_edac loading dependent on that? I mean, that was > *the* predicate for exactly that - to have the firmware look at the > errors first. No need for platform whitelisting and so on. I agree that 'osc_sb_apei_support_acked' should be checked when enabling ghes_edac. I do not know the details of existing issues, but it sounds unlikely that this will address all of them since bugs can be everywhere. For instance, ghes_edac relies on DMI/SMBIOS info, unlike other EDAC drivers, which can be buggy regardless of this _OSC info. > I'd still decouple ghes_edac loading from ghes_probe() even though > loading the platform driver should've been done *after* the > firmware-first detection regardless. > > So what we could do is make ghes_edac a normal module and have the > relevant x86 EDAC modules query FF mode and if enabled, fail loading. I agree that making ghes_edac as a normal module is a good thing, but I do not think it's going to solve this issue. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 21:20 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 21:20 UTC (permalink / raw) To: tony.luck, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 10:08 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 08:00:07AM +0200, Borislav Petkov wrote: > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > Btw, in looking at this more, what about the firmware-first thing? > > I.e., the firmware-first detection with apei_osc_setup() at the end > of ghes_init(). > > Can we make ghes_edac loading dependent on that? I mean, that was > *the* predicate for exactly that - to have the firmware look at the > errors first. No need for platform whitelisting and so on. I agree that 'osc_sb_apei_support_acked' should be checked when enabling ghes_edac. I do not know the details of existing issues, but it sounds unlikely that this will address all of them since bugs can be everywhere. For instance, ghes_edac relies on DMI/SMBIOS info, unlike other EDAC drivers, which can be buggy regardless of this _OSC info. > I'd still decouple ghes_edac loading from ghes_probe() even though > loading the platform driver should've been done *after* the > firmware-first detection regardless. > > So what we could do is make ghes_edac a normal module and have the > relevant x86 EDAC modules query FF mode and if enabled, fail loading. I agree that making ghes_edac as a normal module is a good thing, but I do not think it's going to solve this issue. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 21:20 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 21:20 UTC (permalink / raw) To: tony.luck, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 10:08 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 08:00:07AM +0200, Borislav Petkov wrote: > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > Btw, in looking at this more, what about the firmware-first thing? > > I.e., the firmware-first detection with apei_osc_setup() at the end > of ghes_init(). > > Can we make ghes_edac loading dependent on that? I mean, that was > *the* predicate for exactly that - to have the firmware look at the > errors first. No need for platform whitelisting and so on. I agree that 'osc_sb_apei_support_acked' should be checked when enabling ghes_edac. I do not know the details of existing issues, but it sounds unlikely that this will address all of them since bugs can be everywhere. For instance, ghes_edac relies on DMI/SMBIOS info, unlike other EDAC drivers, which can be buggy regardless of this _OSC info. > I'd still decouple ghes_edac loading from ghes_probe() even though > loading the platform driver should've been done *after* the > firmware-first detection regardless. > > So what we could do is make ghes_edac a normal module and have the > relevant x86 EDAC modules query FF mode and if enabled, fail loading. I agree that making ghes_edac as a normal module is a good thing, but I do not think it's going to solve this issue. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 21:20 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-19 5:52 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:52 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote: > I agree that 'osc_sb_apei_support_acked' should be checked when > enabling ghes_edac. I do not know the details of existing issues, but > it sounds unlikely that this will address all of them since bugs can be > everywhere. No, see below. > For instance, ghes_edac relies on DMI/SMBIOS info, unlike > other EDAC drivers, which can be buggy regardless of this _OSC info. That's the problem with firmware. You can't really fix it and it is buggy as hell. > I agree that making ghes_edac as a normal module is a good thing, but I > do not think it's going to solve this issue. Of course it will - if the firmware says it wants to look at the errors first, then it gets to do so. This is the whole handling of hardware errors in the firmware deal. I admit, sometimes it makes sense because the firmware has the most intimate knowledge of the platform and, in a perfect world, we won't ever need to have platform-specific EDAC drivers. But, we don't live in a perfect world. And the vendor execution of the whole firmware-error-handling deal is an abomination at best. So, if we realize that the firmware is buggy, we can use a platform list to blacklist it (^hint hint^) and have a parameter to disable ghes_edac from loading. But we'll deal with that when we get to cross that bridge. Right now, I'd like to do the loading spec-conform and not fiddle with white-, black-, or any-other-color lists. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 5:52 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:52 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote: > I agree that 'osc_sb_apei_support_acked' should be checked when > enabling ghes_edac. I do not know the details of existing issues, but > it sounds unlikely that this will address all of them since bugs can be > everywhere. No, see below. > For instance, ghes_edac relies on DMI/SMBIOS info, unlike > other EDAC drivers, which can be buggy regardless of this _OSC info. That's the problem with firmware. You can't really fix it and it is buggy as hell. > I agree that making ghes_edac as a normal module is a good thing, but I > do not think it's going to solve this issue. Of course it will - if the firmware says it wants to look at the errors first, then it gets to do so. This is the whole handling of hardware errors in the firmware deal. I admit, sometimes it makes sense because the firmware has the most intimate knowledge of the platform and, in a perfect world, we won't ever need to have platform-specific EDAC drivers. But, we don't live in a perfect world. And the vendor execution of the whole firmware-error-handling deal is an abomination at best. So, if we realize that the firmware is buggy, we can use a platform list to blacklist it (^hint hint^) and have a parameter to disable ghes_edac from loading. But we'll deal with that when we get to cross that bridge. Right now, I'd like to do the loading spec-conform and not fiddle with white-, black-, or any-other-color lists. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 5:52 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:52 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote: > I agree that 'osc_sb_apei_support_acked' should be checked when > enabling ghes_edac. I do not know the details of existing issues, but > it sounds unlikely that this will address all of them since bugs can be > everywhere. No, see below. > For instance, ghes_edac relies on DMI/SMBIOS info, unlike > other EDAC drivers, which can be buggy regardless of this _OSC info. That's the problem with firmware. You can't really fix it and it is buggy as hell. > I agree that making ghes_edac as a normal module is a good thing, but I > do not think it's going to solve this issue. Of course it will - if the firmware says it wants to look at the errors first, then it gets to do so. This is the whole handling of hardware errors in the firmware deal. I admit, sometimes it makes sense because the firmware has the most intimate knowledge of the platform and, in a perfect world, we won't ever need to have platform-specific EDAC drivers. But, we don't live in a perfect world. And the vendor execution of the whole firmware-error-handling deal is an abomination at best. So, if we realize that the firmware is buggy, we can use a platform list to blacklist it (^hint hint^) and have a parameter to disable ghes_edac from loading. But we'll deal with that when we get to cross that bridge. Right now, I'd like to do the loading spec-conform and not fiddle with white-, black-, or any-other-color lists. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 5:52 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-19 16:10 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 16:10 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 07:52 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote: > > I agree that 'osc_sb_apei_support_acked' should be checked when > > enabling ghes_edac. I do not know the details of existing issues, > > but it sounds unlikely that this will address all of them since > > bugs can be everywhere. > > No, see below. > > > For instance, ghes_edac relies on DMI/SMBIOS info, unlike > > other EDAC drivers, which can be buggy regardless of this _OSC > > info. > > That's the problem with firmware. You can't really fix it and it is > buggy as hell. Right, and that's what I was told as an issue for ghes_edac. This is why this patch introduces a white-list to preclude all buggy firmwares that are unknown to us... > > I agree that making ghes_edac as a normal module is a good thing, > > but I do not think it's going to solve this issue. > > Of course it will - if the firmware says it wants to look at the > errors first, then it gets to do so. This is the whole handling of > hardware errors in the firmware deal. I admit, sometimes it makes > sense because the firmware has the most intimate knowledge of the > platform and, in a perfect world, we won't ever need to have > platform-specific EDAC drivers. > > But, we don't live in a perfect world. And the vendor execution of > the whole firmware-error-handling deal is an abomination at best. > > So, if we realize that the firmware is buggy, we can use a platform > list to blacklist it (^hint hint^) and have a parameter to disable > ghes_edac from loading. Setting blacklist needs us to enable ghes_edac and find all buggy firmwares to date. I think this is too disturbing for people who are happily using regular edac drivers today even though their platforms have GHES. > But we'll deal with that when we get to cross that bridge. Right now, > I'd like to do the loading spec-conform and not fiddle with white-, > black-, or any-other-color lists. I do prefer to avoid any white / black listing. But I do not see how it solves the buggy DMI/SMBIOS info as an example of firmware bugs we may have to deal with. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:10 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-19 16:10 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 07:52 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote: > > I agree that 'osc_sb_apei_support_acked' should be checked when > > enabling ghes_edac. I do not know the details of existing issues, > > but it sounds unlikely that this will address all of them since > > bugs can be everywhere. > > No, see below. > > > For instance, ghes_edac relies on DMI/SMBIOS info, unlike > > other EDAC drivers, which can be buggy regardless of this _OSC > > info. > > That's the problem with firmware. You can't really fix it and it is > buggy as hell. Right, and that's what I was told as an issue for ghes_edac. This is why this patch introduces a white-list to preclude all buggy firmwares that are unknown to us... > > I agree that making ghes_edac as a normal module is a good thing, > > but I do not think it's going to solve this issue. > > Of course it will - if the firmware says it wants to look at the > errors first, then it gets to do so. This is the whole handling of > hardware errors in the firmware deal. I admit, sometimes it makes > sense because the firmware has the most intimate knowledge of the > platform and, in a perfect world, we won't ever need to have > platform-specific EDAC drivers. > > But, we don't live in a perfect world. And the vendor execution of > the whole firmware-error-handling deal is an abomination at best. > > So, if we realize that the firmware is buggy, we can use a platform > list to blacklist it (^hint hint^) and have a parameter to disable > ghes_edac from loading. Setting blacklist needs us to enable ghes_edac and find all buggy firmwares to date. I think this is too disturbing for people who are happily using regular edac drivers today even though their platforms have GHES. > But we'll deal with that when we get to cross that bridge. Right now, > I'd like to do the loading spec-conform and not fiddle with white-, > black-, or any-other-color lists. I do prefer to avoid any white / black listing. But I do not see how it solves the buggy DMI/SMBIOS info as an example of firmware bugs we may have to deal with. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:10 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 16:10 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 07:52 +0200, Borislav Petkov wrote: > On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote: > > I agree that 'osc_sb_apei_support_acked' should be checked when > > enabling ghes_edac. I do not know the details of existing issues, > > but it sounds unlikely that this will address all of them since > > bugs can be everywhere. > > No, see below. > > > For instance, ghes_edac relies on DMI/SMBIOS info, unlike > > other EDAC drivers, which can be buggy regardless of this _OSC > > info. > > That's the problem with firmware. You can't really fix it and it is > buggy as hell. Right, and that's what I was told as an issue for ghes_edac. This is why this patch introduces a white-list to preclude all buggy firmwares that are unknown to us... > > I agree that making ghes_edac as a normal module is a good thing, > > but I do not think it's going to solve this issue. > > Of course it will - if the firmware says it wants to look at the > errors first, then it gets to do so. This is the whole handling of > hardware errors in the firmware deal. I admit, sometimes it makes > sense because the firmware has the most intimate knowledge of the > platform and, in a perfect world, we won't ever need to have > platform-specific EDAC drivers. > > But, we don't live in a perfect world. And the vendor execution of > the whole firmware-error-handling deal is an abomination at best. > > So, if we realize that the firmware is buggy, we can use a platform > list to blacklist it (^hint hint^) and have a parameter to disable > ghes_edac from loading. Setting blacklist needs us to enable ghes_edac and find all buggy firmwares to date. I think this is too disturbing for people who are happily using regular edac drivers today even though their platforms have GHES. > But we'll deal with that when we get to cross that bridge. Right now, > I'd like to do the loading spec-conform and not fiddle with white-, > black-, or any-other-color lists. I do prefer to avoid any white / black listing. But I do not see how it solves the buggy DMI/SMBIOS info as an example of firmware bugs we may have to deal with. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 16:10 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-19 16:22 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 16:22 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > I do prefer to avoid any white / black listing. But I do not see how > it solves the buggy DMI/SMBIOS info as an example of firmware bugs we > may have to deal with. So how do you want to deal with this? Maintain an evergrowing whitelist of platforms which are OK and then the moment a new platform comes along, you send a patch to add it to that whitelist? I'm sure you can see the problems with that approach. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:22 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 16:22 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > I do prefer to avoid any white / black listing. But I do not see how > it solves the buggy DMI/SMBIOS info as an example of firmware bugs we > may have to deal with. So how do you want to deal with this? Maintain an evergrowing whitelist of platforms which are OK and then the moment a new platform comes along, you send a patch to add it to that whitelist? I'm sure you can see the problems with that approach. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:22 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 16:22 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > I do prefer to avoid any white / black listing. But I do not see how > it solves the buggy DMI/SMBIOS info as an example of firmware bugs we > may have to deal with. So how do you want to deal with this? Maintain an evergrowing whitelist of platforms which are OK and then the moment a new platform comes along, you send a patch to add it to that whitelist? I'm sure you can see the problems with that approach. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 16:22 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-19 16:56 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 16:56 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 18:22 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > I do prefer to avoid any white / black listing. But I do not see > > how it solves the buggy DMI/SMBIOS info as an example of firmware > > bugs we may have to deal with. > > So how do you want to deal with this? > > Maintain an evergrowing whitelist of platforms which are OK and then > the moment a new platform comes along, you send a patch to add it to > that whitelist? > > I'm sure you can see the problems with that approach. Since ghes_edac has not been used for a long time, I have a feeling that not so many vendors want to use it. In the case of HPE, we do not need to update with each platform since "HPE" "Server" will cover all platforms we need. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:56 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-19 16:56 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 18:22 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > I do prefer to avoid any white / black listing. But I do not see > > how it solves the buggy DMI/SMBIOS info as an example of firmware > > bugs we may have to deal with. > > So how do you want to deal with this? > > Maintain an evergrowing whitelist of platforms which are OK and then > the moment a new platform comes along, you send a patch to add it to > that whitelist? > > I'm sure you can see the problems with that approach. Since ghes_edac has not been used for a long time, I have a feeling that not so many vendors want to use it. In the case of HPE, we do not need to update with each platform since "HPE" "Server" will cover all platforms we need. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:56 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 16:56 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 18:22 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > I do prefer to avoid any white / black listing. But I do not see > > how it solves the buggy DMI/SMBIOS info as an example of firmware > > bugs we may have to deal with. > > So how do you want to deal with this? > > Maintain an evergrowing whitelist of platforms which are OK and then > the moment a new platform comes along, you send a patch to add it to > that whitelist? > > I'm sure you can see the problems with that approach. Since ghes_edac has not been used for a long time, I have a feeling that not so many vendors want to use it. In the case of HPE, we do not need to update with each platform since "HPE" "Server" will cover all platforms we need. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 16:56 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-20 4:16 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:16 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:56:17PM +0000, Kani, Toshimitsu wrote: > Since ghes_edac has not been used for a long time, I have a feeling > that not so many vendors want to use it. In the case of HPE, we do not > need to update with each platform since "HPE" "Server" will cover all > platforms we need. Does the apei_osc_setup() detection with the uuid work on HP systems? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 4:16 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:16 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:56:17PM +0000, Kani, Toshimitsu wrote: > Since ghes_edac has not been used for a long time, I have a feeling > that not so many vendors want to use it. In the case of HPE, we do not > need to update with each platform since "HPE" "Server" will cover all > platforms we need. Does the apei_osc_setup() detection with the uuid work on HP systems? ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 4:16 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:16 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:56:17PM +0000, Kani, Toshimitsu wrote: > Since ghes_edac has not been used for a long time, I have a feeling > that not so many vendors want to use it. In the case of HPE, we do not > need to update with each platform since "HPE" "Server" will cover all > platforms we need. Does the apei_osc_setup() detection with the uuid work on HP systems? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 4:16 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-20 14:42 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-20 14:42 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 06:16 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:56:17PM +0000, Kani, Toshimitsu wrote: > > Since ghes_edac has not been used for a long time, I have a feeling > > that not so many vendors want to use it. In the case of HPE, we do > > not need to update with each platform since "HPE" "Server" will > > cover all platforms we need. > > Does the apei_osc_setup() detection with the uuid work on HP systems? Yes, the following message is shown on HP systems. Please note that WHEA is a Windows-defined interface. "GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC." Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 14:42 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-20 14:42 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 06:16 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:56:17PM +0000, Kani, Toshimitsu wrote: > > Since ghes_edac has not been used for a long time, I have a feeling > > that not so many vendors want to use it. In the case of HPE, we do > > not need to update with each platform since "HPE" "Server" will > > cover all platforms we need. > > Does the apei_osc_setup() detection with the uuid work on HP systems? Yes, the following message is shown on HP systems. Please note that WHEA is a Windows-defined interface. "GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC." Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 14:42 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-20 14:42 UTC (permalink / raw) To: bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 06:16 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:56:17PM +0000, Kani, Toshimitsu wrote: > > Since ghes_edac has not been used for a long time, I have a feeling > > that not so many vendors want to use it. In the case of HPE, we do > > not need to update with each platform since "HPE" "Server" will > > cover all platforms we need. > > Does the apei_osc_setup() detection with the uuid work on HP systems? Yes, the following message is shown on HP systems. Please note that WHEA is a Windows-defined interface. "GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC." Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 14:42 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-20 15:04 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 15:04 UTC (permalink / raw) To: Kani, Toshimitsu, tony.luck Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 02:42:25PM +0000, Kani, Toshimitsu wrote: > Yes, the following message is shown on HP systems. Please note that > WHEA is a Windows-defined interface. Ok, so let's couple ghes_edac loading to that and see how far we could go. I guess we should add checks for that to the major x86 EDAC drivers to not load and this way ghes_edac will be the only driver loading. Tony, how does that sound? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 15:04 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 15:04 UTC (permalink / raw) To: Kani, Toshimitsu, tony.luck Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 02:42:25PM +0000, Kani, Toshimitsu wrote: > Yes, the following message is shown on HP systems. Please note that > WHEA is a Windows-defined interface. Ok, so let's couple ghes_edac loading to that and see how far we could go. I guess we should add checks for that to the major x86 EDAC drivers to not load and this way ghes_edac will be the only driver loading. Tony, how does that sound? ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 15:04 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 15:04 UTC (permalink / raw) To: Kani, Toshimitsu, tony.luck Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 02:42:25PM +0000, Kani, Toshimitsu wrote: > Yes, the following message is shown on HP systems. Please note that > WHEA is a Windows-defined interface. Ok, so let's couple ghes_edac loading to that and see how far we could go. I guess we should add checks for that to the major x86 EDAC drivers to not load and this way ghes_edac will be the only driver loading. Tony, how does that sound? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 15:04 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-20 16:55 ` Luck, Tony -1 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 16:55 UTC (permalink / raw) To: Borislav Petkov, Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac >> Yes, the following message is shown on HP systems. Please note that >> WHEA is a Windows-defined interface. > > Ok, so let's couple ghes_edac loading to that and see how far we could > go. I guess we should add checks for that to the major x86 EDAC drivers > to not load and this way ghes_edac will be the only driver loading. > > Tony, how does that sound? Add a module parameter to those edac drivers that can override the check and let them load anyway. I'm not paranoid, I just assume that there is a BIOS out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 16:55 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 16:55 UTC (permalink / raw) To: Borislav Petkov, Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Pj4gWWVzLCB0aGUgZm9sbG93aW5nIG1lc3NhZ2UgaXMgc2hvd24gb24gSFAgc3lzdGVtcy4gIFBs ZWFzZSBub3RlIHRoYXQNCj4+IFdIRUEgaXMgYSBXaW5kb3dzLWRlZmluZWQgaW50ZXJmYWNlLg0K Pg0KPiBPaywgc28gbGV0J3MgY291cGxlIGdoZXNfZWRhYyBsb2FkaW5nIHRvIHRoYXQgYW5kIHNl ZSBob3cgZmFyIHdlIGNvdWxkDQo+IGdvLiBJIGd1ZXNzIHdlIHNob3VsZCBhZGQgY2hlY2tzIGZv ciB0aGF0IHRvIHRoZSBtYWpvciB4ODYgRURBQyBkcml2ZXJzDQo+IHRvIG5vdCBsb2FkIGFuZCB0 aGlzIHdheSBnaGVzX2VkYWMgd2lsbCBiZSB0aGUgb25seSBkcml2ZXIgbG9hZGluZy4NCj4NCj4g VG9ueSwgaG93IGRvZXMgdGhhdCBzb3VuZD8NCg0KQWRkIGEgbW9kdWxlIHBhcmFtZXRlciB0byB0 aG9zZSBlZGFjIGRyaXZlcnMgdGhhdCBjYW4gb3ZlcnJpZGUgdGhlIGNoZWNrDQphbmQgbGV0IHRo ZW0gbG9hZCBhbnl3YXkuICBJJ20gbm90IHBhcmFub2lkLCBJIGp1c3QgYXNzdW1lIHRoYXQgdGhl cmUgaXMgYSBCSU9TDQpvdXQgdGhlcmUgdGhhdCBzZXRzIHRoZSBPU0MvV0hFQSBiaXRzLCBidXQg aXNuJ3QgZ2VuZXJhdGluZyB1c2VmdWwgR0hFUyBsb2dzLg0KDQotVG9ueQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 16:55 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 16:55 UTC (permalink / raw) To: Borislav Petkov, Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac >> Yes, the following message is shown on HP systems. Please note that >> WHEA is a Windows-defined interface. > > Ok, so let's couple ghes_edac loading to that and see how far we could > go. I guess we should add checks for that to the major x86 EDAC drivers > to not load and this way ghes_edac will be the only driver loading. > > Tony, how does that sound? Add a module parameter to those edac drivers that can override the check and let them load anyway. I'm not paranoid, I just assume that there is a BIOS out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 16:55 ` [PATCH 3/3] " Luck, Tony (?) @ 2017-07-20 17:05 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 17:05 UTC (permalink / raw) To: Luck, Tony Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 04:55:59PM +0000, Luck, Tony wrote: > Add a module parameter to those edac drivers that can override the check > and let them load anyway. I'm not paranoid, I just assume that there is a BIOS > out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. Or add that parameter to edac_core.ko and let it control which EDAC driver gets loaded? Something like edac=ignore_ghes or so. And then the other EDAC drivers query it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 17:05 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 17:05 UTC (permalink / raw) To: Luck, Tony Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 04:55:59PM +0000, Luck, Tony wrote: > Add a module parameter to those edac drivers that can override the check > and let them load anyway. I'm not paranoid, I just assume that there is a BIOS > out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. Or add that parameter to edac_core.ko and let it control which EDAC driver gets loaded? Something like edac=ignore_ghes or so. And then the other EDAC drivers query it. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 17:05 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 17:05 UTC (permalink / raw) To: Luck, Tony Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 04:55:59PM +0000, Luck, Tony wrote: > Add a module parameter to those edac drivers that can override the check > and let them load anyway. I'm not paranoid, I just assume that there is a BIOS > out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. Or add that parameter to edac_core.ko and let it control which EDAC driver gets loaded? Something like edac=ignore_ghes or so. And then the other EDAC drivers query it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 17:05 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-20 17:10 ` Luck, Tony -1 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 17:10 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > Or add that parameter to edac_core.ko and let it control which EDAC > driver gets loaded? Something like > > edac=ignore_ghes > > or so. And then the other EDAC drivers query it. Sure ... one central place is better than adding code to each driver. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 17:10 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 17:10 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac PiBPciBhZGQgdGhhdCBwYXJhbWV0ZXIgdG8gZWRhY19jb3JlLmtvIGFuZCBsZXQgaXQgY29udHJv bCB3aGljaCBFREFDDQo+IGRyaXZlciBnZXRzIGxvYWRlZD8gU29tZXRoaW5nIGxpa2UNCj4NCj4g ZWRhYz1pZ25vcmVfZ2hlcw0KPg0KPiBvciBzby4gQW5kIHRoZW4gdGhlIG90aGVyIEVEQUMgZHJp dmVycyBxdWVyeSBpdC4NCg0KU3VyZSAuLi4gb25lIGNlbnRyYWwgcGxhY2UgaXMgYmV0dGVyIHRo YW4gYWRkaW5nIGNvZGUgdG8gZWFjaA0KZHJpdmVyLg0KDQotVG9ueQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 17:10 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 17:10 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > Or add that parameter to edac_core.ko and let it control which EDAC > driver gets loaded? Something like > > edac=ignore_ghes > > or so. And then the other EDAC drivers query it. Sure ... one central place is better than adding code to each driver. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 17:05 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-20 18:16 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-20 18:16 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 19:05:04 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 20, 2017 at 04:55:59PM +0000, Luck, Tony wrote: > > Add a module parameter to those edac drivers that can override the check > > and let them load anyway. I'm not paranoid, I just assume that there is a BIOS > > out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. > > Or add that parameter to edac_core.ko and let it control which EDAC > driver gets loaded? Something like > > edac=ignore_ghes > > or so. And then the other EDAC drivers query it. Works for me. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 18:16 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-20 18:16 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 19:05:04 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 20, 2017 at 04:55:59PM +0000, Luck, Tony wrote: > > Add a module parameter to those edac drivers that can override the check > > and let them load anyway. I'm not paranoid, I just assume that there is a BIOS > > out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. > > Or add that parameter to edac_core.ko and let it control which EDAC > driver gets loaded? Something like > > edac=ignore_ghes > > or so. And then the other EDAC drivers query it. Works for me. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 18:16 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-20 18:16 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 19:05:04 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 20, 2017 at 04:55:59PM +0000, Luck, Tony wrote: > > Add a module parameter to those edac drivers that can override the check > > and let them load anyway. I'm not paranoid, I just assume that there is a BIOS > > out there that sets the OSC/WHEA bits, but isn't generating useful GHES logs. > > Or add that parameter to edac_core.ko and let it control which EDAC > driver gets loaded? Something like > > edac=ignore_ghes > > or so. And then the other EDAC drivers query it. Works for me. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 16:22 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-19 18:55 ` Aristeu Rozanski -1 siblings, 0 replies; 238+ messages in thread From: Aristeu Rozanski @ 2017-07-19 18:55 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 06:22:04PM +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > I do prefer to avoid any white / black listing. But I do not see how > > it solves the buggy DMI/SMBIOS info as an example of firmware bugs we > > may have to deal with. > > So how do you want to deal with this? > > Maintain an evergrowing whitelist of platforms which are OK and then the > moment a new platform comes along, you send a patch to add it to that > whitelist? That would also need to keep an eye on versions. A newer version of BIOS on a whitelisted platform might be broken. -- Aristeu ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 18:55 ` Aristeu Rozanski 0 siblings, 0 replies; 238+ messages in thread From: Aristeu Rozanski @ 2017-07-19 18:55 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 06:22:04PM +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > I do prefer to avoid any white / black listing. But I do not see how > > it solves the buggy DMI/SMBIOS info as an example of firmware bugs we > > may have to deal with. > > So how do you want to deal with this? > > Maintain an evergrowing whitelist of platforms which are OK and then the > moment a new platform comes along, you send a patch to add it to that > whitelist? That would also need to keep an eye on versions. A newer version of BIOS on a whitelisted platform might be broken. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 18:55 ` Aristeu Rozanski 0 siblings, 0 replies; 238+ messages in thread From: Aristeu Rozanski @ 2017-07-19 18:55 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 06:22:04PM +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > I do prefer to avoid any white / black listing. But I do not see how > > it solves the buggy DMI/SMBIOS info as an example of firmware bugs we > > may have to deal with. > > So how do you want to deal with this? > > Maintain an evergrowing whitelist of platforms which are OK and then the > moment a new platform comes along, you send a patch to add it to that > whitelist? That would also need to keep an eye on versions. A newer version of BIOS on a whitelisted platform might be broken. -- Aristeu ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 18:55 ` [PATCH 3/3] " Aristeu Rozanski (?) @ 2017-07-19 20:13 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 20:13 UTC (permalink / raw) To: aris, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 14:55 -0400, Aristeu Rozanski wrote: > On Wed, Jul 19, 2017 at 06:22:04PM +0200, Borislav Petkov wrote: > > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > > I do prefer to avoid any white / black listing. But I do not see > > > how > > > it solves the buggy DMI/SMBIOS info as an example of firmware > > > bugs we > > > may have to deal with. > > > > So how do you want to deal with this? > > > > Maintain an evergrowing whitelist of platforms which are OK and > > then the moment a new platform comes along, you send a patch to add > > it to that whitelist? > > That would also need to keep an eye on versions. A newer version of > BIOS on a whitelisted platform might be broken. Right. I think a question comes to who broke a running system -- OS update or BIOS update. This whitelist attempts to protect the former case by not introducing ghes_edac on arbitrary platforms. The latter case should be vendor's responsibility. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 20:13 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-19 20:13 UTC (permalink / raw) To: aris, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac T24gV2VkLCAyMDE3LTA3LTE5IGF0IDE0OjU1IC0wNDAwLCBBcmlzdGV1IFJvemFuc2tpIHdyb3Rl Og0KPiBPbiBXZWQsIEp1bCAxOSwgMjAxNyBhdCAwNjoyMjowNFBNICswMjAwLCBCb3Jpc2xhdiBQ ZXRrb3Ygd3JvdGU6DQo+ID4gT24gV2VkLCBKdWwgMTksIDIwMTcgYXQgMDQ6MTA6MDdQTSArMDAw MCwgS2FuaSwgVG9zaGltaXRzdSB3cm90ZToNCj4gPiA+IEkgZG8gcHJlZmVyIHRvIGF2b2lkIGFu eSB3aGl0ZSAvIGJsYWNrIGxpc3RpbmcuwqDCoEJ1dCBJIGRvIG5vdCBzZWUNCj4gPiA+IGhvdw0K PiA+ID4gaXQgc29sdmVzIHRoZSBidWdneSBETUkvU01CSU9TIGluZm8gYXMgYW4gZXhhbXBsZSBv ZiBmaXJtd2FyZQ0KPiA+ID4gYnVncyB3ZQ0KPiA+ID4gbWF5IGhhdmUgdG8gZGVhbCB3aXRoLg0K PiA+IA0KPiA+IFNvIGhvdyBkbyB5b3Ugd2FudCB0byBkZWFsIHdpdGggdGhpcz8NCj4gPiANCj4g PiBNYWludGFpbiBhbiBldmVyZ3Jvd2luZyB3aGl0ZWxpc3Qgb2YgcGxhdGZvcm1zIHdoaWNoIGFy ZSBPSyBhbmQNCj4gPiB0aGVuIHRoZSBtb21lbnQgYSBuZXcgcGxhdGZvcm0gY29tZXMgYWxvbmcs IHlvdSBzZW5kIGEgcGF0Y2ggdG8gYWRkDQo+ID4gaXQgdG8gdGhhdCB3aGl0ZWxpc3Q/DQo+IA0K PiBUaGF0IHdvdWxkIGFsc28gbmVlZCB0byBrZWVwIGFuIGV5ZSBvbiB2ZXJzaW9ucy4gQSBuZXdl ciB2ZXJzaW9uIG9mDQo+IEJJT1Mgb24gYSB3aGl0ZWxpc3RlZCBwbGF0Zm9ybSBtaWdodCBiZSBi cm9rZW4uDQoNClJpZ2h0LiAgSSB0aGluayBhIHF1ZXN0aW9uIGNvbWVzIHRvIHdobyBicm9rZSBh IHJ1bm5pbmcgc3lzdGVtIC0tIE9TDQp1cGRhdGUgb3IgQklPUyB1cGRhdGUuICBUaGlzIHdoaXRl bGlzdCBhdHRlbXB0cyB0byBwcm90ZWN0IHRoZSBmb3JtZXINCmNhc2UgYnkgbm90IGludHJvZHVj aW5nIGdoZXNfZWRhYyBvbiBhcmJpdHJhcnkgcGxhdGZvcm1zLiAgVGhlIGxhdHRlcg0KY2FzZSBz aG91bGQgYmUgdmVuZG9yJ3MgcmVzcG9uc2liaWxpdHkuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 20:13 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 20:13 UTC (permalink / raw) To: aris, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, 2017-07-19 at 14:55 -0400, Aristeu Rozanski wrote: > On Wed, Jul 19, 2017 at 06:22:04PM +0200, Borislav Petkov wrote: > > On Wed, Jul 19, 2017 at 04:10:07PM +0000, Kani, Toshimitsu wrote: > > > I do prefer to avoid any white / black listing. But I do not see > > > how > > > it solves the buggy DMI/SMBIOS info as an example of firmware > > > bugs we > > > may have to deal with. > > > > So how do you want to deal with this? > > > > Maintain an evergrowing whitelist of platforms which are OK and > > then the moment a new platform comes along, you send a patch to add > > it to that whitelist? > > That would also need to keep an eye on versions. A newer version of > BIOS on a whitelisted platform might be broken. Right. I think a question comes to who broke a running system -- OS update or BIOS update. This whitelist attempts to protect the former case by not introducing ghes_edac on arbitrary platforms. The latter case should be vendor's responsibility. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 18:55 ` [PATCH 3/3] " Aristeu Rozanski (?) @ 2017-07-20 4:19 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:19 UTC (permalink / raw) To: Aristeu Rozanski Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 02:55:08PM -0400, Aristeu Rozanski wrote: > That would also need to keep an eye on versions. A newer version of BIOS > on a whitelisted platform might be broken. Yeah, that would be a nasty, back-stabbing SNAFU. So I'm thinking of adding a bunch of FW_ERR sanity checks to that whole ghes_edac and ghes init code to hopefully catch issues during platform validation. I.e., early enough for them to get fixed. But that's the same problem as with UEFI - vendors need to try to boot Linux on their platforms early enough. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 4:19 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:19 UTC (permalink / raw) To: Aristeu Rozanski Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 02:55:08PM -0400, Aristeu Rozanski wrote: > That would also need to keep an eye on versions. A newer version of BIOS > on a whitelisted platform might be broken. Yeah, that would be a nasty, back-stabbing SNAFU. So I'm thinking of adding a bunch of FW_ERR sanity checks to that whole ghes_edac and ghes init code to hopefully catch issues during platform validation. I.e., early enough for them to get fixed. But that's the same problem as with UEFI - vendors need to try to boot Linux on their platforms early enough. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 4:19 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:19 UTC (permalink / raw) To: Aristeu Rozanski Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 02:55:08PM -0400, Aristeu Rozanski wrote: > That would also need to keep an eye on versions. A newer version of BIOS > on a whitelisted platform might be broken. Yeah, that would be a nasty, back-stabbing SNAFU. So I'm thinking of adding a bunch of FW_ERR sanity checks to that whole ghes_edac and ghes init code to hopefully catch issues during platform validation. I.e., early enough for them to get fixed. But that's the same problem as with UEFI - vendors need to try to boot Linux on their platforms early enough. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 6:00 ` [3/3] " Borislav Petkov (?) @ 2017-07-18 19:58 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 19:58 UTC (permalink / raw) To: tony.luck, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet. This driver obtains error info > > from firmware interfaces, which are not properly implemented on > > many platforms, as the driver always emits the messages below: > > > > This EDAC driver relies on BIOS to enumerate memory and get error > > reports. Unfortunately, not all BIOSes reflect the memory layout > > correctly So, the end result of using this driver varies from > > vendor to vendor If you find incorrect reports, please contact > > your hardware vendor to correct its BIOS. > > > > To get out from this situation, add a platform type check to > > selectively enable the driver on the platforms that are known to > > have proper firmware implementation. Platform vendors can add > > their platforms to the list when they support ghes_edac. > > So maintaining whitelists for things has always been a PITA and we > should try to avoid it, if possible. (We can always do it if nothing > saner comes along.) Agreed. > Now, below is a dirty patch converting ghes_edac to a normal module. > On systems where we have GHES, the firmware generally disables the > detection of the presence of ECC hardware, thus preventing the > platform EDAC driver from loading. I have HPE Haswell and Skylake test systems with GHES, but they do not hide IMCs from the OS. So, the sb_edac and skx_edac drivers get attached on these systems when ghes_edac is disabled. > Let me clarify: I have an AMD HP box which, when GHES is enabled in > the BIOS, says that ECC is disabled in the memory controller and the > amd64_edac driver doesn't load for that memory controller. Hmm... what's the platform name of this box? I can look into this case if you need. > And I think we should try this first: have the firmware disable > detection methods so that the platform drivers don't load. I do not think we can rely on this method. > Then, ghes_edac can be a simple module and no other driver would > attempt loading. I like the use of notifier chain, which is much cleaner. > The question is: does the platform do this disabling now? Unfortunately, that is not the case today. The IMCs cannot be hidden with the Device Hide registers for Skylake at least. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 19:58 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 19:58 UTC (permalink / raw) To: tony.luck, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac T24gVHVlLCAyMDE3LTA3LTE4IGF0IDA4OjAwICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6 DQo+IE9uIE1vbiwgSnVsIDE3LCAyMDE3IGF0IDAzOjU5OjEyUE0gLTA2MDAsIFRvc2hpIEthbmkg d3JvdGU6DQo+ID4gVGhlIGdoZXNfZWRhYyBkcml2ZXIgd2FzIGludHJvZHVjZWQgaW4gMjAxMyBb MV0sIGJ1dCBpdCBoYXMgbm90DQo+ID4gYmVlbiBlbmFibGVkIGJ5IGFueSBkaXN0cm8geWV0LsKg wqBUaGlzIGRyaXZlciBvYnRhaW5zIGVycm9yIGluZm8NCj4gPiBmcm9tIGZpcm13YXJlIGludGVy ZmFjZXMsIHdoaWNoIGFyZSBub3QgcHJvcGVybHkgaW1wbGVtZW50ZWQgb24NCj4gPiBtYW55IHBs YXRmb3JtcywgYXMgdGhlIGRyaXZlciBhbHdheXMgZW1pdHMgdGhlIG1lc3NhZ2VzIGJlbG93Og0K PiA+IA0KPiA+IMKgVGhpcyBFREFDIGRyaXZlciByZWxpZXMgb24gQklPUyB0byBlbnVtZXJhdGUg bWVtb3J5IGFuZCBnZXQgZXJyb3INCj4gPiByZXBvcnRzLiDCoFVuZm9ydHVuYXRlbHksIG5vdCBh bGwgQklPU2VzIHJlZmxlY3QgdGhlIG1lbW9yeSBsYXlvdXQNCj4gPiBjb3JyZWN0bHkgwqBTbywg dGhlIGVuZCByZXN1bHQgb2YgdXNpbmcgdGhpcyBkcml2ZXIgdmFyaWVzIGZyb20NCj4gPiB2ZW5k b3IgdG8gdmVuZG9yIMKgSWYgeW91IGZpbmQgaW5jb3JyZWN0IHJlcG9ydHMsIHBsZWFzZSBjb250 YWN0DQo+ID4geW91ciBoYXJkd2FyZSB2ZW5kb3IgwqB0byBjb3JyZWN0IGl0cyBCSU9TLg0KPiA+ IA0KPiA+IFRvIGdldCBvdXQgZnJvbSB0aGlzIHNpdHVhdGlvbiwgYWRkIGEgcGxhdGZvcm0gdHlw ZSBjaGVjayB0bw0KPiA+IHNlbGVjdGl2ZWx5IGVuYWJsZSB0aGUgZHJpdmVyIG9uIHRoZSBwbGF0 Zm9ybXMgdGhhdCBhcmUga25vd24gdG8NCj4gPiBoYXZlIHByb3BlciBmaXJtd2FyZSBpbXBsZW1l bnRhdGlvbi7CoMKgUGxhdGZvcm0gdmVuZG9ycyBjYW4gYWRkDQo+ID4gdGhlaXIgcGxhdGZvcm1z IHRvIHRoZSBsaXN0IHdoZW4gdGhleSBzdXBwb3J0IGdoZXNfZWRhYy4NCj4gDQo+IFNvIG1haW50 YWluaW5nIHdoaXRlbGlzdHMgZm9yIHRoaW5ncyBoYXMgYWx3YXlzIGJlZW4gYSBQSVRBIGFuZCB3 ZQ0KPiBzaG91bGQgdHJ5IHRvIGF2b2lkIGl0LCBpZiBwb3NzaWJsZS4gKFdlIGNhbiBhbHdheXMg ZG8gaXQgaWYgbm90aGluZw0KPiBzYW5lciBjb21lcyBhbG9uZy4pDQoNCkFncmVlZC4NCg0KPiBO b3csIGJlbG93IGlzIGEgZGlydHkgcGF0Y2ggY29udmVydGluZyBnaGVzX2VkYWMgdG8gYSBub3Jt YWwgbW9kdWxlLg0KPiBPbiBzeXN0ZW1zIHdoZXJlIHdlIGhhdmUgR0hFUywgdGhlIGZpcm13YXJl IGdlbmVyYWxseSBkaXNhYmxlcyB0aGUNCj4gZGV0ZWN0aW9uIG9mIHRoZSBwcmVzZW5jZSBvZiBF Q0MgaGFyZHdhcmUsIHRodXMgcHJldmVudGluZyB0aGUNCj4gcGxhdGZvcm0gRURBQyBkcml2ZXIg ZnJvbSBsb2FkaW5nLg0KDQpJIGhhdmUgSFBFIEhhc3dlbGwgYW5kIFNreWxha2UgdGVzdCBzeXN0 ZW1zIHdpdGggR0hFUywgYnV0IHRoZXkgZG8gbm90DQpoaWRlIElNQ3MgZnJvbSB0aGUgT1MuICBT bywgdGhlIHNiX2VkYWMgYW5kIHNreF9lZGFjIGRyaXZlcnMgZ2V0DQphdHRhY2hlZCBvbiB0aGVz ZSBzeXN0ZW1zIHdoZW4gZ2hlc19lZGFjIGlzIGRpc2FibGVkLg0KDQo+IExldCBtZSBjbGFyaWZ5 OiBJIGhhdmUgYW4gQU1EIEhQIGJveCB3aGljaCwgd2hlbiBHSEVTIGlzIGVuYWJsZWQgaW4NCj4g dGhlIEJJT1MsIHNheXMgdGhhdCBFQ0MgaXMgZGlzYWJsZWQgaW4gdGhlIG1lbW9yeSBjb250cm9s bGVyIGFuZCB0aGUNCj4gYW1kNjRfZWRhYyBkcml2ZXIgZG9lc24ndCBsb2FkIGZvciB0aGF0IG1l bW9yeSBjb250cm9sbGVyLg0KDQpIbW0uLi4gd2hhdCdzIHRoZSBwbGF0Zm9ybSBuYW1lIG9mIHRo aXMgYm94PyAgSSBjYW4gbG9vayBpbnRvIHRoaXMgY2FzZQ0KaWYgeW91IG5lZWQuDQoNCj4gQW5k IEkgdGhpbmsgd2Ugc2hvdWxkIHRyeSB0aGlzIGZpcnN0OiBoYXZlIHRoZSBmaXJtd2FyZSBkaXNh YmxlDQo+IGRldGVjdGlvbiBtZXRob2RzIHNvIHRoYXQgdGhlIHBsYXRmb3JtIGRyaXZlcnMgZG9u J3QgbG9hZC4NCg0KSSBkbyBub3QgdGhpbmsgd2UgY2FuIHJlbHkgb24gdGhpcyBtZXRob2QuDQoN Cj4gVGhlbiwgZ2hlc19lZGFjIGNhbiBiZSBhIHNpbXBsZSBtb2R1bGUgYW5kIG5vIG90aGVyIGRy aXZlciB3b3VsZA0KPiBhdHRlbXB0IGxvYWRpbmcuDQoNCkkgbGlrZSB0aGUgdXNlIG9mIG5vdGlm aWVyIGNoYWluLCB3aGljaCBpcyBtdWNoIGNsZWFuZXIuDQoNCj4gVGhlIHF1ZXN0aW9uIGlzOiBk b2VzIHRoZSBwbGF0Zm9ybSBkbyB0aGlzIGRpc2FibGluZyBub3c/DQoNClVuZm9ydHVuYXRlbHks IHRoYXQgaXMgbm90IHRoZSBjYXNlIHRvZGF5LiAgVGhlIElNQ3MgY2Fubm90IGJlIGhpZGRlbg0K d2l0aCB0aGUgRGV2aWNlIEhpZGUgcmVnaXN0ZXJzIGZvciBTa3lsYWtlIGF0IGxlYXN0Lg0KDQpU aGFua3MsDQotVG9zaGkNCg0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 19:58 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 19:58 UTC (permalink / raw) To: tony.luck, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet. This driver obtains error info > > from firmware interfaces, which are not properly implemented on > > many platforms, as the driver always emits the messages below: > > > > This EDAC driver relies on BIOS to enumerate memory and get error > > reports. Unfortunately, not all BIOSes reflect the memory layout > > correctly So, the end result of using this driver varies from > > vendor to vendor If you find incorrect reports, please contact > > your hardware vendor to correct its BIOS. > > > > To get out from this situation, add a platform type check to > > selectively enable the driver on the platforms that are known to > > have proper firmware implementation. Platform vendors can add > > their platforms to the list when they support ghes_edac. > > So maintaining whitelists for things has always been a PITA and we > should try to avoid it, if possible. (We can always do it if nothing > saner comes along.) Agreed. > Now, below is a dirty patch converting ghes_edac to a normal module. > On systems where we have GHES, the firmware generally disables the > detection of the presence of ECC hardware, thus preventing the > platform EDAC driver from loading. I have HPE Haswell and Skylake test systems with GHES, but they do not hide IMCs from the OS. So, the sb_edac and skx_edac drivers get attached on these systems when ghes_edac is disabled. > Let me clarify: I have an AMD HP box which, when GHES is enabled in > the BIOS, says that ECC is disabled in the memory controller and the > amd64_edac driver doesn't load for that memory controller. Hmm... what's the platform name of this box? I can look into this case if you need. > And I think we should try this first: have the firmware disable > detection methods so that the platform drivers don't load. I do not think we can rely on this method. > Then, ghes_edac can be a simple module and no other driver would > attempt loading. I like the use of notifier chain, which is much cleaner. > The question is: does the platform do this disabling now? Unfortunately, that is not the case today. The IMCs cannot be hidden with the Device Hide registers for Skylake at least. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 19:58 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-18 21:15 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-18 21:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Tue, 18 Jul 2017 19:58:54 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > > The ghes_edac driver was introduced in 2013 [1], but it has not > > > been enabled by any distro yet. This driver obtains error info > > > from firmware interfaces, which are not properly implemented on > > > many platforms, as the driver always emits the messages below: > > > > > > This EDAC driver relies on BIOS to enumerate memory and get error > > > reports. Unfortunately, not all BIOSes reflect the memory layout > > > correctly So, the end result of using this driver varies from > > > vendor to vendor If you find incorrect reports, please contact > > > your hardware vendor to correct its BIOS. > > > > > > To get out from this situation, add a platform type check to > > > selectively enable the driver on the platforms that are known to > > > have proper firmware implementation. Platform vendors can add > > > their platforms to the list when they support ghes_edac. > > > > So maintaining whitelists for things has always been a PITA and we > > should try to avoid it, if possible. (We can always do it if nothing > > saner comes along.) > > Agreed. > > > Now, below is a dirty patch converting ghes_edac to a normal module. > > On systems where we have GHES, the firmware generally disables the > > detection of the presence of ECC hardware, thus preventing the > > platform EDAC driver from loading. > > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. > > > Let me clarify: I have an AMD HP box which, when GHES is enabled in > > the BIOS, says that ECC is disabled in the memory controller and the > > amd64_edac driver doesn't load for that memory controller. > > Hmm... what's the platform name of this box? I can look into this case > if you need. > > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > I do not think we can rely on this method. > > > Then, ghes_edac can be a simple module and no other driver would > > attempt loading. > > I like the use of notifier chain, which is much cleaner. > > > The question is: does the platform do this disabling now? > > Unfortunately, that is not the case today. The IMCs cannot be hidden > with the Device Hide registers for Skylake at least. We had a similar discussion several years ago when I wrote this driver. On that time, I talked with Red Hat, HP, Dell, Intel people and with some customers with large clusters. The way it is, ghes_edac is a poor man's driver. What it hopefully provide is a detection that an error happened, without really telling the user what component should be replaced. Ok, on machines with their own error reporting mechanism (like HP servers), a sys admin can look on some proprietary software (or bios), in order to identify what happened. Yet, BIOS doesn't provide any glue about what's the memory architecture, as it maps memory as if it was a single DIMM memory: (from ghes_edac_register) layers[0].type = EDAC_MC_LAYER_ALL_MEM; layers[0].size = num_dimm; layers[0].is_virt_csrow = true; So, even on systems where the BIOS actually knows how the memory cards are wired, it will mask the memory controller data. Now, the EDAC driver can also be used to identify what channels are used. That helps the sys admin to know if the memories are connected in a way that it will be using multiple channels, or not, helping to setup the machine to obtain the maximum possible performance. So, for example, on my Intel-based HP server, I can check such info with: $ ras-mc-ctl --mainboard ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 $ ras-mc-ctl --layout +-----------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | -------+-----------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | -------+---------------------------------------------------------------------------+ So, I know that both CPUs will be connected to my memories, and, on both, it is using 2 channels. If I was using the ghes driver, that information would be hidden. So, due to all problems with ghes, it is enabled only if there are no better solution, e. g. on systems where there's no way to talk directly to the hardware (like on E7 Xeon machines, where the memory controller is actually on a separate chip that are controlled only by the BIOS). Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 21:15 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-18 21:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Tue, 18 Jul 2017 19:58:54 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > > The ghes_edac driver was introduced in 2013 [1], but it has not > > > been enabled by any distro yet. This driver obtains error info > > > from firmware interfaces, which are not properly implemented on > > > many platforms, as the driver always emits the messages below: > > > > > > This EDAC driver relies on BIOS to enumerate memory and get error > > > reports. Unfortunately, not all BIOSes reflect the memory layout > > > correctly So, the end result of using this driver varies from > > > vendor to vendor If you find incorrect reports, please contact > > > your hardware vendor to correct its BIOS. > > > > > > To get out from this situation, add a platform type check to > > > selectively enable the driver on the platforms that are known to > > > have proper firmware implementation. Platform vendors can add > > > their platforms to the list when they support ghes_edac. > > > > So maintaining whitelists for things has always been a PITA and we > > should try to avoid it, if possible. (We can always do it if nothing > > saner comes along.) > > Agreed. > > > Now, below is a dirty patch converting ghes_edac to a normal module. > > On systems where we have GHES, the firmware generally disables the > > detection of the presence of ECC hardware, thus preventing the > > platform EDAC driver from loading. > > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. > > > Let me clarify: I have an AMD HP box which, when GHES is enabled in > > the BIOS, says that ECC is disabled in the memory controller and the > > amd64_edac driver doesn't load for that memory controller. > > Hmm... what's the platform name of this box? I can look into this case > if you need. > > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > I do not think we can rely on this method. > > > Then, ghes_edac can be a simple module and no other driver would > > attempt loading. > > I like the use of notifier chain, which is much cleaner. > > > The question is: does the platform do this disabling now? > > Unfortunately, that is not the case today. The IMCs cannot be hidden > with the Device Hide registers for Skylake at least. We had a similar discussion several years ago when I wrote this driver. On that time, I talked with Red Hat, HP, Dell, Intel people and with some customers with large clusters. The way it is, ghes_edac is a poor man's driver. What it hopefully provide is a detection that an error happened, without really telling the user what component should be replaced. Ok, on machines with their own error reporting mechanism (like HP servers), a sys admin can look on some proprietary software (or bios), in order to identify what happened. Yet, BIOS doesn't provide any glue about what's the memory architecture, as it maps memory as if it was a single DIMM memory: (from ghes_edac_register) layers[0].type = EDAC_MC_LAYER_ALL_MEM; layers[0].size = num_dimm; layers[0].is_virt_csrow = true; So, even on systems where the BIOS actually knows how the memory cards are wired, it will mask the memory controller data. Now, the EDAC driver can also be used to identify what channels are used. That helps the sys admin to know if the memories are connected in a way that it will be using multiple channels, or not, helping to setup the machine to obtain the maximum possible performance. So, for example, on my Intel-based HP server, I can check such info with: $ ras-mc-ctl --mainboard ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 $ ras-mc-ctl --layout +-----------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | -------+-----------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | -------+---------------------------------------------------------------------------+ So, I know that both CPUs will be connected to my memories, and, on both, it is using 2 channels. If I was using the ghes driver, that information would be hidden. So, due to all problems with ghes, it is enabled only if there are no better solution, e. g. on systems where there's no way to talk directly to the hardware (like on E7 Xeon machines, where the memory controller is actually on a separate chip that are controlled only by the BIOS). Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 21:15 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-18 21:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Tue, 18 Jul 2017 19:58:54 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote: > > On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote: > > > The ghes_edac driver was introduced in 2013 [1], but it has not > > > been enabled by any distro yet. This driver obtains error info > > > from firmware interfaces, which are not properly implemented on > > > many platforms, as the driver always emits the messages below: > > > > > > This EDAC driver relies on BIOS to enumerate memory and get error > > > reports. Unfortunately, not all BIOSes reflect the memory layout > > > correctly So, the end result of using this driver varies from > > > vendor to vendor If you find incorrect reports, please contact > > > your hardware vendor to correct its BIOS. > > > > > > To get out from this situation, add a platform type check to > > > selectively enable the driver on the platforms that are known to > > > have proper firmware implementation. Platform vendors can add > > > their platforms to the list when they support ghes_edac. > > > > So maintaining whitelists for things has always been a PITA and we > > should try to avoid it, if possible. (We can always do it if nothing > > saner comes along.) > > Agreed. > > > Now, below is a dirty patch converting ghes_edac to a normal module. > > On systems where we have GHES, the firmware generally disables the > > detection of the presence of ECC hardware, thus preventing the > > platform EDAC driver from loading. > > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. > > > Let me clarify: I have an AMD HP box which, when GHES is enabled in > > the BIOS, says that ECC is disabled in the memory controller and the > > amd64_edac driver doesn't load for that memory controller. > > Hmm... what's the platform name of this box? I can look into this case > if you need. > > > And I think we should try this first: have the firmware disable > > detection methods so that the platform drivers don't load. > > I do not think we can rely on this method. > > > Then, ghes_edac can be a simple module and no other driver would > > attempt loading. > > I like the use of notifier chain, which is much cleaner. > > > The question is: does the platform do this disabling now? > > Unfortunately, that is not the case today. The IMCs cannot be hidden > with the Device Hide registers for Skylake at least. We had a similar discussion several years ago when I wrote this driver. On that time, I talked with Red Hat, HP, Dell, Intel people and with some customers with large clusters. The way it is, ghes_edac is a poor man's driver. What it hopefully provide is a detection that an error happened, without really telling the user what component should be replaced. Ok, on machines with their own error reporting mechanism (like HP servers), a sys admin can look on some proprietary software (or bios), in order to identify what happened. Yet, BIOS doesn't provide any glue about what's the memory architecture, as it maps memory as if it was a single DIMM memory: (from ghes_edac_register) layers[0].type = EDAC_MC_LAYER_ALL_MEM; layers[0].size = num_dimm; layers[0].is_virt_csrow = true; So, even on systems where the BIOS actually knows how the memory cards are wired, it will mask the memory controller data. Now, the EDAC driver can also be used to identify what channels are used. That helps the sys admin to know if the memories are connected in a way that it will be using multiple channels, or not, helping to setup the machine to obtain the maximum possible performance. So, for example, on my Intel-based HP server, I can check such info with: $ ras-mc-ctl --mainboard ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 $ ras-mc-ctl --layout +-----------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | -------+-----------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | -------+---------------------------------------------------------------------------+ So, I know that both CPUs will be connected to my memories, and, on both, it is using 2 channels. If I was using the ghes driver, that information would be hidden. So, due to all problems with ghes, it is enabled only if there are no better solution, e. g. on systems where there's no way to talk directly to the hardware (like on E7 Xeon machines, where the memory controller is actually on a separate chip that are controlled only by the BIOS). Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 21:15 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-19 5:58 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:58 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 06:15:45PM -0300, Mauro Carvalho Chehab wrote: > The way it is, ghes_edac is a poor man's driver. What it hopefully > provide is a detection that an error happened, without really telling > the user what component should be replaced. I beg to differ. From the UEFI spec: "The module number of the memory error location. (NODE, CARD, and MODULE should provide the information necessary to identify the failing FRU)." So this tuple is sufficient to pinpoint the DIMM, IIUC. Which means, ghes_edac can have a single layer of DIMMs without channels. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 5:58 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:58 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 06:15:45PM -0300, Mauro Carvalho Chehab wrote: > The way it is, ghes_edac is a poor man's driver. What it hopefully > provide is a detection that an error happened, without really telling > the user what component should be replaced. I beg to differ. From the UEFI spec: "The module number of the memory error location. (NODE, CARD, and MODULE should provide the information necessary to identify the failing FRU)." So this tuple is sufficient to pinpoint the DIMM, IIUC. Which means, ghes_edac can have a single layer of DIMMs without channels. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 5:58 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:58 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 06:15:45PM -0300, Mauro Carvalho Chehab wrote: > The way it is, ghes_edac is a poor man's driver. What it hopefully > provide is a detection that an error happened, without really telling > the user what component should be replaced. I beg to differ. From the UEFI spec: "The module number of the memory error location. (NODE, CARD, and MODULE should provide the information necessary to identify the failing FRU)." So this tuple is sufficient to pinpoint the DIMM, IIUC. Which means, ghes_edac can have a single layer of DIMMs without channels. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 5:58 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-19 15:14 ` Luck, Tony -1 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 15:14 UTC (permalink / raw) To: Borislav Petkov, Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > "The module number of the memory error location. (NODE, CARD, and MODULE > should provide the information necessary to identify the failing FRU)." > > So this tuple is sufficient to pinpoint the DIMM, IIUC. > > Which means, ghes_edac can have a single layer of DIMMs without channels. The tricky part is that you have to rely on SMBIOS/DMI to know what DIMMs are on the system when the driver initializes so you can populate /sys/.*/edac Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need to match these up. But SMBIOS only gave you two strings "Locator" and "Bank Locator" which have no defined syntax. You are at the mercy of the BIOS writer to put in something parseable. Some writers used zero based counts, others are Fortran fans and use one-based. Still other use letters. About the one guarantee is that they will make almost no effort to match the silkscreen labels on the motherboard itself. E.g. my Broadwell-EX has things like: Locator: CHANNEL D DIMM 1 Bank Locator: Memriser8 Channel is A,B,C,D. DIMM is 0, 1, 2. Memriser is {1..8} so this manages to use all three counting options! -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 15:14 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 15:14 UTC (permalink / raw) To: Borislav Petkov, Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > "The module number of the memory error location. (NODE, CARD, and MODULE > should provide the information necessary to identify the failing FRU)." > > So this tuple is sufficient to pinpoint the DIMM, IIUC. > > Which means, ghes_edac can have a single layer of DIMMs without channels. The tricky part is that you have to rely on SMBIOS/DMI to know what DIMMs are on the system when the driver initializes so you can populate /sys/.*/edac Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need to match these up. But SMBIOS only gave you two strings "Locator" and "Bank Locator" which have no defined syntax. You are at the mercy of the BIOS writer to put in something parseable. Some writers used zero based counts, others are Fortran fans and use one-based. Still other use letters. About the one guarantee is that they will make almost no effort to match the silkscreen labels on the motherboard itself. E.g. my Broadwell-EX has things like: Locator: CHANNEL D DIMM 1 Bank Locator: Memriser8 Channel is A,B,C,D. DIMM is 0, 1, 2. Memriser is {1..8} so this manages to use all three counting options! -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 15:14 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 15:14 UTC (permalink / raw) To: Borislav Petkov, Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > "The module number of the memory error location. (NODE, CARD, and MODULE > should provide the information necessary to identify the failing FRU)." > > So this tuple is sufficient to pinpoint the DIMM, IIUC. > > Which means, ghes_edac can have a single layer of DIMMs without channels. The tricky part is that you have to rely on SMBIOS/DMI to know what DIMMs are on the system when the driver initializes so you can populate /sys/.*/edac Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need to match these up. But SMBIOS only gave you two strings "Locator" and "Bank Locator" which have no defined syntax. You are at the mercy of the BIOS writer to put in something parseable. Some writers used zero based counts, others are Fortran fans and use one-based. Still other use letters. About the one guarantee is that they will make almost no effort to match the silkscreen labels on the motherboard itself. E.g. my Broadwell-EX has things like: Locator: CHANNEL D DIMM 1 Bank Locator: Memriser8 Channel is A,B,C,D. DIMM is 0, 1, 2. Memriser is {1..8} so this manages to use all three counting options! -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 15:14 ` [PATCH 3/3] " Luck, Tony (?) @ 2017-07-19 15:57 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 15:57 UTC (permalink / raw) To: Luck, Tony Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 03:14:32PM +0000, Luck, Tony wrote: > Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need > to match these up. But SMBIOS only gave you two strings "Locator" and "Bank > Locator" which have no defined syntax. You are at the mercy of the BIOS writer > to put in something parseable. Well, at some point it is only so much we can do, right? I mean, if FW says it wants to do firmware-first and we go and adhere to that, it should be expected that said FW vendor marks the silkscreen labels and DMI data accordingly. I mean, it is time for FW to put its money where its mouth is, no? How else would you do this? Firmware First but the kernel does the figuring out which DIMMs are where. So FW can't have the cake and eat it too. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 15:57 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 15:57 UTC (permalink / raw) To: Luck, Tony Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 03:14:32PM +0000, Luck, Tony wrote: > Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need > to match these up. But SMBIOS only gave you two strings "Locator" and "Bank > Locator" which have no defined syntax. You are at the mercy of the BIOS writer > to put in something parseable. Well, at some point it is only so much we can do, right? I mean, if FW says it wants to do firmware-first and we go and adhere to that, it should be expected that said FW vendor marks the silkscreen labels and DMI data accordingly. I mean, it is time for FW to put its money where its mouth is, no? How else would you do this? Firmware First but the kernel does the figuring out which DIMMs are where. So FW can't have the cake and eat it too. :-) ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 15:57 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 15:57 UTC (permalink / raw) To: Luck, Tony Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 03:14:32PM +0000, Luck, Tony wrote: > Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need > to match these up. But SMBIOS only gave you two strings "Locator" and "Bank > Locator" which have no defined syntax. You are at the mercy of the BIOS writer > to put in something parseable. Well, at some point it is only so much we can do, right? I mean, if FW says it wants to do firmware-first and we go and adhere to that, it should be expected that said FW vendor marks the silkscreen labels and DMI data accordingly. I mean, it is time for FW to put its money where its mouth is, no? How else would you do this? Firmware First but the kernel does the figuring out which DIMMs are where. So FW can't have the cake and eat it too. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 15:57 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-19 18:06 ` Luck, Tony -1 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 18:06 UTC (permalink / raw) To: Borislav Petkov Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac >> Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need >> to match these up. But SMBIOS only gave you two strings "Locator" and "Bank >> Locator" which have no defined syntax. You are at the mercy of the BIOS writer >> to put in something parseable. > > Well, at some point it is only so much we can do, right? > > I mean, if FW says it wants to do firmware-first and we go and adhere > to that, it should be expected that said FW vendor marks the silkscreen > labels and DMI data accordingly. > > I mean, it is time for FW to put its money where its mouth is, no? > > How else would you do this? By thinking a bit more and realizing that what I wrote up above misses that at byte offset 78 in the UEFI memory error section there is "Module Handle" which tells you which SMBIOS entry to use. So this should work just fine (as long as BIOS fills out all these fields ... there's a "Validation Bits" mask at the start of the error structure that says which fields have been populated). -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 18:06 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 18:06 UTC (permalink / raw) To: Borislav Petkov Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Pj4gTGF0ZXIgd2hlbiBHSEVTIGdpdmVzIHlvdSBhIE5PREUvQ0FSRC9NT0RVTEUpIGluIGFuIGVy cm9yIHJlY29yZC4gIFlvdSBuZWVkDQo+PiB0byBtYXRjaCB0aGVzZSB1cC4gQnV0IFNNQklPUyBv bmx5IGdhdmUgeW91IHR3byBzdHJpbmdzICJMb2NhdG9yIiBhbmQgIkJhbmsNCj4+IExvY2F0b3Ii IHdoaWNoIGhhdmUgbm8gZGVmaW5lZCBzeW50YXguIFlvdSBhcmUgYXQgdGhlIG1lcmN5IG9mIHRo ZSBCSU9TIHdyaXRlcg0KPj4gdG8gcHV0IGluIHNvbWV0aGluZyBwYXJzZWFibGUuDQo+DQo+IFdl bGwsIGF0IHNvbWUgcG9pbnQgaXQgaXMgb25seSBzbyBtdWNoIHdlIGNhbiBkbywgcmlnaHQ/DQo+ DQo+IEkgbWVhbiwgaWYgRlcgc2F5cyBpdCB3YW50cyB0byBkbyBmaXJtd2FyZS1maXJzdCBhbmQg d2UgZ28gYW5kIGFkaGVyZQ0KPiB0byB0aGF0LCBpdCBzaG91bGQgYmUgZXhwZWN0ZWQgdGhhdCBz YWlkIEZXIHZlbmRvciBtYXJrcyB0aGUgc2lsa3NjcmVlbg0KPiBsYWJlbHMgYW5kIERNSSBkYXRh IGFjY29yZGluZ2x5Lg0KPg0KPiBJIG1lYW4sIGl0IGlzIHRpbWUgZm9yIEZXIHRvIHB1dCBpdHMg bW9uZXkgd2hlcmUgaXRzIG1vdXRoIGlzLCBubz8NCj4NCj4gSG93IGVsc2Ugd291bGQgeW91IGRv IHRoaXM/DQoNCkJ5IHRoaW5raW5nIGEgYml0IG1vcmUgYW5kIHJlYWxpemluZyB0aGF0IHdoYXQg SSB3cm90ZSB1cCBhYm92ZSBtaXNzZXMgdGhhdA0KYXQgYnl0ZSBvZmZzZXQgNzggaW4gdGhlIFVF RkkgbWVtb3J5IGVycm9yIHNlY3Rpb24gdGhlcmUgaXMgIk1vZHVsZSBIYW5kbGUiDQp3aGljaCB0 ZWxscyB5b3Ugd2hpY2ggU01CSU9TIGVudHJ5IHRvIHVzZS4NCg0KU28gdGhpcyBzaG91bGQgd29y ayBqdXN0IGZpbmUgKGFzIGxvbmcgYXMgQklPUyBmaWxscyBvdXQgYWxsIHRoZXNlIGZpZWxkcyAu Li4NCnRoZXJlJ3MgYSAiVmFsaWRhdGlvbiBCaXRzIiBtYXNrIGF0IHRoZSBzdGFydCBvZiB0aGUg ZXJyb3Igc3RydWN0dXJlIHRoYXQgc2F5cw0Kd2hpY2ggZmllbGRzIGhhdmUgYmVlbiBwb3B1bGF0 ZWQpLg0KDQotVG9ueQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 18:06 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 18:06 UTC (permalink / raw) To: Borislav Petkov Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac >> Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need >> to match these up. But SMBIOS only gave you two strings "Locator" and "Bank >> Locator" which have no defined syntax. You are at the mercy of the BIOS writer >> to put in something parseable. > > Well, at some point it is only so much we can do, right? > > I mean, if FW says it wants to do firmware-first and we go and adhere > to that, it should be expected that said FW vendor marks the silkscreen > labels and DMI data accordingly. > > I mean, it is time for FW to put its money where its mouth is, no? > > How else would you do this? By thinking a bit more and realizing that what I wrote up above misses that at byte offset 78 in the UEFI memory error section there is "Module Handle" which tells you which SMBIOS entry to use. So this should work just fine (as long as BIOS fills out all these fields ... there's a "Validation Bits" mask at the start of the error structure that says which fields have been populated). -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:02 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-19 16:02 UTC (permalink / raw) To: Luck, Tony, Aristeu Rozanski Cc: rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Tony/Aris, I got yesterday an HP ML350 G9, equipped with Sandy Bridge EP CPUs (E5-2640v4). I'm running Kernel 4.11 there. AFAIKT, Sandy Bridge EP has 4 channels per memory controller, right? That would match the number of memory slots on this machine (24 slots). Yet, EDAC is only identifying 3 channels per CPU: $ ras-mc-ctl --layout +-----------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | -------+-----------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | -------+---------------------------------------------------------------------------+ So, it seems that either the BIOS is hidden the other channel or there's something wrong with SandyBridge EP support at sb_edac driver. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:02 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-19 16:02 UTC (permalink / raw) To: Luck, Tony, Aristeu Rozanski Cc: rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Tony/Aris, I got yesterday an HP ML350 G9, equipped with Sandy Bridge EP CPUs (E5-2640v4). I'm running Kernel 4.11 there. AFAIKT, Sandy Bridge EP has 4 channels per memory controller, right? That would match the number of memory slots on this machine (24 slots). Yet, EDAC is only identifying 3 channels per CPU: $ ras-mc-ctl --layout +-----------------------------------------------------------------------+ | mc0 | mc1 | | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | -------+-----------------------------------------------------------------------+ slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | -------+---------------------------------------------------------------------------+ So, it seems that either the BIOS is hidden the other channel or there's something wrong with SandyBridge EP support at sb_edac driver. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 20:06 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 20:06 UTC (permalink / raw) To: Mauro Carvalho Chehab, Aristeu Rozanski Cc: rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > So, it seems that either the BIOS is hidden the other channel or > there's something wrong with SandyBridge EP support at sb_edac driver. Can you send me the out of "lspci -xxxx" (run as root)? -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 20:06 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-19 20:06 UTC (permalink / raw) To: Mauro Carvalho Chehab, Aristeu Rozanski Cc: rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac > So, it seems that either the BIOS is hidden the other channel or > there's something wrong with SandyBridge EP support at sb_edac driver. Can you send me the out of "lspci -xxxx" (run as root)? -Tony --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 21:15 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 21:15 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Aristeu Rozanski, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 01:02:45PM -0300, Mauro Carvalho Chehab wrote: > Tony/Aris, > > I got yesterday an HP ML350 G9, equipped with Sandy Bridge EP CPUs (E5-2640v4). > I'm running Kernel 4.11 there. > > AFAIKT, Sandy Bridge EP has 4 channels per memory controller, right? > That would match the number of memory slots on this machine (24 slots). > > Yet, EDAC is only identifying 3 channels per CPU: > > $ ras-mc-ctl --layout > +-----------------------------------------------------------------------+ > | mc0 | mc1 | > | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | > -------+-----------------------------------------------------------------------+ > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | > -------+---------------------------------------------------------------------------+ > > So, it seems that either the BIOS is hidden the other channel or > there's something wrong with SandyBridge EP support at sb_edac driver. Does lspci show all four of these devices? include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0 0x3caa /* 15.2 */ include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1 0x3cab /* 15.3 */ include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2 0x3cac /* 15.4 */ include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3 0x3cad /* 15.5 */ There should be two of each (one on bus 7f, the other on bus ff). -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 21:15 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-20 21:15 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Aristeu Rozanski, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 01:02:45PM -0300, Mauro Carvalho Chehab wrote: > Tony/Aris, > > I got yesterday an HP ML350 G9, equipped with Sandy Bridge EP CPUs (E5-2640v4). > I'm running Kernel 4.11 there. > > AFAIKT, Sandy Bridge EP has 4 channels per memory controller, right? > That would match the number of memory slots on this machine (24 slots). > > Yet, EDAC is only identifying 3 channels per CPU: > > $ ras-mc-ctl --layout > +-----------------------------------------------------------------------+ > | mc0 | mc1 | > | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | > -------+-----------------------------------------------------------------------+ > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | > -------+---------------------------------------------------------------------------+ > > So, it seems that either the BIOS is hidden the other channel or > there's something wrong with SandyBridge EP support at sb_edac driver. Does lspci show all four of these devices? include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0 0x3caa /* 15.2 */ include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1 0x3cab /* 15.3 */ include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2 0x3cac /* 15.4 */ include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3 0x3cad /* 15.5 */ There should be two of each (one on bus 7f, the other on bus ff). -Tony --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 0:00 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 0:00 UTC (permalink / raw) To: Luck, Tony Cc: Aristeu Rozanski, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 14:15:54 -0700 "Luck, Tony" <tony.luck@intel.com> escreveu: > On Wed, Jul 19, 2017 at 01:02:45PM -0300, Mauro Carvalho Chehab wrote: > > Tony/Aris, > > > > I got yesterday an HP ML350 G9, equipped with Sandy Bridge EP CPUs (E5-2640v4). > > I'm running Kernel 4.11 there. > > > > AFAIKT, Sandy Bridge EP has 4 channels per memory controller, right? > > That would match the number of memory slots on this machine (24 slots). > > > > Yet, EDAC is only identifying 3 channels per CPU: > > > > $ ras-mc-ctl --layout > > +-----------------------------------------------------------------------+ > > | mc0 | mc1 | > > | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | > > -------+-----------------------------------------------------------------------+ > > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | > > -------+---------------------------------------------------------------------------+ > > > > So, it seems that either the BIOS is hidden the other channel or > > there's something wrong with SandyBridge EP support at sb_edac driver. > > Does lspci show all four of these devices? > > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0 0x3caa /* 15.2 */ > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1 0x3cab /* 15.3 */ > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2 0x3cac /* 15.4 */ > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3 0x3cad /* 15.5 */ > > There should be two of each (one on bus 7f, the other on bus ff). It is getting all 4 TAD devices (Broadwell). This is what I'm getting (from the PCI IDs that it is supposed to be on Broadwell): 00:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management [8086:6f28] (rev 01) 7f:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffc] (rev 01) 7f:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffd] (rev 01) 7f:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 [8086:6fa0] (rev 01) 7f:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6fa8] (rev 01) 7f:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6f71] (rev 01) 7f:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6faa] (rev 01) 7f:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fab] (rev 01) 7f:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fac] (rev 01) 7f:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fad] (rev 01) 7f:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast [8086:6faf] (rev 01) 7f:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Target Address/Thermal/RAS [8086:6f68] (rev 01) 80:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management [8086:6f28] (rev 01) ff:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffc] (rev 01) ff:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffd] (rev 01) ff:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 [8086:6fa0] (rev 01) ff:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6fa8] (rev 01) ff:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6f71] (rev 01) ff:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6faa] (rev 01) ff:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fab] (rev 01) ff:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fac] (rev 01) ff:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fad] (rev 01) ff:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast [8086:6faf] (rev 01) ff:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Target Address/Thermal/RAS [8086:6f68] (rev 01) Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 0:00 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 0:00 UTC (permalink / raw) To: Luck, Tony Cc: Aristeu Rozanski, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 14:15:54 -0700 "Luck, Tony" <tony.luck@intel.com> escreveu: > On Wed, Jul 19, 2017 at 01:02:45PM -0300, Mauro Carvalho Chehab wrote: > > Tony/Aris, > > > > I got yesterday an HP ML350 G9, equipped with Sandy Bridge EP CPUs (E5-2640v4). > > I'm running Kernel 4.11 there. > > > > AFAIKT, Sandy Bridge EP has 4 channels per memory controller, right? > > That would match the number of memory slots on this machine (24 slots). > > > > Yet, EDAC is only identifying 3 channels per CPU: > > > > $ ras-mc-ctl --layout > > +-----------------------------------------------------------------------+ > > | mc0 | mc1 | > > | channel0 | channel1 | channel2 | channel0 | channel1 | channel2 | > > -------+-----------------------------------------------------------------------+ > > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | > > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB | > > -------+---------------------------------------------------------------------------+ > > > > So, it seems that either the BIOS is hidden the other channel or > > there's something wrong with SandyBridge EP support at sb_edac driver. > > Does lspci show all four of these devices? > > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0 0x3caa /* 15.2 */ > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1 0x3cab /* 15.3 */ > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2 0x3cac /* 15.4 */ > include/linux/pci_ids.h:#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3 0x3cad /* 15.5 */ > > There should be two of each (one on bus 7f, the other on bus ff). It is getting all 4 TAD devices (Broadwell). This is what I'm getting (from the PCI IDs that it is supposed to be on Broadwell): 00:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management [8086:6f28] (rev 01) 7f:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffc] (rev 01) 7f:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffd] (rev 01) 7f:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 [8086:6fa0] (rev 01) 7f:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6fa8] (rev 01) 7f:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6f71] (rev 01) 7f:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6faa] (rev 01) 7f:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fab] (rev 01) 7f:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fac] (rev 01) 7f:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fad] (rev 01) 7f:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast [8086:6faf] (rev 01) 7f:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Target Address/Thermal/RAS [8086:6f68] (rev 01) 80:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management [8086:6f28] (rev 01) ff:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffc] (rev 01) ff:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent [8086:6ffd] (rev 01) ff:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 [8086:6fa0] (rev 01) ff:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6fa8] (rev 01) ff:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS [8086:6f71] (rev 01) ff:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6faa] (rev 01) ff:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fab] (rev 01) ff:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fac] (rev 01) ff:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder [8086:6fad] (rev 01) ff:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast [8086:6faf] (rev 01) ff:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Target Address/Thermal/RAS [8086:6f68] (rev 01) Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 16:53 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-21 16:53 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Aristeu Rozanski, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Hmmm so the BIOS isn't hiding any devices. Can you read the MTR registers from each of those target address decoders? for i in a b d c do for j in 0 4 8 do setpci -d 8086:6fa$i 0x8$j.L done done bit 14 is the IS_DIMM_PRESENT one. So you should see values like 001c5050 for populated slots. I see 000f000c for empty slots. -Tony [If you send me "lspci -xxxx" output I can check why the driver isn't reporting the 4th channel] ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 16:53 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-21 16:53 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Aristeu Rozanski, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac Hmmm so the BIOS isn't hiding any devices. Can you read the MTR registers from each of those target address decoders? for i in a b d c do for j in 0 4 8 do setpci -d 8086:6fa$i 0x8$j.L done done bit 14 is the IS_DIMM_PRESENT one. So you should see values like 001c5050 for populated slots. I see 000f000c for empty slots. -Tony [If you send me "lspci -xxxx" output I can check why the driver isn't reporting the 4th channel] --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 21:15 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-19 16:40 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 16:40 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 18:15 -0300, Mauro Carvalho Chehab wrote: > Em Tue, 18 Jul 2017 19:58:54 +0000 : > We had a similar discussion several years ago when I wrote this > driver. On that time, I talked with Red Hat, HP, Dell, Intel people > and with some customers with large clusters. > > The way it is, ghes_edac is a poor man's driver. What it hopefully > provide is a detection that an error happened, without really telling > the user what component should be replaced. "poor man's driver" is a bit misleading, but yes, firmware-first platforms have RAS features built-into the platforms, and they do not need intelligence in EDAC drivers, which may conflict with the platform's RAS features. I cannot speak for other vendors, but HPE platforms log errors and provide FRU info. ghes_edac allows to report errors to OS management tools like rasdaemon in addition to platform- specific managements. > Ok, on machines with their own error reporting mechanism (like > HP servers), a sys admin can look on some proprietary software > (or bios), in order to identify what happened. > > Yet, BIOS doesn't provide any glue about what's the memory > architecture, as it maps memory as if it was a single DIMM memory: > > (from ghes_edac_register) > > layers[0].type = EDAC_MC_LAYER_ALL_MEM; > layers[0].size = num_dimm; > layers[0].is_virt_csrow = true; > > So, even on systems where the BIOS actually knows how the memory > cards are wired, it will mask the memory controller data. > > Now, the EDAC driver can also be used to identify what > channels are used. That helps the sys admin to know if the > memories are connected in a way that it will be using multiple > channels, or not, helping to setup the machine to obtain > the maximum possible performance. > > So, for example, on my Intel-based HP server, I can check > such info with: > > $ ras-mc-ctl --mainboard > ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 > $ ras-mc-ctl --layout > +------------------------------------------------------------- > ----------+ > | mc0 | mc1 > | > | channel0 | channel1 | channel2 | channel0 | channel1 | > channel2 | > -------+------------------------------------------------------------- > ----------+ > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 > MB | 0 MB | > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 > MB | 0 MB | > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 > MB | 16384 MB | > -------+------------------------------------------------------------- > --------------+ > > So, I know that both CPUs will be connected to my memories, and, > on both, it is using 2 channels. > > If I was using the ghes driver, that information would be hidden. > > So, due to all problems with ghes, it is enabled only if there are no > better solution, e. g. on systems where there's no way to talk > directly to the hardware (like on E7 Xeon machines, where the memory > controller is actually on a separate chip that are controlled only by > the BIOS). Thanks for the info! That's very helpful. I will check to see if ghes_edac provides enough info that we need. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:40 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-19 16:40 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 18:15 -0300, Mauro Carvalho Chehab wrote: > Em Tue, 18 Jul 2017 19:58:54 +0000 : > We had a similar discussion several years ago when I wrote this > driver. On that time, I talked with Red Hat, HP, Dell, Intel people > and with some customers with large clusters. > > The way it is, ghes_edac is a poor man's driver. What it hopefully > provide is a detection that an error happened, without really telling > the user what component should be replaced. "poor man's driver" is a bit misleading, but yes, firmware-first platforms have RAS features built-into the platforms, and they do not need intelligence in EDAC drivers, which may conflict with the platform's RAS features. I cannot speak for other vendors, but HPE platforms log errors and provide FRU info. ghes_edac allows to report errors to OS management tools like rasdaemon in addition to platform- specific managements. > Ok, on machines with their own error reporting mechanism (like > HP servers), a sys admin can look on some proprietary software > (or bios), in order to identify what happened. > > Yet, BIOS doesn't provide any glue about what's the memory > architecture, as it maps memory as if it was a single DIMM memory: > > (from ghes_edac_register) > > layers[0].type = EDAC_MC_LAYER_ALL_MEM; > layers[0].size = num_dimm; > layers[0].is_virt_csrow = true; > > So, even on systems where the BIOS actually knows how the memory > cards are wired, it will mask the memory controller data. > > Now, the EDAC driver can also be used to identify what > channels are used. That helps the sys admin to know if the > memories are connected in a way that it will be using multiple > channels, or not, helping to setup the machine to obtain > the maximum possible performance. > > So, for example, on my Intel-based HP server, I can check > such info with: > > $ ras-mc-ctl --mainboard > ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 > $ ras-mc-ctl --layout > +------------------------------------------------------------- > ----------+ > | mc0 | mc1 > | > | channel0 | channel1 | channel2 | channel0 | channel1 | > channel2 | > -------+------------------------------------------------------------- > ----------+ > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 > MB | 0 MB | > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 > MB | 0 MB | > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 > MB | 16384 MB | > -------+------------------------------------------------------------- > --------------+ > > So, I know that both CPUs will be connected to my memories, and, > on both, it is using 2 channels. > > If I was using the ghes driver, that information would be hidden. > > So, due to all problems with ghes, it is enabled only if there are no > better solution, e. g. on systems where there's no way to talk > directly to the hardware (like on E7 Xeon machines, where the memory > controller is actually on a separate chip that are controlled only by > the BIOS). Thanks for the info! That's very helpful. I will check to see if ghes_edac provides enough info that we need. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 16:40 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-19 16:40 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Tue, 2017-07-18 at 18:15 -0300, Mauro Carvalho Chehab wrote: > Em Tue, 18 Jul 2017 19:58:54 +0000 : > We had a similar discussion several years ago when I wrote this > driver. On that time, I talked with Red Hat, HP, Dell, Intel people > and with some customers with large clusters. > > The way it is, ghes_edac is a poor man's driver. What it hopefully > provide is a detection that an error happened, without really telling > the user what component should be replaced. "poor man's driver" is a bit misleading, but yes, firmware-first platforms have RAS features built-into the platforms, and they do not need intelligence in EDAC drivers, which may conflict with the platform's RAS features. I cannot speak for other vendors, but HPE platforms log errors and provide FRU info. ghes_edac allows to report errors to OS management tools like rasdaemon in addition to platform- specific managements. > Ok, on machines with their own error reporting mechanism (like > HP servers), a sys admin can look on some proprietary software > (or bios), in order to identify what happened. > > Yet, BIOS doesn't provide any glue about what's the memory > architecture, as it maps memory as if it was a single DIMM memory: > > (from ghes_edac_register) > > layers[0].type = EDAC_MC_LAYER_ALL_MEM; > layers[0].size = num_dimm; > layers[0].is_virt_csrow = true; > > So, even on systems where the BIOS actually knows how the memory > cards are wired, it will mask the memory controller data. > > Now, the EDAC driver can also be used to identify what > channels are used. That helps the sys admin to know if the > memories are connected in a way that it will be using multiple > channels, or not, helping to setup the machine to obtain > the maximum possible performance. > > So, for example, on my Intel-based HP server, I can check > such info with: > > $ ras-mc-ctl --mainboard > ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9 > $ ras-mc-ctl --layout > +------------------------------------------------------------- > ----------+ > | mc0 | mc1 > | > | channel0 | channel1 | channel2 | channel0 | channel1 | > channel2 | > -------+------------------------------------------------------------- > ----------+ > slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 > MB | 0 MB | > slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 > MB | 0 MB | > slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 > MB | 16384 MB | > -------+------------------------------------------------------------- > --------------+ > > So, I know that both CPUs will be connected to my memories, and, > on both, it is using 2 channels. > > If I was using the ghes driver, that information would be hidden. > > So, due to all problems with ghes, it is enabled only if there are no > better solution, e. g. on systems where there's no way to talk > directly to the hardware (like on E7 Xeon machines, where the memory > controller is actually on a separate chip that are controlled only by > the BIOS). Thanks for the info! That's very helpful. I will check to see if ghes_edac provides enough info that we need. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-19 16:40 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-20 4:33 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:33 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > ghes_edac allows to report errors to OS management tools like > rasdaemon in addition to platform- specific managements. So ghes_edac *is* a poor man's driver in the sense that it doesn't do anything fancy but repeat like a parrot data it has gotten from the firmware and shoving it into the EDAC counters. At least that's the intention. Nothing more. All the action stuff like error detection and recovery should be done by the firmware. But considering how SNAFU'd firmware is, I wouldn't expect any great RAS functionality there. Of course, I'd be delighted to be proven wrong. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 4:33 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:33 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > ghes_edac allows to report errors to OS management tools like > rasdaemon in addition to platform- specific managements. So ghes_edac *is* a poor man's driver in the sense that it doesn't do anything fancy but repeat like a parrot data it has gotten from the firmware and shoving it into the EDAC counters. At least that's the intention. Nothing more. All the action stuff like error detection and recovery should be done by the firmware. But considering how SNAFU'd firmware is, I wouldn't expect any great RAS functionality there. Of course, I'd be delighted to be proven wrong. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 4:33 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-20 4:33 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > ghes_edac allows to report errors to OS management tools like > rasdaemon in addition to platform- specific managements. So ghes_edac *is* a poor man's driver in the sense that it doesn't do anything fancy but repeat like a parrot data it has gotten from the firmware and shoving it into the EDAC counters. At least that's the intention. Nothing more. All the action stuff like error detection and recovery should be done by the firmware. But considering how SNAFU'd firmware is, I wouldn't expect any great RAS functionality there. Of course, I'd be delighted to be proven wrong. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 4:33 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-20 19:50 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-20 19:50 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > > ghes_edac allows to report errors to OS management tools like > > rasdaemon in addition to platform- specific managements. > > So ghes_edac *is* a poor man's driver in the sense that it doesn't do > anything fancy but repeat like a parrot data it has gotten from the > firmware and shoving it into the EDAC counters. At least that's the > intention. Nothing more. Right for ghes_edac. > All the action stuff like error detection and recovery should be done > by the firmware. GHES / firmware-first still requires OS recovery actions when an error cannot be corrected by the platform. They are handled by ghes_proc(), and ghes_edac remains its error-reporting wrapper. > But considering how SNAFU'd firmware is, I wouldn't expect any great > RAS functionality there. Of course, I'd be delighted to be proven > wrong. Firmware has better knowledge about the platform and can provide better RAS when implemented properly. I agree that user experiences may vary on platforms. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 19:50 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-20 19:50 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > > ghes_edac allows to report errors to OS management tools like > > rasdaemon in addition to platform- specific managements. > > So ghes_edac *is* a poor man's driver in the sense that it doesn't do > anything fancy but repeat like a parrot data it has gotten from the > firmware and shoving it into the EDAC counters. At least that's the > intention. Nothing more. Right for ghes_edac. > All the action stuff like error detection and recovery should be done > by the firmware. GHES / firmware-first still requires OS recovery actions when an error cannot be corrected by the platform. They are handled by ghes_proc(), and ghes_edac remains its error-reporting wrapper. > But considering how SNAFU'd firmware is, I wouldn't expect any great > RAS functionality there. Of course, I'd be delighted to be proven > wrong. Firmware has better knowledge about the platform and can provide better RAS when implemented properly. I agree that user experiences may vary on platforms. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 19:50 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-20 19:50 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote: > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > > ghes_edac allows to report errors to OS management tools like > > rasdaemon in addition to platform- specific managements. > > So ghes_edac *is* a poor man's driver in the sense that it doesn't do > anything fancy but repeat like a parrot data it has gotten from the > firmware and shoving it into the EDAC counters. At least that's the > intention. Nothing more. Right for ghes_edac. > All the action stuff like error detection and recovery should be done > by the firmware. GHES / firmware-first still requires OS recovery actions when an error cannot be corrected by the platform. They are handled by ghes_proc(), and ghes_edac remains its error-reporting wrapper. > But considering how SNAFU'd firmware is, I wouldn't expect any great > RAS functionality there. Of course, I'd be delighted to be proven > wrong. Firmware has better knowledge about the platform and can provide better RAS when implemented properly. I agree that user experiences may vary on platforms. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 19:50 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-20 20:15 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-20 20:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 19:50:03 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote: > > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > > > ghes_edac allows to report errors to OS management tools like > > > rasdaemon in addition to platform- specific managements. > > > > So ghes_edac *is* a poor man's driver in the sense that it doesn't do > > anything fancy but repeat like a parrot data it has gotten from the > > firmware and shoving it into the EDAC counters. At least that's the > > intention. Nothing more. > > Right for ghes_edac. > > > All the action stuff like error detection and recovery should be done > > by the firmware. > > GHES / firmware-first still requires OS recovery actions when an error > cannot be corrected by the platform. They are handled by ghes_proc(), > and ghes_edac remains its error-reporting wrapper. > > > But considering how SNAFU'd firmware is, I wouldn't expect any great > > RAS functionality there. Of course, I'd be delighted to be proven > > wrong. > > Firmware has better knowledge about the platform and can provide better > RAS when implemented properly. I agree that user experiences may vary > on platforms. It may have a better knowledge, when the vendor ships different BIOS for platforms with different motherboard silkscreens, but a lot of vendors just use the same BIOS on different models, with the same information at "Locator" and "Bank Locator" data at DMI tables, that don't match what's printed at the board's silkscreen. So, GHES ends by exposing wrong data. Also, such BIOS fail to properly expose such knowledge to drivers/userspace. On the discussions I had with HP, back in 2012, the idea was to try to have some sort of way for the GHES driver to query the BIOS on a reliable way, in order to get its layout, in a way that tools like ras-mc-ctl would properly report the memory configuration (with --layout) and the motherboard silkscreen labels (with --print-labels). Unfortunately, at least on that time, the discussions with HP didn't proceed. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 20:15 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-20 20:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 19:50:03 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote: > > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > > > ghes_edac allows to report errors to OS management tools like > > > rasdaemon in addition to platform- specific managements. > > > > So ghes_edac *is* a poor man's driver in the sense that it doesn't do > > anything fancy but repeat like a parrot data it has gotten from the > > firmware and shoving it into the EDAC counters. At least that's the > > intention. Nothing more. > > Right for ghes_edac. > > > All the action stuff like error detection and recovery should be done > > by the firmware. > > GHES / firmware-first still requires OS recovery actions when an error > cannot be corrected by the platform. They are handled by ghes_proc(), > and ghes_edac remains its error-reporting wrapper. > > > But considering how SNAFU'd firmware is, I wouldn't expect any great > > RAS functionality there. Of course, I'd be delighted to be proven > > wrong. > > Firmware has better knowledge about the platform and can provide better > RAS when implemented properly. I agree that user experiences may vary > on platforms. It may have a better knowledge, when the vendor ships different BIOS for platforms with different motherboard silkscreens, but a lot of vendors just use the same BIOS on different models, with the same information at "Locator" and "Bank Locator" data at DMI tables, that don't match what's printed at the board's silkscreen. So, GHES ends by exposing wrong data. Also, such BIOS fail to properly expose such knowledge to drivers/userspace. On the discussions I had with HP, back in 2012, the idea was to try to have some sort of way for the GHES driver to query the BIOS on a reliable way, in order to get its layout, in a way that tools like ras-mc-ctl would properly report the memory configuration (with --layout) and the motherboard silkscreen labels (with --print-labels). Unfortunately, at least on that time, the discussions with HP didn't proceed. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 20:15 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-20 20:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Thu, 20 Jul 2017 19:50:03 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote: > > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote: > > > ghes_edac allows to report errors to OS management tools like > > > rasdaemon in addition to platform- specific managements. > > > > So ghes_edac *is* a poor man's driver in the sense that it doesn't do > > anything fancy but repeat like a parrot data it has gotten from the > > firmware and shoving it into the EDAC counters. At least that's the > > intention. Nothing more. > > Right for ghes_edac. > > > All the action stuff like error detection and recovery should be done > > by the firmware. > > GHES / firmware-first still requires OS recovery actions when an error > cannot be corrected by the platform. They are handled by ghes_proc(), > and ghes_edac remains its error-reporting wrapper. > > > But considering how SNAFU'd firmware is, I wouldn't expect any great > > RAS functionality there. Of course, I'd be delighted to be proven > > wrong. > > Firmware has better knowledge about the platform and can provide better > RAS when implemented properly. I agree that user experiences may vary > on platforms. It may have a better knowledge, when the vendor ships different BIOS for platforms with different motherboard silkscreens, but a lot of vendors just use the same BIOS on different models, with the same information at "Locator" and "Bank Locator" data at DMI tables, that don't match what's printed at the board's silkscreen. So, GHES ends by exposing wrong data. Also, such BIOS fail to properly expose such knowledge to drivers/userspace. On the discussions I had with HP, back in 2012, the idea was to try to have some sort of way for the GHES driver to query the BIOS on a reliable way, in order to get its layout, in a way that tools like ras-mc-ctl would properly report the memory configuration (with --layout) and the motherboard silkscreen labels (with --print-labels). Unfortunately, at least on that time, the discussions with HP didn't proceed. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 20:15 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-20 21:07 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-20 21:07 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 17:15 -0300, Mauro Carvalho Chehab wrote: > Em Thu, 20 Jul 2017 19:50:03 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: : > > Firmware has better knowledge about the platform and can provide > > better RAS when implemented properly. I agree that user > > experiences may vary on platforms. > > It may have a better knowledge, when the vendor ships different BIOS > for platforms with different motherboard silkscreens, but a lot of > vendors just use the same BIOS on different models, with the same > information at "Locator" and "Bank Locator" data at DMI tables, > that don't match what's printed at the board's silkscreen. > > So, GHES ends by exposing wrong data. Also, such BIOS fail > to properly expose such knowledge to drivers/userspace. I see. Yeah, I can see such problems could be overlooked since normal tests run just fine even if there is a mismatch in such info... > On the discussions I had with HP, back in 2012, the idea was to try > to have some sort of way for the GHES driver to query the BIOS > on a reliable way, in order to get its layout, in a way > that tools like ras-mc-ctl would properly report the memory > configuration (with --layout) and the motherboard silkscreen > labels (with --print-labels). Unfortunately, at least on that > time, the discussions with HP didn't proceed. Thanks for the info. I hope we can enable it this time around. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 21:07 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-20 21:07 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 17:15 -0300, Mauro Carvalho Chehab wrote: > Em Thu, 20 Jul 2017 19:50:03 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: : > > Firmware has better knowledge about the platform and can provide > > better RAS when implemented properly. I agree that user > > experiences may vary on platforms. > > It may have a better knowledge, when the vendor ships different BIOS > for platforms with different motherboard silkscreens, but a lot of > vendors just use the same BIOS on different models, with the same > information at "Locator" and "Bank Locator" data at DMI tables, > that don't match what's printed at the board's silkscreen. > > So, GHES ends by exposing wrong data. Also, such BIOS fail > to properly expose such knowledge to drivers/userspace. I see. Yeah, I can see such problems could be overlooked since normal tests run just fine even if there is a mismatch in such info... > On the discussions I had with HP, back in 2012, the idea was to try > to have some sort of way for the GHES driver to query the BIOS > on a reliable way, in order to get its layout, in a way > that tools like ras-mc-ctl would properly report the memory > configuration (with --layout) and the motherboard silkscreen > labels (with --print-labels). Unfortunately, at least on that > time, the discussions with HP didn't proceed. Thanks for the info. I hope we can enable it this time around. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-20 21:07 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-20 21:07 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Thu, 2017-07-20 at 17:15 -0300, Mauro Carvalho Chehab wrote: > Em Thu, 20 Jul 2017 19:50:03 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: : > > Firmware has better knowledge about the platform and can provide > > better RAS when implemented properly. I agree that user > > experiences may vary on platforms. > > It may have a better knowledge, when the vendor ships different BIOS > for platforms with different motherboard silkscreens, but a lot of > vendors just use the same BIOS on different models, with the same > information at "Locator" and "Bank Locator" data at DMI tables, > that don't match what's printed at the board's silkscreen. > > So, GHES ends by exposing wrong data. Also, such BIOS fail > to properly expose such knowledge to drivers/userspace. I see. Yeah, I can see such problems could be overlooked since normal tests run just fine even if there is a mismatch in such info... > On the discussions I had with HP, back in 2012, the idea was to try > to have some sort of way for the GHES driver to query the BIOS > on a reliable way, in order to get its layout, in a way > that tools like ras-mc-ctl would properly report the memory > configuration (with --layout) and the motherboard silkscreen > labels (with --print-labels). Unfortunately, at least on that > time, the discussions with HP didn't proceed. Thanks for the info. I hope we can enable it this time around. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-20 19:50 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-21 13:34 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 13:34 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 07:50:03PM +0000, Kani, Toshimitsu wrote: > GHES / firmware-first still requires OS recovery actions when an error > cannot be corrected by the platform. They are handled by ghes_proc(), > and ghes_edac remains its error-reporting wrapper. I mean all the recovery actions the firmware does because it gets to see the error first. Otherwise, Firmware First is the the dumbest repeater layer in the history of layers. > Firmware has better knowledge about the platform and can provide better > RAS when implemented properly. s/when/if/ -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 13:34 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 13:34 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 07:50:03PM +0000, Kani, Toshimitsu wrote: > GHES / firmware-first still requires OS recovery actions when an error > cannot be corrected by the platform. They are handled by ghes_proc(), > and ghes_edac remains its error-reporting wrapper. I mean all the recovery actions the firmware does because it gets to see the error first. Otherwise, Firmware First is the the dumbest repeater layer in the history of layers. > Firmware has better knowledge about the platform and can provide better > RAS when implemented properly. s/when/if/ ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 13:34 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 13:34 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Thu, Jul 20, 2017 at 07:50:03PM +0000, Kani, Toshimitsu wrote: > GHES / firmware-first still requires OS recovery actions when an error > cannot be corrected by the platform. They are handled by ghes_proc(), > and ghes_edac remains its error-reporting wrapper. I mean all the recovery actions the firmware does because it gets to see the error first. Otherwise, Firmware First is the the dumbest repeater layer in the history of layers. > Firmware has better knowledge about the platform and can provide better > RAS when implemented properly. s/when/if/ -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 13:34 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-21 13:40 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 13:40 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 15:34:41 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 20, 2017 at 07:50:03PM +0000, Kani, Toshimitsu wrote: > > GHES / firmware-first still requires OS recovery actions when an error > > cannot be corrected by the platform. They are handled by ghes_proc(), > > and ghes_edac remains its error-reporting wrapper. What happens when the error can be corrected? Does it still report it to userspace, or just silently hide the error? If I remember well about a past discussion with some vendor, I was told that the firmware can hide some errors from being reported. Is it still the case? Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 13:40 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 13:40 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 15:34:41 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 20, 2017 at 07:50:03PM +0000, Kani, Toshimitsu wrote: > > GHES / firmware-first still requires OS recovery actions when an error > > cannot be corrected by the platform. They are handled by ghes_proc(), > > and ghes_edac remains its error-reporting wrapper. What happens when the error can be corrected? Does it still report it to userspace, or just silently hide the error? If I remember well about a past discussion with some vendor, I was told that the firmware can hide some errors from being reported. Is it still the case? Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 13:40 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 13:40 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 15:34:41 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 20, 2017 at 07:50:03PM +0000, Kani, Toshimitsu wrote: > > GHES / firmware-first still requires OS recovery actions when an error > > cannot be corrected by the platform. They are handled by ghes_proc(), > > and ghes_edac remains its error-reporting wrapper. What happens when the error can be corrected? Does it still report it to userspace, or just silently hide the error? If I remember well about a past discussion with some vendor, I was told that the firmware can hide some errors from being reported. Is it still the case? Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 13:40 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-21 13:47 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 13:47 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 10:40:01AM -0300, Mauro Carvalho Chehab wrote: > What happens when the error can be corrected? Does it still report it to > userspace, or just silently hide the error? > > If I remember well about a past discussion with some vendor, I was told > that the firmware can hide some errors from being reported. Is it > still the case? I've heard the same thing but I have no idea what they're actually doing. But it would make sense because the intention is not to worry users unnecessarily if it can hide the error and if there are no adverse consequences from it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 13:47 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 13:47 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 10:40:01AM -0300, Mauro Carvalho Chehab wrote: > What happens when the error can be corrected? Does it still report it to > userspace, or just silently hide the error? > > If I remember well about a past discussion with some vendor, I was told > that the firmware can hide some errors from being reported. Is it > still the case? I've heard the same thing but I have no idea what they're actually doing. But it would make sense because the intention is not to worry users unnecessarily if it can hide the error and if there are no adverse consequences from it. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 13:47 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 13:47 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 10:40:01AM -0300, Mauro Carvalho Chehab wrote: > What happens when the error can be corrected? Does it still report it to > userspace, or just silently hide the error? > > If I remember well about a past discussion with some vendor, I was told > that the firmware can hide some errors from being reported. Is it > still the case? I've heard the same thing but I have no idea what they're actually doing. But it would make sense because the intention is not to worry users unnecessarily if it can hide the error and if there are no adverse consequences from it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 13:47 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-21 15:08 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 15:08 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 15:47 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 10:40:01AM -0300, Mauro Carvalho Chehab > wrote: > > What happens when the error can be corrected? Does it still report > > it to userspace, or just silently hide the error? > > > > If I remember well about a past discussion with some vendor, I was > > told that the firmware can hide some errors from being reported. Is > > it still the case? > > I've heard the same thing but I have no idea what they're actually > doing. But it would make sense because the intention is not to worry > users unnecessarily if it can hide the error and if there are no > adverse consequences from it. Yes, that is correct. Corrected errors are reported to the OS when they exceeded the platform's threshold. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:08 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-21 15:08 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 15:47 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 10:40:01AM -0300, Mauro Carvalho Chehab > wrote: > > What happens when the error can be corrected? Does it still report > > it to userspace, or just silently hide the error? > > > > If I remember well about a past discussion with some vendor, I was > > told that the firmware can hide some errors from being reported. Is > > it still the case? > > I've heard the same thing but I have no idea what they're actually > doing. But it would make sense because the intention is not to worry > users unnecessarily if it can hide the error and if there are no > adverse consequences from it. Yes, that is correct. Corrected errors are reported to the OS when they exceeded the platform's threshold. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:08 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 15:08 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 15:47 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 10:40:01AM -0300, Mauro Carvalho Chehab > wrote: > > What happens when the error can be corrected? Does it still report > > it to userspace, or just silently hide the error? > > > > If I remember well about a past discussion with some vendor, I was > > told that the firmware can hide some errors from being reported. Is > > it still the case? > > I've heard the same thing but I have no idea what they're actually > doing. But it would make sense because the intention is not to worry > users unnecessarily if it can hide the error and if there are no > adverse consequences from it. Yes, that is correct. Corrected errors are reported to the OS when they exceeded the platform's threshold. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 15:08 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-21 15:13 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 15:13 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > Yes, that is correct. Corrected errors are reported to the OS when > they exceeded the platform's threshold. Are those thresholds user-configurable? If not, what are you telling users who want to see *every* corrected error for measuring DIMM wear and so on...? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:13 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 15:13 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > Yes, that is correct. Corrected errors are reported to the OS when > they exceeded the platform's threshold. Are those thresholds user-configurable? If not, what are you telling users who want to see *every* corrected error for measuring DIMM wear and so on...? ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:13 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 15:13 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > Yes, that is correct. Corrected errors are reported to the OS when > they exceeded the platform's threshold. Are those thresholds user-configurable? If not, what are you telling users who want to see *every* corrected error for measuring DIMM wear and so on...? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 15:13 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-21 15:34 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 15:34 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > > Yes, that is correct. Corrected errors are reported to the OS when > > they exceeded the platform's threshold. > > Are those thresholds user-configurable? I suppose it'd depend on vendors, but I do not think users can do it properly unless they have depth knowledge about the hardware. > If not, what are you telling users who want to see *every* corrected > error for measuring DIMM wear and so on...? Corrected errors are normal and expected to occur on healthy hardware. They do not need user's attention until they repeatedly occurred at a same place. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:34 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-21 15:34 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac T24gRnJpLCAyMDE3LTA3LTIxIGF0IDE3OjEzICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6 DQo+IE9uIEZyaSwgSnVsIDIxLCAyMDE3IGF0IDAzOjA4OjQxUE0gKzAwMDAsIEthbmksIFRvc2hp bWl0c3Ugd3JvdGU6DQo+ID4gWWVzLCB0aGF0IGlzIGNvcnJlY3QuwqDCoENvcnJlY3RlZCBlcnJv cnMgYXJlIHJlcG9ydGVkIHRvIHRoZSBPUyB3aGVuDQo+ID4gdGhleSBleGNlZWRlZCB0aGUgcGxh dGZvcm0ncyB0aHJlc2hvbGQuDQo+IA0KPiBBcmUgdGhvc2UgdGhyZXNob2xkcyB1c2VyLWNvbmZp Z3VyYWJsZT8NCg0KSSBzdXBwb3NlIGl0J2QgZGVwZW5kIG9uIHZlbmRvcnMsIGJ1dCBJIGRvIG5v dCB0aGluayB1c2VycyBjYW4gZG8gaXQNCnByb3Blcmx5IHVubGVzcyB0aGV5IGhhdmUgZGVwdGgg a25vd2xlZGdlIGFib3V0IHRoZSBoYXJkd2FyZS4NCg0KPiBJZiBub3QsIHdoYXQgYXJlIHlvdSB0 ZWxsaW5nIHVzZXJzIHdobyB3YW50IHRvIHNlZSAqZXZlcnkqIGNvcnJlY3RlZA0KPiBlcnJvciBm b3IgbWVhc3VyaW5nIERJTU0gd2VhciBhbmQgc28gb24uLi4/DQoNCkNvcnJlY3RlZCBlcnJvcnMg YXJlIG5vcm1hbCBhbmQgZXhwZWN0ZWQgdG8gb2NjdXIgb24gaGVhbHRoeSBoYXJkd2FyZS4gDQpU aGV5IGRvIG5vdCBuZWVkIHVzZXIncyBhdHRlbnRpb24gdW50aWwgdGhleSByZXBlYXRlZGx5IG9j Y3VycmVkIGF0IGENCnNhbWUgcGxhY2UuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:34 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 15:34 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > > Yes, that is correct. Corrected errors are reported to the OS when > > they exceeded the platform's threshold. > > Are those thresholds user-configurable? I suppose it'd depend on vendors, but I do not think users can do it properly unless they have depth knowledge about the hardware. > If not, what are you telling users who want to see *every* corrected > error for measuring DIMM wear and so on...? Corrected errors are normal and expected to occur on healthy hardware. They do not need user's attention until they repeatedly occurred at a same place. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 15:34 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-21 15:44 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 15:44 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 15:34:50 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > > > Yes, that is correct. Corrected errors are reported to the OS when > > > they exceeded the platform's threshold. > > > > Are those thresholds user-configurable? > > I suppose it'd depend on vendors, but I do not think users can do it > properly unless they have depth knowledge about the hardware. > > > If not, what are you telling users who want to see *every* corrected > > error for measuring DIMM wear and so on...? > > Corrected errors are normal and expected to occur on healthy hardware. > They do not need user's attention until they repeatedly occurred at a > same place. Yes, they're expected to happen. Still, some sys admins have their own measurements about what's "normal" for their scenario, and want to monitor every single corrected error, running their own algorithm to warn if the number of corrected errors is above their "normal" rate. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:44 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 15:44 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 15:34:50 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > > > Yes, that is correct. Corrected errors are reported to the OS when > > > they exceeded the platform's threshold. > > > > Are those thresholds user-configurable? > > I suppose it'd depend on vendors, but I do not think users can do it > properly unless they have depth knowledge about the hardware. > > > If not, what are you telling users who want to see *every* corrected > > error for measuring DIMM wear and so on...? > > Corrected errors are normal and expected to occur on healthy hardware. > They do not need user's attention until they repeatedly occurred at a > same place. Yes, they're expected to happen. Still, some sys admins have their own measurements about what's "normal" for their scenario, and want to monitor every single corrected error, running their own algorithm to warn if the number of corrected errors is above their "normal" rate. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:44 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 15:44 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 15:34:50 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu wrote: > > > Yes, that is correct. Corrected errors are reported to the OS when > > > they exceeded the platform's threshold. > > > > Are those thresholds user-configurable? > > I suppose it'd depend on vendors, but I do not think users can do it > properly unless they have depth knowledge about the hardware. > > > If not, what are you telling users who want to see *every* corrected > > error for measuring DIMM wear and so on...? > > Corrected errors are normal and expected to occur on healthy hardware. > They do not need user's attention until they repeatedly occurred at a > same place. Yes, they're expected to happen. Still, some sys admins have their own measurements about what's "normal" for their scenario, and want to monitor every single corrected error, running their own algorithm to warn if the number of corrected errors is above their "normal" rate. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 15:44 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-21 16:40 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 16:40 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > Em Fri, 21 Jul 2017 15:34:50 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > wrote: > > > > Yes, that is correct. Corrected errors are reported to the OS > > > > when they exceeded the platform's threshold. > > > > > > Are those thresholds user-configurable? > > > > I suppose it'd depend on vendors, but I do not think users can do > > it properly unless they have depth knowledge about the hardware. > > > > > If not, what are you telling users who want to see *every* > > > corrected error for measuring DIMM wear and so on...? > > > > Corrected errors are normal and expected to occur on healthy > > hardware. They do not need user's attention until they repeatedly > > occurred at a same place. > > Yes, they're expected to happen. Still, some sys admins have their > own measurements about what's "normal" for their scenario, and want > to monitor every single corrected error, running their own > algorithm to warn if the number of corrected errors is above their > "normal" rate. I suppose these admins had to do it because their platforms reported all corrected errors. It addresses such administrators' burden. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 16:40 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-21 16:40 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > Em Fri, 21 Jul 2017 15:34:50 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > wrote: > > > > Yes, that is correct. Corrected errors are reported to the OS > > > > when they exceeded the platform's threshold. > > > > > > Are those thresholds user-configurable? > > > > I suppose it'd depend on vendors, but I do not think users can do > > it properly unless they have depth knowledge about the hardware. > > > > > If not, what are you telling users who want to see *every* > > > corrected error for measuring DIMM wear and so on...? > > > > Corrected errors are normal and expected to occur on healthy > > hardware. They do not need user's attention until they repeatedly > > occurred at a same place. > > Yes, they're expected to happen. Still, some sys admins have their > own measurements about what's "normal" for their scenario, and want > to monitor every single corrected error, running their own > algorithm to warn if the number of corrected errors is above their > "normal" rate. I suppose these admins had to do it because their platforms reported all corrected errors. It addresses such administrators' burden. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 16:40 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 16:40 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > Em Fri, 21 Jul 2017 15:34:50 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > wrote: > > > > Yes, that is correct. Corrected errors are reported to the OS > > > > when they exceeded the platform's threshold. > > > > > > Are those thresholds user-configurable? > > > > I suppose it'd depend on vendors, but I do not think users can do > > it properly unless they have depth knowledge about the hardware. > > > > > If not, what are you telling users who want to see *every* > > > corrected error for measuring DIMM wear and so on...? > > > > Corrected errors are normal and expected to occur on healthy > > hardware. They do not need user's attention until they repeatedly > > occurred at a same place. > > Yes, they're expected to happen. Still, some sys admins have their > own measurements about what's "normal" for their scenario, and want > to monitor every single corrected error, running their own > algorithm to warn if the number of corrected errors is above their > "normal" rate. I suppose these admins had to do it because their platforms reported all corrected errors. It addresses such administrators' burden. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 16:40 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-21 17:01 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 17:01 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 16:40:20 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > > Em Fri, 21 Jul 2017 15:34:50 +0000 > > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > > wrote: > > > > > Yes, that is correct. Corrected errors are reported to the OS > > > > > when they exceeded the platform's threshold. > > > > > > > > Are those thresholds user-configurable? > > > > > > I suppose it'd depend on vendors, but I do not think users can do > > > it properly unless they have depth knowledge about the hardware. > > > > > > > If not, what are you telling users who want to see *every* > > > > corrected error for measuring DIMM wear and so on...? > > > > > > Corrected errors are normal and expected to occur on healthy > > > hardware. They do not need user's attention until they repeatedly > > > occurred at a same place. > > > > Yes, they're expected to happen. Still, some sys admins have their > > own measurements about what's "normal" for their scenario, and want > > to monitor every single corrected error, running their own > > algorithm to warn if the number of corrected errors is above their > > "normal" rate. > > I suppose these admins had to do it because their platforms reported > all corrected errors. It addresses such administrators' burden. I see the value of having a threshold in BIOS, provided that it is well documented, and whose value can be adjusted, if needed. One of the things I wanted to implement in ras-daemon were an algorithm that would be doing such threshold in software. The problem is that it would require field experience. So, I talked with a few vendors, to see if they could help doing it, but, on that time, none rised their hands :-) The thing with a BIOS threshold is that the user has no way to audit the algorithm. So, when BIOS start reporting such errors, it may be already too late: the systems may be in the verge of losing data (or some data was already lost). That's critical on cluster systems with thousands of machines: while the impact of disabling a cluster node to do some maintainance is marginal, the impact of an uncorrected error on a single machine may compromise weeks of expensive processing. That's why some users prefer to monitor every single corrected error, and compare with the probability distribution they know that the risk of uncorrected errors is acceptable. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 17:01 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 17:01 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 16:40:20 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > > Em Fri, 21 Jul 2017 15:34:50 +0000 > > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > > wrote: > > > > > Yes, that is correct. Corrected errors are reported to the OS > > > > > when they exceeded the platform's threshold. > > > > > > > > Are those thresholds user-configurable? > > > > > > I suppose it'd depend on vendors, but I do not think users can do > > > it properly unless they have depth knowledge about the hardware. > > > > > > > If not, what are you telling users who want to see *every* > > > > corrected error for measuring DIMM wear and so on...? > > > > > > Corrected errors are normal and expected to occur on healthy > > > hardware. They do not need user's attention until they repeatedly > > > occurred at a same place. > > > > Yes, they're expected to happen. Still, some sys admins have their > > own measurements about what's "normal" for their scenario, and want > > to monitor every single corrected error, running their own > > algorithm to warn if the number of corrected errors is above their > > "normal" rate. > > I suppose these admins had to do it because their platforms reported > all corrected errors. It addresses such administrators' burden. I see the value of having a threshold in BIOS, provided that it is well documented, and whose value can be adjusted, if needed. One of the things I wanted to implement in ras-daemon were an algorithm that would be doing such threshold in software. The problem is that it would require field experience. So, I talked with a few vendors, to see if they could help doing it, but, on that time, none rised their hands :-) The thing with a BIOS threshold is that the user has no way to audit the algorithm. So, when BIOS start reporting such errors, it may be already too late: the systems may be in the verge of losing data (or some data was already lost). That's critical on cluster systems with thousands of machines: while the impact of disabling a cluster node to do some maintainance is marginal, the impact of an uncorrected error on a single machine may compromise weeks of expensive processing. That's why some users prefer to monitor every single corrected error, and compare with the probability distribution they know that the risk of uncorrected errors is acceptable. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 17:01 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-21 17:01 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac Em Fri, 21 Jul 2017 16:40:20 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > > Em Fri, 21 Jul 2017 15:34:50 +0000 > > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > > wrote: > > > > > Yes, that is correct. Corrected errors are reported to the OS > > > > > when they exceeded the platform's threshold. > > > > > > > > Are those thresholds user-configurable? > > > > > > I suppose it'd depend on vendors, but I do not think users can do > > > it properly unless they have depth knowledge about the hardware. > > > > > > > If not, what are you telling users who want to see *every* > > > > corrected error for measuring DIMM wear and so on...? > > > > > > Corrected errors are normal and expected to occur on healthy > > > hardware. They do not need user's attention until they repeatedly > > > occurred at a same place. > > > > Yes, they're expected to happen. Still, some sys admins have their > > own measurements about what's "normal" for their scenario, and want > > to monitor every single corrected error, running their own > > algorithm to warn if the number of corrected errors is above their > > "normal" rate. > > I suppose these admins had to do it because their platforms reported > all corrected errors. It addresses such administrators' burden. I see the value of having a threshold in BIOS, provided that it is well documented, and whose value can be adjusted, if needed. One of the things I wanted to implement in ras-daemon were an algorithm that would be doing such threshold in software. The problem is that it would require field experience. So, I talked with a few vendors, to see if they could help doing it, but, on that time, none rised their hands :-) The thing with a BIOS threshold is that the user has no way to audit the algorithm. So, when BIOS start reporting such errors, it may be already too late: the systems may be in the verge of losing data (or some data was already lost). That's critical on cluster systems with thousands of machines: while the impact of disabling a cluster node to do some maintainance is marginal, the impact of an uncorrected error on a single machine may compromise weeks of expensive processing. That's why some users prefer to monitor every single corrected error, and compare with the probability distribution they know that the risk of uncorrected errors is acceptable. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 17:01 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-21 17:21 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 17:21 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 14:01 -0300, Mauro Carvalho Chehab wrote: > Em Fri, 21 Jul 2017 16:40:20 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > > > Em Fri, 21 Jul 2017 15:34:50 +0000 > > > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > > > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > > > wrote: > > > > > > Yes, that is correct. Corrected errors are reported to the > > > > > > OS when they exceeded the platform's threshold. > > > > > > > > > > Are those thresholds user-configurable? > > > > > > > > I suppose it'd depend on vendors, but I do not think users can > > > > do it properly unless they have depth knowledge about the > > > > hardware. > > > > > > > > > If not, what are you telling users who want to see *every* > > > > > corrected error for measuring DIMM wear and so on...? > > > > > > > > Corrected errors are normal and expected to occur on healthy > > > > hardware. They do not need user's attention until they > > > > repeatedly occurred at a same place. > > > > > > Yes, they're expected to happen. Still, some sys admins have > > > their own measurements about what's "normal" for their scenario, > > > and want to monitor every single corrected error, running their > > > own algorithm to warn if the number of corrected errors is above > > > their "normal" rate. > > > > I suppose these admins had to do it because their platforms > > reported all corrected errors. It addresses such administrators' > > burden. > > I see the value of having a threshold in BIOS, provided that it is > well documented, and whose value can be adjusted, if needed. > > One of the things I wanted to implement in ras-daemon were an > algorithm that would be doing such threshold in software. > The problem is that it would require field experience. So, > I talked with a few vendors, to see if they could help doing > it, but, on that time, none rised their hands :-) I think it'd be very hard to keep it up to date. > The thing with a BIOS threshold is that the user has no way to > audit the algorithm. So, when BIOS start reporting such errors, > it may be already too late: the systems may be in the verge of > losing data (or some data was already lost). > > That's critical on cluster systems with thousands of machines: > while the impact of disabling a cluster node to do some maintainance > is marginal, the impact of an uncorrected error on a single > machine may compromise weeks of expensive processing. > > That's why some users prefer to monitor every single corrected > error, and compare with the probability distribution they > know that the risk of uncorrected errors is acceptable. Right, I do not think all platforms need to be firmware-first. I do not want to talk like a sale's person, but we also offer lower-cost platforms that do not come with built-in RAS. Users can choose a right model for their needs. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 17:21 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-21 17:21 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac T24gRnJpLCAyMDE3LTA3LTIxIGF0IDE0OjAxIC0wMzAwLCBNYXVybyBDYXJ2YWxobyBDaGVoYWIg d3JvdGU6DQo+IEVtIEZyaSwgMjEgSnVsIDIwMTcgMTY6NDA6MjAgKzAwMDANCj4gIkthbmksIFRv c2hpbWl0c3UiIDx0b3NoaS5rYW5pQGhwZS5jb20+IGVzY3JldmV1Og0KPiANCj4gPiBPbiBGcmks IDIwMTctMDctMjEgYXQgMTI6NDQgLTAzMDAsIE1hdXJvIENhcnZhbGhvIENoZWhhYiB3cm90ZToN Cj4gPiA+IEVtIEZyaSwgMjEgSnVsIDIwMTcgMTU6MzQ6NTAgKzAwMDANCj4gPiA+ICJLYW5pLCBU b3NoaW1pdHN1IiA8dG9zaGkua2FuaUBocGUuY29tPiBlc2NyZXZldToNCj4gPiA+IMKgwqANCj4g PiA+ID4gT24gRnJpLCAyMDE3LTA3LTIxIGF0IDE3OjEzICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Yg d3JvdGU6wqDCoA0KPiA+ID4gPiA+IE9uIEZyaSwgSnVsIDIxLCAyMDE3IGF0IDAzOjA4OjQxUE0g KzAwMDAsIEthbmksIFRvc2hpbWl0c3UNCj4gPiA+ID4gPiB3cm90ZTrCoMKgwqDCoA0KPiA+ID4g PiA+ID4gWWVzLCB0aGF0IGlzIGNvcnJlY3QuwqDCoENvcnJlY3RlZCBlcnJvcnMgYXJlIHJlcG9y dGVkIHRvIHRoZQ0KPiA+ID4gPiA+ID4gT1Mgd2hlbiB0aGV5IGV4Y2VlZGVkIHRoZSBwbGF0Zm9y bSdzIHRocmVzaG9sZC7CoMKgwqDCoA0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+IEFyZSB0aG9zZSB0 aHJlc2hvbGRzIHVzZXItY29uZmlndXJhYmxlP8KgwqDCoMKgDQo+ID4gPiA+IA0KPiA+ID4gPiBJ IHN1cHBvc2UgaXQnZCBkZXBlbmQgb24gdmVuZG9ycywgYnV0IEkgZG8gbm90IHRoaW5rIHVzZXJz IGNhbg0KPiA+ID4gPiBkbyBpdCBwcm9wZXJseSB1bmxlc3MgdGhleSBoYXZlIGRlcHRoIGtub3ds ZWRnZSBhYm91dCB0aGUNCj4gPiA+ID4gaGFyZHdhcmUuDQo+ID4gPiA+IMKgwqANCj4gPiA+ID4g PiBJZiBub3QsIHdoYXQgYXJlIHlvdSB0ZWxsaW5nIHVzZXJzIHdobyB3YW50IHRvIHNlZSAqZXZl cnkqDQo+ID4gPiA+ID4gY29ycmVjdGVkIGVycm9yIGZvciBtZWFzdXJpbmcgRElNTSB3ZWFyIGFu ZCBzbyBvbi4uLj/CoMKgwqDCoA0KPiA+ID4gPiANCj4gPiA+ID4gQ29ycmVjdGVkIGVycm9ycyBh cmUgbm9ybWFsIGFuZCBleHBlY3RlZCB0byBvY2N1ciBvbiBoZWFsdGh5DQo+ID4gPiA+IGhhcmR3 YXJlLsKgwqBUaGV5IGRvIG5vdCBuZWVkIHVzZXIncyBhdHRlbnRpb24gdW50aWwgdGhleQ0KPiA+ ID4gPiByZXBlYXRlZGx5IG9jY3VycmVkIGF0IGEgc2FtZSBwbGFjZS7CoMKgDQo+ID4gPiANCj4g PiA+IFllcywgdGhleSdyZSBleHBlY3RlZCB0byBoYXBwZW4uIFN0aWxsLCBzb21lIHN5cyBhZG1p bnMgaGF2ZQ0KPiA+ID4gdGhlaXIgb3duIG1lYXN1cmVtZW50cyBhYm91dCB3aGF0J3MgIm5vcm1h bCIgZm9yIHRoZWlyIHNjZW5hcmlvLA0KPiA+ID4gYW5kIHdhbnQgdG8gbW9uaXRvciBldmVyeSBz aW5nbGUgY29ycmVjdGVkIGVycm9yLCBydW5uaW5nIHRoZWlyDQo+ID4gPiBvd24gYWxnb3JpdGht IHRvIHdhcm4gaWYgdGhlIG51bWJlciBvZiBjb3JyZWN0ZWQgZXJyb3JzIGlzIGFib3ZlDQo+ID4g PiB0aGVpciAibm9ybWFsIiByYXRlLsKgwqANCj4gPiANCj4gPiBJIHN1cHBvc2UgdGhlc2UgYWRt aW5zIGhhZCB0byBkbyBpdCBiZWNhdXNlIHRoZWlyIHBsYXRmb3Jtcw0KPiA+IHJlcG9ydGVkIGFs bCBjb3JyZWN0ZWQgZXJyb3JzLsKgwqBJdCBhZGRyZXNzZXMgc3VjaCBhZG1pbmlzdHJhdG9ycycN Cj4gPiBidXJkZW4uDQo+IA0KPiBJIHNlZSB0aGUgdmFsdWUgb2YgaGF2aW5nIGEgdGhyZXNob2xk IGluIEJJT1MsIHByb3ZpZGVkIHRoYXQgaXQgaXMNCj4gd2VsbCBkb2N1bWVudGVkLCBhbmQgd2hv c2UgdmFsdWUgY2FuIGJlIGFkanVzdGVkLCBpZiBuZWVkZWQuDQo+IA0KPiBPbmUgb2YgdGhlIHRo aW5ncyBJIHdhbnRlZCB0byBpbXBsZW1lbnQgaW4gcmFzLWRhZW1vbiB3ZXJlIGFuDQo+IGFsZ29y aXRobSB0aGF0IHdvdWxkIGJlIGRvaW5nIHN1Y2ggdGhyZXNob2xkIGluIHNvZnR3YXJlLg0KPiBU aGUgcHJvYmxlbSBpcyB0aGF0IGl0IHdvdWxkIHJlcXVpcmUgZmllbGQgZXhwZXJpZW5jZS4gU28s DQo+IEkgdGFsa2VkIHdpdGggYSBmZXcgdmVuZG9ycywgdG8gc2VlIGlmIHRoZXkgY291bGQgaGVs cCBkb2luZw0KPiBpdCwgYnV0LCBvbiB0aGF0IHRpbWUsIG5vbmUgcmlzZWQgdGhlaXIgaGFuZHMg Oi0pDQoNCkkgdGhpbmsgaXQnZCBiZSB2ZXJ5IGhhcmQgdG8ga2VlcCBpdCB1cCB0byBkYXRlLg0K DQo+IFRoZSB0aGluZyB3aXRoIGEgQklPUyB0aHJlc2hvbGQgaXMgdGhhdCB0aGUgdXNlciBoYXMg bm8gd2F5IHRvDQo+IGF1ZGl0IHRoZSBhbGdvcml0aG0uIFNvLCB3aGVuIEJJT1Mgc3RhcnQgcmVw b3J0aW5nIHN1Y2ggZXJyb3JzLA0KPiBpdCBtYXkgYmUgYWxyZWFkeSB0b28gbGF0ZTogdGhlIHN5 c3RlbXMgbWF5IGJlIGluIHRoZSB2ZXJnZSBvZsKgDQo+IGxvc2luZyBkYXRhIChvciBzb21lIGRh dGEgd2FzIGFscmVhZHkgbG9zdCkuDQo+IA0KPiBUaGF0J3MgY3JpdGljYWwgb24gY2x1c3RlciBz eXN0ZW1zIHdpdGggdGhvdXNhbmRzIG9mIG1hY2hpbmVzOg0KPiB3aGlsZSB0aGUgaW1wYWN0IG9m IGRpc2FibGluZyBhIGNsdXN0ZXIgbm9kZSB0byBkbyBzb21lIG1haW50YWluYW5jZQ0KPiBpcyBt YXJnaW5hbCwgdGhlIGltcGFjdCBvZiBhbiB1bmNvcnJlY3RlZCBlcnJvciBvbiBhIHNpbmdsZQ0K PiBtYWNoaW5lIG1heSBjb21wcm9taXNlIHdlZWtzIG9mIGV4cGVuc2l2ZSBwcm9jZXNzaW5nLg0K PiANCj4gVGhhdCdzIHdoeSBzb21lIHVzZXJzIHByZWZlciB0byBtb25pdG9yIGV2ZXJ5IHNpbmds ZSBjb3JyZWN0ZWQNCj4gZXJyb3IsIGFuZCBjb21wYXJlIHdpdGggdGhlIHByb2JhYmlsaXR5IGRp c3RyaWJ1dGlvbiB0aGV5DQo+IGtub3cgdGhhdCB0aGUgcmlzayBvZiB1bmNvcnJlY3RlZCBlcnJv cnMgaXMgYWNjZXB0YWJsZS4NCg0KUmlnaHQsIEkgZG8gbm90IHRoaW5rIGFsbCBwbGF0Zm9ybXMg bmVlZCB0byBiZSBmaXJtd2FyZS1maXJzdC4gIEkgZG8NCm5vdCB3YW50IHRvIHRhbGsgbGlrZSBh IHNhbGUncyBwZXJzb24sIGJ1dCB3ZSBhbHNvIG9mZmVyIGxvd2VyLWNvc3QNCnBsYXRmb3JtcyB0 aGF0IGRvIG5vdCBjb21lIHdpdGggYnVpbHQtaW4gUkFTLiAgVXNlcnMgY2FuIGNob29zZSBhIHJp Z2h0DQptb2RlbCBmb3IgdGhlaXIgbmVlZHMuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 17:21 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 17:21 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 14:01 -0300, Mauro Carvalho Chehab wrote: > Em Fri, 21 Jul 2017 16:40:20 +0000 > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > On Fri, 2017-07-21 at 12:44 -0300, Mauro Carvalho Chehab wrote: > > > Em Fri, 21 Jul 2017 15:34:50 +0000 > > > "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > > > > > > > On Fri, 2017-07-21 at 17:13 +0200, Borislav Petkov wrote: > > > > > On Fri, Jul 21, 2017 at 03:08:41PM +0000, Kani, Toshimitsu > > > > > wrote: > > > > > > Yes, that is correct. Corrected errors are reported to the > > > > > > OS when they exceeded the platform's threshold. > > > > > > > > > > Are those thresholds user-configurable? > > > > > > > > I suppose it'd depend on vendors, but I do not think users can > > > > do it properly unless they have depth knowledge about the > > > > hardware. > > > > > > > > > If not, what are you telling users who want to see *every* > > > > > corrected error for measuring DIMM wear and so on...? > > > > > > > > Corrected errors are normal and expected to occur on healthy > > > > hardware. They do not need user's attention until they > > > > repeatedly occurred at a same place. > > > > > > Yes, they're expected to happen. Still, some sys admins have > > > their own measurements about what's "normal" for their scenario, > > > and want to monitor every single corrected error, running their > > > own algorithm to warn if the number of corrected errors is above > > > their "normal" rate. > > > > I suppose these admins had to do it because their platforms > > reported all corrected errors. It addresses such administrators' > > burden. > > I see the value of having a threshold in BIOS, provided that it is > well documented, and whose value can be adjusted, if needed. > > One of the things I wanted to implement in ras-daemon were an > algorithm that would be doing such threshold in software. > The problem is that it would require field experience. So, > I talked with a few vendors, to see if they could help doing > it, but, on that time, none rised their hands :-) I think it'd be very hard to keep it up to date. > The thing with a BIOS threshold is that the user has no way to > audit the algorithm. So, when BIOS start reporting such errors, > it may be already too late: the systems may be in the verge of > losing data (or some data was already lost). > > That's critical on cluster systems with thousands of machines: > while the impact of disabling a cluster node to do some maintainance > is marginal, the impact of an uncorrected error on a single > machine may compromise weeks of expensive processing. > > That's why some users prefer to monitor every single corrected > error, and compare with the probability distribution they > know that the risk of uncorrected errors is acceptable. Right, I do not think all platforms need to be firmware-first. I do not want to talk like a sale's person, but we also offer lower-cost platforms that do not come with built-in RAS. Users can choose a right model for their needs. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 17:01 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-21 17:23 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 17:23 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 02:01:31PM -0300, Mauro Carvalho Chehab wrote: > I see the value of having a threshold in BIOS, provided that it is > well documented, and whose value can be adjusted, if needed. > > One of the things I wanted to implement in ras-daemon were an > algorithm that would be doing such threshold in software. We have that now in the kernel: drivers/ras/cec.c We did it exactly for that purpose - not upsetting users unnecessarily. > The thing with a BIOS threshold is that the user has no way to > audit the algorithm. So, when BIOS start reporting such errors, > it may be already too late: the systems may be in the verge of > losing data (or some data was already lost). Not only that: thresholds depend on the DIMM types which means, BIOS must know what DIMM types are in there which I doubt. So exposing that to configuration instead of "deciding" for people would be better. > That's critical on cluster systems with thousands of machines: > while the impact of disabling a cluster node to do some maintainance > is marginal, the impact of an uncorrected error on a single > machine may compromise weeks of expensive processing. > > That's why some users prefer to monitor every single corrected > error, and compare with the probability distribution they > know that the risk of uncorrected errors is acceptable. Yap, you need to have stuff like that configurable - BIOS can't predict all possible use cases. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 17:23 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 17:23 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 02:01:31PM -0300, Mauro Carvalho Chehab wrote: > I see the value of having a threshold in BIOS, provided that it is > well documented, and whose value can be adjusted, if needed. > > One of the things I wanted to implement in ras-daemon were an > algorithm that would be doing such threshold in software. We have that now in the kernel: drivers/ras/cec.c We did it exactly for that purpose - not upsetting users unnecessarily. > The thing with a BIOS threshold is that the user has no way to > audit the algorithm. So, when BIOS start reporting such errors, > it may be already too late: the systems may be in the verge of > losing data (or some data was already lost). Not only that: thresholds depend on the DIMM types which means, BIOS must know what DIMM types are in there which I doubt. So exposing that to configuration instead of "deciding" for people would be better. > That's critical on cluster systems with thousands of machines: > while the impact of disabling a cluster node to do some maintainance > is marginal, the impact of an uncorrected error on a single > machine may compromise weeks of expensive processing. > > That's why some users prefer to monitor every single corrected > error, and compare with the probability distribution they > know that the risk of uncorrected errors is acceptable. Yap, you need to have stuff like that configurable - BIOS can't predict all possible use cases. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 17:23 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 17:23 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 02:01:31PM -0300, Mauro Carvalho Chehab wrote: > I see the value of having a threshold in BIOS, provided that it is > well documented, and whose value can be adjusted, if needed. > > One of the things I wanted to implement in ras-daemon were an > algorithm that would be doing such threshold in software. We have that now in the kernel: drivers/ras/cec.c We did it exactly for that purpose - not upsetting users unnecessarily. > The thing with a BIOS threshold is that the user has no way to > audit the algorithm. So, when BIOS start reporting such errors, > it may be already too late: the systems may be in the verge of > losing data (or some data was already lost). Not only that: thresholds depend on the DIMM types which means, BIOS must know what DIMM types are in there which I doubt. So exposing that to configuration instead of "deciding" for people would be better. > That's critical on cluster systems with thousands of machines: > while the impact of disabling a cluster node to do some maintainance > is marginal, the impact of an uncorrected error on a single > machine may compromise weeks of expensive processing. > > That's why some users prefer to monitor every single corrected > error, and compare with the probability distribution they > know that the risk of uncorrected errors is acceptable. Yap, you need to have stuff like that configurable - BIOS can't predict all possible use cases. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 17:23 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-21 18:38 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 18:38 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 19:23 +0200, Borislav Petkov wrote: : > Not only that: thresholds depend on the DIMM types which means, BIOS > must know what DIMM types are in there which I doubt. BIOS knows DIMM model from the SPD data. > So exposing that to configuration instead of "deciding" for people > would be better. Enterprise platforms have very different model (I do not say it's better for everyone from the cost perspective). Typically, such platform vendors work with DIMM vendors directly to come with their supported DIMMs with own part numbers, which are certified for the platforms with extensive validation testings. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 18:38 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-21 18:38 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 19:23 +0200, Borislav Petkov wrote: : > Not only that: thresholds depend on the DIMM types which means, BIOS > must know what DIMM types are in there which I doubt. BIOS knows DIMM model from the SPD data. > So exposing that to configuration instead of "deciding" for people > would be better. Enterprise platforms have very different model (I do not say it's better for everyone from the cost perspective). Typically, such platform vendors work with DIMM vendors directly to come with their supported DIMMs with own part numbers, which are certified for the platforms with extensive validation testings. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 18:38 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 18:38 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 19:23 +0200, Borislav Petkov wrote: : > Not only that: thresholds depend on the DIMM types which means, BIOS > must know what DIMM types are in there which I doubt. BIOS knows DIMM model from the SPD data. > So exposing that to configuration instead of "deciding" for people > would be better. Enterprise platforms have very different model (I do not say it's better for everyone from the cost perspective). Typically, such platform vendors work with DIMM vendors directly to come with their supported DIMMs with own part numbers, which are certified for the platforms with extensive validation testings. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 18:38 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-22 6:28 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-22 6:28 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 06:38:52PM +0000, Kani, Toshimitsu wrote: > Enterprise platforms have very different model (I do not say it's > better for everyone from the cost perspective). Typically, such But you do tell your customers that the error counts they see are not really what *actually* happens, right? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-22 6:28 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-22 6:28 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 06:38:52PM +0000, Kani, Toshimitsu wrote: > Enterprise platforms have very different model (I do not say it's > better for everyone from the cost perspective). Typically, such But you do tell your customers that the error counts they see are not really what *actually* happens, right? ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-22 6:28 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-22 6:28 UTC (permalink / raw) To: Kani, Toshimitsu Cc: mchehab, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 06:38:52PM +0000, Kani, Toshimitsu wrote: > Enterprise platforms have very different model (I do not say it's > better for everyone from the cost perspective). Typically, such But you do tell your customers that the error counts they see are not really what *actually* happens, right? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-22 6:28 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-24 14:49 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 14:49 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Sat, 2017-07-22 at 08:28 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 06:38:52PM +0000, Kani, Toshimitsu wrote: > > Enterprise platforms have very different model (I do not say it's > > better for everyone from the cost perspective). Typically, such > > But you do tell your customers that the error counts they see are not > really what *actually* happens, right? We do not tell the error counts to customers. We tell customers when they need attention and have actionable items, and we provide support for that. Support gets all info necessary. There are multiple models for multiple types of customers. I am not saying one model is better than the other. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 14:49 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-24 14:49 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac T24gU2F0LCAyMDE3LTA3LTIyIGF0IDA4OjI4ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6 DQo+IE9uIEZyaSwgSnVsIDIxLCAyMDE3IGF0IDA2OjM4OjUyUE0gKzAwMDAsIEthbmksIFRvc2hp bWl0c3Ugd3JvdGU6DQo+ID4gRW50ZXJwcmlzZSBwbGF0Zm9ybXMgaGF2ZSB2ZXJ5IGRpZmZlcmVu dCBtb2RlbCAoSSBkbyBub3Qgc2F5IGl0J3MNCj4gPiBiZXR0ZXIgZm9yIGV2ZXJ5b25lIGZyb20g dGhlIGNvc3QgcGVyc3BlY3RpdmUpLsKgwqBUeXBpY2FsbHksIHN1Y2gNCj4gDQo+IEJ1dCB5b3Ug ZG8gdGVsbCB5b3VyIGN1c3RvbWVycyB0aGF0IHRoZSBlcnJvciBjb3VudHMgdGhleSBzZWUgYXJl IG5vdA0KPiByZWFsbHkgd2hhdCAqYWN0dWFsbHkqIGhhcHBlbnMsIHJpZ2h0Pw0KDQpXZSBkbyBu b3QgdGVsbCB0aGUgZXJyb3IgY291bnRzIHRvIGN1c3RvbWVycy4gIFdlIHRlbGwgY3VzdG9tZXJz IHdoZW4NCnRoZXkgbmVlZCBhdHRlbnRpb24gYW5kIGhhdmUgYWN0aW9uYWJsZSBpdGVtcywgYW5k IHdlIHByb3ZpZGUgc3VwcG9ydA0KZm9yIHRoYXQuICBTdXBwb3J0IGdldHMgYWxsIGluZm8gbmVj ZXNzYXJ5Lg0KDQpUaGVyZSBhcmUgbXVsdGlwbGUgbW9kZWxzIGZvciBtdWx0aXBsZSB0eXBlcyBv ZiBjdXN0b21lcnMuICBJIGFtIG5vdA0Kc2F5aW5nIG9uZSBtb2RlbCBpcyBiZXR0ZXIgdGhhbiB0 aGUgb3RoZXIuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 14:49 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 14:49 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Sat, 2017-07-22 at 08:28 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 06:38:52PM +0000, Kani, Toshimitsu wrote: > > Enterprise platforms have very different model (I do not say it's > > better for everyone from the cost perspective). Typically, such > > But you do tell your customers that the error counts they see are not > really what *actually* happens, right? We do not tell the error counts to customers. We tell customers when they need attention and have actionable items, and we provide support for that. Support gets all info necessary. There are multiple models for multiple types of customers. I am not saying one model is better than the other. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 14:49 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-24 15:04 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 15:04 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 02:49:30PM +0000, Kani, Toshimitsu wrote: > We do not tell the error counts to customers. Please read what I said: do you tell your customers that the error counts they're seeing (or are *not* seeing) is bogus because the BIOS is hiding them? Not the *actual* numbers! > We tell customers when they need attention and have actionable items, > and we provide support for that. Support gets all info necessary. Ok, good to know. I'll make sure to bounce such issues to you guys in the future. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:04 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 15:04 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 02:49:30PM +0000, Kani, Toshimitsu wrote: > We do not tell the error counts to customers. Please read what I said: do you tell your customers that the error counts they're seeing (or are *not* seeing) is bogus because the BIOS is hiding them? Not the *actual* numbers! > We tell customers when they need attention and have actionable items, > and we provide support for that. Support gets all info necessary. Ok, good to know. I'll make sure to bounce such issues to you guys in the future. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:04 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 15:04 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 02:49:30PM +0000, Kani, Toshimitsu wrote: > We do not tell the error counts to customers. Please read what I said: do you tell your customers that the error counts they're seeing (or are *not* seeing) is bogus because the BIOS is hiding them? Not the *actual* numbers! > We tell customers when they need attention and have actionable items, > and we provide support for that. Support gets all info necessary. Ok, good to know. I'll make sure to bounce such issues to you guys in the future. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 15:04 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-24 15:25 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 15:25 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 17:04 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 02:49:30PM +0000, Kani, Toshimitsu wrote: > > We do not tell the error counts to customers. > > Please read what I said: do you tell your customers that the error > counts they're seeing (or are *not* seeing) is bogus because the BIOS > is hiding them? Not the *actual* numbers! Customers do not see error counts. I do not think it's bogus. This model is basically the same as your car. You do not see error counts or periodical normal errors from all kinds of controllers in the car while you are driving. You get an attention lamp lit when you need to bring it to a car dealer. > > We tell customers when they need attention and have actionable > > items, and we provide support for that. Support gets all info > > necessary. > > Ok, good to know. I'll make sure to bounce such issues to you guys in > the future. We've been providing this model for many years now. I am just trying to enable OS error reporting with ghes_edac. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:25 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-24 15:25 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 17:04 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 02:49:30PM +0000, Kani, Toshimitsu wrote: > > We do not tell the error counts to customers. > > Please read what I said: do you tell your customers that the error > counts they're seeing (or are *not* seeing) is bogus because the BIOS > is hiding them? Not the *actual* numbers! Customers do not see error counts. I do not think it's bogus. This model is basically the same as your car. You do not see error counts or periodical normal errors from all kinds of controllers in the car while you are driving. You get an attention lamp lit when you need to bring it to a car dealer. > > We tell customers when they need attention and have actionable > > items, and we provide support for that. Support gets all info > > necessary. > > Ok, good to know. I'll make sure to bounce such issues to you guys in > the future. We've been providing this model for many years now. I am just trying to enable OS error reporting with ghes_edac. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:25 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 15:25 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 17:04 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 02:49:30PM +0000, Kani, Toshimitsu wrote: > > We do not tell the error counts to customers. > > Please read what I said: do you tell your customers that the error > counts they're seeing (or are *not* seeing) is bogus because the BIOS > is hiding them? Not the *actual* numbers! Customers do not see error counts. I do not think it's bogus. This model is basically the same as your car. You do not see error counts or periodical normal errors from all kinds of controllers in the car while you are driving. You get an attention lamp lit when you need to bring it to a car dealer. > > We tell customers when they need attention and have actionable > > items, and we provide support for that. Support gets all info > > necessary. > > Ok, good to know. I'll make sure to bounce such issues to you guys in > the future. We've been providing this model for many years now. I am just trying to enable OS error reporting with ghes_edac. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 15:25 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-24 15:37 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 15:37 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: > Customers do not see error counts. I do not think it's bogus. Not showing the real error error counts but something contrived is the definition of bogus numbers. But you're not showing anything - only when some thresholds are being hit. > This model is basically the same as your car. You do not see error Oh jeez, we're talking about cars now. > We've been providing this model for many years now. Dude, relax, I'm only trying to point out to you that there are customers who want to see *every* error and thus track how their hardware behaves. And that for those customers it is probably worth considering exposing that info and providing a switch to disable that dumbing of the RAS functionality in the BIOS so that people can decide for themselves. That's all. I'm not questioning your model - I'm just saying that it could be improved for certain customers. Do me a favor and this time *actually* *read* my reply. > I am just trying to enable OS error reporting with ghes_edac. I know, you don't have to state the obvious constantly. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:37 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 15:37 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: > Customers do not see error counts. I do not think it's bogus. Not showing the real error error counts but something contrived is the definition of bogus numbers. But you're not showing anything - only when some thresholds are being hit. > This model is basically the same as your car. You do not see error Oh jeez, we're talking about cars now. > We've been providing this model for many years now. Dude, relax, I'm only trying to point out to you that there are customers who want to see *every* error and thus track how their hardware behaves. And that for those customers it is probably worth considering exposing that info and providing a switch to disable that dumbing of the RAS functionality in the BIOS so that people can decide for themselves. That's all. I'm not questioning your model - I'm just saying that it could be improved for certain customers. Do me a favor and this time *actually* *read* my reply. > I am just trying to enable OS error reporting with ghes_edac. I know, you don't have to state the obvious constantly. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:37 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 15:37 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: > Customers do not see error counts. I do not think it's bogus. Not showing the real error error counts but something contrived is the definition of bogus numbers. But you're not showing anything - only when some thresholds are being hit. > This model is basically the same as your car. You do not see error Oh jeez, we're talking about cars now. > We've been providing this model for many years now. Dude, relax, I'm only trying to point out to you that there are customers who want to see *every* error and thus track how their hardware behaves. And that for those customers it is probably worth considering exposing that info and providing a switch to disable that dumbing of the RAS functionality in the BIOS so that people can decide for themselves. That's all. I'm not questioning your model - I'm just saying that it could be improved for certain customers. Do me a favor and this time *actually* *read* my reply. > I am just trying to enable OS error reporting with ghes_edac. I know, you don't have to state the obvious constantly. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 15:37 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-24 15:56 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 15:56 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 17:37 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: : > > > We've been providing this model for many years now. > > Dude, relax, I'm only trying to point out to you that there are > customers who want to see *every* error and thus track how their > hardware behaves. And that for those customers it is probably worth > considering exposing that info and providing a switch to disable that > dumbing of the RAS functionality in the BIOS so that people can > decide for themselves. That's all. Yes, Mauro has already pointed this out. As I replied to him, we do have a separate series of platforms that do not have built-in RAS, and report all errors. Such customers can simply choose them. They do not need to pay for built-in RAS. The model w/ built-in RAS provides warranty & full support. As I said, it's a different model. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:56 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-24 15:56 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 17:37 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: : > > > We've been providing this model for many years now. > > Dude, relax, I'm only trying to point out to you that there are > customers who want to see *every* error and thus track how their > hardware behaves. And that for those customers it is probably worth > considering exposing that info and providing a switch to disable that > dumbing of the RAS functionality in the BIOS so that people can > decide for themselves. That's all. Yes, Mauro has already pointed this out. As I replied to him, we do have a separate series of platforms that do not have built-in RAS, and report all errors. Such customers can simply choose them. They do not need to pay for built-in RAS. The model w/ built-in RAS provides warranty & full support. As I said, it's a different model. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 15:56 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 15:56 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 17:37 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: : > > > We've been providing this model for many years now. > > Dude, relax, I'm only trying to point out to you that there are > customers who want to see *every* error and thus track how their > hardware behaves. And that for those customers it is probably worth > considering exposing that info and providing a switch to disable that > dumbing of the RAS functionality in the BIOS so that people can > decide for themselves. That's all. Yes, Mauro has already pointed this out. As I replied to him, we do have a separate series of platforms that do not have built-in RAS, and report all errors. Such customers can simply choose them. They do not need to pay for built-in RAS. The model w/ built-in RAS provides warranty & full support. As I said, it's a different model. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 15:56 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-24 16:37 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 16:37 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 03:56:27PM +0000, Kani, Toshimitsu wrote: > Yes, Mauro has already pointed this out. As I replied to him, we do > have a separate series of platforms that do not have built-in RAS, and So this whitelist entry +static struct acpi_oemlist oemlist[] = { + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, + { } /* End */ +}; looks like it'll match every HP server platform not only the ones with built-in RAS. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 16:37 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 16:37 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 03:56:27PM +0000, Kani, Toshimitsu wrote: > Yes, Mauro has already pointed this out. As I replied to him, we do > have a separate series of platforms that do not have built-in RAS, and So this whitelist entry +static struct acpi_oemlist oemlist[] = { + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, + { } /* End */ +}; looks like it'll match every HP server platform not only the ones with built-in RAS. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 16:37 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 16:37 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 03:56:27PM +0000, Kani, Toshimitsu wrote: > Yes, Mauro has already pointed this out. As I replied to him, we do > have a separate series of platforms that do not have built-in RAS, and So this whitelist entry +static struct acpi_oemlist oemlist[] = { + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, + { } /* End */ +}; looks like it'll match every HP server platform not only the ones with built-in RAS. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 16:37 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-24 17:44 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 17:44 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 18:37 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 03:56:27PM +0000, Kani, Toshimitsu wrote: > > Yes, Mauro has already pointed this out. As I replied to him, we > > do have a separate series of platforms that do not have built-in > > RAS, and > > So this whitelist entry > > +static struct acpi_oemlist oemlist[] = { > + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, > + { } /* End */ > +}; > > looks like it'll match every HP server platform not only the ones > with built-in RAS. I assumed our platforms w/o build-in RAS do not implement GHES, but I will check for sure. Also, all our previous/current platforms have "HP". Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:44 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-24 17:44 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 18:37 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 03:56:27PM +0000, Kani, Toshimitsu wrote: > > Yes, Mauro has already pointed this out. As I replied to him, we > > do have a separate series of platforms that do not have built-in > > RAS, and > > So this whitelist entry > > +static struct acpi_oemlist oemlist[] = { > + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, > + { } /* End */ > +}; > > looks like it'll match every HP server platform not only the ones > with built-in RAS. I assumed our platforms w/o build-in RAS do not implement GHES, but I will check for sure. Also, all our previous/current platforms have "HP". Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:44 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 17:44 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 18:37 +0200, Borislav Petkov wrote: > On Mon, Jul 24, 2017 at 03:56:27PM +0000, Kani, Toshimitsu wrote: > > Yes, Mauro has already pointed this out. As I replied to him, we > > do have a separate series of platforms that do not have built-in > > RAS, and > > So this whitelist entry > > +static struct acpi_oemlist oemlist[] = { > + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, > + { } /* End */ > +}; > > looks like it'll match every HP server platform not only the ones > with built-in RAS. I assumed our platforms w/o build-in RAS do not implement GHES, but I will check for sure. Also, all our previous/current platforms have "HP". Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 17:44 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-24 17:50 ` Boris Petkov -1 siblings, 0 replies; 238+ messages in thread From: Boris Petkov @ 2017-07-24 17:50 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On July 24, 2017 8:44:03 PM GMT+03:00, "Kani, Toshimitsu" <toshi.kani@hpe.com> wrote: >I assumed our platforms w/o build-in RAS do not implement GHES, If we make it a normal module, it will be decoupled from GHES and it will rely only on the whitelist to load. -- Sent from a small device: formatting sux and brevity is inevitable. ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:50 ` Boris Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 17:50 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On July 24, 2017 8:44:03 PM GMT+03:00, "Kani, Toshimitsu" <toshi.kani@hpe.com> wrote: >I assumed our platforms w/o build-in RAS do not implement GHES, If we make it a normal module, it will be decoupled from GHES and it will rely only on the whitelist to load. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:50 ` Boris Petkov 0 siblings, 0 replies; 238+ messages in thread From: Boris Petkov @ 2017-07-24 17:50 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On July 24, 2017 8:44:03 PM GMT+03:00, "Kani, Toshimitsu" <toshi.kani@hpe.com> wrote: >I assumed our platforms w/o build-in RAS do not implement GHES, If we make it a normal module, it will be decoupled from GHES and it will rely only on the whitelist to load. -- Sent from a small device: formatting sux and brevity is inevitable. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 17:50 ` [PATCH 3/3] " Boris Petkov (?) @ 2017-07-24 17:54 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 17:54 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 20:50 +0300, Boris Petkov wrote: > On July 24, 2017 8:44:03 PM GMT+03:00, "Kani, Toshimitsu" <toshi.kani > @hpe.com> wrote: > > I assumed our platforms w/o build-in RAS do not implement GHES, > > If we make it a normal module, it will be decoupled from GHES and it > will rely only on the whitelist to load. Umm... I was under impression that we are adding the OSC bit check in addition to the current GHES filtering. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:54 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-24 17:54 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 20:50 +0300, Boris Petkov wrote: > On July 24, 2017 8:44:03 PM GMT+03:00, "Kani, Toshimitsu" <toshi.kani > @hpe.com> wrote: > > I assumed our platforms w/o build-in RAS do not implement GHES, > > If we make it a normal module, it will be decoupled from GHES and it > will rely only on the whitelist to load. Umm... I was under impression that we are adding the OSC bit check in addition to the current GHES filtering. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:54 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 17:54 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 20:50 +0300, Boris Petkov wrote: > On July 24, 2017 8:44:03 PM GMT+03:00, "Kani, Toshimitsu" <toshi.kani > @hpe.com> wrote: > > I assumed our platforms w/o build-in RAS do not implement GHES, > > If we make it a normal module, it will be decoupled from GHES and it > will rely only on the whitelist to load. Umm... I was under impression that we are adding the OSC bit check in addition to the current GHES filtering. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 17:54 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-24 18:18 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 18:18 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 05:54:52PM +0000, Kani, Toshimitsu wrote: > Umm... I was under impression that we are adding the OSC bit check in > addition to the current GHES filtering. Read the parallel subthread again. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:18 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 18:18 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 05:54:52PM +0000, Kani, Toshimitsu wrote: > Umm... I was under impression that we are adding the OSC bit check in > addition to the current GHES filtering. Read the parallel subthread again. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:18 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 18:18 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 05:54:52PM +0000, Kani, Toshimitsu wrote: > Umm... I was under impression that we are adding the OSC bit check in > addition to the current GHES filtering. Read the parallel subthread again. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 15:56 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-24 17:56 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 17:56 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 15:56:27 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Mon, 2017-07-24 at 17:37 +0200, Borislav Petkov wrote: > > On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: > : > > > > > We've been providing this model for many years now. > > > > Dude, relax, I'm only trying to point out to you that there are > > customers who want to see *every* error and thus track how their > > hardware behaves. And that for those customers it is probably worth > > considering exposing that info and providing a switch to disable that > > dumbing of the RAS functionality in the BIOS so that people can > > decide for themselves. That's all. > > Yes, Mauro has already pointed this out. As I replied to him, we do > have a separate series of platforms that do not have built-in RAS, and > report all errors. Such customers can simply choose them. They do not > need to pay for built-in RAS. That's probably too late for me as I received a new HP machine we bought just last week, but for the next time I would need to get a new hardware, what would be the non-RAS equivalent to a ML 350 G9 tower-mounted machine with two Xeon v4 CPUs and iLO? Regards, Mauro > > The model w/ built-in RAS provides warranty & full support. As I said, > it's a different model. > > Thanks, > -Toshi Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:56 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 17:56 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 15:56:27 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Mon, 2017-07-24 at 17:37 +0200, Borislav Petkov wrote: > > On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: > : > > > > > We've been providing this model for many years now. > > > > Dude, relax, I'm only trying to point out to you that there are > > customers who want to see *every* error and thus track how their > > hardware behaves. And that for those customers it is probably worth > > considering exposing that info and providing a switch to disable that > > dumbing of the RAS functionality in the BIOS so that people can > > decide for themselves. That's all. > > Yes, Mauro has already pointed this out. As I replied to him, we do > have a separate series of platforms that do not have built-in RAS, and > report all errors. Such customers can simply choose them. They do not > need to pay for built-in RAS. That's probably too late for me as I received a new HP machine we bought just last week, but for the next time I would need to get a new hardware, what would be the non-RAS equivalent to a ML 350 G9 tower-mounted machine with two Xeon v4 CPUs and iLO? Regards, Mauro > > The model w/ built-in RAS provides warranty & full support. As I said, > it's a different model. > > Thanks, > -Toshi Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 17:56 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 17:56 UTC (permalink / raw) To: Kani, Toshimitsu Cc: bp, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 15:56:27 +0000 "Kani, Toshimitsu" <toshi.kani@hpe.com> escreveu: > On Mon, 2017-07-24 at 17:37 +0200, Borislav Petkov wrote: > > On Mon, Jul 24, 2017 at 03:25:34PM +0000, Kani, Toshimitsu wrote: > : > > > > > We've been providing this model for many years now. > > > > Dude, relax, I'm only trying to point out to you that there are > > customers who want to see *every* error and thus track how their > > hardware behaves. And that for those customers it is probably worth > > considering exposing that info and providing a switch to disable that > > dumbing of the RAS functionality in the BIOS so that people can > > decide for themselves. That's all. > > Yes, Mauro has already pointed this out. As I replied to him, we do > have a separate series of platforms that do not have built-in RAS, and > report all errors. Such customers can simply choose them. They do not > need to pay for built-in RAS. That's probably too late for me as I received a new HP machine we bought just last week, but for the next time I would need to get a new hardware, what would be the non-RAS equivalent to a ML 350 G9 tower-mounted machine with two Xeon v4 CPUs and iLO? Regards, Mauro > > The model w/ built-in RAS provides warranty & full support. As I said, > it's a different model. > > Thanks, > -Toshi Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 17:56 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-24 18:12 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 18:12 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 14:56 -0300, Mauro Carvalho Chehab wrote: > Em Mon, 24 Jul 2017 15:56:27 +0000 : > That's probably too late for me as I received a new HP machine > we bought just last week, but for the next time I would need to > get a new hardware, what would be the non-RAS equivalent to > a ML 350 G9 tower-mounted machine with two Xeon v4 CPUs and iLO? Such servers are called "HPE Cloudline". But I think they are all rack-mounted, not tower-mounted machines. HP Inc. (which is now a separate company for consumer-oriented products) probably has such machine. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:12 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-24 18:12 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac T24gTW9uLCAyMDE3LTA3LTI0IGF0IDE0OjU2IC0wMzAwLCBNYXVybyBDYXJ2YWxobyBDaGVoYWIg d3JvdGU6DQo+IEVtIE1vbiwgMjQgSnVsIDIwMTcgMTU6NTY6MjcgKzAwMDANCiA6DQo+IFRoYXQn cyBwcm9iYWJseSB0b28gbGF0ZSBmb3IgbWUgYXMgSSByZWNlaXZlZCBhIG5ldyBIUCBtYWNoaW5l DQo+IHdlIGJvdWdodCBqdXN0IGxhc3Qgd2VlaywgYnV0IGZvciB0aGUgbmV4dCB0aW1lIEkgd291 bGQgbmVlZCB0bw0KPiBnZXQgYSBuZXcgaGFyZHdhcmUsIHdoYXQgd291bGQgYmUgdGhlIG5vbi1S QVMgZXF1aXZhbGVudCB0bw0KPiBhIE1MIDM1MCBHOSB0b3dlci1tb3VudGVkIG1hY2hpbmUgd2l0 aCB0d28gWGVvbiB2NCBDUFVzIGFuZCBpTE8/DQoNClN1Y2ggc2VydmVycyBhcmUgY2FsbGVkICJI UEUgQ2xvdWRsaW5lIi4gIEJ1dCBJIHRoaW5rIHRoZXkgYXJlIGFsbA0KcmFjay1tb3VudGVkLCBu b3QgdG93ZXItbW91bnRlZCBtYWNoaW5lcy4gIEhQIEluYy4gKHdoaWNoIGlzIG5vdyBhDQpzZXBh cmF0ZSBjb21wYW55IGZvciBjb25zdW1lci1vcmllbnRlZCBwcm9kdWN0cykgcHJvYmFibHkgaGFz IHN1Y2gNCm1hY2hpbmUuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:12 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-24 18:12 UTC (permalink / raw) To: mchehab Cc: linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, bp, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 14:56 -0300, Mauro Carvalho Chehab wrote: > Em Mon, 24 Jul 2017 15:56:27 +0000 : > That's probably too late for me as I received a new HP machine > we bought just last week, but for the next time I would need to > get a new hardware, what would be the non-RAS equivalent to > a ML 350 G9 tower-mounted machine with two Xeon v4 CPUs and iLO? Such servers are called "HPE Cloudline". But I think they are all rack-mounted, not tower-mounted machines. HP Inc. (which is now a separate company for consumer-oriented products) probably has such machine. Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 15:37 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-24 16:04 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 16:04 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 17:37:16 +0200 Borislav Petkov <bp@alien8.de> escreveu: > > Customers do not see error counts. I do not think it's bogus. > > I am just trying to enable OS error reporting with ghes_edac. > > I know, you don't have to state the obvious constantly. The problem I see is that, currently, on users that have EDAC already enabled, the users gets the errors directly from the hardware. If the Kernel force those users to use ghes_edac by default, they they won't see the error counts anymore, but, instead, hardware reports that the memories need to be replaced. Well, if such users are handling thresholds themselves, they won't see those errors anymore, as the errors will be masked. That's a regression. So, the right solution would be to keep hardware first, but providing a modprobe parameter to let them switch to software first. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 16:04 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 16:04 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 17:37:16 +0200 Borislav Petkov <bp@alien8.de> escreveu: > > Customers do not see error counts. I do not think it's bogus. > > I am just trying to enable OS error reporting with ghes_edac. > > I know, you don't have to state the obvious constantly. The problem I see is that, currently, on users that have EDAC already enabled, the users gets the errors directly from the hardware. If the Kernel force those users to use ghes_edac by default, they they won't see the error counts anymore, but, instead, hardware reports that the memories need to be replaced. Well, if such users are handling thresholds themselves, they won't see those errors anymore, as the errors will be masked. That's a regression. So, the right solution would be to keep hardware first, but providing a modprobe parameter to let them switch to software first. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 16:04 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 16:04 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 17:37:16 +0200 Borislav Petkov <bp@alien8.de> escreveu: > > Customers do not see error counts. I do not think it's bogus. > > I am just trying to enable OS error reporting with ghes_edac. > > I know, you don't have to state the obvious constantly. The problem I see is that, currently, on users that have EDAC already enabled, the users gets the errors directly from the hardware. If the Kernel force those users to use ghes_edac by default, they they won't see the error counts anymore, but, instead, hardware reports that the memories need to be replaced. Well, if such users are handling thresholds themselves, they won't see those errors anymore, as the errors will be masked. That's a regression. So, the right solution would be to keep hardware first, but providing a modprobe parameter to let them switch to software first. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 16:04 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-24 16:44 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 16:44 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 01:04:13PM -0300, Mauro Carvalho Chehab wrote: > If the Kernel force those users to use ghes_edac by default, > they they won't see the error counts anymore, but, instead, > hardware reports that the memories need to be replaced. This is exactly why I'm trying to load ghes_edac only on those platforms which would really want it. > So, the right solution would be to keep hardware first, but > providing a modprobe parameter to let them switch to software > first. That's exactly the issue: if we make it spec-conform and adhere to FF setting, then it'll be clean. BUT(!), we will force ghes_edac on those platforms which potentially are using the platform-specific drivers until now. Not good. If we do the whitelisting, then we're stuck with maintaining a yucky whitelist and have to keep updating ghes_edac with it. So we're basically between a rock and a hard place. If I had to choose *right* *now*, I'd probably lean slightly towards the whitelist as it won't break existing users. A big grumpfy-grumbly hmmm. :-\ -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 16:44 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 16:44 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 01:04:13PM -0300, Mauro Carvalho Chehab wrote: > If the Kernel force those users to use ghes_edac by default, > they they won't see the error counts anymore, but, instead, > hardware reports that the memories need to be replaced. This is exactly why I'm trying to load ghes_edac only on those platforms which would really want it. > So, the right solution would be to keep hardware first, but > providing a modprobe parameter to let them switch to software > first. That's exactly the issue: if we make it spec-conform and adhere to FF setting, then it'll be clean. BUT(!), we will force ghes_edac on those platforms which potentially are using the platform-specific drivers until now. Not good. If we do the whitelisting, then we're stuck with maintaining a yucky whitelist and have to keep updating ghes_edac with it. So we're basically between a rock and a hard place. If I had to choose *right* *now*, I'd probably lean slightly towards the whitelist as it won't break existing users. A big grumpfy-grumbly hmmm. :-\ ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 16:44 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 16:44 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, Jul 24, 2017 at 01:04:13PM -0300, Mauro Carvalho Chehab wrote: > If the Kernel force those users to use ghes_edac by default, > they they won't see the error counts anymore, but, instead, > hardware reports that the memories need to be replaced. This is exactly why I'm trying to load ghes_edac only on those platforms which would really want it. > So, the right solution would be to keep hardware first, but > providing a modprobe parameter to let them switch to software > first. That's exactly the issue: if we make it spec-conform and adhere to FF setting, then it'll be clean. BUT(!), we will force ghes_edac on those platforms which potentially are using the platform-specific drivers until now. Not good. If we do the whitelisting, then we're stuck with maintaining a yucky whitelist and have to keep updating ghes_edac with it. So we're basically between a rock and a hard place. If I had to choose *right* *now*, I'd probably lean slightly towards the whitelist as it won't break existing users. A big grumpfy-grumbly hmmm. :-\ -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 16:44 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-24 18:10 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 18:10 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 18:44:00 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Mon, Jul 24, 2017 at 01:04:13PM -0300, Mauro Carvalho Chehab wrote: > > If the Kernel force those users to use ghes_edac by default, > > they they won't see the error counts anymore, but, instead, > > hardware reports that the memories need to be replaced. > > This is exactly why I'm trying to load ghes_edac only on those platforms > which would really want it. > > > So, the right solution would be to keep hardware first, but > > providing a modprobe parameter to let them switch to software > > first. > > That's exactly the issue: if we make it spec-conform and adhere to FF > setting, then it'll be clean. BUT(!), we will force ghes_edac on those > platforms which potentially are using the platform-specific drivers > until now. Not good. > > If we do the whitelisting, then we're stuck with maintaining a yucky > whitelist and have to keep updating ghes_edac with it. Yeah, having a whitelist is a maintainership's burden, but, on the other hand, I suspect that there aren't many systems that implement FF, have a reliable BIOS mapping of MB's silkscreen and doesn't filters out corrected errors using some sort of undocumented mechanism. So, I guess it is doable. Another alternative, with, IMO, is better would be to add a parameter like: edac=FF - firmware first; edac=hw - hardware first; edac=auto - honors FF if set in BIOS. Otherwise, hardware first. In order to avoid regressions, and to avoid the need of a whitelist, I would keep "edac=hw" as default. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:10 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 18:10 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 18:44:00 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Mon, Jul 24, 2017 at 01:04:13PM -0300, Mauro Carvalho Chehab wrote: > > If the Kernel force those users to use ghes_edac by default, > > they they won't see the error counts anymore, but, instead, > > hardware reports that the memories need to be replaced. > > This is exactly why I'm trying to load ghes_edac only on those platforms > which would really want it. > > > So, the right solution would be to keep hardware first, but > > providing a modprobe parameter to let them switch to software > > first. > > That's exactly the issue: if we make it spec-conform and adhere to FF > setting, then it'll be clean. BUT(!), we will force ghes_edac on those > platforms which potentially are using the platform-specific drivers > until now. Not good. > > If we do the whitelisting, then we're stuck with maintaining a yucky > whitelist and have to keep updating ghes_edac with it. Yeah, having a whitelist is a maintainership's burden, but, on the other hand, I suspect that there aren't many systems that implement FF, have a reliable BIOS mapping of MB's silkscreen and doesn't filters out corrected errors using some sort of undocumented mechanism. So, I guess it is doable. Another alternative, with, IMO, is better would be to add a parameter like: edac=FF - firmware first; edac=hw - hardware first; edac=auto - honors FF if set in BIOS. Otherwise, hardware first. In order to avoid regressions, and to avoid the need of a whitelist, I would keep "edac=hw" as default. Thanks, Mauro --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:10 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 238+ messages in thread From: Mauro Carvalho Chehab @ 2017-07-24 18:10 UTC (permalink / raw) To: Borislav Petkov Cc: Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac Em Mon, 24 Jul 2017 18:44:00 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Mon, Jul 24, 2017 at 01:04:13PM -0300, Mauro Carvalho Chehab wrote: > > If the Kernel force those users to use ghes_edac by default, > > they they won't see the error counts anymore, but, instead, > > hardware reports that the memories need to be replaced. > > This is exactly why I'm trying to load ghes_edac only on those platforms > which would really want it. > > > So, the right solution would be to keep hardware first, but > > providing a modprobe parameter to let them switch to software > > first. > > That's exactly the issue: if we make it spec-conform and adhere to FF > setting, then it'll be clean. BUT(!), we will force ghes_edac on those > platforms which potentially are using the platform-specific drivers > until now. Not good. > > If we do the whitelisting, then we're stuck with maintaining a yucky > whitelist and have to keep updating ghes_edac with it. Yeah, having a whitelist is a maintainership's burden, but, on the other hand, I suspect that there aren't many systems that implement FF, have a reliable BIOS mapping of MB's silkscreen and doesn't filters out corrected errors using some sort of undocumented mechanism. So, I guess it is doable. Another alternative, with, IMO, is better would be to add a parameter like: edac=FF - firmware first; edac=hw - hardware first; edac=auto - honors FF if set in BIOS. Otherwise, hardware first. In order to avoid regressions, and to avoid the need of a whitelist, I would keep "edac=hw" as default. Thanks, Mauro ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 18:10 ` [PATCH 3/3] " Mauro Carvalho Chehab (?) @ 2017-07-24 18:30 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 18:30 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac (Sending to your other mail address because there's some temporary resolution issue: msmtp: recipient address mchehab@s-opensource.com not accepted by the server msmtp: server message: 451 4.3.0 <mchehab@s-opensource.com>: Temporary lookup failure msmtp: could not send mail (account alien8.de from /home/boris/.msmtprc) Maybe the problem is on my end.) On Mon, Jul 24, 2017 at 03:10:13PM -0300, Mauro Carvalho Chehab wrote: > Yeah, having a whitelist is a maintainership's burden, but, on > the other hand, I suspect that there aren't many systems that > implement FF, have a reliable BIOS mapping of MB's silkscreen > and doesn't filters out corrected errors using some sort of > undocumented mechanism. > > So, I guess it is doable. Right, let's hope. > Another alternative, with, IMO, is better would be to add a parameter like: > > edac=FF - firmware first; > edac=hw - hardware first; > edac=auto - honors FF if set in BIOS. Otherwise, hardware first. Or maybe edac=try_FF or so. But yeah, I guess we'll need something to tell the EDAC core to try FF first. > In order to avoid regressions, and to avoid the need of a whitelist, > I would keep "edac=hw" as default. So I don't want to break existing users and thus make only explicitly known platforms load ghes_edac. In the current case, the HPE machines. All the rest will simply use the platform drivers and nothing will change for them. Later we'll probably need to revisit this decision but right now and with all things considered, the whitelist seems - as ugly as it is - the most workable solution for all the different use cases and machines... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:30 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 18:30 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac (Sending to your other mail address because there's some temporary resolution issue: msmtp: recipient address mchehab@s-opensource.com not accepted by the server msmtp: server message: 451 4.3.0 <mchehab@s-opensource.com>: Temporary lookup failure msmtp: could not send mail (account alien8.de from /home/boris/.msmtprc) Maybe the problem is on my end.) On Mon, Jul 24, 2017 at 03:10:13PM -0300, Mauro Carvalho Chehab wrote: > Yeah, having a whitelist is a maintainership's burden, but, on > the other hand, I suspect that there aren't many systems that > implement FF, have a reliable BIOS mapping of MB's silkscreen > and doesn't filters out corrected errors using some sort of > undocumented mechanism. > > So, I guess it is doable. Right, let's hope. > Another alternative, with, IMO, is better would be to add a parameter like: > > edac=FF - firmware first; > edac=hw - hardware first; > edac=auto - honors FF if set in BIOS. Otherwise, hardware first. Or maybe edac=try_FF or so. But yeah, I guess we'll need something to tell the EDAC core to try FF first. > In order to avoid regressions, and to avoid the need of a whitelist, > I would keep "edac=hw" as default. So I don't want to break existing users and thus make only explicitly known platforms load ghes_edac. In the current case, the HPE machines. All the rest will simply use the platform drivers and nothing will change for them. Later we'll probably need to revisit this decision but right now and with all things considered, the whitelist seems - as ugly as it is - the most workable solution for all the different use cases and machines... ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-24 18:30 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-24 18:30 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Mauro Carvalho Chehab, Kani, Toshimitsu, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac (Sending to your other mail address because there's some temporary resolution issue: msmtp: recipient address mchehab@s-opensource.com not accepted by the server msmtp: server message: 451 4.3.0 <mchehab@s-opensource.com>: Temporary lookup failure msmtp: could not send mail (account alien8.de from /home/boris/.msmtprc) Maybe the problem is on my end.) On Mon, Jul 24, 2017 at 03:10:13PM -0300, Mauro Carvalho Chehab wrote: > Yeah, having a whitelist is a maintainership's burden, but, on > the other hand, I suspect that there aren't many systems that > implement FF, have a reliable BIOS mapping of MB's silkscreen > and doesn't filters out corrected errors using some sort of > undocumented mechanism. > > So, I guess it is doable. Right, let's hope. > Another alternative, with, IMO, is better would be to add a parameter like: > > edac=FF - firmware first; > edac=hw - hardware first; > edac=auto - honors FF if set in BIOS. Otherwise, hardware first. Or maybe edac=try_FF or so. But yeah, I guess we'll need something to tell the EDAC core to try FF first. > In order to avoid regressions, and to avoid the need of a whitelist, > I would keep "edac=hw" as default. So I don't want to break existing users and thus make only explicitly known platforms load ghes_edac. In the current case, the HPE machines. All the rest will simply use the platform drivers and nothing will change for them. Later we'll probably need to revisit this decision but right now and with all things considered, the whitelist seems - as ugly as it is - the most workable solution for all the different use cases and machines... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-24 18:30 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-25 23:00 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-25 23:00 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 20:30 +0200, Borislav Petkov wrote: : > > So I don't want to break existing users and thus make only explicitly > known platforms load ghes_edac. In the current case, the HPE > machines. All the rest will simply use the platform drivers and > nothing will change for them. > > Later we'll probably need to revisit this decision but right now and > with all things considered, the whitelist seems - as ugly as it is - > the most workable solution for all the different use cases and > machines... Agreed. I will verify OEMID info of our other platforms, and add APEI OSC check before calling ghes_edac_register(). Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-25 23:00 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-25 23:00 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac T24gTW9uLCAyMDE3LTA3LTI0IGF0IDIwOjMwICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6 DQogOg0KPiANCj4gU28gSSBkb24ndCB3YW50IHRvIGJyZWFrIGV4aXN0aW5nIHVzZXJzIGFuZCB0 aHVzIG1ha2Ugb25seSBleHBsaWNpdGx5DQo+IGtub3duIHBsYXRmb3JtcyBsb2FkIGdoZXNfZWRh Yy4gSW4gdGhlIGN1cnJlbnQgY2FzZSwgdGhlIEhQRQ0KPiBtYWNoaW5lcy4gIEFsbCB0aGUgcmVz dCB3aWxsIHNpbXBseSB1c2UgdGhlIHBsYXRmb3JtIGRyaXZlcnMgYW5kDQo+IG5vdGhpbmcgd2ls bCBjaGFuZ2UgZm9yIHRoZW0uDQo+IA0KPiBMYXRlciB3ZSdsbCBwcm9iYWJseSBuZWVkIHRvIHJl dmlzaXQgdGhpcyBkZWNpc2lvbiBidXQgcmlnaHQgbm93IGFuZA0KPiB3aXRoIGFsbCB0aGluZ3Mg Y29uc2lkZXJlZCwgdGhlIHdoaXRlbGlzdCBzZWVtcyAtIGFzIHVnbHkgYXMgaXQgaXMgLQ0KPiB0 aGUgbW9zdCB3b3JrYWJsZSBzb2x1dGlvbiBmb3IgYWxsIHRoZSBkaWZmZXJlbnQgdXNlIGNhc2Vz IGFuZA0KPiBtYWNoaW5lcy4uLg0KDQpBZ3JlZWQuICBJIHdpbGwgdmVyaWZ5IE9FTUlEIGluZm8g b2Ygb3VyIG90aGVyIHBsYXRmb3JtcywgYW5kIGFkZCBBUEVJDQpPU0MgY2hlY2sgYmVmb3JlIGNh bGxpbmcgZ2hlc19lZGFjX3JlZ2lzdGVyKCkuDQoNClRoYW5rcywNCi1Ub3NoaQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-25 23:00 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-25 23:00 UTC (permalink / raw) To: mchehab, bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Mon, 2017-07-24 at 20:30 +0200, Borislav Petkov wrote: : > > So I don't want to break existing users and thus make only explicitly > known platforms load ghes_edac. In the current case, the HPE > machines. All the rest will simply use the platform drivers and > nothing will change for them. > > Later we'll probably need to revisit this decision but right now and > with all things considered, the whitelist seems - as ugly as it is - > the most workable solution for all the different use cases and > machines... Agreed. I will verify OEMID info of our other platforms, and add APEI OSC check before calling ghes_edac_register(). Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 15:34 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-21 15:53 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 15:53 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 03:34:50PM +0000, Kani, Toshimitsu wrote: > I suppose it'd depend on vendors, but I do not think users can do it > properly unless they have depth knowledge about the hardware. I'm talking about a menu in the BIOS where you can set the thresholding levels on the system. Does your BIOS have that? > Corrected errors are normal and expected to occur on healthy hardware. > They do not need user's attention until they repeatedly occurred at a > same place. Apparently, you haven't been on enough maintanance calls, trying to calm down the customer about the hardware error he sees in his logs... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:53 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 15:53 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 03:34:50PM +0000, Kani, Toshimitsu wrote: > I suppose it'd depend on vendors, but I do not think users can do it > properly unless they have depth knowledge about the hardware. I'm talking about a menu in the BIOS where you can set the thresholding levels on the system. Does your BIOS have that? > Corrected errors are normal and expected to occur on healthy hardware. > They do not need user's attention until they repeatedly occurred at a > same place. Apparently, you haven't been on enough maintanance calls, trying to calm down the customer about the hardware error he sees in his logs... ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 15:53 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-21 15:53 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, Jul 21, 2017 at 03:34:50PM +0000, Kani, Toshimitsu wrote: > I suppose it'd depend on vendors, but I do not think users can do it > properly unless they have depth knowledge about the hardware. I'm talking about a menu in the BIOS where you can set the thresholding levels on the system. Does your BIOS have that? > Corrected errors are normal and expected to occur on healthy hardware. > They do not need user's attention until they repeatedly occurred at a > same place. Apparently, you haven't been on enough maintanance calls, trying to calm down the customer about the hardware error he sees in his logs... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-21 15:53 ` [PATCH 3/3] " Borislav Petkov (?) @ 2017-07-21 16:32 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 16:32 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 17:53 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 03:34:50PM +0000, Kani, Toshimitsu wrote: > > I suppose it'd depend on vendors, but I do not think users can do > > it properly unless they have depth knowledge about the hardware. > > I'm talking about a menu in the BIOS where you can set the > thresholding levels on the system. Does your BIOS have that? No, we don't offer such settings. > > Corrected errors are normal and expected to occur on healthy > > hardware. They do not need user's attention until they repeatedly > > occurred at a same place. > > Apparently, you haven't been on enough maintanance calls, trying to > calm down the customer about the hardware error he sees in his > logs... Actually, that's why. Reporting all corrected errors make users worried, call support, and asking to replace healthy hardware... Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 16:32 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-21 16:32 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 17:53 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 03:34:50PM +0000, Kani, Toshimitsu wrote: > > I suppose it'd depend on vendors, but I do not think users can do > > it properly unless they have depth knowledge about the hardware. > > I'm talking about a menu in the BIOS where you can set the > thresholding levels on the system. Does your BIOS have that? No, we don't offer such settings. > > Corrected errors are normal and expected to occur on healthy > > hardware. They do not need user's attention until they repeatedly > > occurred at a same place. > > Apparently, you haven't been on enough maintanance calls, trying to > calm down the customer about the hardware error he sees in his > logs... Actually, that's why. Reporting all corrected errors make users worried, call support, and asking to replace healthy hardware... Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-21 16:32 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-21 16:32 UTC (permalink / raw) To: bp Cc: linux-kernel, mchehab, tglx, mchehab, rjw, srinivas.pandruvada, tony.luck, lenb, linux-acpi, linux-edac On Fri, 2017-07-21 at 17:53 +0200, Borislav Petkov wrote: > On Fri, Jul 21, 2017 at 03:34:50PM +0000, Kani, Toshimitsu wrote: > > I suppose it'd depend on vendors, but I do not think users can do > > it properly unless they have depth knowledge about the hardware. > > I'm talking about a menu in the BIOS where you can set the > thresholding levels on the system. Does your BIOS have that? No, we don't offer such settings. > > Corrected errors are normal and expected to occur on healthy > > hardware. They do not need user's attention until they repeatedly > > occurred at a same place. > > Apparently, you haven't been on enough maintanance calls, trying to > calm down the customer about the hardware error he sees in his > logs... Actually, that's why. Reporting all corrected errors make users worried, call support, and asking to replace healthy hardware... Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 19:58 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-19 5:55 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:55 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 07:58:54PM +0000, Kani, Toshimitsu wrote: > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. That's how it is supposed to work. The platform drivers are the fallback, practically. But this is the important piece of info I was looking for - having GHES enabled in the firmware does not prevent the platform drivers from loading. But I think we have a better solution, the FF thing. > Hmm... what's the platform name of this box? I can look into this case > if you need. You can but that's not addressing the issue as a whole so it'll be a waste of time. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 5:55 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:55 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 07:58:54PM +0000, Kani, Toshimitsu wrote: > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. That's how it is supposed to work. The platform drivers are the fallback, practically. But this is the important piece of info I was looking for - having GHES enabled in the firmware does not prevent the platform drivers from loading. But I think we have a better solution, the FF thing. > Hmm... what's the platform name of this box? I can look into this case > if you need. You can but that's not addressing the issue as a whole so it'll be a waste of time. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 5:55 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 5:55 UTC (permalink / raw) To: Kani, Toshimitsu Cc: tony.luck, linux-kernel, tglx, mchehab, rjw, srinivas.pandruvada, lenb, linux-acpi, linux-edac On Tue, Jul 18, 2017 at 07:58:54PM +0000, Kani, Toshimitsu wrote: > I have HPE Haswell and Skylake test systems with GHES, but they do not > hide IMCs from the OS. So, the sb_edac and skx_edac drivers get > attached on these systems when ghes_edac is disabled. That's how it is supposed to work. The platform drivers are the fallback, practically. But this is the important piece of info I was looking for - having GHES enabled in the firmware does not prevent the platform drivers from loading. But I think we have a better solution, the FF thing. > Hmm... what's the platform name of this box? I can look into this case > if you need. You can but that's not addressing the issue as a whole so it'll be a waste of time. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 6:00 ` [3/3] " Borislav Petkov (?) @ 2017-07-18 22:13 ` Luck, Tony -1 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-18 22:13 UTC (permalink / raw) To: Borislav Petkov, Toshi Kani Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel > The question is: does the platform do this disabling now? > > Tony, I'm looking at sb_edac and there we don't do something like that > or maybe I'm missing it. Historically we've had complaints that sb_edac won't load that have been tracked to BIOS hiding one of the (many) PCI devices that it needs. But device hiding is orthogonal to providing GHES error records. A BIOS might do that, but I don't know that anyone intentionally does so. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 22:13 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-18 22:13 UTC (permalink / raw) To: Borislav Petkov, Toshi Kani Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel > The question is: does the platform do this disabling now? > > Tony, I'm looking at sb_edac and there we don't do something like that > or maybe I'm missing it. Historically we've had complaints that sb_edac won't load that have been tracked to BIOS hiding one of the (many) PCI devices that it needs. But device hiding is orthogonal to providing GHES error records. A BIOS might do that, but I don't know that anyone intentionally does so. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 22:13 ` Luck, Tony 0 siblings, 0 replies; 238+ messages in thread From: Luck, Tony @ 2017-07-18 22:13 UTC (permalink / raw) To: Borislav Petkov, Toshi Kani Cc: rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel > The question is: does the platform do this disabling now? > > Tony, I'm looking at sb_edac and there we don't do something like that > or maybe I'm missing it. Historically we've had complaints that sb_edac won't load that have been tracked to BIOS hiding one of the (many) PCI devices that it needs. But device hiding is orthogonal to providing GHES error records. A BIOS might do that, but I don't know that anyone intentionally does so. -Tony ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 22:13 ` [PATCH 3/3] " Luck, Tony (?) @ 2017-07-19 6:01 ` Borislav Petkov -1 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 6:01 UTC (permalink / raw) To: Luck, Tony Cc: Toshi Kani, rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Tue, Jul 18, 2017 at 10:13:42PM +0000, Luck, Tony wrote: > Historically we've had complaints that sb_edac won't load that have been > tracked to BIOS hiding one of the (many) PCI devices that it needs. But > device hiding is orthogonal to providing GHES error records. A BIOS might > do that, but I don't know that anyone intentionally does so. Yeah, the hiding-devices path doesn't look like the optimal one. I think we should look at the firmware-first setting and load ghes if FF is being done by the firmware. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 6:01 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 6:01 UTC (permalink / raw) To: Luck, Tony Cc: Toshi Kani, rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Tue, Jul 18, 2017 at 10:13:42PM +0000, Luck, Tony wrote: > Historically we've had complaints that sb_edac won't load that have been > tracked to BIOS hiding one of the (many) PCI devices that it needs. But > device hiding is orthogonal to providing GHES error records. A BIOS might > do that, but I don't know that anyone intentionally does so. Yeah, the hiding-devices path doesn't look like the optimal one. I think we should look at the firmware-first setting and load ghes if FF is being done by the firmware. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-19 6:01 ` Borislav Petkov 0 siblings, 0 replies; 238+ messages in thread From: Borislav Petkov @ 2017-07-19 6:01 UTC (permalink / raw) To: Luck, Tony Cc: Toshi Kani, rjw, mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On Tue, Jul 18, 2017 at 10:13:42PM +0000, Luck, Tony wrote: > Historically we've had complaints that sb_edac won't load that have been > tracked to BIOS hiding one of the (many) PCI devices that it needs. But > device hiding is orthogonal to providing GHES error records. A BIOS might > do that, but I don't know that anyone intentionally does so. Yeah, the hiding-devices path doesn't look like the optimal one. I think we should look at the firmware-first setting and load ghes if FF is being done by the firmware. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 14:39 ` Jeffrey Hugo 0 siblings, 0 replies; 238+ messages in thread From: Jeffrey Hugo @ 2017-07-18 14:39 UTC (permalink / raw) To: Toshi Kani, rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On 7/17/2017 3:59 PM, Toshi Kani wrote: > The ghes_edac driver was introduced in 2013 [1], but it has not > been enabled by any distro yet. Ubuntu is expected to enable this soon. -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 14:39 ` Jeffrey Hugo 0 siblings, 0 replies; 238+ messages in thread From: Jeffrey Hugo @ 2017-07-18 14:39 UTC (permalink / raw) To: Toshi Kani, rjw, bp Cc: mchehab, tglx, srinivas.pandruvada, lenb, linux-acpi, linux-edac, linux-kernel On 7/17/2017 3:59 PM, Toshi Kani wrote: > The ghes_edac driver was introduced in 2013 [1], but it has not > been enabled by any distro yet. Ubuntu is expected to enable this soon. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 14:39 ` [3/3] " Jeffrey Hugo (?) @ 2017-07-18 15:36 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 15:36 UTC (permalink / raw) To: bp, jhugo, rjw Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: > On 7/17/2017 3:59 PM, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet. > > Ubuntu is expected to enable this soon. Interesting. I was told from other distro that there were many buggy firmwares out there that prevented to enable ghes_edac. Do you know if Ubuntu has any plan to address such issue? Or do they not see such issue? I do not test with other vendors' platforms, so I cannot tell exactly what those bugs are... Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 15:36 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 15:36 UTC (permalink / raw) To: bp, jhugo, rjw Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: > On 7/17/2017 3:59 PM, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet. > > Ubuntu is expected to enable this soon. Interesting. I was told from other distro that there were many buggy firmwares out there that prevented to enable ghes_edac. Do you know if Ubuntu has any plan to address such issue? Or do they not see such issue? I do not test with other vendors' platforms, so I cannot tell exactly what those bugs are... Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 15:36 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 15:36 UTC (permalink / raw) To: bp, jhugo, rjw Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: > On 7/17/2017 3:59 PM, Toshi Kani wrote: > > The ghes_edac driver was introduced in 2013 [1], but it has not > > been enabled by any distro yet. > > Ubuntu is expected to enable this soon. Interesting. I was told from other distro that there were many buggy firmwares out there that prevented to enable ghes_edac. Do you know if Ubuntu has any plan to address such issue? Or do they not see such issue? I do not test with other vendors' platforms, so I cannot tell exactly what those bugs are... Thanks, -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 15:36 ` [PATCH 3/3] " Kani, Toshimitsu (?) @ 2017-07-18 16:24 ` Jeffrey Hugo -1 siblings, 0 replies; 238+ messages in thread From: Jeffrey Hugo @ 2017-07-18 16:24 UTC (permalink / raw) To: Kani, Toshimitsu, bp, rjw Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On 7/18/2017 9:36 AM, Kani, Toshimitsu wrote: > On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: >> On 7/17/2017 3:59 PM, Toshi Kani wrote: >>> The ghes_edac driver was introduced in 2013 [1], but it has not >>> been enabled by any distro yet. >> >> Ubuntu is expected to enable this soon. > > Interesting. I was told from other distro that there were many buggy > firmwares out there that prevented to enable ghes_edac. Do you know if > Ubuntu has any plan to address such issue? Or do they not see such > issue? I do not test with other vendors' platforms, so I cannot tell > exactly what those bugs are... > I do not know if Ubuntu intends to address any "known issues". I know a request was made to Canonical to enable the option, and it appears the request is being considered, although the option may be limited to ARM64, depending on how Canonical's evaluation goes. I am not aware of any particular issues, so I cannot say what the side effects are, or what platforms are considered to exhibit such issues. -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 16:24 ` Jeffrey Hugo 0 siblings, 0 replies; 238+ messages in thread From: Jeffrey Hugo @ 2017-07-18 16:24 UTC (permalink / raw) To: Kani, Toshimitsu, bp, rjw Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On 7/18/2017 9:36 AM, Kani, Toshimitsu wrote: > On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: >> On 7/17/2017 3:59 PM, Toshi Kani wrote: >>> The ghes_edac driver was introduced in 2013 [1], but it has not >>> been enabled by any distro yet. >> >> Ubuntu is expected to enable this soon. > > Interesting. I was told from other distro that there were many buggy > firmwares out there that prevented to enable ghes_edac. Do you know if > Ubuntu has any plan to address such issue? Or do they not see such > issue? I do not test with other vendors' platforms, so I cannot tell > exactly what those bugs are... > I do not know if Ubuntu intends to address any "known issues". I know a request was made to Canonical to enable the option, and it appears the request is being considered, although the option may be limited to ARM64, depending on how Canonical's evaluation goes. I am not aware of any particular issues, so I cannot say what the side effects are, or what platforms are considered to exhibit such issues. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 16:24 ` Jeffrey Hugo 0 siblings, 0 replies; 238+ messages in thread From: Jeffrey Hugo @ 2017-07-18 16:24 UTC (permalink / raw) To: Kani, Toshimitsu, bp, rjw Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On 7/18/2017 9:36 AM, Kani, Toshimitsu wrote: > On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: >> On 7/17/2017 3:59 PM, Toshi Kani wrote: >>> The ghes_edac driver was introduced in 2013 [1], but it has not >>> been enabled by any distro yet. >> >> Ubuntu is expected to enable this soon. > > Interesting. I was told from other distro that there were many buggy > firmwares out there that prevented to enable ghes_edac. Do you know if > Ubuntu has any plan to address such issue? Or do they not see such > issue? I do not test with other vendors' platforms, so I cannot tell > exactly what those bugs are... > I do not know if Ubuntu intends to address any "known issues". I know a request was made to Canonical to enable the option, and it appears the request is being considered, although the option may be limited to ARM64, depending on how Canonical's evaluation goes. I am not aware of any particular issues, so I cannot say what the side effects are, or what platforms are considered to exhibit such issues. -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac 2017-07-18 16:24 ` [PATCH 3/3] " Jeffrey Hugo (?) @ 2017-07-18 16:42 ` Kani, Toshimitsu -1 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 16:42 UTC (permalink / raw) To: rjw, jhugo, bp Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On Tue, 2017-07-18 at 10:24 -0600, Jeffrey Hugo wrote: > On 7/18/2017 9:36 AM, Kani, Toshimitsu wrote: > > On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: > > > On 7/17/2017 3:59 PM, Toshi Kani wrote: > > > > The ghes_edac driver was introduced in 2013 [1], but it has not > > > > been enabled by any distro yet. > > > > > > Ubuntu is expected to enable this soon. > > > > Interesting. I was told from other distro that there were many > > buggy firmwares out there that prevented to enable ghes_edac. Do > > you know if Ubuntu has any plan to address such issue? Or do they > > not see such issue? I do not test with other vendors' platforms, > > so I cannot tell exactly what those bugs are... > > > > I do not know if Ubuntu intends to address any "known issues". I > know a request was made to Canonical to enable the option, and it > appears the request is being considered, although the option may be > limited to ARM64, depending on how Canonical's evaluation goes. I am > not aware of any particular issues, so I cannot say what the side > effects are, or what platforms are considered to exhibit such issues. I see. Thanks for the info! I hope someone from Canonical is on the list. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
* [3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 16:42 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Toshi Kani @ 2017-07-18 16:42 UTC (permalink / raw) To: rjw, jhugo, bp Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel T24gVHVlLCAyMDE3LTA3LTE4IGF0IDEwOjI0IC0wNjAwLCBKZWZmcmV5IEh1Z28gd3JvdGU6DQo+ IE9uIDcvMTgvMjAxNyA5OjM2IEFNLCBLYW5pLCBUb3NoaW1pdHN1IHdyb3RlOg0KPiA+IE9uIFR1 ZSwgMjAxNy0wNy0xOCBhdCAwODozOSAtMDYwMCwgSmVmZnJleSBIdWdvIHdyb3RlOg0KPiA+ID4g T24gNy8xNy8yMDE3IDM6NTkgUE0sIFRvc2hpIEthbmkgd3JvdGU6DQo+ID4gPiA+IFRoZSBnaGVz X2VkYWMgZHJpdmVyIHdhcyBpbnRyb2R1Y2VkIGluIDIwMTMgWzFdLCBidXQgaXQgaGFzIG5vdA0K PiA+ID4gPiBiZWVuIGVuYWJsZWQgYnkgYW55IGRpc3RybyB5ZXQuDQo+ID4gPiANCj4gPiA+IFVi dW50dSBpcyBleHBlY3RlZCB0byBlbmFibGUgdGhpcyBzb29uLg0KPiA+IA0KPiA+IEludGVyZXN0 aW5nLsKgwqBJIHdhcyB0b2xkIGZyb20gb3RoZXIgZGlzdHJvIHRoYXQgdGhlcmUgd2VyZSBtYW55 DQo+ID4gYnVnZ3kgZmlybXdhcmVzIG91dCB0aGVyZSB0aGF0IHByZXZlbnRlZCB0byBlbmFibGUg Z2hlc19lZGFjLsKgwqBEbw0KPiA+IHlvdSBrbm93IGlmIFVidW50dSBoYXMgYW55IHBsYW4gdG8g YWRkcmVzcyBzdWNoIGlzc3VlP8KgwqBPciBkbyB0aGV5DQo+ID4gbm90IHNlZSBzdWNoIGlzc3Vl P8KgwqBJIGRvIG5vdCB0ZXN0IHdpdGggb3RoZXIgdmVuZG9ycycgcGxhdGZvcm1zLA0KPiA+IHNv IEkgY2Fubm90IHRlbGwgZXhhY3RseSB3aGF0IHRob3NlIGJ1Z3MgYXJlLi4uDQo+ID4gDQo+IA0K PiBJIGRvIG5vdCBrbm93IGlmIFVidW50dSBpbnRlbmRzIHRvIGFkZHJlc3MgYW55ICJrbm93biBp c3N1ZXMiLsKgwqBJDQo+IGtub3cgYcKgcmVxdWVzdCB3YXMgbWFkZSB0byBDYW5vbmljYWwgdG8g ZW5hYmxlIHRoZSBvcHRpb24sIGFuZCBpdA0KPiBhcHBlYXJzIHRoZSByZXF1ZXN0IGlzIGJlaW5n IGNvbnNpZGVyZWQsIGFsdGhvdWdoIHRoZSBvcHRpb24gbWF5IGJlDQo+IGxpbWl0ZWQgdG/CoEFS TTY0LCBkZXBlbmRpbmcgb24gaG93IENhbm9uaWNhbCdzIGV2YWx1YXRpb24gZ29lcy7CoMKgSSBh bQ0KPiBub3QgYXdhcmUgb2YgYW55IHBhcnRpY3VsYXIgaXNzdWVzLCBzbyBJIGNhbm5vdCBzYXkg d2hhdCB0aGUgc2lkZQ0KPiBlZmZlY3RzIGFyZSwgb3Igd2hhdCBwbGF0Zm9ybXMgYXJlIGNvbnNp ZGVyZWQgdG8gZXhoaWJpdCBzdWNoIGlzc3Vlcy4NCg0KSSBzZWUuIFRoYW5rcyBmb3IgdGhlIGlu Zm8hIEkgaG9wZSBzb21lb25lIGZyb20gQ2Fub25pY2FsIGlzIG9uIHRoZQ0KbGlzdC4NCi1Ub3No aQ0K --- To unsubscribe from this list: send the line "unsubscribe linux-edac" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac @ 2017-07-18 16:42 ` Kani, Toshimitsu 0 siblings, 0 replies; 238+ messages in thread From: Kani, Toshimitsu @ 2017-07-18 16:42 UTC (permalink / raw) To: rjw, jhugo, bp Cc: tglx, srinivas.pandruvada, mchehab, lenb, linux-acpi, linux-edac, linux-kernel On Tue, 2017-07-18 at 10:24 -0600, Jeffrey Hugo wrote: > On 7/18/2017 9:36 AM, Kani, Toshimitsu wrote: > > On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote: > > > On 7/17/2017 3:59 PM, Toshi Kani wrote: > > > > The ghes_edac driver was introduced in 2013 [1], but it has not > > > > been enabled by any distro yet. > > > > > > Ubuntu is expected to enable this soon. > > > > Interesting. I was told from other distro that there were many > > buggy firmwares out there that prevented to enable ghes_edac. Do > > you know if Ubuntu has any plan to address such issue? Or do they > > not see such issue? I do not test with other vendors' platforms, > > so I cannot tell exactly what those bugs are... > > > > I do not know if Ubuntu intends to address any "known issues". I > know a request was made to Canonical to enable the option, and it > appears the request is being considered, although the option may be > limited to ARM64, depending on how Canonical's evaluation goes. I am > not aware of any particular issues, so I cannot say what the side > effects are, or what platforms are considered to exhibit such issues. I see. Thanks for the info! I hope someone from Canonical is on the list. -Toshi ^ permalink raw reply [flat|nested] 238+ messages in thread
end of thread, other threads:[~2017-07-25 23:00 UTC | newest] Thread overview: 238+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-17 21:59 [PATCH 0/3] enable ghes_edac on selected platforms Toshi Kani 2017-07-17 21:59 ` [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface Toshi Kani 2017-07-17 21:59 ` [1/3] " Toshi Kani 2017-07-18 5:34 ` [PATCH 1/3] " Borislav Petkov 2017-07-18 5:34 ` [1/3] " Borislav Petkov 2017-07-18 15:48 ` [PATCH 1/3] " Kani, Toshimitsu 2017-07-18 15:48 ` [1/3] " Toshi Kani 2017-07-18 15:48 ` [PATCH 1/3] " Kani, Toshimitsu 2017-07-18 16:43 ` Borislav Petkov 2017-07-18 16:43 ` [1/3] " Borislav Petkov 2017-07-18 16:43 ` [PATCH 1/3] " Borislav Petkov 2017-07-18 17:24 ` Kani, Toshimitsu 2017-07-18 17:24 ` [1/3] " Toshi Kani 2017-07-18 17:24 ` [PATCH 1/3] " Kani, Toshimitsu 2017-07-18 17:42 ` Borislav Petkov 2017-07-18 17:42 ` [1/3] " Borislav Petkov 2017-07-18 17:42 ` [PATCH 1/3] " Borislav Petkov 2017-07-18 18:49 ` Kani, Toshimitsu 2017-07-18 18:49 ` [1/3] " Toshi Kani 2017-07-18 18:49 ` [PATCH 1/3] " Kani, Toshimitsu 2017-07-18 19:32 ` Borislav Petkov 2017-07-18 19:32 ` [1/3] " Borislav Petkov 2017-07-18 19:32 ` [PATCH 1/3] " Borislav Petkov 2017-07-18 20:17 ` Kani, Toshimitsu 2017-07-18 20:17 ` [1/3] " Toshi Kani 2017-07-18 20:17 ` [PATCH 1/3] " Kani, Toshimitsu 2017-07-17 21:59 ` [PATCH 2/3] intel_pstate: convert to use acpi_match_oemlist() Toshi Kani 2017-07-17 21:59 ` [2/3] " Toshi Kani 2017-07-17 21:59 ` [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac Toshi Kani 2017-07-17 21:59 ` [3/3] " Toshi Kani 2017-07-18 6:00 ` [PATCH 3/3] " Borislav Petkov 2017-07-18 6:00 ` [3/3] " Borislav Petkov 2017-07-18 8:08 ` [PATCH 3/3] " Borislav Petkov 2017-07-18 8:08 ` [3/3] " Borislav Petkov 2017-07-18 21:20 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-18 21:20 ` [3/3] " Toshi Kani 2017-07-18 21:20 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-19 5:52 ` Borislav Petkov 2017-07-19 5:52 ` [3/3] " Borislav Petkov 2017-07-19 5:52 ` [PATCH 3/3] " Borislav Petkov 2017-07-19 16:10 ` Kani, Toshimitsu 2017-07-19 16:10 ` [3/3] " Toshi Kani 2017-07-19 16:10 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-19 16:22 ` Borislav Petkov 2017-07-19 16:22 ` [3/3] " Borislav Petkov 2017-07-19 16:22 ` [PATCH 3/3] " Borislav Petkov 2017-07-19 16:56 ` Kani, Toshimitsu 2017-07-19 16:56 ` [3/3] " Toshi Kani 2017-07-19 16:56 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-20 4:16 ` Borislav Petkov 2017-07-20 4:16 ` [3/3] " Borislav Petkov 2017-07-20 4:16 ` [PATCH 3/3] " Borislav Petkov 2017-07-20 14:42 ` Kani, Toshimitsu 2017-07-20 14:42 ` [3/3] " Toshi Kani 2017-07-20 14:42 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-20 15:04 ` Borislav Petkov 2017-07-20 15:04 ` [3/3] " Borislav Petkov 2017-07-20 15:04 ` [PATCH 3/3] " Borislav Petkov 2017-07-20 16:55 ` Luck, Tony 2017-07-20 16:55 ` [3/3] " Luck, Tony 2017-07-20 16:55 ` [PATCH 3/3] " Luck, Tony 2017-07-20 17:05 ` Borislav Petkov 2017-07-20 17:05 ` [3/3] " Borislav Petkov 2017-07-20 17:05 ` [PATCH 3/3] " Borislav Petkov 2017-07-20 17:10 ` Luck, Tony 2017-07-20 17:10 ` [3/3] " Luck, Tony 2017-07-20 17:10 ` [PATCH 3/3] " Luck, Tony 2017-07-20 18:16 ` Mauro Carvalho Chehab 2017-07-20 18:16 ` [3/3] " Mauro Carvalho Chehab 2017-07-20 18:16 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-19 18:55 ` Aristeu Rozanski 2017-07-19 18:55 ` [3/3] " Aristeu Rozanski 2017-07-19 18:55 ` [PATCH 3/3] " Aristeu Rozanski 2017-07-19 20:13 ` Kani, Toshimitsu 2017-07-19 20:13 ` [3/3] " Toshi Kani 2017-07-19 20:13 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-20 4:19 ` Borislav Petkov 2017-07-20 4:19 ` [3/3] " Borislav Petkov 2017-07-20 4:19 ` [PATCH 3/3] " Borislav Petkov 2017-07-18 19:58 ` Kani, Toshimitsu 2017-07-18 19:58 ` [3/3] " Toshi Kani 2017-07-18 19:58 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-18 21:15 ` Mauro Carvalho Chehab 2017-07-18 21:15 ` [3/3] " Mauro Carvalho Chehab 2017-07-18 21:15 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-19 5:58 ` Borislav Petkov 2017-07-19 5:58 ` [3/3] " Borislav Petkov 2017-07-19 5:58 ` [PATCH 3/3] " Borislav Petkov 2017-07-19 15:14 ` Luck, Tony 2017-07-19 15:14 ` [3/3] " Luck, Tony 2017-07-19 15:14 ` [PATCH 3/3] " Luck, Tony 2017-07-19 15:57 ` Borislav Petkov 2017-07-19 15:57 ` [3/3] " Borislav Petkov 2017-07-19 15:57 ` [PATCH 3/3] " Borislav Petkov 2017-07-19 18:06 ` Luck, Tony 2017-07-19 18:06 ` [3/3] " Luck, Tony 2017-07-19 18:06 ` [PATCH 3/3] " Luck, Tony 2017-07-19 16:02 ` Mauro Carvalho Chehab 2017-07-19 16:02 ` [3/3] " Mauro Carvalho Chehab 2017-07-19 20:06 ` [PATCH 3/3] " Luck, Tony 2017-07-19 20:06 ` [3/3] " Luck, Tony 2017-07-20 21:15 ` [PATCH 3/3] " Luck, Tony 2017-07-20 21:15 ` [3/3] " Luck, Tony 2017-07-21 0:00 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-21 0:00 ` [3/3] " Mauro Carvalho Chehab 2017-07-21 16:53 ` [PATCH 3/3] " Luck, Tony 2017-07-21 16:53 ` [3/3] " Luck, Tony 2017-07-19 16:40 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-19 16:40 ` [3/3] " Toshi Kani 2017-07-19 16:40 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-20 4:33 ` Borislav Petkov 2017-07-20 4:33 ` [3/3] " Borislav Petkov 2017-07-20 4:33 ` [PATCH 3/3] " Borislav Petkov 2017-07-20 19:50 ` Kani, Toshimitsu 2017-07-20 19:50 ` [3/3] " Toshi Kani 2017-07-20 19:50 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-20 20:15 ` Mauro Carvalho Chehab 2017-07-20 20:15 ` [3/3] " Mauro Carvalho Chehab 2017-07-20 20:15 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-20 21:07 ` Kani, Toshimitsu 2017-07-20 21:07 ` [3/3] " Toshi Kani 2017-07-20 21:07 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-21 13:34 ` Borislav Petkov 2017-07-21 13:34 ` [3/3] " Borislav Petkov 2017-07-21 13:34 ` [PATCH 3/3] " Borislav Petkov 2017-07-21 13:40 ` Mauro Carvalho Chehab 2017-07-21 13:40 ` [3/3] " Mauro Carvalho Chehab 2017-07-21 13:40 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-21 13:47 ` Borislav Petkov 2017-07-21 13:47 ` [3/3] " Borislav Petkov 2017-07-21 13:47 ` [PATCH 3/3] " Borislav Petkov 2017-07-21 15:08 ` Kani, Toshimitsu 2017-07-21 15:08 ` [3/3] " Toshi Kani 2017-07-21 15:08 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-21 15:13 ` Borislav Petkov 2017-07-21 15:13 ` [3/3] " Borislav Petkov 2017-07-21 15:13 ` [PATCH 3/3] " Borislav Petkov 2017-07-21 15:34 ` Kani, Toshimitsu 2017-07-21 15:34 ` [3/3] " Toshi Kani 2017-07-21 15:34 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-21 15:44 ` Mauro Carvalho Chehab 2017-07-21 15:44 ` [3/3] " Mauro Carvalho Chehab 2017-07-21 15:44 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-21 16:40 ` Kani, Toshimitsu 2017-07-21 16:40 ` [3/3] " Toshi Kani 2017-07-21 16:40 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-21 17:01 ` Mauro Carvalho Chehab 2017-07-21 17:01 ` [3/3] " Mauro Carvalho Chehab 2017-07-21 17:01 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-21 17:21 ` Kani, Toshimitsu 2017-07-21 17:21 ` [3/3] " Toshi Kani 2017-07-21 17:21 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-21 17:23 ` Borislav Petkov 2017-07-21 17:23 ` [3/3] " Borislav Petkov 2017-07-21 17:23 ` [PATCH 3/3] " Borislav Petkov 2017-07-21 18:38 ` Kani, Toshimitsu 2017-07-21 18:38 ` [3/3] " Toshi Kani 2017-07-21 18:38 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-22 6:28 ` Borislav Petkov 2017-07-22 6:28 ` [3/3] " Borislav Petkov 2017-07-22 6:28 ` [PATCH 3/3] " Borislav Petkov 2017-07-24 14:49 ` Kani, Toshimitsu 2017-07-24 14:49 ` [3/3] " Toshi Kani 2017-07-24 14:49 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-24 15:04 ` Borislav Petkov 2017-07-24 15:04 ` [3/3] " Borislav Petkov 2017-07-24 15:04 ` [PATCH 3/3] " Borislav Petkov 2017-07-24 15:25 ` Kani, Toshimitsu 2017-07-24 15:25 ` [3/3] " Toshi Kani 2017-07-24 15:25 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-24 15:37 ` Borislav Petkov 2017-07-24 15:37 ` [3/3] " Borislav Petkov 2017-07-24 15:37 ` [PATCH 3/3] " Borislav Petkov 2017-07-24 15:56 ` Kani, Toshimitsu 2017-07-24 15:56 ` [3/3] " Toshi Kani 2017-07-24 15:56 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-24 16:37 ` Borislav Petkov 2017-07-24 16:37 ` [3/3] " Borislav Petkov 2017-07-24 16:37 ` [PATCH 3/3] " Borislav Petkov 2017-07-24 17:44 ` Kani, Toshimitsu 2017-07-24 17:44 ` [3/3] " Toshi Kani 2017-07-24 17:44 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-24 17:50 ` Boris Petkov 2017-07-24 17:50 ` [3/3] " Borislav Petkov 2017-07-24 17:50 ` [PATCH 3/3] " Boris Petkov 2017-07-24 17:54 ` Kani, Toshimitsu 2017-07-24 17:54 ` [3/3] " Toshi Kani 2017-07-24 17:54 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-24 18:18 ` Borislav Petkov 2017-07-24 18:18 ` [3/3] " Borislav Petkov 2017-07-24 18:18 ` [PATCH 3/3] " Borislav Petkov 2017-07-24 17:56 ` Mauro Carvalho Chehab 2017-07-24 17:56 ` [3/3] " Mauro Carvalho Chehab 2017-07-24 17:56 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-24 18:12 ` Kani, Toshimitsu 2017-07-24 18:12 ` [3/3] " Toshi Kani 2017-07-24 18:12 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-24 16:04 ` Mauro Carvalho Chehab 2017-07-24 16:04 ` [3/3] " Mauro Carvalho Chehab 2017-07-24 16:04 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-24 16:44 ` Borislav Petkov 2017-07-24 16:44 ` [3/3] " Borislav Petkov 2017-07-24 16:44 ` [PATCH 3/3] " Borislav Petkov 2017-07-24 18:10 ` Mauro Carvalho Chehab 2017-07-24 18:10 ` [3/3] " Mauro Carvalho Chehab 2017-07-24 18:10 ` [PATCH 3/3] " Mauro Carvalho Chehab 2017-07-24 18:30 ` Borislav Petkov 2017-07-24 18:30 ` [3/3] " Borislav Petkov 2017-07-24 18:30 ` [PATCH 3/3] " Borislav Petkov 2017-07-25 23:00 ` Kani, Toshimitsu 2017-07-25 23:00 ` [3/3] " Toshi Kani 2017-07-25 23:00 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-21 15:53 ` Borislav Petkov 2017-07-21 15:53 ` [3/3] " Borislav Petkov 2017-07-21 15:53 ` [PATCH 3/3] " Borislav Petkov 2017-07-21 16:32 ` Kani, Toshimitsu 2017-07-21 16:32 ` [3/3] " Toshi Kani 2017-07-21 16:32 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-19 5:55 ` Borislav Petkov 2017-07-19 5:55 ` [3/3] " Borislav Petkov 2017-07-19 5:55 ` [PATCH 3/3] " Borislav Petkov 2017-07-18 22:13 ` Luck, Tony 2017-07-18 22:13 ` [3/3] " Luck, Tony 2017-07-18 22:13 ` [PATCH 3/3] " Luck, Tony 2017-07-19 6:01 ` Borislav Petkov 2017-07-19 6:01 ` [3/3] " Borislav Petkov 2017-07-19 6:01 ` [PATCH 3/3] " Borislav Petkov 2017-07-18 14:39 ` Jeffrey Hugo 2017-07-18 14:39 ` [3/3] " Jeffrey Hugo 2017-07-18 15:36 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-18 15:36 ` [3/3] " Toshi Kani 2017-07-18 15:36 ` [PATCH 3/3] " Kani, Toshimitsu 2017-07-18 16:24 ` Jeffrey Hugo 2017-07-18 16:24 ` [3/3] " Jeffrey Hugo 2017-07-18 16:24 ` [PATCH 3/3] " Jeffrey Hugo 2017-07-18 16:42 ` Kani, Toshimitsu 2017-07-18 16:42 ` [3/3] " Toshi Kani 2017-07-18 16:42 ` [PATCH 3/3] " Kani, Toshimitsu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.