linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] ACPI, APEI, Add APEI related _OSC support
@ 2011-06-21  7:16 Huang Ying
  2011-06-21  7:16 ` [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module Huang Ying
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Huang Ying @ 2011-06-21  7:16 UTC (permalink / raw)
  To: Len Brown
  Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi, mjg59

[PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
[PATCH 2/4] ACPI, APEI, GHES, Support disable GHES at boot time
[PATCH 3/4] ACPI, APEI, Add APEI bit support in generic _OSC call
[PATCH 4/4] ACPI, APEI, Add APEI _OSC support

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
  2011-06-21  7:16 [PATCH 0/4] ACPI, APEI, Add APEI related _OSC support Huang Ying
@ 2011-06-21  7:16 ` Huang Ying
  2011-06-21  7:23   ` Andi Kleen
  2011-06-21 13:23   ` Matthew Garrett
  2011-06-21  7:16 ` [PATCH 2/4] ACPI, APEI, GHES, Support disable GHES at boot time Huang Ying
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Huang Ying @ 2011-06-21  7:16 UTC (permalink / raw)
  To: Len Brown
  Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi, mjg59

GHES (Generic Hardware Error Source) is used to process hardware error
notification in firmware first mode.  But because firmware first mode
can be turned on but can not be turned off, it is unreasonable to
unload the GHES module with firmware first mode turned on.  To avoid
confusion, this patch makes GHES can be enable/disable in
configuration time, but not built as module and unload at run time.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/apei/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -10,7 +10,7 @@ config ACPI_APEI
 	  error injection.
 
 config ACPI_APEI_GHES
-	tristate "APEI Generic Hardware Error Source"
+	bool "APEI Generic Hardware Error Source"
 	depends on ACPI_APEI && X86
 	select ACPI_HED
 	help

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/4] ACPI, APEI, GHES, Support disable GHES at boot time
  2011-06-21  7:16 [PATCH 0/4] ACPI, APEI, Add APEI related _OSC support Huang Ying
  2011-06-21  7:16 ` [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module Huang Ying
@ 2011-06-21  7:16 ` Huang Ying
  2011-06-21  7:16 ` [PATCH 3/4] ACPI, APEI, Add APEI bit support in generic _OSC call Huang Ying
  2011-06-21  7:16 ` [PATCH 4/4] ACPI, APEI, Add APEI _OSC support Huang Ying
  3 siblings, 0 replies; 12+ messages in thread
From: Huang Ying @ 2011-06-21  7:16 UTC (permalink / raw)
  To: Len Brown
  Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi, mjg59

Some machine may have broken firmware so that GHES and firmware first
mode should be disabled.  This patch adds support to that.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/apei/ghes.c |    8 ++++++++
 drivers/acpi/apei/hest.c |   17 +++++++++--------
 include/acpi/apei.h      |    1 +
 3 files changed, 18 insertions(+), 8 deletions(-)

--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -77,6 +77,9 @@ struct ghes {
 	};
 };
 
+int ghes_disable;
+module_param_named(disable, ghes_disable, bool, 0);
+
 static int ghes_panic_timeout	__read_mostly = 30;
 
 /*
@@ -662,6 +665,11 @@ static int __init ghes_init(void)
 		return -EINVAL;
 	}
 
+	if (ghes_disable) {
+		pr_info(GHES_PFX "GHES is not enabled!\n");
+		return -EINVAL;
+	}
+
 	rc = ghes_ioremap_init();
 	if (rc)
 		goto err;
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -230,16 +230,17 @@ void __init acpi_hest_init(void)
 		goto err;
 	}
 
-	rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);
-	if (rc)
-		goto err;
-
-	rc = hest_ghes_dev_register(ghes_count);
-	if (!rc) {
-		pr_info(HEST_PFX "Table parsing has been initialized.\n");
-		return;
+	if (!ghes_disable) {
+		rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);
+		if (rc)
+			goto err;
+		rc = hest_ghes_dev_register(ghes_count);
+		if (rc)
+			goto err;
 	}
 
+	pr_info(HEST_PFX "Table parsing has been initialized.\n");
+	return;
 err:
 	hest_disable = 1;
 }
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -18,6 +18,7 @@
 
 extern int hest_disable;
 extern int erst_disable;
+extern int ghes_disable;
 
 #ifdef CONFIG_ACPI_APEI
 void __init acpi_hest_init(void);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/4] ACPI, APEI, Add APEI bit support in generic _OSC call
  2011-06-21  7:16 [PATCH 0/4] ACPI, APEI, Add APEI related _OSC support Huang Ying
  2011-06-21  7:16 ` [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module Huang Ying
  2011-06-21  7:16 ` [PATCH 2/4] ACPI, APEI, GHES, Support disable GHES at boot time Huang Ying
@ 2011-06-21  7:16 ` Huang Ying
  2011-06-21  7:16 ` [PATCH 4/4] ACPI, APEI, Add APEI _OSC support Huang Ying
  3 siblings, 0 replies; 12+ messages in thread
From: Huang Ying @ 2011-06-21  7:16 UTC (permalink / raw)
  To: Len Brown
  Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi, mjg59

In APEI firmware first mode, hardware error is reported by hardware to
firmware firstly, then firmware reports the error to Linux in a GHES
error record via POLL/SCI/IRQ/NMI etc.

This may result in some issues if OS has no full APEI support.  So
some firmware implementation will work in a back-compatible mode by
default.  Where firmware will only notify OS in old-fashion, without
GHES record.  For example, for a fatal hardware error, only NMI is
signaled, no GHES record.

To gain full APEI power on these machines, APEI bit in generic _OSC
call should be specified to tell firmware that Linux has full APEI
support.  This patch adds the APEI bit support in generic _OSC call.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/bus.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -39,6 +39,7 @@
 #include <linux/pci.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/acpi_drivers.h>
+#include <acpi/apei.h>
 #include <linux/dmi.h>
 #include <linux/suspend.h>
 
@@ -541,6 +542,11 @@ static void acpi_bus_osc_support(void)
 #if defined(CONFIG_ACPI_PROCESSOR) || defined(CONFIG_ACPI_PROCESSOR_MODULE)
 	capbuf[OSC_SUPPORT_TYPE] |= OSC_SB_PPC_OST_SUPPORT;
 #endif
+
+#ifdef CONFIG_ACPI_APEI_GHES
+	if (!ghes_disable)
+		capbuf[OSC_SUPPORT_TYPE] |= OSC_SB_APEI_SUPPORT;
+#endif
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
 		return;
 	if (ACPI_SUCCESS(acpi_run_osc(handle, &context)))

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/4] ACPI, APEI, Add APEI _OSC support
  2011-06-21  7:16 [PATCH 0/4] ACPI, APEI, Add APEI related _OSC support Huang Ying
                   ` (2 preceding siblings ...)
  2011-06-21  7:16 ` [PATCH 3/4] ACPI, APEI, Add APEI bit support in generic _OSC call Huang Ying
@ 2011-06-21  7:16 ` Huang Ying
  2011-06-21 13:22   ` Matthew Garrett
  3 siblings, 1 reply; 12+ messages in thread
From: Huang Ying @ 2011-06-21  7:16 UTC (permalink / raw)
  To: Len Brown
  Cc: linux-kernel, Andi Kleen, Tony Luck, ying.huang, linux-acpi, mjg59

APEI firmware first mode must be turned on explicitly on some
machines, otherwise they may be no GHES hardware error record for
hardware error notification.  APEI bit in generic _OSC call can be
used to do that, but on some machine, a special APEI _OSC call must
be used.  This patch adds the support to that APEI _OSC call.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/apei/apei-base.c     |   26 ++++++++++++++++++++++++++
 drivers/acpi/apei/apei-internal.h |    2 ++
 drivers/acpi/apei/ghes.c          |    4 ++++
 3 files changed, 32 insertions(+)

--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -604,3 +604,29 @@ struct dentry *apei_get_debugfs_dir(void
 	return dapei;
 }
 EXPORT_SYMBOL_GPL(apei_get_debugfs_dir);
+
+int apei_osc_setup(void)
+{
+	static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";
+	acpi_handle handle;
+	u32 capbuf[3];
+	struct acpi_osc_context context = {
+		.uuid_str	= apei_uuid_str,
+		.rev		= 1,
+		.cap.length	= sizeof(capbuf),
+		.cap.pointer	= capbuf,
+	};
+
+	capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
+	capbuf[OSC_SUPPORT_TYPE] = 0;
+	capbuf[OSC_CONTROL_TYPE] = 0;
+
+	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
+	    || ACPI_FAILURE(acpi_run_osc(handle, &context)))
+		return -EIO;
+	else {
+		kfree(context.ret.pointer);
+		return 0;
+	}
+}
+EXPORT_SYMBOL_GPL(apei_osc_setup);
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -124,4 +124,6 @@ void apei_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus);
 int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus);
 int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
+
+int apei_osc_setup(void);
 #endif
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -678,6 +678,10 @@ static int __init ghes_init(void)
 	if (rc)
 		goto err_ioremap_exit;
 
+	rc = apei_osc_setup();
+	if (rc)
+		pr_info(GHES_PFX "Evaluate APEI _OSC failed!\n");
+
 	return 0;
 err_ioremap_exit:
 	ghes_ioremap_exit();

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
  2011-06-21  7:16 ` [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module Huang Ying
@ 2011-06-21  7:23   ` Andi Kleen
  2011-06-21  7:30     ` Huang Ying
  2011-06-21 13:18     ` Matthew Garrett
  2011-06-21 13:23   ` Matthew Garrett
  1 sibling, 2 replies; 12+ messages in thread
From: Andi Kleen @ 2011-06-21  7:23 UTC (permalink / raw)
  To: Huang Ying
  Cc: Len Brown, linux-kernel, Andi Kleen, Tony Luck, linux-acpi, mjg59

On Tue, Jun 21, 2011 at 03:16:24PM +0800, Huang Ying wrote:
> GHES (Generic Hardware Error Source) is used to process hardware error
> notification in firmware first mode.  But because firmware first mode
> can be turned on but can not be turned off, it is unreasonable to
> unload the GHES module with firmware first mode turned on.  To avoid
> confusion, this patch makes GHES can be enable/disable in
> configuration time, but not built as module and unload at run time.

It's better to keep it as a module, but disable unloading.
You can do that with a __module_get(THIS_MODULE) in the init code
when FFM is detected.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
  2011-06-21  7:23   ` Andi Kleen
@ 2011-06-21  7:30     ` Huang Ying
  2011-06-21 13:18     ` Matthew Garrett
  1 sibling, 0 replies; 12+ messages in thread
From: Huang Ying @ 2011-06-21  7:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Len Brown, linux-kernel, Luck, Tony, linux-acpi, mjg59

Hi, Andi,

Thanks for comments!

On 06/21/2011 03:23 PM, Andi Kleen wrote:
> On Tue, Jun 21, 2011 at 03:16:24PM +0800, Huang Ying wrote:
>> GHES (Generic Hardware Error Source) is used to process hardware error
>> notification in firmware first mode.  But because firmware first mode
>> can be turned on but can not be turned off, it is unreasonable to
>> unload the GHES module with firmware first mode turned on.  To avoid
>> confusion, this patch makes GHES can be enable/disable in
>> configuration time, but not built as module and unload at run time.
> 
> It's better to keep it as a module, but disable unloading.
> You can do that with a __module_get(THIS_MODULE) in the init code
> when FFM is detected.

There are two ways to turn on firmware first mode

1) APEI bit in generic _OSC call
2) Special APEI _OSC call

1) is run before GHES module loading.  If keeping GHES as a module, it
is possible that firmware first mode has been turned on with APEI bit in
generic _OSC call, but GHES module is prevented to be loaded via some
kind of module blacklist.  So I think it is better to prevent GHES to be
built as module.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
  2011-06-21  7:23   ` Andi Kleen
  2011-06-21  7:30     ` Huang Ying
@ 2011-06-21 13:18     ` Matthew Garrett
  1 sibling, 0 replies; 12+ messages in thread
From: Matthew Garrett @ 2011-06-21 13:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Huang Ying, Len Brown, linux-kernel, Tony Luck, linux-acpi

On Tue, Jun 21, 2011 at 09:23:01AM +0200, Andi Kleen wrote:
> On Tue, Jun 21, 2011 at 03:16:24PM +0800, Huang Ying wrote:
> > GHES (Generic Hardware Error Source) is used to process hardware error
> > notification in firmware first mode.  But because firmware first mode
> > can be turned on but can not be turned off, it is unreasonable to
> > unload the GHES module with firmware first mode turned on.  To avoid
> > confusion, this patch makes GHES can be enable/disable in
> > configuration time, but not built as module and unload at run time.
> 
> It's better to keep it as a module, but disable unloading.
> You can do that with a __module_get(THIS_MODULE) in the init code
> when FFM is detected.

Anything that's enabled by an _OSC call is expected to be available 
immediately. So the choices are either to ensure that GHES support is 
built in, or to make a second _OSC call when the GHES code is loaded. 
We've seen in the PCIe case that many firmware implementations 
misinterpret multiple attempts to set _OSC with the same UUID, and a 
cursory examination of some implementations of the systemwide one 
suggest that we'd see the same issue there.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] ACPI, APEI, Add APEI _OSC support
  2011-06-21  7:16 ` [PATCH 4/4] ACPI, APEI, Add APEI _OSC support Huang Ying
@ 2011-06-21 13:22   ` Matthew Garrett
  2011-06-22  2:21     ` Huang Ying
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Garrett @ 2011-06-21 13:22 UTC (permalink / raw)
  To: Huang Ying; +Cc: Len Brown, linux-kernel, Andi Kleen, Tony Luck, linux-acpi

On Tue, Jun 21, 2011 at 03:16:27PM +0800, Huang Ying wrote:

> +	rc = apei_osc_setup();
> +	if (rc)
> +		pr_info(GHES_PFX "Evaluate APEI _OSC failed!\n");

Hm. This is maybe a little strong. It'd be valid for a machine to return 
an error here but still have the GHES functionality enabled via the 
generic call, but this message would still show up and potentially 
confuse the user. Can we keep a flag to check whether the generic method 
gave us control, and only give the error if both fail to enable it?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
  2011-06-21  7:16 ` [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module Huang Ying
  2011-06-21  7:23   ` Andi Kleen
@ 2011-06-21 13:23   ` Matthew Garrett
  1 sibling, 0 replies; 12+ messages in thread
From: Matthew Garrett @ 2011-06-21 13:23 UTC (permalink / raw)
  To: Huang Ying; +Cc: Len Brown, linux-kernel, Andi Kleen, Tony Luck, linux-acpi

Other than my proviso in 4/4, this set looks good.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] ACPI, APEI, Add APEI _OSC support
  2011-06-21 13:22   ` Matthew Garrett
@ 2011-06-22  2:21     ` Huang Ying
  2011-06-22 15:49       ` Matthew Garrett
  0 siblings, 1 reply; 12+ messages in thread
From: Huang Ying @ 2011-06-22  2:21 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Len Brown, linux-kernel, Andi Kleen, Luck, Tony, linux-acpi

On 06/21/2011 09:22 PM, Matthew Garrett wrote:
> On Tue, Jun 21, 2011 at 03:16:27PM +0800, Huang Ying wrote:
> 
>> +	rc = apei_osc_setup();
>> +	if (rc)
>> +		pr_info(GHES_PFX "Evaluate APEI _OSC failed!\n");
> 
> Hm. This is maybe a little strong. It'd be valid for a machine to return 
> an error here but still have the GHES functionality enabled via the 
> generic call, but this message would still show up and potentially 
> confuse the user. Can we keep a flag to check whether the generic method 
> gave us control, and only give the error if both fail to enable it?

At least on some of my testing machine, generic _OSC call will not
return any error even it does not support APEI bit.  So I think
sometimes it may be helpful to printk something here.  To avoid
confusion, can we change the message as follow.

- generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
enabled by APEI bit.
- generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
enabled by APEI _OSC.
- both succeeded: APEI firmware first mode is enabled by APEI bit and
APEI _OSC.
- both failed: Failed to enable APEI firmware first mode!

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] ACPI, APEI, Add APEI _OSC support
  2011-06-22  2:21     ` Huang Ying
@ 2011-06-22 15:49       ` Matthew Garrett
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Garrett @ 2011-06-22 15:49 UTC (permalink / raw)
  To: Huang Ying; +Cc: Len Brown, linux-kernel, Andi Kleen, Luck, Tony, linux-acpi

On Wed, Jun 22, 2011 at 10:21:20AM +0800, Huang Ying wrote:

> At least on some of my testing machine, generic _OSC call will not
> return any error even it does not support APEI bit.  So I think
> sometimes it may be helpful to printk something here.  To avoid
> confusion, can we change the message as follow.
> 
> - generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
> enabled by APEI bit.
> - generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
> enabled by APEI _OSC.

"WHEA _OSC", I think. The UUID here is part of the Microsoft spec, not 
the APEI one.

> - both succeeded: APEI firmware first mode is enabled by APEI bit and
> APEI _OSC.
> - both failed: Failed to enable APEI firmware first mode!

Otherwise, looks good.
-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-06-22 15:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-21  7:16 [PATCH 0/4] ACPI, APEI, Add APEI related _OSC support Huang Ying
2011-06-21  7:16 ` [PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module Huang Ying
2011-06-21  7:23   ` Andi Kleen
2011-06-21  7:30     ` Huang Ying
2011-06-21 13:18     ` Matthew Garrett
2011-06-21 13:23   ` Matthew Garrett
2011-06-21  7:16 ` [PATCH 2/4] ACPI, APEI, GHES, Support disable GHES at boot time Huang Ying
2011-06-21  7:16 ` [PATCH 3/4] ACPI, APEI, Add APEI bit support in generic _OSC call Huang Ying
2011-06-21  7:16 ` [PATCH 4/4] ACPI, APEI, Add APEI _OSC support Huang Ying
2011-06-21 13:22   ` Matthew Garrett
2011-06-22  2:21     ` Huang Ying
2011-06-22 15:49       ` Matthew Garrett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).