All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling
@ 2017-11-28 21:48 Tyler Baicar
  2017-11-28 21:48 ` [PATCH V4 1/2] acpi: apei: handle PCIe AER errors in separate function Tyler Baicar
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Tyler Baicar @ 2017-11-28 21:48 UTC (permalink / raw)
  To: rjw, tony.luck, will.deacon, james.morse
  Cc: bp, linux-acpi, linux-kernel, Tyler Baicar

First, break the PCIe AER handling out into its own function to separate
it from the standard GHES processing

Then fix the AER handling to process all errors in the AER driver rather
than only handling recoverable errors.

V4: Rebase to 4.15-rc1 and add reviewed-by

Tyler Baicar (2):
  acpi: apei: handle PCIe AER errors in separate function
  acpi: apei: call into AER handling regardless of severity

 drivers/acpi/apei/ghes.c | 76 +++++++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 30 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH V4 1/2] acpi: apei: handle PCIe AER errors in separate function
  2017-11-28 21:48 [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Tyler Baicar
@ 2017-11-28 21:48 ` Tyler Baicar
  2017-11-28 21:48 ` [PATCH V4 2/2] acpi: apei: call into AER handling regardless of severity Tyler Baicar
  2017-12-06  0:56 ` [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Rafael J. Wysocki
  2 siblings, 0 replies; 4+ messages in thread
From: Tyler Baicar @ 2017-11-28 21:48 UTC (permalink / raw)
  To: rjw, tony.luck, will.deacon, james.morse
  Cc: bp, linux-acpi, linux-kernel, Tyler Baicar

Move PCIe AER error handling code into a separate function.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 64 +++++++++++++++++++++++++-----------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6402f7f..f67eb76 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -414,6 +414,39 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 #endif
 }
 
+static void ghes_handle_aer(struct acpi_hest_generic_data *gdata, int sev, int sec_sev)
+{
+#ifdef CONFIG_ACPI_APEI_PCIEAER
+	struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+	if (sev == GHES_SEV_RECOVERABLE &&
+	    sec_sev == GHES_SEV_RECOVERABLE &&
+	    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+	    pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
+		unsigned int devfn;
+		int aer_severity;
+
+		devfn = PCI_DEVFN(pcie_err->device_id.device,
+				  pcie_err->device_id.function);
+		aer_severity = cper_severity_to_aer(gdata->error_severity);
+
+		/*
+		 * If firmware reset the component to contain
+		 * the error, we must reinitialize it before
+		 * use, so treat it as a fatal AER error.
+		 */
+		if (gdata->flags & CPER_SEC_RESET)
+			aer_severity = AER_FATAL;
+
+		aer_recover_queue(pcie_err->device_id.segment,
+				  pcie_err->device_id.bus,
+				  devfn, aer_severity,
+				  (struct aer_capability_regs *)
+				  pcie_err->aer_info);
+	}
+#endif
+}
+
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -441,38 +474,9 @@ static void ghes_do_proc(struct ghes *ghes,
 			arch_apei_report_mem_error(sev, mem_err);
 			ghes_handle_memory_failure(gdata, sev);
 		}
-#ifdef CONFIG_ACPI_APEI_PCIEAER
 		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
-			struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
-
-			if (sev == GHES_SEV_RECOVERABLE &&
-			    sec_sev == GHES_SEV_RECOVERABLE &&
-			    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
-			    pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
-				unsigned int devfn;
-				int aer_severity;
-
-				devfn = PCI_DEVFN(pcie_err->device_id.device,
-						  pcie_err->device_id.function);
-				aer_severity = cper_severity_to_aer(gdata->error_severity);
-
-				/*
-				 * If firmware reset the component to contain
-				 * the error, we must reinitialize it before
-				 * use, so treat it as a fatal AER error.
-				 */
-				if (gdata->flags & CPER_SEC_RESET)
-					aer_severity = AER_FATAL;
-
-				aer_recover_queue(pcie_err->device_id.segment,
-						  pcie_err->device_id.bus,
-						  devfn, aer_severity,
-						  (struct aer_capability_regs *)
-						  pcie_err->aer_info);
-			}
-
+			ghes_handle_aer(gdata, sev, sec_sev);
 		}
-#endif
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH V4 2/2] acpi: apei: call into AER handling regardless of severity
  2017-11-28 21:48 [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Tyler Baicar
  2017-11-28 21:48 ` [PATCH V4 1/2] acpi: apei: handle PCIe AER errors in separate function Tyler Baicar
@ 2017-11-28 21:48 ` Tyler Baicar
  2017-12-06  0:56 ` [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Rafael J. Wysocki
  2 siblings, 0 replies; 4+ messages in thread
From: Tyler Baicar @ 2017-11-28 21:48 UTC (permalink / raw)
  To: rjw, tony.luck, will.deacon, james.morse
  Cc: bp, linux-acpi, linux-kernel, Tyler Baicar

Currently the GHES code only calls into the AER driver for
recoverable type errors. This is incorrect because errors of
other severities do not get logged by the AER driver and do not
get exposed to user space via the AER trace event. So, call
into the AER driver for PCIe errors regardless of the severity

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index f67eb76..cc65d19 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -414,14 +414,26 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 #endif
 }
 
-static void ghes_handle_aer(struct acpi_hest_generic_data *gdata, int sev, int sec_sev)
+/*
+ * PCIe AER errors need to be sent to the AER driver for reporting and
+ * recovery. The GHES severities map to the following AER severities and
+ * require the following handling:
+ *
+ * GHES_SEV_CORRECTABLE -> AER_CORRECTABLE
+ *     These need to be reported by the AER driver but no recovery is
+ *     necessary.
+ * GHES_SEV_RECOVERABLE -> AER_NONFATAL
+ * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
+ *     These both need to be reported and recovered from by the AER driver.
+ * GHES_SEV_PANIC does not make it to this handling since the kernel must
+ *     panic.
+ */
+static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 {
 #ifdef CONFIG_ACPI_APEI_PCIEAER
 	struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
 
-	if (sev == GHES_SEV_RECOVERABLE &&
-	    sec_sev == GHES_SEV_RECOVERABLE &&
-	    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+	if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
 	    pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
 		unsigned int devfn;
 		int aer_severity;
@@ -475,7 +487,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_handle_memory_failure(gdata, sev);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
-			ghes_handle_aer(gdata, sev, sec_sev);
+			ghes_handle_aer(gdata);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling
  2017-11-28 21:48 [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Tyler Baicar
  2017-11-28 21:48 ` [PATCH V4 1/2] acpi: apei: handle PCIe AER errors in separate function Tyler Baicar
  2017-11-28 21:48 ` [PATCH V4 2/2] acpi: apei: call into AER handling regardless of severity Tyler Baicar
@ 2017-12-06  0:56 ` Rafael J. Wysocki
  2 siblings, 0 replies; 4+ messages in thread
From: Rafael J. Wysocki @ 2017-12-06  0:56 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: tony.luck, will.deacon, james.morse, bp, linux-acpi, linux-kernel

On Tuesday, November 28, 2017 10:48:07 PM CET Tyler Baicar wrote:
> First, break the PCIe AER handling out into its own function to separate
> it from the standard GHES processing
> 
> Then fix the AER handling to process all errors in the AER driver rather
> than only handling recoverable errors.
> 
> V4: Rebase to 4.15-rc1 and add reviewed-by
> 
> Tyler Baicar (2):
>   acpi: apei: handle PCIe AER errors in separate function
>   acpi: apei: call into AER handling regardless of severity
> 
>  drivers/acpi/apei/ghes.c | 76 +++++++++++++++++++++++++++++-------------------
>  1 file changed, 46 insertions(+), 30 deletions(-)

Both applied, thanks!


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-12-06  0:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28 21:48 [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Tyler Baicar
2017-11-28 21:48 ` [PATCH V4 1/2] acpi: apei: handle PCIe AER errors in separate function Tyler Baicar
2017-11-28 21:48 ` [PATCH V4 2/2] acpi: apei: call into AER handling regardless of severity Tyler Baicar
2017-12-06  0:56 ` [PATCH V4 0/2] Restructure and fix GHES PCIe AER handling Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.