linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/3] aerdrv: Trace Event for AER
@ 2012-12-04 17:04 Lance Ortiz
  2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Lance Ortiz @ 2012-12-04 17:04 UTC (permalink / raw)
  To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	mchehab, linux-acpi, linux-pci, linux-kernel

This header file will define a new trace event that will be triggered when
a AER event occurs.  The following data will be provided to the trace
event.

char * dev_name - The name of the slot where the device resides
                  ([domain:]bus:device.function).

u32 status - Either the correctable or uncorrectable register
             indicating what error or errors have been see.

u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED

The trace event will also provide a trace string that may look like:

"0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
TLP"

v1-v2 Move header from include/ras/aer_event.h to
include/trace/events/ras.h
v3-v4 Cleaned up comments and commit header
v4-v5 More cleanup remove () from if statement in print.
      Renamed string define to be more specific.
v5-v6 change TRACE_SYSTEM define to be ras and not aer.
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
---

 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
new file mode 100644
index 0000000..88b8783
--- /dev/null
+++ b/include/trace/events/ras.h
@@ -0,0 +1,77 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+
+#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_AER_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+
+
+/*
+ * PCIe AER Trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event on a PCIe device. The event report has
+ * the following structure:
+ *
+ * char * dev_name -	The name of the slot where the device resides
+ *			([domain:]bus:device.function).
+ * u32 status -		Either the correctable or uncorrectable register
+ *			indicating what error or errors have been seen
+ * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
+ */
+
+#define aer_correctable_errors		\
+	{BIT(0),	"Receiver Error"},		\
+	{BIT(6),	"Bad TLP"},			\
+	{BIT(7),	"Bad DLLP"},			\
+	{BIT(8),	"RELAY_NUM Rollover"},		\
+	{BIT(12),	"Replay Timer Timeout"},	\
+	{BIT(13),	"Advisory Non-Fatal"}
+
+#define aer_uncorrectable_errors		\
+	{BIT(4),	"Data Link Protocol"},		\
+	{BIT(12),	"Poisoned TLP"},		\
+	{BIT(13),	"Flow Control Protocol"},	\
+	{BIT(14),	"Completion Timeout"},		\
+	{BIT(15),	"Completer Abort"},		\
+	{BIT(16),	"Unexpected Completion"},	\
+	{BIT(17),	"Receiver Overflow"},		\
+	{BIT(18),	"Malformed TLP"},		\
+	{BIT(19),	"ECRC"},			\
+	{BIT(20),	"Unsupported Request"}
+
+TRACE_EVENT(aer_event,
+	TP_PROTO(const char *dev_name,
+		 const u32 status,
+		 const u8 severity),
+
+	TP_ARGS(dev_name, status, severity),
+
+	TP_STRUCT__entry(
+		__string(	dev_name,	dev_name	)
+		__field(	u32,		status		)
+		__field(	u8,		severity	)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->status		= status;
+		__entry->severity	= severity;
+	),
+
+	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
+		__get_str(dev_name),
+		__entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
+			__entry->severity == HW_EVENT_ERR_FATAL ?
+			"Fatal" : "Uncorrected",
+		__entry->severity == HW_EVENT_ERR_CORRECTED ?
+		__print_flags(__entry->status, "|", aer_correctable_errors) :
+		__print_flags(__entry->status, "|", aer_uncorrectable_errors))
+);
+
+#endif /* _TRACE_AER_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/3] aerdrv: Enhanced AER logging
  2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz
@ 2012-12-04 17:04 ` Lance Ortiz
  2012-12-04 18:41   ` Mauro Carvalho Chehab
  2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz
  2012-12-04 18:36 ` [PATCH v6 1/3] aerdrv: Trace Event for AER Mauro Carvalho Chehab
  2 siblings, 1 reply; 10+ messages in thread
From: Lance Ortiz @ 2012-12-04 17:04 UTC (permalink / raw)
  To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	mchehab, linux-acpi, linux-pci, linux-kernel

This patch will provide a more reliable and easy way for user-space
applications to have access to AER logs rather than reading them from the
message buffer. It also provides a way to notify user-space when an AER
event occurs.

The aer driver is updated to generate a trace event of function 'aer_event'
when a PCIe error is reported over the AER interface.  The trace event was
added to both the interrupt based aer path and the firmware first path.

v1-v2 fix compile errors in ifdefs.
v2-v3 Update to new location of trace header. Update print to remove
warning.
v3-v4 Reworked logic when getting ready to call cper_print_aer
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
---

 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
index e6defd8..4a3e945 100644
--- a/drivers/acpi/apei/cper.c
+++ b/drivers/acpi/apei/cper.c
@@ -29,6 +29,7 @@
 #include <linux/time.h>
 #include <linux/cper.h>
 #include <linux/acpi.h>
+#include <linux/pci.h>
 #include <linux/aer.h>
 
 /*
@@ -249,6 +250,10 @@ static const char *cper_pcie_port_type_strs[] = {
 static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 			    const struct acpi_hest_generic_data *gdata)
 {
+#ifdef CONFIG_ACPI_APEI_PCIEAER
+	struct pci_dev *dev;
+#endif
+
 	if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE)
 		printk("%s""port_type: %d, %s\n", pfx, pcie->port_type,
 		       pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ?
@@ -281,10 +286,18 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 	"%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n",
 	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-	if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) {
-		struct aer_capability_regs *aer_regs = (void *)pcie->aer_info;
-		cper_print_aer(pfx, gdata->error_severity, aer_regs);
+	dev = pci_get_domain_bus_and_slot(pcie->device_id.segment,
+			pcie->device_id.bus, pcie->device_id.function);
+	if (!dev) {
+		pr_info("PCI AER Cannot get PCI device %04x:%02x:%02x.%d\n",
+			pcie->device_id.segment, pcie->device_id.bus,
+			pcie->device_id.slot, pcie->device_id.function);
+		return;
 	}
+	if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO)
+		cper_print_aer(dev, gdata->error_severity,
+				(struct aer_capability_regs *) pcie->aer_info);
+	pci_dev_put(dev);
 #endif
 }
 
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 3ea5173..34d96e4 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -23,6 +23,9 @@
 
 #include "aerdrv.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/ras.h>
+
 #define AER_AGENT_RECEIVER		0
 #define AER_AGENT_REQUESTER		1
 #define AER_AGENT_COMPLETER		2
@@ -194,6 +197,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	if (info->id && info->error_dev_num > 1 && info->id == id)
 		printk("%s""  Error of this Agent(%04x) is reported first\n",
 			prefix, id);
+	trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
+			info->severity);
 }
 
 void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
@@ -217,12 +222,13 @@ int cper_severity_to_aer(int cper_severity)
 }
 EXPORT_SYMBOL_GPL(cper_severity_to_aer);
 
-void cper_print_aer(const char *prefix, int cper_severity,
+void cper_print_aer(struct pci_dev *dev, int cper_severity,
 		    struct aer_capability_regs *aer)
 {
 	int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0;
 	u32 status, mask;
 	const char **status_strs;
+	char *prefix = NULL;
 
 	aer_severity = cper_severity_to_aer(cper_severity);
 	if (aer_severity == AER_CORRECTABLE) {
@@ -259,5 +265,7 @@ void cper_print_aer(const char *prefix, int cper_severity,
 			*(tlp + 8), *(tlp + 15), *(tlp + 14),
 			*(tlp + 13), *(tlp + 12));
 	}
+	trace_aer_event(dev_name(&dev->dev), (status & ~mask),
+			aer_severity);
 }
 #endif
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 544abdb..7b86dc6 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -49,7 +49,7 @@ static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 }
 #endif
 
-extern void cper_print_aer(const char *prefix, int cper_severity,
+extern void cper_print_aer(struct pci_dev *dev, int cper_severity,
 			   struct aer_capability_regs *aer);
 extern int cper_severity_to_aer(int cper_severity);
 extern void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER
  2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz
  2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz
@ 2012-12-04 17:04 ` Lance Ortiz
  2012-12-04 17:39   ` Joe Perches
  2012-12-04 18:36 ` [PATCH v6 1/3] aerdrv: Trace Event for AER Mauro Carvalho Chehab
  2 siblings, 1 reply; 10+ messages in thread
From: Lance Ortiz @ 2012-12-04 17:04 UTC (permalink / raw)
  To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	mchehab, linux-acpi, linux-pci, linux-kernel

These changes make cper_print_aer more consistent with aer_print_error
which is called in the AER interrupt case. The string in the variable
'prefix' is printed at the beginning of each print statement in
cper_print_aer(). The prefix is a string containing the driver name
and the device's slot location.  From the callers the value of prefix
is never assigned and is NULL, so when cper_print_aer prints data the
initial string does not get printed.  This string is important because
it identifies the device that the error is on.

v1-v2 fix some compile errors withinn the #ifdef
v3-v4 remove agent id stuff and kept print the same to avoid
compatibility issues

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
---

 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 34d96e4..58ff4c0 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int cper_severity,
 	int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0;
 	u32 status, mask;
 	const char **status_strs;
-	char *prefix = NULL;
+	char prefix[44];
 
 	aer_severity = cper_severity_to_aer(cper_severity);
+	snprintf(prefix, sizeof(prefix), "%s%s %s: ",
+		(aer_severity == AER_CORRECTABLE) ?
+		KERN_WARNING : KERN_ERR,
+		dev_driver_string(&dev->dev), dev_name(&dev->dev));
+
 	if (aer_severity == AER_CORRECTABLE) {
 		status = aer->cor_status;
 		mask = aer->cor_mask;


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER
  2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz
@ 2012-12-04 17:39   ` Joe Perches
  2012-12-04 20:06     ` Ortiz, Lance E
  0 siblings, 1 reply; 10+ messages in thread
From: Joe Perches @ 2012-12-04 17:39 UTC (permalink / raw)
  To: Lance Ortiz
  Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	mchehab, linux-acpi, linux-pci, linux-kernel

On Tue, 2012-12-04 at 10:04 -0700, Lance Ortiz wrote:
> These changes make cper_print_aer more consistent with aer_print_error
> which is called in the AER interrupt case. The string in the variable
> 'prefix' is printed at the beginning of each print statement in
> cper_print_aer(). The prefix is a string containing the driver name
> and the device's slot location.  From the callers the value of prefix
> is never assigned and is NULL, so when cper_print_aer prints data the
> initial string does not get printed.  This string is important because
> it identifies the device that the error is on.

Hi again Lance.

> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
[]
> @@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int cper_severity,
>  	int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0;
>  	u32 status, mask;
>  	const char **status_strs;
> -	char *prefix = NULL;
> +	char prefix[44];
>  
>  	aer_severity = cper_severity_to_aer(cper_severity);
> +	snprintf(prefix, sizeof(prefix), "%s%s %s: ",
> +		(aer_severity == AER_CORRECTABLE) ?
> +		KERN_WARNING : KERN_ERR,
> +		dev_driver_string(&dev->dev), dev_name(&dev->dev));
> +
>  	if (aer_severity == AER_CORRECTABLE) {
>  		status = aer->cor_status;
>  		mask = aer->cor_mask;

Perhaps this would be better just using dev_printk
instead of snprintf into a prefix with printk to
emulate dev_printk.

Also, perhaps KERN_NOTICE is preferable to KERN_WARNING
in the CORRECTABLE case.

Maybe something like:

	const char *level = KERN_ERR;
	if (aer_severity == AER_CORRECTABLE)
		level = KERN_NOTICE;

	...

	dev_printk(level, &dev->dev, etc...);

Maybe do this after this series of patches is accepted.
Enough with the revisions for awhile...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] aerdrv: Trace Event for AER
  2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz
  2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz
  2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz
@ 2012-12-04 18:36 ` Mauro Carvalho Chehab
  2 siblings, 0 replies; 10+ messages in thread
From: Mauro Carvalho Chehab @ 2012-12-04 18:36 UTC (permalink / raw)
  To: Lance Ortiz
  Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	linux-acpi, linux-pci, linux-kernel

Em Tue, 04 Dec 2012 10:04:30 -0700
Lance Ortiz <lance.ortiz@hp.com> escreveu:

> This header file will define a new trace event that will be triggered when
> a AER event occurs.  The following data will be provided to the trace
> event.
> 
> char * dev_name - The name of the slot where the device resides
>                   ([domain:]bus:device.function).
> 
> u32 status - Either the correctable or uncorrectable register
>              indicating what error or errors have been see.
> 
> u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> 
> The trace event will also provide a trace string that may look like:
> 
> "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
> TLP"
> 
> v1-v2 Move header from include/ras/aer_event.h to
> include/trace/events/ras.h
> v3-v4 Cleaned up comments and commit header
> v4-v5 More cleanup remove () from if statement in print.
>       Renamed string define to be more specific.
> v5-v6 change TRACE_SYSTEM define to be ras and not aer.
> Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
> ---
> 
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
> new file mode 100644
> index 0000000..88b8783
> --- /dev/null
> +++ b/include/trace/events/ras.h
> @@ -0,0 +1,77 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM ras
> +
> +#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_AER_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +
> +
> +/*
> + * PCIe AER Trace event
> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event on a PCIe device. The event report has
> + * the following structure:
> + *
> + * char * dev_name -	The name of the slot where the device resides
> + *			([domain:]bus:device.function).
> + * u32 status -		Either the correctable or uncorrectable register
> + *			indicating what error or errors have been seen
> + * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> + */
> +
> +#define aer_correctable_errors		\
> +	{BIT(0),	"Receiver Error"},		\
> +	{BIT(6),	"Bad TLP"},			\
> +	{BIT(7),	"Bad DLLP"},			\
> +	{BIT(8),	"RELAY_NUM Rollover"},		\
> +	{BIT(12),	"Replay Timer Timeout"},	\
> +	{BIT(13),	"Advisory Non-Fatal"}
> +
> +#define aer_uncorrectable_errors		\
> +	{BIT(4),	"Data Link Protocol"},		\
> +	{BIT(12),	"Poisoned TLP"},		\
> +	{BIT(13),	"Flow Control Protocol"},	\
> +	{BIT(14),	"Completion Timeout"},		\
> +	{BIT(15),	"Completer Abort"},		\
> +	{BIT(16),	"Unexpected Completion"},	\
> +	{BIT(17),	"Receiver Overflow"},		\
> +	{BIT(18),	"Malformed TLP"},		\
> +	{BIT(19),	"ECRC"},			\
> +	{BIT(20),	"Unsupported Request"}
> +
> +TRACE_EVENT(aer_event,
> +	TP_PROTO(const char *dev_name,
> +		 const u32 status,
> +		 const u8 severity),
> +
> +	TP_ARGS(dev_name, status, severity),
> +
> +	TP_STRUCT__entry(
> +		__string(	dev_name,	dev_name	)
> +		__field(	u32,		status		)
> +		__field(	u8,		severity	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->status		= status;
> +		__entry->severity	= severity;
> +	),
> +
> +	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
> +		__get_str(dev_name),
> +		__entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
> +			__entry->severity == HW_EVENT_ERR_FATAL ?
> +			"Fatal" : "Uncorrected",
> +		__entry->severity == HW_EVENT_ERR_CORRECTED ?
> +		__print_flags(__entry->status, "|", aer_correctable_errors) :
> +		__print_flags(__entry->status, "|", aer_uncorrectable_errors))
> +);
> +
> +#endif /* _TRACE_AER_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> 

Provided that a latter patch adds the missing bits and joins the strings used
both here and at aer, as proposed by Boris:

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>

-- 

Cheers,
Mauro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] aerdrv: Enhanced AER logging
  2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz
@ 2012-12-04 18:41   ` Mauro Carvalho Chehab
  2012-12-04 20:14     ` Ortiz, Lance E
  0 siblings, 1 reply; 10+ messages in thread
From: Mauro Carvalho Chehab @ 2012-12-04 18:41 UTC (permalink / raw)
  To: Lance Ortiz
  Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	linux-acpi, linux-pci, linux-kernel

Em Tue, 04 Dec 2012 10:04:36 -0700
Lance Ortiz <lance.ortiz@hp.com> escreveu:

> This patch will provide a more reliable and easy way for user-space
> applications to have access to AER logs rather than reading them from the
> message buffer. It also provides a way to notify user-space when an AER
> event occurs.
> 
> The aer driver is updated to generate a trace event of function 'aer_event'
> when a PCIe error is reported over the AER interface.  The trace event was
> added to both the interrupt based aer path and the firmware first path.
> 
> v1-v2 fix compile errors in ifdefs.
> v2-v3 Update to new location of trace header. Update print to remove
> warning.
> v3-v4 Reworked logic when getting ready to call cper_print_aer
> Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
> ---
> 
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
> index e6defd8..4a3e945 100644
> --- a/drivers/acpi/apei/cper.c
> +++ b/drivers/acpi/apei/cper.c
> @@ -29,6 +29,7 @@
>  #include <linux/time.h>
>  #include <linux/cper.h>
>  #include <linux/acpi.h>
> +#include <linux/pci.h>
>  #include <linux/aer.h>
>  
>  /*
> @@ -249,6 +250,10 @@ static const char *cper_pcie_port_type_strs[] = {
>  static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
>  			    const struct acpi_hest_generic_data *gdata)
>  {
> +#ifdef CONFIG_ACPI_APEI_PCIEAER
> +	struct pci_dev *dev;
> +#endif
> +
>  	if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE)
>  		printk("%s""port_type: %d, %s\n", pfx, pcie->port_type,
>  		       pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ?
> @@ -281,10 +286,18 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
>  	"%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n",
>  	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
> -	if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> -		struct aer_capability_regs *aer_regs = (void *)pcie->aer_info;
> -		cper_print_aer(pfx, gdata->error_severity, aer_regs);
> +	dev = pci_get_domain_bus_and_slot(pcie->device_id.segment,
> +			pcie->device_id.bus, pcie->device_id.function);
> +	if (!dev) {
> +		pr_info("PCI AER Cannot get PCI device %04x:%02x:%02x.%d\n",
> +			pcie->device_id.segment, pcie->device_id.bus,
> +			pcie->device_id.slot, pcie->device_id.function);

Hmm... please correct if I'm wrong, but an error happened at PCI, and also a 
kernel bug that prevented it to get the proper PCI device... 

IMHO, the message here should be stronger, and likely printed via pr_err().

> +		return;
>  	}
> +	if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO)
> +		cper_print_aer(dev, gdata->error_severity,
> +				(struct aer_capability_regs *) pcie->aer_info);
> +	pci_dev_put(dev);
>  #endif
>  }
>  
> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index 3ea5173..34d96e4 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -23,6 +23,9 @@
>  
>  #include "aerdrv.h"
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/ras.h>
> +
>  #define AER_AGENT_RECEIVER		0
>  #define AER_AGENT_REQUESTER		1
>  #define AER_AGENT_COMPLETER		2
> @@ -194,6 +197,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>  	if (info->id && info->error_dev_num > 1 && info->id == id)
>  		printk("%s""  Error of this Agent(%04x) is reported first\n",
>  			prefix, id);
> +	trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
> +			info->severity);
>  }
>  
>  void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> @@ -217,12 +222,13 @@ int cper_severity_to_aer(int cper_severity)
>  }
>  EXPORT_SYMBOL_GPL(cper_severity_to_aer);
>  
> -void cper_print_aer(const char *prefix, int cper_severity,
> +void cper_print_aer(struct pci_dev *dev, int cper_severity,
>  		    struct aer_capability_regs *aer)
>  {
>  	int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0;
>  	u32 status, mask;
>  	const char **status_strs;
> +	char *prefix = NULL;
>  
>  	aer_severity = cper_severity_to_aer(cper_severity);
>  	if (aer_severity == AER_CORRECTABLE) {
> @@ -259,5 +265,7 @@ void cper_print_aer(const char *prefix, int cper_severity,
>  			*(tlp + 8), *(tlp + 15), *(tlp + 14),
>  			*(tlp + 13), *(tlp + 12));
>  	}
> +	trace_aer_event(dev_name(&dev->dev), (status & ~mask),
> +			aer_severity);
>  }
>  #endif
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 544abdb..7b86dc6 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -49,7 +49,7 @@ static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
>  }
>  #endif
>  
> -extern void cper_print_aer(const char *prefix, int cper_severity,
> +extern void cper_print_aer(struct pci_dev *dev, int cper_severity,
>  			   struct aer_capability_regs *aer);
>  extern int cper_severity_to_aer(int cper_severity);
>  extern void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
> 

After addressing the above:

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>


-- 

Cheers,
Mauro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER
  2012-12-04 17:39   ` Joe Perches
@ 2012-12-04 20:06     ` Ortiz, Lance E
  0 siblings, 0 replies; 10+ messages in thread
From: Ortiz, Lance E @ 2012-12-04 20:06 UTC (permalink / raw)
  To: Joe Perches
  Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	mchehab, linux-acpi, linux-pci, linux-kernel

> Hi again Lance.
> 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> []
> > @@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int
> cper_severity,
> >  	int aer_severity, layer, agent, status_strs_size,
> tlp_header_valid = 0;
> >  	u32 status, mask;
> >  	const char **status_strs;
> > -	char *prefix = NULL;
> > +	char prefix[44];
> >
> >  	aer_severity = cper_severity_to_aer(cper_severity);
> > +	snprintf(prefix, sizeof(prefix), "%s%s %s: ",
> > +		(aer_severity == AER_CORRECTABLE) ?
> > +		KERN_WARNING : KERN_ERR,
> > +		dev_driver_string(&dev->dev), dev_name(&dev->dev));
> > +
> >  	if (aer_severity == AER_CORRECTABLE) {
> >  		status = aer->cor_status;
> >  		mask = aer->cor_mask;
> 
> Perhaps this would be better just using dev_printk
> instead of snprintf into a prefix with printk to
> emulate dev_printk.
> 
> Also, perhaps KERN_NOTICE is preferable to KERN_WARNING
> in the CORRECTABLE case.
> 
> Maybe something like:
> 
> 	const char *level = KERN_ERR;
> 	if (aer_severity == AER_CORRECTABLE)
> 		level = KERN_NOTICE;
> 
> 	...
> 
> 	dev_printk(level, &dev->dev, etc...);
> 
> Maybe do this after this series of patches is accepted.
> Enough with the revisions for awhile...

Hi Joe,

The reason I did it this way was I was trying to be consistent with what was done for the AER interrupt case.  The function aer_print_error()in the same file uses sprintf this way.  I'm not sure why they chose that instead of dev_printk.  I think if we did make the change we should do it in both places to keep them the same.  

Lance

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v6 2/3] aerdrv: Enhanced AER logging
  2012-12-04 18:41   ` Mauro Carvalho Chehab
@ 2012-12-04 20:14     ` Ortiz, Lance E
  2012-12-04 20:31       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 10+ messages in thread
From: Ortiz, Lance E @ 2012-12-04 20:14 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	linux-acpi, linux-pci, linux-kernel

> > +	if (!dev) {
> > +		pr_info("PCI AER Cannot get PCI device
> %04x:%02x:%02x.%d\n",
> > +			pcie->device_id.segment, pcie->device_id.bus,
> > +			pcie->device_id.slot, pcie->device_id.function);
> 
> Hmm... please correct if I'm wrong, but an error happened at PCI, and
> also a
> kernel bug that prevented it to get the proper PCI device...
> 
> IMHO, the message here should be stronger, and likely printed via
> pr_err().
> 

Mauro,

I modeled this message after other places in the kernel where this function failed.  So I figured it would be safe to be consistent there.  I agree though that it should be pr_err().  I can make that change.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] aerdrv: Enhanced AER logging
  2012-12-04 20:14     ` Ortiz, Lance E
@ 2012-12-04 20:31       ` Mauro Carvalho Chehab
  2012-12-04 20:36         ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Mauro Carvalho Chehab @ 2012-12-04 20:31 UTC (permalink / raw)
  To: Ortiz, Lance E
  Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt,
	linux-acpi, linux-pci, linux-kernel

Em Tue, 4 Dec 2012 20:14:10 +0000
"Ortiz, Lance E" <lance.oritz@hp.com> escreveu:

> > > +	if (!dev) {
> > > +		pr_info("PCI AER Cannot get PCI device
> > %04x:%02x:%02x.%d\n",
> > > +			pcie->device_id.segment, pcie->device_id.bus,
> > > +			pcie->device_id.slot, pcie->device_id.function);
> > 
> > Hmm... please correct if I'm wrong, but an error happened at PCI, and
> > also a
> > kernel bug that prevented it to get the proper PCI device...
> > 
> > IMHO, the message here should be stronger, and likely printed via
> > pr_err().
> > 
> 
> Mauro,
> 
> I modeled this message after other places in the kernel where this function failed.  So I figured it would be safe to be consistent there.  I agree though that it should be pr_err().  I can make that change.

I understand. On most cases, this may not be a critical issue.

However, in this particular case, if PCI AER got an error, but the device
is not found when trying to handle it, it can be an indication that 
the PCI device has a more serious issue. So, I'm in favor of changing it,
and likely be more verbose at the error message, saying that the device
was not found while trying to report an error condition that happened
there. It could make sense to even send a trace for the daemon to be
aware of the error, on some pci device that vanished likely due to the
error.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] aerdrv: Enhanced AER logging
  2012-12-04 20:31       ` Mauro Carvalho Chehab
@ 2012-12-04 20:36         ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2012-12-04 20:36 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Ortiz, Lance E, bhelgaas, lance_ortiz, jiang.liu, tony.luck,
	rostedt, linux-acpi, linux-pci, linux-kernel

On Tue, Dec 04, 2012 at 06:31:10PM -0200, Mauro Carvalho Chehab wrote:
> However, in this particular case, if PCI AER got an error, but the
> device is not found when trying to handle it, it can be an indication
> that the PCI device has a more serious issue. So, I'm in favor
> of changing it, and likely be more verbose at the error message,
> saying that the device was not found while trying to report an error
> condition that happened there. It could make sense to even send a
> trace for the daemon to be aware of the error, on some pci device that
> vanished likely due to the error.

Let's leave it at pr_err now and if we actually start seeing errors like
that and decide that they need more verbose reporting, then to change it
to whatever's best. IOW, do the empirical approach.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-04 20:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz
2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz
2012-12-04 18:41   ` Mauro Carvalho Chehab
2012-12-04 20:14     ` Ortiz, Lance E
2012-12-04 20:31       ` Mauro Carvalho Chehab
2012-12-04 20:36         ` Borislav Petkov
2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz
2012-12-04 17:39   ` Joe Perches
2012-12-04 20:06     ` Ortiz, Lance E
2012-12-04 18:36 ` [PATCH v6 1/3] aerdrv: Trace Event for AER Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).