All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] PCI/AER: Clean up logging
@ 2023-12-06 22:42 ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas

From: Bjorn Helgaas <bhelgaas@google.com>

Clean up some minor AER logging issues:

  - Log as "Correctable errors", not "Corrected errors"

  - Decode the Requester ID when we couldn't find detail error info

Bjorn Helgaas (3):
  PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  PCI/AER: Decode Requester ID when no error info found
  PCI/AER: Use explicit register sizes for struct members

 drivers/pci/pcie/aer.c | 19 ++++++++++++-------
 include/linux/aer.h    |  8 ++++----
 2 files changed, 16 insertions(+), 11 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/3] PCI/AER: Clean up logging
@ 2023-12-06 22:42 ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev

From: Bjorn Helgaas <bhelgaas@google.com>

Clean up some minor AER logging issues:

  - Log as "Correctable errors", not "Corrected errors"

  - Decode the Requester ID when we couldn't find detail error info

Bjorn Helgaas (3):
  PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  PCI/AER: Decode Requester ID when no error info found
  PCI/AER: Use explicit register sizes for struct members

 drivers/pci/pcie/aer.c | 19 ++++++++++++-------
 include/linux/aer.h    |  8 ++++----
 2 files changed, 16 insertions(+), 11 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  2023-12-06 22:42 ` Bjorn Helgaas
@ 2023-12-06 22:42   ` Bjorn Helgaas
  -1 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas

From: Bjorn Helgaas <bhelgaas@google.com>

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected".  To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected.  The recovery agent must report an uncorrectable error if it is
unable to recover.  If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

  - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
  + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
  * AER error strings
  */
 static const char *aer_error_severity_string[] = {
-	"Uncorrected (Non-Fatal)",
-	"Uncorrected (Fatal)",
-	"Corrected"
+	"Uncorrectable (Non-Fatal)",
+	"Uncorrectable (Fatal)",
+	"Correctable"
 };
 
 static const char *aer_error_layer[] = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
@ 2023-12-06 22:42   ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev

From: Bjorn Helgaas <bhelgaas@google.com>

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected".  To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected.  The recovery agent must report an uncorrectable error if it is
unable to recover.  If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

  - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
  + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
  * AER error strings
  */
 static const char *aer_error_severity_string[] = {
-	"Uncorrected (Non-Fatal)",
-	"Uncorrected (Fatal)",
-	"Corrected"
+	"Uncorrectable (Non-Fatal)",
+	"Uncorrectable (Fatal)",
+	"Correctable"
 };
 
 static const char *aer_error_layer[] = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
  2023-12-06 22:42 ` Bjorn Helgaas
@ 2023-12-06 22:42   ` Bjorn Helgaas
  -1 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas

From: Bjorn Helgaas <bhelgaas@google.com>

When a device with AER detects an error, it logs error information in its
own AER Error Status registers.  It may send an Error Message to the Root
Port (RCEC in the case of an RCiEP), which logs the fact that an Error
Message was received (Root Error Status) and the Requester ID of the
message source (Error Source Identification).

aer_print_port_info() prints the Requester ID from the Root Port Error
Source in the usual Linux "bb:dd.f" format, but when find_source_device()
finds no error details in the hierarchy below the Root Port, it printed the
raw Requester ID without decoding it.

Decode the Requester ID in the usual Linux format so it matches other
messages.

Sample message changes:

  - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
  - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
  + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
  + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 20db80018b5d..2ff6bac9979f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
 	u8 bus = info->id >> 8;
 	u8 devfn = info->id & 0xff;
 
-	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
+	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
 		 info->multi_error_valid ? "Multiple " : "",
 		 aer_error_severity_string[info->severity],
 		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
@@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
 		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
 
 	if (!e_info->error_dev_num) {
-		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+		u8 bus = e_info->id >> 8;
+		u8 devfn = e_info->id & 0xff;
+
+		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
+			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
+			 PCI_FUNC(devfn));
 		return false;
 	}
 	return true;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
@ 2023-12-06 22:42   ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev

From: Bjorn Helgaas <bhelgaas@google.com>

When a device with AER detects an error, it logs error information in its
own AER Error Status registers.  It may send an Error Message to the Root
Port (RCEC in the case of an RCiEP), which logs the fact that an Error
Message was received (Root Error Status) and the Requester ID of the
message source (Error Source Identification).

aer_print_port_info() prints the Requester ID from the Root Port Error
Source in the usual Linux "bb:dd.f" format, but when find_source_device()
finds no error details in the hierarchy below the Root Port, it printed the
raw Requester ID without decoding it.

Decode the Requester ID in the usual Linux format so it matches other
messages.

Sample message changes:

  - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
  - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
  + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
  + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 20db80018b5d..2ff6bac9979f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
 	u8 bus = info->id >> 8;
 	u8 devfn = info->id & 0xff;
 
-	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
+	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
 		 info->multi_error_valid ? "Multiple " : "",
 		 aer_error_severity_string[info->severity],
 		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
@@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
 		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
 
 	if (!e_info->error_dev_num) {
-		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+		u8 bus = e_info->id >> 8;
+		u8 devfn = e_info->id & 0xff;
+
+		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
+			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
+			 PCI_FUNC(devfn));
 		return false;
 	}
 	return true;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members
  2023-12-06 22:42 ` Bjorn Helgaas
@ 2023-12-06 22:42   ` Bjorn Helgaas
  -1 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas

From: Bjorn Helgaas <bhelgaas@google.com>

aer_irq() reads the AER Root Error Status and Error Source Identification
(PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
struct aer_err_source.  Both registers are 32 bits, so declare the members
explicitly as "u32" instead of "unsigned int".

Similarly, aer_get_device_error_info() reads the AER Header Log
(PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
aer_header_log_regs.  Declare those members as "u32" as well.

No functional changes intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 4 ++--
 include/linux/aer.h    | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2ff6bac9979f..60f84414ec2a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -41,8 +41,8 @@
 #define AER_MAX_TYPEOF_UNCOR_ERRS	27	/* as per PCI_ERR_UNCOR_STATUS*/
 
 struct aer_err_source {
-	unsigned int status;
-	unsigned int id;
+	u32 status;			/* PCI_ERR_ROOT_STATUS */
+	u32 id;				/* PCI_ERR_ROOT_ERR_SRC */
 };
 
 struct aer_rpc {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f6ea2f57d808..ae0fae70d4bd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -19,10 +19,10 @@
 struct pci_dev;
 
 struct aer_header_log_regs {
-	unsigned int dw0;
-	unsigned int dw1;
-	unsigned int dw2;
-	unsigned int dw3;
+	u32 dw0;
+	u32 dw1;
+	u32 dw2;
+	u32 dw3;
 };
 
 struct aer_capability_regs {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members
@ 2023-12-06 22:42   ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-06 22:42 UTC (permalink / raw)
  To: linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev

From: Bjorn Helgaas <bhelgaas@google.com>

aer_irq() reads the AER Root Error Status and Error Source Identification
(PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
struct aer_err_source.  Both registers are 32 bits, so declare the members
explicitly as "u32" instead of "unsigned int".

Similarly, aer_get_device_error_info() reads the AER Header Log
(PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
aer_header_log_regs.  Declare those members as "u32" as well.

No functional changes intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 4 ++--
 include/linux/aer.h    | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2ff6bac9979f..60f84414ec2a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -41,8 +41,8 @@
 #define AER_MAX_TYPEOF_UNCOR_ERRS	27	/* as per PCI_ERR_UNCOR_STATUS*/
 
 struct aer_err_source {
-	unsigned int status;
-	unsigned int id;
+	u32 status;			/* PCI_ERR_ROOT_STATUS */
+	u32 id;				/* PCI_ERR_ROOT_ERR_SRC */
 };
 
 struct aer_rpc {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f6ea2f57d808..ae0fae70d4bd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -19,10 +19,10 @@
 struct pci_dev;
 
 struct aer_header_log_regs {
-	unsigned int dw0;
-	unsigned int dw1;
-	unsigned int dw2;
-	unsigned int dw3;
+	u32 dw0;
+	u32 dw1;
+	u32 dw2;
+	u32 dw3;
 };
 
 struct aer_capability_regs {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2023-12-08 14:36     ` Jonathan Cameron
  -1 siblings, 0 replies; 29+ messages in thread
From: Jonathan Cameron @ 2023-12-08 14:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver O'Halloran,
	Robert Richter, Terry Bowman, Kai-Heng Feng, linuxppc-dev,
	linux-kernel, Bjorn Helgaas

On Wed,  6 Dec 2023 16:42:29 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com
Good to tidy this up. FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
@ 2023-12-08 14:36     ` Jonathan Cameron
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Cameron @ 2023-12-08 14:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robert Richter, Terry Bowman, linux-pci, Mahesh J Salgaonkar,
	linux-kernel, Kai-Heng Feng, Oliver O'Halloran,
	Bjorn Helgaas, linuxppc-dev

On Wed,  6 Dec 2023 16:42:29 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com
Good to tidy this up. FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2023-12-08 14:38     ` Jonathan Cameron
  -1 siblings, 0 replies; 29+ messages in thread
From: Jonathan Cameron @ 2023-12-08 14:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver O'Halloran,
	Robert Richter, Terry Bowman, Kai-Heng Feng, linuxppc-dev,
	linux-kernel, Bjorn Helgaas

On Wed,  6 Dec 2023 16:42:30 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
>   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
@ 2023-12-08 14:38     ` Jonathan Cameron
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Cameron @ 2023-12-08 14:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robert Richter, Terry Bowman, linux-pci, Mahesh J Salgaonkar,
	linux-kernel, Kai-Heng Feng, Oliver O'Halloran,
	Bjorn Helgaas, linuxppc-dev

On Wed,  6 Dec 2023 16:42:30 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
>   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2023-12-08 14:38     ` Jonathan Cameron
  -1 siblings, 0 replies; 29+ messages in thread
From: Jonathan Cameron @ 2023-12-08 14:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver O'Halloran,
	Robert Richter, Terry Bowman, Kai-Heng Feng, linuxppc-dev,
	linux-kernel, Bjorn Helgaas

On Wed,  6 Dec 2023 16:42:31 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> aer_irq() reads the AER Root Error Status and Error Source Identification
> (PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
> struct aer_err_source.  Both registers are 32 bits, so declare the members
> explicitly as "u32" instead of "unsigned int".
> 
> Similarly, aer_get_device_error_info() reads the AER Header Log
> (PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
> aer_header_log_regs.  Declare those members as "u32" as well.
> 
> No functional changes intended.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Another sensible cleanup. FWIW on such simple patches
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members
@ 2023-12-08 14:38     ` Jonathan Cameron
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Cameron @ 2023-12-08 14:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robert Richter, Terry Bowman, linux-pci, Mahesh J Salgaonkar,
	linux-kernel, Kai-Heng Feng, Oliver O'Halloran,
	Bjorn Helgaas, linuxppc-dev

On Wed,  6 Dec 2023 16:42:31 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> aer_irq() reads the AER Root Error Status and Error Source Identification
> (PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
> struct aer_err_source.  Both registers are 32 bits, so declare the members
> explicitly as "u32" instead of "unsigned int".
> 
> Similarly, aer_get_device_error_info() reads the AER Header Log
> (PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
> aer_header_log_regs.  Declare those members as "u32" as well.
> 
> No functional changes intended.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Another sensible cleanup. FWIW on such simple patches
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] PCI/AER: Clean up logging
  2023-12-06 22:42 ` Bjorn Helgaas
@ 2023-12-08 16:54   ` Bjorn Helgaas
  -1 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-08 16:54 UTC (permalink / raw)
  To: linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas, Jonathan Cameron

[+cc Jonathan]

On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Clean up some minor AER logging issues:
> 
>   - Log as "Correctable errors", not "Corrected errors"
> 
>   - Decode the Requester ID when we couldn't find detail error info
> 
> Bjorn Helgaas (3):
>   PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
>   PCI/AER: Decode Requester ID when no error info found
>   PCI/AER: Use explicit register sizes for struct members
> 
>  drivers/pci/pcie/aer.c | 19 ++++++++++++-------
>  include/linux/aer.h    |  8 ++++----
>  2 files changed, 16 insertions(+), 11 deletions(-)

Applied to pci/aer for v6.8.  Thanks, Jonathan, for your time in
taking a look.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] PCI/AER: Clean up logging
@ 2023-12-08 16:54   ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-08 16:54 UTC (permalink / raw)
  To: linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Jonathan Cameron,
	Bjorn Helgaas, linuxppc-dev

[+cc Jonathan]

On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Clean up some minor AER logging issues:
> 
>   - Log as "Correctable errors", not "Corrected errors"
> 
>   - Decode the Requester ID when we couldn't find detail error info
> 
> Bjorn Helgaas (3):
>   PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
>   PCI/AER: Decode Requester ID when no error info found
>   PCI/AER: Use explicit register sizes for struct members
> 
>  drivers/pci/pcie/aer.c | 19 ++++++++++++-------
>  include/linux/aer.h    |  8 ++++----
>  2 files changed, 16 insertions(+), 11 deletions(-)

Applied to pci/aer for v6.8.  Thanks, Jonathan, for your time in
taking a look.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2023-12-12 15:00     ` Terry Bowman
  -1 siblings, 0 replies; 29+ messages in thread
From: Terry Bowman @ 2023-12-12 15:00 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Kai-Heng Feng, linuxppc-dev, linux-kernel, Bjorn Helgaas

Hi Bjorn,

Will help prevent confusion. LGTM. 

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
@ 2023-12-12 15:00     ` Terry Bowman
  0 siblings, 0 replies; 29+ messages in thread
From: Terry Bowman @ 2023-12-12 15:00 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Robert Richter, Mahesh J Salgaonkar, linux-kernel, Kai-Heng Feng,
	Oliver O'Halloran, Bjorn Helgaas, linuxppc-dev

Hi Bjorn,

Will help prevent confusion. LGTM. 

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2023-12-12 15:00     ` Terry Bowman
  -1 siblings, 0 replies; 29+ messages in thread
From: Terry Bowman @ 2023-12-12 15:00 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Kai-Heng Feng, linuxppc-dev, linux-kernel, Bjorn Helgaas

LGTM

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
>   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pcie/aer.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 20db80018b5d..2ff6bac9979f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
>  	u8 bus = info->id >> 8;
>  	u8 devfn = info->id & 0xff;
>  
> -	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> +	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
>  		 info->multi_error_valid ? "Multiple " : "",
>  		 aer_error_severity_string[info->severity],
>  		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
>  		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
>  
>  	if (!e_info->error_dev_num) {
> -		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> +		u8 bus = e_info->id >> 8;
> +		u8 devfn = e_info->id & 0xff;
> +
> +		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> +			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> +			 PCI_FUNC(devfn));
>  		return false;
>  	}
>  	return true;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
@ 2023-12-12 15:00     ` Terry Bowman
  0 siblings, 0 replies; 29+ messages in thread
From: Terry Bowman @ 2023-12-12 15:00 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Robert Richter, Mahesh J Salgaonkar, linux-kernel, Kai-Heng Feng,
	Oliver O'Halloran, Bjorn Helgaas, linuxppc-dev

LGTM

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
>   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pcie/aer.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 20db80018b5d..2ff6bac9979f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
>  	u8 bus = info->id >> 8;
>  	u8 devfn = info->id & 0xff;
>  
> -	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> +	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
>  		 info->multi_error_valid ? "Multiple " : "",
>  		 aer_error_severity_string[info->severity],
>  		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
>  		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
>  
>  	if (!e_info->error_dev_num) {
> -		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> +		u8 bus = e_info->id >> 8;
> +		u8 devfn = e_info->id & 0xff;
> +
> +		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> +			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> +			 PCI_FUNC(devfn));
>  		return false;
>  	}
>  	return true;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  2023-12-12 15:00     ` Terry Bowman
@ 2023-12-12 21:23       ` Bjorn Helgaas
  -1 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-12 21:23 UTC (permalink / raw)
  To: Terry Bowman
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver O'Halloran,
	Robert Richter, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas

On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
> 
> Will help prevent confusion. LGTM. 

Thanks a lot for taking a look at these!  I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman <Terry.Bowman@amd.com>",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
@ 2023-12-12 21:23       ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2023-12-12 21:23 UTC (permalink / raw)
  To: Terry Bowman
  Cc: Robert Richter, linux-pci, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev

On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
> 
> Will help prevent confusion. LGTM. 

Thanks a lot for taking a look at these!  I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman <Terry.Bowman@amd.com>",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  2023-12-12 21:23       ` Bjorn Helgaas
  (?)
@ 2023-12-12 22:42       ` Bowman, Terry
  -1 siblings, 0 replies; 29+ messages in thread
From: Bowman, Terry @ 2023-12-12 22:42 UTC (permalink / raw)
  To: Bjorn Helgaas, Terry Bowman
  Cc: Robert Richter, linux-pci, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

No problem. You can add my "Reviewed-by". Thanks.

Regards,
Terry
On 12/12/2023 3:23 PM, Bjorn Helgaas wrote:
> On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
>> Hi Bjorn,
>>
>> Will help prevent confusion. LGTM.
> Thanks a lot for taking a look at these!  I'd like to give you credit
> in the log, e.g., "Reviewed-by: Terry Bowman<Terry.Bowman@amd.com>",
> but I'm OCD enough that I don't want to translate "LGTM" into that all
> by myself.
>
> If you want that credit (and, I guess, the privilege of being cc'd
> when we find that these patches break something :)), just reply again
> with that actual "Reviewed-by:" text in it.
>
> Bjorn

[-- Attachment #2: Type: text/html, Size: 1295 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2024-01-02 19:22     ` Kuppuswamy Sathyanarayanan
  -1 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-02 19:22 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas



On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
>   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Except for the suggestion given below, it looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> ---
>  drivers/pci/pcie/aer.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 20db80018b5d..2ff6bac9979f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
>  	u8 bus = info->id >> 8;
>  	u8 devfn = info->id & 0xff;
>  
> -	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> +	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
>  		 info->multi_error_valid ? "Multiple " : "",
>  		 aer_error_severity_string[info->severity],
>  		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
>  		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
>  
>  	if (!e_info->error_dev_num) {
> -		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> +		u8 bus = e_info->id >> 8;
> +		u8 devfn = e_info->id & 0xff;

You can use PCI_BUS_NUM(e_info->id) for getting bus number. Since you are
extracting this info in more than one place, maybe you can also define a
macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

> +
> +		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> +			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> +			 PCI_FUNC(devfn));
>  		return false;
>  	}
>  	return true;

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
@ 2024-01-02 19:22     ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-02 19:22 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev



On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
>   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Except for the suggestion given below, it looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> ---
>  drivers/pci/pcie/aer.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 20db80018b5d..2ff6bac9979f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
>  	u8 bus = info->id >> 8;
>  	u8 devfn = info->id & 0xff;
>  
> -	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> +	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
>  		 info->multi_error_valid ? "Multiple " : "",
>  		 aer_error_severity_string[info->severity],
>  		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
>  		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
>  
>  	if (!e_info->error_dev_num) {
> -		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> +		u8 bus = e_info->id >> 8;
> +		u8 devfn = e_info->id & 0xff;

You can use PCI_BUS_NUM(e_info->id) for getting bus number. Since you are
extracting this info in more than one place, maybe you can also define a
macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

> +
> +		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> +			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> +			 PCI_FUNC(devfn));
>  		return false;
>  	}
>  	return true;

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  2023-12-06 22:42   ` Bjorn Helgaas
@ 2024-01-02 19:23     ` Kuppuswamy Sathyanarayanan
  -1 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-02 19:23 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran, Robert Richter,
	Terry Bowman, Kai-Heng Feng, linuxppc-dev, linux-kernel,
	Bjorn Helgaas



On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
@ 2024-01-02 19:23     ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 29+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-02 19:23 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Robert Richter, Terry Bowman, Mahesh J Salgaonkar, linux-kernel,
	Kai-Heng Feng, Oliver O'Halloran, Bjorn Helgaas,
	linuxppc-dev



On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
  2024-01-02 19:22     ` Kuppuswamy Sathyanarayanan
@ 2024-01-02 22:53       ` Bjorn Helgaas
  -1 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2024-01-02 22:53 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver O'Halloran,
	Robert Richter, Terry Bowman, Kai-Heng Feng, linuxppc-dev,
	linux-kernel, Bjorn Helgaas

On Tue, Jan 02, 2024 at 11:22:53AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > When a device with AER detects an error, it logs error information in its
> > own AER Error Status registers.  It may send an Error Message to the Root
> > Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> > Message was received (Root Error Status) and the Requester ID of the
> > message source (Error Source Identification).
> > 
> > aer_print_port_info() prints the Requester ID from the Root Port Error
> > Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> > finds no error details in the hierarchy below the Root Port, it printed the
> > raw Requester ID without decoding it.
> > 
> > Decode the Requester ID in the usual Linux format so it matches other
> > messages.
> > 
> > Sample message changes:
> > 
> >   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> >   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> >   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
> >   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> > 
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Except for the suggestion given below, it looks good to me.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Thanks for taking a look!

> > @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> >  	u8 bus = info->id >> 8;
> >  	u8 devfn = info->id & 0xff;
> >  
> > -	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> > +	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> >  		 info->multi_error_valid ? "Multiple " : "",
> >  		 aer_error_severity_string[info->severity],
> >  		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> > @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> >  		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
> >  
> >  	if (!e_info->error_dev_num) {
> > -		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> > +		u8 bus = e_info->id >> 8;
> > +		u8 devfn = e_info->id & 0xff;
> 
> You can use PCI_BUS_NUM(e_info->id) for getting bus number.  Since
> you are extracting this info in more than one place, maybe you can
> also define a macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

Thanks, both good ideas.

We already have a PCI_DEVFN() that *combines* slot + func into devfn,
so we'd have to come up with a different name.

I'll add a patch to use PCI_BUS_NUM() in the two places here and in
pme.c.

I think I'll wait with these until after the v6.7 release.

> > +		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> > +			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> > +			 PCI_FUNC(devfn));
> >  		return false;
> >  	}
> >  	return true;
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
@ 2024-01-02 22:53       ` Bjorn Helgaas
  0 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2024-01-02 22:53 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Robert Richter, Terry Bowman, linux-pci, Mahesh J Salgaonkar,
	linux-kernel, Kai-Heng Feng, Oliver O'Halloran,
	Bjorn Helgaas, linuxppc-dev

On Tue, Jan 02, 2024 at 11:22:53AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > When a device with AER detects an error, it logs error information in its
> > own AER Error Status registers.  It may send an Error Message to the Root
> > Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> > Message was received (Root Error Status) and the Requester ID of the
> > message source (Error Source Identification).
> > 
> > aer_print_port_info() prints the Requester ID from the Root Port Error
> > Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> > finds no error details in the hierarchy below the Root Port, it printed the
> > raw Requester ID without decoding it.
> > 
> > Decode the Requester ID in the usual Linux format so it matches other
> > messages.
> > 
> > Sample message changes:
> > 
> >   - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> >   - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> >   + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
> >   + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> > 
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Except for the suggestion given below, it looks good to me.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Thanks for taking a look!

> > @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> >  	u8 bus = info->id >> 8;
> >  	u8 devfn = info->id & 0xff;
> >  
> > -	pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> > +	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> >  		 info->multi_error_valid ? "Multiple " : "",
> >  		 aer_error_severity_string[info->severity],
> >  		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> > @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> >  		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
> >  
> >  	if (!e_info->error_dev_num) {
> > -		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> > +		u8 bus = e_info->id >> 8;
> > +		u8 devfn = e_info->id & 0xff;
> 
> You can use PCI_BUS_NUM(e_info->id) for getting bus number.  Since
> you are extracting this info in more than one place, maybe you can
> also define a macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

Thanks, both good ideas.

We already have a PCI_DEVFN() that *combines* slot + func into devfn,
so we'd have to come up with a different name.

I'll add a patch to use PCI_BUS_NUM() in the two places here and in
pme.c.

I think I'll wait with these until after the v6.7 release.

> > +		pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> > +			 pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> > +			 PCI_FUNC(devfn));
> >  		return false;
> >  	}
> >  	return true;
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-01-02 22:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06 22:42 [PATCH 0/3] PCI/AER: Clean up logging Bjorn Helgaas
2023-12-06 22:42 ` Bjorn Helgaas
2023-12-06 22:42 ` [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors Bjorn Helgaas
2023-12-06 22:42   ` Bjorn Helgaas
2023-12-08 14:36   ` Jonathan Cameron
2023-12-08 14:36     ` Jonathan Cameron
2023-12-12 15:00   ` Terry Bowman
2023-12-12 15:00     ` Terry Bowman
2023-12-12 21:23     ` Bjorn Helgaas
2023-12-12 21:23       ` Bjorn Helgaas
2023-12-12 22:42       ` Bowman, Terry
2024-01-02 19:23   ` Kuppuswamy Sathyanarayanan
2024-01-02 19:23     ` Kuppuswamy Sathyanarayanan
2023-12-06 22:42 ` [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found Bjorn Helgaas
2023-12-06 22:42   ` Bjorn Helgaas
2023-12-08 14:38   ` Jonathan Cameron
2023-12-08 14:38     ` Jonathan Cameron
2023-12-12 15:00   ` Terry Bowman
2023-12-12 15:00     ` Terry Bowman
2024-01-02 19:22   ` Kuppuswamy Sathyanarayanan
2024-01-02 19:22     ` Kuppuswamy Sathyanarayanan
2024-01-02 22:53     ` Bjorn Helgaas
2024-01-02 22:53       ` Bjorn Helgaas
2023-12-06 22:42 ` [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members Bjorn Helgaas
2023-12-06 22:42   ` Bjorn Helgaas
2023-12-08 14:38   ` Jonathan Cameron
2023-12-08 14:38     ` Jonathan Cameron
2023-12-08 16:54 ` [PATCH 0/3] PCI/AER: Clean up logging Bjorn Helgaas
2023-12-08 16:54   ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.