All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI/AER: don't call recovery process for correctable errors
@ 2017-08-28 17:09 Tyler Baicar
  2017-10-02 23:19 ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Tyler Baicar @ 2017-08-28 17:09 UTC (permalink / raw)
  To: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel
  Cc: Tyler Baicar

Correctable errors do not need any software intervention, so
avoid calling into the software recovery process for correctable
errors.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index b1303b3..4765c11 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
 			continue;
 		}
 		cper_print_aer(pdev, entry.severity, entry.regs);
-		do_recovery(pdev, entry.severity);
+		if (entry.severity != AER_CORRECTABLE)
+			do_recovery(pdev, entry.severity);
 		pci_dev_put(pdev);
 	}
 }
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI/AER: don't call recovery process for correctable errors
  2017-08-28 17:09 [PATCH] PCI/AER: don't call recovery process for correctable errors Tyler Baicar
@ 2017-10-02 23:19 ` Bjorn Helgaas
  2017-10-11 14:37   ` Tyler Baicar
  2017-11-15 14:46   ` Tyler Baicar
  0 siblings, 2 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2017-10-02 23:19 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel

On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote:
> Correctable errors do not need any software intervention, so
> avoid calling into the software recovery process for correctable
> errors.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> ---
>  drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index b1303b3..4765c11 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
>  			continue;
>  		}
>  		cper_print_aer(pdev, entry.severity, entry.regs);
> -		do_recovery(pdev, entry.severity);
> +		if (entry.severity != AER_CORRECTABLE)
> +			do_recovery(pdev, entry.severity);

I think this is fine, and it mirrors what is done in
handle_error_source().

But I want to converge the APEI path and the "native" AER path, so as
one tiny step in that direction, can you look into doing this test
once, e.g., move the test from handle_error_source() into
do_recovery(), where one test will handle both paths?

>  		pci_dev_put(pdev);
>  	}
>  }
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI/AER: don't call recovery process for correctable errors
  2017-10-02 23:19 ` Bjorn Helgaas
@ 2017-10-11 14:37   ` Tyler Baicar
  2017-10-11 17:09     ` Bjorn Helgaas
  2017-11-15 14:46   ` Tyler Baicar
  1 sibling, 1 reply; 7+ messages in thread
From: Tyler Baicar @ 2017-10-11 14:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel

On 10/2/2017 7:19 PM, Bjorn Helgaas wrote:
> On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote:
>> Correctable errors do not need any software intervention, so
>> avoid calling into the software recovery process for correctable
>> errors.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>>   drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
>> index b1303b3..4765c11 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_core.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
>> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
>>   			continue;
>>   		}
>>   		cper_print_aer(pdev, entry.severity, entry.regs);
>> -		do_recovery(pdev, entry.severity);
>> +		if (entry.severity != AER_CORRECTABLE)
>> +			do_recovery(pdev, entry.severity);
> I think this is fine, and it mirrors what is done in
> handle_error_source().
>
> But I want to converge the APEI path and the "native" AER path, so as
> one tiny step in that direction, can you look into doing this test
> once, e.g., move the test from handle_error_source() into
> do_recovery(), where one test will handle both paths?
Hello Bjorn,

Thank you for the input!

I've looked into this and it seems there is still going to need to be 
two versions of this check. The native AER path goes through 
handle_error_source() and for correctable errors requires the write to 
PCI_ERR_COR_STATUS, but the APEI path does not require this write. I 
could move this check to the beginning of do_recovery() so it returns 
right away for correctable errors and remove the else line in 
handle_error_source() so it always calls into do_recovery(). That 
doesn't seem like a very clean solution though since then there are 
still two checks for correctable errors and now we're calling into 
do_recovery() in both cases for nothing :)

Thanks,
Tyler

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI/AER: don't call recovery process for correctable errors
  2017-10-11 14:37   ` Tyler Baicar
@ 2017-10-11 17:09     ` Bjorn Helgaas
  2017-11-07 23:27       ` Tyler Baicar
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2017-10-11 17:09 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel

On Wed, Oct 11, 2017 at 10:37:47AM -0400, Tyler Baicar wrote:
> On 10/2/2017 7:19 PM, Bjorn Helgaas wrote:
> >On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote:
> >>Correctable errors do not need any software intervention, so
> >>avoid calling into the software recovery process for correctable
> >>errors.
> >>
> >>Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> >>---
> >>  drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> >>index b1303b3..4765c11 100644
> >>--- a/drivers/pci/pcie/aer/aerdrv_core.c
> >>+++ b/drivers/pci/pcie/aer/aerdrv_core.c
> >>@@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
> >>  			continue;
> >>  		}
> >>  		cper_print_aer(pdev, entry.severity, entry.regs);
> >>-		do_recovery(pdev, entry.severity);
> >>+		if (entry.severity != AER_CORRECTABLE)
> >>+			do_recovery(pdev, entry.severity);
> >I think this is fine, and it mirrors what is done in
> >handle_error_source().
> >
> >But I want to converge the APEI path and the "native" AER path, so as
> >one tiny step in that direction, can you look into doing this test
> >once, e.g., move the test from handle_error_source() into
> >do_recovery(), where one test will handle both paths?
> 
> I've looked into this and it seems there is still going to need to
> be two versions of this check. The native AER path goes through
> handle_error_source() and for correctable errors requires the write
> to PCI_ERR_COR_STATUS, but the APEI path does not require this
> write. I could move this check to the beginning of do_recovery() so
> it returns right away for correctable errors and remove the else
> line in handle_error_source() so it always calls into do_recovery().
> That doesn't seem like a very clean solution though since then there
> are still two checks for correctable errors and now we're calling
> into do_recovery() in both cases for nothing :)

The PCI_ERR_COR_STATUS thing is part of what I see as the problem
here.  IMHO, the native AER path should collect up the log registers
(and do any acknowledgement, e.g., writing PCI_ERR_COR_STATUS)
*before* entering the common path.

In other words, the Linux code in the native part of AER should be
functionally the same as the BIOS code that implements the APEI path.

This is a bit of restructuring in the Linux AER code.  I haven't
looked enough to know how much.  If it's impractical, it's
impractical.  I thought this might be an opportunity for a tiny step
in that direction, but if it's not, I guess that's OK.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI/AER: don't call recovery process for correctable errors
  2017-10-11 17:09     ` Bjorn Helgaas
@ 2017-11-07 23:27       ` Tyler Baicar
  0 siblings, 0 replies; 7+ messages in thread
From: Tyler Baicar @ 2017-11-07 23:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel

On 10/11/2017 1:09 PM, Bjorn Helgaas wrote:
> On Wed, Oct 11, 2017 at 10:37:47AM -0400, Tyler Baicar wrote:
>> On 10/2/2017 7:19 PM, Bjorn Helgaas wrote:
>>> On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote:
>>>> Correctable errors do not need any software intervention, so
>>>> avoid calling into the software recovery process for correctable
>>>> errors.
>>>>
>>>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>>>> ---
>>>>   drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
>>>> index b1303b3..4765c11 100644
>>>> --- a/drivers/pci/pcie/aer/aerdrv_core.c
>>>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
>>>> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
>>>>   			continue;
>>>>   		}
>>>>   		cper_print_aer(pdev, entry.severity, entry.regs);
>>>> -		do_recovery(pdev, entry.severity);
>>>> +		if (entry.severity != AER_CORRECTABLE)
>>>> +			do_recovery(pdev, entry.severity);
>>> I think this is fine, and it mirrors what is done in
>>> handle_error_source().
>>>
>>> But I want to converge the APEI path and the "native" AER path, so as
>>> one tiny step in that direction, can you look into doing this test
>>> once, e.g., move the test from handle_error_source() into
>>> do_recovery(), where one test will handle both paths?
>> I've looked into this and it seems there is still going to need to
>> be two versions of this check. The native AER path goes through
>> handle_error_source() and for correctable errors requires the write
>> to PCI_ERR_COR_STATUS, but the APEI path does not require this
>> write. I could move this check to the beginning of do_recovery() so
>> it returns right away for correctable errors and remove the else
>> line in handle_error_source() so it always calls into do_recovery().
>> That doesn't seem like a very clean solution though since then there
>> are still two checks for correctable errors and now we're calling
>> into do_recovery() in both cases for nothing :)
> The PCI_ERR_COR_STATUS thing is part of what I see as the problem
> here.  IMHO, the native AER path should collect up the log registers
> (and do any acknowledgement, e.g., writing PCI_ERR_COR_STATUS)
> *before* entering the common path.
>
> In other words, the Linux code in the native part of AER should be
> functionally the same as the BIOS code that implements the APEI path.
>
> This is a bit of restructuring in the Linux AER code.  I haven't
> looked enough to know how much.  If it's impractical, it's
> impractical.  I thought this might be an opportunity for a tiny step
> in that direction, but if it's not, I guess that's OK.
Hello Bjorn,

That restructuring doesn't look trivial to do in this patch, so do you think 
this patch is
good for 4.15?

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI/AER: don't call recovery process for correctable errors
  2017-10-02 23:19 ` Bjorn Helgaas
  2017-10-11 14:37   ` Tyler Baicar
@ 2017-11-15 14:46   ` Tyler Baicar
  2017-11-17 18:22     ` Bjorn Helgaas
  1 sibling, 1 reply; 7+ messages in thread
From: Tyler Baicar @ 2017-11-15 14:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel

On 10/2/2017 7:19 PM, Bjorn Helgaas wrote:
> On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote:
>> Correctable errors do not need any software intervention, so
>> avoid calling into the software recovery process for correctable
>> errors.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>>   drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
>> index b1303b3..4765c11 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_core.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
>> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
>>   			continue;
>>   		}
>>   		cper_print_aer(pdev, entry.severity, entry.regs);
>> -		do_recovery(pdev, entry.severity);
>> +		if (entry.severity != AER_CORRECTABLE)
>> +			do_recovery(pdev, entry.severity);
> I think this is fine, and it mirrors what is done in
> handle_error_source().
>
Hello,

Will this patch be pulled into 4.15?

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI/AER: don't call recovery process for correctable errors
  2017-11-15 14:46   ` Tyler Baicar
@ 2017-11-17 18:22     ` Bjorn Helgaas
  0 siblings, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2017-11-17 18:22 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: bhelgaas, jonathan.derrick, keith.busch, linux-pci, linux-kernel

On Wed, Nov 15, 2017 at 09:46:42AM -0500, Tyler Baicar wrote:
> On 10/2/2017 7:19 PM, Bjorn Helgaas wrote:
> >On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote:
> >>Correctable errors do not need any software intervention, so
> >>avoid calling into the software recovery process for correctable
> >>errors.
> >>
> >>Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> >>---
> >>  drivers/pci/pcie/aer/aerdrv_core.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> >>index b1303b3..4765c11 100644
> >>--- a/drivers/pci/pcie/aer/aerdrv_core.c
> >>+++ b/drivers/pci/pcie/aer/aerdrv_core.c
> >>@@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work)
> >>  			continue;
> >>  		}
> >>  		cper_print_aer(pdev, entry.severity, entry.regs);
> >>-		do_recovery(pdev, entry.severity);
> >>+		if (entry.severity != AER_CORRECTABLE)
> >>+			do_recovery(pdev, entry.severity);
> >I think this is fine, and it mirrors what is done in
> >handle_error_source().
> >
> Hello,
> 
> Will this patch be pulled into 4.15?

Sorry, I didn't get to this in time for v4.15, but I put it on pci/aer
for v4.16.  I expanded the changelog to note that this means we won't
call the driver's callbacks or emit the "recovery successful" message
for correctable errors from APEI, and that this matches what we
already do for the non-APEI path.

Bjorn


commit 72761170e6d5519c91136fd6cc80805a74ef9cfd
Author: Tyler Baicar <tbaicar@codeaurora.org>
Date:   Mon Aug 28 11:09:44 2017 -0600

    PCI/AER: Skip recovery callbacks for correctable errors from ACPI APEI
    
    PCIe correctable errors are corrected by hardware.  Software may log them,
    but no other software intervention is required.
    
    There are two paths to enter the AER recovery code: (1) the native path
    where Linux fields the AER interrupt and reads the AER registers directly,
    and (2) the ACPI path where firmware reads the AER registers and hands them
    off to Linux via the ACPI APEI path.
    
    The AER do_recovery() function calls driver error reporting callbacks
    (error_detected(), mmio_enabled(), resume(), etc), attempts recovery (for
    fatal errors), and logs a "AER: Device recovery successful" message.
    
    Since there's nothing to recover for correctable errors, the native path
    already skips do_recovery(), so it doesn't call the driver callbacks and or
    emit the message.  Make the APEI path do the same.
    
    Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
    [bhelgaas: changelog]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 744805232155..3e354f224422 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -633,7 +633,8 @@ static void aer_recover_work_func(struct work_struct *work)
 			continue;
 		}
 		cper_print_aer(pdev, entry.severity, entry.regs);
-		do_recovery(pdev, entry.severity);
+		if (entry.severity != AER_CORRECTABLE)
+			do_recovery(pdev, entry.severity);
 		pci_dev_put(pdev);
 	}
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-17 18:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 17:09 [PATCH] PCI/AER: don't call recovery process for correctable errors Tyler Baicar
2017-10-02 23:19 ` Bjorn Helgaas
2017-10-11 14:37   ` Tyler Baicar
2017-10-11 17:09     ` Bjorn Helgaas
2017-11-07 23:27       ` Tyler Baicar
2017-11-15 14:46   ` Tyler Baicar
2017-11-17 18:22     ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.