All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] acpi: apei: check for pending errors when probing HED type GHES entries
@ 2017-03-29 15:54 Tyler Baicar
  2017-03-30 17:30 ` James Morse
  0 siblings, 1 reply; 3+ messages in thread
From: Tyler Baicar @ 2017-03-29 15:54 UTC (permalink / raw)
  To: rjw, lenb, bp, prarit, bhelgaas, punit.agrawal, mingo,
	linux-acpi, linux-kernel, shiju.jose, James.Morse, ahs3
  Cc: Tyler Baicar

If a HED type error occurs prior to GHES probing, the kernel will
never report the error. The HED driver will see that no notifiers
are registered, and clear the interrupt.

This becomes a more serious problem with firmware that supports
GHESv2 acknowledgements from the kernel. The firmware will populate
the error and wait for the kernel ack. But since the kernel will
never process the error we get into a state that the firmware will
not send any more errors and the kernel will never see or ack the
original error.

Check for pending errors when probing HED type GHES entries to
avoid the above situation.

This patch is based on Shiju's patch that adds support for GSIV
and GPIO notification types:
https://patchwork.kernel.org/patch/9628817/

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fd39929..cf5e938 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1035,6 +1035,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			register_acpi_hed_notifier(&ghes_notifier_hed);
 		list_add_rcu(&ghes->list, &ghes_hed);
 		mutex_unlock(&ghes_list_mutex);
+		ghes_proc(ghes);
 		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] acpi: apei: check for pending errors when probing HED type GHES entries
  2017-03-29 15:54 [PATCH V2] acpi: apei: check for pending errors when probing HED type GHES entries Tyler Baicar
@ 2017-03-30 17:30 ` James Morse
  2017-03-31 15:18   ` Baicar, Tyler
  0 siblings, 1 reply; 3+ messages in thread
From: James Morse @ 2017-03-30 17:30 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: rjw, lenb, bp, prarit, bhelgaas, punit.agrawal, mingo,
	linux-acpi, linux-kernel, shiju.jose, ahs3

Hi Tyler,

On 29/03/17 16:54, Tyler Baicar wrote:
> If a HED type error occurs prior to GHES probing, the kernel will
> never report the error. The HED driver will see that no notifiers
> are registered, and clear the interrupt.
> 
> This becomes a more serious problem with firmware that supports
> GHESv2 acknowledgements from the kernel. The firmware will populate
> the error and wait for the kernel ack. But since the kernel will
> never process the error we get into a state that the firmware will
> not send any more errors and the kernel will never see or ack the
> original error.
> 
> Check for pending errors when probing HED type GHES entries to
> avoid the above situation.

Isn't this a problem for the other notification types too?

It looks like SEI can indicate the notification is non-fatal even if we haven't
done the ghes_probe() yet and fail to find the CPER records.

Would moving the OSC call to set the APEI bit later solve this, or is it
specific to the way AMLs Notify() works?


Thanks,

James


> 
> This patch is based on Shiju's patch that adds support for GSIV
> and GPIO notification types:
> https://patchwork.kernel.org/patch/9628817/
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> ---
>  drivers/acpi/apei/ghes.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fd39929..cf5e938 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1035,6 +1035,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  			register_acpi_hed_notifier(&ghes_notifier_hed);
>  		list_add_rcu(&ghes->list, &ghes_hed);
>  		mutex_unlock(&ghes_list_mutex);
> +		ghes_proc(ghes);
>  		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] acpi: apei: check for pending errors when probing HED type GHES entries
  2017-03-30 17:30 ` James Morse
@ 2017-03-31 15:18   ` Baicar, Tyler
  0 siblings, 0 replies; 3+ messages in thread
From: Baicar, Tyler @ 2017-03-31 15:18 UTC (permalink / raw)
  To: James Morse
  Cc: rjw, lenb, bp, prarit, bhelgaas, punit.agrawal, mingo,
	linux-acpi, linux-kernel, shiju.jose, ahs3

Hello James,


On 3/30/2017 11:30 AM, James Morse wrote:
> On 29/03/17 16:54, Tyler Baicar wrote:
>> If a HED type error occurs prior to GHES probing, the kernel will
>> never report the error. The HED driver will see that no notifiers
>> are registered, and clear the interrupt.
>>
>> This becomes a more serious problem with firmware that supports
>> GHESv2 acknowledgements from the kernel. The firmware will populate
>> the error and wait for the kernel ack. But since the kernel will
>> never process the error we get into a state that the firmware will
>> not send any more errors and the kernel will never see or ack the
>> original error.
>>
>> Check for pending errors when probing HED type GHES entries to
>> avoid the above situation.
> Isn't this a problem for the other notification types too?
>
> It looks like SEI can indicate the notification is non-fatal even if we haven't
> done the ghes_probe() yet and fail to find the CPER records.
>
> Would moving the OSC call to set the APEI bit later solve this, or is it
> specific to the way AMLs Notify() works?
For SEAs, the kernel would have already tried handling the SEA and gone 
through the GHES code unsuccessfully. If the SEA was non-fatal then we 
could print the GHES/CPER info here.

SEIs are probably similar to the SEA case I would imagine.

The notification interface I believe just sends the notification to 
anyone registered for that event and then clears the interrupt, so if no 
one is registered, the interrupt will be cleared.

It may be more practical to add this call to ghes_proc() at the end of 
the switch statement so that all notification types check for pending 
errors right after probing. Adding this call shouldn't be an issue since 
if there is no error pending it will just return back. The polled 
notification type calls ghes_proc() every time it's timer expires to 
check for errors.

Thanks,
Tyler
>> This patch is based on Shiju's patch that adds support for GSIV
>> and GPIO notification types:
>> https://patchwork.kernel.org/patch/9628817/
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>>   drivers/acpi/apei/ghes.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fd39929..cf5e938 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -1035,6 +1035,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   			register_acpi_hed_notifier(&ghes_notifier_hed);
>>   		list_add_rcu(&ghes->list, &ghes_hed);
>>   		mutex_unlock(&ghes_list_mutex);
>> +		ghes_proc(ghes);
>>   		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-31 15:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-29 15:54 [PATCH V2] acpi: apei: check for pending errors when probing HED type GHES entries Tyler Baicar
2017-03-30 17:30 ` James Morse
2017-03-31 15:18   ` Baicar, Tyler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.