linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: yaoaili126@163.com
Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com,
	bp@alien8.de, linux-acpi@vger.kernel.org, yangfeng1@kingsoft.com,
	CHENGUOMIN@kingsoft.com, yaoaili@kingsoft.com
Subject: Re: [PATCH] Don't do panic for memory Fatal UE in ghes of x86_mce platform
Date: Wed, 28 Oct 2020 19:19:30 +0000	[thread overview]
Message-ID: <3942f335-b905-4c8f-bf81-7bdff6c025ad@arm.com> (raw)
In-Reply-To: <20201028073540.70136-1-yaoaili126@163.com>

Hi!

On 28/10/2020 07:35, yaoaili126@163.com wrote:
> From: Aili Yao <yaoaili@kingsoft.com>
> 
> For x86 with mce, when BIOS get its work done for memory UE,it will
> raise MCE exception, In MCE, it will do panic or recovery work there.
> But When BIOS option WHEA memory log is enabled,

This is GHES_ASSIST?


> BIOS also prepared one
> detailed error table which will be polled by ghes_notify_nmi from NMI
> watchdog,

heh, because the NMI notification chain has no idea what triggered it...


> ghes_notify_nmi will check the severity and do panic too, this
> panic action is not necessary and confusing, and may lead to unwanted
> results like core dump fail.
> 
> Downgrade CPER_SEV_FATAL to GHES_SEV_RECOVERABLE before panic is called
> for x86_mce

You can't know whether your platform (will?) also generate an MCE when you build the
kernel. Distros set all the Kconfig options, they aren't tuned to the platform.

This also makes fatal memory errors recoverable, which isn't true for other platforms.


Assuming this is GHES_ASSIST, I think the simplest approach is to skip the panic() for
CPER records found for those GHES. APEI is only providing assistance in this case, so its
unfair for it to take some terminal action. The machine-check handler should have the
final say in in this case.


Section 18.7 of ACPI v6.3a says we're supposed to:
| consume the additional GHES_ASSIST information in the context of an error reported
| by hardware
which would be the MCE - so there is no reason for these GHES entries to be poked whenever
an unrelated NMI occurs.

Fixing this would need a separate list for the machine check handler to poke to dump any
data from GHES_ASSIST CPER, so its more work than just skipping the register call.

(I've no idea how x86 prioritises MCE and NMI... does one block the other?)


Thanks,

James


> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 81bf71b10d44..e5e8a53beb5a 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -272,7 +272,27 @@ static inline int ghes_severity(int severity)
>  	case CPER_SEV_RECOVERABLE:
>  		return GHES_SEV_RECOVERABLE;
>  	case CPER_SEV_FATAL:
> +	{
> +#ifdef CONFIG_X86_MCE
> +		int sev, sec_sev;
> +		struct acpi_hest_generic_data *gdata;
> +		guid_t *sec_type;
> +
> +		if (estatus == NULL)
> +			return GHES_SEV_PANIC;
> +
> +		apei_estatus_for_each_section(estatus, gdata) {
> +			sec_type = (guid_t *)gdata->section_type;
> +			sec_sev = gdata->error_severity;
> +			if (sec_sev == CPER_SEV_FATAL &&
> +			 !guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM))
> +				return GHES_SEV_PANIC;
> +		}
> +		return GHES_SEV_RECOVERABLE;
> +#else
>  		return GHES_SEV_PANIC;
> +#endif
> +	}

       reply	other threads:[~2020-10-29  2:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20201028073540.70136-1-yaoaili126@163.com>
2020-10-28 19:19 ` James Morse [this message]
     [not found] <3942f335-b905-4c8f-bf81-7bdff6c025ad () arm ! com>
2020-10-29 12:27 ` Re:Re: [PATCH] Don't do panic for memory Fatal UE in ghes of x86_mce platform yaoaili126
2020-10-30 13:47   ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3942f335-b905-4c8f-bf81-7bdff6c025ad@arm.com \
    --to=james.morse@arm.com \
    --cc=CHENGUOMIN@kingsoft.com \
    --cc=bp@alien8.de \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili126@163.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).