linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Tyler Baicar <tbaicar@codeaurora.org>, zjzhang@codeaurora.org
Cc: christoffer.dall@linaro.org, marc.zyngier@arm.com,
	pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk,
	catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net,
	lenb@kernel.org, matt@codeblueprint.co.uk,
	robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org,
	mark.rutland@arm.com, akpm@linux-foundation.org,
	eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com,
	labbott@redhat.com, shijie.huang@arm.com,
	rruigrok@codeaurora.org, paul.gortmaker@windriver.com,
	tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org,
	bristot@redhat.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-efi@vger.kernel.org, devel@acpica.org,
	Suzuki.Poulose@arm.com, punit.agrawal@arm.com, astone@redhat.com,
	harba@codeaurora.org, hanjun.guo@linaro.org,
	john.garry@huawei.com, shiju.jose@huawei.com
Subject: Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
Date: Thu, 09 Feb 2017 10:48:42 +0000	[thread overview]
Message-ID: <589C490A.9080109@arm.com> (raw)
In-Reply-To: <1485969413-23577-7-git-send-email-tbaicar@codeaurora.org>

Hi Jonathan, Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> Even if an error status block's severity is fatal, the kernel does not
> honor the severity level and panic.
> 
> With the firmware first model, the platform could inform the OS about a
> fatal hardware error through the non-NMI GHES notification type. The OS
> should panic when a hardware error record is received with this
> severity.
> 
> Call panic() after CPER data in error status block is printed if
> severity is fatal, before each error section is handled.

> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 8756172..86c1f15 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>  	return rc;
>  }
>  
> +static void __ghes_call_panic(void)
> +{
> +	if (panic_timeout == 0)
> +		panic_timeout = ghes_panic_timeout;
> +	panic("Fatal hardware error!");
> +}
> +

__ghes_panic() also has:
>	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);

Which prints this estatus regardless of rate limiting and cache-ing.

[ ... ]

> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>  		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))

ghes_print_estatus() uses some custom rate limiting '2 messages every 5
seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.

I think its possible to get 2 recoverable messages, then a panic in a 5 second
window. The rate limit will kick in to stop the panic estatus block being
printed, but we still go on to call panic() without the real reason being printed...

(the caching thing only seems to consider identical messages, given we would
never see two panic messages, I don't think that will cause any problems.)

>  			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>  	}
> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
> +		__ghes_call_panic();
> +	}
> +

I think this ghes_severity() then panic() should go above the:
>	if (!ghes_estatus_cached(ghes->estatus)) {
and we should call __ghes_print_estatus() here too, to make sure the message
definitely got out!


With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

  reply	other threads:[~2017-02-09 10:58 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-01 17:16 [PATCH V8 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 01/10] acpi: apei: read ack upon ghes record consumption Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 03/10] efi: parse ARM processor error Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort Tyler Baicar
2017-02-03 15:59   ` James Morse
2017-02-03 20:24     ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
2017-02-01 22:26   ` kbuild test robot
2017-02-03 16:00   ` James Morse
2017-02-03 20:38     ` Baicar, Tyler
2017-02-15  6:24   ` Zhengqiang
2017-02-15 14:58     ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block Tyler Baicar
2017-02-09 10:48   ` James Morse [this message]
2017-02-13 22:45     ` Baicar, Tyler
2017-02-15 12:13       ` James Morse
2017-02-15 17:07         ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 07/10] efi: print unrecognized CPER section Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 08/10] ras: acpi / apei: generate trace event for " Tyler Baicar
2017-02-01 23:20   ` kbuild test robot
2017-02-15 15:52   ` Steven Rostedt
2017-02-15 16:54     ` Baicar, Tyler
2017-02-15 17:03       ` Steven Rostedt
2017-02-15 17:06         ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 09/10] trace, ras: add ARM processor error trace event Tyler Baicar
2017-02-02  2:34   ` kbuild test robot
2017-02-02  3:15   ` Steven Rostedt
2017-02-03 20:18     ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 10/10] arm/arm64: KVM: add guest SEA support Tyler Baicar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=589C490A.9080109@arm.com \
    --to=james.morse@arm.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=astone@redhat.com \
    --cc=bristot@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=devel@acpica.org \
    --cc=eun.taik.lee@samsung.com \
    --cc=fu.wei@linaro.org \
    --cc=hanjun.guo@linaro.org \
    --cc=harba@codeaurora.org \
    --cc=john.garry@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=labbott@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lv.zheng@intel.com \
    --cc=marc.zyngier@arm.com \
    --cc=mark.rutland@arm.com \
    --cc=matt@codeblueprint.co.uk \
    --cc=nkaje@codeaurora.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=pbonzini@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=rkrcmar@redhat.com \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=rruigrok@codeaurora.org \
    --cc=sandeepa.s.prabhu@gmail.com \
    --cc=shijie.huang@arm.com \
    --cc=shiju.jose@huawei.com \
    --cc=tbaicar@codeaurora.org \
    --cc=tn@semihalf.com \
    --cc=will.deacon@arm.com \
    --cc=zjzhang@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).