From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tyler Baicar Subject: [PATCH V9 06/10] acpi: apei: panic OS with fatal error status block Date: Wed, 15 Feb 2017 10:44:48 -0700 Message-ID: <1487180692-16732-7-git-send-email-tbaicar@codeaurora.org> References: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> Return-path: Received: from smtp.codeaurora.org ([198.145.29.96]:47298 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752106AbdBORpq (ORCPT ); Wed, 15 Feb 2017 12:45:46 -0500 In-Reply-To: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: christoffer.dall@linaro.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net, lenb@kernel.org, matt@codeblueprint.co.uk, robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org, zjzhang@codeaurora.org, mark.rutland@arm.com, james.morse@arm.com, akpm@linux-foundation.org, eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com, labbott@redhat.com, shijie.huang@arm.com, rruigrok@codeaurora.org, paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org Cc: Tyler Baicar From: "Jonathan (Zhixiong) Zhang" Even if an error status block's severity is fatal, the kernel does not honor the severity level and panic. With the firmware first model, the platform could inform the OS about a fatal hardware error through the non-NMI GHES notification type. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang Signed-off-by: Tyler Baicar Reviewed-by: James Morse --- drivers/acpi/apei/ghes.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 87045dc..9f105ce 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -133,6 +133,8 @@ static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE]; static atomic_t ghes_estatus_cache_alloced; +static int ghes_panic_timeout __read_mostly = 30; + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -688,6 +690,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2) return rc; } +static void __ghes_call_panic(void) +{ + if (panic_timeout == 0) + panic_timeout = ghes_panic_timeout; + panic("Fatal hardware error!"); +} + static int ghes_proc(struct ghes *ghes) { int rc; @@ -695,6 +704,10 @@ static int ghes_proc(struct ghes *ghes) rc = ghes_read_estatus(ghes, 0); if (rc) goto out; + if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) { + __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); + __ghes_call_panic(); + } if (!ghes_estatus_cached(ghes->estatus)) { if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus)) ghes_estatus_cache_add(ghes->generic, ghes->estatus); @@ -831,8 +844,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) static LIST_HEAD(ghes_nmi); -static int ghes_panic_timeout __read_mostly = 30; - static void ghes_proc_in_irq(struct irq_work *irq_work) { struct llist_node *llnode, *next; @@ -925,9 +936,7 @@ static void __ghes_panic(struct ghes *ghes) __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); /* reboot to log the error! */ - if (panic_timeout == 0) - panic_timeout = ghes_panic_timeout; - panic("Fatal hardware error!"); + __ghes_call_panic(); } static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752587AbdBORpt (ORCPT ); Wed, 15 Feb 2017 12:45:49 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:47298 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752106AbdBORpq (ORCPT ); Wed, 15 Feb 2017 12:45:46 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 378F460E77 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=tbaicar@codeaurora.org From: Tyler Baicar To: christoffer.dall@linaro.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net, lenb@kernel.org, matt@codeblueprint.co.uk, robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org, zjzhang@codeaurora.org, mark.rutland@arm.com, james.morse@arm.com, akpm@linux-foundation.org, eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com, labbott@redhat.com, shijie.huang@arm.com, rruigrok@codeaurora.org, paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org, devel@acpica.org, Suzuki.Poulose@arm.com, punit.agrawal@arm.com, astone@redhat.com, harba@codeaurora.org, hanjun.guo@linaro.org, john.garry@huawei.com, shiju.jose@huawei.com Cc: Tyler Baicar Subject: [PATCH V9 06/10] acpi: apei: panic OS with fatal error status block Date: Wed, 15 Feb 2017 10:44:48 -0700 Message-Id: <1487180692-16732-7-git-send-email-tbaicar@codeaurora.org> X-Mailer: git-send-email 1.8.2.1 In-Reply-To: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> References: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: "Jonathan (Zhixiong) Zhang" Even if an error status block's severity is fatal, the kernel does not honor the severity level and panic. With the firmware first model, the platform could inform the OS about a fatal hardware error through the non-NMI GHES notification type. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang Signed-off-by: Tyler Baicar Reviewed-by: James Morse --- drivers/acpi/apei/ghes.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 87045dc..9f105ce 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -133,6 +133,8 @@ static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE]; static atomic_t ghes_estatus_cache_alloced; +static int ghes_panic_timeout __read_mostly = 30; + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -688,6 +690,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2) return rc; } +static void __ghes_call_panic(void) +{ + if (panic_timeout == 0) + panic_timeout = ghes_panic_timeout; + panic("Fatal hardware error!"); +} + static int ghes_proc(struct ghes *ghes) { int rc; @@ -695,6 +704,10 @@ static int ghes_proc(struct ghes *ghes) rc = ghes_read_estatus(ghes, 0); if (rc) goto out; + if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) { + __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); + __ghes_call_panic(); + } if (!ghes_estatus_cached(ghes->estatus)) { if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus)) ghes_estatus_cache_add(ghes->generic, ghes->estatus); @@ -831,8 +844,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) static LIST_HEAD(ghes_nmi); -static int ghes_panic_timeout __read_mostly = 30; - static void ghes_proc_in_irq(struct irq_work *irq_work) { struct llist_node *llnode, *next; @@ -925,9 +936,7 @@ static void __ghes_panic(struct ghes *ghes) __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); /* reboot to log the error! */ - if (panic_timeout == 0) - panic_timeout = ghes_panic_timeout; - panic("Fatal hardware error!"); + __ghes_call_panic(); } static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tyler Baicar Subject: [PATCH V9 06/10] acpi: apei: panic OS with fatal error status block Date: Wed, 15 Feb 2017 10:44:48 -0700 Message-ID: <1487180692-16732-7-git-send-email-tbaicar@codeaurora.org> References: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> Cc: Tyler Baicar To: christoffer.dall@linaro.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net, lenb@kernel.org, matt@codeblueprint.co.uk, robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org, zjzhang@codeaurora.org, mark.rutland@arm.com, james.morse@arm.com, akpm@linux-foundation.org, eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com, labbott@redhat.com, shijie.huang@arm.com, rruigrok@codeaurora.org, paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org, Return-path: In-Reply-To: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> Sender: linux-acpi-owner@vger.kernel.org List-Id: kvm.vger.kernel.org From: "Jonathan (Zhixiong) Zhang" Even if an error status block's severity is fatal, the kernel does not honor the severity level and panic. With the firmware first model, the platform could inform the OS about a fatal hardware error through the non-NMI GHES notification type. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang Signed-off-by: Tyler Baicar Reviewed-by: James Morse --- drivers/acpi/apei/ghes.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 87045dc..9f105ce 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -133,6 +133,8 @@ static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE]; static atomic_t ghes_estatus_cache_alloced; +static int ghes_panic_timeout __read_mostly = 30; + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -688,6 +690,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2) return rc; } +static void __ghes_call_panic(void) +{ + if (panic_timeout == 0) + panic_timeout = ghes_panic_timeout; + panic("Fatal hardware error!"); +} + static int ghes_proc(struct ghes *ghes) { int rc; @@ -695,6 +704,10 @@ static int ghes_proc(struct ghes *ghes) rc = ghes_read_estatus(ghes, 0); if (rc) goto out; + if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) { + __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); + __ghes_call_panic(); + } if (!ghes_estatus_cached(ghes->estatus)) { if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus)) ghes_estatus_cache_add(ghes->generic, ghes->estatus); @@ -831,8 +844,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) static LIST_HEAD(ghes_nmi); -static int ghes_panic_timeout __read_mostly = 30; - static void ghes_proc_in_irq(struct irq_work *irq_work) { struct llist_node *llnode, *next; @@ -925,9 +936,7 @@ static void __ghes_panic(struct ghes *ghes) __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); /* reboot to log the error! */ - if (panic_timeout == 0) - panic_timeout = ghes_panic_timeout; - panic("Fatal hardware error!"); + __ghes_call_panic(); } static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 From: tbaicar@codeaurora.org (Tyler Baicar) Date: Wed, 15 Feb 2017 10:44:48 -0700 Subject: [PATCH V9 06/10] acpi: apei: panic OS with fatal error status block In-Reply-To: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> References: <1487180692-16732-1-git-send-email-tbaicar@codeaurora.org> Message-ID: <1487180692-16732-7-git-send-email-tbaicar@codeaurora.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org From: "Jonathan (Zhixiong) Zhang" Even if an error status block's severity is fatal, the kernel does not honor the severity level and panic. With the firmware first model, the platform could inform the OS about a fatal hardware error through the non-NMI GHES notification type. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang Signed-off-by: Tyler Baicar Reviewed-by: James Morse --- drivers/acpi/apei/ghes.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 87045dc..9f105ce 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -133,6 +133,8 @@ static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE]; static atomic_t ghes_estatus_cache_alloced; +static int ghes_panic_timeout __read_mostly = 30; + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -688,6 +690,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2) return rc; } +static void __ghes_call_panic(void) +{ + if (panic_timeout == 0) + panic_timeout = ghes_panic_timeout; + panic("Fatal hardware error!"); +} + static int ghes_proc(struct ghes *ghes) { int rc; @@ -695,6 +704,10 @@ static int ghes_proc(struct ghes *ghes) rc = ghes_read_estatus(ghes, 0); if (rc) goto out; + if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) { + __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); + __ghes_call_panic(); + } if (!ghes_estatus_cached(ghes->estatus)) { if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus)) ghes_estatus_cache_add(ghes->generic, ghes->estatus); @@ -831,8 +844,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) static LIST_HEAD(ghes_nmi); -static int ghes_panic_timeout __read_mostly = 30; - static void ghes_proc_in_irq(struct irq_work *irq_work) { struct llist_node *llnode, *next; @@ -925,9 +936,7 @@ static void __ghes_panic(struct ghes *ghes) __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); /* reboot to log the error! */ - if (panic_timeout == 0) - panic_timeout = ghes_panic_timeout; - panic("Fatal hardware error!"); + __ghes_call_panic(); } static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.