All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/3] arm64/ras: support sea error recovery
@ 2017-09-01 10:31 ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:31 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
are consumed. In some cases, if the error address is in a clean page or a
read-only page, there is a chance to recover. Such as error occurs in a
instruction page, we can reread this page from disk instead of killing process.

Because memory_failure() may sleep, we can not call it directly in SEA exception
context. So we saved faulting physical address associated with a process in the
ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
and get into do_notify_resume() before the process running, we could check it
and call memory_failure() to do recovery. It's safe, because we are in process
context.

In some platform, when SEA triggerred, physical address could be
reported by memory section or by processor section, so we save 
address at this two place.

Xie XiuQi (3):
  arm64/ras: support sea error recovery
  apei: add ghes param for arch_apei_report_mem_error
  arm64/apei: get error address from memory section for recovery

 arch/arm64/Kconfig                   |  11 +++
 arch/arm64/include/asm/ras.h         |  27 ++++++
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/ras.c              | 155 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c           |   8 ++
 arch/arm64/mm/fault.c                |  27 ++++--
 arch/x86/kernel/acpi/apei.c          |   2 +-
 drivers/acpi/apei/apei-base.c        |   4 +-
 drivers/acpi/apei/ghes.c             |   4 +-
 include/acpi/apei.h                  |   4 +-
 include/acpi/ghes.h                  |   3 +-
 12 files changed, 236 insertions(+), 14 deletions(-)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 0/3] arm64/ras: support sea error recovery
@ 2017-09-01 10:31 ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:31 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
are consumed. In some cases, if the error address is in a clean page or a
read-only page, there is a chance to recover. Such as error occurs in a
instruction page, we can reread this page from disk instead of killing process.

Because memory_failure() may sleep, we can not call it directly in SEA exception
context. So we saved faulting physical address associated with a process in the
ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
and get into do_notify_resume() before the process running, we could check it
and call memory_failure() to do recovery. It's safe, because we are in process
context.

In some platform, when SEA triggerred, physical address could be
reported by memory section or by processor section, so we save 
address at this two place.

Xie XiuQi (3):
  arm64/ras: support sea error recovery
  apei: add ghes param for arch_apei_report_mem_error
  arm64/apei: get error address from memory section for recovery

 arch/arm64/Kconfig                   |  11 +++
 arch/arm64/include/asm/ras.h         |  27 ++++++
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/ras.c              | 155 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c           |   8 ++
 arch/arm64/mm/fault.c                |  27 ++++--
 arch/x86/kernel/acpi/apei.c          |   2 +-
 drivers/acpi/apei/apei-base.c        |   4 +-
 drivers/acpi/apei/ghes.c             |   4 +-
 include/acpi/apei.h                  |   4 +-
 include/acpi/ghes.h                  |   3 +-
 12 files changed, 236 insertions(+), 14 deletions(-)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 0/3] arm64/ras: support sea error recovery
@ 2017-09-01 10:31 ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
are consumed. In some cases, if the error address is in a clean page or a
read-only page, there is a chance to recover. Such as error occurs in a
instruction page, we can reread this page from disk instead of killing process.

Because memory_failure() may sleep, we can not call it directly in SEA exception
context. So we saved faulting physical address associated with a process in the
ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
and get into do_notify_resume() before the process running, we could check it
and call memory_failure() to do recovery. It's safe, because we are in process
context.

In some platform, when SEA triggerred, physical address could be
reported by memory section or by processor section, so we save 
address at this two place.

Xie XiuQi (3):
  arm64/ras: support sea error recovery
  apei: add ghes param for arch_apei_report_mem_error
  arm64/apei: get error address from memory section for recovery

 arch/arm64/Kconfig                   |  11 +++
 arch/arm64/include/asm/ras.h         |  27 ++++++
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/ras.c              | 155 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c           |   8 ++
 arch/arm64/mm/fault.c                |  27 ++++--
 arch/x86/kernel/acpi/apei.c          |   2 +-
 drivers/acpi/apei/apei-base.c        |   4 +-
 drivers/acpi/apei/ghes.c             |   4 +-
 include/acpi/apei.h                  |   4 +-
 include/acpi/ghes.h                  |   3 +-
 12 files changed, 236 insertions(+), 14 deletions(-)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
  2017-09-01 10:31 ` Xie XiuQi
  (?)
@ 2017-09-01 10:31   ` Xie XiuQi
  -1 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:31 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
are consumed. In some cases, if the error address is in a clean page or a
read-only page, there is a chance to recover. Such as error occurs in a
instruction page, we can reread this page from disk instead of killing process.

Because memory_failure() may sleep, we can not call it directly in SEA exception
context. So we saved faulting physical address associated with a process in the
ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
and get into do_notify_resume() before the process running, we could check it
and call memory_failure() to do recovery. It's safe, because we are in process
context.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>
---
 arch/arm64/Kconfig                   |  11 +++
 arch/arm64/include/asm/ras.h         |  27 +++++++
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/ras.c              | 138 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c           |   8 ++
 arch/arm64/mm/fault.c                |  27 +++++--
 drivers/acpi/apei/ghes.c             |   2 +
 8 files changed, 209 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd9086..7d44589 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -640,6 +640,17 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+config ARM64_ERR_RECOV
+	bool "Support arm64 RAS error recovery"
+	depends on ACPI_APEI_SEA && MEMORY_FAILURE
+	help
+	  With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
+	  are consumed. In some cases, if the error address is in a clean page or a
+	  read-only page, there is a chance to recover. Such as error occurs in a
+	  instruction page, we can reread this page from disk instead of killing process.
+
+	  Say Y if unsure.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
new file mode 100644
index 0000000..8c4f6a8
--- /dev/null
+++ b/arch/arm64/include/asm/ras.h
@@ -0,0 +1,27 @@
+/*
+ * ARM64 SEA error recoery support
+ *
+ * Copyright 2017 Huawei Technologies Co., Ltd.
+ *   Author: Xie XiuQi <xiexiuqi@huawei.com>
+ *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_RAS_H
+#define _ASM_RAS_H
+
+#include <linux/cper.h>
+#include <acpi/ghes.h>
+
+extern void sea_notify_process(void);
+extern void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err);
+
+#endif /*_ASM_RAS_H*/
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 46c3b93..4b10131 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -86,6 +86,7 @@ struct thread_info {
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
+#define TIF_SEA_NOTIFY          5       /* notify to do an error recovery */
 #define TIF_NOHZ		7
 #define TIF_SYSCALL_TRACE	8
 #define TIF_SYSCALL_AUDIT	9
@@ -102,6 +103,7 @@ struct thread_info {
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
+#define _TIF_SEA_NOTIFY         (1 << TIF_SEA_NOTIFY)
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
@@ -111,7 +113,7 @@ struct thread_info {
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE)
+				 _TIF_UPROBE|_TIF_SEA_NOTIFY)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index f2b4e81..ba3abf8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
 arm64-obj-$(CONFIG_PCI)			+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)		+= acpi.o
+arm64-obj-$(CONFIG_ARM64_ERR_RECOV)	+= ras.o
 arm64-obj-$(CONFIG_ACPI_NUMA)		+= acpi_numa.o
 arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
new file mode 100644
index 0000000..8562ec7
--- /dev/null
+++ b/arch/arm64/kernel/ras.c
@@ -0,0 +1,138 @@
+/*
+ * ARM64 SEA error recoery support
+ *
+ * Copyright 2017 Huawei Technologies Co., Ltd.
+ *   Author: Xie XiuQi <xiexiuqi@huawei.com>
+ *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/cper.h>
+#include <linux/mm.h>
+#include <linux/preempt.h>
+#include <linux/acpi.h>
+#include <linux/sched/signal.h>
+
+#include <acpi/actbl1.h>
+#include <acpi/ghes.h>
+#include <acpi/apei.h>
+
+#include <asm/thread_info.h>
+#include <asm/atomic.h>
+#include <asm/ras.h>
+
+/*
+ * Need to save faulting physical address associated with a process
+ * in the sea ghes handler some place where we can grab it back
+ * later in sea_notify_process()
+ */
+#define SEA_INFO_MAX    16
+
+struct sea_info {
+        atomic_t                inuse;
+        struct task_struct      *t;
+        __u64                   paddr;
+} sea_info[SEA_INFO_MAX];
+
+static int sea_save_info(__u64 addr)
+{
+        struct sea_info *si;
+
+        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++) {
+                if (atomic_cmpxchg(&si->inuse, 0, 1) == 0) {
+                        si->t = current;
+                        si->paddr = addr;
+                        return 0;
+                }
+        }
+
+	pr_err("Too many concurrent recoverable errors\n");
+	return -ENOMEM;
+}
+
+static struct sea_info *sea_find_info(void)
+{
+        struct sea_info *si;
+
+        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++)
+                if (atomic_read(&si->inuse) && si->t == current)
+                        return si;
+        return NULL;
+}
+
+static void sea_clear_info(struct sea_info *si)
+{
+        atomic_set(&si->inuse, 0);
+}
+
+/*
+ * Called in process context that interrupted by SEA and marked with
+ * TIF_SEA_NOTIFY, just before returning to erroneous userland.
+ * This code is allowed to sleep.
+ * Attempt possible recovery such as calling the high level VM handler to
+ * process any corrupted pages, and kill/signal current process if required.
+ * Action required errors are handled here.
+ */
+void sea_notify_process(void)
+{
+	unsigned long pfn;
+	int fail = 0, flags = MF_ACTION_REQUIRED;
+	struct sea_info *si = sea_find_info();
+
+	if (!si)
+		panic("Lost physical address for consumed uncorrectable error");
+
+	clear_thread_flag(TIF_SEA_NOTIFY);
+	do {
+		pfn = si->paddr >> PAGE_SHIFT;
+
+
+		pr_err("Uncorrected hardware memory error in user-access at %llx\n",
+			si->paddr);
+		/*
+		 * We must call memory_failure() here even if the current process is
+		 * doomed. We still need to mark the page as poisoned and alert any
+		 * other users of the page.
+		 */
+		if (memory_failure(pfn, 0, flags) < 0) {
+			fail++;
+		}
+		sea_clear_info(si);
+
+		si = sea_find_info();
+	} while (si);
+
+	if (fail) {
+		pr_err("Memory error not recovered\n");
+		force_sig(SIGBUS, current);
+	}
+}
+
+void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
+{
+	int i, ret = -1;
+	struct cper_arm_err_info *err_info;
+
+	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
+	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
+		return;
+
+	err_info = (struct cper_arm_err_info *)(err + 1);
+	for (i = 0; i < err->err_info_num; i++, err_info++) {
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
+			ret |= sea_save_info(err_info->physical_fault_addr);
+		}
+	}
+
+	if (!ret)
+		set_thread_flag(TIF_SEA_NOTIFY);
+}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 089c3747..71e314e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -38,6 +38,7 @@
 #include <asm/fpsimd.h>
 #include <asm/signal32.h>
 #include <asm/vdso.h>
+#include <asm/ras.h>
 
 /*
  * Do a signal return; undo the signal stack. These are aligned to 128-bit.
@@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 	 * Update the trace code with the current status.
 	 */
 	trace_hardirqs_off();
+
+#ifdef CONFIG_ARM64_ERR_RECOV
+		/* notify userspace of pending SEAs */
+		if (thread_flags & _TIF_SEA_NOTIFY)
+			sea_notify_process();
+#endif /* CONFIG_ARM64_ERR_RECOV */
+
 	do {
 		if (thread_flags & _TIF_NEED_RESCHED) {
 			schedule();
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 1f22a41..b38476d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 			nmi_exit();
 	}
 
-	info.si_signo = SIGBUS;
-	info.si_errno = 0;
-	info.si_code  = 0;
-	if (esr & ESR_ELx_FnV)
-		info.si_addr = NULL;
-	else
-		info.si_addr  = (void __user *)addr;
-	arm64_notify_die("", regs, &info, esr);
+	if (user_mode(regs)) {
+		if (test_thread_flag(TIF_SEA_NOTIFY))
+			return ret;
+
+		info.si_signo = SIGBUS;
+		info.si_errno = 0;
+		info.si_code  = 0;
+		if (esr & ESR_ELx_FnV)
+			info.si_addr = NULL;
+		else
+			info.si_addr  = (void __user *)addr;
+
+		current->thread.fault_address = 0;
+		current->thread.fault_code = esr;
+		force_sig_info(info.si_signo, &info, current);
+	} else {
+		die("Uncorrected hardware memory error in kernel-access\n",
+		    regs, esr);
+	}
 
 	return ret;
 }
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..fa9400d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -52,6 +52,7 @@
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <asm/ras.h>
 #include <ras/ras_event.h>
 
 #include "apei-internal.h"
@@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 
+			arm_proc_error_check(ghes, err);
 			log_arm_hw_error(err);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
@ 2017-09-01 10:31   ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:31 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
are consumed. In some cases, if the error address is in a clean page or a
read-only page, there is a chance to recover. Such as error occurs in a
instruction page, we can reread this page from disk instead of killing process.

Because memory_failure() may sleep, we can not call it directly in SEA exception
context. So we saved faulting physical address associated with a process in the
ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
and get into do_notify_resume() before the process running, we could check it
and call memory_failure() to do recovery. It's safe, because we are in process
context.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>
---
 arch/arm64/Kconfig                   |  11 +++
 arch/arm64/include/asm/ras.h         |  27 +++++++
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/ras.c              | 138 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c           |   8 ++
 arch/arm64/mm/fault.c                |  27 +++++--
 drivers/acpi/apei/ghes.c             |   2 +
 8 files changed, 209 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd9086..7d44589 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -640,6 +640,17 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+config ARM64_ERR_RECOV
+	bool "Support arm64 RAS error recovery"
+	depends on ACPI_APEI_SEA && MEMORY_FAILURE
+	help
+	  With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
+	  are consumed. In some cases, if the error address is in a clean page or a
+	  read-only page, there is a chance to recover. Such as error occurs in a
+	  instruction page, we can reread this page from disk instead of killing process.
+
+	  Say Y if unsure.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
new file mode 100644
index 0000000..8c4f6a8
--- /dev/null
+++ b/arch/arm64/include/asm/ras.h
@@ -0,0 +1,27 @@
+/*
+ * ARM64 SEA error recoery support
+ *
+ * Copyright 2017 Huawei Technologies Co., Ltd.
+ *   Author: Xie XiuQi <xiexiuqi@huawei.com>
+ *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_RAS_H
+#define _ASM_RAS_H
+
+#include <linux/cper.h>
+#include <acpi/ghes.h>
+
+extern void sea_notify_process(void);
+extern void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err);
+
+#endif /*_ASM_RAS_H*/
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 46c3b93..4b10131 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -86,6 +86,7 @@ struct thread_info {
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
+#define TIF_SEA_NOTIFY          5       /* notify to do an error recovery */
 #define TIF_NOHZ		7
 #define TIF_SYSCALL_TRACE	8
 #define TIF_SYSCALL_AUDIT	9
@@ -102,6 +103,7 @@ struct thread_info {
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
+#define _TIF_SEA_NOTIFY         (1 << TIF_SEA_NOTIFY)
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
@@ -111,7 +113,7 @@ struct thread_info {
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE)
+				 _TIF_UPROBE|_TIF_SEA_NOTIFY)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index f2b4e81..ba3abf8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
 arm64-obj-$(CONFIG_PCI)			+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)		+= acpi.o
+arm64-obj-$(CONFIG_ARM64_ERR_RECOV)	+= ras.o
 arm64-obj-$(CONFIG_ACPI_NUMA)		+= acpi_numa.o
 arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
new file mode 100644
index 0000000..8562ec7
--- /dev/null
+++ b/arch/arm64/kernel/ras.c
@@ -0,0 +1,138 @@
+/*
+ * ARM64 SEA error recoery support
+ *
+ * Copyright 2017 Huawei Technologies Co., Ltd.
+ *   Author: Xie XiuQi <xiexiuqi@huawei.com>
+ *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/cper.h>
+#include <linux/mm.h>
+#include <linux/preempt.h>
+#include <linux/acpi.h>
+#include <linux/sched/signal.h>
+
+#include <acpi/actbl1.h>
+#include <acpi/ghes.h>
+#include <acpi/apei.h>
+
+#include <asm/thread_info.h>
+#include <asm/atomic.h>
+#include <asm/ras.h>
+
+/*
+ * Need to save faulting physical address associated with a process
+ * in the sea ghes handler some place where we can grab it back
+ * later in sea_notify_process()
+ */
+#define SEA_INFO_MAX    16
+
+struct sea_info {
+        atomic_t                inuse;
+        struct task_struct      *t;
+        __u64                   paddr;
+} sea_info[SEA_INFO_MAX];
+
+static int sea_save_info(__u64 addr)
+{
+        struct sea_info *si;
+
+        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++) {
+                if (atomic_cmpxchg(&si->inuse, 0, 1) == 0) {
+                        si->t = current;
+                        si->paddr = addr;
+                        return 0;
+                }
+        }
+
+	pr_err("Too many concurrent recoverable errors\n");
+	return -ENOMEM;
+}
+
+static struct sea_info *sea_find_info(void)
+{
+        struct sea_info *si;
+
+        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++)
+                if (atomic_read(&si->inuse) && si->t == current)
+                        return si;
+        return NULL;
+}
+
+static void sea_clear_info(struct sea_info *si)
+{
+        atomic_set(&si->inuse, 0);
+}
+
+/*
+ * Called in process context that interrupted by SEA and marked with
+ * TIF_SEA_NOTIFY, just before returning to erroneous userland.
+ * This code is allowed to sleep.
+ * Attempt possible recovery such as calling the high level VM handler to
+ * process any corrupted pages, and kill/signal current process if required.
+ * Action required errors are handled here.
+ */
+void sea_notify_process(void)
+{
+	unsigned long pfn;
+	int fail = 0, flags = MF_ACTION_REQUIRED;
+	struct sea_info *si = sea_find_info();
+
+	if (!si)
+		panic("Lost physical address for consumed uncorrectable error");
+
+	clear_thread_flag(TIF_SEA_NOTIFY);
+	do {
+		pfn = si->paddr >> PAGE_SHIFT;
+
+
+		pr_err("Uncorrected hardware memory error in user-access at %llx\n",
+			si->paddr);
+		/*
+		 * We must call memory_failure() here even if the current process is
+		 * doomed. We still need to mark the page as poisoned and alert any
+		 * other users of the page.
+		 */
+		if (memory_failure(pfn, 0, flags) < 0) {
+			fail++;
+		}
+		sea_clear_info(si);
+
+		si = sea_find_info();
+	} while (si);
+
+	if (fail) {
+		pr_err("Memory error not recovered\n");
+		force_sig(SIGBUS, current);
+	}
+}
+
+void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
+{
+	int i, ret = -1;
+	struct cper_arm_err_info *err_info;
+
+	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
+	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
+		return;
+
+	err_info = (struct cper_arm_err_info *)(err + 1);
+	for (i = 0; i < err->err_info_num; i++, err_info++) {
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
+			ret |= sea_save_info(err_info->physical_fault_addr);
+		}
+	}
+
+	if (!ret)
+		set_thread_flag(TIF_SEA_NOTIFY);
+}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 089c3747..71e314e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -38,6 +38,7 @@
 #include <asm/fpsimd.h>
 #include <asm/signal32.h>
 #include <asm/vdso.h>
+#include <asm/ras.h>
 
 /*
  * Do a signal return; undo the signal stack. These are aligned to 128-bit.
@@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 	 * Update the trace code with the current status.
 	 */
 	trace_hardirqs_off();
+
+#ifdef CONFIG_ARM64_ERR_RECOV
+		/* notify userspace of pending SEAs */
+		if (thread_flags & _TIF_SEA_NOTIFY)
+			sea_notify_process();
+#endif /* CONFIG_ARM64_ERR_RECOV */
+
 	do {
 		if (thread_flags & _TIF_NEED_RESCHED) {
 			schedule();
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 1f22a41..b38476d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 			nmi_exit();
 	}
 
-	info.si_signo = SIGBUS;
-	info.si_errno = 0;
-	info.si_code  = 0;
-	if (esr & ESR_ELx_FnV)
-		info.si_addr = NULL;
-	else
-		info.si_addr  = (void __user *)addr;
-	arm64_notify_die("", regs, &info, esr);
+	if (user_mode(regs)) {
+		if (test_thread_flag(TIF_SEA_NOTIFY))
+			return ret;
+
+		info.si_signo = SIGBUS;
+		info.si_errno = 0;
+		info.si_code  = 0;
+		if (esr & ESR_ELx_FnV)
+			info.si_addr = NULL;
+		else
+			info.si_addr  = (void __user *)addr;
+
+		current->thread.fault_address = 0;
+		current->thread.fault_code = esr;
+		force_sig_info(info.si_signo, &info, current);
+	} else {
+		die("Uncorrected hardware memory error in kernel-access\n",
+		    regs, esr);
+	}
 
 	return ret;
 }
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..fa9400d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -52,6 +52,7 @@
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <asm/ras.h>
 #include <ras/ras_event.h>
 
 #include "apei-internal.h"
@@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 
+			arm_proc_error_check(ghes, err);
 			log_arm_hw_error(err);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
@ 2017-09-01 10:31   ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
are consumed. In some cases, if the error address is in a clean page or a
read-only page, there is a chance to recover. Such as error occurs in a
instruction page, we can reread this page from disk instead of killing process.

Because memory_failure() may sleep, we can not call it directly in SEA exception
context. So we saved faulting physical address associated with a process in the
ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
and get into do_notify_resume() before the process running, we could check it
and call memory_failure() to do recovery. It's safe, because we are in process
context.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>
---
 arch/arm64/Kconfig                   |  11 +++
 arch/arm64/include/asm/ras.h         |  27 +++++++
 arch/arm64/include/asm/thread_info.h |   4 +-
 arch/arm64/kernel/Makefile           |   1 +
 arch/arm64/kernel/ras.c              | 138 +++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c           |   8 ++
 arch/arm64/mm/fault.c                |  27 +++++--
 drivers/acpi/apei/ghes.c             |   2 +
 8 files changed, 209 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd9086..7d44589 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -640,6 +640,17 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+config ARM64_ERR_RECOV
+	bool "Support arm64 RAS error recovery"
+	depends on ACPI_APEI_SEA && MEMORY_FAILURE
+	help
+	  With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
+	  are consumed. In some cases, if the error address is in a clean page or a
+	  read-only page, there is a chance to recover. Such as error occurs in a
+	  instruction page, we can reread this page from disk instead of killing process.
+
+	  Say Y if unsure.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
new file mode 100644
index 0000000..8c4f6a8
--- /dev/null
+++ b/arch/arm64/include/asm/ras.h
@@ -0,0 +1,27 @@
+/*
+ * ARM64 SEA error recoery support
+ *
+ * Copyright 2017 Huawei Technologies Co., Ltd.
+ *   Author: Xie XiuQi <xiexiuqi@huawei.com>
+ *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_RAS_H
+#define _ASM_RAS_H
+
+#include <linux/cper.h>
+#include <acpi/ghes.h>
+
+extern void sea_notify_process(void);
+extern void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err);
+
+#endif /*_ASM_RAS_H*/
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 46c3b93..4b10131 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -86,6 +86,7 @@ struct thread_info {
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
+#define TIF_SEA_NOTIFY          5       /* notify to do an error recovery */
 #define TIF_NOHZ		7
 #define TIF_SYSCALL_TRACE	8
 #define TIF_SYSCALL_AUDIT	9
@@ -102,6 +103,7 @@ struct thread_info {
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
+#define _TIF_SEA_NOTIFY         (1 << TIF_SEA_NOTIFY)
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
@@ -111,7 +113,7 @@ struct thread_info {
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE)
+				 _TIF_UPROBE|_TIF_SEA_NOTIFY)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index f2b4e81..ba3abf8 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
 arm64-obj-$(CONFIG_PCI)			+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)		+= acpi.o
+arm64-obj-$(CONFIG_ARM64_ERR_RECOV)	+= ras.o
 arm64-obj-$(CONFIG_ACPI_NUMA)		+= acpi_numa.o
 arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
new file mode 100644
index 0000000..8562ec7
--- /dev/null
+++ b/arch/arm64/kernel/ras.c
@@ -0,0 +1,138 @@
+/*
+ * ARM64 SEA error recoery support
+ *
+ * Copyright 2017 Huawei Technologies Co., Ltd.
+ *   Author: Xie XiuQi <xiexiuqi@huawei.com>
+ *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/cper.h>
+#include <linux/mm.h>
+#include <linux/preempt.h>
+#include <linux/acpi.h>
+#include <linux/sched/signal.h>
+
+#include <acpi/actbl1.h>
+#include <acpi/ghes.h>
+#include <acpi/apei.h>
+
+#include <asm/thread_info.h>
+#include <asm/atomic.h>
+#include <asm/ras.h>
+
+/*
+ * Need to save faulting physical address associated with a process
+ * in the sea ghes handler some place where we can grab it back
+ * later in sea_notify_process()
+ */
+#define SEA_INFO_MAX    16
+
+struct sea_info {
+        atomic_t                inuse;
+        struct task_struct      *t;
+        __u64                   paddr;
+} sea_info[SEA_INFO_MAX];
+
+static int sea_save_info(__u64 addr)
+{
+        struct sea_info *si;
+
+        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++) {
+                if (atomic_cmpxchg(&si->inuse, 0, 1) == 0) {
+                        si->t = current;
+                        si->paddr = addr;
+                        return 0;
+                }
+        }
+
+	pr_err("Too many concurrent recoverable errors\n");
+	return -ENOMEM;
+}
+
+static struct sea_info *sea_find_info(void)
+{
+        struct sea_info *si;
+
+        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++)
+                if (atomic_read(&si->inuse) && si->t == current)
+                        return si;
+        return NULL;
+}
+
+static void sea_clear_info(struct sea_info *si)
+{
+        atomic_set(&si->inuse, 0);
+}
+
+/*
+ * Called in process context that interrupted by SEA and marked with
+ * TIF_SEA_NOTIFY, just before returning to erroneous userland.
+ * This code is allowed to sleep.
+ * Attempt possible recovery such as calling the high level VM handler to
+ * process any corrupted pages, and kill/signal current process if required.
+ * Action required errors are handled here.
+ */
+void sea_notify_process(void)
+{
+	unsigned long pfn;
+	int fail = 0, flags = MF_ACTION_REQUIRED;
+	struct sea_info *si = sea_find_info();
+
+	if (!si)
+		panic("Lost physical address for consumed uncorrectable error");
+
+	clear_thread_flag(TIF_SEA_NOTIFY);
+	do {
+		pfn = si->paddr >> PAGE_SHIFT;
+
+
+		pr_err("Uncorrected hardware memory error in user-access at %llx\n",
+			si->paddr);
+		/*
+		 * We must call memory_failure() here even if the current process is
+		 * doomed. We still need to mark the page as poisoned and alert any
+		 * other users of the page.
+		 */
+		if (memory_failure(pfn, 0, flags) < 0) {
+			fail++;
+		}
+		sea_clear_info(si);
+
+		si = sea_find_info();
+	} while (si);
+
+	if (fail) {
+		pr_err("Memory error not recovered\n");
+		force_sig(SIGBUS, current);
+	}
+}
+
+void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
+{
+	int i, ret = -1;
+	struct cper_arm_err_info *err_info;
+
+	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
+	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
+		return;
+
+	err_info = (struct cper_arm_err_info *)(err + 1);
+	for (i = 0; i < err->err_info_num; i++, err_info++) {
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
+			ret |= sea_save_info(err_info->physical_fault_addr);
+		}
+	}
+
+	if (!ret)
+		set_thread_flag(TIF_SEA_NOTIFY);
+}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 089c3747..71e314e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -38,6 +38,7 @@
 #include <asm/fpsimd.h>
 #include <asm/signal32.h>
 #include <asm/vdso.h>
+#include <asm/ras.h>
 
 /*
  * Do a signal return; undo the signal stack. These are aligned to 128-bit.
@@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 	 * Update the trace code with the current status.
 	 */
 	trace_hardirqs_off();
+
+#ifdef CONFIG_ARM64_ERR_RECOV
+		/* notify userspace of pending SEAs */
+		if (thread_flags & _TIF_SEA_NOTIFY)
+			sea_notify_process();
+#endif /* CONFIG_ARM64_ERR_RECOV */
+
 	do {
 		if (thread_flags & _TIF_NEED_RESCHED) {
 			schedule();
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 1f22a41..b38476d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 			nmi_exit();
 	}
 
-	info.si_signo = SIGBUS;
-	info.si_errno = 0;
-	info.si_code  = 0;
-	if (esr & ESR_ELx_FnV)
-		info.si_addr = NULL;
-	else
-		info.si_addr  = (void __user *)addr;
-	arm64_notify_die("", regs, &info, esr);
+	if (user_mode(regs)) {
+		if (test_thread_flag(TIF_SEA_NOTIFY))
+			return ret;
+
+		info.si_signo = SIGBUS;
+		info.si_errno = 0;
+		info.si_code  = 0;
+		if (esr & ESR_ELx_FnV)
+			info.si_addr = NULL;
+		else
+			info.si_addr  = (void __user *)addr;
+
+		current->thread.fault_address = 0;
+		current->thread.fault_code = esr;
+		force_sig_info(info.si_signo, &info, current);
+	} else {
+		die("Uncorrected hardware memory error in kernel-access\n",
+		    regs, esr);
+	}
 
 	return ret;
 }
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..fa9400d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -52,6 +52,7 @@
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <asm/ras.h>
 #include <ras/ras_event.h>
 
 #include "apei-internal.h"
@@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 
+			arm_proc_error_check(ghes, err);
 			log_arm_hw_error(err);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
  2017-09-01 10:31 ` Xie XiuQi
  (?)
@ 2017-09-01 10:32   ` Xie XiuQi
  -1 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:32 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

Add ghes param for arch_apei_report_mem_error, with which
we could do more arch-specific processing.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/x86/kernel/acpi/apei.c   | 2 +-
 drivers/acpi/apei/apei-base.c | 4 +++-
 drivers/acpi/apei/ghes.c      | 2 +-
 include/acpi/apei.h           | 4 +++-
 include/acpi/ghes.h           | 3 ++-
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
index ea3046e..1bf1c9b 100644
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
 	return 1;
 }
 
-void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
 {
 #ifdef CONFIG_X86_MCE
 	apei_mce_report_mem_error(sev, mem_err);
diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index da370e1..317169b 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -38,6 +38,8 @@
 #include <linux/debugfs.h>
 #include <asm/unaligned.h>
 
+#include <acpi/ghes.h>
+
 #include "apei-internal.h"
 
 #define APEI_PFX "APEI: "
@@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
 }
 EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
 
-void __weak arch_apei_report_mem_error(int sev,
+void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
 				       struct cper_sec_mem_err *mem_err)
 {
 }
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fa9400d..996d16c4 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
 
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
-			arch_apei_report_mem_error(sev, mem_err);
+			arch_apei_report_mem_error(ghes, sev, mem_err);
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
diff --git a/include/acpi/apei.h b/include/acpi/apei.h
index 76284bb..586dfb0 100644
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -9,6 +9,8 @@
 #include <linux/cper.h>
 #include <asm/ioctls.h>
 
+#include <acpi/ghes.h>
+
 #define APEI_ERST_INVALID_RECORD_ID	0xffffffffffffffffULL
 
 #define APEI_ERST_CLEAR_RECORD		_IOW('E', 1, u64)
@@ -43,7 +45,7 @@ ssize_t erst_read(u64 record_id, struct cper_record_header *record,
 int erst_clear(u64 record_id);
 
 int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
-void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
+void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err);
 void arch_apei_flush_tlb_one(unsigned long addr);
 
 #endif
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 9f26e01..3100791 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -1,7 +1,8 @@
 #ifndef GHES_H
 #define GHES_H
 
-#include <acpi/apei.h>
+#include <linux/acpi.h>
+#include <linux/cper.h>
 #include <acpi/hed.h>
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
@ 2017-09-01 10:32   ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:32 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

Add ghes param for arch_apei_report_mem_error, with which
we could do more arch-specific processing.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/x86/kernel/acpi/apei.c   | 2 +-
 drivers/acpi/apei/apei-base.c | 4 +++-
 drivers/acpi/apei/ghes.c      | 2 +-
 include/acpi/apei.h           | 4 +++-
 include/acpi/ghes.h           | 3 ++-
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
index ea3046e..1bf1c9b 100644
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
 	return 1;
 }
 
-void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
 {
 #ifdef CONFIG_X86_MCE
 	apei_mce_report_mem_error(sev, mem_err);
diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index da370e1..317169b 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -38,6 +38,8 @@
 #include <linux/debugfs.h>
 #include <asm/unaligned.h>
 
+#include <acpi/ghes.h>
+
 #include "apei-internal.h"
 
 #define APEI_PFX "APEI: "
@@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
 }
 EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
 
-void __weak arch_apei_report_mem_error(int sev,
+void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
 				       struct cper_sec_mem_err *mem_err)
 {
 }
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fa9400d..996d16c4 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
 
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
-			arch_apei_report_mem_error(sev, mem_err);
+			arch_apei_report_mem_error(ghes, sev, mem_err);
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
diff --git a/include/acpi/apei.h b/include/acpi/apei.h
index 76284bb..586dfb0 100644
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -9,6 +9,8 @@
 #include <linux/cper.h>
 #include <asm/ioctls.h>
 
+#include <acpi/ghes.h>
+
 #define APEI_ERST_INVALID_RECORD_ID	0xffffffffffffffffULL
 
 #define APEI_ERST_CLEAR_RECORD		_IOW('E', 1, u64)
@@ -43,7 +45,7 @@ ssize_t erst_read(u64 record_id, struct cper_record_header *record,
 int erst_clear(u64 record_id);
 
 int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
-void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
+void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err);
 void arch_apei_flush_tlb_one(unsigned long addr);
 
 #endif
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 9f26e01..3100791 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -1,7 +1,8 @@
 #ifndef GHES_H
 #define GHES_H
 
-#include <acpi/apei.h>
+#include <linux/acpi.h>
+#include <linux/cper.h>
 #include <acpi/hed.h>
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
@ 2017-09-01 10:32   ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

Add ghes param for arch_apei_report_mem_error, with which
we could do more arch-specific processing.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/x86/kernel/acpi/apei.c   | 2 +-
 drivers/acpi/apei/apei-base.c | 4 +++-
 drivers/acpi/apei/ghes.c      | 2 +-
 include/acpi/apei.h           | 4 +++-
 include/acpi/ghes.h           | 3 ++-
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
index ea3046e..1bf1c9b 100644
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
 	return 1;
 }
 
-void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
+void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
 {
 #ifdef CONFIG_X86_MCE
 	apei_mce_report_mem_error(sev, mem_err);
diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index da370e1..317169b 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -38,6 +38,8 @@
 #include <linux/debugfs.h>
 #include <asm/unaligned.h>
 
+#include <acpi/ghes.h>
+
 #include "apei-internal.h"
 
 #define APEI_PFX "APEI: "
@@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
 }
 EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
 
-void __weak arch_apei_report_mem_error(int sev,
+void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
 				       struct cper_sec_mem_err *mem_err)
 {
 }
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fa9400d..996d16c4 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
 
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
-			arch_apei_report_mem_error(sev, mem_err);
+			arch_apei_report_mem_error(ghes, sev, mem_err);
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
diff --git a/include/acpi/apei.h b/include/acpi/apei.h
index 76284bb..586dfb0 100644
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -9,6 +9,8 @@
 #include <linux/cper.h>
 #include <asm/ioctls.h>
 
+#include <acpi/ghes.h>
+
 #define APEI_ERST_INVALID_RECORD_ID	0xffffffffffffffffULL
 
 #define APEI_ERST_CLEAR_RECORD		_IOW('E', 1, u64)
@@ -43,7 +45,7 @@ ssize_t erst_read(u64 record_id, struct cper_record_header *record,
 int erst_clear(u64 record_id);
 
 int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
-void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
+void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err);
 void arch_apei_flush_tlb_one(unsigned long addr);
 
 #endif
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 9f26e01..3100791 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -1,7 +1,8 @@
 #ifndef GHES_H
 #define GHES_H
 
-#include <acpi/apei.h>
+#include <linux/acpi.h>
+#include <linux/cper.h>
 #include <acpi/hed.h>
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 3/3] arm64/apei: get error address from memory section for recovery
  2017-09-01 10:31 ` Xie XiuQi
  (?)
@ 2017-09-01 10:32   ` Xie XiuQi
  -1 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:32 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

In some platform, when SEA triggerred, physical address might be
reported by memory section, so we save it for error recovery later.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/arm64/kernel/ras.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
index 8562ec7..2b400b8 100644
--- a/arch/arm64/kernel/ras.c
+++ b/arch/arm64/kernel/ras.c
@@ -136,3 +136,20 @@ void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
 	if (!ret)
 		set_thread_flag(TIF_SEA_NOTIFY);
 }
+
+void arch_apei_report_mem_error(struct ghes *ghes, int sev,
+				struct cper_sec_mem_err *mem)
+{
+	int ret = -1;
+
+	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
+	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
+		return;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA) {
+		ret = sea_save_info(mem->physical_addr);
+	}
+
+	if (!ret)
+		set_thread_flag(TIF_SEA_NOTIFY);
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 3/3] arm64/apei: get error address from memory section for recovery
@ 2017-09-01 10:32   ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:32 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar, bp,
	shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, xiexiuqi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

In some platform, when SEA triggerred, physical address might be
reported by memory section, so we save it for error recovery later.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/arm64/kernel/ras.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
index 8562ec7..2b400b8 100644
--- a/arch/arm64/kernel/ras.c
+++ b/arch/arm64/kernel/ras.c
@@ -136,3 +136,20 @@ void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
 	if (!ret)
 		set_thread_flag(TIF_SEA_NOTIFY);
 }
+
+void arch_apei_report_mem_error(struct ghes *ghes, int sev,
+				struct cper_sec_mem_err *mem)
+{
+	int ret = -1;
+
+	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
+	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
+		return;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA) {
+		ret = sea_save_info(mem->physical_addr);
+	}
+
+	if (!ret)
+		set_thread_flag(TIF_SEA_NOTIFY);
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 3/3] arm64/apei: get error address from memory section for recovery
@ 2017-09-01 10:32   ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-01 10:32 UTC (permalink / raw)
  To: linux-arm-kernel

In some platform, when SEA triggerred, physical address might be
reported by memory section, so we save it for error recovery later.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/arm64/kernel/ras.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
index 8562ec7..2b400b8 100644
--- a/arch/arm64/kernel/ras.c
+++ b/arch/arm64/kernel/ras.c
@@ -136,3 +136,20 @@ void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
 	if (!ret)
 		set_thread_flag(TIF_SEA_NOTIFY);
 }
+
+void arch_apei_report_mem_error(struct ghes *ghes, int sev,
+				struct cper_sec_mem_err *mem)
+{
+	int ret = -1;
+
+	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
+	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
+		return;
+
+	if (mem->validation_bits & CPER_MEM_VALID_PA) {
+		ret = sea_save_info(mem->physical_addr);
+	}
+
+	if (!ret)
+		set_thread_flag(TIF_SEA_NOTIFY);
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
  2017-09-01 10:32   ` Xie XiuQi
@ 2017-09-01 11:15     ` Borislav Petkov
  -1 siblings, 0 replies; 24+ messages in thread
From: Borislav Petkov @ 2017-09-01 11:15 UTC (permalink / raw)
  To: Xie XiuQi
  Cc: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar,
	shiju.jose, zjzhang, linux-arm-kernel, linux-kernel, linux-acpi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

n Fri, Sep 01, 2017 at 06:32:00PM +0800, Xie XiuQi wrote:
> Add ghes param for arch_apei_report_mem_error, with which
> we could do more arch-specific processing.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> ---
>  arch/x86/kernel/acpi/apei.c   | 2 +-
>  drivers/acpi/apei/apei-base.c | 4 +++-
>  drivers/acpi/apei/ghes.c      | 2 +-
>  include/acpi/apei.h           | 4 +++-
>  include/acpi/ghes.h           | 3 ++-
>  5 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
> index ea3046e..1bf1c9b 100644
> --- a/arch/x86/kernel/acpi/apei.c
> +++ b/arch/x86/kernel/acpi/apei.c
> @@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>  	return 1;
>  }
>  
> -void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> +void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
>  {
>  #ifdef CONFIG_X86_MCE
>  	apei_mce_report_mem_error(sev, mem_err);
> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> index da370e1..317169b 100644
> --- a/drivers/acpi/apei/apei-base.c
> +++ b/drivers/acpi/apei/apei-base.c
> @@ -38,6 +38,8 @@
>  #include <linux/debugfs.h>
>  #include <asm/unaligned.h>
>  
> +#include <acpi/ghes.h>
> +
>  #include "apei-internal.h"
>  
>  #define APEI_PFX "APEI: "
> @@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
>  }
>  EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
>  
> -void __weak arch_apei_report_mem_error(int sev,
> +void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
>  				       struct cper_sec_mem_err *mem_err)
>  {
>  }
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fa9400d..996d16c4 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  			ghes_edac_report_mem_error(ghes, sev, mem_err);
>  
> -			arch_apei_report_mem_error(sev, mem_err);
> +			arch_apei_report_mem_error(ghes, sev, mem_err);

And next time you want to pass something else, you'll have to touch all
those files again...

Instead, make that a notifier to which consumers register and define
a separate struct mem_err or ghes_err or whatnot and populate it with
cper_sec_mem_err data and whatever else is needed by the consumers.
Instead of passing that struct ghes * which consumers don't need to
know.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
@ 2017-09-01 11:15     ` Borislav Petkov
  0 siblings, 0 replies; 24+ messages in thread
From: Borislav Petkov @ 2017-09-01 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

n Fri, Sep 01, 2017 at 06:32:00PM +0800, Xie XiuQi wrote:
> Add ghes param for arch_apei_report_mem_error, with which
> we could do more arch-specific processing.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> ---
>  arch/x86/kernel/acpi/apei.c   | 2 +-
>  drivers/acpi/apei/apei-base.c | 4 +++-
>  drivers/acpi/apei/ghes.c      | 2 +-
>  include/acpi/apei.h           | 4 +++-
>  include/acpi/ghes.h           | 3 ++-
>  5 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
> index ea3046e..1bf1c9b 100644
> --- a/arch/x86/kernel/acpi/apei.c
> +++ b/arch/x86/kernel/acpi/apei.c
> @@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>  	return 1;
>  }
>  
> -void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> +void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
>  {
>  #ifdef CONFIG_X86_MCE
>  	apei_mce_report_mem_error(sev, mem_err);
> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> index da370e1..317169b 100644
> --- a/drivers/acpi/apei/apei-base.c
> +++ b/drivers/acpi/apei/apei-base.c
> @@ -38,6 +38,8 @@
>  #include <linux/debugfs.h>
>  #include <asm/unaligned.h>
>  
> +#include <acpi/ghes.h>
> +
>  #include "apei-internal.h"
>  
>  #define APEI_PFX "APEI: "
> @@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
>  }
>  EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
>  
> -void __weak arch_apei_report_mem_error(int sev,
> +void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
>  				       struct cper_sec_mem_err *mem_err)
>  {
>  }
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fa9400d..996d16c4 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  			ghes_edac_report_mem_error(ghes, sev, mem_err);
>  
> -			arch_apei_report_mem_error(sev, mem_err);
> +			arch_apei_report_mem_error(ghes, sev, mem_err);

And next time you want to pass something else, you'll have to touch all
those files again...

Instead, make that a notifier to which consumers register and define
a separate struct mem_err or ghes_err or whatnot and populate it with
cper_sec_mem_err data and whatever else is needed by the consumers.
Instead of passing that struct ghes * which consumers don't need to
know.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
-- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
  2017-09-01 10:31   ` Xie XiuQi
@ 2017-09-01 11:16     ` Borislav Petkov
  -1 siblings, 0 replies; 24+ messages in thread
From: Borislav Petkov @ 2017-09-01 11:16 UTC (permalink / raw)
  To: Xie XiuQi
  Cc: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar,
	shiju.jose, zjzhang, linux-arm-kernel, linux-kernel, linux-acpi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

On Fri, Sep 01, 2017 at 06:31:59PM +0800, Xie XiuQi wrote:
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..fa9400d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -52,6 +52,7 @@
>  #include <acpi/ghes.h>
>  #include <acpi/apei.h>
>  #include <asm/tlbflush.h>
> +#include <asm/ras.h>
>  #include <ras/ras_event.h>
>  
>  #include "apei-internal.h"
> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>  			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>  
> +			arm_proc_error_check(ghes, err);
>  			log_arm_hw_error(err);

Wrap those two in a single arm_process_error() which does everything
needed on ARM.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
@ 2017-09-01 11:16     ` Borislav Petkov
  0 siblings, 0 replies; 24+ messages in thread
From: Borislav Petkov @ 2017-09-01 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 01, 2017 at 06:31:59PM +0800, Xie XiuQi wrote:
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..fa9400d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -52,6 +52,7 @@
>  #include <acpi/ghes.h>
>  #include <acpi/apei.h>
>  #include <asm/tlbflush.h>
> +#include <asm/ras.h>
>  #include <ras/ras_event.h>
>  
>  #include "apei-internal.h"
> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>  			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>  
> +			arm_proc_error_check(ghes, err);
>  			log_arm_hw_error(err);

Wrap those two in a single arm_process_error() which does everything
needed on ARM.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imend?rffer, Jane Smithard, Graham Norton, HRB 21284 (AG N?rnberg)
-- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
  2017-09-01 10:31   ` Xie XiuQi
@ 2017-09-01 15:51     ` Julien Thierry
  -1 siblings, 0 replies; 24+ messages in thread
From: Julien Thierry @ 2017-09-01 15:51 UTC (permalink / raw)
  To: Xie XiuQi, catalin.marinas, will.deacon, mingo, x86,
	mark.rutland, ard.biesheuvel, james.morse, takahiro.akashi,
	tbaicar, bp, shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, wangxiongfeng2,
	zhengqiang10, gengdongjiu

Hi Xie,

On 01/09/17 11:31, Xie XiuQi wrote:
> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
> are consumed. In some cases, if the error address is in a clean page or a
> read-only page, there is a chance to recover. Such as error occurs in a
> instruction page, we can reread this page from disk instead of killing process.
> 
> Because memory_failure() may sleep, we can not call it directly in SEA exception
> context. So we saved faulting physical address associated with a process in the
> ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
> and get into do_notify_resume() before the process running, we could check it
> and call memory_failure() to do recovery. It's safe, because we are in process
> context.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> ---
>   arch/arm64/Kconfig                   |  11 +++
>   arch/arm64/include/asm/ras.h         |  27 +++++++
>   arch/arm64/include/asm/thread_info.h |   4 +-
>   arch/arm64/kernel/Makefile           |   1 +
>   arch/arm64/kernel/ras.c              | 138 +++++++++++++++++++++++++++++++++++
>   arch/arm64/kernel/signal.c           |   8 ++
>   arch/arm64/mm/fault.c                |  27 +++++--
>   drivers/acpi/apei/ghes.c             |   2 +
>   8 files changed, 209 insertions(+), 9 deletions(-)
>   create mode 100644 arch/arm64/include/asm/ras.h
>   create mode 100644 arch/arm64/kernel/ras.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index dfd9086..7d44589 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -640,6 +640,17 @@ config HOTPLUG_CPU
>   	  Say Y here to experiment with turning CPUs off and on.  CPUs
>   	  can be controlled through /sys/devices/system/cpu.
>   
> +config ARM64_ERR_RECOV
> +	bool "Support arm64 RAS error recovery"
> +	depends on ACPI_APEI_SEA && MEMORY_FAILURE
> +	help
> +	  With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
> +	  are consumed. In some cases, if the error address is in a clean page or a
> +	  read-only page, there is a chance to recover. Such as error occurs in a
> +	  instruction page, we can reread this page from disk instead of killing process.
> +
> +	  Say Y if unsure.
> +
>   # Common NUMA Features
>   config NUMA
>   	bool "Numa Memory Allocation and Scheduler Support"
> diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
> new file mode 100644
> index 0000000..8c4f6a8
> --- /dev/null
> +++ b/arch/arm64/include/asm/ras.h
> @@ -0,0 +1,27 @@
> +/*
> + * ARM64 SEA error recoery support
> + *
> + * Copyright 2017 Huawei Technologies Co., Ltd.
> + *   Author: Xie XiuQi <xiexiuqi@huawei.com>
> + *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation;
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _ASM_RAS_H
> +#define _ASM_RAS_H
> +
> +#include <linux/cper.h>
> +#include <acpi/ghes.h>
> +
> +extern void sea_notify_process(void);
> +extern void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err);
> +
> +#endif /*_ASM_RAS_H*/
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 46c3b93..4b10131 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -86,6 +86,7 @@ struct thread_info {
>   #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>   #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
>   #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
> +#define TIF_SEA_NOTIFY          5       /* notify to do an error recovery */
>   #define TIF_NOHZ		7
>   #define TIF_SYSCALL_TRACE	8
>   #define TIF_SYSCALL_AUDIT	9
> @@ -102,6 +103,7 @@ struct thread_info {
>   #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
>   #define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
>   #define _TIF_NOHZ		(1 << TIF_NOHZ)
> +#define _TIF_SEA_NOTIFY         (1 << TIF_SEA_NOTIFY)
>   #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
>   #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
>   #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
> @@ -111,7 +113,7 @@ struct thread_info {
>   
>   #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
>   				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> -				 _TIF_UPROBE)
> +				 _TIF_UPROBE|_TIF_SEA_NOTIFY)
>   
>   #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>   				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index f2b4e81..ba3abf8 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
>   arm64-obj-$(CONFIG_PCI)			+= pci.o
>   arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
>   arm64-obj-$(CONFIG_ACPI)		+= acpi.o
> +arm64-obj-$(CONFIG_ARM64_ERR_RECOV)	+= ras.o
>   arm64-obj-$(CONFIG_ACPI_NUMA)		+= acpi_numa.o
>   arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
>   arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
> diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
> new file mode 100644
> index 0000000..8562ec7
> --- /dev/null
> +++ b/arch/arm64/kernel/ras.c
> @@ -0,0 +1,138 @@
> +/*
> + * ARM64 SEA error recoery support
> + *
> + * Copyright 2017 Huawei Technologies Co., Ltd.
> + *   Author: Xie XiuQi <xiexiuqi@huawei.com>
> + *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation;
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/cper.h>
> +#include <linux/mm.h>
> +#include <linux/preempt.h>
> +#include <linux/acpi.h>
> +#include <linux/sched/signal.h>
> +
> +#include <acpi/actbl1.h>
> +#include <acpi/ghes.h>
> +#include <acpi/apei.h>
> +
> +#include <asm/thread_info.h>
> +#include <asm/atomic.h>
> +#include <asm/ras.h>
> +
> +/*
> + * Need to save faulting physical address associated with a process
> + * in the sea ghes handler some place where we can grab it back
> + * later in sea_notify_process()
> + */
> +#define SEA_INFO_MAX    16
> +
> +struct sea_info {
> +        atomic_t                inuse;
> +        struct task_struct      *t;
> +        __u64                   paddr;
> +} sea_info[SEA_INFO_MAX];
> +
> +static int sea_save_info(__u64 addr)
> +{
> +        struct sea_info *si;
> +
> +        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++) {
> +                if (atomic_cmpxchg(&si->inuse, 0, 1) == 0) {
> +                        si->t = current;
> +                        si->paddr = addr;
> +                        return 0;
> +                }
> +        }
> +
> +	pr_err("Too many concurrent recoverable errors\n");
> +	return -ENOMEM;
> +}
> +
> +static struct sea_info *sea_find_info(void)
> +{
> +        struct sea_info *si;
> +
> +        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++)
> +                if (atomic_read(&si->inuse) && si->t == current)
> +                        return si;
> +        return NULL;
> +}
> +
> +static void sea_clear_info(struct sea_info *si)
> +{
> +        atomic_set(&si->inuse, 0);
> +}
> +
> +/*
> + * Called in process context that interrupted by SEA and marked with
> + * TIF_SEA_NOTIFY, just before returning to erroneous userland.
> + * This code is allowed to sleep.
> + * Attempt possible recovery such as calling the high level VM handler to
> + * process any corrupted pages, and kill/signal current process if required.
> + * Action required errors are handled here.
> + */
> +void sea_notify_process(void)
> +{
> +	unsigned long pfn;
> +	int fail = 0, flags = MF_ACTION_REQUIRED;
> +	struct sea_info *si = sea_find_info();
> +
> +	if (!si)
> +		panic("Lost physical address for consumed uncorrectable error");
> +
> +	clear_thread_flag(TIF_SEA_NOTIFY);
> +	do {
> +		pfn = si->paddr >> PAGE_SHIFT;
> +
> +
> +		pr_err("Uncorrected hardware memory error in user-access at %llx\n",
> +			si->paddr);
> +		/*
> +		 * We must call memory_failure() here even if the current process is
> +		 * doomed. We still need to mark the page as poisoned and alert any
> +		 * other users of the page.
> +		 */
> +		if (memory_failure(pfn, 0, flags) < 0) {
> +			fail++;
> +		}
> +		sea_clear_info(si);
> +
> +		si = sea_find_info();
> +	} while (si);
> +
> +	if (fail) {
> +		pr_err("Memory error not recovered\n");
> +		force_sig(SIGBUS, current);
> +	}
> +}
> +
> +void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
> +{
> +	int i, ret = -1;
> +	struct cper_arm_err_info *err_info;
> +
> +	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
> +	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
> +		return;
> +
> +	err_info = (struct cper_arm_err_info *)(err + 1);
> +	for (i = 0; i < err->err_info_num; i++, err_info++) {
> +		if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
> +			ret |= sea_save_info(err_info->physical_fault_addr);
> +		}
> +	}
> +
> +	if (!ret)

If ret is initialized to -1, this is never true since you only OR bits 
in ret.

Should the body of the loop be:
	ret &= sea_save_info(err_info->physical_fault_addr);

so as long as you as you manage to store 1 sea_info you set the thread flag?

But if that's the case a boolean might make more sense:

bool info_saved = false;
[...]
	info_saved |= !sea_save_info(err_info->physical_fault_addr);
[...]
if (info_saved)
		[...]


> +		set_thread_flag(TIF_SEA_NOTIFY);
> +}
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 089c3747..71e314e 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -38,6 +38,7 @@
>   #include <asm/fpsimd.h>
>   #include <asm/signal32.h>
>   #include <asm/vdso.h>
> +#include <asm/ras.h>
>   
>   /*
>    * Do a signal return; undo the signal stack. These are aligned to 128-bit.
> @@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>   	 * Update the trace code with the current status.
>   	 */
>   	trace_hardirqs_off();
> +
> +#ifdef CONFIG_ARM64_ERR_RECOV
> +		/* notify userspace of pending SEAs */
> +		if (thread_flags & _TIF_SEA_NOTIFY)
> +			sea_notify_process();
> +#endif /* CONFIG_ARM64_ERR_RECOV */
> +
>   	do {
>   		if (thread_flags & _TIF_NEED_RESCHED) {
>   			schedule();
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 1f22a41..b38476d 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>   			nmi_exit();
>   	}
>   
> -	info.si_signo = SIGBUS;
> -	info.si_errno = 0;
> -	info.si_code  = 0;
> -	if (esr & ESR_ELx_FnV)
> -		info.si_addr = NULL;
> -	else
> -		info.si_addr  = (void __user *)addr;
> -	arm64_notify_die("", regs, &info, esr);
> +	if (user_mode(regs)) {
> +		if (test_thread_flag(TIF_SEA_NOTIFY))
> +			return ret;
> +
> +		info.si_signo = SIGBUS;
> +		info.si_errno = 0;
> +		info.si_code  = 0;
> +		if (esr & ESR_ELx_FnV)
> +			info.si_addr = NULL;
> +		else
> +			info.si_addr  = (void __user *)addr;
> +
> +		current->thread.fault_address = 0;
> +		current->thread.fault_code = esr;
> +		force_sig_info(info.si_signo, &info, current);
> +	} else {
> +		die("Uncorrected hardware memory error in kernel-access\n",
> +		    regs, esr);
> +	}
>   
>   	return ret;
>   }
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..fa9400d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -52,6 +52,7 @@
>   #include <acpi/ghes.h>
>   #include <acpi/apei.h>
>   #include <asm/tlbflush.h>
> +#include <asm/ras.h>
>   #include <ras/ras_event.h>
>   
>   #include "apei-internal.h"
> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>   		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>   			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>   
> +			arm_proc_error_check(ghes, err);

If I understand the Makefile change correctly, arm_proc_error_check is 
compiled only when CONFIG_ARM64_ERR_RECOV, don't you get a linker error 
here if this config is not selected?

Otherwise patch looks fine.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
@ 2017-09-01 15:51     ` Julien Thierry
  0 siblings, 0 replies; 24+ messages in thread
From: Julien Thierry @ 2017-09-01 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Xie,

On 01/09/17 11:31, Xie XiuQi wrote:
> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
> are consumed. In some cases, if the error address is in a clean page or a
> read-only page, there is a chance to recover. Such as error occurs in a
> instruction page, we can reread this page from disk instead of killing process.
> 
> Because memory_failure() may sleep, we can not call it directly in SEA exception
> context. So we saved faulting physical address associated with a process in the
> ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
> and get into do_notify_resume() before the process running, we could check it
> and call memory_failure() to do recovery. It's safe, because we are in process
> context.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> ---
>   arch/arm64/Kconfig                   |  11 +++
>   arch/arm64/include/asm/ras.h         |  27 +++++++
>   arch/arm64/include/asm/thread_info.h |   4 +-
>   arch/arm64/kernel/Makefile           |   1 +
>   arch/arm64/kernel/ras.c              | 138 +++++++++++++++++++++++++++++++++++
>   arch/arm64/kernel/signal.c           |   8 ++
>   arch/arm64/mm/fault.c                |  27 +++++--
>   drivers/acpi/apei/ghes.c             |   2 +
>   8 files changed, 209 insertions(+), 9 deletions(-)
>   create mode 100644 arch/arm64/include/asm/ras.h
>   create mode 100644 arch/arm64/kernel/ras.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index dfd9086..7d44589 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -640,6 +640,17 @@ config HOTPLUG_CPU
>   	  Say Y here to experiment with turning CPUs off and on.  CPUs
>   	  can be controlled through /sys/devices/system/cpu.
>   
> +config ARM64_ERR_RECOV
> +	bool "Support arm64 RAS error recovery"
> +	depends on ACPI_APEI_SEA && MEMORY_FAILURE
> +	help
> +	  With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
> +	  are consumed. In some cases, if the error address is in a clean page or a
> +	  read-only page, there is a chance to recover. Such as error occurs in a
> +	  instruction page, we can reread this page from disk instead of killing process.
> +
> +	  Say Y if unsure.
> +
>   # Common NUMA Features
>   config NUMA
>   	bool "Numa Memory Allocation and Scheduler Support"
> diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
> new file mode 100644
> index 0000000..8c4f6a8
> --- /dev/null
> +++ b/arch/arm64/include/asm/ras.h
> @@ -0,0 +1,27 @@
> +/*
> + * ARM64 SEA error recoery support
> + *
> + * Copyright 2017 Huawei Technologies Co., Ltd.
> + *   Author: Xie XiuQi <xiexiuqi@huawei.com>
> + *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation;
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _ASM_RAS_H
> +#define _ASM_RAS_H
> +
> +#include <linux/cper.h>
> +#include <acpi/ghes.h>
> +
> +extern void sea_notify_process(void);
> +extern void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err);
> +
> +#endif /*_ASM_RAS_H*/
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 46c3b93..4b10131 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -86,6 +86,7 @@ struct thread_info {
>   #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
>   #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
>   #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
> +#define TIF_SEA_NOTIFY          5       /* notify to do an error recovery */
>   #define TIF_NOHZ		7
>   #define TIF_SYSCALL_TRACE	8
>   #define TIF_SYSCALL_AUDIT	9
> @@ -102,6 +103,7 @@ struct thread_info {
>   #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
>   #define _TIF_FOREIGN_FPSTATE	(1 << TIF_FOREIGN_FPSTATE)
>   #define _TIF_NOHZ		(1 << TIF_NOHZ)
> +#define _TIF_SEA_NOTIFY         (1 << TIF_SEA_NOTIFY)
>   #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
>   #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
>   #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
> @@ -111,7 +113,7 @@ struct thread_info {
>   
>   #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
>   				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> -				 _TIF_UPROBE)
> +				 _TIF_UPROBE|_TIF_SEA_NOTIFY)
>   
>   #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>   				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index f2b4e81..ba3abf8 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
>   arm64-obj-$(CONFIG_PCI)			+= pci.o
>   arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
>   arm64-obj-$(CONFIG_ACPI)		+= acpi.o
> +arm64-obj-$(CONFIG_ARM64_ERR_RECOV)	+= ras.o
>   arm64-obj-$(CONFIG_ACPI_NUMA)		+= acpi_numa.o
>   arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
>   arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
> diff --git a/arch/arm64/kernel/ras.c b/arch/arm64/kernel/ras.c
> new file mode 100644
> index 0000000..8562ec7
> --- /dev/null
> +++ b/arch/arm64/kernel/ras.c
> @@ -0,0 +1,138 @@
> +/*
> + * ARM64 SEA error recoery support
> + *
> + * Copyright 2017 Huawei Technologies Co., Ltd.
> + *   Author: Xie XiuQi <xiexiuqi@huawei.com>
> + *   Author: Wang Xiongfeng <wangxiongfeng2@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation;
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/cper.h>
> +#include <linux/mm.h>
> +#include <linux/preempt.h>
> +#include <linux/acpi.h>
> +#include <linux/sched/signal.h>
> +
> +#include <acpi/actbl1.h>
> +#include <acpi/ghes.h>
> +#include <acpi/apei.h>
> +
> +#include <asm/thread_info.h>
> +#include <asm/atomic.h>
> +#include <asm/ras.h>
> +
> +/*
> + * Need to save faulting physical address associated with a process
> + * in the sea ghes handler some place where we can grab it back
> + * later in sea_notify_process()
> + */
> +#define SEA_INFO_MAX    16
> +
> +struct sea_info {
> +        atomic_t                inuse;
> +        struct task_struct      *t;
> +        __u64                   paddr;
> +} sea_info[SEA_INFO_MAX];
> +
> +static int sea_save_info(__u64 addr)
> +{
> +        struct sea_info *si;
> +
> +        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++) {
> +                if (atomic_cmpxchg(&si->inuse, 0, 1) == 0) {
> +                        si->t = current;
> +                        si->paddr = addr;
> +                        return 0;
> +                }
> +        }
> +
> +	pr_err("Too many concurrent recoverable errors\n");
> +	return -ENOMEM;
> +}
> +
> +static struct sea_info *sea_find_info(void)
> +{
> +        struct sea_info *si;
> +
> +        for (si = sea_info; si < &sea_info[SEA_INFO_MAX]; si++)
> +                if (atomic_read(&si->inuse) && si->t == current)
> +                        return si;
> +        return NULL;
> +}
> +
> +static void sea_clear_info(struct sea_info *si)
> +{
> +        atomic_set(&si->inuse, 0);
> +}
> +
> +/*
> + * Called in process context that interrupted by SEA and marked with
> + * TIF_SEA_NOTIFY, just before returning to erroneous userland.
> + * This code is allowed to sleep.
> + * Attempt possible recovery such as calling the high level VM handler to
> + * process any corrupted pages, and kill/signal current process if required.
> + * Action required errors are handled here.
> + */
> +void sea_notify_process(void)
> +{
> +	unsigned long pfn;
> +	int fail = 0, flags = MF_ACTION_REQUIRED;
> +	struct sea_info *si = sea_find_info();
> +
> +	if (!si)
> +		panic("Lost physical address for consumed uncorrectable error");
> +
> +	clear_thread_flag(TIF_SEA_NOTIFY);
> +	do {
> +		pfn = si->paddr >> PAGE_SHIFT;
> +
> +
> +		pr_err("Uncorrected hardware memory error in user-access at %llx\n",
> +			si->paddr);
> +		/*
> +		 * We must call memory_failure() here even if the current process is
> +		 * doomed. We still need to mark the page as poisoned and alert any
> +		 * other users of the page.
> +		 */
> +		if (memory_failure(pfn, 0, flags) < 0) {
> +			fail++;
> +		}
> +		sea_clear_info(si);
> +
> +		si = sea_find_info();
> +	} while (si);
> +
> +	if (fail) {
> +		pr_err("Memory error not recovered\n");
> +		force_sig(SIGBUS, current);
> +	}
> +}
> +
> +void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
> +{
> +	int i, ret = -1;
> +	struct cper_arm_err_info *err_info;
> +
> +	if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
> +	    (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
> +		return;
> +
> +	err_info = (struct cper_arm_err_info *)(err + 1);
> +	for (i = 0; i < err->err_info_num; i++, err_info++) {
> +		if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
> +			ret |= sea_save_info(err_info->physical_fault_addr);
> +		}
> +	}
> +
> +	if (!ret)

If ret is initialized to -1, this is never true since you only OR bits 
in ret.

Should the body of the loop be:
	ret &= sea_save_info(err_info->physical_fault_addr);

so as long as you as you manage to store 1 sea_info you set the thread flag?

But if that's the case a boolean might make more sense:

bool info_saved = false;
[...]
	info_saved |= !sea_save_info(err_info->physical_fault_addr);
[...]
if (info_saved)
		[...]


> +		set_thread_flag(TIF_SEA_NOTIFY);
> +}
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 089c3747..71e314e 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -38,6 +38,7 @@
>   #include <asm/fpsimd.h>
>   #include <asm/signal32.h>
>   #include <asm/vdso.h>
> +#include <asm/ras.h>
>   
>   /*
>    * Do a signal return; undo the signal stack. These are aligned to 128-bit.
> @@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>   	 * Update the trace code with the current status.
>   	 */
>   	trace_hardirqs_off();
> +
> +#ifdef CONFIG_ARM64_ERR_RECOV
> +		/* notify userspace of pending SEAs */
> +		if (thread_flags & _TIF_SEA_NOTIFY)
> +			sea_notify_process();
> +#endif /* CONFIG_ARM64_ERR_RECOV */
> +
>   	do {
>   		if (thread_flags & _TIF_NEED_RESCHED) {
>   			schedule();
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 1f22a41..b38476d 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>   			nmi_exit();
>   	}
>   
> -	info.si_signo = SIGBUS;
> -	info.si_errno = 0;
> -	info.si_code  = 0;
> -	if (esr & ESR_ELx_FnV)
> -		info.si_addr = NULL;
> -	else
> -		info.si_addr  = (void __user *)addr;
> -	arm64_notify_die("", regs, &info, esr);
> +	if (user_mode(regs)) {
> +		if (test_thread_flag(TIF_SEA_NOTIFY))
> +			return ret;
> +
> +		info.si_signo = SIGBUS;
> +		info.si_errno = 0;
> +		info.si_code  = 0;
> +		if (esr & ESR_ELx_FnV)
> +			info.si_addr = NULL;
> +		else
> +			info.si_addr  = (void __user *)addr;
> +
> +		current->thread.fault_address = 0;
> +		current->thread.fault_code = esr;
> +		force_sig_info(info.si_signo, &info, current);
> +	} else {
> +		die("Uncorrected hardware memory error in kernel-access\n",
> +		    regs, esr);
> +	}
>   
>   	return ret;
>   }
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..fa9400d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -52,6 +52,7 @@
>   #include <acpi/ghes.h>
>   #include <acpi/apei.h>
>   #include <asm/tlbflush.h>
> +#include <asm/ras.h>
>   #include <ras/ras_event.h>
>   
>   #include "apei-internal.h"
> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>   		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>   			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>   
> +			arm_proc_error_check(ghes, err);

If I understand the Makefile change correctly, arm_proc_error_check is 
compiled only when CONFIG_ARM64_ERR_RECOV, don't you get a linker error 
here if this config is not selected?

Otherwise patch looks fine.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
  2017-09-01 15:51     ` Julien Thierry
  (?)
@ 2017-09-04  2:58       ` Xie XiuQi
  -1 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-04  2:58 UTC (permalink / raw)
  To: Julien Thierry, catalin.marinas, will.deacon, mingo, x86,
	mark.rutland, ard.biesheuvel, james.morse, takahiro.akashi,
	tbaicar, bp, shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, wangxiongfeng2,
	zhengqiang10, gengdongjiu

Hi Julien,

On 2017/9/1 23:51, Julien Thierry wrote:
> Hi Xie,
> 
> On 01/09/17 11:31, Xie XiuQi wrote:
>> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
>> are consumed. In some cases, if the error address is in a clean page or a
>> read-only page, there is a chance to recover. Such as error occurs in a
>> instruction page, we can reread this page from disk instead of killing process.
>>
>> Because memory_failure() may sleep, we can not call it directly in SEA exception
>> context. So we saved faulting physical address associated with a process in the
>> ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
>> and get into do_notify_resume() before the process running, we could check it
>> and call memory_failure() to do recovery. It's safe, because we are in process
>> context.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>

...

>> +
>> +void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
>> +{
>> +    int i, ret = -1;
>> +    struct cper_arm_err_info *err_info;
>> +
>> +    if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
>> +        (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
>> +        return;
>> +
>> +    err_info = (struct cper_arm_err_info *)(err + 1);
>> +    for (i = 0; i < err->err_info_num; i++, err_info++) {
>> +        if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
>> +            ret |= sea_save_info(err_info->physical_fault_addr);
>> +        }
>> +    }
>> +
>> +    if (!ret)
> 
> If ret is initialized to -1, this is never true since you only OR bits in ret.
> 
> Should the body of the loop be:
>     ret &= sea_save_info(err_info->physical_fault_addr);
> 
> so as long as you as you manage to store 1 sea_info you set the thread flag?
> 
> But if that's the case a boolean might make more sense:
> 
> bool info_saved = false;
> [...]
>     info_saved |= !sea_save_info(err_info->physical_fault_addr);
> [...]
> if (info_saved)
>         [...]
> 

You are right, I'll fix this issue, thanks.

> 
>> +        set_thread_flag(TIF_SEA_NOTIFY);
>> +}
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 089c3747..71e314e 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -38,6 +38,7 @@
>>   #include <asm/fpsimd.h>
>>   #include <asm/signal32.h>
>>   #include <asm/vdso.h>
>> +#include <asm/ras.h>
>>     /*
>>    * Do a signal return; undo the signal stack. These are aligned to 128-bit.
>> @@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>>        * Update the trace code with the current status.
>>        */
>>       trace_hardirqs_off();
>> +
>> +#ifdef CONFIG_ARM64_ERR_RECOV
>> +        /* notify userspace of pending SEAs */
>> +        if (thread_flags & _TIF_SEA_NOTIFY)
>> +            sea_notify_process();
>> +#endif /* CONFIG_ARM64_ERR_RECOV */
>> +
>>       do {
>>           if (thread_flags & _TIF_NEED_RESCHED) {
>>               schedule();
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 1f22a41..b38476d 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>               nmi_exit();
>>       }
>>   -    info.si_signo = SIGBUS;
>> -    info.si_errno = 0;
>> -    info.si_code  = 0;
>> -    if (esr & ESR_ELx_FnV)
>> -        info.si_addr = NULL;
>> -    else
>> -        info.si_addr  = (void __user *)addr;
>> -    arm64_notify_die("", regs, &info, esr);
>> +    if (user_mode(regs)) {
>> +        if (test_thread_flag(TIF_SEA_NOTIFY))
>> +            return ret;
>> +
>> +        info.si_signo = SIGBUS;
>> +        info.si_errno = 0;
>> +        info.si_code  = 0;
>> +        if (esr & ESR_ELx_FnV)
>> +            info.si_addr = NULL;
>> +        else
>> +            info.si_addr  = (void __user *)addr;
>> +
>> +        current->thread.fault_address = 0;
>> +        current->thread.fault_code = esr;
>> +        force_sig_info(info.si_signo, &info, current);
>> +    } else {
>> +        die("Uncorrected hardware memory error in kernel-access\n",
>> +            regs, esr);
>> +    }
>>         return ret;
>>   }
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index d661d45..fa9400d 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -52,6 +52,7 @@
>>   #include <acpi/ghes.h>
>>   #include <acpi/apei.h>
>>   #include <asm/tlbflush.h>
>> +#include <asm/ras.h>
>>   #include <ras/ras_event.h>
>>     #include "apei-internal.h"
>> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>           else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>>               struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>>   +            arm_proc_error_check(ghes, err);
> 
> If I understand the Makefile change correctly, arm_proc_error_check is compiled only when CONFIG_ARM64_ERR_RECOV, don't you get a linker error here if this config is not selected?
> 

Yes, it's a problem if CONFIG_ARM64_ERR_RECOV is not selected.
I'll fix it in next version.

> Otherwise patch looks fine.
> 

Thanks for your comments.

> Cheers,
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
@ 2017-09-04  2:58       ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-04  2:58 UTC (permalink / raw)
  To: Julien Thierry, catalin.marinas, will.deacon, mingo, x86,
	mark.rutland, ard.biesheuvel, james.morse, takahiro.akashi,
	tbaicar, bp, shiju.jose, zjzhang
  Cc: linux-arm-kernel, linux-kernel, linux-acpi, wangxiongfeng2,
	zhengqiang10, gengdongjiu

Hi Julien,

On 2017/9/1 23:51, Julien Thierry wrote:
> Hi Xie,
> 
> On 01/09/17 11:31, Xie XiuQi wrote:
>> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
>> are consumed. In some cases, if the error address is in a clean page or a
>> read-only page, there is a chance to recover. Such as error occurs in a
>> instruction page, we can reread this page from disk instead of killing process.
>>
>> Because memory_failure() may sleep, we can not call it directly in SEA exception
>> context. So we saved faulting physical address associated with a process in the
>> ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
>> and get into do_notify_resume() before the process running, we could check it
>> and call memory_failure() to do recovery. It's safe, because we are in process
>> context.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>

...

>> +
>> +void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
>> +{
>> +    int i, ret = -1;
>> +    struct cper_arm_err_info *err_info;
>> +
>> +    if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
>> +        (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
>> +        return;
>> +
>> +    err_info = (struct cper_arm_err_info *)(err + 1);
>> +    for (i = 0; i < err->err_info_num; i++, err_info++) {
>> +        if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
>> +            ret |= sea_save_info(err_info->physical_fault_addr);
>> +        }
>> +    }
>> +
>> +    if (!ret)
> 
> If ret is initialized to -1, this is never true since you only OR bits in ret.
> 
> Should the body of the loop be:
>     ret &= sea_save_info(err_info->physical_fault_addr);
> 
> so as long as you as you manage to store 1 sea_info you set the thread flag?
> 
> But if that's the case a boolean might make more sense:
> 
> bool info_saved = false;
> [...]
>     info_saved |= !sea_save_info(err_info->physical_fault_addr);
> [...]
> if (info_saved)
>         [...]
> 

You are right, I'll fix this issue, thanks.

> 
>> +        set_thread_flag(TIF_SEA_NOTIFY);
>> +}
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 089c3747..71e314e 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -38,6 +38,7 @@
>>   #include <asm/fpsimd.h>
>>   #include <asm/signal32.h>
>>   #include <asm/vdso.h>
>> +#include <asm/ras.h>
>>     /*
>>    * Do a signal return; undo the signal stack. These are aligned to 128-bit.
>> @@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>>        * Update the trace code with the current status.
>>        */
>>       trace_hardirqs_off();
>> +
>> +#ifdef CONFIG_ARM64_ERR_RECOV
>> +        /* notify userspace of pending SEAs */
>> +        if (thread_flags & _TIF_SEA_NOTIFY)
>> +            sea_notify_process();
>> +#endif /* CONFIG_ARM64_ERR_RECOV */
>> +
>>       do {
>>           if (thread_flags & _TIF_NEED_RESCHED) {
>>               schedule();
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 1f22a41..b38476d 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>               nmi_exit();
>>       }
>>   -    info.si_signo = SIGBUS;
>> -    info.si_errno = 0;
>> -    info.si_code  = 0;
>> -    if (esr & ESR_ELx_FnV)
>> -        info.si_addr = NULL;
>> -    else
>> -        info.si_addr  = (void __user *)addr;
>> -    arm64_notify_die("", regs, &info, esr);
>> +    if (user_mode(regs)) {
>> +        if (test_thread_flag(TIF_SEA_NOTIFY))
>> +            return ret;
>> +
>> +        info.si_signo = SIGBUS;
>> +        info.si_errno = 0;
>> +        info.si_code  = 0;
>> +        if (esr & ESR_ELx_FnV)
>> +            info.si_addr = NULL;
>> +        else
>> +            info.si_addr  = (void __user *)addr;
>> +
>> +        current->thread.fault_address = 0;
>> +        current->thread.fault_code = esr;
>> +        force_sig_info(info.si_signo, &info, current);
>> +    } else {
>> +        die("Uncorrected hardware memory error in kernel-access\n",
>> +            regs, esr);
>> +    }
>>         return ret;
>>   }
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index d661d45..fa9400d 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -52,6 +52,7 @@
>>   #include <acpi/ghes.h>
>>   #include <acpi/apei.h>
>>   #include <asm/tlbflush.h>
>> +#include <asm/ras.h>
>>   #include <ras/ras_event.h>
>>     #include "apei-internal.h"
>> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>           else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>>               struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>>   +            arm_proc_error_check(ghes, err);
> 
> If I understand the Makefile change correctly, arm_proc_error_check is compiled only when CONFIG_ARM64_ERR_RECOV, don't you get a linker error here if this config is not selected?
> 

Yes, it's a problem if CONFIG_ARM64_ERR_RECOV is not selected.
I'll fix it in next version.

> Otherwise patch looks fine.
> 

Thanks for your comments.

> Cheers,
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 1/3] arm64/ras: support sea error recovery
@ 2017-09-04  2:58       ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-04  2:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Julien,

On 2017/9/1 23:51, Julien Thierry wrote:
> Hi Xie,
> 
> On 01/09/17 11:31, Xie XiuQi wrote:
>> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
>> are consumed. In some cases, if the error address is in a clean page or a
>> read-only page, there is a chance to recover. Such as error occurs in a
>> instruction page, we can reread this page from disk instead of killing process.
>>
>> Because memory_failure() may sleep, we can not call it directly in SEA exception
>> context. So we saved faulting physical address associated with a process in the
>> ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context
>> and get into do_notify_resume() before the process running, we could check it
>> and call memory_failure() to do recovery. It's safe, because we are in process
>> context.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> Signed-off-by: Wang Xiongfeng <wangxiongfeng2@huawei.com>

...

>> +
>> +void arm_proc_error_check(struct ghes *ghes, struct cper_sec_proc_arm *err)
>> +{
>> +    int i, ret = -1;
>> +    struct cper_arm_err_info *err_info;
>> +
>> +    if ((ghes->generic->notify.type != ACPI_HEST_NOTIFY_SEA) ||
>> +        (ghes->estatus->error_severity != CPER_SEV_RECOVERABLE))
>> +        return;
>> +
>> +    err_info = (struct cper_arm_err_info *)(err + 1);
>> +    for (i = 0; i < err->err_info_num; i++, err_info++) {
>> +        if (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) {
>> +            ret |= sea_save_info(err_info->physical_fault_addr);
>> +        }
>> +    }
>> +
>> +    if (!ret)
> 
> If ret is initialized to -1, this is never true since you only OR bits in ret.
> 
> Should the body of the loop be:
>     ret &= sea_save_info(err_info->physical_fault_addr);
> 
> so as long as you as you manage to store 1 sea_info you set the thread flag?
> 
> But if that's the case a boolean might make more sense:
> 
> bool info_saved = false;
> [...]
>     info_saved |= !sea_save_info(err_info->physical_fault_addr);
> [...]
> if (info_saved)
>         [...]
> 

You are right, I'll fix this issue, thanks.

> 
>> +        set_thread_flag(TIF_SEA_NOTIFY);
>> +}
>> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> index 089c3747..71e314e 100644
>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -38,6 +38,7 @@
>>   #include <asm/fpsimd.h>
>>   #include <asm/signal32.h>
>>   #include <asm/vdso.h>
>> +#include <asm/ras.h>
>>     /*
>>    * Do a signal return; undo the signal stack. These are aligned to 128-bit.
>> @@ -749,6 +750,13 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>>        * Update the trace code with the current status.
>>        */
>>       trace_hardirqs_off();
>> +
>> +#ifdef CONFIG_ARM64_ERR_RECOV
>> +        /* notify userspace of pending SEAs */
>> +        if (thread_flags & _TIF_SEA_NOTIFY)
>> +            sea_notify_process();
>> +#endif /* CONFIG_ARM64_ERR_RECOV */
>> +
>>       do {
>>           if (thread_flags & _TIF_NEED_RESCHED) {
>>               schedule();
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 1f22a41..b38476d 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -594,14 +594,25 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>               nmi_exit();
>>       }
>>   -    info.si_signo = SIGBUS;
>> -    info.si_errno = 0;
>> -    info.si_code  = 0;
>> -    if (esr & ESR_ELx_FnV)
>> -        info.si_addr = NULL;
>> -    else
>> -        info.si_addr  = (void __user *)addr;
>> -    arm64_notify_die("", regs, &info, esr);
>> +    if (user_mode(regs)) {
>> +        if (test_thread_flag(TIF_SEA_NOTIFY))
>> +            return ret;
>> +
>> +        info.si_signo = SIGBUS;
>> +        info.si_errno = 0;
>> +        info.si_code  = 0;
>> +        if (esr & ESR_ELx_FnV)
>> +            info.si_addr = NULL;
>> +        else
>> +            info.si_addr  = (void __user *)addr;
>> +
>> +        current->thread.fault_address = 0;
>> +        current->thread.fault_code = esr;
>> +        force_sig_info(info.si_signo, &info, current);
>> +    } else {
>> +        die("Uncorrected hardware memory error in kernel-access\n",
>> +            regs, esr);
>> +    }
>>         return ret;
>>   }
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index d661d45..fa9400d 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -52,6 +52,7 @@
>>   #include <acpi/ghes.h>
>>   #include <acpi/apei.h>
>>   #include <asm/tlbflush.h>
>> +#include <asm/ras.h>
>>   #include <ras/ras_event.h>
>>     #include "apei-internal.h"
>> @@ -520,6 +521,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>           else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>>               struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>>   +            arm_proc_error_check(ghes, err);
> 
> If I understand the Makefile change correctly, arm_proc_error_check is compiled only when CONFIG_ARM64_ERR_RECOV, don't you get a linker error here if this config is not selected?
> 

Yes, it's a problem if CONFIG_ARM64_ERR_RECOV is not selected.
I'll fix it in next version.

> Otherwise patch looks fine.
> 

Thanks for your comments.

> Cheers,
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
  2017-09-01 11:15     ` Borislav Petkov
  (?)
@ 2017-09-05  2:20       ` Xie XiuQi
  -1 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-05  2:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar,
	shiju.jose, zjzhang, linux-arm-kernel, linux-kernel, linux-acpi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

Hi Borislav,

On 2017/9/1 19:15, Borislav Petkov wrote:
> n Fri, Sep 01, 2017 at 06:32:00PM +0800, Xie XiuQi wrote:
>> Add ghes param for arch_apei_report_mem_error, with which
>> we could do more arch-specific processing.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> ---
>>  arch/x86/kernel/acpi/apei.c   | 2 +-
>>  drivers/acpi/apei/apei-base.c | 4 +++-
>>  drivers/acpi/apei/ghes.c      | 2 +-
>>  include/acpi/apei.h           | 4 +++-
>>  include/acpi/ghes.h           | 3 ++-
>>  5 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
>> index ea3046e..1bf1c9b 100644
>> --- a/arch/x86/kernel/acpi/apei.c
>> +++ b/arch/x86/kernel/acpi/apei.c
>> @@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>>  	return 1;
>>  }
>>  
>> -void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>> +void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
>>  {
>>  #ifdef CONFIG_X86_MCE
>>  	apei_mce_report_mem_error(sev, mem_err);
>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>> index da370e1..317169b 100644
>> --- a/drivers/acpi/apei/apei-base.c
>> +++ b/drivers/acpi/apei/apei-base.c
>> @@ -38,6 +38,8 @@
>>  #include <linux/debugfs.h>
>>  #include <asm/unaligned.h>
>>  
>> +#include <acpi/ghes.h>
>> +
>>  #include "apei-internal.h"
>>  
>>  #define APEI_PFX "APEI: "
>> @@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
>>  }
>>  EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
>>  
>> -void __weak arch_apei_report_mem_error(int sev,
>> +void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
>>  				       struct cper_sec_mem_err *mem_err)
>>  {
>>  }
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fa9400d..996d16c4 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>  
>>  			ghes_edac_report_mem_error(ghes, sev, mem_err);
>>  
>> -			arch_apei_report_mem_error(sev, mem_err);
>> +			arch_apei_report_mem_error(ghes, sev, mem_err);
> 
> And next time you want to pass something else, you'll have to touch all
> those files again...
> 
> Instead, make that a notifier to which consumers register and define
> a separate struct mem_err or ghes_err or whatnot and populate it with
> cper_sec_mem_err data and whatever else is needed by the consumers.
> Instead of passing that struct ghes * which consumers don't need to
> know.

OK, I'll add a notify chain here, thanks.

> 
> Thx.
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
@ 2017-09-05  2:20       ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-05  2:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: catalin.marinas, will.deacon, mingo, x86, mark.rutland,
	ard.biesheuvel, james.morse, takahiro.akashi, tbaicar,
	shiju.jose, zjzhang, linux-arm-kernel, linux-kernel, linux-acpi,
	wangxiongfeng2, zhengqiang10, gengdongjiu

Hi Borislav,

On 2017/9/1 19:15, Borislav Petkov wrote:
> n Fri, Sep 01, 2017 at 06:32:00PM +0800, Xie XiuQi wrote:
>> Add ghes param for arch_apei_report_mem_error, with which
>> we could do more arch-specific processing.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> ---
>>  arch/x86/kernel/acpi/apei.c   | 2 +-
>>  drivers/acpi/apei/apei-base.c | 4 +++-
>>  drivers/acpi/apei/ghes.c      | 2 +-
>>  include/acpi/apei.h           | 4 +++-
>>  include/acpi/ghes.h           | 3 ++-
>>  5 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
>> index ea3046e..1bf1c9b 100644
>> --- a/arch/x86/kernel/acpi/apei.c
>> +++ b/arch/x86/kernel/acpi/apei.c
>> @@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>>  	return 1;
>>  }
>>  
>> -void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>> +void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
>>  {
>>  #ifdef CONFIG_X86_MCE
>>  	apei_mce_report_mem_error(sev, mem_err);
>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>> index da370e1..317169b 100644
>> --- a/drivers/acpi/apei/apei-base.c
>> +++ b/drivers/acpi/apei/apei-base.c
>> @@ -38,6 +38,8 @@
>>  #include <linux/debugfs.h>
>>  #include <asm/unaligned.h>
>>  
>> +#include <acpi/ghes.h>
>> +
>>  #include "apei-internal.h"
>>  
>>  #define APEI_PFX "APEI: "
>> @@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
>>  }
>>  EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
>>  
>> -void __weak arch_apei_report_mem_error(int sev,
>> +void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
>>  				       struct cper_sec_mem_err *mem_err)
>>  {
>>  }
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fa9400d..996d16c4 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>  
>>  			ghes_edac_report_mem_error(ghes, sev, mem_err);
>>  
>> -			arch_apei_report_mem_error(sev, mem_err);
>> +			arch_apei_report_mem_error(ghes, sev, mem_err);
> 
> And next time you want to pass something else, you'll have to touch all
> those files again...
> 
> Instead, make that a notifier to which consumers register and define
> a separate struct mem_err or ghes_err or whatnot and populate it with
> cper_sec_mem_err data and whatever else is needed by the consumers.
> Instead of passing that struct ghes * which consumers don't need to
> know.

OK, I'll add a notify chain here, thanks.

> 
> Thx.
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error
@ 2017-09-05  2:20       ` Xie XiuQi
  0 siblings, 0 replies; 24+ messages in thread
From: Xie XiuQi @ 2017-09-05  2:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Borislav,

On 2017/9/1 19:15, Borislav Petkov wrote:
> n Fri, Sep 01, 2017 at 06:32:00PM +0800, Xie XiuQi wrote:
>> Add ghes param for arch_apei_report_mem_error, with which
>> we could do more arch-specific processing.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> ---
>>  arch/x86/kernel/acpi/apei.c   | 2 +-
>>  drivers/acpi/apei/apei-base.c | 4 +++-
>>  drivers/acpi/apei/ghes.c      | 2 +-
>>  include/acpi/apei.h           | 4 +++-
>>  include/acpi/ghes.h           | 3 ++-
>>  5 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
>> index ea3046e..1bf1c9b 100644
>> --- a/arch/x86/kernel/acpi/apei.c
>> +++ b/arch/x86/kernel/acpi/apei.c
>> @@ -46,7 +46,7 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>>  	return 1;
>>  }
>>  
>> -void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>> +void arch_apei_report_mem_error(struct ghes *ghes, int sev, struct cper_sec_mem_err *mem_err)
>>  {
>>  #ifdef CONFIG_X86_MCE
>>  	apei_mce_report_mem_error(sev, mem_err);
>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>> index da370e1..317169b 100644
>> --- a/drivers/acpi/apei/apei-base.c
>> +++ b/drivers/acpi/apei/apei-base.c
>> @@ -38,6 +38,8 @@
>>  #include <linux/debugfs.h>
>>  #include <asm/unaligned.h>
>>  
>> +#include <acpi/ghes.h>
>> +
>>  #include "apei-internal.h"
>>  
>>  #define APEI_PFX "APEI: "
>> @@ -770,7 +772,7 @@ int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr,
>>  }
>>  EXPORT_SYMBOL_GPL(arch_apei_enable_cmcff);
>>  
>> -void __weak arch_apei_report_mem_error(int sev,
>> +void __weak arch_apei_report_mem_error(struct ghes *ghes, int sev,
>>  				       struct cper_sec_mem_err *mem_err)
>>  {
>>  }
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fa9400d..996d16c4 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -483,7 +483,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>  
>>  			ghes_edac_report_mem_error(ghes, sev, mem_err);
>>  
>> -			arch_apei_report_mem_error(sev, mem_err);
>> +			arch_apei_report_mem_error(ghes, sev, mem_err);
> 
> And next time you want to pass something else, you'll have to touch all
> those files again...
> 
> Instead, make that a notifier to which consumers register and define
> a separate struct mem_err or ghes_err or whatnot and populate it with
> cper_sec_mem_err data and whatever else is needed by the consumers.
> Instead of passing that struct ghes * which consumers don't need to
> know.

OK, I'll add a notify chain here, thanks.

> 
> Thx.
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-09-05  2:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-01 10:31 [RFC PATCH v1 0/3] arm64/ras: support sea error recovery Xie XiuQi
2017-09-01 10:31 ` Xie XiuQi
2017-09-01 10:31 ` Xie XiuQi
2017-09-01 10:31 ` [RFC PATCH v1 1/3] " Xie XiuQi
2017-09-01 10:31   ` Xie XiuQi
2017-09-01 10:31   ` Xie XiuQi
2017-09-01 11:16   ` Borislav Petkov
2017-09-01 11:16     ` Borislav Petkov
2017-09-01 15:51   ` Julien Thierry
2017-09-01 15:51     ` Julien Thierry
2017-09-04  2:58     ` Xie XiuQi
2017-09-04  2:58       ` Xie XiuQi
2017-09-04  2:58       ` Xie XiuQi
2017-09-01 10:32 ` [RFC PATCH v1 2/3] apei: add ghes param for arch_apei_report_mem_error Xie XiuQi
2017-09-01 10:32   ` Xie XiuQi
2017-09-01 10:32   ` Xie XiuQi
2017-09-01 11:15   ` Borislav Petkov
2017-09-01 11:15     ` Borislav Petkov
2017-09-05  2:20     ` Xie XiuQi
2017-09-05  2:20       ` Xie XiuQi
2017-09-05  2:20       ` Xie XiuQi
2017-09-01 10:32 ` [RFC PATCH v1 3/3] arm64/apei: get error address from memory section for recovery Xie XiuQi
2017-09-01 10:32   ` Xie XiuQi
2017-09-01 10:32   ` Xie XiuQi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.