linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH -next V2 0/7]arm64: add machine check safe support
@ 2022-04-06  9:13 Tong Tiangen
  2022-04-06  9:13 ` [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error Tong Tiangen
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

This patchset is based on[1].

With the increase of memory capacity and density, the probability of
memory error increases. The increasing size and density of server RAM
in the data center and cloud have shown increased uncorrectable memory
errors.

Currently, the kernel has a mechanism to recover from hardware memory
errors. This patchset provides an new recovery mechanism.

For ARM64, the hardware error handling is do_sea() which divided into
two cases:
1. The user state consumed the memory errors, the solution is kill th
     user process and isolate the error page.
2. The kernel state consumed the memory errors, the solution is panic.

For kernelspace, Undifferentiated panic maybe not the optimal choice,
it can be handled better.

This patchset deals with four sscenarios of hardware memory error consumed
in kernelspace:
1. copy_from_user.
2. get_user.
3. cow(copy on write).
4. pagecache reading.

These four scenarios have similarities. Although the error is consumed in
the kernel state, but the consumed data belongs to the user state.

The processing scheme is based on CONFIG_ARCH_HAS_COPY_MC and uses the
process killing plus isolate error page to replace kernel panic.

[1]https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtiangen@huawei.com/

Since V2:
 1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
   ARM64_UCE_KERNEL_RECOVERY.
 2.Add two new scenarios, cow and pagecache reading.
 3.Fix two small bug(the first two patch).

Tong Tiangen (7):
  x86: fix copy_mc_to_user compile error
  arm64: fix page_address return value in copy_highpage
  arm64: add support for machine check error safe
  arm64: add copy_from_user to machine check safe
  arm64: add get_user to machine check safe
  arm64: add cow to machine check safe
  arm64: add pagecache reading to machine check safe

 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/asm-extable.h | 25 +++++++
 arch/arm64/include/asm/asm-uaccess.h | 16 +++++
 arch/arm64/include/asm/esr.h         |  5 ++
 arch/arm64/include/asm/extable.h     |  2 +-
 arch/arm64/include/asm/page.h        | 10 +++
 arch/arm64/include/asm/uaccess.h     | 17 ++++-
 arch/arm64/kernel/probes/kprobes.c   |  2 +-
 arch/arm64/lib/Makefile              |  2 +
 arch/arm64/lib/copy_from_user.S      | 11 ++--
 arch/arm64/lib/copy_page_mc.S        | 98 ++++++++++++++++++++++++++++
 arch/arm64/lib/copy_to_user_mc.S     | 78 ++++++++++++++++++++++
 arch/arm64/mm/copypage.c             | 36 ++++++++--
 arch/arm64/mm/extable.c              | 21 +++++-
 arch/arm64/mm/fault.c                | 30 ++++++++-
 arch/x86/include/asm/uaccess.h       |  1 +
 include/linux/highmem.h              |  8 +++
 include/linux/uaccess.h              |  8 +++
 include/linux/uio.h                  |  9 ++-
 lib/iov_iter.c                       | 85 +++++++++++++++++++-----
 mm/memory.c                          |  2 +-
 21 files changed, 432 insertions(+), 35 deletions(-)
 create mode 100644 arch/arm64/lib/copy_page_mc.S
 create mode 100644 arch/arm64/lib/copy_to_user_mc.S

-- 
2.18.0.huawei.25


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06  9:22   ` Borislav Petkov
  2022-04-06  9:13 ` [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage Tong Tiangen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

The follow patch will add copy_mc_to_user to include/linux/uaccess.h, X86
must declare Implemented to avoid compile error.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/x86/include/asm/uaccess.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f78e2b3501a1..e18c5f098025 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -415,6 +415,7 @@ copy_mc_to_kernel(void *to, const void *from, unsigned len);
 
 unsigned long __must_check
 copy_mc_to_user(void *to, const void *from, unsigned len);
+#define copy_mc_to_user copy_mc_to_user
 #endif
 
 /*
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
  2022-04-06  9:13 ` [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06 10:22   ` Mark Rutland
  2022-04-06  9:13 ` [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe Tong Tiangen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

Function page_address return void, fix it.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/mm/copypage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index b5447e53cd73..0dea80bf6de4 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -16,8 +16,8 @@
 
 void copy_highpage(struct page *to, struct page *from)
 {
-	struct page *kto = page_address(to);
-	struct page *kfrom = page_address(from);
+	void *kto = page_address(to);
+	void *kfrom = page_address(from);
 
 	copy_page(kto, kfrom);
 
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
  2022-04-06  9:13 ` [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error Tong Tiangen
  2022-04-06  9:13 ` [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06 10:58   ` Mark Rutland
  2022-04-06  9:13 ` [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe Tong Tiangen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

In arm64 kernel hardware memory errors process(do_sea()), if the errors
is consumed in the kernel, the current processing is panic. However,
it is not optimal. In some case, the page accessed in kernel is a user
page (such as copy_from_user/get_user), kill the user process and
isolate the user page with hardware memory errors is a better choice.

Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.

This patch only enable machine error check framework, it add exception
fixup before kernel panic in do_sea() and only limit the consumption of
hardware memory errors in kernel mode triggered by user mode processes.
If fixup successful, there is no need to panic.

Also add _asm_extable_mc macro used for add extable entry to help
fixup.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/asm-extable.h | 13 ++++++++++++
 arch/arm64/include/asm/esr.h         |  5 +++++
 arch/arm64/include/asm/extable.h     |  2 +-
 arch/arm64/kernel/probes/kprobes.c   |  2 +-
 arch/arm64/mm/extable.c              | 20 ++++++++++++++++++-
 arch/arm64/mm/fault.c                | 30 +++++++++++++++++++++++++++-
 include/linux/uaccess.h              |  8 ++++++++
 8 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d9325dd95eba..012e38309955 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,6 +19,7 @@ config ARM64
 	select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
 	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
 	select ARCH_HAS_CACHE_LINE_SIZE
+	select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
 	select ARCH_HAS_CURRENT_STACK_POINTER
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEBUG_VM_PGTABLE
diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
index c39f2437e08e..74d1db74fd86 100644
--- a/arch/arm64/include/asm/asm-extable.h
+++ b/arch/arm64/include/asm/asm-extable.h
@@ -8,6 +8,11 @@
 #define EX_TYPE_UACCESS_ERR_ZERO	3
 #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD	4
 
+/* _MC indicates that can fixup from machine check errors */
+#define EX_TYPE_FIXUP_MC		5
+
+#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
+
 #ifdef __ASSEMBLY__
 
 #define __ASM_EXTABLE_RAW(insn, fixup, type, data)	\
@@ -27,6 +32,14 @@
 	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
 	.endm
 
+/*
+ * Create an exception table entry for `insn`, which will branch to `fixup`
+ * when an unhandled fault(include sea fault) is taken.
+ */
+	.macro		_asm_extable_mc, insn, fixup
+	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP_MC, 0)
+	.endm
+
 /*
  * Create an exception table entry for `insn` if `fixup` is provided. Otherwise
  * do nothing.
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d52a0b269ee8..11fcfc002654 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -330,6 +330,11 @@
 #ifndef __ASSEMBLY__
 #include <asm/types.h>
 
+static inline bool esr_is_sea(u32 esr)
+{
+	return (esr & ESR_ELx_FSC) == ESR_ELx_FSC_EXTABT;
+}
+
 static inline bool esr_is_data_abort(u32 esr)
 {
 	const u32 ec = ESR_ELx_EC(esr);
diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
index 72b0e71cc3de..f7835b0f473b 100644
--- a/arch/arm64/include/asm/extable.h
+++ b/arch/arm64/include/asm/extable.h
@@ -45,5 +45,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
 }
 #endif /* !CONFIG_BPF_JIT */
 
-bool fixup_exception(struct pt_regs *regs);
+bool fixup_exception(struct pt_regs *regs, unsigned int esr);
 #endif
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d9dfa82c1f18..16a069e8eec3 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -285,7 +285,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
 		 * In case the user-specified fault handler returned
 		 * zero, try to fix up.
 		 */
-		if (fixup_exception(regs))
+		if (fixup_exception(regs, fsr))
 			return 1;
 	}
 	return 0;
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 489455309695..f1134c88e849 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -9,6 +9,7 @@
 
 #include <asm/asm-extable.h>
 #include <asm/ptrace.h>
+#include <asm/esr.h>
 
 static inline unsigned long
 get_ex_fixup(const struct exception_table_entry *ex)
@@ -23,6 +24,18 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex,
 	return true;
 }
 
+static bool ex_handler_fixup_mc(const struct exception_table_entry *ex,
+				struct pt_regs *regs, unsigned int esr)
+{
+	if (esr_is_sea(esr))
+		regs->regs[0] = 0;
+	else
+		regs->regs[0] = 1;
+
+	regs->pc = get_ex_fixup(ex);
+	return true;
+}
+
 static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
 					struct pt_regs *regs)
 {
@@ -63,7 +76,7 @@ ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex,
 	return true;
 }
 
-bool fixup_exception(struct pt_regs *regs)
+bool fixup_exception(struct pt_regs *regs, unsigned int esr)
 {
 	const struct exception_table_entry *ex;
 
@@ -71,9 +84,14 @@ bool fixup_exception(struct pt_regs *regs)
 	if (!ex)
 		return false;
 
+	if (esr_is_sea(esr) && !IS_EX_TYPE_MC(ex->type))
+		return false;
+
 	switch (ex->type) {
 	case EX_TYPE_FIXUP:
 		return ex_handler_fixup(ex, regs);
+	case EX_TYPE_FIXUP_MC:
+		return ex_handler_fixup_mc(ex, regs, esr);
 	case EX_TYPE_BPF:
 		return ex_handler_bpf(ex, regs);
 	case EX_TYPE_UACCESS_ERR_ZERO:
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 77341b160aca..ffdfab2fdd60 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -361,7 +361,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
 	 * Are we prepared to handle this kernel fault?
 	 * We are almost certainly not prepared to handle instruction faults.
 	 */
-	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
+	if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
 		return;
 
 	if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
@@ -695,6 +695,30 @@ static int do_bad(unsigned long far, unsigned int esr, struct pt_regs *regs)
 	return 1; /* "fault" */
 }
 
+static bool arm64_process_kernel_sea(unsigned long addr, unsigned int esr,
+				     struct pt_regs *regs, int sig, int code)
+{
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
+		return false;
+
+	if (user_mode(regs) || !current->mm)
+		return false;
+
+	if (apei_claim_sea(regs) < 0)
+		return false;
+
+	current->thread.fault_address = 0;
+	current->thread.fault_code = esr;
+
+	if (!fixup_exception(regs, esr))
+		return false;
+
+	arm64_force_sig_fault(sig, code, addr,
+		"Uncorrected hardware memory error in kernel-access\n");
+
+	return true;
+}
+
 static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
 {
 	const struct fault_info *inf;
@@ -720,6 +744,10 @@ static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
 		 */
 		siaddr  = untagged_addr(far);
 	}
+
+	if (arm64_process_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
+		return 0;
+
 	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
 
 	return 0;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 546179418ffa..dd952aeecdc1 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -174,6 +174,14 @@ copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
 }
 #endif
 
+#ifndef copy_mc_to_user
+static inline unsigned long __must_check
+copy_mc_to_user(void *dst, const void *src, size_t cnt)
+{
+	return raw_copy_to_user(dst, src, cnt);
+}
+#endif
+
 static __always_inline void pagefault_disabled_inc(void)
 {
 	current->pagefault_disabled++;
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
                   ` (2 preceding siblings ...)
  2022-04-06  9:13 ` [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06 11:19   ` Mark Rutland
  2022-04-06  9:13 ` [RFC PATCH -next V2 5/7] arm64: add get_user " Tong Tiangen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

Add scenarios copy_from_user to machine check safe.

The data copied is user data and is machine check safe, so just kill
the user process and isolate the error page, not necessary panic.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++
 arch/arm64/lib/copy_from_user.S      | 11 ++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
index 0557af834e03..f31c8978e1af 100644
--- a/arch/arm64/include/asm/asm-uaccess.h
+++ b/arch/arm64/include/asm/asm-uaccess.h
@@ -92,4 +92,20 @@ alternative_else_nop_endif
 
 		_asm_extable	8888b,\l;
 	.endm
+
+	.macro user_ldp_mc l, reg1, reg2, addr, post_inc
+8888:		ldtr	\reg1, [\addr];
+8889:		ldtr	\reg2, [\addr, #8];
+		add	\addr, \addr, \post_inc;
+
+		_asm_extable_mc	8888b, \l;
+		_asm_extable_mc	8889b, \l;
+	.endm
+
+	.macro user_ldst_mc l, inst, reg, addr, post_inc
+8888:		\inst		\reg, [\addr];
+		add		\addr, \addr, \post_inc;
+
+		_asm_extable_mc	8888b, \l;
+	.endm
 #endif
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 34e317907524..d9d7c5291871 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -21,7 +21,7 @@
  */
 
 	.macro ldrb1 reg, ptr, val
-	user_ldst 9998f, ldtrb, \reg, \ptr, \val
+	user_ldst_mc 9998f, ldtrb, \reg, \ptr, \val
 	.endm
 
 	.macro strb1 reg, ptr, val
@@ -29,7 +29,7 @@
 	.endm
 
 	.macro ldrh1 reg, ptr, val
-	user_ldst 9997f, ldtrh, \reg, \ptr, \val
+	user_ldst_mc 9997f, ldtrh, \reg, \ptr, \val
 	.endm
 
 	.macro strh1 reg, ptr, val
@@ -37,7 +37,7 @@
 	.endm
 
 	.macro ldr1 reg, ptr, val
-	user_ldst 9997f, ldtr, \reg, \ptr, \val
+	user_ldst_mc 9997f, ldtr, \reg, \ptr, \val
 	.endm
 
 	.macro str1 reg, ptr, val
@@ -45,7 +45,7 @@
 	.endm
 
 	.macro ldp1 reg1, reg2, ptr, val
-	user_ldp 9997f, \reg1, \reg2, \ptr, \val
+	user_ldp_mc 9997f, \reg1, \reg2, \ptr, \val
 	.endm
 
 	.macro stp1 reg1, reg2, ptr, val
@@ -62,7 +62,8 @@ SYM_FUNC_START(__arch_copy_from_user)
 	ret
 
 	// Exception fixups
-9997:	cmp	dst, dstin
+9997:	cbz	x0, 9998f			// Check machine check exception
+	cmp	dst, dstin
 	b.ne	9998f
 	// Before being absolutely sure we couldn't copy anything, try harder
 USER(9998f, ldtrb tmp1w, [srcin])
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 5/7] arm64: add get_user to machine check safe
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
                   ` (3 preceding siblings ...)
  2022-04-06  9:13 ` [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06 11:22   ` Mark Rutland
  2022-04-06  9:13 ` [RFC PATCH -next V2 6/7] arm64: add cow " Tong Tiangen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

Add scenarios get_user to machine check safe. The processing of
EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
and both return -EFAULT.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/include/asm/asm-extable.h | 14 +++++++++++++-
 arch/arm64/include/asm/uaccess.h     |  2 +-
 arch/arm64/mm/extable.c              |  1 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
index 74d1db74fd86..bfc2d224cbae 100644
--- a/arch/arm64/include/asm/asm-extable.h
+++ b/arch/arm64/include/asm/asm-extable.h
@@ -10,8 +10,11 @@
 
 /* _MC indicates that can fixup from machine check errors */
 #define EX_TYPE_FIXUP_MC		5
+#define EX_TYPE_UACCESS_ERR_ZERO_MC	6
 
-#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
+#define IS_EX_TYPE_MC(type)			\
+	(type == EX_TYPE_FIXUP_MC ||		\
+	 type == EX_TYPE_UACCESS_ERR_ZERO_MC)
 
 #ifdef __ASSEMBLY__
 
@@ -77,6 +80,15 @@
 #define EX_DATA_REG(reg, gpr)						\
 	"((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
 
+#define _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(insn, fixup, err, zero)		\
+	__DEFINE_ASM_GPR_NUMS							\
+	__ASM_EXTABLE_RAW(#insn, #fixup,					\
+			  __stringify(EX_TYPE_UACCESS_ERR_ZERO_MC),		\
+			  "("							\
+			    EX_DATA_REG(ERR, err) " | "				\
+			    EX_DATA_REG(ZERO, zero)				\
+			  ")")
+
 #define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero)		\
 	__DEFINE_ASM_GPR_NUMS						\
 	__ASM_EXTABLE_RAW(#insn, #fixup, 				\
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e8dce0cc5eaa..24b662407fbd 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	asm volatile(							\
 	"1:	" load "	" reg "1, [%2]\n"			\
 	"2:\n"								\
-	_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1)			\
+	_ASM_EXTABLE_UACCESS_ERR_ZERO_MC(1b, 2b, %w0, %w1)		\
 	: "+r" (err), "=&r" (x)						\
 	: "r" (addr))
 
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index f1134c88e849..7c05f8d2bce0 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -95,6 +95,7 @@ bool fixup_exception(struct pt_regs *regs, unsigned int esr)
 	case EX_TYPE_BPF:
 		return ex_handler_bpf(ex, regs);
 	case EX_TYPE_UACCESS_ERR_ZERO:
+	case EX_TYPE_UACCESS_ERR_ZERO_MC:
 		return ex_handler_uaccess_err_zero(ex, regs);
 	case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
 		return ex_handler_load_unaligned_zeropad(ex, regs);
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 6/7] arm64: add cow to machine check safe
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
                   ` (4 preceding siblings ...)
  2022-04-06  9:13 ` [RFC PATCH -next V2 5/7] arm64: add get_user " Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06  9:13 ` [RFC PATCH -next V2 7/7] arm64: add pagecache reading " Tong Tiangen
  2022-04-06 10:04 ` [RFC PATCH -next V2 0/7]arm64: add machine check safe support Mark Rutland
  7 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

In the cow(copy on write) processing, the data of the user process is
copied, when machine check error is encountered during copy, killing
the user process and isolate the user page with hardware memory errors
is a more reasonable choice than kernel panic.

The copy_page_mc() in copy_page_mc.S is largely borrows from
copy_page() in copy_page.S and the main difference is copy_page_mc()
add the extable entry to support machine check safe.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/include/asm/page.h | 10 ++++
 arch/arm64/lib/Makefile       |  2 +
 arch/arm64/lib/copy_page_mc.S | 98 +++++++++++++++++++++++++++++++++++
 arch/arm64/mm/copypage.c      | 36 ++++++++++---
 include/linux/highmem.h       |  8 +++
 mm/memory.c                   |  2 +-
 6 files changed, 149 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/lib/copy_page_mc.S

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 993a27ea6f54..832571a7dddb 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -29,6 +29,16 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+extern void copy_page_mc(void *to, const void *from);
+void copy_highpage_mc(struct page *to, struct page *from);
+#define __HAVE_ARCH_COPY_HIGHPAGE_MC
+
+void copy_user_highpage_mc(struct page *to, struct page *from,
+		unsigned long vaddr, struct vm_area_struct *vma);
+#define __HAVE_ARCH_COPY_USER_HIGHPAGE_MC
+#endif
+
 struct page *alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
 						unsigned long vaddr);
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE_MOVABLE
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29490be2546b..29c578414b12 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -22,3 +22,5 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 obj-$(CONFIG_ARM64_MTE) += mte.o
 
 obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
+
+obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
diff --git a/arch/arm64/lib/copy_page_mc.S b/arch/arm64/lib/copy_page_mc.S
new file mode 100644
index 000000000000..cbf56e661efe
--- /dev/null
+++ b/arch/arm64/lib/copy_page_mc.S
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/assembler.h>
+#include <asm/page.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative.h>
+
+/*
+ * Copy a page from src to dest (both are page aligned) with machine check
+ *
+ * Parameters:
+ *	x0 - dest
+ *	x1 - src
+ */
+SYM_FUNC_START(__pi_copy_page_mc)
+alternative_if ARM64_HAS_NO_HW_PREFETCH
+	// Prefetch three cache lines ahead.
+	prfm	pldl1strm, [x1, #128]
+	prfm	pldl1strm, [x1, #256]
+	prfm	pldl1strm, [x1, #384]
+alternative_else_nop_endif
+
+100:	ldp	x2, x3, [x1]
+101:	ldp	x4, x5, [x1, #16]
+102:	ldp	x6, x7, [x1, #32]
+103:	ldp	x8, x9, [x1, #48]
+104:	ldp	x10, x11, [x1, #64]
+105:	ldp	x12, x13, [x1, #80]
+106:	ldp	x14, x15, [x1, #96]
+107:	ldp	x16, x17, [x1, #112]
+
+	add	x0, x0, #256
+	add	x1, x1, #128
+1:
+	tst	x0, #(PAGE_SIZE - 1)
+
+alternative_if ARM64_HAS_NO_HW_PREFETCH
+	prfm	pldl1strm, [x1, #384]
+alternative_else_nop_endif
+
+	stnp	x2, x3, [x0, #-256]
+200:	ldp	x2, x3, [x1]
+	stnp	x4, x5, [x0, #16 - 256]
+201:	ldp	x4, x5, [x1, #16]
+	stnp	x6, x7, [x0, #32 - 256]
+202:	ldp	x6, x7, [x1, #32]
+	stnp	x8, x9, [x0, #48 - 256]
+203:	ldp	x8, x9, [x1, #48]
+	stnp	x10, x11, [x0, #64 - 256]
+204:	ldp	x10, x11, [x1, #64]
+	stnp	x12, x13, [x0, #80 - 256]
+205:	ldp	x12, x13, [x1, #80]
+	stnp	x14, x15, [x0, #96 - 256]
+206:	ldp	x14, x15, [x1, #96]
+	stnp	x16, x17, [x0, #112 - 256]
+207:	ldp	x16, x17, [x1, #112]
+
+	add	x0, x0, #128
+	add	x1, x1, #128
+
+	b.ne	1b
+
+	stnp	x2, x3, [x0, #-256]
+	stnp	x4, x5, [x0, #16 - 256]
+	stnp	x6, x7, [x0, #32 - 256]
+	stnp	x8, x9, [x0, #48 - 256]
+	stnp	x10, x11, [x0, #64 - 256]
+	stnp	x12, x13, [x0, #80 - 256]
+	stnp	x14, x15, [x0, #96 - 256]
+	stnp	x16, x17, [x0, #112 - 256]
+
+300:	ret
+
+_asm_extable_mc 100b, 300b
+_asm_extable_mc 101b, 300b
+_asm_extable_mc 102b, 300b
+_asm_extable_mc 103b, 300b
+_asm_extable_mc 104b, 300b
+_asm_extable_mc 105b, 300b
+_asm_extable_mc 106b, 300b
+_asm_extable_mc 107b, 300b
+_asm_extable_mc 200b, 300b
+_asm_extable_mc 201b, 300b
+_asm_extable_mc 202b, 300b
+_asm_extable_mc 203b, 300b
+_asm_extable_mc 204b, 300b
+_asm_extable_mc 205b, 300b
+_asm_extable_mc 206b, 300b
+_asm_extable_mc 207b, 300b
+
+SYM_FUNC_END(__pi_copy_page_mc)
+SYM_FUNC_ALIAS(copy_page_mc, __pi_copy_page_mc)
+EXPORT_SYMBOL(copy_page_mc)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 0dea80bf6de4..0f28edfcb234 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -14,13 +14,8 @@
 #include <asm/cpufeature.h>
 #include <asm/mte.h>
 
-void copy_highpage(struct page *to, struct page *from)
+static void do_mte(struct page *to, struct page *from, void *kto, void *kfrom)
 {
-	void *kto = page_address(to);
-	void *kfrom = page_address(from);
-
-	copy_page(kto, kfrom);
-
 	if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) {
 		set_bit(PG_mte_tagged, &to->flags);
 		page_kasan_tag_reset(to);
@@ -35,6 +30,15 @@ void copy_highpage(struct page *to, struct page *from)
 		mte_copy_page_tags(kto, kfrom);
 	}
 }
+
+void copy_highpage(struct page *to, struct page *from)
+{
+	void *kto = page_address(to);
+	void *kfrom = page_address(from);
+
+	copy_page(kto, kfrom);
+	do_mte(to, from, kto, kfrom);
+}
 EXPORT_SYMBOL(copy_highpage);
 
 void copy_user_highpage(struct page *to, struct page *from,
@@ -44,3 +48,23 @@ void copy_user_highpage(struct page *to, struct page *from,
 	flush_dcache_page(to);
 }
 EXPORT_SYMBOL_GPL(copy_user_highpage);
+
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+void copy_highpage_mc(struct page *to, struct page *from)
+{
+	void *kto = page_address(to);
+	void *kfrom = page_address(from);
+
+	copy_page_mc(kto, kfrom);
+	do_mte(to, from, kto, kfrom);
+}
+EXPORT_SYMBOL(copy_highpage_mc);
+
+void copy_user_highpage_mc(struct page *to, struct page *from,
+			unsigned long vaddr, struct vm_area_struct *vma)
+{
+	copy_highpage_mc(to, from);
+	flush_dcache_page(to);
+}
+EXPORT_SYMBOL_GPL(copy_user_highpage_mc);
+#endif
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 39bb9b47fa9c..a9dbf331b038 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -283,6 +283,10 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
 
 #endif
 
+#ifndef __HAVE_ARCH_COPY_USER_HIGHPAGE_MC
+#define copy_user_highpage_mc copy_user_highpage
+#endif
+
 #ifndef __HAVE_ARCH_COPY_HIGHPAGE
 
 static inline void copy_highpage(struct page *to, struct page *from)
@@ -298,6 +302,10 @@ static inline void copy_highpage(struct page *to, struct page *from)
 
 #endif
 
+#ifndef __HAVE_ARCH_COPY_HIGHPAGE_MC
+#define cop_highpage_mc copy_highpage
+#endif
+
 static inline void memcpy_page(struct page *dst_page, size_t dst_off,
 			       struct page *src_page, size_t src_off,
 			       size_t len)
diff --git a/mm/memory.c b/mm/memory.c
index 76e3af9639d9..d5f62234152d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2767,7 +2767,7 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	unsigned long addr = vmf->address;
 
 	if (likely(src)) {
-		copy_user_highpage(dst, src, addr, vma);
+		copy_user_highpage_mc(dst, src, addr, vma);
 		return true;
 	}
 
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
                   ` (5 preceding siblings ...)
  2022-04-06  9:13 ` [RFC PATCH -next V2 6/7] arm64: add cow " Tong Tiangen
@ 2022-04-06  9:13 ` Tong Tiangen
  2022-04-06 11:27   ` Mark Rutland
  2022-04-06 10:04 ` [RFC PATCH -next V2 0/7]arm64: add machine check safe support Mark Rutland
  7 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin
  Cc: linux-arm-kernel, linux-kernel, linux-mm, Tong Tiangen

When user process reading file, the data is cached in pagecache and
the data belongs to the user process, When machine check error is
encountered during pagecache reading, killing the user process and
isolate the user page with hardware memory errors is a more reasonable
choice than kernel panic.

The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
from __arch_copy_to_user() in copy_to_user.S and the main difference
is __arch_copy_mc_to_user() add the extable entry to support machine
check safe.

In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
which is used by pagecache reading.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/arm64/include/asm/uaccess.h | 15 ++++++
 arch/arm64/lib/Makefile          |  2 +-
 arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
 include/linux/uio.h              |  9 +++-
 lib/iov_iter.c                   | 85 +++++++++++++++++++++++++-------
 5 files changed, 170 insertions(+), 19 deletions(-)
 create mode 100644 arch/arm64/lib/copy_to_user_mc.S

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 24b662407fbd..f0d5e811165a 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count);
 
 extern __must_check long strnlen_user(const char __user *str, long n);
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+extern unsigned long __must_check __arch_copy_mc_to_user(void __user *to,
+							 const void *from, unsigned long n);
+static inline unsigned long __must_check
+copy_mc_to_user(void __user *to, const void *from, unsigned long n)
+{
+	uaccess_ttbr0_enable();
+	n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
+	uaccess_ttbr0_disable();
+
+	return n;
+}
+#define copy_mc_to_user copy_mc_to_user
+#endif
+
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 struct page;
 void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t len);
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29c578414b12..9b3571227fb4 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
 
 obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
 
-obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
+obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
diff --git a/arch/arm64/lib/copy_to_user_mc.S b/arch/arm64/lib/copy_to_user_mc.S
new file mode 100644
index 000000000000..9d228ff15446
--- /dev/null
+++ b/arch/arm64/lib/copy_to_user_mc.S
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/asm-uaccess.h>
+#include <asm/assembler.h>
+#include <asm/cache.h>
+
+/*
+ * Copy to user space from a kernel buffer (alignment handled by the hardware)
+ *
+ * Parameters:
+ *	x0 - to
+ *	x1 - from
+ *	x2 - n
+ * Returns:
+ *	x0 - bytes not copied
+ */
+	.macro ldrb1 reg, ptr, val
+	1000: ldrb  \reg, [\ptr], \val;
+	_asm_extable_mc 1000b, 9998f;
+	.endm
+
+	.macro strb1 reg, ptr, val
+	user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
+	.endm
+
+	.macro ldrh1 reg, ptr, val
+	1001: ldrh  \reg, [\ptr], \val;
+	_asm_extable_mc 1001b, 9998f;
+	.endm
+
+	.macro strh1 reg, ptr, val
+	user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
+	.endm
+
+	.macro ldr1 reg, ptr, val
+	1002: ldr \reg, [\ptr], \val;
+	_asm_extable_mc 1002b, 9998f;
+	.endm
+
+	.macro str1 reg, ptr, val
+	user_ldst_mc 9997f, sttr, \reg, \ptr, \val
+	.endm
+
+	.macro ldp1 reg1, reg2, ptr, val
+	1003: ldp \reg1, \reg2, [\ptr], \val;
+	_asm_extable_mc 1003b, 9998f;
+	.endm
+
+	.macro stp1 reg1, reg2, ptr, val
+	user_stp 9997f, \reg1, \reg2, \ptr, \val
+	.endm
+
+end	.req	x5
+srcin	.req	x15
+SYM_FUNC_START(__arch_copy_mc_to_user)
+	add	end, x0, x2
+	mov	srcin, x1
+#include "copy_template.S"
+	mov	x0, #0
+	ret
+
+	// Exception fixups
+9997:	cbz	x0, 9998f			// Check machine check exception
+	cmp	dst, dstin
+	b.ne	9998f
+	// Before being absolutely sure we couldn't copy anything, try harder
+	ldrb	tmp1w, [srcin]
+USER(9998f, sttrb tmp1w, [dst])
+	add	dst, dst, #1
+9998:	sub	x0, end, dst			// bytes not copied
+	ret
+SYM_FUNC_END(__arch_copy_mc_to_user)
+EXPORT_SYMBOL(__arch_copy_mc_to_user)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..539d9ee9b032 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
+			    struct iov_iter *i);
+#else
+#define copy_mc_page_to_iter copy_page_to_iter
+#endif
+
 static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
 		size_t bytes, struct iov_iter *i)
 {
-	return copy_page_to_iter(&folio->page, offset, bytes, i);
+	return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
 }
 
 static __always_inline __must_check
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..2c5f3bb6391d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -157,6 +157,19 @@ static int copyout(void __user *to, const void *from, size_t n)
 	return n;
 }
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+static int copyout_mc(void __user *to, const void *from, size_t n)
+{
+	if (access_ok(to, n)) {
+		instrument_copy_to_user(to, from, n);
+		n = copy_mc_to_user((__force void *) to, from, n);
+	}
+	return n;
+}
+#else
+#define copyout_mc copyout
+#endif
+
 static int copyin(void *to, const void __user *from, size_t n)
 {
 	if (should_fail_usercopy())
@@ -169,7 +182,7 @@ static int copyin(void *to, const void __user *from, size_t n)
 }
 
 static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
+			 struct iov_iter *i, bool mc_safe)
 {
 	size_t skip, copy, left, wanted;
 	const struct iovec *iov;
@@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
 		from = kaddr + offset;
 
 		/* first chunk, usually the only one */
-		left = copyout(buf, from, copy);
+		if (mc_safe)
+			left = copyout_mc(buf, from, copy);
+		else
+			left = copyout(buf, from, copy);
 		copy -= left;
 		skip += copy;
 		from += copy;
@@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
 			iov++;
 			buf = iov->iov_base;
 			copy = min(bytes, iov->iov_len);
-			left = copyout(buf, from, copy);
+			if (mc_safe)
+				left = copyout_mc(buf, from, copy);
+			else
+				left = copyout(buf, from, copy);
 			copy -= left;
 			skip = copy;
 			from += copy;
@@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
 
 	kaddr = kmap(page);
 	from = kaddr + offset;
-	left = copyout(buf, from, copy);
+	if (mc_safe)
+		left = copyout_mc(buf, from, copy);
+	else
+		left = copyout(buf, from, copy);
 	copy -= left;
 	skip += copy;
 	from += copy;
@@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
 		iov++;
 		buf = iov->iov_base;
 		copy = min(bytes, iov->iov_len);
-		left = copyout(buf, from, copy);
+		if (mc_safe)
+			left = copyout_mc(buf, from, copy);
+		else
+			left = copyout(buf, from, copy);
 		copy -= left;
 		skip = copy;
 		from += copy;
@@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL(_copy_to_iter);
 
 #ifdef CONFIG_ARCH_HAS_COPY_MC
-static int copyout_mc(void __user *to, const void *from, size_t n)
-{
-	if (access_ok(to, n)) {
-		instrument_copy_to_user(to, from, n);
-		n = copy_mc_to_user((__force void *) to, from, n);
-	}
-	return n;
-}
-
 static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
 				struct iov_iter *i)
 {
@@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
 }
 
 static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
+			 struct iov_iter *i, bool mc_safe)
 {
 	if (likely(iter_is_iovec(i)))
-		return copy_page_to_iter_iovec(page, offset, bytes, i);
+		return copy_page_to_iter_iovec(page, offset, bytes, i, mc_safe);
 	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
 		void *kaddr = kmap_local_page(page);
 		size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
@@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 	offset %= PAGE_SIZE;
 	while (1) {
 		size_t n = __copy_page_to_iter(page, offset,
-				min(bytes, (size_t)PAGE_SIZE - offset), i);
+				min(bytes, (size_t)PAGE_SIZE - offset), i, false);
 		res += n;
 		bytes -= n;
 		if (!bytes || !n)
@@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_to_iter);
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+/**
+ * copy_mc_page_to_iter - copy page to iter with source memory error exception handling.
+ *
+ * The filemap_read deploys this for pagecache reading and the main differences between
+ * this and typical copy_page_to_iter() is call __copy_page_to_iter with mc_safe true.
+ *
+ * Return: number of bytes copied (may be %0)
+ */
+size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
+			 struct iov_iter *i)
+{
+	size_t res = 0;
+
+	if (unlikely(!page_copy_sane(page, offset, bytes)))
+		return 0;
+	page += offset / PAGE_SIZE; // first subpage
+	offset %= PAGE_SIZE;
+	while (1) {
+		size_t n = __copy_page_to_iter(page, offset,
+				min(bytes, (size_t)PAGE_SIZE - offset), i, true);
+		res += n;
+		bytes -= n;
+		if (!bytes || !n)
+			break;
+		offset += n;
+		if (offset == PAGE_SIZE) {
+			page++;
+			offset = 0;
+		}
+	}
+	return res;
+}
+#endif
+
 size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error
  2022-04-06  9:13 ` [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error Tong Tiangen
@ 2022-04-06  9:22   ` Borislav Petkov
  2022-04-06 10:02     ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2022-04-06  9:22 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm

On Wed, Apr 06, 2022 at 09:13:05AM +0000, Tong Tiangen wrote:
> The follow patch

There's no concept of "follow patch" in git - you need to explain this
differently.

> will add copy_mc_to_user to include/linux/uaccess.h, X86
> must declare Implemented to avoid compile error.

I don't know what that means. Try again pls.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error
  2022-04-06  9:22   ` Borislav Petkov
@ 2022-04-06 10:02     ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06 10:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm



在 2022/4/6 17:22, Borislav Petkov 写道:
> On Wed, Apr 06, 2022 at 09:13:05AM +0000, Tong Tiangen wrote:
>> The follow patch
> 
> There's no concept of "follow patch" in git - you need to explain this
> differently.
> 
>> will add copy_mc_to_user to include/linux/uaccess.h, X86
>> must declare Implemented to avoid compile error.
> 
> I don't know what that means. Try again pls.
> 

This description is not good, will redescribe it in next version.

Here I describe the reasons for this:

Patch 3/7 in patchset[1] introduce copy_mc_to_user() in 
include/linux/uaccess.h and the prototype is:

     static inline unsigned long __must_check
     copy_mc_to_user(void *dst, const void *src, size_t cnt)

The prototype in x86 is:

     unsigned long __must_check
     copy_mc_to_user(void *to, const void *from, unsigned len);

This two are a little different, so I added the follow code to x86 to
avoid prototype conflict compile error.

     #define copy_mc_to_user copy_mc_to_user

In addition, I think this #define should be added here.

[1]https://patchwork.kernel.org/project/linux-mm/cover/20220406091311.3354723-1-tongtiangen@huawei.com/

Thanks.
Tong.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 0/7]arm64: add machine check safe support
  2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
                   ` (6 preceding siblings ...)
  2022-04-06  9:13 ` [RFC PATCH -next V2 7/7] arm64: add pagecache reading " Tong Tiangen
@ 2022-04-06 10:04 ` Mark Rutland
  2022-04-07  4:21   ` Tong Tiangen
  7 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-06 10:04 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	james.morse

Hi,

In future, for the arm64 uaccess stuff, could you please CC me, and for the
arm64 RAS bits (e.g. the SEA handling), could you please CC James Morse?

On Wed, Apr 06, 2022 at 09:13:04AM +0000, Tong Tiangen wrote:
> This patchset is based on[1].

That link below appears to be a single patch. Sending that separately makes
this harder to review, so in future could you please send this as a combined
series?

> With the increase of memory capacity and density, the probability of
> memory error increases. The increasing size and density of server RAM
> in the data center and cloud have shown increased uncorrectable memory
> errors.
> 
> Currently, the kernel has a mechanism to recover from hardware memory
> errors. This patchset provides an new recovery mechanism.
> 
> For ARM64, the hardware error handling is do_sea() which divided into
> two cases:
> 1. The user state consumed the memory errors, the solution is kill th
>      user process and isolate the error page.
> 2. The kernel state consumed the memory errors, the solution is panic.
> 
> For kernelspace, Undifferentiated panic maybe not the optimal choice,
> it can be handled better.
> 
> This patchset deals with four sscenarios of hardware memory error consumed
> in kernelspace:
> 1. copy_from_user.
> 2. get_user.

What about atomics to user memory? e.g. futexes, or the armv8_deprecated
emulations?

It seems the assumption is that writing to user memory (e.g. copy_to_user() and
put_user()) don't matter? Could you please mention why? e.g. do we never take
an exception for writes to memory with errors?

> 3. cow(copy on write).
> 4. pagecache reading.

There are a bunch of other places where we'll access user memory via the linear
map, so I assume this is just a best-effort "try not to die" rather than "never
die" ?

Are there other places we might need/want to expand this to in future?

Thanks,
Mark.

> These four scenarios have similarities. Although the error is consumed in
> the kernel state, but the consumed data belongs to the user state.
> 
> The processing scheme is based on CONFIG_ARCH_HAS_COPY_MC and uses the
> process killing plus isolate error page to replace kernel panic.
> 
> [1]https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtiangen@huawei.com/
> 
> Since V2:
>  1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
>    ARM64_UCE_KERNEL_RECOVERY.
>  2.Add two new scenarios, cow and pagecache reading.
>  3.Fix two small bug(the first two patch).
> 
> Tong Tiangen (7):
>   x86: fix copy_mc_to_user compile error
>   arm64: fix page_address return value in copy_highpage
>   arm64: add support for machine check error safe
>   arm64: add copy_from_user to machine check safe
>   arm64: add get_user to machine check safe
>   arm64: add cow to machine check safe
>   arm64: add pagecache reading to machine check safe
> 
>  arch/arm64/Kconfig                   |  1 +
>  arch/arm64/include/asm/asm-extable.h | 25 +++++++
>  arch/arm64/include/asm/asm-uaccess.h | 16 +++++
>  arch/arm64/include/asm/esr.h         |  5 ++
>  arch/arm64/include/asm/extable.h     |  2 +-
>  arch/arm64/include/asm/page.h        | 10 +++
>  arch/arm64/include/asm/uaccess.h     | 17 ++++-
>  arch/arm64/kernel/probes/kprobes.c   |  2 +-
>  arch/arm64/lib/Makefile              |  2 +
>  arch/arm64/lib/copy_from_user.S      | 11 ++--
>  arch/arm64/lib/copy_page_mc.S        | 98 ++++++++++++++++++++++++++++
>  arch/arm64/lib/copy_to_user_mc.S     | 78 ++++++++++++++++++++++
>  arch/arm64/mm/copypage.c             | 36 ++++++++--
>  arch/arm64/mm/extable.c              | 21 +++++-
>  arch/arm64/mm/fault.c                | 30 ++++++++-
>  arch/x86/include/asm/uaccess.h       |  1 +
>  include/linux/highmem.h              |  8 +++
>  include/linux/uaccess.h              |  8 +++
>  include/linux/uio.h                  |  9 ++-
>  lib/iov_iter.c                       | 85 +++++++++++++++++++-----
>  mm/memory.c                          |  2 +-
>  21 files changed, 432 insertions(+), 35 deletions(-)
>  create mode 100644 arch/arm64/lib/copy_page_mc.S
>  create mode 100644 arch/arm64/lib/copy_to_user_mc.S
> 
> -- 
> 2.18.0.huawei.25
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage
  2022-04-06  9:13 ` [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage Tong Tiangen
@ 2022-04-06 10:22   ` Mark Rutland
  2022-04-06 12:47     ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-06 10:22 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	Vincenzo Frascino

On Wed, Apr 06, 2022 at 09:13:06AM +0000, Tong Tiangen wrote:
> Function page_address return void, fix it.
> 
> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>

This looks like a sensible cleanup, but the commit title and message aren't
that clear.

Can you please make this:

| arm64: fix types in copy_highpage()
|
| In copy_highpage() the `kto` and `kfrom` local variables are pointers to
| struct page, but these are used to hold arbitrary pointers to kernel memory.
| Each call to page_address() returns a void pointer to memory associated with
| the relevant page, and copy_page() expects void pointers to this memory.
|
| This inconsistency was introduced in commit:
|
|   2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
|
| ... and while this doesn't appear to be harmful in practice it is clearly wrong.
|
| Correct this by making `kto` and `kfrom` void pointers.
|
| Fixes: 2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")

With that:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/mm/copypage.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
> index b5447e53cd73..0dea80bf6de4 100644
> --- a/arch/arm64/mm/copypage.c
> +++ b/arch/arm64/mm/copypage.c
> @@ -16,8 +16,8 @@
>  
>  void copy_highpage(struct page *to, struct page *from)
>  {
> -	struct page *kto = page_address(to);
> -	struct page *kfrom = page_address(from);
> +	void *kto = page_address(to);
> +	void *kfrom = page_address(from);
>  
>  	copy_page(kto, kfrom);
>  
> -- 
> 2.18.0.huawei.25
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe
  2022-04-06  9:13 ` [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe Tong Tiangen
@ 2022-04-06 10:58   ` Mark Rutland
  2022-04-07 14:26     ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-06 10:58 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	james.morse

On Wed, Apr 06, 2022 at 09:13:07AM +0000, Tong Tiangen wrote:
> In arm64 kernel hardware memory errors process(do_sea()), if the errors
> is consumed in the kernel, the current processing is panic. However,
> it is not optimal. In some case, the page accessed in kernel is a user
> page (such as copy_from_user/get_user), kill the user process and
> isolate the user page with hardware memory errors is a better choice.
>
> Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.

Why do we need new helpers for this, rather than doing this for *any* uaccess?

I understand this is consistent with PPC & X86, but *why* is it done that way
today? e.g. are there cases where we access memroy where we do not expect the
situation to be recoverable?

> This patch only enable machine error check framework, it add exception
> fixup before kernel panic in do_sea() and only limit the consumption of
> hardware memory errors in kernel mode triggered by user mode processes.
> If fixup successful, there is no need to panic.
> 
> Also add _asm_extable_mc macro used for add extable entry to help
> fixup.
> 
> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
> ---
>  arch/arm64/Kconfig                   |  1 +
>  arch/arm64/include/asm/asm-extable.h | 13 ++++++++++++
>  arch/arm64/include/asm/esr.h         |  5 +++++
>  arch/arm64/include/asm/extable.h     |  2 +-
>  arch/arm64/kernel/probes/kprobes.c   |  2 +-
>  arch/arm64/mm/extable.c              | 20 ++++++++++++++++++-
>  arch/arm64/mm/fault.c                | 30 +++++++++++++++++++++++++++-
>  include/linux/uaccess.h              |  8 ++++++++
>  8 files changed, 77 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index d9325dd95eba..012e38309955 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,6 +19,7 @@ config ARM64
>  	select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>  	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>  	select ARCH_HAS_CACHE_LINE_SIZE
> +	select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
>  	select ARCH_HAS_CURRENT_STACK_POINTER
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
> index c39f2437e08e..74d1db74fd86 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -8,6 +8,11 @@
>  #define EX_TYPE_UACCESS_ERR_ZERO	3
>  #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD	4
>  
> +/* _MC indicates that can fixup from machine check errors */
> +#define EX_TYPE_FIXUP_MC		5
> +
> +#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)

If we need this, I'd strongly prefer that we have a EX_TYPE_UACCESS_MC or
EX_TYPE_UACCESS_MC_ERR_ZERO for the uaccess cases, so that we can clearly
distinguish those from non-uaccess cases.

AFAICT the only remaining raw EX_TYPE_FIXUP cases we have today are in some
cache maintenance routines, and we should be able to convert those to a new
EX_TYPE_FIXUP_UACCESS, or EX_TYPE_UACCESS_ERR_ZERO.

> +
>  #ifdef __ASSEMBLY__
>  
>  #define __ASM_EXTABLE_RAW(insn, fixup, type, data)	\
> @@ -27,6 +32,14 @@
>  	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
>  	.endm
>  
> +/*
> + * Create an exception table entry for `insn`, which will branch to `fixup`
> + * when an unhandled fault(include sea fault) is taken.
> + */
> +	.macro		_asm_extable_mc, insn, fixup
> +	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP_MC, 0)
> +	.endm
> +
>  /*
>   * Create an exception table entry for `insn` if `fixup` is provided. Otherwise
>   * do nothing.
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index d52a0b269ee8..11fcfc002654 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -330,6 +330,11 @@
>  #ifndef __ASSEMBLY__
>  #include <asm/types.h>
>  
> +static inline bool esr_is_sea(u32 esr)
> +{
> +	return (esr & ESR_ELx_FSC) == ESR_ELx_FSC_EXTABT;
> +}
> +
>  static inline bool esr_is_data_abort(u32 esr)
>  {
>  	const u32 ec = ESR_ELx_EC(esr);
> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
> index 72b0e71cc3de..f7835b0f473b 100644
> --- a/arch/arm64/include/asm/extable.h
> +++ b/arch/arm64/include/asm/extable.h
> @@ -45,5 +45,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>  }
>  #endif /* !CONFIG_BPF_JIT */
>  
> -bool fixup_exception(struct pt_regs *regs);
> +bool fixup_exception(struct pt_regs *regs, unsigned int esr);
>  #endif
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index d9dfa82c1f18..16a069e8eec3 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -285,7 +285,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
>  		 * In case the user-specified fault handler returned
>  		 * zero, try to fix up.
>  		 */
> -		if (fixup_exception(regs))
> +		if (fixup_exception(regs, fsr))
>  			return 1;
>  	}
>  	return 0;
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 489455309695..f1134c88e849 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -9,6 +9,7 @@
>  
>  #include <asm/asm-extable.h>
>  #include <asm/ptrace.h>
> +#include <asm/esr.h>
>  
>  static inline unsigned long
>  get_ex_fixup(const struct exception_table_entry *ex)
> @@ -23,6 +24,18 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex,
>  	return true;
>  }
>  
> +static bool ex_handler_fixup_mc(const struct exception_table_entry *ex,
> +				struct pt_regs *regs, unsigned int esr)
> +{
> +	if (esr_is_sea(esr))
> +		regs->regs[0] = 0;
> +	else
> +		regs->regs[0] = 1;

This needs more explanation.

Why does this hard-code an assumption that we can alter x0?

Why is the x0 value distinct for SEA or non-SEA? What is this meant to
represent specifically?

What if this SEA was taken for a reason other than a memory error?

> +
> +	regs->pc = get_ex_fixup(ex);
> +	return true;
> +}
> +
>  static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
>  					struct pt_regs *regs)
>  {
> @@ -63,7 +76,7 @@ ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex,
>  	return true;
>  }
>  
> -bool fixup_exception(struct pt_regs *regs)
> +bool fixup_exception(struct pt_regs *regs, unsigned int esr)
>  {
>  	const struct exception_table_entry *ex;
>  
> @@ -71,9 +84,14 @@ bool fixup_exception(struct pt_regs *regs)
>  	if (!ex)
>  		return false;
>  
> +	if (esr_is_sea(esr) && !IS_EX_TYPE_MC(ex->type))
> +		return false;

I don't think this check belongs here.

Either this should be folded into ex_handler_fixup_mc(), or we should make the
judgement earlier in the fault handling path, and have a separate
fixup_exception_mc() that we can call specifically in the case of a memory
error.

> +
>  	switch (ex->type) {
>  	case EX_TYPE_FIXUP:
>  		return ex_handler_fixup(ex, regs);
> +	case EX_TYPE_FIXUP_MC:
> +		return ex_handler_fixup_mc(ex, regs, esr);
>  	case EX_TYPE_BPF:
>  		return ex_handler_bpf(ex, regs);
>  	case EX_TYPE_UACCESS_ERR_ZERO:
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 77341b160aca..ffdfab2fdd60 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -361,7 +361,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>  	 * Are we prepared to handle this kernel fault?
>  	 * We are almost certainly not prepared to handle instruction faults.
>  	 */
> -	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
> +	if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
>  		return;
>  
>  	if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
> @@ -695,6 +695,30 @@ static int do_bad(unsigned long far, unsigned int esr, struct pt_regs *regs)
>  	return 1; /* "fault" */
>  }
>  
> +static bool arm64_process_kernel_sea(unsigned long addr, unsigned int esr,
> +				     struct pt_regs *regs, int sig, int code)
> +{
> +	if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
> +		return false;
> +
> +	if (user_mode(regs) || !current->mm)
> +		return false;
> +
> +	if (apei_claim_sea(regs) < 0)
> +		return false;
> +
> +	current->thread.fault_address = 0;
> +	current->thread.fault_code = esr;
> +
> +	if (!fixup_exception(regs, esr))
> +		return false;
> +
> +	arm64_force_sig_fault(sig, code, addr,
> +		"Uncorrected hardware memory error in kernel-access\n");
> +
> +	return true;
> +}
> +
>  static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>  {
>  	const struct fault_info *inf;
> @@ -720,6 +744,10 @@ static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>  		 */
>  		siaddr  = untagged_addr(far);
>  	}
> +
> +	if (arm64_process_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
> +		return 0;
> +
>  	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>  
>  	return 0;
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 546179418ffa..dd952aeecdc1 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -174,6 +174,14 @@ copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
>  }
>  #endif
>  
> +#ifndef copy_mc_to_user
> +static inline unsigned long __must_check
> +copy_mc_to_user(void *dst, const void *src, size_t cnt)
> +{
> +	return raw_copy_to_user(dst, src, cnt);
> +}

... this isn't using the new EX_TYPE_FIXUP_MC type, so isn't this just broken
as of this patch?

Thanks,
Mark.

> +#endif
> +
>  static __always_inline void pagefault_disabled_inc(void)
>  {
>  	current->pagefault_disabled++;
> -- 
> 2.18.0.huawei.25
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe
  2022-04-06  9:13 ` [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe Tong Tiangen
@ 2022-04-06 11:19   ` Mark Rutland
  2022-04-07 14:28     ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-06 11:19 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm

On Wed, Apr 06, 2022 at 09:13:08AM +0000, Tong Tiangen wrote:
> Add scenarios copy_from_user to machine check safe.
> 
> The data copied is user data and is machine check safe, so just kill
> the user process and isolate the error page, not necessary panic.
> 
> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
> ---
>  arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++
>  arch/arm64/lib/copy_from_user.S      | 11 ++++++-----
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
> index 0557af834e03..f31c8978e1af 100644
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ b/arch/arm64/include/asm/asm-uaccess.h
> @@ -92,4 +92,20 @@ alternative_else_nop_endif
>  
>  		_asm_extable	8888b,\l;
>  	.endm
> +
> +	.macro user_ldp_mc l, reg1, reg2, addr, post_inc
> +8888:		ldtr	\reg1, [\addr];
> +8889:		ldtr	\reg2, [\addr, #8];
> +		add	\addr, \addr, \post_inc;
> +
> +		_asm_extable_mc	8888b, \l;
> +		_asm_extable_mc	8889b, \l;
> +	.endm
> +
> +	.macro user_ldst_mc l, inst, reg, addr, post_inc
> +8888:		\inst		\reg, [\addr];
> +		add		\addr, \addr, \post_inc;
> +
> +		_asm_extable_mc	8888b, \l;
> +	.endm
>  #endif
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..d9d7c5291871 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -21,7 +21,7 @@
>   */
>  
>  	.macro ldrb1 reg, ptr, val
> -	user_ldst 9998f, ldtrb, \reg, \ptr, \val
> +	user_ldst_mc 9998f, ldtrb, \reg, \ptr, \val
>  	.endm
>  
>  	.macro strb1 reg, ptr, val
> @@ -29,7 +29,7 @@
>  	.endm
>  
>  	.macro ldrh1 reg, ptr, val
> -	user_ldst 9997f, ldtrh, \reg, \ptr, \val
> +	user_ldst_mc 9997f, ldtrh, \reg, \ptr, \val
>  	.endm
>  
>  	.macro strh1 reg, ptr, val
> @@ -37,7 +37,7 @@
>  	.endm
>  
>  	.macro ldr1 reg, ptr, val
> -	user_ldst 9997f, ldtr, \reg, \ptr, \val
> +	user_ldst_mc 9997f, ldtr, \reg, \ptr, \val
>  	.endm
>  
>  	.macro str1 reg, ptr, val
> @@ -45,7 +45,7 @@
>  	.endm
>  
>  	.macro ldp1 reg1, reg2, ptr, val
> -	user_ldp 9997f, \reg1, \reg2, \ptr, \val
> +	user_ldp_mc 9997f, \reg1, \reg2, \ptr, \val
>  	.endm
>  
>  	.macro stp1 reg1, reg2, ptr, val
> @@ -62,7 +62,8 @@ SYM_FUNC_START(__arch_copy_from_user)
>  	ret
>  
>  	// Exception fixups
> -9997:	cmp	dst, dstin
> +9997:	cbz	x0, 9998f			// Check machine check exception
> +	cmp	dst, dstin
>  	b.ne	9998f

If you look at the copy template, you'd see that `dstin` *is* x0.

Consier if we took a non-SEA fault. The the fixup handler will overwrite x0,
it's likely `dst` != `dstin`, and we'll branch to the byte-by-byte copy. Or if
we're doing something odd and mmap_min_addr is 0, we can do the wrong thing the
other way around and *not* branch to the byte-by-byte copy when we should.

So this is at best confusing, but likely broken too.

Thanks,
Mark.

>  	// Before being absolutely sure we couldn't copy anything, try harder
>  USER(9998f, ldtrb tmp1w, [srcin])
> -- 
> 2.18.0.huawei.25
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 5/7] arm64: add get_user to machine check safe
  2022-04-06  9:13 ` [RFC PATCH -next V2 5/7] arm64: add get_user " Tong Tiangen
@ 2022-04-06 11:22   ` Mark Rutland
  2022-04-07 14:38     ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-06 11:22 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm

On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
> Add scenarios get_user to machine check safe. The processing of
> EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
> and both return -EFAULT.

Which uaccess cases do we expect to *not* be recoverable?

Naively I would assume that if we're going to treat a memory error on a uaccess
as fatal to userspace we should be able to do that for *any* uacesses.

The commit message should explain why we need the distinction between a
recoverable uaccess and a non-recoverable uaccess.

Thanks,
Mark.

> 
> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
> ---
>  arch/arm64/include/asm/asm-extable.h | 14 +++++++++++++-
>  arch/arm64/include/asm/uaccess.h     |  2 +-
>  arch/arm64/mm/extable.c              |  1 +
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
> index 74d1db74fd86..bfc2d224cbae 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -10,8 +10,11 @@
>  
>  /* _MC indicates that can fixup from machine check errors */
>  #define EX_TYPE_FIXUP_MC		5
> +#define EX_TYPE_UACCESS_ERR_ZERO_MC	6
>  
> -#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
> +#define IS_EX_TYPE_MC(type)			\
> +	(type == EX_TYPE_FIXUP_MC ||		\
> +	 type == EX_TYPE_UACCESS_ERR_ZERO_MC)
>  
>  #ifdef __ASSEMBLY__
>  
> @@ -77,6 +80,15 @@
>  #define EX_DATA_REG(reg, gpr)						\
>  	"((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
>  
> +#define _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(insn, fixup, err, zero)		\
> +	__DEFINE_ASM_GPR_NUMS							\
> +	__ASM_EXTABLE_RAW(#insn, #fixup,					\
> +			  __stringify(EX_TYPE_UACCESS_ERR_ZERO_MC),		\
> +			  "("							\
> +			    EX_DATA_REG(ERR, err) " | "				\
> +			    EX_DATA_REG(ZERO, zero)				\
> +			  ")")
> +
>  #define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero)		\
>  	__DEFINE_ASM_GPR_NUMS						\
>  	__ASM_EXTABLE_RAW(#insn, #fixup, 				\
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index e8dce0cc5eaa..24b662407fbd 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
>  	asm volatile(							\
>  	"1:	" load "	" reg "1, [%2]\n"			\
>  	"2:\n"								\
> -	_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1)			\
> +	_ASM_EXTABLE_UACCESS_ERR_ZERO_MC(1b, 2b, %w0, %w1)		\
>  	: "+r" (err), "=&r" (x)						\
>  	: "r" (addr))
>  
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index f1134c88e849..7c05f8d2bce0 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -95,6 +95,7 @@ bool fixup_exception(struct pt_regs *regs, unsigned int esr)
>  	case EX_TYPE_BPF:
>  		return ex_handler_bpf(ex, regs);
>  	case EX_TYPE_UACCESS_ERR_ZERO:
> +	case EX_TYPE_UACCESS_ERR_ZERO_MC:
>  		return ex_handler_uaccess_err_zero(ex, regs);
>  	case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
>  		return ex_handler_load_unaligned_zeropad(ex, regs);
> -- 
> 2.18.0.huawei.25
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-06  9:13 ` [RFC PATCH -next V2 7/7] arm64: add pagecache reading " Tong Tiangen
@ 2022-04-06 11:27   ` Mark Rutland
  2022-04-07 14:56     ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-06 11:27 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm

On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
> When user process reading file, the data is cached in pagecache and
> the data belongs to the user process, When machine check error is
> encountered during pagecache reading, killing the user process and
> isolate the user page with hardware memory errors is a more reasonable
> choice than kernel panic.
> 
> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
> from __arch_copy_to_user() in copy_to_user.S and the main difference
> is __arch_copy_mc_to_user() add the extable entry to support machine
> check safe.

As with prior patches, *why* is the distinction necessary?

This patch adds a bunch of conditional logic, but *structurally* it doesn't
alter the handling to be substantially different for the MC and non-MC cases.

This seems like pointless duplication that just makes it harder to maintain
this code.

Thanks,
Mark.

> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
> which is used by pagecache reading.
> 
> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
> ---
>  arch/arm64/include/asm/uaccess.h | 15 ++++++
>  arch/arm64/lib/Makefile          |  2 +-
>  arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>  include/linux/uio.h              |  9 +++-
>  lib/iov_iter.c                   | 85 +++++++++++++++++++++++++-------
>  5 files changed, 170 insertions(+), 19 deletions(-)
>  create mode 100644 arch/arm64/lib/copy_to_user_mc.S
> 
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 24b662407fbd..f0d5e811165a 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count);
>  
>  extern __must_check long strnlen_user(const char __user *str, long n);
>  
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +extern unsigned long __must_check __arch_copy_mc_to_user(void __user *to,
> +							 const void *from, unsigned long n);
> +static inline unsigned long __must_check
> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
> +{
> +	uaccess_ttbr0_enable();
> +	n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
> +	uaccess_ttbr0_disable();
> +
> +	return n;
> +}
> +#define copy_mc_to_user copy_mc_to_user
> +#endif
> +
>  #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>  struct page;
>  void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t len);
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index 29c578414b12..9b3571227fb4 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>  
>  obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>  
> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
> diff --git a/arch/arm64/lib/copy_to_user_mc.S b/arch/arm64/lib/copy_to_user_mc.S
> new file mode 100644
> index 000000000000..9d228ff15446
> --- /dev/null
> +++ b/arch/arm64/lib/copy_to_user_mc.S
> @@ -0,0 +1,78 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +
> +#include <linux/linkage.h>
> +
> +#include <asm/asm-uaccess.h>
> +#include <asm/assembler.h>
> +#include <asm/cache.h>
> +
> +/*
> + * Copy to user space from a kernel buffer (alignment handled by the hardware)
> + *
> + * Parameters:
> + *	x0 - to
> + *	x1 - from
> + *	x2 - n
> + * Returns:
> + *	x0 - bytes not copied
> + */
> +	.macro ldrb1 reg, ptr, val
> +	1000: ldrb  \reg, [\ptr], \val;
> +	_asm_extable_mc 1000b, 9998f;
> +	.endm
> +
> +	.macro strb1 reg, ptr, val
> +	user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
> +	.endm
> +
> +	.macro ldrh1 reg, ptr, val
> +	1001: ldrh  \reg, [\ptr], \val;
> +	_asm_extable_mc 1001b, 9998f;
> +	.endm
> +
> +	.macro strh1 reg, ptr, val
> +	user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
> +	.endm
> +
> +	.macro ldr1 reg, ptr, val
> +	1002: ldr \reg, [\ptr], \val;
> +	_asm_extable_mc 1002b, 9998f;
> +	.endm
> +
> +	.macro str1 reg, ptr, val
> +	user_ldst_mc 9997f, sttr, \reg, \ptr, \val
> +	.endm
> +
> +	.macro ldp1 reg1, reg2, ptr, val
> +	1003: ldp \reg1, \reg2, [\ptr], \val;
> +	_asm_extable_mc 1003b, 9998f;
> +	.endm
> +
> +	.macro stp1 reg1, reg2, ptr, val
> +	user_stp 9997f, \reg1, \reg2, \ptr, \val
> +	.endm
> +
> +end	.req	x5
> +srcin	.req	x15
> +SYM_FUNC_START(__arch_copy_mc_to_user)
> +	add	end, x0, x2
> +	mov	srcin, x1
> +#include "copy_template.S"
> +	mov	x0, #0
> +	ret
> +
> +	// Exception fixups
> +9997:	cbz	x0, 9998f			// Check machine check exception
> +	cmp	dst, dstin
> +	b.ne	9998f
> +	// Before being absolutely sure we couldn't copy anything, try harder
> +	ldrb	tmp1w, [srcin]
> +USER(9998f, sttrb tmp1w, [dst])
> +	add	dst, dst, #1
> +9998:	sub	x0, end, dst			// bytes not copied
> +	ret
> +SYM_FUNC_END(__arch_copy_mc_to_user)
> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 739285fe5a2f..539d9ee9b032 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
>  size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>  size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
>  
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
> +			    struct iov_iter *i);
> +#else
> +#define copy_mc_page_to_iter copy_page_to_iter
> +#endif
> +
>  static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
>  		size_t bytes, struct iov_iter *i)
>  {
> -	return copy_page_to_iter(&folio->page, offset, bytes, i);
> +	return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>  }
>  
>  static __always_inline __must_check
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 6dd5330f7a99..2c5f3bb6391d 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void *from, size_t n)
>  	return n;
>  }
>  
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +static int copyout_mc(void __user *to, const void *from, size_t n)
> +{
> +	if (access_ok(to, n)) {
> +		instrument_copy_to_user(to, from, n);
> +		n = copy_mc_to_user((__force void *) to, from, n);
> +	}
> +	return n;
> +}
> +#else
> +#define copyout_mc copyout
> +#endif
> +
>  static int copyin(void *to, const void __user *from, size_t n)
>  {
>  	if (should_fail_usercopy())
> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user *from, size_t n)
>  }
>  
>  static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
> -			 struct iov_iter *i)
> +			 struct iov_iter *i, bool mc_safe)
>  {
>  	size_t skip, copy, left, wanted;
>  	const struct iovec *iov;
> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>  		from = kaddr + offset;
>  
>  		/* first chunk, usually the only one */
> -		left = copyout(buf, from, copy);
> +		if (mc_safe)
> +			left = copyout_mc(buf, from, copy);
> +		else
> +			left = copyout(buf, from, copy);
>  		copy -= left;
>  		skip += copy;
>  		from += copy;
> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>  			iov++;
>  			buf = iov->iov_base;
>  			copy = min(bytes, iov->iov_len);
> -			left = copyout(buf, from, copy);
> +			if (mc_safe)
> +				left = copyout_mc(buf, from, copy);
> +			else
> +				left = copyout(buf, from, copy);
>  			copy -= left;
>  			skip = copy;
>  			from += copy;
> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>  
>  	kaddr = kmap(page);
>  	from = kaddr + offset;
> -	left = copyout(buf, from, copy);
> +	if (mc_safe)
> +		left = copyout_mc(buf, from, copy);
> +	else
> +		left = copyout(buf, from, copy);
>  	copy -= left;
>  	skip += copy;
>  	from += copy;
> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>  		iov++;
>  		buf = iov->iov_base;
>  		copy = min(bytes, iov->iov_len);
> -		left = copyout(buf, from, copy);
> +		if (mc_safe)
> +			left = copyout_mc(buf, from, copy);
> +		else
> +			left = copyout(buf, from, copy);
>  		copy -= left;
>  		skip = copy;
>  		from += copy;
> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
>  EXPORT_SYMBOL(_copy_to_iter);
>  
>  #ifdef CONFIG_ARCH_HAS_COPY_MC
> -static int copyout_mc(void __user *to, const void *from, size_t n)
> -{
> -	if (access_ok(to, n)) {
> -		instrument_copy_to_user(to, from, n);
> -		n = copy_mc_to_user((__force void *) to, from, n);
> -	}
> -	return n;
> -}
> -
>  static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>  				struct iov_iter *i)
>  {
> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
>  }
>  
>  static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> -			 struct iov_iter *i)
> +			 struct iov_iter *i, bool mc_safe)
>  {
>  	if (likely(iter_is_iovec(i)))
> -		return copy_page_to_iter_iovec(page, offset, bytes, i);
> +		return copy_page_to_iter_iovec(page, offset, bytes, i, mc_safe);
>  	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
>  		void *kaddr = kmap_local_page(page);
>  		size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>  	offset %= PAGE_SIZE;
>  	while (1) {
>  		size_t n = __copy_page_to_iter(page, offset,
> -				min(bytes, (size_t)PAGE_SIZE - offset), i);
> +				min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>  		res += n;
>  		bytes -= n;
>  		if (!bytes || !n)
> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>  }
>  EXPORT_SYMBOL(copy_page_to_iter);
>  
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +/**
> + * copy_mc_page_to_iter - copy page to iter with source memory error exception handling.
> + *
> + * The filemap_read deploys this for pagecache reading and the main differences between
> + * this and typical copy_page_to_iter() is call __copy_page_to_iter with mc_safe true.
> + *
> + * Return: number of bytes copied (may be %0)
> + */
> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
> +			 struct iov_iter *i)
> +{
> +	size_t res = 0;
> +
> +	if (unlikely(!page_copy_sane(page, offset, bytes)))
> +		return 0;
> +	page += offset / PAGE_SIZE; // first subpage
> +	offset %= PAGE_SIZE;
> +	while (1) {
> +		size_t n = __copy_page_to_iter(page, offset,
> +				min(bytes, (size_t)PAGE_SIZE - offset), i, true);
> +		res += n;
> +		bytes -= n;
> +		if (!bytes || !n)
> +			break;
> +		offset += n;
> +		if (offset == PAGE_SIZE) {
> +			page++;
> +			offset = 0;
> +		}
> +	}
> +	return res;
> +}
> +#endif
> +
>  size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
>  			 struct iov_iter *i)
>  {
> -- 
> 2.18.0.huawei.25
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage
  2022-04-06 10:22   ` Mark Rutland
@ 2022-04-06 12:47     ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-06 12:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	Vincenzo Frascino, james.morse



在 2022/4/6 18:22, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:06AM +0000, Tong Tiangen wrote:
>> Function page_address return void, fix it.
>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
> 
> This looks like a sensible cleanup, but the commit title and message aren't
> that clear.
> 
> Can you please make this:
> 
> | arm64: fix types in copy_highpage()
> |
> | In copy_highpage() the `kto` and `kfrom` local variables are pointers to
> | struct page, but these are used to hold arbitrary pointers to kernel memory.
> | Each call to page_address() returns a void pointer to memory associated with
> | the relevant page, and copy_page() expects void pointers to this memory.
> |
> | This inconsistency was introduced in commit:
> |
> |   2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
> |
> | ... and while this doesn't appear to be harmful in practice it is clearly wrong.
> |
> | Correct this by making `kto` and `kfrom` void pointers.
> |
> | Fixes: 2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
> 
> With that:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Thanks,
> Mark.
> 

OK, sure , will do in next version.

Thanks.
Tong.

>> ---
>>   arch/arm64/mm/copypage.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
>> index b5447e53cd73..0dea80bf6de4 100644
>> --- a/arch/arm64/mm/copypage.c
>> +++ b/arch/arm64/mm/copypage.c
>> @@ -16,8 +16,8 @@
>>   
>>   void copy_highpage(struct page *to, struct page *from)
>>   {
>> -	struct page *kto = page_address(to);
>> -	struct page *kfrom = page_address(from);
>> +	void *kto = page_address(to);
>> +	void *kfrom = page_address(from);
>>   
>>   	copy_page(kto, kfrom);
>>   
>> -- 
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 0/7]arm64: add machine check safe support
  2022-04-06 10:04 ` [RFC PATCH -next V2 0/7]arm64: add machine check safe support Mark Rutland
@ 2022-04-07  4:21   ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-07  4:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	james.morse



在 2022/4/6 18:04, Mark Rutland 写道:
> Hi,
> 
> In future, for the arm64 uaccess stuff, could you please CC me, and for the
> arm64 RAS bits (e.g. the SEA handling), could you please CC James Morse?

ok :)

> 
> On Wed, Apr 06, 2022 at 09:13:04AM +0000, Tong Tiangen wrote:
>> This patchset is based on[1].
> 
> That link below appears to be a single patch. Sending that separately makes
> this harder to review, so in future could you please send this as a combined
> series?
> 
>> With the increase of memory capacity and density, the probability of
>> memory error increases. The increasing size and density of server RAM
>> in the data center and cloud have shown increased uncorrectable memory
>> errors.
>>
>> Currently, the kernel has a mechanism to recover from hardware memory
>> errors. This patchset provides an new recovery mechanism.
>>
>> For ARM64, the hardware error handling is do_sea() which divided into
>> two cases:
>> 1. The user state consumed the memory errors, the solution is kill th
>>       user process and isolate the error page.
>> 2. The kernel state consumed the memory errors, the solution is panic.
>>
>> For kernelspace, Undifferentiated panic maybe not the optimal choice,
>> it can be handled better.
>>
>> This patchset deals with four sscenarios of hardware memory error consumed
>> in kernelspace:
>> 1. copy_from_user.
>> 2. get_user.
> 
> What about atomics to user memory? e.g. futexes, or the armv8_deprecated
> emulations?
> 
> It seems the assumption is that writing to user memory (e.g. copy_to_user() and
> put_user()) don't matter? Could you please mention why? e.g. do we never take
> an exception for writes to memory with errors?

First, explain why only pay attention to the errors that occur when 
reading memory and not when writing memory:

1. For Linux reading page, the Linux is consumer[*], the DDR controller 
is producer. if page with memory error, Linux consumes the error will 
receive an error signal than process the signal.

2. For Linux writing page, the Linux is producer, the DDR controller is 
consumer, the DDR controller will process the memory error.

3. From the perspective of Linux, here we only focus on his situation as 
a consumer. Focus on how Linux responds to errors when reading pages.

[*]For definitions of producers and consumers, refer to the documentation:
Reliability, Availability, and Serviceability (RAS) Architecture Extension

Second, explain why writing to user memory don't matter.

Don't matter means that we will not deal with it in this patchset, but 
follow the current strategy of the kernel(kernel panic). Take 
copy_from[to]_user/get[put]_user as an example:

1. In copy_to_user()/put_user(), it read the kernel page and write to 
user page, We cannot judge the importance of this kernel page that holds 
kernel data, if a memory error is encountered while reading, the normal 
operation of the system after recovery cannot be guaranteed,so the 
current processing strategy of the kernel is panic,we will not change this.

2. In copy_from_user()/get_user(), it read the user page and write to 
kernel page in user process context, this user data is only critical to 
this user process and does not affect the operation of the whole system. 
Therefore, if a memory error is encountered while reading, we can 
recover by killing this process and isolating the error user page 
without going to kernel panic, This patchset is aimed at this situation.
> 
>> 3. cow(copy on write).
>> 4. pagecache reading.
> 
> There are a bunch of other places where we'll access user memory via the linear
> map, so I assume this is just a best-effort "try not to die" rather than "never
> die" ?
> 
> Are there other places we might need/want to expand this to in future?
> 
> Thanks,
> Mark.

Yes.

The strategy is "try not to die" in some specific scene.

In both cases(cow and pagecache reading), when the page with memory 
error is read in user process context, the result is not fatal, because 
the data of the error page is only critical to the user process. Killing 
the process and isolating the error page will not affect the normal 
operation of the system.


I hope I can explain this clearly.

Great thanks to mark and I hope James can help take a look at this idea.

Thanks.
Tong.
> 
>> These four scenarios have similarities. Although the error is consumed in
>> the kernel state, but the consumed data belongs to the user state.
>>
>> The processing scheme is based on CONFIG_ARCH_HAS_COPY_MC and uses the
>> process killing plus isolate error page to replace kernel panic.
>>
>> [1]https://lore.kernel.org/lkml/20220323033705.3966643-1-tongtiangen@huawei.com/
>>
>> Since V2:
>>   1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
>>     ARM64_UCE_KERNEL_RECOVERY.
>>   2.Add two new scenarios, cow and pagecache reading.
>>   3.Fix two small bug(the first two patch).
>>
>> Tong Tiangen (7):
>>    x86: fix copy_mc_to_user compile error
>>    arm64: fix page_address return value in copy_highpage
>>    arm64: add support for machine check error safe
>>    arm64: add copy_from_user to machine check safe
>>    arm64: add get_user to machine check safe
>>    arm64: add cow to machine check safe
>>    arm64: add pagecache reading to machine check safe
>>
>>   arch/arm64/Kconfig                   |  1 +
>>   arch/arm64/include/asm/asm-extable.h | 25 +++++++
>>   arch/arm64/include/asm/asm-uaccess.h | 16 +++++
>>   arch/arm64/include/asm/esr.h         |  5 ++
>>   arch/arm64/include/asm/extable.h     |  2 +-
>>   arch/arm64/include/asm/page.h        | 10 +++
>>   arch/arm64/include/asm/uaccess.h     | 17 ++++-
>>   arch/arm64/kernel/probes/kprobes.c   |  2 +-
>>   arch/arm64/lib/Makefile              |  2 +
>>   arch/arm64/lib/copy_from_user.S      | 11 ++--
>>   arch/arm64/lib/copy_page_mc.S        | 98 ++++++++++++++++++++++++++++
>>   arch/arm64/lib/copy_to_user_mc.S     | 78 ++++++++++++++++++++++
>>   arch/arm64/mm/copypage.c             | 36 ++++++++--
>>   arch/arm64/mm/extable.c              | 21 +++++-
>>   arch/arm64/mm/fault.c                | 30 ++++++++-
>>   arch/x86/include/asm/uaccess.h       |  1 +
>>   include/linux/highmem.h              |  8 +++
>>   include/linux/uaccess.h              |  8 +++
>>   include/linux/uio.h                  |  9 ++-
>>   lib/iov_iter.c                       | 85 +++++++++++++++++++-----
>>   mm/memory.c                          |  2 +-
>>   21 files changed, 432 insertions(+), 35 deletions(-)
>>   create mode 100644 arch/arm64/lib/copy_page_mc.S
>>   create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>
>> -- 
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe
  2022-04-06 10:58   ` Mark Rutland
@ 2022-04-07 14:26     ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-07 14:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	james.morse



在 2022/4/6 18:58, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:07AM +0000, Tong Tiangen wrote:
>> In arm64 kernel hardware memory errors process(do_sea()), if the errors
>> is consumed in the kernel, the current processing is panic. However,
>> it is not optimal. In some case, the page accessed in kernel is a user
>> page (such as copy_from_user/get_user), kill the user process and
>> isolate the user page with hardware memory errors is a better choice.
>>
>> Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.
> 
> Why do we need new helpers for this, rather than doing this for *any* uaccess >
> I understand this is consistent with PPC & X86, but *why* is it done that way
> today? e.g. are there cases where we access memroy where we do not expect the
> situation to be recoverable?

Here[1] i explain why not all uaccess needs to be recoverable.

[1]https://lore.kernel.org/lkml/bdf67ff6-9eb8-c2b4-2c2f-b160d4f879cd@huawei.com/

> 
>> This patch only enable machine error check framework, it add exception
>> fixup before kernel panic in do_sea() and only limit the consumption of
>> hardware memory errors in kernel mode triggered by user mode processes.
>> If fixup successful, there is no need to panic.
>>
>> Also add _asm_extable_mc macro used for add extable entry to help
>> fixup.
>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>> ---
>>   arch/arm64/Kconfig                   |  1 +
>>   arch/arm64/include/asm/asm-extable.h | 13 ++++++++++++
>>   arch/arm64/include/asm/esr.h         |  5 +++++
>>   arch/arm64/include/asm/extable.h     |  2 +-
>>   arch/arm64/kernel/probes/kprobes.c   |  2 +-
>>   arch/arm64/mm/extable.c              | 20 ++++++++++++++++++-
>>   arch/arm64/mm/fault.c                | 30 +++++++++++++++++++++++++++-
>>   include/linux/uaccess.h              |  8 ++++++++
>>   8 files changed, 77 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index d9325dd95eba..012e38309955 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -19,6 +19,7 @@ config ARM64
>>   	select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>>   	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>>   	select ARCH_HAS_CACHE_LINE_SIZE
>> +	select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
>>   	select ARCH_HAS_CURRENT_STACK_POINTER
>>   	select ARCH_HAS_DEBUG_VIRTUAL
>>   	select ARCH_HAS_DEBUG_VM_PGTABLE
>> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
>> index c39f2437e08e..74d1db74fd86 100644
>> --- a/arch/arm64/include/asm/asm-extable.h
>> +++ b/arch/arm64/include/asm/asm-extable.h
>> @@ -8,6 +8,11 @@
>>   #define EX_TYPE_UACCESS_ERR_ZERO	3
>>   #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD	4
>>   
>> +/* _MC indicates that can fixup from machine check errors */
>> +#define EX_TYPE_FIXUP_MC		5
>> +
>> +#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
> 
> If we need this, I'd strongly prefer that we have a EX_TYPE_UACCESS_MC or
> EX_TYPE_UACCESS_MC_ERR_ZERO for the uaccess cases, so that we can clearly
> distinguish those from non-uaccess cases.

Agreed.

> 
> AFAICT the only remaining raw EX_TYPE_FIXUP cases we have today are in some
> cache maintenance routines, and we should be able to convert those to a new
> EX_TYPE_FIXUP_UACCESS, or EX_TYPE_UACCESS_ERR_ZERO.

I will redesign this part. The general idea is not to use x0 to save
exception information. At that time, it may be merged into raw
EX_TYPE_FIXUP processing.
> 
>> +
>>   #ifdef __ASSEMBLY__
>>   
>>   #define __ASM_EXTABLE_RAW(insn, fixup, type, data)	\
>> @@ -27,6 +32,14 @@
>>   	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
>>   	.endm
>>   
>> +/*
>> + * Create an exception table entry for `insn`, which will branch to `fixup`
>> + * when an unhandled fault(include sea fault) is taken.
>> + */
>> +	.macro		_asm_extable_mc, insn, fixup
>> +	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP_MC, 0)
>> +	.endm
>> +
>>   /*
>>    * Create an exception table entry for `insn` if `fixup` is provided. Otherwise
>>    * do nothing.
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index d52a0b269ee8..11fcfc002654 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -330,6 +330,11 @@
>>   #ifndef __ASSEMBLY__
>>   #include <asm/types.h>
>>   
>> +static inline bool esr_is_sea(u32 esr)
>> +{
>> +	return (esr & ESR_ELx_FSC) == ESR_ELx_FSC_EXTABT;
>> +}
>> +
>>   static inline bool esr_is_data_abort(u32 esr)
>>   {
>>   	const u32 ec = ESR_ELx_EC(esr);
>> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
>> index 72b0e71cc3de..f7835b0f473b 100644
>> --- a/arch/arm64/include/asm/extable.h
>> +++ b/arch/arm64/include/asm/extable.h
>> @@ -45,5 +45,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>>   }
>>   #endif /* !CONFIG_BPF_JIT */
>>   
>> -bool fixup_exception(struct pt_regs *regs);
>> +bool fixup_exception(struct pt_regs *regs, unsigned int esr);
>>   #endif
>> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
>> index d9dfa82c1f18..16a069e8eec3 100644
>> --- a/arch/arm64/kernel/probes/kprobes.c
>> +++ b/arch/arm64/kernel/probes/kprobes.c
>> @@ -285,7 +285,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
>>   		 * In case the user-specified fault handler returned
>>   		 * zero, try to fix up.
>>   		 */
>> -		if (fixup_exception(regs))
>> +		if (fixup_exception(regs, fsr))
>>   			return 1;
>>   	}
>>   	return 0;
>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>> index 489455309695..f1134c88e849 100644
>> --- a/arch/arm64/mm/extable.c
>> +++ b/arch/arm64/mm/extable.c
>> @@ -9,6 +9,7 @@
>>   
>>   #include <asm/asm-extable.h>
>>   #include <asm/ptrace.h>
>> +#include <asm/esr.h>
>>   
>>   static inline unsigned long
>>   get_ex_fixup(const struct exception_table_entry *ex)
>> @@ -23,6 +24,18 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex,
>>   	return true;
>>   }
>>   
>> +static bool ex_handler_fixup_mc(const struct exception_table_entry *ex,
>> +				struct pt_regs *regs, unsigned int esr)
>> +{
>> +	if (esr_is_sea(esr))
>> +		regs->regs[0] = 0;
>> +	else
>> +		regs->regs[0] = 1;
> 
> This needs more explanation.
> 
> Why does this hard-code an assumption that we can alter x0?
> 
> Why is the x0 value distinct for SEA or non-SEA? What is this meant to
> represent specifically?
> 
> What if this SEA was taken for a reason other than a memory error?

There is a problem with the design of this place and it needs to be
modified. The value in x0 cannot be overwritten. will be fixed in next
version.

> 
>> +
>> +	regs->pc = get_ex_fixup(ex);
>> +	return true;
>> +}
>> +
>>   static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
>>   					struct pt_regs *regs)
>>   {
>> @@ -63,7 +76,7 @@ ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex,
>>   	return true;
>>   }
>>   
>> -bool fixup_exception(struct pt_regs *regs)
>> +bool fixup_exception(struct pt_regs *regs, unsigned int esr)
>>   {
>>   	const struct exception_table_entry *ex;
>>   
>> @@ -71,9 +84,14 @@ bool fixup_exception(struct pt_regs *regs)
>>   	if (!ex)
>>   		return false;
>>   
>> +	if (esr_is_sea(esr) && !IS_EX_TYPE_MC(ex->type))
>> +		return false;
> 
> I don't think this check belongs here.
> 
> Either this should be folded into ex_handler_fixup_mc(), or we should make the
> judgement earlier in the fault handling path, and have a separate
> fixup_exception_mc() that we can call specifically in the case of a memory
> error.

Agreed, Maybe it's better to use a separate fixup_exception_mc().

> 
>> +
>>   	switch (ex->type) {
>>   	case EX_TYPE_FIXUP:
>>   		return ex_handler_fixup(ex, regs);
>> +	case EX_TYPE_FIXUP_MC:
>> +		return ex_handler_fixup_mc(ex, regs, esr);
>>   	case EX_TYPE_BPF:
>>   		return ex_handler_bpf(ex, regs);
>>   	case EX_TYPE_UACCESS_ERR_ZERO:
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 77341b160aca..ffdfab2fdd60 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -361,7 +361,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>>   	 * Are we prepared to handle this kernel fault?
>>   	 * We are almost certainly not prepared to handle instruction faults.
>>   	 */
>> -	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
>> +	if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
>>   		return;
>>   
>>   	if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
>> @@ -695,6 +695,30 @@ static int do_bad(unsigned long far, unsigned int esr, struct pt_regs *regs)
>>   	return 1; /* "fault" */
>>   }
>>   
>> +static bool arm64_process_kernel_sea(unsigned long addr, unsigned int esr,
>> +				     struct pt_regs *regs, int sig, int code)
>> +{
>> +	if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>> +		return false;
>> +
>> +	if (user_mode(regs) || !current->mm)
>> +		return false;
>> +
>> +	if (apei_claim_sea(regs) < 0)
>> +		return false;
>> +
>> +	current->thread.fault_address = 0;
>> +	current->thread.fault_code = esr;
>> +
>> +	if (!fixup_exception(regs, esr))
>> +		return false;
>> +
>> +	arm64_force_sig_fault(sig, code, addr,
>> +		"Uncorrected hardware memory error in kernel-access\n");
>> +
>> +	return true;
>> +}
>> +
>>   static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>>   {
>>   	const struct fault_info *inf;
>> @@ -720,6 +744,10 @@ static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>>   		 */
>>   		siaddr  = untagged_addr(far);
>>   	}
>> +
>> +	if (arm64_process_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
>> +		return 0;
>> +
>>   	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>>   
>>   	return 0;
>> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
>> index 546179418ffa..dd952aeecdc1 100644
>> --- a/include/linux/uaccess.h
>> +++ b/include/linux/uaccess.h
>> @@ -174,6 +174,14 @@ copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
>>   }
>>   #endif
>>   
>> +#ifndef copy_mc_to_user
>> +static inline unsigned long __must_check
>> +copy_mc_to_user(void *dst, const void *src, size_t cnt)
>> +{
>> +	return raw_copy_to_user(dst, src, cnt);
>> +}
> 
> ... this isn't using the new EX_TYPE_FIXUP_MC type, so isn't this just broken
> as of this patch?
> 
> Thanks,
> Mark.

It is useful in the later patch (adding specific scenarios to use this
framework), here just add helper.

Great thanks,
Tong.
> 
>> +#endif
>> +
>>   static __always_inline void pagefault_disabled_inc(void)
>>   {
>>   	current->pagefault_disabled++;
>> -- 
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe
  2022-04-06 11:19   ` Mark Rutland
@ 2022-04-07 14:28     ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-07 14:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm



在 2022/4/6 19:19, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:08AM +0000, Tong Tiangen wrote:
>> Add scenarios copy_from_user to machine check safe.
>>
>> The data copied is user data and is machine check safe, so just kill
>> the user process and isolate the error page, not necessary panic.
>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>> ---
>>   arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++
>>   arch/arm64/lib/copy_from_user.S      | 11 ++++++-----
>>   2 files changed, 22 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
>> index 0557af834e03..f31c8978e1af 100644
>> --- a/arch/arm64/include/asm/asm-uaccess.h
>> +++ b/arch/arm64/include/asm/asm-uaccess.h
>> @@ -92,4 +92,20 @@ alternative_else_nop_endif
>>   
>>   		_asm_extable	8888b,\l;
>>   	.endm
>> +
>> +	.macro user_ldp_mc l, reg1, reg2, addr, post_inc
>> +8888:		ldtr	\reg1, [\addr];
>> +8889:		ldtr	\reg2, [\addr, #8];
>> +		add	\addr, \addr, \post_inc;
>> +
>> +		_asm_extable_mc	8888b, \l;
>> +		_asm_extable_mc	8889b, \l;
>> +	.endm
>> +
>> +	.macro user_ldst_mc l, inst, reg, addr, post_inc
>> +8888:		\inst		\reg, [\addr];
>> +		add		\addr, \addr, \post_inc;
>> +
>> +		_asm_extable_mc	8888b, \l;
>> +	.endm
>>   #endif
>> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
>> index 34e317907524..d9d7c5291871 100644
>> --- a/arch/arm64/lib/copy_from_user.S
>> +++ b/arch/arm64/lib/copy_from_user.S
>> @@ -21,7 +21,7 @@
>>    */
>>   
>>   	.macro ldrb1 reg, ptr, val
>> -	user_ldst 9998f, ldtrb, \reg, \ptr, \val
>> +	user_ldst_mc 9998f, ldtrb, \reg, \ptr, \val
>>   	.endm
>>   
>>   	.macro strb1 reg, ptr, val
>> @@ -29,7 +29,7 @@
>>   	.endm
>>   
>>   	.macro ldrh1 reg, ptr, val
>> -	user_ldst 9997f, ldtrh, \reg, \ptr, \val
>> +	user_ldst_mc 9997f, ldtrh, \reg, \ptr, \val
>>   	.endm
>>   
>>   	.macro strh1 reg, ptr, val
>> @@ -37,7 +37,7 @@
>>   	.endm
>>   
>>   	.macro ldr1 reg, ptr, val
>> -	user_ldst 9997f, ldtr, \reg, \ptr, \val
>> +	user_ldst_mc 9997f, ldtr, \reg, \ptr, \val
>>   	.endm
>>   
>>   	.macro str1 reg, ptr, val
>> @@ -45,7 +45,7 @@
>>   	.endm
>>   
>>   	.macro ldp1 reg1, reg2, ptr, val
>> -	user_ldp 9997f, \reg1, \reg2, \ptr, \val
>> +	user_ldp_mc 9997f, \reg1, \reg2, \ptr, \val
>>   	.endm
>>   
>>   	.macro stp1 reg1, reg2, ptr, val
>> @@ -62,7 +62,8 @@ SYM_FUNC_START(__arch_copy_from_user)
>>   	ret
>>   
>>   	// Exception fixups
>> -9997:	cmp	dst, dstin
>> +9997:	cbz	x0, 9998f			// Check machine check exception
>> +	cmp	dst, dstin
>>   	b.ne	9998f
> 
> If you look at the copy template, you'd see that `dstin` *is* x0.
> 
> Consier if we took a non-SEA fault. The the fixup handler will overwrite x0,
> it's likely `dst` != `dstin`, and we'll branch to the byte-by-byte copy. Or if
> we're doing something odd and mmap_min_addr is 0, we can do the wrong thing the
> other way around and *not* branch to the byte-by-byte copy when we should.
> 
> So this is at best confusing, but likely broken too.
> 
> Thanks,
> Mark.

OK, missing that, will be fixed in next verison.

Thanks,
Tong.

> 
>>   	// Before being absolutely sure we couldn't copy anything, try harder
>>   USER(9998f, ldtrb tmp1w, [srcin])
>> -- 
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 5/7] arm64: add get_user to machine check safe
  2022-04-06 11:22   ` Mark Rutland
@ 2022-04-07 14:38     ` Tong Tiangen
  2022-04-08 15:22       ` Mark Rutland
  0 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-07 14:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm



在 2022/4/6 19:22, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
>> Add scenarios get_user to machine check safe. The processing of
>> EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
>> and both return -EFAULT.
> 
> Which uaccess cases do we expect to *not* be recoverable?
> 
> Naively I would assume that if we're going to treat a memory error on a uaccess
> as fatal to userspace we should be able to do that for *any* uacesses.
> 
> The commit message should explain why we need the distinction between a
> recoverable uaccess and a non-recoverable uaccess.
> 
> Thanks,
> Mark.
> 

Currently, any memory error consumed in kernel mode will lead to panic
(do_sea()).

My idea is that not all memory errors consumed in kernel mode are fatal,
such as copy_ from_ user/get_ user is a memory error consumed when
reading user data in the process context. In this case, we can not let 
the kernel panic, just kill the process without affecting the operation
of the system.

However, not all uaccess can be recovered without affecting the normal
operation of the system. The key is not whether it is uaccess, but 
whether there are key data affecting the normal operation of the system 
in the read page.

Thanks,
Tong.


>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>> ---
>>   arch/arm64/include/asm/asm-extable.h | 14 +++++++++++++-
>>   arch/arm64/include/asm/uaccess.h     |  2 +-
>>   arch/arm64/mm/extable.c              |  1 +
>>   3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
>> index 74d1db74fd86..bfc2d224cbae 100644
>> --- a/arch/arm64/include/asm/asm-extable.h
>> +++ b/arch/arm64/include/asm/asm-extable.h
>> @@ -10,8 +10,11 @@
>>   
>>   /* _MC indicates that can fixup from machine check errors */
>>   #define EX_TYPE_FIXUP_MC		5
>> +#define EX_TYPE_UACCESS_ERR_ZERO_MC	6
>>   
>> -#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
>> +#define IS_EX_TYPE_MC(type)			\
>> +	(type == EX_TYPE_FIXUP_MC ||		\
>> +	 type == EX_TYPE_UACCESS_ERR_ZERO_MC)
>>   
>>   #ifdef __ASSEMBLY__
>>   
>> @@ -77,6 +80,15 @@
>>   #define EX_DATA_REG(reg, gpr)						\
>>   	"((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
>>   
>> +#define _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(insn, fixup, err, zero)		\
>> +	__DEFINE_ASM_GPR_NUMS							\
>> +	__ASM_EXTABLE_RAW(#insn, #fixup,					\
>> +			  __stringify(EX_TYPE_UACCESS_ERR_ZERO_MC),		\
>> +			  "("							\
>> +			    EX_DATA_REG(ERR, err) " | "				\
>> +			    EX_DATA_REG(ZERO, zero)				\
>> +			  ")")
>> +
>>   #define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero)		\
>>   	__DEFINE_ASM_GPR_NUMS						\
>>   	__ASM_EXTABLE_RAW(#insn, #fixup, 				\
>> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
>> index e8dce0cc5eaa..24b662407fbd 100644
>> --- a/arch/arm64/include/asm/uaccess.h
>> +++ b/arch/arm64/include/asm/uaccess.h
>> @@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
>>   	asm volatile(							\
>>   	"1:	" load "	" reg "1, [%2]\n"			\
>>   	"2:\n"								\
>> -	_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1)			\
>> +	_ASM_EXTABLE_UACCESS_ERR_ZERO_MC(1b, 2b, %w0, %w1)		\
>>   	: "+r" (err), "=&r" (x)						\
>>   	: "r" (addr))
>>   
>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>> index f1134c88e849..7c05f8d2bce0 100644
>> --- a/arch/arm64/mm/extable.c
>> +++ b/arch/arm64/mm/extable.c
>> @@ -95,6 +95,7 @@ bool fixup_exception(struct pt_regs *regs, unsigned int esr)
>>   	case EX_TYPE_BPF:
>>   		return ex_handler_bpf(ex, regs);
>>   	case EX_TYPE_UACCESS_ERR_ZERO:
>> +	case EX_TYPE_UACCESS_ERR_ZERO_MC:
>>   		return ex_handler_uaccess_err_zero(ex, regs);
>>   	case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
>>   		return ex_handler_load_unaligned_zeropad(ex, regs);
>> -- 
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-06 11:27   ` Mark Rutland
@ 2022-04-07 14:56     ` Tong Tiangen
  2022-04-07 15:53       ` Robin Murphy
  0 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-07 14:56 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm



在 2022/4/6 19:27, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>> When user process reading file, the data is cached in pagecache and
>> the data belongs to the user process, When machine check error is
>> encountered during pagecache reading, killing the user process and
>> isolate the user page with hardware memory errors is a more reasonable
>> choice than kernel panic.
>>
>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>> is __arch_copy_mc_to_user() add the extable entry to support machine
>> check safe.
> 
> As with prior patches, *why* is the distinction necessary?
> 
> This patch adds a bunch of conditional logic, but *structurally* it doesn't
> alter the handling to be substantially different for the MC and non-MC cases.
> 
> This seems like pointless duplication that just makes it harder to maintain
> this code.
> 
> Thanks,
> Mark.

Agreed, The implementation here looks a little ugly and harder to maintain.

The purpose of my doing this is not all copy_to_user can be recovered.

A memory error is consumed when reading pagecache using copy_to_user.
I think in this scenario, only the process is affected because it can't read
pagecache data correctly. Just kill the process and don't need the whole
kernel panic.

So I need two different copy_to_user implementation, one is existing 
__arch_copy_to_user,
this function will panic when consuming memory errors. The other one is 
this new helper
__arch_copy_mc_to_user, this interface is used when reading pagecache. 
It can recover from
consume memory error.

In future, if find a scenario use copy_to_user can also be recovered, we 
can also use this mc
safe helper instead it.

Thanks,
Tong.

> 
>> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
>> which is used by pagecache reading.
>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>> ---
>>   arch/arm64/include/asm/uaccess.h | 15 ++++++
>>   arch/arm64/lib/Makefile          |  2 +-
>>   arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>>   include/linux/uio.h              |  9 +++-
>>   lib/iov_iter.c                   | 85 +++++++++++++++++++++++++-------
>>   5 files changed, 170 insertions(+), 19 deletions(-)
>>   create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>
>> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
>> index 24b662407fbd..f0d5e811165a 100644
>> --- a/arch/arm64/include/asm/uaccess.h
>> +++ b/arch/arm64/include/asm/uaccess.h
>> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count);
>>   
>>   extern __must_check long strnlen_user(const char __user *str, long n);
>>   
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +extern unsigned long __must_check __arch_copy_mc_to_user(void __user *to,
>> +							 const void *from, unsigned long n);
>> +static inline unsigned long __must_check
>> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
>> +{
>> +	uaccess_ttbr0_enable();
>> +	n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
>> +	uaccess_ttbr0_disable();
>> +
>> +	return n;
>> +}
>> +#define copy_mc_to_user copy_mc_to_user
>> +#endif
>> +
>>   #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>>   struct page;
>>   void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t len);
>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>> index 29c578414b12..9b3571227fb4 100644
>> --- a/arch/arm64/lib/Makefile
>> +++ b/arch/arm64/lib/Makefile
>> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>>   
>>   obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>>   
>> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
>> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
>> diff --git a/arch/arm64/lib/copy_to_user_mc.S b/arch/arm64/lib/copy_to_user_mc.S
>> new file mode 100644
>> index 000000000000..9d228ff15446
>> --- /dev/null
>> +++ b/arch/arm64/lib/copy_to_user_mc.S
>> @@ -0,0 +1,78 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2012 ARM Ltd.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> +#include <asm/asm-uaccess.h>
>> +#include <asm/assembler.h>
>> +#include <asm/cache.h>
>> +
>> +/*
>> + * Copy to user space from a kernel buffer (alignment handled by the hardware)
>> + *
>> + * Parameters:
>> + *	x0 - to
>> + *	x1 - from
>> + *	x2 - n
>> + * Returns:
>> + *	x0 - bytes not copied
>> + */
>> +	.macro ldrb1 reg, ptr, val
>> +	1000: ldrb  \reg, [\ptr], \val;
>> +	_asm_extable_mc 1000b, 9998f;
>> +	.endm
>> +
>> +	.macro strb1 reg, ptr, val
>> +	user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
>> +	.endm
>> +
>> +	.macro ldrh1 reg, ptr, val
>> +	1001: ldrh  \reg, [\ptr], \val;
>> +	_asm_extable_mc 1001b, 9998f;
>> +	.endm
>> +
>> +	.macro strh1 reg, ptr, val
>> +	user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
>> +	.endm
>> +
>> +	.macro ldr1 reg, ptr, val
>> +	1002: ldr \reg, [\ptr], \val;
>> +	_asm_extable_mc 1002b, 9998f;
>> +	.endm
>> +
>> +	.macro str1 reg, ptr, val
>> +	user_ldst_mc 9997f, sttr, \reg, \ptr, \val
>> +	.endm
>> +
>> +	.macro ldp1 reg1, reg2, ptr, val
>> +	1003: ldp \reg1, \reg2, [\ptr], \val;
>> +	_asm_extable_mc 1003b, 9998f;
>> +	.endm
>> +
>> +	.macro stp1 reg1, reg2, ptr, val
>> +	user_stp 9997f, \reg1, \reg2, \ptr, \val
>> +	.endm
>> +
>> +end	.req	x5
>> +srcin	.req	x15
>> +SYM_FUNC_START(__arch_copy_mc_to_user)
>> +	add	end, x0, x2
>> +	mov	srcin, x1
>> +#include "copy_template.S"
>> +	mov	x0, #0
>> +	ret
>> +
>> +	// Exception fixups
>> +9997:	cbz	x0, 9998f			// Check machine check exception
>> +	cmp	dst, dstin
>> +	b.ne	9998f
>> +	// Before being absolutely sure we couldn't copy anything, try harder
>> +	ldrb	tmp1w, [srcin]
>> +USER(9998f, sttrb tmp1w, [dst])
>> +	add	dst, dst, #1
>> +9998:	sub	x0, end, dst			// bytes not copied
>> +	ret
>> +SYM_FUNC_END(__arch_copy_mc_to_user)
>> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>> index 739285fe5a2f..539d9ee9b032 100644
>> --- a/include/linux/uio.h
>> +++ b/include/linux/uio.h
>> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
>>   size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>>   size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
>>   
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> +			    struct iov_iter *i);
>> +#else
>> +#define copy_mc_page_to_iter copy_page_to_iter
>> +#endif
>> +
>>   static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
>>   		size_t bytes, struct iov_iter *i)
>>   {
>> -	return copy_page_to_iter(&folio->page, offset, bytes, i);
>> +	return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>>   }
>>   
>>   static __always_inline __must_check
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index 6dd5330f7a99..2c5f3bb6391d 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void *from, size_t n)
>>   	return n;
>>   }
>>   
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +static int copyout_mc(void __user *to, const void *from, size_t n)
>> +{
>> +	if (access_ok(to, n)) {
>> +		instrument_copy_to_user(to, from, n);
>> +		n = copy_mc_to_user((__force void *) to, from, n);
>> +	}
>> +	return n;
>> +}
>> +#else
>> +#define copyout_mc copyout
>> +#endif
>> +
>>   static int copyin(void *to, const void __user *from, size_t n)
>>   {
>>   	if (should_fail_usercopy())
>> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user *from, size_t n)
>>   }
>>   
>>   static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
>> -			 struct iov_iter *i)
>> +			 struct iov_iter *i, bool mc_safe)
>>   {
>>   	size_t skip, copy, left, wanted;
>>   	const struct iovec *iov;
>> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>>   		from = kaddr + offset;
>>   
>>   		/* first chunk, usually the only one */
>> -		left = copyout(buf, from, copy);
>> +		if (mc_safe)
>> +			left = copyout_mc(buf, from, copy);
>> +		else
>> +			left = copyout(buf, from, copy);
>>   		copy -= left;
>>   		skip += copy;
>>   		from += copy;
>> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>>   			iov++;
>>   			buf = iov->iov_base;
>>   			copy = min(bytes, iov->iov_len);
>> -			left = copyout(buf, from, copy);
>> +			if (mc_safe)
>> +				left = copyout_mc(buf, from, copy);
>> +			else
>> +				left = copyout(buf, from, copy);
>>   			copy -= left;
>>   			skip = copy;
>>   			from += copy;
>> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>>   
>>   	kaddr = kmap(page);
>>   	from = kaddr + offset;
>> -	left = copyout(buf, from, copy);
>> +	if (mc_safe)
>> +		left = copyout_mc(buf, from, copy);
>> +	else
>> +		left = copyout(buf, from, copy);
>>   	copy -= left;
>>   	skip += copy;
>>   	from += copy;
>> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>>   		iov++;
>>   		buf = iov->iov_base;
>>   		copy = min(bytes, iov->iov_len);
>> -		left = copyout(buf, from, copy);
>> +		if (mc_safe)
>> +			left = copyout_mc(buf, from, copy);
>> +		else
>> +			left = copyout(buf, from, copy);
>>   		copy -= left;
>>   		skip = copy;
>>   		from += copy;
>> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
>>   EXPORT_SYMBOL(_copy_to_iter);
>>   
>>   #ifdef CONFIG_ARCH_HAS_COPY_MC
>> -static int copyout_mc(void __user *to, const void *from, size_t n)
>> -{
>> -	if (access_ok(to, n)) {
>> -		instrument_copy_to_user(to, from, n);
>> -		n = copy_mc_to_user((__force void *) to, from, n);
>> -	}
>> -	return n;
>> -}
>> -
>>   static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>>   				struct iov_iter *i)
>>   {
>> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
>>   }
>>   
>>   static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> -			 struct iov_iter *i)
>> +			 struct iov_iter *i, bool mc_safe)
>>   {
>>   	if (likely(iter_is_iovec(i)))
>> -		return copy_page_to_iter_iovec(page, offset, bytes, i);
>> +		return copy_page_to_iter_iovec(page, offset, bytes, i, mc_safe);
>>   	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
>>   		void *kaddr = kmap_local_page(page);
>>   		size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
>> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>>   	offset %= PAGE_SIZE;
>>   	while (1) {
>>   		size_t n = __copy_page_to_iter(page, offset,
>> -				min(bytes, (size_t)PAGE_SIZE - offset), i);
>> +				min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>>   		res += n;
>>   		bytes -= n;
>>   		if (!bytes || !n)
>> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>>   }
>>   EXPORT_SYMBOL(copy_page_to_iter);
>>   
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +/**
>> + * copy_mc_page_to_iter - copy page to iter with source memory error exception handling.
>> + *
>> + * The filemap_read deploys this for pagecache reading and the main differences between
>> + * this and typical copy_page_to_iter() is call __copy_page_to_iter with mc_safe true.
>> + *
>> + * Return: number of bytes copied (may be %0)
>> + */
>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> +			 struct iov_iter *i)
>> +{
>> +	size_t res = 0;
>> +
>> +	if (unlikely(!page_copy_sane(page, offset, bytes)))
>> +		return 0;
>> +	page += offset / PAGE_SIZE; // first subpage
>> +	offset %= PAGE_SIZE;
>> +	while (1) {
>> +		size_t n = __copy_page_to_iter(page, offset,
>> +				min(bytes, (size_t)PAGE_SIZE - offset), i, true);
>> +		res += n;
>> +		bytes -= n;
>> +		if (!bytes || !n)
>> +			break;
>> +		offset += n;
>> +		if (offset == PAGE_SIZE) {
>> +			page++;
>> +			offset = 0;
>> +		}
>> +	}
>> +	return res;
>> +}
>> +#endif
>> +
>>   size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
>>   			 struct iov_iter *i)
>>   {
>> -- 
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-07 14:56     ` Tong Tiangen
@ 2022-04-07 15:53       ` Robin Murphy
  2022-04-08  2:43         ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Robin Murphy @ 2022-04-07 15:53 UTC (permalink / raw)
  To: Tong Tiangen, Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm

On 2022-04-07 15:56, Tong Tiangen wrote:
> 
> 
> 在 2022/4/6 19:27, Mark Rutland 写道:
>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>> When user process reading file, the data is cached in pagecache and
>>> the data belongs to the user process, When machine check error is
>>> encountered during pagecache reading, killing the user process and
>>> isolate the user page with hardware memory errors is a more reasonable
>>> choice than kernel panic.
>>>
>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>> check safe.
>>
>> As with prior patches, *why* is the distinction necessary?
>>
>> This patch adds a bunch of conditional logic, but *structurally* it 
>> doesn't
>> alter the handling to be substantially different for the MC and non-MC 
>> cases.
>>
>> This seems like pointless duplication that just makes it harder to 
>> maintain
>> this code.
>>
>> Thanks,
>> Mark.
> 
> Agreed, The implementation here looks a little ugly and harder to maintain.
> 
> The purpose of my doing this is not all copy_to_user can be recovered.
> 
> A memory error is consumed when reading pagecache using copy_to_user.
> I think in this scenario, only the process is affected because it can't 
> read
> pagecache data correctly. Just kill the process and don't need the whole
> kernel panic.
> 
> So I need two different copy_to_user implementation, one is existing 
> __arch_copy_to_user,
> this function will panic when consuming memory errors. The other one is 
> this new helper
> __arch_copy_mc_to_user, this interface is used when reading pagecache. 
> It can recover from
> consume memory error.

OK, but do we really need two almost-identical implementations of every 
function where the only difference is how the exception table entries 
are annotated? Could the exception handler itself just figure out who 
owns the page where the fault occurred and decide what action to take as 
appropriate?

Robin.

> 
> In future, if find a scenario use copy_to_user can also be recovered, we 
> can also use this mc
> safe helper instead it.
> 
> Thanks,
> Tong.
> 
>>
>>> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
>>> which is used by pagecache reading.
>>>
>>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>>> ---
>>>   arch/arm64/include/asm/uaccess.h | 15 ++++++
>>>   arch/arm64/lib/Makefile          |  2 +-
>>>   arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>>>   include/linux/uio.h              |  9 +++-
>>>   lib/iov_iter.c                   | 85 +++++++++++++++++++++++++-------
>>>   5 files changed, 170 insertions(+), 19 deletions(-)
>>>   create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>>
>>> diff --git a/arch/arm64/include/asm/uaccess.h 
>>> b/arch/arm64/include/asm/uaccess.h
>>> index 24b662407fbd..f0d5e811165a 100644
>>> --- a/arch/arm64/include/asm/uaccess.h
>>> +++ b/arch/arm64/include/asm/uaccess.h
>>> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const 
>>> char __user *src, long count);
>>>   extern __must_check long strnlen_user(const char __user *str, long n);
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +extern unsigned long __must_check __arch_copy_mc_to_user(void __user 
>>> *to,
>>> +                             const void *from, unsigned long n);
>>> +static inline unsigned long __must_check
>>> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
>>> +{
>>> +    uaccess_ttbr0_enable();
>>> +    n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
>>> +    uaccess_ttbr0_disable();
>>> +
>>> +    return n;
>>> +}
>>> +#define copy_mc_to_user copy_mc_to_user
>>> +#endif
>>> +
>>>   #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>>>   struct page;
>>>   void memcpy_page_flushcache(char *to, struct page *page, size_t 
>>> offset, size_t len);
>>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>>> index 29c578414b12..9b3571227fb4 100644
>>> --- a/arch/arm64/lib/Makefile
>>> +++ b/arch/arm64/lib/Makefile
>>> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>>>   obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>>> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
>>> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
>>> diff --git a/arch/arm64/lib/copy_to_user_mc.S 
>>> b/arch/arm64/lib/copy_to_user_mc.S
>>> new file mode 100644
>>> index 000000000000..9d228ff15446
>>> --- /dev/null
>>> +++ b/arch/arm64/lib/copy_to_user_mc.S
>>> @@ -0,0 +1,78 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2012 ARM Ltd.
>>> + */
>>> +
>>> +#include <linux/linkage.h>
>>> +
>>> +#include <asm/asm-uaccess.h>
>>> +#include <asm/assembler.h>
>>> +#include <asm/cache.h>
>>> +
>>> +/*
>>> + * Copy to user space from a kernel buffer (alignment handled by the 
>>> hardware)
>>> + *
>>> + * Parameters:
>>> + *    x0 - to
>>> + *    x1 - from
>>> + *    x2 - n
>>> + * Returns:
>>> + *    x0 - bytes not copied
>>> + */
>>> +    .macro ldrb1 reg, ptr, val
>>> +    1000: ldrb  \reg, [\ptr], \val;
>>> +    _asm_extable_mc 1000b, 9998f;
>>> +    .endm
>>> +
>>> +    .macro strb1 reg, ptr, val
>>> +    user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
>>> +    .endm
>>> +
>>> +    .macro ldrh1 reg, ptr, val
>>> +    1001: ldrh  \reg, [\ptr], \val;
>>> +    _asm_extable_mc 1001b, 9998f;
>>> +    .endm
>>> +
>>> +    .macro strh1 reg, ptr, val
>>> +    user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
>>> +    .endm
>>> +
>>> +    .macro ldr1 reg, ptr, val
>>> +    1002: ldr \reg, [\ptr], \val;
>>> +    _asm_extable_mc 1002b, 9998f;
>>> +    .endm
>>> +
>>> +    .macro str1 reg, ptr, val
>>> +    user_ldst_mc 9997f, sttr, \reg, \ptr, \val
>>> +    .endm
>>> +
>>> +    .macro ldp1 reg1, reg2, ptr, val
>>> +    1003: ldp \reg1, \reg2, [\ptr], \val;
>>> +    _asm_extable_mc 1003b, 9998f;
>>> +    .endm
>>> +
>>> +    .macro stp1 reg1, reg2, ptr, val
>>> +    user_stp 9997f, \reg1, \reg2, \ptr, \val
>>> +    .endm
>>> +
>>> +end    .req    x5
>>> +srcin    .req    x15
>>> +SYM_FUNC_START(__arch_copy_mc_to_user)
>>> +    add    end, x0, x2
>>> +    mov    srcin, x1
>>> +#include "copy_template.S"
>>> +    mov    x0, #0
>>> +    ret
>>> +
>>> +    // Exception fixups
>>> +9997:    cbz    x0, 9998f            // Check machine check exception
>>> +    cmp    dst, dstin
>>> +    b.ne    9998f
>>> +    // Before being absolutely sure we couldn't copy anything, try 
>>> harder
>>> +    ldrb    tmp1w, [srcin]
>>> +USER(9998f, sttrb tmp1w, [dst])
>>> +    add    dst, dst, #1
>>> +9998:    sub    x0, end, dst            // bytes not copied
>>> +    ret
>>> +SYM_FUNC_END(__arch_copy_mc_to_user)
>>> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
>>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>>> index 739285fe5a2f..539d9ee9b032 100644
>>> --- a/include/linux/uio.h
>>> +++ b/include/linux/uio.h
>>> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t 
>>> bytes, struct iov_iter *i);
>>>   size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>>>   size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct 
>>> iov_iter *i);
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t 
>>> bytes,
>>> +                struct iov_iter *i);
>>> +#else
>>> +#define copy_mc_page_to_iter copy_page_to_iter
>>> +#endif
>>> +
>>>   static inline size_t copy_folio_to_iter(struct folio *folio, size_t 
>>> offset,
>>>           size_t bytes, struct iov_iter *i)
>>>   {
>>> -    return copy_page_to_iter(&folio->page, offset, bytes, i);
>>> +    return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>>>   }
>>>   static __always_inline __must_check
>>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>>> index 6dd5330f7a99..2c5f3bb6391d 100644
>>> --- a/lib/iov_iter.c
>>> +++ b/lib/iov_iter.c
>>> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void 
>>> *from, size_t n)
>>>       return n;
>>>   }
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +static int copyout_mc(void __user *to, const void *from, size_t n)
>>> +{
>>> +    if (access_ok(to, n)) {
>>> +        instrument_copy_to_user(to, from, n);
>>> +        n = copy_mc_to_user((__force void *) to, from, n);
>>> +    }
>>> +    return n;
>>> +}
>>> +#else
>>> +#define copyout_mc copyout
>>> +#endif
>>> +
>>>   static int copyin(void *to, const void __user *from, size_t n)
>>>   {
>>>       if (should_fail_usercopy())
>>> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user 
>>> *from, size_t n)
>>>   }
>>>   static size_t copy_page_to_iter_iovec(struct page *page, size_t 
>>> offset, size_t bytes,
>>> -             struct iov_iter *i)
>>> +             struct iov_iter *i, bool mc_safe)
>>>   {
>>>       size_t skip, copy, left, wanted;
>>>       const struct iovec *iov;
>>> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>> page *page, size_t offset, size_t b
>>>           from = kaddr + offset;
>>>           /* first chunk, usually the only one */
>>> -        left = copyout(buf, from, copy);
>>> +        if (mc_safe)
>>> +            left = copyout_mc(buf, from, copy);
>>> +        else
>>> +            left = copyout(buf, from, copy);
>>>           copy -= left;
>>>           skip += copy;
>>>           from += copy;
>>> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>> page *page, size_t offset, size_t b
>>>               iov++;
>>>               buf = iov->iov_base;
>>>               copy = min(bytes, iov->iov_len);
>>> -            left = copyout(buf, from, copy);
>>> +            if (mc_safe)
>>> +                left = copyout_mc(buf, from, copy);
>>> +            else
>>> +                left = copyout(buf, from, copy);
>>>               copy -= left;
>>>               skip = copy;
>>>               from += copy;
>>> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>> page *page, size_t offset, size_t b
>>>       kaddr = kmap(page);
>>>       from = kaddr + offset;
>>> -    left = copyout(buf, from, copy);
>>> +    if (mc_safe)
>>> +        left = copyout_mc(buf, from, copy);
>>> +    else
>>> +        left = copyout(buf, from, copy);
>>>       copy -= left;
>>>       skip += copy;
>>>       from += copy;
>>> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>> page *page, size_t offset, size_t b
>>>           iov++;
>>>           buf = iov->iov_base;
>>>           copy = min(bytes, iov->iov_len);
>>> -        left = copyout(buf, from, copy);
>>> +        if (mc_safe)
>>> +            left = copyout_mc(buf, from, copy);
>>> +        else
>>> +            left = copyout(buf, from, copy);
>>>           copy -= left;
>>>           skip = copy;
>>>           from += copy;
>>> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t 
>>> bytes, struct iov_iter *i)
>>>   EXPORT_SYMBOL(_copy_to_iter);
>>>   #ifdef CONFIG_ARCH_HAS_COPY_MC
>>> -static int copyout_mc(void __user *to, const void *from, size_t n)
>>> -{
>>> -    if (access_ok(to, n)) {
>>> -        instrument_copy_to_user(to, from, n);
>>> -        n = copy_mc_to_user((__force void *) to, from, n);
>>> -    }
>>> -    return n;
>>> -}
>>> -
>>>   static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>>>                   struct iov_iter *i)
>>>   {
>>> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page 
>>> *page, size_t offset, size_t n)
>>>   }
>>>   static size_t __copy_page_to_iter(struct page *page, size_t offset, 
>>> size_t bytes,
>>> -             struct iov_iter *i)
>>> +             struct iov_iter *i, bool mc_safe)
>>>   {
>>>       if (likely(iter_is_iovec(i)))
>>> -        return copy_page_to_iter_iovec(page, offset, bytes, i);
>>> +        return copy_page_to_iter_iovec(page, offset, bytes, i, 
>>> mc_safe);
>>>       if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || 
>>> iov_iter_is_xarray(i)) {
>>>           void *kaddr = kmap_local_page(page);
>>>           size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
>>> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, 
>>> size_t offset, size_t bytes,
>>>       offset %= PAGE_SIZE;
>>>       while (1) {
>>>           size_t n = __copy_page_to_iter(page, offset,
>>> -                min(bytes, (size_t)PAGE_SIZE - offset), i);
>>> +                min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>>>           res += n;
>>>           bytes -= n;
>>>           if (!bytes || !n)
>>> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, 
>>> size_t offset, size_t bytes,
>>>   }
>>>   EXPORT_SYMBOL(copy_page_to_iter);
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +/**
>>> + * copy_mc_page_to_iter - copy page to iter with source memory error 
>>> exception handling.
>>> + *
>>> + * The filemap_read deploys this for pagecache reading and the main 
>>> differences between
>>> + * this and typical copy_page_to_iter() is call __copy_page_to_iter 
>>> with mc_safe true.
>>> + *
>>> + * Return: number of bytes copied (may be %0)
>>> + */
>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t 
>>> bytes,
>>> +             struct iov_iter *i)
>>> +{
>>> +    size_t res = 0;
>>> +
>>> +    if (unlikely(!page_copy_sane(page, offset, bytes)))
>>> +        return 0;
>>> +    page += offset / PAGE_SIZE; // first subpage
>>> +    offset %= PAGE_SIZE;
>>> +    while (1) {
>>> +        size_t n = __copy_page_to_iter(page, offset,
>>> +                min(bytes, (size_t)PAGE_SIZE - offset), i, true);
>>> +        res += n;
>>> +        bytes -= n;
>>> +        if (!bytes || !n)
>>> +            break;
>>> +        offset += n;
>>> +        if (offset == PAGE_SIZE) {
>>> +            page++;
>>> +            offset = 0;
>>> +        }
>>> +    }
>>> +    return res;
>>> +}
>>> +#endif
>>> +
>>>   size_t copy_page_from_iter(struct page *page, size_t offset, size_t 
>>> bytes,
>>>                struct iov_iter *i)
>>>   {
>>> -- 
>>> 2.18.0.huawei.25
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>> .
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-07 15:53       ` Robin Murphy
@ 2022-04-08  2:43         ` Tong Tiangen
  2022-04-08 11:11           ` Robin Murphy
  0 siblings, 1 reply; 28+ messages in thread
From: Tong Tiangen @ 2022-04-08  2:43 UTC (permalink / raw)
  To: Robin Murphy, Mark Rutland, james.morse
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	Kefeng Wang



在 2022/4/7 23:53, Robin Murphy 写道:
> On 2022-04-07 15:56, Tong Tiangen wrote:
>>
>>
>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>> When user process reading file, the data is cached in pagecache and
>>>> the data belongs to the user process, When machine check error is
>>>> encountered during pagecache reading, killing the user process and
>>>> isolate the user page with hardware memory errors is a more reasonable
>>>> choice than kernel panic.
>>>>
>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>> check safe.
>>>
>>> As with prior patches, *why* is the distinction necessary?
>>>
>>> This patch adds a bunch of conditional logic, but *structurally* it 
>>> doesn't
>>> alter the handling to be substantially different for the MC and 
>>> non-MC cases.
>>>
>>> This seems like pointless duplication that just makes it harder to 
>>> maintain
>>> this code.
>>>
>>> Thanks,
>>> Mark.
>>
>> Agreed, The implementation here looks a little ugly and harder to 
>> maintain.
>>
>> The purpose of my doing this is not all copy_to_user can be recovered.
>>
>> A memory error is consumed when reading pagecache using copy_to_user.
>> I think in this scenario, only the process is affected because it 
>> can't read
>> pagecache data correctly. Just kill the process and don't need the whole
>> kernel panic.
>>
>> So I need two different copy_to_user implementation, one is existing 
>> __arch_copy_to_user,
>> this function will panic when consuming memory errors. The other one 
>> is this new helper
>> __arch_copy_mc_to_user, this interface is used when reading pagecache. 
>> It can recover from
>> consume memory error.
> 
> OK, but do we really need two almost-identical implementations of every 
> function where the only difference is how the exception table entries 
> are annotated? Could the exception handler itself just figure out who 
> owns the page where the fault occurred and decide what action to take as 
> appropriate?
> 
> Robin.
> 

Thank you, Robin.

I added this call path in this patchset: do_sea() -> fixup_exception(), 
the purpose is to provide a chance for sea fault to fixup (refer patch 
3/7).

If fixup successful, panic can be avoided. Otherwise, panic can be 
eliminated according to the original logic.

fixup_exception() will set regs->pc and jump to regs->pc when the 
context recovery of an exception occurs.

If mc-safe-fixup added to  __arch_copy_to_user(),  in *non pagecache 
reading* scenario encount memory error when call __arch_copy_to_user() , 
do_sea() -> fixup_exception() will not return fail and will miss the 
panic logic in do_sea().

So I add new helper __arch_copy_mc_to_user()  and add mc-safe-fixup to 
this helper, which can be used in the required scenarios. At present, 
there is only one pagecache reading scenario, other scenarios need to be 
developed.

This is my current confusion. Of course, I will think about the solution 
to  solve the duplicate code problem.

Thanks,
Tong.
>>
>> In future, if find a scenario use copy_to_user can also be recovered, 
>> we can also use this mc
>> safe helper instead it.
>>
>> Thanks,
>> Tong.
>>
>>>
>>>> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
>>>> which is used by pagecache reading.
>>>>
>>>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>>>> ---
>>>>   arch/arm64/include/asm/uaccess.h | 15 ++++++
>>>>   arch/arm64/lib/Makefile          |  2 +-
>>>>   arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>>>>   include/linux/uio.h              |  9 +++-
>>>>   lib/iov_iter.c                   | 85 
>>>> +++++++++++++++++++++++++-------
>>>>   5 files changed, 170 insertions(+), 19 deletions(-)
>>>>   create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>>>
>>>> diff --git a/arch/arm64/include/asm/uaccess.h 
>>>> b/arch/arm64/include/asm/uaccess.h
>>>> index 24b662407fbd..f0d5e811165a 100644
>>>> --- a/arch/arm64/include/asm/uaccess.h
>>>> +++ b/arch/arm64/include/asm/uaccess.h
>>>> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const 
>>>> char __user *src, long count);
>>>>   extern __must_check long strnlen_user(const char __user *str, long 
>>>> n);
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +extern unsigned long __must_check __arch_copy_mc_to_user(void 
>>>> __user *to,
>>>> +                             const void *from, unsigned long n);
>>>> +static inline unsigned long __must_check
>>>> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
>>>> +{
>>>> +    uaccess_ttbr0_enable();
>>>> +    n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
>>>> +    uaccess_ttbr0_disable();
>>>> +
>>>> +    return n;
>>>> +}
>>>> +#define copy_mc_to_user copy_mc_to_user
>>>> +#endif
>>>> +
>>>>   #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>>>>   struct page;
>>>>   void memcpy_page_flushcache(char *to, struct page *page, size_t 
>>>> offset, size_t len);
>>>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>>>> index 29c578414b12..9b3571227fb4 100644
>>>> --- a/arch/arm64/lib/Makefile
>>>> +++ b/arch/arm64/lib/Makefile
>>>> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>>>>   obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>>>> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
>>>> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
>>>> diff --git a/arch/arm64/lib/copy_to_user_mc.S 
>>>> b/arch/arm64/lib/copy_to_user_mc.S
>>>> new file mode 100644
>>>> index 000000000000..9d228ff15446
>>>> --- /dev/null
>>>> +++ b/arch/arm64/lib/copy_to_user_mc.S
>>>> @@ -0,0 +1,78 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +/*
>>>> + * Copyright (C) 2012 ARM Ltd.
>>>> + */
>>>> +
>>>> +#include <linux/linkage.h>
>>>> +
>>>> +#include <asm/asm-uaccess.h>
>>>> +#include <asm/assembler.h>
>>>> +#include <asm/cache.h>
>>>> +
>>>> +/*
>>>> + * Copy to user space from a kernel buffer (alignment handled by 
>>>> the hardware)
>>>> + *
>>>> + * Parameters:
>>>> + *    x0 - to
>>>> + *    x1 - from
>>>> + *    x2 - n
>>>> + * Returns:
>>>> + *    x0 - bytes not copied
>>>> + */
>>>> +    .macro ldrb1 reg, ptr, val
>>>> +    1000: ldrb  \reg, [\ptr], \val;
>>>> +    _asm_extable_mc 1000b, 9998f;
>>>> +    .endm
>>>> +
>>>> +    .macro strb1 reg, ptr, val
>>>> +    user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
>>>> +    .endm
>>>> +
>>>> +    .macro ldrh1 reg, ptr, val
>>>> +    1001: ldrh  \reg, [\ptr], \val;
>>>> +    _asm_extable_mc 1001b, 9998f;
>>>> +    .endm
>>>> +
>>>> +    .macro strh1 reg, ptr, val
>>>> +    user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
>>>> +    .endm
>>>> +
>>>> +    .macro ldr1 reg, ptr, val
>>>> +    1002: ldr \reg, [\ptr], \val;
>>>> +    _asm_extable_mc 1002b, 9998f;
>>>> +    .endm
>>>> +
>>>> +    .macro str1 reg, ptr, val
>>>> +    user_ldst_mc 9997f, sttr, \reg, \ptr, \val
>>>> +    .endm
>>>> +
>>>> +    .macro ldp1 reg1, reg2, ptr, val
>>>> +    1003: ldp \reg1, \reg2, [\ptr], \val;
>>>> +    _asm_extable_mc 1003b, 9998f;
>>>> +    .endm
>>>> +
>>>> +    .macro stp1 reg1, reg2, ptr, val
>>>> +    user_stp 9997f, \reg1, \reg2, \ptr, \val
>>>> +    .endm
>>>> +
>>>> +end    .req    x5
>>>> +srcin    .req    x15
>>>> +SYM_FUNC_START(__arch_copy_mc_to_user)
>>>> +    add    end, x0, x2
>>>> +    mov    srcin, x1
>>>> +#include "copy_template.S"
>>>> +    mov    x0, #0
>>>> +    ret
>>>> +
>>>> +    // Exception fixups
>>>> +9997:    cbz    x0, 9998f            // Check machine check exception
>>>> +    cmp    dst, dstin
>>>> +    b.ne    9998f
>>>> +    // Before being absolutely sure we couldn't copy anything, try 
>>>> harder
>>>> +    ldrb    tmp1w, [srcin]
>>>> +USER(9998f, sttrb tmp1w, [dst])
>>>> +    add    dst, dst, #1
>>>> +9998:    sub    x0, end, dst            // bytes not copied
>>>> +    ret
>>>> +SYM_FUNC_END(__arch_copy_mc_to_user)
>>>> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
>>>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>>>> index 739285fe5a2f..539d9ee9b032 100644
>>>> --- a/include/linux/uio.h
>>>> +++ b/include/linux/uio.h
>>>> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t 
>>>> bytes, struct iov_iter *i);
>>>>   size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>>>>   size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct 
>>>> iov_iter *i);
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, 
>>>> size_t bytes,
>>>> +                struct iov_iter *i);
>>>> +#else
>>>> +#define copy_mc_page_to_iter copy_page_to_iter
>>>> +#endif
>>>> +
>>>>   static inline size_t copy_folio_to_iter(struct folio *folio, 
>>>> size_t offset,
>>>>           size_t bytes, struct iov_iter *i)
>>>>   {
>>>> -    return copy_page_to_iter(&folio->page, offset, bytes, i);
>>>> +    return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>>>>   }
>>>>   static __always_inline __must_check
>>>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>>>> index 6dd5330f7a99..2c5f3bb6391d 100644
>>>> --- a/lib/iov_iter.c
>>>> +++ b/lib/iov_iter.c
>>>> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void 
>>>> *from, size_t n)
>>>>       return n;
>>>>   }
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +static int copyout_mc(void __user *to, const void *from, size_t n)
>>>> +{
>>>> +    if (access_ok(to, n)) {
>>>> +        instrument_copy_to_user(to, from, n);
>>>> +        n = copy_mc_to_user((__force void *) to, from, n);
>>>> +    }
>>>> +    return n;
>>>> +}
>>>> +#else
>>>> +#define copyout_mc copyout
>>>> +#endif
>>>> +
>>>>   static int copyin(void *to, const void __user *from, size_t n)
>>>>   {
>>>>       if (should_fail_usercopy())
>>>> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user 
>>>> *from, size_t n)
>>>>   }
>>>>   static size_t copy_page_to_iter_iovec(struct page *page, size_t 
>>>> offset, size_t bytes,
>>>> -             struct iov_iter *i)
>>>> +             struct iov_iter *i, bool mc_safe)
>>>>   {
>>>>       size_t skip, copy, left, wanted;
>>>>       const struct iovec *iov;
>>>> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>>> page *page, size_t offset, size_t b
>>>>           from = kaddr + offset;
>>>>           /* first chunk, usually the only one */
>>>> -        left = copyout(buf, from, copy);
>>>> +        if (mc_safe)
>>>> +            left = copyout_mc(buf, from, copy);
>>>> +        else
>>>> +            left = copyout(buf, from, copy);
>>>>           copy -= left;
>>>>           skip += copy;
>>>>           from += copy;
>>>> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>>> page *page, size_t offset, size_t b
>>>>               iov++;
>>>>               buf = iov->iov_base;
>>>>               copy = min(bytes, iov->iov_len);
>>>> -            left = copyout(buf, from, copy);
>>>> +            if (mc_safe)
>>>> +                left = copyout_mc(buf, from, copy);
>>>> +            else
>>>> +                left = copyout(buf, from, copy);
>>>>               copy -= left;
>>>>               skip = copy;
>>>>               from += copy;
>>>> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>>> page *page, size_t offset, size_t b
>>>>       kaddr = kmap(page);
>>>>       from = kaddr + offset;
>>>> -    left = copyout(buf, from, copy);
>>>> +    if (mc_safe)
>>>> +        left = copyout_mc(buf, from, copy);
>>>> +    else
>>>> +        left = copyout(buf, from, copy);
>>>>       copy -= left;
>>>>       skip += copy;
>>>>       from += copy;
>>>> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct 
>>>> page *page, size_t offset, size_t b
>>>>           iov++;
>>>>           buf = iov->iov_base;
>>>>           copy = min(bytes, iov->iov_len);
>>>> -        left = copyout(buf, from, copy);
>>>> +        if (mc_safe)
>>>> +            left = copyout_mc(buf, from, copy);
>>>> +        else
>>>> +            left = copyout(buf, from, copy);
>>>>           copy -= left;
>>>>           skip = copy;
>>>>           from += copy;
>>>> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t 
>>>> bytes, struct iov_iter *i)
>>>>   EXPORT_SYMBOL(_copy_to_iter);
>>>>   #ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> -static int copyout_mc(void __user *to, const void *from, size_t n)
>>>> -{
>>>> -    if (access_ok(to, n)) {
>>>> -        instrument_copy_to_user(to, from, n);
>>>> -        n = copy_mc_to_user((__force void *) to, from, n);
>>>> -    }
>>>> -    return n;
>>>> -}
>>>> -
>>>>   static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>>>>                   struct iov_iter *i)
>>>>   {
>>>> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page 
>>>> *page, size_t offset, size_t n)
>>>>   }
>>>>   static size_t __copy_page_to_iter(struct page *page, size_t 
>>>> offset, size_t bytes,
>>>> -             struct iov_iter *i)
>>>> +             struct iov_iter *i, bool mc_safe)
>>>>   {
>>>>       if (likely(iter_is_iovec(i)))
>>>> -        return copy_page_to_iter_iovec(page, offset, bytes, i);
>>>> +        return copy_page_to_iter_iovec(page, offset, bytes, i, 
>>>> mc_safe);
>>>>       if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || 
>>>> iov_iter_is_xarray(i)) {
>>>>           void *kaddr = kmap_local_page(page);
>>>>           size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
>>>> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, 
>>>> size_t offset, size_t bytes,
>>>>       offset %= PAGE_SIZE;
>>>>       while (1) {
>>>>           size_t n = __copy_page_to_iter(page, offset,
>>>> -                min(bytes, (size_t)PAGE_SIZE - offset), i);
>>>> +                min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>>>>           res += n;
>>>>           bytes -= n;
>>>>           if (!bytes || !n)
>>>> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, 
>>>> size_t offset, size_t bytes,
>>>>   }
>>>>   EXPORT_SYMBOL(copy_page_to_iter);
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +/**
>>>> + * copy_mc_page_to_iter - copy page to iter with source memory 
>>>> error exception handling.
>>>> + *
>>>> + * The filemap_read deploys this for pagecache reading and the main 
>>>> differences between
>>>> + * this and typical copy_page_to_iter() is call __copy_page_to_iter 
>>>> with mc_safe true.
>>>> + *
>>>> + * Return: number of bytes copied (may be %0)
>>>> + */
>>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, 
>>>> size_t bytes,
>>>> +             struct iov_iter *i)
>>>> +{
>>>> +    size_t res = 0;
>>>> +
>>>> +    if (unlikely(!page_copy_sane(page, offset, bytes)))
>>>> +        return 0;
>>>> +    page += offset / PAGE_SIZE; // first subpage
>>>> +    offset %= PAGE_SIZE;
>>>> +    while (1) {
>>>> +        size_t n = __copy_page_to_iter(page, offset,
>>>> +                min(bytes, (size_t)PAGE_SIZE - offset), i, true);
>>>> +        res += n;
>>>> +        bytes -= n;
>>>> +        if (!bytes || !n)
>>>> +            break;
>>>> +        offset += n;
>>>> +        if (offset == PAGE_SIZE) {
>>>> +            page++;
>>>> +            offset = 0;
>>>> +        }
>>>> +    }
>>>> +    return res;
>>>> +}
>>>> +#endif
>>>> +
>>>>   size_t copy_page_from_iter(struct page *page, size_t offset, 
>>>> size_t bytes,
>>>>                struct iov_iter *i)
>>>>   {
>>>> -- 
>>>> 2.18.0.huawei.25
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>> .
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-08  2:43         ` Tong Tiangen
@ 2022-04-08 11:11           ` Robin Murphy
  2022-04-09  9:24             ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Robin Murphy @ 2022-04-08 11:11 UTC (permalink / raw)
  To: Tong Tiangen, Mark Rutland, james.morse
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	Kefeng Wang

On 2022-04-08 03:43, Tong Tiangen wrote:
> 
> 
> 在 2022/4/7 23:53, Robin Murphy 写道:
>> On 2022-04-07 15:56, Tong Tiangen wrote:
>>>
>>>
>>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>>> When user process reading file, the data is cached in pagecache and
>>>>> the data belongs to the user process, When machine check error is
>>>>> encountered during pagecache reading, killing the user process and
>>>>> isolate the user page with hardware memory errors is a more reasonable
>>>>> choice than kernel panic.
>>>>>
>>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>>> check safe.
>>>>
>>>> As with prior patches, *why* is the distinction necessary?
>>>>
>>>> This patch adds a bunch of conditional logic, but *structurally* it 
>>>> doesn't
>>>> alter the handling to be substantially different for the MC and 
>>>> non-MC cases.
>>>>
>>>> This seems like pointless duplication that just makes it harder to 
>>>> maintain
>>>> this code.
>>>>
>>>> Thanks,
>>>> Mark.
>>>
>>> Agreed, The implementation here looks a little ugly and harder to 
>>> maintain.
>>>
>>> The purpose of my doing this is not all copy_to_user can be recovered.
>>>
>>> A memory error is consumed when reading pagecache using copy_to_user.
>>> I think in this scenario, only the process is affected because it 
>>> can't read
>>> pagecache data correctly. Just kill the process and don't need the whole
>>> kernel panic.
>>>
>>> So I need two different copy_to_user implementation, one is existing 
>>> __arch_copy_to_user,
>>> this function will panic when consuming memory errors. The other one 
>>> is this new helper
>>> __arch_copy_mc_to_user, this interface is used when reading 
>>> pagecache. It can recover from
>>> consume memory error.
>>
>> OK, but do we really need two almost-identical implementations of 
>> every function where the only difference is how the exception table 
>> entries are annotated? Could the exception handler itself just figure 
>> out who owns the page where the fault occurred and decide what action 
>> to take as appropriate?
>>
>> Robin.
>>
> 
> Thank you, Robin.
> 
> I added this call path in this patchset: do_sea() -> fixup_exception(), 
> the purpose is to provide a chance for sea fault to fixup (refer patch 
> 3/7).
> 
> If fixup successful, panic can be avoided. Otherwise, panic can be 
> eliminated according to the original logic.
> 
> fixup_exception() will set regs->pc and jump to regs->pc when the 
> context recovery of an exception occurs.
> 
> If mc-safe-fixup added to  __arch_copy_to_user(),  in *non pagecache 
> reading* scenario encount memory error when call __arch_copy_to_user() , 
> do_sea() -> fixup_exception() will not return fail and will miss the 
> panic logic in do_sea().
> 
> So I add new helper __arch_copy_mc_to_user()  and add mc-safe-fixup to 
> this helper, which can be used in the required scenarios. At present, 
> there is only one pagecache reading scenario, other scenarios need to be 
> developed.
> 
> This is my current confusion. Of course, I will think about the solution 
> to  solve the duplicate code problem.

Right, but if the point is that faults in pagecahe pages are special, 
surely __do_kernel_fault() could ultimately figure out from the address 
whether that's the case or not?

In general, if the principle is that whether a fault is recoverable or 
not depends on what was being accessed, then to me it seems 
fundamentally more robust to base that decision on details of the fault 
that actually occurred, rather than what the caller thought it was 
supposed to be doing at the time.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 5/7] arm64: add get_user to machine check safe
  2022-04-07 14:38     ` Tong Tiangen
@ 2022-04-08 15:22       ` Mark Rutland
  2022-04-09  9:17         ` Tong Tiangen
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2022-04-08 15:22 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm

On Thu, Apr 07, 2022 at 10:38:04PM +0800, Tong Tiangen wrote:
> 在 2022/4/6 19:22, Mark Rutland 写道:
> > On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
> > > Add scenarios get_user to machine check safe. The processing of
> > > EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
> > > and both return -EFAULT.
> > 
> > Which uaccess cases do we expect to *not* be recoverable?
> > 
> > Naively I would assume that if we're going to treat a memory error on a uaccess
> > as fatal to userspace we should be able to do that for *any* uacesses.
> > 
> > The commit message should explain why we need the distinction between a
> > recoverable uaccess and a non-recoverable uaccess.
> > 
> > Thanks,
> > Mark.
> 
> Currently, any memory error consumed in kernel mode will lead to panic
> (do_sea()).
> 
> My idea is that not all memory errors consumed in kernel mode are fatal,
> such as copy_ from_ user/get_ user is a memory error consumed when
> reading user data in the process context. In this case, we can not let the
> kernel panic, just kill the process without affecting the operation
> of the system.

I understood this part.

> However, not all uaccess can be recovered without affecting the normal
> operation of the system. The key is not whether it is uaccess, but whether
> there are key data affecting the normal operation of the system in the read
> page.

Ok. Can you give an example of such a case where the a uaccess that hits
a memory error must be fatal?

I think you might be trying to say that for copy_{to,from}_user() we can
make that judgement, but those are combined user+kernel access
primitives, and the *uaccess* part should never be reading from a page
with "key data affecting the normal operation of the system", since
that's userspace memory.

Is there any *userspace access* (e.g. where we use LDTR/STTR today)
where we must treat a memory error as fatal to the system?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 5/7] arm64: add get_user to machine check safe
  2022-04-08 15:22       ` Mark Rutland
@ 2022-04-09  9:17         ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-09  9:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm



在 2022/4/8 23:22, Mark Rutland 写道:
> On Thu, Apr 07, 2022 at 10:38:04PM +0800, Tong Tiangen wrote:
>> 在 2022/4/6 19:22, Mark Rutland 写道:
>>> On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
>>>> Add scenarios get_user to machine check safe. The processing of
>>>> EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
>>>> and both return -EFAULT.
>>>
>>> Which uaccess cases do we expect to *not* be recoverable?
>>>
>>> Naively I would assume that if we're going to treat a memory error on a uaccess
>>> as fatal to userspace we should be able to do that for *any* uacesses.
>>>
>>> The commit message should explain why we need the distinction between a
>>> recoverable uaccess and a non-recoverable uaccess.
>>>
>>> Thanks,
>>> Mark.
>>
>> Currently, any memory error consumed in kernel mode will lead to panic
>> (do_sea()).
>>
>> My idea is that not all memory errors consumed in kernel mode are fatal,
>> such as copy_ from_ user/get_ user is a memory error consumed when
>> reading user data in the process context. In this case, we can not let the
>> kernel panic, just kill the process without affecting the operation
>> of the system.
> 
> I understood this part.
> 
>> However, not all uaccess can be recovered without affecting the normal
>> operation of the system. The key is not whether it is uaccess, but whether
>> there are key data affecting the normal operation of the system in the read
>> page.
> 
> Ok. Can you give an example of such a case where the a uaccess that hits
> a memory error must be fatal?
> 
> I think you might be trying to say that for copy_{to,from}_user() we can
> make that judgement, but those are combined user+kernel access
> primitives, and the *uaccess* part should never be reading from a page
> with "key data affecting the normal operation of the system", since
> that's userspace memory.
> 
> Is there any *userspace access* (e.g. where we use LDTR/STTR today)
> where we must treat a memory error as fatal to the system?
> 
> Thanks,
> Mark.
> .

I seem to understand what you mean.
Take copy_to_user()/put_user() as an example. If it encounters memory 
error, only related processes will be affected. According to this 
understanding, it seems that all uaccess can be recovered.

Thanks,
Tong.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
  2022-04-08 11:11           ` Robin Murphy
@ 2022-04-09  9:24             ` Tong Tiangen
  0 siblings, 0 replies; 28+ messages in thread
From: Tong Tiangen @ 2022-04-09  9:24 UTC (permalink / raw)
  To: Robin Murphy, Mark Rutland, james.morse
  Cc: Andrew Morton, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Catalin Marinas, Will Deacon, Alexander Viro, x86,
	H. Peter Anvin, linux-arm-kernel, linux-kernel, linux-mm,
	Kefeng Wang, james.morse



在 2022/4/8 19:11, Robin Murphy 写道:
> On 2022-04-08 03:43, Tong Tiangen wrote:
>>
>>
>> 在 2022/4/7 23:53, Robin Murphy 写道:
>>> On 2022-04-07 15:56, Tong Tiangen wrote:
>>>>
>>>>
>>>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>>>> When user process reading file, the data is cached in pagecache and
>>>>>> the data belongs to the user process, When machine check error is
>>>>>> encountered during pagecache reading, killing the user process and
>>>>>> isolate the user page with hardware memory errors is a more 
>>>>>> reasonable
>>>>>> choice than kernel panic.
>>>>>>
>>>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>>>> check safe.
>>>>>
>>>>> As with prior patches, *why* is the distinction necessary?
>>>>>
>>>>> This patch adds a bunch of conditional logic, but *structurally* it 
>>>>> doesn't
>>>>> alter the handling to be substantially different for the MC and 
>>>>> non-MC cases.
>>>>>
>>>>> This seems like pointless duplication that just makes it harder to 
>>>>> maintain
>>>>> this code.
>>>>>
>>>>> Thanks,
>>>>> Mark.
>>>>
>>>> Agreed, The implementation here looks a little ugly and harder to 
>>>> maintain.
>>>>
>>>> The purpose of my doing this is not all copy_to_user can be recovered.
>>>>
>>>> A memory error is consumed when reading pagecache using copy_to_user.
>>>> I think in this scenario, only the process is affected because it 
>>>> can't read
>>>> pagecache data correctly. Just kill the process and don't need the 
>>>> whole
>>>> kernel panic.
>>>>
>>>> So I need two different copy_to_user implementation, one is existing 
>>>> __arch_copy_to_user,
>>>> this function will panic when consuming memory errors. The other one 
>>>> is this new helper
>>>> __arch_copy_mc_to_user, this interface is used when reading 
>>>> pagecache. It can recover from
>>>> consume memory error.
>>>
>>> OK, but do we really need two almost-identical implementations of 
>>> every function where the only difference is how the exception table 
>>> entries are annotated? Could the exception handler itself just figure 
>>> out who owns the page where the fault occurred and decide what action 
>>> to take as appropriate?
>>>
>>> Robin.
>>>
>>
>> Thank you, Robin.
>>
>> I added this call path in this patchset: do_sea() -> 
>> fixup_exception(), the purpose is to provide a chance for sea fault to 
>> fixup (refer patch 3/7).
>>
>> If fixup successful, panic can be avoided. Otherwise, panic can be 
>> eliminated according to the original logic.
>>
>> fixup_exception() will set regs->pc and jump to regs->pc when the 
>> context recovery of an exception occurs.
>>
>> If mc-safe-fixup added to  __arch_copy_to_user(),  in *non pagecache 
>> reading* scenario encount memory error when call __arch_copy_to_user() 
>> , do_sea() -> fixup_exception() will not return fail and will miss the 
>> panic logic in do_sea().
>>
>> So I add new helper __arch_copy_mc_to_user()  and add mc-safe-fixup to 
>> this helper, which can be used in the required scenarios. At present, 
>> there is only one pagecache reading scenario, other scenarios need to 
>> be developed.
>>
>> This is my current confusion. Of course, I will think about the 
>> solution to  solve the duplicate code problem.
> 
> Right, but if the point is that faults in pagecahe pages are special, 
> surely __do_kernel_fault() could ultimately figure out from the address 
> whether that's the case or not?
> 
> In general, if the principle is that whether a fault is recoverable or 
> not depends on what was being accessed, then to me it seems 
> fundamentally more robust to base that decision on details of the fault 
> that actually occurred, rather than what the caller thought it was 
> supposed to be doing at the time.
> 
> Thanks,
> Robin.
> .
According to Mark's suggestion, all uaccess can be recovered, including 
copy_to_user(), so there is no need to add new helper 
__arch_mc_copy_to_user()。

Thanks,
Tong.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-04-09  9:24 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error Tong Tiangen
2022-04-06  9:22   ` Borislav Petkov
2022-04-06 10:02     ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage Tong Tiangen
2022-04-06 10:22   ` Mark Rutland
2022-04-06 12:47     ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe Tong Tiangen
2022-04-06 10:58   ` Mark Rutland
2022-04-07 14:26     ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe Tong Tiangen
2022-04-06 11:19   ` Mark Rutland
2022-04-07 14:28     ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 5/7] arm64: add get_user " Tong Tiangen
2022-04-06 11:22   ` Mark Rutland
2022-04-07 14:38     ` Tong Tiangen
2022-04-08 15:22       ` Mark Rutland
2022-04-09  9:17         ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 6/7] arm64: add cow " Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 7/7] arm64: add pagecache reading " Tong Tiangen
2022-04-06 11:27   ` Mark Rutland
2022-04-07 14:56     ` Tong Tiangen
2022-04-07 15:53       ` Robin Murphy
2022-04-08  2:43         ` Tong Tiangen
2022-04-08 11:11           ` Robin Murphy
2022-04-09  9:24             ` Tong Tiangen
2022-04-06 10:04 ` [RFC PATCH -next V2 0/7]arm64: add machine check safe support Mark Rutland
2022-04-07  4:21   ` Tong Tiangen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).