linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi()
@ 2022-09-16 13:59 Kees Cook
  2022-09-16 13:59 ` [PATCH 1/3] x86/uaccess: Move nmi_uaccess_okay() into uaccess.h Kees Cook
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Kees Cook @ 2022-09-16 13:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Uladzislau Rezki, Andrew Morton, Yu Zhao, dev,
	Peter Zijlstra, Ingo Molnar, linux-kernel, x86, linux-perf-users,
	linux-mm, linux-hardening, linux-arch

Hi,

This fixes a find_vmap_area() deadlock. The main fix is patch 2, repeated here:

    The check_object_size() helper under CONFIG_HARDENED_USERCOPY is
    designed to skip any checks where the length is known at compile time as
    a reasonable heuristic to avoid "likely known-good" cases. However, it can
    only do this when the copy_*_user() helpers are, themselves, inline too.

    Using find_vmap_area() requires taking a spinlock. The check_object_size()
    helper can call find_vmap_area() when the destination is in vmap memory.
    If show_regs() is called in interrupt context, it will attempt a call to
    copy_from_user_nmi(), which may call check_object_size() and then
    find_vmap_area(). If something in normal context happens to be in the
    middle of calling find_vmap_area() (with the spinlock held), the interrupt
    handler will hang forever.

    The copy_from_user_nmi() call is actually being called with a fixed-size
    length, so check_object_size() should never have been called in the
    first place. In order for check_object_size() to see that the length is
    a fixed size, inline copy_from_user_nmi(), as already done with all the
    other uaccess helpers.

    Reported-by: Yu Zhao <yuzhao@google.com>
    Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com
    Reported-by: dev@der-flo.net

Patch 1 is a refactor for patch 2, and patch 3 should make sure we avoid
future deadlocks.

Thanks,

-Kees


Kees Cook (3):
  x86/uaccess: Move nmi_uaccess_okay() into uaccess.h
  x86/dumpstack: Inline copy_from_user_nmi()
  usercopy: Add find_vmap_area_try() to avoid deadlocks

 arch/x86/events/core.c          |  1 -
 arch/x86/include/asm/tlbflush.h |  3 --
 arch/x86/include/asm/uaccess.h  |  5 ++--
 arch/x86/kernel/dumpstack.c     |  4 +--
 arch/x86/lib/Makefile           |  2 +-
 arch/x86/lib/usercopy.c         | 52 ---------------------------------
 include/asm-generic/tlb.h       |  9 ------
 include/linux/uaccess.h         | 50 +++++++++++++++++++++++++++++++
 include/linux/vmalloc.h         |  1 +
 kernel/trace/bpf_trace.c        |  2 --
 mm/usercopy.c                   | 11 ++++++-
 mm/vmalloc.c                    | 11 +++++++
 12 files changed, 78 insertions(+), 73 deletions(-)
 delete mode 100644 arch/x86/lib/usercopy.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] x86/uaccess: Move nmi_uaccess_okay() into uaccess.h
  2022-09-16 13:59 [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
@ 2022-09-16 13:59 ` Kees Cook
  2022-09-16 13:59 ` [PATCH 2/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Kees Cook @ 2022-09-16 13:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Uladzislau Rezki, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, x86, linux-perf-users, bpf, Yu Zhao, dev,
	Andrew Morton, linux-kernel, linux-mm, linux-hardening,
	linux-arch

In preparation for inlining copy_from_user_in_nmi(), move the
nmi_uaccess_okay() declaration into uaccess.h, which makes a bit more
sense anyway. Additionally update all callers to remove the no longer
needed tlbflush.h include, which was only for the declaration of
nmi_uaccess_okay().

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: linux-perf-users@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: dev@der-flo.net
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/events/core.c          | 1 -
 arch/x86/include/asm/tlbflush.h | 3 ---
 arch/x86/include/asm/uaccess.h  | 3 +++
 arch/x86/lib/usercopy.c         | 2 --
 include/asm-generic/tlb.h       | 9 ---------
 include/linux/uaccess.h         | 9 +++++++++
 kernel/trace/bpf_trace.c        | 2 --
 7 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f969410d0c90..3e2bb6324ca3 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -36,7 +36,6 @@
 #include <asm/smp.h>
 #include <asm/alternative.h>
 #include <asm/mmu_context.h>
-#include <asm/tlbflush.h>
 #include <asm/timer.h>
 #include <asm/desc.h>
 #include <asm/ldt.h>
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cda3118f3b27..233818bb72c6 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -157,9 +157,6 @@ struct tlb_state_shared {
 };
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
 
-bool nmi_uaccess_okay(void);
-#define nmi_uaccess_okay nmi_uaccess_okay
-
 /* Initialize cr4 shadow for this CPU. */
 static inline void cr4_init_shadow(void)
 {
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 913e593a3b45..e9390eea861b 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -20,6 +20,9 @@ static inline bool pagefault_disabled(void);
 # define WARN_ON_IN_IRQ()
 #endif
 
+bool nmi_uaccess_okay(void);
+#define nmi_uaccess_okay nmi_uaccess_okay
+
 /**
  * access_ok - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index ad0139d25401..959489f2f814 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -7,8 +7,6 @@
 #include <linux/uaccess.h>
 #include <linux/export.h>
 
-#include <asm/tlbflush.h>
-
 /**
  * copy_from_user_nmi - NMI safe copy from user
  * @to:		Pointer to the destination buffer
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 492dce43236e..14efd74f3e70 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -17,15 +17,6 @@
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 
-/*
- * Blindly accessing user memory from NMI context can be dangerous
- * if we're in the middle of switching the current user task or switching
- * the loaded mm.
- */
-#ifndef nmi_uaccess_okay
-# define nmi_uaccess_okay() true
-#endif
-
 #ifdef CONFIG_MMU
 
 /*
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 47e5d374c7eb..065e121d2a86 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -10,6 +10,15 @@
 
 #include <asm/uaccess.h>
 
+/*
+ * Blindly accessing user memory from NMI context can be dangerous
+ * if we're in the middle of switching the current user task or switching
+ * the loaded mm.
+ */
+#ifndef nmi_uaccess_okay
+# define nmi_uaccess_okay() true
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 68e5cdd24cef..0fd185c3d174 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -26,8 +26,6 @@
 #include <uapi/linux/bpf.h>
 #include <uapi/linux/btf.h>
 
-#include <asm/tlb.h>
-
 #include "trace_probe.h"
 #include "trace.h"
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] x86/dumpstack: Inline copy_from_user_nmi()
  2022-09-16 13:59 [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
  2022-09-16 13:59 ` [PATCH 1/3] x86/uaccess: Move nmi_uaccess_okay() into uaccess.h Kees Cook
@ 2022-09-16 13:59 ` Kees Cook
  2022-09-16 13:59 ` [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks Kees Cook
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Kees Cook @ 2022-09-16 13:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Uladzislau Rezki, Yu Zhao, dev, Peter Zijlstra,
	Josh Poimboeuf, Dave Hansen, x86, stable, Andrew Morton,
	Ingo Molnar, linux-kernel, linux-perf-users, linux-mm,
	linux-hardening, linux-arch

The check_object_size() helper under CONFIG_HARDENED_USERCOPY is
designed to skip any checks where the length is known at compile time as
a reasonable heuristic to avoid "likely known-good" cases. However, it can
only do this when the copy_*_user() helpers are, themselves, inline too.

Using find_vmap_area() requires taking a spinlock. The check_object_size()
helper can call find_vmap_area() when the destination is in vmap memory.
If show_regs() is called in interrupt context, it will attempt a call to
copy_from_user_nmi(), which may call check_object_size() and then
find_vmap_area(). If something in normal context happens to be in the
middle of calling find_vmap_area() (with the spinlock held), the interrupt
handler will hang forever.

The copy_from_user_nmi() call is actually being called with a fixed-size
length, so check_object_size() should never have been called in the
first place. In order for check_object_size() to see that the length is
a fixed size, inline copy_from_user_nmi(), as already done with all the
other uaccess helpers.

Reported-by: Yu Zhao <yuzhao@google.com>
Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com
Reported-by: dev@der-flo.net
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/include/asm/uaccess.h |  2 --
 arch/x86/kernel/dumpstack.c    |  4 +--
 arch/x86/lib/Makefile          |  2 +-
 arch/x86/lib/usercopy.c        | 50 ----------------------------------
 include/linux/uaccess.h        | 41 ++++++++++++++++++++++++++++
 5 files changed, 44 insertions(+), 55 deletions(-)
 delete mode 100644 arch/x86/lib/usercopy.c

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index e9390eea861b..f47c0c752e7a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -498,8 +498,6 @@ struct __large_struct { unsigned long buf[100]; };
 		: : ltype(x), "m" (__m(addr))				\
 		: : label)
 
-extern unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n);
 extern __must_check long
 strncpy_from_user(char *dst, const char __user *src, long count);
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index afae4dd77495..b59d59ef10d2 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -72,8 +72,8 @@ static void printk_stack_address(unsigned long address, int reliable,
 	printk("%s %s%pBb\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
-static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
-		     unsigned int nbytes)
+static __always_inline int
+copy_code(struct pt_regs *regs, u8 *buf, unsigned long src, unsigned int nbytes)
 {
 	if (!user_mode(regs))
 		return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index f76747862bd2..aeb5cd634e27 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -42,7 +42,7 @@ clean-files := inat-tables.c
 obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 
 lib-y := delay.o misc.o cmdline.o cpu.o
-lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
+lib-y += usercopy_$(BITS).o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-y += pc-conf-reg.o
 lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
deleted file mode 100644
index 959489f2f814..000000000000
--- a/arch/x86/lib/usercopy.c
+++ /dev/null
@@ -1,50 +0,0 @@
-/*
- * User address space access functions.
- *
- *  For licencing details see kernel-base/COPYING
- */
-
-#include <linux/uaccess.h>
-#include <linux/export.h>
-
-/**
- * copy_from_user_nmi - NMI safe copy from user
- * @to:		Pointer to the destination buffer
- * @from:	Pointer to a user space address of the current task
- * @n:		Number of bytes to copy
- *
- * Returns: The number of not copied bytes. 0 is success, i.e. all bytes copied
- *
- * Contrary to other copy_from_user() variants this function can be called
- * from NMI context. Despite the name it is not restricted to be called
- * from NMI context. It is safe to be called from any other context as
- * well. It disables pagefaults across the copy which means a fault will
- * abort the copy.
- *
- * For NMI context invocations this relies on the nested NMI work to allow
- * atomic faults from the NMI path; the nested NMI paths are careful to
- * preserve CR2.
- */
-unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
-{
-	unsigned long ret;
-
-	if (!__access_ok(from, n))
-		return n;
-
-	if (!nmi_uaccess_okay())
-		return n;
-
-	/*
-	 * Even though this function is typically called from NMI/IRQ context
-	 * disable pagefaults so that its behaviour is consistent even when
-	 * called from other contexts.
-	 */
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(to, from, n);
-	pagefault_enable();
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(copy_from_user_nmi);
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 065e121d2a86..fee141ed8f95 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -273,6 +273,47 @@ __copy_from_user_inatomic_nocache(void *to, const void __user *from,
 
 #endif		/* ARCH_HAS_NOCACHE_UACCESS */
 
+/**
+ * copy_from_user_nmi - NMI safe copy from user
+ * @to:		Pointer to the destination buffer
+ * @from:	Pointer to a user space address of the current task
+ * @n:		Number of bytes to copy
+ *
+ * Returns: The number of not copied bytes. 0 is success, i.e. all bytes copied
+ *
+ * Contrary to other copy_from_user() variants this function can be called
+ * from NMI context. Despite the name it is not restricted to be called
+ * from NMI context. It is safe to be called from any other context as
+ * well. It disables pagefaults across the copy which means a fault will
+ * abort the copy.
+ *
+ * For NMI context invocations this relies on the nested NMI work to allow
+ * atomic faults from the NMI path; the nested NMI paths are careful to
+ * preserve CR2.
+ */
+static __always_inline unsigned long
+copy_from_user_nmi(void *to, const void __user *from, const unsigned long n)
+{
+	unsigned long ret;
+
+	if (!__access_ok(from, n))
+		return n;
+
+	if (!nmi_uaccess_okay())
+		return n;
+
+	/*
+	 * Even though this function is typically called from NMI/IRQ context
+	 * disable pagefaults so that its behaviour is consistent even when
+	 * called from other contexts.
+	 */
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(to, from, n);
+	pagefault_enable();
+
+	return ret;
+}
+
 extern __must_check int check_zeroed_user(const void __user *from, size_t size);
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks
  2022-09-16 13:59 [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
  2022-09-16 13:59 ` [PATCH 1/3] x86/uaccess: Move nmi_uaccess_okay() into uaccess.h Kees Cook
  2022-09-16 13:59 ` [PATCH 2/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
@ 2022-09-16 13:59 ` Kees Cook
  2022-09-16 14:46   ` Matthew Wilcox
  2022-09-16 17:29   ` Uladzislau Rezki
  2022-09-16 19:57 ` [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Andrew Morton
  2022-09-17  2:20 ` Yu Zhao
  4 siblings, 2 replies; 13+ messages in thread
From: Kees Cook @ 2022-09-16 13:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Uladzislau Rezki, Andrew Morton, Yu Zhao, dev,
	linux-mm, linux-hardening, Peter Zijlstra, Ingo Molnar,
	linux-kernel, x86, linux-perf-users, linux-arch

The check_object_size() checks under CONFIG_HARDENED_USERCOPY need to be
more defensive against running from interrupt context. Use a best-effort
check for VMAP areas when running in interrupt context

Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/linux-mm/YyQ2CSdIJdvQPSPO@casper.infradead.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: dev@der-flo.net
Cc: linux-mm@kvack.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/vmalloc.h |  1 +
 mm/usercopy.c           | 11 ++++++++++-
 mm/vmalloc.c            | 11 +++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..c8a00f181a11 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 struct vmap_area *find_vmap_area(unsigned long addr);
+struct vmap_area *find_vmap_area_try(unsigned long addr);
 
 static inline bool is_vm_area_hugepages(const void *addr)
 {
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..4a371099ac64 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 
 	if (is_vmalloc_addr(ptr)) {
-		struct vmap_area *area = find_vmap_area(addr);
+		struct vmap_area *area;
+
+		if (unlikely(in_interrupt())) {
+			area = find_vmap_area_try(addr);
+			/* Give up under interrupt to avoid deadlocks. */
+			if (!area)
+				return;
+		} else {
+			area = find_vmap_area(addr);
+		}
 
 		if (!area)
 			usercopy_abort("vmalloc", "no area", to_user, 0, n);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd6cdb201195..f14f1902c2f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
 	return va;
 }
 
+struct vmap_area *find_vmap_area_try(unsigned long addr)
+{
+	struct vmap_area *va = NULL;
+
+	if (spin_trylock(&vmap_area_lock)) {
+		va = __find_vmap_area(addr, &vmap_area_root);
+		spin_unlock(&vmap_area_lock);
+	}
+	return va;
+}
+
 /*** Per cpu kva allocator ***/
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks
  2022-09-16 13:59 ` [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks Kees Cook
@ 2022-09-16 14:46   ` Matthew Wilcox
  2022-09-16 15:09     ` Kees Cook
  2022-09-16 17:29   ` Uladzislau Rezki
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2022-09-16 14:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: Uladzislau Rezki, Andrew Morton, Yu Zhao, dev, linux-mm,
	linux-hardening, Peter Zijlstra, Ingo Molnar, linux-kernel, x86,
	linux-perf-users, linux-arch

On Fri, Sep 16, 2022 at 06:59:57AM -0700, Kees Cook wrote:
> The check_object_size() checks under CONFIG_HARDENED_USERCOPY need to be
> more defensive against running from interrupt context. Use a best-effort
> check for VMAP areas when running in interrupt context

I had something more like this in mind:

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..2b7c52e76856 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -215,7 +215,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
 void free_vm_area(struct vm_struct *area);
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
-struct vmap_area *find_vmap_area(unsigned long addr);
+struct vmap_area *find_vmap_area_try(unsigned long addr);
 
 static inline bool is_vm_area_hugepages(const void *addr)
 {
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..e0fb605c1b38 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,7 +173,11 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 
 	if (is_vmalloc_addr(ptr)) {
-		struct vmap_area *area = find_vmap_area(addr);
+		struct vmap_area *area = find_vmap_area_try(addr);
+
+		/* We may be in NMI context */
+		if (area == ERR_PTR(-EAGAIN))
+			return;
 
 		if (!area)
 			usercopy_abort("vmalloc", "no area", to_user, 0, n);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd6cdb201195..2ea76cb56d4b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1829,7 +1829,7 @@ static void free_unmap_vmap_area(struct vmap_area *va)
 	free_vmap_area_noflush(va);
 }
 
-struct vmap_area *find_vmap_area(unsigned long addr)
+static struct vmap_area *find_vmap_area(unsigned long addr)
 {
 	struct vmap_area *va;
 
@@ -1840,6 +1840,18 @@ struct vmap_area *find_vmap_area(unsigned long addr)
 	return va;
 }
 
+struct vmap_area *find_vmap_area_try(unsigned long addr)
+{
+	struct vmap_area *va;
+
+	if (!spin_lock(&vmap_area_lock))
+		return ERR_PTR(-EAGAIN);
+	va = __find_vmap_area(addr, &vmap_area_root);
+	spin_unlock(&vmap_area_lock);
+
+	return va;
+}
+
 /*** Per cpu kva allocator ***/
 
 /*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks
  2022-09-16 14:46   ` Matthew Wilcox
@ 2022-09-16 15:09     ` Kees Cook
  2022-09-16 19:15       ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Kees Cook @ 2022-09-16 15:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki, Andrew Morton, Yu Zhao, dev, linux-mm,
	linux-hardening, Peter Zijlstra, Ingo Molnar, linux-kernel, x86,
	linux-perf-users, linux-arch

On Fri, Sep 16, 2022 at 03:46:07PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 16, 2022 at 06:59:57AM -0700, Kees Cook wrote:
> > The check_object_size() checks under CONFIG_HARDENED_USERCOPY need to be
> > more defensive against running from interrupt context. Use a best-effort
> > check for VMAP areas when running in interrupt context
> 
> I had something more like this in mind:

Yeah, I like -EAGAIN. I'd like to keep the interrupt test to choose lock
vs trylock, otherwise it's trivial to bypass the hardening test by having
all the other CPUs beating on the spinlock.

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks
  2022-09-16 13:59 ` [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks Kees Cook
  2022-09-16 14:46   ` Matthew Wilcox
@ 2022-09-16 17:29   ` Uladzislau Rezki
  1 sibling, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2022-09-16 17:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, Uladzislau Rezki, Andrew Morton, Yu Zhao, dev,
	linux-mm, linux-hardening, Peter Zijlstra, Ingo Molnar,
	linux-kernel, x86, linux-perf-users, linux-arch

On Fri, Sep 16, 2022 at 06:59:57AM -0700, Kees Cook wrote:
> The check_object_size() checks under CONFIG_HARDENED_USERCOPY need to be
> more defensive against running from interrupt context. Use a best-effort
> check for VMAP areas when running in interrupt context
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/linux-mm/YyQ2CSdIJdvQPSPO@casper.infradead.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: dev@der-flo.net
> Cc: linux-mm@kvack.org
> Cc: linux-hardening@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  include/linux/vmalloc.h |  1 +
>  mm/usercopy.c           | 11 ++++++++++-
>  mm/vmalloc.c            | 11 +++++++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..c8a00f181a11 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -216,6 +216,7 @@ void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
>  struct vmap_area *find_vmap_area(unsigned long addr);
> +struct vmap_area *find_vmap_area_try(unsigned long addr);
>  
>  static inline bool is_vm_area_hugepages(const void *addr)
>  {
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index c1ee15a98633..4a371099ac64 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -173,7 +173,16 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>  	}
>  
>  	if (is_vmalloc_addr(ptr)) {
> -		struct vmap_area *area = find_vmap_area(addr);
> +		struct vmap_area *area;
> +
> +		if (unlikely(in_interrupt())) {
> +			area = find_vmap_area_try(addr);
> +			/* Give up under interrupt to avoid deadlocks. */
> +			if (!area)
> +				return;
> +		} else {
> +			area = find_vmap_area(addr);
> +		}
>  
>  		if (!area)
>  			usercopy_abort("vmalloc", "no area", to_user, 0, n);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd6cdb201195..f14f1902c2f6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1840,6 +1840,17 @@ struct vmap_area *find_vmap_area(unsigned long addr)
>  	return va;
>  }
>  
> +struct vmap_area *find_vmap_area_try(unsigned long addr)
>
I think it makes sense to add a comment of having such function and for
which context it is supposed to be used. Maybe rename it to something
with "atomic" prefix.

Thanks.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks
  2022-09-16 15:09     ` Kees Cook
@ 2022-09-16 19:15       ` Matthew Wilcox
  2022-09-19  8:29         ` Uladzislau Rezki
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2022-09-16 19:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Uladzislau Rezki, Andrew Morton, Yu Zhao, dev, linux-mm,
	linux-hardening, Peter Zijlstra, Ingo Molnar, linux-kernel, x86,
	linux-perf-users, linux-arch

On Fri, Sep 16, 2022 at 08:09:16AM -0700, Kees Cook wrote:
> On Fri, Sep 16, 2022 at 03:46:07PM +0100, Matthew Wilcox wrote:
> > On Fri, Sep 16, 2022 at 06:59:57AM -0700, Kees Cook wrote:
> > > The check_object_size() checks under CONFIG_HARDENED_USERCOPY need to be
> > > more defensive against running from interrupt context. Use a best-effort
> > > check for VMAP areas when running in interrupt context
> > 
> > I had something more like this in mind:
> 
> Yeah, I like -EAGAIN. I'd like to keep the interrupt test to choose lock
> vs trylock, otherwise it's trivial to bypass the hardening test by having
> all the other CPUs beating on the spinlock.

I was thinking about this:

+++ b/mm/vmalloc.c
@@ -1844,12 +1844,19 @@
 {
 	struct vmap_area *va;

-	if (!spin_lock(&vmap_area_lock))
-		return ERR_PTR(-EAGAIN);
+	/*
+	 * It's safe to walk the rbtree under the RCU lock, but we may
+	 * incorrectly find no vmap_area if the tree is being modified.
+	 */
+	rcu_read_lock();
 	va = __find_vmap_area(addr, &vmap_area_root);
-	spin_unlock(&vmap_area_lock);
+	if (!va && in_interrupt())
+		va = ERR_PTR(-EAGAIN);
+	rcu_read_unlock();

-	return va;
+	if (va)
+		return va;
+	return find_vmap_area(addr);
 }

 /*** Per cpu kva allocator ***/

... but I don't think that works since vmap_areas aren't freed by RCU,
and I think they're reused without going through an RCU cycle.

So here's attempt #4, which actually compiles, and is, I think, what you
had in mind.

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..2b7c52e76856 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -215,7 +215,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
 void free_vm_area(struct vm_struct *area);
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
-struct vmap_area *find_vmap_area(unsigned long addr);
+struct vmap_area *find_vmap_area_try(unsigned long addr);
 
 static inline bool is_vm_area_hugepages(const void *addr)
 {
diff --git a/mm/usercopy.c b/mm/usercopy.c
index c1ee15a98633..e0fb605c1b38 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,7 +173,11 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
 	}
 
 	if (is_vmalloc_addr(ptr)) {
-		struct vmap_area *area = find_vmap_area(addr);
+		struct vmap_area *area = find_vmap_area_try(addr);
+
+		/* We may be in NMI context */
+		if (area == ERR_PTR(-EAGAIN))
+			return;
 
 		if (!area)
 			usercopy_abort("vmalloc", "no area", to_user, 0, n);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd6cdb201195..c47b3b5d1c2d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1829,7 +1829,7 @@ static void free_unmap_vmap_area(struct vmap_area *va)
 	free_vmap_area_noflush(va);
 }
 
-struct vmap_area *find_vmap_area(unsigned long addr)
+static struct vmap_area *find_vmap_area(unsigned long addr)
 {
 	struct vmap_area *va;
 
@@ -1840,6 +1840,26 @@ struct vmap_area *find_vmap_area(unsigned long addr)
 	return va;
 }
 
+/*
+ * The vmap_area_lock is not interrupt-safe, and we can end up here from
+ * NMI context, so it's not worth even trying to make it IRQ-safe.
+ */
+struct vmap_area *find_vmap_area_try(unsigned long addr)
+{
+	struct vmap_area *va;
+
+	if (in_interrupt()) {
+		if (!spin_trylock(&vmap_area_lock))
+			return ERR_PTR(-EAGAIN);
+	} else {
+		spin_lock(&vmap_area_lock);
+	}
+	va = __find_vmap_area(addr, &vmap_area_root);
+	spin_unlock(&vmap_area_lock);
+
+	return va;
+}
+
 /*** Per cpu kva allocator ***/
 
 /*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi()
  2022-09-16 13:59 [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
                   ` (2 preceding siblings ...)
  2022-09-16 13:59 ` [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks Kees Cook
@ 2022-09-16 19:57 ` Andrew Morton
  2022-09-19 14:46   ` Peter Zijlstra
  2022-09-17  2:20 ` Yu Zhao
  4 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2022-09-16 19:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, Uladzislau Rezki, Yu Zhao, dev, Peter Zijlstra,
	Ingo Molnar, linux-kernel, x86, linux-perf-users, linux-mm,
	linux-hardening, linux-arch

On Fri, 16 Sep 2022 06:59:51 -0700 Kees Cook <keescook@chromium.org> wrote:

> Hi,
> 
> This fixes a find_vmap_area() deadlock. The main fix is patch 2, repeated here:
> 
>     The check_object_size() helper under CONFIG_HARDENED_USERCOPY is
>     designed to skip any checks where the length is known at compile time as
>     a reasonable heuristic to avoid "likely known-good" cases. However, it can
>     only do this when the copy_*_user() helpers are, themselves, inline too.
> 
>     Using find_vmap_area() requires taking a spinlock. The check_object_size()
>     helper can call find_vmap_area() when the destination is in vmap memory.
>     If show_regs() is called in interrupt context, it will attempt a call to
>     copy_from_user_nmi(), which may call check_object_size() and then
>     find_vmap_area(). If something in normal context happens to be in the
>     middle of calling find_vmap_area() (with the spinlock held), the interrupt
>     handler will hang forever.
> 
>     The copy_from_user_nmi() call is actually being called with a fixed-size
>     length, so check_object_size() should never have been called in the
>     first place. In order for check_object_size() to see that the length is
>     a fixed size, inline copy_from_user_nmi(), as already done with all the
>     other uaccess helpers.
> 

Why is this so complicated.

There's virtually zero value in running all those debug checks from within
copy_from_user_nmi().

--- a/arch/x86/lib/usercopy.c~a
+++ a/arch/x86/lib/usercopy.c
@@ -44,7 +44,7 @@ copy_from_user_nmi(void *to, const void
 	 * called from other contexts.
 	 */
 	pagefault_disable();
-	ret = __copy_from_user_inatomic(to, from, n);
+	ret = raw_copy_from_user(to, from, n);
 	pagefault_enable();
 
 	return ret;
_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi()
  2022-09-16 13:59 [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
                   ` (3 preceding siblings ...)
  2022-09-16 19:57 ` [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Andrew Morton
@ 2022-09-17  2:20 ` Yu Zhao
  4 siblings, 0 replies; 13+ messages in thread
From: Yu Zhao @ 2022-09-17  2:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, Uladzislau Rezki, Andrew Morton, dev,
	Peter Zijlstra, Ingo Molnar, linux-kernel,
	the arch/x86 maintainers, linux-perf-users, Linux-MM,
	linux-hardening, Linux-Arch

On Fri, Sep 16, 2022 at 8:01 AM Kees Cook <keescook@chromium.org> wrote:
>
> Hi,
>
> This fixes a find_vmap_area() deadlock. The main fix is patch 2, repeated here:
>
>     The check_object_size() helper under CONFIG_HARDENED_USERCOPY is
>     designed to skip any checks where the length is known at compile time as
>     a reasonable heuristic to avoid "likely known-good" cases. However, it can
>     only do this when the copy_*_user() helpers are, themselves, inline too.
>
>     Using find_vmap_area() requires taking a spinlock. The check_object_size()
>     helper can call find_vmap_area() when the destination is in vmap memory.
>     If show_regs() is called in interrupt context, it will attempt a call to
>     copy_from_user_nmi(), which may call check_object_size() and then
>     find_vmap_area(). If something in normal context happens to be in the
>     middle of calling find_vmap_area() (with the spinlock held), the interrupt
>     handler will hang forever.
>
>     The copy_from_user_nmi() call is actually being called with a fixed-size
>     length, so check_object_size() should never have been called in the
>     first place. In order for check_object_size() to see that the length is
>     a fixed size, inline copy_from_user_nmi(), as already done with all the
>     other uaccess helpers.
>
>     Reported-by: Yu Zhao <yuzhao@google.com>
>     Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com
>     Reported-by: dev@der-flo.net
>
> Patch 1 is a refactor for patch 2, and patch 3 should make sure we avoid
> future deadlocks.

Thanks!

Tested-by: Yu Zhao <yuzhao@google.com>

The same test has been holding up well for a few hours now. It used to
throw that warning in less than half an hour.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks
  2022-09-16 19:15       ` Matthew Wilcox
@ 2022-09-19  8:29         ` Uladzislau Rezki
  0 siblings, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2022-09-19  8:29 UTC (permalink / raw)
  To: Matthew Wilcox, Kees Cook
  Cc: Kees Cook, Uladzislau Rezki, Andrew Morton, Yu Zhao, dev,
	linux-mm, linux-hardening, Peter Zijlstra, Ingo Molnar,
	linux-kernel, x86, linux-perf-users, linux-arch

On Fri, Sep 16, 2022 at 08:15:24PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 16, 2022 at 08:09:16AM -0700, Kees Cook wrote:
> > On Fri, Sep 16, 2022 at 03:46:07PM +0100, Matthew Wilcox wrote:
> > > On Fri, Sep 16, 2022 at 06:59:57AM -0700, Kees Cook wrote:
> > > > The check_object_size() checks under CONFIG_HARDENED_USERCOPY need to be
> > > > more defensive against running from interrupt context. Use a best-effort
> > > > check for VMAP areas when running in interrupt context
> > > 
> > > I had something more like this in mind:
> > 
> > Yeah, I like -EAGAIN. I'd like to keep the interrupt test to choose lock
> > vs trylock, otherwise it's trivial to bypass the hardening test by having
> > all the other CPUs beating on the spinlock.
> 
> I was thinking about this:
> 
> +++ b/mm/vmalloc.c
> @@ -1844,12 +1844,19 @@
>  {
>  	struct vmap_area *va;
> 
> -	if (!spin_lock(&vmap_area_lock))
> -		return ERR_PTR(-EAGAIN);
> +	/*
> +	 * It's safe to walk the rbtree under the RCU lock, but we may
> +	 * incorrectly find no vmap_area if the tree is being modified.
> +	 */
> +	rcu_read_lock();
>  	va = __find_vmap_area(addr, &vmap_area_root);
> -	spin_unlock(&vmap_area_lock);
> +	if (!va && in_interrupt())
> +		va = ERR_PTR(-EAGAIN);
> +	rcu_read_unlock();
> 
> -	return va;
> +	if (va)
> +		return va;
> +	return find_vmap_area(addr);
>  }
> 
>  /*** Per cpu kva allocator ***/
> 
> ... but I don't think that works since vmap_areas aren't freed by RCU,
> and I think they're reused without going through an RCU cycle.
>
Right you are. It should be freed via RCU-core. So wrapping by rcu_* is
useless here.

> 
> So here's attempt #4, which actually compiles, and is, I think, what you
> had in mind.
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..2b7c52e76856 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -215,7 +215,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
>  void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
> -struct vmap_area *find_vmap_area(unsigned long addr);
> +struct vmap_area *find_vmap_area_try(unsigned long addr);
>  
>  static inline bool is_vm_area_hugepages(const void *addr)
>  {
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index c1ee15a98633..e0fb605c1b38 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -173,7 +173,11 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
>  	}
>  
>  	if (is_vmalloc_addr(ptr)) {
> -		struct vmap_area *area = find_vmap_area(addr);
> +		struct vmap_area *area = find_vmap_area_try(addr);
> +
> +		/* We may be in NMI context */
> +		if (area == ERR_PTR(-EAGAIN))
> +			return;
>  
>  		if (!area)
>  			usercopy_abort("vmalloc", "no area", to_user, 0, n);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd6cdb201195..c47b3b5d1c2d 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1829,7 +1829,7 @@ static void free_unmap_vmap_area(struct vmap_area *va)
>  	free_vmap_area_noflush(va);
>  }
>  
> -struct vmap_area *find_vmap_area(unsigned long addr)
> +static struct vmap_area *find_vmap_area(unsigned long addr)
>  {
>  	struct vmap_area *va;
>  
> @@ -1840,6 +1840,26 @@ struct vmap_area *find_vmap_area(unsigned long addr)
>  	return va;
>  }
>  
> +/*
> + * The vmap_area_lock is not interrupt-safe, and we can end up here from
> + * NMI context, so it's not worth even trying to make it IRQ-safe.
> + */
> +struct vmap_area *find_vmap_area_try(unsigned long addr)
> +{
> +	struct vmap_area *va;
> +
> +	if (in_interrupt()) {
> +		if (!spin_trylock(&vmap_area_lock))
> +			return ERR_PTR(-EAGAIN);
> +	} else {
> +		spin_lock(&vmap_area_lock);
> +	}
> +	va = __find_vmap_area(addr, &vmap_area_root);
> +	spin_unlock(&vmap_area_lock);
> +
> +	return va;
> +}
> +
>
If we look at it other way around. There is a user that tries to access
the tree from IRQ context. Can we just extend the find_vmap_area() with?

   - in_interrupt();
   - use trylock if so.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi()
  2022-09-16 19:57 ` [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Andrew Morton
@ 2022-09-19 14:46   ` Peter Zijlstra
  2022-09-19 19:26     ` Kees Cook
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2022-09-19 14:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kees Cook, Matthew Wilcox, Uladzislau Rezki, Yu Zhao, dev,
	Ingo Molnar, linux-kernel, x86, linux-perf-users, linux-mm,
	linux-hardening, linux-arch

On Fri, Sep 16, 2022 at 12:57:23PM -0700, Andrew Morton wrote:
> On Fri, 16 Sep 2022 06:59:51 -0700 Kees Cook <keescook@chromium.org> wrote:
> 
> > Hi,
> > 
> > This fixes a find_vmap_area() deadlock. The main fix is patch 2, repeated here:
> > 
> >     The check_object_size() helper under CONFIG_HARDENED_USERCOPY is
> >     designed to skip any checks where the length is known at compile time as
> >     a reasonable heuristic to avoid "likely known-good" cases. However, it can
> >     only do this when the copy_*_user() helpers are, themselves, inline too.
> > 
> >     Using find_vmap_area() requires taking a spinlock. The check_object_size()
> >     helper can call find_vmap_area() when the destination is in vmap memory.
> >     If show_regs() is called in interrupt context, it will attempt a call to
> >     copy_from_user_nmi(), which may call check_object_size() and then
> >     find_vmap_area(). If something in normal context happens to be in the
> >     middle of calling find_vmap_area() (with the spinlock held), the interrupt
> >     handler will hang forever.
> > 
> >     The copy_from_user_nmi() call is actually being called with a fixed-size
> >     length, so check_object_size() should never have been called in the
> >     first place. In order for check_object_size() to see that the length is
> >     a fixed size, inline copy_from_user_nmi(), as already done with all the
> >     other uaccess helpers.
> > 
> 
> Why is this so complicated.
> 
> There's virtually zero value in running all those debug checks from within
> copy_from_user_nmi().
> 
> --- a/arch/x86/lib/usercopy.c~a
> +++ a/arch/x86/lib/usercopy.c
> @@ -44,7 +44,7 @@ copy_from_user_nmi(void *to, const void
>  	 * called from other contexts.
>  	 */
>  	pagefault_disable();
> -	ret = __copy_from_user_inatomic(to, from, n);
> +	ret = raw_copy_from_user(to, from, n);
>  	pagefault_enable();
>  
>  	return ret;

I'm with Andrew here; this looks a *LOT* saner than all the other stuff.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi()
  2022-09-19 14:46   ` Peter Zijlstra
@ 2022-09-19 19:26     ` Kees Cook
  0 siblings, 0 replies; 13+ messages in thread
From: Kees Cook @ 2022-09-19 19:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Matthew Wilcox, Uladzislau Rezki, Yu Zhao, dev,
	Ingo Molnar, linux-kernel, x86, linux-perf-users, linux-mm,
	linux-hardening, linux-arch

On Mon, Sep 19, 2022 at 04:46:39PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 16, 2022 at 12:57:23PM -0700, Andrew Morton wrote:
> > Why is this so complicated.
> > 
> > There's virtually zero value in running all those debug checks from within
> > copy_from_user_nmi().
> > 
> > --- a/arch/x86/lib/usercopy.c~a
> > +++ a/arch/x86/lib/usercopy.c
> > @@ -44,7 +44,7 @@ copy_from_user_nmi(void *to, const void
> >  	 * called from other contexts.
> >  	 */
> >  	pagefault_disable();
> > -	ret = __copy_from_user_inatomic(to, from, n);
> > +	ret = raw_copy_from_user(to, from, n);
> >  	pagefault_enable();
> >  
> >  	return ret;
> 
> I'm with Andrew here; this looks a *LOT* saner than all the other stuff.

Yeah, I'd agree -- it's a special case of a special case. I'll send a
new patch.

Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-19 19:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-16 13:59 [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
2022-09-16 13:59 ` [PATCH 1/3] x86/uaccess: Move nmi_uaccess_okay() into uaccess.h Kees Cook
2022-09-16 13:59 ` [PATCH 2/3] x86/dumpstack: Inline copy_from_user_nmi() Kees Cook
2022-09-16 13:59 ` [PATCH 3/3] usercopy: Add find_vmap_area_try() to avoid deadlocks Kees Cook
2022-09-16 14:46   ` Matthew Wilcox
2022-09-16 15:09     ` Kees Cook
2022-09-16 19:15       ` Matthew Wilcox
2022-09-19  8:29         ` Uladzislau Rezki
2022-09-16 17:29   ` Uladzislau Rezki
2022-09-16 19:57 ` [PATCH 0/3] x86/dumpstack: Inline copy_from_user_nmi() Andrew Morton
2022-09-19 14:46   ` Peter Zijlstra
2022-09-19 19:26     ` Kees Cook
2022-09-17  2:20 ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).