All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
@ 2023-01-18  5:14 Alexei Starovoitov
  2023-01-18  5:14 ` [PATCH bpf 2/2] perf: Fix arch_perf_out_copy_user() Alexei Starovoitov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2023-01-18  5:14 UTC (permalink / raw)
  To: torvalds
  Cc: x86, davem, daniel, andrii, peterz, keescook, tglx, hsinweih,
	rostedt, vegard.nossum, gregkh, alan.maguire, dylany, riel, bpf,
	kernel-team

From: Alexei Starovoitov <ast@kernel.org>

There are several issues with copy_from_user_nofault():

- access_ok() is designed for user context only and for that reason
it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe
and perf on ppc are calling it from irq.

- it's missing nmi_uaccess_okay() which is a nop on all architectures
except x86 where it's required.
The comment in arch/x86/mm/tlb.c explains the details why it's necessary.
Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe.

- __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling
check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock()
which is not safe to do from bpf, [ke]probe and perf due to potential deadlock.

Fix all three issues. At the end the copy_from_user_nofault() becomes
equivalent to copy_from_user_nmi() from safety point of view with
a difference in the return value.

Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 mm/maccess.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 074f6b086671..6ee9b337c501 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -5,6 +5,7 @@
 #include <linux/export.h>
 #include <linux/mm.h>
 #include <linux/uaccess.h>
+#include <asm/tlb.h>
 
 bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
 		size_t size)
@@ -113,11 +114,18 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 long copy_from_user_nofault(void *dst, const void __user *src, size_t size)
 {
 	long ret = -EFAULT;
-	if (access_ok(src, size)) {
-		pagefault_disable();
-		ret = __copy_from_user_inatomic(dst, src, size);
-		pagefault_enable();
-	}
+
+	if (!__access_ok(src, size))
+		return ret;
+
+	if (!nmi_uaccess_okay())
+		return ret;
+
+	pagefault_disable();
+	instrument_copy_from_user_before(dst, src, size);
+	ret = raw_copy_from_user(dst, src, size);
+	instrument_copy_from_user_after(dst, src, size, ret);
+	pagefault_enable();
 
 	if (ret)
 		return -EFAULT;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 2/2] perf: Fix arch_perf_out_copy_user().
  2023-01-18  5:14 [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Alexei Starovoitov
@ 2023-01-18  5:14 ` Alexei Starovoitov
  2023-01-18 21:32 ` [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Hsin-Wei Hung
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2023-01-18  5:14 UTC (permalink / raw)
  To: torvalds
  Cc: x86, davem, daniel, andrii, peterz, keescook, tglx, hsinweih,
	rostedt, vegard.nossum, gregkh, alan.maguire, dylany, riel, bpf,
	kernel-team

From: Alexei Starovoitov <ast@kernel.org>

There are several issues with arch_perf_out_copy_user().
On x86 it's the same as copy_from_user_nmi() and all is good,
but on other archs:

- __access_ok() is missing.
Only on m68k, s390, parisc, sparc64 archs this function returns 'true'.
Other archs must call it before user memory access.
- nmi_uaccess_okay() is missing.
- __copy_from_user_inatomic() issues under CONFIG_HARDENED_USERCOPY.

The latter two issues existed in copy_from_user_nofault() as well and
were fixed in the previous patch.

This patch copies comments from copy_from_user_nmi() into mm/maccess.c
and splits copy_from_user_nofault() into copy_from_user_nmi()
that returns number of not copied bytes and copy_from_user_nofault()
that returns -EFAULT or zero.
With that copy_from_user_nmi() becomes generic and is used
by perf on all architectures.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/include/asm/perf_event.h |  2 --
 arch/x86/lib/Makefile             |  2 +-
 arch/x86/lib/usercopy.c           | 55 -------------------------------
 kernel/events/internal.h          | 16 +--------
 mm/maccess.c                      | 48 ++++++++++++++++++++++-----
 5 files changed, 41 insertions(+), 82 deletions(-)
 delete mode 100644 arch/x86/lib/usercopy.c

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 5d0f6891ae61..2e5cada5f74e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -590,6 +590,4 @@ static inline void perf_lopwr_cb(bool lopwr_in)
  static inline void amd_pmu_disable_virt(void) { }
 #endif
 
-#define arch_perf_out_copy_user copy_from_user_nmi
-
 #endif /* _ASM_X86_PERF_EVENT_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 4f1a40a86534..e85937696afd 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -42,7 +42,7 @@ clean-files := inat-tables.c
 obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 
 lib-y := delay.o misc.o cmdline.o cpu.o
-lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
+lib-y += usercopy_$(BITS).o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-y += pc-conf-reg.o
 lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
deleted file mode 100644
index 24b48af27417..000000000000
--- a/arch/x86/lib/usercopy.c
+++ /dev/null
@@ -1,55 +0,0 @@
-/*
- * User address space access functions.
- *
- *  For licencing details see kernel-base/COPYING
- */
-
-#include <linux/uaccess.h>
-#include <linux/export.h>
-#include <linux/instrumented.h>
-
-#include <asm/tlbflush.h>
-
-/**
- * copy_from_user_nmi - NMI safe copy from user
- * @to:		Pointer to the destination buffer
- * @from:	Pointer to a user space address of the current task
- * @n:		Number of bytes to copy
- *
- * Returns: The number of not copied bytes. 0 is success, i.e. all bytes copied
- *
- * Contrary to other copy_from_user() variants this function can be called
- * from NMI context. Despite the name it is not restricted to be called
- * from NMI context. It is safe to be called from any other context as
- * well. It disables pagefaults across the copy which means a fault will
- * abort the copy.
- *
- * For NMI context invocations this relies on the nested NMI work to allow
- * atomic faults from the NMI path; the nested NMI paths are careful to
- * preserve CR2.
- */
-unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
-{
-	unsigned long ret;
-
-	if (!__access_ok(from, n))
-		return n;
-
-	if (!nmi_uaccess_okay())
-		return n;
-
-	/*
-	 * Even though this function is typically called from NMI/IRQ context
-	 * disable pagefaults so that its behaviour is consistent even when
-	 * called from other contexts.
-	 */
-	pagefault_disable();
-	instrument_copy_from_user_before(to, from, n);
-	ret = raw_copy_from_user(to, from, n);
-	instrument_copy_from_user_after(to, from, n, ret);
-	pagefault_enable();
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(copy_from_user_nmi);
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 5150d5f84c03..62fe2089a1f9 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -190,21 +190,7 @@ memcpy_skip(void *dst, const void *src, unsigned long n)
 
 DEFINE_OUTPUT_COPY(__output_skip, memcpy_skip)
 
-#ifndef arch_perf_out_copy_user
-#define arch_perf_out_copy_user arch_perf_out_copy_user
-
-static inline unsigned long
-arch_perf_out_copy_user(void *dst, const void *src, unsigned long n)
-{
-	unsigned long ret;
-
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(dst, src, n);
-	pagefault_enable();
-
-	return ret;
-}
-#endif
+#define arch_perf_out_copy_user copy_from_user_nmi
 
 DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user)
 
diff --git a/mm/maccess.c b/mm/maccess.c
index 6ee9b337c501..aa7520bb64bf 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -103,17 +103,27 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 }
 
 /**
- * copy_from_user_nofault(): safely attempt to read from a user-space location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from. This must be a user address.
- * @size: size of the data chunk
+ * copy_from_user_nmi - NMI safe copy from user
+ * @dst:	Pointer to the destination buffer
+ * @src:	Pointer to a user space address of the current task
+ * @size:	Number of bytes to copy
  *
- * Safely read from user address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
+ * Returns: The number of not copied bytes. 0 is success, i.e. all bytes copied
+ *
+ * Contrary to other copy_from_user() variants this function can be called
+ * from NMI context. Despite the name it is not restricted to be called
+ * from NMI context. It is safe to be called from any other context as
+ * well. It disables pagefaults across the copy which means a fault will
+ * abort the copy.
+ *
+ * For NMI context invocations this relies on the nested NMI work to allow
+ * atomic faults from the NMI path; the nested NMI paths are careful to
+ * preserve CR2 on X86 architecture.
  */
-long copy_from_user_nofault(void *dst, const void __user *src, size_t size)
+unsigned long
+copy_from_user_nmi(void *dst, const void __user *src, unsigned long size)
 {
-	long ret = -EFAULT;
+	unsigned long ret = size;
 
 	if (!__access_ok(src, size))
 		return ret;
@@ -121,13 +131,33 @@ long copy_from_user_nofault(void *dst, const void __user *src, size_t size)
 	if (!nmi_uaccess_okay())
 		return ret;
 
+	/*
+	 * Even though this function is typically called from NMI/IRQ context
+	 * disable pagefaults so that its behaviour is consistent even when
+	 * called from other contexts.
+	 */
 	pagefault_disable();
 	instrument_copy_from_user_before(dst, src, size);
 	ret = raw_copy_from_user(dst, src, size);
 	instrument_copy_from_user_after(dst, src, size, ret);
 	pagefault_enable();
 
-	if (ret)
+	return ret;
+}
+EXPORT_SYMBOL_GPL(copy_from_user_nmi);
+
+/**
+ * copy_from_user_nofault(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long copy_from_user_nofault(void *dst, const void __user *src, size_t size)
+{
+	if (copy_from_user_nmi(dst, src, size))
 		return -EFAULT;
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-18  5:14 [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Alexei Starovoitov
  2023-01-18  5:14 ` [PATCH bpf 2/2] perf: Fix arch_perf_out_copy_user() Alexei Starovoitov
@ 2023-01-18 21:32 ` Hsin-Wei Hung
  2023-01-19 16:52 ` Kees Cook
  2023-03-25 14:55 ` Florian Lehner
  3 siblings, 0 replies; 12+ messages in thread
From: Hsin-Wei Hung @ 2023-01-18 21:32 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: alan.maguire, andrii, bpf, daniel, davem, dylany, gregkh,
	hsinweih, keescook, kernel-team, peterz, riel, rostedt, tglx,
	torvalds, vegard.nossum, x86

After applying the patches, running the fuzzer with the BPF PoC program no 
longer triggers the warning.

Tested-by: Hsin-Wei Hung <hsinweih@uci.edu>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-18  5:14 [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Alexei Starovoitov
  2023-01-18  5:14 ` [PATCH bpf 2/2] perf: Fix arch_perf_out_copy_user() Alexei Starovoitov
  2023-01-18 21:32 ` [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Hsin-Wei Hung
@ 2023-01-19 16:52 ` Kees Cook
  2023-01-19 19:21   ` Alexei Starovoitov
  2023-03-25 14:55 ` Florian Lehner
  3 siblings, 1 reply; 12+ messages in thread
From: Kees Cook @ 2023-01-19 16:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: torvalds, x86, davem, daniel, andrii, peterz, tglx, hsinweih,
	rostedt, vegard.nossum, gregkh, alan.maguire, dylany, riel, bpf,
	kernel-team

On Tue, Jan 17, 2023 at 09:14:42PM -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> There are several issues with copy_from_user_nofault():
> 
> - access_ok() is designed for user context only and for that reason
> it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe
> and perf on ppc are calling it from irq.
> 
> - it's missing nmi_uaccess_okay() which is a nop on all architectures
> except x86 where it's required.
> The comment in arch/x86/mm/tlb.c explains the details why it's necessary.
> Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe.
> 
> - __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling
> check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock()
> which is not safe to do from bpf, [ke]probe and perf due to potential deadlock.

Er, this drops check_object_size() -- that needs to stay. The vmap area
test in check_object_size is likely what needs fixing. It was discussed
before:
https://lore.kernel.org/lkml/YySML2HfqaE%2FwXBU@casper.infradead.org/

The only reason it was ultimately tolerable to remove the check from
the x86-only _nmi function was because it was being used on compile-time
sized copies.

We need to fix the vmap lookup so the checking doesn't regress --
especially for trace, bpf, etc, where we could have much more interested
dest/source/size combinations. :)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-19 16:52 ` Kees Cook
@ 2023-01-19 19:21   ` Alexei Starovoitov
  2023-01-19 20:08     ` Kees Cook
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2023-01-19 19:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, X86 ML, David S. Miller, Daniel Borkmann,
	Andrii Nakryiko, Peter Zijlstra, Thomas Gleixner, Hsin-Wei Hung,
	Steven Rostedt, Vegard Nossum, Greg Kroah-Hartman, Alan Maguire,
	dylany, Rik van Riel, bpf, Kernel Team

On Thu, Jan 19, 2023 at 8:52 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Jan 17, 2023 at 09:14:42PM -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > There are several issues with copy_from_user_nofault():
> >
> > - access_ok() is designed for user context only and for that reason
> > it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe
> > and perf on ppc are calling it from irq.
> >
> > - it's missing nmi_uaccess_okay() which is a nop on all architectures
> > except x86 where it's required.
> > The comment in arch/x86/mm/tlb.c explains the details why it's necessary.
> > Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe.
> >
> > - __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling
> > check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock()
> > which is not safe to do from bpf, [ke]probe and perf due to potential deadlock.
>
> Er, this drops check_object_size() -- that needs to stay. The vmap area
> test in check_object_size is likely what needs fixing. It was discussed
> before:
> https://lore.kernel.org/lkml/YySML2HfqaE%2FwXBU@casper.infradead.org/

Thanks for the link.
Unfortunately all options discussed in that link won't work,
since all of them rely on in_interrupt() which will not catch the condition.
[ke]probe, bpf, perf can run after spin_lock is taken.
Like via trace_lock_release tracepoint.
It's only with lockdep=on, but still.
Or via trace_contention_begin tracepoint with lockdep=off.
check_object_size() will not execute in_interrupt().

> The only reason it was ultimately tolerable to remove the check from
> the x86-only _nmi function was because it was being used on compile-time
> sized copies.

It doesn't look to be the case.
copy_from_user_nmi() is called via __output_copy_user by perf
with run-time 'size'.

> We need to fix the vmap lookup so the checking doesn't regress --
> especially for trace, bpf, etc, where we could have much more interested
> dest/source/size combinations. :)

Well, for bpf the 'dst' is never a vmalloc area, so
is_vmalloc_addr() and later spin_lock() in check_heap_object()
won't trigger.
Also for bpf the 'dst' area is statically checked by the verifier
at program load time, so at run-time the dst pointer is
guaranteed to be valid and of correct dimensions.
So doing check_object_size() is pointless unless there is a bug
in the verifier, but if there is a bug kasan and friends
will find it sooner. The 'dst' checks are generic and
not copy_from_user_nofault() specific.

For trace, kprobe and perf would be nice to keep check_object_size()
working, of course.

What do you suggest?
I frankly don't see other options other than done in this patch,
though it's not great.
Happy to be proven otherwise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-19 19:21   ` Alexei Starovoitov
@ 2023-01-19 20:08     ` Kees Cook
  2023-01-19 20:14       ` Alexei Starovoitov
  2023-01-19 20:28       ` Alexei Starovoitov
  0 siblings, 2 replies; 12+ messages in thread
From: Kees Cook @ 2023-01-19 20:08 UTC (permalink / raw)
  To: Alexei Starovoitov, Matthew Wilcox
  Cc: Linus Torvalds, X86 ML, David S. Miller, Daniel Borkmann,
	Andrii Nakryiko, Peter Zijlstra, Thomas Gleixner, Hsin-Wei Hung,
	Steven Rostedt, Vegard Nossum, Greg Kroah-Hartman, Alan Maguire,
	dylany, Rik van Riel, bpf, Kernel Team

On Thu, Jan 19, 2023 at 11:21:33AM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 19, 2023 at 8:52 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Tue, Jan 17, 2023 at 09:14:42PM -0800, Alexei Starovoitov wrote:
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > There are several issues with copy_from_user_nofault():
> > >
> > > - access_ok() is designed for user context only and for that reason
> > > it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe
> > > and perf on ppc are calling it from irq.
> > >
> > > - it's missing nmi_uaccess_okay() which is a nop on all architectures
> > > except x86 where it's required.
> > > The comment in arch/x86/mm/tlb.c explains the details why it's necessary.
> > > Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe.
> > >
> > > - __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling
> > > check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock()
> > > which is not safe to do from bpf, [ke]probe and perf due to potential deadlock.
> >
> > Er, this drops check_object_size() -- that needs to stay. The vmap area
> > test in check_object_size is likely what needs fixing. It was discussed
> > before:
> > https://lore.kernel.org/lkml/YySML2HfqaE%2FwXBU@casper.infradead.org/
> 
> Thanks for the link.
> Unfortunately all options discussed in that link won't work,
> since all of them rely on in_interrupt() which will not catch the condition.
> [ke]probe, bpf, perf can run after spin_lock is taken.
> Like via trace_lock_release tracepoint.
> It's only with lockdep=on, but still.
> Or via trace_contention_begin tracepoint with lockdep=off.
> check_object_size() will not execute in_interrupt().
> 
> > The only reason it was ultimately tolerable to remove the check from
> > the x86-only _nmi function was because it was being used on compile-time
> > sized copies.
> 
> It doesn't look to be the case.
> copy_from_user_nmi() is called via __output_copy_user by perf
> with run-time 'size'.

Perhaps this changed recently? It was only called in copy_code() before
when I looked last. Regardless, it still needs solving.

> > We need to fix the vmap lookup so the checking doesn't regress --
> > especially for trace, bpf, etc, where we could have much more interested
> > dest/source/size combinations. :)
> 
> Well, for bpf the 'dst' is never a vmalloc area, so
> is_vmalloc_addr() and later spin_lock() in check_heap_object()
> won't trigger.
> Also for bpf the 'dst' area is statically checked by the verifier
> at program load time, so at run-time the dst pointer is
> guaranteed to be valid and of correct dimensions.
> So doing check_object_size() is pointless unless there is a bug
> in the verifier, but if there is a bug kasan and friends
> will find it sooner. The 'dst' checks are generic and
> not copy_from_user_nofault() specific.
> 
> For trace, kprobe and perf would be nice to keep check_object_size()
> working, of course.
> 
> What do you suggest?
> I frankly don't see other options other than done in this patch,
> though it's not great.
> Happy to be proven otherwise.

Matthew, do you have any thoughts on dealing with this? Can we use a
counter instead of a spin lock?

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-19 20:08     ` Kees Cook
@ 2023-01-19 20:14       ` Alexei Starovoitov
  2023-01-19 20:28       ` Alexei Starovoitov
  1 sibling, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2023-01-19 20:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, Linus Torvalds, X86 ML, David S. Miller,
	Daniel Borkmann, Andrii Nakryiko, Peter Zijlstra,
	Thomas Gleixner, Hsin-Wei Hung, Steven Rostedt, Vegard Nossum,
	Greg Kroah-Hartman, Alan Maguire, dylany, Rik van Riel, bpf,
	Kernel Team

On Thu, Jan 19, 2023 at 12:08 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Jan 19, 2023 at 11:21:33AM -0800, Alexei Starovoitov wrote:
> > On Thu, Jan 19, 2023 at 8:52 AM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Tue, Jan 17, 2023 at 09:14:42PM -0800, Alexei Starovoitov wrote:
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > >
> > > > There are several issues with copy_from_user_nofault():
> > > >
> > > > - access_ok() is designed for user context only and for that reason
> > > > it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe
> > > > and perf on ppc are calling it from irq.
> > > >
> > > > - it's missing nmi_uaccess_okay() which is a nop on all architectures
> > > > except x86 where it's required.
> > > > The comment in arch/x86/mm/tlb.c explains the details why it's necessary.
> > > > Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe.
> > > >
> > > > - __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling
> > > > check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock()
> > > > which is not safe to do from bpf, [ke]probe and perf due to potential deadlock.
> > >
> > > Er, this drops check_object_size() -- that needs to stay. The vmap area
> > > test in check_object_size is likely what needs fixing. It was discussed
> > > before:
> > > https://lore.kernel.org/lkml/YySML2HfqaE%2FwXBU@casper.infradead.org/
> >
> > Thanks for the link.
> > Unfortunately all options discussed in that link won't work,
> > since all of them rely on in_interrupt() which will not catch the condition.
> > [ke]probe, bpf, perf can run after spin_lock is taken.
> > Like via trace_lock_release tracepoint.
> > It's only with lockdep=on, but still.
> > Or via trace_contention_begin tracepoint with lockdep=off.
> > check_object_size() will not execute in_interrupt().
> >
> > > The only reason it was ultimately tolerable to remove the check from
> > > the x86-only _nmi function was because it was being used on compile-time
> > > sized copies.
> >
> > It doesn't look to be the case.
> > copy_from_user_nmi() is called via __output_copy_user by perf
> > with run-time 'size'.
>
> Perhaps this changed recently? It was only called in copy_code() before
> when I looked last. Regardless, it still needs solving.

I think it was this way forever:
perf_output_sample_ustack(handle,
                          data->stack_user_size,
                          data->regs_user.regs);
__output_copy_user(handle, (void *) sp, dump_size);

kernel/events/internal.h:#define arch_perf_out_copy_user copy_from_user_nmi
kernel/events/internal.h:DEFINE_OUTPUT_COPY(__output_copy_user,
arch_perf_out_copy_user)


> > > We need to fix the vmap lookup so the checking doesn't regress --
> > > especially for trace, bpf, etc, where we could have much more interested
> > > dest/source/size combinations. :)
> >
> > Well, for bpf the 'dst' is never a vmalloc area, so
> > is_vmalloc_addr() and later spin_lock() in check_heap_object()
> > won't trigger.
> > Also for bpf the 'dst' area is statically checked by the verifier
> > at program load time, so at run-time the dst pointer is
> > guaranteed to be valid and of correct dimensions.
> > So doing check_object_size() is pointless unless there is a bug
> > in the verifier, but if there is a bug kasan and friends
> > will find it sooner. The 'dst' checks are generic and
> > not copy_from_user_nofault() specific.
> >
> > For trace, kprobe and perf would be nice to keep check_object_size()
> > working, of course.
> >
> > What do you suggest?
> > I frankly don't see other options other than done in this patch,
> > though it's not great.
> > Happy to be proven otherwise.
>
> Matthew, do you have any thoughts on dealing with this? Can we use a
> counter instead of a spin lock?
>
> -Kees
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-19 20:08     ` Kees Cook
  2023-01-19 20:14       ` Alexei Starovoitov
@ 2023-01-19 20:28       ` Alexei Starovoitov
  1 sibling, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2023-01-19 20:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, Linus Torvalds, X86 ML, David S. Miller,
	Daniel Borkmann, Andrii Nakryiko, Peter Zijlstra,
	Thomas Gleixner, Hsin-Wei Hung, Steven Rostedt, Vegard Nossum,
	Greg Kroah-Hartman, Alan Maguire, dylany, Rik van Riel, bpf,
	Kernel Team

On Thu, Jan 19, 2023 at 12:08 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > What do you suggest?
> > I frankly don't see other options other than done in this patch,
> > though it's not great.
> > Happy to be proven otherwise.
>
> Matthew, do you have any thoughts on dealing with this? Can we use a
> counter instead of a spin lock?

Have you consider using pagefault_disabled() instead of in_interrupt()?

spin_trylock() and if (pagefault_disabled()) out ?

or
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 4c3164beacec..83c164aba6e0 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -173,7 +173,7 @@ static inline void check_heap_object(const void
*ptr, unsigned long n,
                return;
        }

-       if (is_vmalloc_addr(ptr)) {
+       if (is_vmalloc_addr(ptr) && !pagefault_disabled()) {
                struct vmap_area *area = find_vmap_area(addr);

effectively gutting that part of check for *_nofault() and *_nmi() ?

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-01-18  5:14 [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2023-01-19 16:52 ` Kees Cook
@ 2023-03-25 14:55 ` Florian Lehner
  2023-03-25 19:47   ` Alexei Starovoitov
  3 siblings, 1 reply; 12+ messages in thread
From: Florian Lehner @ 2023-03-25 14:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: torvalds, x86, davem, daniel, andrii, peterz, keescook, tglx,
	hsinweih, rostedt, vegard.nossum, gregkh, alan.maguire, dylany,
	riel, bpf, kernel-team

With this patch applied on top of bpf/bpf-next (55fbae05) the system no longer runs into a total freeze as reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398.

Tested-by: Florian Lehner <dev@der-flo.net>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-03-25 14:55 ` Florian Lehner
@ 2023-03-25 19:47   ` Alexei Starovoitov
  2023-04-06 20:17     ` Salvatore Bonaccorso
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2023-03-25 19:47 UTC (permalink / raw)
  To: Florian Lehner
  Cc: Linus Torvalds, X86 ML, David S. Miller, Daniel Borkmann,
	Andrii Nakryiko, Peter Zijlstra, Kees Cook, Thomas Gleixner,
	Hsin-Wei Hung, Steven Rostedt, Vegard Nossum, Greg Kroah-Hartman,
	Alan Maguire, Rik van Riel, bpf, Kernel Team

On Sat, Mar 25, 2023 at 7:55 AM Florian Lehner <dev@der-flo.net> wrote:
>
> With this patch applied on top of bpf/bpf-next (55fbae05) the system no longer runs into a total freeze as reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398.
>
> Tested-by: Florian Lehner <dev@der-flo.net>

Thanks for testing and for bumping the thread.
The fix slipped through the cracks.

Looking at the stack trace in bugzilla the patch set
should indeed fix the issue, since the kernel is deadlocking on:
copy_from_user_nofault -> check_object_size -> find_vmap_area -> spin_lock

I'm travelling this and next week, so if you can take over
the whole patch set and roll in the tweak that was proposed back in January:

-       if (is_vmalloc_addr(ptr)) {
+       if (is_vmalloc_addr(ptr) && !pagefault_disabled())

and respin for the bpf tree our group maintainers can review and apply
while I'm travelling.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-03-25 19:47   ` Alexei Starovoitov
@ 2023-04-06 20:17     ` Salvatore Bonaccorso
  2023-04-06 20:24       ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Salvatore Bonaccorso @ 2023-04-06 20:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Florian Lehner, Linus Torvalds, X86 ML, David S. Miller,
	Daniel Borkmann, Andrii Nakryiko, Peter Zijlstra, Kees Cook,
	Thomas Gleixner, Hsin-Wei Hung, Steven Rostedt, Vegard Nossum,
	Greg Kroah-Hartman, Alan Maguire, Rik van Riel, bpf, Kernel Team

Hi,

On Sat, Mar 25, 2023 at 12:47:17PM -0700, Alexei Starovoitov wrote:
> On Sat, Mar 25, 2023 at 7:55 AM Florian Lehner <dev@der-flo.net> wrote:
> >
> > With this patch applied on top of bpf/bpf-next (55fbae05) the system no longer runs into a total freeze as reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398.
> >
> > Tested-by: Florian Lehner <dev@der-flo.net>
> 
> Thanks for testing and for bumping the thread.
> The fix slipped through the cracks.
> 
> Looking at the stack trace in bugzilla the patch set
> should indeed fix the issue, since the kernel is deadlocking on:
> copy_from_user_nofault -> check_object_size -> find_vmap_area -> spin_lock
> 
> I'm travelling this and next week, so if you can take over
> the whole patch set and roll in the tweak that was proposed back in January:
> 
> -       if (is_vmalloc_addr(ptr)) {
> +       if (is_vmalloc_addr(ptr) && !pagefault_disabled())
> 
> and respin for the bpf tree our group maintainers can review and apply
> while I'm travelling.

Anyone can pick it up as suggested by Alexei, and propose that to the
bpf tree maintainers?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/2] mm: Fix copy_from_user_nofault().
  2023-04-06 20:17     ` Salvatore Bonaccorso
@ 2023-04-06 20:24       ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2023-04-06 20:24 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Florian Lehner, Linus Torvalds, X86 ML, David S. Miller,
	Daniel Borkmann, Andrii Nakryiko, Peter Zijlstra, Kees Cook,
	Thomas Gleixner, Hsin-Wei Hung, Steven Rostedt, Vegard Nossum,
	Greg Kroah-Hartman, Alan Maguire, Rik van Riel, bpf, Kernel Team

On Thu, Apr 6, 2023 at 1:17 PM Salvatore Bonaccorso <carnil@debian.org> wrote:
>
> Hi,
>
> On Sat, Mar 25, 2023 at 12:47:17PM -0700, Alexei Starovoitov wrote:
> > On Sat, Mar 25, 2023 at 7:55 AM Florian Lehner <dev@der-flo.net> wrote:
> > >
> > > With this patch applied on top of bpf/bpf-next (55fbae05) the system no longer runs into a total freeze as reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398.
> > >
> > > Tested-by: Florian Lehner <dev@der-flo.net>
> >
> > Thanks for testing and for bumping the thread.
> > The fix slipped through the cracks.
> >
> > Looking at the stack trace in bugzilla the patch set
> > should indeed fix the issue, since the kernel is deadlocking on:
> > copy_from_user_nofault -> check_object_size -> find_vmap_area -> spin_lock
> >
> > I'm travelling this and next week, so if you can take over
> > the whole patch set and roll in the tweak that was proposed back in January:
> >
> > -       if (is_vmalloc_addr(ptr)) {
> > +       if (is_vmalloc_addr(ptr) && !pagefault_disabled())
> >
> > and respin for the bpf tree our group maintainers can review and apply
> > while I'm travelling.
>
> Anyone can pick it up as suggested by Alexei, and propose that to the
> bpf tree maintainers?

Florian already did.
Changes were requested.
https://patchwork.kernel.org/project/netdevbpf/patch/20230329193931.320642-3-dev@der-flo.net/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-04-06 20:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18  5:14 [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Alexei Starovoitov
2023-01-18  5:14 ` [PATCH bpf 2/2] perf: Fix arch_perf_out_copy_user() Alexei Starovoitov
2023-01-18 21:32 ` [PATCH bpf 1/2] mm: Fix copy_from_user_nofault() Hsin-Wei Hung
2023-01-19 16:52 ` Kees Cook
2023-01-19 19:21   ` Alexei Starovoitov
2023-01-19 20:08     ` Kees Cook
2023-01-19 20:14       ` Alexei Starovoitov
2023-01-19 20:28       ` Alexei Starovoitov
2023-03-25 14:55 ` Florian Lehner
2023-03-25 19:47   ` Alexei Starovoitov
2023-04-06 20:17     ` Salvatore Bonaccorso
2023-04-06 20:24       ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.