linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] kernel: introduce uaccess logging
@ 2021-12-08  4:48 Peter Collingbourne
  2021-12-08  4:48 ` [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header Peter Collingbourne
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

This patch series introduces a kernel feature known as uaccess
logging, which allows userspace programs to be made aware of the
address and size of uaccesses performed by the kernel during
the servicing of a syscall. More details on the motivation
for and interface to this feature are available in the file
Documentation/admin-guide/uaccess-logging.rst added by the final
patch in the series.

Because we don't have a common kernel entry/exit code path that is used
on all architectures, uaccess logging is only implemented for arm64
and architectures that use CONFIG_GENERIC_ENTRY, i.e. x86 and s390.

The proposed interface is the result of numerous iterations and
prototyping and is based on a proposal by Dmitry Vyukov. The interface
preserves the correspondence between uaccess log identity and syscall
identity while tolerating incoming asynchronous signals in the interval
between setting up the logging and the actual syscall. We considered
a number of alternative designs but rejected them for various reasons:

- The design from v1 of this patch [1] proposed notifying the kernel
  of the address and size of the uaccess buffer via a prctl that
  would also automatically mask and unmask asynchronous signals as
  needed, but this would require multiple syscalls per "real" syscall,
  harming performance.

- We considered extending the syscall calling convention to
  designate currently-unused registers to be used to pass the
  location of the uaccess buffer, but this was rejected for being
  architecture-specific.

- One idea that we considered involved using the stack pointer address
  as a unique identifier for the syscall, but this currently would
  need to be arch-specific as we currently do not appear to have an
  arch-generic way of retrieving the stack pointer; the userspace
  side would also need some arch-specific code for this to work. It's
  also possible that a longjmp() past the signal handler would make
  the stack pointer address not unique enough for this purpose.

We also evaluated implementing this on top of the existing tracepoint
facility, but concluded that it is not suitable for this purpose:

- Tracepoints have a per-task granularity at best, whereas we really want
  to trace per-syscall. This is so that we can exclude syscalls that
  should not be traced, such as syscalls that make up part of the
  sanitizer implementation (to avoid infinite recursion when e.g. printing
  an error report).

- Tracing would need to be synchronous in order to produce useful
  stack traces. For example this could be achieved using the new SIGTRAP
  on perf events mechanism. However, this would require logging each
  access to the stack (in the form of a sigcontext) and this is more
  likely to overflow the stack due to being much larger than a uaccess
  buffer entry as well as being unbounded, in contrast to the bounded
  buffer size passed to prctl(). An approach based on signal handlers is
  also likely to fall foul of the asynchronous signal issues mentioned
  previously, together with needing sigreturn to be handled specially
  (because it copies a sigcontext from userspace) otherwise we could
  never return from the signal handler. Furthermore, arguments to the
  trace events are not available to SIGTRAP. (This on its own wouldn't
  be insurmountable though -- we could add the arguments as fields
  to siginfo.)

- The API in https://www.kernel.org/doc/Documentation/trace/ftrace.txt
  -- e.g. trace_pipe_raw gives access to the internal ring buffer, but
  I don't think it's usable because it's per-CPU and not per-task.

- Tracepoints can be used by eBPF programs, but eBPF programs may
  only be loaded as root, among other potential headaches.

[1] https://lore.kernel.org/all/20210922061809.736124-1-pcc@google.com/

Peter Collingbourne (6):
  include: split out uaccess instrumentation into a separate header
  uaccess-buffer: add core code
  fs: use copy_from_user_nolog() to copy mount() data
  uaccess-buffer: add CONFIG_GENERIC_ENTRY support
  arm64: add support for uaccess logging
  Documentation: document uaccess logging

 Documentation/admin-guide/index.rst           |   1 +
 Documentation/admin-guide/uaccess-logging.rst | 151 ++++++++++++++++++
 arch/Kconfig                                  |   6 +
 arch/arm64/Kconfig                            |   1 +
 arch/arm64/include/asm/thread_info.h          |   7 +-
 arch/arm64/kernel/ptrace.c                    |   7 +
 arch/arm64/kernel/signal.c                    |   5 +
 arch/arm64/kernel/syscall.c                   |   1 +
 fs/exec.c                                     |   3 +
 fs/namespace.c                                |   8 +-
 include/linux/entry-common.h                  |   2 +
 include/linux/instrumented-uaccess.h          |  53 ++++++
 include/linux/instrumented.h                  |  34 ----
 include/linux/sched.h                         |   5 +
 include/linux/thread_info.h                   |   4 +
 include/linux/uaccess-buffer-info.h           |  46 ++++++
 include/linux/uaccess-buffer.h                | 112 +++++++++++++
 include/linux/uaccess.h                       |   2 +-
 include/uapi/linux/prctl.h                    |   3 +
 include/uapi/linux/uaccess-buffer.h           |  27 ++++
 kernel/Makefile                               |   1 +
 kernel/bpf/helpers.c                          |   7 +-
 kernel/entry/common.c                         |  10 ++
 kernel/fork.c                                 |   4 +
 kernel/signal.c                               |   4 +-
 kernel/sys.c                                  |   6 +
 kernel/uaccess-buffer.c                       | 129 +++++++++++++++
 lib/iov_iter.c                                |   2 +-
 lib/usercopy.c                                |   2 +-
 29 files changed, 602 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/admin-guide/uaccess-logging.rst
 create mode 100644 include/linux/instrumented-uaccess.h
 create mode 100644 include/linux/uaccess-buffer-info.h
 create mode 100644 include/linux/uaccess-buffer.h
 create mode 100644 include/uapi/linux/uaccess-buffer.h
 create mode 100644 kernel/uaccess-buffer.c

-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
@ 2021-12-08  4:48 ` Peter Collingbourne
  2021-12-08  9:27   ` Dmitry Vyukov
  2021-12-08  4:48 ` [PATCH v3 2/6] uaccess-buffer: add core code Peter Collingbourne
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

In an upcoming change we are going to add uaccess instrumentation
that uses inline access to struct task_struct from the
instrumentation routines. Because instrumentation.h is included
from many places including (recursively) from sched.h this would
otherwise lead to a circular dependency. Break the dependency by
moving uaccess instrumentation routines into a separate header,
instrumentation-uaccess.h.

Link: https://linux-review.googlesource.com/id/I625728db0c8db374e13e4ebc54985ac5c79ace7d
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 include/linux/instrumented-uaccess.h | 49 ++++++++++++++++++++++++++++
 include/linux/instrumented.h         | 34 -------------------
 include/linux/uaccess.h              |  2 +-
 lib/iov_iter.c                       |  2 +-
 lib/usercopy.c                       |  2 +-
 5 files changed, 52 insertions(+), 37 deletions(-)
 create mode 100644 include/linux/instrumented-uaccess.h

diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
new file mode 100644
index 000000000000..ece549088e50
--- /dev/null
+++ b/include/linux/instrumented-uaccess.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This header provides generic wrappers for memory access instrumentation for
+ * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
+ */
+#ifndef _LINUX_INSTRUMENTED_UACCESS_H
+#define _LINUX_INSTRUMENTED_UACCESS_H
+
+#include <linux/compiler.h>
+#include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
+#include <linux/types.h>
+
+/**
+ * instrument_copy_to_user - instrument reads of copy_to_user
+ *
+ * Instrument reads from kernel memory, that are due to copy_to_user (and
+ * variants). The instrumentation must be inserted before the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ */
+static __always_inline void
+instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
+{
+	kasan_check_read(from, n);
+	kcsan_check_read(from, n);
+}
+
+/**
+ * instrument_copy_from_user - instrument writes of copy_from_user
+ *
+ * Instrument writes to kernel memory, that are due to copy_from_user (and
+ * variants). The instrumentation should be inserted before the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ */
+static __always_inline void
+instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
+{
+	kasan_check_write(to, n);
+	kcsan_check_write(to, n);
+}
+
+#endif /* _LINUX_INSTRUMENTED_UACCESS_H */
diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 42faebbaa202..b68f415510c7 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -102,38 +102,4 @@ static __always_inline void instrument_atomic_read_write(const volatile void *v,
 	kcsan_check_atomic_read_write(v, size);
 }
 
-/**
- * instrument_copy_to_user - instrument reads of copy_to_user
- *
- * Instrument reads from kernel memory, that are due to copy_to_user (and
- * variants). The instrumentation must be inserted before the accesses.
- *
- * @to destination address
- * @from source address
- * @n number of bytes to copy
- */
-static __always_inline void
-instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
-{
-	kasan_check_read(from, n);
-	kcsan_check_read(from, n);
-}
-
-/**
- * instrument_copy_from_user - instrument writes of copy_from_user
- *
- * Instrument writes to kernel memory, that are due to copy_from_user (and
- * variants). The instrumentation should be inserted before the accesses.
- *
- * @to destination address
- * @from source address
- * @n number of bytes to copy
- */
-static __always_inline void
-instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
-{
-	kasan_check_write(to, n);
-	kcsan_check_write(to, n);
-}
-
 #endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ac0394087f7d..c0c467e39657 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -3,7 +3,7 @@
 #define __LINUX_UACCESS_H__
 
 #include <linux/fault-inject-usercopy.h>
-#include <linux/instrumented.h>
+#include <linux/instrumented-uaccess.h>
 #include <linux/minmax.h>
 #include <linux/sched.h>
 #include <linux/thread_info.h>
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 66a740e6e153..3f9dc6df7102 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -12,7 +12,7 @@
 #include <linux/compat.h>
 #include <net/checksum.h>
 #include <linux/scatterlist.h>
-#include <linux/instrumented.h>
+#include <linux/instrumented-uaccess.h>
 
 #define PIPE_PARANOIA /* for now */
 
diff --git a/lib/usercopy.c b/lib/usercopy.c
index 7413dd300516..1cd188e62d06 100644
--- a/lib/usercopy.c
+++ b/lib/usercopy.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/bitops.h>
 #include <linux/fault-inject-usercopy.h>
-#include <linux/instrumented.h>
+#include <linux/instrumented-uaccess.h>
 #include <linux/uaccess.h>
 
 /* out-of-line parts */
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/6] uaccess-buffer: add core code
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
  2021-12-08  4:48 ` [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header Peter Collingbourne
@ 2021-12-08  4:48 ` Peter Collingbourne
  2021-12-08 10:21   ` Dmitry Vyukov
  2021-12-08 16:46   ` Marco Elver
  2021-12-08  4:48 ` [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data Peter Collingbourne
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

Add the core code to support uaccess logging. Subsequent patches will
hook this up to the arch-specific kernel entry and exit code for
certain architectures.

Link: https://linux-review.googlesource.com/id/I6581765646501a5631b281d670903945ebadc57d
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
v3:
- performance optimizations for entry/exit code
- don't use kcur == NULL to mean overflow
- fix potential double free in clone()
- don't allocate a new kernel-side uaccess buffer for each syscall
- fix uaccess buffer leak on exit
- fix some sparse warnings

v2:
- New interface that avoids multiple syscalls per real syscall and
  is arch-generic
- Avoid logging uaccesses done by BPF programs
- Add documentation
- Split up into multiple patches
- Various code moves, renames etc as requested by Marco

 fs/exec.c                            |   3 +
 include/linux/instrumented-uaccess.h |   6 +-
 include/linux/sched.h                |   5 ++
 include/linux/uaccess-buffer-info.h  |  46 ++++++++++
 include/linux/uaccess-buffer.h       | 112 +++++++++++++++++++++++
 include/uapi/linux/prctl.h           |   3 +
 include/uapi/linux/uaccess-buffer.h  |  27 ++++++
 kernel/Makefile                      |   1 +
 kernel/bpf/helpers.c                 |   7 +-
 kernel/fork.c                        |   4 +
 kernel/signal.c                      |   4 +-
 kernel/sys.c                         |   6 ++
 kernel/uaccess-buffer.c              | 129 +++++++++++++++++++++++++++
 13 files changed, 350 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/uaccess-buffer-info.h
 create mode 100644 include/linux/uaccess-buffer.h
 create mode 100644 include/uapi/linux/uaccess-buffer.h
 create mode 100644 kernel/uaccess-buffer.c

diff --git a/fs/exec.c b/fs/exec.c
index 537d92c41105..c9975e790f30 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -65,6 +65,7 @@
 #include <linux/vmalloc.h>
 #include <linux/io_uring.h>
 #include <linux/syscall_user_dispatch.h>
+#include <linux/uaccess-buffer.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1313,6 +1314,8 @@ int begin_new_exec(struct linux_binprm * bprm)
 	me->personality &= ~bprm->per_clear;
 
 	clear_syscall_work_syscall_user_dispatch(me);
+	uaccess_buffer_set_descriptor_addr_addr(0);
+	uaccess_buffer_free(current);
 
 	/*
 	 * We have to apply CLOEXEC before we change whether the process is
diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
index ece549088e50..b967f4436d15 100644
--- a/include/linux/instrumented-uaccess.h
+++ b/include/linux/instrumented-uaccess.h
@@ -2,7 +2,8 @@
 
 /*
  * This header provides generic wrappers for memory access instrumentation for
- * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
+ * uaccess routines that the compiler cannot emit for: KASAN, KCSAN,
+ * uaccess buffers.
  */
 #ifndef _LINUX_INSTRUMENTED_UACCESS_H
 #define _LINUX_INSTRUMENTED_UACCESS_H
@@ -11,6 +12,7 @@
 #include <linux/kasan-checks.h>
 #include <linux/kcsan-checks.h>
 #include <linux/types.h>
+#include <linux/uaccess-buffer.h>
 
 /**
  * instrument_copy_to_user - instrument reads of copy_to_user
@@ -27,6 +29,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
 {
 	kasan_check_read(from, n);
 	kcsan_check_read(from, n);
+	uaccess_buffer_log_write(to, n);
 }
 
 /**
@@ -44,6 +47,7 @@ instrument_copy_from_user(const void *to, const void __user *from, unsigned long
 {
 	kasan_check_write(to, n);
 	kcsan_check_write(to, n);
+	uaccess_buffer_log_read(from, n);
 }
 
 #endif /* _LINUX_INSTRUMENTED_UACCESS_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..7c5278d7b57d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,6 +34,7 @@
 #include <linux/rseq.h>
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
+#include <linux/uaccess-buffer-info.h>
 #include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
@@ -1484,6 +1485,10 @@ struct task_struct {
 	struct callback_head		l1d_flush_kill;
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
+	struct uaccess_buffer_info	uaccess_buffer;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/include/linux/uaccess-buffer-info.h b/include/linux/uaccess-buffer-info.h
new file mode 100644
index 000000000000..15e2d8f7c074
--- /dev/null
+++ b/include/linux/uaccess-buffer-info.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_UACCESS_BUFFER_INFO_H
+#define _LINUX_UACCESS_BUFFER_INFO_H
+
+#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
+
+struct uaccess_buffer_info {
+	/*
+	 * The pointer to pointer to struct uaccess_descriptor. This is the
+	 * value controlled by prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR).
+	 */
+	struct uaccess_descriptor __user *__user *desc_ptr_ptr;
+
+	/*
+	 * The pointer to struct uaccess_descriptor read at syscall entry time.
+	 */
+	struct uaccess_descriptor __user *desc_ptr;
+
+	/*
+	 * A pointer to the kernel's temporary copy of the uaccess log for the
+	 * current syscall. We log to a kernel buffer in order to avoid leaking
+	 * timing information to userspace.
+	 */
+	struct uaccess_buffer_entry *kbegin;
+
+	/*
+	 * The position of the next uaccess buffer entry for the current
+	 * syscall, or NULL if we are not logging the current syscall.
+	 */
+	struct uaccess_buffer_entry *kcur;
+
+	/*
+	 * A pointer to the end of the kernel's uaccess log.
+	 */
+	struct uaccess_buffer_entry *kend;
+
+	/*
+	 * The pointer to the userspace uaccess log, as read from the
+	 * struct uaccess_descriptor.
+	 */
+	struct uaccess_buffer_entry __user *ubegin;
+};
+
+#endif
+
+#endif  /* _LINUX_UACCESS_BUFFER_INFO_H */
diff --git a/include/linux/uaccess-buffer.h b/include/linux/uaccess-buffer.h
new file mode 100644
index 000000000000..f2f46db274f3
--- /dev/null
+++ b/include/linux/uaccess-buffer.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_UACCESS_BUFFER_H
+#define _LINUX_UACCESS_BUFFER_H
+
+#include <linux/sched.h>
+#include <uapi/linux/uaccess-buffer.h>
+
+#include <asm-generic/errno-base.h>
+
+#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
+
+static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
+{
+	return test_task_syscall_work(tsk, UACCESS_BUFFER_ENTRY);
+}
+
+void __uaccess_buffer_syscall_entry(void);
+static inline void uaccess_buffer_syscall_entry(void)
+{
+	__uaccess_buffer_syscall_entry();
+}
+
+void __uaccess_buffer_syscall_exit(void);
+static inline void uaccess_buffer_syscall_exit(void)
+{
+	__uaccess_buffer_syscall_exit();
+}
+
+bool __uaccess_buffer_pre_exit_loop(void);
+static inline bool uaccess_buffer_pre_exit_loop(void)
+{
+	if (!test_syscall_work(UACCESS_BUFFER_ENTRY))
+		return false;
+	return __uaccess_buffer_pre_exit_loop();
+}
+
+void __uaccess_buffer_post_exit_loop(void);
+static inline void uaccess_buffer_post_exit_loop(bool pending)
+{
+	if (pending)
+		__uaccess_buffer_post_exit_loop();
+}
+
+static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
+{
+	current->uaccess_buffer.desc_ptr_ptr =
+		(struct uaccess_descriptor __user * __user *)addr;
+	if (addr)
+		set_syscall_work(UACCESS_BUFFER_ENTRY);
+	else
+		clear_syscall_work(UACCESS_BUFFER_ENTRY);
+	return 0;
+}
+
+size_t copy_from_user_nolog(void *to, const void __user *from, size_t len);
+
+void uaccess_buffer_free(struct task_struct *tsk);
+
+void __uaccess_buffer_log_read(const void __user *from, unsigned long n);
+static inline void uaccess_buffer_log_read(const void __user *from, unsigned long n)
+{
+	if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
+		__uaccess_buffer_log_read(from, n);
+}
+
+void __uaccess_buffer_log_write(void __user *to, unsigned long n);
+static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
+{
+	if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
+		__uaccess_buffer_log_write(to, n);
+}
+
+#else
+
+static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
+{
+	return false;
+}
+static inline void uaccess_buffer_syscall_entry(void)
+{
+}
+static inline void uaccess_buffer_syscall_exit(void)
+{
+}
+static inline bool uaccess_buffer_pre_exit_loop(void)
+{
+	return false;
+}
+static inline void uaccess_buffer_post_exit_loop(bool pending)
+{
+}
+static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
+{
+	return -EINVAL;
+}
+static inline void uaccess_buffer_free(struct task_struct *tsk)
+{
+}
+
+#define copy_from_user_nolog(to, from, len) copy_from_user(to, from, len)
+
+static inline void uaccess_buffer_log_read(const void __user *from,
+					   unsigned long n)
+{
+}
+static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
+{
+}
+
+#endif
+
+#endif  /* _LINUX_UACCESS_BUFFER_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index bb73e9a0b24f..74b37469c7b3 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -272,4 +272,7 @@ struct prctl_mm_map {
 # define PR_SCHED_CORE_SCOPE_THREAD_GROUP	1
 # define PR_SCHED_CORE_SCOPE_PROCESS_GROUP	2
 
+/* Configure uaccess logging feature */
+#define PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR	63
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/uapi/linux/uaccess-buffer.h b/include/uapi/linux/uaccess-buffer.h
new file mode 100644
index 000000000000..bf10f7c78857
--- /dev/null
+++ b/include/uapi/linux/uaccess-buffer.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_UACCESS_BUFFER_H
+#define _UAPI_LINUX_UACCESS_BUFFER_H
+
+#include <linux/types.h>
+
+/* Location of the uaccess log. */
+struct uaccess_descriptor {
+	/* Address of the uaccess_buffer_entry array. */
+	__u64 addr;
+	/* Size of the uaccess_buffer_entry array in number of elements. */
+	__u64 size;
+};
+
+/* Format of the entries in the uaccess log. */
+struct uaccess_buffer_entry {
+	/* Address being accessed. */
+	__u64 addr;
+	/* Number of bytes that were accessed. */
+	__u64 size;
+	/* UACCESS_BUFFER_* flags. */
+	__u64 flags;
+};
+
+#define UACCESS_BUFFER_FLAG_WRITE	1 /* access was a write */
+
+#endif /* _UAPI_LINUX_UACCESS_BUFFER_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 186c49582f45..d4d9be5146c3 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -114,6 +114,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
 obj-$(CONFIG_CFI_CLANG) += cfi.o
+obj-$(CONFIG_HAVE_ARCH_UACCESS_BUFFER) += uaccess-buffer.o
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 649f07623df6..ab6520a633ef 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -15,6 +15,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/proc_ns.h>
 #include <linux/security.h>
+#include <linux/uaccess-buffer.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -637,7 +638,11 @@ const struct bpf_func_proto bpf_event_output_data_proto =  {
 BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
 	   const void __user *, user_ptr)
 {
-	int ret = copy_from_user(dst, user_ptr, size);
+	/*
+	 * Avoid logging uaccesses here as the BPF program may not be following
+	 * the uaccess log rules.
+	 */
+	int ret = copy_from_user_nolog(dst, user_ptr, size);
 
 	if (unlikely(ret)) {
 		memset(dst, 0, size);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697..8be2ca528a65 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -96,6 +96,7 @@
 #include <linux/scs.h>
 #include <linux/io_uring.h>
 #include <linux/bpf.h>
+#include <linux/uaccess-buffer.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -754,6 +755,7 @@ void __put_task_struct(struct task_struct *tsk)
 	delayacct_tsk_free(tsk);
 	put_signal_struct(tsk->signal);
 	sched_core_free(tsk);
+	uaccess_buffer_free(tsk);
 
 	if (!profile_handoff_task(tsk))
 		free_task(tsk);
@@ -890,6 +892,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	if (memcg_charge_kernel_stack(tsk))
 		goto free_stack;
 
+	uaccess_buffer_free(orig);
+
 	stack_vm_area = task_stack_vm_area(tsk);
 
 	err = arch_dup_task_struct(tsk, orig);
diff --git a/kernel/signal.c b/kernel/signal.c
index a629b11bf3e0..69bf21518bd0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -45,6 +45,7 @@
 #include <linux/posix-timers.h>
 #include <linux/cgroup.h>
 #include <linux/audit.h>
+#include <linux/uaccess-buffer.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
@@ -1031,7 +1032,8 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
 	if (sig_fatal(p, sig) &&
 	    !(signal->flags & SIGNAL_GROUP_EXIT) &&
 	    !sigismember(&t->real_blocked, sig) &&
-	    (sig == SIGKILL || !p->ptrace)) {
+	    (sig == SIGKILL ||
+	     !(p->ptrace || uaccess_buffer_maybe_blocked(p)))) {
 		/*
 		 * This signal will be fatal to the whole group.
 		 */
diff --git a/kernel/sys.c b/kernel/sys.c
index 8fdac0d90504..c71a9a9c0f68 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -42,6 +42,7 @@
 #include <linux/version.h>
 #include <linux/ctype.h>
 #include <linux/syscall_user_dispatch.h>
+#include <linux/uaccess-buffer.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -2530,6 +2531,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
 		break;
 #endif
+	case PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR:
+		if (arg3 || arg4 || arg5)
+			return -EINVAL;
+		error = uaccess_buffer_set_descriptor_addr_addr(arg2);
+		break;
 	default:
 		error = -EINVAL;
 		break;
diff --git a/kernel/uaccess-buffer.c b/kernel/uaccess-buffer.c
new file mode 100644
index 000000000000..088e43f7611c
--- /dev/null
+++ b/kernel/uaccess-buffer.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Support for uaccess logging via uaccess buffers.
+ *
+ * Copyright (C) 2021, Google LLC.
+ */
+
+#include <linux/compat.h>
+#include <linux/mm.h>
+#include <linux/prctl.h>
+#include <linux/ptrace.h>
+#include <linux/sched.h>
+#include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/uaccess-buffer.h>
+
+static void uaccess_buffer_log(unsigned long addr, unsigned long size,
+			      unsigned long flags)
+{
+	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
+	struct uaccess_buffer_entry *entry = buf->kcur;
+
+	if (entry == buf->kend || unlikely(uaccess_kernel()))
+		return;
+	entry->addr = addr;
+	entry->size = size;
+	entry->flags = flags;
+
+	++buf->kcur;
+}
+
+void __uaccess_buffer_log_read(const void __user *from, unsigned long n)
+{
+	uaccess_buffer_log((unsigned long)from, n, 0);
+}
+EXPORT_SYMBOL(__uaccess_buffer_log_read);
+
+void __uaccess_buffer_log_write(void __user *to, unsigned long n)
+{
+	uaccess_buffer_log((unsigned long)to, n, UACCESS_BUFFER_FLAG_WRITE);
+}
+EXPORT_SYMBOL(__uaccess_buffer_log_write);
+
+bool __uaccess_buffer_pre_exit_loop(void)
+{
+	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
+	struct uaccess_descriptor __user *desc_ptr;
+	sigset_t tmp_mask;
+
+	if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
+		return false;
+
+	current->real_blocked = current->blocked;
+	sigfillset(&tmp_mask);
+	set_current_blocked(&tmp_mask);
+	return true;
+}
+
+void __uaccess_buffer_post_exit_loop(void)
+{
+	spin_lock_irq(&current->sighand->siglock);
+	current->blocked = current->real_blocked;
+	recalc_sigpending();
+	spin_unlock_irq(&current->sighand->siglock);
+}
+
+void uaccess_buffer_free(struct task_struct *tsk)
+{
+	struct uaccess_buffer_info *buf = &tsk->uaccess_buffer;
+
+	kfree(buf->kbegin);
+	clear_syscall_work(UACCESS_BUFFER_EXIT);
+	buf->kbegin = buf->kcur = buf->kend = NULL;
+}
+
+void __uaccess_buffer_syscall_entry(void)
+{
+	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
+	struct uaccess_descriptor desc;
+
+	if (get_user(buf->desc_ptr, buf->desc_ptr_ptr) || !buf->desc_ptr ||
+	    put_user(0, buf->desc_ptr_ptr) ||
+	    copy_from_user(&desc, buf->desc_ptr, sizeof(desc)))
+		return;
+
+	if (desc.size > 1024)
+		desc.size = 1024;
+
+	if (buf->kend - buf->kbegin != desc.size)
+		buf->kbegin =
+			krealloc_array(buf->kbegin, desc.size,
+				       sizeof(struct uaccess_buffer_entry),
+				       GFP_KERNEL);
+	if (!buf->kbegin)
+		return;
+
+	set_syscall_work(UACCESS_BUFFER_EXIT);
+	buf->kcur = buf->kbegin;
+	buf->kend = buf->kbegin + desc.size;
+	buf->ubegin =
+		(struct uaccess_buffer_entry __user *)(unsigned long)desc.addr;
+}
+
+void __uaccess_buffer_syscall_exit(void)
+{
+	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
+	u64 num_entries = buf->kcur - buf->kbegin;
+	struct uaccess_descriptor desc;
+
+	clear_syscall_work(UACCESS_BUFFER_EXIT);
+	desc.addr = (u64)(unsigned long)(buf->ubegin + num_entries);
+	desc.size = buf->kend - buf->kcur;
+	buf->kcur = NULL;
+	if (copy_to_user(buf->ubegin, buf->kbegin,
+			 num_entries * sizeof(struct uaccess_buffer_entry)) == 0)
+		(void)copy_to_user(buf->desc_ptr, &desc, sizeof(desc));
+}
+
+size_t copy_from_user_nolog(void *to, const void __user *from, size_t len)
+{
+	size_t retval;
+
+	clear_syscall_work(UACCESS_BUFFER_EXIT);
+	retval = copy_from_user(to, from, len);
+	if (current->uaccess_buffer.kcur)
+		set_syscall_work(UACCESS_BUFFER_EXIT);
+	return retval;
+}
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
  2021-12-08  4:48 ` [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header Peter Collingbourne
  2021-12-08  4:48 ` [PATCH v3 2/6] uaccess-buffer: add core code Peter Collingbourne
@ 2021-12-08  4:48 ` Peter Collingbourne
  2021-12-08  9:34   ` Dmitry Vyukov
  2021-12-08  4:48 ` [PATCH v3 4/6] uaccess-buffer: add CONFIG_GENERIC_ENTRY support Peter Collingbourne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

With uaccess logging the contract is that the kernel must not report
accessing more data than necessary, as this can lead to false positive
reports in downstream consumers. This generally works out of the box
when instrumenting copy_{from,to}_user(), but with the data argument
to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
much as we can, if the PAGE_SIZE sized access failed) and figure out
later how much we actually need.

To prevent this from leading to a false positive report, use
copy_from_user_nolog(), which will prevent the access from being logged.
Recall that it is valid for the kernel to report accessing less
data than it actually accessed, as uaccess logging is a best-effort
mechanism for reporting uaccesses.

Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 fs/namespace.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 659a8f39c61a..8f5f2aaca64e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -31,6 +31,7 @@
 #include <uapi/linux/mount.h>
 #include <linux/fs_context.h>
 #include <linux/shmem_fs.h>
+#include <linux/uaccess-buffer.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	left = copy_from_user(copy, data, PAGE_SIZE);
+	/*
+	 * Use copy_from_user_nolog to avoid reporting overly large accesses in
+	 * the uaccess buffer, as this can lead to false positive reports in
+	 * downstream consumers.
+	 */
+	left = copy_from_user_nolog(copy, data, PAGE_SIZE);
 
 	/*
 	 * Not all architectures have an exact copy_from_user(). Resort to
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/6] uaccess-buffer: add CONFIG_GENERIC_ENTRY support
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
                   ` (2 preceding siblings ...)
  2021-12-08  4:48 ` [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data Peter Collingbourne
@ 2021-12-08  4:48 ` Peter Collingbourne
  2021-12-08  9:43   ` Dmitry Vyukov
  2021-12-08  4:48 ` [PATCH v3 5/6] arm64: add support for uaccess logging Peter Collingbourne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

Add uaccess logging support on architectures that use
CONFIG_GENERIC_ENTRY (currently only s390 and x86).

Link: https://linux-review.googlesource.com/id/I3c5eb19a7e4a1dbe6095f6971f7826c4b0663f7d
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 arch/Kconfig                 |  6 ++++++
 include/linux/entry-common.h |  2 ++
 include/linux/thread_info.h  |  4 ++++
 kernel/entry/common.c        | 10 ++++++++++
 4 files changed, 22 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index d3c4ab249e9c..c4dcab5279ac 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -31,6 +31,7 @@ config HOTPLUG_SMT
 	bool
 
 config GENERIC_ENTRY
+       select HAVE_ARCH_UACCESS_BUFFER
        bool
 
 config KPROBES
@@ -1312,6 +1313,11 @@ config ARCH_HAS_PARANOID_L1D_FLUSH
 config DYNAMIC_SIGFRAME
 	bool
 
+config HAVE_ARCH_UACCESS_BUFFER
+	bool
+	help
+	  Select if the architecture's syscall entry/exit code supports uaccess buffers.
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 2e2b8d6140ed..973fcd1d48a3 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -42,12 +42,14 @@
 				 SYSCALL_WORK_SYSCALL_EMU |		\
 				 SYSCALL_WORK_SYSCALL_AUDIT |		\
 				 SYSCALL_WORK_SYSCALL_USER_DISPATCH |	\
+				 SYSCALL_WORK_UACCESS_BUFFER_ENTRY |	\
 				 ARCH_SYSCALL_WORK_ENTER)
 #define SYSCALL_WORK_EXIT	(SYSCALL_WORK_SYSCALL_TRACEPOINT |	\
 				 SYSCALL_WORK_SYSCALL_TRACE |		\
 				 SYSCALL_WORK_SYSCALL_AUDIT |		\
 				 SYSCALL_WORK_SYSCALL_USER_DISPATCH |	\
 				 SYSCALL_WORK_SYSCALL_EXIT_TRAP	|	\
+				 SYSCALL_WORK_UACCESS_BUFFER_EXIT |	\
 				 ARCH_SYSCALL_WORK_EXIT)
 
 /*
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index ad0c4e041030..b0f8ea86967f 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -46,6 +46,8 @@ enum syscall_work_bit {
 	SYSCALL_WORK_BIT_SYSCALL_AUDIT,
 	SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH,
 	SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP,
+	SYSCALL_WORK_BIT_UACCESS_BUFFER_ENTRY,
+	SYSCALL_WORK_BIT_UACCESS_BUFFER_EXIT,
 };
 
 #define SYSCALL_WORK_SECCOMP		BIT(SYSCALL_WORK_BIT_SECCOMP)
@@ -55,6 +57,8 @@ enum syscall_work_bit {
 #define SYSCALL_WORK_SYSCALL_AUDIT	BIT(SYSCALL_WORK_BIT_SYSCALL_AUDIT)
 #define SYSCALL_WORK_SYSCALL_USER_DISPATCH BIT(SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH)
 #define SYSCALL_WORK_SYSCALL_EXIT_TRAP	BIT(SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP)
+#define SYSCALL_WORK_UACCESS_BUFFER_ENTRY	BIT(SYSCALL_WORK_BIT_UACCESS_BUFFER_ENTRY)
+#define SYSCALL_WORK_UACCESS_BUFFER_EXIT	BIT(SYSCALL_WORK_BIT_UACCESS_BUFFER_EXIT)
 #endif
 
 #include <asm/thread_info.h>
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index d5a61d565ad5..57c4bb01a554 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -6,6 +6,7 @@
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
+#include <linux/uaccess-buffer.h>
 
 #include "common.h"
 
@@ -70,6 +71,9 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
 			return ret;
 	}
 
+	if (work & SYSCALL_WORK_UACCESS_BUFFER_ENTRY)
+		uaccess_buffer_syscall_entry();
+
 	/* Either of the above might have changed the syscall number */
 	syscall = syscall_get_nr(current, regs);
 
@@ -197,14 +201,17 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 static void exit_to_user_mode_prepare(struct pt_regs *regs)
 {
 	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+	bool uaccess_buffer_pending;
 
 	lockdep_assert_irqs_disabled();
 
 	/* Flush pending rcuog wakeup before the last need_resched() check */
 	tick_nohz_user_enter_prepare();
 
+	uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
 	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
 		ti_work = exit_to_user_mode_loop(regs, ti_work);
+	uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
 
 	arch_exit_to_user_mode_prepare(regs, ti_work);
 
@@ -247,6 +254,9 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
 
 	audit_syscall_exit(regs);
 
+	if (work & SYSCALL_WORK_UACCESS_BUFFER_EXIT)
+		uaccess_buffer_syscall_exit();
+
 	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
 		trace_sys_exit(regs, syscall_get_return_value(current, regs));
 
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/6] arm64: add support for uaccess logging
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
                   ` (3 preceding siblings ...)
  2021-12-08  4:48 ` [PATCH v3 4/6] uaccess-buffer: add CONFIG_GENERIC_ENTRY support Peter Collingbourne
@ 2021-12-08  4:48 ` Peter Collingbourne
  2021-12-08  9:48   ` Dmitry Vyukov
  2021-12-08  4:48 ` [PATCH v3 6/6] Documentation: document " Peter Collingbourne
  2021-12-08 15:33 ` [PATCH v3 0/6] kernel: introduce " Marco Elver
  6 siblings, 1 reply; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

arm64 does not use CONFIG_GENERIC_ENTRY, so add the support for
uaccess logging directly to the architecture.

Link: https://linux-review.googlesource.com/id/I88de539fb9c4a9d27fa8cccbe201a6e4382faf89
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
 arch/arm64/Kconfig                   | 1 +
 arch/arm64/include/asm/thread_info.h | 7 ++++++-
 arch/arm64/kernel/ptrace.c           | 7 +++++++
 arch/arm64/kernel/signal.c           | 5 +++++
 arch/arm64/kernel/syscall.c          | 1 +
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..6023946abe4a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -161,6 +161,7 @@ config ARM64
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select HAVE_ARCH_UACCESS_BUFFER
 	select HAVE_ARCH_VMAP_STACK
 	select HAVE_ARM_SMCCC
 	select HAVE_ASM_MODVERSIONS
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index e1317b7c4525..0461b36251ea 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -82,6 +82,8 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
+#define TIF_UACCESS_BUFFER_ENTRY 27     /* thread has non-zero uaccess_desc_addr_addr */
+#define TIF_UACCESS_BUFFER_EXIT  28     /* thread has non-zero kcur */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
@@ -98,6 +100,8 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define _TIF_SVE		(1 << TIF_SVE)
 #define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 #define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
+#define _TIF_UACCESS_BUFFER_ENTRY	(1 << TIF_UACCESS_BUFFER_ENTRY)
+#define _TIF_UACCESS_BUFFER_EXIT	(1 << TIF_UACCESS_BUFFER_EXIT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
@@ -106,7 +110,8 @@ int arch_dup_task_struct(struct task_struct *dst,
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
-				 _TIF_SYSCALL_EMU)
+				 _TIF_SYSCALL_EMU | _TIF_UACCESS_BUFFER_ENTRY | \
+				 _TIF_UACCESS_BUFFER_EXIT)
 
 #ifdef CONFIG_SHADOW_CALL_STACK
 #define INIT_SCS							\
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 88a9034fb9b5..283372eccaeb 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -29,6 +29,7 @@
 #include <linux/regset.h>
 #include <linux/tracehook.h>
 #include <linux/elf.h>
+#include <linux/uaccess-buffer.h>
 
 #include <asm/compat.h>
 #include <asm/cpufeature.h>
@@ -1854,6 +1855,9 @@ int syscall_trace_enter(struct pt_regs *regs)
 	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
 		trace_sys_enter(regs, regs->syscallno);
 
+	if (flags & _TIF_UACCESS_BUFFER_ENTRY)
+		uaccess_buffer_syscall_entry();
+
 	audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
 			    regs->regs[2], regs->regs[3]);
 
@@ -1866,6 +1870,9 @@ void syscall_trace_exit(struct pt_regs *regs)
 
 	audit_syscall_exit(regs);
 
+	if (flags & _TIF_UACCESS_BUFFER_EXIT)
+		uaccess_buffer_syscall_exit();
+
 	if (flags & _TIF_SYSCALL_TRACEPOINT)
 		trace_sys_exit(regs, syscall_get_return_value(current, regs));
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 8f6372b44b65..5bbd98e5c257 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -20,6 +20,7 @@
 #include <linux/tracehook.h>
 #include <linux/ratelimit.h>
 #include <linux/syscalls.h>
+#include <linux/uaccess-buffer.h>
 
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
@@ -919,6 +920,8 @@ static void do_signal(struct pt_regs *regs)
 
 void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
 {
+	bool uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
+
 	do {
 		if (thread_flags & _TIF_NEED_RESCHED) {
 			/* Unmask Debug and SError for the next task */
@@ -950,6 +953,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
 		local_daif_mask();
 		thread_flags = READ_ONCE(current_thread_info()->flags);
 	} while (thread_flags & _TIF_WORK_MASK);
+
+	uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
 }
 
 unsigned long __ro_after_init signal_minsigstksz;
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 50a0f1a38e84..d59022b594f2 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -7,6 +7,7 @@
 #include <linux/ptrace.h>
 #include <linux/randomize_kstack.h>
 #include <linux/syscalls.h>
+#include <linux/uaccess-buffer.h>
 
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 6/6] Documentation: document uaccess logging
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
                   ` (4 preceding siblings ...)
  2021-12-08  4:48 ` [PATCH v3 5/6] arm64: add support for uaccess logging Peter Collingbourne
@ 2021-12-08  4:48 ` Peter Collingbourne
  2021-12-08 15:33 ` [PATCH v3 0/6] kernel: introduce " Marco Elver
  6 siblings, 0 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-08  4:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Peter Collingbourne, Gabriel Krisman Bertazi, Chris Hyser,
	Daniel Vetter, Chris Wilson, Arnd Bergmann, Dmitry Vyukov,
	Christian Brauner, Eric W. Biederman, Alexey Gladkov,
	Ran Xiaokai, David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov,
	Thomas Cedeno, Marco Elver, Alexander Potapenko
  Cc: linux-kernel, linux-arm-kernel, Evgenii Stepanov

Add documentation for the uaccess logging feature.

Link: https://linux-review.googlesource.com/id/Ia626c0ca91bc0a3d8067d7f28406aa40693b65a2
Signed-off-by: Peter Collingbourne <pcc@google.com>
---
v3:
- document what happens if passing NULL to prctl
- be explicit about meaning of addr and size

 Documentation/admin-guide/index.rst           |   1 +
 Documentation/admin-guide/uaccess-logging.rst | 151 ++++++++++++++++++
 2 files changed, 152 insertions(+)
 create mode 100644 Documentation/admin-guide/uaccess-logging.rst

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 1bedab498104..4f6ee447ab2f 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -54,6 +54,7 @@ ABI will be found here.
    :maxdepth: 1
 
    sysfs-rules
+   uaccess-logging
 
 The rest of this manual consists of various unordered guides on how to
 configure specific aspects of kernel behavior to your liking.
diff --git a/Documentation/admin-guide/uaccess-logging.rst b/Documentation/admin-guide/uaccess-logging.rst
new file mode 100644
index 000000000000..24def38bbdf8
--- /dev/null
+++ b/Documentation/admin-guide/uaccess-logging.rst
@@ -0,0 +1,151 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Uaccess Logging
+===============
+
+Background
+----------
+
+Userspace tools such as sanitizers (ASan, MSan, HWASan) and tools
+making use of the ARM Memory Tagging Extension (MTE) need to
+monitor all memory accesses in a program so that they can detect
+memory errors. Furthermore, fuzzing tools such as syzkaller need to
+monitor all memory accesses so that they know which parts of memory
+to fuzz. For accesses made purely in userspace, this is achieved
+via compiler instrumentation, or for MTE, via direct hardware
+support. However, accesses made by the kernel on behalf of the user
+program via syscalls (i.e. uaccesses) are normally invisible to
+these tools.
+
+Traditionally, the sanitizers have handled this by interposing the libc
+syscall stubs with a wrapper that checks the memory based on what we
+believe the uaccesses will be. However, this creates a maintenance
+burden: each syscall must be annotated with its uaccesses in order
+to be recognized by the sanitizer, and these annotations must be
+continuously updated as the kernel changes.
+
+The kernel's uaccess logging feature provides userspace tools with
+the address and size of each userspace access, thereby allowing these
+tools to report memory errors involving these accesses without needing
+annotations for every syscall.
+
+By relying on the kernel's actual uaccesses, rather than a
+reimplementation of them, the userspace memory safety tools may
+play a dual role of verifying the validity of kernel accesses. Even
+a sanitizer whose syscall wrappers have complete knowledge of the
+kernel's intended API may vary from the kernel's actual uaccesses due
+to kernel bugs. A sanitizer with knowledge of the kernel's actual
+uaccesses may produce more accurate error reports that reveal such
+bugs. For example, a kernel that accesses more memory than expected
+by the userspace program could indicate that either userspace or the
+kernel has the wrong idea about which kernel functionality is being
+requested -- either way, there is a bug.
+
+Interface
+---------
+
+The feature may be used via the following prctl:
+
+.. code-block:: c
+
+  uint64_t addr = 0; /* Generally will be a TLS slot or equivalent */
+  prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &addr, 0, 0, 0);
+
+Supplying a non-zero address as the second argument to ``prctl``
+will cause the kernel to read an address (referred to as the *uaccess
+descriptor address*) from that address on each kernel entry. Specifying
+an address of NULL as the second argument will restore the kernel's
+default behavior, i.e. no uaccess descriptor address is read.
+
+When entering the kernel with a non-zero uaccess descriptor address
+to handle a syscall, the kernel will read a data structure of type
+``struct uaccess_descriptor`` from the uaccess descriptor address,
+which is defined as follows:
+
+.. code-block:: c
+
+  struct uaccess_descriptor {
+    uint64_t addr, size;
+  };
+
+This data structure contains the address and size (in array elements)
+of a *uaccess buffer*, which is an array of data structures of type
+``struct uaccess_buffer_entry``. Before returning to userspace, the
+kernel will log information about uaccesses to sequential entries
+in the uaccess buffer. It will also store ``NULL`` to the uaccess
+descriptor address, and store the address and size of the unused
+portion of the uaccess buffer to the uaccess descriptor.
+
+The format of a uaccess buffer entry is defined as follows:
+
+.. code-block:: c
+
+  struct uaccess_buffer_entry {
+    uint64_t addr, size, flags;
+  };
+
+``addr`` and ``size`` contain the address and size of the user memory
+access. On arm64, tag bits are preserved in the ``addr`` field. There
+is currently one flag bit assignment for the ``flags`` field:
+
+.. code-block:: c
+
+  #define UACCESS_BUFFER_FLAG_WRITE 1
+
+This flag is set if the access was a write, or clear if it was a
+read. The meaning of all other flag bits is reserved.
+
+When entering the kernel with a non-zero uaccess descriptor
+address for a reason other than a syscall (for example, when
+IPI'd due to an incoming asynchronous signal), any signals other
+than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling
+``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been
+initialized with ``sigfillset(set)``. This is to prevent incoming
+signals from interfering with uaccess logging.
+
+Example
+-------
+
+Here is an example of a code snippet that will enumerate the accesses
+performed by a ``uname(2)`` syscall:
+
+.. code-block:: c
+
+  struct uaccess_buffer_entry entries[64];
+  struct uaccess_descriptor desc;
+  uint64_t desc_addr = 0;
+  prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &desc_addr, 0, 0, 0);
+
+  desc.addr = (uint64_t)&entries;
+  desc.size = 64;
+  desc_addr = (uint64_t)&desc;
+
+  struct utsname un;
+  uname(&un);
+
+  struct uaccess_buffer_entry* entries_end = (struct uaccess_buffer_entry*)desc.addr;
+  for (struct uaccess_buffer_entry* entry = entries; entry != entries_end; ++entry) {
+    printf("%s at 0x%lx size 0x%lx\n", entry->flags & UACCESS_BUFFER_FLAG_WRITE ? "WRITE" : "READ",
+           (unsigned long)entry->addr, (unsigned long)entry->size);
+  }
+
+Limitations
+-----------
+
+This feature is currently only supported on the arm64, s390 and x86
+architectures.
+
+Uaccess buffers are a "best-effort" mechanism for logging uaccesses. Of
+course, not all of the accesses may fit in the buffer, but aside from
+that, not all internal kernel APIs that access userspace memory are
+covered. Therefore, userspace programs should tolerate unreported
+accesses.
+
+On the other hand, the kernel guarantees that it will not
+(intentionally) report accessing more data than it is specified
+to read. For example, if the kernel implements a syscall that is
+specified to read a data structure of size ``N`` bytes by first
+reading a page's worth of data and then only using the first ``N``
+bytes from it, the kernel will either report reading ``N`` bytes or
+not report the access at all.
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header
  2021-12-08  4:48 ` [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header Peter Collingbourne
@ 2021-12-08  9:27   ` Dmitry Vyukov
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-08  9:27 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
>
> In an upcoming change we are going to add uaccess instrumentation
> that uses inline access to struct task_struct from the
> instrumentation routines. Because instrumentation.h is included
> from many places including (recursively) from sched.h this would
> otherwise lead to a circular dependency. Break the dependency by
> moving uaccess instrumentation routines into a separate header,
> instrumentation-uaccess.h.
>
> Link: https://linux-review.googlesource.com/id/I625728db0c8db374e13e4ebc54985ac5c79ace7d
> Signed-off-by: Peter Collingbourne <pcc@google.com>

Acked-by: Dmitry Vyukov <dvyukov@google.com>

> ---
>  include/linux/instrumented-uaccess.h | 49 ++++++++++++++++++++++++++++
>  include/linux/instrumented.h         | 34 -------------------
>  include/linux/uaccess.h              |  2 +-
>  lib/iov_iter.c                       |  2 +-
>  lib/usercopy.c                       |  2 +-
>  5 files changed, 52 insertions(+), 37 deletions(-)
>  create mode 100644 include/linux/instrumented-uaccess.h
>
> diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
> new file mode 100644
> index 000000000000..ece549088e50
> --- /dev/null
> +++ b/include/linux/instrumented-uaccess.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * This header provides generic wrappers for memory access instrumentation for
> + * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
> + */
> +#ifndef _LINUX_INSTRUMENTED_UACCESS_H
> +#define _LINUX_INSTRUMENTED_UACCESS_H
> +
> +#include <linux/compiler.h>
> +#include <linux/kasan-checks.h>
> +#include <linux/kcsan-checks.h>
> +#include <linux/types.h>
> +
> +/**
> + * instrument_copy_to_user - instrument reads of copy_to_user
> + *
> + * Instrument reads from kernel memory, that are due to copy_to_user (and
> + * variants). The instrumentation must be inserted before the accesses.
> + *
> + * @to destination address
> + * @from source address
> + * @n number of bytes to copy
> + */
> +static __always_inline void
> +instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
> +{
> +       kasan_check_read(from, n);
> +       kcsan_check_read(from, n);
> +}
> +
> +/**
> + * instrument_copy_from_user - instrument writes of copy_from_user
> + *
> + * Instrument writes to kernel memory, that are due to copy_from_user (and
> + * variants). The instrumentation should be inserted before the accesses.
> + *
> + * @to destination address
> + * @from source address
> + * @n number of bytes to copy
> + */
> +static __always_inline void
> +instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
> +{
> +       kasan_check_write(to, n);
> +       kcsan_check_write(to, n);
> +}
> +
> +#endif /* _LINUX_INSTRUMENTED_UACCESS_H */
> diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
> index 42faebbaa202..b68f415510c7 100644
> --- a/include/linux/instrumented.h
> +++ b/include/linux/instrumented.h
> @@ -102,38 +102,4 @@ static __always_inline void instrument_atomic_read_write(const volatile void *v,
>         kcsan_check_atomic_read_write(v, size);
>  }
>
> -/**
> - * instrument_copy_to_user - instrument reads of copy_to_user
> - *
> - * Instrument reads from kernel memory, that are due to copy_to_user (and
> - * variants). The instrumentation must be inserted before the accesses.
> - *
> - * @to destination address
> - * @from source address
> - * @n number of bytes to copy
> - */
> -static __always_inline void
> -instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
> -{
> -       kasan_check_read(from, n);
> -       kcsan_check_read(from, n);
> -}
> -
> -/**
> - * instrument_copy_from_user - instrument writes of copy_from_user
> - *
> - * Instrument writes to kernel memory, that are due to copy_from_user (and
> - * variants). The instrumentation should be inserted before the accesses.
> - *
> - * @to destination address
> - * @from source address
> - * @n number of bytes to copy
> - */
> -static __always_inline void
> -instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
> -{
> -       kasan_check_write(to, n);
> -       kcsan_check_write(to, n);
> -}
> -
>  #endif /* _LINUX_INSTRUMENTED_H */
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index ac0394087f7d..c0c467e39657 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -3,7 +3,7 @@
>  #define __LINUX_UACCESS_H__
>
>  #include <linux/fault-inject-usercopy.h>
> -#include <linux/instrumented.h>
> +#include <linux/instrumented-uaccess.h>
>  #include <linux/minmax.h>
>  #include <linux/sched.h>
>  #include <linux/thread_info.h>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 66a740e6e153..3f9dc6df7102 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -12,7 +12,7 @@
>  #include <linux/compat.h>
>  #include <net/checksum.h>
>  #include <linux/scatterlist.h>
> -#include <linux/instrumented.h>
> +#include <linux/instrumented-uaccess.h>
>
>  #define PIPE_PARANOIA /* for now */
>
> diff --git a/lib/usercopy.c b/lib/usercopy.c
> index 7413dd300516..1cd188e62d06 100644
> --- a/lib/usercopy.c
> +++ b/lib/usercopy.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <linux/bitops.h>
>  #include <linux/fault-inject-usercopy.h>
> -#include <linux/instrumented.h>
> +#include <linux/instrumented-uaccess.h>
>  #include <linux/uaccess.h>
>
>  /* out-of-line parts */
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data
  2021-12-08  4:48 ` [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data Peter Collingbourne
@ 2021-12-08  9:34   ` Dmitry Vyukov
  2021-12-09 21:42     ` Peter Collingbourne
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-08  9:34 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
>
> With uaccess logging the contract is that the kernel must not report
> accessing more data than necessary, as this can lead to false positive
> reports in downstream consumers. This generally works out of the box
> when instrumenting copy_{from,to}_user(), but with the data argument
> to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> much as we can, if the PAGE_SIZE sized access failed) and figure out
> later how much we actually need.
>
> To prevent this from leading to a false positive report, use
> copy_from_user_nolog(), which will prevent the access from being logged.
> Recall that it is valid for the kernel to report accessing less
> data than it actually accessed, as uaccess logging is a best-effort
> mechanism for reporting uaccesses.
>
> Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
>  fs/namespace.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 659a8f39c61a..8f5f2aaca64e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -31,6 +31,7 @@
>  #include <uapi/linux/mount.h>
>  #include <linux/fs_context.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include "pnode.h"
>  #include "internal.h"
> @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
>         if (!copy)
>                 return ERR_PTR(-ENOMEM);
>
> -       left = copy_from_user(copy, data, PAGE_SIZE);
> +       /*
> +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> +        * the uaccess buffer, as this can lead to false positive reports in
> +        * downstream consumers.
> +        */
> +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);

A late idea...
Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
flag. Better for user-space, at least can detect UAFs by checking the
first byte. And a more logical kernel annotation (maybe will be used
in some other tools? or if we ever check user tags in the kernel).

Probably not too important today since we use this only in 2 places,
but longer term may be better.

Btw, what's the story with BPF accesses? Can we log them theoretically?

Previously the comment said:

+       /*
+        * Avoid copy_from_user() here as it may leak information about the BPF
+        * program to userspace via the uaccess buffer.
+        */

but now it says something very generic:

/*
* Avoid logging uaccesses here as the BPF program may not be following
* the uaccess log rules.
*/






>         /*
>          * Not all architectures have an exact copy_from_user(). Resort to
> --
> 2.34.1.173.g76aa8bc2d0-goog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/6] uaccess-buffer: add CONFIG_GENERIC_ENTRY support
  2021-12-08  4:48 ` [PATCH v3 4/6] uaccess-buffer: add CONFIG_GENERIC_ENTRY support Peter Collingbourne
@ 2021-12-08  9:43   ` Dmitry Vyukov
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-08  9:43 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
>
> Add uaccess logging support on architectures that use
> CONFIG_GENERIC_ENTRY (currently only s390 and x86).
>
> Link: https://linux-review.googlesource.com/id/I3c5eb19a7e4a1dbe6095f6971f7826c4b0663f7d
> Signed-off-by: Peter Collingbourne <pcc@google.com>

Acked-by: Dmitry Vyukov <dvyukov@google.com>

> ---
>  arch/Kconfig                 |  6 ++++++
>  include/linux/entry-common.h |  2 ++
>  include/linux/thread_info.h  |  4 ++++
>  kernel/entry/common.c        | 10 ++++++++++
>  4 files changed, 22 insertions(+)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c..c4dcab5279ac 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -31,6 +31,7 @@ config HOTPLUG_SMT
>         bool
>
>  config GENERIC_ENTRY
> +       select HAVE_ARCH_UACCESS_BUFFER
>         bool
>
>  config KPROBES
> @@ -1312,6 +1313,11 @@ config ARCH_HAS_PARANOID_L1D_FLUSH
>  config DYNAMIC_SIGFRAME
>         bool
>
> +config HAVE_ARCH_UACCESS_BUFFER
> +       bool
> +       help
> +         Select if the architecture's syscall entry/exit code supports uaccess buffers.
> +
>  source "kernel/gcov/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index 2e2b8d6140ed..973fcd1d48a3 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -42,12 +42,14 @@
>                                  SYSCALL_WORK_SYSCALL_EMU |             \
>                                  SYSCALL_WORK_SYSCALL_AUDIT |           \
>                                  SYSCALL_WORK_SYSCALL_USER_DISPATCH |   \
> +                                SYSCALL_WORK_UACCESS_BUFFER_ENTRY |    \
>                                  ARCH_SYSCALL_WORK_ENTER)
>  #define SYSCALL_WORK_EXIT      (SYSCALL_WORK_SYSCALL_TRACEPOINT |      \
>                                  SYSCALL_WORK_SYSCALL_TRACE |           \
>                                  SYSCALL_WORK_SYSCALL_AUDIT |           \
>                                  SYSCALL_WORK_SYSCALL_USER_DISPATCH |   \
>                                  SYSCALL_WORK_SYSCALL_EXIT_TRAP |       \
> +                                SYSCALL_WORK_UACCESS_BUFFER_EXIT |     \
>                                  ARCH_SYSCALL_WORK_EXIT)
>
>  /*
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index ad0c4e041030..b0f8ea86967f 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -46,6 +46,8 @@ enum syscall_work_bit {
>         SYSCALL_WORK_BIT_SYSCALL_AUDIT,
>         SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH,
>         SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP,
> +       SYSCALL_WORK_BIT_UACCESS_BUFFER_ENTRY,
> +       SYSCALL_WORK_BIT_UACCESS_BUFFER_EXIT,
>  };
>
>  #define SYSCALL_WORK_SECCOMP           BIT(SYSCALL_WORK_BIT_SECCOMP)
> @@ -55,6 +57,8 @@ enum syscall_work_bit {
>  #define SYSCALL_WORK_SYSCALL_AUDIT     BIT(SYSCALL_WORK_BIT_SYSCALL_AUDIT)
>  #define SYSCALL_WORK_SYSCALL_USER_DISPATCH BIT(SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH)
>  #define SYSCALL_WORK_SYSCALL_EXIT_TRAP BIT(SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP)
> +#define SYSCALL_WORK_UACCESS_BUFFER_ENTRY      BIT(SYSCALL_WORK_BIT_UACCESS_BUFFER_ENTRY)
> +#define SYSCALL_WORK_UACCESS_BUFFER_EXIT       BIT(SYSCALL_WORK_BIT_UACCESS_BUFFER_EXIT)
>  #endif
>
>  #include <asm/thread_info.h>
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index d5a61d565ad5..57c4bb01a554 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -6,6 +6,7 @@
>  #include <linux/livepatch.h>
>  #include <linux/audit.h>
>  #include <linux/tick.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include "common.h"
>
> @@ -70,6 +71,9 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>                         return ret;
>         }
>
> +       if (work & SYSCALL_WORK_UACCESS_BUFFER_ENTRY)
> +               uaccess_buffer_syscall_entry();
> +
>         /* Either of the above might have changed the syscall number */
>         syscall = syscall_get_nr(current, regs);
>
> @@ -197,14 +201,17 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  static void exit_to_user_mode_prepare(struct pt_regs *regs)
>  {
>         unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +       bool uaccess_buffer_pending;
>
>         lockdep_assert_irqs_disabled();
>
>         /* Flush pending rcuog wakeup before the last need_resched() check */
>         tick_nohz_user_enter_prepare();
>
> +       uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
>         if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
>                 ti_work = exit_to_user_mode_loop(regs, ti_work);
> +       uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
>
>         arch_exit_to_user_mode_prepare(regs, ti_work);
>
> @@ -247,6 +254,9 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
>
>         audit_syscall_exit(regs);
>
> +       if (work & SYSCALL_WORK_UACCESS_BUFFER_EXIT)
> +               uaccess_buffer_syscall_exit();
> +
>         if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
>                 trace_sys_exit(regs, syscall_get_return_value(current, regs));
>
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] arm64: add support for uaccess logging
  2021-12-08  4:48 ` [PATCH v3 5/6] arm64: add support for uaccess logging Peter Collingbourne
@ 2021-12-08  9:48   ` Dmitry Vyukov
  2021-12-09 21:44     ` Peter Collingbourne
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-08  9:48 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
>
> arm64 does not use CONFIG_GENERIC_ENTRY, so add the support for
> uaccess logging directly to the architecture.
>
> Link: https://linux-review.googlesource.com/id/I88de539fb9c4a9d27fa8cccbe201a6e4382faf89
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
>  arch/arm64/Kconfig                   | 1 +
>  arch/arm64/include/asm/thread_info.h | 7 ++++++-
>  arch/arm64/kernel/ptrace.c           | 7 +++++++
>  arch/arm64/kernel/signal.c           | 5 +++++
>  arch/arm64/kernel/syscall.c          | 1 +
>  5 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c4207cf9bb17..6023946abe4a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -161,6 +161,7 @@ config ARM64
>         select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>         select HAVE_ARCH_TRACEHOOK
>         select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +       select HAVE_ARCH_UACCESS_BUFFER
>         select HAVE_ARCH_VMAP_STACK
>         select HAVE_ARM_SMCCC
>         select HAVE_ASM_MODVERSIONS
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index e1317b7c4525..0461b36251ea 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -82,6 +82,8 @@ int arch_dup_task_struct(struct task_struct *dst,
>  #define TIF_SVE_VL_INHERIT     24      /* Inherit SVE vl_onexec across exec */
>  #define TIF_SSBD               25      /* Wants SSB mitigation */
>  #define TIF_TAGGED_ADDR                26      /* Allow tagged user addresses */
> +#define TIF_UACCESS_BUFFER_ENTRY 27     /* thread has non-zero uaccess_desc_addr_addr */
> +#define TIF_UACCESS_BUFFER_EXIT  28     /* thread has non-zero kcur */
>
>  #define _TIF_SIGPENDING                (1 << TIF_SIGPENDING)
>  #define _TIF_NEED_RESCHED      (1 << TIF_NEED_RESCHED)
> @@ -98,6 +100,8 @@ int arch_dup_task_struct(struct task_struct *dst,
>  #define _TIF_SVE               (1 << TIF_SVE)
>  #define _TIF_MTE_ASYNC_FAULT   (1 << TIF_MTE_ASYNC_FAULT)
>  #define _TIF_NOTIFY_SIGNAL     (1 << TIF_NOTIFY_SIGNAL)
> +#define _TIF_UACCESS_BUFFER_ENTRY      (1 << TIF_UACCESS_BUFFER_ENTRY)
> +#define _TIF_UACCESS_BUFFER_EXIT       (1 << TIF_UACCESS_BUFFER_EXIT)
>
>  #define _TIF_WORK_MASK         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
>                                  _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> @@ -106,7 +110,8 @@ int arch_dup_task_struct(struct task_struct *dst,
>
>  #define _TIF_SYSCALL_WORK      (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>                                  _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> -                                _TIF_SYSCALL_EMU)
> +                                _TIF_SYSCALL_EMU | _TIF_UACCESS_BUFFER_ENTRY | \
> +                                _TIF_UACCESS_BUFFER_EXIT)
>
>  #ifdef CONFIG_SHADOW_CALL_STACK
>  #define INIT_SCS                                                       \
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 88a9034fb9b5..283372eccaeb 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -29,6 +29,7 @@
>  #include <linux/regset.h>
>  #include <linux/tracehook.h>
>  #include <linux/elf.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include <asm/compat.h>
>  #include <asm/cpufeature.h>
> @@ -1854,6 +1855,9 @@ int syscall_trace_enter(struct pt_regs *regs)
>         if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
>                 trace_sys_enter(regs, regs->syscallno);
>
> +       if (flags & _TIF_UACCESS_BUFFER_ENTRY)
> +               uaccess_buffer_syscall_entry();
> +
>         audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
>                             regs->regs[2], regs->regs[3]);
>
> @@ -1866,6 +1870,9 @@ void syscall_trace_exit(struct pt_regs *regs)
>
>         audit_syscall_exit(regs);
>
> +       if (flags & _TIF_UACCESS_BUFFER_EXIT)
> +               uaccess_buffer_syscall_exit();
> +
>         if (flags & _TIF_SYSCALL_TRACEPOINT)
>                 trace_sys_exit(regs, syscall_get_return_value(current, regs));
>
> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 8f6372b44b65..5bbd98e5c257 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -20,6 +20,7 @@
>  #include <linux/tracehook.h>
>  #include <linux/ratelimit.h>
>  #include <linux/syscalls.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include <asm/daifflags.h>
>  #include <asm/debug-monitors.h>
> @@ -919,6 +920,8 @@ static void do_signal(struct pt_regs *regs)
>
>  void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
>  {
> +       bool uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
> +
>         do {
>                 if (thread_flags & _TIF_NEED_RESCHED) {
>                         /* Unmask Debug and SError for the next task */
> @@ -950,6 +953,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
>                 local_daif_mask();
>                 thread_flags = READ_ONCE(current_thread_info()->flags);
>         } while (thread_flags & _TIF_WORK_MASK);
> +
> +       uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
>  }
>
>  unsigned long __ro_after_init signal_minsigstksz;
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index 50a0f1a38e84..d59022b594f2 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -7,6 +7,7 @@
>  #include <linux/ptrace.h>
>  #include <linux/randomize_kstack.h>
>  #include <linux/syscalls.h>
> +#include <linux/uaccess-buffer.h>

This looks strange... Does some other header miss this include?

>
>  #include <asm/daifflags.h>
>  #include <asm/debug-monitors.h>
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/6] uaccess-buffer: add core code
  2021-12-08  4:48 ` [PATCH v3 2/6] uaccess-buffer: add core code Peter Collingbourne
@ 2021-12-08 10:21   ` Dmitry Vyukov
  2021-12-09 22:13     ` Peter Collingbourne
  2021-12-08 16:46   ` Marco Elver
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-08 10:21 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
>
> Add the core code to support uaccess logging. Subsequent patches will
> hook this up to the arch-specific kernel entry and exit code for
> certain architectures.
>
> Link: https://linux-review.googlesource.com/id/I6581765646501a5631b281d670903945ebadc57d
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
> v3:
> - performance optimizations for entry/exit code
> - don't use kcur == NULL to mean overflow
> - fix potential double free in clone()
> - don't allocate a new kernel-side uaccess buffer for each syscall
> - fix uaccess buffer leak on exit
> - fix some sparse warnings
>
> v2:
> - New interface that avoids multiple syscalls per real syscall and
>   is arch-generic
> - Avoid logging uaccesses done by BPF programs
> - Add documentation
> - Split up into multiple patches
> - Various code moves, renames etc as requested by Marco
>
>  fs/exec.c                            |   3 +
>  include/linux/instrumented-uaccess.h |   6 +-
>  include/linux/sched.h                |   5 ++
>  include/linux/uaccess-buffer-info.h  |  46 ++++++++++
>  include/linux/uaccess-buffer.h       | 112 +++++++++++++++++++++++
>  include/uapi/linux/prctl.h           |   3 +
>  include/uapi/linux/uaccess-buffer.h  |  27 ++++++
>  kernel/Makefile                      |   1 +
>  kernel/bpf/helpers.c                 |   7 +-
>  kernel/fork.c                        |   4 +
>  kernel/signal.c                      |   4 +-
>  kernel/sys.c                         |   6 ++
>  kernel/uaccess-buffer.c              | 129 +++++++++++++++++++++++++++
>  13 files changed, 350 insertions(+), 3 deletions(-)
>  create mode 100644 include/linux/uaccess-buffer-info.h
>  create mode 100644 include/linux/uaccess-buffer.h
>  create mode 100644 include/uapi/linux/uaccess-buffer.h
>  create mode 100644 kernel/uaccess-buffer.c
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 537d92c41105..c9975e790f30 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -65,6 +65,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/io_uring.h>
>  #include <linux/syscall_user_dispatch.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include <linux/uaccess.h>
>  #include <asm/mmu_context.h>
> @@ -1313,6 +1314,8 @@ int begin_new_exec(struct linux_binprm * bprm)
>         me->personality &= ~bprm->per_clear;
>
>         clear_syscall_work_syscall_user_dispatch(me);
> +       uaccess_buffer_set_descriptor_addr_addr(0);
> +       uaccess_buffer_free(current);
>
>         /*
>          * We have to apply CLOEXEC before we change whether the process is
> diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
> index ece549088e50..b967f4436d15 100644
> --- a/include/linux/instrumented-uaccess.h
> +++ b/include/linux/instrumented-uaccess.h
> @@ -2,7 +2,8 @@
>
>  /*
>   * This header provides generic wrappers for memory access instrumentation for
> - * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
> + * uaccess routines that the compiler cannot emit for: KASAN, KCSAN,
> + * uaccess buffers.
>   */
>  #ifndef _LINUX_INSTRUMENTED_UACCESS_H
>  #define _LINUX_INSTRUMENTED_UACCESS_H
> @@ -11,6 +12,7 @@
>  #include <linux/kasan-checks.h>
>  #include <linux/kcsan-checks.h>
>  #include <linux/types.h>
> +#include <linux/uaccess-buffer.h>
>
>  /**
>   * instrument_copy_to_user - instrument reads of copy_to_user
> @@ -27,6 +29,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
>  {
>         kasan_check_read(from, n);
>         kcsan_check_read(from, n);
> +       uaccess_buffer_log_write(to, n);
>  }
>
>  /**
> @@ -44,6 +47,7 @@ instrument_copy_from_user(const void *to, const void __user *from, unsigned long
>  {
>         kasan_check_write(to, n);
>         kcsan_check_write(to, n);
> +       uaccess_buffer_log_read(from, n);
>  }
>
>  #endif /* _LINUX_INSTRUMENTED_UACCESS_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 78c351e35fec..7c5278d7b57d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -34,6 +34,7 @@
>  #include <linux/rseq.h>
>  #include <linux/seqlock.h>
>  #include <linux/kcsan.h>
> +#include <linux/uaccess-buffer-info.h>
>  #include <asm/kmap_size.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
> @@ -1484,6 +1485,10 @@ struct task_struct {
>         struct callback_head            l1d_flush_kill;
>  #endif
>
> +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> +       struct uaccess_buffer_info      uaccess_buffer;
> +#endif
> +
>         /*
>          * New fields for task_struct should be added above here, so that
>          * they are included in the randomized portion of task_struct.
> diff --git a/include/linux/uaccess-buffer-info.h b/include/linux/uaccess-buffer-info.h
> new file mode 100644
> index 000000000000..15e2d8f7c074
> --- /dev/null
> +++ b/include/linux/uaccess-buffer-info.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_UACCESS_BUFFER_INFO_H
> +#define _LINUX_UACCESS_BUFFER_INFO_H
> +
> +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> +
> +struct uaccess_buffer_info {
> +       /*
> +        * The pointer to pointer to struct uaccess_descriptor. This is the
> +        * value controlled by prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR).
> +        */
> +       struct uaccess_descriptor __user *__user *desc_ptr_ptr;
> +
> +       /*
> +        * The pointer to struct uaccess_descriptor read at syscall entry time.
> +        */
> +       struct uaccess_descriptor __user *desc_ptr;
> +
> +       /*
> +        * A pointer to the kernel's temporary copy of the uaccess log for the
> +        * current syscall. We log to a kernel buffer in order to avoid leaking
> +        * timing information to userspace.
> +        */
> +       struct uaccess_buffer_entry *kbegin;
> +
> +       /*
> +        * The position of the next uaccess buffer entry for the current
> +        * syscall, or NULL if we are not logging the current syscall.
> +        */
> +       struct uaccess_buffer_entry *kcur;
> +
> +       /*
> +        * A pointer to the end of the kernel's uaccess log.
> +        */
> +       struct uaccess_buffer_entry *kend;
> +
> +       /*
> +        * The pointer to the userspace uaccess log, as read from the
> +        * struct uaccess_descriptor.
> +        */
> +       struct uaccess_buffer_entry __user *ubegin;
> +};
> +
> +#endif
> +
> +#endif  /* _LINUX_UACCESS_BUFFER_INFO_H */
> diff --git a/include/linux/uaccess-buffer.h b/include/linux/uaccess-buffer.h
> new file mode 100644
> index 000000000000..f2f46db274f3
> --- /dev/null
> +++ b/include/linux/uaccess-buffer.h
> @@ -0,0 +1,112 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_UACCESS_BUFFER_H
> +#define _LINUX_UACCESS_BUFFER_H
> +
> +#include <linux/sched.h>
> +#include <uapi/linux/uaccess-buffer.h>
> +
> +#include <asm-generic/errno-base.h>
> +
> +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> +
> +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> +{
> +       return test_task_syscall_work(tsk, UACCESS_BUFFER_ENTRY);
> +}
> +
> +void __uaccess_buffer_syscall_entry(void);
> +static inline void uaccess_buffer_syscall_entry(void)
> +{
> +       __uaccess_buffer_syscall_entry();
> +}
> +
> +void __uaccess_buffer_syscall_exit(void);
> +static inline void uaccess_buffer_syscall_exit(void)
> +{
> +       __uaccess_buffer_syscall_exit();
> +}
> +
> +bool __uaccess_buffer_pre_exit_loop(void);
> +static inline bool uaccess_buffer_pre_exit_loop(void)
> +{
> +       if (!test_syscall_work(UACCESS_BUFFER_ENTRY))
> +               return false;
> +       return __uaccess_buffer_pre_exit_loop();
> +}
> +
> +void __uaccess_buffer_post_exit_loop(void);
> +static inline void uaccess_buffer_post_exit_loop(bool pending)
> +{
> +       if (pending)
> +               __uaccess_buffer_post_exit_loop();
> +}
> +
> +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)

I would move the implementation to .c file. It's a rare path.

> +{
> +       current->uaccess_buffer.desc_ptr_ptr =
> +               (struct uaccess_descriptor __user * __user *)addr;
> +       if (addr)
> +               set_syscall_work(UACCESS_BUFFER_ENTRY);
> +       else
> +               clear_syscall_work(UACCESS_BUFFER_ENTRY);
> +       return 0;
> +}
> +
> +size_t copy_from_user_nolog(void *to, const void __user *from, size_t len);
> +
> +void uaccess_buffer_free(struct task_struct *tsk);
> +
> +void __uaccess_buffer_log_read(const void __user *from, unsigned long n);
> +static inline void uaccess_buffer_log_read(const void __user *from, unsigned long n)
> +{
> +       if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))

UACCESS_BUFFER_EXIT is only defined in future patches, so this won't compile.

> +               __uaccess_buffer_log_read(from, n);
> +}
> +
> +void __uaccess_buffer_log_write(void __user *to, unsigned long n);
> +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> +{
> +       if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
> +               __uaccess_buffer_log_write(to, n);
> +}
> +
> +#else
> +
> +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> +{
> +       return false;
> +}
> +static inline void uaccess_buffer_syscall_entry(void)
> +{
> +}
> +static inline void uaccess_buffer_syscall_exit(void)
> +{
> +}
> +static inline bool uaccess_buffer_pre_exit_loop(void)
> +{
> +       return false;
> +}
> +static inline void uaccess_buffer_post_exit_loop(bool pending)
> +{
> +}
> +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
> +{
> +       return -EINVAL;
> +}
> +static inline void uaccess_buffer_free(struct task_struct *tsk)
> +{
> +}
> +
> +#define copy_from_user_nolog(to, from, len) copy_from_user(to, from, len)
> +
> +static inline void uaccess_buffer_log_read(const void __user *from,
> +                                          unsigned long n)
> +{
> +}
> +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> +{
> +}
> +
> +#endif
> +
> +#endif  /* _LINUX_UACCESS_BUFFER_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index bb73e9a0b24f..74b37469c7b3 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -272,4 +272,7 @@ struct prctl_mm_map {
>  # define PR_SCHED_CORE_SCOPE_THREAD_GROUP      1
>  # define PR_SCHED_CORE_SCOPE_PROCESS_GROUP     2
>
> +/* Configure uaccess logging feature */
> +#define PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR    63
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/include/uapi/linux/uaccess-buffer.h b/include/uapi/linux/uaccess-buffer.h
> new file mode 100644
> index 000000000000..bf10f7c78857
> --- /dev/null
> +++ b/include/uapi/linux/uaccess-buffer.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_UACCESS_BUFFER_H
> +#define _UAPI_LINUX_UACCESS_BUFFER_H
> +
> +#include <linux/types.h>
> +
> +/* Location of the uaccess log. */
> +struct uaccess_descriptor {
> +       /* Address of the uaccess_buffer_entry array. */
> +       __u64 addr;
> +       /* Size of the uaccess_buffer_entry array in number of elements. */
> +       __u64 size;
> +};
> +
> +/* Format of the entries in the uaccess log. */
> +struct uaccess_buffer_entry {
> +       /* Address being accessed. */
> +       __u64 addr;
> +       /* Number of bytes that were accessed. */
> +       __u64 size;
> +       /* UACCESS_BUFFER_* flags. */
> +       __u64 flags;
> +};
> +
> +#define UACCESS_BUFFER_FLAG_WRITE      1 /* access was a write */
> +
> +#endif /* _UAPI_LINUX_UACCESS_BUFFER_H */
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 186c49582f45..d4d9be5146c3 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -114,6 +114,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
>  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
>  obj-$(CONFIG_CFI_CLANG) += cfi.o
> +obj-$(CONFIG_HAVE_ARCH_UACCESS_BUFFER) += uaccess-buffer.o
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 649f07623df6..ab6520a633ef 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -15,6 +15,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/proc_ns.h>
>  #include <linux/security.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include "../../lib/kstrtox.h"
>
> @@ -637,7 +638,11 @@ const struct bpf_func_proto bpf_event_output_data_proto =  {
>  BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
>            const void __user *, user_ptr)
>  {
> -       int ret = copy_from_user(dst, user_ptr, size);
> +       /*
> +        * Avoid logging uaccesses here as the BPF program may not be following
> +        * the uaccess log rules.
> +        */
> +       int ret = copy_from_user_nolog(dst, user_ptr, size);
>
>         if (unlikely(ret)) {
>                 memset(dst, 0, size);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3244cc56b697..8be2ca528a65 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -96,6 +96,7 @@
>  #include <linux/scs.h>
>  #include <linux/io_uring.h>
>  #include <linux/bpf.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include <asm/pgalloc.h>
>  #include <linux/uaccess.h>
> @@ -754,6 +755,7 @@ void __put_task_struct(struct task_struct *tsk)
>         delayacct_tsk_free(tsk);
>         put_signal_struct(tsk->signal);
>         sched_core_free(tsk);
> +       uaccess_buffer_free(tsk);
>
>         if (!profile_handoff_task(tsk))
>                 free_task(tsk);
> @@ -890,6 +892,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>         if (memcg_charge_kernel_stack(tsk))
>                 goto free_stack;
>
> +       uaccess_buffer_free(orig);
> +
>         stack_vm_area = task_stack_vm_area(tsk);
>
>         err = arch_dup_task_struct(tsk, orig);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a629b11bf3e0..69bf21518bd0 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -45,6 +45,7 @@
>  #include <linux/posix-timers.h>
>  #include <linux/cgroup.h>
>  #include <linux/audit.h>
> +#include <linux/uaccess-buffer.h>
>
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/signal.h>
> @@ -1031,7 +1032,8 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
>         if (sig_fatal(p, sig) &&
>             !(signal->flags & SIGNAL_GROUP_EXIT) &&
>             !sigismember(&t->real_blocked, sig) &&
> -           (sig == SIGKILL || !p->ptrace)) {
> +           (sig == SIGKILL ||
> +            !(p->ptrace || uaccess_buffer_maybe_blocked(p)))) {
>                 /*
>                  * This signal will be fatal to the whole group.
>                  */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 8fdac0d90504..c71a9a9c0f68 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -42,6 +42,7 @@
>  #include <linux/version.h>
>  #include <linux/ctype.h>
>  #include <linux/syscall_user_dispatch.h>
> +#include <linux/uaccess-buffer.h>
>
>  #include <linux/compat.h>
>  #include <linux/syscalls.h>
> @@ -2530,6 +2531,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>                 error = sched_core_share_pid(arg2, arg3, arg4, arg5);
>                 break;
>  #endif
> +       case PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR:
> +               if (arg3 || arg4 || arg5)
> +                       return -EINVAL;
> +               error = uaccess_buffer_set_descriptor_addr_addr(arg2);

Does this miss uaccess_buffer_free() for the 0 case?
It seems that when we set it to 0 we always want to free as well (e.g.
in exec). I wonder if freeing should be done by
uaccess_buffer_set_descriptor_addr_addr() itself.
Both uaccess_buffer_set_descriptor_addr_addr() and
uaccess_buffer_free() reset task work, which is fine but is somewhat
suboptimal logically.
Then task exit could do uaccess_buffer_set_descriptor_addr_addr(0).


> +               break;
>         default:
>                 error = -EINVAL;
>                 break;
> diff --git a/kernel/uaccess-buffer.c b/kernel/uaccess-buffer.c
> new file mode 100644
> index 000000000000..088e43f7611c
> --- /dev/null
> +++ b/kernel/uaccess-buffer.c
> @@ -0,0 +1,129 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Support for uaccess logging via uaccess buffers.
> + *
> + * Copyright (C) 2021, Google LLC.
> + */
> +
> +#include <linux/compat.h>
> +#include <linux/mm.h>
> +#include <linux/prctl.h>
> +#include <linux/ptrace.h>
> +#include <linux/sched.h>
> +#include <linux/signal.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/uaccess-buffer.h>
> +
> +static void uaccess_buffer_log(unsigned long addr, unsigned long size,
> +                             unsigned long flags)
> +{
> +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +       struct uaccess_buffer_entry *entry = buf->kcur;
> +
> +       if (entry == buf->kend || unlikely(uaccess_kernel()))
> +               return;
> +       entry->addr = addr;
> +       entry->size = size;
> +       entry->flags = flags;
> +
> +       ++buf->kcur;
> +}
> +
> +void __uaccess_buffer_log_read(const void __user *from, unsigned long n)
> +{
> +       uaccess_buffer_log((unsigned long)from, n, 0);
> +}
> +EXPORT_SYMBOL(__uaccess_buffer_log_read);
> +
> +void __uaccess_buffer_log_write(void __user *to, unsigned long n)
> +{
> +       uaccess_buffer_log((unsigned long)to, n, UACCESS_BUFFER_FLAG_WRITE);
> +}
> +EXPORT_SYMBOL(__uaccess_buffer_log_write);
> +
> +bool __uaccess_buffer_pre_exit_loop(void)
> +{
> +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +       struct uaccess_descriptor __user *desc_ptr;
> +       sigset_t tmp_mask;
> +
> +       if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
> +               return false;
> +
> +       current->real_blocked = current->blocked;
> +       sigfillset(&tmp_mask);

This and __uaccess_buffer_post_exit_loop() runs only when we have a
signal/timer interrupt between setting the descriptor address in
userspace and entering the next syscall, right?
Just want to make sure this code is not executed for normal uaccess
tracing for performance reasons.

> +       set_current_blocked(&tmp_mask);
> +       return true;
> +}
> +
> +void __uaccess_buffer_post_exit_loop(void)
> +{
> +       spin_lock_irq(&current->sighand->siglock);
> +       current->blocked = current->real_blocked;
> +       recalc_sigpending();
> +       spin_unlock_irq(&current->sighand->siglock);
> +}
> +
> +void uaccess_buffer_free(struct task_struct *tsk)
> +{
> +       struct uaccess_buffer_info *buf = &tsk->uaccess_buffer;
> +
> +       kfree(buf->kbegin);
> +       clear_syscall_work(UACCESS_BUFFER_EXIT);
> +       buf->kbegin = buf->kcur = buf->kend = NULL;
> +}
> +
> +void __uaccess_buffer_syscall_entry(void)
> +{
> +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +       struct uaccess_descriptor desc;
> +
> +       if (get_user(buf->desc_ptr, buf->desc_ptr_ptr) || !buf->desc_ptr ||
> +           put_user(0, buf->desc_ptr_ptr) ||
> +           copy_from_user(&desc, buf->desc_ptr, sizeof(desc)))
> +               return;
> +
> +       if (desc.size > 1024)
> +               desc.size = 1024;
> +
> +       if (buf->kend - buf->kbegin != desc.size)
> +               buf->kbegin =
> +                       krealloc_array(buf->kbegin, desc.size,
> +                                      sizeof(struct uaccess_buffer_entry),
> +                                      GFP_KERNEL);
> +       if (!buf->kbegin)

I think we also need to set at least buf->kend to NULL here.
I am not sure what can go wrong now, but it's a strange state. On next
iteration we will do "buf->kend - buf->kbegin", where kend is a
dangling pointer and kbegin is NULL.

> +               return;
> +
> +       set_syscall_work(UACCESS_BUFFER_EXIT);
> +       buf->kcur = buf->kbegin;
> +       buf->kend = buf->kbegin + desc.size;
> +       buf->ubegin =
> +               (struct uaccess_buffer_entry __user *)(unsigned long)desc.addr;
> +}
> +
> +void __uaccess_buffer_syscall_exit(void)
> +{
> +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +       u64 num_entries = buf->kcur - buf->kbegin;
> +       struct uaccess_descriptor desc;
> +
> +       clear_syscall_work(UACCESS_BUFFER_EXIT);
> +       desc.addr = (u64)(unsigned long)(buf->ubegin + num_entries);
> +       desc.size = buf->kend - buf->kcur;
> +       buf->kcur = NULL;
> +       if (copy_to_user(buf->ubegin, buf->kbegin,
> +                        num_entries * sizeof(struct uaccess_buffer_entry)) == 0)
> +               (void)copy_to_user(buf->desc_ptr, &desc, sizeof(desc));
> +}
> +
> +size_t copy_from_user_nolog(void *to, const void __user *from, size_t len)
> +{
> +       size_t retval;
> +
> +       clear_syscall_work(UACCESS_BUFFER_EXIT);
> +       retval = copy_from_user(to, from, len);
> +       if (current->uaccess_buffer.kcur)
> +               set_syscall_work(UACCESS_BUFFER_EXIT);
> +       return retval;
> +}
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/6] kernel: introduce uaccess logging
  2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
                   ` (5 preceding siblings ...)
  2021-12-08  4:48 ` [PATCH v3 6/6] Documentation: document " Peter Collingbourne
@ 2021-12-08 15:33 ` Marco Elver
  2021-12-09 22:12   ` Peter Collingbourne
  6 siblings, 1 reply; 22+ messages in thread
From: Marco Elver @ 2021-12-08 15:33 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Dmitry Vyukov, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
[...]
> Peter Collingbourne (6):
>   include: split out uaccess instrumentation into a separate header
>   uaccess-buffer: add core code
>   fs: use copy_from_user_nolog() to copy mount() data
>   uaccess-buffer: add CONFIG_GENERIC_ENTRY support
>   arm64: add support for uaccess logging
>   Documentation: document uaccess logging

I think it needs to be possible to disable the feature via a Kconfig
option. Not all systems want or could even tolerate the additional
overheads -- even though you say they are minimal elsewhere. For
example, some embedded systems most likely have absolutely no use for
this feature, and the increase in .text might be unacceptable. Certain
features that we usually take for granted are no different (see
init/Kconfig: FUTEX, EPOLL, .. etc). If you'd like it enabled by
default, given the overheads are small enough, it can do "default y"
and be configurable only "if EXPERT".

Is it possible to add a kselftest-style test to
tools/testing/selftests? In addition to the basic tests, can certain
non-trivial properties, like masking of signals, also be tested? I
think that'd be extremely valuable, because I'm sure we'd have to
backport this to several older kernels.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/6] uaccess-buffer: add core code
  2021-12-08  4:48 ` [PATCH v3 2/6] uaccess-buffer: add core code Peter Collingbourne
  2021-12-08 10:21   ` Dmitry Vyukov
@ 2021-12-08 16:46   ` Marco Elver
  2021-12-09 22:14     ` Peter Collingbourne
  1 sibling, 1 reply; 22+ messages in thread
From: Marco Elver @ 2021-12-08 16:46 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Dmitry Vyukov, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Tue, Dec 07, 2021 at 08:48PM -0800, Peter Collingbourne wrote:
> Add the core code to support uaccess logging. Subsequent patches will
> hook this up to the arch-specific kernel entry and exit code for
> certain architectures.
> 
> Link: https://linux-review.googlesource.com/id/I6581765646501a5631b281d670903945ebadc57d
> Signed-off-by: Peter Collingbourne <pcc@google.com>
> ---
> v3:
> - performance optimizations for entry/exit code
> - don't use kcur == NULL to mean overflow
> - fix potential double free in clone()
> - don't allocate a new kernel-side uaccess buffer for each syscall
> - fix uaccess buffer leak on exit
> - fix some sparse warnings
> 
> v2:
> - New interface that avoids multiple syscalls per real syscall and
>   is arch-generic
> - Avoid logging uaccesses done by BPF programs
> - Add documentation
> - Split up into multiple patches
> - Various code moves, renames etc as requested by Marco
> 
>  fs/exec.c                            |   3 +
>  include/linux/instrumented-uaccess.h |   6 +-
>  include/linux/sched.h                |   5 ++
>  include/linux/uaccess-buffer-info.h  |  46 ++++++++++
>  include/linux/uaccess-buffer.h       | 112 +++++++++++++++++++++++
>  include/uapi/linux/prctl.h           |   3 +
>  include/uapi/linux/uaccess-buffer.h  |  27 ++++++
>  kernel/Makefile                      |   1 +
>  kernel/bpf/helpers.c                 |   7 +-
>  kernel/fork.c                        |   4 +
>  kernel/signal.c                      |   4 +-
>  kernel/sys.c                         |   6 ++
>  kernel/uaccess-buffer.c              | 129 +++++++++++++++++++++++++++
>  13 files changed, 350 insertions(+), 3 deletions(-)
>  create mode 100644 include/linux/uaccess-buffer-info.h
>  create mode 100644 include/linux/uaccess-buffer.h
>  create mode 100644 include/uapi/linux/uaccess-buffer.h
>  create mode 100644 kernel/uaccess-buffer.c
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 537d92c41105..c9975e790f30 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -65,6 +65,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/io_uring.h>
>  #include <linux/syscall_user_dispatch.h>
> +#include <linux/uaccess-buffer.h>
>  
>  #include <linux/uaccess.h>
>  #include <asm/mmu_context.h>
> @@ -1313,6 +1314,8 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	me->personality &= ~bprm->per_clear;
>  
>  	clear_syscall_work_syscall_user_dispatch(me);
> +	uaccess_buffer_set_descriptor_addr_addr(0);
> +	uaccess_buffer_free(current);
>  
>  	/*
>  	 * We have to apply CLOEXEC before we change whether the process is
> diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
> index ece549088e50..b967f4436d15 100644
> --- a/include/linux/instrumented-uaccess.h
> +++ b/include/linux/instrumented-uaccess.h
> @@ -2,7 +2,8 @@
>  
>  /*
>   * This header provides generic wrappers for memory access instrumentation for
> - * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
> + * uaccess routines that the compiler cannot emit for: KASAN, KCSAN,
> + * uaccess buffers.
>   */
>  #ifndef _LINUX_INSTRUMENTED_UACCESS_H
>  #define _LINUX_INSTRUMENTED_UACCESS_H
> @@ -11,6 +12,7 @@
>  #include <linux/kasan-checks.h>
>  #include <linux/kcsan-checks.h>
>  #include <linux/types.h>
> +#include <linux/uaccess-buffer.h>
>  
>  /**
>   * instrument_copy_to_user - instrument reads of copy_to_user
> @@ -27,6 +29,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
>  {
>  	kasan_check_read(from, n);
>  	kcsan_check_read(from, n);
> +	uaccess_buffer_log_write(to, n);
>  }
>  
>  /**
> @@ -44,6 +47,7 @@ instrument_copy_from_user(const void *to, const void __user *from, unsigned long
>  {
>  	kasan_check_write(to, n);
>  	kcsan_check_write(to, n);
> +	uaccess_buffer_log_read(from, n);
>  }
>  
>  #endif /* _LINUX_INSTRUMENTED_UACCESS_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 78c351e35fec..7c5278d7b57d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -34,6 +34,7 @@
>  #include <linux/rseq.h>
>  #include <linux/seqlock.h>
>  #include <linux/kcsan.h>
> +#include <linux/uaccess-buffer-info.h>
>  #include <asm/kmap_size.h>
>  
>  /* task_struct member predeclarations (sorted alphabetically): */
> @@ -1484,6 +1485,10 @@ struct task_struct {
>  	struct callback_head		l1d_flush_kill;
>  #endif
>  
> +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> +	struct uaccess_buffer_info	uaccess_buffer;
> +#endif
> +
>  	/*
>  	 * New fields for task_struct should be added above here, so that
>  	 * they are included in the randomized portion of task_struct.
> diff --git a/include/linux/uaccess-buffer-info.h b/include/linux/uaccess-buffer-info.h
> new file mode 100644
> index 000000000000..15e2d8f7c074
> --- /dev/null
> +++ b/include/linux/uaccess-buffer-info.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_UACCESS_BUFFER_INFO_H
> +#define _LINUX_UACCESS_BUFFER_INFO_H
> +
> +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> +
> +struct uaccess_buffer_info {
> +	/*
> +	 * The pointer to pointer to struct uaccess_descriptor. This is the
> +	 * value controlled by prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR).
> +	 */
> +	struct uaccess_descriptor __user *__user *desc_ptr_ptr;
> +
> +	/*
> +	 * The pointer to struct uaccess_descriptor read at syscall entry time.
> +	 */
> +	struct uaccess_descriptor __user *desc_ptr;
> +
> +	/*
> +	 * A pointer to the kernel's temporary copy of the uaccess log for the
> +	 * current syscall. We log to a kernel buffer in order to avoid leaking
> +	 * timing information to userspace.
> +	 */
> +	struct uaccess_buffer_entry *kbegin;
> +
> +	/*
> +	 * The position of the next uaccess buffer entry for the current
> +	 * syscall, or NULL if we are not logging the current syscall.
> +	 */
> +	struct uaccess_buffer_entry *kcur;
> +
> +	/*
> +	 * A pointer to the end of the kernel's uaccess log.
> +	 */
> +	struct uaccess_buffer_entry *kend;
> +
> +	/*
> +	 * The pointer to the userspace uaccess log, as read from the
> +	 * struct uaccess_descriptor.
> +	 */
> +	struct uaccess_buffer_entry __user *ubegin;
> +};
> +
> +#endif
> +
> +#endif  /* _LINUX_UACCESS_BUFFER_INFO_H */
> diff --git a/include/linux/uaccess-buffer.h b/include/linux/uaccess-buffer.h
> new file mode 100644
> index 000000000000..f2f46db274f3
> --- /dev/null
> +++ b/include/linux/uaccess-buffer.h
> @@ -0,0 +1,112 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_UACCESS_BUFFER_H
> +#define _LINUX_UACCESS_BUFFER_H
> +
> +#include <linux/sched.h>
> +#include <uapi/linux/uaccess-buffer.h>
> +
> +#include <asm-generic/errno-base.h>
> +
> +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER

Kernel-doc comments for each of the below would be useful (if __ prefixed
functions are implementation details, they can be left as-is).

> +
> +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> +{
> +	return test_task_syscall_work(tsk, UACCESS_BUFFER_ENTRY);
> +}
> +
> +void __uaccess_buffer_syscall_entry(void);
> +static inline void uaccess_buffer_syscall_entry(void)
> +{
> +	__uaccess_buffer_syscall_entry();
> +}
> +
> +void __uaccess_buffer_syscall_exit(void);
> +static inline void uaccess_buffer_syscall_exit(void)
> +{
> +	__uaccess_buffer_syscall_exit();
> +}
> +
> +bool __uaccess_buffer_pre_exit_loop(void);
> +static inline bool uaccess_buffer_pre_exit_loop(void)
> +{
> +	if (!test_syscall_work(UACCESS_BUFFER_ENTRY))
> +		return false;
> +	return __uaccess_buffer_pre_exit_loop();
> +}
> +
> +void __uaccess_buffer_post_exit_loop(void);
> +static inline void uaccess_buffer_post_exit_loop(bool pending)
> +{
> +	if (pending)
> +		__uaccess_buffer_post_exit_loop();
> +}
> +
> +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
> +{
> +	current->uaccess_buffer.desc_ptr_ptr =
> +		(struct uaccess_descriptor __user * __user *)addr;
> +	if (addr)
> +		set_syscall_work(UACCESS_BUFFER_ENTRY);
> +	else
> +		clear_syscall_work(UACCESS_BUFFER_ENTRY);
> +	return 0;
> +}
> +
> +size_t copy_from_user_nolog(void *to, const void __user *from, size_t len);

copy_from_user() has unsigned long for @len and also return type. I'd
make them match.

> +
> +void uaccess_buffer_free(struct task_struct *tsk);
> +
> +void __uaccess_buffer_log_read(const void __user *from, unsigned long n);
> +static inline void uaccess_buffer_log_read(const void __user *from, unsigned long n)
> +{
> +	if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
> +		__uaccess_buffer_log_read(from, n);
> +}
> +
> +void __uaccess_buffer_log_write(void __user *to, unsigned long n);
> +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> +{
> +	if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
> +		__uaccess_buffer_log_write(to, n);
> +}
> +
> +#else
> +
> +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> +{
> +	return false;
> +}
> +static inline void uaccess_buffer_syscall_entry(void)
> +{
> +}
> +static inline void uaccess_buffer_syscall_exit(void)
> +{
> +}
> +static inline bool uaccess_buffer_pre_exit_loop(void)
> +{
> +	return false;
> +}
> +static inline void uaccess_buffer_post_exit_loop(bool pending)
> +{
> +}
> +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
> +{
> +	return -EINVAL;
> +}
> +static inline void uaccess_buffer_free(struct task_struct *tsk)
> +{
> +}
> +
> +#define copy_from_user_nolog(to, from, len) copy_from_user(to, from, len)

"#define copy_from_user_nolog copy_from_user" ?

> +
> +static inline void uaccess_buffer_log_read(const void __user *from,
> +					   unsigned long n)
> +{
> +}
> +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> +{
> +}
> +
> +#endif
> +
> +#endif  /* _LINUX_UACCESS_BUFFER_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index bb73e9a0b24f..74b37469c7b3 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -272,4 +272,7 @@ struct prctl_mm_map {
>  # define PR_SCHED_CORE_SCOPE_THREAD_GROUP	1
>  # define PR_SCHED_CORE_SCOPE_PROCESS_GROUP	2
>  
> +/* Configure uaccess logging feature */
> +#define PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR	63
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/include/uapi/linux/uaccess-buffer.h b/include/uapi/linux/uaccess-buffer.h
> new file mode 100644
> index 000000000000..bf10f7c78857
> --- /dev/null
> +++ b/include/uapi/linux/uaccess-buffer.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_UACCESS_BUFFER_H
> +#define _UAPI_LINUX_UACCESS_BUFFER_H
> +
> +#include <linux/types.h>
> +
> +/* Location of the uaccess log. */
> +struct uaccess_descriptor {
> +	/* Address of the uaccess_buffer_entry array. */
> +	__u64 addr;
> +	/* Size of the uaccess_buffer_entry array in number of elements. */
> +	__u64 size;
> +};
> +
> +/* Format of the entries in the uaccess log. */
> +struct uaccess_buffer_entry {
> +	/* Address being accessed. */
> +	__u64 addr;
> +	/* Number of bytes that were accessed. */
> +	__u64 size;
> +	/* UACCESS_BUFFER_* flags. */
> +	__u64 flags;
> +};
> +
> +#define UACCESS_BUFFER_FLAG_WRITE	1 /* access was a write */
> +
> +#endif /* _UAPI_LINUX_UACCESS_BUFFER_H */
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 186c49582f45..d4d9be5146c3 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -114,6 +114,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
>  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
>  obj-$(CONFIG_CFI_CLANG) += cfi.o
> +obj-$(CONFIG_HAVE_ARCH_UACCESS_BUFFER) += uaccess-buffer.o
>  
>  obj-$(CONFIG_PERF_EVENTS) += events/
>  
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 649f07623df6..ab6520a633ef 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -15,6 +15,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/proc_ns.h>
>  #include <linux/security.h>
> +#include <linux/uaccess-buffer.h>
>  
>  #include "../../lib/kstrtox.h"
>  
> @@ -637,7 +638,11 @@ const struct bpf_func_proto bpf_event_output_data_proto =  {
>  BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
>  	   const void __user *, user_ptr)
>  {
> -	int ret = copy_from_user(dst, user_ptr, size);
> +	/*
> +	 * Avoid logging uaccesses here as the BPF program may not be following
> +	 * the uaccess log rules.
> +	 */
> +	int ret = copy_from_user_nolog(dst, user_ptr, size);
>  
>  	if (unlikely(ret)) {
>  		memset(dst, 0, size);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3244cc56b697..8be2ca528a65 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -96,6 +96,7 @@
>  #include <linux/scs.h>
>  #include <linux/io_uring.h>
>  #include <linux/bpf.h>
> +#include <linux/uaccess-buffer.h>
>  
>  #include <asm/pgalloc.h>
>  #include <linux/uaccess.h>
> @@ -754,6 +755,7 @@ void __put_task_struct(struct task_struct *tsk)
>  	delayacct_tsk_free(tsk);
>  	put_signal_struct(tsk->signal);
>  	sched_core_free(tsk);
> +	uaccess_buffer_free(tsk);
>  
>  	if (!profile_handoff_task(tsk))
>  		free_task(tsk);
> @@ -890,6 +892,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  	if (memcg_charge_kernel_stack(tsk))
>  		goto free_stack;
>  
> +	uaccess_buffer_free(orig);
> +
>  	stack_vm_area = task_stack_vm_area(tsk);
>  
>  	err = arch_dup_task_struct(tsk, orig);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a629b11bf3e0..69bf21518bd0 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -45,6 +45,7 @@
>  #include <linux/posix-timers.h>
>  #include <linux/cgroup.h>
>  #include <linux/audit.h>
> +#include <linux/uaccess-buffer.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/signal.h>
> @@ -1031,7 +1032,8 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
>  	if (sig_fatal(p, sig) &&
>  	    !(signal->flags & SIGNAL_GROUP_EXIT) &&
>  	    !sigismember(&t->real_blocked, sig) &&
> -	    (sig == SIGKILL || !p->ptrace)) {
> +	    (sig == SIGKILL ||
> +	     !(p->ptrace || uaccess_buffer_maybe_blocked(p)))) {
>  		/*
>  		 * This signal will be fatal to the whole group.
>  		 */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 8fdac0d90504..c71a9a9c0f68 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -42,6 +42,7 @@
>  #include <linux/version.h>
>  #include <linux/ctype.h>
>  #include <linux/syscall_user_dispatch.h>
> +#include <linux/uaccess-buffer.h>
>  
>  #include <linux/compat.h>
>  #include <linux/syscalls.h>
> @@ -2530,6 +2531,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		error = sched_core_share_pid(arg2, arg3, arg4, arg5);
>  		break;
>  #endif
> +	case PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR:
> +		if (arg3 || arg4 || arg5)
> +			return -EINVAL;
> +		error = uaccess_buffer_set_descriptor_addr_addr(arg2);
> +		break;
>  	default:
>  		error = -EINVAL;
>  		break;
> diff --git a/kernel/uaccess-buffer.c b/kernel/uaccess-buffer.c
> new file mode 100644
> index 000000000000..088e43f7611c
> --- /dev/null
> +++ b/kernel/uaccess-buffer.c
> @@ -0,0 +1,129 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Support for uaccess logging via uaccess buffers.
> + *
> + * Copyright (C) 2021, Google LLC.
> + */
> +
> +#include <linux/compat.h>
> +#include <linux/mm.h>
> +#include <linux/prctl.h>
> +#include <linux/ptrace.h>
> +#include <linux/sched.h>
> +#include <linux/signal.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/uaccess-buffer.h>
> +
> +static void uaccess_buffer_log(unsigned long addr, unsigned long size,
> +			      unsigned long flags)
> +{
> +	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +	struct uaccess_buffer_entry *entry = buf->kcur;
> +
> +	if (entry == buf->kend || unlikely(uaccess_kernel()))
> +		return;
> +	entry->addr = addr;
> +	entry->size = size;
> +	entry->flags = flags;
> +
> +	++buf->kcur;
> +}
> +
> +void __uaccess_buffer_log_read(const void __user *from, unsigned long n)
> +{
> +	uaccess_buffer_log((unsigned long)from, n, 0);
> +}
> +EXPORT_SYMBOL(__uaccess_buffer_log_read);
> +
> +void __uaccess_buffer_log_write(void __user *to, unsigned long n)
> +{
> +	uaccess_buffer_log((unsigned long)to, n, UACCESS_BUFFER_FLAG_WRITE);
> +}
> +EXPORT_SYMBOL(__uaccess_buffer_log_write);
> +
> +bool __uaccess_buffer_pre_exit_loop(void)
> +{
> +	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +	struct uaccess_descriptor __user *desc_ptr;
> +	sigset_t tmp_mask;
> +
> +	if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
> +		return false;
> +
> +	current->real_blocked = current->blocked;
> +	sigfillset(&tmp_mask);
> +	set_current_blocked(&tmp_mask);
> +	return true;
> +}
> +
> +void __uaccess_buffer_post_exit_loop(void)
> +{
> +	spin_lock_irq(&current->sighand->siglock);
> +	current->blocked = current->real_blocked;
> +	recalc_sigpending();
> +	spin_unlock_irq(&current->sighand->siglock);
> +}
> +
> +void uaccess_buffer_free(struct task_struct *tsk)
> +{
> +	struct uaccess_buffer_info *buf = &tsk->uaccess_buffer;
> +
> +	kfree(buf->kbegin);
> +	clear_syscall_work(UACCESS_BUFFER_EXIT);
> +	buf->kbegin = buf->kcur = buf->kend = NULL;
> +}
> +
> +void __uaccess_buffer_syscall_entry(void)
> +{
> +	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +	struct uaccess_descriptor desc;
> +
> +	if (get_user(buf->desc_ptr, buf->desc_ptr_ptr) || !buf->desc_ptr ||
> +	    put_user(0, buf->desc_ptr_ptr) ||
> +	    copy_from_user(&desc, buf->desc_ptr, sizeof(desc)))
> +		return;
> +
> +	if (desc.size > 1024)
> +		desc.size = 1024;
> +
> +	if (buf->kend - buf->kbegin != desc.size)
> +		buf->kbegin =
> +			krealloc_array(buf->kbegin, desc.size,
> +				       sizeof(struct uaccess_buffer_entry),
> +				       GFP_KERNEL);
> +	if (!buf->kbegin)
> +		return;
> +
> +	set_syscall_work(UACCESS_BUFFER_EXIT);
> +	buf->kcur = buf->kbegin;
> +	buf->kend = buf->kbegin + desc.size;
> +	buf->ubegin =
> +		(struct uaccess_buffer_entry __user *)(unsigned long)desc.addr;
> +}
> +
> +void __uaccess_buffer_syscall_exit(void)
> +{
> +	struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> +	u64 num_entries = buf->kcur - buf->kbegin;
> +	struct uaccess_descriptor desc;
> +
> +	clear_syscall_work(UACCESS_BUFFER_EXIT);
> +	desc.addr = (u64)(unsigned long)(buf->ubegin + num_entries);
> +	desc.size = buf->kend - buf->kcur;
> +	buf->kcur = NULL;
> +	if (copy_to_user(buf->ubegin, buf->kbegin,
> +			 num_entries * sizeof(struct uaccess_buffer_entry)) == 0)
> +		(void)copy_to_user(buf->desc_ptr, &desc, sizeof(desc));
> +}
> +
> +size_t copy_from_user_nolog(void *to, const void __user *from, size_t len)
> +{
> +	size_t retval;
> +
> +	clear_syscall_work(UACCESS_BUFFER_EXIT);
> +	retval = copy_from_user(to, from, len);
> +	if (current->uaccess_buffer.kcur)
> +		set_syscall_work(UACCESS_BUFFER_EXIT);
> +	return retval;
> +}
> -- 
> 2.34.1.173.g76aa8bc2d0-goog
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data
  2021-12-08  9:34   ` Dmitry Vyukov
@ 2021-12-09 21:42     ` Peter Collingbourne
  2021-12-10  2:59       ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-09 21:42 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, Dec 8, 2021 at 1:35 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
> >
> > With uaccess logging the contract is that the kernel must not report
> > accessing more data than necessary, as this can lead to false positive
> > reports in downstream consumers. This generally works out of the box
> > when instrumenting copy_{from,to}_user(), but with the data argument
> > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > later how much we actually need.
> >
> > To prevent this from leading to a false positive report, use
> > copy_from_user_nolog(), which will prevent the access from being logged.
> > Recall that it is valid for the kernel to report accessing less
> > data than it actually accessed, as uaccess logging is a best-effort
> > mechanism for reporting uaccesses.
> >
> > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> >  fs/namespace.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 659a8f39c61a..8f5f2aaca64e 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -31,6 +31,7 @@
> >  #include <uapi/linux/mount.h>
> >  #include <linux/fs_context.h>
> >  #include <linux/shmem_fs.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include "pnode.h"
> >  #include "internal.h"
> > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> >         if (!copy)
> >                 return ERR_PTR(-ENOMEM);
> >
> > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > +       /*
> > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > +        * the uaccess buffer, as this can lead to false positive reports in
> > +        * downstream consumers.
> > +        */
> > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
>
> A late idea...
> Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> flag. Better for user-space, at least can detect UAFs by checking the
> first byte. And a more logical kernel annotation (maybe will be used
> in some other tools? or if we ever check user tags in the kernel).
>
> Probably not too important today since we use this only in 2 places,
> but longer term may be better.

I'm not sure about this. The overreads are basically an implementation
detail of the kernel, so I'm not sure it makes sense to expose them. A
scheme where we expose all overreads wouldn't necessarily help with
UAF, because what if for example the kernel reads *behind* the
user-provided pointer? I guess it could lead to false positives.

> Btw, what's the story with BPF accesses? Can we log them theoretically?
>
> Previously the comment said:
>
> +       /*
> +        * Avoid copy_from_user() here as it may leak information about the BPF
> +        * program to userspace via the uaccess buffer.
> +        */
>
> but now it says something very generic:
>
> /*
> * Avoid logging uaccesses here as the BPF program may not be following
> * the uaccess log rules.
> */

Yes we should be able to log them theoretically, but we don't need to
do that now. See my reply here:

https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.

Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] arm64: add support for uaccess logging
  2021-12-08  9:48   ` Dmitry Vyukov
@ 2021-12-09 21:44     ` Peter Collingbourne
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-09 21:44 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, Dec 8, 2021 at 1:49 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
> >
> > arm64 does not use CONFIG_GENERIC_ENTRY, so add the support for
> > uaccess logging directly to the architecture.
> >
> > Link: https://linux-review.googlesource.com/id/I88de539fb9c4a9d27fa8cccbe201a6e4382faf89
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> >  arch/arm64/Kconfig                   | 1 +
> >  arch/arm64/include/asm/thread_info.h | 7 ++++++-
> >  arch/arm64/kernel/ptrace.c           | 7 +++++++
> >  arch/arm64/kernel/signal.c           | 5 +++++
> >  arch/arm64/kernel/syscall.c          | 1 +
> >  5 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index c4207cf9bb17..6023946abe4a 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -161,6 +161,7 @@ config ARM64
> >         select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> >         select HAVE_ARCH_TRACEHOOK
> >         select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > +       select HAVE_ARCH_UACCESS_BUFFER
> >         select HAVE_ARCH_VMAP_STACK
> >         select HAVE_ARM_SMCCC
> >         select HAVE_ASM_MODVERSIONS
> > diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> > index e1317b7c4525..0461b36251ea 100644
> > --- a/arch/arm64/include/asm/thread_info.h
> > +++ b/arch/arm64/include/asm/thread_info.h
> > @@ -82,6 +82,8 @@ int arch_dup_task_struct(struct task_struct *dst,
> >  #define TIF_SVE_VL_INHERIT     24      /* Inherit SVE vl_onexec across exec */
> >  #define TIF_SSBD               25      /* Wants SSB mitigation */
> >  #define TIF_TAGGED_ADDR                26      /* Allow tagged user addresses */
> > +#define TIF_UACCESS_BUFFER_ENTRY 27     /* thread has non-zero uaccess_desc_addr_addr */
> > +#define TIF_UACCESS_BUFFER_EXIT  28     /* thread has non-zero kcur */
> >
> >  #define _TIF_SIGPENDING                (1 << TIF_SIGPENDING)
> >  #define _TIF_NEED_RESCHED      (1 << TIF_NEED_RESCHED)
> > @@ -98,6 +100,8 @@ int arch_dup_task_struct(struct task_struct *dst,
> >  #define _TIF_SVE               (1 << TIF_SVE)
> >  #define _TIF_MTE_ASYNC_FAULT   (1 << TIF_MTE_ASYNC_FAULT)
> >  #define _TIF_NOTIFY_SIGNAL     (1 << TIF_NOTIFY_SIGNAL)
> > +#define _TIF_UACCESS_BUFFER_ENTRY      (1 << TIF_UACCESS_BUFFER_ENTRY)
> > +#define _TIF_UACCESS_BUFFER_EXIT       (1 << TIF_UACCESS_BUFFER_EXIT)
> >
> >  #define _TIF_WORK_MASK         (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
> >                                  _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
> > @@ -106,7 +110,8 @@ int arch_dup_task_struct(struct task_struct *dst,
> >
> >  #define _TIF_SYSCALL_WORK      (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
> >                                  _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
> > -                                _TIF_SYSCALL_EMU)
> > +                                _TIF_SYSCALL_EMU | _TIF_UACCESS_BUFFER_ENTRY | \
> > +                                _TIF_UACCESS_BUFFER_EXIT)
> >
> >  #ifdef CONFIG_SHADOW_CALL_STACK
> >  #define INIT_SCS                                                       \
> > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > index 88a9034fb9b5..283372eccaeb 100644
> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/regset.h>
> >  #include <linux/tracehook.h>
> >  #include <linux/elf.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include <asm/compat.h>
> >  #include <asm/cpufeature.h>
> > @@ -1854,6 +1855,9 @@ int syscall_trace_enter(struct pt_regs *regs)
> >         if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> >                 trace_sys_enter(regs, regs->syscallno);
> >
> > +       if (flags & _TIF_UACCESS_BUFFER_ENTRY)
> > +               uaccess_buffer_syscall_entry();
> > +
> >         audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
> >                             regs->regs[2], regs->regs[3]);
> >
> > @@ -1866,6 +1870,9 @@ void syscall_trace_exit(struct pt_regs *regs)
> >
> >         audit_syscall_exit(regs);
> >
> > +       if (flags & _TIF_UACCESS_BUFFER_EXIT)
> > +               uaccess_buffer_syscall_exit();
> > +
> >         if (flags & _TIF_SYSCALL_TRACEPOINT)
> >                 trace_sys_exit(regs, syscall_get_return_value(current, regs));
> >
> > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> > index 8f6372b44b65..5bbd98e5c257 100644
> > --- a/arch/arm64/kernel/signal.c
> > +++ b/arch/arm64/kernel/signal.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/tracehook.h>
> >  #include <linux/ratelimit.h>
> >  #include <linux/syscalls.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include <asm/daifflags.h>
> >  #include <asm/debug-monitors.h>
> > @@ -919,6 +920,8 @@ static void do_signal(struct pt_regs *regs)
> >
> >  void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
> >  {
> > +       bool uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
> > +
> >         do {
> >                 if (thread_flags & _TIF_NEED_RESCHED) {
> >                         /* Unmask Debug and SError for the next task */
> > @@ -950,6 +953,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
> >                 local_daif_mask();
> >                 thread_flags = READ_ONCE(current_thread_info()->flags);
> >         } while (thread_flags & _TIF_WORK_MASK);
> > +
> > +       uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
> >  }
> >
> >  unsigned long __ro_after_init signal_minsigstksz;
> > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> > index 50a0f1a38e84..d59022b594f2 100644
> > --- a/arch/arm64/kernel/syscall.c
> > +++ b/arch/arm64/kernel/syscall.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/randomize_kstack.h>
> >  #include <linux/syscalls.h>
> > +#include <linux/uaccess-buffer.h>
>
> This looks strange... Does some other header miss this include?

This was left in unintentionally. I'll remove it in v4.

Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/6] kernel: introduce uaccess logging
  2021-12-08 15:33 ` [PATCH v3 0/6] kernel: introduce " Marco Elver
@ 2021-12-09 22:12   ` Peter Collingbourne
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-09 22:12 UTC (permalink / raw)
  To: Marco Elver
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Dmitry Vyukov, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, Dec 8, 2021 at 7:33 AM Marco Elver <elver@google.com> wrote:
>
> On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
> [...]
> > Peter Collingbourne (6):
> >   include: split out uaccess instrumentation into a separate header
> >   uaccess-buffer: add core code
> >   fs: use copy_from_user_nolog() to copy mount() data
> >   uaccess-buffer: add CONFIG_GENERIC_ENTRY support
> >   arm64: add support for uaccess logging
> >   Documentation: document uaccess logging
>
> I think it needs to be possible to disable the feature via a Kconfig
> option. Not all systems want or could even tolerate the additional
> overheads -- even though you say they are minimal elsewhere. For
> example, some embedded systems most likely have absolutely no use for
> this feature, and the increase in .text might be unacceptable. Certain
> features that we usually take for granted are no different (see
> init/Kconfig: FUTEX, EPOLL, .. etc). If you'd like it enabled by
> default, given the overheads are small enough, it can do "default y"
> and be configurable only "if EXPERT".

Okay, done.

> Is it possible to add a kselftest-style test to
> tools/testing/selftests? In addition to the basic tests, can certain
> non-trivial properties, like masking of signals, also be tested? I
> think that'd be extremely valuable, because I'm sure we'd have to
> backport this to several older kernels.

Yes, I've added a new patch with a kselftest. (Good thing I did,
because it (together with DEBUG_PREEMPT) uncovered a bug in the
pre/post-exit-loop code. Fixed in v4.)

Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/6] uaccess-buffer: add core code
  2021-12-08 10:21   ` Dmitry Vyukov
@ 2021-12-09 22:13     ` Peter Collingbourne
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-09 22:13 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, Dec 8, 2021 at 2:21 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, 8 Dec 2021 at 05:48, Peter Collingbourne <pcc@google.com> wrote:
> >
> > Add the core code to support uaccess logging. Subsequent patches will
> > hook this up to the arch-specific kernel entry and exit code for
> > certain architectures.
> >
> > Link: https://linux-review.googlesource.com/id/I6581765646501a5631b281d670903945ebadc57d
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> > v3:
> > - performance optimizations for entry/exit code
> > - don't use kcur == NULL to mean overflow
> > - fix potential double free in clone()
> > - don't allocate a new kernel-side uaccess buffer for each syscall
> > - fix uaccess buffer leak on exit
> > - fix some sparse warnings
> >
> > v2:
> > - New interface that avoids multiple syscalls per real syscall and
> >   is arch-generic
> > - Avoid logging uaccesses done by BPF programs
> > - Add documentation
> > - Split up into multiple patches
> > - Various code moves, renames etc as requested by Marco
> >
> >  fs/exec.c                            |   3 +
> >  include/linux/instrumented-uaccess.h |   6 +-
> >  include/linux/sched.h                |   5 ++
> >  include/linux/uaccess-buffer-info.h  |  46 ++++++++++
> >  include/linux/uaccess-buffer.h       | 112 +++++++++++++++++++++++
> >  include/uapi/linux/prctl.h           |   3 +
> >  include/uapi/linux/uaccess-buffer.h  |  27 ++++++
> >  kernel/Makefile                      |   1 +
> >  kernel/bpf/helpers.c                 |   7 +-
> >  kernel/fork.c                        |   4 +
> >  kernel/signal.c                      |   4 +-
> >  kernel/sys.c                         |   6 ++
> >  kernel/uaccess-buffer.c              | 129 +++++++++++++++++++++++++++
> >  13 files changed, 350 insertions(+), 3 deletions(-)
> >  create mode 100644 include/linux/uaccess-buffer-info.h
> >  create mode 100644 include/linux/uaccess-buffer.h
> >  create mode 100644 include/uapi/linux/uaccess-buffer.h
> >  create mode 100644 kernel/uaccess-buffer.c
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 537d92c41105..c9975e790f30 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -65,6 +65,7 @@
> >  #include <linux/vmalloc.h>
> >  #include <linux/io_uring.h>
> >  #include <linux/syscall_user_dispatch.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include <linux/uaccess.h>
> >  #include <asm/mmu_context.h>
> > @@ -1313,6 +1314,8 @@ int begin_new_exec(struct linux_binprm * bprm)
> >         me->personality &= ~bprm->per_clear;
> >
> >         clear_syscall_work_syscall_user_dispatch(me);
> > +       uaccess_buffer_set_descriptor_addr_addr(0);
> > +       uaccess_buffer_free(current);
> >
> >         /*
> >          * We have to apply CLOEXEC before we change whether the process is
> > diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
> > index ece549088e50..b967f4436d15 100644
> > --- a/include/linux/instrumented-uaccess.h
> > +++ b/include/linux/instrumented-uaccess.h
> > @@ -2,7 +2,8 @@
> >
> >  /*
> >   * This header provides generic wrappers for memory access instrumentation for
> > - * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
> > + * uaccess routines that the compiler cannot emit for: KASAN, KCSAN,
> > + * uaccess buffers.
> >   */
> >  #ifndef _LINUX_INSTRUMENTED_UACCESS_H
> >  #define _LINUX_INSTRUMENTED_UACCESS_H
> > @@ -11,6 +12,7 @@
> >  #include <linux/kasan-checks.h>
> >  #include <linux/kcsan-checks.h>
> >  #include <linux/types.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  /**
> >   * instrument_copy_to_user - instrument reads of copy_to_user
> > @@ -27,6 +29,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
> >  {
> >         kasan_check_read(from, n);
> >         kcsan_check_read(from, n);
> > +       uaccess_buffer_log_write(to, n);
> >  }
> >
> >  /**
> > @@ -44,6 +47,7 @@ instrument_copy_from_user(const void *to, const void __user *from, unsigned long
> >  {
> >         kasan_check_write(to, n);
> >         kcsan_check_write(to, n);
> > +       uaccess_buffer_log_read(from, n);
> >  }
> >
> >  #endif /* _LINUX_INSTRUMENTED_UACCESS_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 78c351e35fec..7c5278d7b57d 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -34,6 +34,7 @@
> >  #include <linux/rseq.h>
> >  #include <linux/seqlock.h>
> >  #include <linux/kcsan.h>
> > +#include <linux/uaccess-buffer-info.h>
> >  #include <asm/kmap_size.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> > @@ -1484,6 +1485,10 @@ struct task_struct {
> >         struct callback_head            l1d_flush_kill;
> >  #endif
> >
> > +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> > +       struct uaccess_buffer_info      uaccess_buffer;
> > +#endif
> > +
> >         /*
> >          * New fields for task_struct should be added above here, so that
> >          * they are included in the randomized portion of task_struct.
> > diff --git a/include/linux/uaccess-buffer-info.h b/include/linux/uaccess-buffer-info.h
> > new file mode 100644
> > index 000000000000..15e2d8f7c074
> > --- /dev/null
> > +++ b/include/linux/uaccess-buffer-info.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_UACCESS_BUFFER_INFO_H
> > +#define _LINUX_UACCESS_BUFFER_INFO_H
> > +
> > +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> > +
> > +struct uaccess_buffer_info {
> > +       /*
> > +        * The pointer to pointer to struct uaccess_descriptor. This is the
> > +        * value controlled by prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR).
> > +        */
> > +       struct uaccess_descriptor __user *__user *desc_ptr_ptr;
> > +
> > +       /*
> > +        * The pointer to struct uaccess_descriptor read at syscall entry time.
> > +        */
> > +       struct uaccess_descriptor __user *desc_ptr;
> > +
> > +       /*
> > +        * A pointer to the kernel's temporary copy of the uaccess log for the
> > +        * current syscall. We log to a kernel buffer in order to avoid leaking
> > +        * timing information to userspace.
> > +        */
> > +       struct uaccess_buffer_entry *kbegin;
> > +
> > +       /*
> > +        * The position of the next uaccess buffer entry for the current
> > +        * syscall, or NULL if we are not logging the current syscall.
> > +        */
> > +       struct uaccess_buffer_entry *kcur;
> > +
> > +       /*
> > +        * A pointer to the end of the kernel's uaccess log.
> > +        */
> > +       struct uaccess_buffer_entry *kend;
> > +
> > +       /*
> > +        * The pointer to the userspace uaccess log, as read from the
> > +        * struct uaccess_descriptor.
> > +        */
> > +       struct uaccess_buffer_entry __user *ubegin;
> > +};
> > +
> > +#endif
> > +
> > +#endif  /* _LINUX_UACCESS_BUFFER_INFO_H */
> > diff --git a/include/linux/uaccess-buffer.h b/include/linux/uaccess-buffer.h
> > new file mode 100644
> > index 000000000000..f2f46db274f3
> > --- /dev/null
> > +++ b/include/linux/uaccess-buffer.h
> > @@ -0,0 +1,112 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_UACCESS_BUFFER_H
> > +#define _LINUX_UACCESS_BUFFER_H
> > +
> > +#include <linux/sched.h>
> > +#include <uapi/linux/uaccess-buffer.h>
> > +
> > +#include <asm-generic/errno-base.h>
> > +
> > +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> > +
> > +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> > +{
> > +       return test_task_syscall_work(tsk, UACCESS_BUFFER_ENTRY);
> > +}
> > +
> > +void __uaccess_buffer_syscall_entry(void);
> > +static inline void uaccess_buffer_syscall_entry(void)
> > +{
> > +       __uaccess_buffer_syscall_entry();
> > +}
> > +
> > +void __uaccess_buffer_syscall_exit(void);
> > +static inline void uaccess_buffer_syscall_exit(void)
> > +{
> > +       __uaccess_buffer_syscall_exit();
> > +}
> > +
> > +bool __uaccess_buffer_pre_exit_loop(void);
> > +static inline bool uaccess_buffer_pre_exit_loop(void)
> > +{
> > +       if (!test_syscall_work(UACCESS_BUFFER_ENTRY))
> > +               return false;
> > +       return __uaccess_buffer_pre_exit_loop();
> > +}
> > +
> > +void __uaccess_buffer_post_exit_loop(void);
> > +static inline void uaccess_buffer_post_exit_loop(bool pending)
> > +{
> > +       if (pending)
> > +               __uaccess_buffer_post_exit_loop();
> > +}
> > +
> > +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
>
> I would move the implementation to .c file. It's a rare path.

Done.

> > +{
> > +       current->uaccess_buffer.desc_ptr_ptr =
> > +               (struct uaccess_descriptor __user * __user *)addr;
> > +       if (addr)
> > +               set_syscall_work(UACCESS_BUFFER_ENTRY);
> > +       else
> > +               clear_syscall_work(UACCESS_BUFFER_ENTRY);
> > +       return 0;
> > +}
> > +
> > +size_t copy_from_user_nolog(void *to, const void __user *from, size_t len);
> > +
> > +void uaccess_buffer_free(struct task_struct *tsk);
> > +
> > +void __uaccess_buffer_log_read(const void __user *from, unsigned long n);
> > +static inline void uaccess_buffer_log_read(const void __user *from, unsigned long n)
> > +{
> > +       if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
>
> UACCESS_BUFFER_EXIT is only defined in future patches, so this won't compile.

Right, but there's no way for CONFIG_UACCESS_BUFFER to be defined at this point,
so this won't compile anyway. We define the constants for this
(TIF_UACCESS_BUFFER_EXIT for arm64, SYSCALL_WORK_UACCESS_BUFFER_EXIT for
GENERIC_ENTRY) at the same time as we enable the respective architecture
support.

>
> > +               __uaccess_buffer_log_read(from, n);
> > +}
> > +
> > +void __uaccess_buffer_log_write(void __user *to, unsigned long n);
> > +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> > +{
> > +       if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
> > +               __uaccess_buffer_log_write(to, n);
> > +}
> > +
> > +#else
> > +
> > +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> > +{
> > +       return false;
> > +}
> > +static inline void uaccess_buffer_syscall_entry(void)
> > +{
> > +}
> > +static inline void uaccess_buffer_syscall_exit(void)
> > +{
> > +}
> > +static inline bool uaccess_buffer_pre_exit_loop(void)
> > +{
> > +       return false;
> > +}
> > +static inline void uaccess_buffer_post_exit_loop(bool pending)
> > +{
> > +}
> > +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
> > +{
> > +       return -EINVAL;
> > +}
> > +static inline void uaccess_buffer_free(struct task_struct *tsk)
> > +{
> > +}
> > +
> > +#define copy_from_user_nolog(to, from, len) copy_from_user(to, from, len)
> > +
> > +static inline void uaccess_buffer_log_read(const void __user *from,
> > +                                          unsigned long n)
> > +{
> > +}
> > +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> > +{
> > +}
> > +
> > +#endif
> > +
> > +#endif  /* _LINUX_UACCESS_BUFFER_H */
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index bb73e9a0b24f..74b37469c7b3 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -272,4 +272,7 @@ struct prctl_mm_map {
> >  # define PR_SCHED_CORE_SCOPE_THREAD_GROUP      1
> >  # define PR_SCHED_CORE_SCOPE_PROCESS_GROUP     2
> >
> > +/* Configure uaccess logging feature */
> > +#define PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR    63
> > +
> >  #endif /* _LINUX_PRCTL_H */
> > diff --git a/include/uapi/linux/uaccess-buffer.h b/include/uapi/linux/uaccess-buffer.h
> > new file mode 100644
> > index 000000000000..bf10f7c78857
> > --- /dev/null
> > +++ b/include/uapi/linux/uaccess-buffer.h
> > @@ -0,0 +1,27 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +#ifndef _UAPI_LINUX_UACCESS_BUFFER_H
> > +#define _UAPI_LINUX_UACCESS_BUFFER_H
> > +
> > +#include <linux/types.h>
> > +
> > +/* Location of the uaccess log. */
> > +struct uaccess_descriptor {
> > +       /* Address of the uaccess_buffer_entry array. */
> > +       __u64 addr;
> > +       /* Size of the uaccess_buffer_entry array in number of elements. */
> > +       __u64 size;
> > +};
> > +
> > +/* Format of the entries in the uaccess log. */
> > +struct uaccess_buffer_entry {
> > +       /* Address being accessed. */
> > +       __u64 addr;
> > +       /* Number of bytes that were accessed. */
> > +       __u64 size;
> > +       /* UACCESS_BUFFER_* flags. */
> > +       __u64 flags;
> > +};
> > +
> > +#define UACCESS_BUFFER_FLAG_WRITE      1 /* access was a write */
> > +
> > +#endif /* _UAPI_LINUX_UACCESS_BUFFER_H */
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 186c49582f45..d4d9be5146c3 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -114,6 +114,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
> >  obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> >  obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> >  obj-$(CONFIG_CFI_CLANG) += cfi.o
> > +obj-$(CONFIG_HAVE_ARCH_UACCESS_BUFFER) += uaccess-buffer.o
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 649f07623df6..ab6520a633ef 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/pid_namespace.h>
> >  #include <linux/proc_ns.h>
> >  #include <linux/security.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include "../../lib/kstrtox.h"
> >
> > @@ -637,7 +638,11 @@ const struct bpf_func_proto bpf_event_output_data_proto =  {
> >  BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
> >            const void __user *, user_ptr)
> >  {
> > -       int ret = copy_from_user(dst, user_ptr, size);
> > +       /*
> > +        * Avoid logging uaccesses here as the BPF program may not be following
> > +        * the uaccess log rules.
> > +        */
> > +       int ret = copy_from_user_nolog(dst, user_ptr, size);
> >
> >         if (unlikely(ret)) {
> >                 memset(dst, 0, size);
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 3244cc56b697..8be2ca528a65 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -96,6 +96,7 @@
> >  #include <linux/scs.h>
> >  #include <linux/io_uring.h>
> >  #include <linux/bpf.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include <asm/pgalloc.h>
> >  #include <linux/uaccess.h>
> > @@ -754,6 +755,7 @@ void __put_task_struct(struct task_struct *tsk)
> >         delayacct_tsk_free(tsk);
> >         put_signal_struct(tsk->signal);
> >         sched_core_free(tsk);
> > +       uaccess_buffer_free(tsk);
> >
> >         if (!profile_handoff_task(tsk))
> >                 free_task(tsk);
> > @@ -890,6 +892,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> >         if (memcg_charge_kernel_stack(tsk))
> >                 goto free_stack;
> >
> > +       uaccess_buffer_free(orig);
> > +
> >         stack_vm_area = task_stack_vm_area(tsk);
> >
> >         err = arch_dup_task_struct(tsk, orig);
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index a629b11bf3e0..69bf21518bd0 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -45,6 +45,7 @@
> >  #include <linux/posix-timers.h>
> >  #include <linux/cgroup.h>
> >  #include <linux/audit.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/signal.h>
> > @@ -1031,7 +1032,8 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
> >         if (sig_fatal(p, sig) &&
> >             !(signal->flags & SIGNAL_GROUP_EXIT) &&
> >             !sigismember(&t->real_blocked, sig) &&
> > -           (sig == SIGKILL || !p->ptrace)) {
> > +           (sig == SIGKILL ||
> > +            !(p->ptrace || uaccess_buffer_maybe_blocked(p)))) {
> >                 /*
> >                  * This signal will be fatal to the whole group.
> >                  */
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 8fdac0d90504..c71a9a9c0f68 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -42,6 +42,7 @@
> >  #include <linux/version.h>
> >  #include <linux/ctype.h>
> >  #include <linux/syscall_user_dispatch.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include <linux/compat.h>
> >  #include <linux/syscalls.h>
> > @@ -2530,6 +2531,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> >                 error = sched_core_share_pid(arg2, arg3, arg4, arg5);
> >                 break;
> >  #endif
> > +       case PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR:
> > +               if (arg3 || arg4 || arg5)
> > +                       return -EINVAL;
> > +               error = uaccess_buffer_set_descriptor_addr_addr(arg2);
>
> Does this miss uaccess_buffer_free() for the 0 case?
> It seems that when we set it to 0 we always want to free as well (e.g.
> in exec). I wonder if freeing should be done by
> uaccess_buffer_set_descriptor_addr_addr() itself.
> Both uaccess_buffer_set_descriptor_addr_addr() and
> uaccess_buffer_free() reset task work, which is fine but is somewhat
> suboptimal logically.
> Then task exit could do uaccess_buffer_set_descriptor_addr_addr(0).

We don't need to free in that case because we can just log to the original
location. I originally had uaccess_buffer_set_descriptor_addr_addr() do the
freeing, but you asked me to change it to avoid the free [1]. I guess I don't
have a strong opinion but slightly prefer having the two functions do orthogonal
things.

[1] https://lore.kernel.org/all/CACT4Y+aoiT+z+3CMBNmO0SwXBXpfDCsHY7pPLf54S8V=c-a8ag@mail.gmail.com/#:~:text=Is%20this%20necessary

>
> > +               break;
> >         default:
> >                 error = -EINVAL;
> >                 break;
> > diff --git a/kernel/uaccess-buffer.c b/kernel/uaccess-buffer.c
> > new file mode 100644
> > index 000000000000..088e43f7611c
> > --- /dev/null
> > +++ b/kernel/uaccess-buffer.c
> > @@ -0,0 +1,129 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Support for uaccess logging via uaccess buffers.
> > + *
> > + * Copyright (C) 2021, Google LLC.
> > + */
> > +
> > +#include <linux/compat.h>
> > +#include <linux/mm.h>
> > +#include <linux/prctl.h>
> > +#include <linux/ptrace.h>
> > +#include <linux/sched.h>
> > +#include <linux/signal.h>
> > +#include <linux/slab.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/uaccess-buffer.h>
> > +
> > +static void uaccess_buffer_log(unsigned long addr, unsigned long size,
> > +                             unsigned long flags)
> > +{
> > +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> > +       struct uaccess_buffer_entry *entry = buf->kcur;
> > +
> > +       if (entry == buf->kend || unlikely(uaccess_kernel()))
> > +               return;
> > +       entry->addr = addr;
> > +       entry->size = size;
> > +       entry->flags = flags;
> > +
> > +       ++buf->kcur;
> > +}
> > +
> > +void __uaccess_buffer_log_read(const void __user *from, unsigned long n)
> > +{
> > +       uaccess_buffer_log((unsigned long)from, n, 0);
> > +}
> > +EXPORT_SYMBOL(__uaccess_buffer_log_read);
> > +
> > +void __uaccess_buffer_log_write(void __user *to, unsigned long n)
> > +{
> > +       uaccess_buffer_log((unsigned long)to, n, UACCESS_BUFFER_FLAG_WRITE);
> > +}
> > +EXPORT_SYMBOL(__uaccess_buffer_log_write);
> > +
> > +bool __uaccess_buffer_pre_exit_loop(void)
> > +{
> > +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> > +       struct uaccess_descriptor __user *desc_ptr;
> > +       sigset_t tmp_mask;
> > +
> > +       if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
> > +               return false;
> > +
> > +       current->real_blocked = current->blocked;
> > +       sigfillset(&tmp_mask);
>
> This and __uaccess_buffer_post_exit_loop() runs only when we have a
> signal/timer interrupt between setting the descriptor address in
> userspace and entering the next syscall, right?
> Just want to make sure this code is not executed for normal uaccess
> tracing for performance reasons.

They only need to run if something in _TIF_WORK_MASK (arm64) or
EXIT_TO_USER_MODE_WORK (GENERIC_ENTRY) is set, i.e. signals pending
or some kind of tracing enabled. That's how it works on the arm64
side (check is in the caller of do_notify_resume) but I had neglected
to move the calls into the if statement on the GENERIC_ENTRY side;
done in v4.

> > +       set_current_blocked(&tmp_mask);
> > +       return true;
> > +}
> > +
> > +void __uaccess_buffer_post_exit_loop(void)
> > +{
> > +       spin_lock_irq(&current->sighand->siglock);
> > +       current->blocked = current->real_blocked;
> > +       recalc_sigpending();
> > +       spin_unlock_irq(&current->sighand->siglock);
> > +}
> > +
> > +void uaccess_buffer_free(struct task_struct *tsk)
> > +{
> > +       struct uaccess_buffer_info *buf = &tsk->uaccess_buffer;
> > +
> > +       kfree(buf->kbegin);
> > +       clear_syscall_work(UACCESS_BUFFER_EXIT);
> > +       buf->kbegin = buf->kcur = buf->kend = NULL;
> > +}
> > +
> > +void __uaccess_buffer_syscall_entry(void)
> > +{
> > +       struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> > +       struct uaccess_descriptor desc;
> > +
> > +       if (get_user(buf->desc_ptr, buf->desc_ptr_ptr) || !buf->desc_ptr ||
> > +           put_user(0, buf->desc_ptr_ptr) ||
> > +           copy_from_user(&desc, buf->desc_ptr, sizeof(desc)))
> > +               return;
> > +
> > +       if (desc.size > 1024)
> > +               desc.size = 1024;
> > +
> > +       if (buf->kend - buf->kbegin != desc.size)
> > +               buf->kbegin =
> > +                       krealloc_array(buf->kbegin, desc.size,
> > +                                      sizeof(struct uaccess_buffer_entry),
> > +                                      GFP_KERNEL);
> > +       if (!buf->kbegin)
>
> I think we also need to set at least buf->kend to NULL here.
> I am not sure what can go wrong now, but it's a strange state. On next
> iteration we will do "buf->kend - buf->kbegin", where kend is a
> dangling pointer and kbegin is NULL.

Done.

Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/6] uaccess-buffer: add core code
  2021-12-08 16:46   ` Marco Elver
@ 2021-12-09 22:14     ` Peter Collingbourne
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-09 22:14 UTC (permalink / raw)
  To: Marco Elver
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Dmitry Vyukov, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Wed, Dec 8, 2021 at 8:46 AM Marco Elver <elver@google.com> wrote:
>
> On Tue, Dec 07, 2021 at 08:48PM -0800, Peter Collingbourne wrote:
> > Add the core code to support uaccess logging. Subsequent patches will
> > hook this up to the arch-specific kernel entry and exit code for
> > certain architectures.
> >
> > Link: https://linux-review.googlesource.com/id/I6581765646501a5631b281d670903945ebadc57d
> > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > ---
> > v3:
> > - performance optimizations for entry/exit code
> > - don't use kcur == NULL to mean overflow
> > - fix potential double free in clone()
> > - don't allocate a new kernel-side uaccess buffer for each syscall
> > - fix uaccess buffer leak on exit
> > - fix some sparse warnings
> >
> > v2:
> > - New interface that avoids multiple syscalls per real syscall and
> >   is arch-generic
> > - Avoid logging uaccesses done by BPF programs
> > - Add documentation
> > - Split up into multiple patches
> > - Various code moves, renames etc as requested by Marco
> >
> >  fs/exec.c                            |   3 +
> >  include/linux/instrumented-uaccess.h |   6 +-
> >  include/linux/sched.h                |   5 ++
> >  include/linux/uaccess-buffer-info.h  |  46 ++++++++++
> >  include/linux/uaccess-buffer.h       | 112 +++++++++++++++++++++++
> >  include/uapi/linux/prctl.h           |   3 +
> >  include/uapi/linux/uaccess-buffer.h  |  27 ++++++
> >  kernel/Makefile                      |   1 +
> >  kernel/bpf/helpers.c                 |   7 +-
> >  kernel/fork.c                        |   4 +
> >  kernel/signal.c                      |   4 +-
> >  kernel/sys.c                         |   6 ++
> >  kernel/uaccess-buffer.c              | 129 +++++++++++++++++++++++++++
> >  13 files changed, 350 insertions(+), 3 deletions(-)
> >  create mode 100644 include/linux/uaccess-buffer-info.h
> >  create mode 100644 include/linux/uaccess-buffer.h
> >  create mode 100644 include/uapi/linux/uaccess-buffer.h
> >  create mode 100644 kernel/uaccess-buffer.c
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 537d92c41105..c9975e790f30 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -65,6 +65,7 @@
> >  #include <linux/vmalloc.h>
> >  #include <linux/io_uring.h>
> >  #include <linux/syscall_user_dispatch.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  #include <linux/uaccess.h>
> >  #include <asm/mmu_context.h>
> > @@ -1313,6 +1314,8 @@ int begin_new_exec(struct linux_binprm * bprm)
> >       me->personality &= ~bprm->per_clear;
> >
> >       clear_syscall_work_syscall_user_dispatch(me);
> > +     uaccess_buffer_set_descriptor_addr_addr(0);
> > +     uaccess_buffer_free(current);
> >
> >       /*
> >        * We have to apply CLOEXEC before we change whether the process is
> > diff --git a/include/linux/instrumented-uaccess.h b/include/linux/instrumented-uaccess.h
> > index ece549088e50..b967f4436d15 100644
> > --- a/include/linux/instrumented-uaccess.h
> > +++ b/include/linux/instrumented-uaccess.h
> > @@ -2,7 +2,8 @@
> >
> >  /*
> >   * This header provides generic wrappers for memory access instrumentation for
> > - * uaccess routines that the compiler cannot emit for: KASAN, KCSAN.
> > + * uaccess routines that the compiler cannot emit for: KASAN, KCSAN,
> > + * uaccess buffers.
> >   */
> >  #ifndef _LINUX_INSTRUMENTED_UACCESS_H
> >  #define _LINUX_INSTRUMENTED_UACCESS_H
> > @@ -11,6 +12,7 @@
> >  #include <linux/kasan-checks.h>
> >  #include <linux/kcsan-checks.h>
> >  #include <linux/types.h>
> > +#include <linux/uaccess-buffer.h>
> >
> >  /**
> >   * instrument_copy_to_user - instrument reads of copy_to_user
> > @@ -27,6 +29,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
> >  {
> >       kasan_check_read(from, n);
> >       kcsan_check_read(from, n);
> > +     uaccess_buffer_log_write(to, n);
> >  }
> >
> >  /**
> > @@ -44,6 +47,7 @@ instrument_copy_from_user(const void *to, const void __user *from, unsigned long
> >  {
> >       kasan_check_write(to, n);
> >       kcsan_check_write(to, n);
> > +     uaccess_buffer_log_read(from, n);
> >  }
> >
> >  #endif /* _LINUX_INSTRUMENTED_UACCESS_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 78c351e35fec..7c5278d7b57d 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -34,6 +34,7 @@
> >  #include <linux/rseq.h>
> >  #include <linux/seqlock.h>
> >  #include <linux/kcsan.h>
> > +#include <linux/uaccess-buffer-info.h>
> >  #include <asm/kmap_size.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> > @@ -1484,6 +1485,10 @@ struct task_struct {
> >       struct callback_head            l1d_flush_kill;
> >  #endif
> >
> > +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> > +     struct uaccess_buffer_info      uaccess_buffer;
> > +#endif
> > +
> >       /*
> >        * New fields for task_struct should be added above here, so that
> >        * they are included in the randomized portion of task_struct.
> > diff --git a/include/linux/uaccess-buffer-info.h b/include/linux/uaccess-buffer-info.h
> > new file mode 100644
> > index 000000000000..15e2d8f7c074
> > --- /dev/null
> > +++ b/include/linux/uaccess-buffer-info.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_UACCESS_BUFFER_INFO_H
> > +#define _LINUX_UACCESS_BUFFER_INFO_H
> > +
> > +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
> > +
> > +struct uaccess_buffer_info {
> > +     /*
> > +      * The pointer to pointer to struct uaccess_descriptor. This is the
> > +      * value controlled by prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR).
> > +      */
> > +     struct uaccess_descriptor __user *__user *desc_ptr_ptr;
> > +
> > +     /*
> > +      * The pointer to struct uaccess_descriptor read at syscall entry time.
> > +      */
> > +     struct uaccess_descriptor __user *desc_ptr;
> > +
> > +     /*
> > +      * A pointer to the kernel's temporary copy of the uaccess log for the
> > +      * current syscall. We log to a kernel buffer in order to avoid leaking
> > +      * timing information to userspace.
> > +      */
> > +     struct uaccess_buffer_entry *kbegin;
> > +
> > +     /*
> > +      * The position of the next uaccess buffer entry for the current
> > +      * syscall, or NULL if we are not logging the current syscall.
> > +      */
> > +     struct uaccess_buffer_entry *kcur;
> > +
> > +     /*
> > +      * A pointer to the end of the kernel's uaccess log.
> > +      */
> > +     struct uaccess_buffer_entry *kend;
> > +
> > +     /*
> > +      * The pointer to the userspace uaccess log, as read from the
> > +      * struct uaccess_descriptor.
> > +      */
> > +     struct uaccess_buffer_entry __user *ubegin;
> > +};
> > +
> > +#endif
> > +
> > +#endif  /* _LINUX_UACCESS_BUFFER_INFO_H */
> > diff --git a/include/linux/uaccess-buffer.h b/include/linux/uaccess-buffer.h
> > new file mode 100644
> > index 000000000000..f2f46db274f3
> > --- /dev/null
> > +++ b/include/linux/uaccess-buffer.h
> > @@ -0,0 +1,112 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_UACCESS_BUFFER_H
> > +#define _LINUX_UACCESS_BUFFER_H
> > +
> > +#include <linux/sched.h>
> > +#include <uapi/linux/uaccess-buffer.h>
> > +
> > +#include <asm-generic/errno-base.h>
> > +
> > +#ifdef CONFIG_HAVE_ARCH_UACCESS_BUFFER
>
> Kernel-doc comments for each of the below would be useful (if __ prefixed
> functions are implementation details, they can be left as-is).

Done.

> > +
> > +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> > +{
> > +     return test_task_syscall_work(tsk, UACCESS_BUFFER_ENTRY);
> > +}
> > +
> > +void __uaccess_buffer_syscall_entry(void);
> > +static inline void uaccess_buffer_syscall_entry(void)
> > +{
> > +     __uaccess_buffer_syscall_entry();
> > +}
> > +
> > +void __uaccess_buffer_syscall_exit(void);
> > +static inline void uaccess_buffer_syscall_exit(void)
> > +{
> > +     __uaccess_buffer_syscall_exit();
> > +}
> > +
> > +bool __uaccess_buffer_pre_exit_loop(void);
> > +static inline bool uaccess_buffer_pre_exit_loop(void)
> > +{
> > +     if (!test_syscall_work(UACCESS_BUFFER_ENTRY))
> > +             return false;
> > +     return __uaccess_buffer_pre_exit_loop();
> > +}
> > +
> > +void __uaccess_buffer_post_exit_loop(void);
> > +static inline void uaccess_buffer_post_exit_loop(bool pending)
> > +{
> > +     if (pending)
> > +             __uaccess_buffer_post_exit_loop();
> > +}
> > +
> > +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
> > +{
> > +     current->uaccess_buffer.desc_ptr_ptr =
> > +             (struct uaccess_descriptor __user * __user *)addr;
> > +     if (addr)
> > +             set_syscall_work(UACCESS_BUFFER_ENTRY);
> > +     else
> > +             clear_syscall_work(UACCESS_BUFFER_ENTRY);
> > +     return 0;
> > +}
> > +
> > +size_t copy_from_user_nolog(void *to, const void __user *from, size_t len);
>
> copy_from_user() has unsigned long for @len and also return type. I'd
> make them match.

Done.

> > +
> > +void uaccess_buffer_free(struct task_struct *tsk);
> > +
> > +void __uaccess_buffer_log_read(const void __user *from, unsigned long n);
> > +static inline void uaccess_buffer_log_read(const void __user *from, unsigned long n)
> > +{
> > +     if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
> > +             __uaccess_buffer_log_read(from, n);
> > +}
> > +
> > +void __uaccess_buffer_log_write(void __user *to, unsigned long n);
> > +static inline void uaccess_buffer_log_write(void __user *to, unsigned long n)
> > +{
> > +     if (unlikely(test_syscall_work(UACCESS_BUFFER_EXIT)))
> > +             __uaccess_buffer_log_write(to, n);
> > +}
> > +
> > +#else
> > +
> > +static inline bool uaccess_buffer_maybe_blocked(struct task_struct *tsk)
> > +{
> > +     return false;
> > +}
> > +static inline void uaccess_buffer_syscall_entry(void)
> > +{
> > +}
> > +static inline void uaccess_buffer_syscall_exit(void)
> > +{
> > +}
> > +static inline bool uaccess_buffer_pre_exit_loop(void)
> > +{
> > +     return false;
> > +}
> > +static inline void uaccess_buffer_post_exit_loop(bool pending)
> > +{
> > +}
> > +static inline int uaccess_buffer_set_descriptor_addr_addr(unsigned long addr)
> > +{
> > +     return -EINVAL;
> > +}
> > +static inline void uaccess_buffer_free(struct task_struct *tsk)
> > +{
> > +}
> > +
> > +#define copy_from_user_nolog(to, from, len) copy_from_user(to, from, len)
>
> "#define copy_from_user_nolog copy_from_user" ?

Done.

Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data
  2021-12-09 21:42     ` Peter Collingbourne
@ 2021-12-10  2:59       ` Dmitry Vyukov
  2021-12-10  4:02         ` Peter Collingbourne
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-10  2:59 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote:
> > > With uaccess logging the contract is that the kernel must not report
> > > accessing more data than necessary, as this can lead to false positive
> > > reports in downstream consumers. This generally works out of the box
> > > when instrumenting copy_{from,to}_user(), but with the data argument
> > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > > later how much we actually need.
> > >
> > > To prevent this from leading to a false positive report, use
> > > copy_from_user_nolog(), which will prevent the access from being logged.
> > > Recall that it is valid for the kernel to report accessing less
> > > data than it actually accessed, as uaccess logging is a best-effort
> > > mechanism for reporting uaccesses.
> > >
> > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > > ---
> > >  fs/namespace.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index 659a8f39c61a..8f5f2aaca64e 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -31,6 +31,7 @@
> > >  #include <uapi/linux/mount.h>
> > >  #include <linux/fs_context.h>
> > >  #include <linux/shmem_fs.h>
> > > +#include <linux/uaccess-buffer.h>
> > >
> > >  #include "pnode.h"
> > >  #include "internal.h"
> > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> > >         if (!copy)
> > >                 return ERR_PTR(-ENOMEM);
> > >
> > > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > > +       /*
> > > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > > +        * the uaccess buffer, as this can lead to false positive reports in
> > > +        * downstream consumers.
> > > +        */
> > > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
> >
> > A late idea...
> > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> > flag. Better for user-space, at least can detect UAFs by checking the
> > first byte. And a more logical kernel annotation (maybe will be used
> > in some other tools? or if we ever check user tags in the kernel).
> >
> > Probably not too important today since we use this only in 2 places,
> > but longer term may be better.
>
> I'm not sure about this. The overreads are basically an implementation
> detail of the kernel, so I'm not sure it makes sense to expose them. A
> scheme where we expose all overreads wouldn't necessarily help with
> UAF, because what if for example the kernel reads *behind* the
> user-provided pointer? I guess it could lead to false positives.

If user-space uses logging to check addressability, then it can safely
check only the first byte (right? there must be at least 1 byte passed
by user-space at that address). And that's enough to detect UAFs.

> > Btw, what's the story with BPF accesses? Can we log them theoretically?
> >
> > Previously the comment said:
> >
> > +       /*
> > +        * Avoid copy_from_user() here as it may leak information about the BPF
> > +        * program to userspace via the uaccess buffer.
> > +        */
> >
> > but now it says something very generic:
> >
> > /*
> > * Avoid logging uaccesses here as the BPF program may not be following
> > * the uaccess log rules.
> > */
>
> Yes we should be able to log them theoretically, but we don't need to
> do that now. See my reply here:
>
> https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.

I see. These could be marked with another flag.
I don't have a strong opinion about this. But I am mentioning this
because my experience is that it's better to expose more raw info from
kernel in these cases, rather than hardcoding policies into kernel
code (what's ignored/why/when) b/c a delay from another kernel change
to wide deployment is 5+ years and user-space code may need to detect
and deal with all various versions of the kernel logic.
Say, fuzzing may still want to know about the mount options (rather
than no signal that the kernel reads at least something at that
address). But adding them later with a flag is not really a backwards
compatible change b/c you now have addressability checking code that's
not checking the new flag and will produce false positives.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data
  2021-12-10  2:59       ` Dmitry Vyukov
@ 2021-12-10  4:02         ` Peter Collingbourne
  2021-12-10  4:23           ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Collingbourne @ 2021-12-10  4:02 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Thu, Dec 9, 2021 at 6:59 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote:
> > > > With uaccess logging the contract is that the kernel must not report
> > > > accessing more data than necessary, as this can lead to false positive
> > > > reports in downstream consumers. This generally works out of the box
> > > > when instrumenting copy_{from,to}_user(), but with the data argument
> > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > > > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > > > later how much we actually need.
> > > >
> > > > To prevent this from leading to a false positive report, use
> > > > copy_from_user_nolog(), which will prevent the access from being logged.
> > > > Recall that it is valid for the kernel to report accessing less
> > > > data than it actually accessed, as uaccess logging is a best-effort
> > > > mechanism for reporting uaccesses.
> > > >
> > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > > > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > > > ---
> > > >  fs/namespace.c | 8 +++++++-
> > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > index 659a8f39c61a..8f5f2aaca64e 100644
> > > > --- a/fs/namespace.c
> > > > +++ b/fs/namespace.c
> > > > @@ -31,6 +31,7 @@
> > > >  #include <uapi/linux/mount.h>
> > > >  #include <linux/fs_context.h>
> > > >  #include <linux/shmem_fs.h>
> > > > +#include <linux/uaccess-buffer.h>
> > > >
> > > >  #include "pnode.h"
> > > >  #include "internal.h"
> > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> > > >         if (!copy)
> > > >                 return ERR_PTR(-ENOMEM);
> > > >
> > > > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > > > +       /*
> > > > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > > > +        * the uaccess buffer, as this can lead to false positive reports in
> > > > +        * downstream consumers.
> > > > +        */
> > > > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
> > >
> > > A late idea...
> > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> > > flag. Better for user-space, at least can detect UAFs by checking the
> > > first byte. And a more logical kernel annotation (maybe will be used
> > > in some other tools? or if we ever check user tags in the kernel).
> > >
> > > Probably not too important today since we use this only in 2 places,
> > > but longer term may be better.
> >
> > I'm not sure about this. The overreads are basically an implementation
> > detail of the kernel, so I'm not sure it makes sense to expose them. A
> > scheme where we expose all overreads wouldn't necessarily help with
> > UAF, because what if for example the kernel reads *behind* the
> > user-provided pointer? I guess it could lead to false positives.
>
> If user-space uses logging to check addressability, then it can safely
> check only the first byte (right? there must be at least 1 byte passed
> by user-space at that address). And that's enough to detect UAFs.

I was thinking more e.g. what if the kernel reads an entire page with
copy_from_user() and takes a subset of it later. Then the first byte
could point to some other random allocation in the same page and lead
to a false UAF report if we just consider the first byte.

So I think the use cases for accesses with this flag set may be
limited to things like fuzzers.

> > > Btw, what's the story with BPF accesses? Can we log them theoretically?
> > >
> > > Previously the comment said:
> > >
> > > +       /*
> > > +        * Avoid copy_from_user() here as it may leak information about the BPF
> > > +        * program to userspace via the uaccess buffer.
> > > +        */
> > >
> > > but now it says something very generic:
> > >
> > > /*
> > > * Avoid logging uaccesses here as the BPF program may not be following
> > > * the uaccess log rules.
> > > */
> >
> > Yes we should be able to log them theoretically, but we don't need to
> > do that now. See my reply here:
> >
> > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.
>
> I see. These could be marked with another flag.
> I don't have a strong opinion about this. But I am mentioning this
> because my experience is that it's better to expose more raw info from
> kernel in these cases, rather than hardcoding policies into kernel
> code (what's ignored/why/when) b/c a delay from another kernel change
> to wide deployment is 5+ years and user-space code may need to detect
> and deal with all various versions of the kernel logic.
> Say, fuzzing may still want to know about the mount options (rather
> than no signal that the kernel reads at least something at that
> address). But adding them later with a flag is not really a backwards
> compatible change b/c you now have addressability checking code that's
> not checking the new flag and will produce false positives.

I think this is a good point. I'll see about adding flags for the BPF
and overread cases.

Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data
  2021-12-10  4:02         ` Peter Collingbourne
@ 2021-12-10  4:23           ` Dmitry Vyukov
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2021-12-10  4:23 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Catalin Marinas, Will Deacon, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Thomas Gleixner, Andy Lutomirski, Kees Cook, Andrew Morton,
	Masahiro Yamada, Sami Tolvanen, YiFei Zhu, Mark Rutland,
	Frederic Weisbecker, Viresh Kumar, Andrey Konovalov,
	Gabriel Krisman Bertazi, Chris Hyser, Daniel Vetter,
	Chris Wilson, Arnd Bergmann, Christian Brauner,
	Eric W. Biederman, Alexey Gladkov, Ran Xiaokai,
	David Hildenbrand, Xiaofeng Cao, Cyrill Gorcunov, Thomas Cedeno,
	Marco Elver, Alexander Potapenko, linux-kernel, linux-arm-kernel,
	Evgenii Stepanov

On Fri, 10 Dec 2021 at 05:02, Peter Collingbourne <pcc@google.com> wrote:
> > On Thu, 9 Dec 2021 at 22:42, Peter Collingbourne <pcc@google.com> wrote:
> > > > > With uaccess logging the contract is that the kernel must not report
> > > > > accessing more data than necessary, as this can lead to false positive
> > > > > reports in downstream consumers. This generally works out of the box
> > > > > when instrumenting copy_{from,to}_user(), but with the data argument
> > > > > to mount() we use copy_from_user() to copy PAGE_SIZE bytes (or as
> > > > > much as we can, if the PAGE_SIZE sized access failed) and figure out
> > > > > later how much we actually need.
> > > > >
> > > > > To prevent this from leading to a false positive report, use
> > > > > copy_from_user_nolog(), which will prevent the access from being logged.
> > > > > Recall that it is valid for the kernel to report accessing less
> > > > > data than it actually accessed, as uaccess logging is a best-effort
> > > > > mechanism for reporting uaccesses.
> > > > >
> > > > > Link: https://linux-review.googlesource.com/id/I5629b92a725c817acd9a861288338dd605cafee6
> > > > > Signed-off-by: Peter Collingbourne <pcc@google.com>
> > > > > ---
> > > > >  fs/namespace.c | 8 +++++++-
> > > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > > index 659a8f39c61a..8f5f2aaca64e 100644
> > > > > --- a/fs/namespace.c
> > > > > +++ b/fs/namespace.c
> > > > > @@ -31,6 +31,7 @@
> > > > >  #include <uapi/linux/mount.h>
> > > > >  #include <linux/fs_context.h>
> > > > >  #include <linux/shmem_fs.h>
> > > > > +#include <linux/uaccess-buffer.h>
> > > > >
> > > > >  #include "pnode.h"
> > > > >  #include "internal.h"
> > > > > @@ -3197,7 +3198,12 @@ static void *copy_mount_options(const void __user * data)
> > > > >         if (!copy)
> > > > >                 return ERR_PTR(-ENOMEM);
> > > > >
> > > > > -       left = copy_from_user(copy, data, PAGE_SIZE);
> > > > > +       /*
> > > > > +        * Use copy_from_user_nolog to avoid reporting overly large accesses in
> > > > > +        * the uaccess buffer, as this can lead to false positive reports in
> > > > > +        * downstream consumers.
> > > > > +        */
> > > > > +       left = copy_from_user_nolog(copy, data, PAGE_SIZE);
> > > >
> > > > A late idea...
> > > > Maybe it's better to log them with a new UACCESS_BUFFER_FLAG_OVERREAD
> > > > flag. Better for user-space, at least can detect UAFs by checking the
> > > > first byte. And a more logical kernel annotation (maybe will be used
> > > > in some other tools? or if we ever check user tags in the kernel).
> > > >
> > > > Probably not too important today since we use this only in 2 places,
> > > > but longer term may be better.
> > >
> > > I'm not sure about this. The overreads are basically an implementation
> > > detail of the kernel, so I'm not sure it makes sense to expose them. A
> > > scheme where we expose all overreads wouldn't necessarily help with
> > > UAF, because what if for example the kernel reads *behind* the
> > > user-provided pointer? I guess it could lead to false positives.
> >
> > If user-space uses logging to check addressability, then it can safely
> > check only the first byte (right? there must be at least 1 byte passed
> > by user-space at that address). And that's enough to detect UAFs.
>
> I was thinking more e.g. what if the kernel reads an entire page with
> copy_from_user() and takes a subset of it later. Then the first byte
> could point to some other random allocation in the same page and lead
> to a false UAF report if we just consider the first byte.

Humm.. good point. As I said I am not strong about this. I don't know
what's the right answer.

> So I think the use cases for accesses with this flag set may be
> limited to things like fuzzers.
>
> > > > Btw, what's the story with BPF accesses? Can we log them theoretically?
> > > >
> > > > Previously the comment said:
> > > >
> > > > +       /*
> > > > +        * Avoid copy_from_user() here as it may leak information about the BPF
> > > > +        * program to userspace via the uaccess buffer.
> > > > +        */
> > > >
> > > > but now it says something very generic:
> > > >
> > > > /*
> > > > * Avoid logging uaccesses here as the BPF program may not be following
> > > > * the uaccess log rules.
> > > > */
> > >
> > > Yes we should be able to log them theoretically, but we don't need to
> > > do that now. See my reply here:
> > >
> > > https://lore.kernel.org/all/CAMn1gO5B5Q3hfN6kugv2wmdFGNhJb75iRX1zmCkw3wnueN1dtg@mail.gmail.com/#:~:text=This%20comment%20was,the%20comment%20accordingly.
> >
> > I see. These could be marked with another flag.
> > I don't have a strong opinion about this. But I am mentioning this
> > because my experience is that it's better to expose more raw info from
> > kernel in these cases, rather than hardcoding policies into kernel
> > code (what's ignored/why/when) b/c a delay from another kernel change
> > to wide deployment is 5+ years and user-space code may need to detect
> > and deal with all various versions of the kernel logic.
> > Say, fuzzing may still want to know about the mount options (rather
> > than no signal that the kernel reads at least something at that
> > address). But adding them later with a flag is not really a backwards
> > compatible change b/c you now have addressability checking code that's
> > not checking the new flag and will produce false positives.
>
> I think this is a good point. I'll see about adding flags for the BPF
> and overread cases.
>
> Peter

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-12-10  4:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08  4:48 [PATCH v3 0/6] kernel: introduce uaccess logging Peter Collingbourne
2021-12-08  4:48 ` [PATCH v3 1/6] include: split out uaccess instrumentation into a separate header Peter Collingbourne
2021-12-08  9:27   ` Dmitry Vyukov
2021-12-08  4:48 ` [PATCH v3 2/6] uaccess-buffer: add core code Peter Collingbourne
2021-12-08 10:21   ` Dmitry Vyukov
2021-12-09 22:13     ` Peter Collingbourne
2021-12-08 16:46   ` Marco Elver
2021-12-09 22:14     ` Peter Collingbourne
2021-12-08  4:48 ` [PATCH v3 3/6] fs: use copy_from_user_nolog() to copy mount() data Peter Collingbourne
2021-12-08  9:34   ` Dmitry Vyukov
2021-12-09 21:42     ` Peter Collingbourne
2021-12-10  2:59       ` Dmitry Vyukov
2021-12-10  4:02         ` Peter Collingbourne
2021-12-10  4:23           ` Dmitry Vyukov
2021-12-08  4:48 ` [PATCH v3 4/6] uaccess-buffer: add CONFIG_GENERIC_ENTRY support Peter Collingbourne
2021-12-08  9:43   ` Dmitry Vyukov
2021-12-08  4:48 ` [PATCH v3 5/6] arm64: add support for uaccess logging Peter Collingbourne
2021-12-08  9:48   ` Dmitry Vyukov
2021-12-09 21:44     ` Peter Collingbourne
2021-12-08  4:48 ` [PATCH v3 6/6] Documentation: document " Peter Collingbourne
2021-12-08 15:33 ` [PATCH v3 0/6] kernel: introduce " Marco Elver
2021-12-09 22:12   ` Peter Collingbourne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).