linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* clean up and streamline probe_kernel_* and friends
@ 2020-05-06  6:22 Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 01/15] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
                   ` (14 more replies)
  0 siblings, 15 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Hi all,

this series start cleaning up the safe kernel and user memory probing
helpers in mm/maccess.c, and then allows architectures to implement
the kernel probing without overriding the address space limit and
temporarily allowing access to user memory.  It then switches x86
over to this new mechanism by reusing the unsafe_* uaccess logic.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/15] maccess: unexport probe_kernel_write and probe_user_write
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 02/15] maccess: remove various unused weak aliases Christoph Hellwig
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

These two functions are not used by any modular code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 3ca8d97e50106..cf21e604f78cb 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -121,7 +121,6 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(probe_kernel_write);
 
 /**
  * probe_user_write(): safely attempt to write to a user-space location
@@ -148,7 +147,6 @@ long __probe_user_write(void __user *dst, const void *src, size_t size)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(probe_user_write);
 
 /**
  * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/15] maccess: remove various unused weak aliases
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 01/15] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 03/15] maccess: remove duplicate kerneldoc commens Christoph Hellwig
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

maccess tends to define lots of underscore prefixed symbols that then
have other weak aliases.  But except for two cases they are never
actually used, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h |  3 ---
 mm/maccess.c            | 19 +++----------------
 2 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad5..a2c606a403745 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -324,7 +324,6 @@ extern long __probe_kernel_read(void *dst, const void *src, size_t size);
  * happens, handle that and return -EFAULT.
  */
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
-extern long __probe_user_read(void *dst, const void __user *src, size_t size);
 
 /*
  * probe_kernel_write(): safely attempt to write to a location
@@ -336,7 +335,6 @@ extern long __probe_user_read(void *dst, const void __user *src, size_t size);
  * happens, handle that and return -EFAULT.
  */
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size);
 
 /*
  * probe_user_write(): safely attempt to write to a location in user space
@@ -348,7 +346,6 @@ extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size
  * happens, handle that and return -EFAULT.
  */
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
-extern long notrace __probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
diff --git a/mm/maccess.c b/mm/maccess.c
index cf21e604f78cb..4e7f3b6eb05ae 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -79,11 +79,7 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
  * Safely read from user address @src to the buffer at @dst. If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-
-long __weak probe_user_read(void *dst, const void __user *src, size_t size)
-    __attribute__((alias("__probe_user_read")));
-
-long __probe_user_read(void *dst, const void __user *src, size_t size)
+long probe_user_read(void *dst, const void __user *src, size_t size)
 {
 	long ret = -EFAULT;
 	mm_segment_t old_fs = get_fs();
@@ -106,11 +102,7 @@ EXPORT_SYMBOL_GPL(probe_user_read);
  * Safely write to address @dst from the buffer at @src.  If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-
-long __weak probe_kernel_write(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_write")));
-
-long __probe_kernel_write(void *dst, const void *src, size_t size)
+long probe_kernel_write(void *dst, const void *src, size_t size)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
@@ -131,11 +123,7 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
  * Safely write to address @dst from the buffer at @src.  If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-
-long __weak probe_user_write(void __user *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_user_write")));
-
-long __probe_user_write(void __user *dst, const void *src, size_t size)
+long probe_user_write(void __user *dst, const void *src, size_t size)
 {
 	long ret = -EFAULT;
 	mm_segment_t old_fs = get_fs();
@@ -171,7 +159,6 @@ long __probe_user_write(void __user *dst, const void *src, size_t size)
  * probing memory on a user address range where strncpy_from_unsafe_user() is
  * supposed to be used instead.
  */
-
 long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/15] maccess: remove duplicate kerneldoc commens
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 01/15] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 02/15] maccess: remove various unused weak aliases Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 04/15] maccess: clarify kerneldoc comments Christoph Hellwig
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Many of the maccess routines have a copy of the kerneldoc comment
in the header.  Remove it as it is not useful and will get out of
sync sooner or later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h | 38 --------------------------------------
 1 file changed, 38 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index a2c606a403745..5a36a298a85f8 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,50 +301,12 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
-/*
- * probe_kernel_read(): safely attempt to read from a location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Safely read from address @src to the buffer at @dst.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
 extern long __probe_kernel_read(void *dst, const void *src, size_t size);
-
-/*
- * probe_user_read(): safely attempt to read from a location in user space
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Safely read from address @src to the buffer at @dst.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
 
-/*
- * probe_kernel_write(): safely attempt to write to a location
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-
-/*
- * probe_user_write(): safely attempt to write to a location in user space
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/15] maccess: clarify kerneldoc comments
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 03/15] maccess: remove duplicate kerneldoc commens Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 05/15] maccess: update the top of file comment Christoph Hellwig
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Add proper kerneldoc comments for probe_kernel_read_strict and
probe_kernel_read strncpy_from_unsafe_strict and explain the different
versus the non-strict version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 61 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 4e7f3b6eb05ae..747581ac50dc9 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -31,29 +31,35 @@ probe_write_common(void __user *dst, const void *src, size_t size)
 }
 
 /**
- * probe_kernel_read(): safely attempt to read from a kernel-space location
+ * probe_kernel_read(): safely attempt to read from any location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
  *
- * Safely read from address @src to the buffer at @dst.  If a kernel fault
- * happens, handle that and return -EFAULT.
+ * Same as probe_kernel_read_strict() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
+ */
+long __weak probe_kernel_read(void *dst, const void *src, size_t size)
+    __attribute__((alias("__probe_kernel_read")));
+
+/**
+ * probe_kernel_read_strict(): safely attempt to read from kernel-space
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Safely read from kernel address @src to the buffer at @dst.  If a kernel
+ * fault happens, handle that and return -EFAULT.
  *
  * We ensure that the copy_from_user is executed in atomic context so that
  * do_page_fault() doesn't attempt to take mmap_sem.  This makes
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
- *
- * probe_kernel_read_strict() is the same as probe_kernel_read() except for
- * the case where architectures have non-overlapping user and kernel address
- * ranges: probe_kernel_read_strict() will additionally return -EFAULT for
- * probing memory on a user address range where probe_user_read() is supposed
- * to be used instead.
  */
-
-long __weak probe_kernel_read(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_read")));
-
 long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
     __attribute__((alias("__probe_kernel_read")));
 
@@ -153,15 +159,34 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  *
- * strncpy_from_unsafe_strict() is the same as strncpy_from_unsafe() except
- * for the case where architectures have non-overlapping user and kernel address
- * ranges: strncpy_from_unsafe_strict() will additionally return -EFAULT for
- * probing memory on a user address range where strncpy_from_unsafe_user() is
- * supposed to be used instead.
+ * Same as strncpy_from_unsafe_strict() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
  */
 long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
+/**
+ * strncpy_from_unsafe_strict: - Copy a NUL terminated string from unsafe
+ *				 address.
+ * @dst:   Destination address, in kernel space.  This buffer must be at
+ *         least @count bytes long.
+ * @unsafe_addr: Unsafe address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
 long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
 				       long count)
     __attribute__((alias("__strncpy_from_unsafe")));
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/15] maccess: update the top of file comment
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 04/15] maccess: clarify kerneldoc comments Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 06/15] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_unsafe Christoph Hellwig
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This file now also contains several helpers for accessing user memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 747581ac50dc9..65880ba2ca376 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Access kernel memory without faulting.
+ * Access kernel or user memory without faulting.
  */
 #include <linux/export.h>
 #include <linux/mm.h>
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/15] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_unsafe
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 05/15] maccess: update the top of file comment Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 07/15] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_unsafe Christoph Hellwig
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This matches the convention of always having _unsafe as a suffix, as
well as match the naming of strncpy_from_user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h     | 2 +-
 kernel/trace/bpf_trace.c    | 2 +-
 kernel/trace/trace_kprobe.c | 2 +-
 mm/maccess.c                | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a36a298a85f8..74db16fac2a04 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -313,7 +313,7 @@ extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
 				       long count);
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+extern long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
 				     long count);
 extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1796747a773..d3989384e515d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -159,7 +159,7 @@ static const struct bpf_func_proto bpf_probe_read_user_proto = {
 BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
 	   const void __user *, unsafe_ptr)
 {
-	int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
+	int ret = strncpy_from_user_unsafe(dst, unsafe_ptr, size);
 
 	if (unlikely(ret < 0))
 		memset(dst, 0, size);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d0568af4a0ef6..67ed9655afc51 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1262,7 +1262,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
 
 	__dest = get_loc_data(dest, base);
 
-	ret = strncpy_from_unsafe_user(__dest, uaddr, maxlen);
+	ret = strncpy_from_user_unsafe(__dest, uaddr, maxlen);
 	if (ret >= 0)
 		*(u32 *)dest = make_data_loc(ret, __dest - base);
 
diff --git a/mm/maccess.c b/mm/maccess.c
index 65880ba2ca376..76f9d4dd51300 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -215,7 +215,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 }
 
 /**
- * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ * strncpy_from_user_unsafe: - Copy a NUL terminated string from unsafe user
  *				address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
  *         least @count bytes long.
@@ -232,7 +232,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
 			      long count)
 {
 	mm_segment_t old_fs = get_fs();
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/15] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_unsafe
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 06/15] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_unsafe Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe Christoph Hellwig
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This matches the convention of always having _unsafe as a suffix, as
well as match the naming of strncpy_from_user_unsafe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/mm/maccess.c    | 2 +-
 include/linux/uaccess.h  | 2 +-
 kernel/trace/bpf_trace.c | 2 +-
 mm/maccess.c             | 6 +++---
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index f5b85bdc0535c..6290e9894fb55 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -34,7 +34,7 @@ long probe_kernel_read_strict(void *dst, const void *src, size_t size)
 	return __probe_kernel_read(dst, src, size);
 }
 
-long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr, long count)
+long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
 {
 	if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
 		return -EFAULT;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 74db16fac2a04..221d703480b6e 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -310,7 +310,7 @@ extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+extern long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
 				       long count);
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d3989384e515d..e4e202f433903 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -240,7 +240,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
 	 * is returned that can be used for bpf_perf_event_output() et al.
 	 */
 	ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
-	      strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
+	      strncpy_from_kernel_unsafe(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0))
 out:
 		memset(dst, 0, size);
diff --git a/mm/maccess.c b/mm/maccess.c
index 76f9d4dd51300..683b70518a896 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -159,7 +159,7 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  *
- * Same as strncpy_from_unsafe_strict() except that for architectures with
+ * Same as strncpy_from_kernel_unsafe() except that for architectures with
  * not fully separated user and kernel address spaces this function also works
  * for user address tanges.
  *
@@ -170,7 +170,7 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
 /**
- * strncpy_from_unsafe_strict: - Copy a NUL terminated string from unsafe
+ * strncpy_from_kernel_unsafe: - Copy a NUL terminated string from unsafe
  *				 address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
  *         least @count bytes long.
@@ -187,7 +187,7 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+long __weak strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
 				       long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 07/15] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_unsafe Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06 17:44   ` Linus Torvalds
  2020-05-06  6:22 ` [PATCH 09/15] maccess: remove probe_read_common and probe_write_common Christoph Hellwig
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This matches the convention of always having _unsafe as a suffix, as
well as match the naming of strncpy_from_user_unsafe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h     | 2 +-
 kernel/trace/trace_kprobe.c | 2 +-
 mm/maccess.c                | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 221d703480b6e..77909cafde5a8 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -315,7 +315,7 @@ extern long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
 				     long count);
-extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
+extern long strnlen_user_unsafe(const void __user *unsafe_addr, long count);
 
 /**
  * probe_kernel_address(): safely attempt to read from a location
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 67ed9655afc51..a7f43c7ec9880 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1215,7 +1215,7 @@ fetch_store_strlen_user(unsigned long addr)
 {
 	const void __user *uaddr =  (__force const void __user *)addr;
 
-	return strnlen_unsafe_user(uaddr, MAX_STRING_SIZE);
+	return strnlen_user_unsafe(uaddr, MAX_STRING_SIZE);
 }
 
 /*
diff --git a/mm/maccess.c b/mm/maccess.c
index 683b70518a896..95f2bf97e5ad7 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -258,7 +258,7 @@ long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
 }
 
 /**
- * strnlen_unsafe_user: - Get the size of a user string INCLUDING final NUL.
+ * strnlen_user_unsafe: - Get the size of a user string INCLUDING final NUL.
  * @unsafe_addr: The string to measure.
  * @count: Maximum count (including NUL)
  *
@@ -273,7 +273,7 @@ long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
  * Unlike strnlen_user, this can be used from IRQ handler etc. because
  * it disables pagefaults.
  */
-long strnlen_unsafe_user(const void __user *unsafe_addr, long count)
+long strnlen_user_unsafe(const void __user *unsafe_addr, long count)
 {
 	mm_segment_t old_fs = get_fs();
 	int ret;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/15] maccess: remove probe_read_common and probe_write_common
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (7 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 10/15] maccess: unify the probe kernel arch hooks Christoph Hellwig
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Each of the helpers has just two callers, which also different in
dealing with kernel or userspace pointers.  Just open code the logic
in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 63 ++++++++++++++++++++++++----------------------------
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 95f2bf97e5ad7..c18f2dcdb1b88 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,30 +6,6 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
-static __always_inline long
-probe_read_common(void *dst, const void __user *src, size_t size)
-{
-	long ret;
-
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(dst, src, size);
-	pagefault_enable();
-
-	return ret ? -EFAULT : 0;
-}
-
-static __always_inline long
-probe_write_common(void __user *dst, const void *src, size_t size)
-{
-	long ret;
-
-	pagefault_disable();
-	ret = __copy_to_user_inatomic(dst, src, size);
-	pagefault_enable();
-
-	return ret ? -EFAULT : 0;
-}
-
 /**
  * probe_kernel_read(): safely attempt to read from any location
  * @dst: pointer to the buffer that shall take the data
@@ -69,10 +45,15 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	ret = probe_read_common(dst, (__force const void __user *)src, size);
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(dst, (__force const void __user *)src,
+			size);
+	pagefault_enable();
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
@@ -91,11 +72,16 @@ long probe_user_read(void *dst, const void __user *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(USER_DS);
-	if (access_ok(src, size))
-		ret = probe_read_common(dst, src, size);
+	if (access_ok(src, size)) {
+		pagefault_disable();
+		ret = __copy_from_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(probe_user_read);
 
@@ -114,10 +100,14 @@ long probe_kernel_write(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	ret = probe_write_common((__force void __user *)dst, src, size);
+	pagefault_disable();
+	ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);
+	pagefault_enable();
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 
 /**
@@ -135,11 +125,16 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(USER_DS);
-	if (access_ok(dst, size))
-		ret = probe_write_common(dst, src, size);
+	if (access_ok(dst, size)) {
+		pagefault_disable();
+		ret = __copy_to_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/15] maccess: unify the probe kernel arch hooks
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (8 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 09/15] maccess: remove probe_read_common and probe_write_common Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 11/15] maccess: remove strncpy_from_unsafe Christoph Hellwig
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Currently architectures have to override every routine that probes
kernel memory, which includes a pure read and strcpy, both in strict
and not strict variants.  Just provide a single arch hooks instead to
make sure all architectures cover all the cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/parisc/lib/memcpy.c | 13 ++++-------
 arch/um/kernel/maccess.c | 11 ++++-----
 arch/x86/mm/maccess.c    | 18 ++++-----------
 include/linux/uaccess.h  |  5 ++--
 mm/maccess.c             | 49 ++++++++++++++++++++++++++++++----------
 5 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index beceaab34ecb7..5ef648bd33119 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,14 +57,11 @@ void * memcpy(void * dst,const void *src, size_t count)
 EXPORT_SYMBOL(raw_copy_in_user);
 EXPORT_SYMBOL(memcpy);
 
-long probe_kernel_read(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
+		bool strict)
 {
-	unsigned long addr = (unsigned long)src;
-
-	if (addr < PAGE_SIZE)
-		return -EFAULT;
-
+	if ((unsigned long)unsafe_src < PAGE_SIZE)
+		return false;
 	/* check for I/O space F_EXTEND(0xfff00000) access as well? */
-
-	return __probe_kernel_read(dst, src, size);
+	return true;
 }
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index 67b2e0fa92bba..90a1bec923158 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,15 +7,14 @@
 #include <linux/kernel.h>
 #include <os.h>
 
-long probe_kernel_read(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(void *dst, const void *src, size_t size,
+		bool strict)
 {
 	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
 
 	if ((unsigned long)src < PAGE_SIZE || size <= 0)
-		return -EFAULT;
-
+		return false;
 	if (os_mincore(psrc, size + src - psrc) <= 0)
-		return -EFAULT;
-
-	return __probe_kernel_read(dst, src, size);
+		return false;
+	return true;
 }
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 6290e9894fb55..5c323ab187b27 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -26,18 +26,10 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
 }
 #endif
 
-long probe_kernel_read_strict(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
+		bool strict)
 {
-	if (unlikely(invalid_probe_range((unsigned long)src)))
-		return -EFAULT;
-
-	return __probe_kernel_read(dst, src, size);
-}
-
-long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
-{
-	if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
-		return -EFAULT;
-
-	return __strncpy_from_unsafe(dst, unsafe_addr, count);
+	if (!strict)
+		return true;
+	return !invalid_probe_range((unsigned long)unsafe_src);
 }
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 77909cafde5a8..f8c47395a92df 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,9 +301,11 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src,
+		size_t size, bool strict);
+
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
-extern long __probe_kernel_read(void *dst, const void *src, size_t size);
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
 
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
@@ -312,7 +314,6 @@ extern long notrace probe_user_write(void __user *dst, const void *src, size_t s
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
 				       long count);
-extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
 				     long count);
 extern long strnlen_user_unsafe(const void __user *unsafe_addr, long count);
diff --git a/mm/maccess.c b/mm/maccess.c
index c18f2dcdb1b88..11563129cd490 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,6 +6,17 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
+static long __probe_kernel_read(void *dst, const void *src, size_t size,
+		bool strict);
+static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
+		long count, bool strict);
+
+bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
+		size_t size, bool strict)
+{
+	return true;
+}
+
 /**
  * probe_kernel_read(): safely attempt to read from any location
  * @dst: pointer to the buffer that shall take the data
@@ -19,8 +30,11 @@
  * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
  * separate kernel and user address spaces, and also a bad idea otherwise.
  */
-long __weak probe_kernel_read(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_read")));
+long probe_kernel_read(void *dst, const void *src, size_t size)
+{
+	return __probe_kernel_read(dst, src, size, false);
+}
+EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
  * probe_kernel_read_strict(): safely attempt to read from kernel-space
@@ -36,14 +50,20 @@ long __weak probe_kernel_read(void *dst, const void *src, size_t size)
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
  */
-long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_read")));
+long probe_kernel_read_strict(void *dst, const void *src, size_t size)
+{
+	return __probe_kernel_read(dst, src, size, true);
+}
 
-long __probe_kernel_read(void *dst, const void *src, size_t size)
+static long __probe_kernel_read(void *dst, const void *src, size_t size,
+		bool strict)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
 
+	if (!probe_kernel_read_allowed(dst, src, size, strict))
+		return -EFAULT;
+
 	set_fs(KERNEL_DS);
 	pagefault_disable();
 	ret = __copy_from_user_inatomic(dst, (__force const void __user *)src,
@@ -55,7 +75,6 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 		return -EFAULT;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
  * probe_user_read(): safely attempt to read from a user-space location
@@ -161,8 +180,10 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
  * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
  * separate kernel and user address spaces, and also a bad idea otherwise.
  */
-long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
-    __attribute__((alias("__strncpy_from_unsafe")));
+long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+{
+	return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
+}
 
 /**
  * strncpy_from_kernel_unsafe: - Copy a NUL terminated string from unsafe
@@ -182,11 +203,13 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long __weak strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
-				       long count)
-    __attribute__((alias("__strncpy_from_unsafe")));
+long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
+{
+	return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
+}
 
-long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
+		long count, bool strict)
 {
 	mm_segment_t old_fs = get_fs();
 	const void *src = unsafe_addr;
@@ -194,6 +217,8 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	if (unlikely(count <= 0))
 		return 0;
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, strict))
+		return -EFAULT;
 
 	set_fs(KERNEL_DS);
 	pagefault_disable();
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/15] maccess: remove strncpy_from_unsafe
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (9 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 10/15] maccess: unify the probe kernel arch hooks Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-11  5:34   ` Masami Hiramatsu
  2020-05-06  6:22 ` [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

All three callers really should try the explicit kernel and user
copies instead.  One has already deprecated the somewhat dangerous
either kernel or user address concept, the other two still need to
follow up eventually.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h     |  1 -
 kernel/trace/bpf_trace.c    | 40 ++++++++++++++++++++++++++-----------
 kernel/trace/trace_kprobe.c |  5 ++++-
 mm/maccess.c                | 39 +-----------------------------------
 4 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index f8c47395a92df..09d6e358883cc 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -311,7 +311,6 @@ extern long probe_user_read(void *dst, const void __user *src, size_t size);
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
 
-extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
 				       long count);
 extern long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e4e202f433903..ffe841433caa1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -229,9 +229,10 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
 	int ret = security_locked_down(LOCKDOWN_BPF_READ);
 
 	if (unlikely(ret < 0))
-		goto out;
+		goto fail;
+
 	/*
-	 * The strncpy_from_unsafe_*() call will likely not fill the entire
+	 * The strncpy_from_*_unsafe() call will likely not fill the entire
 	 * buffer, but that's okay in this circumstance as we're probing
 	 * arbitrary memory anyway similar to bpf_probe_read_*() and might
 	 * as well probe the stack. Thus, memory is explicitly cleared
@@ -239,11 +240,18 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
 	 * code altogether don't copy garbage; otherwise length of string
 	 * is returned that can be used for bpf_perf_event_output() et al.
 	 */
-	ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
-	      strncpy_from_kernel_unsafe(dst, unsafe_ptr, size);
-	if (unlikely(ret < 0))
-out:
-		memset(dst, 0, size);
+	ret = strncpy_from_kernel_unsafe(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0)) {
+		if (compat)
+			ret = strncpy_from_user_unsafe(dst,
+					(__force const void __user *)unsafe_ptr,
+					size);
+		if (ret < 0)
+			goto fail;
+	}
+	return 0;
+fail:
+	memset(dst, 0, size);
 	return ret;
 }
 
@@ -321,6 +329,17 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
 	return &bpf_probe_write_user_proto;
 }
 
+#define BPF_STRNCPY_LEN 64
+
+static void bpf_strncpy(char *buf, long unsafe_addr)
+{
+	buf[0] = 0;
+	if (strncpy_from_kernel_unsafe(buf, (void *)unsafe_addr,
+			BPF_STRNCPY_LEN))
+		strncpy_from_user_unsafe(buf, (void __user *)unsafe_addr,
+				BPF_STRNCPY_LEN);
+}
+
 /*
  * Only limited trace_printk() conversion specifiers allowed:
  * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %s
@@ -332,7 +351,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
 	int mod[3] = {};
 	int fmt_cnt = 0;
 	u64 unsafe_addr;
-	char buf[64];
+	char buf[BPF_STRNCPY_LEN];
 	int i;
 
 	/*
@@ -387,10 +406,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
 					arg3 = (long) buf;
 					break;
 				}
-				buf[0] = 0;
-				strncpy_from_unsafe(buf,
-						    (void *) (long) unsafe_addr,
-						    sizeof(buf));
+				bpf_strncpy(buf, unsafe_addr);
 			}
 			continue;
 		}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a7f43c7ec9880..525d12137325c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1238,7 +1238,10 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
 	 * Try to get string again, since the string can be changed while
 	 * probing.
 	 */
-	ret = strncpy_from_unsafe(__dest, (void *)addr, maxlen);
+	ret = strncpy_from_kernel_unsafe(__dest, (void *)addr, maxlen);
+	if (ret < 0)
+		ret = strncpy_from_user_unsafe(__dest, (void __user *)addr,
+				maxlen);
 	if (ret >= 0)
 		*(u32 *)dest = make_data_loc(ret, __dest - base);
 
diff --git a/mm/maccess.c b/mm/maccess.c
index 11563129cd490..cbd9d668aa46e 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -8,8 +8,6 @@
 
 static long __probe_kernel_read(void *dst, const void *src, size_t size,
 		bool strict);
-static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
-		long count, bool strict);
 
 bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
 		size_t size, bool strict)
@@ -156,35 +154,6 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
 	return 0;
 }
 
-/**
- * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
- * @dst:   Destination address, in kernel space.  This buffer must be at
- *         least @count bytes long.
- * @unsafe_addr: Unsafe address.
- * @count: Maximum number of bytes to copy, including the trailing NUL.
- *
- * Copies a NUL-terminated string from unsafe address to kernel buffer.
- *
- * On success, returns the length of the string INCLUDING the trailing NUL.
- *
- * If access fails, returns -EFAULT (some data may have been copied
- * and the trailing NUL added).
- *
- * If @count is smaller than the length of the string, copies @count-1 bytes,
- * sets the last byte of @dst buffer to NUL and returns @count.
- *
- * Same as strncpy_from_kernel_unsafe() except that for architectures with
- * not fully separated user and kernel address spaces this function also works
- * for user address tanges.
- *
- * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
- * separate kernel and user address spaces, and also a bad idea otherwise.
- */
-long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
-{
-	return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
-}
-
 /**
  * strncpy_from_kernel_unsafe: - Copy a NUL terminated string from unsafe
  *				 address.
@@ -204,12 +173,6 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
 long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
-{
-	return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
-}
-
-static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
-		long count, bool strict)
 {
 	mm_segment_t old_fs = get_fs();
 	const void *src = unsafe_addr;
@@ -217,7 +180,7 @@ static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
 
 	if (unlikely(count <= 0))
 		return 0;
-	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, strict))
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (10 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 11/15] maccess: remove strncpy_from_unsafe Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-11  5:05   ` Masami Hiramatsu
  2020-05-06  6:22 ` [PATCH 13/15] maccess: move user access routines together Christoph Hellwig
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Except for historical confusion in the kprobes/uprobes and bpf tracers
there is no good reason to ever allow user memory accesses from
probe_kernel_read.  Make the tracers fall back to a probe_user_read
if the probe_kernel_read falls to keep the core API clean.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/parisc/lib/memcpy.c    |  3 +--
 arch/um/kernel/maccess.c    |  3 +--
 arch/x86/mm/maccess.c       |  5 +----
 include/linux/uaccess.h     |  4 +---
 kernel/trace/bpf_trace.c    | 20 +++++++++++++------
 kernel/trace/trace_kprobe.c | 11 ++++++++++-
 mm/maccess.c                | 39 ++++++-------------------------------
 7 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 5ef648bd33119..9fe662b3b5604 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,8 +57,7 @@ void * memcpy(void * dst,const void *src, size_t count)
 EXPORT_SYMBOL(raw_copy_in_user);
 EXPORT_SYMBOL(memcpy);
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
-		bool strict)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
 {
 	if ((unsigned long)unsafe_src < PAGE_SIZE)
 		return false;
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index 90a1bec923158..734f3d7e57c0f 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,8 +7,7 @@
 #include <linux/kernel.h>
 #include <os.h>
 
-bool probe_kernel_read_allowed(void *dst, const void *src, size_t size,
-		bool strict)
+bool probe_kernel_read_allowed(void *dst, const void *src, size_t size)
 {
 	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
 
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 5c323ab187b27..a1bd81677aa72 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -26,10 +26,7 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
 }
 #endif
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
-		bool strict)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
 {
-	if (!strict)
-		return true;
 	return !invalid_probe_range((unsigned long)unsafe_src);
 }
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 09d6e358883cc..99e2c2a41164a 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,11 +301,9 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src,
-		size_t size, bool strict);
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size);
 
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
-extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
 
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ffe841433caa1..f694befe8ec9b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -183,12 +183,20 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
 	int ret = security_locked_down(LOCKDOWN_BPF_READ);
 
 	if (unlikely(ret < 0))
-		goto out;
-	ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
-	      probe_kernel_read_strict(dst, unsafe_ptr, size);
-	if (unlikely(ret < 0))
-out:
-		memset(dst, 0, size);
+		goto fail;
+
+	ret = probe_kernel_read(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0)) {
+		if (compat)
+			ret = probe_user_read(dst,
+				(__force const void __user *)unsafe_ptr, size);
+		if (unlikely(ret < 0))
+			goto fail;
+	}
+
+	return 0;
+fail:
+	memset(dst, 0, size);
 	return ret;
 }
 
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 525d12137325c..1300c9fd5c755 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1203,6 +1203,9 @@ fetch_store_strlen(unsigned long addr)
 
 	do {
 		ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
+		if (ret)
+			ret = probe_user_read(&c,
+				(__force u8 __user *)addr + len, 1);
 		len++;
 	} while (c && ret == 0 && len < MAX_STRING_SIZE);
 
@@ -1275,7 +1278,13 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size)
 {
-	return probe_kernel_read(dest, src, size);
+	int ret;
+
+	ret = probe_kernel_read(dest, src, size);
+	if (ret)
+		ret = probe_user_read(dest, (__force const void __user *)src,
+				size);
+	return ret;
 }
 
 static nokprobe_inline int
diff --git a/mm/maccess.c b/mm/maccess.c
index cbd9d668aa46e..811f49e8de113 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,36 +6,14 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
-static long __probe_kernel_read(void *dst, const void *src, size_t size,
-		bool strict);
-
 bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
-		size_t size, bool strict)
+		size_t size)
 {
 	return true;
 }
 
 /**
- * probe_kernel_read(): safely attempt to read from any location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Same as probe_kernel_read_strict() except that for architectures with
- * not fully separated user and kernel address spaces this function also works
- * for user address tanges.
- *
- * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
- * separate kernel and user address spaces, and also a bad idea otherwise.
- */
-long probe_kernel_read(void *dst, const void *src, size_t size)
-{
-	return __probe_kernel_read(dst, src, size, false);
-}
-EXPORT_SYMBOL_GPL(probe_kernel_read);
-
-/**
- * probe_kernel_read_strict(): safely attempt to read from kernel-space
+ * probe_kernel_read(): safely attempt to read from kernel-space
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
@@ -48,18 +26,12 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
  */
-long probe_kernel_read_strict(void *dst, const void *src, size_t size)
-{
-	return __probe_kernel_read(dst, src, size, true);
-}
-
-static long __probe_kernel_read(void *dst, const void *src, size_t size,
-		bool strict)
+long probe_kernel_read(void *dst, const void *src, size_t size)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
 
-	if (!probe_kernel_read_allowed(dst, src, size, strict))
+	if (!probe_kernel_read_allowed(dst, src, size))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
@@ -73,6 +45,7 @@ static long __probe_kernel_read(void *dst, const void *src, size_t size,
 		return -EFAULT;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
  * probe_user_read(): safely attempt to read from a user-space location
@@ -180,7 +153,7 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	if (unlikely(count <= 0))
 		return 0;
-	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/15] maccess: move user access routines together
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (11 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 14/15] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 15/15] x86: use non-set_fs based maccess routines Christoph Hellwig
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Move kernel access vs user access routines together to ease upcoming
ifdefs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 110 +++++++++++++++++++++++++--------------------------
 1 file changed, 55 insertions(+), 55 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 811f49e8de113..aa59967d9b658 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -47,34 +47,6 @@ long probe_kernel_read(void *dst, const void *src, size_t size)
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
-/**
- * probe_user_read(): safely attempt to read from a user-space location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from. This must be a user address.
- * @size: size of the data chunk
- *
- * Safely read from user address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
-long probe_user_read(void *dst, const void __user *src, size_t size)
-{
-	long ret = -EFAULT;
-	mm_segment_t old_fs = get_fs();
-
-	set_fs(USER_DS);
-	if (access_ok(src, size)) {
-		pagefault_disable();
-		ret = __copy_from_user_inatomic(dst, src, size);
-		pagefault_enable();
-	}
-	set_fs(old_fs);
-
-	if (ret)
-		return -EFAULT;
-	return 0;
-}
-EXPORT_SYMBOL_GPL(probe_user_read);
-
 /**
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
@@ -100,33 +72,6 @@ long probe_kernel_write(void *dst, const void *src, size_t size)
 	return 0;
 }
 
-/**
- * probe_user_write(): safely attempt to write to a user-space location
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
-long probe_user_write(void __user *dst, const void *src, size_t size)
-{
-	long ret = -EFAULT;
-	mm_segment_t old_fs = get_fs();
-
-	set_fs(USER_DS);
-	if (access_ok(dst, size)) {
-		pagefault_disable();
-		ret = __copy_to_user_inatomic(dst, src, size);
-		pagefault_enable();
-	}
-	set_fs(old_fs);
-
-	if (ret)
-		return -EFAULT;
-	return 0;
-}
-
 /**
  * strncpy_from_kernel_unsafe: - Copy a NUL terminated string from unsafe
  *				 address.
@@ -170,6 +115,61 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
 	return ret ? -EFAULT : src - unsafe_addr;
 }
 
+/**
+ * probe_user_read(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long probe_user_read(void *dst, const void __user *src, size_t size)
+{
+	long ret = -EFAULT;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(USER_DS);
+	if (access_ok(src, size)) {
+		pagefault_disable();
+		ret = __copy_from_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
+	set_fs(old_fs);
+
+	if (ret)
+		return -EFAULT;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(probe_user_read);
+
+/**
+ * probe_user_write(): safely attempt to write to a user-space location
+ * @dst: address to write to
+ * @src: pointer to the data that shall be written
+ * @size: size of the data chunk
+ *
+ * Safely write to address @dst from the buffer at @src.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long probe_user_write(void __user *dst, const void *src, size_t size)
+{
+	long ret = -EFAULT;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(USER_DS);
+	if (access_ok(dst, size)) {
+		pagefault_disable();
+		ret = __copy_to_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
+	set_fs(old_fs);
+
+	if (ret)
+		return -EFAULT;
+	return 0;
+}
+
 /**
  * strncpy_from_user_unsafe: - Copy a NUL terminated string from unsafe user
  *				address.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/15] maccess: allow architectures to provide kernel probing directly
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (12 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 13/15] maccess: move user access routines together Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06  6:22 ` [PATCH 15/15] x86: use non-set_fs based maccess routines Christoph Hellwig
  14 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Provide alternative versions of probe_kernel_read, probe_kernel_write
and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
use arch hooks that are modelled after unsafe_{get,put}_user to access
kernel memory in an exception safe way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/mm/maccess.c b/mm/maccess.c
index aa59967d9b658..d99a5a67fa9b3 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -12,6 +12,81 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
 	return true;
 }
 
+#ifdef HAVE_ARCH_PROBE_KERNEL
+
+#define probe_kernel_read_loop(dst, src, len, type, err_label)		\
+	while (len >= sizeof(type)) {					\
+		arch_kernel_read(dst, src, type, err_label);		\
+		dst += sizeof(type);					\
+		src += sizeof(type);					\
+		len -= sizeof(type);					\
+	}
+
+long probe_kernel_read(void *dst, const void *src, size_t size)
+{
+	if (!probe_kernel_read_allowed(dst, src, size))
+		return -EFAULT;
+
+	pagefault_disable();
+	probe_kernel_read_loop(dst, src, size, u64, Efault);
+	probe_kernel_read_loop(dst, src, size, u32, Efault);
+	probe_kernel_read_loop(dst, src, size, u16, Efault);
+	probe_kernel_read_loop(dst, src, size, u8, Efault);
+	pagefault_enable();
+	return 0;
+Efault:
+	pagefault_enable();
+	return -EFAULT;
+}
+EXPORT_SYMBOL_GPL(probe_kernel_read);
+
+#define probe_kernel_write_loop(dst, src, len, type, err_label)		\
+	while (len >= sizeof(type)) {					\
+		arch_kernel_write(dst, src, type, err_label);		\
+		dst += sizeof(type);					\
+		src += sizeof(type);					\
+		len -= sizeof(type);					\
+	}
+
+long probe_kernel_write(void *dst, const void *src, size_t size)
+{
+	pagefault_disable();
+	probe_kernel_write_loop(dst, src, size, u64, Efault);
+	probe_kernel_write_loop(dst, src, size, u32, Efault);
+	probe_kernel_write_loop(dst, src, size, u16, Efault);
+	probe_kernel_write_loop(dst, src, size, u8, Efault);
+	pagefault_enable();
+	return 0;
+Efault:
+	pagefault_enable();
+	return -EFAULT;
+}
+
+long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
+{
+	const void *src = unsafe_addr;
+
+	if (unlikely(count <= 0))
+		return 0;
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
+		return -EFAULT;
+
+	pagefault_disable();
+	do {
+		arch_kernel_read(dst, src, u8, Efault);
+		dst++;
+		src++;
+	} while (dst[-1] && src - unsafe_addr < count);
+	pagefault_enable();
+
+	dst[-1] = '\0';
+	return src - unsafe_addr;
+Efault:
+	pagefault_enable();
+	dst[-1] = '\0';
+	return -EFAULT;
+}
+#else /* HAVE_ARCH_PROBE_KERNEL */
 /**
  * probe_kernel_read(): safely attempt to read from kernel-space
  * @dst: pointer to the buffer that shall take the data
@@ -114,6 +189,7 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	return ret ? -EFAULT : src - unsafe_addr;
 }
+#endif /* HAVE_ARCH_PROBE_KERNEL */
 
 /**
  * probe_user_read(): safely attempt to read from a user-space location
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/15] x86: use non-set_fs based maccess routines
  2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
                   ` (13 preceding siblings ...)
  2020-05-06  6:22 ` [PATCH 14/15] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
@ 2020-05-06  6:22 ` Christoph Hellwig
  2020-05-06 17:51   ` Linus Torvalds
  14 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06  6:22 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Provide arch_kernel_read and arch_kernel_write routines to implement the
maccess routines without messing with set_fs and without stac/clac that
opens up access to user space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/include/asm/uaccess.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index d8f283b9a569c..765e18417b3ba 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -523,5 +523,21 @@ do {									\
 	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label);	\
 } while (0)
 
+#define HAVE_ARCH_PROBE_KERNEL
+
+#define arch_kernel_read(dst, src, type, err_label)			\
+do {									\
+        int __kr_err;							\
+									\
+	__get_user_size(*((type *)dst), (__force type __user *)src,	\
+			sizeof(type), __kr_err);			\
+        if (unlikely(__kr_err))						\
+		goto err_label;						\
+} while (0)
+
+#define arch_kernel_write(dst, src, type, err_label)			\
+	__put_user_size(*((type *)(src)), (__force type __user *)(dst),	\
+			sizeof(type), err_label)
+
 #endif /* _ASM_X86_UACCESS_H */
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe
  2020-05-06  6:22 ` [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe Christoph Hellwig
@ 2020-05-06 17:44   ` Linus Torvalds
  2020-05-06 17:47     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2020-05-06 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Tue, May 5, 2020 at 11:22 PM Christoph Hellwig <hch@lst.de> wrote:
>
> This matches the convention of always having _unsafe as a suffix, as
> well as match the naming of strncpy_from_user_unsafe.

Hmm. While renaming them, can we perhaps clarify what the rules are?

We now have two different kinds of "unsafe". We have the
"unsafe_get_user()" kind of unsafe: the user pointer itself is unsafe
because it isn't checked, and you need to use a "user_access_begin()"
to verify.

It's the new form of "__get_user()".

And then we have the strncpy_from_user_unsafe(), which is really more
like the "probe_kernel_read()" kind of funtion, in that it's about the
context, and not faulting.

Honestly, I don't like the "unsafe" in the second case, because
there's nothing "unsafe" about the function. It's used in odd
contexts. I guess to some degree those are "unsafe" contexts, but I
think it might be better to clarify.

So while I think using a consistent convention is good, and it's true
that there is a difference in the convention between the two cases
("unsafe" at the beginning vs end), one of them is actually about the
safety and security of the operation (and we have automated logic
these days to verify it on x86), the other has nothing to do with
"safety", really.

Would it be better to standardize around a "probe_xyz()" naming?

Or perhaps a "xyz_nofault()" naming?

I'm not a huge fan of the "probe" naming, but it sure is descriptive,
which is a really good thing.

Another option would be to make it explicitly about _what_ is
"unsafe", ie that it's about not having a process context that can be
faulted in. But "xyz_unsafe_context()" is much too verbose.
"xyz_noctx()" might work.

I realize this is nit-picky, and I think the patch series as-is is
already an improvement, but I do think our naming in this area is
really quite bad.

The fact that we have "probe_kernel_read()" but then
"strncpy_from_user_unsafe()" for the _same_ conceptual difference
really tells me how inconsistent the naming for these kinds of "we
can't take page faults" is. No?

                   Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe
  2020-05-06 17:44   ` Linus Torvalds
@ 2020-05-06 17:47     ` Christoph Hellwig
  2020-05-06 17:57       ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06 17:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Daniel Borkmann, Masami Hiramatsu, Andrew Morton, linux-parisc,
	linux-um, Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 06, 2020 at 10:44:15AM -0700, Linus Torvalds wrote:
> So while I think using a consistent convention is good, and it's true
> that there is a difference in the convention between the two cases
> ("unsafe" at the beginning vs end), one of them is actually about the
> safety and security of the operation (and we have automated logic
> these days to verify it on x86), the other has nothing to do with
> "safety", really.
> 
> Would it be better to standardize around a "probe_xyz()" naming?

So:

  probe_strncpy, probe_strncpy_user, probe_strnlen_user?

Sounds weird, but at least it is consistent.

> Or perhaps a "xyz_nofault()" naming?

That sounds a little better:

   strncpy_nofault, strncpy_user_nofault, strnlen_user_nofault

> I realize this is nit-picky, and I think the patch series as-is is
> already an improvement, but I do think our naming in this area is
> really quite bad.

Always open for improvements :)

> The fact that we have "probe_kernel_read()" but then
> "strncpy_from_user_unsafe()" for the _same_ conceptual difference
> really tells me how inconsistent the naming for these kinds of "we
> can't take page faults" is. No?

True.  If we wanted to do _nofaul, what would the basic read/write
versions be?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 15/15] x86: use non-set_fs based maccess routines
  2020-05-06  6:22 ` [PATCH 15/15] x86: use non-set_fs based maccess routines Christoph Hellwig
@ 2020-05-06 17:51   ` Linus Torvalds
  2020-05-06 18:15     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2020-05-06 17:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Tue, May 5, 2020 at 11:23 PM Christoph Hellwig <hch@lst.de> wrote:
>
> +#define arch_kernel_read(dst, src, type, err_label)                    \
> +       __get_user_size(*((type *)dst), (__force type __user *)src,     \
> +                       sizeof(type), __kr_err);                        \
..
> +#define arch_kernel_write(dst, src, type, err_label)                   \
> +       __put_user_size(*((type *)(src)), (__force type __user *)(dst), \
> +                       sizeof(type), err_label)

My private tree no longer has those __get/put_user_size() things,
because "unsafe_get/put_user()" is the only thing that remains with my
conversion to asm goto.

And we're actively trying to get rid of the whole __get_user() mess.
Admittedly "__get_user_size()" is just the internal helper that
doesn't have the problem, but it really is an internal helper for a
legacy operation, and the new op that uses it is that
"unsafe_get_user()".

Also, because you use __get_user_size(), you then have to duplicate
the error handling logic that we already have in unsafe_get_user().

IOW - is there some reason why you didn't just make these use
"unsafe_get/put_user()" directly, and avoid both of those issues?

              Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe
  2020-05-06 17:47     ` Christoph Hellwig
@ 2020-05-06 17:57       ` Linus Torvalds
  0 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2020-05-06 17:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 6, 2020 at 10:47 AM Christoph Hellwig <hch@lst.de> wrote:
>
> > The fact that we have "probe_kernel_read()" but then
> > "strncpy_from_user_unsafe()" for the _same_ conceptual difference
> > really tells me how inconsistent the naming for these kinds of "we
> > can't take page faults" is. No?
>
> True.  If we wanted to do _nofaul, what would the basic read/write
> versions be?

I think "copy_to/from_kernel_nofault()" might be the most consistent
model, if we are looking to be kind of consistent with the user access
functions..

Unless we want to make "memcpy" be part of the name, but I think that
we really want to have that 'from/to' part anyway, which forces the
"copy_from/to_xyz" kind of naming.

I dunno. I don't want to be too nit-picky, I just would like us to be
more consistent and have the naming say what's up without having
multiple different versions of the same thing.

We've had this same discussion with the nvdimm case, but there the
issues are somewhat different (faulting is ok on user addresses - you
can sleep - but kernel address faults aren't about the _context_ any
more, they are about the data not being safe to access any more)

Anybody else?

             Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 15/15] x86: use non-set_fs based maccess routines
  2020-05-06 17:51   ` Linus Torvalds
@ 2020-05-06 18:15     ` Christoph Hellwig
  2020-05-06 19:01       ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-06 18:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Daniel Borkmann, Masami Hiramatsu, Andrew Morton, linux-parisc,
	linux-um, Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 06, 2020 at 10:51:51AM -0700, Linus Torvalds wrote:
> My private tree no longer has those __get/put_user_size() things,
> because "unsafe_get/put_user()" is the only thing that remains with my
> conversion to asm goto.
> 
> And we're actively trying to get rid of the whole __get_user() mess.
> Admittedly "__get_user_size()" is just the internal helper that
> doesn't have the problem, but it really is an internal helper for a
> legacy operation, and the new op that uses it is that
> "unsafe_get_user()".
> 
> Also, because you use __get_user_size(), you then have to duplicate
> the error handling logic that we already have in unsafe_get_user().
> 
> IOW - is there some reason why you didn't just make these use
> "unsafe_get/put_user()" directly, and avoid both of those issues?

That was the first prototype, and or x86 it works great, just the
__user cases in maccess.c are a little ugly.  And they point to
the real problem - for architectures like sparc and s390 that use
an entirely separate address space for the kernel vs userspace
I dont think just use unsafe_{get,put}_user will work, as they need
different instructions.

Btw, where is you magic private tree and what is the plan for it?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 15/15] x86: use non-set_fs based maccess routines
  2020-05-06 18:15     ` Christoph Hellwig
@ 2020-05-06 19:01       ` Linus Torvalds
  2020-05-07  5:12         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2020-05-06 19:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]

On Wed, May 6, 2020 at 11:15 AM Christoph Hellwig <hch@lst.de> wrote:
>
> That was the first prototype, and or x86 it works great, just the
> __user cases in maccess.c are a little ugly.  And they point to
> the real problem - for architectures like sparc and s390 that use
> an entirely separate address space for the kernel vs userspace
> I dont think just use unsafe_{get,put}_user will work, as they need
> different instructions.

Oh, absolutely. I did *NOT* mean that you'd use "unsafe_get_user()" as
the actual interface. I just meant that as an implementation detail on
x86, using "unsafe_get_user()" instead of "__get_user_size()"
internally both simplifies the implementation, and means that it
doesn't clash horribly with my local changes.

Btw, that brings up another issue: so that people can't mis-use those
kernel accessors and use them for user addresses, they probably should
actually do something like

        if ((long)addr >= 0)
                goto error_label;

on x86. IOW, have the "strict" kernel pointer behavior.

Otherwise somebody will start using them for user pointers, and it
will happen to work on old x86 without CLAC/STAC support.

Of course, maybe CLAC/STAC is so common these days (at least with
developers) that we don't have to worry about it.

> Btw, where is you magic private tree and what is the plan for it?

I don't want to make it a public branch, but here's a private bundle.

It's based on top of my current -git tree - I just maintain a separate
tree that I keep up-to-date locally for testing. My "normal" tree I do
build tests on (allmodconfig etc), this separate tree I keep around to
actually do boot tests on, and I end up using "current Linus' tree
plus this" as the code I actually run om my main desktop.

But this *ONLY* works with clang, and only with current HEAD of the
clang development tree. So it's almosty entirely useless to anybody
else right now. You literally have to clone the llvm tree, build your
own clang, and install it to even _build_ this.

I'm not planning on going any further than my local testing until the
whole thing calms down. The llvm tree still has some known bugs in the
asm goto with output area, and I want there to be an actual release of
it before I actually merge anything like this (and I need to do the
small extra work to then have that conditional "does the compiler
support asm goto with outputs" so that it works with gcc too).

But here you see what it is, if you want to. __get_user_size()
technically still exists, but it has the "target branch" semantics in
here, so your patch clashes badly with it. (Well, those are the
semantics you want, so "badly" may not be the right word, but
basically it means that if you _had_ used unsafe_get_user(), there
wouldn't have been those kinds of semantic conflicts).

                Linus

[-- Attachment #2: asm-goto-outputs.bundle --]
[-- Type: application/octet-stream, Size: 6251 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 15/15] x86: use non-set_fs based maccess routines
  2020-05-06 19:01       ` Linus Torvalds
@ 2020-05-07  5:12         ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-05-07  5:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Daniel Borkmann, Masami Hiramatsu, Andrew Morton, linux-parisc,
	linux-um, Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 06, 2020 at 12:01:32PM -0700, Linus Torvalds wrote:
> Oh, absolutely. I did *NOT* mean that you'd use "unsafe_get_user()" as
> the actual interface. I just meant that as an implementation detail on
> x86, using "unsafe_get_user()" instead of "__get_user_size()"
> internally both simplifies the implementation, and means that it
> doesn't clash horribly with my local changes.

I had a version that just wrapped them, but somehow wasn't able to
make it work due to all the side effects vs macros issues.  Maybe I
need to try again, the current version seemed like a nice way out
as it avoided a lot of the silly casting.


> Btw, that brings up another issue: so that people can't mis-use those
> kernel accessors and use them for user addresses, they probably should
> actually do something like
> 
>         if ((long)addr >= 0)
>                 goto error_label;
> 
> on x86. IOW, have the "strict" kernel pointer behavior.
> 
> Otherwise somebody will start using them for user pointers, and it
> will happen to work on old x86 without CLAC/STAC support.
> 
> Of course, maybe CLAC/STAC is so common these days (at least with
> developers) that we don't have to worry about it.

The actual public routines (probe_kernel_read and co) get these
checks through probe_kernel_read_allowed, which is implemented by
the x86 code.  Doing this for every 1-8 byte access might be a little
slow, though.  Do you really fear drivers starting to use the low-level
helper?  Maybe we need to move those into a different header than
<asm/uaccess.h> that makes it more clear that they are internal?

> But here you see what it is, if you want to. __get_user_size()
> technically still exists, but it has the "target branch" semantics in
> here, so your patch clashes badly with it.

The target branch semantics actually are what I want, that is how the
maccess code is structured.  This is the diff I'd need for the calling
conventions in your bundle:


diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 765e18417b3ba..d1c8aacedade1 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -526,14 +526,8 @@ do {									\
 #define HAVE_ARCH_PROBE_KERNEL
 
 #define arch_kernel_read(dst, src, type, err_label)			\
-do {									\
-        int __kr_err;							\
-									\
 	__get_user_size(*((type *)dst), (__force type __user *)src,	\
-			sizeof(type), __kr_err);			\
-        if (unlikely(__kr_err))						\
-		goto err_label;						\
-} while (0)
+			sizeof(type), err_label);			\
 
 #define arch_kernel_write(dst, src, type, err_label)			\
 	__put_user_size(*((type *)(src)), (__force type __user *)(dst),	\

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read
  2020-05-06  6:22 ` [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
@ 2020-05-11  5:05   ` Masami Hiramatsu
  2020-05-11  5:27     ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2020-05-11  5:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: x86, Alexei Starovoitov, Daniel Borkmann, Linus Torvalds,
	Andrew Morton, linux-parisc, linux-um, netdev, bpf, linux-mm,
	linux-kernel

Hi Christoph,

At first, thank you for your work on cleaning up these functions!

On Wed,  6 May 2020 08:22:20 +0200
Christoph Hellwig <hch@lst.de> wrote:

> Except for historical confusion in the kprobes/uprobes and bpf tracers
> there is no good reason to ever allow user memory accesses from
> probe_kernel_read.

Yes, thus now trace_kprobe supports "ustring" type for accessing
user space memory. (If the address spaces are overwrapped, we have
no way to distinguish whether an address is kernel or user)

>  Make the tracers fall back to a probe_user_read
> if the probe_kernel_read falls to keep the core API clean.

For trace_kprobes doesn't need to fall back. User must specify
the probe should be read from user space or kernel space. This is
because it has  fetch_store_string_user() and probe_mem_read_user()
variants.

Thank you,


> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  arch/parisc/lib/memcpy.c    |  3 +--
>  arch/um/kernel/maccess.c    |  3 +--
>  arch/x86/mm/maccess.c       |  5 +----
>  include/linux/uaccess.h     |  4 +---
>  kernel/trace/bpf_trace.c    | 20 +++++++++++++------
>  kernel/trace/trace_kprobe.c | 11 ++++++++++-
>  mm/maccess.c                | 39 ++++++-------------------------------
>  7 files changed, 34 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
> index 5ef648bd33119..9fe662b3b5604 100644
> --- a/arch/parisc/lib/memcpy.c
> +++ b/arch/parisc/lib/memcpy.c
> @@ -57,8 +57,7 @@ void * memcpy(void * dst,const void *src, size_t count)
>  EXPORT_SYMBOL(raw_copy_in_user);
>  EXPORT_SYMBOL(memcpy);
>  
> -bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
> -		bool strict)
> +bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
>  {
>  	if ((unsigned long)unsafe_src < PAGE_SIZE)
>  		return false;
> diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
> index 90a1bec923158..734f3d7e57c0f 100644
> --- a/arch/um/kernel/maccess.c
> +++ b/arch/um/kernel/maccess.c
> @@ -7,8 +7,7 @@
>  #include <linux/kernel.h>
>  #include <os.h>
>  
> -bool probe_kernel_read_allowed(void *dst, const void *src, size_t size,
> -		bool strict)
> +bool probe_kernel_read_allowed(void *dst, const void *src, size_t size)
>  {
>  	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
>  
> diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
> index 5c323ab187b27..a1bd81677aa72 100644
> --- a/arch/x86/mm/maccess.c
> +++ b/arch/x86/mm/maccess.c
> @@ -26,10 +26,7 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
>  }
>  #endif
>  
> -bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
> -		bool strict)
> +bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
>  {
> -	if (!strict)
> -		return true;
>  	return !invalid_probe_range((unsigned long)unsafe_src);
>  }
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 09d6e358883cc..99e2c2a41164a 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -301,11 +301,9 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
>  	return 0;
>  }
>  
> -bool probe_kernel_read_allowed(void *dst, const void *unsafe_src,
> -		size_t size, bool strict);
> +bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size);
>  
>  extern long probe_kernel_read(void *dst, const void *src, size_t size);
> -extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
>  extern long probe_user_read(void *dst, const void __user *src, size_t size);
>  
>  extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ffe841433caa1..f694befe8ec9b 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -183,12 +183,20 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
>  	int ret = security_locked_down(LOCKDOWN_BPF_READ);
>  
>  	if (unlikely(ret < 0))
> -		goto out;
> -	ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
> -	      probe_kernel_read_strict(dst, unsafe_ptr, size);
> -	if (unlikely(ret < 0))
> -out:
> -		memset(dst, 0, size);
> +		goto fail;
> +
> +	ret = probe_kernel_read(dst, unsafe_ptr, size);
> +	if (unlikely(ret < 0)) {
> +		if (compat)
> +			ret = probe_user_read(dst,
> +				(__force const void __user *)unsafe_ptr, size);
> +		if (unlikely(ret < 0))
> +			goto fail;
> +	}
> +
> +	return 0;
> +fail:
> +	memset(dst, 0, size);
>  	return ret;
>  }
>  
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 525d12137325c..1300c9fd5c755 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1203,6 +1203,9 @@ fetch_store_strlen(unsigned long addr)
>  
>  	do {
>  		ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
> +		if (ret)
> +			ret = probe_user_read(&c,
> +				(__force u8 __user *)addr + len, 1);
>  		len++;
>  	} while (c && ret == 0 && len < MAX_STRING_SIZE);
>  
> @@ -1275,7 +1278,13 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
>  static nokprobe_inline int
>  probe_mem_read(void *dest, void *src, size_t size)
>  {
> -	return probe_kernel_read(dest, src, size);
> +	int ret;
> +
> +	ret = probe_kernel_read(dest, src, size);
> +	if (ret)
> +		ret = probe_user_read(dest, (__force const void __user *)src,
> +				size);
> +	return ret;
>  }
>  
>  static nokprobe_inline int
> diff --git a/mm/maccess.c b/mm/maccess.c
> index cbd9d668aa46e..811f49e8de113 100644
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -6,36 +6,14 @@
>  #include <linux/mm.h>
>  #include <linux/uaccess.h>
>  
> -static long __probe_kernel_read(void *dst, const void *src, size_t size,
> -		bool strict);
> -
>  bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
> -		size_t size, bool strict)
> +		size_t size)
>  {
>  	return true;
>  }
>  
>  /**
> - * probe_kernel_read(): safely attempt to read from any location
> - * @dst: pointer to the buffer that shall take the data
> - * @src: address to read from
> - * @size: size of the data chunk
> - *
> - * Same as probe_kernel_read_strict() except that for architectures with
> - * not fully separated user and kernel address spaces this function also works
> - * for user address tanges.
> - *
> - * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
> - * separate kernel and user address spaces, and also a bad idea otherwise.
> - */
> -long probe_kernel_read(void *dst, const void *src, size_t size)
> -{
> -	return __probe_kernel_read(dst, src, size, false);
> -}
> -EXPORT_SYMBOL_GPL(probe_kernel_read);
> -
> -/**
> - * probe_kernel_read_strict(): safely attempt to read from kernel-space
> + * probe_kernel_read(): safely attempt to read from kernel-space
>   * @dst: pointer to the buffer that shall take the data
>   * @src: address to read from
>   * @size: size of the data chunk
> @@ -48,18 +26,12 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
>   * probe_kernel_read() suitable for use within regions where the caller
>   * already holds mmap_sem, or other locks which nest inside mmap_sem.
>   */
> -long probe_kernel_read_strict(void *dst, const void *src, size_t size)
> -{
> -	return __probe_kernel_read(dst, src, size, true);
> -}
> -
> -static long __probe_kernel_read(void *dst, const void *src, size_t size,
> -		bool strict)
> +long probe_kernel_read(void *dst, const void *src, size_t size)
>  {
>  	long ret;
>  	mm_segment_t old_fs = get_fs();
>  
> -	if (!probe_kernel_read_allowed(dst, src, size, strict))
> +	if (!probe_kernel_read_allowed(dst, src, size))
>  		return -EFAULT;
>  
>  	set_fs(KERNEL_DS);
> @@ -73,6 +45,7 @@ static long __probe_kernel_read(void *dst, const void *src, size_t size,
>  		return -EFAULT;
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(probe_kernel_read);
>  
>  /**
>   * probe_user_read(): safely attempt to read from a user-space location
> @@ -180,7 +153,7 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
>  
>  	if (unlikely(count <= 0))
>  		return 0;
> -	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
> +	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
>  		return -EFAULT;
>  
>  	set_fs(KERNEL_DS);
> -- 
> 2.26.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read
  2020-05-11  5:05   ` Masami Hiramatsu
@ 2020-05-11  5:27     ` Masami Hiramatsu
  0 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-05-11  5:27 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christoph Hellwig, x86, Alexei Starovoitov, Daniel Borkmann,
	Linus Torvalds, Andrew Morton, linux-parisc, linux-um, netdev,
	bpf, linux-mm, linux-kernel

On Mon, 11 May 2020 14:05:36 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Hi Christoph,
> 
> At first, thank you for your work on cleaning up these functions!
> 
> On Wed,  6 May 2020 08:22:20 +0200
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > Except for historical confusion in the kprobes/uprobes and bpf tracers
> > there is no good reason to ever allow user memory accesses from
> > probe_kernel_read.
> 
> Yes, thus now trace_kprobe supports "ustring" type for accessing
> user space memory. (If the address spaces are overwrapped, we have
> no way to distinguish whether an address is kernel or user)
> 
> >  Make the tracers fall back to a probe_user_read
> > if the probe_kernel_read falls to keep the core API clean.
> 
> For trace_kprobes doesn't need to fall back. User must specify
> the probe should be read from user space or kernel space. This is
> because it has  fetch_store_string_user() and probe_mem_read_user()
> variants.

Hmm, wait, I changed my mind. The "string" type currently supports
kernel and user (on some archs, e.g. x86), there is no reason to
restrict it. So let's keep the behavior.
Only if users want to trace "user" data, they can use "ustring".

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Thank you,


> 
> Thank you,
> 
> 
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  arch/parisc/lib/memcpy.c    |  3 +--
> >  arch/um/kernel/maccess.c    |  3 +--
> >  arch/x86/mm/maccess.c       |  5 +----
> >  include/linux/uaccess.h     |  4 +---
> >  kernel/trace/bpf_trace.c    | 20 +++++++++++++------
> >  kernel/trace/trace_kprobe.c | 11 ++++++++++-
> >  mm/maccess.c                | 39 ++++++-------------------------------
> >  7 files changed, 34 insertions(+), 51 deletions(-)
> > 
> > diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
> > index 5ef648bd33119..9fe662b3b5604 100644
> > --- a/arch/parisc/lib/memcpy.c
> > +++ b/arch/parisc/lib/memcpy.c
> > @@ -57,8 +57,7 @@ void * memcpy(void * dst,const void *src, size_t count)
> >  EXPORT_SYMBOL(raw_copy_in_user);
> >  EXPORT_SYMBOL(memcpy);
> >  
> > -bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
> > -		bool strict)
> > +bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
> >  {
> >  	if ((unsigned long)unsafe_src < PAGE_SIZE)
> >  		return false;
> > diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
> > index 90a1bec923158..734f3d7e57c0f 100644
> > --- a/arch/um/kernel/maccess.c
> > +++ b/arch/um/kernel/maccess.c
> > @@ -7,8 +7,7 @@
> >  #include <linux/kernel.h>
> >  #include <os.h>
> >  
> > -bool probe_kernel_read_allowed(void *dst, const void *src, size_t size,
> > -		bool strict)
> > +bool probe_kernel_read_allowed(void *dst, const void *src, size_t size)
> >  {
> >  	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
> >  
> > diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
> > index 5c323ab187b27..a1bd81677aa72 100644
> > --- a/arch/x86/mm/maccess.c
> > +++ b/arch/x86/mm/maccess.c
> > @@ -26,10 +26,7 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
> >  }
> >  #endif
> >  
> > -bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
> > -		bool strict)
> > +bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
> >  {
> > -	if (!strict)
> > -		return true;
> >  	return !invalid_probe_range((unsigned long)unsafe_src);
> >  }
> > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> > index 09d6e358883cc..99e2c2a41164a 100644
> > --- a/include/linux/uaccess.h
> > +++ b/include/linux/uaccess.h
> > @@ -301,11 +301,9 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
> >  	return 0;
> >  }
> >  
> > -bool probe_kernel_read_allowed(void *dst, const void *unsafe_src,
> > -		size_t size, bool strict);
> > +bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size);
> >  
> >  extern long probe_kernel_read(void *dst, const void *src, size_t size);
> > -extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
> >  extern long probe_user_read(void *dst, const void __user *src, size_t size);
> >  
> >  extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ffe841433caa1..f694befe8ec9b 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -183,12 +183,20 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
> >  	int ret = security_locked_down(LOCKDOWN_BPF_READ);
> >  
> >  	if (unlikely(ret < 0))
> > -		goto out;
> > -	ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
> > -	      probe_kernel_read_strict(dst, unsafe_ptr, size);
> > -	if (unlikely(ret < 0))
> > -out:
> > -		memset(dst, 0, size);
> > +		goto fail;
> > +
> > +	ret = probe_kernel_read(dst, unsafe_ptr, size);
> > +	if (unlikely(ret < 0)) {
> > +		if (compat)
> > +			ret = probe_user_read(dst,
> > +				(__force const void __user *)unsafe_ptr, size);
> > +		if (unlikely(ret < 0))
> > +			goto fail;
> > +	}
> > +
> > +	return 0;
> > +fail:
> > +	memset(dst, 0, size);
> >  	return ret;
> >  }
> >  
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 525d12137325c..1300c9fd5c755 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -1203,6 +1203,9 @@ fetch_store_strlen(unsigned long addr)
> >  
> >  	do {
> >  		ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
> > +		if (ret)
> > +			ret = probe_user_read(&c,
> > +				(__force u8 __user *)addr + len, 1);
> >  		len++;
> >  	} while (c && ret == 0 && len < MAX_STRING_SIZE);
> >  
> > @@ -1275,7 +1278,13 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
> >  static nokprobe_inline int
> >  probe_mem_read(void *dest, void *src, size_t size)
> >  {
> > -	return probe_kernel_read(dest, src, size);
> > +	int ret;
> > +
> > +	ret = probe_kernel_read(dest, src, size);
> > +	if (ret)
> > +		ret = probe_user_read(dest, (__force const void __user *)src,
> > +				size);
> > +	return ret;
> >  }
> >  
> >  static nokprobe_inline int
> > diff --git a/mm/maccess.c b/mm/maccess.c
> > index cbd9d668aa46e..811f49e8de113 100644
> > --- a/mm/maccess.c
> > +++ b/mm/maccess.c
> > @@ -6,36 +6,14 @@
> >  #include <linux/mm.h>
> >  #include <linux/uaccess.h>
> >  
> > -static long __probe_kernel_read(void *dst, const void *src, size_t size,
> > -		bool strict);
> > -
> >  bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
> > -		size_t size, bool strict)
> > +		size_t size)
> >  {
> >  	return true;
> >  }
> >  
> >  /**
> > - * probe_kernel_read(): safely attempt to read from any location
> > - * @dst: pointer to the buffer that shall take the data
> > - * @src: address to read from
> > - * @size: size of the data chunk
> > - *
> > - * Same as probe_kernel_read_strict() except that for architectures with
> > - * not fully separated user and kernel address spaces this function also works
> > - * for user address tanges.
> > - *
> > - * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
> > - * separate kernel and user address spaces, and also a bad idea otherwise.
> > - */
> > -long probe_kernel_read(void *dst, const void *src, size_t size)
> > -{
> > -	return __probe_kernel_read(dst, src, size, false);
> > -}
> > -EXPORT_SYMBOL_GPL(probe_kernel_read);
> > -
> > -/**
> > - * probe_kernel_read_strict(): safely attempt to read from kernel-space
> > + * probe_kernel_read(): safely attempt to read from kernel-space
> >   * @dst: pointer to the buffer that shall take the data
> >   * @src: address to read from
> >   * @size: size of the data chunk
> > @@ -48,18 +26,12 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
> >   * probe_kernel_read() suitable for use within regions where the caller
> >   * already holds mmap_sem, or other locks which nest inside mmap_sem.
> >   */
> > -long probe_kernel_read_strict(void *dst, const void *src, size_t size)
> > -{
> > -	return __probe_kernel_read(dst, src, size, true);
> > -}
> > -
> > -static long __probe_kernel_read(void *dst, const void *src, size_t size,
> > -		bool strict)
> > +long probe_kernel_read(void *dst, const void *src, size_t size)
> >  {
> >  	long ret;
> >  	mm_segment_t old_fs = get_fs();
> >  
> > -	if (!probe_kernel_read_allowed(dst, src, size, strict))
> > +	if (!probe_kernel_read_allowed(dst, src, size))
> >  		return -EFAULT;
> >  
> >  	set_fs(KERNEL_DS);
> > @@ -73,6 +45,7 @@ static long __probe_kernel_read(void *dst, const void *src, size_t size,
> >  		return -EFAULT;
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL_GPL(probe_kernel_read);
> >  
> >  /**
> >   * probe_user_read(): safely attempt to read from a user-space location
> > @@ -180,7 +153,7 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
> >  
> >  	if (unlikely(count <= 0))
> >  		return 0;
> > -	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
> > +	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
> >  		return -EFAULT;
> >  
> >  	set_fs(KERNEL_DS);
> > -- 
> > 2.26.2
> > 
> 
> 
> -- 
> Masami Hiramatsu <mhiramat@kernel.org>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/15] maccess: remove strncpy_from_unsafe
  2020-05-06  6:22 ` [PATCH 11/15] maccess: remove strncpy_from_unsafe Christoph Hellwig
@ 2020-05-11  5:34   ` Masami Hiramatsu
  0 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-05-11  5:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: x86, Alexei Starovoitov, Daniel Borkmann, Linus Torvalds,
	Andrew Morton, linux-parisc, linux-um, netdev, bpf, linux-mm,
	linux-kernel

On Wed,  6 May 2020 08:22:19 +0200
Christoph Hellwig <hch@lst.de> wrote:

> All three callers really should try the explicit kernel and user
> copies instead.  One has already deprecated the somewhat dangerous
> either kernel or user address concept, the other two still need to
> follow up eventually.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This looks good to me.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Thank you,

> ---
>  include/linux/uaccess.h     |  1 -
>  kernel/trace/bpf_trace.c    | 40 ++++++++++++++++++++++++++-----------
>  kernel/trace/trace_kprobe.c |  5 ++++-
>  mm/maccess.c                | 39 +-----------------------------------
>  4 files changed, 33 insertions(+), 52 deletions(-)
> 
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index f8c47395a92df..09d6e358883cc 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -311,7 +311,6 @@ extern long probe_user_read(void *dst, const void __user *src, size_t size);
>  extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
>  extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
>  
> -extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
>  extern long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr,
>  				       long count);
>  extern long strncpy_from_user_unsafe(char *dst, const void __user *unsafe_addr,
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e4e202f433903..ffe841433caa1 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -229,9 +229,10 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
>  	int ret = security_locked_down(LOCKDOWN_BPF_READ);
>  
>  	if (unlikely(ret < 0))
> -		goto out;
> +		goto fail;
> +
>  	/*
> -	 * The strncpy_from_unsafe_*() call will likely not fill the entire
> +	 * The strncpy_from_*_unsafe() call will likely not fill the entire
>  	 * buffer, but that's okay in this circumstance as we're probing
>  	 * arbitrary memory anyway similar to bpf_probe_read_*() and might
>  	 * as well probe the stack. Thus, memory is explicitly cleared
> @@ -239,11 +240,18 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
>  	 * code altogether don't copy garbage; otherwise length of string
>  	 * is returned that can be used for bpf_perf_event_output() et al.
>  	 */
> -	ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
> -	      strncpy_from_kernel_unsafe(dst, unsafe_ptr, size);
> -	if (unlikely(ret < 0))
> -out:
> -		memset(dst, 0, size);
> +	ret = strncpy_from_kernel_unsafe(dst, unsafe_ptr, size);
> +	if (unlikely(ret < 0)) {
> +		if (compat)
> +			ret = strncpy_from_user_unsafe(dst,
> +					(__force const void __user *)unsafe_ptr,
> +					size);
> +		if (ret < 0)
> +			goto fail;
> +	}
> +	return 0;
> +fail:
> +	memset(dst, 0, size);
>  	return ret;
>  }
>  
> @@ -321,6 +329,17 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
>  	return &bpf_probe_write_user_proto;
>  }
>  
> +#define BPF_STRNCPY_LEN 64
> +
> +static void bpf_strncpy(char *buf, long unsafe_addr)
> +{
> +	buf[0] = 0;
> +	if (strncpy_from_kernel_unsafe(buf, (void *)unsafe_addr,
> +			BPF_STRNCPY_LEN))
> +		strncpy_from_user_unsafe(buf, (void __user *)unsafe_addr,
> +				BPF_STRNCPY_LEN);
> +}
> +
>  /*
>   * Only limited trace_printk() conversion specifiers allowed:
>   * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %s
> @@ -332,7 +351,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
>  	int mod[3] = {};
>  	int fmt_cnt = 0;
>  	u64 unsafe_addr;
> -	char buf[64];
> +	char buf[BPF_STRNCPY_LEN];
>  	int i;
>  
>  	/*
> @@ -387,10 +406,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
>  					arg3 = (long) buf;
>  					break;
>  				}
> -				buf[0] = 0;
> -				strncpy_from_unsafe(buf,
> -						    (void *) (long) unsafe_addr,
> -						    sizeof(buf));
> +				bpf_strncpy(buf, unsafe_addr);
>  			}
>  			continue;
>  		}
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index a7f43c7ec9880..525d12137325c 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1238,7 +1238,10 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
>  	 * Try to get string again, since the string can be changed while
>  	 * probing.
>  	 */
> -	ret = strncpy_from_unsafe(__dest, (void *)addr, maxlen);
> +	ret = strncpy_from_kernel_unsafe(__dest, (void *)addr, maxlen);
> +	if (ret < 0)
> +		ret = strncpy_from_user_unsafe(__dest, (void __user *)addr,
> +				maxlen);
>  	if (ret >= 0)
>  		*(u32 *)dest = make_data_loc(ret, __dest - base);
>  
> diff --git a/mm/maccess.c b/mm/maccess.c
> index 11563129cd490..cbd9d668aa46e 100644
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -8,8 +8,6 @@
>  
>  static long __probe_kernel_read(void *dst, const void *src, size_t size,
>  		bool strict);
> -static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
> -		long count, bool strict);
>  
>  bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
>  		size_t size, bool strict)
> @@ -156,35 +154,6 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
>  	return 0;
>  }
>  
> -/**
> - * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
> - * @dst:   Destination address, in kernel space.  This buffer must be at
> - *         least @count bytes long.
> - * @unsafe_addr: Unsafe address.
> - * @count: Maximum number of bytes to copy, including the trailing NUL.
> - *
> - * Copies a NUL-terminated string from unsafe address to kernel buffer.
> - *
> - * On success, returns the length of the string INCLUDING the trailing NUL.
> - *
> - * If access fails, returns -EFAULT (some data may have been copied
> - * and the trailing NUL added).
> - *
> - * If @count is smaller than the length of the string, copies @count-1 bytes,
> - * sets the last byte of @dst buffer to NUL and returns @count.
> - *
> - * Same as strncpy_from_kernel_unsafe() except that for architectures with
> - * not fully separated user and kernel address spaces this function also works
> - * for user address tanges.
> - *
> - * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
> - * separate kernel and user address spaces, and also a bad idea otherwise.
> - */
> -long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
> -{
> -	return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
> -}
> -
>  /**
>   * strncpy_from_kernel_unsafe: - Copy a NUL terminated string from unsafe
>   *				 address.
> @@ -204,12 +173,6 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
>   * sets the last byte of @dst buffer to NUL and returns @count.
>   */
>  long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
> -{
> -	return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
> -}
> -
> -static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
> -		long count, bool strict)
>  {
>  	mm_segment_t old_fs = get_fs();
>  	const void *src = unsafe_addr;
> @@ -217,7 +180,7 @@ static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
>  
>  	if (unlikely(count <= 0))
>  		return 0;
> -	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, strict))
> +	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
>  		return -EFAULT;
>  
>  	set_fs(KERNEL_DS);
> -- 
> 2.26.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-05-11  5:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06  6:22 clean up and streamline probe_kernel_* and friends Christoph Hellwig
2020-05-06  6:22 ` [PATCH 01/15] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
2020-05-06  6:22 ` [PATCH 02/15] maccess: remove various unused weak aliases Christoph Hellwig
2020-05-06  6:22 ` [PATCH 03/15] maccess: remove duplicate kerneldoc commens Christoph Hellwig
2020-05-06  6:22 ` [PATCH 04/15] maccess: clarify kerneldoc comments Christoph Hellwig
2020-05-06  6:22 ` [PATCH 05/15] maccess: update the top of file comment Christoph Hellwig
2020-05-06  6:22 ` [PATCH 06/15] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_unsafe Christoph Hellwig
2020-05-06  6:22 ` [PATCH 07/15] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_unsafe Christoph Hellwig
2020-05-06  6:22 ` [PATCH 08/15] maccess: rename strnlen_unsafe_user to strnlen_user_unsafe Christoph Hellwig
2020-05-06 17:44   ` Linus Torvalds
2020-05-06 17:47     ` Christoph Hellwig
2020-05-06 17:57       ` Linus Torvalds
2020-05-06  6:22 ` [PATCH 09/15] maccess: remove probe_read_common and probe_write_common Christoph Hellwig
2020-05-06  6:22 ` [PATCH 10/15] maccess: unify the probe kernel arch hooks Christoph Hellwig
2020-05-06  6:22 ` [PATCH 11/15] maccess: remove strncpy_from_unsafe Christoph Hellwig
2020-05-11  5:34   ` Masami Hiramatsu
2020-05-06  6:22 ` [PATCH 12/15] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
2020-05-11  5:05   ` Masami Hiramatsu
2020-05-11  5:27     ` Masami Hiramatsu
2020-05-06  6:22 ` [PATCH 13/15] maccess: move user access routines together Christoph Hellwig
2020-05-06  6:22 ` [PATCH 14/15] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
2020-05-06  6:22 ` [PATCH 15/15] x86: use non-set_fs based maccess routines Christoph Hellwig
2020-05-06 17:51   ` Linus Torvalds
2020-05-06 18:15     ` Christoph Hellwig
2020-05-06 19:01       ` Linus Torvalds
2020-05-07  5:12         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).