Linux-parisc archive on lore.kernel.org
 help / color / Atom feed
* clean up and streamline probe_kernel_* and friends v2
@ 2020-05-13 16:00 Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 01/18] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
                   ` (19 more replies)
  0 siblings, 20 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Hi all,

this series start cleaning up the safe kernel and user memory probing
helpers in mm/maccess.c, and then allows architectures to implement
the kernel probing without overriding the address space limit and
temporarily allowing access to user memory.  It then switches x86
over to this new mechanism by reusing the unsafe_* uaccess logic.

This version also switches to the saner copy_{from,to}_kernel_nofault
naming suggested by Linus.

I kept the x86 helprs as-is without calling unsage_{get,put}_user as
that avoids a number of hard to trace casts, and it will still work
with the asm-goto based version easily.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/18] maccess: unexport probe_kernel_write and probe_user_write
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 02/18] maccess: remove various unused weak aliases Christoph Hellwig
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

These two functions are not used by any modular code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 3ca8d97e50106..cf21e604f78cb 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -121,7 +121,6 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(probe_kernel_write);
 
 /**
  * probe_user_write(): safely attempt to write to a user-space location
@@ -148,7 +147,6 @@ long __probe_user_write(void __user *dst, const void *src, size_t size)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(probe_user_write);
 
 /**
  * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 02/18] maccess: remove various unused weak aliases
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 01/18] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 03/18] maccess: remove duplicate kerneldoc comments Christoph Hellwig
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

maccess tends to define lots of underscore prefixed symbols that then
have other weak aliases.  But except for two cases they are never
actually used, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h |  3 ---
 mm/maccess.c            | 19 +++----------------
 2 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad5..a2c606a403745 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -324,7 +324,6 @@ extern long __probe_kernel_read(void *dst, const void *src, size_t size);
  * happens, handle that and return -EFAULT.
  */
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
-extern long __probe_user_read(void *dst, const void __user *src, size_t size);
 
 /*
  * probe_kernel_write(): safely attempt to write to a location
@@ -336,7 +335,6 @@ extern long __probe_user_read(void *dst, const void __user *src, size_t size);
  * happens, handle that and return -EFAULT.
  */
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size);
 
 /*
  * probe_user_write(): safely attempt to write to a location in user space
@@ -348,7 +346,6 @@ extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size
  * happens, handle that and return -EFAULT.
  */
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
-extern long notrace __probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
diff --git a/mm/maccess.c b/mm/maccess.c
index cf21e604f78cb..4e7f3b6eb05ae 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -79,11 +79,7 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
  * Safely read from user address @src to the buffer at @dst. If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-
-long __weak probe_user_read(void *dst, const void __user *src, size_t size)
-    __attribute__((alias("__probe_user_read")));
-
-long __probe_user_read(void *dst, const void __user *src, size_t size)
+long probe_user_read(void *dst, const void __user *src, size_t size)
 {
 	long ret = -EFAULT;
 	mm_segment_t old_fs = get_fs();
@@ -106,11 +102,7 @@ EXPORT_SYMBOL_GPL(probe_user_read);
  * Safely write to address @dst from the buffer at @src.  If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-
-long __weak probe_kernel_write(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_write")));
-
-long __probe_kernel_write(void *dst, const void *src, size_t size)
+long probe_kernel_write(void *dst, const void *src, size_t size)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
@@ -131,11 +123,7 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
  * Safely write to address @dst from the buffer at @src.  If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-
-long __weak probe_user_write(void __user *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_user_write")));
-
-long __probe_user_write(void __user *dst, const void *src, size_t size)
+long probe_user_write(void __user *dst, const void *src, size_t size)
 {
 	long ret = -EFAULT;
 	mm_segment_t old_fs = get_fs();
@@ -171,7 +159,6 @@ long __probe_user_write(void __user *dst, const void *src, size_t size)
  * probing memory on a user address range where strncpy_from_unsafe_user() is
  * supposed to be used instead.
  */
-
 long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/18] maccess: remove duplicate kerneldoc comments
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 01/18] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 02/18] maccess: remove various unused weak aliases Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 04/18] maccess: clarify " Christoph Hellwig
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Many of the maccess routines have a copy of the kerneldoc comment
in the header.  Remove it as it is not useful and will get out of
sync sooner or later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h | 38 --------------------------------------
 1 file changed, 38 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index a2c606a403745..5a36a298a85f8 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,50 +301,12 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
-/*
- * probe_kernel_read(): safely attempt to read from a location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Safely read from address @src to the buffer at @dst.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
 extern long __probe_kernel_read(void *dst, const void *src, size_t size);
-
-/*
- * probe_user_read(): safely attempt to read from a location in user space
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Safely read from address @src to the buffer at @dst.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
 
-/*
- * probe_kernel_write(): safely attempt to write to a location
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-
-/*
- * probe_user_write(): safely attempt to write to a location in user space
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 04/18] maccess: clarify kerneldoc comments
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 03/18] maccess: remove duplicate kerneldoc comments Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 05/18] maccess: update the top of file comment Christoph Hellwig
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Add proper kerneldoc comments for probe_kernel_read_strict and
probe_kernel_read strncpy_from_unsafe_strict and explain the different
versus the non-strict version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 61 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 4e7f3b6eb05ae..747581ac50dc9 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -31,29 +31,35 @@ probe_write_common(void __user *dst, const void *src, size_t size)
 }
 
 /**
- * probe_kernel_read(): safely attempt to read from a kernel-space location
+ * probe_kernel_read(): safely attempt to read from any location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
  *
- * Safely read from address @src to the buffer at @dst.  If a kernel fault
- * happens, handle that and return -EFAULT.
+ * Same as probe_kernel_read_strict() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
+ */
+long __weak probe_kernel_read(void *dst, const void *src, size_t size)
+    __attribute__((alias("__probe_kernel_read")));
+
+/**
+ * probe_kernel_read_strict(): safely attempt to read from kernel-space
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Safely read from kernel address @src to the buffer at @dst.  If a kernel
+ * fault happens, handle that and return -EFAULT.
  *
  * We ensure that the copy_from_user is executed in atomic context so that
  * do_page_fault() doesn't attempt to take mmap_sem.  This makes
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
- *
- * probe_kernel_read_strict() is the same as probe_kernel_read() except for
- * the case where architectures have non-overlapping user and kernel address
- * ranges: probe_kernel_read_strict() will additionally return -EFAULT for
- * probing memory on a user address range where probe_user_read() is supposed
- * to be used instead.
  */
-
-long __weak probe_kernel_read(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_read")));
-
 long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
     __attribute__((alias("__probe_kernel_read")));
 
@@ -153,15 +159,34 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  *
- * strncpy_from_unsafe_strict() is the same as strncpy_from_unsafe() except
- * for the case where architectures have non-overlapping user and kernel address
- * ranges: strncpy_from_unsafe_strict() will additionally return -EFAULT for
- * probing memory on a user address range where strncpy_from_unsafe_user() is
- * supposed to be used instead.
+ * Same as strncpy_from_unsafe_strict() except that for architectures with
+ * not fully separated user and kernel address spaces this function also works
+ * for user address tanges.
+ *
+ * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
+ * separate kernel and user address spaces, and also a bad idea otherwise.
  */
 long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
+/**
+ * strncpy_from_unsafe_strict: - Copy a NUL terminated string from unsafe
+ *				 address.
+ * @dst:   Destination address, in kernel space.  This buffer must be at
+ *         least @count bytes long.
+ * @unsafe_addr: Unsafe address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
 long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
 				       long count)
     __attribute__((alias("__strncpy_from_unsafe")));
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 05/18] maccess: update the top of file comment
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 04/18] maccess: clarify " Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 06/18] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault Christoph Hellwig
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This file now also contains several helpers for accessing user memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 747581ac50dc9..65880ba2ca376 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Access kernel memory without faulting.
+ * Access kernel or user memory without faulting.
  */
 #include <linux/export.h>
 #include <linux/mm.h>
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 06/18] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 05/18] maccess: update the top of file comment Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 07/18] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault Christoph Hellwig
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This matches the naming of strncpy_from_user, and also makes it more
clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h     | 4 ++--
 kernel/trace/bpf_trace.c    | 2 +-
 kernel/trace/trace_kprobe.c | 2 +-
 mm/maccess.c                | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a36a298a85f8..b983cb1c1216a 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -313,8 +313,8 @@ extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
 				       long count);
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
-				     long count);
+long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
+		long count);
 extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 
 /**
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1796747a773..fcbc11040d956 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -159,7 +159,7 @@ static const struct bpf_func_proto bpf_probe_read_user_proto = {
 BPF_CALL_3(bpf_probe_read_user_str, void *, dst, u32, size,
 	   const void __user *, unsafe_ptr)
 {
-	int ret = strncpy_from_unsafe_user(dst, unsafe_ptr, size);
+	int ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
 
 	if (unlikely(ret < 0))
 		memset(dst, 0, size);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 35989383ae113..d600f41fda1ca 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1268,7 +1268,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
 
 	__dest = get_loc_data(dest, base);
 
-	ret = strncpy_from_unsafe_user(__dest, uaddr, maxlen);
+	ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
 	if (ret >= 0)
 		*(u32 *)dest = make_data_loc(ret, __dest - base);
 
diff --git a/mm/maccess.c b/mm/maccess.c
index 65880ba2ca376..457d8f9bf714f 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -215,7 +215,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 }
 
 /**
- * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ * strncpy_from_user_nofault: - Copy a NUL terminated string from unsafe user
  *				address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
  *         least @count bytes long.
@@ -232,7 +232,7 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
 			      long count)
 {
 	mm_segment_t old_fs = get_fs();
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 07/18] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 06/18] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 08/18] maccess: rename strnlen_unsafe_user to strnlen_user_nofault Christoph Hellwig
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This matches the naming of strncpy_from_user_nofault, and also makes it
more clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/mm/maccess.c    | 2 +-
 include/linux/uaccess.h  | 4 ++--
 kernel/trace/bpf_trace.c | 2 +-
 mm/maccess.c             | 6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index f5b85bdc0535c..62c4017a2473d 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -34,7 +34,7 @@ long probe_kernel_read_strict(void *dst, const void *src, size_t size)
 	return __probe_kernel_read(dst, src, size);
 }
 
-long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr, long count)
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 {
 	if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
 		return -EFAULT;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index b983cb1c1216a..134ff9c1c151b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -310,8 +310,8 @@ extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
-extern long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
-				       long count);
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
+		long count);
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
 		long count);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fcbc11040d956..3dd4763c195bb 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -240,7 +240,7 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
 	 * is returned that can be used for bpf_perf_event_output() et al.
 	 */
 	ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
-	      strncpy_from_unsafe_strict(dst, unsafe_ptr, size);
+	      strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0))
 out:
 		memset(dst, 0, size);
diff --git a/mm/maccess.c b/mm/maccess.c
index 457d8f9bf714f..c8748c2809096 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -159,7 +159,7 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  *
- * Same as strncpy_from_unsafe_strict() except that for architectures with
+ * Same as strncpy_from_kernel_nofault() except that for architectures with
  * not fully separated user and kernel address spaces this function also works
  * for user address tanges.
  *
@@ -170,7 +170,7 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
 /**
- * strncpy_from_unsafe_strict: - Copy a NUL terminated string from unsafe
+ * strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
  *				 address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
  *         least @count bytes long.
@@ -187,7 +187,7 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long __weak strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr,
+long __weak strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
 				       long count)
     __attribute__((alias("__strncpy_from_unsafe")));
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 08/18] maccess: rename strnlen_unsafe_user to strnlen_user_nofault
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 07/18] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 09/18] maccess: remove probe_read_common and probe_write_common Christoph Hellwig
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

This matches the naming of strnlen_user, and also makes it more clear
what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uaccess.h     | 2 +-
 kernel/trace/trace_kprobe.c | 2 +-
 mm/maccess.c                | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 134ff9c1c151b..d8366f8468664 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -315,7 +315,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
 extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
 		long count);
-extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
+long strnlen_user_nofault(const void __user *unsafe_addr, long count);
 
 /**
  * probe_kernel_address(): safely attempt to read from a location
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d600f41fda1ca..4325f9e7fadaa 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1221,7 +1221,7 @@ fetch_store_strlen_user(unsigned long addr)
 {
 	const void __user *uaddr =  (__force const void __user *)addr;
 
-	return strnlen_unsafe_user(uaddr, MAX_STRING_SIZE);
+	return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
 }
 
 /*
diff --git a/mm/maccess.c b/mm/maccess.c
index c8748c2809096..e783ebfccd542 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -258,7 +258,7 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
 }
 
 /**
- * strnlen_unsafe_user: - Get the size of a user string INCLUDING final NUL.
+ * strnlen_user_nofault: - Get the size of a user string INCLUDING final NUL.
  * @unsafe_addr: The string to measure.
  * @count: Maximum count (including NUL)
  *
@@ -273,7 +273,7 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
  * Unlike strnlen_user, this can be used from IRQ handler etc. because
  * it disables pagefaults.
  */
-long strnlen_unsafe_user(const void __user *unsafe_addr, long count)
+long strnlen_user_nofault(const void __user *unsafe_addr, long count)
 {
 	mm_segment_t old_fs = get_fs();
 	int ret;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 09/18] maccess: remove probe_read_common and probe_write_common
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 08/18] maccess: rename strnlen_unsafe_user to strnlen_user_nofault Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 10/18] maccess: unify the probe kernel arch hooks Christoph Hellwig
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Each of the helpers has just two callers, which also different in
dealing with kernel or userspace pointers.  Just open code the logic
in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 63 ++++++++++++++++++++++++----------------------------
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index e783ebfccd542..31cf6604e7fff 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,30 +6,6 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
-static __always_inline long
-probe_read_common(void *dst, const void __user *src, size_t size)
-{
-	long ret;
-
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(dst, src, size);
-	pagefault_enable();
-
-	return ret ? -EFAULT : 0;
-}
-
-static __always_inline long
-probe_write_common(void __user *dst, const void *src, size_t size)
-{
-	long ret;
-
-	pagefault_disable();
-	ret = __copy_to_user_inatomic(dst, src, size);
-	pagefault_enable();
-
-	return ret ? -EFAULT : 0;
-}
-
 /**
  * probe_kernel_read(): safely attempt to read from any location
  * @dst: pointer to the buffer that shall take the data
@@ -69,10 +45,15 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	ret = probe_read_common(dst, (__force const void __user *)src, size);
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(dst, (__force const void __user *)src,
+			size);
+	pagefault_enable();
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
@@ -91,11 +72,16 @@ long probe_user_read(void *dst, const void __user *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(USER_DS);
-	if (access_ok(src, size))
-		ret = probe_read_common(dst, src, size);
+	if (access_ok(src, size)) {
+		pagefault_disable();
+		ret = __copy_from_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(probe_user_read);
 
@@ -114,10 +100,14 @@ long probe_kernel_write(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	ret = probe_write_common((__force void __user *)dst, src, size);
+	pagefault_disable();
+	ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);
+	pagefault_enable();
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 
 /**
@@ -135,11 +125,16 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(USER_DS);
-	if (access_ok(dst, size))
-		ret = probe_write_common(dst, src, size);
+	if (access_ok(dst, size)) {
+		pagefault_disable();
+		ret = __copy_to_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
 	set_fs(old_fs);
 
-	return ret;
+	if (ret)
+		return -EFAULT;
+	return 0;
 }
 
 /**
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 10/18] maccess: unify the probe kernel arch hooks
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 09/18] maccess: remove probe_read_common and probe_write_common Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-14  1:13   ` Masami Hiramatsu
  2020-05-13 16:00 ` [PATCH 11/18] maccess: remove strncpy_from_unsafe Christoph Hellwig
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Currently architectures have to override every routine that probes
kernel memory, which includes a pure read and strcpy, both in strict
and not strict variants.  Just provide a single arch hooks instead to
make sure all architectures cover all the cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/parisc/lib/memcpy.c | 13 ++++-------
 arch/um/kernel/maccess.c | 11 ++++-----
 arch/x86/mm/maccess.c    | 18 ++++-----------
 include/linux/uaccess.h  |  6 +++--
 mm/maccess.c             | 49 ++++++++++++++++++++++++++++++----------
 5 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index beceaab34ecb7..5ef648bd33119 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,14 +57,11 @@ void * memcpy(void * dst,const void *src, size_t count)
 EXPORT_SYMBOL(raw_copy_in_user);
 EXPORT_SYMBOL(memcpy);
 
-long probe_kernel_read(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
+		bool strict)
 {
-	unsigned long addr = (unsigned long)src;
-
-	if (addr < PAGE_SIZE)
-		return -EFAULT;
-
+	if ((unsigned long)unsafe_src < PAGE_SIZE)
+		return false;
 	/* check for I/O space F_EXTEND(0xfff00000) access as well? */
-
-	return __probe_kernel_read(dst, src, size);
+	return true;
 }
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index 67b2e0fa92bba..90a1bec923158 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,15 +7,14 @@
 #include <linux/kernel.h>
 #include <os.h>
 
-long probe_kernel_read(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(void *dst, const void *src, size_t size,
+		bool strict)
 {
 	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
 
 	if ((unsigned long)src < PAGE_SIZE || size <= 0)
-		return -EFAULT;
-
+		return false;
 	if (os_mincore(psrc, size + src - psrc) <= 0)
-		return -EFAULT;
-
-	return __probe_kernel_read(dst, src, size);
+		return false;
+	return true;
 }
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 62c4017a2473d..5c323ab187b27 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -26,18 +26,10 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
 }
 #endif
 
-long probe_kernel_read_strict(void *dst, const void *src, size_t size)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
+		bool strict)
 {
-	if (unlikely(invalid_probe_range((unsigned long)src)))
-		return -EFAULT;
-
-	return __probe_kernel_read(dst, src, size);
-}
-
-long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
-{
-	if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
-		return -EFAULT;
-
-	return __strncpy_from_unsafe(dst, unsafe_addr, count);
+	if (!strict)
+		return true;
+	return !invalid_probe_range((unsigned long)unsafe_src);
 }
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index d8366f8468664..7cfc10eb09c60 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,9 +301,11 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src,
+		size_t size, bool strict);
+
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
-extern long __probe_kernel_read(void *dst, const void *src, size_t size);
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
 
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
@@ -312,7 +314,7 @@ extern long notrace probe_user_write(void __user *dst, const void *src, size_t s
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
 		long count);
-extern long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
+
 long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
 		long count);
 long strnlen_user_nofault(const void __user *unsafe_addr, long count);
diff --git a/mm/maccess.c b/mm/maccess.c
index 31cf6604e7fff..483a933b7d241 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,6 +6,17 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
+static long __probe_kernel_read(void *dst, const void *src, size_t size,
+		bool strict);
+static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
+		long count, bool strict);
+
+bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
+		size_t size, bool strict)
+{
+	return true;
+}
+
 /**
  * probe_kernel_read(): safely attempt to read from any location
  * @dst: pointer to the buffer that shall take the data
@@ -19,8 +30,11 @@
  * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
  * separate kernel and user address spaces, and also a bad idea otherwise.
  */
-long __weak probe_kernel_read(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_read")));
+long probe_kernel_read(void *dst, const void *src, size_t size)
+{
+	return __probe_kernel_read(dst, src, size, false);
+}
+EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
  * probe_kernel_read_strict(): safely attempt to read from kernel-space
@@ -36,14 +50,20 @@ long __weak probe_kernel_read(void *dst, const void *src, size_t size)
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
  */
-long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
-    __attribute__((alias("__probe_kernel_read")));
+long probe_kernel_read_strict(void *dst, const void *src, size_t size)
+{
+	return __probe_kernel_read(dst, src, size, true);
+}
 
-long __probe_kernel_read(void *dst, const void *src, size_t size)
+static long __probe_kernel_read(void *dst, const void *src, size_t size,
+		bool strict)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
 
+	if (!probe_kernel_read_allowed(dst, src, size, strict))
+		return -EFAULT;
+
 	set_fs(KERNEL_DS);
 	pagefault_disable();
 	ret = __copy_from_user_inatomic(dst, (__force const void __user *)src,
@@ -55,7 +75,6 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 		return -EFAULT;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
  * probe_user_read(): safely attempt to read from a user-space location
@@ -161,8 +180,10 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
  * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
  * separate kernel and user address spaces, and also a bad idea otherwise.
  */
-long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
-    __attribute__((alias("__strncpy_from_unsafe")));
+long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+{
+	return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
+}
 
 /**
  * strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
@@ -182,11 +203,13 @@ long __weak strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * If @count is smaller than the length of the string, copies @count-1 bytes,
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
-long __weak strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
-				       long count)
-    __attribute__((alias("__strncpy_from_unsafe")));
+long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
+{
+	return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
+}
 
-long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
+static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
+		long count, bool strict)
 {
 	mm_segment_t old_fs = get_fs();
 	const void *src = unsafe_addr;
@@ -194,6 +217,8 @@ long __strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	if (unlikely(count <= 0))
 		return 0;
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, strict))
+		return -EFAULT;
 
 	set_fs(KERNEL_DS);
 	pagefault_disable();
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 10/18] maccess: unify the probe kernel arch hooks Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 19:11   ` Linus Torvalds
  2020-05-13 16:00 ` [PATCH 12/18] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

All three callers really should try the explicit kernel and user
copies instead.  One has already deprecated the somewhat dangerous
either kernel or user address concept, the other two still need to
follow up eventually.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 include/linux/uaccess.h     |  1 -
 kernel/trace/bpf_trace.c    | 39 +++++++++++++++++++++++++------------
 kernel/trace/trace_kprobe.c |  5 ++++-
 mm/maccess.c                | 39 +------------------------------------
 4 files changed, 32 insertions(+), 52 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 7cfc10eb09c60..28944a14e0534 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -311,7 +311,6 @@ extern long probe_user_read(void *dst, const void __user *src, size_t size);
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
 
-extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
 long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
 		long count);
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3dd4763c195bb..0d849acc9de38 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -226,12 +226,14 @@ static __always_inline int
 bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
 				 const bool compat)
 {
+	const void __user *user_ptr = (__force const void __user *)unsafe_ptr;
 	int ret = security_locked_down(LOCKDOWN_BPF_READ);
 
 	if (unlikely(ret < 0))
-		goto out;
+		goto fail;
+
 	/*
-	 * The strncpy_from_unsafe_*() call will likely not fill the entire
+	 * The strncpy_from_*_nofault() calls will likely not fill the entire
 	 * buffer, but that's okay in this circumstance as we're probing
 	 * arbitrary memory anyway similar to bpf_probe_read_*() and might
 	 * as well probe the stack. Thus, memory is explicitly cleared
@@ -239,11 +241,16 @@ bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr,
 	 * code altogether don't copy garbage; otherwise length of string
 	 * is returned that can be used for bpf_perf_event_output() et al.
 	 */
-	ret = compat ? strncpy_from_unsafe(dst, unsafe_ptr, size) :
-	      strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
-	if (unlikely(ret < 0))
-out:
-		memset(dst, 0, size);
+	ret = strncpy_from_kernel_nofault(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0)) {
+		if (compat)
+			ret = strncpy_from_user_nofault(dst, user_ptr, size);
+		if (unlikely(ret < 0))
+			goto fail;
+	}
+	return 0;
+fail:
+	memset(dst, 0, size);
 	return ret;
 }
 
@@ -321,6 +328,17 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
 	return &bpf_probe_write_user_proto;
 }
 
+#define BPF_STRNCPY_LEN 64
+
+static void bpf_strncpy(char *buf, long unsafe_addr)
+{
+	buf[0] = 0;
+	if (strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
+			BPF_STRNCPY_LEN))
+		strncpy_from_user_nofault(buf, (void __user *)unsafe_addr,
+				BPF_STRNCPY_LEN);
+}
+
 /*
  * Only limited trace_printk() conversion specifiers allowed:
  * %d %i %u %x %ld %li %lu %lx %lld %lli %llu %llx %p %s
@@ -332,7 +350,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
 	int mod[3] = {};
 	int fmt_cnt = 0;
 	u64 unsafe_addr;
-	char buf[64];
+	char buf[BPF_STRNCPY_LEN];
 	int i;
 
 	/*
@@ -387,10 +405,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
 					arg3 = (long) buf;
 					break;
 				}
-				buf[0] = 0;
-				strncpy_from_unsafe(buf,
-						    (void *) (long) unsafe_addr,
-						    sizeof(buf));
+				bpf_strncpy(buf, unsafe_addr);
 			}
 			continue;
 		}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4325f9e7fadaa..8c456e30933d3 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1244,7 +1244,10 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
 	 * Try to get string again, since the string can be changed while
 	 * probing.
 	 */
-	ret = strncpy_from_unsafe(__dest, (void *)addr, maxlen);
+	ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen);
+	if (ret < 0)
+		ret = strncpy_from_user_nofault(__dest, (void __user *)addr,
+				maxlen);
 	if (ret >= 0)
 		*(u32 *)dest = make_data_loc(ret, __dest - base);
 
diff --git a/mm/maccess.c b/mm/maccess.c
index 483a933b7d241..3d85e48013e6b 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -8,8 +8,6 @@
 
 static long __probe_kernel_read(void *dst, const void *src, size_t size,
 		bool strict);
-static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
-		long count, bool strict);
 
 bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
 		size_t size, bool strict)
@@ -156,35 +154,6 @@ long probe_user_write(void __user *dst, const void *src, size_t size)
 	return 0;
 }
 
-/**
- * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
- * @dst:   Destination address, in kernel space.  This buffer must be at
- *         least @count bytes long.
- * @unsafe_addr: Unsafe address.
- * @count: Maximum number of bytes to copy, including the trailing NUL.
- *
- * Copies a NUL-terminated string from unsafe address to kernel buffer.
- *
- * On success, returns the length of the string INCLUDING the trailing NUL.
- *
- * If access fails, returns -EFAULT (some data may have been copied
- * and the trailing NUL added).
- *
- * If @count is smaller than the length of the string, copies @count-1 bytes,
- * sets the last byte of @dst buffer to NUL and returns @count.
- *
- * Same as strncpy_from_kernel_nofault() except that for architectures with
- * not fully separated user and kernel address spaces this function also works
- * for user address tanges.
- *
- * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
- * separate kernel and user address spaces, and also a bad idea otherwise.
- */
-long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
-{
-	return __strncpy_from_unsafe(dst, unsafe_addr, count, false);
-}
-
 /**
  * strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
  *				 address.
@@ -204,12 +173,6 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
  * sets the last byte of @dst buffer to NUL and returns @count.
  */
 long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
-{
-	return __strncpy_from_unsafe(dst, unsafe_addr, count, true);
-}
-
-static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
-		long count, bool strict)
 {
 	mm_segment_t old_fs = get_fs();
 	const void *src = unsafe_addr;
@@ -217,7 +180,7 @@ static long __strncpy_from_unsafe(char *dst, const void *unsafe_addr,
 
 	if (unlikely(count <= 0))
 		return 0;
-	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, strict))
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 12/18] maccess: always use strict semantics for probe_kernel_read
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 11/18] maccess: remove strncpy_from_unsafe Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 13/18] maccess: move user access routines together Christoph Hellwig
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Except for historical confusion in the kprobes/uprobes and bpf tracers
there is no good reason to ever allow user memory accesses from
probe_kernel_read.  Make the tracers fall back to a probe_user_read
if the probe_kernel_read falls to keep the core API clean.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/parisc/lib/memcpy.c    |  3 +--
 arch/um/kernel/maccess.c    |  3 +--
 arch/x86/mm/maccess.c       |  5 +----
 include/linux/uaccess.h     |  4 +---
 kernel/trace/bpf_trace.c    | 20 +++++++++++++------
 kernel/trace/trace_kprobe.c | 11 ++++++++++-
 mm/maccess.c                | 39 ++++++-------------------------------
 7 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 5ef648bd33119..9fe662b3b5604 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,8 +57,7 @@ void * memcpy(void * dst,const void *src, size_t count)
 EXPORT_SYMBOL(raw_copy_in_user);
 EXPORT_SYMBOL(memcpy);
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
-		bool strict)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
 {
 	if ((unsigned long)unsafe_src < PAGE_SIZE)
 		return false;
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index 90a1bec923158..734f3d7e57c0f 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,8 +7,7 @@
 #include <linux/kernel.h>
 #include <os.h>
 
-bool probe_kernel_read_allowed(void *dst, const void *src, size_t size,
-		bool strict)
+bool probe_kernel_read_allowed(void *dst, const void *src, size_t size)
 {
 	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
 
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 5c323ab187b27..a1bd81677aa72 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -26,10 +26,7 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
 }
 #endif
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size,
-		bool strict)
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
 {
-	if (!strict)
-		return true;
 	return !invalid_probe_range((unsigned long)unsafe_src);
 }
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 28944a14e0534..78e0ff8641559 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,11 +301,9 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src,
-		size_t size, bool strict);
+bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size);
 
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
-extern long probe_kernel_read_strict(void *dst, const void *src, size_t size);
 extern long probe_user_read(void *dst, const void __user *src, size_t size);
 
 extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0d849acc9de38..5dea4169a8323 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -180,15 +180,23 @@ static __always_inline int
 bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
 			     const bool compat)
 {
+	const void __user *user_ptr = (__force const void __user *)unsafe_ptr;
 	int ret = security_locked_down(LOCKDOWN_BPF_READ);
 
 	if (unlikely(ret < 0))
-		goto out;
-	ret = compat ? probe_kernel_read(dst, unsafe_ptr, size) :
-	      probe_kernel_read_strict(dst, unsafe_ptr, size);
-	if (unlikely(ret < 0))
-out:
-		memset(dst, 0, size);
+		goto fail;
+
+	ret = probe_kernel_read(dst, unsafe_ptr, size);
+	if (unlikely(ret < 0)) {
+		if (compat)
+			ret = probe_user_read(dst, user_ptr, size);
+		if (unlikely(ret < 0))
+			goto fail;
+	}
+
+	return 0;
+fail:
+	memset(dst, 0, size);
 	return ret;
 }
 
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 8c456e30933d3..7725146369731 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1209,6 +1209,9 @@ fetch_store_strlen(unsigned long addr)
 
 	do {
 		ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
+		if (ret)
+			ret = probe_user_read(&c,
+				(__force u8 __user *)addr + len, 1);
 		len++;
 	} while (c && ret == 0 && len < MAX_STRING_SIZE);
 
@@ -1281,7 +1284,13 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size)
 {
-	return probe_kernel_read(dest, src, size);
+	const void __user *user_ptr = (__force const void __user *)src;
+	int ret;
+
+	ret = probe_kernel_read(dest, src, size);
+	if (ret)
+		ret = probe_user_read(dest, user_ptr, size);
+	return ret;
 }
 
 static nokprobe_inline int
diff --git a/mm/maccess.c b/mm/maccess.c
index 3d85e48013e6b..05c44d490b4e3 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,36 +6,14 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
-static long __probe_kernel_read(void *dst, const void *src, size_t size,
-		bool strict);
-
 bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
-		size_t size, bool strict)
+		size_t size)
 {
 	return true;
 }
 
 /**
- * probe_kernel_read(): safely attempt to read from any location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from
- * @size: size of the data chunk
- *
- * Same as probe_kernel_read_strict() except that for architectures with
- * not fully separated user and kernel address spaces this function also works
- * for user address tanges.
- *
- * DO NOT USE THIS FUNCTION - it is broken on architectures with entirely
- * separate kernel and user address spaces, and also a bad idea otherwise.
- */
-long probe_kernel_read(void *dst, const void *src, size_t size)
-{
-	return __probe_kernel_read(dst, src, size, false);
-}
-EXPORT_SYMBOL_GPL(probe_kernel_read);
-
-/**
- * probe_kernel_read_strict(): safely attempt to read from kernel-space
+ * probe_kernel_read(): safely attempt to read from kernel-space
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
@@ -48,18 +26,12 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
  * probe_kernel_read() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
  */
-long probe_kernel_read_strict(void *dst, const void *src, size_t size)
-{
-	return __probe_kernel_read(dst, src, size, true);
-}
-
-static long __probe_kernel_read(void *dst, const void *src, size_t size,
-		bool strict)
+long probe_kernel_read(void *dst, const void *src, size_t size)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
 
-	if (!probe_kernel_read_allowed(dst, src, size, strict))
+	if (!probe_kernel_read_allowed(dst, src, size))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
@@ -73,6 +45,7 @@ static long __probe_kernel_read(void *dst, const void *src, size_t size,
 		return -EFAULT;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
  * probe_user_read(): safely attempt to read from a user-space location
@@ -180,7 +153,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 
 	if (unlikely(count <= 0))
 		return 0;
-	if (!probe_kernel_read_allowed(dst, unsafe_addr, count, true))
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 13/18] maccess: move user access routines together
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 12/18] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 14/18] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Move kernel access vs user access routines together to ease upcoming
ifdefs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 110 +++++++++++++++++++++++++--------------------------
 1 file changed, 55 insertions(+), 55 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 05c44d490b4e3..9773e2253b495 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -47,34 +47,6 @@ long probe_kernel_read(void *dst, const void *src, size_t size)
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
-/**
- * probe_user_read(): safely attempt to read from a user-space location
- * @dst: pointer to the buffer that shall take the data
- * @src: address to read from. This must be a user address.
- * @size: size of the data chunk
- *
- * Safely read from user address @src to the buffer at @dst. If a kernel fault
- * happens, handle that and return -EFAULT.
- */
-long probe_user_read(void *dst, const void __user *src, size_t size)
-{
-	long ret = -EFAULT;
-	mm_segment_t old_fs = get_fs();
-
-	set_fs(USER_DS);
-	if (access_ok(src, size)) {
-		pagefault_disable();
-		ret = __copy_from_user_inatomic(dst, src, size);
-		pagefault_enable();
-	}
-	set_fs(old_fs);
-
-	if (ret)
-		return -EFAULT;
-	return 0;
-}
-EXPORT_SYMBOL_GPL(probe_user_read);
-
 /**
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
@@ -100,33 +72,6 @@ long probe_kernel_write(void *dst, const void *src, size_t size)
 	return 0;
 }
 
-/**
- * probe_user_write(): safely attempt to write to a user-space location
- * @dst: address to write to
- * @src: pointer to the data that shall be written
- * @size: size of the data chunk
- *
- * Safely write to address @dst from the buffer at @src.  If a kernel fault
- * happens, handle that and return -EFAULT.
- */
-long probe_user_write(void __user *dst, const void *src, size_t size)
-{
-	long ret = -EFAULT;
-	mm_segment_t old_fs = get_fs();
-
-	set_fs(USER_DS);
-	if (access_ok(dst, size)) {
-		pagefault_disable();
-		ret = __copy_to_user_inatomic(dst, src, size);
-		pagefault_enable();
-	}
-	set_fs(old_fs);
-
-	if (ret)
-		return -EFAULT;
-	return 0;
-}
-
 /**
  * strncpy_from_kernel_nofault: - Copy a NUL terminated string from unsafe
  *				 address.
@@ -170,6 +115,61 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 	return ret ? -EFAULT : src - unsafe_addr;
 }
 
+/**
+ * probe_user_read(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long probe_user_read(void *dst, const void __user *src, size_t size)
+{
+	long ret = -EFAULT;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(USER_DS);
+	if (access_ok(src, size)) {
+		pagefault_disable();
+		ret = __copy_from_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
+	set_fs(old_fs);
+
+	if (ret)
+		return -EFAULT;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(probe_user_read);
+
+/**
+ * probe_user_write(): safely attempt to write to a user-space location
+ * @dst: address to write to
+ * @src: pointer to the data that shall be written
+ * @size: size of the data chunk
+ *
+ * Safely write to address @dst from the buffer at @src.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+long probe_user_write(void __user *dst, const void *src, size_t size)
+{
+	long ret = -EFAULT;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(USER_DS);
+	if (access_ok(dst, size)) {
+		pagefault_disable();
+		ret = __copy_to_user_inatomic(dst, src, size);
+		pagefault_enable();
+	}
+	set_fs(old_fs);
+
+	if (ret)
+		return -EFAULT;
+	return 0;
+}
+
 /**
  * strncpy_from_user_nofault: - Copy a NUL terminated string from unsafe user
  *				address.
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 13/18] maccess: move user access routines together Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 19:36   ` Linus Torvalds
  2020-05-16  3:42   ` Masami Hiramatsu
  2020-05-13 16:00 ` [PATCH 15/18] x86: use non-set_fs based maccess routines Christoph Hellwig
                   ` (5 subsequent siblings)
  19 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Provide alternative versions of probe_kernel_read, probe_kernel_write
and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
use arch hooks that are modelled after unsafe_{get,put}_user to access
kernel memory in an exception safe way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/maccess.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/mm/maccess.c b/mm/maccess.c
index 9773e2253b495..e9efe2f98e34a 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -12,6 +12,81 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
 	return true;
 }
 
+#ifdef HAVE_ARCH_PROBE_KERNEL
+
+#define probe_kernel_read_loop(dst, src, len, type, err_label)		\
+	while (len >= sizeof(type)) {					\
+		arch_kernel_read(dst, src, type, err_label);		\
+		dst += sizeof(type);					\
+		src += sizeof(type);					\
+		len -= sizeof(type);					\
+	}
+
+long probe_kernel_read(void *dst, const void *src, size_t size)
+{
+	if (!probe_kernel_read_allowed(dst, src, size))
+		return -EFAULT;
+
+	pagefault_disable();
+	probe_kernel_read_loop(dst, src, size, u64, Efault);
+	probe_kernel_read_loop(dst, src, size, u32, Efault);
+	probe_kernel_read_loop(dst, src, size, u16, Efault);
+	probe_kernel_read_loop(dst, src, size, u8, Efault);
+	pagefault_enable();
+	return 0;
+Efault:
+	pagefault_enable();
+	return -EFAULT;
+}
+EXPORT_SYMBOL_GPL(probe_kernel_read);
+
+#define probe_kernel_write_loop(dst, src, len, type, err_label)		\
+	while (len >= sizeof(type)) {					\
+		arch_kernel_write(dst, src, type, err_label);		\
+		dst += sizeof(type);					\
+		src += sizeof(type);					\
+		len -= sizeof(type);					\
+	}
+
+long probe_kernel_write(void *dst, const void *src, size_t size)
+{
+	pagefault_disable();
+	probe_kernel_write_loop(dst, src, size, u64, Efault);
+	probe_kernel_write_loop(dst, src, size, u32, Efault);
+	probe_kernel_write_loop(dst, src, size, u16, Efault);
+	probe_kernel_write_loop(dst, src, size, u8, Efault);
+	pagefault_enable();
+	return 0;
+Efault:
+	pagefault_enable();
+	return -EFAULT;
+}
+
+long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
+{
+	const void *src = unsafe_addr;
+
+	if (unlikely(count <= 0))
+		return 0;
+	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
+		return -EFAULT;
+
+	pagefault_disable();
+	do {
+		arch_kernel_read(dst, src, u8, Efault);
+		dst++;
+		src++;
+	} while (dst[-1] && src - unsafe_addr < count);
+	pagefault_enable();
+
+	dst[-1] = '\0';
+	return src - unsafe_addr;
+Efault:
+	pagefault_enable();
+	dst[-1] = '\0';
+	return -EFAULT;
+}
+#else /* HAVE_ARCH_PROBE_KERNEL */
 /**
  * probe_kernel_read(): safely attempt to read from kernel-space
  * @dst: pointer to the buffer that shall take the data
@@ -114,6 +189,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 
 	return ret ? -EFAULT : src - unsafe_addr;
 }
+#endif /* HAVE_ARCH_PROBE_KERNEL */
 
 /**
  * probe_user_read(): safely attempt to read from a user-space location
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 15/18] x86: use non-set_fs based maccess routines
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 14/18] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 16/18] maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault Christoph Hellwig
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Provide arch_kernel_read and arch_kernel_write routines to implement the
maccess routines without messing with set_fs and without stac/clac that
opens up access to user space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/include/asm/uaccess.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index d8f283b9a569c..765e18417b3ba 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -523,5 +523,21 @@ do {									\
 	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label);	\
 } while (0)
 
+#define HAVE_ARCH_PROBE_KERNEL
+
+#define arch_kernel_read(dst, src, type, err_label)			\
+do {									\
+        int __kr_err;							\
+									\
+	__get_user_size(*((type *)dst), (__force type __user *)src,	\
+			sizeof(type), __kr_err);			\
+        if (unlikely(__kr_err))						\
+		goto err_label;						\
+} while (0)
+
+#define arch_kernel_write(dst, src, type, err_label)			\
+	__put_user_size(*((type *)(src)), (__force type __user *)(dst),	\
+			sizeof(type), err_label)
+
 #endif /* _ASM_X86_UACCESS_H */
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 16/18] maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 15/18] x86: use non-set_fs based maccess routines Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 17/18] maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault Christoph Hellwig
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/kernel/ftrace.c           |  3 +-
 arch/arm/kernel/kgdb.c             |  2 +-
 arch/arm64/kernel/insn.c           |  4 +--
 arch/csky/kernel/ftrace.c          |  5 ++--
 arch/ia64/kernel/ftrace.c          |  6 ++--
 arch/mips/kernel/kprobes.c         |  6 ++--
 arch/nds32/kernel/ftrace.c         |  5 ++--
 arch/parisc/kernel/ftrace.c        |  2 +-
 arch/parisc/kernel/kgdb.c          |  4 +--
 arch/parisc/lib/memcpy.c           |  3 +-
 arch/powerpc/kernel/module_64.c    |  5 ++--
 arch/powerpc/kernel/trace/ftrace.c | 24 +++++++--------
 arch/powerpc/perf/core-book3s.c    |  3 +-
 arch/riscv/kernel/ftrace.c         |  3 +-
 arch/riscv/kernel/patch.c          |  4 +--
 arch/s390/kernel/ftrace.c          |  4 +--
 arch/sh/kernel/ftrace.c            |  6 ++--
 arch/um/kernel/maccess.c           |  2 +-
 arch/x86/include/asm/ptrace.h      |  4 +--
 arch/x86/kernel/dumpstack.c        |  2 +-
 arch/x86/kernel/ftrace.c           |  8 ++---
 arch/x86/kernel/kgdb.c             |  6 ++--
 arch/x86/kernel/kprobes/core.c     |  5 ++--
 arch/x86/kernel/kprobes/opt.c      |  2 +-
 arch/x86/kernel/traps.c            |  3 +-
 arch/x86/mm/fault.c                |  2 +-
 arch/x86/mm/init_32.c              |  2 +-
 arch/x86/mm/maccess.c              |  3 +-
 arch/x86/xen/enlighten_pv.c        |  2 +-
 drivers/char/mem.c                 |  2 +-
 drivers/dio/dio.c                  |  6 ++--
 drivers/input/serio/hp_sdc.c       |  2 +-
 drivers/misc/kgdbts.c              |  6 ++--
 drivers/video/fbdev/hpfb.c         |  2 +-
 fs/proc/kcore.c                    |  3 +-
 include/linux/uaccess.h            | 14 +++++----
 kernel/debug/debug_core.c          |  6 ++--
 kernel/debug/gdbstub.c             |  6 ++--
 kernel/debug/kdb/kdb_main.c        |  3 +-
 kernel/debug/kdb/kdb_support.c     |  7 +++--
 kernel/kthread.c                   |  2 +-
 kernel/trace/bpf_trace.c           |  2 +-
 kernel/trace/trace_kprobe.c        |  4 +--
 kernel/workqueue.c                 | 10 +++----
 mm/maccess.c                       | 48 +++++++++++++++---------------
 mm/rodata_test.c                   |  2 +-
 mm/slub.c                          |  2 +-
 47 files changed, 137 insertions(+), 120 deletions(-)

diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index 10499d44964a2..9a79ef6b1876c 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -84,7 +84,8 @@ static int ftrace_modify_code(unsigned long pc, unsigned long old,
 		old = __opcode_to_mem_arm(old);
 
 	if (validate) {
-		if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
+		if (copy_from_kernel_nofault(&replaced, (void *)pc,
+				MCOUNT_INSN_SIZE))
 			return -EFAULT;
 
 		if (replaced != old)
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 6a95b92966406..7bd30c0a4280d 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -236,7 +236,7 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 	/* patch_text() only supports int-sized breakpoints */
 	BUILD_BUG_ON(sizeof(int) != BREAK_INSTR_SIZE);
 
-	err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+	err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
 				BREAK_INSTR_SIZE);
 	if (err)
 		return err;
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f0..c5f7a7b8d2c3e 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -123,7 +123,7 @@ int __kprobes aarch64_insn_read(void *addr, u32 *insnp)
 	int ret;
 	__le32 val;
 
-	ret = probe_kernel_read(&val, addr, AARCH64_INSN_SIZE);
+	ret = copy_from_kernel_nofault(&val, addr, AARCH64_INSN_SIZE);
 	if (!ret)
 		*insnp = le32_to_cpu(val);
 
@@ -139,7 +139,7 @@ static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
 	raw_spin_lock_irqsave(&patch_lock, flags);
 	waddr = patch_map(addr, FIX_TEXT_POKE0);
 
-	ret = probe_kernel_write(waddr, &insn, AARCH64_INSN_SIZE);
+	ret = copy_to_kernel_nofault(waddr, &insn, AARCH64_INSN_SIZE);
 
 	patch_unmap(FIX_TEXT_POKE0);
 	raw_spin_unlock_irqrestore(&patch_lock, flags);
diff --git a/arch/csky/kernel/ftrace.c b/arch/csky/kernel/ftrace.c
index 44628e3f7fa68..bded363eae714 100644
--- a/arch/csky/kernel/ftrace.c
+++ b/arch/csky/kernel/ftrace.c
@@ -72,7 +72,8 @@ static int ftrace_check_current_nop(unsigned long hook)
 	uint16_t olds[7];
 	unsigned long hook_pos = hook - 2;
 
-	if (probe_kernel_read((void *)olds, (void *)hook_pos, sizeof(nops)))
+	if (copy_from_kernel_nofault((void *)olds, (void *)hook_pos,
+			sizeof(nops)))
 		return -EFAULT;
 
 	if (memcmp((void *)nops, (void *)olds, sizeof(nops))) {
@@ -97,7 +98,7 @@ static int ftrace_modify_code(unsigned long hook, unsigned long target,
 
 	make_jbsr(target, hook, call, nolr);
 
-	ret = probe_kernel_write((void *)hook_pos, enable ? call : nops,
+	ret = copy_to_kernel_nofault((void *)hook_pos, enable ? call : nops,
 				 sizeof(nops));
 	if (ret)
 		return -EPERM;
diff --git a/arch/ia64/kernel/ftrace.c b/arch/ia64/kernel/ftrace.c
index cee411e647ca0..b2ab2d58fb30c 100644
--- a/arch/ia64/kernel/ftrace.c
+++ b/arch/ia64/kernel/ftrace.c
@@ -108,7 +108,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
 		goto skip_check;
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(replaced, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
@@ -117,7 +117,7 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
 
 skip_check:
 	/* replace the text with the new text */
-	if (probe_kernel_write(((void *)ip), new_code, MCOUNT_INSN_SIZE))
+	if (copy_to_kernel_nofault(((void *)ip), new_code, MCOUNT_INSN_SIZE))
 		return -EPERM;
 	flush_icache_range(ip, ip + MCOUNT_INSN_SIZE);
 
@@ -129,7 +129,7 @@ static int ftrace_make_nop_check(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned char __attribute__((aligned(8))) replaced[MCOUNT_INSN_SIZE];
 	unsigned long ip = rec->ip;
 
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(replaced, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 	if (rec->flags & FTRACE_FL_CONVERTED) {
 		struct ftrace_call_insn *call_insn, *tmp_call;
diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index 6cfae2411c044..d043c2f897fc2 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -86,9 +86,9 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
 		goto out;
 	}
 
-	if ((probe_kernel_read(&prev_insn, p->addr - 1,
-				sizeof(mips_instruction)) == 0) &&
-				insn_has_delayslot(prev_insn)) {
+	if (copy_from_kernel_nofault(&prev_insn, p->addr - 1,
+			sizeof(mips_instruction)) == 0 &&
+	    insn_has_delayslot(prev_insn)) {
 		pr_notice("Kprobes for branch delayslot are not supported\n");
 		ret = -EINVAL;
 		goto out;
diff --git a/arch/nds32/kernel/ftrace.c b/arch/nds32/kernel/ftrace.c
index 22ab77ea27ad3..3763b3f8c3db5 100644
--- a/arch/nds32/kernel/ftrace.c
+++ b/arch/nds32/kernel/ftrace.c
@@ -131,13 +131,14 @@ static int __ftrace_modify_code(unsigned long pc, unsigned long *old_insn,
 	unsigned long orig_insn[3];
 
 	if (validate) {
-		if (probe_kernel_read(orig_insn, (void *)pc, MCOUNT_INSN_SIZE))
+		if (copy_from_kernel_nofault(orig_insn, (void *)pc,
+				MCOUNT_INSN_SIZE))
 			return -EFAULT;
 		if (memcmp(orig_insn, old_insn, MCOUNT_INSN_SIZE))
 			return -EINVAL;
 	}
 
-	if (probe_kernel_write((void *)pc, new_insn, MCOUNT_INSN_SIZE))
+	if (copy_to_kernel_nofault((void *)pc, new_insn, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index b836fc61a24f4..1df0f67ed6671 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -172,7 +172,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 
 	ip = (void *)(rec->ip + 4 - size);
 
-	ret = probe_kernel_read(insn, ip, size);
+	ret = copy_from_kernel_nofault(insn, ip, size);
 	if (ret)
 		return ret;
 
diff --git a/arch/parisc/kernel/kgdb.c b/arch/parisc/kernel/kgdb.c
index 664278db9b977..c4554ac13eac7 100644
--- a/arch/parisc/kernel/kgdb.c
+++ b/arch/parisc/kernel/kgdb.c
@@ -154,8 +154,8 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long ip)
 
 int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 {
-	int ret = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
-				BREAK_INSTR_SIZE);
+	int ret = copy_from_kernel_nofault(bpt->saved_instr,
+			(char *)bpt->bpt_addr, BREAK_INSTR_SIZE);
 	if (ret)
 		return ret;
 
diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 9fe662b3b5604..f51811d613d72 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -57,7 +57,8 @@ void * memcpy(void * dst,const void *src, size_t count)
 EXPORT_SYMBOL(raw_copy_in_user);
 EXPORT_SYMBOL(memcpy);
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
+bool copy_from_kernel_nofault_allowed(void *dst, const void *unsafe_src,
+		size_t size)
 {
 	if ((unsigned long)unsafe_src < PAGE_SIZE)
 		return false;
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 007606a48fd98..29067ef485411 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -159,7 +159,7 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
 
 	stub = (struct ppc64_stub_entry *)addr;
 
-	if (probe_kernel_read(&magic, &stub->magic, sizeof(magic))) {
+	if (copy_from_kernel_nofault(&magic, &stub->magic, sizeof(magic))) {
 		pr_err("%s: fault reading magic for stub %lx for %s\n", __func__, addr, mod->name);
 		return -EFAULT;
 	}
@@ -169,7 +169,8 @@ int module_trampoline_target(struct module *mod, unsigned long addr,
 		return -EFAULT;
 	}
 
-	if (probe_kernel_read(&funcdata, &stub->funcdata, sizeof(funcdata))) {
+	if (copy_from_kernel_nofault(&funcdata, &stub->funcdata,
+			sizeof(funcdata))) {
 		pr_err("%s: fault reading funcdata for stub %lx for %s\n", __func__, addr, mod->name);
                 return -EFAULT;
 	}
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 7ea0ca044b650..60ee6813830c3 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -67,7 +67,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
 	 */
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(&replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(&replaced, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
@@ -128,7 +128,7 @@ __ftrace_make_nop(struct module *mod,
 	unsigned int op, pop;
 
 	/* read where this goes */
-	if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+	if (copy_from_kernel_nofault(&op, (void *)ip, sizeof(int))) {
 		pr_err("Fetching opcode failed.\n");
 		return -EFAULT;
 	}
@@ -162,7 +162,7 @@ __ftrace_make_nop(struct module *mod,
 	/* When using -mkernel_profile there is no load to jump over */
 	pop = PPC_INST_NOP;
 
-	if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
+	if (copy_from_kernel_nofault(&op, (void *)(ip - 4), 4)) {
 		pr_err("Fetching instruction at %lx failed.\n", ip - 4);
 		return -EFAULT;
 	}
@@ -193,7 +193,7 @@ __ftrace_make_nop(struct module *mod,
 	 * Check what is in the next instruction. We can see ld r2,40(r1), but
 	 * on first pass after boot we will see mflr r0.
 	 */
-	if (probe_kernel_read(&op, (void *)(ip+4), MCOUNT_INSN_SIZE)) {
+	if (copy_from_kernel_nofault(&op, (void *)(ip+4), MCOUNT_INSN_SIZE)) {
 		pr_err("Fetching op failed.\n");
 		return -EFAULT;
 	}
@@ -222,7 +222,7 @@ __ftrace_make_nop(struct module *mod,
 	unsigned long ip = rec->ip;
 	unsigned long tramp;
 
-	if (probe_kernel_read(&op, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(&op, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	/* Make sure that that this is still a 24bit jump */
@@ -245,7 +245,7 @@ __ftrace_make_nop(struct module *mod,
 	pr_devel("ip:%lx jumps to %lx", ip, tramp);
 
 	/* Find where the trampoline jumps to */
-	if (probe_kernel_read(jmp, (void *)tramp, sizeof(jmp))) {
+	if (copy_from_kernel_nofault(jmp, (void *)tramp, sizeof(jmp))) {
 		pr_err("Failed to read %lx\n", tramp);
 		return -EFAULT;
 	}
@@ -341,7 +341,7 @@ static int setup_mcount_compiler_tramp(unsigned long tramp)
 			return -1;
 
 	/* New trampoline -- read where this goes */
-	if (probe_kernel_read(&op, (void *)tramp, sizeof(int))) {
+	if (copy_from_kernel_nofault(&op, (void *)tramp, sizeof(int))) {
 		pr_debug("Fetching opcode failed.\n");
 		return -1;
 	}
@@ -391,7 +391,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned int op;
 
 	/* Read where this goes */
-	if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+	if (copy_from_kernel_nofault(&op, (void *)ip, sizeof(int))) {
 		pr_err("Fetching opcode failed.\n");
 		return -EFAULT;
 	}
@@ -516,7 +516,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	struct module *mod = rec->arch.mod;
 
 	/* read where this goes */
-	if (probe_kernel_read(op, ip, sizeof(op)))
+	if (copy_from_kernel_nofault(op, ip, sizeof(op)))
 		return -EFAULT;
 
 	if (!expected_nop_sequence(ip, op[0], op[1])) {
@@ -578,7 +578,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned long ip = rec->ip;
 
 	/* read where this goes */
-	if (probe_kernel_read(&op, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(&op, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	/* It should be pointing to a nop */
@@ -634,7 +634,7 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace *rec, unsigned long addr)
 	}
 
 	/* Make sure we have a nop */
-	if (probe_kernel_read(&op, ip, sizeof(op))) {
+	if (copy_from_kernel_nofault(&op, ip, sizeof(op))) {
 		pr_err("Unable to read ftrace location %p\n", ip);
 		return -EFAULT;
 	}
@@ -712,7 +712,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	}
 
 	/* read where this goes */
-	if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
+	if (copy_from_kernel_nofault(&op, (void *)ip, sizeof(int))) {
 		pr_err("Fetching opcode failed.\n");
 		return -EFAULT;
 	}
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf858f36..50bc9f0eb6be3 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -418,7 +418,8 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	__u64 target;
 
 	if (is_kernel_addr(addr)) {
-		if (probe_kernel_read(&instr, (void *)addr, sizeof(instr)))
+		if (copy_from_kernel_nofault(&instr, (void *)addr,
+				sizeof(instr)))
 			return 0;
 
 		return branch_target(&instr);
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index ce69b34ff55d0..143a22ab8853e 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -25,7 +25,8 @@ static int ftrace_check_current_call(unsigned long hook_pos,
 	 * Read the text we want to modify;
 	 * return must be -EFAULT on read error
 	 */
-	if (probe_kernel_read(replaced, (void *)hook_pos, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
+			MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	/*
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 8a4fc65ee0222..9a0ade9fa10b0 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -57,7 +57,7 @@ static int __kprobes riscv_insn_write(void *addr, const void *insn, size_t len)
 
 	waddr = patch_map(addr, FIX_TEXT_POKE0);
 
-	ret = probe_kernel_write(waddr, insn, len);
+	ret = copy_to_kernel_nofault(waddr, insn, len);
 
 	patch_unmap(FIX_TEXT_POKE0);
 
@@ -71,7 +71,7 @@ static int __kprobes riscv_insn_write(void *addr, const void *insn, size_t len)
 #else
 static int __kprobes riscv_insn_write(void *addr, const void *insn, size_t len)
 {
-	return probe_kernel_write(addr, insn, len);
+	return copy_to_kernel_nofault(addr, insn, len);
 }
 #endif /* CONFIG_MMU */
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 4cd9b1ada8340..1576f0eede771 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -99,7 +99,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
 {
 	struct ftrace_insn orig, new, old;
 
-	if (probe_kernel_read(&old, (void *) rec->ip, sizeof(old)))
+	if (copy_from_kernel_nofault(&old, (void *) rec->ip, sizeof(old)))
 		return -EFAULT;
 	if (addr == MCOUNT_ADDR) {
 		/* Initial code replacement */
@@ -121,7 +121,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	struct ftrace_insn orig, new, old;
 
-	if (probe_kernel_read(&old, (void *) rec->ip, sizeof(old)))
+	if (copy_from_kernel_nofault(&old, (void *) rec->ip, sizeof(old)))
 		return -EFAULT;
 	/* Replace nop with an ftrace call. */
 	ftrace_generate_nop_insn(&orig);
diff --git a/arch/sh/kernel/ftrace.c b/arch/sh/kernel/ftrace.c
index 1b04270e5460e..0646c59618466 100644
--- a/arch/sh/kernel/ftrace.c
+++ b/arch/sh/kernel/ftrace.c
@@ -119,7 +119,7 @@ static void ftrace_mod_code(void)
 	 * But if one were to fail, then they all should, and if one were
 	 * to succeed, then they all should.
 	 */
-	mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
+	mod_code_status = copy_to_kernel_nofault(mod_code_ip, mod_code_newcode,
 					     MCOUNT_INSN_SIZE);
 
 	/* if we fail, then kill any new writers */
@@ -203,7 +203,7 @@ static int ftrace_modify_code(unsigned long ip, unsigned char *old_code,
 	 */
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(replaced, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
@@ -268,7 +268,7 @@ static int ftrace_mod(unsigned long ip, unsigned long old_addr,
 {
 	unsigned char code[MCOUNT_INSN_SIZE];
 
-	if (probe_kernel_read(code, (void *)ip, MCOUNT_INSN_SIZE))
+	if (copy_from_kernel_nofault(code, (void *)ip, MCOUNT_INSN_SIZE))
 		return -EFAULT;
 
 	if (old_addr != __raw_readl((unsigned long *)code))
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
index 734f3d7e57c0f..cedf73e9e8ce2 100644
--- a/arch/um/kernel/maccess.c
+++ b/arch/um/kernel/maccess.c
@@ -7,7 +7,7 @@
 #include <linux/kernel.h>
 #include <os.h>
 
-bool probe_kernel_read_allowed(void *dst, const void *src, size_t size)
+bool copy_from_kernel_nofault_allowed(void *dst, const void *src, size_t size)
 {
 	void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
 
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6d6475fdd3278..62ac40751276f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -278,7 +278,7 @@ static inline unsigned long *regs_get_kernel_stack_nth_addr(struct pt_regs *regs
 }
 
 /* To avoid include hell, we can't include uaccess.h */
-extern long probe_kernel_read(void *dst, const void *src, size_t size);
+extern long copy_from_kernel_nofault(void *dst, const void *src, size_t size);
 
 /**
  * regs_get_kernel_stack_nth() - get Nth entry of the stack
@@ -298,7 +298,7 @@ static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
 
 	addr = regs_get_kernel_stack_nth_addr(regs, n);
 	if (addr) {
-		ret = probe_kernel_read(&val, addr, sizeof(val));
+		ret = copy_from_kernel_nofault(&val, addr, sizeof(val));
 		if (!ret)
 			return val;
 	}
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index ae64ec7f752f4..f8dd06ccd8d2e 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -106,7 +106,7 @@ void show_opcodes(struct pt_regs *regs, const char *loglvl)
 	bad_ip = user_mode(regs) &&
 		__chk_range_not_ok(prologue, OPCODE_BUFSIZE, TASK_SIZE_MAX);
 
-	if (bad_ip || probe_kernel_read(opcodes, (u8 *)prologue,
+	if (bad_ip || copy_from_kernel_nofault(opcodes, (u8 *)prologue,
 					OPCODE_BUFSIZE)) {
 		printk("%sCode: Bad RIP value.\n", loglvl);
 	} else {
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 37a0aeaf89e77..018adf1f18bea 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -86,7 +86,7 @@ static int ftrace_verify_code(unsigned long ip, const char *old_code)
 	 * sure what we read is what we expected it to be before modifying it.
 	 */
 	/* read the text we want to modify */
-	if (probe_kernel_read(cur_code, (void *)ip, MCOUNT_INSN_SIZE)) {
+	if (copy_from_kernel_nofault(cur_code, (void *)ip, MCOUNT_INSN_SIZE)) {
 		WARN_ON(1);
 		return -EFAULT;
 	}
@@ -354,7 +354,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);
 
 	/* Copy ftrace_caller onto the trampoline memory */
-	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
+	ret = copy_from_kernel_nofault(trampoline, (void *)start_offset, size);
 	if (WARN_ON(ret < 0))
 		goto fail;
 
@@ -362,7 +362,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	/* The trampoline ends with ret(q) */
 	retq = (unsigned long)ftrace_stub;
-	ret = probe_kernel_read(ip, (void *)retq, RET_SIZE);
+	ret = copy_from_kernel_nofault(ip, (void *)retq, RET_SIZE);
 	if (WARN_ON(ret < 0))
 		goto fail;
 
@@ -471,7 +471,7 @@ static void *addr_from_call(void *ptr)
 	union text_poke_insn call;
 	int ret;
 
-	ret = probe_kernel_read(&call, ptr, CALL_INSN_SIZE);
+	ret = copy_from_kernel_nofault(&call, ptr, CALL_INSN_SIZE);
 	if (WARN_ON_ONCE(ret < 0))
 		return NULL;
 
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index c44fe7d8d9a4e..68acd30c6b878 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -732,11 +732,11 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 	int err;
 
 	bpt->type = BP_BREAKPOINT;
-	err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+	err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
 				BREAK_INSTR_SIZE);
 	if (err)
 		return err;
-	err = probe_kernel_write((char *)bpt->bpt_addr,
+	err = copy_to_kernel_nofault((char *)bpt->bpt_addr,
 				 arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE);
 	if (!err)
 		return err;
@@ -768,7 +768,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
 	return 0;
 
 knl_write:
-	return probe_kernel_write((char *)bpt->bpt_addr,
+	return copy_to_kernel_nofault((char *)bpt->bpt_addr,
 				  (char *)bpt->saved_instr, BREAK_INSTR_SIZE);
 }
 
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab0..ecffda528f86f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -243,7 +243,7 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr)
 	 * Fortunately, we know that the original code is the ideal 5-byte
 	 * long NOP.
 	 */
-	if (probe_kernel_read(buf, (void *)addr,
+	if (copy_from_kernel_nofault(buf, (void *)addr,
 		MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
 		return 0UL;
 
@@ -346,7 +346,8 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
 		return 0;
 
 	/* This can access kernel text if given address is not recovered */
-	if (probe_kernel_read(dest, (void *)recovered_insn, MAX_INSN_SIZE))
+	if (copy_from_kernel_nofault(dest, (void *)recovered_insn,
+			MAX_INSN_SIZE))
 		return 0;
 
 	kernel_insn_init(insn, dest, MAX_INSN_SIZE);
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index ea13f68882849..85696911093f4 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -56,7 +56,7 @@ unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr)
 	 * overwritten by jump destination address. In this case, original
 	 * bytes must be recovered from op->optinsn.copied_insn buffer.
 	 */
-	if (probe_kernel_read(buf, (void *)addr,
+	if (copy_from_kernel_nofault(buf, (void *)addr,
 		MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
 		return 0UL;
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d54cffdc7cac2..96809f6aad67d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -483,7 +483,8 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 	u8 insn_buf[MAX_INSN_SIZE];
 	struct insn insn;
 
-	if (probe_kernel_read(insn_buf, (void *)regs->ip, MAX_INSN_SIZE))
+	if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip,
+			MAX_INSN_SIZE))
 		return GP_NO_HINT;
 
 	kernel_insn_init(&insn, insn_buf, MAX_INSN_SIZE);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a51df516b87bf..994e207abdf64 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -590,7 +590,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 		return;
 	}
 
-	if (probe_kernel_read(&desc, (void *)(gdt->address + offset),
+	if (copy_from_kernel_nofault(&desc, (void *)(gdt->address + offset),
 			      sizeof(struct ldttss_desc))) {
 		pr_alert("%s: 0x%hx -- GDT entry is not readable\n",
 			 name, index);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 4222a010057a9..4253c95968932 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -758,7 +758,7 @@ static void __init test_wp_bit(void)
 
 	__set_fixmap(FIX_WP_TEST, __pa_symbol(empty_zero_page), PAGE_KERNEL_RO);
 
-	if (probe_kernel_write((char *)fix_to_virt(FIX_WP_TEST), &z, 1)) {
+	if (copy_to_kernel_nofault((char *)fix_to_virt(FIX_WP_TEST), &z, 1)) {
 		clear_fixmap(FIX_WP_TEST);
 		printk(KERN_CONT "Ok.\n");
 		return;
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index a1bd81677aa72..a29111d25a990 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -26,7 +26,8 @@ static __always_inline bool invalid_probe_range(u64 vaddr)
 }
 #endif
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size)
+bool copy_from_kernel_nofault_allowed(void *dst, const void *unsafe_src,
+		size_t size)
 {
 	return !invalid_probe_range((unsigned long)unsafe_src);
 }
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 507f4fb88fa7f..0a76fd298d896 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -387,7 +387,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
 	preempt_disable();
 
-	probe_kernel_read(&dummy, v, 1);
+	copy_from_kernel_nofault(&dummy, v, 1);
 
 	if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
 		BUG();
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 43dd0891ca1ed..12774655aff45 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -167,7 +167,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 			if (!ptr)
 				goto failed;
 
-			probe = probe_kernel_read(bounce, ptr, sz);
+			probe = copy_from_kernel_nofault(bounce, ptr, sz);
 			unxlate_dev_mem_ptr(p, ptr);
 			if (probe)
 				goto failed;
diff --git a/drivers/dio/dio.c b/drivers/dio/dio.c
index c9aa15fb86a9a..193b40e7aec03 100644
--- a/drivers/dio/dio.c
+++ b/drivers/dio/dio.c
@@ -135,7 +135,8 @@ int __init dio_find(int deviceid)
 		else
 			va = ioremap(pa, PAGE_SIZE);
 
-                if (probe_kernel_read(&i, (unsigned char *)va + DIO_IDOFF, 1)) {
+		if (copy_from_kernel_nofault(&i,
+				(unsigned char *)va + DIO_IDOFF, 1)) {
 			if (scode >= DIOII_SCBASE)
 				iounmap(va);
                         continue;             /* no board present at that select code */
@@ -208,7 +209,8 @@ static int __init dio_init(void)
 		else
 			va = ioremap(pa, PAGE_SIZE);
 
-                if (probe_kernel_read(&i, (unsigned char *)va + DIO_IDOFF, 1)) {
+		if (copy_from_kernel_nofault(&i,
+				(unsigned char *)va + DIO_IDOFF, 1)) {
 			if (scode >= DIOII_SCBASE)
 				iounmap(va);
                         continue;              /* no board present at that select code */
diff --git a/drivers/input/serio/hp_sdc.c b/drivers/input/serio/hp_sdc.c
index 654252361653d..13eacf6ab4310 100644
--- a/drivers/input/serio/hp_sdc.c
+++ b/drivers/input/serio/hp_sdc.c
@@ -1021,7 +1021,7 @@ static int __init hp_sdc_register(void)
 	hp_sdc.base_io	 = (unsigned long) 0xf0428000;
 	hp_sdc.data_io	 = (unsigned long) hp_sdc.base_io + 1;
 	hp_sdc.status_io = (unsigned long) hp_sdc.base_io + 3;
-	if (!probe_kernel_read(&i, (unsigned char *)hp_sdc.data_io, 1))
+	if (!copy_from_kernel_nofault(&i, (unsigned char *)hp_sdc.data_io, 1))
 		hp_sdc.dev = (void *)1;
 	hp_sdc.dev_err   = hp_sdc_init();
 #endif
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index bccd341e9ae16..d5d2af4d10e66 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -828,7 +828,7 @@ static void run_plant_and_detach_test(int is_early)
 	char before[BREAK_INSTR_SIZE];
 	char after[BREAK_INSTR_SIZE];
 
-	probe_kernel_read(before, (char *)kgdbts_break_test,
+	copy_from_kernel_nofault(before, (char *)kgdbts_break_test,
 	  BREAK_INSTR_SIZE);
 	init_simple_test();
 	ts.tst = plant_and_detach_test;
@@ -836,8 +836,8 @@ static void run_plant_and_detach_test(int is_early)
 	/* Activate test with initial breakpoint */
 	if (!is_early)
 		kgdb_breakpoint();
-	probe_kernel_read(after, (char *)kgdbts_break_test,
-	  BREAK_INSTR_SIZE);
+	copy_from_kernel_nofault(after, (char *)kgdbts_break_test,
+			BREAK_INSTR_SIZE);
 	if (memcmp(before, after, BREAK_INSTR_SIZE)) {
 		printk(KERN_CRIT "kgdbts: ERROR kgdb corrupted memory\n");
 		panic("kgdb memory corruption");
diff --git a/drivers/video/fbdev/hpfb.c b/drivers/video/fbdev/hpfb.c
index f02be0db335e9..8d418abdd7678 100644
--- a/drivers/video/fbdev/hpfb.c
+++ b/drivers/video/fbdev/hpfb.c
@@ -402,7 +402,7 @@ int __init hpfb_init(void)
 	if (err)
 		return err;
 
-	err = probe_kernel_read(&i, (unsigned char *)INTFBVADDR + DIO_IDOFF, 1);
+	err = copy_from_kernel_nofault(&i, (unsigned char *)INTFBVADDR + DIO_IDOFF, 1);
 
 	if (!err && (i == DIO_ID_FBUFFER) && topcat_sid_ok(sid = DIO_SECID(INTFBVADDR))) {
 		if (!request_mem_region(INTFBPADDR, DIO_DEVSIZE, "Internal Topcat"))
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 8ba492d44e689..e502414b35564 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -512,7 +512,8 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 				 * Using bounce buffer to bypass the
 				 * hardened user copy kernel text checks.
 				 */
-				if (probe_kernel_read(buf, (void *) start, tsz)) {
+				if (copy_from_kernel_nofault(buf, (void *)start,
+						tsz)) {
 					if (clear_user(buffer, tsz)) {
 						ret = -EFAULT;
 						goto out;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 78e0ff8641559..849bc3dca54d6 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -301,13 +301,15 @@ copy_struct_from_user(void *dst, size_t ksize, const void __user *src,
 	return 0;
 }
 
-bool probe_kernel_read_allowed(void *dst, const void *unsafe_src, size_t size);
+bool copy_from_kernel_nofault_allowed(void *dst, const void *unsafe_src,
+		size_t size);
 
-extern long probe_kernel_read(void *dst, const void *src, size_t size);
-extern long probe_user_read(void *dst, const void __user *src, size_t size);
+long copy_from_kernel_nofault(void *dst, const void *src, size_t size);
+long notrace copy_to_kernel_nofault(void *dst, const void *src, size_t size);
 
-extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
-extern long notrace probe_user_write(void __user *dst, const void *src, size_t size);
+extern long probe_user_read(void *dst, const void __user *src, size_t size);
+extern long notrace probe_user_write(void __user *dst, const void *src,
+		size_t size);
 
 long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
 		long count);
@@ -324,7 +326,7 @@ long strnlen_user_nofault(const void __user *unsafe_addr, long count);
  * Returns 0 on success, or -EFAULT.
  */
 #define probe_kernel_address(addr, retval)		\
-	probe_kernel_read(&retval, addr, sizeof(retval))
+	copy_from_kernel_nofault(&retval, addr, sizeof(retval))
 
 #ifndef user_access_begin
 #define user_access_begin(ptr,len) access_ok(ptr, len)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 2b7c9b67931d6..b5c2492e0b010 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -171,18 +171,18 @@ int __weak kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 {
 	int err;
 
-	err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+	err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
 				BREAK_INSTR_SIZE);
 	if (err)
 		return err;
-	err = probe_kernel_write((char *)bpt->bpt_addr,
+	err = copy_to_kernel_nofault((char *)bpt->bpt_addr,
 				 arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE);
 	return err;
 }
 
 int __weak kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
 {
-	return probe_kernel_write((char *)bpt->bpt_addr,
+	return copy_to_kernel_nofault((char *)bpt->bpt_addr,
 				  (char *)bpt->saved_instr, BREAK_INSTR_SIZE);
 }
 
diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
index 4b280fc7dd675..61774aec46b4c 100644
--- a/kernel/debug/gdbstub.c
+++ b/kernel/debug/gdbstub.c
@@ -247,7 +247,7 @@ char *kgdb_mem2hex(char *mem, char *buf, int count)
 	 */
 	tmp = buf + count;
 
-	err = probe_kernel_read(tmp, mem, count);
+	err = copy_from_kernel_nofault(tmp, mem, count);
 	if (err)
 		return NULL;
 	while (count > 0) {
@@ -283,7 +283,7 @@ int kgdb_hex2mem(char *buf, char *mem, int count)
 		*tmp_raw |= hex_to_bin(*tmp_hex--) << 4;
 	}
 
-	return probe_kernel_write(mem, tmp_raw, count);
+	return copy_to_kernel_nofault(mem, tmp_raw, count);
 }
 
 /*
@@ -335,7 +335,7 @@ static int kgdb_ebin2mem(char *buf, char *mem, int count)
 		size++;
 	}
 
-	return probe_kernel_write(mem, c, size);
+	return copy_to_kernel_nofault(mem, c, size);
 }
 
 #if DBG_MAX_REG_NUM > 0
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 515379cbf2092..31858c3839ca5 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2325,7 +2325,8 @@ void kdb_ps1(const struct task_struct *p)
 	int cpu;
 	unsigned long tmp;
 
-	if (!p || probe_kernel_read(&tmp, (char *)p, sizeof(unsigned long)))
+	if (!p ||
+	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
 		return;
 
 	cpu = kdb_process_cpu(p);
diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
index b8e6306e7e133..004c5b6c87f89 100644
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -325,7 +325,7 @@ char *kdb_strdup(const char *str, gfp_t type)
  */
 int kdb_getarea_size(void *res, unsigned long addr, size_t size)
 {
-	int ret = probe_kernel_read((char *)res, (char *)addr, size);
+	int ret = copy_from_kernel_nofault((char *)res, (char *)addr, size);
 	if (ret) {
 		if (!KDB_STATE(SUPPRESS)) {
 			kdb_printf("kdb_getarea: Bad address 0x%lx\n", addr);
@@ -350,7 +350,7 @@ int kdb_getarea_size(void *res, unsigned long addr, size_t size)
  */
 int kdb_putarea_size(unsigned long addr, void *res, size_t size)
 {
-	int ret = probe_kernel_read((char *)addr, (char *)res, size);
+	int ret = copy_from_kernel_nofault((char *)addr, (char *)res, size);
 	if (ret) {
 		if (!KDB_STATE(SUPPRESS)) {
 			kdb_printf("kdb_putarea: Bad address 0x%lx\n", addr);
@@ -624,7 +624,8 @@ char kdb_task_state_char (const struct task_struct *p)
 	char state;
 	unsigned long tmp;
 
-	if (!p || probe_kernel_read(&tmp, (char *)p, sizeof(unsigned long)))
+	if (!p ||
+	    copy_from_kernel_nofault(&tmp, (char *)p, sizeof(unsigned long)))
 		return 'E';
 
 	cpu = kdb_process_cpu(p);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfbfa481be3a5..e19ebe61232ca 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -179,7 +179,7 @@ void *kthread_probe_data(struct task_struct *task)
 	struct kthread *kthread = to_kthread(task);
 	void *data = NULL;
 
-	probe_kernel_read(&data, &kthread->data, sizeof(data));
+	copy_from_kernel_nofault(&data, &kthread->data, sizeof(data));
 	return data;
 }
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 5dea4169a8323..1b0b817eb3248 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -186,7 +186,7 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
 	if (unlikely(ret < 0))
 		goto fail;
 
-	ret = probe_kernel_read(dst, unsafe_ptr, size);
+	ret = copy_from_kernel_nofault(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0)) {
 		if (compat)
 			ret = probe_user_read(dst, user_ptr, size);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7725146369731..0e306983cd658 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1208,7 +1208,7 @@ fetch_store_strlen(unsigned long addr)
 	u8 c;
 
 	do {
-		ret = probe_kernel_read(&c, (u8 *)addr + len, 1);
+		ret = copy_from_kernel_nofault(&c, (u8 *)addr + len, 1);
 		if (ret)
 			ret = probe_user_read(&c,
 				(__force u8 __user *)addr + len, 1);
@@ -1287,7 +1287,7 @@ probe_mem_read(void *dest, void *src, size_t size)
 	const void __user *user_ptr = (__force const void __user *)src;
 	int ret;
 
-	ret = probe_kernel_read(dest, src, size);
+	ret = copy_from_kernel_nofault(dest, src, size);
 	if (ret)
 		ret = probe_user_read(dest, user_ptr, size);
 	return ret;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 891ccad5f2716..a1345ffa3a0ba 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4627,11 +4627,11 @@ void print_worker_info(const char *log_lvl, struct task_struct *task)
 	 * Carefully copy the associated workqueue's workfn, name and desc.
 	 * Keep the original last '\0' in case the original is garbage.
 	 */
-	probe_kernel_read(&fn, &worker->current_func, sizeof(fn));
-	probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
-	probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
-	probe_kernel_read(name, wq->name, sizeof(name) - 1);
-	probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
+	copy_from_kernel_nofault(&fn, &worker->current_func, sizeof(fn));
+	copy_from_kernel_nofault(&pwq, &worker->current_pwq, sizeof(pwq));
+	copy_from_kernel_nofault(&wq, &pwq->wq, sizeof(wq));
+	copy_from_kernel_nofault(name, wq->name, sizeof(name) - 1);
+	copy_from_kernel_nofault(desc, worker->desc, sizeof(desc) - 1);
 
 	if (fn || name[0] || desc[0]) {
 		printk("%sWorkqueue: %s %ps", log_lvl, name, fn);
diff --git a/mm/maccess.c b/mm/maccess.c
index e9efe2f98e34a..4c342a69ae71d 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -6,7 +6,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
-bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
+bool __weak copy_from_kernel_nofault_allowed(void *dst, const void *unsafe_src,
 		size_t size)
 {
 	return true;
@@ -14,7 +14,7 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
 
 #ifdef HAVE_ARCH_PROBE_KERNEL
 
-#define probe_kernel_read_loop(dst, src, len, type, err_label)		\
+#define copy_from_kernel_nofault_loop(dst, src, len, type, err_label)	\
 	while (len >= sizeof(type)) {					\
 		arch_kernel_read(dst, src, type, err_label);		\
 		dst += sizeof(type);					\
@@ -22,25 +22,25 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
 		len -= sizeof(type);					\
 	}
 
-long probe_kernel_read(void *dst, const void *src, size_t size)
+long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
 {
-	if (!probe_kernel_read_allowed(dst, src, size))
+	if (!copy_from_kernel_nofault_allowed(dst, src, size))
 		return -EFAULT;
 
 	pagefault_disable();
-	probe_kernel_read_loop(dst, src, size, u64, Efault);
-	probe_kernel_read_loop(dst, src, size, u32, Efault);
-	probe_kernel_read_loop(dst, src, size, u16, Efault);
-	probe_kernel_read_loop(dst, src, size, u8, Efault);
+	copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
+	copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
+	copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
+	copy_from_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
 Efault:
 	pagefault_enable();
 	return -EFAULT;
 }
-EXPORT_SYMBOL_GPL(probe_kernel_read);
+EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);
 
-#define probe_kernel_write_loop(dst, src, len, type, err_label)		\
+#define copy_to_kernel_nofault_loop(dst, src, len, type, err_label)	\
 	while (len >= sizeof(type)) {					\
 		arch_kernel_write(dst, src, type, err_label);		\
 		dst += sizeof(type);					\
@@ -48,13 +48,13 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
 		len -= sizeof(type);					\
 	}
 
-long probe_kernel_write(void *dst, const void *src, size_t size)
+long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
 {
 	pagefault_disable();
-	probe_kernel_write_loop(dst, src, size, u64, Efault);
-	probe_kernel_write_loop(dst, src, size, u32, Efault);
-	probe_kernel_write_loop(dst, src, size, u16, Efault);
-	probe_kernel_write_loop(dst, src, size, u8, Efault);
+	copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
+	copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
+	copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
+	copy_to_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
 Efault:
@@ -68,7 +68,7 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	if (unlikely(count <= 0))
 		return 0;
-	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
+	if (!copy_from_kernel_nofault_allowed(dst, unsafe_addr, count))
 		return -EFAULT;
 
 	pagefault_disable();
@@ -88,7 +88,7 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
 }
 #else /* HAVE_ARCH_PROBE_KERNEL */
 /**
- * probe_kernel_read(): safely attempt to read from kernel-space
+ * copy_from_kernel_nofault(): safely attempt to read from kernel-space
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
@@ -98,15 +98,15 @@ long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
  *
  * We ensure that the copy_from_user is executed in atomic context so that
  * do_page_fault() doesn't attempt to take mmap_sem.  This makes
- * probe_kernel_read() suitable for use within regions where the caller
+ * copy_from_kernel_nofault() suitable for use within regions where the caller
  * already holds mmap_sem, or other locks which nest inside mmap_sem.
  */
-long probe_kernel_read(void *dst, const void *src, size_t size)
+long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
 
-	if (!probe_kernel_read_allowed(dst, src, size))
+	if (!copy_from_kernel_nofault_allowed(dst, src, size))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
@@ -120,10 +120,10 @@ long probe_kernel_read(void *dst, const void *src, size_t size)
 		return -EFAULT;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(probe_kernel_read);
+EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);
 
 /**
- * probe_kernel_write(): safely attempt to write to a location
+ * copy_to_kernel_nofault(): safely attempt to write to a location
  * @dst: address to write to
  * @src: pointer to the data that shall be written
  * @size: size of the data chunk
@@ -131,7 +131,7 @@ EXPORT_SYMBOL_GPL(probe_kernel_read);
  * Safely write to address @dst from the buffer at @src.  If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-long probe_kernel_write(void *dst, const void *src, size_t size)
+long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
 {
 	long ret;
 	mm_segment_t old_fs = get_fs();
@@ -173,7 +173,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 
 	if (unlikely(count <= 0))
 		return 0;
-	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
+	if (!copy_from_kernel_nofault_allowed(dst, unsafe_addr, count))
 		return -EFAULT;
 
 	set_fs(KERNEL_DS);
diff --git a/mm/rodata_test.c b/mm/rodata_test.c
index 5e313fa93276d..2a99df7beeb35 100644
--- a/mm/rodata_test.c
+++ b/mm/rodata_test.c
@@ -25,7 +25,7 @@ void rodata_test(void)
 	}
 
 	/* test 2: write to the variable; this should fault */
-	if (!probe_kernel_write((void *)&rodata_test_data,
+	if (!copy_to_kernel_nofault((void *)&rodata_test_data,
 				(void *)&zero, sizeof(zero))) {
 		pr_err("test data was not read only\n");
 		return;
diff --git a/mm/slub.c b/mm/slub.c
index b762450fc9f07..4af02f7af7118 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -292,7 +292,7 @@ static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
 		return get_freepointer(s, object);
 
 	freepointer_addr = (unsigned long)object + s->offset;
-	probe_kernel_read(&p, (void **)freepointer_addr, sizeof(p));
+	copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
 	return freelist_ptr(s, p, freepointer_addr);
 }
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 17/18] maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 16/18] maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 16:00 ` [PATCH 18/18] maccess: rename probe_kernel_address to get_kernel_nofault Christoph Hellwig
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/kernel/process.c          |  3 ++-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  4 ++--
 arch/powerpc/mm/fault.c                |  2 +-
 arch/powerpc/oprofile/backtrace.c      |  6 ++++--
 arch/powerpc/perf/callchain_32.c       |  2 +-
 arch/powerpc/perf/callchain_64.c       |  2 +-
 arch/powerpc/perf/core-book3s.c        |  3 ++-
 arch/powerpc/sysdev/fsl_pci.c          |  4 ++--
 include/linux/uaccess.h                |  4 ++--
 kernel/trace/bpf_trace.c               |  6 +++---
 kernel/trace/trace_kprobe.c            |  6 +++---
 mm/maccess.c                           | 10 +++++-----
 12 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9c21288f86455..d5d6136b13480 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1294,7 +1294,8 @@ void show_user_instructions(struct pt_regs *regs)
 		for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) {
 			int instr;
 
-			if (probe_user_read(&instr, (void __user *)pc, sizeof(instr))) {
+			if (copy_from_user_nofault(&instr, (void __user *)pc,
+					sizeof(instr))) {
 				seq_buf_printf(&s, "XXXXXXXX ");
 				continue;
 			}
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index aa12cd4078b32..9d25f2eb5a33a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -64,9 +64,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
 	isync();
 
 	if (is_load)
-		ret = probe_user_read(to, (const void __user *)from, n);
+		ret = copy_from_user_nofault(to, (const void __user *)from, n);
 	else
-		ret = probe_user_write((void __user *)to, from, n);
+		ret = copy_to_user_nofault((void __user *)to, from, n);
 
 	/* switch the pid first to avoid running host with unallocated pid */
 	if (quadrant == 1 && pid != old_pid)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 84af6c8eecf71..231664fe9d126 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -280,7 +280,7 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
 		    access_ok(nip, sizeof(*nip))) {
 			unsigned int inst;
 
-			if (!probe_user_read(&inst, nip, sizeof(inst)))
+			if (!copy_from_user_nofault(&inst, nip, sizeof(inst)))
 				return !store_updates_sp(inst);
 			*must_retry = true;
 		}
diff --git a/arch/powerpc/oprofile/backtrace.c b/arch/powerpc/oprofile/backtrace.c
index 6f347fa29f41e..9db7ada79d10d 100644
--- a/arch/powerpc/oprofile/backtrace.c
+++ b/arch/powerpc/oprofile/backtrace.c
@@ -33,7 +33,8 @@ static unsigned int user_getsp32(unsigned int sp, int is_first)
 	 * which means that we've done all that we can do from
 	 * interrupt context.
 	 */
-	if (probe_user_read(stack_frame, (void __user *)p, sizeof(stack_frame)))
+	if (copy_from_user_nofault(stack_frame, (void __user *)p,
+			sizeof(stack_frame)))
 		return 0;
 
 	if (!is_first)
@@ -51,7 +52,8 @@ static unsigned long user_getsp64(unsigned long sp, int is_first)
 {
 	unsigned long stack_frame[3];
 
-	if (probe_user_read(stack_frame, (void __user *)sp, sizeof(stack_frame)))
+	if (copy_from_user_nofault(stack_frame, (void __user *)sp,
+			sizeof(stack_frame)))
 		return 0;
 
 	if (!is_first)
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
index 8aa9510031415..2e21849f82b18 100644
--- a/arch/powerpc/perf/callchain_32.c
+++ b/arch/powerpc/perf/callchain_32.c
@@ -45,7 +45,7 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
 	    ((unsigned long)ptr & 3))
 		return -EFAULT;
 
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
+	rc = copy_from_user_nofault(ret, ptr, sizeof(*ret));
 
 	if (IS_ENABLED(CONFIG_PPC64) && rc)
 		return read_user_stack_slow(ptr, ret, 4);
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
index df1ffd8b20f21..7b0121694ebb7 100644
--- a/arch/powerpc/perf/callchain_64.c
+++ b/arch/powerpc/perf/callchain_64.c
@@ -71,7 +71,7 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	    ((unsigned long)ptr & 7))
 		return -EFAULT;
 
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+	if (!copy_from_user_nofault(ret, ptr, sizeof(*ret)))
 		return 0;
 
 	return read_user_stack_slow(ptr, ret, 8);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 50bc9f0eb6be3..f8072d1e5d172 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -426,7 +426,8 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	}
 
 	/* Userspace: need copy instruction here then translate it */
-	if (probe_user_read(&instr, (unsigned int __user *)addr, sizeof(instr)))
+	if (copy_from_user_nofault(&instr, (unsigned int __user *)addr,
+			sizeof(instr)))
 		return 0;
 
 	target = branch_target(&instr);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 4a8874bc10574..73fa37ca40ef9 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1066,8 +1066,8 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
 
 	if (is_in_pci_mem_space(addr)) {
 		if (user_mode(regs))
-			ret = probe_user_read(&inst, (void __user *)regs->nip,
-					      sizeof(inst));
+			ret = copy_from_user_nofault(&inst,
+					(void __user *)regs->nip, sizeof(inst));
 		else
 			ret = probe_kernel_address((void *)regs->nip, inst);
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 849bc3dca54d6..baef2e09b5ae9 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -307,8 +307,8 @@ bool copy_from_kernel_nofault_allowed(void *dst, const void *unsafe_src,
 long copy_from_kernel_nofault(void *dst, const void *src, size_t size);
 long notrace copy_to_kernel_nofault(void *dst, const void *src, size_t size);
 
-extern long probe_user_read(void *dst, const void __user *src, size_t size);
-extern long notrace probe_user_write(void __user *dst, const void *src,
+long copy_from_user_nofault(void *dst, const void __user *src, size_t size);
+long notrace copy_to_user_nofault(void __user *dst, const void *src,
 		size_t size);
 
 long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 1b0b817eb3248..759ce714c66e8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -139,7 +139,7 @@ static const struct bpf_func_proto bpf_override_return_proto = {
 BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size,
 	   const void __user *, unsafe_ptr)
 {
-	int ret = probe_user_read(dst, unsafe_ptr, size);
+	int ret = copy_from_user_nofault(dst, unsafe_ptr, size);
 
 	if (unlikely(ret < 0))
 		memset(dst, 0, size);
@@ -189,7 +189,7 @@ bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr,
 	ret = copy_from_kernel_nofault(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0)) {
 		if (compat)
-			ret = probe_user_read(dst, user_ptr, size);
+			ret = copy_from_user_nofault(dst, user_ptr, size);
 		if (unlikely(ret < 0))
 			goto fail;
 	}
@@ -316,7 +316,7 @@ BPF_CALL_3(bpf_probe_write_user, void __user *, unsafe_ptr, const void *, src,
 	if (unlikely(!nmi_uaccess_okay()))
 		return -EPERM;
 
-	return probe_user_write(unsafe_ptr, src, size);
+	return copy_to_user_nofault(unsafe_ptr, src, size);
 }
 
 static const struct bpf_func_proto bpf_probe_write_user_proto = {
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 0e306983cd658..c03241f6fd682 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1210,7 +1210,7 @@ fetch_store_strlen(unsigned long addr)
 	do {
 		ret = copy_from_kernel_nofault(&c, (u8 *)addr + len, 1);
 		if (ret)
-			ret = probe_user_read(&c,
+			ret = copy_from_user_nofault(&c,
 				(__force u8 __user *)addr + len, 1);
 		len++;
 	} while (c && ret == 0 && len < MAX_STRING_SIZE);
@@ -1289,7 +1289,7 @@ probe_mem_read(void *dest, void *src, size_t size)
 
 	ret = copy_from_kernel_nofault(dest, src, size);
 	if (ret)
-		ret = probe_user_read(dest, user_ptr, size);
+		ret = copy_from_user_nofault(dest, user_ptr, size);
 	return ret;
 }
 
@@ -1298,7 +1298,7 @@ probe_mem_read_user(void *dest, void *src, size_t size)
 {
 	const void __user *uaddr =  (__force const void __user *)src;
 
-	return probe_user_read(dest, uaddr, size);
+	return copy_from_user_nofault(dest, uaddr, size);
 }
 
 /* Note that we don't verify it, since the code does not come from user space */
diff --git a/mm/maccess.c b/mm/maccess.c
index 4c342a69ae71d..23996dc381f71 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -192,7 +192,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
 #endif /* HAVE_ARCH_PROBE_KERNEL */
 
 /**
- * probe_user_read(): safely attempt to read from a user-space location
+ * copy_from_user_nofault(): safely attempt to read from a user-space location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from. This must be a user address.
  * @size: size of the data chunk
@@ -200,7 +200,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
  * Safely read from user address @src to the buffer at @dst. If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-long probe_user_read(void *dst, const void __user *src, size_t size)
+long copy_from_user_nofault(void *dst, const void __user *src, size_t size)
 {
 	long ret = -EFAULT;
 	mm_segment_t old_fs = get_fs();
@@ -217,10 +217,10 @@ long probe_user_read(void *dst, const void __user *src, size_t size)
 		return -EFAULT;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(probe_user_read);
+EXPORT_SYMBOL_GPL(copy_from_user_nofault);
 
 /**
- * probe_user_write(): safely attempt to write to a user-space location
+ * copy_to_user_nofault(): safely attempt to write to a user-space location
  * @dst: address to write to
  * @src: pointer to the data that shall be written
  * @size: size of the data chunk
@@ -228,7 +228,7 @@ EXPORT_SYMBOL_GPL(probe_user_read);
  * Safely write to address @dst from the buffer at @src.  If a kernel fault
  * happens, handle that and return -EFAULT.
  */
-long probe_user_write(void __user *dst, const void *src, size_t size)
+long copy_to_user_nofault(void __user *dst, const void *src, size_t size)
 {
 	long ret = -EFAULT;
 	mm_segment_t old_fs = get_fs();
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 18/18] maccess: rename probe_kernel_address to get_kernel_nofault
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 17/18] maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault Christoph Hellwig
@ 2020-05-13 16:00 ` Christoph Hellwig
  2020-05-13 19:37 ` clean up and streamline probe_kernel_* and friends v2 Linus Torvalds
  2020-05-13 23:04 ` Daniel Borkmann
  19 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 16:00 UTC (permalink / raw)
  To: x86, Alexei Starovoitov, Daniel Borkmann, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm/kernel/traps.c             |  2 +-
 arch/arm/mm/alignment.c             |  4 ++--
 arch/arm64/kernel/traps.c           |  2 +-
 arch/ia64/include/asm/sections.h    |  2 +-
 arch/parisc/kernel/process.c        |  2 +-
 arch/powerpc/include/asm/sections.h |  2 +-
 arch/powerpc/kernel/kgdb.c          |  2 +-
 arch/powerpc/kernel/process.c       |  2 +-
 arch/powerpc/sysdev/fsl_pci.c       |  2 +-
 arch/riscv/kernel/traps.c           |  4 ++--
 arch/s390/mm/fault.c                |  2 +-
 arch/sh/kernel/traps.c              |  2 +-
 arch/x86/kernel/probe_roms.c        | 20 ++++++++++----------
 arch/x86/kernel/traps.c             |  2 +-
 arch/x86/mm/fault.c                 |  6 +++---
 arch/x86/pci/pcbios.c               |  2 +-
 include/linux/uaccess.h             |  4 ++--
 lib/test_lockup.c                   |  6 +++---
 18 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1e70e7227f0ff..61b91872b3e74 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -391,7 +391,7 @@ int is_valid_bugaddr(unsigned long pc)
 	u32 insn = __opcode_to_mem_arm(BUG_INSTR_VALUE);
 #endif
 
-	if (probe_kernel_address((unsigned *)pc, bkpt))
+	if (get_kernel_nofault((unsigned *)pc, bkpt))
 		return 0;
 
 	return bkpt == insn;
diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index 84718eddae603..236016f13bcd3 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -774,7 +774,7 @@ static int alignment_get_arm(struct pt_regs *regs, u32 *ip, u32 *inst)
 	if (user_mode(regs))
 		fault = get_user(instr, ip);
 	else
-		fault = probe_kernel_address(ip, instr);
+		fault = get_kernel_nofault(ip, instr);
 
 	*inst = __mem_to_opcode_arm(instr);
 
@@ -789,7 +789,7 @@ static int alignment_get_thumb(struct pt_regs *regs, u16 *ip, u16 *inst)
 	if (user_mode(regs))
 		fault = get_user(instr, ip);
 	else
-		fault = probe_kernel_address(ip, instr);
+		fault = get_kernel_nofault(ip, instr);
 
 	*inst = __mem_to_opcode_thumb16(instr);
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be5c573f..2245af23973a2 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -315,7 +315,7 @@ static int call_undef_hook(struct pt_regs *regs)
 
 	if (!user_mode(regs)) {
 		__le32 instr_le;
-		if (probe_kernel_address((__force __le32 *)pc, instr_le))
+		if (get_kernel_nofault((__force __le32 *)pc, instr_le))
 			goto exit;
 		instr = le32_to_cpu(instr_le);
 	} else if (compat_thumb_mode(regs)) {
diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index cea15f2dd38df..ef03eec8666f8 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -35,7 +35,7 @@ static inline void *dereference_function_descriptor(void *ptr)
 	struct fdesc *desc = ptr;
 	void *p;
 
-	if (!probe_kernel_address(&desc->ip, p))
+	if (!get_kernel_nofault(&desc->ip, p))
 		ptr = p;
 	return ptr;
 }
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 230a6422b99f3..c3b0f03bd0bf1 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -293,7 +293,7 @@ void *dereference_function_descriptor(void *ptr)
 	Elf64_Fdesc *desc = ptr;
 	void *p;
 
-	if (!probe_kernel_address(&desc->addr, p))
+	if (!get_kernel_nofault(&desc->addr, p))
 		ptr = p;
 	return ptr;
 }
diff --git a/arch/powerpc/include/asm/sections.h b/arch/powerpc/include/asm/sections.h
index d19871763ed4a..636bb1633de18 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -85,7 +85,7 @@ static inline void *dereference_function_descriptor(void *ptr)
 	struct ppc64_opd_entry *desc = ptr;
 	void *p;
 
-	if (!probe_kernel_address(&desc->funcaddr, p))
+	if (!get_kernel_nofault(&desc->funcaddr, p))
 		ptr = p;
 	return ptr;
 }
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 7dd55eb1259dc..f0045a74e7cea 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -420,7 +420,7 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 	unsigned int instr;
 	unsigned int *addr = (unsigned int *)bpt->bpt_addr;
 
-	err = probe_kernel_address(addr, instr);
+	err = get_kernel_nofault(addr, instr);
 	if (err)
 		return err;
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d5d6136b13480..b77f97073200c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1260,7 +1260,7 @@ static void show_instructions(struct pt_regs *regs)
 #endif
 
 		if (!__kernel_text_address(pc) ||
-		    probe_kernel_address((const void *)pc, instr)) {
+		    get_kernel_nofault((const void *)pc, instr)) {
 			pr_cont("XXXXXXXX ");
 		} else {
 			if (regs->nip == pc)
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 73fa37ca40ef9..483412d5a1973 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1069,7 +1069,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
 			ret = copy_from_user_nofault(&inst,
 					(void __user *)regs->nip, sizeof(inst));
 		else
-			ret = probe_kernel_address((void *)regs->nip, inst);
+			ret = get_kernel_nofault((void *)regs->nip, inst);
 
 		if (!ret && mcheck_handle_load(regs, inst)) {
 			regs->nip += 4;
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 7f58fa53033f6..d807d507bd95a 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -137,7 +137,7 @@ static inline unsigned long get_break_insn_length(unsigned long pc)
 {
 	bug_insn_t insn;
 
-	if (probe_kernel_address((bug_insn_t *)pc, insn))
+	if (get_kernel_nofault((bug_insn_t *)pc, insn))
 		return 0;
 
 	return GET_INSN_LENGTH(insn);
@@ -160,7 +160,7 @@ int is_valid_bugaddr(unsigned long pc)
 
 	if (pc < VMALLOC_START)
 		return 0;
-	if (probe_kernel_address((bug_insn_t *)pc, insn))
+	if (get_kernel_nofault((bug_insn_t *)pc, insn))
 		return 0;
 	if ((insn & __INSN_LENGTH_MASK) == __INSN_LENGTH_32)
 		return (insn == __BUG_INSN_32);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index dedc28be27ab4..ab68eca6b2480 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -106,7 +106,7 @@ static int bad_address(void *p)
 {
 	unsigned long dummy;
 
-	return probe_kernel_address((unsigned long *)p, dummy);
+	return get_kernel_nofault((unsigned long *)p, dummy);
 }
 
 static void dump_pagetable(unsigned long asce, unsigned long address)
diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c
index 63cf17bc760da..27c53c5b84e6b 100644
--- a/arch/sh/kernel/traps.c
+++ b/arch/sh/kernel/traps.c
@@ -118,7 +118,7 @@ int is_valid_bugaddr(unsigned long addr)
 
 	if (addr < PAGE_OFFSET)
 		return 0;
-	if (probe_kernel_address((insn_size_t *)addr, opcode))
+	if (get_kernel_nofault((insn_size_t *)addr, opcode))
 		return 0;
 	if (opcode == TRAPA_BUG_OPCODE)
 		return 1;
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index ee0286390a4c1..77f3341570e84 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -99,7 +99,7 @@ static bool probe_list(struct pci_dev *pdev, unsigned short vendor,
 	unsigned short device;
 
 	do {
-		if (probe_kernel_address(rom_list, device) != 0)
+		if (get_kernel_nofault(rom_list, device) != 0)
 			device = 0;
 
 		if (device && match_id(pdev, vendor, device))
@@ -125,13 +125,13 @@ static struct resource *find_oprom(struct pci_dev *pdev)
 			break;
 
 		rom = isa_bus_to_virt(res->start);
-		if (probe_kernel_address(rom + 0x18, offset) != 0)
+		if (get_kernel_nofault(rom + 0x18, offset) != 0)
 			continue;
 
-		if (probe_kernel_address(rom + offset + 0x4, vendor) != 0)
+		if (get_kernel_nofault(rom + offset + 0x4, vendor) != 0)
 			continue;
 
-		if (probe_kernel_address(rom + offset + 0x6, device) != 0)
+		if (get_kernel_nofault(rom + offset + 0x6, device) != 0)
 			continue;
 
 		if (match_id(pdev, vendor, device)) {
@@ -139,8 +139,8 @@ static struct resource *find_oprom(struct pci_dev *pdev)
 			break;
 		}
 
-		if (probe_kernel_address(rom + offset + 0x8, list) == 0 &&
-		    probe_kernel_address(rom + offset + 0xc, rev) == 0 &&
+		if (get_kernel_nofault(rom + offset + 0x8, list) == 0 &&
+		    get_kernel_nofault(rom + offset + 0xc, rev) == 0 &&
 		    rev >= 3 && list &&
 		    probe_list(pdev, vendor, rom + offset + list)) {
 			oprom = res;
@@ -183,14 +183,14 @@ static int __init romsignature(const unsigned char *rom)
 	const unsigned short * const ptr = (const unsigned short *)rom;
 	unsigned short sig;
 
-	return probe_kernel_address(ptr, sig) == 0 && sig == ROMSIGNATURE;
+	return get_kernel_nofault(ptr, sig) == 0 && sig == ROMSIGNATURE;
 }
 
 static int __init romchecksum(const unsigned char *rom, unsigned long length)
 {
 	unsigned char sum, c;
 
-	for (sum = 0; length && probe_kernel_address(rom++, c) == 0; length--)
+	for (sum = 0; length && get_kernel_nofault(rom++, c) == 0; length--)
 		sum += c;
 	return !length && !sum;
 }
@@ -211,7 +211,7 @@ void __init probe_roms(void)
 
 		video_rom_resource.start = start;
 
-		if (probe_kernel_address(rom + 2, c) != 0)
+		if (get_kernel_nofault(rom + 2, c) != 0)
 			continue;
 
 		/* 0 < length <= 0x7f * 512, historically */
@@ -249,7 +249,7 @@ void __init probe_roms(void)
 		if (!romsignature(rom))
 			continue;
 
-		if (probe_kernel_address(rom + 2, c) != 0)
+		if (get_kernel_nofault(rom + 2, c) != 0)
 			continue;
 
 		/* 0 < length <= 0x7f * 512, historically */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 96809f6aad67d..07739b67da335 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -161,7 +161,7 @@ int is_valid_bugaddr(unsigned long addr)
 	if (addr < TASK_SIZE_MAX)
 		return 0;
 
-	if (probe_kernel_address((unsigned short *)addr, ud))
+	if (get_kernel_nofault((unsigned short *)addr, ud))
 		return 0;
 
 	return ud == INSN_UD0 || ud == INSN_UD2;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 994e207abdf64..7c5e23e7407d8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -98,7 +98,7 @@ check_prefetch_opcode(struct pt_regs *regs, unsigned char *instr,
 		return !instr_lo || (instr_lo>>1) == 1;
 	case 0x00:
 		/* Prefetch instruction is 0x0F0D or 0x0F18 */
-		if (probe_kernel_address(instr, opcode))
+		if (get_kernel_nofault(instr, opcode))
 			return 0;
 
 		*prefetch = (instr_lo == 0xF) &&
@@ -132,7 +132,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
 	while (instr < max_instr) {
 		unsigned char opcode;
 
-		if (probe_kernel_address(instr, opcode))
+		if (get_kernel_nofault(instr, opcode))
 			break;
 
 		instr++;
@@ -441,7 +441,7 @@ static int bad_address(void *p)
 {
 	unsigned long dummy;
 
-	return probe_kernel_address((unsigned long *)p, dummy);
+	return get_kernel_nofault((unsigned long *)p, dummy);
 }
 
 static void dump_pagetable(unsigned long address)
diff --git a/arch/x86/pci/pcbios.c b/arch/x86/pci/pcbios.c
index 9c97d814125eb..b7f8699b18c1f 100644
--- a/arch/x86/pci/pcbios.c
+++ b/arch/x86/pci/pcbios.c
@@ -302,7 +302,7 @@ static const struct pci_raw_ops *__init pci_find_bios(void)
 	     check <= (union bios32 *) __va(0xffff0);
 	     ++check) {
 		long sig;
-		if (probe_kernel_address(&check->fields.signature, sig))
+		if (get_kernel_nofault(&check->fields.signature, sig))
 			continue;
 
 		if (check->fields.signature != BIOS32_SIGNATURE)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index baef2e09b5ae9..5cc94a38a3984 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -319,13 +319,13 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr,
 long strnlen_user_nofault(const void __user *unsafe_addr, long count);
 
 /**
- * probe_kernel_address(): safely attempt to read from a location
+ * get_kernel_nofault(): safely attempt to read from a location
  * @addr: address to read from
  * @retval: read into this variable
  *
  * Returns 0 on success, or -EFAULT.
  */
-#define probe_kernel_address(addr, retval)		\
+#define get_kernel_nofault(addr, retval)		\
 	copy_from_kernel_nofault(&retval, addr, sizeof(retval))
 
 #ifndef user_access_begin
diff --git a/lib/test_lockup.c b/lib/test_lockup.c
index ea09ca335b214..71f7a4c89ed4f 100644
--- a/lib/test_lockup.c
+++ b/lib/test_lockup.c
@@ -419,8 +419,8 @@ static bool test_kernel_ptr(unsigned long addr, int size)
 	/* should be at least readable kernel address */
 	if (access_ok(ptr, 1) ||
 	    access_ok(ptr + size - 1, 1) ||
-	    probe_kernel_address(ptr, buf) ||
-	    probe_kernel_address(ptr + size - 1, buf)) {
+	    get_kernel_nofault(ptr, buf) ||
+	    get_kernel_nofault(ptr + size - 1, buf)) {
 		pr_err("invalid kernel ptr: %#lx\n", addr);
 		return true;
 	}
@@ -437,7 +437,7 @@ static bool __maybe_unused test_magic(unsigned long addr, int offset,
 	if (!addr)
 		return false;
 
-	if (probe_kernel_address(ptr, magic) || magic != expected) {
+	if (get_kernel_nofault(ptr, magic) || magic != expected) {
 		pr_err("invalid magic at %#lx + %#x = %#x, expected %#x\n",
 		       addr, offset, magic, expected);
 		return true;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 16:00 ` [PATCH 11/18] maccess: remove strncpy_from_unsafe Christoph Hellwig
@ 2020-05-13 19:11   ` Linus Torvalds
  2020-05-13 19:28     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 19:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 9:01 AM Christoph Hellwig <hch@lst.de> wrote:
>
> +static void bpf_strncpy(char *buf, long unsafe_addr)
> +{
> +       buf[0] = 0;
> +       if (strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
> +                       BPF_STRNCPY_LEN))
> +               strncpy_from_user_nofault(buf, (void __user *)unsafe_addr,
> +                               BPF_STRNCPY_LEN);
> +}

This seems buggy when I look at it.

It seems to think that strncpy_from_kernel_nofault() returns an error code.

Not so, unless I missed where you changed the rules.

It returns the length of the string for a successful copy. 0 is
actually an error case (for count being <= 0).

So the test for success seems entirely wrong.

Also, I do wonder if we shouldn't gate this on TASK_SIZE, and do the
user trial first. On architectures where this thing is valid in the
first place (ie kernel and user addresses are separate), the test for
address size would allow us to avoid a pointless fault due to an
invalid kernel access to user space.

So I think this function should look something like

  static void bpf_strncpy(char *buf, long unsafe_addr)
  {
          /* Try user address */
          if (unsafe_addr < TASK_SIZE) {
                  void __user *ptr = (void __user *)unsafe_addr;
                  if (strncpy_from_user_nofault(buf, ptr, BPF_STRNCPY_LEN) >= 0)
                          return;
          }

          /* .. fall back on trying kernel access */
          buf[0] = 0;
          strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
BPF_STRNCPY_LEN);
  }

or similar. No?

                   Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 19:11   ` Linus Torvalds
@ 2020-05-13 19:28     ` Christoph Hellwig
  2020-05-13 22:36       ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Daniel Borkmann, Masami Hiramatsu, Andrew Morton, linux-parisc,
	linux-um, Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 12:11:27PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2020 at 9:01 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > +static void bpf_strncpy(char *buf, long unsafe_addr)
> > +{
> > +       buf[0] = 0;
> > +       if (strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
> > +                       BPF_STRNCPY_LEN))
> > +               strncpy_from_user_nofault(buf, (void __user *)unsafe_addr,
> > +                               BPF_STRNCPY_LEN);
> > +}
> 
> This seems buggy when I look at it.
> 
> It seems to think that strncpy_from_kernel_nofault() returns an error code.
> 
> Not so, unless I missed where you changed the rules.

I didn't change the rules, so yes, this is wrong.

> Also, I do wonder if we shouldn't gate this on TASK_SIZE, and do the
> user trial first. On architectures where this thing is valid in the
> first place (ie kernel and user addresses are separate), the test for
> address size would allow us to avoid a pointless fault due to an
> invalid kernel access to user space.
> 
> So I think this function should look something like
> 
>   static void bpf_strncpy(char *buf, long unsafe_addr)
>   {
>           /* Try user address */
>           if (unsafe_addr < TASK_SIZE) {
>                   void __user *ptr = (void __user *)unsafe_addr;
>                   if (strncpy_from_user_nofault(buf, ptr, BPF_STRNCPY_LEN) >= 0)
>                           return;
>           }
> 
>           /* .. fall back on trying kernel access */
>           buf[0] = 0;
>           strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
> BPF_STRNCPY_LEN);
>   }
> 
> or similar. No?

So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway
try the user copy first, which seems odd.

I'd really like to here from the bpf folks what the expected use case
is here, and if the typical argument is kernel or user memory.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-13 16:00 ` [PATCH 14/18] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
@ 2020-05-13 19:36   ` Linus Torvalds
  2020-05-13 19:40     ` Christoph Hellwig
  2020-05-16  3:42   ` Masami Hiramatsu
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 19:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 9:01 AM Christoph Hellwig <hch@lst.de> wrote:
>
> +               arch_kernel_read(dst, src, type, err_label);            \

I'm wondering if

 (a) we shouldn't expose this as an interface in general

 (b) it wouldn't be named differently..

The reason for (a) is that several users of the
"copy_from_kernel_nofault()" interfaces just seem to want a single
access from kernel mode.

The reason for (b) is that if we do expose this as a normal interface,
it shouldn't be called "arch_kernel_read", and it should have the same
semantics as "get_user_unsafe()".

IOW, maybe we should simply do exactly that: have a
"get_kernel_nofault()" thing that looks exactly like
unsafe_get_user().

On x86, it would basically be identical to unsafe_get_user().

And on architectures that only have the copy function, you'd just have
a fallback something like this:

  #define get_kernel_nofault(dst, src, err_label) do {  \
        typeof (*src) __gkn_result;                     \
        if (probe_kernel_read(&__gkn_result, src) < 0)  \
                goto err_label;                         \
        (dst) = __gkn_result;                           \
  } while (0)

and now the people who want to read a single kernel word can just do

        get_kernel_nofault(n, untrusted_pointer, error);

and they're done.

And some day - when we get reliably "asm goto" wiith outputs - that
"get_kernel_fault()" will literally be a single instruction asm with
the proper exception handler marker, the way "put_user_unsafe()"
already works (and the way "put_kernel_nofault()" would already work
if it does the above).

             Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: clean up and streamline probe_kernel_* and friends v2
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2020-05-13 16:00 ` [PATCH 18/18] maccess: rename probe_kernel_address to get_kernel_nofault Christoph Hellwig
@ 2020-05-13 19:37 ` Linus Torvalds
  2020-05-13 23:04 ` Daniel Borkmann
  19 siblings, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 19:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 9:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> this series start cleaning up the safe kernel and user memory probing
> helpers in mm/maccess.c, and then allows architectures to implement
> the kernel probing without overriding the address space limit and
> temporarily allowing access to user memory.  It then switches x86
> over to this new mechanism by reusing the unsafe_* uaccess logic.

Ok, I think I found a bug, and I had one more suggestion, but other
than the two emails I sent this all looks like an improvement to me.

                 Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-13 19:36   ` Linus Torvalds
@ 2020-05-13 19:40     ` Christoph Hellwig
  2020-05-13 19:48       ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Daniel Borkmann, Masami Hiramatsu, Andrew Morton, linux-parisc,
	linux-um, Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 12:36:18PM -0700, Linus Torvalds wrote:
> > +               arch_kernel_read(dst, src, type, err_label);            \
> 
> I'm wondering if
> 
>  (a) we shouldn't expose this as an interface in general

We do export something like it, currently it is called
probe_kernel_address, and the last patch renames it to
get_kernel_nofault.  However it is implemented as a wrapper
around probe_kernel_address / copy_from_kernel_nofault and thus
not quite as efficient and without the magic goto semantics.

>  (b) it wouldn't be named differently..

It probably should with all the renaming..

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-13 19:40     ` Christoph Hellwig
@ 2020-05-13 19:48       ` Linus Torvalds
  2020-05-13 19:54         ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 19:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Daniel Borkmann,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 12:40 PM Christoph Hellwig <hch@lst.de> wrote:
>
> We do export something like it, currently it is called
> probe_kernel_address, and the last patch renames it to
> get_kernel_nofault.  However it is implemented as a wrapper
> around probe_kernel_address / copy_from_kernel_nofault and thus
> not quite as efficient and without the magic goto semantics.

Looking at the current users of "probe_kernel_read()", it looks like
it's almost mostly things that just want a single byte or word.

It's not 100% that: we definitely do several things that want the
"copy" semantics vs the "get" semantics: on the x86 side we have
CALL_INSN_SIZE and MAX_INSN_SIZE, and the ldttss_desc.

But the bulk of them do seem to be a single value.

I don't know if performance really matters here, but to me the whole
"most users seem to want to read a single value" is what makes me
think that maybe that should be the primary model, rather than have
the copy model be the primary one and then we implement the single
value case (badly) with a copy.

It probably doesn't matter that much. I certainly wouldn't hold this
series up over it - it can be a future thing.

         Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-13 19:48       ` Linus Torvalds
@ 2020-05-13 19:54         ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-13 19:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Daniel Borkmann, Masami Hiramatsu, Andrew Morton, linux-parisc,
	linux-um, Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 12:48:54PM -0700, Linus Torvalds wrote:
> Looking at the current users of "probe_kernel_read()", it looks like
> it's almost mostly things that just want a single byte or word.
> 
> It's not 100% that: we definitely do several things that want the
> "copy" semantics vs the "get" semantics: on the x86 side we have
> CALL_INSN_SIZE and MAX_INSN_SIZE, and the ldttss_desc.
> 
> But the bulk of them do seem to be a single value.
> 
> I don't know if performance really matters here, but to me the whole
> "most users seem to want to read a single value" is what makes me
> think that maybe that should be the primary model, rather than have
> the copy model be the primary one and then we implement the single
> value case (badly) with a copy.
> 
> It probably doesn't matter that much. I certainly wouldn't hold this
> series up over it - it can be a future thing.

I can make the get_kernel_nofault implementation suck a little less :)

Note that the arch helper (we could call it unsafe_get_kernel_nofault)
we still need to have a pagefault_disable / pagefault_enable pair
around the calls.  So maybe keep the get_kernel_nofault interface
as-is (without the goto label), and prepare the arch helpers for
being used similar to unsafe_get_user once all architectures are
converted.  And I can throw in a few patches to convert callers
from the copy semantics to the get semantics.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 19:28     ` Christoph Hellwig
@ 2020-05-13 22:36       ` Daniel Borkmann
  2020-05-13 23:03         ` Linus Torvalds
                           ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Daniel Borkmann @ 2020-05-13 22:36 UTC (permalink / raw)
  To: Christoph Hellwig, Linus Torvalds
  Cc: the arch/x86 maintainers, Alexei Starovoitov, Masami Hiramatsu,
	Andrew Morton, linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List

On 5/13/20 9:28 PM, Christoph Hellwig wrote:
> On Wed, May 13, 2020 at 12:11:27PM -0700, Linus Torvalds wrote:
>> On Wed, May 13, 2020 at 9:01 AM Christoph Hellwig <hch@lst.de> wrote:
>>>
>>> +static void bpf_strncpy(char *buf, long unsafe_addr)
>>> +{
>>> +       buf[0] = 0;
>>> +       if (strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
>>> +                       BPF_STRNCPY_LEN))
>>> +               strncpy_from_user_nofault(buf, (void __user *)unsafe_addr,
>>> +                               BPF_STRNCPY_LEN);
>>> +}
>>
>> This seems buggy when I look at it.
>>
>> It seems to think that strncpy_from_kernel_nofault() returns an error code.
>>
>> Not so, unless I missed where you changed the rules.
> 
> I didn't change the rules, so yes, this is wrong.
> 
>> Also, I do wonder if we shouldn't gate this on TASK_SIZE, and do the
>> user trial first. On architectures where this thing is valid in the
>> first place (ie kernel and user addresses are separate), the test for
>> address size would allow us to avoid a pointless fault due to an
>> invalid kernel access to user space.
>>
>> So I think this function should look something like
>>
>>    static void bpf_strncpy(char *buf, long unsafe_addr)
>>    {
>>            /* Try user address */
>>            if (unsafe_addr < TASK_SIZE) {
>>                    void __user *ptr = (void __user *)unsafe_addr;
>>                    if (strncpy_from_user_nofault(buf, ptr, BPF_STRNCPY_LEN) >= 0)
>>                            return;
>>            }
>>
>>            /* .. fall back on trying kernel access */
>>            buf[0] = 0;
>>            strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
>> BPF_STRNCPY_LEN);
>>    }
>>
>> or similar. No?
> 
> So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway
> try the user copy first, which seems odd.
> 
> I'd really like to here from the bpf folks what the expected use case
> is here, and if the typical argument is kernel or user memory.

It's used for both. Given this is enabled on pretty much all program types, my
assumption would be that usage is still more often on kernel memory than user one.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 22:36       ` Daniel Borkmann
@ 2020-05-13 23:03         ` Linus Torvalds
  2020-05-13 23:24           ` Daniel Borkmann
  2020-05-13 23:20         ` Masami Hiramatsu
  2020-05-13 23:28         ` Al Viro
  2 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 23:03 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 3:36 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> It's used for both.

Daniel, BPF real;ly needs to make up its mind about that.

You *cannot* use ti for both.

Yes, it happens to work on x86 and some other architectures.

But on other architectures, the exact same pointer value can be a
kernel pointer or a user pointer.

> Given this is enabled on pretty much all program types, my
> assumption would be that usage is still more often on kernel memory than user one.

You need to pick one.

If you know it is a user pointer, use strncpy_from_user() (possibly
with disable_pagefault() aka strncpy_from_user_nofault()).

And if you know it is a kernel pointer, use strncpy_from_unsafe() (aka
strncpy_from_kernel_nofault()).

You really can't pick the "randomly one or the other guess what I mean " option.

                  Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: clean up and streamline probe_kernel_* and friends v2
  2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2020-05-13 19:37 ` clean up and streamline probe_kernel_* and friends v2 Linus Torvalds
@ 2020-05-13 23:04 ` Daniel Borkmann
  2020-05-13 23:20   ` Linus Torvalds
  2020-05-19  5:50   ` Christoph Hellwig
  19 siblings, 2 replies; 46+ messages in thread
From: Daniel Borkmann @ 2020-05-13 23:04 UTC (permalink / raw)
  To: Christoph Hellwig, x86, Alexei Starovoitov, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton
  Cc: linux-parisc, linux-um, netdev, bpf, linux-mm, linux-kernel

On 5/13/20 6:00 PM, Christoph Hellwig wrote:
> Hi all,
> 
> this series start cleaning up the safe kernel and user memory probing
> helpers in mm/maccess.c, and then allows architectures to implement
> the kernel probing without overriding the address space limit and
> temporarily allowing access to user memory.  It then switches x86
> over to this new mechanism by reusing the unsafe_* uaccess logic.
> 
> This version also switches to the saner copy_{from,to}_kernel_nofault
> naming suggested by Linus.
> 
> I kept the x86 helprs as-is without calling unsage_{get,put}_user as
> that avoids a number of hard to trace casts, and it will still work
> with the asm-goto based version easily.

Aside from comments on list, the series looks reasonable to me. For BPF
the bpf_probe_read() helper would be slightly penalized for probing user
memory given we now test on copy_from_kernel_nofault() first and if that
fails only then fall back to copy_from_user_nofault(), but it seems
small enough that it shouldn't matter too much and aside from that we have
the newer bpf_probe_read_kernel() and bpf_probe_read_user() anyway that
BPF progs should use instead, so I think it's okay.

For patch 14 and patch 15, do you roughly know the performance gain with
the new probe_kernel_read_loop() + arch_kernel_read() approach?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: clean up and streamline probe_kernel_* and friends v2
  2020-05-13 23:04 ` Daniel Borkmann
@ 2020-05-13 23:20   ` Linus Torvalds
  2020-05-19  5:50   ` Christoph Hellwig
  1 sibling, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 23:20 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 4:04 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Aside from comments on list, the series looks reasonable to me. For BPF
> the bpf_probe_read() helper would be slightly penalized for probing user
> memory given we now test on copy_from_kernel_nofault() first and if that
> fails only then fall back to copy_from_user_nofault(),

Again, no.

If you can't tell that one or the other is always the right thing,
then that function is simply buggy and wrong.

On sparc and on s390, address X can be _both_ a kernel address and a
user address. You need to specify which it is (by using the proper
function). The whole "try one first, then the other" doesn't work.
They may both "work", and by virtue of that, unless you can state
"yes, we always want user space" or "yes, we always want kernel", that
"try one or the other" isn't valid.

And it can be a real security issue. If a user program can be made to
read kernel memory when BPF validated things as a user pointer, it's
an obvious security issue.

But it can be a security issue the other way around too: if the BPF
code expects to get a kernel string, but user space can fool it into
reading a user string instead by mapping something of its own into the
user space address that aliases the kernel space address, then you can
presumably fool the BPF program to do bad things too (eg mess up any
BPF packet switching routines?).

So BPF really really really needs to specify which one it is. Not
specifying it and saying "whichever" is a bug, and a security issue.

             Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 22:36       ` Daniel Borkmann
  2020-05-13 23:03         ` Linus Torvalds
@ 2020-05-13 23:20         ` Masami Hiramatsu
  2020-05-13 23:59           ` Linus Torvalds
  2020-05-13 23:28         ` Al Viro
  2 siblings, 1 reply; 46+ messages in thread
From: Masami Hiramatsu @ 2020-05-13 23:20 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Christoph Hellwig, Linus Torvalds, the arch/x86 maintainers,
	Alexei Starovoitov, Masami Hiramatsu, Andrew Morton,
	linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List

On Thu, 14 May 2020 00:36:28 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 5/13/20 9:28 PM, Christoph Hellwig wrote:
> > On Wed, May 13, 2020 at 12:11:27PM -0700, Linus Torvalds wrote:
> >> On Wed, May 13, 2020 at 9:01 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>
> >>> +static void bpf_strncpy(char *buf, long unsafe_addr)
> >>> +{
> >>> +       buf[0] = 0;
> >>> +       if (strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
> >>> +                       BPF_STRNCPY_LEN))
> >>> +               strncpy_from_user_nofault(buf, (void __user *)unsafe_addr,
> >>> +                               BPF_STRNCPY_LEN);
> >>> +}
> >>
> >> This seems buggy when I look at it.
> >>
> >> It seems to think that strncpy_from_kernel_nofault() returns an error code.
> >>
> >> Not so, unless I missed where you changed the rules.
> > 
> > I didn't change the rules, so yes, this is wrong.
> > 
> >> Also, I do wonder if we shouldn't gate this on TASK_SIZE, and do the
> >> user trial first. On architectures where this thing is valid in the
> >> first place (ie kernel and user addresses are separate), the test for
> >> address size would allow us to avoid a pointless fault due to an
> >> invalid kernel access to user space.
> >>
> >> So I think this function should look something like
> >>
> >>    static void bpf_strncpy(char *buf, long unsafe_addr)
> >>    {
> >>            /* Try user address */
> >>            if (unsafe_addr < TASK_SIZE) {
> >>                    void __user *ptr = (void __user *)unsafe_addr;
> >>                    if (strncpy_from_user_nofault(buf, ptr, BPF_STRNCPY_LEN) >= 0)
> >>                            return;
> >>            }
> >>
> >>            /* .. fall back on trying kernel access */
> >>            buf[0] = 0;
> >>            strncpy_from_kernel_nofault(buf, (void *)unsafe_addr,
> >> BPF_STRNCPY_LEN);
> >>    }
> >>
> >> or similar. No?
> > 
> > So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway
> > try the user copy first, which seems odd.
> > 
> > I'd really like to here from the bpf folks what the expected use case
> > is here, and if the typical argument is kernel or user memory.
> 
> It's used for both. Given this is enabled on pretty much all program types, my
> assumption would be that usage is still more often on kernel memory than user one.

For trace_kprobe.c current order (kernel -> user fallback) is preferred
because it has another function dedicated for user memory.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 23:03         ` Linus Torvalds
@ 2020-05-13 23:24           ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2020-05-13 23:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Masami Hiramatsu, Andrew Morton, linux-parisc, linux-um, Netdev,
	bpf, Linux-MM, Linux Kernel Mailing List, bgregg

On 5/14/20 1:03 AM, Linus Torvalds wrote:
> On Wed, May 13, 2020 at 3:36 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> It's used for both.
> 
> Daniel, BPF real;ly needs to make up its mind about that.
> 
> You *cannot* use ti for both.
> 
> Yes, it happens to work on x86 and some other architectures.
> 
> But on other architectures, the exact same pointer value can be a
> kernel pointer or a user pointer.

Right, it has the same issue as with the old probe helper. I was merely stating that
there are existing users (on x86) out there that use it this way, even though broken
generally.

>> Given this is enabled on pretty much all program types, my
>> assumption would be that usage is still more often on kernel memory than user one.
> 
> You need to pick one.
> 
> If you know it is a user pointer, use strncpy_from_user() (possibly
> with disable_pagefault() aka strncpy_from_user_nofault()).
> 
> And if you know it is a kernel pointer, use strncpy_from_unsafe() (aka
> strncpy_from_kernel_nofault()).
> 
> You really can't pick the "randomly one or the other guess what I mean " option.

My preference would be to have %s, %sK, %sU for bpf_trace_printk() where the latter two
result in an explicit strncpy_from_kernel_nofault() or strncpy_from_user_nofault()
choice while the %s is converted as per your suggestion and it would still allow for a
grace period to convert existing users to the new variants, similar with what we did on
the bpf_probe_read_kernel() and bpf_probe_read_user() helpers to get this sorted out.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 22:36       ` Daniel Borkmann
  2020-05-13 23:03         ` Linus Torvalds
  2020-05-13 23:20         ` Masami Hiramatsu
@ 2020-05-13 23:28         ` Al Viro
  2020-05-13 23:58           ` Daniel Borkmann
  2 siblings, 1 reply; 46+ messages in thread
From: Al Viro @ 2020-05-13 23:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Christoph Hellwig, Linus Torvalds, the arch/x86 maintainers,
	Alexei Starovoitov, Masami Hiramatsu, Andrew Morton,
	linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List

On Thu, May 14, 2020 at 12:36:28AM +0200, Daniel Borkmann wrote:

> > So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway
> > try the user copy first, which seems odd.
> > 
> > I'd really like to here from the bpf folks what the expected use case
> > is here, and if the typical argument is kernel or user memory.
> 
> It's used for both. Given this is enabled on pretty much all program types, my
> assumption would be that usage is still more often on kernel memory than user one.

Then it needs an argument telling it which one to use.  Look at sparc64.
Or s390.  Or parisc.  Et sodding cetera.

The underlying model is that the kernel lives in a separate address space.
Yes, on x86 it's actually sharing the page tables with userland, but that's
not universal.  The same address can be both a valid userland one _and_
a valid kernel one.  You need to tell which one do you want.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 23:28         ` Al Viro
@ 2020-05-13 23:58           ` Daniel Borkmann
  2020-05-14 10:01             ` David Laight
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2020-05-13 23:58 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, the arch/x86 maintainers,
	Alexei Starovoitov, Masami Hiramatsu, Andrew Morton,
	linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List, bgregg

On 5/14/20 1:28 AM, Al Viro wrote:
> On Thu, May 14, 2020 at 12:36:28AM +0200, Daniel Borkmann wrote:
> 
>>> So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway
>>> try the user copy first, which seems odd.
>>>
>>> I'd really like to here from the bpf folks what the expected use case
>>> is here, and if the typical argument is kernel or user memory.
>>
>> It's used for both. Given this is enabled on pretty much all program types, my
>> assumption would be that usage is still more often on kernel memory than user one.
> 
> Then it needs an argument telling it which one to use.  Look at sparc64.
> Or s390.  Or parisc.  Et sodding cetera.
> 
> The underlying model is that the kernel lives in a separate address space.
> Yes, on x86 it's actually sharing the page tables with userland, but that's
> not universal.  The same address can be both a valid userland one _and_
> a valid kernel one.  You need to tell which one do you want.

Yes, see also 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
kernel}_str helpers"), and my other reply wrt bpf_trace_printk() on how to address
this. All I'm trying to say is that both bpf_probe_read() and bpf_trace_printk() do
exist in this form since early [e]bpf days for ~5yrs now and while broken on non-x86
there are a lot of users on x86 for this in the wild, so they need to have a chance
to migrate over to the new facilities before they are fully removed.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 23:20         ` Masami Hiramatsu
@ 2020-05-13 23:59           ` Linus Torvalds
  2020-05-14  1:00             ` Masami Hiramatsu
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-05-13 23:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Borkmann, Christoph Hellwig, the arch/x86 maintainers,
	Alexei Starovoitov, Andrew Morton, linux-parisc, linux-um,
	Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 4:21 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
>
> For trace_kprobe.c current order (kernel -> user fallback) is preferred
> because it has another function dedicated for user memory.

Well, then it should just use the "strict" kernel-only one for the
non-user memory.

But yes, if there are legacy interfaces, then we might want to say
"these continue to work for the legacy case on platforms where we can
tell which kind of pointer it is from the bit pattern".

But we should likely at least disallow it entirely on platforms where
we really can't - or pick one hardcoded choice. On sparc, you really
_have_ to specify one or the other.

                  Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 23:59           ` Linus Torvalds
@ 2020-05-14  1:00             ` Masami Hiramatsu
  2020-05-14  2:43               ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: Masami Hiramatsu @ 2020-05-14  1:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Borkmann, Christoph Hellwig, the arch/x86 maintainers,
	Alexei Starovoitov, Andrew Morton, linux-parisc, linux-um,
	Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, 13 May 2020 16:59:40 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, May 13, 2020 at 4:21 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> >
> > For trace_kprobe.c current order (kernel -> user fallback) is preferred
> > because it has another function dedicated for user memory.
> 
> Well, then it should just use the "strict" kernel-only one for the
> non-user memory.
> 
> But yes, if there are legacy interfaces, then we might want to say
> "these continue to work for the legacy case on platforms where we can
> tell which kind of pointer it is from the bit pattern".

Yes, that was why I changed my mind and send reviewed-by last time.

https://lore.kernel.org/bpf/20200511142716.f1ff6fc55220012982c47fec@kernel.org/

> But we should likely at least disallow it entirely on platforms where
> we really can't - or pick one hardcoded choice. On sparc, you really
> _have_ to specify one or the other.

OK. BTW, is there any way to detect the kernel/user space overlap on
memory layout statically? If there, I can do it. (I don't like
"if (CONFIG_X86)" thing....)
Or, maybe we need CONFIG_ARCH_OVERLAP_ADDRESS_SPACE?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/18] maccess: unify the probe kernel arch hooks
  2020-05-13 16:00 ` [PATCH 10/18] maccess: unify the probe kernel arch hooks Christoph Hellwig
@ 2020-05-14  1:13   ` Masami Hiramatsu
  2020-05-19  5:46     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Masami Hiramatsu @ 2020-05-14  1:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: x86, Alexei Starovoitov, Daniel Borkmann, Linus Torvalds,
	Andrew Morton, linux-parisc, linux-um, netdev, bpf, linux-mm,
	linux-kernel

Hi Christoph,

On Wed, 13 May 2020 18:00:30 +0200
Christoph Hellwig <hch@lst.de> wrote:

> @@ -36,14 +50,20 @@ long __weak probe_kernel_read(void *dst, const void *src, size_t size)
>   * probe_kernel_read() suitable for use within regions where the caller
>   * already holds mmap_sem, or other locks which nest inside mmap_sem.
>   */
> -long __weak probe_kernel_read_strict(void *dst, const void *src, size_t size)
> -    __attribute__((alias("__probe_kernel_read")));
> +long probe_kernel_read_strict(void *dst, const void *src, size_t size)
> +{
> +	return __probe_kernel_read(dst, src, size, true);
> +}
>  
> -long __probe_kernel_read(void *dst, const void *src, size_t size)
> +static long __probe_kernel_read(void *dst, const void *src, size_t size,
> +		bool strict)
>  {
>  	long ret;
>  	mm_segment_t old_fs = get_fs();
>  
> +	if (!probe_kernel_read_allowed(dst, src, size, strict))
> +		return -EFAULT;

Could you make this return -ERANGE instead of -EFAULT so that
the caller can notice that the address might be user space?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-14  1:00             ` Masami Hiramatsu
@ 2020-05-14  2:43               ` Linus Torvalds
  2020-05-14  9:44                 ` Masami Hiramatsu
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-05-14  2:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Daniel Borkmann, Christoph Hellwig, the arch/x86 maintainers,
	Alexei Starovoitov, Andrew Morton, linux-parisc, linux-um,
	Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, May 13, 2020 at 6:00 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> > But we should likely at least disallow it entirely on platforms where
> > we really can't - or pick one hardcoded choice. On sparc, you really
> > _have_ to specify one or the other.
>
> OK. BTW, is there any way to detect the kernel/user space overlap on
> memory layout statically? If there, I can do it. (I don't like
> "if (CONFIG_X86)" thing....)
> Or, maybe we need CONFIG_ARCH_OVERLAP_ADDRESS_SPACE?

I think it would be better to have a CONFIG variable that
architectures can just 'select' to show that they are ok with separate
kernel and user addresses.

Because I don't think we have any way to say that right now as-is. You
can probably come up with hacky ways to approximate it, ie something
like

    if (TASK_SIZE_MAX > PAGE_OFFSET)
        .... they overlap ..

which would almost work, but..

                 Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-14  2:43               ` Linus Torvalds
@ 2020-05-14  9:44                 ` Masami Hiramatsu
  2020-05-14 10:27                   ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Masami Hiramatsu @ 2020-05-14  9:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Borkmann, Christoph Hellwig, the arch/x86 maintainers,
	Alexei Starovoitov, Andrew Morton, linux-parisc, linux-um,
	Netdev, bpf, Linux-MM, Linux Kernel Mailing List

On Wed, 13 May 2020 19:43:24 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, May 13, 2020 at 6:00 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > > But we should likely at least disallow it entirely on platforms where
> > > we really can't - or pick one hardcoded choice. On sparc, you really
> > > _have_ to specify one or the other.
> >
> > OK. BTW, is there any way to detect the kernel/user space overlap on
> > memory layout statically? If there, I can do it. (I don't like
> > "if (CONFIG_X86)" thing....)
> > Or, maybe we need CONFIG_ARCH_OVERLAP_ADDRESS_SPACE?
> 
> I think it would be better to have a CONFIG variable that
> architectures can just 'select' to show that they are ok with separate
> kernel and user addresses.
> 
> Because I don't think we have any way to say that right now as-is. You
> can probably come up with hacky ways to approximate it, ie something
> like
> 
>     if (TASK_SIZE_MAX > PAGE_OFFSET)
>         .... they overlap ..
> 
> which would almost work, but..

It seems TASK_SIZE_MAX is defined only on x86 and s390, what about
comparing STACK_TOP_MAX with PAGE_OFFSET ?
Anyway, I agree that the best way is introducing a CONFIG.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-13 23:58           ` Daniel Borkmann
@ 2020-05-14 10:01             ` David Laight
  2020-05-14 10:21               ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: David Laight @ 2020-05-14 10:01 UTC (permalink / raw)
  To: 'Daniel Borkmann', Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, the arch/x86 maintainers,
	Alexei Starovoitov, Masami Hiramatsu, Andrew Morton,
	linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List, bgregg

From: Daniel Borkmann
> Sent: 14 May 2020 00:59
> On 5/14/20 1:28 AM, Al Viro wrote:
> > On Thu, May 14, 2020 at 12:36:28AM +0200, Daniel Borkmann wrote:
> >
> >>> So on say s390 TASK_SIZE_USUALLy is (-PAGE_SIZE), which means we'd alway
> >>> try the user copy first, which seems odd.
> >>>
> >>> I'd really like to here from the bpf folks what the expected use case
> >>> is here, and if the typical argument is kernel or user memory.
> >>
> >> It's used for both. Given this is enabled on pretty much all program types, my
> >> assumption would be that usage is still more often on kernel memory than user one.
> >
> > Then it needs an argument telling it which one to use.  Look at sparc64.
> > Or s390.  Or parisc.  Et sodding cetera.
> >
> > The underlying model is that the kernel lives in a separate address space.
> > Yes, on x86 it's actually sharing the page tables with userland, but that's
> > not universal.  The same address can be both a valid userland one _and_
> > a valid kernel one.  You need to tell which one do you want.
> 
> Yes, see also 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
> kernel}_str helpers"), and my other reply wrt bpf_trace_printk() on how to address
> this. All I'm trying to say is that both bpf_probe_read() and bpf_trace_printk() do
> exist in this form since early [e]bpf days for ~5yrs now and while broken on non-x86
> there are a lot of users on x86 for this in the wild, so they need to have a chance
> to migrate over to the new facilities before they are fully removed.

If it's not a stupid question why is a BPF program allowed to get
into a situation where it might have an invalid kernel address.

It all stinks of a hole that allows all of kernel memory to be read
and copied to userspace.

Now you might want to something special so that BPF programs just
abort on OOPS instead of possibly paniking the kernel.
But that is different from a copy that expects to be passed garbage.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-14 10:01             ` David Laight
@ 2020-05-14 10:21               ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2020-05-14 10:21 UTC (permalink / raw)
  To: David Laight, Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, the arch/x86 maintainers,
	Alexei Starovoitov, Masami Hiramatsu, Andrew Morton,
	linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List, bgregg

On 5/14/20 12:01 PM, David Laight wrote:
[...]
> If it's not a stupid question why is a BPF program allowed to get
> into a situation where it might have an invalid kernel address.
> 
> It all stinks of a hole that allows all of kernel memory to be read
> and copied to userspace.
> 
> Now you might want to something special so that BPF programs just
> abort on OOPS instead of possibly paniking the kernel.
> But that is different from a copy that expects to be passed garbage.

I suggest you read up on probe_kernel_read() and its uses in tracing in
general, looks like you haven't done that.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/18] maccess: remove strncpy_from_unsafe
  2020-05-14  9:44                 ` Masami Hiramatsu
@ 2020-05-14 10:27                   ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2020-05-14 10:27 UTC (permalink / raw)
  To: Masami Hiramatsu, Linus Torvalds
  Cc: Christoph Hellwig, the arch/x86 maintainers, Alexei Starovoitov,
	Andrew Morton, linux-parisc, linux-um, Netdev, bpf, Linux-MM,
	Linux Kernel Mailing List

On 5/14/20 11:44 AM, Masami Hiramatsu wrote:
> On Wed, 13 May 2020 19:43:24 -0700
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> On Wed, May 13, 2020 at 6:00 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>>>
>>>> But we should likely at least disallow it entirely on platforms where
>>>> we really can't - or pick one hardcoded choice. On sparc, you really
>>>> _have_ to specify one or the other.
>>>
>>> OK. BTW, is there any way to detect the kernel/user space overlap on
>>> memory layout statically? If there, I can do it. (I don't like
>>> "if (CONFIG_X86)" thing....)
>>> Or, maybe we need CONFIG_ARCH_OVERLAP_ADDRESS_SPACE?
>>
>> I think it would be better to have a CONFIG variable that
>> architectures can just 'select' to show that they are ok with separate
>> kernel and user addresses.
>>
>> Because I don't think we have any way to say that right now as-is. You
>> can probably come up with hacky ways to approximate it, ie something
>> like
>>
>>      if (TASK_SIZE_MAX > PAGE_OFFSET)
>>          .... they overlap ..
>>
>> which would almost work, but..
> 
> It seems TASK_SIZE_MAX is defined only on x86 and s390, what about
> comparing STACK_TOP_MAX with PAGE_OFFSET ?
> Anyway, I agree that the best way is introducing a CONFIG.

Agree, CONFIG knob that archs can select feels cleanest. Fwiw, I've cooked
up fixes for bpf side locally here and finishing up testing, will push out
later today.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-13 16:00 ` [PATCH 14/18] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
  2020-05-13 19:36   ` Linus Torvalds
@ 2020-05-16  3:42   ` Masami Hiramatsu
  2020-05-18 15:09     ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Masami Hiramatsu @ 2020-05-16  3:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: x86, Alexei Starovoitov, Daniel Borkmann, Linus Torvalds,
	Andrew Morton, linux-parisc, linux-um, netdev, bpf, linux-mm,
	linux-kernel

Hi Christoph,

On Wed, 13 May 2020 18:00:34 +0200
Christoph Hellwig <hch@lst.de> wrote:

> Provide alternative versions of probe_kernel_read, probe_kernel_write
> and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
> use arch hooks that are modelled after unsafe_{get,put}_user to access
> kernel memory in an exception safe way.

This patch seems to introduce new implementation of probe_kernel_read/write()
and strncpy_from_kernel_unsafe(), but also drops copy_from/to_kernel_nofault()
and strncpy_from_kernel_nofault() if HAVE_ARCH_PROBE_KERNEL is defined.
In the result, this cause a link error with BPF and kprobe events.

BTW, what is the difference of *_unsafe() and *_nofault()?
(maybe we make those to *_nofault() finally?)

Thank you,

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/maccess.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
> 
> diff --git a/mm/maccess.c b/mm/maccess.c
> index 9773e2253b495..e9efe2f98e34a 100644
> --- a/mm/maccess.c
> +++ b/mm/maccess.c
> @@ -12,6 +12,81 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src,
>  	return true;
>  }
>  
> +#ifdef HAVE_ARCH_PROBE_KERNEL
> +
> +#define probe_kernel_read_loop(dst, src, len, type, err_label)		\
> +	while (len >= sizeof(type)) {					\
> +		arch_kernel_read(dst, src, type, err_label);		\
> +		dst += sizeof(type);					\
> +		src += sizeof(type);					\
> +		len -= sizeof(type);					\
> +	}
> +
> +long probe_kernel_read(void *dst, const void *src, size_t size)
> +{
> +	if (!probe_kernel_read_allowed(dst, src, size))
> +		return -EFAULT;
> +
> +	pagefault_disable();
> +	probe_kernel_read_loop(dst, src, size, u64, Efault);
> +	probe_kernel_read_loop(dst, src, size, u32, Efault);
> +	probe_kernel_read_loop(dst, src, size, u16, Efault);
> +	probe_kernel_read_loop(dst, src, size, u8, Efault);
> +	pagefault_enable();
> +	return 0;
> +Efault:
> +	pagefault_enable();
> +	return -EFAULT;
> +}
> +EXPORT_SYMBOL_GPL(probe_kernel_read);
> +
> +#define probe_kernel_write_loop(dst, src, len, type, err_label)		\
> +	while (len >= sizeof(type)) {					\
> +		arch_kernel_write(dst, src, type, err_label);		\
> +		dst += sizeof(type);					\
> +		src += sizeof(type);					\
> +		len -= sizeof(type);					\
> +	}
> +
> +long probe_kernel_write(void *dst, const void *src, size_t size)
> +{
> +	pagefault_disable();
> +	probe_kernel_write_loop(dst, src, size, u64, Efault);
> +	probe_kernel_write_loop(dst, src, size, u32, Efault);
> +	probe_kernel_write_loop(dst, src, size, u16, Efault);
> +	probe_kernel_write_loop(dst, src, size, u8, Efault);
> +	pagefault_enable();
> +	return 0;
> +Efault:
> +	pagefault_enable();
> +	return -EFAULT;
> +}
> +
> +long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count)
> +{
> +	const void *src = unsafe_addr;
> +
> +	if (unlikely(count <= 0))
> +		return 0;
> +	if (!probe_kernel_read_allowed(dst, unsafe_addr, count))
> +		return -EFAULT;
> +
> +	pagefault_disable();
> +	do {
> +		arch_kernel_read(dst, src, u8, Efault);
> +		dst++;
> +		src++;
> +	} while (dst[-1] && src - unsafe_addr < count);
> +	pagefault_enable();
> +
> +	dst[-1] = '\0';
> +	return src - unsafe_addr;
> +Efault:
> +	pagefault_enable();
> +	dst[-1] = '\0';
> +	return -EFAULT;
> +}
> +#else /* HAVE_ARCH_PROBE_KERNEL */
>  /**
>   * probe_kernel_read(): safely attempt to read from kernel-space
>   * @dst: pointer to the buffer that shall take the data
> @@ -114,6 +189,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count)
>  
>  	return ret ? -EFAULT : src - unsafe_addr;
>  }
> +#endif /* HAVE_ARCH_PROBE_KERNEL */
>  
>  /**
>   * probe_user_read(): safely attempt to read from a user-space location
> -- 
> 2.26.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/18] maccess: allow architectures to provide kernel probing directly
  2020-05-16  3:42   ` Masami Hiramatsu
@ 2020-05-18 15:09     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-18 15:09 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christoph Hellwig, x86, Alexei Starovoitov, Daniel Borkmann,
	Linus Torvalds, Andrew Morton, linux-parisc, linux-um, netdev,
	bpf, linux-mm, linux-kernel

On Sat, May 16, 2020 at 12:42:59PM +0900, Masami Hiramatsu wrote:
> > Provide alternative versions of probe_kernel_read, probe_kernel_write
> > and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead
> > use arch hooks that are modelled after unsafe_{get,put}_user to access
> > kernel memory in an exception safe way.
> 
> This patch seems to introduce new implementation of probe_kernel_read/write()
> and strncpy_from_kernel_unsafe(), but also drops copy_from/to_kernel_nofault()
> and strncpy_from_kernel_nofault() if HAVE_ARCH_PROBE_KERNEL is defined.
> In the result, this cause a link error with BPF and kprobe events.

That was just a bug as I didn't commit the changes to switch everything
to _nofault and remove _unsafe entirely, sorry.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/18] maccess: unify the probe kernel arch hooks
  2020-05-14  1:13   ` Masami Hiramatsu
@ 2020-05-19  5:46     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-19  5:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christoph Hellwig, x86, Alexei Starovoitov, Daniel Borkmann,
	Linus Torvalds, Andrew Morton, linux-parisc, linux-um, netdev,
	bpf, linux-mm, linux-kernel

On Thu, May 14, 2020 at 10:13:18AM +0900, Masami Hiramatsu wrote:
> > +		bool strict)
> >  {
> >  	long ret;
> >  	mm_segment_t old_fs = get_fs();
> >  
> > +	if (!probe_kernel_read_allowed(dst, src, size, strict))
> > +		return -EFAULT;
> 
> Could you make this return -ERANGE instead of -EFAULT so that
> the caller can notice that the address might be user space?

That is clearly a behavior change, so I don't want to mix it into
this patch.  But I can add it as another patch at the end.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: clean up and streamline probe_kernel_* and friends v2
  2020-05-13 23:04 ` Daniel Borkmann
  2020-05-13 23:20   ` Linus Torvalds
@ 2020-05-19  5:50   ` Christoph Hellwig
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2020-05-19  5:50 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Christoph Hellwig, x86, Alexei Starovoitov, Masami Hiramatsu,
	Linus Torvalds, Andrew Morton, linux-parisc, linux-um, netdev,
	bpf, linux-mm, linux-kernel

On Thu, May 14, 2020 at 01:04:38AM +0200, Daniel Borkmann wrote:
> Aside from comments on list, the series looks reasonable to me. For BPF
> the bpf_probe_read() helper would be slightly penalized for probing user
> memory given we now test on copy_from_kernel_nofault() first and if that
> fails only then fall back to copy_from_user_nofault(), but it seems
> small enough that it shouldn't matter too much and aside from that we have
> the newer bpf_probe_read_kernel() and bpf_probe_read_user() anyway that
> BPF progs should use instead, so I think it's okay.
>
> For patch 14 and patch 15, do you roughly know the performance gain with
> the new probe_kernel_read_loop() + arch_kernel_read() approach?

I don't think there should be any measurable difference in performance
for typical use cases.  We'll save the stac/clac pair, but that's it.
The real eason is to avoid that stac/clac pair that opens up a window
for explots, and as a significant enabler for killing of set_fs based 
address limit overrides entirely.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13 16:00 clean up and streamline probe_kernel_* and friends v2 Christoph Hellwig
2020-05-13 16:00 ` [PATCH 01/18] maccess: unexport probe_kernel_write and probe_user_write Christoph Hellwig
2020-05-13 16:00 ` [PATCH 02/18] maccess: remove various unused weak aliases Christoph Hellwig
2020-05-13 16:00 ` [PATCH 03/18] maccess: remove duplicate kerneldoc comments Christoph Hellwig
2020-05-13 16:00 ` [PATCH 04/18] maccess: clarify " Christoph Hellwig
2020-05-13 16:00 ` [PATCH 05/18] maccess: update the top of file comment Christoph Hellwig
2020-05-13 16:00 ` [PATCH 06/18] maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault Christoph Hellwig
2020-05-13 16:00 ` [PATCH 07/18] maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault Christoph Hellwig
2020-05-13 16:00 ` [PATCH 08/18] maccess: rename strnlen_unsafe_user to strnlen_user_nofault Christoph Hellwig
2020-05-13 16:00 ` [PATCH 09/18] maccess: remove probe_read_common and probe_write_common Christoph Hellwig
2020-05-13 16:00 ` [PATCH 10/18] maccess: unify the probe kernel arch hooks Christoph Hellwig
2020-05-14  1:13   ` Masami Hiramatsu
2020-05-19  5:46     ` Christoph Hellwig
2020-05-13 16:00 ` [PATCH 11/18] maccess: remove strncpy_from_unsafe Christoph Hellwig
2020-05-13 19:11   ` Linus Torvalds
2020-05-13 19:28     ` Christoph Hellwig
2020-05-13 22:36       ` Daniel Borkmann
2020-05-13 23:03         ` Linus Torvalds
2020-05-13 23:24           ` Daniel Borkmann
2020-05-13 23:20         ` Masami Hiramatsu
2020-05-13 23:59           ` Linus Torvalds
2020-05-14  1:00             ` Masami Hiramatsu
2020-05-14  2:43               ` Linus Torvalds
2020-05-14  9:44                 ` Masami Hiramatsu
2020-05-14 10:27                   ` Daniel Borkmann
2020-05-13 23:28         ` Al Viro
2020-05-13 23:58           ` Daniel Borkmann
2020-05-14 10:01             ` David Laight
2020-05-14 10:21               ` Daniel Borkmann
2020-05-13 16:00 ` [PATCH 12/18] maccess: always use strict semantics for probe_kernel_read Christoph Hellwig
2020-05-13 16:00 ` [PATCH 13/18] maccess: move user access routines together Christoph Hellwig
2020-05-13 16:00 ` [PATCH 14/18] maccess: allow architectures to provide kernel probing directly Christoph Hellwig
2020-05-13 19:36   ` Linus Torvalds
2020-05-13 19:40     ` Christoph Hellwig
2020-05-13 19:48       ` Linus Torvalds
2020-05-13 19:54         ` Christoph Hellwig
2020-05-16  3:42   ` Masami Hiramatsu
2020-05-18 15:09     ` Christoph Hellwig
2020-05-13 16:00 ` [PATCH 15/18] x86: use non-set_fs based maccess routines Christoph Hellwig
2020-05-13 16:00 ` [PATCH 16/18] maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault Christoph Hellwig
2020-05-13 16:00 ` [PATCH 17/18] maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault Christoph Hellwig
2020-05-13 16:00 ` [PATCH 18/18] maccess: rename probe_kernel_address to get_kernel_nofault Christoph Hellwig
2020-05-13 19:37 ` clean up and streamline probe_kernel_* and friends v2 Linus Torvalds
2020-05-13 23:04 ` Daniel Borkmann
2020-05-13 23:20   ` Linus Torvalds
2020-05-19  5:50   ` Christoph Hellwig

Linux-parisc archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-parisc/0 linux-parisc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-parisc linux-parisc/ https://lore.kernel.org/linux-parisc \
		linux-parisc@vger.kernel.org
	public-inbox-index linux-parisc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-parisc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git