linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
@ 2019-02-25 14:04 Masami Hiramatsu
  2019-02-25 14:05 ` [RFC PATCH 1/4] uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault Masami Hiramatsu
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-25 14:04 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Hi,

Here is an RFC series of probe-event to support user-space access
methods, which we discussed in previous thread.

https://lkml.kernel.org/r/20190220171019.5e81a4946b56982f324f7c45@kernel.org

So in this thread, it is clear that current probe_kernel_read()
and strncpy_from_unsafe() are not enough to access user-space
variables from kprobe events on some arch. On such arch, user
address space and kernel address space can overlap so we have
to change the memory segment to user-mode before copying.
But probe_kernel_read() is designed to access primarily kernel
memory, it may fail to get, or get unexpected value on such
arch. So we need to expand kprobe fetcharg to support new options
for such case.

For user-space access extension, this series adds 2 features,
"ustring" type and user-space dereference syntax. "ustring" is
used for recording a null-terminated string in user-space from
kprobe events.

"ustring" type is easy, it is able to use instead of "string"
type, so if you want to record a user-space string via
"__user char *", you can use ustring type instead of string.
For example,

echo 'p do_sys_open path=+0($arg2):ustring' >> kprobe_events

will record the path string from user-space.

The user-space dereference syntax is also simple. Thi just
adds 'u' prefix before an offset value.

   +|-u<OFFSET>(<FETCHARG>)

e.g. +u8(%ax), +u0(+0(%si))

This is more generic. If you want to refer the variable in user-
space from its address or access a field in data structure in
user-space, you need to use this.

For example, if you probe do_sched_setscheduler(pid, policy,
param) and record param->sched_priority, you can add new
probe as below;
    
   p do_sched_setscheduler priority=+u0($arg3)

Actually, with this feature, "ustring" type is not absolutely
necessary, because these are same meanings.

  +0($arg2):ustring == +u0($arg2):string

Perhups, we may be better removing "ustring" and just introducing
this user-space dereference syntax...

Note that kprobe event provides these methods, but it doesn't
change it from kernel to user automatically because we do not
know whether the given address is in userspace or kernel on
some arch.
Moreover, from perf-probe, at this moment it is not able to
switch. Since __user is not for compiler but checker, we have
no clue which data structure is in user-space, in debuginfo.

BTW, according to Linus's comment, I implemented probe_user_read()
and strncpy_from_unsafe_user() APIs. And since those use
"access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86,
it will get a warn message at once. It should be solved before
merging this series.

Thank you,

---

Masami Hiramatsu (4):
      uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
      uaccess: Add non-pagefault user-space read functions
      tracing/probe: Add ustring type for user-space string
      tracing/probe: Support user-space dereference


 Documentation/trace/kprobetrace.rst  |   13 ++-
 Documentation/trace/uprobetracer.rst |    9 +-
 fs/namespace.c                       |    2 
 include/linux/uaccess.h              |   13 +++
 kernel/trace/trace.c                 |    7 +-
 kernel/trace/trace_kprobe.c          |   65 ++++++++++++++++
 kernel/trace/trace_probe.c           |   39 ++++++++--
 kernel/trace/trace_probe.h           |    3 +
 kernel/trace/trace_probe_tmpl.h      |   36 +++++++--
 kernel/trace/trace_uprobe.c          |   19 +++++
 mm/maccess.c                         |  138 ++++++++++++++++++++++++++++++----
 11 files changed, 302 insertions(+), 42 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/4] uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
  2019-02-25 14:04 [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
@ 2019-02-25 14:05 ` Masami Hiramatsu
  2019-02-25 14:05 ` [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-25 14:05 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Make sure the kernel_uaccess_faults_ok is updated before a
pagefault can hit and the last loads/stores issued before
resotring kernel_uaccess_faults_ok.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 fs/namespace.c |    2 ++
 mm/maccess.c   |   12 ++++++------
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a677b59efd74..88a8e0eb6234 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2699,6 +2699,7 @@ static long exact_copy_from_user(void *to, const void __user * from,
 		return n;
 
 	current->kernel_uaccess_faults_ok++;
+	barrier();
 	while (n) {
 		if (__get_user(c, f)) {
 			memset(t, 0, n);
@@ -2708,6 +2709,7 @@ static long exact_copy_from_user(void *to, const void __user * from,
 		f++;
 		n--;
 	}
+	barrier();
 	current->kernel_uaccess_faults_ok--;
 	return n;
 }
diff --git a/mm/maccess.c b/mm/maccess.c
index f3416632e5a4..7a9752d59587 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -29,12 +29,12 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	pagefault_disable();
 	current->kernel_uaccess_faults_ok++;
+	pagefault_disable();
 	ret = __copy_from_user_inatomic(dst,
 			(__force const void __user *)src, size);
-	current->kernel_uaccess_faults_ok--;
 	pagefault_enable();
+	current->kernel_uaccess_faults_ok--;
 	set_fs(old_fs);
 
 	return ret ? -EFAULT : 0;
@@ -59,11 +59,11 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	pagefault_disable();
 	current->kernel_uaccess_faults_ok++;
+	pagefault_disable();
 	ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);
-	current->kernel_uaccess_faults_ok--;
 	pagefault_enable();
+	current->kernel_uaccess_faults_ok--;
 	set_fs(old_fs);
 
 	return ret ? -EFAULT : 0;
@@ -97,16 +97,16 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 		return 0;
 
 	set_fs(KERNEL_DS);
-	pagefault_disable();
 	current->kernel_uaccess_faults_ok++;
+	pagefault_disable();
 
 	do {
 		ret = __get_user(*dst++, (const char __user __force *)src++);
 	} while (dst[-1] && ret == 0 && src - unsafe_addr < count);
 
-	current->kernel_uaccess_faults_ok--;
 	dst[-1] = '\0';
 	pagefault_enable();
+	current->kernel_uaccess_faults_ok--;
 	set_fs(old_fs);
 
 	return ret ? -EFAULT : src - unsafe_addr;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 14:04 [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
  2019-02-25 14:05 ` [RFC PATCH 1/4] uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault Masami Hiramatsu
@ 2019-02-25 14:05 ` Masami Hiramatsu
  2019-02-25 15:06   ` Peter Zijlstra
  2019-02-25 17:06   ` Kees Cook
  2019-02-25 14:06 ` [RFC PATCH 3/4] tracing/probe: Add ustring type for user-space string Masami Hiramatsu
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-25 14:05 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Add probe_user_read() and strncpy_from_unsafe_user() which
will not involves mm_sem so we can use it for accessing
user-space in irq-handler.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 include/linux/uaccess.h |   13 +++++
 mm/maccess.c            |  134 +++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 131 insertions(+), 16 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 37b226e8df13..906573b8f02c 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -240,6 +240,17 @@ extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long __probe_kernel_read(void *dst, const void *src, size_t size);
 
 /*
+ * probe_user_read(): safely attempt to read from a location in user space
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Safely read from address @src to the buffer at @dst.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+extern long probe_user_read(void *dst, const void __user *src, size_t size);
+
+/*
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
  * @src: pointer to the data that shall be written
@@ -252,6 +263,8 @@ extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
+extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+				     long count);
 
 /**
  * probe_kernel_address(): safely attempt to read from a location
diff --git a/mm/maccess.c b/mm/maccess.c
index 7a9752d59587..7d38d783e0b1 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -5,8 +5,20 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
+static __always_inline long
+probe_read_common(void *dst, const void __user *src, size_t size)
+{
+	long ret;
+
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(dst, src, size);
+	pagefault_enable();
+
+	return ret ? -EFAULT : 0;
+}
+
 /**
- * probe_kernel_read(): safely attempt to read from a location
+ * probe_kernel_read(): safely attempt to read from a kernel-space location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
@@ -30,18 +42,53 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 
 	set_fs(KERNEL_DS);
 	current->kernel_uaccess_faults_ok++;
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(dst,
-			(__force const void __user *)src, size);
-	pagefault_enable();
+	ret = probe_read_common(dst, (__force const void __user *)src, size);
 	current->kernel_uaccess_faults_ok--;
 	set_fs(old_fs);
 
-	return ret ? -EFAULT : 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
+ * probe_user_read(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+
+long __weak probe_user_read(void *dst, const void __user *src, size_t size)
+    __attribute__((alias("__probe_user_read")));
+
+static long __normal_probe_user_read(void *dst, const void __user *src,
+				     size_t size)
+{
+	if (!access_ok(src, size))
+		return -EFAULT;
+
+	return probe_read_common(dst, src, size);
+}
+
+long __probe_user_read(void *dst, const void __user *src, size_t size)
+{
+	long ret;
+	mm_segment_t old_fs = get_fs();
+
+	if (segment_eq(old_fs, USER_DS)) {
+		ret = __normal_probe_user_read(dst, src, size);
+	} else {
+		set_fs(USER_DS);
+		ret = __normal_probe_user_read(dst, src, size);
+		set_fs(old_fs);
+	}
+	return ret ? -EFAULT : 0;
+}
+EXPORT_SYMBOL_GPL(probe_user_read);
+
+/**
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
  * @src: pointer to the data that shall be written
@@ -70,6 +117,22 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 }
 EXPORT_SYMBOL_GPL(probe_kernel_write);
 
+
+static __always_inline long strncpy_from_unsafe_common(char *dst,
+				const char __user *unsafe_addr, long count)
+{
+	const char __user *src = unsafe_addr;
+	int ret;
+
+	pagefault_disable();
+	do {
+		ret = __get_user(*dst++, src++);
+	} while (dst[-1] && ret == 0 && src - unsafe_addr < count);
+	dst[-1] = '\0';
+	pagefault_enable();
+
+	return ret ? -EFAULT : src - unsafe_addr;
+}
 /**
  * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
@@ -90,7 +153,6 @@ EXPORT_SYMBOL_GPL(probe_kernel_write);
 long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 {
 	mm_segment_t old_fs = get_fs();
-	const void *src = unsafe_addr;
 	long ret;
 
 	if (unlikely(count <= 0))
@@ -98,16 +160,56 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	set_fs(KERNEL_DS);
 	current->kernel_uaccess_faults_ok++;
-	pagefault_disable();
-
-	do {
-		ret = __get_user(*dst++, (const char __user __force *)src++);
-	} while (dst[-1] && ret == 0 && src - unsafe_addr < count);
-
-	dst[-1] = '\0';
-	pagefault_enable();
+	ret = strncpy_from_unsafe_common(dst,
+			(const char __user __force *)unsafe_addr, count);
 	current->kernel_uaccess_faults_ok--;
 	set_fs(old_fs);
 
-	return ret ? -EFAULT : src - unsafe_addr;
+	return ret;
+}
+
+static __always_inline long __strncpy_from_unsafe_user(char *dst,
+			 const char __user *unsafe_addr, long count)
+{
+	if (!access_ok(unsafe_addr, count))
+		return -EFAULT;
+
+	return strncpy_from_unsafe_common(dst, unsafe_addr, count);
+}
+
+/**
+ * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ *				address.
+ * @dst:   Destination address, in kernel space.  This buffer must be at
+ *         least @count bytes long.
+ * @unsafe_addr: Unsafe user address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe user address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
+long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+			      long count)
+{
+	mm_segment_t old_fs = get_fs();
+	long ret;
+
+	if (unlikely(count <= 0))
+		return 0;
+
+	if (segment_eq(old_fs, USER_DS)) {
+		ret = __strncpy_from_unsafe_user(dst, unsafe_addr, count);
+	} else {
+		set_fs(USER_DS);
+		ret = __strncpy_from_unsafe_user(dst, unsafe_addr, count);
+		set_fs(old_fs);
+	}
+	return ret;
 }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/4] tracing/probe: Add ustring type for user-space string
  2019-02-25 14:04 [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
  2019-02-25 14:05 ` [RFC PATCH 1/4] uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault Masami Hiramatsu
  2019-02-25 14:05 ` [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
@ 2019-02-25 14:06 ` Masami Hiramatsu
  2019-02-25 14:06 ` [RFC PATCH 4/4] tracing/probe: Support user-space dereference Masami Hiramatsu
  2019-02-26 21:38 ` [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Joel Fernandes
  4 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-25 14:06 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Add "ustring" type for fetching user-space string from kprobe event.
User can specify ustring type at uprobe event, and it is same as
"string" for uprobe.

Note that probe-event provides this option but it doesn't choose the
correct type automatically since we have not way to decide the address
is in user-space or not on some arch (and on some other arch, you can
fetch the string by "string" type). So user must carefully check the
target code (e.g. if you see __user on the target variable) and
use this new type.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Documentation/trace/kprobetrace.rst |    9 ++++-
 kernel/trace/trace.c                |    2 +
 kernel/trace/trace_kprobe.c         |   59 +++++++++++++++++++++++++++++++++++
 kernel/trace/trace_probe.c          |   14 +++++++-
 kernel/trace/trace_probe.h          |    1 +
 kernel/trace/trace_probe_tmpl.h     |   14 ++++++++
 kernel/trace/trace_uprobe.c         |   12 +++++++
 7 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 235ce2ab131a..a3ac7c9ac242 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -55,7 +55,8 @@ Synopsis of kprobe_events
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
 		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
-		  (x8/x16/x32/x64), "string" and bitfield are supported.
+		  (x8/x16/x32/x64), "string", "ustring" and bitfield
+		  are supported.
 
   (\*1) only for the probe on function entry (offs == 0).
   (\*2) only for return probe.
@@ -77,7 +78,11 @@ apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
 wrong, but '+8($stack):x8[8]' is OK.)
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
-has been paged out.
+has been paged out. "ustring" type is an alternative of string for user-space.
+Note that kprobe-event provides string/ustring types, but doesn't change it
+automatically. So user has to decide if the targe string in kernel or in user
+space carefully. On some arch, if you choose wrong one, it always fails to
+record string data.
 The string array type is a bit different from other types. For other base
 types, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
 as +0(%di):x32.) But string[1] is not equal to string. The string type itself
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c4238b441624..4cacbb0e1538 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4643,7 +4643,7 @@ static const char readme_msg[] =
 	"\t           $stack<index>, $stack, $retval, $comm\n"
 #endif
 	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
-	"\t           b<bit-width>@<bit-offset>/<container-size>,\n"
+	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
 	"\t           <type>\\[<array-size>\\]\n"
 #ifdef CONFIG_HIST_TRIGGERS
 	"\t    field: <stype> <name>;\n"
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 9eaf07f99212..d50d937b6933 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -872,6 +872,44 @@ fetch_store_strlen(unsigned long addr)
 	return (ret < 0) ? ret : len;
 }
 
+static nokprobe_inline int __fetch_strlen_user(const char __user *addr)
+{
+	int ret, len = 0;
+	u8 c;
+
+	if (!access_ok(addr, MAX_STRING_SIZE))
+		return -EFAULT;
+
+	pagefault_disable();
+
+	do {
+		ret = __get_user(c, addr + len);
+		len++;
+	} while (c && ret == 0 && len < MAX_STRING_SIZE);
+
+	pagefault_enable();
+
+	return (ret < 0) ? ret : len;
+}
+
+/* Return the length of string -- including null terminal byte */
+static nokprobe_inline int
+fetch_store_strlen_user(unsigned long addr)
+{
+	mm_segment_t old_fs = get_fs();
+	int ret;
+
+	if (segment_eq(old_fs, USER_DS)) {
+		ret = __fetch_strlen_user((__force const char __user *)addr);
+	} else {
+		set_fs(USER_DS);
+		ret = __fetch_strlen_user((__force const char __user *)addr);
+		set_fs(old_fs);
+	}
+
+	return ret;
+}
+
 /*
  * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
  * length and relative data location.
@@ -896,6 +934,27 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
 	return ret;
 }
 
+/*
+ * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf with max
+ * length and relative data location.
+ */
+static nokprobe_inline int
+fetch_store_string_user(unsigned long addr, void *dest, void *base)
+{
+	int maxlen = get_loc_len(*(u32 *)dest);
+	u8 *dst = get_loc_data(dest, base);
+	long ret;
+
+	if (unlikely(!maxlen))
+		return -ENOMEM;
+	ret = strncpy_from_unsafe_user(dst, (__force const void __user *)addr,
+				       maxlen);
+
+	if (ret >= 0)
+		*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
+	return ret;
+}
+
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size)
 {
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9962cb5da8ac..a7012de37a00 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -73,6 +73,8 @@ static const struct fetch_type probe_fetch_types[] = {
 	/* Special types */
 	__ASSIGN_FETCH_TYPE("string", string, string, sizeof(u32), 1,
 			    "__data_loc char[]"),
+	__ASSIGN_FETCH_TYPE("ustring", string, string, sizeof(u32), 1,
+			    "__data_loc char[]"),
 	/* Basic types */
 	ASSIGN_FETCH_TYPE(u8,  u8,  0),
 	ASSIGN_FETCH_TYPE(u16, u16, 0),
@@ -440,7 +442,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 		goto fail;
 
 	/* Store operation */
-	if (!strcmp(parg->type->name, "string")) {
+	if (!strcmp(parg->type->name, "string") ||
+	    !strcmp(parg->type->name, "ustring")) {
 		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
 		    code->op != FETCH_OP_COMM) {
 			pr_info("string only accepts memory or address.\n");
@@ -459,7 +462,11 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 				goto fail;
 			}
 		}
-		code->op = FETCH_OP_ST_STRING;	/* In DEREF case, replace it */
+		/* If op == DEREF, replace it with STRING */
+		if (!strcmp(parg->type->name, "ustring"))
+			code->op = FETCH_OP_ST_USTRING;
+		else
+			code->op = FETCH_OP_ST_STRING;
 		code->size = parg->type->size;
 		parg->dynamic = true;
 	} else if (code->op == FETCH_OP_DEREF) {
@@ -484,7 +491,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 	/* Loop(Array) operation */
 	if (parg->count) {
 		if (scode->op != FETCH_OP_ST_MEM &&
-		    scode->op != FETCH_OP_ST_STRING) {
+		    scode->op != FETCH_OP_ST_STRING &&
+		    scode->op != FETCH_OP_ST_USTRING) {
 			pr_info("array only accepts memory or address\n");
 			ret = -EINVAL;
 			goto fail;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 8a63f8bc01bc..cf4ba8bbb841 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -95,6 +95,7 @@ enum fetch_op {
 	FETCH_OP_ST_RAW,	/* Raw: .size */
 	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
 	FETCH_OP_ST_STRING,	/* String: .offset, .size */
+	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
 	// Stage 4 (modify) op
 	FETCH_OP_MOD_BF,	/* Bitfield: .basesize, .lshift, .rshift */
 	// Stage 5 (loop) op
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 4737bb8c07a3..7526f6f8d7b0 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -59,6 +59,9 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs,
 static nokprobe_inline int fetch_store_strlen(unsigned long addr);
 static nokprobe_inline int
 fetch_store_string(unsigned long addr, void *dest, void *base);
+static nokprobe_inline int fetch_store_strlen_user(unsigned long addr);
+static nokprobe_inline int
+fetch_store_string_user(unsigned long addr, void *dest, void *base);
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size);
 
@@ -91,6 +94,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 			ret += fetch_store_strlen(val + code->offset);
 			code++;
 			goto array;
+		} else if (code->op == FETCH_OP_ST_USTRING) {
+			ret += fetch_store_strlen_user(val + code->offset);
+			code++;
+			goto array;
 		} else
 			return -EILSEQ;
 	}
@@ -106,6 +113,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 		loc = *(u32 *)dest;
 		ret = fetch_store_string(val + code->offset, dest, base);
 		break;
+	case FETCH_OP_ST_USTRING:
+		loc = *(u32 *)dest;
+		ret = fetch_store_string_user(val + code->offset, dest, base);
+		break;
 	default:
 		return -EILSEQ;
 	}
@@ -123,7 +134,8 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 		total += ret;
 		if (++i < code->param) {
 			code = s3;
-			if (s3->op != FETCH_OP_ST_STRING) {
+			if (s3->op != FETCH_OP_ST_STRING &&
+			    s3->op != FETCH_OP_ST_USTRING) {
 				dest += s3->size;
 				val += s3->size;
 				goto stage3;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 9bde07c06362..92facae8c3d8 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -173,6 +173,12 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
 	return ret;
 }
 
+static nokprobe_inline int
+fetch_store_string_user(unsigned long addr, void *dest, void *base)
+{
+	return fetch_store_string(addr, dest, base);
+}
+
 /* Return the length of string -- including null terminal byte */
 static nokprobe_inline int
 fetch_store_strlen(unsigned long addr)
@@ -185,6 +191,12 @@ fetch_store_strlen(unsigned long addr)
 	return (len > MAX_STRING_SIZE) ? 0 : len;
 }
 
+static nokprobe_inline int
+fetch_store_strlen_user(unsigned long addr)
+{
+	return fetch_store_strlen(addr);
+}
+
 static unsigned long translate_user_vaddr(unsigned long file_offset)
 {
 	unsigned long base_addr;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 4/4] tracing/probe: Support user-space dereference
  2019-02-25 14:04 [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2019-02-25 14:06 ` [RFC PATCH 3/4] tracing/probe: Add ustring type for user-space string Masami Hiramatsu
@ 2019-02-25 14:06 ` Masami Hiramatsu
  2019-02-26 21:38 ` [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Joel Fernandes
  4 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-25 14:06 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Support user-space dereference syntax for probe event arguments
to dereference the data-structure or array in user-space.

The syntax is just adding 'u' before an offset value.

 +|-u<OFFSET>(<FETCHARG>)

e.g. +u8(%ax), +u0(+0(%si))

For example, if you probe do_sched_setscheduler(pid, policy,
param) and record param->sched_priority, you can add new
probe as below;

 p do_sched_setscheduler priority=+u0($arg3)

Note that kprobe event provides this and it doesn't change the
dereference method automatically because we do not know whether
the given address is in userspace or kernel on some arch.

So as same as "ustring", this is an option for user, who has to
carefully choose the dereference method.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Documentation/trace/kprobetrace.rst  |    4 +++-
 Documentation/trace/uprobetracer.rst |    9 +++++----
 kernel/trace/trace.c                 |    5 +++--
 kernel/trace/trace_kprobe.c          |    6 ++++++
 kernel/trace/trace_probe.c           |   27 +++++++++++++++++++++------
 kernel/trace/trace_probe.h           |    2 ++
 kernel/trace/trace_probe_tmpl.h      |   22 +++++++++++++++++-----
 kernel/trace/trace_uprobe.c          |    7 +++++++
 8 files changed, 64 insertions(+), 18 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index a3ac7c9ac242..036d8c5ba18c 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -51,7 +51,7 @@ Synopsis of kprobe_events
   $argN		: Fetch the Nth function argument. (N >= 1) (\*1)
   $retval	: Fetch return value.(\*2)
   $comm		: Fetch current task comm.
-  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(\*3)
+  +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
 		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
@@ -61,6 +61,8 @@ Synopsis of kprobe_events
   (\*1) only for the probe on function entry (offs == 0).
   (\*2) only for return probe.
   (\*3) this is useful for fetching a field of data structures.
+  (\*4) "u" means user-space dereference. So accessing data structure in
+        user-space, you have to use this "u" prefix. (e.g. +u0(%si))
 
 Types
 -----
diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index 4c3bfde2ba47..6144423b2368 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
    @+OFFSET	: Fetch memory at OFFSET (OFFSET from same file as PATH)
    $stackN	: Fetch Nth entry of stack (N >= 0)
    $stack	: Fetch stack address.
-   $retval	: Fetch return value.(*)
+   $retval	: Fetch return value.(\*1)
    $comm	: Fetch current task comm.
-   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
    NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
    FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
 		       (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
 		       (x8/x16/x32/x64), "string" and bitfield are supported.
 
-  (*) only for return probe.
-  (**) this is useful for fetching a field of data structures.
+  (\*1) only for return probe.
+  (\*2) this is useful for fetching a field of data structures.
+  (\*3) Unlike kprobe event, "u" prefix will be just ignored.
 
 Types
 -----
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4cacbb0e1538..5408a82a015d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4638,10 +4638,11 @@ static const char readme_msg[] =
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
-	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>\n"
+	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
 #else
-	"\t           $stack<index>, $stack, $retval, $comm\n"
+	"\t           $stack<index>, $stack, $retval, $comm,\n"
 #endif
+	"\t           +|-[u]<offset>(<fetcharg>)\n"
 	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
 	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
 	"\t           <type>\\[<array-size>\\]\n"
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d50d937b6933..e650b9cc5fbd 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -961,6 +961,12 @@ probe_mem_read(void *dest, void *src, size_t size)
 	return probe_kernel_read(dest, src, size);
 }
 
+static nokprobe_inline int
+probe_mem_read_user(void *dest, void *src, size_t size)
+{
+	return probe_user_read(dest, src, size);
+}
+
 /* Note that we don't verify it, since the code does not come from user space */
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index a7012de37a00..0efef172db17 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -239,6 +239,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 {
 	struct fetch_insn *code = *pcode;
 	unsigned long param;
+	int deref = FETCH_OP_DEREF;
 	long offset = 0;
 	char *tmp;
 	int ret = 0;
@@ -301,8 +302,17 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 		break;
 
 	case '+':	/* deref memory */
-		arg++;	/* Skip '+', because kstrtol() rejects it. */
 	case '-':
+		if (arg[0] == '+') {
+			arg++;	/* Skip '+', because kstrtol() rejects it. */
+			if (arg[0] == 'u') {
+				deref = FETCH_OP_UDEREF;
+				arg++;
+			}
+		} else if (arg[1] == 'u') {	/* Start with "-u" */
+			deref = FETCH_OP_UDEREF;
+			*(++arg) = '-';
+		}
 		tmp = strchr(arg, '(');
 		if (!tmp)
 			return -EINVAL;
@@ -328,7 +338,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 				return -E2BIG;
 			*pcode = code;
 
-			code->op = FETCH_OP_DEREF;
+			code->op = deref;
 			code->offset = offset;
 		}
 		break;
@@ -444,13 +454,14 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 	/* Store operation */
 	if (!strcmp(parg->type->name, "string") ||
 	    !strcmp(parg->type->name, "ustring")) {
-		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
-		    code->op != FETCH_OP_COMM) {
+		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_UDEREF
+		    && code->op != FETCH_OP_IMM && code->op != FETCH_OP_COMM) {
 			pr_info("string only accepts memory or address.\n");
 			ret = -EINVAL;
 			goto fail;
 		}
-		if (code->op != FETCH_OP_DEREF || parg->count) {
+		if ((code->op == FETCH_OP_IMM && code->op == FETCH_OP_COMM)
+		    || parg->count) {
 			/*
 			 * IMM and COMM is pointing actual address, those must
 			 * be kept, and if parg->count != 0, this is an array
@@ -463,7 +474,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 			}
 		}
 		/* If op == DEREF, replace it with STRING */
-		if (!strcmp(parg->type->name, "ustring"))
+		if (!strcmp(parg->type->name, "ustring") ||
+		    code->op == FETCH_OP_UDEREF)
 			code->op = FETCH_OP_ST_USTRING;
 		else
 			code->op = FETCH_OP_ST_STRING;
@@ -472,6 +484,9 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 	} else if (code->op == FETCH_OP_DEREF) {
 		code->op = FETCH_OP_ST_MEM;
 		code->size = parg->type->size;
+	} else if (code->op == FETCH_OP_UDEREF) {
+		code->op = FETCH_OP_ST_UMEM;
+		code->size = parg->type->size;
 	} else {
 		code++;
 		if (code->op != FETCH_OP_NOP) {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index cf4ba8bbb841..a5e8b2ac2c97 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -91,9 +91,11 @@ enum fetch_op {
 	FETCH_OP_FOFFS,		/* File offset: .immediate */
 	// Stage 2 (dereference) op
 	FETCH_OP_DEREF,		/* Dereference: .offset */
+	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
 	// Stage 3 (store) ops
 	FETCH_OP_ST_RAW,	/* Raw: .size */
 	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
+	FETCH_OP_ST_UMEM,	/* Mem: .offset, .size */
 	FETCH_OP_ST_STRING,	/* String: .offset, .size */
 	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
 	// Stage 4 (modify) op
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 7526f6f8d7b0..a1b58ccdba9a 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -64,6 +64,8 @@ static nokprobe_inline int
 fetch_store_string_user(unsigned long addr, void *dest, void *base);
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size);
+static nokprobe_inline int
+probe_mem_read_user(void *dest, void *src, size_t size);
 
 /* From the 2nd stage, routine is same */
 static nokprobe_inline int
@@ -77,14 +79,21 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 
 stage2:
 	/* 2nd stage: dereference memory if needed */
-	while (code->op == FETCH_OP_DEREF) {
-		lval = val;
-		ret = probe_mem_read(&val, (void *)val + code->offset,
-					sizeof(val));
+	do {
+		if (code->op == FETCH_OP_DEREF) {
+			lval = val;
+			ret = probe_mem_read(&val, (void *)val + code->offset,
+					     sizeof(val));
+		} else if (code->op == FETCH_OP_UDEREF) {
+			lval = val;
+			ret = probe_mem_read_user(&val,
+				 (void *)val + code->offset, sizeof(val));
+		} else
+			break;
 		if (ret)
 			return ret;
 		code++;
-	}
+	} while (1);
 
 	s3 = code;
 stage3:
@@ -109,6 +118,9 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 	case FETCH_OP_ST_MEM:
 		probe_mem_read(dest, (void *)val + code->offset, code->size);
 		break;
+	case FETCH_OP_ST_UMEM:
+		probe_mem_read_user(dest, (void *)val + code->offset, code->size);
+		break;
 	case FETCH_OP_ST_STRING:
 		loc = *(u32 *)dest;
 		ret = fetch_store_string(val + code->offset, dest, base);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 92facae8c3d8..a86afc9e2a6a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -140,6 +140,13 @@ probe_mem_read(void *dest, void *src, size_t size)
 
 	return copy_from_user(dest, vaddr, size) ? -EFAULT : 0;
 }
+
+static nokprobe_inline int
+probe_mem_read_user(void *dest, void *src, size_t size)
+{
+	return probe_mem_read(dest, src, size);
+}
+
 /*
  * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
  * length and relative data location.


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 14:05 ` [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
@ 2019-02-25 15:06   ` Peter Zijlstra
  2019-02-25 17:00     ` Linus Torvalds
  2019-02-26  3:01     ` [RFC PATCH 2/4] " Masami Hiramatsu
  2019-02-25 17:06   ` Kees Cook
  1 sibling, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2019-02-25 15:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Linus Torvalds, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit

On Mon, Feb 25, 2019 at 11:05:41PM +0900, Masami Hiramatsu wrote:

> +static __always_inline long __strncpy_from_unsafe_user(char *dst,
> +			 const char __user *unsafe_addr, long count)
> +{
> +	if (!access_ok(unsafe_addr, count))
> +		return -EFAULT;
> +
> +	return strncpy_from_unsafe_common(dst, unsafe_addr, count);
> +}

Would something like so work for people?

---
 arch/x86/include/asm/uaccess.h |  8 +++++++-
 include/linux/uaccess.h        | 18 ++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 780f2b42c8ef..3125d129d3b6 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -92,12 +92,18 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
  * checks that the pointer is in the user space range - after calling
  * this function, memory access functions may still return -EFAULT.
  */
-#define access_ok(addr, size)					\
+#define access_ok(addr, size)						\
 ({									\
 	WARN_ON_IN_IRQ();						\
 	likely(!__range_not_ok(addr, size, user_addr_max()));		\
 })
 
+#define user_access_ok(addr, size)					\
+({									\
+	WARN_ON_ONCE(!segment_eq(get_fs(), USER_DS));			\
+	likely(!__range_not_ok(addr, size, user_addr_max()));		\
+})
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 37b226e8df13..088f2ae09e14 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -10,6 +10,24 @@
 
 #include <asm/uaccess.h>
 
+/**
+ * user_access_ok: Checks if a user space pointer is valid
+ * @addr: User space pointer to start of block to check
+ * @size: Size of block to check
+ *
+ * Context: User context or explicit set_fs(USER_DS).
+ *
+ * This function is very much like access_ok(), except it (may) have different
+ * context validation. In general we must be very careful when using this.
+ */
+#ifndef user_access_ok
+#define user_access_ok(addr, size)					\
+({									\
+	WARN_ON_ONCE(!segment_eq(get_fs(), USER_DS));			\
+	access_ok(addr, size);						\
+})
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and

> +/**
> + * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
> + *				address.
> + * @dst:   Destination address, in kernel space.  This buffer must be at
> + *         least @count bytes long.
> + * @unsafe_addr: Unsafe user address.
> + * @count: Maximum number of bytes to copy, including the trailing NUL.
> + *
> + * Copies a NUL-terminated string from unsafe user address to kernel buffer.
> + *
> + * On success, returns the length of the string INCLUDING the trailing NUL.
> + *
> + * If access fails, returns -EFAULT (some data may have been copied
> + * and the trailing NUL added).
> + *
> + * If @count is smaller than the length of the string, copies @count-1 bytes,
> + * sets the last byte of @dst buffer to NUL and returns @count.
> + */
> +long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
> +			      long count)
> +{
> +	mm_segment_t old_fs = get_fs();
> +	long ret;
> +
> +	if (unlikely(count <= 0))
> +		return 0;
> +
> +	if (segment_eq(old_fs, USER_DS)) {
> +		ret = __strncpy_from_unsafe_user(dst, unsafe_addr, count);
> +	} else {
> +		set_fs(USER_DS);
> +		ret = __strncpy_from_unsafe_user(dst, unsafe_addr, count);
> +		set_fs(old_fs);
> +	}
> +	return ret;
>  }

Is that really worth the effort?

Why not keep it simple:

	mm_segment_t old_fs = get_fs();

	set_fs(USER_DS);
	ret = __strncpy...();
	set_fs(old_fd);

	return ret;
?

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 15:06   ` Peter Zijlstra
@ 2019-02-25 17:00     ` Linus Torvalds
  2019-02-25 18:16       ` Andy Lutomirski
  2019-02-26  4:16       ` Masami Hiramatsu
  2019-02-26  3:01     ` [RFC PATCH 2/4] " Masami Hiramatsu
  1 sibling, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2019-02-25 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Steven Rostedt, Linux List Kernel Mailing,
	Andy Lutomirski, Ingo Molnar, Andrew Morton, Changbin Du,
	Jann Horn, Kees Cook, Andy Lutomirski, Alexei Starovoitov,
	Nadav Amit

On Mon, Feb 25, 2019 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Would something like so work for people?

Looks reasonable to me.

> Why not keep it simple:
>
>         mm_segment_t old_fs = get_fs();
>
>         set_fs(USER_DS);
>         ret = __strncpy...();
>         set_fs(old_fd);
>
>         return ret;

So none of this code looks sane. First odd, there's no real reason to
use __get_user(). The thing should never be used. It does the whole
stac/clac for every byte.

In the copy_from_user() case, I suggested re-doing it as one common
routine without the set_fs() dance for the "already there" case to
simplify error handling. Here it doesn't do that.

But honestly, I think for the strncpy case, we could just do

  long strncpy_from_unsafe_user(char *dst, const void __user *src, long count)
  {
      long ret;
      mm_segment_t old_fs = get_fs();

      set_fs(USER_DS);
      pagefault_disable();
      ret = strncpy_from_user(dst, src, count);
      pagefault_enable();
      set_fs(old_fs);
      return ret;
  }

and be done with it. Efficient and simple.

Note: the above will *only* work for actual user addresses, because
strncpy_from_user() does that proper range check.

                  Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 14:05 ` [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
  2019-02-25 15:06   ` Peter Zijlstra
@ 2019-02-25 17:06   ` Kees Cook
  2019-02-26  4:07     ` Masami Hiramatsu
  1 sibling, 1 reply; 19+ messages in thread
From: Kees Cook @ 2019-02-25 17:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Linus Torvalds, LKML, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

On Mon, Feb 25, 2019 at 6:06 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> +static __always_inline long strncpy_from_unsafe_common(char *dst,
> +                               const char __user *unsafe_addr, long count)
> +{
> +       const char __user *src = unsafe_addr;
> +       int ret;
> +
> +       pagefault_disable();
> +       do {
> +               ret = __get_user(*dst++, src++);
> +       } while (dst[-1] && ret == 0 && src - unsafe_addr < count);
> +       dst[-1] = '\0';
> +       pagefault_enable();
> +
> +       return ret ? -EFAULT : src - unsafe_addr;
> +}

I'm all for always NUL-truncating, but this isn't "strncpy" (which has
the buggy maybe-I-didn't-NUL-terminate behavior). Can we call this
strscpy_...() instead?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 17:00     ` Linus Torvalds
@ 2019-02-25 18:16       ` Andy Lutomirski
  2019-02-26  4:16       ` Masami Hiramatsu
  1 sibling, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2019-02-25 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Masami Hiramatsu, Steven Rostedt,
	Linux List Kernel Mailing, Ingo Molnar, Andrew Morton,
	Changbin Du, Jann Horn, Kees Cook, Andy Lutomirski,
	Alexei Starovoitov, Nadav Amit

On Mon, Feb 25, 2019 at 9:01 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, Feb 25, 2019 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Would something like so work for people?
>
> Looks reasonable to me.
>
> > Why not keep it simple:
> >
> >         mm_segment_t old_fs = get_fs();
> >
> >         set_fs(USER_DS);
> >         ret = __strncpy...();
> >         set_fs(old_fd);
> >
> >         return ret;
>
> So none of this code looks sane. First odd, there's no real reason to
> use __get_user(). The thing should never be used. It does the whole
> stac/clac for every byte.
>
> In the copy_from_user() case, I suggested re-doing it as one common
> routine without the set_fs() dance for the "already there" case to
> simplify error handling. Here it doesn't do that.
>
> But honestly, I think for the strncpy case, we could just do
>
>   long strncpy_from_unsafe_user(char *dst, const void __user *src, long count)
>   {
>       long ret;
>       mm_segment_t old_fs = get_fs();
>
>       set_fs(USER_DS);
>       pagefault_disable();
>       ret = strncpy_from_user(dst, src, count);
>       pagefault_enable();
>       set_fs(old_fs);
>       return ret;
>   }
>
> and be done with it. Efficient and simple.
>
> Note: the above will *only* work for actual user addresses, because
> strncpy_from_user() does that proper range check.
>

Can we also stick the nmi_uaccess_okay() thing in here and kill an
extra bird with the same stone?  Basically, this is "I have no idea
what context I'm in, but I want to try to read a string from current's
address space, so do your best."  In which case, we need to handle the
fact that we might be in KERNEL_DS *and* we need to handle the fact
that CR3 might be wrong.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 15:06   ` Peter Zijlstra
  2019-02-25 17:00     ` Linus Torvalds
@ 2019-02-26  3:01     ` Masami Hiramatsu
  1 sibling, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-26  3:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linus Torvalds, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit

Hi Peter,

On Mon, 25 Feb 2019 16:06:03 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 25, 2019 at 11:05:41PM +0900, Masami Hiramatsu wrote:
> 
> > +static __always_inline long __strncpy_from_unsafe_user(char *dst,
> > +			 const char __user *unsafe_addr, long count)
> > +{
> > +	if (!access_ok(unsafe_addr, count))
> > +		return -EFAULT;
> > +
> > +	return strncpy_from_unsafe_common(dst, unsafe_addr, count);
> > +}
> 
> Would something like so work for people?
> 
> ---
>  arch/x86/include/asm/uaccess.h |  8 +++++++-
>  include/linux/uaccess.h        | 18 ++++++++++++++++++
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 780f2b42c8ef..3125d129d3b6 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -92,12 +92,18 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
>   * checks that the pointer is in the user space range - after calling
>   * this function, memory access functions may still return -EFAULT.
>   */
> -#define access_ok(addr, size)					\
> +#define access_ok(addr, size)						\
>  ({									\
>  	WARN_ON_IN_IRQ();						\
>  	likely(!__range_not_ok(addr, size, user_addr_max()));		\
>  })
>  
> +#define user_access_ok(addr, size)					\
> +({									\
> +	WARN_ON_ONCE(!segment_eq(get_fs(), USER_DS));			\
> +	likely(!__range_not_ok(addr, size, user_addr_max()));		\
> +})
> +
>  /*
>   * These are the main single-value transfer routines.  They automatically
>   * use the right size if we just have the right pointer type.
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 37b226e8df13..088f2ae09e14 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -10,6 +10,24 @@
>  
>  #include <asm/uaccess.h>
>  
> +/**
> + * user_access_ok: Checks if a user space pointer is valid
> + * @addr: User space pointer to start of block to check
> + * @size: Size of block to check
> + *
> + * Context: User context or explicit set_fs(USER_DS).
> + *
> + * This function is very much like access_ok(), except it (may) have different
> + * context validation. In general we must be very careful when using this.
> + */
> +#ifndef user_access_ok
> +#define user_access_ok(addr, size)					\
> +({									\
> +	WARN_ON_ONCE(!segment_eq(get_fs(), USER_DS));			\
> +	access_ok(addr, size);						\
> +})
> +#endif
> +
>  /*
>   * Architectures should provide two primitives (raw_copy_{to,from}_user())
>   * and get rid of their private instances of copy_{to,from}_user() and

Yeah, looks good to me. 

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 17:06   ` Kees Cook
@ 2019-02-26  4:07     ` Masami Hiramatsu
  0 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-26  4:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Steven Rostedt, Linus Torvalds, LKML, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Hi Kees,

On Mon, 25 Feb 2019 09:06:55 -0800
Kees Cook <keescook@chromium.org> wrote:

> On Mon, Feb 25, 2019 at 6:06 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > +static __always_inline long strncpy_from_unsafe_common(char *dst,
> > +                               const char __user *unsafe_addr, long count)
> > +{
> > +       const char __user *src = unsafe_addr;
> > +       int ret;
> > +
> > +       pagefault_disable();
> > +       do {
> > +               ret = __get_user(*dst++, src++);
> > +       } while (dst[-1] && ret == 0 && src - unsafe_addr < count);
> > +       dst[-1] = '\0';
> > +       pagefault_enable();
> > +
> > +       return ret ? -EFAULT : src - unsafe_addr;
> > +}
> 
> I'm all for always NUL-truncating, but this isn't "strncpy" (which has
> the buggy maybe-I-didn't-NUL-terminate behavior). Can we call this
> strscpy_...() instead?

Yes, it is easy to me to fit it to strscpy spec and caller side too.
But if we reuse strncpy_from_user() as Linus suggested, it may be better
keep it or write a wrapper, since this function spec is still a bit
different from strscpy (this doesn't return -E2BIG but returns the
copied length of the string with NULL terminal byte).

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-25 17:00     ` Linus Torvalds
  2019-02-25 18:16       ` Andy Lutomirski
@ 2019-02-26  4:16       ` Masami Hiramatsu
  2019-02-26 12:24         ` Masami Hiramatsu
  1 sibling, 1 reply; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-26  4:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Masami Hiramatsu, Steven Rostedt,
	Linux List Kernel Mailing, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit

Hi Linus,

On Mon, 25 Feb 2019 09:00:57 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Feb 25, 2019 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Would something like so work for people?
> 
> Looks reasonable to me.
> 
> > Why not keep it simple:
> >
> >         mm_segment_t old_fs = get_fs();
> >
> >         set_fs(USER_DS);
> >         ret = __strncpy...();
> >         set_fs(old_fd);
> >
> >         return ret;
> 
> So none of this code looks sane. First odd, there's no real reason to
> use __get_user(). The thing should never be used. It does the whole
> stac/clac for every byte.

Ah, I got it. I just followed the commit bd28b14591b9 ("x86: remove more
uaccess_32.h complexity") as same as strnlen_from_unsafe(). No special
reason.

> 
> In the copy_from_user() case, I suggested re-doing it as one common
> routine without the set_fs() dance for the "already there" case to
> simplify error handling. Here it doesn't do that.
> 
> But honestly, I think for the strncpy case, we could just do
> 
>   long strncpy_from_unsafe_user(char *dst, const void __user *src, long count)
>   {
>       long ret;
>       mm_segment_t old_fs = get_fs();
> 
>       set_fs(USER_DS);
>       pagefault_disable();
>       ret = strncpy_from_user(dst, src, count);
>       pagefault_enable();
>       set_fs(old_fs);
>       return ret;
>   }
> 
> and be done with it. Efficient and simple.

Yes, it looks good to me :)

> 
> Note: the above will *only* work for actual user addresses, because
> strncpy_from_user() does that proper range check.

I think we can reuse do_strncpy_from_user() for strncpy_from_unsafe().
(so maybe we should move it from mm/maccess.c to lib/strncpy_from_user.c?)

As Kees pointed out, I think it is a good chance to sort the behavior of
these strXcpy APIs to match their names.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions
  2019-02-26  4:16       ` Masami Hiramatsu
@ 2019-02-26 12:24         ` Masami Hiramatsu
  2019-02-26 15:14           ` [RFC PATCH v2] " Masami Hiramatsu
  0 siblings, 1 reply; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-26 12:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Linus Torvalds, Peter Zijlstra, Steven Rostedt,
	Linux List Kernel Mailing, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit

On Tue, 26 Feb 2019 13:16:05 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Hi Linus,
> 
> On Mon, 25 Feb 2019 09:00:57 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Mon, Feb 25, 2019 at 7:06 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Would something like so work for people?
> > 
> > Looks reasonable to me.
> > 
> > > Why not keep it simple:
> > >
> > >         mm_segment_t old_fs = get_fs();
> > >
> > >         set_fs(USER_DS);
> > >         ret = __strncpy...();
> > >         set_fs(old_fd);
> > >
> > >         return ret;
> > 
> > So none of this code looks sane. First odd, there's no real reason to
> > use __get_user(). The thing should never be used. It does the whole
> > stac/clac for every byte.
> 
> Ah, I got it. I just followed the commit bd28b14591b9 ("x86: remove more
> uaccess_32.h complexity") as same as strnlen_from_unsafe(). No special
> reason.
> 
> > 
> > In the copy_from_user() case, I suggested re-doing it as one common
> > routine without the set_fs() dance for the "already there" case to
> > simplify error handling. Here it doesn't do that.
> > 
> > But honestly, I think for the strncpy case, we could just do
> > 
> >   long strncpy_from_unsafe_user(char *dst, const void __user *src, long count)
> >   {
> >       long ret;
> >       mm_segment_t old_fs = get_fs();
> > 
> >       set_fs(USER_DS);
> >       pagefault_disable();
> >       ret = strncpy_from_user(dst, src, count);
> >       pagefault_enable();
> >       set_fs(old_fs);
> >       return ret;
> >   }
> > 
> > and be done with it. Efficient and simple.
> 
> Yes, it looks good to me :)
> 
> > 
> > Note: the above will *only* work for actual user addresses, because
> > strncpy_from_user() does that proper range check.
> 
> I think we can reuse do_strncpy_from_user() for strncpy_from_unsafe().
> (so maybe we should move it from mm/maccess.c to lib/strncpy_from_user.c?)

Ah, no, since lib/strncpy_from_user.c depends on CONFIG_GENERIC_STRNCPY_FROM_USER
which is not selected on a half of supported architectures. Hmm, can we make
strncpy_from_user() an __weak function and build that always?

> As Kees pointed out, I think it is a good chance to sort the behavior of
> these strXcpy APIs to match their names.

I meant, at first rename strncpy_from_unsafe() to strscpy_from_unsafe()
and change the behavior (returning the length of copied string, exclude NULL).

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH v2] uaccess: Add non-pagefault user-space read functions
  2019-02-26 12:24         ` Masami Hiramatsu
@ 2019-02-26 15:14           ` Masami Hiramatsu
  0 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-26 15:14 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

Add probe_user_read() and strncpy_from_unsafe_user() which
will not involves mm_sem so we can use it for accessing
user-space in irq-handler.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
  Changes in v2:
   - Simplify strncpy_from_unsafe_user() using strncpy_from_user()
     according to Linus's suggestion.
   - Simplify probe_user_read() not using intermediate function.
---
 include/linux/uaccess.h |   13 +++++++
 mm/maccess.c            |   91 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 37b226e8df13..906573b8f02c 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -240,6 +240,17 @@ extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long __probe_kernel_read(void *dst, const void *src, size_t size);
 
 /*
+ * probe_user_read(): safely attempt to read from a location in user space
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Safely read from address @src to the buffer at @dst.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+extern long probe_user_read(void *dst, const void __user *src, size_t size);
+
+/*
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
  * @src: pointer to the data that shall be written
@@ -252,6 +263,8 @@ extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
+extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+				     long count);
 
 /**
  * probe_kernel_address(): safely attempt to read from a location
diff --git a/mm/maccess.c b/mm/maccess.c
index ec00be51a24f..0209c9eeaa70 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -5,8 +5,20 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
+static __always_inline long
+probe_read_common(void *dst, const void __user *src, size_t size)
+{
+	long ret;
+
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(dst, src, size);
+	pagefault_enable();
+
+	return ret ? -EFAULT : 0;
+}
+
 /**
- * probe_kernel_read(): safely attempt to read from a location
+ * probe_kernel_read(): safely attempt to read from a kernel-space location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
@@ -29,17 +41,44 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(dst,
-			(__force const void __user *)src, size);
-	pagefault_enable();
+	ret = probe_read_common(dst, (__force const void __user *)src, size);
 	set_fs(old_fs);
 
-	return ret ? -EFAULT : 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
 /**
+ * probe_user_read(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+
+long __weak probe_user_read(void *dst, const void __user *src, size_t size)
+    __attribute__((alias("__probe_user_read")));
+
+long __probe_user_read(void *dst, const void __user *src, size_t size)
+{
+	long ret;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(USER_DS);
+
+	if (!access_ok(src, size))
+		ret = -EFAULT;
+	else
+		ret = probe_read_common(dst, src, size);
+
+	set_fs(old_fs);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(probe_user_read);
+
+/**
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
  * @src: pointer to the data that shall be written
@@ -66,6 +105,7 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 }
 EXPORT_SYMBOL_GPL(probe_kernel_write);
 
+
 /**
  * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
@@ -105,3 +145,42 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	return ret ? -EFAULT : src - unsafe_addr;
 }
+
+/**
+ * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ *				address.
+ * @dst:   Destination address, in kernel space.  This buffer must be at
+ *         least @count bytes long.
+ * @unsafe_addr: Unsafe user address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe user address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
+long strncpy_from_unsafe_user(char *dst, const void __user *src, long count)
+{
+	mm_segment_t old_fs = get_fs();
+	long ret;
+
+	if (unlikely(count <= 0))
+		return 0;
+
+	set_fs(USER_DS);
+	pagefault_disable();
+	ret = strncpy_from_user(dst, src, count);
+	pagefault_enable();
+	set_fs(old_fs);
+	if (ret >= count) {
+		ret = count;
+		dst[ret - 1] = '\0';
+	} else if (ret > 0)
+		ret++;
+	return ret;
+}


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
  2019-02-25 14:04 [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2019-02-25 14:06 ` [RFC PATCH 4/4] tracing/probe: Support user-space dereference Masami Hiramatsu
@ 2019-02-26 21:38 ` Joel Fernandes
  2019-02-27  7:41   ` Masami Hiramatsu
  4 siblings, 1 reply; 19+ messages in thread
From: Joel Fernandes @ 2019-02-26 21:38 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Linus Torvalds, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

On Mon, Feb 25, 2019 at 11:04:42PM +0900, Masami Hiramatsu wrote:
> Hi,
> 
> Here is an RFC series of probe-event to support user-space access
> methods, which we discussed in previous thread.
> 
> https://lkml.kernel.org/r/20190220171019.5e81a4946b56982f324f7c45@kernel.org
> 
> So in this thread, it is clear that current probe_kernel_read()
> and strncpy_from_unsafe() are not enough to access user-space
> variables from kprobe events on some arch. On such arch, user
> address space and kernel address space can overlap so we have
> to change the memory segment to user-mode before copying.
> But probe_kernel_read() is designed to access primarily kernel
> memory, it may fail to get, or get unexpected value on such
> arch. So we need to expand kprobe fetcharg to support new options
> for such case.
> 
> For user-space access extension, this series adds 2 features,
> "ustring" type and user-space dereference syntax. "ustring" is
> used for recording a null-terminated string in user-space from
> kprobe events.
> 
> "ustring" type is easy, it is able to use instead of "string"
> type, so if you want to record a user-space string via
> "__user char *", you can use ustring type instead of string.
> For example,
> 
> echo 'p do_sys_open path=+0($arg2):ustring' >> kprobe_events
> 
> will record the path string from user-space.
> 
> The user-space dereference syntax is also simple. Thi just
> adds 'u' prefix before an offset value.
> 
>    +|-u<OFFSET>(<FETCHARG>)
> 
> e.g. +u8(%ax), +u0(+0(%si))
> 
> This is more generic. If you want to refer the variable in user-
> space from its address or access a field in data structure in
> user-space, you need to use this.
> 
> For example, if you probe do_sched_setscheduler(pid, policy,
> param) and record param->sched_priority, you can add new
> probe as below;
>     
>    p do_sched_setscheduler priority=+u0($arg3)
> 
> Actually, with this feature, "ustring" type is not absolutely
> necessary, because these are same meanings.
> 
>   +0($arg2):ustring == +u0($arg2):string
> 
> Perhups, we may be better removing "ustring" and just introducing
> this user-space dereference syntax...
> 
> Note that kprobe event provides these methods, but it doesn't
> change it from kernel to user automatically because we do not
> know whether the given address is in userspace or kernel on
> some arch.
> Moreover, from perf-probe, at this moment it is not able to
> switch. Since __user is not for compiler but checker, we have
> no clue which data structure is in user-space, in debuginfo.
> 
> BTW, according to Linus's comment, I implemented probe_user_read()
> and strncpy_from_unsafe_user() APIs. And since those use
> "access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86,
> it will get a warn message at once. It should be solved before
> merging this series.

I was wondering why access_ok() can sleep. In the arm64 and x86
implementation, I don't see access_ok() itself causing a user pointer
dereference access that can cause a page fault. It seems to just be checking
the validity of the ranges.

Any idea why the access_ok() code has these comments?
"Context: User context only. This function may sleep if pagefaults are
enabled."

My _guess_ is this is because whatever calls access_ok() may also call
something else that *does* fault next, if that's the case then that
WARN_ON_IN_IRQ() in access_ok() is fine, but at least I guess the comments
should be more clear that it is not access_ok() itself that sleeps.

thanks for any help on understanding this,

 - Joel


> 
> Thank you,
> 
> ---
> 
> Masami Hiramatsu (4):
>       uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
>       uaccess: Add non-pagefault user-space read functions
>       tracing/probe: Add ustring type for user-space string
>       tracing/probe: Support user-space dereference
> 
> 
>  Documentation/trace/kprobetrace.rst  |   13 ++-
>  Documentation/trace/uprobetracer.rst |    9 +-
>  fs/namespace.c                       |    2 
>  include/linux/uaccess.h              |   13 +++
>  kernel/trace/trace.c                 |    7 +-
>  kernel/trace/trace_kprobe.c          |   65 ++++++++++++++++
>  kernel/trace/trace_probe.c           |   39 ++++++++--
>  kernel/trace/trace_probe.h           |    3 +
>  kernel/trace/trace_probe_tmpl.h      |   36 +++++++--
>  kernel/trace/trace_uprobe.c          |   19 +++++
>  mm/maccess.c                         |  138 ++++++++++++++++++++++++++++++----
>  11 files changed, 302 insertions(+), 42 deletions(-)
> 
> --
> Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
  2019-02-26 21:38 ` [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Joel Fernandes
@ 2019-02-27  7:41   ` Masami Hiramatsu
  2019-02-27  8:00     ` Peter Zijlstra
  2019-02-27 21:33     ` Joel Fernandes
  0 siblings, 2 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-27  7:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, Linus Torvalds, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

On Tue, 26 Feb 2019 16:38:50 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Mon, Feb 25, 2019 at 11:04:42PM +0900, Masami Hiramatsu wrote:

> > Note that kprobe event provides these methods, but it doesn't
> > change it from kernel to user automatically because we do not
> > know whether the given address is in userspace or kernel on
> > some arch.
> > Moreover, from perf-probe, at this moment it is not able to
> > switch. Since __user is not for compiler but checker, we have
> > no clue which data structure is in user-space, in debuginfo.
> > 
> > BTW, according to Linus's comment, I implemented probe_user_read()
> > and strncpy_from_unsafe_user() APIs. And since those use
> > "access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86,
> > it will get a warn message at once. It should be solved before
> > merging this series.
> 
> I was wondering why access_ok() can sleep. In the arm64 and x86
> implementation, I don't see access_ok() itself causing a user pointer
> dereference access that can cause a page fault. It seems to just be checking
> the validity of the ranges.
> 
> Any idea why the access_ok() code has these comments?
> "Context: User context only. This function may sleep if pagefaults are
> enabled."

Because access_ok() is used only for preparing accessing user-space,
and the user-space access may cause page-fault and sleep.

IMHO, checking in access_ok() inside it is reasonable, but as it
commented, it is for "if pagefaults are enabled.". What we need another 
access_ok() for the case when pagefaults are disabled, that is what PeterZ
suggested in below mail.

https://lore.kernel.org/lkml/20190225150603.GE32494@hirez.programming.kicks-ass.net/T/#u


Thank you,

> 
> My _guess_ is this is because whatever calls access_ok() may also call
> something else that *does* fault next, if that's the case then that
> WARN_ON_IN_IRQ() in access_ok() is fine, but at least I guess the comments
> should be more clear that it is not access_ok() itself that sleeps.
> 
> thanks for any help on understanding this,
> 
>  - Joel
> 
> 
> > 
> > Thank you,
> > 
> > ---
> > 
> > Masami Hiramatsu (4):
> >       uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
> >       uaccess: Add non-pagefault user-space read functions
> >       tracing/probe: Add ustring type for user-space string
> >       tracing/probe: Support user-space dereference
> > 
> > 
> >  Documentation/trace/kprobetrace.rst  |   13 ++-
> >  Documentation/trace/uprobetracer.rst |    9 +-
> >  fs/namespace.c                       |    2 
> >  include/linux/uaccess.h              |   13 +++
> >  kernel/trace/trace.c                 |    7 +-
> >  kernel/trace/trace_kprobe.c          |   65 ++++++++++++++++
> >  kernel/trace/trace_probe.c           |   39 ++++++++--
> >  kernel/trace/trace_probe.h           |    3 +
> >  kernel/trace/trace_probe_tmpl.h      |   36 +++++++--
> >  kernel/trace/trace_uprobe.c          |   19 +++++
> >  mm/maccess.c                         |  138 ++++++++++++++++++++++++++++++----
> >  11 files changed, 302 insertions(+), 42 deletions(-)
> > 
> > --
> > Masami Hiramatsu (Linaro) <mhiramat@kernel.org>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
  2019-02-27  7:41   ` Masami Hiramatsu
@ 2019-02-27  8:00     ` Peter Zijlstra
  2019-02-27 11:39       ` Masami Hiramatsu
  2019-02-27 21:33     ` Joel Fernandes
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2019-02-27  8:00 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Joel Fernandes, Steven Rostedt, Linus Torvalds, linux-kernel,
	Andy Lutomirski, Ingo Molnar, Andrew Morton, Changbin Du,
	Jann Horn, Kees Cook, Andy Lutomirski, Alexei Starovoitov,
	Nadav Amit

On Wed, Feb 27, 2019 at 04:41:04PM +0900, Masami Hiramatsu wrote:
> 
> IMHO, checking in access_ok() inside it is reasonable, but as it
> commented, it is for "if pagefaults are enabled.". What we need another 
> access_ok() for the case when pagefaults are disabled, that is what PeterZ
> suggested in below mail.
> 
> https://lore.kernel.org/lkml/20190225150603.GE32494@hirez.programming.kicks-ass.net/T/#u

FWIW, feel free to carry that patch with your other changes that require
it.

You can add my SoB if you write the Changelog ;-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
  2019-02-27  8:00     ` Peter Zijlstra
@ 2019-02-27 11:39       ` Masami Hiramatsu
  0 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2019-02-27 11:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joel Fernandes, Steven Rostedt, Linus Torvalds, linux-kernel,
	Andy Lutomirski, Ingo Molnar, Andrew Morton, Changbin Du,
	Jann Horn, Kees Cook, Andy Lutomirski, Alexei Starovoitov,
	Nadav Amit

On Wed, 27 Feb 2019 09:00:34 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Feb 27, 2019 at 04:41:04PM +0900, Masami Hiramatsu wrote:
> > 
> > IMHO, checking in access_ok() inside it is reasonable, but as it
> > commented, it is for "if pagefaults are enabled.". What we need another 
> > access_ok() for the case when pagefaults are disabled, that is what PeterZ
> > suggested in below mail.
> > 
> > https://lore.kernel.org/lkml/20190225150603.GE32494@hirez.programming.kicks-ass.net/T/#u
> 
> FWIW, feel free to carry that patch with your other changes that require
> it.
> 
> You can add my SoB if you write the Changelog ;-)

OK, I'll send v3 patch including your fix :)

Thank you!

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access
  2019-02-27  7:41   ` Masami Hiramatsu
  2019-02-27  8:00     ` Peter Zijlstra
@ 2019-02-27 21:33     ` Joel Fernandes
  1 sibling, 0 replies; 19+ messages in thread
From: Joel Fernandes @ 2019-02-27 21:33 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Linus Torvalds, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Peter Zijlstra

On Wed, Feb 27, 2019 at 04:41:04PM +0900, Masami Hiramatsu wrote:
> On Tue, 26 Feb 2019 16:38:50 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > On Mon, Feb 25, 2019 at 11:04:42PM +0900, Masami Hiramatsu wrote:
> 
> > > Note that kprobe event provides these methods, but it doesn't
> > > change it from kernel to user automatically because we do not
> > > know whether the given address is in userspace or kernel on
> > > some arch.
> > > Moreover, from perf-probe, at this moment it is not able to
> > > switch. Since __user is not for compiler but checker, we have
> > > no clue which data structure is in user-space, in debuginfo.
> > > 
> > > BTW, according to Linus's comment, I implemented probe_user_read()
> > > and strncpy_from_unsafe_user() APIs. And since those use
> > > "access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86,
> > > it will get a warn message at once. It should be solved before
> > > merging this series.
> > 
> > I was wondering why access_ok() can sleep. In the arm64 and x86
> > implementation, I don't see access_ok() itself causing a user pointer
> > dereference access that can cause a page fault. It seems to just be checking
> > the validity of the ranges.
> > 
> > Any idea why the access_ok() code has these comments?
> > "Context: User context only. This function may sleep if pagefaults are
> > enabled."
> 
> Because access_ok() is used only for preparing accessing user-space,
> and the user-space access may cause page-fault and sleep.

Pedantically speaking, the access_ok() function itself in x86 implementation
does not sleep so the comment on the function saying "This function may
sleep" is odd to the code reader (and it confused someone else too so I'm not
the only one :)), but it could be that for some architectures it does sleep...

> IMHO, checking in access_ok() inside it is reasonable, but as it
> commented, it is for "if pagefaults are enabled.". What we need another 
> access_ok() for the case when pagefaults are disabled, that is what PeterZ
> suggested in below mail.
> 
> https://lore.kernel.org/lkml/20190225150603.GE32494@hirez.programming.kicks-ass.net/T/#u

Makes sense, thanks.

- Joel


> 
> 
> Thank you,
> 
> > 
> > My _guess_ is this is because whatever calls access_ok() may also call
> > something else that *does* fault next, if that's the case then that
> > WARN_ON_IN_IRQ() in access_ok() is fine, but at least I guess the comments
> > should be more clear that it is not access_ok() itself that sleeps.
> > 
> > thanks for any help on understanding this,
> > 
> >  - Joel
> > 
> > 
> > > 
> > > Thank you,
> > > 
> > > ---
> > > 
> > > Masami Hiramatsu (4):
> > >       uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
> > >       uaccess: Add non-pagefault user-space read functions
> > >       tracing/probe: Add ustring type for user-space string
> > >       tracing/probe: Support user-space dereference
> > > 
> > > 
> > >  Documentation/trace/kprobetrace.rst  |   13 ++-
> > >  Documentation/trace/uprobetracer.rst |    9 +-
> > >  fs/namespace.c                       |    2 
> > >  include/linux/uaccess.h              |   13 +++
> > >  kernel/trace/trace.c                 |    7 +-
> > >  kernel/trace/trace_kprobe.c          |   65 ++++++++++++++++
> > >  kernel/trace/trace_probe.c           |   39 ++++++++--
> > >  kernel/trace/trace_probe.h           |    3 +
> > >  kernel/trace/trace_probe_tmpl.h      |   36 +++++++--
> > >  kernel/trace/trace_uprobe.c          |   19 +++++
> > >  mm/maccess.c                         |  138 ++++++++++++++++++++++++++++++----
> > >  11 files changed, 302 insertions(+), 42 deletions(-)
> > > 
> > > --
> > > Masami Hiramatsu (Linaro) <mhiramat@kernel.org>
> 
> 
> -- 
> Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-02-27 21:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-25 14:04 [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
2019-02-25 14:05 ` [RFC PATCH 1/4] uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault Masami Hiramatsu
2019-02-25 14:05 ` [RFC PATCH 2/4] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
2019-02-25 15:06   ` Peter Zijlstra
2019-02-25 17:00     ` Linus Torvalds
2019-02-25 18:16       ` Andy Lutomirski
2019-02-26  4:16       ` Masami Hiramatsu
2019-02-26 12:24         ` Masami Hiramatsu
2019-02-26 15:14           ` [RFC PATCH v2] " Masami Hiramatsu
2019-02-26  3:01     ` [RFC PATCH 2/4] " Masami Hiramatsu
2019-02-25 17:06   ` Kees Cook
2019-02-26  4:07     ` Masami Hiramatsu
2019-02-25 14:06 ` [RFC PATCH 3/4] tracing/probe: Add ustring type for user-space string Masami Hiramatsu
2019-02-25 14:06 ` [RFC PATCH 4/4] tracing/probe: Support user-space dereference Masami Hiramatsu
2019-02-26 21:38 ` [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space access Joel Fernandes
2019-02-27  7:41   ` Masami Hiramatsu
2019-02-27  8:00     ` Peter Zijlstra
2019-02-27 11:39       ` Masami Hiramatsu
2019-02-27 21:33     ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).