linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access
@ 2019-03-18  6:42 Masami Hiramatsu
  2019-03-18  6:43 ` [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled Masami Hiramatsu
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:42 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Hi,

Here is the v6 series of probe-event to support user-space access.

In this version, I replaced user_access_ok() patch with access_ok()
enhancement, which allows user to call access_ok() in IRQ context
if it disables pagefault. In the result of this change, I also
removed user_access_ok() related patch.
This version also extends perf-probe to handle user-space memory.
This is still not automated yet, but it can be done when __user
is encoded in debuginfo.

Changes in v6:
- [1/6]: (New) allow access_ok() in IRQ context if pagefault is
         disabled.
- [3/6]: Remove user_access_ok()
- [5/6]: Add $argN availablity check
- [6/6]: (New) extend perf-probe to handle "@user" attribute which
         allows user to specify "user-space local variable"

V5 series is here;

https://lkml.kernel.org/r/155136974478.2968.3105123100519786079.stgit@devbox

In summary, strncpy_from_user() should work as below

 - strncpy_from_user() can access user memory with set_fs(USER_DS)
   in task context

 - strncpy_from_user() can access kernel memory with set_fs(KERNEL_DS)
   in task context (e.g. devtmpfsd and init)

 - strncpy_from_user() can access user/kernel memory (depends on DS)
   in IRQ context if pagefault is disabled. (both verified)

PeterZ, would you still have any concern about this check?

====
Kprobe event user-space memory access features:

For user-space access extension, this series adds 2 features,
"ustring" type and user-space dereference syntax. "ustring" is
used for recording a null-terminated string in user-space from
kprobe events.

"ustring" type is easy, it is able to use instead of "string"
type, so if you want to record a user-space string via
"__user char *", you can use ustring type instead of string.
For example,

echo 'p do_sys_open path=+0($arg2):ustring' >> kprobe_events

will record the path string from user-space.

The user-space dereference syntax is also simple. Thi just
adds 'u' prefix before an offset value.

   +|-u<OFFSET>(<FETCHARG>)

e.g. +u8(%ax), +u0(+0(%si))

This is more generic. If you want to refer the variable in user-
space from its address or access a field in data structure in
user-space, you need to use this.

For example, if you probe do_sched_setscheduler(pid, policy,
param) and record param->sched_priority, you can add new
probe as below;
    
   p do_sched_setscheduler priority=+u0($arg3)

Actually, with this feature, "ustring" type is not absolutely
necessary, because these are same meanings.

  +0($arg2):ustring == +u0($arg2):string

Note that kprobe event provides these methods, but it doesn't
change it from kernel to user automatically because we do not
know whether the given address is in userspace or kernel on
some arch.


Thank you,

---

Masami Hiramatsu (6):
      x86/uaccess: Allow access_ok() in irq context if pagefault_disabled
      uaccess: Add non-pagefault user-space read functions
      tracing/probe: Add ustring type for user-space string
      tracing/probe: Support user-space dereference
      selftests/ftrace: Add user-memory access syntax testcase
      perf-probe: Add user memory access attribute support


 Documentation/trace/kprobetrace.rst                |   28 ++++-
 Documentation/trace/uprobetracer.rst               |    9 +-
 arch/x86/include/asm/uaccess.h                     |    4 +
 include/linux/uaccess.h                            |   19 +++
 kernel/trace/trace.c                               |    7 +
 kernel/trace/trace_kprobe.c                        |   35 ++++++
 kernel/trace/trace_probe.c                         |   37 +++++-
 kernel/trace/trace_probe.h                         |    3 +
 kernel/trace/trace_probe_tmpl.h                    |   37 +++++-
 kernel/trace/trace_uprobe.c                        |   19 +++
 mm/maccess.c                                       |  117 +++++++++++++++++++-
 tools/perf/Documentation/perf-probe.txt            |    3 -
 tools/perf/util/probe-event.c                      |   11 ++
 tools/perf/util/probe-event.h                      |    2 
 tools/perf/util/probe-file.c                       |    7 +
 tools/perf/util/probe-file.h                       |    1 
 tools/perf/util/probe-finder.c                     |   19 ++-
 .../ftrace/test.d/kprobe/kprobe_args_user.tc       |   32 +++++
 18 files changed, 349 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled
  2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
@ 2019-03-18  6:43 ` Masami Hiramatsu
  2019-03-22  2:46   ` Steven Rostedt
  2019-03-18  6:43 ` [RFC PATCH v6 2/6] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:43 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

WARN_ON_IN_IRQ() assumes that the access_ok() and following
user memory access can sleep. But this assumption is not
always correct; when the pagefault is disabled, following
memory access will just returns -EFAULT and never sleep.

Add pagefault_disabled() check in WARN_ON_ONCE() so that
it can ignore the case we call it with disabling pagefault.
For this purpose, this modified pagefault_disabled() as
an inline function.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/include/asm/uaccess.h |    4 +++-
 include/linux/uaccess.h        |    5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 1954dd5552a2..11581dc098d5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -66,7 +66,9 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
 })
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
-# define WARN_ON_IN_IRQ()	WARN_ON_ONCE(!in_task())
+static inline bool pagefault_disabled(void);
+# define WARN_ON_IN_IRQ()	\
+	WARN_ON_ONCE(!in_task() && !pagefault_disabled())
 #else
 # define WARN_ON_IN_IRQ()
 #endif
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 37b226e8df13..ef3032db1aef 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -203,7 +203,10 @@ static inline void pagefault_enable(void)
 /*
  * Is the pagefault handler disabled? If so, user access methods will not sleep.
  */
-#define pagefault_disabled() (current->pagefault_disabled != 0)
+static inline bool pagefault_disabled(void)
+{
+	return current->pagefault_disabled != 0;
+}
 
 /*
  * The pagefault handler is in general disabled by pagefault_disable() or


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v6 2/6] uaccess: Add non-pagefault user-space read functions
  2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
  2019-03-18  6:43 ` [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled Masami Hiramatsu
@ 2019-03-18  6:43 ` Masami Hiramatsu
  2019-03-18  6:43 ` [RFC PATCH v6 3/6] tracing/probe: Add ustring type for user-space string Masami Hiramatsu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:43 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Add probe_user_read(), strncpy_from_unsafe_user() and
strnlen_unsafe_user() which allows caller to access user-space
in IRQ context.

Current probe_kernel_read() and strncpy_from_unsafe() are
not available for user-space memory, because it sets
KERNEL_DS while accessing data. On some arch, user address
space and kernel address space can be co-exist, but others
can not. In that case, setting KERNEL_DS means given
address is treated as a kernel address space.
Also strnlen_user() is only available from user context since
it can sleep if pagefault is enabled.

To access user-space memory without pagefault, we need
these new functions which sets USER_DS while accessing
the data.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
  Changes in v6:
   - Remove user_access_ok()
  Changes in v5:
   - Simplify probe_user_read() (Thanks, Peter!)
   - Add strnlen_unsafe_user()
  Changes in v3:
   - Use user_access_ok() for probe_user_read().
  Changes in v2:
   - Simplify strncpy_from_unsafe_user() using strncpy_from_user()
     according to Linus's suggestion.
   - Simplify probe_user_read() not using intermediate function.
---
 include/linux/uaccess.h |   14 ++++++
 mm/maccess.c            |  117 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 125 insertions(+), 6 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ef3032db1aef..80d35d880f56 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -242,6 +242,17 @@ static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
 extern long probe_kernel_read(void *dst, const void *src, size_t size);
 extern long __probe_kernel_read(void *dst, const void *src, size_t size);
 
+/*
+ * probe_user_read(): safely attempt to read from a location in user space
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from
+ * @size: size of the data chunk
+ *
+ * Safely read from address @src to the buffer at @dst.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+extern long probe_user_read(void *dst, const void __user *src, size_t size);
+
 /*
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
@@ -255,6 +266,9 @@ extern long notrace probe_kernel_write(void *dst, const void *src, size_t size);
 extern long notrace __probe_kernel_write(void *dst, const void *src, size_t size);
 
 extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count);
+extern long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+				     long count);
+extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 
 /**
  * probe_kernel_address(): safely attempt to read from a location
diff --git a/mm/maccess.c b/mm/maccess.c
index ec00be51a24f..4e720b173d8a 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -5,8 +5,20 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 
+static __always_inline long
+probe_read_common(void *dst, const void __user *src, size_t size)
+{
+	long ret;
+
+	pagefault_disable();
+	ret = __copy_from_user_inatomic(dst, src, size);
+	pagefault_enable();
+
+	return ret ? -EFAULT : 0;
+}
+
 /**
- * probe_kernel_read(): safely attempt to read from a location
+ * probe_kernel_read(): safely attempt to read from a kernel-space location
  * @dst: pointer to the buffer that shall take the data
  * @src: address to read from
  * @size: size of the data chunk
@@ -29,16 +41,39 @@ long __probe_kernel_read(void *dst, const void *src, size_t size)
 	mm_segment_t old_fs = get_fs();
 
 	set_fs(KERNEL_DS);
-	pagefault_disable();
-	ret = __copy_from_user_inatomic(dst,
-			(__force const void __user *)src, size);
-	pagefault_enable();
+	ret = probe_read_common(dst, (__force const void __user *)src, size);
 	set_fs(old_fs);
 
-	return ret ? -EFAULT : 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(probe_kernel_read);
 
+/**
+ * probe_user_read(): safely attempt to read from a user-space location
+ * @dst: pointer to the buffer that shall take the data
+ * @src: address to read from. This must be a user address.
+ * @size: size of the data chunk
+ *
+ * Safely read from user address @src to the buffer at @dst. If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+
+long __weak probe_user_read(void *dst, const void __user *src, size_t size)
+    __attribute__((alias("__probe_user_read")));
+
+long __probe_user_read(void *dst, const void __user *src, size_t size)
+{
+	long ret = -EFAULT;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(USER_DS);
+	if (access_ok(src, size))
+		ret = probe_read_common(dst, src, size);
+	set_fs(old_fs);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(probe_user_read);
+
 /**
  * probe_kernel_write(): safely attempt to write to a location
  * @dst: address to write to
@@ -66,6 +101,7 @@ long __probe_kernel_write(void *dst, const void *src, size_t size)
 }
 EXPORT_SYMBOL_GPL(probe_kernel_write);
 
+
 /**
  * strncpy_from_unsafe: - Copy a NUL terminated string from unsafe address.
  * @dst:   Destination address, in kernel space.  This buffer must be at
@@ -105,3 +141,72 @@ long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count)
 
 	return ret ? -EFAULT : src - unsafe_addr;
 }
+
+/**
+ * strncpy_from_unsafe_user: - Copy a NUL terminated string from unsafe user
+ *				address.
+ * @dst:   Destination address, in kernel space.  This buffer must be at
+ *         least @count bytes long.
+ * @unsafe_addr: Unsafe user address.
+ * @count: Maximum number of bytes to copy, including the trailing NUL.
+ *
+ * Copies a NUL-terminated string from unsafe user address to kernel buffer.
+ *
+ * On success, returns the length of the string INCLUDING the trailing NUL.
+ *
+ * If access fails, returns -EFAULT (some data may have been copied
+ * and the trailing NUL added).
+ *
+ * If @count is smaller than the length of the string, copies @count-1 bytes,
+ * sets the last byte of @dst buffer to NUL and returns @count.
+ */
+long strncpy_from_unsafe_user(char *dst, const void __user *unsafe_addr,
+			      long count)
+{
+	mm_segment_t old_fs = get_fs();
+	long ret;
+
+	if (unlikely(count <= 0))
+		return 0;
+
+	set_fs(USER_DS);
+	pagefault_disable();
+	ret = strncpy_from_user(dst, unsafe_addr, count);
+	pagefault_enable();
+	set_fs(old_fs);
+	if (ret >= count) {
+		ret = count;
+		dst[ret - 1] = '\0';
+	} else if (ret > 0)
+		ret++;
+	return ret;
+}
+
+/**
+ * strnlen_unsafe_user: - Get the size of a user string INCLUDING final NUL.
+ * @unsafe_addr: The string to measure.
+ * @count: Maximum count (including NUL character)
+ *
+ * Get the size of a NUL-terminated string in user space without pagefault.
+ *
+ * Returns the size of the string INCLUDING the terminating NUL.
+ *
+ * If the string is too long, returns a number larger than @count. User
+ * has to check the return value against "> count".
+ * On exception (or invalid count), returns 0.
+ *
+ * Unlike strnlen_user, this can be used from IRQ handler etc. because
+ * it disables pagefaults.
+ */
+long strnlen_unsafe_user(const void __user *unsafe_addr, long count)
+{
+	mm_segment_t old_fs = get_fs();
+	int ret;
+
+	set_fs(USER_DS);
+	pagefault_disable();
+	ret = strnlen_user(unsafe_addr, count);
+	pagefault_enable();
+	set_fs(old_fs);
+	return ret;
+}


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v6 3/6] tracing/probe: Add ustring type for user-space string
  2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
  2019-03-18  6:43 ` [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled Masami Hiramatsu
  2019-03-18  6:43 ` [RFC PATCH v6 2/6] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
@ 2019-03-18  6:43 ` Masami Hiramatsu
  2019-03-18  6:43 ` [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference Masami Hiramatsu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:43 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Add "ustring" type for fetching user-space string from kprobe event.
User can specify ustring type at uprobe event, and it is same as
"string" for uprobe.

Note that probe-event provides this option but it doesn't choose the
correct type automatically since we have not way to decide the address
is in user-space or not on some arch (and on some other arch, you can
fetch the string by "string" type). So user must carefully check the
target code (e.g. if you see __user on the target variable) and
use this new type.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

---
 Changes in v5:
 - Use strnlen_unsafe_user() in fetch_store_strlen_user().
 Changes in v2:
 - Use strnlen_user() instead of open code for fetch_store_strlen_user().
---
 Documentation/trace/kprobetrace.rst |    9 +++++++--
 kernel/trace/trace.c                |    2 +-
 kernel/trace/trace_kprobe.c         |   29 +++++++++++++++++++++++++++++
 kernel/trace/trace_probe.c          |   14 +++++++++++---
 kernel/trace/trace_probe.h          |    1 +
 kernel/trace/trace_probe_tmpl.h     |   14 +++++++++++++-
 kernel/trace/trace_uprobe.c         |   12 ++++++++++++
 7 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 235ce2ab131a..a3ac7c9ac242 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -55,7 +55,8 @@ Synopsis of kprobe_events
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
 		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
-		  (x8/x16/x32/x64), "string" and bitfield are supported.
+		  (x8/x16/x32/x64), "string", "ustring" and bitfield
+		  are supported.
 
   (\*1) only for the probe on function entry (offs == 0).
   (\*2) only for return probe.
@@ -77,7 +78,11 @@ apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
 wrong, but '+8($stack):x8[8]' is OK.)
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
-has been paged out.
+has been paged out. "ustring" type is an alternative of string for user-space.
+Note that kprobe-event provides string/ustring types, but doesn't change it
+automatically. So user has to decide if the targe string in kernel or in user
+space carefully. On some arch, if you choose wrong one, it always fails to
+record string data.
 The string array type is a bit different from other types. For other base
 types, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
 as +0(%di):x32.) But string[1] is not equal to string. The string type itself
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 21153e64bf1c..7a6ed76ba104 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4820,7 +4820,7 @@ static const char readme_msg[] =
 	"\t           $stack<index>, $stack, $retval, $comm\n"
 #endif
 	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
-	"\t           b<bit-width>@<bit-offset>/<container-size>,\n"
+	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
 	"\t           <type>\\[<array-size>\\]\n"
 #ifdef CONFIG_HIST_TRIGGERS
 	"\t    field: <stype> <name>;\n"
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5d5129b05df7..e346229ddbba 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -871,6 +871,14 @@ fetch_store_strlen(unsigned long addr)
 	return (ret < 0) ? ret : len;
 }
 
+/* Return the length of string -- including null terminal byte */
+static nokprobe_inline int
+fetch_store_strlen_user(unsigned long addr)
+{
+	return strnlen_unsafe_user((__force const void __user *)addr,
+				   MAX_STRING_SIZE);
+}
+
 /*
  * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
  * length and relative data location.
@@ -895,6 +903,27 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
 	return ret;
 }
 
+/*
+ * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
+ * with max length and relative data location.
+ */
+static nokprobe_inline int
+fetch_store_string_user(unsigned long addr, void *dest, void *base)
+{
+	int maxlen = get_loc_len(*(u32 *)dest);
+	u8 *dst = get_loc_data(dest, base);
+	long ret;
+
+	if (unlikely(!maxlen))
+		return -ENOMEM;
+	ret = strncpy_from_unsafe_user(dst, (__force const void __user *)addr,
+				       maxlen);
+
+	if (ret >= 0)
+		*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
+	return ret;
+}
+
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size)
 {
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 8f8411e7835f..30054136cfde 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -73,6 +73,8 @@ static const struct fetch_type probe_fetch_types[] = {
 	/* Special types */
 	__ASSIGN_FETCH_TYPE("string", string, string, sizeof(u32), 1,
 			    "__data_loc char[]"),
+	__ASSIGN_FETCH_TYPE("ustring", string, string, sizeof(u32), 1,
+			    "__data_loc char[]"),
 	/* Basic types */
 	ASSIGN_FETCH_TYPE(u8,  u8,  0),
 	ASSIGN_FETCH_TYPE(u16, u16, 0),
@@ -455,7 +457,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 		goto fail;
 
 	/* Store operation */
-	if (!strcmp(parg->type->name, "string")) {
+	if (!strcmp(parg->type->name, "string") ||
+	    !strcmp(parg->type->name, "ustring")) {
 		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
 		    code->op != FETCH_OP_COMM) {
 			pr_info("string only accepts memory or address.\n");
@@ -474,7 +477,11 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 				goto fail;
 			}
 		}
-		code->op = FETCH_OP_ST_STRING;	/* In DEREF case, replace it */
+		/* If op == DEREF, replace it with STRING */
+		if (!strcmp(parg->type->name, "ustring"))
+			code->op = FETCH_OP_ST_USTRING;
+		else
+			code->op = FETCH_OP_ST_STRING;
 		code->size = parg->type->size;
 		parg->dynamic = true;
 	} else if (code->op == FETCH_OP_DEREF) {
@@ -499,7 +506,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 	/* Loop(Array) operation */
 	if (parg->count) {
 		if (scode->op != FETCH_OP_ST_MEM &&
-		    scode->op != FETCH_OP_ST_STRING) {
+		    scode->op != FETCH_OP_ST_STRING &&
+		    scode->op != FETCH_OP_ST_USTRING) {
 			pr_info("array only accepts memory or address\n");
 			ret = -EINVAL;
 			goto fail;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 2177c206de15..94cdcfdaced0 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -96,6 +96,7 @@ enum fetch_op {
 	FETCH_OP_ST_RAW,	/* Raw: .size */
 	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
 	FETCH_OP_ST_STRING,	/* String: .offset, .size */
+	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
 	// Stage 4 (modify) op
 	FETCH_OP_MOD_BF,	/* Bitfield: .basesize, .lshift, .rshift */
 	// Stage 5 (loop) op
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 4737bb8c07a3..7526f6f8d7b0 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -59,6 +59,9 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs,
 static nokprobe_inline int fetch_store_strlen(unsigned long addr);
 static nokprobe_inline int
 fetch_store_string(unsigned long addr, void *dest, void *base);
+static nokprobe_inline int fetch_store_strlen_user(unsigned long addr);
+static nokprobe_inline int
+fetch_store_string_user(unsigned long addr, void *dest, void *base);
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size);
 
@@ -91,6 +94,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 			ret += fetch_store_strlen(val + code->offset);
 			code++;
 			goto array;
+		} else if (code->op == FETCH_OP_ST_USTRING) {
+			ret += fetch_store_strlen_user(val + code->offset);
+			code++;
+			goto array;
 		} else
 			return -EILSEQ;
 	}
@@ -106,6 +113,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 		loc = *(u32 *)dest;
 		ret = fetch_store_string(val + code->offset, dest, base);
 		break;
+	case FETCH_OP_ST_USTRING:
+		loc = *(u32 *)dest;
+		ret = fetch_store_string_user(val + code->offset, dest, base);
+		break;
 	default:
 		return -EILSEQ;
 	}
@@ -123,7 +134,8 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 		total += ret;
 		if (++i < code->param) {
 			code = s3;
-			if (s3->op != FETCH_OP_ST_STRING) {
+			if (s3->op != FETCH_OP_ST_STRING &&
+			    s3->op != FETCH_OP_ST_USTRING) {
 				dest += s3->size;
 				val += s3->size;
 				goto stage3;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index be78d99ee6bc..f4e37c4f8a21 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -173,6 +173,12 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
 	return ret;
 }
 
+static nokprobe_inline int
+fetch_store_string_user(unsigned long addr, void *dest, void *base)
+{
+	return fetch_store_string(addr, dest, base);
+}
+
 /* Return the length of string -- including null terminal byte */
 static nokprobe_inline int
 fetch_store_strlen(unsigned long addr)
@@ -185,6 +191,12 @@ fetch_store_strlen(unsigned long addr)
 	return (len > MAX_STRING_SIZE) ? 0 : len;
 }
 
+static nokprobe_inline int
+fetch_store_strlen_user(unsigned long addr)
+{
+	return fetch_store_strlen(addr);
+}
+
 static unsigned long translate_user_vaddr(unsigned long file_offset)
 {
 	unsigned long base_addr;


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference
  2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2019-03-18  6:43 ` [RFC PATCH v6 3/6] tracing/probe: Add ustring type for user-space string Masami Hiramatsu
@ 2019-03-18  6:43 ` Masami Hiramatsu
  2019-05-06 15:52   ` Steven Rostedt
  2019-03-18  6:44 ` [RFC PATCH v6 5/6] selftests/ftrace: Add user-memory access syntax testcase Masami Hiramatsu
  2019-03-18  6:44 ` [RFC PATCH v6 6/6] perf-probe: Add user memory access attribute support Masami Hiramatsu
  5 siblings, 1 reply; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:43 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Support user-space dereference syntax for probe event arguments
to dereference the data-structure or array in user-space.

The syntax is just adding 'u' before an offset value.

 +|-u<OFFSET>(<FETCHARG>)

e.g. +u8(%ax), +u0(+0(%si))

For example, if you probe do_sched_setscheduler(pid, policy,
param) and record param->sched_priority, you can add new
probe as below;

 p do_sched_setscheduler priority=+u0($arg3)

Note that kprobe event provides this and it doesn't change the
dereference method automatically because we do not know whether
the given address is in userspace or kernel on some arch.

So as same as "ustring", this is an option for user, who has to
carefully choose the dereference method.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v4:
  - Fix a bug for parsing argument and simplify code.
  - Fix documents accroding to Steve's comment.
 Changes in v3:
  - Add a section for user memory access to the document.
---
 Documentation/trace/kprobetrace.rst  |   27 ++++++++++++++++++++++-----
 Documentation/trace/uprobetracer.rst |    9 +++++----
 kernel/trace/trace.c                 |    5 +++--
 kernel/trace/trace_kprobe.c          |    6 ++++++
 kernel/trace/trace_probe.c           |   25 ++++++++++++++++++-------
 kernel/trace/trace_probe.h           |    2 ++
 kernel/trace/trace_probe_tmpl.h      |   23 ++++++++++++++++++-----
 kernel/trace/trace_uprobe.c          |    7 +++++++
 8 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index a3ac7c9ac242..0639ae492932 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -51,7 +51,7 @@ Synopsis of kprobe_events
   $argN		: Fetch the Nth function argument. (N >= 1) (\*1)
   $retval	: Fetch return value.(\*2)
   $comm		: Fetch current task comm.
-  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(\*3)
+  +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
 		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
@@ -61,6 +61,7 @@ Synopsis of kprobe_events
   (\*1) only for the probe on function entry (offs == 0).
   (\*2) only for return probe.
   (\*3) this is useful for fetching a field of data structures.
+  (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
 
 Types
 -----
@@ -79,10 +80,7 @@ wrong, but '+8($stack):x8[8]' is OK.)
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
 has been paged out. "ustring" type is an alternative of string for user-space.
-Note that kprobe-event provides string/ustring types, but doesn't change it
-automatically. So user has to decide if the targe string in kernel or in user
-space carefully. On some arch, if you choose wrong one, it always fails to
-record string data.
+See :ref:`user_mem_access` for more info..
 The string array type is a bit different from other types. For other base
 types, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
 as +0(%di):x32.) But string[1] is not equal to string. The string type itself
@@ -97,6 +95,25 @@ Symbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
 which shows given pointer in "symbol+offset" style.
 For $comm, the default type is "string"; any other type is invalid.
 
+.. _user_mem_access:
+User Memory Access
+------------------
+Kprobe events supports user-space memory access. For that purpose, you can use
+either user-space dereference syntax or 'ustring' type.
+
+The user-space dereference syntax allows you to access a field of a data
+structure in user-space. This is done by adding the "u" prefix to the
+dereference syntax. For example, +u4(%si) means it will read memory from the
+address in the register %si offset by 4, and the mory is expected to be in
+user-space. You can use this for strings too, e.g. +u0(%si):string will read
+a string from the address in the register %si that is expected to be in user-
+space. 'ustring' is a shortcut way of performing the same task. That is,
++0(%si):ustring is equivalent to +u0(%si):string.
+
+Note that kprobe-event provides the user-memory access syntax but it doesn't
+use it transparently. This means if you use normal dereference or string type
+for user memory, it might fail, and always fails on some arch. So user has to
+check if the targe data is in kernel or in user space carefully.
 
 Per-Probe Event Filtering
 -------------------------
diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index 4346e23e3ae7..de8812c932bc 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
    @+OFFSET	: Fetch memory at OFFSET (OFFSET from same file as PATH)
    $stackN	: Fetch Nth entry of stack (N >= 0)
    $stack	: Fetch stack address.
-   $retval	: Fetch return value.(*)
+   $retval	: Fetch return value.(\*1)
    $comm	: Fetch current task comm.
-   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
    NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
    FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
 		       (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
 		       (x8/x16/x32/x64), "string" and bitfield are supported.
 
-  (*) only for return probe.
-  (**) this is useful for fetching a field of data structures.
+  (\*1) only for return probe.
+  (\*2) this is useful for fetching a field of data structures.
+  (\*3) Unlike kprobe event, "u" prefix will just be ignored.
 
 Types
 -----
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7a6ed76ba104..b595d5ef099a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4815,10 +4815,11 @@ static const char readme_msg[] =
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
-	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>\n"
+	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
 #else
-	"\t           $stack<index>, $stack, $retval, $comm\n"
+	"\t           $stack<index>, $stack, $retval, $comm,\n"
 #endif
+	"\t           +|-[u]<offset>(<fetcharg>)\n"
 	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
 	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
 	"\t           <type>\\[<array-size>\\]\n"
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index e346229ddbba..9456f4ca3b8a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -930,6 +930,12 @@ probe_mem_read(void *dest, void *src, size_t size)
 	return probe_kernel_read(dest, src, size);
 }
 
+static nokprobe_inline int
+probe_mem_read_user(void *dest, void *src, size_t size)
+{
+	return probe_user_read(dest, src, size);
+}
+
 /* Note that we don't verify it, since the code does not come from user space */
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 30054136cfde..00771e7b6ef8 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -253,6 +253,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 {
 	struct fetch_insn *code = *pcode;
 	unsigned long param;
+	int deref = FETCH_OP_DEREF;
 	long offset = 0;
 	char *tmp;
 	int ret = 0;
@@ -315,9 +316,14 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 		break;
 
 	case '+':	/* deref memory */
-		arg++;	/* Skip '+', because kstrtol() rejects it. */
-		/* fall through */
 	case '-':
+		if (arg[1] == 'u') {
+			deref = FETCH_OP_UDEREF;
+			arg[1] = arg[0];
+			arg++;
+		}
+		if (arg[0] == '+')
+			arg++;	/* Skip '+', because kstrtol() rejects it. */
 		tmp = strchr(arg, '(');
 		if (!tmp)
 			return -EINVAL;
@@ -343,7 +349,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 				return -E2BIG;
 			*pcode = code;
 
-			code->op = FETCH_OP_DEREF;
+			code->op = deref;
 			code->offset = offset;
 		}
 		break;
@@ -459,13 +465,14 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 	/* Store operation */
 	if (!strcmp(parg->type->name, "string") ||
 	    !strcmp(parg->type->name, "ustring")) {
-		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
-		    code->op != FETCH_OP_COMM) {
+		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_UDEREF
+		    && code->op != FETCH_OP_IMM && code->op != FETCH_OP_COMM) {
 			pr_info("string only accepts memory or address.\n");
 			ret = -EINVAL;
 			goto fail;
 		}
-		if (code->op != FETCH_OP_DEREF || parg->count) {
+		if ((code->op == FETCH_OP_IMM || code->op == FETCH_OP_COMM)
+		    || parg->count) {
 			/*
 			 * IMM and COMM is pointing actual address, those must
 			 * be kept, and if parg->count != 0, this is an array
@@ -478,7 +485,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 			}
 		}
 		/* If op == DEREF, replace it with STRING */
-		if (!strcmp(parg->type->name, "ustring"))
+		if (!strcmp(parg->type->name, "ustring") ||
+		    code->op == FETCH_OP_UDEREF)
 			code->op = FETCH_OP_ST_USTRING;
 		else
 			code->op = FETCH_OP_ST_STRING;
@@ -487,6 +495,9 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
 	} else if (code->op == FETCH_OP_DEREF) {
 		code->op = FETCH_OP_ST_MEM;
 		code->size = parg->type->size;
+	} else if (code->op == FETCH_OP_UDEREF) {
+		code->op = FETCH_OP_ST_UMEM;
+		code->size = parg->type->size;
 	} else {
 		code++;
 		if (code->op != FETCH_OP_NOP) {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 94cdcfdaced0..0feac0a81f82 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -92,9 +92,11 @@ enum fetch_op {
 	FETCH_OP_FOFFS,		/* File offset: .immediate */
 	// Stage 2 (dereference) op
 	FETCH_OP_DEREF,		/* Dereference: .offset */
+	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
 	// Stage 3 (store) ops
 	FETCH_OP_ST_RAW,	/* Raw: .size */
 	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
+	FETCH_OP_ST_UMEM,	/* Mem: .offset, .size */
 	FETCH_OP_ST_STRING,	/* String: .offset, .size */
 	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
 	// Stage 4 (modify) op
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 7526f6f8d7b0..06f2d901c4cf 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -64,6 +64,8 @@ static nokprobe_inline int
 fetch_store_string_user(unsigned long addr, void *dest, void *base);
 static nokprobe_inline int
 probe_mem_read(void *dest, void *src, size_t size);
+static nokprobe_inline int
+probe_mem_read_user(void *dest, void *src, size_t size);
 
 /* From the 2nd stage, routine is same */
 static nokprobe_inline int
@@ -77,14 +79,21 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 
 stage2:
 	/* 2nd stage: dereference memory if needed */
-	while (code->op == FETCH_OP_DEREF) {
-		lval = val;
-		ret = probe_mem_read(&val, (void *)val + code->offset,
-					sizeof(val));
+	do {
+		if (code->op == FETCH_OP_DEREF) {
+			lval = val;
+			ret = probe_mem_read(&val, (void *)val + code->offset,
+					     sizeof(val));
+		} else if (code->op == FETCH_OP_UDEREF) {
+			lval = val;
+			ret = probe_mem_read_user(&val,
+				 (void *)val + code->offset, sizeof(val));
+		} else
+			break;
 		if (ret)
 			return ret;
 		code++;
-	}
+	} while (1);
 
 	s3 = code;
 stage3:
@@ -109,6 +118,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 	case FETCH_OP_ST_MEM:
 		probe_mem_read(dest, (void *)val + code->offset, code->size);
 		break;
+	case FETCH_OP_ST_UMEM:
+		probe_mem_read_user(dest, (void *)val + code->offset,
+				    code->size);
+		break;
 	case FETCH_OP_ST_STRING:
 		loc = *(u32 *)dest;
 		ret = fetch_store_string(val + code->offset, dest, base);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f4e37c4f8a21..5bc8c3686f6f 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -140,6 +140,13 @@ probe_mem_read(void *dest, void *src, size_t size)
 
 	return copy_from_user(dest, vaddr, size) ? -EFAULT : 0;
 }
+
+static nokprobe_inline int
+probe_mem_read_user(void *dest, void *src, size_t size)
+{
+	return probe_mem_read(dest, src, size);
+}
+
 /*
  * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
  * length and relative data location.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v6 5/6] selftests/ftrace: Add user-memory access syntax testcase
  2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2019-03-18  6:43 ` [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference Masami Hiramatsu
@ 2019-03-18  6:44 ` Masami Hiramatsu
  2019-03-18  6:44 ` [RFC PATCH v6 6/6] perf-probe: Add user memory access attribute support Masami Hiramatsu
  5 siblings, 0 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:44 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Add a user-memory access syntax testcase which checks
new user-memory access syntax and ustring type.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v6:
  - Add $argN availability check
---
 .../ftrace/test.d/kprobe/kprobe_args_user.tc       |   32 ++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc
new file mode 100644
index 000000000000..0f60087583d8
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_user.tc
@@ -0,0 +1,32 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event user-memory access
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -q '\$arg<N>' README || exit_unresolved # depends on arch
+grep -A10 "fetcharg:" README | grep -q 'ustring' || exit_unsupported
+grep -A10 "fetcharg:" README | grep -q '\[u\]<offset>' || exit_unsupported
+
+:;: "user-memory access syntax and ustring working on user memory";:
+echo 'p:myevent do_sys_open path=+0($arg2):ustring path2=+u0($arg2):string' \
+	> kprobe_events
+
+grep myevent kprobe_events | \
+	grep -q 'path=+0($arg2):ustring path2=+u0($arg2):string'
+echo 1 > events/kprobes/myevent/enable
+echo > /dev/null
+echo 0 > events/kprobes/myevent/enable
+
+grep myevent trace | grep -q 'path="/dev/null" path2="/dev/null"'
+
+:;: "user-memory access syntax and ustring not working with kernel memory";:
+echo 'p:myevent vfs_symlink path=+0($arg3):ustring path2=+u0($arg3):string' \
+	> kprobe_events
+echo 1 > events/kprobes/myevent/enable
+ln -s foo $TMPDIR/bar
+echo 0 > events/kprobes/myevent/enable
+
+grep myevent trace | grep -q 'path=(fault) path2=(fault)'
+
+exit 0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v6 6/6] perf-probe: Add user memory access attribute support
  2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2019-03-18  6:44 ` [RFC PATCH v6 5/6] selftests/ftrace: Add user-memory access syntax testcase Masami Hiramatsu
@ 2019-03-18  6:44 ` Masami Hiramatsu
  5 siblings, 0 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2019-03-18  6:44 UTC (permalink / raw)
  To: Steven Rostedt, Linus Torvalds, Shuah Khan,
	Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: mhiramat, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Add user memory access attribute for kprobe event arguments.
If a given 'local variable' is in user-space, User can
specify memory access method by '@user' suffix. This is
not only for string but also for data structure.

If we access a field of data structure in user memory from
kernel on some arch, it will fail. e.g.

 perf probe -a "sched_setscheduler param->sched_priority"

This will fail to access the "param->sched_priority" because
the param is __user pointer. Instead, we can now specify
@user suffix for such argument.

 perf probe -a "sched_setscheduler param->sched_priority@user"

Note that kernel memory access with "@user" must always fail
on any arch.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 tools/perf/Documentation/perf-probe.txt |    3 ++-
 tools/perf/util/probe-event.c           |   11 +++++++++++
 tools/perf/util/probe-event.h           |    2 ++
 tools/perf/util/probe-file.c            |    7 +++++++
 tools/perf/util/probe-file.h            |    1 +
 tools/perf/util/probe-finder.c          |   19 ++++++++++++-------
 6 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt
index b6866a05edd2..ed3ecfa422e1 100644
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -194,12 +194,13 @@ PROBE ARGUMENT
 --------------
 Each probe argument follows below syntax.
 
- [NAME=]LOCALVAR|$retval|%REG|@SYMBOL[:TYPE]
+ [NAME=]LOCALVAR|$retval|%REG|@SYMBOL[:TYPE][@user]
 
 'NAME' specifies the name of this argument (optional). You can use the name of local variable, local data structure member (e.g. var->field, var.field2), local array with fixed index (e.g. array[1], var->array[0], var->pointer[2]), or kprobe-tracer argument format (e.g. $retval, %ax, etc). Note that the name of this argument will be set as the last member name if you specify a local data structure member (e.g. field2 for 'var->field1.field2'.)
 '$vars' and '$params' special arguments are also available for NAME, '$vars' is expanded to the local variables (including function parameters) which can access at given probe point. '$params' is expanded to only the function parameters.
 'TYPE' casts the type of this argument (optional). If omitted, perf probe automatically set the type based on debuginfo (*). Currently, basic types (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal integers (x/x8/x16/x32/x64), signedness casting (u/s), "string" and bitfield are supported. (see TYPES for detail)
 On x86 systems %REG is always the short form of the register: for example %AX. %RAX or %EAX is not valid.
+"@user" is a special attribute which means the LOCALVAR will be treated as a user-space memory. This is only valid for kprobe event.
 
 TYPES
 -----
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index a1b8d9649ca7..845109aad686 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1575,6 +1575,17 @@ static int parse_perf_probe_arg(char *str, struct perf_probe_arg *arg)
 		str = tmp + 1;
 	}
 
+	tmp = strchr(str, '@');
+	if (tmp && tmp != str && strcmp(tmp + 1, "user")) { /* user attr */
+		if (!user_access_is_supported()) {
+			semantic_error("ftrace does not support user access\n");
+			return -EINVAL;
+		}
+		*tmp = '\0';
+		arg->user = true;
+		pr_debug("user ");
+	}
+
 	tmp = strchr(str, ':');
 	if (tmp) {	/* Type setting */
 		*tmp = '\0';
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 05c8d571a901..155f8741b32e 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -37,6 +37,7 @@ struct probe_trace_point {
 struct probe_trace_arg_ref {
 	struct probe_trace_arg_ref	*next;	/* Next reference */
 	long				offset;	/* Offset value */
+	bool				user;	/* User-memory access */
 };
 
 /* kprobe-tracer and uprobe-tracer tracing argument */
@@ -82,6 +83,7 @@ struct perf_probe_arg {
 	char				*var;	/* Variable name */
 	char				*type;	/* Type name */
 	struct perf_probe_arg_field	*field;	/* Structure fields */
+	bool				user;	/* User-memory */
 };
 
 /* Perf probe probing event (point + arg) */
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4062bc4412a9..89ce1a9c3798 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -1015,6 +1015,7 @@ enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
 	FTRACE_README_UPROBE_REF_CTR,
+	FTRACE_README_USER_ACCESS,
 	FTRACE_README_END,
 };
 
@@ -1027,6 +1028,7 @@ static struct {
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
 	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
+	DEFINE_TYPE(FTRACE_README_USER_ACCESS, "*[u]<offset>*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1087,3 +1089,8 @@ bool uprobe_ref_ctr_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
 }
+
+bool user_access_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_USER_ACCESS);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 2a249182f2a6..986c1c94f64f 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -70,6 +70,7 @@ int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
 bool uprobe_ref_ctr_is_supported(void);
+bool user_access_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index c37fbef1711d..390a69dd0887 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -294,7 +294,7 @@ static int convert_variable_location(Dwarf_Die *vr_die, Dwarf_Addr addr,
 
 static int convert_variable_type(Dwarf_Die *vr_die,
 				 struct probe_trace_arg *tvar,
-				 const char *cast)
+				 const char *cast, bool user)
 {
 	struct probe_trace_arg_ref **ref_ptr = &tvar->ref;
 	Dwarf_Die type;
@@ -334,7 +334,8 @@ static int convert_variable_type(Dwarf_Die *vr_die,
 	pr_debug("%s type is %s.\n",
 		 dwarf_diename(vr_die), dwarf_diename(&type));
 
-	if (cast && strcmp(cast, "string") == 0) {	/* String type */
+	if (cast && (!strcmp(cast, "string") || !strcmp(cast, "ustring"))) {
+		/* String type */
 		ret = dwarf_tag(&type);
 		if (ret != DW_TAG_pointer_type &&
 		    ret != DW_TAG_array_type) {
@@ -357,6 +358,7 @@ static int convert_variable_type(Dwarf_Die *vr_die,
 				pr_warning("Out of memory error\n");
 				return -ENOMEM;
 			}
+			(*ref_ptr)->user = user;
 		}
 		if (!die_compare_name(&type, "char") &&
 		    !die_compare_name(&type, "unsigned char")) {
@@ -411,7 +413,7 @@ static int convert_variable_type(Dwarf_Die *vr_die,
 static int convert_variable_fields(Dwarf_Die *vr_die, const char *varname,
 				    struct perf_probe_arg_field *field,
 				    struct probe_trace_arg_ref **ref_ptr,
-				    Dwarf_Die *die_mem)
+				    Dwarf_Die *die_mem, bool user)
 {
 	struct probe_trace_arg_ref *ref = *ref_ptr;
 	Dwarf_Die type;
@@ -448,6 +450,7 @@ static int convert_variable_fields(Dwarf_Die *vr_die, const char *varname,
 				*ref_ptr = ref;
 		}
 		ref->offset += dwarf_bytesize(&type) * field->index;
+		ref->user = user;
 		goto next;
 	} else if (tag == DW_TAG_pointer_type) {
 		/* Check the pointer and dereference */
@@ -519,17 +522,18 @@ static int convert_variable_fields(Dwarf_Die *vr_die, const char *varname,
 		}
 	}
 	ref->offset += (long)offs;
+	ref->user = user;
 
 	/* If this member is unnamed, we need to reuse this field */
 	if (!dwarf_diename(die_mem))
 		return convert_variable_fields(die_mem, varname, field,
-						&ref, die_mem);
+						&ref, die_mem, user);
 
 next:
 	/* Converting next field */
 	if (field->next)
 		return convert_variable_fields(die_mem, field->name,
-					field->next, &ref, die_mem);
+					field->next, &ref, die_mem, user);
 	else
 		return 0;
 }
@@ -555,11 +559,12 @@ static int convert_variable(Dwarf_Die *vr_die, struct probe_finder *pf)
 	else if (ret == 0 && pf->pvar->field) {
 		ret = convert_variable_fields(vr_die, pf->pvar->var,
 					      pf->pvar->field, &pf->tvar->ref,
-					      &die_mem);
+					      &die_mem, pf->pvar->user);
 		vr_die = &die_mem;
 	}
 	if (ret == 0)
-		ret = convert_variable_type(vr_die, pf->tvar, pf->pvar->type);
+		ret = convert_variable_type(vr_die, pf->tvar, pf->pvar->type,
+					    pf->pvar->user);
 	/* *expr will be cached in libdw. Don't free it. */
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled
  2019-03-18  6:43 ` [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled Masami Hiramatsu
@ 2019-03-22  2:46   ` Steven Rostedt
  2019-05-06 15:22     ` Masami Hiramatsu
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2019-03-22  2:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

On Mon, 18 Mar 2019 15:43:17 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> WARN_ON_IN_IRQ() assumes that the access_ok() and following
> user memory access can sleep. But this assumption is not
> always correct; when the pagefault is disabled, following
> memory access will just returns -EFAULT and never sleep.
> 
> Add pagefault_disabled() check in WARN_ON_ONCE() so that
> it can ignore the case we call it with disabling pagefault.
> For this purpose, this modified pagefault_disabled() as
> an inline function.
> 

Actually, accessing user space from an interrupt doesn't really make
sense. Now I'm differentiating a true interrupt (like a device handler)
from an exception. The difference is that an exception is synchronous
with the execution of the code, but an interrupt is something where you
don't know what task is running. A uaccess in this type of interrupt
will randomly grab some user space memory but have no idea what task is
running.

The one time this makes sense is if you are doing some kind of
profiling, where the randomness is fine.

I'm curious, what interrupt handler are kprobes executing in that needs
random user space addresses?

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled
  2019-03-22  2:46   ` Steven Rostedt
@ 2019-05-06 15:22     ` Masami Hiramatsu
  2019-05-06 15:39       ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Masami Hiramatsu @ 2019-05-06 15:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

Hi Steve,

It seems I missed this message...

On Thu, 21 Mar 2019 22:46:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 18 Mar 2019 15:43:17 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > WARN_ON_IN_IRQ() assumes that the access_ok() and following
> > user memory access can sleep. But this assumption is not
> > always correct; when the pagefault is disabled, following
> > memory access will just returns -EFAULT and never sleep.
> > 
> > Add pagefault_disabled() check in WARN_ON_ONCE() so that
> > it can ignore the case we call it with disabling pagefault.
> > For this purpose, this modified pagefault_disabled() as
> > an inline function.
> > 
> 
> Actually, accessing user space from an interrupt doesn't really make
> sense. Now I'm differentiating a true interrupt (like a device handler)
> from an exception. The difference is that an exception is synchronous
> with the execution of the code, but an interrupt is something where you
> don't know what task is running. A uaccess in this type of interrupt
> will randomly grab some user space memory but have no idea what task is
> running.

I see. Would you mean the title is incorrect?

> 
> The one time this makes sense is if you are doing some kind of
> profiling, where the randomness is fine.

Agreed.

> 
> I'm curious, what interrupt handler are kprobes executing in that needs
> random user space addresses?

Sorry for confusion. Kprobes is using an exception (of course!). So the
title can mislead, it should be "in exception" instead of "in irq context",
However, current code checks it by "!in_task()", which includes both of
IRQ and exception. A better solution might change it to "in_irq()".

However, I could not find a way to distinguish the "exception" and
"external IRQ" by the execution context (based on the preempt count)
because exception is treated as a kind of IRQ.
Thus, in this patch, I changed it as not only checking what the context
is, but also whether it is appropriately called.


Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled
  2019-05-06 15:22     ` Masami Hiramatsu
@ 2019-05-06 15:39       ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2019-05-06 15:39 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

On Tue, 7 May 2019 00:22:03 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Hi Steve,
> 
> It seems I missed this message...

No problem. The number of times I missed messages... ugh.

> 
> > 
> > I'm curious, what interrupt handler are kprobes executing in that needs
> > random user space addresses?  
> 
> Sorry for confusion. Kprobes is using an exception (of course!). So the
> title can mislead, it should be "in exception" instead of "in irq context",
> However, current code checks it by "!in_task()", which includes both of
> IRQ and exception. A better solution might change it to "in_irq()".

That makes sense.

> 
> However, I could not find a way to distinguish the "exception" and
> "external IRQ" by the execution context (based on the preempt count)
> because exception is treated as a kind of IRQ.
> Thus, in this patch, I changed it as not only checking what the context
> is, but also whether it is appropriately called.
> 

As exceptions typically disable interrupts, we treat them as their own
context. Especially for looking at recursion detection algorithms,
which allow for different contexts to recurse.

Normal-context -> softirq -> exception / IRQ -> NMI


Anyway, that WARN_ON_IN_IRQ() should come with a big comment about why
we allow it if we have pagefault_disable() set.

This will need to go through the x86 maintainers. I'll go and review
the tracing patches of this series and give an ack / reviewed-by if
there's no issues.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference
  2019-03-18  6:43 ` [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference Masami Hiramatsu
@ 2019-05-06 15:52   ` Steven Rostedt
  2019-05-08  4:11     ` Masami Hiramatsu
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2019-05-06 15:52 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

On Mon, 18 Mar 2019 15:43:52 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> +.. _user_mem_access:
> +User Memory Access
> +------------------
> +Kprobe events supports user-space memory access. For that purpose, you can use
> +either user-space dereference syntax or 'ustring' type.
> +
> +The user-space dereference syntax allows you to access a field of a data
> +structure in user-space. This is done by adding the "u" prefix to the
> +dereference syntax. For example, +u4(%si) means it will read memory from the
> +address in the register %si offset by 4, and the mory is expected to be in

                                                    ^^^^
 "memory"

> +user-space. You can use this for strings too, e.g. +u0(%si):string will read
> +a string from the address in the register %si that is expected to be in user-
> +space. 'ustring' is a shortcut way of performing the same task. That is,
> ++0(%si):ustring is equivalent to +u0(%si):string.
> +
> +Note that kprobe-event provides the user-memory access syntax but it doesn't
> +use it transparently. This means if you use normal dereference or string type
> +for user memory, it might fail, and always fails on some arch. So user has to

  "and may always fail on some archs. The user has to carefully check
  if the target data is in kernel or user space."


> +check if the targe data is in kernel or in user space carefully.
>  
>  Per-Probe Event Filtering
>  -------------------------
> diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
> index 4346e23e3ae7..de8812c932bc 100644
> --- a/Documentation/trace/uprobetracer.rst
> +++ b/Documentation/trace/uprobetracer.rst
> @@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
>     @+OFFSET	: Fetch memory at OFFSET (OFFSET from same file as PATH)
>     $stackN	: Fetch Nth entry of stack (N >= 0)
>     $stack	: Fetch stack address.
> -   $retval	: Fetch return value.(*)
> +   $retval	: Fetch return value.(\*1)
>     $comm	: Fetch current task comm.
> -   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
> +   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
>     NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
>     FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
>  		       (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
>  		       (x8/x16/x32/x64), "string" and bitfield are supported.

Hmm, shouldn't uprobes default to userspace. Isn't the purpose mostly
to find out what's going on in userspace. Perhaps we should add a 'k'
annotation to uprobes to denote that it's for kernel space, as that
should be the exception and not the norm.

>  
> -  (*) only for return probe.
> -  (**) this is useful for fetching a field of data structures.
> +  (\*1) only for return probe.
> +  (\*2) this is useful for fetching a field of data structures.
> +  (\*3) Unlike kprobe event, "u" prefix will just be ignored.
>  
>  Types
>  -----
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7a6ed76ba104..b595d5ef099a 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4815,10 +4815,11 @@ static const char readme_msg[] =
>  	"\t     args: <name>=fetcharg[:type]\n"
>  	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
>  #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
> -	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>\n"
> +	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
>  #else
> -	"\t           $stack<index>, $stack, $retval, $comm\n"
> +	"\t           $stack<index>, $stack, $retval, $comm,\n"
>  #endif
> +	"\t           +|-[u]<offset>(<fetcharg>)\n"
>  	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
>  	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
>  	"\t           <type>\\[<array-size>\\]\n"
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index e346229ddbba..9456f4ca3b8a 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -930,6 +930,12 @@ probe_mem_read(void *dest, void *src, size_t size)
>  	return probe_kernel_read(dest, src, size);
>  }
>  
> +static nokprobe_inline int
> +probe_mem_read_user(void *dest, void *src, size_t size)
> +{
> +	return probe_user_read(dest, src, size);
> +}
> +
>  /* Note that we don't verify it, since the code does not come from user space */
>  static int
>  process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 30054136cfde..00771e7b6ef8 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -253,6 +253,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>  {
>  	struct fetch_insn *code = *pcode;
>  	unsigned long param;
> +	int deref = FETCH_OP_DEREF;
>  	long offset = 0;
>  	char *tmp;
>  	int ret = 0;
> @@ -315,9 +316,14 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>  		break;
>  
>  	case '+':	/* deref memory */
> -		arg++;	/* Skip '+', because kstrtol() rejects it. */
> -		/* fall through */
>  	case '-':
> +		if (arg[1] == 'u') {
> +			deref = FETCH_OP_UDEREF;
> +			arg[1] = arg[0];
> +			arg++;
> +		}

It should be fine to add a 'k' version here too.

> +		if (arg[0] == '+')
> +			arg++;	/* Skip '+', because kstrtol() rejects it. */
>  		tmp = strchr(arg, '(');
>  		if (!tmp)
>  			return -EINVAL;
> @@ -343,7 +349,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>  				return -E2BIG;
>  			*pcode = code;
>  
> -			code->op = FETCH_OP_DEREF;
> +			code->op = deref;
>  			code->offset = offset;
>  		}
>  		break;
> @@ -459,13 +465,14 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
>  	/* Store operation */
>  	if (!strcmp(parg->type->name, "string") ||
>  	    !strcmp(parg->type->name, "ustring")) {
> -		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
> -		    code->op != FETCH_OP_COMM) {
> +		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_UDEREF
> +		    && code->op != FETCH_OP_IMM && code->op != FETCH_OP_COMM) {
>  			pr_info("string only accepts memory or address.\n");
>  			ret = -EINVAL;
>  			goto fail;
>  		}
> -		if (code->op != FETCH_OP_DEREF || parg->count) {
> +		if ((code->op == FETCH_OP_IMM || code->op == FETCH_OP_COMM)
> +		    || parg->count) {
>  			/*
>  			 * IMM and COMM is pointing actual address, those must
>  			 * be kept, and if parg->count != 0, this is an array
> @@ -478,7 +485,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
>  			}
>  		}
>  		/* If op == DEREF, replace it with STRING */
> -		if (!strcmp(parg->type->name, "ustring"))
> +		if (!strcmp(parg->type->name, "ustring") ||

Perhaps have a "kstring" for kernel strings in uprobes.

> +		    code->op == FETCH_OP_UDEREF)
>  			code->op = FETCH_OP_ST_USTRING;
>  		else
>  			code->op = FETCH_OP_ST_STRING;
> @@ -487,6 +495,9 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
>  	} else if (code->op == FETCH_OP_DEREF) {
>  		code->op = FETCH_OP_ST_MEM;
>  		code->size = parg->type->size;
> +	} else if (code->op == FETCH_OP_UDEREF) {
> +		code->op = FETCH_OP_ST_UMEM;
> +		code->size = parg->type->size;
>  	} else {
>  		code++;
>  		if (code->op != FETCH_OP_NOP) {
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 94cdcfdaced0..0feac0a81f82 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -92,9 +92,11 @@ enum fetch_op {
>  	FETCH_OP_FOFFS,		/* File offset: .immediate */
>  	// Stage 2 (dereference) op
>  	FETCH_OP_DEREF,		/* Dereference: .offset */
> +	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
>  	// Stage 3 (store) ops
>  	FETCH_OP_ST_RAW,	/* Raw: .size */
>  	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
> +	FETCH_OP_ST_UMEM,	/* Mem: .offset, .size */
>  	FETCH_OP_ST_STRING,	/* String: .offset, .size */
>  	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
>  	// Stage 4 (modify) op
> diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
> index 7526f6f8d7b0..06f2d901c4cf 100644
> --- a/kernel/trace/trace_probe_tmpl.h
> +++ b/kernel/trace/trace_probe_tmpl.h
> @@ -64,6 +64,8 @@ static nokprobe_inline int
>  fetch_store_string_user(unsigned long addr, void *dest, void *base);
>  static nokprobe_inline int
>  probe_mem_read(void *dest, void *src, size_t size);
> +static nokprobe_inline int
> +probe_mem_read_user(void *dest, void *src, size_t size);
>  
>  /* From the 2nd stage, routine is same */
>  static nokprobe_inline int
> @@ -77,14 +79,21 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
>  
>  stage2:
>  	/* 2nd stage: dereference memory if needed */
> -	while (code->op == FETCH_OP_DEREF) {
> -		lval = val;
> -		ret = probe_mem_read(&val, (void *)val + code->offset,
> -					sizeof(val));
> +	do {
> +		if (code->op == FETCH_OP_DEREF) {
> +			lval = val;
> +			ret = probe_mem_read(&val, (void *)val + code->offset,
> +					     sizeof(val));
> +		} else if (code->op == FETCH_OP_UDEREF) {
> +			lval = val;
> +			ret = probe_mem_read_user(&val,
> +				 (void *)val + code->offset, sizeof(val));
> +		} else
> +			break;
>  		if (ret)
>  			return ret;
>  		code++;
> -	}
> +	} while (1);
>  
>  	s3 = code;
>  stage3:
> @@ -109,6 +118,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
>  	case FETCH_OP_ST_MEM:
>  		probe_mem_read(dest, (void *)val + code->offset, code->size);
>  		break;
> +	case FETCH_OP_ST_UMEM:
> +		probe_mem_read_user(dest, (void *)val + code->offset,
> +				    code->size);
> +		break;
>  	case FETCH_OP_ST_STRING:
>  		loc = *(u32 *)dest;
>  		ret = fetch_store_string(val + code->offset, dest, base);
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index f4e37c4f8a21..5bc8c3686f6f 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -140,6 +140,13 @@ probe_mem_read(void *dest, void *src, size_t size)
>  
>  	return copy_from_user(dest, vaddr, size) ? -EFAULT : 0;
>  }
> +
> +static nokprobe_inline int
> +probe_mem_read_user(void *dest, void *src, size_t size)
> +{
> +	return probe_mem_read(dest, src, size);

Hmm, if probe_mem_read() is the same as probe_mem_read_user(), perhaps
not even have a 'u' version for uprobes.

-- Steve


> +}
> +
>  /*
>   * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
>   * length and relative data location.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference
  2019-05-06 15:52   ` Steven Rostedt
@ 2019-05-08  4:11     ` Masami Hiramatsu
  2019-05-08 15:22       ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Masami Hiramatsu @ 2019-05-08  4:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

On Mon, 6 May 2019 11:52:26 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 18 Mar 2019 15:43:52 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > +.. _user_mem_access:
> > +User Memory Access
> > +------------------
> > +Kprobe events supports user-space memory access. For that purpose, you can use
> > +either user-space dereference syntax or 'ustring' type.
> > +
> > +The user-space dereference syntax allows you to access a field of a data
> > +structure in user-space. This is done by adding the "u" prefix to the
> > +dereference syntax. For example, +u4(%si) means it will read memory from the
> > +address in the register %si offset by 4, and the mory is expected to be in
> 
>                                                     ^^^^
>  "memory"

OK, thanks!

> 
> > +user-space. You can use this for strings too, e.g. +u0(%si):string will read
> > +a string from the address in the register %si that is expected to be in user-
> > +space. 'ustring' is a shortcut way of performing the same task. That is,
> > ++0(%si):ustring is equivalent to +u0(%si):string.
> > +
> > +Note that kprobe-event provides the user-memory access syntax but it doesn't
> > +use it transparently. This means if you use normal dereference or string type
> > +for user memory, it might fail, and always fails on some arch. So user has to
> 
>   "and may always fail on some archs. The user has to carefully check
>   if the target data is in kernel or user space."

OK. I'll update.

> > +check if the targe data is in kernel or in user space carefully.
> >  
> >  Per-Probe Event Filtering
> >  -------------------------
> > diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
> > index 4346e23e3ae7..de8812c932bc 100644
> > --- a/Documentation/trace/uprobetracer.rst
> > +++ b/Documentation/trace/uprobetracer.rst
> > @@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
> >     @+OFFSET	: Fetch memory at OFFSET (OFFSET from same file as PATH)
> >     $stackN	: Fetch Nth entry of stack (N >= 0)
> >     $stack	: Fetch stack address.
> > -   $retval	: Fetch return value.(*)
> > +   $retval	: Fetch return value.(\*1)
> >     $comm	: Fetch current task comm.
> > -   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
> > +   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
> >     NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
> >     FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
> >  		       (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
> >  		       (x8/x16/x32/x64), "string" and bitfield are supported.
> 
> Hmm, shouldn't uprobes default to userspace. Isn't the purpose mostly
> to find out what's going on in userspace. Perhaps we should add a 'k'
> annotation to uprobes to denote that it's for kernel space, as that
> should be the exception and not the norm.

No, uprobe can not access kernel space, because it doesn't have the
current kernel context. Note that all registers, stacks which
can be accessed from uprobe handler are user-space. We can not access
kernel context from that. See below

> > -  (*) only for return probe.
> > -  (**) this is useful for fetching a field of data structures.
> > +  (\*1) only for return probe.
> > +  (\*2) this is useful for fetching a field of data structures.
> > +  (\*3) Unlike kprobe event, "u" prefix will just be ignored.

Thus the 'u' is just ignored on uprobe event.

> >  
> >  Types
> >  -----
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 7a6ed76ba104..b595d5ef099a 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -4815,10 +4815,11 @@ static const char readme_msg[] =
> >  	"\t     args: <name>=fetcharg[:type]\n"
> >  	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
> >  #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
> > -	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>\n"
> > +	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
> >  #else
> > -	"\t           $stack<index>, $stack, $retval, $comm\n"
> > +	"\t           $stack<index>, $stack, $retval, $comm,\n"
> >  #endif
> > +	"\t           +|-[u]<offset>(<fetcharg>)\n"
> >  	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
> >  	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
> >  	"\t           <type>\\[<array-size>\\]\n"
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index e346229ddbba..9456f4ca3b8a 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -930,6 +930,12 @@ probe_mem_read(void *dest, void *src, size_t size)
> >  	return probe_kernel_read(dest, src, size);
> >  }
> >  
> > +static nokprobe_inline int
> > +probe_mem_read_user(void *dest, void *src, size_t size)
> > +{
> > +	return probe_user_read(dest, src, size);
> > +}
> > +
> >  /* Note that we don't verify it, since the code does not come from user space */
> >  static int
> >  process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
> > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > index 30054136cfde..00771e7b6ef8 100644
> > --- a/kernel/trace/trace_probe.c
> > +++ b/kernel/trace/trace_probe.c
> > @@ -253,6 +253,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  {
> >  	struct fetch_insn *code = *pcode;
> >  	unsigned long param;
> > +	int deref = FETCH_OP_DEREF;
> >  	long offset = 0;
> >  	char *tmp;
> >  	int ret = 0;
> > @@ -315,9 +316,14 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  		break;
> >  
> >  	case '+':	/* deref memory */
> > -		arg++;	/* Skip '+', because kstrtol() rejects it. */
> > -		/* fall through */
> >  	case '-':
> > +		if (arg[1] == 'u') {
> > +			deref = FETCH_OP_UDEREF;
> > +			arg[1] = arg[0];
> > +			arg++;
> > +		}
> 
> It should be fine to add a 'k' version here too.
> 
> > +		if (arg[0] == '+')
> > +			arg++;	/* Skip '+', because kstrtol() rejects it. */
> >  		tmp = strchr(arg, '(');
> >  		if (!tmp)
> >  			return -EINVAL;
> > @@ -343,7 +349,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  				return -E2BIG;
> >  			*pcode = code;
> >  
> > -			code->op = FETCH_OP_DEREF;
> > +			code->op = deref;
> >  			code->offset = offset;
> >  		}
> >  		break;
> > @@ -459,13 +465,14 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
> >  	/* Store operation */
> >  	if (!strcmp(parg->type->name, "string") ||
> >  	    !strcmp(parg->type->name, "ustring")) {
> > -		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
> > -		    code->op != FETCH_OP_COMM) {
> > +		if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_UDEREF
> > +		    && code->op != FETCH_OP_IMM && code->op != FETCH_OP_COMM) {
> >  			pr_info("string only accepts memory or address.\n");
> >  			ret = -EINVAL;
> >  			goto fail;
> >  		}
> > -		if (code->op != FETCH_OP_DEREF || parg->count) {
> > +		if ((code->op == FETCH_OP_IMM || code->op == FETCH_OP_COMM)
> > +		    || parg->count) {
> >  			/*
> >  			 * IMM and COMM is pointing actual address, those must
> >  			 * be kept, and if parg->count != 0, this is an array
> > @@ -478,7 +485,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
> >  			}
> >  		}
> >  		/* If op == DEREF, replace it with STRING */
> > -		if (!strcmp(parg->type->name, "ustring"))
> > +		if (!strcmp(parg->type->name, "ustring") ||
> 
> Perhaps have a "kstring" for kernel strings in uprobes.
> 
> > +		    code->op == FETCH_OP_UDEREF)
> >  			code->op = FETCH_OP_ST_USTRING;
> >  		else
> >  			code->op = FETCH_OP_ST_STRING;
> > @@ -487,6 +495,9 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
> >  	} else if (code->op == FETCH_OP_DEREF) {
> >  		code->op = FETCH_OP_ST_MEM;
> >  		code->size = parg->type->size;
> > +	} else if (code->op == FETCH_OP_UDEREF) {
> > +		code->op = FETCH_OP_ST_UMEM;
> > +		code->size = parg->type->size;
> >  	} else {
> >  		code++;
> >  		if (code->op != FETCH_OP_NOP) {
> > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > index 94cdcfdaced0..0feac0a81f82 100644
> > --- a/kernel/trace/trace_probe.h
> > +++ b/kernel/trace/trace_probe.h
> > @@ -92,9 +92,11 @@ enum fetch_op {
> >  	FETCH_OP_FOFFS,		/* File offset: .immediate */
> >  	// Stage 2 (dereference) op
> >  	FETCH_OP_DEREF,		/* Dereference: .offset */
> > +	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
> >  	// Stage 3 (store) ops
> >  	FETCH_OP_ST_RAW,	/* Raw: .size */
> >  	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
> > +	FETCH_OP_ST_UMEM,	/* Mem: .offset, .size */
> >  	FETCH_OP_ST_STRING,	/* String: .offset, .size */
> >  	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
> >  	// Stage 4 (modify) op
> > diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
> > index 7526f6f8d7b0..06f2d901c4cf 100644
> > --- a/kernel/trace/trace_probe_tmpl.h
> > +++ b/kernel/trace/trace_probe_tmpl.h
> > @@ -64,6 +64,8 @@ static nokprobe_inline int
> >  fetch_store_string_user(unsigned long addr, void *dest, void *base);
> >  static nokprobe_inline int
> >  probe_mem_read(void *dest, void *src, size_t size);
> > +static nokprobe_inline int
> > +probe_mem_read_user(void *dest, void *src, size_t size);
> >  
> >  /* From the 2nd stage, routine is same */
> >  static nokprobe_inline int
> > @@ -77,14 +79,21 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
> >  
> >  stage2:
> >  	/* 2nd stage: dereference memory if needed */
> > -	while (code->op == FETCH_OP_DEREF) {
> > -		lval = val;
> > -		ret = probe_mem_read(&val, (void *)val + code->offset,
> > -					sizeof(val));
> > +	do {
> > +		if (code->op == FETCH_OP_DEREF) {
> > +			lval = val;
> > +			ret = probe_mem_read(&val, (void *)val + code->offset,
> > +					     sizeof(val));
> > +		} else if (code->op == FETCH_OP_UDEREF) {
> > +			lval = val;
> > +			ret = probe_mem_read_user(&val,
> > +				 (void *)val + code->offset, sizeof(val));
> > +		} else
> > +			break;
> >  		if (ret)
> >  			return ret;
> >  		code++;
> > -	}
> > +	} while (1);
> >  
> >  	s3 = code;
> >  stage3:
> > @@ -109,6 +118,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
> >  	case FETCH_OP_ST_MEM:
> >  		probe_mem_read(dest, (void *)val + code->offset, code->size);
> >  		break;
> > +	case FETCH_OP_ST_UMEM:
> > +		probe_mem_read_user(dest, (void *)val + code->offset,
> > +				    code->size);
> > +		break;
> >  	case FETCH_OP_ST_STRING:
> >  		loc = *(u32 *)dest;
> >  		ret = fetch_store_string(val + code->offset, dest, base);
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index f4e37c4f8a21..5bc8c3686f6f 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -140,6 +140,13 @@ probe_mem_read(void *dest, void *src, size_t size)
> >  
> >  	return copy_from_user(dest, vaddr, size) ? -EFAULT : 0;
> >  }
> > +
> > +static nokprobe_inline int
> > +probe_mem_read_user(void *dest, void *src, size_t size)
> > +{
> > +	return probe_mem_read(dest, src, size);
> 
> Hmm, if probe_mem_read() is the same as probe_mem_read_user(), perhaps
> not even have a 'u' version for uprobes.

Yes, this is just a hack for sharing the template code.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference
  2019-05-08  4:11     ` Masami Hiramatsu
@ 2019-05-08 15:22       ` Steven Rostedt
  2019-05-13 12:11         ` Masami Hiramatsu
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2019-05-08 15:22 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

On Wed, 8 May 2019 13:11:43 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Mon, 6 May 2019 11:52:26 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Mon, 18 Mar 2019 15:43:52 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >   
> > > +.. _user_mem_access:
> > > +User Memory Access
> > > +------------------
> > > +Kprobe events supports user-space memory access. For that purpose, you can use
> > > +either user-space dereference syntax or 'ustring' type.
> > > +
> > > +The user-space dereference syntax allows you to access a field of a data
> > > +structure in user-space. This is done by adding the "u" prefix to the
> > > +dereference syntax. For example, +u4(%si) means it will read memory from the
> > > +address in the register %si offset by 4, and the mory is expected to be in  
> > 
> >                                                     ^^^^
> >  "memory"  
> 
> OK, thanks!
> 
> >   
> > > +user-space. You can use this for strings too, e.g. +u0(%si):string will read
> > > +a string from the address in the register %si that is expected to be in user-
> > > +space. 'ustring' is a shortcut way of performing the same task. That is,
> > > ++0(%si):ustring is equivalent to +u0(%si):string.
> > > +
> > > +Note that kprobe-event provides the user-memory access syntax but it doesn't
> > > +use it transparently. This means if you use normal dereference or string type
> > > +for user memory, it might fail, and always fails on some arch. So user has to  
> > 
> >   "and may always fail on some archs. The user has to carefully check
> >   if the target data is in kernel or user space."  
> 
> OK. I'll update.
> 
> > > +check if the targe data is in kernel or in user space carefully.
> > >  
> > >  Per-Probe Event Filtering
> > >  -------------------------
> > > diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
> > > index 4346e23e3ae7..de8812c932bc 100644
> > > --- a/Documentation/trace/uprobetracer.rst
> > > +++ b/Documentation/trace/uprobetracer.rst
> > > @@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
> > >     @+OFFSET	: Fetch memory at OFFSET (OFFSET from same file as PATH)
> > >     $stackN	: Fetch Nth entry of stack (N >= 0)
> > >     $stack	: Fetch stack address.
> > > -   $retval	: Fetch return value.(*)
> > > +   $retval	: Fetch return value.(\*1)
> > >     $comm	: Fetch current task comm.
> > > -   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
> > > +   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
> > >     NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
> > >     FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
> > >  		       (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
> > >  		       (x8/x16/x32/x64), "string" and bitfield are supported.  
> > 
> > Hmm, shouldn't uprobes default to userspace. Isn't the purpose mostly
> > to find out what's going on in userspace. Perhaps we should add a 'k'
> > annotation to uprobes to denote that it's for kernel space, as that
> > should be the exception and not the norm.  
> 
> No, uprobe can not access kernel space, because it doesn't have the
> current kernel context. Note that all registers, stacks which
> can be accessed from uprobe handler are user-space. We can not access
> kernel context from that. See below
> 
> > > -  (*) only for return probe.
> > > -  (**) this is useful for fetching a field of data structures.
> > > +  (\*1) only for return probe.
> > > +  (\*2) this is useful for fetching a field of data structures.
> > > +  (\*3) Unlike kprobe event, "u" prefix will just be ignored.  
> 
> Thus the 'u' is just ignored on uprobe event.

I totally missed the footnote here. Can we stress this point more up in
the "User Memory Access" section. Specifically state something like:
"Uprobes only access userspace memory, thus the 'u' is not required,
and if it is added to a uprobe, it will simply be ignored".

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference
  2019-05-08 15:22       ` Steven Rostedt
@ 2019-05-13 12:11         ` Masami Hiramatsu
  0 siblings, 0 replies; 14+ messages in thread
From: Masami Hiramatsu @ 2019-05-13 12:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Shuah Khan, Arnaldo Carvalho de Melo,
	Peter Zijlstra, linux-kernel, Andy Lutomirski, Ingo Molnar,
	Andrew Morton, Changbin Du, Jann Horn, Kees Cook,
	Andy Lutomirski, Alexei Starovoitov, Nadav Amit, Joel Fernandes,
	yhs

On Wed, 8 May 2019 11:22:37 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > > >  Per-Probe Event Filtering
> > > >  -------------------------
> > > > diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
> > > > index 4346e23e3ae7..de8812c932bc 100644
> > > > --- a/Documentation/trace/uprobetracer.rst
> > > > +++ b/Documentation/trace/uprobetracer.rst
> > > > @@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
> > > >     @+OFFSET	: Fetch memory at OFFSET (OFFSET from same file as PATH)
> > > >     $stackN	: Fetch Nth entry of stack (N >= 0)
> > > >     $stack	: Fetch stack address.
> > > > -   $retval	: Fetch return value.(*)
> > > > +   $retval	: Fetch return value.(\*1)
> > > >     $comm	: Fetch current task comm.
> > > > -   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
> > > > +   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
> > > >     NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
> > > >     FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
> > > >  		       (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
> > > >  		       (x8/x16/x32/x64), "string" and bitfield are supported.  
> > > 
> > > Hmm, shouldn't uprobes default to userspace. Isn't the purpose mostly
> > > to find out what's going on in userspace. Perhaps we should add a 'k'
> > > annotation to uprobes to denote that it's for kernel space, as that
> > > should be the exception and not the norm.  
> > 
> > No, uprobe can not access kernel space, because it doesn't have the
> > current kernel context. Note that all registers, stacks which
> > can be accessed from uprobe handler are user-space. We can not access
> > kernel context from that. See below
> > 
> > > > -  (*) only for return probe.
> > > > -  (**) this is useful for fetching a field of data structures.
> > > > +  (\*1) only for return probe.
> > > > +  (\*2) this is useful for fetching a field of data structures.
> > > > +  (\*3) Unlike kprobe event, "u" prefix will just be ignored.  
> > 
> > Thus the 'u' is just ignored on uprobe event.
> 
> I totally missed the footnote here. Can we stress this point more up in
> the "User Memory Access" section. Specifically state something like:
> "Uprobes only access userspace memory, thus the 'u' is not required,
> and if it is added to a uprobe, it will simply be ignored".

Sorry, I missed this mail. 

Since the "User Memory Access" section is only in kprobetrace.rst, I think
mentioning uprobe-events in kprobetrace.rst is meaningless. Uprobe user
might read uprobetracer.rst instead of kprobetrace.rst.
So I think it is enough to mention it as a footnote in uprobetracer.rst.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-05-13 12:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-18  6:42 [RFC PATCH v6 0/6] tracing/probes: uaccess: Add support user-space access Masami Hiramatsu
2019-03-18  6:43 ` [RFC PATCH v6 1/6] x86/uaccess: Allow access_ok() in irq context if pagefault_disabled Masami Hiramatsu
2019-03-22  2:46   ` Steven Rostedt
2019-05-06 15:22     ` Masami Hiramatsu
2019-05-06 15:39       ` Steven Rostedt
2019-03-18  6:43 ` [RFC PATCH v6 2/6] uaccess: Add non-pagefault user-space read functions Masami Hiramatsu
2019-03-18  6:43 ` [RFC PATCH v6 3/6] tracing/probe: Add ustring type for user-space string Masami Hiramatsu
2019-03-18  6:43 ` [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference Masami Hiramatsu
2019-05-06 15:52   ` Steven Rostedt
2019-05-08  4:11     ` Masami Hiramatsu
2019-05-08 15:22       ` Steven Rostedt
2019-05-13 12:11         ` Masami Hiramatsu
2019-03-18  6:44 ` [RFC PATCH v6 5/6] selftests/ftrace: Add user-memory access syntax testcase Masami Hiramatsu
2019-03-18  6:44 ` [RFC PATCH v6 6/6] perf-probe: Add user memory access attribute support Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).