All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
@ 2013-10-29  6:53 Namhyung Kim
  2013-10-29  6:53 ` [PATCH 01/13] tracing/uprobes: Fix documentation of uprobe registration syntax Namhyung Kim
                   ` (14 more replies)
  0 siblings, 15 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hello,

This patchset implements memory (address), stack[N], deference,
bitfield and retval (it needs uretprobe tho) fetch methods for
uprobes.  It's based on the previous work [1] done by Hyeoncheol Lee.

Now kprobes and uprobes have their own fetch_type_tables and, in turn,
memory and stack access methods.  Other fetch methods are shared.

For the dereference method, I added a new argument to fetch functions.
It's because for uprobes it needs to know whether the given address is
a file offset or a virtual address in an user process.  For instance,
in case of fetching from a memory directly (like @offset) it should
convert the address (offset) to a virtual address of the process, but
if it's a dereferencing, the given address already has the virtual
address.

To determine this in a fetch function, I passed a pointer to
trace_uprobe for direct fetch, and passed NULL for dereference.

The patch 1-2 are bug fixes and can be applied independently.

Please look at patch 10 that uses per-cpu buffer for accessing user
memory as suggested by Steven.  While I tried hard not to mess things
up there might be a chance I did something horrible.  It'd be great if
you guys take a look and give comments.


 * v6 changes:
  - add more Ack's from Masami
  - fix ref count of uprobe_cpu_buffer (thanks to Jovi)

 * v5 changes:
  - use user_stack_pointer() instead of GET_USP()
  - fix a bug in 'stack' fetch method of uprobes

 * v4 changes:
  - add Ack's from Masami
  - rearrange patches to make it easy for simple fixes to be applied
  - update documentation
  - use per-cpu buffer for storing args (thanks to Steve!)


[1] https://lkml.org/lkml/2012/11/14/84

A simple example:

  # cat foo.c
  int glob = -1;
  char str[] = "hello uprobe.";

  struct foo {
    unsigned int unused: 2;
    unsigned int foo: 20;
    unsigned int bar: 10;
  } foo = {
    .foo = 5,
  };

  int main(int argc, char *argv[])
  {
    long local = 0x1234;

    return 127;
  }

  # gcc -o foo -g foo.c

  # objdump -d foo | grep -A9 -F '<main>'
  00000000004004b0 <main>:
    4004b0:	55                   	push   %rbp
    4004b1:	48 89 e5             	mov    %rsp,%rbp
    4004b4:	89 7d ec             	mov    %edi,-0x14(%rbp)
    4004b7:	48 89 75 e0          	mov    %rsi,-0x20(%rbp)
    4004bb:	48 c7 45 f8 34 12 00 	movq   $0x1234,-0x8(%rbp)
    4004c2:	00 
    4004c3:	b8 7f 00 00 00       	mov    $0x7f,%eax
    4004c8:	5d                   	pop    %rbp
    4004c9:	c3                   	retq   

  # nm foo | grep -e glob$ -e str -e foo
  00000000006008bc D foo
  00000000006008a8 D glob
  00000000006008ac D str

  # perf probe -x /home/namhyung/tmp/foo -a 'foo=main+0x13 glob=@0x8a8:s32 \
  > str=@0x8ac:string bit=@0x8bc:b10@2/32 argc=%di local=-0x8(%bp)'
  Added new event:
    probe_foo:foo      (on 0x4c3 with glob=@0x8a8:s32 str=@0x8ac:string 
                                 bit=@0x8bc:b10@2/32 argc=%di local=-0x8(%bp))

  You can now use it in all perf tools, such as:

          perf record -e probe_foo:foo -aR sleep 1

  # perf record -e probe_foo:foo ./foo
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.001 MB perf.data (~33 samples) ]

  # perf script | grep -v ^#
               foo  2008 [002  2199.867154: probe_foo:foo (4004c3)
                   glob=-1 str="hello uprobe." bit=5 argc=1 local=1234


This patchset is based on the current for-next branch of the Steven
Rostedt's linux-trace tree.  I also put this on my 'uprobe/fetch-v6'
branch in my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Hemant Kumar <hkshaw@linux.vnet.ibm.com>


Hyeoncheol Lee (2):
  tracing/kprobes: Move fetch functions to trace_kprobe.c
  tracing/kprobes: Add fetch{,_size} member into deref fetch method

Namhyung Kim (11):
  tracing/uprobes: Fix documentation of uprobe registration syntax
  tracing/probes: Fix basic print type functions
  tracing/kprobes: Staticize stack and memory fetch functions
  tracing/kprobes: Factor out struct trace_probe
  tracing/uprobes: Convert to struct trace_probe
  tracing/kprobes: Move common functions to trace_probe.h
  tracing/kprobes: Integrate duplicate set_print_fmt()
  tracing/uprobes: Fetch args before reserving a ring buffer
  tracing/kprobes: Add priv argument to fetch functions
  tracing/uprobes: Add more fetch functions
  tracing/uprobes: Add support for full argument access methods

 Documentation/trace/uprobetracer.txt |  35 +-
 kernel/trace/trace_kprobe.c          | 642 +++++++++++++++++++----------------
 kernel/trace/trace_probe.c           | 453 +++++++++---------------
 kernel/trace/trace_probe.h           | 202 ++++++++++-
 kernel/trace/trace_uprobe.c          | 458 +++++++++++++++++--------
 5 files changed, 1063 insertions(+), 727 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 01/13] tracing/uprobes: Fix documentation of uprobe registration syntax
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 02/13] tracing/probes: Fix basic print type functions Namhyung Kim
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

The uprobe syntax requires an offset after a file path not a symbol.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 Documentation/trace/uprobetracer.txt | 10 +++++-----
 kernel/trace/trace_uprobe.c          |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt
index d9c3e682312c..8f1a8b8956fc 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -19,15 +19,15 @@ user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -------------------------
-  p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                                  : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
+  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
+  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
-                  on SYMBOL+offs.
+                  on PATH+OFFSET.
   PATH          : Path to an executable or a library.
-  SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+  OFFSET        : Offset where the probe is inserted.
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 272261b5f94f..a415c5867ec5 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -210,7 +210,7 @@ end:
 
 /*
  * Argument syntax:
- *  - Add uprobe: p|r[:[GRP/]EVENT] PATH:SYMBOL [FETCHARGS]
+ *  - Add uprobe: p|r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS]
  *
  *  - Remove uprobe: -:[GRP/]EVENT
  */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 02/13] tracing/probes: Fix basic print type functions
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
  2013-10-29  6:53 ` [PATCH 01/13] tracing/uprobes: Fix documentation of uprobe registration syntax Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 03/13] tracing/kprobes: Move fetch functions to trace_kprobe.c Namhyung Kim
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

The print format of s32 type was "ld" and it's casted to "long".  So
it turned out to print 4294967295 for "-1" on 64-bit systems.  Not
sure whether it worked well on 32-bit systems.

Anyway, it'd be better if we have exact format and type cast for each
types on both of 32- and 64-bit systems.  In fact, the only difference
is on s64/u64 types.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_probe.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 412e959709b4..b571e4de0769 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -49,14 +49,19 @@ static __kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,	\
 }									\
 static const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
 
-DEFINE_BASIC_PRINT_TYPE_FUNC(u8, "%x", unsigned int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "%x", unsigned int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "%lx", unsigned long)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u8 , "%x", unsigned char)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "%x", unsigned short)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "%x", unsigned int)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s8,  "%d", signed char)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d", short)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%d", int)
+#if BITS_PER_LONG == 32
 DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "%llx", unsigned long long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s8, "%d", int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d", int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%ld", long)
 DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%lld", long long)
+#else /* BITS_PER_LONG == 64 */
+DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "%lx", unsigned long)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%ld", long)
+#endif
 
 static inline void *get_rloc_data(u32 *dl)
 {
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 03/13] tracing/kprobes: Move fetch functions to trace_kprobe.c
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
  2013-10-29  6:53 ` [PATCH 01/13] tracing/uprobes: Fix documentation of uprobe registration syntax Namhyung Kim
  2013-10-29  6:53 ` [PATCH 02/13] tracing/probes: Fix basic print type functions Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 04/13] tracing/kprobes: Add fetch{,_size} member into deref fetch method Namhyung Kim
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Hyeoncheol Lee <cheol.lee@lge.com>

Move kprobes-specific fetch functions to the trace_kprobe.c file.
Also define kprobes_fetch_type_table in the .c file.  This table is
shared with uprobes for now, but the uprobes will get its own table
in the later patch.

This is a preparation for supporting more fetch functions to uprobes
and no functional changes are intended.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Hyeoncheol Lee <cheol.lee@lge.com>
[namhyung@kernel.org: Split original patch into pieces as requested]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_kprobe.c | 169 +++++++++++++++++++++++++
 kernel/trace/trace_probe.c  | 299 +++++---------------------------------------
 kernel/trace/trace_probe.h  | 132 +++++++++++++++++++
 3 files changed, 335 insertions(+), 265 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 243f6834d026..1eff166990c2 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -754,6 +754,175 @@ static const struct file_operations kprobe_profile_ops = {
 	.release        = seq_release,
 };
 
+/*
+ * kprobes-specific fetch functions
+ */
+#define DEFINE_FETCH_stack(type)					\
+__kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,	\
+					  void *offset, void *dest)	\
+{									\
+	*(type *)dest = (type)regs_get_kernel_stack_nth(regs,		\
+				(unsigned int)((unsigned long)offset));	\
+}
+DEFINE_BASIC_FETCH_FUNCS(stack)
+/* No string on the stack entry */
+#define fetch_stack_string	NULL
+#define fetch_stack_string_size	NULL
+
+#define DEFINE_FETCH_memory(type)					\
+__kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,	\
+					  void *addr, void *dest)	\
+{									\
+	type retval;							\
+	if (probe_kernel_address(addr, retval))				\
+		*(type *)dest = 0;					\
+	else								\
+		*(type *)dest = retval;					\
+}
+DEFINE_BASIC_FETCH_FUNCS(memory)
+/*
+ * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
+ * length and relative data location.
+ */
+__kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
+					       void *addr, void *dest)
+{
+	long ret;
+	int maxlen = get_rloc_len(*(u32 *)dest);
+	u8 *dst = get_rloc_data(dest);
+	u8 *src = addr;
+	mm_segment_t old_fs = get_fs();
+
+	if (!maxlen)
+		return;
+
+	/*
+	 * Try to get string again, since the string can be changed while
+	 * probing.
+	 */
+	set_fs(KERNEL_DS);
+	pagefault_disable();
+
+	do
+		ret = __copy_from_user_inatomic(dst++, src++, 1);
+	while (dst[-1] && ret == 0 && src - (u8 *)addr < maxlen);
+
+	dst[-1] = '\0';
+	pagefault_enable();
+	set_fs(old_fs);
+
+	if (ret < 0) {	/* Failed to fetch string */
+		((u8 *)get_rloc_data(dest))[0] = '\0';
+		*(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
+	} else {
+		*(u32 *)dest = make_data_rloc(src - (u8 *)addr,
+					      get_rloc_offs(*(u32 *)dest));
+	}
+}
+
+/* Return the length of string -- including null terminal byte */
+__kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
+						    void *addr, void *dest)
+{
+	mm_segment_t old_fs;
+	int ret, len = 0;
+	u8 c;
+
+	old_fs = get_fs();
+	set_fs(KERNEL_DS);
+	pagefault_disable();
+
+	do {
+		ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1);
+		len++;
+	} while (c && ret == 0 && len < MAX_STRING_SIZE);
+
+	pagefault_enable();
+	set_fs(old_fs);
+
+	if (ret < 0)	/* Failed to check the length */
+		*(u32 *)dest = 0;
+	else
+		*(u32 *)dest = len;
+}
+
+/* Memory fetching by symbol */
+struct symbol_cache {
+	char		*symbol;
+	long		offset;
+	unsigned long	addr;
+};
+
+unsigned long update_symbol_cache(struct symbol_cache *sc)
+{
+	sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
+
+	if (sc->addr)
+		sc->addr += sc->offset;
+
+	return sc->addr;
+}
+
+void free_symbol_cache(struct symbol_cache *sc)
+{
+	kfree(sc->symbol);
+	kfree(sc);
+}
+
+struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
+{
+	struct symbol_cache *sc;
+
+	if (!sym || strlen(sym) == 0)
+		return NULL;
+
+	sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
+	if (!sc)
+		return NULL;
+
+	sc->symbol = kstrdup(sym, GFP_KERNEL);
+	if (!sc->symbol) {
+		kfree(sc);
+		return NULL;
+	}
+	sc->offset = offset;
+	update_symbol_cache(sc);
+
+	return sc;
+}
+
+#define DEFINE_FETCH_symbol(type)					\
+__kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,	\
+					  void *data, void *dest)	\
+{									\
+	struct symbol_cache *sc = data;					\
+	if (sc->addr)							\
+		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
+	else								\
+		*(type *)dest = 0;					\
+}
+DEFINE_BASIC_FETCH_FUNCS(symbol)
+DEFINE_FETCH_symbol(string)
+DEFINE_FETCH_symbol(string_size)
+
+/* Fetch type information table */
+const struct fetch_type kprobes_fetch_type_table[] = {
+	/* Special types */
+	[FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
+					sizeof(u32), 1, "__data_loc char[]"),
+	[FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
+					string_size, sizeof(u32), 0, "u32"),
+	/* Basic types */
+	ASSIGN_FETCH_TYPE(u8,  u8,  0),
+	ASSIGN_FETCH_TYPE(u16, u16, 0),
+	ASSIGN_FETCH_TYPE(u32, u32, 0),
+	ASSIGN_FETCH_TYPE(u64, u64, 0),
+	ASSIGN_FETCH_TYPE(s8,  u8,  1),
+	ASSIGN_FETCH_TYPE(s16, u16, 1),
+	ASSIGN_FETCH_TYPE(s32, u32, 1),
+	ASSIGN_FETCH_TYPE(s64, u64, 1),
+};
+
 /* Sum up total data length for dynamic arraies (strings) */
 static __kprobes int __get_data_size(struct trace_probe *tp,
 				     struct pt_regs *regs)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index b571e4de0769..41f654d24cd9 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -35,19 +35,15 @@ const char *reserved_field_names[] = {
 	FIELD_STRING_FUNC,
 };
 
-/* Printing function type */
-#define PRINT_TYPE_FUNC_NAME(type)	print_type_##type
-#define PRINT_TYPE_FMT_NAME(type)	print_type_format_##type
-
 /* Printing  in basic type function template */
 #define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt, cast)			\
-static __kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,	\
+__kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,		\
 						const char *name,	\
-						void *data, void *ent)\
+						void *data, void *ent)	\
 {									\
 	return trace_seq_printf(s, " %s=" fmt, name, (cast)*(type *)data);\
 }									\
-static const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
+const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
 
 DEFINE_BASIC_PRINT_TYPE_FUNC(u8 , "%x", unsigned char)
 DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "%x", unsigned short)
@@ -63,25 +59,10 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "%lx", unsigned long)
 DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%ld", long)
 #endif
 
-static inline void *get_rloc_data(u32 *dl)
-{
-	return (u8 *)dl + get_rloc_offs(*dl);
-}
-
-/* For data_loc conversion */
-static inline void *get_loc_data(u32 *dl, void *ent)
-{
-	return (u8 *)ent + get_rloc_offs(*dl);
-}
-
-/* For defining macros, define string/string_size types */
-typedef u32 string;
-typedef u32 string_size;
-
 /* Print type function for string type */
-static __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
-						  const char *name,
-						  void *data, void *ent)
+__kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
+					   const char *name,
+					   void *data, void *ent)
 {
 	int len = *(u32 *)data >> 16;
 
@@ -92,199 +73,25 @@ static __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
 					(const char *)get_loc_data(data, ent));
 }
 
-static const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
-
-#define FETCH_FUNC_NAME(method, type)	fetch_##method##_##type
-/*
- * Define macro for basic types - we don't need to define s* types, because
- * we have to care only about bitwidth at recording time.
- */
-#define DEFINE_BASIC_FETCH_FUNCS(method) \
-DEFINE_FETCH_##method(u8)		\
-DEFINE_FETCH_##method(u16)		\
-DEFINE_FETCH_##method(u32)		\
-DEFINE_FETCH_##method(u64)
-
-#define CHECK_FETCH_FUNCS(method, fn)			\
-	(((FETCH_FUNC_NAME(method, u8) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, u16) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, u32) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, u64) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, string) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
-	 && (fn != NULL))
+const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 
 /* Data fetch function templates */
 #define DEFINE_FETCH_reg(type)						\
-static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,	\
+__kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,		\
 					void *offset, void *dest)	\
 {									\
 	*(type *)dest = (type)regs_get_register(regs,			\
 				(unsigned int)((unsigned long)offset));	\
 }
 DEFINE_BASIC_FETCH_FUNCS(reg)
-/* No string on the register */
-#define fetch_reg_string	NULL
-#define fetch_reg_string_size	NULL
-
-#define DEFINE_FETCH_stack(type)					\
-static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
-					  void *offset, void *dest)	\
-{									\
-	*(type *)dest = (type)regs_get_kernel_stack_nth(regs,		\
-				(unsigned int)((unsigned long)offset));	\
-}
-DEFINE_BASIC_FETCH_FUNCS(stack)
-/* No string on the stack entry */
-#define fetch_stack_string	NULL
-#define fetch_stack_string_size	NULL
 
 #define DEFINE_FETCH_retval(type)					\
-static __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,\
+__kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,	\
 					  void *dummy, void *dest)	\
 {									\
 	*(type *)dest = (type)regs_return_value(regs);			\
 }
 DEFINE_BASIC_FETCH_FUNCS(retval)
-/* No string on the retval */
-#define fetch_retval_string		NULL
-#define fetch_retval_string_size	NULL
-
-#define DEFINE_FETCH_memory(type)					\
-static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
-					  void *addr, void *dest)	\
-{									\
-	type retval;							\
-	if (probe_kernel_address(addr, retval))				\
-		*(type *)dest = 0;					\
-	else								\
-		*(type *)dest = retval;					\
-}
-DEFINE_BASIC_FETCH_FUNCS(memory)
-/*
- * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
- * length and relative data location.
- */
-static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
-						      void *addr, void *dest)
-{
-	long ret;
-	int maxlen = get_rloc_len(*(u32 *)dest);
-	u8 *dst = get_rloc_data(dest);
-	u8 *src = addr;
-	mm_segment_t old_fs = get_fs();
-
-	if (!maxlen)
-		return;
-
-	/*
-	 * Try to get string again, since the string can be changed while
-	 * probing.
-	 */
-	set_fs(KERNEL_DS);
-	pagefault_disable();
-
-	do
-		ret = __copy_from_user_inatomic(dst++, src++, 1);
-	while (dst[-1] && ret == 0 && src - (u8 *)addr < maxlen);
-
-	dst[-1] = '\0';
-	pagefault_enable();
-	set_fs(old_fs);
-
-	if (ret < 0) {	/* Failed to fetch string */
-		((u8 *)get_rloc_data(dest))[0] = '\0';
-		*(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
-	} else {
-		*(u32 *)dest = make_data_rloc(src - (u8 *)addr,
-					      get_rloc_offs(*(u32 *)dest));
-	}
-}
-
-/* Return the length of string -- including null terminal byte */
-static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
-							void *addr, void *dest)
-{
-	mm_segment_t old_fs;
-	int ret, len = 0;
-	u8 c;
-
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	pagefault_disable();
-
-	do {
-		ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1);
-		len++;
-	} while (c && ret == 0 && len < MAX_STRING_SIZE);
-
-	pagefault_enable();
-	set_fs(old_fs);
-
-	if (ret < 0)	/* Failed to check the length */
-		*(u32 *)dest = 0;
-	else
-		*(u32 *)dest = len;
-}
-
-/* Memory fetching by symbol */
-struct symbol_cache {
-	char		*symbol;
-	long		offset;
-	unsigned long	addr;
-};
-
-static unsigned long update_symbol_cache(struct symbol_cache *sc)
-{
-	sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-
-	if (sc->addr)
-		sc->addr += sc->offset;
-
-	return sc->addr;
-}
-
-static void free_symbol_cache(struct symbol_cache *sc)
-{
-	kfree(sc->symbol);
-	kfree(sc);
-}
-
-static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
-{
-	struct symbol_cache *sc;
-
-	if (!sym || strlen(sym) == 0)
-		return NULL;
-
-	sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
-	if (!sc)
-		return NULL;
-
-	sc->symbol = kstrdup(sym, GFP_KERNEL);
-	if (!sc->symbol) {
-		kfree(sc);
-		return NULL;
-	}
-	sc->offset = offset;
-	update_symbol_cache(sc);
-
-	return sc;
-}
-
-#define DEFINE_FETCH_symbol(type)					\
-static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
-					  void *data, void *dest)	\
-{									\
-	struct symbol_cache *sc = data;					\
-	if (sc->addr)							\
-		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
-	else								\
-		*(type *)dest = 0;					\
-}
-DEFINE_BASIC_FETCH_FUNCS(symbol)
-DEFINE_FETCH_symbol(string)
-DEFINE_FETCH_symbol(string_size)
 
 /* Dereference memory access function */
 struct deref_fetch_param {
@@ -293,7 +100,7 @@ struct deref_fetch_param {
 };
 
 #define DEFINE_FETCH_deref(type)					\
-static __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,\
+__kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,	\
 					    void *data, void *dest)	\
 {									\
 	struct deref_fetch_param *dprm = data;				\
@@ -334,7 +141,7 @@ struct bitfield_fetch_param {
 };
 
 #define DEFINE_FETCH_bitfield(type)					\
-static __kprobes void FETCH_FUNC_NAME(bitfield, type)(struct pt_regs *regs,\
+__kprobes void FETCH_FUNC_NAME(bitfield, type)(struct pt_regs *regs,\
 					    void *data, void *dest)	\
 {									\
 	struct bitfield_fetch_param *bprm = data;			\
@@ -348,8 +155,6 @@ static __kprobes void FETCH_FUNC_NAME(bitfield, type)(struct pt_regs *regs,\
 }
 
 DEFINE_BASIC_FETCH_FUNCS(bitfield)
-#define fetch_bitfield_string		NULL
-#define fetch_bitfield_string_size	NULL
 
 static __kprobes void
 update_bitfield_fetch_param(struct bitfield_fetch_param *data)
@@ -385,52 +190,8 @@ free_bitfield_fetch_param(struct bitfield_fetch_param *data)
 #define DEFAULT_FETCH_TYPE _DEFAULT_FETCH_TYPE(BITS_PER_LONG)
 #define DEFAULT_FETCH_TYPE_STR __stringify(DEFAULT_FETCH_TYPE)
 
-#define ASSIGN_FETCH_FUNC(method, type)	\
-	[FETCH_MTD_##method] = FETCH_FUNC_NAME(method, type)
-
-#define __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype)	\
-	{.name = _name,				\
-	 .size = _size,					\
-	 .is_signed = sign,				\
-	 .print = PRINT_TYPE_FUNC_NAME(ptype),		\
-	 .fmt = PRINT_TYPE_FMT_NAME(ptype),		\
-	 .fmttype = _fmttype,				\
-	 .fetch = {					\
-ASSIGN_FETCH_FUNC(reg, ftype),				\
-ASSIGN_FETCH_FUNC(stack, ftype),			\
-ASSIGN_FETCH_FUNC(retval, ftype),			\
-ASSIGN_FETCH_FUNC(memory, ftype),			\
-ASSIGN_FETCH_FUNC(symbol, ftype),			\
-ASSIGN_FETCH_FUNC(deref, ftype),			\
-ASSIGN_FETCH_FUNC(bitfield, ftype),			\
-	  }						\
-	}
-
-#define ASSIGN_FETCH_TYPE(ptype, ftype, sign)			\
-	__ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype)
-
-#define FETCH_TYPE_STRING	0
-#define FETCH_TYPE_STRSIZE	1
-
-/* Fetch type information table */
-static const struct fetch_type fetch_type_table[] = {
-	/* Special types */
-	[FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
-					sizeof(u32), 1, "__data_loc char[]"),
-	[FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
-					string_size, sizeof(u32), 0, "u32"),
-	/* Basic types */
-	ASSIGN_FETCH_TYPE(u8,  u8,  0),
-	ASSIGN_FETCH_TYPE(u16, u16, 0),
-	ASSIGN_FETCH_TYPE(u32, u32, 0),
-	ASSIGN_FETCH_TYPE(u64, u64, 0),
-	ASSIGN_FETCH_TYPE(s8,  u8,  1),
-	ASSIGN_FETCH_TYPE(s16, u16, 1),
-	ASSIGN_FETCH_TYPE(s32, u32, 1),
-	ASSIGN_FETCH_TYPE(s64, u64, 1),
-};
-
-static const struct fetch_type *find_fetch_type(const char *type)
+static const struct fetch_type *find_fetch_type(const char *type,
+						const struct fetch_type *ttbl)
 {
 	int i;
 
@@ -451,21 +212,21 @@ static const struct fetch_type *find_fetch_type(const char *type)
 
 		switch (bs) {
 		case 8:
-			return find_fetch_type("u8");
+			return find_fetch_type("u8", ttbl);
 		case 16:
-			return find_fetch_type("u16");
+			return find_fetch_type("u16", ttbl);
 		case 32:
-			return find_fetch_type("u32");
+			return find_fetch_type("u32", ttbl);
 		case 64:
-			return find_fetch_type("u64");
+			return find_fetch_type("u64", ttbl);
 		default:
 			goto fail;
 		}
 	}
 
-	for (i = 0; i < ARRAY_SIZE(fetch_type_table); i++)
-		if (strcmp(type, fetch_type_table[i].name) == 0)
-			return &fetch_type_table[i];
+	for (i = 0; i < NR_FETCH_TYPES; i++)
+		if (strcmp(type, ttbl[i].name) == 0)
+			return &ttbl[i];
 
 fail:
 	return NULL;
@@ -479,16 +240,17 @@ static __kprobes void fetch_stack_address(struct pt_regs *regs,
 }
 
 static fetch_func_t get_fetch_size_function(const struct fetch_type *type,
-					fetch_func_t orig_fn)
+					    fetch_func_t orig_fn,
+					    const struct fetch_type *ttbl)
 {
 	int i;
 
-	if (type != &fetch_type_table[FETCH_TYPE_STRING])
+	if (type != &ttbl[FETCH_TYPE_STRING])
 		return NULL;	/* Only string type needs size function */
 
 	for (i = 0; i < FETCH_MTD_END; i++)
 		if (type->fetch[i] == orig_fn)
-			return fetch_type_table[FETCH_TYPE_STRSIZE].fetch[i];
+			return ttbl[FETCH_TYPE_STRSIZE].fetch[i];
 
 	WARN_ON(1);	/* This should not happen */
 
@@ -561,6 +323,9 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 	long offset;
 	char *tmp;
 	int ret;
+	const struct fetch_type *ttbl;
+
+	ttbl = kprobes_fetch_type_table;
 
 	ret = 0;
 
@@ -621,7 +386,7 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 			struct deref_fetch_param	*dprm;
 			const struct fetch_type		*t2;
 
-			t2 = find_fetch_type(NULL);
+			t2 = find_fetch_type(NULL, ttbl);
 			*tmp = '\0';
 			dprm = kzalloc(sizeof(struct deref_fetch_param), GFP_KERNEL);
 
@@ -692,6 +457,9 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 {
 	const char *t;
 	int ret;
+	const struct fetch_type *ttbl;
+
+	ttbl = kprobes_fetch_type_table;
 
 	if (strlen(arg) > MAX_ARGSTR_LEN) {
 		pr_info("Argument is too long.: %s\n",  arg);
@@ -707,7 +475,7 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 		arg[t - parg->comm] = '\0';
 		t++;
 	}
-	parg->type = find_fetch_type(t);
+	parg->type = find_fetch_type(t, ttbl);
 	if (!parg->type) {
 		pr_info("Unsupported type: %s\n", t);
 		return -EINVAL;
@@ -721,7 +489,8 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 
 	if (ret >= 0) {
 		parg->fetch_size.fn = get_fetch_size_function(parg->type,
-							      parg->fetch.fn);
+							      parg->fetch.fn,
+							      ttbl);
 		parg->fetch_size.data = parg->fetch.data;
 	}
 
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 5c7e09d10d74..8c62746e5419 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -81,11 +81,46 @@
  */
 #define convert_rloc_to_loc(dl, offs)	((u32)(dl) + (offs))
 
+static inline void *get_rloc_data(u32 *dl)
+{
+	return (u8 *)dl + get_rloc_offs(*dl);
+}
+
+/* For data_loc conversion */
+static inline void *get_loc_data(u32 *dl, void *ent)
+{
+	return (u8 *)ent + get_rloc_offs(*dl);
+}
+
+
+/* For defining macros, define string/string_size types */
+typedef u32 string;
+typedef u32 string_size;
+
 /* Data fetch function type */
 typedef	void (*fetch_func_t)(struct pt_regs *, void *, void *);
 /* Printing function type */
 typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void *);
 
+/* Printing function type */
+#define PRINT_TYPE_FUNC_NAME(type)	print_type_##type
+#define PRINT_TYPE_FMT_NAME(type)	print_type_format_##type
+
+#define DECLARE_PRINT_TYPE_FUNC(type)					\
+extern int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *, const char *, \
+				     void *, void *);			\
+extern const char PRINT_TYPE_FMT_NAME(type)[]
+
+DECLARE_PRINT_TYPE_FUNC(u8);
+DECLARE_PRINT_TYPE_FUNC(u16);
+DECLARE_PRINT_TYPE_FUNC(u32);
+DECLARE_PRINT_TYPE_FUNC(u64);
+DECLARE_PRINT_TYPE_FUNC(s8);
+DECLARE_PRINT_TYPE_FUNC(s16);
+DECLARE_PRINT_TYPE_FUNC(s32);
+DECLARE_PRINT_TYPE_FUNC(s64);
+DECLARE_PRINT_TYPE_FUNC(string);
+
 /* Fetch types */
 enum {
 	FETCH_MTD_reg = 0,
@@ -124,12 +159,109 @@ struct probe_arg {
 	const struct fetch_type	*type;	/* Type of this argument */
 };
 
+#define FETCH_FUNC_NAME(method, type)	fetch_##method##_##type
+
+#define DECLARE_FETCH_FUNC(method, type)				\
+extern void FETCH_FUNC_NAME(method, type)(struct pt_regs *, void *, void *)
+
+#define DECLARE_BASIC_FETCH_FUNCS(method) 	\
+DECLARE_FETCH_FUNC(method, u8);	  		\
+DECLARE_FETCH_FUNC(method, u16);		\
+DECLARE_FETCH_FUNC(method, u32);		\
+DECLARE_FETCH_FUNC(method, u64)
+
+/*
+ * Declare common fetch functions for both of kprobes and uprobes
+ */
+DECLARE_BASIC_FETCH_FUNCS(reg);
+#define fetch_reg_string		NULL
+#define fetch_reg_string_size		NULL
+
+DECLARE_BASIC_FETCH_FUNCS(stack);
+#define fetch_stack_string		NULL
+#define fetch_stack_string_size		NULL
+
+DECLARE_BASIC_FETCH_FUNCS(retval);
+#define fetch_retval_string		NULL
+#define fetch_retval_string_size	NULL
+
+DECLARE_BASIC_FETCH_FUNCS(memory);
+DECLARE_FETCH_FUNC(memory, string);
+DECLARE_FETCH_FUNC(memory, string_size);
+
+DECLARE_BASIC_FETCH_FUNCS(symbol);
+DECLARE_FETCH_FUNC(symbol, string);
+DECLARE_FETCH_FUNC(symbol, string_size);
+
+DECLARE_BASIC_FETCH_FUNCS(deref);
+DECLARE_FETCH_FUNC(deref, string);
+DECLARE_FETCH_FUNC(deref, string_size);
+
+DECLARE_BASIC_FETCH_FUNCS(bitfield);
+#define fetch_bitfield_string		NULL
+#define fetch_bitfield_string_size	NULL
+
+/*
+ * Define macro for basic types - we don't need to define s* types, because
+ * we have to care only about bitwidth at recording time.
+ */
+#define DEFINE_BASIC_FETCH_FUNCS(method) \
+DEFINE_FETCH_##method(u8)		\
+DEFINE_FETCH_##method(u16)		\
+DEFINE_FETCH_##method(u32)		\
+DEFINE_FETCH_##method(u64)
+
+#define CHECK_FETCH_FUNCS(method, fn)			\
+	(((FETCH_FUNC_NAME(method, u8) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, u16) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, u32) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, u64) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, string) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
+	 && (fn != NULL))
+
+#define ASSIGN_FETCH_FUNC(method, type)	\
+	[FETCH_MTD_##method] = FETCH_FUNC_NAME(method, type)
+
+#define __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype)	\
+	{.name = _name,				\
+	 .size = _size,					\
+	 .is_signed = sign,				\
+	 .print = PRINT_TYPE_FUNC_NAME(ptype),		\
+	 .fmt = PRINT_TYPE_FMT_NAME(ptype),		\
+	 .fmttype = _fmttype,				\
+	 .fetch = {					\
+ASSIGN_FETCH_FUNC(reg, ftype),				\
+ASSIGN_FETCH_FUNC(stack, ftype),			\
+ASSIGN_FETCH_FUNC(retval, ftype),			\
+ASSIGN_FETCH_FUNC(memory, ftype),			\
+ASSIGN_FETCH_FUNC(symbol, ftype),			\
+ASSIGN_FETCH_FUNC(deref, ftype),			\
+ASSIGN_FETCH_FUNC(bitfield, ftype),			\
+	  }						\
+	}
+
+#define ASSIGN_FETCH_TYPE(ptype, ftype, sign)			\
+	__ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype)
+
+#define FETCH_TYPE_STRING	0
+#define FETCH_TYPE_STRSIZE	1
+
+#define NR_FETCH_TYPES		10
+
+extern const struct fetch_type kprobes_fetch_type_table[];
+
 static inline __kprobes void call_fetch(struct fetch_param *fprm,
 				 struct pt_regs *regs, void *dest)
 {
 	return fprm->fn(regs, fprm->data, dest);
 }
 
+struct symbol_cache;
+unsigned long update_symbol_cache(struct symbol_cache *sc);
+void free_symbol_cache(struct symbol_cache *sc);
+struct symbol_cache *alloc_symbol_cache(const char *sym, long offset);
+
 /* Check the name is good for event/group/fields */
 static inline int is_good_name(const char *name)
 {
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 04/13] tracing/kprobes: Add fetch{,_size} member into deref fetch method
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (2 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 03/13] tracing/kprobes: Move fetch functions to trace_kprobe.c Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 05/13] tracing/kprobes: Staticize stack and memory fetch functions Namhyung Kim
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Hyeoncheol Lee <cheol.lee@lge.com>

The deref fetch methods access a memory region but it assumes that
it's a kernel memory since uprobes does not support them.

Add ->fetch and ->fetch_size member in order to provide a proper
access methods for supporting uprobes.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Hyeoncheol Lee <cheol.lee@lge.com>
[namhyung@kernel.org: Split original patch into pieces as requested]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_probe.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 41f654d24cd9..b7b8bda02d6e 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -97,6 +97,8 @@ DEFINE_BASIC_FETCH_FUNCS(retval)
 struct deref_fetch_param {
 	struct fetch_param	orig;
 	long			offset;
+	fetch_func_t		fetch;
+	fetch_func_t		fetch_size;
 };
 
 #define DEFINE_FETCH_deref(type)					\
@@ -108,13 +110,26 @@ __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,	\
 	call_fetch(&dprm->orig, regs, &addr);				\
 	if (addr) {							\
 		addr += dprm->offset;					\
-		fetch_memory_##type(regs, (void *)addr, dest);		\
+		dprm->fetch(regs, (void *)addr, dest);			\
 	} else								\
 		*(type *)dest = 0;					\
 }
 DEFINE_BASIC_FETCH_FUNCS(deref)
 DEFINE_FETCH_deref(string)
-DEFINE_FETCH_deref(string_size)
+
+__kprobes void FETCH_FUNC_NAME(deref, string_size)(struct pt_regs *regs,
+						   void *data, void *dest)
+{
+	struct deref_fetch_param *dprm = data;
+	unsigned long addr;
+
+	call_fetch(&dprm->orig, regs, &addr);
+	if (addr && dprm->fetch_size) {
+		addr += dprm->offset;
+		dprm->fetch_size(regs, (void *)addr, dest);
+	} else
+		*(string_size *)dest = 0;
+}
 
 static __kprobes void update_deref_fetch_param(struct deref_fetch_param *data)
 {
@@ -394,6 +409,9 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 				return -ENOMEM;
 
 			dprm->offset = offset;
+			dprm->fetch = t->fetch[FETCH_MTD_memory];
+			dprm->fetch_size = get_fetch_size_function(t,
+							dprm->fetch, ttbl);
 			ret = parse_probe_arg(arg, t2, &dprm->orig, is_return,
 							is_kprobe);
 			if (ret)
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 05/13] tracing/kprobes: Staticize stack and memory fetch functions
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (3 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 04/13] tracing/kprobes: Add fetch{,_size} member into deref fetch method Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 06/13] tracing/kprobes: Factor out struct trace_probe Namhyung Kim
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Those fetch functions need to be implemented differently for kprobes
and uprobes.  Since the deref fetch functions don't call those
directly anymore, we can make them static and implement them
separately.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_kprobe.c | 8 ++++----
 kernel/trace/trace_probe.h  | 8 --------
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1eff166990c2..fdb6dec11592 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -758,7 +758,7 @@ static const struct file_operations kprobe_profile_ops = {
  * kprobes-specific fetch functions
  */
 #define DEFINE_FETCH_stack(type)					\
-__kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,	\
+static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
 					  void *offset, void *dest)	\
 {									\
 	*(type *)dest = (type)regs_get_kernel_stack_nth(regs,		\
@@ -770,7 +770,7 @@ DEFINE_BASIC_FETCH_FUNCS(stack)
 #define fetch_stack_string_size	NULL
 
 #define DEFINE_FETCH_memory(type)					\
-__kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,	\
+static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
 					  void *addr, void *dest)	\
 {									\
 	type retval;							\
@@ -784,7 +784,7 @@ DEFINE_BASIC_FETCH_FUNCS(memory)
  * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
  * length and relative data location.
  */
-__kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
+static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
 					       void *addr, void *dest)
 {
 	long ret;
@@ -821,7 +821,7 @@ __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
 }
 
 /* Return the length of string -- including null terminal byte */
-__kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
+static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
 						    void *addr, void *dest)
 {
 	mm_segment_t old_fs;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 8c62746e5419..9ac7bdf607cc 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -177,18 +177,10 @@ DECLARE_BASIC_FETCH_FUNCS(reg);
 #define fetch_reg_string		NULL
 #define fetch_reg_string_size		NULL
 
-DECLARE_BASIC_FETCH_FUNCS(stack);
-#define fetch_stack_string		NULL
-#define fetch_stack_string_size		NULL
-
 DECLARE_BASIC_FETCH_FUNCS(retval);
 #define fetch_retval_string		NULL
 #define fetch_retval_string_size	NULL
 
-DECLARE_BASIC_FETCH_FUNCS(memory);
-DECLARE_FETCH_FUNC(memory, string);
-DECLARE_FETCH_FUNC(memory, string_size);
-
 DECLARE_BASIC_FETCH_FUNCS(symbol);
 DECLARE_FETCH_FUNC(symbol, string);
 DECLARE_FETCH_FUNC(symbol, string_size);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 06/13] tracing/kprobes: Factor out struct trace_probe
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (4 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 05/13] tracing/kprobes: Staticize stack and memory fetch functions Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 07/13] tracing/uprobes: Convert to " Namhyung Kim
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

There are functions that can be shared to both of kprobes and uprobes.
Separate common data structure to struct trace_probe and use it from
the shared functions.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_kprobe.c | 396 +++++++++++++++++++++-----------------------
 kernel/trace/trace_probe.h  |  20 +++
 2 files changed, 213 insertions(+), 203 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fdb6dec11592..6d33cfee9448 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -27,18 +27,12 @@
 /**
  * Kprobe event core functions
  */
-struct trace_probe {
+struct trace_kprobe {
 	struct list_head	list;
 	struct kretprobe	rp;	/* Use rp.kp for kprobe use */
 	unsigned long 		nhit;
-	unsigned int		flags;	/* For TP_FLAG_* */
 	const char		*symbol;	/* symbol name */
-	struct ftrace_event_class	class;
-	struct ftrace_event_call	call;
-	struct list_head	files;
-	ssize_t			size;		/* trace entry size */
-	unsigned int		nr_args;
-	struct probe_arg	args[];
+	struct trace_probe	p;
 };
 
 struct event_file_link {
@@ -46,56 +40,46 @@ struct event_file_link {
 	struct list_head		list;
 };
 
-#define SIZEOF_TRACE_PROBE(n)			\
-	(offsetof(struct trace_probe, args) +	\
+#define SIZEOF_TRACE_PROBE(n)				\
+	(offsetof(struct trace_kprobe, p.args) +	\
 	(sizeof(struct probe_arg) * (n)))
 
 
-static __kprobes bool trace_probe_is_return(struct trace_probe *tp)
+static __kprobes bool trace_kprobe_is_return(struct trace_kprobe *tk)
 {
-	return tp->rp.handler != NULL;
+	return tk->rp.handler != NULL;
 }
 
-static __kprobes const char *trace_probe_symbol(struct trace_probe *tp)
+static __kprobes const char *trace_kprobe_symbol(struct trace_kprobe *tk)
 {
-	return tp->symbol ? tp->symbol : "unknown";
+	return tk->symbol ? tk->symbol : "unknown";
 }
 
-static __kprobes unsigned long trace_probe_offset(struct trace_probe *tp)
+static __kprobes unsigned long trace_kprobe_offset(struct trace_kprobe *tk)
 {
-	return tp->rp.kp.offset;
+	return tk->rp.kp.offset;
 }
 
-static __kprobes bool trace_probe_is_enabled(struct trace_probe *tp)
+static __kprobes bool trace_kprobe_has_gone(struct trace_kprobe *tk)
 {
-	return !!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE));
+	return !!(kprobe_gone(&tk->rp.kp));
 }
 
-static __kprobes bool trace_probe_is_registered(struct trace_probe *tp)
-{
-	return !!(tp->flags & TP_FLAG_REGISTERED);
-}
-
-static __kprobes bool trace_probe_has_gone(struct trace_probe *tp)
-{
-	return !!(kprobe_gone(&tp->rp.kp));
-}
-
-static __kprobes bool trace_probe_within_module(struct trace_probe *tp,
-						struct module *mod)
+static __kprobes bool trace_kprobe_within_module(struct trace_kprobe *tk,
+						 struct module *mod)
 {
 	int len = strlen(mod->name);
-	const char *name = trace_probe_symbol(tp);
+	const char *name = trace_kprobe_symbol(tk);
 	return strncmp(mod->name, name, len) == 0 && name[len] == ':';
 }
 
-static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
+static __kprobes bool trace_kprobe_is_on_module(struct trace_kprobe *tk)
 {
-	return !!strchr(trace_probe_symbol(tp), ':');
+	return !!strchr(trace_kprobe_symbol(tk), ':');
 }
 
-static int register_probe_event(struct trace_probe *tp);
-static int unregister_probe_event(struct trace_probe *tp);
+static int register_kprobe_event(struct trace_kprobe *tk);
+static int unregister_kprobe_event(struct trace_kprobe *tk);
 
 static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
@@ -107,14 +91,14 @@ static int kretprobe_dispatcher(struct kretprobe_instance *ri,
 /*
  * Allocate new trace_probe and initialize it (including kprobes).
  */
-static struct trace_probe *alloc_trace_probe(const char *group,
+static struct trace_kprobe *alloc_trace_kprobe(const char *group,
 					     const char *event,
 					     void *addr,
 					     const char *symbol,
 					     unsigned long offs,
 					     int nargs, bool is_return)
 {
-	struct trace_probe *tp;
+	struct trace_kprobe *tp;
 	int ret = -ENOMEM;
 
 	tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL);
@@ -140,9 +124,9 @@ static struct trace_probe *alloc_trace_probe(const char *group,
 		goto error;
 	}
 
-	tp->call.class = &tp->class;
-	tp->call.name = kstrdup(event, GFP_KERNEL);
-	if (!tp->call.name)
+	tp->p.call.class = &tp->p.class;
+	tp->p.call.name = kstrdup(event, GFP_KERNEL);
+	if (!tp->p.call.name)
 		goto error;
 
 	if (!group || !is_good_name(group)) {
@@ -150,41 +134,41 @@ static struct trace_probe *alloc_trace_probe(const char *group,
 		goto error;
 	}
 
-	tp->class.system = kstrdup(group, GFP_KERNEL);
-	if (!tp->class.system)
+	tp->p.class.system = kstrdup(group, GFP_KERNEL);
+	if (!tp->p.class.system)
 		goto error;
 
 	INIT_LIST_HEAD(&tp->list);
-	INIT_LIST_HEAD(&tp->files);
+	INIT_LIST_HEAD(&tp->p.files);
 	return tp;
 error:
-	kfree(tp->call.name);
+	kfree(tp->p.call.name);
 	kfree(tp->symbol);
 	kfree(tp);
 	return ERR_PTR(ret);
 }
 
-static void free_trace_probe(struct trace_probe *tp)
+static void free_trace_kprobe(struct trace_kprobe *tp)
 {
 	int i;
 
-	for (i = 0; i < tp->nr_args; i++)
-		traceprobe_free_probe_arg(&tp->args[i]);
+	for (i = 0; i < tp->p.nr_args; i++)
+		traceprobe_free_probe_arg(&tp->p.args[i]);
 
-	kfree(tp->call.class->system);
-	kfree(tp->call.name);
+	kfree(tp->p.call.class->system);
+	kfree(tp->p.call.name);
 	kfree(tp->symbol);
 	kfree(tp);
 }
 
-static struct trace_probe *find_trace_probe(const char *event,
-					    const char *group)
+static struct trace_kprobe *find_trace_kprobe(const char *event,
+					      const char *group)
 {
-	struct trace_probe *tp;
+	struct trace_kprobe *tp;
 
 	list_for_each_entry(tp, &probe_list, list)
-		if (strcmp(tp->call.name, event) == 0 &&
-		    strcmp(tp->call.class->system, group) == 0)
+		if (strcmp(tp->p.call.name, event) == 0 &&
+		    strcmp(tp->p.call.class->system, group) == 0)
 			return tp;
 	return NULL;
 }
@@ -194,7 +178,7 @@ static struct trace_probe *find_trace_probe(const char *event,
  * if the file is NULL, enable "perf" handler, or enable "trace" handler.
  */
 static int
-enable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
+enable_trace_kprobe(struct trace_kprobe *tp, struct ftrace_event_file *file)
 {
 	int ret = 0;
 
@@ -208,14 +192,14 @@ enable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 		}
 
 		link->file = file;
-		list_add_tail_rcu(&link->list, &tp->files);
+		list_add_tail_rcu(&link->list, &tp->p.files);
 
-		tp->flags |= TP_FLAG_TRACE;
+		tp->p.flags |= TP_FLAG_TRACE;
 	} else
-		tp->flags |= TP_FLAG_PROFILE;
+		tp->p.flags |= TP_FLAG_PROFILE;
 
-	if (trace_probe_is_registered(tp) && !trace_probe_has_gone(tp)) {
-		if (trace_probe_is_return(tp))
+	if (trace_probe_is_registered(&tp->p) && !trace_kprobe_has_gone(tp)) {
+		if (trace_kprobe_is_return(tp))
 			ret = enable_kretprobe(&tp->rp);
 		else
 			ret = enable_kprobe(&tp->rp.kp);
@@ -241,14 +225,14 @@ find_event_file_link(struct trace_probe *tp, struct ftrace_event_file *file)
  * if the file is NULL, disable "perf" handler, or disable "trace" handler.
  */
 static int
-disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
+disable_trace_kprobe(struct trace_kprobe *tp, struct ftrace_event_file *file)
 {
 	struct event_file_link *link = NULL;
 	int wait = 0;
 	int ret = 0;
 
 	if (file) {
-		link = find_event_file_link(tp, file);
+		link = find_event_file_link(&tp->p, file);
 		if (!link) {
 			ret = -EINVAL;
 			goto out;
@@ -256,15 +240,15 @@ disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 
 		list_del_rcu(&link->list);
 		wait = 1;
-		if (!list_empty(&tp->files))
+		if (!list_empty(&tp->p.files))
 			goto out;
 
-		tp->flags &= ~TP_FLAG_TRACE;
+		tp->p.flags &= ~TP_FLAG_TRACE;
 	} else
-		tp->flags &= ~TP_FLAG_PROFILE;
+		tp->p.flags &= ~TP_FLAG_PROFILE;
 
-	if (!trace_probe_is_enabled(tp) && trace_probe_is_registered(tp)) {
-		if (trace_probe_is_return(tp))
+	if (!trace_probe_is_enabled(&tp->p) && trace_probe_is_registered(&tp->p)) {
+		if (trace_kprobe_is_return(tp))
 			disable_kretprobe(&tp->rp);
 		else
 			disable_kprobe(&tp->rp.kp);
@@ -288,33 +272,33 @@ disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 }
 
 /* Internal register function - just handle k*probes and flags */
-static int __register_trace_probe(struct trace_probe *tp)
+static int __register_trace_kprobe(struct trace_kprobe *tp)
 {
 	int i, ret;
 
-	if (trace_probe_is_registered(tp))
+	if (trace_probe_is_registered(&tp->p))
 		return -EINVAL;
 
-	for (i = 0; i < tp->nr_args; i++)
-		traceprobe_update_arg(&tp->args[i]);
+	for (i = 0; i < tp->p.nr_args; i++)
+		traceprobe_update_arg(&tp->p.args[i]);
 
 	/* Set/clear disabled flag according to tp->flag */
-	if (trace_probe_is_enabled(tp))
+	if (trace_probe_is_enabled(&tp->p))
 		tp->rp.kp.flags &= ~KPROBE_FLAG_DISABLED;
 	else
 		tp->rp.kp.flags |= KPROBE_FLAG_DISABLED;
 
-	if (trace_probe_is_return(tp))
+	if (trace_kprobe_is_return(tp))
 		ret = register_kretprobe(&tp->rp);
 	else
 		ret = register_kprobe(&tp->rp.kp);
 
 	if (ret == 0)
-		tp->flags |= TP_FLAG_REGISTERED;
+		tp->p.flags |= TP_FLAG_REGISTERED;
 	else {
 		pr_warning("Could not insert probe at %s+%lu: %d\n",
-			   trace_probe_symbol(tp), trace_probe_offset(tp), ret);
-		if (ret == -ENOENT && trace_probe_is_on_module(tp)) {
+			   trace_kprobe_symbol(tp), trace_kprobe_offset(tp), ret);
+		if (ret == -ENOENT && trace_kprobe_is_on_module(tp)) {
 			pr_warning("This probe might be able to register after"
 				   "target module is loaded. Continue.\n");
 			ret = 0;
@@ -330,14 +314,14 @@ static int __register_trace_probe(struct trace_probe *tp)
 }
 
 /* Internal unregister function - just handle k*probes and flags */
-static void __unregister_trace_probe(struct trace_probe *tp)
+static void __unregister_trace_kprobe(struct trace_kprobe *tp)
 {
-	if (trace_probe_is_registered(tp)) {
-		if (trace_probe_is_return(tp))
+	if (trace_probe_is_registered(&tp->p)) {
+		if (trace_kprobe_is_return(tp))
 			unregister_kretprobe(&tp->rp);
 		else
 			unregister_kprobe(&tp->rp.kp);
-		tp->flags &= ~TP_FLAG_REGISTERED;
+		tp->p.flags &= ~TP_FLAG_REGISTERED;
 		/* Cleanup kprobe for reuse */
 		if (tp->rp.kp.symbol_name)
 			tp->rp.kp.addr = NULL;
@@ -345,50 +329,50 @@ static void __unregister_trace_probe(struct trace_probe *tp)
 }
 
 /* Unregister a trace_probe and probe_event: call with locking probe_lock */
-static int unregister_trace_probe(struct trace_probe *tp)
+static int unregister_trace_kprobe(struct trace_kprobe *tp)
 {
 	/* Enabled event can not be unregistered */
-	if (trace_probe_is_enabled(tp))
+	if (trace_probe_is_enabled(&tp->p))
 		return -EBUSY;
 
 	/* Will fail if probe is being used by ftrace or perf */
-	if (unregister_probe_event(tp))
+	if (unregister_kprobe_event(tp))
 		return -EBUSY;
 
-	__unregister_trace_probe(tp);
+	__unregister_trace_kprobe(tp);
 	list_del(&tp->list);
 
 	return 0;
 }
 
 /* Register a trace_probe and probe_event */
-static int register_trace_probe(struct trace_probe *tp)
+static int register_trace_kprobe(struct trace_kprobe *tp)
 {
-	struct trace_probe *old_tp;
+	struct trace_kprobe *old_tp;
 	int ret;
 
 	mutex_lock(&probe_lock);
 
 	/* Delete old (same name) event if exist */
-	old_tp = find_trace_probe(tp->call.name, tp->call.class->system);
+	old_tp = find_trace_kprobe(tp->p.call.name, tp->p.call.class->system);
 	if (old_tp) {
-		ret = unregister_trace_probe(old_tp);
+		ret = unregister_trace_kprobe(old_tp);
 		if (ret < 0)
 			goto end;
-		free_trace_probe(old_tp);
+		free_trace_kprobe(old_tp);
 	}
 
 	/* Register new event */
-	ret = register_probe_event(tp);
+	ret = register_kprobe_event(tp);
 	if (ret) {
 		pr_warning("Failed to register probe event(%d)\n", ret);
 		goto end;
 	}
 
 	/* Register k*probe */
-	ret = __register_trace_probe(tp);
+	ret = __register_trace_kprobe(tp);
 	if (ret < 0)
-		unregister_probe_event(tp);
+		unregister_kprobe_event(tp);
 	else
 		list_add_tail(&tp->list, &probe_list);
 
@@ -398,11 +382,11 @@ end:
 }
 
 /* Module notifier call back, checking event on the module */
-static int trace_probe_module_callback(struct notifier_block *nb,
+static int trace_kprobe_module_callback(struct notifier_block *nb,
 				       unsigned long val, void *data)
 {
 	struct module *mod = data;
-	struct trace_probe *tp;
+	struct trace_kprobe *tp;
 	int ret;
 
 	if (val != MODULE_STATE_COMING)
@@ -411,14 +395,14 @@ static int trace_probe_module_callback(struct notifier_block *nb,
 	/* Update probes on coming module */
 	mutex_lock(&probe_lock);
 	list_for_each_entry(tp, &probe_list, list) {
-		if (trace_probe_within_module(tp, mod)) {
+		if (trace_kprobe_within_module(tp, mod)) {
 			/* Don't need to check busy - this should have gone. */
-			__unregister_trace_probe(tp);
-			ret = __register_trace_probe(tp);
+			__unregister_trace_kprobe(tp);
+			ret = __register_trace_kprobe(tp);
 			if (ret)
 				pr_warning("Failed to re-register probe %s on"
 					   "%s: %d\n",
-					   tp->call.name, mod->name, ret);
+					   tp->p.call.name, mod->name, ret);
 		}
 	}
 	mutex_unlock(&probe_lock);
@@ -426,12 +410,12 @@ static int trace_probe_module_callback(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
-static struct notifier_block trace_probe_module_nb = {
-	.notifier_call = trace_probe_module_callback,
+static struct notifier_block trace_kprobe_module_nb = {
+	.notifier_call = trace_kprobe_module_callback,
 	.priority = 1	/* Invoked after kprobe module callback */
 };
 
-static int create_trace_probe(int argc, char **argv)
+static int create_trace_kprobe(int argc, char **argv)
 {
 	/*
 	 * Argument syntax:
@@ -451,7 +435,7 @@ static int create_trace_probe(int argc, char **argv)
 	 * Type of args:
 	 *  FETCHARG:TYPE : use TYPE instead of unsigned long.
 	 */
-	struct trace_probe *tp;
+	struct trace_kprobe *tp;
 	int i, ret = 0;
 	bool is_return = false, is_delete = false;
 	char *symbol = NULL, *event = NULL, *group = NULL;
@@ -498,16 +482,16 @@ static int create_trace_probe(int argc, char **argv)
 			return -EINVAL;
 		}
 		mutex_lock(&probe_lock);
-		tp = find_trace_probe(event, group);
+		tp = find_trace_kprobe(event, group);
 		if (!tp) {
 			mutex_unlock(&probe_lock);
 			pr_info("Event %s/%s doesn't exist.\n", group, event);
 			return -ENOENT;
 		}
 		/* delete an event */
-		ret = unregister_trace_probe(tp);
+		ret = unregister_trace_kprobe(tp);
 		if (ret == 0)
-			free_trace_probe(tp);
+			free_trace_kprobe(tp);
 		mutex_unlock(&probe_lock);
 		return ret;
 	}
@@ -554,7 +538,7 @@ static int create_trace_probe(int argc, char **argv)
 				 is_return ? 'r' : 'p', addr);
 		event = buf;
 	}
-	tp = alloc_trace_probe(group, event, addr, symbol, offset, argc,
+	tp = alloc_trace_kprobe(group, event, addr, symbol, offset, argc,
 			       is_return);
 	if (IS_ERR(tp)) {
 		pr_info("Failed to allocate trace_probe.(%d)\n",
@@ -565,36 +549,38 @@ static int create_trace_probe(int argc, char **argv)
 	/* parse arguments */
 	ret = 0;
 	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+		struct probe_arg *parg = &tp->p.args[i];
+
 		/* Increment count for freeing args in error case */
-		tp->nr_args++;
+		tp->p.nr_args++;
 
 		/* Parse argument name */
 		arg = strchr(argv[i], '=');
 		if (arg) {
 			*arg++ = '\0';
-			tp->args[i].name = kstrdup(argv[i], GFP_KERNEL);
+			parg->name = kstrdup(argv[i], GFP_KERNEL);
 		} else {
 			arg = argv[i];
 			/* If argument name is omitted, set "argN" */
 			snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
-			tp->args[i].name = kstrdup(buf, GFP_KERNEL);
+			parg->name = kstrdup(buf, GFP_KERNEL);
 		}
 
-		if (!tp->args[i].name) {
+		if (!parg->name) {
 			pr_info("Failed to allocate argument[%d] name.\n", i);
 			ret = -ENOMEM;
 			goto error;
 		}
 
-		if (!is_good_name(tp->args[i].name)) {
+		if (!is_good_name(parg->name)) {
 			pr_info("Invalid argument[%d] name: %s\n",
-				i, tp->args[i].name);
+				i, parg->name);
 			ret = -EINVAL;
 			goto error;
 		}
 
-		if (traceprobe_conflict_field_name(tp->args[i].name,
-							tp->args, i)) {
+		if (traceprobe_conflict_field_name(parg->name,
+							tp->p.args, i)) {
 			pr_info("Argument[%d] name '%s' conflicts with "
 				"another field.\n", i, argv[i]);
 			ret = -EINVAL;
@@ -602,7 +588,7 @@ static int create_trace_probe(int argc, char **argv)
 		}
 
 		/* Parse fetch argument */
-		ret = traceprobe_parse_probe_arg(arg, &tp->size, &tp->args[i],
+		ret = traceprobe_parse_probe_arg(arg, &tp->p.size, parg,
 						is_return, true);
 		if (ret) {
 			pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
@@ -610,35 +596,35 @@ static int create_trace_probe(int argc, char **argv)
 		}
 	}
 
-	ret = register_trace_probe(tp);
+	ret = register_trace_kprobe(tp);
 	if (ret)
 		goto error;
 	return 0;
 
 error:
-	free_trace_probe(tp);
+	free_trace_kprobe(tp);
 	return ret;
 }
 
-static int release_all_trace_probes(void)
+static int release_all_trace_kprobes(void)
 {
-	struct trace_probe *tp;
+	struct trace_kprobe *tp;
 	int ret = 0;
 
 	mutex_lock(&probe_lock);
 	/* Ensure no probe is in use. */
 	list_for_each_entry(tp, &probe_list, list)
-		if (trace_probe_is_enabled(tp)) {
+		if (trace_probe_is_enabled(&tp->p)) {
 			ret = -EBUSY;
 			goto end;
 		}
 	/* TODO: Use batch unregistration */
 	while (!list_empty(&probe_list)) {
-		tp = list_entry(probe_list.next, struct trace_probe, list);
-		ret = unregister_trace_probe(tp);
+		tp = list_entry(probe_list.next, struct trace_kprobe, list);
+		ret = unregister_trace_kprobe(tp);
 		if (ret)
 			goto end;
-		free_trace_probe(tp);
+		free_trace_kprobe(tp);
 	}
 
 end:
@@ -666,22 +652,22 @@ static void probes_seq_stop(struct seq_file *m, void *v)
 
 static int probes_seq_show(struct seq_file *m, void *v)
 {
-	struct trace_probe *tp = v;
+	struct trace_kprobe *tp = v;
 	int i;
 
-	seq_printf(m, "%c", trace_probe_is_return(tp) ? 'r' : 'p');
-	seq_printf(m, ":%s/%s", tp->call.class->system, tp->call.name);
+	seq_printf(m, "%c", trace_kprobe_is_return(tp) ? 'r' : 'p');
+	seq_printf(m, ":%s/%s", tp->p.call.class->system, tp->p.call.name);
 
 	if (!tp->symbol)
 		seq_printf(m, " 0x%p", tp->rp.kp.addr);
 	else if (tp->rp.kp.offset)
-		seq_printf(m, " %s+%u", trace_probe_symbol(tp),
+		seq_printf(m, " %s+%u", trace_kprobe_symbol(tp),
 			   tp->rp.kp.offset);
 	else
-		seq_printf(m, " %s", trace_probe_symbol(tp));
+		seq_printf(m, " %s", trace_kprobe_symbol(tp));
 
-	for (i = 0; i < tp->nr_args; i++)
-		seq_printf(m, " %s=%s", tp->args[i].name, tp->args[i].comm);
+	for (i = 0; i < tp->p.nr_args; i++)
+		seq_printf(m, " %s=%s", tp->p.args[i].name, tp->p.args[i].comm);
 	seq_printf(m, "\n");
 
 	return 0;
@@ -699,7 +685,7 @@ static int probes_open(struct inode *inode, struct file *file)
 	int ret;
 
 	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
-		ret = release_all_trace_probes();
+		ret = release_all_trace_kprobes();
 		if (ret < 0)
 			return ret;
 	}
@@ -711,7 +697,7 @@ static ssize_t probes_write(struct file *file, const char __user *buffer,
 			    size_t count, loff_t *ppos)
 {
 	return traceprobe_probes_write(file, buffer, count, ppos,
-			create_trace_probe);
+			create_trace_kprobe);
 }
 
 static const struct file_operations kprobe_events_ops = {
@@ -726,9 +712,9 @@ static const struct file_operations kprobe_events_ops = {
 /* Probes profiling interfaces */
 static int probes_profile_seq_show(struct seq_file *m, void *v)
 {
-	struct trace_probe *tp = v;
+	struct trace_kprobe *tp = v;
 
-	seq_printf(m, "  %-44s %15lu %15lu\n", tp->call.name, tp->nhit,
+	seq_printf(m, "  %-44s %15lu %15lu\n", tp->p.call.name, tp->nhit,
 		   tp->rp.kp.nmissed);
 
 	return 0;
@@ -973,7 +959,7 @@ static __kprobes void store_trace_args(int ent_size, struct trace_probe *tp,
 
 /* Kprobe handler */
 static __kprobes void
-__kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
+__kprobe_trace_func(struct trace_kprobe *tp, struct pt_regs *regs,
 		    struct ftrace_event_file *ftrace_file)
 {
 	struct kprobe_trace_entry_head *entry;
@@ -981,7 +967,7 @@ __kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
 	struct ring_buffer *buffer;
 	int size, dsize, pc;
 	unsigned long irq_flags;
-	struct ftrace_event_call *call = &tp->call;
+	struct ftrace_event_call *call = &tp->p.call;
 
 	WARN_ON(call != ftrace_file->event_call);
 
@@ -991,8 +977,8 @@ __kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-	dsize = __get_data_size(tp, regs);
-	size = sizeof(*entry) + tp->size + dsize;
+	dsize = __get_data_size(&tp->p, regs);
+	size = sizeof(*entry) + tp->p.size + dsize;
 
 	event = trace_event_buffer_lock_reserve(&buffer, ftrace_file,
 						call->event.type,
@@ -1002,7 +988,7 @@ __kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
 
 	entry = ring_buffer_event_data(event);
 	entry->ip = (unsigned long)tp->rp.kp.addr;
-	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit_regs(buffer, event,
@@ -1010,17 +996,17 @@ __kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
 }
 
 static __kprobes void
-kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs)
+kprobe_trace_func(struct trace_kprobe *tp, struct pt_regs *regs)
 {
 	struct event_file_link *link;
 
-	list_for_each_entry_rcu(link, &tp->files, list)
+	list_for_each_entry_rcu(link, &tp->p.files, list)
 		__kprobe_trace_func(tp, regs, link->file);
 }
 
 /* Kretprobe handler */
 static __kprobes void
-__kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
+__kretprobe_trace_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 		       struct pt_regs *regs,
 		       struct ftrace_event_file *ftrace_file)
 {
@@ -1029,7 +1015,7 @@ __kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 	struct ring_buffer *buffer;
 	int size, pc, dsize;
 	unsigned long irq_flags;
-	struct ftrace_event_call *call = &tp->call;
+	struct ftrace_event_call *call = &tp->p.call;
 
 	WARN_ON(call != ftrace_file->event_call);
 
@@ -1039,8 +1025,8 @@ __kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-	dsize = __get_data_size(tp, regs);
-	size = sizeof(*entry) + tp->size + dsize;
+	dsize = __get_data_size(&tp->p, regs);
+	size = sizeof(*entry) + tp->p.size + dsize;
 
 	event = trace_event_buffer_lock_reserve(&buffer, ftrace_file,
 						call->event.type,
@@ -1051,7 +1037,7 @@ __kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 	entry = ring_buffer_event_data(event);
 	entry->func = (unsigned long)tp->rp.kp.addr;
 	entry->ret_ip = (unsigned long)ri->ret_addr;
-	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit_regs(buffer, event,
@@ -1059,12 +1045,12 @@ __kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 }
 
 static __kprobes void
-kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
+kretprobe_trace_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 		     struct pt_regs *regs)
 {
 	struct event_file_link *link;
 
-	list_for_each_entry_rcu(link, &tp->files, list)
+	list_for_each_entry_rcu(link, &tp->p.files, list)
 		__kretprobe_trace_func(tp, ri, regs, link->file);
 }
 
@@ -1152,16 +1138,18 @@ static int kprobe_event_define_fields(struct ftrace_event_call *event_call)
 {
 	int ret, i;
 	struct kprobe_trace_entry_head field;
-	struct trace_probe *tp = (struct trace_probe *)event_call->data;
+	struct trace_kprobe *tp = (struct trace_kprobe *)event_call->data;
 
 	DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
 	/* Set argument names as fields */
-	for (i = 0; i < tp->nr_args; i++) {
-		ret = trace_define_field(event_call, tp->args[i].type->fmttype,
-					 tp->args[i].name,
-					 sizeof(field) + tp->args[i].offset,
-					 tp->args[i].type->size,
-					 tp->args[i].type->is_signed,
+	for (i = 0; i < tp->p.nr_args; i++) {
+		struct probe_arg *parg = &tp->p.args[i];
+
+		ret = trace_define_field(event_call, parg->type->fmttype,
+					 parg->name,
+					 sizeof(field) + parg->offset,
+					 parg->type->size,
+					 parg->type->is_signed,
 					 FILTER_OTHER);
 		if (ret)
 			return ret;
@@ -1173,17 +1161,19 @@ static int kretprobe_event_define_fields(struct ftrace_event_call *event_call)
 {
 	int ret, i;
 	struct kretprobe_trace_entry_head field;
-	struct trace_probe *tp = (struct trace_probe *)event_call->data;
+	struct trace_kprobe *tp = (struct trace_kprobe *)event_call->data;
 
 	DEFINE_FIELD(unsigned long, func, FIELD_STRING_FUNC, 0);
 	DEFINE_FIELD(unsigned long, ret_ip, FIELD_STRING_RETIP, 0);
 	/* Set argument names as fields */
-	for (i = 0; i < tp->nr_args; i++) {
-		ret = trace_define_field(event_call, tp->args[i].type->fmttype,
-					 tp->args[i].name,
-					 sizeof(field) + tp->args[i].offset,
-					 tp->args[i].type->size,
-					 tp->args[i].type->is_signed,
+	for (i = 0; i < tp->p.nr_args; i++) {
+		struct probe_arg *parg = &tp->p.args[i];
+
+		ret = trace_define_field(event_call, parg->type->fmttype,
+					 parg->name,
+					 sizeof(field) + parg->offset,
+					 parg->type->size,
+					 parg->type->is_signed,
 					 FILTER_OTHER);
 		if (ret)
 			return ret;
@@ -1191,14 +1181,14 @@ static int kretprobe_event_define_fields(struct ftrace_event_call *event_call)
 	return 0;
 }
 
-static int __set_print_fmt(struct trace_probe *tp, char *buf, int len)
+static int __set_print_fmt(struct trace_kprobe *tp, char *buf, int len)
 {
 	int i;
 	int pos = 0;
 
 	const char *fmt, *arg;
 
-	if (!trace_probe_is_return(tp)) {
+	if (!trace_kprobe_is_return(tp)) {
 		fmt = "(%lx)";
 		arg = "REC->" FIELD_STRING_IP;
 	} else {
@@ -1211,21 +1201,21 @@ static int __set_print_fmt(struct trace_probe *tp, char *buf, int len)
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
 
-	for (i = 0; i < tp->nr_args; i++) {
+	for (i = 0; i < tp->p.nr_args; i++) {
 		pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
-				tp->args[i].name, tp->args[i].type->fmt);
+				tp->p.args[i].name, tp->p.args[i].type->fmt);
 	}
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
 
-	for (i = 0; i < tp->nr_args; i++) {
-		if (strcmp(tp->args[i].type->name, "string") == 0)
+	for (i = 0; i < tp->p.nr_args; i++) {
+		if (strcmp(tp->p.args[i].type->name, "string") == 0)
 			pos += snprintf(buf + pos, LEN_OR_ZERO,
 					", __get_str(%s)",
-					tp->args[i].name);
+					tp->p.args[i].name);
 		else
 			pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
-					tp->args[i].name);
+					tp->p.args[i].name);
 	}
 
 #undef LEN_OR_ZERO
@@ -1234,7 +1224,7 @@ static int __set_print_fmt(struct trace_probe *tp, char *buf, int len)
 	return pos;
 }
 
-static int set_print_fmt(struct trace_probe *tp)
+static int set_print_fmt(struct trace_kprobe *tp)
 {
 	int len;
 	char *print_fmt;
@@ -1247,7 +1237,7 @@ static int set_print_fmt(struct trace_probe *tp)
 
 	/* Second: actually write the @print_fmt */
 	__set_print_fmt(tp, print_fmt, len + 1);
-	tp->call.print_fmt = print_fmt;
+	tp->p.call.print_fmt = print_fmt;
 
 	return 0;
 }
@@ -1256,9 +1246,9 @@ static int set_print_fmt(struct trace_probe *tp)
 
 /* Kprobe profile handler */
 static __kprobes void
-kprobe_perf_func(struct trace_probe *tp, struct pt_regs *regs)
+kprobe_perf_func(struct trace_kprobe *tp, struct pt_regs *regs)
 {
-	struct ftrace_event_call *call = &tp->call;
+	struct ftrace_event_call *call = &tp->p.call;
 	struct kprobe_trace_entry_head *entry;
 	struct hlist_head *head;
 	int size, __size, dsize;
@@ -1268,8 +1258,8 @@ kprobe_perf_func(struct trace_probe *tp, struct pt_regs *regs)
 	if (hlist_empty(head))
 		return;
 
-	dsize = __get_data_size(tp, regs);
-	__size = sizeof(*entry) + tp->size + dsize;
+	dsize = __get_data_size(&tp->p, regs);
+	__size = sizeof(*entry) + tp->p.size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
@@ -1279,16 +1269,16 @@ kprobe_perf_func(struct trace_probe *tp, struct pt_regs *regs)
 
 	entry->ip = (unsigned long)tp->rp.kp.addr;
 	memset(&entry[1], 0, dsize);
-	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
 }
 
 /* Kretprobe profile handler */
 static __kprobes void
-kretprobe_perf_func(struct trace_probe *tp, struct kretprobe_instance *ri,
+kretprobe_perf_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 		    struct pt_regs *regs)
 {
-	struct ftrace_event_call *call = &tp->call;
+	struct ftrace_event_call *call = &tp->p.call;
 	struct kretprobe_trace_entry_head *entry;
 	struct hlist_head *head;
 	int size, __size, dsize;
@@ -1298,8 +1288,8 @@ kretprobe_perf_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 	if (hlist_empty(head))
 		return;
 
-	dsize = __get_data_size(tp, regs);
-	__size = sizeof(*entry) + tp->size + dsize;
+	dsize = __get_data_size(&tp->p, regs);
+	__size = sizeof(*entry) + tp->p.size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
@@ -1309,7 +1299,7 @@ kretprobe_perf_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 
 	entry->func = (unsigned long)tp->rp.kp.addr;
 	entry->ret_ip = (unsigned long)ri->ret_addr;
-	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
 }
 #endif	/* CONFIG_PERF_EVENTS */
@@ -1324,20 +1314,20 @@ static __kprobes
 int kprobe_register(struct ftrace_event_call *event,
 		    enum trace_reg type, void *data)
 {
-	struct trace_probe *tp = (struct trace_probe *)event->data;
+	struct trace_kprobe *tp = (struct trace_kprobe *)event->data;
 	struct ftrace_event_file *file = data;
 
 	switch (type) {
 	case TRACE_REG_REGISTER:
-		return enable_trace_probe(tp, file);
+		return enable_trace_kprobe(tp, file);
 	case TRACE_REG_UNREGISTER:
-		return disable_trace_probe(tp, file);
+		return disable_trace_kprobe(tp, file);
 
 #ifdef CONFIG_PERF_EVENTS
 	case TRACE_REG_PERF_REGISTER:
-		return enable_trace_probe(tp, NULL);
+		return enable_trace_kprobe(tp, NULL);
 	case TRACE_REG_PERF_UNREGISTER:
-		return disable_trace_probe(tp, NULL);
+		return disable_trace_kprobe(tp, NULL);
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
 	case TRACE_REG_PERF_ADD:
@@ -1351,14 +1341,14 @@ int kprobe_register(struct ftrace_event_call *event,
 static __kprobes
 int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
 {
-	struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
+	struct trace_kprobe *tp = container_of(kp, struct trace_kprobe, rp.kp);
 
 	tp->nhit++;
 
-	if (tp->flags & TP_FLAG_TRACE)
+	if (tp->p.flags & TP_FLAG_TRACE)
 		kprobe_trace_func(tp, regs);
 #ifdef CONFIG_PERF_EVENTS
-	if (tp->flags & TP_FLAG_PROFILE)
+	if (tp->p.flags & TP_FLAG_PROFILE)
 		kprobe_perf_func(tp, regs);
 #endif
 	return 0;	/* We don't tweek kernel, so just return 0 */
@@ -1367,14 +1357,14 @@ int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
 static __kprobes
 int kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
-	struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
+	struct trace_kprobe *tp = container_of(ri->rp, struct trace_kprobe, rp);
 
 	tp->nhit++;
 
-	if (tp->flags & TP_FLAG_TRACE)
+	if (tp->p.flags & TP_FLAG_TRACE)
 		kretprobe_trace_func(tp, ri, regs);
 #ifdef CONFIG_PERF_EVENTS
-	if (tp->flags & TP_FLAG_PROFILE)
+	if (tp->p.flags & TP_FLAG_PROFILE)
 		kretprobe_perf_func(tp, ri, regs);
 #endif
 	return 0;	/* We don't tweek kernel, so just return 0 */
@@ -1388,14 +1378,14 @@ static struct trace_event_functions kprobe_funcs = {
 	.trace		= print_kprobe_event
 };
 
-static int register_probe_event(struct trace_probe *tp)
+static int register_kprobe_event(struct trace_kprobe *tp)
 {
-	struct ftrace_event_call *call = &tp->call;
+	struct ftrace_event_call *call = &tp->p.call;
 	int ret;
 
 	/* Initialize ftrace_event_call */
 	INIT_LIST_HEAD(&call->class->fields);
-	if (trace_probe_is_return(tp)) {
+	if (trace_kprobe_is_return(tp)) {
 		call->event.funcs = &kretprobe_funcs;
 		call->class->define_fields = kretprobe_event_define_fields;
 	} else {
@@ -1421,14 +1411,14 @@ static int register_probe_event(struct trace_probe *tp)
 	return ret;
 }
 
-static int unregister_probe_event(struct trace_probe *tp)
+static int unregister_kprobe_event(struct trace_kprobe *tp)
 {
 	int ret;
 
 	/* tp->event is unregistered in trace_remove_event_call() */
-	ret = trace_remove_event_call(&tp->call);
+	ret = trace_remove_event_call(&tp->p.call);
 	if (!ret)
-		kfree(tp->call.print_fmt);
+		kfree(tp->p.call.print_fmt);
 	return ret;
 }
 
@@ -1438,7 +1428,7 @@ static __init int init_kprobe_trace(void)
 	struct dentry *d_tracer;
 	struct dentry *entry;
 
-	if (register_module_notifier(&trace_probe_module_nb))
+	if (register_module_notifier(&trace_kprobe_module_nb))
 		return -EINVAL;
 
 	d_tracer = tracing_init_dentry();
@@ -1478,12 +1468,12 @@ static __used int kprobe_trace_selftest_target(int a1, int a2, int a3,
 }
 
 static struct ftrace_event_file *
-find_trace_probe_file(struct trace_probe *tp, struct trace_array *tr)
+find_trace_probe_file(struct trace_kprobe *tp, struct trace_array *tr)
 {
 	struct ftrace_event_file *file;
 
 	list_for_each_entry(file, &tr->events, list)
-		if (file->event_call == &tp->call)
+		if (file->event_call == &tp->p.call)
 			return file;
 
 	return NULL;
@@ -1497,7 +1487,7 @@ static __init int kprobe_trace_self_tests_init(void)
 {
 	int ret, warn = 0;
 	int (*target)(int, int, int, int, int, int);
-	struct trace_probe *tp;
+	struct trace_kprobe *tp;
 	struct ftrace_event_file *file;
 
 	target = kprobe_trace_selftest_target;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 9ac7bdf607cc..63e5da4e3073 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -159,6 +159,26 @@ struct probe_arg {
 	const struct fetch_type	*type;	/* Type of this argument */
 };
 
+struct trace_probe {
+	unsigned int			flags;	/* For TP_FLAG_* */
+	struct ftrace_event_class	class;
+	struct ftrace_event_call	call;
+	struct list_head 		files;
+	ssize_t				size;	/* trace entry size */
+	unsigned int			nr_args;
+	struct probe_arg		args[];
+};
+
+static inline bool trace_probe_is_enabled(struct trace_probe *tp)
+{
+	return !!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE));
+}
+
+static inline bool trace_probe_is_registered(struct trace_probe *tp)
+{
+	return !!(tp->flags & TP_FLAG_REGISTERED);
+}
+
 #define FETCH_FUNC_NAME(method, type)	fetch_##method##_##type
 
 #define DECLARE_FETCH_FUNC(method, type)				\
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 07/13] tracing/uprobes: Convert to struct trace_probe
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (5 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 06/13] tracing/kprobes: Factor out struct trace_probe Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 08/13] tracing/kprobes: Move common functions to trace_probe.h Namhyung Kim
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Convert struct trace_uprobe to make use of the common trace_probe
structure.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_uprobe.c | 151 ++++++++++++++++++++++----------------------
 1 file changed, 75 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index a415c5867ec5..abb95529d851 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -51,22 +51,17 @@ struct trace_uprobe_filter {
  */
 struct trace_uprobe {
 	struct list_head		list;
-	struct ftrace_event_class	class;
-	struct ftrace_event_call	call;
 	struct trace_uprobe_filter	filter;
 	struct uprobe_consumer		consumer;
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
 	unsigned long			nhit;
-	unsigned int			flags;	/* For TP_FLAG_* */
-	ssize_t				size;	/* trace entry size */
-	unsigned int			nr_args;
-	struct probe_arg		args[];
+	struct trace_probe		p;
 };
 
-#define SIZEOF_TRACE_UPROBE(n)			\
-	(offsetof(struct trace_uprobe, args) +	\
+#define SIZEOF_TRACE_UPROBE(n)				\
+	(offsetof(struct trace_uprobe, p.args) +	\
 	(sizeof(struct probe_arg) * (n)))
 
 static int register_uprobe_event(struct trace_uprobe *tu);
@@ -114,13 +109,13 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
 	if (!tu)
 		return ERR_PTR(-ENOMEM);
 
-	tu->call.class = &tu->class;
-	tu->call.name = kstrdup(event, GFP_KERNEL);
-	if (!tu->call.name)
+	tu->p.call.class = &tu->p.class;
+	tu->p.call.name = kstrdup(event, GFP_KERNEL);
+	if (!tu->p.call.name)
 		goto error;
 
-	tu->class.system = kstrdup(group, GFP_KERNEL);
-	if (!tu->class.system)
+	tu->p.class.system = kstrdup(group, GFP_KERNEL);
+	if (!tu->p.class.system)
 		goto error;
 
 	INIT_LIST_HEAD(&tu->list);
@@ -131,7 +126,7 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
 	return tu;
 
 error:
-	kfree(tu->call.name);
+	kfree(tu->p.call.name);
 	kfree(tu);
 
 	return ERR_PTR(-ENOMEM);
@@ -141,12 +136,12 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
 {
 	int i;
 
-	for (i = 0; i < tu->nr_args; i++)
-		traceprobe_free_probe_arg(&tu->args[i]);
+	for (i = 0; i < tu->p.nr_args; i++)
+		traceprobe_free_probe_arg(&tu->p.args[i]);
 
 	iput(tu->inode);
-	kfree(tu->call.class->system);
-	kfree(tu->call.name);
+	kfree(tu->p.call.class->system);
+	kfree(tu->p.call.name);
 	kfree(tu->filename);
 	kfree(tu);
 }
@@ -156,8 +151,8 @@ static struct trace_uprobe *find_probe_event(const char *event, const char *grou
 	struct trace_uprobe *tu;
 
 	list_for_each_entry(tu, &uprobe_list, list)
-		if (strcmp(tu->call.name, event) == 0 &&
-		    strcmp(tu->call.class->system, group) == 0)
+		if (strcmp(tu->p.call.name, event) == 0 &&
+		    strcmp(tu->p.call.class->system, group) == 0)
 			return tu;
 
 	return NULL;
@@ -186,7 +181,7 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 	mutex_lock(&uprobe_lock);
 
 	/* register as an event */
-	old_tp = find_probe_event(tu->call.name, tu->call.class->system);
+	old_tp = find_probe_event(tu->p.call.name, tu->p.call.class->system);
 	if (old_tp) {
 		/* delete old event */
 		ret = unregister_trace_uprobe(old_tp);
@@ -359,34 +354,36 @@ static int create_trace_uprobe(int argc, char **argv)
 	/* parse arguments */
 	ret = 0;
 	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+		struct probe_arg *parg = &tu->p.args[i];
+
 		/* Increment count for freeing args in error case */
-		tu->nr_args++;
+		tu->p.nr_args++;
 
 		/* Parse argument name */
 		arg = strchr(argv[i], '=');
 		if (arg) {
 			*arg++ = '\0';
-			tu->args[i].name = kstrdup(argv[i], GFP_KERNEL);
+			parg->name = kstrdup(argv[i], GFP_KERNEL);
 		} else {
 			arg = argv[i];
 			/* If argument name is omitted, set "argN" */
 			snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
-			tu->args[i].name = kstrdup(buf, GFP_KERNEL);
+			parg->name = kstrdup(buf, GFP_KERNEL);
 		}
 
-		if (!tu->args[i].name) {
+		if (!parg->name) {
 			pr_info("Failed to allocate argument[%d] name.\n", i);
 			ret = -ENOMEM;
 			goto error;
 		}
 
-		if (!is_good_name(tu->args[i].name)) {
-			pr_info("Invalid argument[%d] name: %s\n", i, tu->args[i].name);
+		if (!is_good_name(parg->name)) {
+			pr_info("Invalid argument[%d] name: %s\n", i, parg->name);
 			ret = -EINVAL;
 			goto error;
 		}
 
-		if (traceprobe_conflict_field_name(tu->args[i].name, tu->args, i)) {
+		if (traceprobe_conflict_field_name(parg->name, tu->p.args, i)) {
 			pr_info("Argument[%d] name '%s' conflicts with "
 				"another field.\n", i, argv[i]);
 			ret = -EINVAL;
@@ -394,7 +391,8 @@ static int create_trace_uprobe(int argc, char **argv)
 		}
 
 		/* Parse fetch argument */
-		ret = traceprobe_parse_probe_arg(arg, &tu->size, &tu->args[i], false, false);
+		ret = traceprobe_parse_probe_arg(arg, &tu->p.size, parg,
+						 false, false);
 		if (ret) {
 			pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
 			goto error;
@@ -458,11 +456,11 @@ static int probes_seq_show(struct seq_file *m, void *v)
 	char c = is_ret_probe(tu) ? 'r' : 'p';
 	int i;
 
-	seq_printf(m, "%c:%s/%s", c, tu->call.class->system, tu->call.name);
+	seq_printf(m, "%c:%s/%s", c, tu->p.call.class->system, tu->p.call.name);
 	seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);
 
-	for (i = 0; i < tu->nr_args; i++)
-		seq_printf(m, " %s=%s", tu->args[i].name, tu->args[i].comm);
+	for (i = 0; i < tu->p.nr_args; i++)
+		seq_printf(m, " %s=%s", tu->p.args[i].name, tu->p.args[i].comm);
 
 	seq_printf(m, "\n");
 	return 0;
@@ -508,7 +506,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
 {
 	struct trace_uprobe *tu = v;
 
-	seq_printf(m, "  %s %-44s %15lu\n", tu->filename, tu->call.name, tu->nhit);
+	seq_printf(m, "  %s %-44s %15lu\n", tu->filename, tu->p.call.name, tu->nhit);
 	return 0;
 }
 
@@ -540,11 +538,11 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 	struct ring_buffer *buffer;
 	void *data;
 	int size, i;
-	struct ftrace_event_call *call = &tu->call;
+	struct ftrace_event_call *call = &tu->p.call;
 
 	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
-						  size + tu->size, 0, 0);
+						  size + tu->p.size, 0, 0);
 	if (!event)
 		return;
 
@@ -558,8 +556,10 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->nr_args; i++)
-		call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
+	for (i = 0; i < tu->p.nr_args; i++) {
+		call_fetch(&tu->p.args[i].fetch, regs,
+			   data + tu->p.args[i].offset);
+	}
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit(buffer, event, 0, 0);
@@ -590,23 +590,24 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
 	int i;
 
 	entry = (struct uprobe_trace_entry_head *)iter->ent;
-	tu = container_of(event, struct trace_uprobe, call.event);
+	tu = container_of(event, struct trace_uprobe, p.call.event);
 
 	if (is_ret_probe(tu)) {
-		if (!trace_seq_printf(s, "%s: (0x%lx <- 0x%lx)", tu->call.name,
+		if (!trace_seq_printf(s, "%s: (0x%lx <- 0x%lx)", tu->p.call.name,
 					entry->vaddr[1], entry->vaddr[0]))
 			goto partial;
 		data = DATAOF_TRACE_ENTRY(entry, true);
 	} else {
-		if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name,
+		if (!trace_seq_printf(s, "%s: (0x%lx)", tu->p.call.name,
 					entry->vaddr[0]))
 			goto partial;
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->nr_args; i++) {
-		if (!tu->args[i].type->print(s, tu->args[i].name,
-					     data + tu->args[i].offset, entry))
+	for (i = 0; i < tu->p.nr_args; i++) {
+		struct probe_arg *parg = &tu->p.args[i];
+
+		if (!parg->type->print(s, parg->name, data + parg->offset, entry))
 			goto partial;
 	}
 
@@ -617,11 +618,6 @@ partial:
 	return TRACE_TYPE_PARTIAL_LINE;
 }
 
-static inline bool is_trace_uprobe_enabled(struct trace_uprobe *tu)
-{
-	return tu->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE);
-}
-
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
@@ -631,29 +627,29 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
 {
 	int ret = 0;
 
-	if (is_trace_uprobe_enabled(tu))
+	if (trace_probe_is_enabled(&tu->p))
 		return -EINTR;
 
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
-	tu->flags |= flag;
+	tu->p.flags |= flag;
 	tu->consumer.filter = filter;
 	ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
 	if (ret)
-		tu->flags &= ~flag;
+		tu->p.flags &= ~flag;
 
 	return ret;
 }
 
 static void probe_event_disable(struct trace_uprobe *tu, int flag)
 {
-	if (!is_trace_uprobe_enabled(tu))
+	if (!trace_probe_is_enabled(&tu->p))
 		return;
 
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
-	tu->flags &= ~flag;
+	tu->p.flags &= ~flag;
 }
 
 static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
@@ -671,12 +667,12 @@ static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
 		size = SIZEOF_TRACE_ENTRY(false);
 	}
 	/* Set argument names as fields */
-	for (i = 0; i < tu->nr_args; i++) {
-		ret = trace_define_field(event_call, tu->args[i].type->fmttype,
-					 tu->args[i].name,
-					 size + tu->args[i].offset,
-					 tu->args[i].type->size,
-					 tu->args[i].type->is_signed,
+	for (i = 0; i < tu->p.nr_args; i++) {
+		struct probe_arg *parg = &tu->p.args[i];
+
+		ret = trace_define_field(event_call, parg->type->fmttype,
+					 parg->name, size + parg->offset,
+					 parg->type->size, parg->type->is_signed,
 					 FILTER_OTHER);
 
 		if (ret)
@@ -704,16 +700,16 @@ static int __set_print_fmt(struct trace_uprobe *tu, char *buf, int len)
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
 
-	for (i = 0; i < tu->nr_args; i++) {
+	for (i = 0; i < tu->p.nr_args; i++) {
 		pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
-				tu->args[i].name, tu->args[i].type->fmt);
+				tu->p.args[i].name, tu->p.args[i].type->fmt);
 	}
 
 	pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
 
-	for (i = 0; i < tu->nr_args; i++) {
+	for (i = 0; i < tu->p.nr_args; i++) {
 		pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
-				tu->args[i].name);
+				tu->p.args[i].name);
 	}
 
 	return pos;	/* return the length of print_fmt */
@@ -733,7 +729,7 @@ static int set_print_fmt(struct trace_uprobe *tu)
 
 	/* Second: actually write the @print_fmt */
 	__set_print_fmt(tu, print_fmt, len + 1);
-	tu->call.print_fmt = print_fmt;
+	tu->p.call.print_fmt = print_fmt;
 
 	return 0;
 }
@@ -830,14 +826,14 @@ static bool uprobe_perf_filter(struct uprobe_consumer *uc,
 static void uprobe_perf_print(struct trace_uprobe *tu,
 				unsigned long func, struct pt_regs *regs)
 {
-	struct ftrace_event_call *call = &tu->call;
+	struct ftrace_event_call *call = &tu->p.call;
 	struct uprobe_trace_entry_head *entry;
 	struct hlist_head *head;
 	void *data;
 	int size, rctx, i;
 
 	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
-	size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
@@ -857,8 +853,11 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->nr_args; i++)
-		call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
+	for (i = 0; i < tu->p.nr_args; i++) {
+		struct probe_arg *parg = &tu->p.args[i];
+
+		call_fetch(&parg->fetch, regs, data + parg->offset);
+	}
 
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
  out:
@@ -925,11 +924,11 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 	tu = container_of(con, struct trace_uprobe, consumer);
 	tu->nhit++;
 
-	if (tu->flags & TP_FLAG_TRACE)
+	if (tu->p.flags & TP_FLAG_TRACE)
 		ret |= uprobe_trace_func(tu, regs);
 
 #ifdef CONFIG_PERF_EVENTS
-	if (tu->flags & TP_FLAG_PROFILE)
+	if (tu->p.flags & TP_FLAG_PROFILE)
 		ret |= uprobe_perf_func(tu, regs);
 #endif
 	return ret;
@@ -942,11 +941,11 @@ static int uretprobe_dispatcher(struct uprobe_consumer *con,
 
 	tu = container_of(con, struct trace_uprobe, consumer);
 
-	if (tu->flags & TP_FLAG_TRACE)
+	if (tu->p.flags & TP_FLAG_TRACE)
 		uretprobe_trace_func(tu, func, regs);
 
 #ifdef CONFIG_PERF_EVENTS
-	if (tu->flags & TP_FLAG_PROFILE)
+	if (tu->p.flags & TP_FLAG_PROFILE)
 		uretprobe_perf_func(tu, func, regs);
 #endif
 	return 0;
@@ -958,7 +957,7 @@ static struct trace_event_functions uprobe_funcs = {
 
 static int register_uprobe_event(struct trace_uprobe *tu)
 {
-	struct ftrace_event_call *call = &tu->call;
+	struct ftrace_event_call *call = &tu->p.call;
 	int ret;
 
 	/* Initialize ftrace_event_call */
@@ -993,11 +992,11 @@ static int unregister_uprobe_event(struct trace_uprobe *tu)
 	int ret;
 
 	/* tu->event is unregistered in trace_remove_event_call() */
-	ret = trace_remove_event_call(&tu->call);
+	ret = trace_remove_event_call(&tu->p.call);
 	if (ret)
 		return ret;
-	kfree(tu->call.print_fmt);
-	tu->call.print_fmt = NULL;
+	kfree(tu->p.call.print_fmt);
+	tu->p.call.print_fmt = NULL;
 	return 0;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 08/13] tracing/kprobes: Move common functions to trace_probe.h
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (6 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 07/13] tracing/uprobes: Convert to " Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 09/13] tracing/kprobes: Integrate duplicate set_print_fmt() Namhyung Kim
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

The __get_data_size() and store_trace_args() will be used by uprobes
too.  Move them to a common location.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_kprobe.c | 48 ---------------------------------------------
 kernel/trace/trace_probe.h  | 48 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 6d33cfee9448..2a668516f0e4 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -909,54 +909,6 @@ const struct fetch_type kprobes_fetch_type_table[] = {
 	ASSIGN_FETCH_TYPE(s64, u64, 1),
 };
 
-/* Sum up total data length for dynamic arraies (strings) */
-static __kprobes int __get_data_size(struct trace_probe *tp,
-				     struct pt_regs *regs)
-{
-	int i, ret = 0;
-	u32 len;
-
-	for (i = 0; i < tp->nr_args; i++)
-		if (unlikely(tp->args[i].fetch_size.fn)) {
-			call_fetch(&tp->args[i].fetch_size, regs, &len);
-			ret += len;
-		}
-
-	return ret;
-}
-
-/* Store the value of each argument */
-static __kprobes void store_trace_args(int ent_size, struct trace_probe *tp,
-				       struct pt_regs *regs,
-				       u8 *data, int maxlen)
-{
-	int i;
-	u32 end = tp->size;
-	u32 *dl;	/* Data (relative) location */
-
-	for (i = 0; i < tp->nr_args; i++) {
-		if (unlikely(tp->args[i].fetch_size.fn)) {
-			/*
-			 * First, we set the relative location and
-			 * maximum data length to *dl
-			 */
-			dl = (u32 *)(data + tp->args[i].offset);
-			*dl = make_data_rloc(maxlen, end - tp->args[i].offset);
-			/* Then try to fetch string or dynamic array data */
-			call_fetch(&tp->args[i].fetch, regs, dl);
-			/* Reduce maximum length */
-			end += get_rloc_len(*dl);
-			maxlen -= get_rloc_len(*dl);
-			/* Trick here, convert data_rloc to data_loc */
-			*dl = convert_rloc_to_loc(*dl,
-				 ent_size + tp->args[i].offset);
-		} else
-			/* Just fetching data normally */
-			call_fetch(&tp->args[i].fetch, regs,
-				   data + tp->args[i].offset);
-	}
-}
-
 /* Kprobe handler */
 static __kprobes void
 __kprobe_trace_func(struct trace_kprobe *tp, struct pt_regs *regs,
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 63e5da4e3073..189a40baea98 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -302,3 +302,51 @@ extern ssize_t traceprobe_probes_write(struct file *file,
 		int (*createfn)(int, char**));
 
 extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
+
+/* Sum up total data length for dynamic arraies (strings) */
+static inline __kprobes int
+__get_data_size(struct trace_probe *tp, struct pt_regs *regs)
+{
+	int i, ret = 0;
+	u32 len;
+
+	for (i = 0; i < tp->nr_args; i++)
+		if (unlikely(tp->args[i].fetch_size.fn)) {
+			call_fetch(&tp->args[i].fetch_size, regs, &len);
+			ret += len;
+		}
+
+	return ret;
+}
+
+/* Store the value of each argument */
+static inline __kprobes void
+store_trace_args(int ent_size, struct trace_probe *tp, struct pt_regs *regs,
+		 u8 *data, int maxlen)
+{
+	int i;
+	u32 end = tp->size;
+	u32 *dl;	/* Data (relative) location */
+
+	for (i = 0; i < tp->nr_args; i++) {
+		if (unlikely(tp->args[i].fetch_size.fn)) {
+			/*
+			 * First, we set the relative location and
+			 * maximum data length to *dl
+			 */
+			dl = (u32 *)(data + tp->args[i].offset);
+			*dl = make_data_rloc(maxlen, end - tp->args[i].offset);
+			/* Then try to fetch string or dynamic array data */
+			call_fetch(&tp->args[i].fetch, regs, dl);
+			/* Reduce maximum length */
+			end += get_rloc_len(*dl);
+			maxlen -= get_rloc_len(*dl);
+			/* Trick here, convert data_rloc to data_loc */
+			*dl = convert_rloc_to_loc(*dl,
+				 ent_size + tp->args[i].offset);
+		} else
+			/* Just fetching data normally */
+			call_fetch(&tp->args[i].fetch, regs,
+				   data + tp->args[i].offset);
+	}
+}
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 09/13] tracing/kprobes: Integrate duplicate set_print_fmt()
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (7 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 08/13] tracing/kprobes: Move common functions to trace_probe.h Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-29  6:53 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

The set_print_fmt() functions are implemented almost same for
[ku]probes.  Move it to a common place and get rid of the duplication.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_kprobe.c | 63 +--------------------------------------------
 kernel/trace/trace_probe.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_probe.h  |  2 ++
 kernel/trace/trace_uprobe.c | 55 +--------------------------------------
 4 files changed, 66 insertions(+), 116 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 2a668516f0e4..3159b114f215 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1133,67 +1133,6 @@ static int kretprobe_event_define_fields(struct ftrace_event_call *event_call)
 	return 0;
 }
 
-static int __set_print_fmt(struct trace_kprobe *tp, char *buf, int len)
-{
-	int i;
-	int pos = 0;
-
-	const char *fmt, *arg;
-
-	if (!trace_kprobe_is_return(tp)) {
-		fmt = "(%lx)";
-		arg = "REC->" FIELD_STRING_IP;
-	} else {
-		fmt = "(%lx <- %lx)";
-		arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
-	}
-
-	/* When len=0, we just calculate the needed length */
-#define LEN_OR_ZERO (len ? len - pos : 0)
-
-	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
-
-	for (i = 0; i < tp->p.nr_args; i++) {
-		pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
-				tp->p.args[i].name, tp->p.args[i].type->fmt);
-	}
-
-	pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
-
-	for (i = 0; i < tp->p.nr_args; i++) {
-		if (strcmp(tp->p.args[i].type->name, "string") == 0)
-			pos += snprintf(buf + pos, LEN_OR_ZERO,
-					", __get_str(%s)",
-					tp->p.args[i].name);
-		else
-			pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
-					tp->p.args[i].name);
-	}
-
-#undef LEN_OR_ZERO
-
-	/* return the length of print_fmt */
-	return pos;
-}
-
-static int set_print_fmt(struct trace_kprobe *tp)
-{
-	int len;
-	char *print_fmt;
-
-	/* First: called with 0 length to calculate the needed length */
-	len = __set_print_fmt(tp, NULL, 0);
-	print_fmt = kmalloc(len + 1, GFP_KERNEL);
-	if (!print_fmt)
-		return -ENOMEM;
-
-	/* Second: actually write the @print_fmt */
-	__set_print_fmt(tp, print_fmt, len + 1);
-	tp->p.call.print_fmt = print_fmt;
-
-	return 0;
-}
-
 #ifdef CONFIG_PERF_EVENTS
 
 /* Kprobe profile handler */
@@ -1344,7 +1283,7 @@ static int register_kprobe_event(struct trace_kprobe *tp)
 		call->event.funcs = &kprobe_funcs;
 		call->class->define_fields = kprobe_event_define_fields;
 	}
-	if (set_print_fmt(tp) < 0)
+	if (set_print_fmt(&tp->p, trace_kprobe_is_return(tp)) < 0)
 		return -ENOMEM;
 	ret = register_ftrace_event(&call->event);
 	if (!ret) {
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index b7b8bda02d6e..1ab83d4c7775 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -629,3 +629,65 @@ out:
 
 	return ret;
 }
+
+static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
+			   bool is_return)
+{
+	int i;
+	int pos = 0;
+
+	const char *fmt, *arg;
+
+	if (!is_return) {
+		fmt = "(%lx)";
+		arg = "REC->" FIELD_STRING_IP;
+	} else {
+		fmt = "(%lx <- %lx)";
+		arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
+	}
+
+	/* When len=0, we just calculate the needed length */
+#define LEN_OR_ZERO (len ? len - pos : 0)
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
+
+	for (i = 0; i < tp->nr_args; i++) {
+		pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
+				tp->args[i].name, tp->args[i].type->fmt);
+	}
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
+
+	for (i = 0; i < tp->nr_args; i++) {
+		if (strcmp(tp->args[i].type->name, "string") == 0)
+			pos += snprintf(buf + pos, LEN_OR_ZERO,
+					", __get_str(%s)",
+					tp->args[i].name);
+		else
+			pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
+					tp->args[i].name);
+	}
+
+#undef LEN_OR_ZERO
+
+	/* return the length of print_fmt */
+	return pos;
+}
+
+int set_print_fmt(struct trace_probe *tp, bool is_return)
+{
+	int len;
+	char *print_fmt;
+
+	/* First: called with 0 length to calculate the needed length */
+	len = __set_print_fmt(tp, NULL, 0, is_return);
+	print_fmt = kmalloc(len + 1, GFP_KERNEL);
+	if (!print_fmt)
+		return -ENOMEM;
+
+	/* Second: actually write the @print_fmt */
+	__set_print_fmt(tp, print_fmt, len + 1, is_return);
+	tp->call.print_fmt = print_fmt;
+
+	return 0;
+}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 189a40baea98..325989f24dbf 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -350,3 +350,5 @@ store_trace_args(int ent_size, struct trace_probe *tp, struct pt_regs *regs,
 				   data + tp->args[i].offset);
 	}
 }
+
+extern int set_print_fmt(struct trace_probe *tp, bool is_return);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index abb95529d851..9f2d12d2311d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -681,59 +681,6 @@ static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
 	return 0;
 }
 
-#define LEN_OR_ZERO		(len ? len - pos : 0)
-static int __set_print_fmt(struct trace_uprobe *tu, char *buf, int len)
-{
-	const char *fmt, *arg;
-	int i;
-	int pos = 0;
-
-	if (is_ret_probe(tu)) {
-		fmt = "(%lx <- %lx)";
-		arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
-	} else {
-		fmt = "(%lx)";
-		arg = "REC->" FIELD_STRING_IP;
-	}
-
-	/* When len=0, we just calculate the needed length */
-
-	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
-
-	for (i = 0; i < tu->p.nr_args; i++) {
-		pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
-				tu->p.args[i].name, tu->p.args[i].type->fmt);
-	}
-
-	pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
-
-	for (i = 0; i < tu->p.nr_args; i++) {
-		pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
-				tu->p.args[i].name);
-	}
-
-	return pos;	/* return the length of print_fmt */
-}
-#undef LEN_OR_ZERO
-
-static int set_print_fmt(struct trace_uprobe *tu)
-{
-	char *print_fmt;
-	int len;
-
-	/* First: called with 0 length to calculate the needed length */
-	len = __set_print_fmt(tu, NULL, 0);
-	print_fmt = kmalloc(len + 1, GFP_KERNEL);
-	if (!print_fmt)
-		return -ENOMEM;
-
-	/* Second: actually write the @print_fmt */
-	__set_print_fmt(tu, print_fmt, len + 1);
-	tu->p.call.print_fmt = print_fmt;
-
-	return 0;
-}
-
 #ifdef CONFIG_PERF_EVENTS
 static bool
 __uprobe_perf_filter(struct trace_uprobe_filter *filter, struct mm_struct *mm)
@@ -965,7 +912,7 @@ static int register_uprobe_event(struct trace_uprobe *tu)
 	call->event.funcs = &uprobe_funcs;
 	call->class->define_fields = uprobe_event_define_fields;
 
-	if (set_print_fmt(tu) < 0)
+	if (set_print_fmt(&tu->p, is_ret_probe(tu)) < 0)
 		return -ENOMEM;
 
 	ret = register_ftrace_event(&call->event);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (8 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 09/13] tracing/kprobes: Integrate duplicate set_print_fmt() Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-31 18:16   ` Oleg Nesterov
  2013-11-01 15:09   ` Oleg Nesterov
  2013-10-29  6:53 ` [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions Namhyung Kim
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Fetching from user space should be done in a non-atomic context.  So
use a per-cpu buffer and copy its content to the ring buffer
atomically.  Note that we can migrate during accessing user memory
thus use a per-cpu mutex to protect concurrent accesses.

This is needed since we'll be able to fetch args from an user memory
which can be swapped out.  Before that uprobes could fetch args from
registers only which saved in a kernel space.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_uprobe.c | 98 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 82 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 9f2d12d2311d..c32f8f2ddc11 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -530,21 +530,44 @@ static const struct file_operations uprobe_profile_ops = {
 	.release	= seq_release,
 };
 
+static atomic_t uprobe_buffer_ref = ATOMIC_INIT(0);
+static void __percpu *uprobe_cpu_buffer;
+static DEFINE_PER_CPU(struct mutex, uprobe_cpu_mutex);
+
 static void uprobe_trace_print(struct trace_uprobe *tu,
 				unsigned long func, struct pt_regs *regs)
 {
 	struct uprobe_trace_entry_head *entry;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
-	void *data;
-	int size, i;
+	struct mutex *mutex;
+	void *data, *arg_buf;
+	int size, dsize, esize;
+	int cpu;
 	struct ftrace_event_call *call = &tu->p.call;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
+		return;
+
+	cpu = raw_smp_processor_id();
+	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
+	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+	/*
+	 * Use per-cpu buffers for fastest access, but we might migrate
+	 * so the mutex makes sure we have sole access to it.
+	 */
+	mutex_lock(mutex);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
+
+	size = esize + tu->p.size + dsize;
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
-						  size + tu->p.size, 0, 0);
+						  size, 0, 0);
 	if (!event)
-		return;
+		goto out;
 
 	entry = ring_buffer_event_data(event);
 	if (is_ret_probe(tu)) {
@@ -556,13 +579,13 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		call_fetch(&tu->p.args[i].fetch, regs,
-			   data + tu->p.args[i].offset);
-	}
+	memcpy(data, arg_buf, tu->p.size + dsize);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit(buffer, event, 0, 0);
+
+out:
+	mutex_unlock(mutex);
 }
 
 /* uprobe handler */
@@ -630,6 +653,19 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
 	if (trace_probe_is_enabled(&tu->p))
 		return -EINTR;
 
+	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
+		int cpu;
+
+		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+		if (uprobe_cpu_buffer == NULL) {
+			atomic_dec(&uprobe_buffer_ref);
+			return -ENOMEM;
+		}
+
+		for_each_possible_cpu(cpu)
+			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
+	}
+
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	tu->p.flags |= flag;
@@ -646,6 +682,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
 	if (!trace_probe_is_enabled(&tu->p))
 		return;
 
+	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
+		free_percpu(uprobe_cpu_buffer);
+		uprobe_cpu_buffer = NULL;
+	}
+
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
@@ -776,11 +817,33 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	struct ftrace_event_call *call = &tu->p.call;
 	struct uprobe_trace_entry_head *entry;
 	struct hlist_head *head;
-	void *data;
-	int size, rctx, i;
+	struct mutex *mutex;
+	void *data, *arg_buf;
+	int size, dsize, esize;
+	int cpu;
+	int rctx;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
-	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
+		return;
+
+	size = esize + tu->p.size + dsize;
+	size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
+		return;
+
+	cpu = raw_smp_processor_id();
+	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
+	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+	/*
+	 * Use per-cpu buffers for fastest access, but we might migrate
+	 * so the mutex makes sure we have sole access to it.
+	 */
+	mutex_lock(mutex);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
@@ -800,15 +863,18 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		struct probe_arg *parg = &tu->p.args[i];
+	memcpy(data, arg_buf, tu->p.size + dsize);
+
+	if (size - esize > tu->p.size + dsize) {
+		int len = tu->p.size + dsize;
 
-		call_fetch(&parg->fetch, regs, data + parg->offset);
+		memset(data + len, 0, size - esize - len);
 	}
 
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
  out:
 	preempt_enable();
+	mutex_unlock(mutex);
 }
 
 /* uprobe profile handler */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (9 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-11-04 16:09   ` Oleg Nesterov
  2013-10-29  6:53 ` [PATCH 12/13] tracing/uprobes: Add more " Namhyung Kim
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

This argument is for passing private data structure to each fetch
function and will be used by uprobes.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_kprobe.c | 32 ++++++++++++++++++--------------
 kernel/trace/trace_probe.c  | 24 ++++++++++++------------
 kernel/trace/trace_probe.h  | 19 ++++++++++---------
 kernel/trace/trace_uprobe.c |  8 ++++----
 4 files changed, 44 insertions(+), 39 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3159b114f215..c0f4c2dbdbb1 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -745,7 +745,7 @@ static const struct file_operations kprobe_profile_ops = {
  */
 #define DEFINE_FETCH_stack(type)					\
 static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
-					  void *offset, void *dest)	\
+				  void *offset, void *dest, void *priv) \
 {									\
 	*(type *)dest = (type)regs_get_kernel_stack_nth(regs,		\
 				(unsigned int)((unsigned long)offset));	\
@@ -757,7 +757,7 @@ DEFINE_BASIC_FETCH_FUNCS(stack)
 
 #define DEFINE_FETCH_memory(type)					\
 static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
-					  void *addr, void *dest)	\
+				    void *addr, void *dest, void *priv) \
 {									\
 	type retval;							\
 	if (probe_kernel_address(addr, retval))				\
@@ -771,7 +771,7 @@ DEFINE_BASIC_FETCH_FUNCS(memory)
  * length and relative data location.
  */
 static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
-					       void *addr, void *dest)
+					void *addr, void *dest, void *priv)
 {
 	long ret;
 	int maxlen = get_rloc_len(*(u32 *)dest);
@@ -808,7 +808,7 @@ static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
 
 /* Return the length of string -- including null terminal byte */
 static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
-						    void *addr, void *dest)
+					   void *addr, void *dest, void *priv)
 {
 	mm_segment_t old_fs;
 	int ret, len = 0;
@@ -879,11 +879,11 @@ struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
 
 #define DEFINE_FETCH_symbol(type)					\
 __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,	\
-					  void *data, void *dest)	\
+				    void *data, void *dest, void *priv)	\
 {									\
 	struct symbol_cache *sc = data;					\
 	if (sc->addr)							\
-		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
+		fetch_memory_##type(regs, (void *)sc->addr, dest, priv);\
 	else								\
 		*(type *)dest = 0;					\
 }
@@ -929,7 +929,7 @@ __kprobe_trace_func(struct trace_kprobe *tp, struct pt_regs *regs,
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-	dsize = __get_data_size(&tp->p, regs);
+	dsize = __get_data_size(&tp->p, regs, NULL);
 	size = sizeof(*entry) + tp->p.size + dsize;
 
 	event = trace_event_buffer_lock_reserve(&buffer, ftrace_file,
@@ -940,7 +940,8 @@ __kprobe_trace_func(struct trace_kprobe *tp, struct pt_regs *regs,
 
 	entry = ring_buffer_event_data(event);
 	entry->ip = (unsigned long)tp->rp.kp.addr;
-	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize,
+			 NULL);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit_regs(buffer, event,
@@ -977,7 +978,7 @@ __kretprobe_trace_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-	dsize = __get_data_size(&tp->p, regs);
+	dsize = __get_data_size(&tp->p, regs, NULL);
 	size = sizeof(*entry) + tp->p.size + dsize;
 
 	event = trace_event_buffer_lock_reserve(&buffer, ftrace_file,
@@ -989,7 +990,8 @@ __kretprobe_trace_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 	entry = ring_buffer_event_data(event);
 	entry->func = (unsigned long)tp->rp.kp.addr;
 	entry->ret_ip = (unsigned long)ri->ret_addr;
-	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize,
+			 NULL);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit_regs(buffer, event,
@@ -1149,7 +1151,7 @@ kprobe_perf_func(struct trace_kprobe *tp, struct pt_regs *regs)
 	if (hlist_empty(head))
 		return;
 
-	dsize = __get_data_size(&tp->p, regs);
+	dsize = __get_data_size(&tp->p, regs, NULL);
 	__size = sizeof(*entry) + tp->p.size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
@@ -1160,7 +1162,8 @@ kprobe_perf_func(struct trace_kprobe *tp, struct pt_regs *regs)
 
 	entry->ip = (unsigned long)tp->rp.kp.addr;
 	memset(&entry[1], 0, dsize);
-	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize,
+			 NULL);
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
 }
 
@@ -1179,7 +1182,7 @@ kretprobe_perf_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 	if (hlist_empty(head))
 		return;
 
-	dsize = __get_data_size(&tp->p, regs);
+	dsize = __get_data_size(&tp->p, regs, NULL);
 	__size = sizeof(*entry) + tp->p.size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
@@ -1190,7 +1193,8 @@ kretprobe_perf_func(struct trace_kprobe *tp, struct kretprobe_instance *ri,
 
 	entry->func = (unsigned long)tp->rp.kp.addr;
 	entry->ret_ip = (unsigned long)ri->ret_addr;
-	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize);
+	store_trace_args(sizeof(*entry), &tp->p, regs, (u8 *)&entry[1], dsize,
+			 NULL);
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
 }
 #endif	/* CONFIG_PERF_EVENTS */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 1ab83d4c7775..eaee44d5d9d1 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -78,7 +78,7 @@ const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 /* Data fetch function templates */
 #define DEFINE_FETCH_reg(type)						\
 __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,		\
-					void *offset, void *dest)	\
+				  void *offset, void *dest, void *priv)	\
 {									\
 	*(type *)dest = (type)regs_get_register(regs,			\
 				(unsigned int)((unsigned long)offset));	\
@@ -87,7 +87,7 @@ DEFINE_BASIC_FETCH_FUNCS(reg)
 
 #define DEFINE_FETCH_retval(type)					\
 __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,	\
-					  void *dummy, void *dest)	\
+				   void *dummy, void *dest, void *priv)	\
 {									\
 	*(type *)dest = (type)regs_return_value(regs);			\
 }
@@ -103,14 +103,14 @@ struct deref_fetch_param {
 
 #define DEFINE_FETCH_deref(type)					\
 __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,	\
-					    void *data, void *dest)	\
+				    void *data, void *dest, void *priv)	\
 {									\
 	struct deref_fetch_param *dprm = data;				\
 	unsigned long addr;						\
-	call_fetch(&dprm->orig, regs, &addr);				\
+	call_fetch(&dprm->orig, regs, &addr, priv);			\
 	if (addr) {							\
 		addr += dprm->offset;					\
-		dprm->fetch(regs, (void *)addr, dest);			\
+		dprm->fetch(regs, (void *)addr, dest, priv);		\
 	} else								\
 		*(type *)dest = 0;					\
 }
@@ -118,15 +118,15 @@ DEFINE_BASIC_FETCH_FUNCS(deref)
 DEFINE_FETCH_deref(string)
 
 __kprobes void FETCH_FUNC_NAME(deref, string_size)(struct pt_regs *regs,
-						   void *data, void *dest)
+					void *data, void *dest, void *priv)
 {
 	struct deref_fetch_param *dprm = data;
 	unsigned long addr;
 
-	call_fetch(&dprm->orig, regs, &addr);
+	call_fetch(&dprm->orig, regs, &addr, priv);
 	if (addr && dprm->fetch_size) {
 		addr += dprm->offset;
-		dprm->fetch_size(regs, (void *)addr, dest);
+		dprm->fetch_size(regs, (void *)addr, dest, priv);
 	} else
 		*(string_size *)dest = 0;
 }
@@ -156,12 +156,12 @@ struct bitfield_fetch_param {
 };
 
 #define DEFINE_FETCH_bitfield(type)					\
-__kprobes void FETCH_FUNC_NAME(bitfield, type)(struct pt_regs *regs,\
-					    void *data, void *dest)	\
+__kprobes void FETCH_FUNC_NAME(bitfield, type)(struct pt_regs *regs,	\
+				    void *data, void *dest, void *priv)	\
 {									\
 	struct bitfield_fetch_param *bprm = data;			\
 	type buf = 0;							\
-	call_fetch(&bprm->orig, regs, &buf);				\
+	call_fetch(&bprm->orig, regs, &buf, priv);			\
 	if (buf) {							\
 		buf <<= bprm->hi_shift;					\
 		buf >>= bprm->low_shift;				\
@@ -249,7 +249,7 @@ fail:
 
 /* Special function : only accept unsigned long */
 static __kprobes void fetch_stack_address(struct pt_regs *regs,
-					void *dummy, void *dest)
+					  void *dummy, void *dest, void *priv)
 {
 	*(unsigned long *)dest = kernel_stack_pointer(regs);
 }
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 325989f24dbf..fc7edf3749ef 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -98,7 +98,7 @@ typedef u32 string;
 typedef u32 string_size;
 
 /* Data fetch function type */
-typedef	void (*fetch_func_t)(struct pt_regs *, void *, void *);
+typedef	void (*fetch_func_t)(struct pt_regs *, void *, void *, void *);
 /* Printing function type */
 typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void *);
 
@@ -182,7 +182,8 @@ static inline bool trace_probe_is_registered(struct trace_probe *tp)
 #define FETCH_FUNC_NAME(method, type)	fetch_##method##_##type
 
 #define DECLARE_FETCH_FUNC(method, type)				\
-extern void FETCH_FUNC_NAME(method, type)(struct pt_regs *, void *, void *)
+extern void FETCH_FUNC_NAME(method, type)(struct pt_regs *, void *,	\
+					  void *, void *)
 
 #define DECLARE_BASIC_FETCH_FUNCS(method) 	\
 DECLARE_FETCH_FUNC(method, u8);	  		\
@@ -264,9 +265,9 @@ ASSIGN_FETCH_FUNC(bitfield, ftype),			\
 extern const struct fetch_type kprobes_fetch_type_table[];
 
 static inline __kprobes void call_fetch(struct fetch_param *fprm,
-				 struct pt_regs *regs, void *dest)
+				 struct pt_regs *regs, void *dest, void *priv)
 {
-	return fprm->fn(regs, fprm->data, dest);
+	return fprm->fn(regs, fprm->data, dest, priv);
 }
 
 struct symbol_cache;
@@ -305,14 +306,14 @@ extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
 
 /* Sum up total data length for dynamic arraies (strings) */
 static inline __kprobes int
-__get_data_size(struct trace_probe *tp, struct pt_regs *regs)
+__get_data_size(struct trace_probe *tp, struct pt_regs *regs, void *priv)
 {
 	int i, ret = 0;
 	u32 len;
 
 	for (i = 0; i < tp->nr_args; i++)
 		if (unlikely(tp->args[i].fetch_size.fn)) {
-			call_fetch(&tp->args[i].fetch_size, regs, &len);
+			call_fetch(&tp->args[i].fetch_size, regs, &len, priv);
 			ret += len;
 		}
 
@@ -322,7 +323,7 @@ __get_data_size(struct trace_probe *tp, struct pt_regs *regs)
 /* Store the value of each argument */
 static inline __kprobes void
 store_trace_args(int ent_size, struct trace_probe *tp, struct pt_regs *regs,
-		 u8 *data, int maxlen)
+		 u8 *data, int maxlen, void *priv)
 {
 	int i;
 	u32 end = tp->size;
@@ -337,7 +338,7 @@ store_trace_args(int ent_size, struct trace_probe *tp, struct pt_regs *regs,
 			dl = (u32 *)(data + tp->args[i].offset);
 			*dl = make_data_rloc(maxlen, end - tp->args[i].offset);
 			/* Then try to fetch string or dynamic array data */
-			call_fetch(&tp->args[i].fetch, regs, dl);
+			call_fetch(&tp->args[i].fetch, regs, dl, priv);
 			/* Reduce maximum length */
 			end += get_rloc_len(*dl);
 			maxlen -= get_rloc_len(*dl);
@@ -347,7 +348,7 @@ store_trace_args(int ent_size, struct trace_probe *tp, struct pt_regs *regs,
 		} else
 			/* Just fetching data normally */
 			call_fetch(&tp->args[i].fetch, regs,
-				   data + tp->args[i].offset);
+				   data + tp->args[i].offset, priv);
 	}
 }
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c32f8f2ddc11..b41e11621aed 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -546,7 +546,7 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 	int cpu;
 	struct ftrace_event_call *call = &tu->p.call;
 
-	dsize = __get_data_size(&tu->p, regs);
+	dsize = __get_data_size(&tu->p, regs, NULL);
 	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 
 	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
@@ -561,7 +561,7 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 	 * so the mutex makes sure we have sole access to it.
 	 */
 	mutex_lock(mutex);
-	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize, NULL);
 
 	size = esize + tu->p.size + dsize;
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
@@ -823,7 +823,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	int cpu;
 	int rctx;
 
-	dsize = __get_data_size(&tu->p, regs);
+	dsize = __get_data_size(&tu->p, regs, NULL);
 	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 
 	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
@@ -843,7 +843,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	 * so the mutex makes sure we have sole access to it.
 	 */
 	mutex_lock(mutex);
-	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize, NULL);
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (10 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-31 18:22   ` Oleg Nesterov
  2013-11-01 17:53   ` Oleg Nesterov
  2013-10-29  6:53 ` [PATCH 13/13] tracing/uprobes: Add support for full argument access methods Namhyung Kim
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Implement uprobe-specific stack and memory fetch functions and add
them to the uprobes_fetch_type_table.  Other fetch fucntions will be
shared with kprobes.

Original-patch-by: Hyeoncheol Lee <cheol.lee@lge.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_probe.c  |   9 ++-
 kernel/trace/trace_probe.h  |   1 +
 kernel/trace/trace_uprobe.c | 188 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 192 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index eaee44d5d9d1..70cd3bfde5a6 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -101,6 +101,10 @@ struct deref_fetch_param {
 	fetch_func_t		fetch_size;
 };
 
+/*
+ * For uprobes, it'll get a vaddr from first call_fetch() so pass NULL
+ * as a priv on the second dprm->fetch() not to translate it to vaddr again.
+ */
 #define DEFINE_FETCH_deref(type)					\
 __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,	\
 				    void *data, void *dest, void *priv)	\
@@ -110,13 +114,14 @@ __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,	\
 	call_fetch(&dprm->orig, regs, &addr, priv);			\
 	if (addr) {							\
 		addr += dprm->offset;					\
-		dprm->fetch(regs, (void *)addr, dest, priv);		\
+		dprm->fetch(regs, (void *)addr, dest, NULL);		\
 	} else								\
 		*(type *)dest = 0;					\
 }
 DEFINE_BASIC_FETCH_FUNCS(deref)
 DEFINE_FETCH_deref(string)
 
+/* Same as above */
 __kprobes void FETCH_FUNC_NAME(deref, string_size)(struct pt_regs *regs,
 					void *data, void *dest, void *priv)
 {
@@ -126,7 +131,7 @@ __kprobes void FETCH_FUNC_NAME(deref, string_size)(struct pt_regs *regs,
 	call_fetch(&dprm->orig, regs, &addr, priv);
 	if (addr && dprm->fetch_size) {
 		addr += dprm->offset;
-		dprm->fetch_size(regs, (void *)addr, dest, priv);
+		dprm->fetch_size(regs, (void *)addr, dest, NULL);
 	} else
 		*(string_size *)dest = 0;
 }
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index fc7edf3749ef..b1e7d722c354 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -263,6 +263,7 @@ ASSIGN_FETCH_FUNC(bitfield, ftype),			\
 #define NR_FETCH_TYPES		10
 
 extern const struct fetch_type kprobes_fetch_type_table[];
+extern const struct fetch_type uprobes_fetch_type_table[];
 
 static inline __kprobes void call_fetch(struct fetch_param *fprm,
 				 struct pt_regs *regs, void *dest, void *priv)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index b41e11621aed..b7664f6eb356 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -530,6 +530,186 @@ static const struct file_operations uprobe_profile_ops = {
 	.release	= seq_release,
 };
 
+#ifdef CONFIG_STACK_GROWSUP
+static unsigned long adjust_stack_addr(unsigned long addr, unsigned n)
+{
+	return addr - (n * sizeof(long));
+}
+
+static bool within_user_stack(struct vm_area_struct *vma, unsigned long addr,
+			      unsigned int n)
+{
+	return vma->vm_start <= adjust_stack_addr(addr, n);
+}
+#else
+static unsigned long adjust_stack_addr(unsigned long addr, unsigned n)
+{
+	return addr + (n * sizeof(long));
+}
+
+static bool within_user_stack(struct vm_area_struct *vma, unsigned long addr,
+			      unsigned int n)
+{
+	return vma->vm_end >= adjust_stack_addr(addr, n);
+}
+#endif
+
+static unsigned long get_user_stack_nth(struct pt_regs *regs, unsigned int n)
+{
+	struct vm_area_struct *vma;
+	unsigned long addr = user_stack_pointer(regs);
+	bool valid = false;
+	unsigned long ret = 0;
+
+	down_read(&current->mm->mmap_sem);
+	vma = find_vma(current->mm, addr);
+	if (vma && vma->vm_start <= addr) {
+		if (within_user_stack(vma, addr, n))
+			valid = true;
+	}
+	up_read(&current->mm->mmap_sem);
+
+	addr = adjust_stack_addr(addr, n);
+
+	if (valid && copy_from_user(&ret, (void __force __user *)addr,
+				    sizeof(ret)) == 0)
+		return ret;
+	return 0;
+}
+
+static unsigned long offset_to_vaddr(struct vm_area_struct *vma,
+				     unsigned long offset)
+{
+	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+}
+
+static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
+{
+	unsigned long pgoff = addr >> PAGE_SHIFT;
+	struct vm_area_struct *vma;
+	struct address_space *mapping;
+	unsigned long vaddr = 0;
+
+	if (tu == NULL) {
+		/* A NULL tu means that we already got the vaddr */
+		return (void __force __user *) addr;
+	}
+
+	mapping = tu->inode->i_mapping;
+
+	mutex_lock(&mapping->i_mmap_mutex);
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
+		if (vma->vm_mm != current->mm)
+			continue;
+		if (!(vma->vm_flags & VM_READ))
+			continue;
+
+		vaddr = offset_to_vaddr(vma, addr);
+		break;
+	}
+	mutex_unlock(&mapping->i_mmap_mutex);
+
+	WARN_ON_ONCE(vaddr == 0);
+	return (void __force __user *) vaddr;
+}
+
+/*
+ * uprobes-specific fetch functions
+ */
+#define DEFINE_FETCH_stack(type)					\
+static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
+				  void *offset, void *dest, void *priv) \
+{									\
+	*(type *)dest = (type)get_user_stack_nth(regs, 			\
+				(unsigned int)((unsigned long)offset)); \
+}
+DEFINE_BASIC_FETCH_FUNCS(stack)
+/* No string on the stack entry */
+#define fetch_stack_string		NULL
+#define fetch_stack_string_size		NULL
+
+#define DEFINE_FETCH_memory(type)					\
+static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
+				    void *addr, void *dest, void *priv) \
+{									\
+	type retval;							\
+	void __user *uaddr = get_user_vaddr((unsigned long)addr, priv);	\
+									\
+	if (copy_from_user(&retval, uaddr, sizeof(type)))		\
+		*(type *)dest = 0;					\
+	else								\
+		*(type *)dest = retval;					\
+}
+DEFINE_BASIC_FETCH_FUNCS(memory)
+/*
+ * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
+ * length and relative data location.
+ */
+static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
+					void *addr, void *dest, void *priv)
+{
+	long ret;
+	u32 rloc = *(u32 *)dest;
+	int maxlen = get_rloc_len(rloc);
+	u8 *dst = get_rloc_data(dest);
+	void __user *vaddr = get_user_vaddr((unsigned long)addr, priv);
+	void __user *src = vaddr;
+
+	if (!maxlen)
+		return;
+
+	do {
+		ret = copy_from_user(dst, src, sizeof(*dst));
+		dst++;
+		src++;
+	} while (dst[-1] && ret == 0 && (src - vaddr) < maxlen);
+
+	if (ret < 0) {  /* Failed to fetch string */
+		((u8 *)get_rloc_data(dest))[0] = '\0';
+		*(u32 *)dest = make_data_rloc(0, get_rloc_offs(rloc));
+	} else {
+		*(u32 *)dest = make_data_rloc(src - vaddr,
+					      get_rloc_offs(rloc));
+	}
+}
+
+/* Return the length of string -- including null terminal byte */
+static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
+					   void *addr, void *dest, void *priv)
+{
+	int ret, len = 0;
+	u8 c;
+	void __user *vaddr = get_user_vaddr((unsigned long)addr, priv);
+
+	do {
+		ret = __copy_from_user_inatomic(&c, vaddr + len, 1);
+		len++;
+	} while (c && ret == 0 && len < MAX_STRING_SIZE);
+
+	if (ret < 0)	/* Failed to check the length */
+		*(u32 *)dest = 0;
+	else
+		*(u32 *)dest = len;
+}
+
+/* Fetch type information table */
+const struct fetch_type uprobes_fetch_type_table[] = {
+	/* Special types */
+	[FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
+					sizeof(u32), 1, "__data_loc char[]"),
+	[FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
+					string_size, sizeof(u32), 0, "u32"),
+	/* Basic types */
+	ASSIGN_FETCH_TYPE(u8,  u8,  0),
+	ASSIGN_FETCH_TYPE(u16, u16, 0),
+	ASSIGN_FETCH_TYPE(u32, u32, 0),
+	ASSIGN_FETCH_TYPE(u64, u64, 0),
+	ASSIGN_FETCH_TYPE(s8,  u8,  1),
+	ASSIGN_FETCH_TYPE(s16, u16, 1),
+	ASSIGN_FETCH_TYPE(s32, u32, 1),
+	ASSIGN_FETCH_TYPE(s64, u64, 1),
+};
+
 static atomic_t uprobe_buffer_ref = ATOMIC_INIT(0);
 static void __percpu *uprobe_cpu_buffer;
 static DEFINE_PER_CPU(struct mutex, uprobe_cpu_mutex);
@@ -546,7 +726,7 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 	int cpu;
 	struct ftrace_event_call *call = &tu->p.call;
 
-	dsize = __get_data_size(&tu->p, regs, NULL);
+	dsize = __get_data_size(&tu->p, regs, tu);
 	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 
 	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
@@ -561,7 +741,7 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 	 * so the mutex makes sure we have sole access to it.
 	 */
 	mutex_lock(mutex);
-	store_trace_args(esize, &tu->p, regs, arg_buf, dsize, NULL);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize, tu);
 
 	size = esize + tu->p.size + dsize;
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
@@ -823,7 +1003,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	int cpu;
 	int rctx;
 
-	dsize = __get_data_size(&tu->p, regs, NULL);
+	dsize = __get_data_size(&tu->p, regs, tu);
 	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 
 	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
@@ -843,7 +1023,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	 * so the mutex makes sure we have sole access to it.
 	 */
 	mutex_lock(mutex);
-	store_trace_args(esize, &tu->p, regs, arg_buf, dsize, NULL);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize, tu);
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 13/13] tracing/uprobes: Add support for full argument access methods
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (11 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 12/13] tracing/uprobes: Add more " Namhyung Kim
@ 2013-10-29  6:53 ` Namhyung Kim
  2013-10-30 10:36 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Masami Hiramatsu
  2013-11-02 15:54 ` Oleg Nesterov
  14 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-10-29  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee, Hemant Kumar,
	LKML, Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Enable to fetch other types of argument for the uprobes.  IOW, we can
access stack, memory, deref, bitfield and retval from uprobes now.

The format for the argument types are same as kprobes (but @SYMBOL
type is not supported for uprobes), i.e:

  @ADDR   : Fetch memory at ADDR
  $stackN : Fetch Nth entry of stack (N >= 0)
  $stack  : Fetch stack address
  $retval : Fetch return value
  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address

Note that the retval only can be used with uretprobes.

Original-patch-by: Hyeoncheol Lee <cheol.lee@lge.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 Documentation/trace/uprobetracer.txt | 25 +++++++++++++++++++++++++
 kernel/trace/trace_probe.c           | 36 +++++++++++++++++++++++-------------
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt
index 8f1a8b8956fc..6e5cff263e2b 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -31,6 +31,31 @@ Synopsis of uprobe_tracer
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
+   @ADDR	: Fetch memory at ADDR (ADDR should be in userspace)
+   $stackN	: Fetch Nth entry of stack (N >= 0)
+   $stack	: Fetch stack address.
+   $retval	: Fetch return value.(*)
+   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+   NAME=FETCHARG     : Set NAME as the argument name of FETCHARG.
+   FETCHARG:TYPE     : Set TYPE as the type of FETCHARG. Currently, basic types
+		       (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
+		       are supported.
+
+  (*) only for return probe.
+  (**) this is useful for fetching a field of data structures.
+
+Types
+-----
+Several types are supported for fetch-args. Uprobe tracer will access memory
+by given type. Prefix 's' and 'u' means those types are signed and unsigned
+respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
+String type is a special type, which fetches a "null-terminated" string from
+user space.
+Bitfield is another special type, which takes 3 parameters, bit-width, bit-
+offset, and container-size (usually 32). The syntax is;
+
+ b<bit-width>@<bit-offset>/<container-size>
+
 
 Event Profiling
 ---------------
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 70cd3bfde5a6..8c77825e87e6 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -253,12 +253,18 @@ fail:
 }
 
 /* Special function : only accept unsigned long */
-static __kprobes void fetch_stack_address(struct pt_regs *regs,
+static __kprobes void fetch_kernel_stack_address(struct pt_regs *regs,
 					  void *dummy, void *dest, void *priv)
 {
 	*(unsigned long *)dest = kernel_stack_pointer(regs);
 }
 
+static __kprobes void fetch_user_stack_address(struct pt_regs *regs,
+					  void *dummy, void *dest, void *priv)
+{
+	*(unsigned long *)dest = user_stack_pointer(regs);
+}
+
 static fetch_func_t get_fetch_size_function(const struct fetch_type *type,
 					    fetch_func_t orig_fn,
 					    const struct fetch_type *ttbl)
@@ -303,7 +309,8 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
 
 static int parse_probe_vars(char *arg, const struct fetch_type *t,
-			    struct fetch_param *f, bool is_return)
+			    struct fetch_param *f, bool is_return,
+			    bool is_kprobe)
 {
 	int ret = 0;
 	unsigned long param;
@@ -315,13 +322,16 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 			ret = -EINVAL;
 	} else if (strncmp(arg, "stack", 5) == 0) {
 		if (arg[5] == '\0') {
-			if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR) == 0)
-				f->fn = fetch_stack_address;
+			if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR))
+				return -EINVAL;
+
+			if (is_kprobe)
+				f->fn = fetch_kernel_stack_address;
 			else
-				ret = -EINVAL;
+				f->fn = fetch_user_stack_address;
 		} else if (isdigit(arg[5])) {
 			ret = kstrtoul(arg + 5, 10, &param);
-			if (ret || param > PARAM_MAX_STACK)
+			if (ret || (is_kprobe && param > PARAM_MAX_STACK))
 				ret = -EINVAL;
 			else {
 				f->fn = t->fetch[FETCH_MTD_stack];
@@ -345,17 +355,13 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 	int ret;
 	const struct fetch_type *ttbl;
 
-	ttbl = kprobes_fetch_type_table;
+	ttbl = is_kprobe ? kprobes_fetch_type_table : uprobes_fetch_type_table;
 
 	ret = 0;
 
-	/* Until uprobe_events supports only reg arguments */
-	if (!is_kprobe && arg[0] != '%')
-		return -EINVAL;
-
 	switch (arg[0]) {
 	case '$':
-		ret = parse_probe_vars(arg + 1, t, f, is_return);
+		ret = parse_probe_vars(arg + 1, t, f, is_return, is_kprobe);
 		break;
 
 	case '%':	/* named register */
@@ -376,6 +382,10 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 			f->fn = t->fetch[FETCH_MTD_memory];
 			f->data = (void *)param;
 		} else {
+			/* uprobes don't support symbols */
+			if (!is_kprobe)
+				return -EINVAL;
+
 			ret = traceprobe_split_symbol_offset(arg + 1, &offset);
 			if (ret)
 				break;
@@ -482,7 +492,7 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 	int ret;
 	const struct fetch_type *ttbl;
 
-	ttbl = kprobes_fetch_type_table;
+	ttbl = is_kprobe ? kprobes_fetch_type_table : uprobes_fetch_type_table;
 
 	if (strlen(arg) > MAX_ARGSTR_LEN) {
 		pr_info("Argument is too long.: %s\n",  arg);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (12 preceding siblings ...)
  2013-10-29  6:53 ` [PATCH 13/13] tracing/uprobes: Add support for full argument access methods Namhyung Kim
@ 2013-10-30 10:36 ` Masami Hiramatsu
  2013-11-02 15:54 ` Oleg Nesterov
  14 siblings, 0 replies; 92+ messages in thread
From: Masami Hiramatsu @ 2013-10-30 10:36 UTC (permalink / raw)
  To: Namhyung Kim, Steven Rostedt
  Cc: Namhyung Kim, Hyeoncheol Lee, Hemant Kumar, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

(2013/10/29 15:53), Namhyung Kim wrote:
> Hello,
> 
> This patchset implements memory (address), stack[N], deference,
> bitfield and retval (it needs uretprobe tho) fetch methods for
> uprobes.  It's based on the previous work [1] done by Hyeoncheol Lee.
> 
> Now kprobes and uprobes have their own fetch_type_tables and, in turn,
> memory and stack access methods.  Other fetch methods are shared.
> 
> For the dereference method, I added a new argument to fetch functions.
> It's because for uprobes it needs to know whether the given address is
> a file offset or a virtual address in an user process.  For instance,
> in case of fetching from a memory directly (like @offset) it should
> convert the address (offset) to a virtual address of the process, but
> if it's a dereferencing, the given address already has the virtual
> address.
> 
> To determine this in a fetch function, I passed a pointer to
> trace_uprobe for direct fetch, and passed NULL for dereference.
> 
> The patch 1-2 are bug fixes and can be applied independently.

You'd better add [BUGFIX] and send those separately. ;)
But anyway, I'm OK to pull those first two (and others too).


> Please look at patch 10 that uses per-cpu buffer for accessing user
> memory as suggested by Steven.  While I tried hard not to mess things
> up there might be a chance I did something horrible.  It'd be great if
> you guys take a look and give comments.
> 
> 
>  * v6 changes:
>   - add more Ack's from Masami
>   - fix ref count of uprobe_cpu_buffer (thanks to Jovi)
> 
>  * v5 changes:
>   - use user_stack_pointer() instead of GET_USP()
>   - fix a bug in 'stack' fetch method of uprobes
> 
>  * v4 changes:
>   - add Ack's from Masami
>   - rearrange patches to make it easy for simple fixes to be applied
>   - update documentation
>   - use per-cpu buffer for storing args (thanks to Steve!)
> 
> 
> [1] https://lkml.org/lkml/2012/11/14/84
> 
> A simple example:
> 
>   # cat foo.c
>   int glob = -1;
>   char str[] = "hello uprobe.";
> 
>   struct foo {
>     unsigned int unused: 2;
>     unsigned int foo: 20;
>     unsigned int bar: 10;
>   } foo = {
>     .foo = 5,
>   };
> 
>   int main(int argc, char *argv[])
>   {
>     long local = 0x1234;
> 
>     return 127;
>   }
> 
>   # gcc -o foo -g foo.c
> 
>   # objdump -d foo | grep -A9 -F '<main>'
>   00000000004004b0 <main>:
>     4004b0:	55                   	push   %rbp
>     4004b1:	48 89 e5             	mov    %rsp,%rbp
>     4004b4:	89 7d ec             	mov    %edi,-0x14(%rbp)
>     4004b7:	48 89 75 e0          	mov    %rsi,-0x20(%rbp)
>     4004bb:	48 c7 45 f8 34 12 00 	movq   $0x1234,-0x8(%rbp)
>     4004c2:	00 
>     4004c3:	b8 7f 00 00 00       	mov    $0x7f,%eax
>     4004c8:	5d                   	pop    %rbp
>     4004c9:	c3                   	retq   
> 
>   # nm foo | grep -e glob$ -e str -e foo
>   00000000006008bc D foo
>   00000000006008a8 D glob
>   00000000006008ac D str
> 
>   # perf probe -x /home/namhyung/tmp/foo -a 'foo=main+0x13 glob=@0x8a8:s32 \
>   > str=@0x8ac:string bit=@0x8bc:b10@2/32 argc=%di local=-0x8(%bp)'
>   Added new event:
>     probe_foo:foo      (on 0x4c3 with glob=@0x8a8:s32 str=@0x8ac:string 
>                                  bit=@0x8bc:b10@2/32 argc=%di local=-0x8(%bp))
> 
>   You can now use it in all perf tools, such as:
> 
>           perf record -e probe_foo:foo -aR sleep 1
> 
>   # perf record -e probe_foo:foo ./foo
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.001 MB perf.data (~33 samples) ]
> 
>   # perf script | grep -v ^#
>                foo  2008 [002  2199.867154: probe_foo:foo (4004c3)
>                    glob=-1 str="hello uprobe." bit=5 argc=1 local=1234

Nice ! :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-10-29  6:53 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
@ 2013-10-31 18:16   ` Oleg Nesterov
  2013-11-01  9:00     ` Namhyung Kim
  2013-11-04  8:06     ` Namhyung Kim
  2013-11-01 15:09   ` Oleg Nesterov
  1 sibling, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-10-31 18:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 10/29, Namhyung Kim wrote:
>
> @@ -630,6 +653,19 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
>  	if (trace_probe_is_enabled(&tu->p))
>  		return -EINTR;
>  
> +	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
> +		int cpu;
> +
> +		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
> +		if (uprobe_cpu_buffer == NULL) {
> +			atomic_dec(&uprobe_buffer_ref);
> +			return -ENOMEM;
> +		}
> +
> +		for_each_possible_cpu(cpu)
> +			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
> +	}
> +
>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>  
>  	tu->p.flags |= flag;
> @@ -646,6 +682,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>  	if (!trace_probe_is_enabled(&tu->p))
>  		return;
>  
> +	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
> +		free_percpu(uprobe_cpu_buffer);
> +		uprobe_cpu_buffer = NULL;
> +	}
> +
>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));

Do we really need atomic_t? probe_event_enable/disable is called under
event_mutex and we rely on this fact anyway.

Otherwise this logic looks racy even with atomic_t, another thread could
use the uninitialized uprobe_cpu_buffer/mutex if it registers another probe
and the handler runs before we complete the initialization, no?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-10-29  6:53 ` [PATCH 12/13] tracing/uprobes: Add more " Namhyung Kim
@ 2013-10-31 18:22   ` Oleg Nesterov
  2013-11-04  8:50     ` Namhyung Kim
  2013-11-01 17:53   ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-10-31 18:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 10/29, Namhyung Kim wrote:
>
> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> +{
> +	unsigned long pgoff = addr >> PAGE_SHIFT;
> +	struct vm_area_struct *vma;
> +	struct address_space *mapping;
> +	unsigned long vaddr = 0;
> +
> +	if (tu == NULL) {
> +		/* A NULL tu means that we already got the vaddr */
> +		return (void __force __user *) addr;
> +	}
> +
> +	mapping = tu->inode->i_mapping;
> +
> +	mutex_lock(&mapping->i_mmap_mutex);
> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
> +		if (vma->vm_mm != current->mm)
> +			continue;
> +		if (!(vma->vm_flags & VM_READ))
> +			continue;
> +
> +		vaddr = offset_to_vaddr(vma, addr);
> +		break;
> +	}
> +	mutex_unlock(&mapping->i_mmap_mutex);
> +
> +	WARN_ON_ONCE(vaddr == 0);

Hmm. But unless I missed something this "addr" passed as an argument can
be wrong? And if nothing else this or another thread can unmap the vma?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-10-31 18:16   ` Oleg Nesterov
@ 2013-11-01  9:00     ` Namhyung Kim
  2013-11-04  8:06     ` Namhyung Kim
  1 sibling, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-01  9:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

Thank you for reviewing this patchset!

On Thu, 31 Oct 2013 19:16:54 +0100, Oleg Nesterov wrote:
> On 10/29, Namhyung Kim wrote:
>>
>> @@ -630,6 +653,19 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
>>  	if (trace_probe_is_enabled(&tu->p))
>>  		return -EINTR;
>>  
>> +	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
>> +		int cpu;
>> +
>> +		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
>> +		if (uprobe_cpu_buffer == NULL) {
>> +			atomic_dec(&uprobe_buffer_ref);
>> +			return -ENOMEM;
>> +		}
>> +
>> +		for_each_possible_cpu(cpu)
>> +			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
>> +	}
>> +
>>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>>  
>>  	tu->p.flags |= flag;
>> @@ -646,6 +682,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>>  	if (!trace_probe_is_enabled(&tu->p))
>>  		return;
>>  
>> +	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
>> +		free_percpu(uprobe_cpu_buffer);
>> +		uprobe_cpu_buffer = NULL;
>> +	}
>> +
>>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>
> Do we really need atomic_t? probe_event_enable/disable is called under
> event_mutex and we rely on this fact anyway.
>
> Otherwise this logic looks racy even with atomic_t, another thread could
> use the uninitialized uprobe_cpu_buffer/mutex if it registers another probe
> and the handler runs before we complete the initialization, no?

It seems you're right.  I really need to have some time to look at this
code carefully again. ;-)  So many works are going on these days...

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-10-29  6:53 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
  2013-10-31 18:16   ` Oleg Nesterov
@ 2013-11-01 15:09   ` Oleg Nesterov
  2013-11-01 15:22     ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-01 15:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Namhyung,

Sorry if this was already discussed. But I can't really understand
the idea of this per-cpu buffer...

On 10/29, Namhyung Kim wrote:
>
> Fetching from user space should be done in a non-atomic context.  So
> use a per-cpu buffer and copy its content to the ring buffer
> atomically.  Note that we can migrate during accessing user memory
> thus use a per-cpu mutex to protect concurrent accesses.

And if the task migrates or just sleeps in page fault, another task
which hits another uprobe on the same CPU should wait.

Why we can't simply add trace_uprobe->buffer instead? Only to save
some memory? But every uprobe is very expensive in this sense anyway.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-01 15:09   ` Oleg Nesterov
@ 2013-11-01 15:22     ` Oleg Nesterov
  2013-11-03 20:20       ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-01 15:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/01, Oleg Nesterov wrote:
>
> Hi Namhyung,
>
> Sorry if this was already discussed. But I can't really understand
> the idea of this per-cpu buffer...
>
> On 10/29, Namhyung Kim wrote:
> >
> > Fetching from user space should be done in a non-atomic context.  So
> > use a per-cpu buffer and copy its content to the ring buffer
> > atomically.  Note that we can migrate during accessing user memory
> > thus use a per-cpu mutex to protect concurrent accesses.
>
> And if the task migrates or just sleeps in page fault, another task
> which hits another uprobe on the same CPU should wait.
>
> Why we can't simply add trace_uprobe->buffer instead? Only to save
> some memory? But every uprobe is very expensive in this sense anyway.

Ah, please ignore... handler_chain() is not self-serialized, so
tu->buffer needs locking/waiting too.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-10-29  6:53 ` [PATCH 12/13] tracing/uprobes: Add more " Namhyung Kim
  2013-10-31 18:22   ` Oleg Nesterov
@ 2013-11-01 17:53   ` Oleg Nesterov
  1 sibling, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-01 17:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 10/29, Namhyung Kim wrote:
>
> +static unsigned long get_user_stack_nth(struct pt_regs *regs, unsigned int n)
> +{
> +	struct vm_area_struct *vma;
> +	unsigned long addr = user_stack_pointer(regs);
> +	bool valid = false;
> +	unsigned long ret = 0;
> +
> +	down_read(&current->mm->mmap_sem);
> +	vma = find_vma(current->mm, addr);
> +	if (vma && vma->vm_start <= addr) {
> +		if (within_user_stack(vma, addr, n))
> +			valid = true;
> +	}
> +	up_read(&current->mm->mmap_sem);
> +
> +	addr = adjust_stack_addr(addr, n);
> +
> +	if (valid && copy_from_user(&ret, (void __force __user *)addr,
> +				    sizeof(ret)) == 0)
> +		return ret;
> +	return 0;
> +}

Namhyung, I am just curious, why do we need find_vma/within_user_stack?
copy_from_user() should fail or expand the stack. Yes, we can actually
look into the wrong vma, but do we really care?

> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> +{
> +	unsigned long pgoff = addr >> PAGE_SHIFT;
> +	struct vm_area_struct *vma;
> +	struct address_space *mapping;
> +	unsigned long vaddr = 0;
> +
> +	if (tu == NULL) {
> +		/* A NULL tu means that we already got the vaddr */
> +		return (void __force __user *) addr;
> +	}
> +
> +	mapping = tu->inode->i_mapping;
> +
> +	mutex_lock(&mapping->i_mmap_mutex);
> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
> +		if (vma->vm_mm != current->mm)
> +			continue;
> +		if (!(vma->vm_flags & VM_READ))
> +			continue;
> +
> +		vaddr = offset_to_vaddr(vma, addr);
> +		break;
> +	}
> +	mutex_unlock(&mapping->i_mmap_mutex);
> +
> +	WARN_ON_ONCE(vaddr == 0);
> +	return (void __force __user *) vaddr;

So. If I understand correctly, @addr cat only read the memory mmaped
to the probed binary, and we need to translate the address... And in
general we can't read the data from bss.

Right?

I'll probably ask another question about this later...

> +static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
> +					void *addr, void *dest, void *priv)
> +{
> +	long ret;
> +	u32 rloc = *(u32 *)dest;
> +	int maxlen = get_rloc_len(rloc);
> +	u8 *dst = get_rloc_data(dest);
> +	void __user *vaddr = get_user_vaddr((unsigned long)addr, priv);
> +	void __user *src = vaddr;
> +
> +	if (!maxlen)
> +		return;
> +
> +	do {
> +		ret = copy_from_user(dst, src, sizeof(*dst));
> +		dst++;
> +		src++;
> +	} while (dst[-1] && ret == 0 && (src - vaddr) < maxlen);

Can't we use strncpy_from_user() ?

> +static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
> +					   void *addr, void *dest, void *priv)
> +{
> +	int ret, len = 0;
> +	u8 c;
> +	void __user *vaddr = get_user_vaddr((unsigned long)addr, priv);
> +
> +	do {
> +		ret = __copy_from_user_inatomic(&c, vaddr + len, 1);

Hmm. I guess I need to actually apply this series ;)

Why inatomic? it seems that this is for uprobes, no? And perhaps
strnlen_user() should work just fine?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
                   ` (13 preceding siblings ...)
  2013-10-30 10:36 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Masami Hiramatsu
@ 2013-11-02 15:54 ` Oleg Nesterov
  2013-11-04  8:46   ` Namhyung Kim
  14 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-02 15:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hello,

Let me first apologize again if this was already discussed. And I also
need to mention that I know almost nothing about elf/randomization/etc.

However,

On 10/29, Namhyung Kim wrote:
>
>   # nm foo | grep -e glob$ -e str -e foo
>   00000000006008bc D foo
>   00000000006008a8 D glob
>   00000000006008ac D str
>
>   # perf probe -x /home/namhyung/tmp/foo -a 'foo=main+0x13 glob=@0x8a8:s32 \

This does not look right to me.

- get_user_vaddr() is costly, it does vma_interval_tree_foreach() under
  ->i_mmap_mutex.

- this only allows to read the data from the same binary.

- in particular, you can't read the data from bss

- get_user_vaddr() looks simply wrong. I blindly applied the whole series
  and did the test to ensure.

  Test-case:

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>

	unsigned int global = 0x1234;

	void func(void)
	{
	}

	int main(void)
	{
		char cmd[64];

		global = 0x4321;
		func();

		printf("addr = %p\n", &global);

		sprintf(cmd, "cat /proc/%d/maps", getpid());
		system(cmd);

		return 0;
	}

	# nm foo | grep -w global
	0000000000600a04 D global

	# perf probe -x ./foo -a "func var=@0xa04:u32"
	# perf record -e probe_foo:func ./foo
	addr = 0x600a04
	00400000-00401000 r-xp 00000000 fe:01 20958                              /root/foo
	00600000-00601000 rw-p 00000000 fe:01 20958                              /root/foo
	...

	# perf script | tail -1
		foo   555 [000]  1302.345642: probe_foo:func: (40059c) var=1234

	Note that it reports "1234", not "4321". This is because
	get_user_vaddr() finds another (1st) read-only mapping, and
	prints the initial value of "global".

	IOW, it reads the memory from 0x400a04, not from 0x600a04.

-------------------------------------------------------------------------------
Can't we simply implement get_user_vaddr() as

	static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
	{
		void __user *vaddr = (void __force __user *)addr;

		/* A NULL tu means that we already got the vaddr */
		if (tu)
			vaddr += (current->mm->start_data & PAGE_MASK);

		return vaddr;
	}

?

I did this change, and now the test-case above works. And it also works
with "cc -pie -fPIC",

	# nm foo | grep -w global
	0000000000200c9c D global

	# perf probe -x ./foo -a "func var=@0xc9c:u32"
	# perf record -e probe_foo:func ./foo
	...
	# perf script | tail -1
		foo   576 [001]   475.519940: probe_foo:func: (7ffe95ca3814) var=4321

What do you think?

-------------------------------------------------------------------------------
Note:
	- I think that /* A NULL tu means that we already got the vaddr */
	  needs more discussion... IOW, I am not sure about 11/13.

	- Perhaps it also makes sense to allow to pass the absolute address
	  (iow, += start_data should be conditional)

but lets ignore this for now.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-01 15:22     ` Oleg Nesterov
@ 2013-11-03 20:20       ` Oleg Nesterov
  2013-11-04  8:11         ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-03 20:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/01, Oleg Nesterov wrote:
>
> Ah, please ignore... handler_chain() is not self-serialized, so
> tu->buffer needs locking/waiting too.

Still I have to admit that I strongly dislike this yet another
(and imho strange) memory pool. However, I am not going to argue
because I can't suggest something better right now.

But. Perhaps it makes sense to at least add a couple of trivial
helpers in 10/13? Something like arg_buf_get/put/init, just to
simplify the potential changes.

Also. I think we should not use this pool unconditionally but
again, this is another improvement we can do later.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-10-31 18:16   ` Oleg Nesterov
  2013-11-01  9:00     ` Namhyung Kim
@ 2013-11-04  8:06     ` Namhyung Kim
  2013-11-04 14:35       ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-04  8:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

On Thu, 31 Oct 2013 19:16:54 +0100, Oleg Nesterov wrote:
> On 10/29, Namhyung Kim wrote:
>>
>> @@ -630,6 +653,19 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
>>  	if (trace_probe_is_enabled(&tu->p))
>>  		return -EINTR;
>>  
>> +	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
>> +		int cpu;
>> +
>> +		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
>> +		if (uprobe_cpu_buffer == NULL) {
>> +			atomic_dec(&uprobe_buffer_ref);
>> +			return -ENOMEM;
>> +		}
>> +
>> +		for_each_possible_cpu(cpu)
>> +			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
>> +	}
>> +
>>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>>  
>>  	tu->p.flags |= flag;
>> @@ -646,6 +682,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>>  	if (!trace_probe_is_enabled(&tu->p))
>>  		return;
>>  
>> +	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
>> +		free_percpu(uprobe_cpu_buffer);
>> +		uprobe_cpu_buffer = NULL;
>> +	}
>> +
>>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>
> Do we really need atomic_t? probe_event_enable/disable is called under
> event_mutex and we rely on this fact anyway.

Looking at the code, it seems probe_event_enable/disable() is called
without event_mutex when it called from sys_perf_event_open().  So we
still need to protect refcount from concurrent accesses.

>
> Otherwise this logic looks racy even with atomic_t, another thread could
> use the uninitialized uprobe_cpu_buffer/mutex if it registers another probe
> and the handler runs before we complete the initialization, no?

But yeah, this is indeed a problem.  Thanks for pointing it out.  I'll
put a mutex to prevent such cases.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-03 20:20       ` Oleg Nesterov
@ 2013-11-04  8:11         ` Namhyung Kim
  2013-11-04 14:38           ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-04  8:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Sun, 3 Nov 2013 21:20:37 +0100, Oleg Nesterov wrote:
> On 11/01, Oleg Nesterov wrote:
>>
>> Ah, please ignore... handler_chain() is not self-serialized, so
>> tu->buffer needs locking/waiting too.
>
> Still I have to admit that I strongly dislike this yet another
> (and imho strange) memory pool. However, I am not going to argue
> because I can't suggest something better right now.

Okay.

>
> But. Perhaps it makes sense to at least add a couple of trivial
> helpers in 10/13? Something like arg_buf_get/put/init, just to
> simplify the potential changes.

Good idea.  How about something like below?


struct uprobe_cpu_buffer {
	struct mutex mutex;
	void *buf;
};
static struct uprobe_cpu_buffer __percpu *uprobe_cpu_buffer;
static DEFINE_MUTEX(uprobe_buffer_mutex);
static int uprobe_buffer_refcnt;

static int uprobe_buffer_init(void)
{
	int cpu, err_cpu;

	uprobe_cpu_buffer = alloc_percpu(struct uprobe_cpu_buffer);
	if (uprobe_cpu_buffer == NULL)
		return -ENOMEM;

	for_each_possible_cpu(cpu) {
		struct page *p = alloc_pages_node(cpu_to_node(cpu),
						  GFP_KERNEL, 0);
		if (p == NULL) {
			err_cpu = cpu;
			goto err;
		}
		per_cpu_ptr(uprobe_cpu_buffer, cpu)->buf = page_address(p);
		mutex_init(&per_cpu_ptr(uprobe_cpu_buffer, cpu)->mutex);
	}

	return 0;

err:
	for_each_possible_cpu(cpu) {
		if (cpu == err_cpu)
			break;
		free_page((unsigned long)per_cpu_ptr(uprobe_cpu_buffer, cpu)->buf);
	}

	free_percpu(uprobe_cpu_buffer);
	return -ENOMEM;
}

static int uprobe_buffer_enable(void)
{
	int ret = 0;

	mutex_lock(&uprobe_buffer_mutex);
	if (uprobe_buffer_refcnt++ == 0) {
		ret = uprobe_buffer_init();
		if (ret < 0)
			uprobe_buffer_refcnt--;
	}
	mutex_unlock(&uprobe_buffer_mutex);

	return ret;
}

static void uprobe_buffer_disable(void)
{
	mutex_lock(&uprobe_buffer_mutex);
	if (--uprobe_buffer_refcnt == 0) {
		free_percpu(uprobe_cpu_buffer);
		uprobe_cpu_buffer = NULL;
	}
	mutex_unlock(&uprobe_buffer_mutex);
}

static struct uprobe_cpu_buffer *uprobe_buffer_get(void)
{
	struct uprobe_cpu_buffer *ucb;
	int cpu;

	cpu = raw_smp_processor_id();
	ucb = per_cpu_ptr(uprobe_cpu_buffer, cpu);

	/*
	 * Use per-cpu buffers for fastest access, but we might migrate
	 * so the mutex makes sure we have sole access to it.
	 */
	mutex_lock(&ucb->mutex);

	return ucb;
}

static void uprobe_buffer_put(struct uprobe_cpu_buffer *ucb)
{
	mutex_unlock(&ucb->mutex);
}


Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-02 15:54 ` Oleg Nesterov
@ 2013-11-04  8:46   ` Namhyung Kim
  2013-11-04  8:59     ` Namhyung Kim
  2013-11-04 15:01     ` Oleg Nesterov
  0 siblings, 2 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-04  8:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
> Hello,
>
> Let me first apologize again if this was already discussed. And I also
> need to mention that I know almost nothing about elf/randomization/etc.

No no, this was not discussed enough.  And I really appreciate your
thorough review! :)

>
> However,
>
> On 10/29, Namhyung Kim wrote:
>>
>>   # nm foo | grep -e glob$ -e str -e foo
>>   00000000006008bc D foo
>>   00000000006008a8 D glob
>>   00000000006008ac D str
>>
>>   # perf probe -x /home/namhyung/tmp/foo -a 'foo=main+0x13 glob=@0x8a8:s32 \
>
> This does not look right to me.
>
> - get_user_vaddr() is costly, it does vma_interval_tree_foreach() under
>   ->i_mmap_mutex.

Hmm.. yes, I think this is not needed.  I guess it should lookup a
proper vma in current->mm with mmap_sem read-locked.

>
> - this only allows to read the data from the same binary.

Right.  This is also an unnecessary restriction.  We should be able to
access data in other binary.

>
> - in particular, you can't read the data from bss

I can't understand why..  The bss region should also be in a same vma of
normal data, no?

>
> - get_user_vaddr() looks simply wrong. I blindly applied the whole series
>   and did the test to ensure.
>
>   Test-case:
>
> 	#include <stdio.h>
> 	#include <stdlib.h>
> 	#include <unistd.h>
>
> 	unsigned int global = 0x1234;
>
> 	void func(void)
> 	{
> 	}
>
> 	int main(void)
> 	{
> 		char cmd[64];
>
> 		global = 0x4321;
> 		func();
>
> 		printf("addr = %p\n", &global);
>
> 		sprintf(cmd, "cat /proc/%d/maps", getpid());
> 		system(cmd);
>
> 		return 0;
> 	}
>
> 	# nm foo | grep -w global
> 	0000000000600a04 D global
>
> 	# perf probe -x ./foo -a "func var=@0xa04:u32"
> 	# perf record -e probe_foo:func ./foo
> 	addr = 0x600a04
> 	00400000-00401000 r-xp 00000000 fe:01 20958                              /root/foo
> 	00600000-00601000 rw-p 00000000 fe:01 20958                              /root/foo
> 	...
>
> 	# perf script | tail -1
> 		foo   555 [000]  1302.345642: probe_foo:func: (40059c) var=1234
>
> 	Note that it reports "1234", not "4321". This is because
> 	get_user_vaddr() finds another (1st) read-only mapping, and
> 	prints the initial value of "global".
>
> 	IOW, it reads the memory from 0x400a04, not from 0x600a04.

Argh..  This is a problem.

I thought the gcc somehow aligns data to next page boundary.  But if
it's not the case, we need to recognize which is the proper one..

Simply preferring a writable vma to a read-only vma is what's came to my
head now.  Do you have an idea?

>
> -------------------------------------------------------------------------------
> Can't we simply implement get_user_vaddr() as
>
> 	static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> 	{
> 		void __user *vaddr = (void __force __user *)addr;
>
> 		/* A NULL tu means that we already got the vaddr */
> 		if (tu)
> 			vaddr += (current->mm->start_data & PAGE_MASK);
>
> 		return vaddr;
> 	}
>
> ?
>
> I did this change, and now the test-case above works. And it also works
> with "cc -pie -fPIC",
>
> 	# nm foo | grep -w global
> 	0000000000200c9c D global
>
> 	# perf probe -x ./foo -a "func var=@0xc9c:u32"
> 	# perf record -e probe_foo:func ./foo
> 	...
> 	# perf script | tail -1
> 		foo   576 [001]   475.519940: probe_foo:func: (7ffe95ca3814) var=4321
>
> What do you think?

This can only work with the probes fetching data from the executable,
right?  But as I said it should support any other binaries too.

What about this?


static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
{
	unsigned long pgoff = addr >> PAGE_SHIFT;
	struct vm_area_struct *vma, *orig_vma = NULL;
	unsigned long vaddr = 0;

	if (tu == NULL) {
		/* A NULL tu means that we already got the vaddr */
		return (void __force __user *) addr;
	}

	down_read(&current->mm->mmap_sem);

	vma = current->mm->mmap;
	do {
		if (!vma->vm_file || vma->vm_file->f_inode != tu->inode) {
			/*
			 * We found read-only mapping for this inode.
			 * (provided that all mappings for this inode
			 * have consecutive addresses)
			 */
			if (orig_vma)
				break;
			continue;
		}

		if (vma->vm_pgoff > pgoff ||
		    (vma->vm_pgoff + vma_pages(vma) <= pgoff))
			continue;

		orig_vma = vma;

		/*
		 * We prefer writable mapping over read-only since
		 * data is usually in read/write memory region.  But
		 * in case of read-only data, it only can be found in
		 * read-only mapping so we save orig_vma and check
		 * whether it also has writable mapping.
		 */
		if (vma->vm_flags & VM_WRITE)
			break;
	} while ((vma = vma->vm_next) != NULL);

	if (orig_vma)
		vaddr = offset_to_vaddr(orig_vma, addr);

	up_read(&current->mm->mmap_sem);

	return (void __force __user *) vaddr;
}


>
> -------------------------------------------------------------------------------
> Note:
> 	- I think that /* A NULL tu means that we already got the vaddr */
> 	  needs more discussion... IOW, I am not sure about 11/13.

Discussion and feedbacks are always more than welcome. :)

>
> 	- Perhaps it also makes sense to allow to pass the absolute address
> 	  (iow, += start_data should be conditional)

For per-process uprobe, it might be useful.

>
> but lets ignore this for now.

Okay.  Let's discuss again after solving current issues.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-10-31 18:22   ` Oleg Nesterov
@ 2013-11-04  8:50     ` Namhyung Kim
  2013-11-04 16:44       ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-04  8:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Thu, 31 Oct 2013 19:22:18 +0100, Oleg Nesterov wrote:
> On 10/29, Namhyung Kim wrote:
>>
>> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> +{
>> +	unsigned long pgoff = addr >> PAGE_SHIFT;
>> +	struct vm_area_struct *vma;
>> +	struct address_space *mapping;
>> +	unsigned long vaddr = 0;
>> +
>> +	if (tu == NULL) {
>> +		/* A NULL tu means that we already got the vaddr */
>> +		return (void __force __user *) addr;
>> +	}
>> +
>> +	mapping = tu->inode->i_mapping;
>> +
>> +	mutex_lock(&mapping->i_mmap_mutex);
>> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
>> +		if (vma->vm_mm != current->mm)
>> +			continue;
>> +		if (!(vma->vm_flags & VM_READ))
>> +			continue;
>> +
>> +		vaddr = offset_to_vaddr(vma, addr);
>> +		break;
>> +	}
>> +	mutex_unlock(&mapping->i_mmap_mutex);
>> +
>> +	WARN_ON_ONCE(vaddr == 0);
>
> Hmm. But unless I missed something this "addr" passed as an argument can
> be wrong? And if nothing else this or another thread can unmap the vma?

You mean WARN_ON_ONCE here is superfluous?  I admit that it should
protect concurrent vma [un]mappings.  Please see my reply in other
thread for a new approach.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04  8:46   ` Namhyung Kim
@ 2013-11-04  8:59     ` Namhyung Kim
  2013-11-04 15:51       ` Oleg Nesterov
  2013-11-04 15:01     ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-04  8:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 04 Nov 2013 17:46:41 +0900, Namhyung Kim wrote:
> On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
>> - this only allows to read the data from the same binary.
>
> Right.  This is also an unnecessary restriction.  We should be able to
> access data in other binary.

Hmm.. I guess this gonna be not simple - perhaps it can only be
supported for per-process uprobe with known virtual addresses?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-04  8:06     ` Namhyung Kim
@ 2013-11-04 14:35       ` Oleg Nesterov
  2013-11-05  1:12         ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 14:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Namhyung,

On 11/04, Namhyung Kim wrote:
>
> >>
> >> +	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
> >> +		free_percpu(uprobe_cpu_buffer);
> >> +		uprobe_cpu_buffer = NULL;
> >> +	}
> >> +
> >>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
> >
> > Do we really need atomic_t? probe_event_enable/disable is called under
> > event_mutex and we rely on this fact anyway.
>
> Looking at the code, it seems probe_event_enable/disable() is called
> without event_mutex when it called from sys_perf_event_open().

Where?

__ftrace_set_clr_event(), perf_trace_init() or perf_trace_destroy()
hold event_mutex. We rely on this fact anyway.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-04  8:11         ` Namhyung Kim
@ 2013-11-04 14:38           ` Oleg Nesterov
  2013-11-05  1:17             ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 14:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Namhyung Kim wrote:
>
> On Sun, 3 Nov 2013 21:20:37 +0100, Oleg Nesterov wrote:
> >
> > But. Perhaps it makes sense to at least add a couple of trivial
> > helpers in 10/13? Something like arg_buf_get/put/init, just to
> > simplify the potential changes.
>
> Good idea.  How about something like below?

Thanks, I agree with any implementation ;)

> static DEFINE_MUTEX(uprobe_buffer_mutex);

but see my previous email, afaics we don't need it.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04  8:46   ` Namhyung Kim
  2013-11-04  8:59     ` Namhyung Kim
@ 2013-11-04 15:01     ` Oleg Nesterov
  2013-11-05  1:53       ` Namhyung Kim
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 15:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Namhyung Kim wrote:
>
> On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
> >
> > This does not look right to me.
> >
> > - get_user_vaddr() is costly, it does vma_interval_tree_foreach() under
> >   ->i_mmap_mutex.
>
> Hmm.. yes, I think this is not needed.  I guess it should lookup a
> proper vma in current->mm with mmap_sem read-locked.
>
> >
> > - this only allows to read the data from the same binary.
>
> Right.  This is also an unnecessary restriction.  We should be able to
> access data in other binary.

Yes... but this needs another discussion. In general, we simply can not
do this with the suggested syntax.

Say you want to probe this "foo" binary and dump "stdin" from libc.so.
You can't do this. You simply can't know where libc.so will be mmaped.

But: if we attach the event to the already running process, or if we
disable the randomization, then we can probably do this, see below.

Or the syntax should be "name=probe @file/addr" or something like this.

> > - in particular, you can't read the data from bss
>
> I can't understand why..  The bss region should also be in a same vma of
> normal data, no?

No, no. bss is mmaped anonymously, at least in general. See set_brk() in
load_elf().

> > 	#include <stdio.h>
> > 	#include <stdlib.h>
> > 	#include <unistd.h>
> >
> > 	unsigned int global = 0x1234;
> >
> > 	void func(void)
> > 	{
> > 	}
> >
> > 	int main(void)
> > 	{
> > 		char cmd[64];
> >
> > 		global = 0x4321;
> > 		func();
> >
> > 		printf("addr = %p\n", &global);
> >
> > 		sprintf(cmd, "cat /proc/%d/maps", getpid());
> > 		system(cmd);
> >
> > 		return 0;
> > 	}
> >
> > 	# nm foo | grep -w global
> > 	0000000000600a04 D global
> >
> > 	# perf probe -x ./foo -a "func var=@0xa04:u32"
> > 	# perf record -e probe_foo:func ./foo
> > 	addr = 0x600a04
> > 	00400000-00401000 r-xp 00000000 fe:01 20958                              /root/foo
> > 	00600000-00601000 rw-p 00000000 fe:01 20958                              /root/foo
> > 	...
> >
> > 	# perf script | tail -1
> > 		foo   555 [000]  1302.345642: probe_foo:func: (40059c) var=1234
> >
> > 	Note that it reports "1234", not "4321". This is because
> > 	get_user_vaddr() finds another (1st) read-only mapping, and
> > 	prints the initial value of "global".
> >
> > 	IOW, it reads the memory from 0x400a04, not from 0x600a04.
>
> Argh..  This is a problem.
>
> I thought the gcc somehow aligns data to next page boundary.

And perhaps it even should, my system is old. But this doesn't really
matter, the process itself can create another mapping.

> But if
> it's not the case, we need to recognize which is the proper one..
>
> Simply preferring a writable vma to a read-only vma is what's came to my
> head now.  Do you have an idea?

So far I think that trace_uprobes.c should not play games with vma. At all.

> > -------------------------------------------------------------------------------
> > Can't we simply implement get_user_vaddr() as
> >
> > 	static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> > 	{
> > 		void __user *vaddr = (void __force __user *)addr;
> >
> > 		/* A NULL tu means that we already got the vaddr */
> > 		if (tu)
> > 			vaddr += (current->mm->start_data & PAGE_MASK);
> >
> > 		return vaddr;
> > 	}
> >
> > ?
> >
> > I did this change, and now the test-case above works. And it also works
> > with "cc -pie -fPIC",
> >
> > 	# nm foo | grep -w global
> > 	0000000000200c9c D global
> >
> > 	# perf probe -x ./foo -a "func var=@0xc9c:u32"
> > 	# perf record -e probe_foo:func ./foo
> > 	...
> > 	# perf script | tail -1
> > 		foo   576 [001]   475.519940: probe_foo:func: (7ffe95ca3814) var=4321
> >
> > What do you think?
>
> This can only work with the probes fetching data from the executable,
> right?  But as I said it should support any other binaries too.

See above, we can't in general read other binaries.

But: if we know know where it is mmapped we can do this, just we need
to calculate the right addr passed to trace_uprobes.

Or: we should support both absolute and relative addresses, this is what
I was going to discuss later.

> static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> {
> 	unsigned long pgoff = addr >> PAGE_SHIFT;
> 	struct vm_area_struct *vma, *orig_vma = NULL;
> 	unsigned long vaddr = 0;
>
> 	if (tu == NULL) {
> 		/* A NULL tu means that we already got the vaddr */
> 		return (void __force __user *) addr;
> 	}
>
> 	down_read(&current->mm->mmap_sem);
>
> 	vma = current->mm->mmap;

Cough, it can be null if another thread does munmap(0, TASK_SIZE) ;)

But this doesn't matter.

> 	do {
> 		if (!vma->vm_file || vma->vm_file->f_inode != tu->inode) {
> 			/*
> 			 * We found read-only mapping for this inode.
> 			 * (provided that all mappings for this inode
> 			 * have consecutive addresses)
> 			 */
> 			if (orig_vma)
> 				break;
> 			continue;
> 		}
>
> 		if (vma->vm_pgoff > pgoff ||
> 		    (vma->vm_pgoff + vma_pages(vma) <= pgoff))
> 			continue;
>
> 		orig_vma = vma;
>
> 		/*
> 		 * We prefer writable mapping over read-only since
> 		 * data is usually in read/write memory region.  But
> 		 * in case of read-only data, it only can be found in
> 		 * read-only mapping so we save orig_vma and check
> 		 * whether it also has writable mapping.
> 		 */
> 		if (vma->vm_flags & VM_WRITE)
> 			break;
> 	} while ((vma = vma->vm_next) != NULL);
>
> 	if (orig_vma)
> 		vaddr = offset_to_vaddr(orig_vma, addr);
>
> 	up_read(&current->mm->mmap_sem);
>
> 	return (void __force __user *) vaddr;
> }

For what? Why it is better then my suggestion?

How it can read bss? How it can read the data from other binaries?

How we can trust the result? This code relies on some guesses and
none of them are "strict".

If nothing else, elf can have the arbitrary number of mmaped sections,
this can't work in general?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04  8:59     ` Namhyung Kim
@ 2013-11-04 15:51       ` Oleg Nesterov
  2013-11-04 16:22         ` Oleg Nesterov
  2013-11-05  1:59         ` Namhyung Kim
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 15:51 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Namhyung Kim wrote:
> On Mon, 04 Nov 2013 17:46:41 +0900, Namhyung Kim wrote:
> > On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
> >> - this only allows to read the data from the same binary.
> >
> > Right.  This is also an unnecessary restriction.  We should be able to
> > access data in other binary.
>
> Hmm.. I guess this gonna be not simple

Yes ;)

- perhaps it can only be
> supported for per-process uprobe with known virtual addresses?

"Known" is very limited. Even in the simplest case (like your test-case
from from 0/13), you simply can't know the address of "int glob" if you
compile it with "-pie -fPIC".

As for other binaries (say libc) the problem is even worse, and
randomize_va_space adds even more pain.

But in any case, I strongly believe that it doesn't make any sense to
rely on tu->inode in get_user_vaddr().

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions
  2013-10-29  6:53 ` [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions Namhyung Kim
@ 2013-11-04 16:09   ` Oleg Nesterov
  2013-11-05  2:10     ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 16:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

See my replies to 0/13. Lets assume that you agree that get_user_vaddr()
doesn't need tu->inode.

On 10/29, Namhyung Kim wrote:
>
> This argument is for passing private data structure to each fetch
> function and will be used by uprobes.

In this case, why do we need this "void *priv"? It actually becomes
"bool need_addr_translation".

Can't we avoid it? Can't we just add FETCH_MTD_memory_notranslate ?
kprobes should use the same methods for FETCH_MTD_memory*, uprobes
should obviously adjust the addr in FETCH_MTD_memory.

Then (afaics) we need a single change in parse_probe_arg(),

	-	dprm->fetch = t->fetch[FETCH_MTD_memory];
	+	dprm->fetch = t->fetch[FETCH_MTD_memory_notranslate];

Yes, this will blow *probes_fetch_type_table[], but looks simpler.

And. This way it would be simple to teach parse_probe_arg('@') to use
_notranslate, say, "@=addr" or whatever.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 15:51       ` Oleg Nesterov
@ 2013-11-04 16:22         ` Oleg Nesterov
  2013-11-04 18:47           ` Oleg Nesterov
  2013-11-05  2:15           ` Namhyung Kim
  2013-11-05  1:59         ` Namhyung Kim
  1 sibling, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 16:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Oleg Nesterov wrote:
>
> But in any case, I strongly believe that it doesn't make any sense to
> rely on tu->inode in get_user_vaddr().

Hmm. But I forgot about the case when you probe the function in libc
and want to dump the variable in libc...

So probably I was wrong and this all needs more thinking. Damn.
Perhaps we really need to pass @file/offset, but it is not clear what
we can do with bss/anon-mapping.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-11-04  8:50     ` Namhyung Kim
@ 2013-11-04 16:44       ` Oleg Nesterov
  2013-11-04 17:17         ` Steven Rostedt
  2013-11-05  2:17         ` Namhyung Kim
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 16:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Namhyung Kim wrote:
>
> On Thu, 31 Oct 2013 19:22:18 +0100, Oleg Nesterov wrote:
> > On 10/29, Namhyung Kim wrote:
> >>
> >> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> >> +{
> >> +	unsigned long pgoff = addr >> PAGE_SHIFT;
> >> +	struct vm_area_struct *vma;
> >> +	struct address_space *mapping;
> >> +	unsigned long vaddr = 0;
> >> +
> >> +	if (tu == NULL) {
> >> +		/* A NULL tu means that we already got the vaddr */
> >> +		return (void __force __user *) addr;
> >> +	}
> >> +
> >> +	mapping = tu->inode->i_mapping;
> >> +
> >> +	mutex_lock(&mapping->i_mmap_mutex);
> >> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
> >> +		if (vma->vm_mm != current->mm)
> >> +			continue;
> >> +		if (!(vma->vm_flags & VM_READ))
> >> +			continue;
> >> +
> >> +		vaddr = offset_to_vaddr(vma, addr);
> >> +		break;
> >> +	}
> >> +	mutex_unlock(&mapping->i_mmap_mutex);
> >> +
> >> +	WARN_ON_ONCE(vaddr == 0);
> >
> > Hmm. But unless I missed something this "addr" passed as an argument can
> > be wrong? And if nothing else this or another thread can unmap the vma?
>
> You mean WARN_ON_ONCE here is superfluous?  I admit that it should
> protect concurrent vma [un]mappings.  Please see my reply in other
> thread for a new approach.

Whatever we do this address can be unmapped. For example, just because of
@invalid_address passed to trace_uprobe.c.

We do not really care, copy_from_user() should fail. But we should not
WARN() in this case.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-11-04 16:44       ` Oleg Nesterov
@ 2013-11-04 17:17         ` Steven Rostedt
  2013-11-05  2:19           ` Namhyung Kim
  2013-11-05  2:17         ` Namhyung Kim
  1 sibling, 1 reply; 92+ messages in thread
From: Steven Rostedt @ 2013-11-04 17:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Namhyung Kim, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 17:44:31 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> On 11/04, Namhyung Kim wrote:
> >
> > On Thu, 31 Oct 2013 19:22:18 +0100, Oleg Nesterov wrote:
> > > On 10/29, Namhyung Kim wrote:
> > >>
> > >> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> > >> +{
> > >> +	unsigned long pgoff = addr >> PAGE_SHIFT;
> > >> +	struct vm_area_struct *vma;
> > >> +	struct address_space *mapping;
> > >> +	unsigned long vaddr = 0;
> > >> +
> > >> +	if (tu == NULL) {
> > >> +		/* A NULL tu means that we already got the vaddr */
> > >> +		return (void __force __user *) addr;
> > >> +	}
> > >> +
> > >> +	mapping = tu->inode->i_mapping;
> > >> +
> > >> +	mutex_lock(&mapping->i_mmap_mutex);
> > >> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
> > >> +		if (vma->vm_mm != current->mm)
> > >> +			continue;
> > >> +		if (!(vma->vm_flags & VM_READ))
> > >> +			continue;
> > >> +
> > >> +		vaddr = offset_to_vaddr(vma, addr);
> > >> +		break;
> > >> +	}
> > >> +	mutex_unlock(&mapping->i_mmap_mutex);
> > >> +
> > >> +	WARN_ON_ONCE(vaddr == 0);
> > >
> > > Hmm. But unless I missed something this "addr" passed as an argument can
> > > be wrong? And if nothing else this or another thread can unmap the vma?
> >
> > You mean WARN_ON_ONCE here is superfluous?  I admit that it should
> > protect concurrent vma [un]mappings.  Please see my reply in other
> > thread for a new approach.
> 
> Whatever we do this address can be unmapped. For example, just because of
> @invalid_address passed to trace_uprobe.c.
> 
> We do not really care, copy_from_user() should fail. But we should not
> WARN() in this case.
> 

I agree, the WARN_ON_ONCE() above looks like it's uncalled for.
WARN()ings should only be used when an anomaly in the kernel logic is
detected. Can this trigger on bad input from user space, or something
else that userspace does? (a race with unmapping memory?). If so, error
out to the user process, but do not call any of the WARN() functions.

-- Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 16:22         ` Oleg Nesterov
@ 2013-11-04 18:47           ` Oleg Nesterov
  2013-11-04 18:57             ` Oleg Nesterov
                               ` (2 more replies)
  2013-11-05  2:15           ` Namhyung Kim
  1 sibling, 3 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 18:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Oleg Nesterov wrote:
>
> On 11/04, Oleg Nesterov wrote:
> >
> > But in any case, I strongly believe that it doesn't make any sense to
> > rely on tu->inode in get_user_vaddr().
>
> Hmm. But I forgot about the case when you probe the function in libc
> and want to dump the variable in libc...
>
> So probably I was wrong and this all needs more thinking. Damn.
> Perhaps we really need to pass @file/offset, but it is not clear what
> we can do with bss/anon-mapping.

Or. Not that I really like this, but just for discussion...

How about

	static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr)
	{
		return (void __force __user *)addr + instruction_pointer(regs);
	}

?

This should solve the problems with relocations/randomization/bss.

The obvious disadvantage is that it is not easy to calculate the
offset we need to pass as an argument, it depends on the probed
function.

And this still doesn't allow to, say, probe the executable but read
the data from libc. Unless, again, we attach to the running process
or randomize_va_space = 0, so we can know it in advance. But otherwise
I do not think there is any solution.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 18:47           ` Oleg Nesterov
@ 2013-11-04 18:57             ` Oleg Nesterov
  2013-11-05  2:51               ` Namhyung Kim
  2013-11-05  2:49             ` Namhyung Kim
  2013-11-05  6:58             ` Namhyung Kim
  2 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-04 18:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/04, Oleg Nesterov wrote:
>
> On 11/04, Oleg Nesterov wrote:
> >
> > On 11/04, Oleg Nesterov wrote:
> > >
> > > But in any case, I strongly believe that it doesn't make any sense to
> > > rely on tu->inode in get_user_vaddr().
> >
> > Hmm. But I forgot about the case when you probe the function in libc
> > and want to dump the variable in libc...
> >
> > So probably I was wrong and this all needs more thinking. Damn.
> > Perhaps we really need to pass @file/offset, but it is not clear what
> > we can do with bss/anon-mapping.
>
> Or. Not that I really like this, but just for discussion...
>
> How about
>
> 	static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr)
> 	{
> 		return (void __force __user *)addr + instruction_pointer(regs);
> 	}
>
> ?
>
> This should solve the problems with relocations/randomization/bss.
>
> The obvious disadvantage is that it is not easy to calculate the
> offset we need to pass as an argument, it depends on the probed
> function.

forgot to mention... and instruction_pointer() can't work in ret-probe,
we need to pass the "unsigned long func" arg somehow...

>
> And this still doesn't allow to, say, probe the executable but read
> the data from libc. Unless, again, we attach to the running process
> or randomize_va_space = 0, so we can know it in advance. But otherwise
> I do not think there is any solution.
>
> Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-04 14:35       ` Oleg Nesterov
@ 2013-11-05  1:12         ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  1:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

On Mon, 4 Nov 2013 15:35:17 +0100, Oleg Nesterov wrote:
> Hi Namhyung,
>
> On 11/04, Namhyung Kim wrote:
>>
>> >>
>> >> +	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
>> >> +		free_percpu(uprobe_cpu_buffer);
>> >> +		uprobe_cpu_buffer = NULL;
>> >> +	}
>> >> +
>> >>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>> >
>> > Do we really need atomic_t? probe_event_enable/disable is called under
>> > event_mutex and we rely on this fact anyway.
>>
>> Looking at the code, it seems probe_event_enable/disable() is called
>> without event_mutex when it called from sys_perf_event_open().
>
> Where?
>
> __ftrace_set_clr_event(), perf_trace_init() or perf_trace_destroy()
> hold event_mutex. We rely on this fact anyway.

Ah, you're right.  My eyes simply missed it. ;-)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-11-04 14:38           ` Oleg Nesterov
@ 2013-11-05  1:17             ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  1:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 15:38:13 +0100, Oleg Nesterov wrote:
> On 11/04, Namhyung Kim wrote:
>>
>> On Sun, 3 Nov 2013 21:20:37 +0100, Oleg Nesterov wrote:
>> >
>> > But. Perhaps it makes sense to at least add a couple of trivial
>> > helpers in 10/13? Something like arg_buf_get/put/init, just to
>> > simplify the potential changes.
>>
>> Good idea.  How about something like below?
>
> Thanks, I agree with any implementation ;)
>
>> static DEFINE_MUTEX(uprobe_buffer_mutex);
>
> but see my previous email, afaics we don't need it.

I see.  Probably checking the event_mutex is locked in front of the
_enable/disable function would be enough.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 15:01     ` Oleg Nesterov
@ 2013-11-05  1:53       ` Namhyung Kim
  2013-11-05 16:28         ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  1:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 16:01:12 +0100, Oleg Nesterov wrote:
> On 11/04, Namhyung Kim wrote:
>>
>> On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
>> >
>> > This does not look right to me.
>> >
>> > - get_user_vaddr() is costly, it does vma_interval_tree_foreach() under
>> >   ->i_mmap_mutex.
>>
>> Hmm.. yes, I think this is not needed.  I guess it should lookup a
>> proper vma in current->mm with mmap_sem read-locked.
>>
>> >
>> > - this only allows to read the data from the same binary.
>>
>> Right.  This is also an unnecessary restriction.  We should be able to
>> access data in other binary.
>
> Yes... but this needs another discussion. In general, we simply can not
> do this with the suggested syntax.

Agreed.

>
> Say you want to probe this "foo" binary and dump "stdin" from libc.so.
> You can't do this. You simply can't know where libc.so will be mmaped.
>
> But: if we attach the event to the already running process, or if we
> disable the randomization, then we can probably do this, see below.
>
> Or the syntax should be "name=probe @file/addr" or something like this.

Okay.  Let's call this kind of thing "cross-fetch" (or a better name can
be suggested).  This is more complex situation and needs more discussion
as you said.  So let's skip the discussion for now. :)

>
>> > - in particular, you can't read the data from bss
>>
>> I can't understand why..  The bss region should also be in a same vma of
>> normal data, no?
>
> No, no. bss is mmaped anonymously, at least in general. See set_brk() in
> load_elf().

Ah, thanks for the pointer.  I also need to say that I'm not familiar
with the code base.

Looking at the code, it seems to add a anon mapping iff the bss region
spans on two or more pages - that's why I missed it from my simple
test. :/

>
>> I thought the gcc somehow aligns data to next page boundary.
>
> And perhaps it even should, my system is old. But this doesn't really
> matter, the process itself can create another mapping.

Right.

>
>> But if
>> it's not the case, we need to recognize which is the proper one..
>>
>> Simply preferring a writable vma to a read-only vma is what's came to my
>> head now.  Do you have an idea?
>
> So far I think that trace_uprobes.c should not play games with vma. At all.

Yes, playing with vma is fragile.  But otherwise how can we get the
address from the file+offset in random processes?

>
>> > -------------------------------------------------------------------------------
>> > Can't we simply implement get_user_vaddr() as
>> >
>> > 	static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> > 	{
>> > 		void __user *vaddr = (void __force __user *)addr;
>> >
>> > 		/* A NULL tu means that we already got the vaddr */
>> > 		if (tu)
>> > 			vaddr += (current->mm->start_data & PAGE_MASK);
>> >
>> > 		return vaddr;
>> > 	}
>> >
>> > ?
>> >
>> > I did this change, and now the test-case above works. And it also works
>> > with "cc -pie -fPIC",
>> >
>> > 	# nm foo | grep -w global
>> > 	0000000000200c9c D global
>> >
>> > 	# perf probe -x ./foo -a "func var=@0xc9c:u32"
>> > 	# perf record -e probe_foo:func ./foo
>> > 	...
>> > 	# perf script | tail -1
>> > 		foo   576 [001]   475.519940: probe_foo:func: (7ffe95ca3814) var=4321
>> >
>> > What do you think?
>>
>> This can only work with the probes fetching data from the executable,
>> right?  But as I said it should support any other binaries too.
>
> See above, we can't in general read other binaries.

Okay, I need to clarify my words.  I'm not saying about "cross-fetch"
here, what I wanted to say is adding a probe in some dso and fetch data
from the dso.

Primary usecase I have in mind is supporting SDTs in the perf probe
tool.  Currently many libraries including glibc add tracepoints (SDTs)
within themselves to be traced/profilied easily.

You can see Hemant's work on this here:

  https://lkml.org/lkml/2013/10/18/274

>
> But: if we know know where it is mmapped we can do this, just we need
> to calculate the right addr passed to trace_uprobes.
>
> Or: we should support both absolute and relative addresses, this is what
> I was going to discuss later.

But I guess this "specifying address directly" is hard to apply to
multiple processes - like system-wide tracing in perf.

>
>> static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> {
>> 	unsigned long pgoff = addr >> PAGE_SHIFT;
>> 	struct vm_area_struct *vma, *orig_vma = NULL;
>> 	unsigned long vaddr = 0;
>>
>> 	if (tu == NULL) {
>> 		/* A NULL tu means that we already got the vaddr */
>> 		return (void __force __user *) addr;
>> 	}
>>
>> 	down_read(&current->mm->mmap_sem);
>>
>> 	vma = current->mm->mmap;
>
> Cough, it can be null if another thread does munmap(0, TASK_SIZE) ;)
>
> But this doesn't matter.

:)

>
>> 	do {
>> 		if (!vma->vm_file || vma->vm_file->f_inode != tu->inode) {
>> 			/*
>> 			 * We found read-only mapping for this inode.
>> 			 * (provided that all mappings for this inode
>> 			 * have consecutive addresses)
>> 			 */
>> 			if (orig_vma)
>> 				break;
>> 			continue;
>> 		}
>>
>> 		if (vma->vm_pgoff > pgoff ||
>> 		    (vma->vm_pgoff + vma_pages(vma) <= pgoff))
>> 			continue;
>>
>> 		orig_vma = vma;
>>
>> 		/*
>> 		 * We prefer writable mapping over read-only since
>> 		 * data is usually in read/write memory region.  But
>> 		 * in case of read-only data, it only can be found in
>> 		 * read-only mapping so we save orig_vma and check
>> 		 * whether it also has writable mapping.
>> 		 */
>> 		if (vma->vm_flags & VM_WRITE)
>> 			break;
>> 	} while ((vma = vma->vm_next) != NULL);
>>
>> 	if (orig_vma)
>> 		vaddr = offset_to_vaddr(orig_vma, addr);
>>
>> 	up_read(&current->mm->mmap_sem);
>>
>> 	return (void __force __user *) vaddr;
>> }
>
> For what? Why it is better then my suggestion?

Just to support fetching (not cross-fetching!) from other binaries
(dsos) other than an executable.

>
> How it can read bss? How it can read the data from other binaries?

Yes, it'd fail if bss resides in a separate vma. :-/

>
> How we can trust the result? This code relies on some guesses and
> none of them are "strict".
>
> If nothing else, elf can have the arbitrary number of mmaped sections,
> this can't work in general?

These two are still problems to be solved.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 15:51       ` Oleg Nesterov
  2013-11-04 16:22         ` Oleg Nesterov
@ 2013-11-05  1:59         ` Namhyung Kim
  1 sibling, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  1:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 16:51:31 +0100, Oleg Nesterov wrote:
> On 11/04, Namhyung Kim wrote:
>> On Mon, 04 Nov 2013 17:46:41 +0900, Namhyung Kim wrote:
>> > On Sat, 2 Nov 2013 16:54:58 +0100, Oleg Nesterov wrote:
>> >> - this only allows to read the data from the same binary.
>> >
>> > Right.  This is also an unnecessary restriction.  We should be able to
>> > access data in other binary.
>>
>> Hmm.. I guess this gonna be not simple
>
> Yes ;)
>
> - perhaps it can only be
>> supported for per-process uprobe with known virtual addresses?
>
> "Known" is very limited. Even in the simplest case (like your test-case
> from from 0/13), you simply can't know the address of "int glob" if you
> compile it with "-pie -fPIC".
>
> As for other binaries (say libc) the problem is even worse, and
> randomize_va_space adds even more pain.

Hmm.. right.  We should deal with relative addresses from the base
mapping address of a binary.

>
> But in any case, I strongly believe that it doesn't make any sense to
> rely on tu->inode in get_user_vaddr().

I'll think about it more.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions
  2013-11-04 16:09   ` Oleg Nesterov
@ 2013-11-05  2:10     ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  2:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 17:09:14 +0100, Oleg Nesterov wrote:
> See my replies to 0/13. Lets assume that you agree that get_user_vaddr()
> doesn't need tu->inode.

Okay.

>
> On 10/29, Namhyung Kim wrote:
>>
>> This argument is for passing private data structure to each fetch
>> function and will be used by uprobes.
>
> In this case, why do we need this "void *priv"? It actually becomes
> "bool need_addr_translation".

Right.  I added it because the deref method is used for both of kprobes
and uprobes and it only needed for uprobes.  And I thought if we need
something for kprobes later, it can be reused so make it general.

>
> Can't we avoid it? Can't we just add FETCH_MTD_memory_notranslate ?
> kprobes should use the same methods for FETCH_MTD_memory*, uprobes
> should obviously adjust the addr in FETCH_MTD_memory.
>
> Then (afaics) we need a single change in parse_probe_arg(),
>
> 	-	dprm->fetch = t->fetch[FETCH_MTD_memory];
> 	+	dprm->fetch = t->fetch[FETCH_MTD_memory_notranslate];
>
> Yes, this will blow *probes_fetch_type_table[], but looks simpler.

Looks good to me too, thanks! :)

>
> And. This way it would be simple to teach parse_probe_arg('@') to use
> _notranslate, say, "@=addr" or whatever.

Yeah, it'll be useful at least for fetching data in anon pages.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 16:22         ` Oleg Nesterov
  2013-11-04 18:47           ` Oleg Nesterov
@ 2013-11-05  2:15           ` Namhyung Kim
  2013-11-05 16:33             ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  2:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 17:22:29 +0100, Oleg Nesterov wrote:
> On 11/04, Oleg Nesterov wrote:
>>
>> But in any case, I strongly believe that it doesn't make any sense to
>> rely on tu->inode in get_user_vaddr().
>
> Hmm. But I forgot about the case when you probe the function in libc
> and want to dump the variable in libc...

Right.  Actually that's what I really wanted.

>
> So probably I was wrong and this all needs more thinking. Damn.

:)

> Perhaps we really need to pass @file/offset, but it is not clear what
> we can do with bss/anon-mapping.

The @file/offset should work with bss since data in bss is accessed via
an offset in the program, but still anon-mapping has nothing to do with
it.  Hmm...

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-11-04 16:44       ` Oleg Nesterov
  2013-11-04 17:17         ` Steven Rostedt
@ 2013-11-05  2:17         ` Namhyung Kim
  1 sibling, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  2:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 17:44:31 +0100, Oleg Nesterov wrote:
> On 11/04, Namhyung Kim wrote:
>>
>> On Thu, 31 Oct 2013 19:22:18 +0100, Oleg Nesterov wrote:
>> > On 10/29, Namhyung Kim wrote:
>> >>
>> >> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> >> +{
>> >> +	unsigned long pgoff = addr >> PAGE_SHIFT;
>> >> +	struct vm_area_struct *vma;
>> >> +	struct address_space *mapping;
>> >> +	unsigned long vaddr = 0;
>> >> +
>> >> +	if (tu == NULL) {
>> >> +		/* A NULL tu means that we already got the vaddr */
>> >> +		return (void __force __user *) addr;
>> >> +	}
>> >> +
>> >> +	mapping = tu->inode->i_mapping;
>> >> +
>> >> +	mutex_lock(&mapping->i_mmap_mutex);
>> >> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
>> >> +		if (vma->vm_mm != current->mm)
>> >> +			continue;
>> >> +		if (!(vma->vm_flags & VM_READ))
>> >> +			continue;
>> >> +
>> >> +		vaddr = offset_to_vaddr(vma, addr);
>> >> +		break;
>> >> +	}
>> >> +	mutex_unlock(&mapping->i_mmap_mutex);
>> >> +
>> >> +	WARN_ON_ONCE(vaddr == 0);
>> >
>> > Hmm. But unless I missed something this "addr" passed as an argument can
>> > be wrong? And if nothing else this or another thread can unmap the vma?
>>
>> You mean WARN_ON_ONCE here is superfluous?  I admit that it should
>> protect concurrent vma [un]mappings.  Please see my reply in other
>> thread for a new approach.
>
> Whatever we do this address can be unmapped. For example, just because of
> @invalid_address passed to trace_uprobe.c.
>
> We do not really care, copy_from_user() should fail. But we should not
> WARN() in this case.

Okay, I see.  Will remove it in the next spin.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 12/13] tracing/uprobes: Add more fetch functions
  2013-11-04 17:17         ` Steven Rostedt
@ 2013-11-05  2:19           ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  2:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Steve,

On Mon, 4 Nov 2013 12:17:06 -0500, Steven Rostedt wrote:
> On Mon, 4 Nov 2013 17:44:31 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
>> On 11/04, Namhyung Kim wrote:
>> >
>> > On Thu, 31 Oct 2013 19:22:18 +0100, Oleg Nesterov wrote:
>> > > On 10/29, Namhyung Kim wrote:
>> > >>
>> > >> +static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
>> > >> +{
>> > >> +	unsigned long pgoff = addr >> PAGE_SHIFT;
>> > >> +	struct vm_area_struct *vma;
>> > >> +	struct address_space *mapping;
>> > >> +	unsigned long vaddr = 0;
>> > >> +
>> > >> +	if (tu == NULL) {
>> > >> +		/* A NULL tu means that we already got the vaddr */
>> > >> +		return (void __force __user *) addr;
>> > >> +	}
>> > >> +
>> > >> +	mapping = tu->inode->i_mapping;
>> > >> +
>> > >> +	mutex_lock(&mapping->i_mmap_mutex);
>> > >> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
>> > >> +		if (vma->vm_mm != current->mm)
>> > >> +			continue;
>> > >> +		if (!(vma->vm_flags & VM_READ))
>> > >> +			continue;
>> > >> +
>> > >> +		vaddr = offset_to_vaddr(vma, addr);
>> > >> +		break;
>> > >> +	}
>> > >> +	mutex_unlock(&mapping->i_mmap_mutex);
>> > >> +
>> > >> +	WARN_ON_ONCE(vaddr == 0);
>> > >
>> > > Hmm. But unless I missed something this "addr" passed as an argument can
>> > > be wrong? And if nothing else this or another thread can unmap the vma?
>> >
>> > You mean WARN_ON_ONCE here is superfluous?  I admit that it should
>> > protect concurrent vma [un]mappings.  Please see my reply in other
>> > thread for a new approach.
>> 
>> Whatever we do this address can be unmapped. For example, just because of
>> @invalid_address passed to trace_uprobe.c.
>> 
>> We do not really care, copy_from_user() should fail. But we should not
>> WARN() in this case.
>> 
>
> I agree, the WARN_ON_ONCE() above looks like it's uncalled for.
> WARN()ings should only be used when an anomaly in the kernel logic is
> detected. Can this trigger on bad input from user space, or something
> else that userspace does? (a race with unmapping memory?). If so, error
> out to the user process, but do not call any of the WARN() functions.

Will do.  Thanks for the explanation.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 18:47           ` Oleg Nesterov
  2013-11-04 18:57             ` Oleg Nesterov
@ 2013-11-05  2:49             ` Namhyung Kim
  2013-11-05  6:58             ` Namhyung Kim
  2 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  2:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 19:47:41 +0100, Oleg Nesterov wrote:
> On 11/04, Oleg Nesterov wrote:
>>
>> On 11/04, Oleg Nesterov wrote:
>> >
>> > But in any case, I strongly believe that it doesn't make any sense to
>> > rely on tu->inode in get_user_vaddr().
>>
>> Hmm. But I forgot about the case when you probe the function in libc
>> and want to dump the variable in libc...
>>
>> So probably I was wrong and this all needs more thinking. Damn.
>> Perhaps we really need to pass @file/offset, but it is not clear what
>> we can do with bss/anon-mapping.
>
> Or. Not that I really like this, but just for discussion...
>
> How about
>
> 	static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr)
> 	{
> 		return (void __force __user *)addr + instruction_pointer(regs);
> 	}
>
> ?
>
> This should solve the problems with relocations/randomization/bss.

Right.  I think this approach is more reliable than playing with vma.

>
> The obvious disadvantage is that it is not easy to calculate the
> offset we need to pass as an argument, it depends on the probed
> function.

Well, maybe it's not that hard if we use symbol address in elf image
rather than file offset for the data.

IOW we can get the base mapping address by subtracting tu->offset from
instruction pointer.  And then adding symbol address of the data should
give us the final virtual address, yay!

I'll try to play with it after lunch.

>
> And this still doesn't allow to, say, probe the executable but read
> the data from libc. Unless, again, we attach to the running process
> or randomize_va_space = 0, so we can know it in advance. But otherwise
> I do not think there is any solution.

Yes, cross-fetching is hard, let's go lunch. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 18:57             ` Oleg Nesterov
@ 2013-11-05  2:51               ` Namhyung Kim
  2013-11-05 16:41                 ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  2:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Mon, 4 Nov 2013 19:57:54 +0100, Oleg Nesterov wrote:
> On 11/04, Oleg Nesterov wrote:
>>
>> On 11/04, Oleg Nesterov wrote:
>> >
>> > On 11/04, Oleg Nesterov wrote:
>> > >
>> > > But in any case, I strongly believe that it doesn't make any sense to
>> > > rely on tu->inode in get_user_vaddr().
>> >
>> > Hmm. But I forgot about the case when you probe the function in libc
>> > and want to dump the variable in libc...
>> >
>> > So probably I was wrong and this all needs more thinking. Damn.
>> > Perhaps we really need to pass @file/offset, but it is not clear what
>> > we can do with bss/anon-mapping.
>>
>> Or. Not that I really like this, but just for discussion...
>>
>> How about
>>
>> 	static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr)
>> 	{
>> 		return (void __force __user *)addr + instruction_pointer(regs);
>> 	}
>>
>> ?
>>
>> This should solve the problems with relocations/randomization/bss.
>>
>> The obvious disadvantage is that it is not easy to calculate the
>> offset we need to pass as an argument, it depends on the probed
>> function.
>
> forgot to mention... and instruction_pointer() can't work in ret-probe,
> we need to pass the "unsigned long func" arg somehow...

Hmm.. what's the value of tu->offset in this case?  Does it have the
offset of the return address or the start of the function?

Anyway, what we really need is the base address of the text mapping
IMHO.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-04 18:47           ` Oleg Nesterov
  2013-11-04 18:57             ` Oleg Nesterov
  2013-11-05  2:49             ` Namhyung Kim
@ 2013-11-05  6:58             ` Namhyung Kim
  2013-11-05 17:45               ` Oleg Nesterov
  2 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-05  6:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

This is what I have for now:

static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr,
				   struct trace_uprobe *tu)
{
	unsigned long base_addr;
	unsigned long vaddr;

	base_addr = instruction_pointer(regs) - tu->offset;
	vaddr = base_addr + addr;

	return (void __force __user *) vaddr;
}

When I tested it, it was able to fetch global and bss data from both of
executable and library properly.  But it still doesn't work for uretprobes
as you said before.

  # perf probe -x ./uprobe-test -a "t1=test1 bss=@0x203000:s32 global=@0x201250:s32 str=@0x201254:string"
  # perf probe -x ./uprobe-test -a "t2=test2 bss=@0x203000:s32 global=@0x201250:s32 str=@0x201254:string"
  # perf probe -x ./uprobe-test -a "t3=test3 bss=@0x203000:s32 global=@0x201250:s32 str=@0x201254:string"
  # perf probe -x ./libfoo.so -a "t4=foo1 bar=@0x201258:s32 baz=@0x203000:s32"
  # perf probe -x ./libfoo.so -a "t5=foo2 bar=@0x201258:s32 baz=@0x203000:s32"
  # perf probe -x ./libfoo.so -a "t6=foo3 bar=@0x201258:s32 baz=@0x203000:s32"
  # perf record -e probe_uprobe:* -e probe_libfoo:* -- ./uprobe-test

  # perf script | grep -v ^#
     uprobe-test  2997 [002] 13108.308952: probe_uprobe:t1: (400660) bss=0 global=1 str="hello uprobe"
     uprobe-test  2997 [002] 13108.322479: probe_uprobe:t2: (400666) bss=0 global=2 str="hello uprobe"
     uprobe-test  2997 [002] 13108.335552: probe_uprobe:t3: (40066c) bss=1 global=2 str="hello uprobe"
     uprobe-test  2997 [002] 13108.342182: probe_libfoo:t4: (7f5eb977b798) bar=7 baz=0
     uprobe-test  2997 [002] 13108.348982: probe_libfoo:t5: (7f5eb977b79e) bar=8 baz=0
     uprobe-test  2997 [002] 13108.356041: probe_libfoo:t6: (7f5eb977b7a4) bar=8 baz=9


As you can see symbol offset passed to the uprobes now look like 0x203000
since it's the difference to the base mapping address.  For a dso, it's same
as the symbol value, but for an executable the symbol value would be larger
value like 0x603000 since the text segment would be mapped to 0x400000.
But still the difference is same, and I believe this applies to the
randomization too.

This symbol offset calculation was done in the getsymoff which implemented
like below (I'm sure there's a much simpler way to do this, but ...).

And I revised my toy test program like this:


/* ----- 8< ----- test.c ----- 8< ----- */
#include <stdio.h>
#include <stdlib.h>

int global = 1;
char str[] = "hello uprobe";
int bss __attribute__((aligned(4096)));

/* this came from libfoo.so */
extern void foo(void);

void test1(void)
{
  /* only for adding probe */
}

void test2(void)
{
  /* only for adding probe */
}

void test3(void)
{
  /* only for adding probe */
}

int main(void)
{
  int local = 3;
  char buf[128];

  test1();
  global = 2;
  test2();
  bss = 1;
  test3();
  foo();
  //  snprintf(buf, sizeof(buf), "cat /proc/%d/maps", getpid());
  //  system(buf);
  return 0;
}


/* ----- 8< ----- foo.c ----- 8< ----- */
int bar = 7;
int baz __attribute__((aligned(4096)));

void foo1(void)
{
  /* only for adding probe */
}

void foo2(void)
{
  /* only for adding probe */
}

void foo3(void)
{
  /* only for adding probe */
}

void foo(void)
{
  foo1();
  bar = 8;
  foo2();
  baz = 9;
  foo3();
}


/* ----- 8< ----- Makefile ----- 8< ----- */
PERF=/home/namhyung/project/linux/tools/perf/perf
GETSYMOFF=./getsymoff

define make-args
$(eval ARG1 := $(shell echo "bss=@`${GETSYMOFF} uprobe-test bss`:s32"))
$(eval ARG2 := $(shell echo "global=@`${GETSYMOFF} uprobe-test global`:s32"))
$(eval ARG3 := $(shell echo "str=@`${GETSYMOFF} uprobe-test str`:string"))
$(eval ARG4 := $(shell echo "bar=@`${GETSYMOFF} libfoo.so bar`:s32"))
$(eval ARG5 := $(shell echo "baz=@`${GETSYMOFF} libfoo.so baz`:s32"))
endef

all: uprobe-test

uprobe-test: test.c foo.c
	gcc -shared -g -fpic -o libfoo.so foo.c
	gcc -g -o $@ test.c -Wl,-rpath,. -L. -lfoo

getsymoff: getsymoff.c
	gcc -g -o $@ getsymoff.c -lelf

test: uprobe-test getsymoff
	$(call make-args)
	${PERF} probe -x ./uprobe-test -a "t1=test1 ${ARG1} ${ARG2} ${ARG3}"
	${PERF} probe -x ./uprobe-test -a "t2=test2 ${ARG1} ${ARG2} ${ARG3}"
	${PERF} probe -x ./uprobe-test -a "t3=test3 ${ARG1} ${ARG2} ${ARG3}"
	${PERF} probe -x ./libfoo.so -a "t4=foo1 ${ARG4} ${ARG5}"
	${PERF} probe -x ./libfoo.so -a "t5=foo2 ${ARG4} ${ARG5}"
	${PERF} probe -x ./libfoo.so -a "t6=foo3 ${ARG4} ${ARG5}"
	${PERF} record -e probe_uprobe:* -e probe_libfoo:* -- ./uprobe-test
	${PERF} script | grep -v ^#
	${PERF} probe -d probe_uprobe:*
	${PERF} probe -d probe_libfoo:*

clean:
	rm -f uprobe-test libfoo.so getsymoff *.o *~
#	${PERF} probe -d probe_uprobe:* -d probe_libfoo:*


/* ----- 8< ----- getsymoff.c ----- 8< ----- */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <gelf.h>

struct sym {
	unsigned long addr;
	unsigned long size;
	char *name;
};

#define SYMTAB_GROW  16

struct symtab {
	struct sym *sym;
	size_t nr_sym;
	size_t nr_alloc;
};

static struct symtab symtab;

static unsigned long base_addr;

static void usage(void)
{
	printf("Usage: %s <file> <symbol>\n", program_invocation_short_name);
	exit(0);
}

static int symsort(const void *a, const void *b)
{
	const struct sym *syma = a;
	const struct sym *symb = b;
	return strcmp(syma->name, symb->name);
}

static int symfind(const void *a, const void *b)
{
	const struct sym *sym = b;
	return strcmp(a, sym->name);
}

static int get_base_addr(Elf *elf)
{
	GElf_Ehdr ehdr;
	GElf_Phdr phdr;
	size_t i;

	if (gelf_getehdr(elf, &ehdr) == NULL)
		return -1;

	for (i = 0; i < ehdr.e_phnum; i++) {
		if (gelf_getphdr(elf, i, &phdr) == NULL)
			return -1;
		if (phdr.p_type != PT_LOAD)
			continue;

		/* use first loadable segment for the base address */
		base_addr = phdr.p_vaddr - phdr.p_offset;
		return 0;
	}
	return -1;
}

static int load_symtab(Elf *elf)
{
	int ret = -1;
	size_t shstr_idx;
	Elf_Scn *shstr_sec, *sym_sec, *str_sec;
	Elf_Data *shstr_data, *sym_data, *str_data;
	Elf_Scn *sec;
	Elf_Data *data;
	size_t i, nr_sym;

	if (elf_getshdrstrndx(elf, &shstr_idx) < 0)
		goto error;

	shstr_sec = elf_getscn(elf, shstr_idx);
	if (shstr_sec == NULL)
		goto error;

	shstr_data = elf_getdata(shstr_sec, NULL);
	if (shstr_data == NULL)
		goto error;

	sec = sym_sec = str_sec = NULL;
	while ((sec = elf_nextscn(elf, sec)) != NULL) {
		char *shstr;
		GElf_Shdr shdr;

		if (gelf_getshdr(sec, &shdr) == NULL)
			goto error;

		shstr = ((char *)shstr_data->d_buf) + shdr.sh_name;

		if (strcmp(shstr, ".symtab") == 0) {
			sym_sec = sec;
			nr_sym = shdr.sh_size / shdr.sh_entsize;
		}
		if (strcmp(shstr, ".strtab") == 0)
			str_sec = sec;
	}

	if (sym_sec == NULL || str_sec == NULL) {
		printf("%s: cannot find symbol information.  Is it stripped?\n",
		       __func__);
		goto out;
	}

	sym_data = elf_getdata(sym_sec, NULL);
	str_data = elf_getdata(str_sec, NULL);

	if (sym_data == NULL || str_data == NULL) {
		printf("%s: cannot find symbol information\n", __func__);
		goto error;
	}

	symtab.sym = NULL;
	symtab.nr_sym = 0;
	symtab.nr_alloc = 0;

	for (i = 0; i < nr_sym; i++) {
		GElf_Sym elf_sym;
		struct sym *sym;
		char *name;

		if (symtab.nr_sym >= symtab.nr_alloc) {
			symtab.nr_alloc += SYMTAB_GROW;
			symtab.sym = realloc(symtab.sym,
					     symtab.nr_alloc * sizeof(*sym));

			if (symtab.sym == NULL) {
				perror("load_symtab: realloc");
				goto out;
			}
		}
		if (gelf_getsym(sym_data, i, &elf_sym) == NULL)
			goto error;
		if (elf_sym.st_size == 0)
			continue;

		sym = &symtab.sym[symtab.nr_sym++];

		name = ((char *)str_data->d_buf) + elf_sym.st_name;
		sym->addr = elf_sym.st_value;
		sym->size = elf_sym.st_size;
		sym->name = strdup(name);
		if (sym->name == NULL) {
			perror("load_symtab: strdup");
			goto out;
		}	
	}

	qsort(symtab.sym, symtab.nr_sym, sizeof(*symtab.sym), symsort);
	ret = 0;

out:
	return ret;

error:
	printf("%s: %s\n", __func__, elf_errmsg(elf_errno()));
	goto out;
}

static struct sym * find_symtab(const char *name)
{
	return bsearch(name, symtab.sym, symtab.nr_sym,
		       sizeof(*symtab.sym), symfind);
}

static void unload_symtab(void)
{
	size_t i;

	for (i = 0; i < symtab.nr_sym; i++) {
		struct sym *sym = symtab.sym + i;
		free(sym->name);
	}

	free(symtab.sym);
}

int main(int argc, char *argv[])
{
	int fd;
	char *filename;
	char *symbol;
	Elf *elf;
	struct sym *sym;

	if (argc < 3)
		usage();

	filename = argv[1];
	symbol = argv[2];

	elf_version(EV_CURRENT);

	fd = open(filename, O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 1;
	}

	elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
	if (elf == NULL) {
		printf("%s: %s\n", __func__, elf_errmsg(elf_errno()));
		goto out;
	}

	if (get_base_addr(elf) < 0)
		goto out_error;

	if (load_symtab(elf) < 0)
		goto out_error;

	sym = find_symtab(symbol);

	if (sym)
		printf("%#lx\n", sym->addr - base_addr);
	else
		printf("cannot find symbol: %s\n", symbol);

out_error:
	unload_symtab();
	elf_end(elf);
out:
	close(fd);
	return 0;
}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05  1:53       ` Namhyung Kim
@ 2013-11-05 16:28         ` Oleg Nesterov
  2013-11-06  8:31           ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-05 16:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/05, Namhyung Kim wrote:
>
> On Mon, 4 Nov 2013 16:01:12 +0100, Oleg Nesterov wrote:
> > Or the syntax should be "name=probe @file/addr" or something like this.
>
> Okay.  Let's call this kind of thing "cross-fetch" (or a better name can
> be suggested).

Yes ;) and I am afraid there was some confusion in our discussion.
I probably confused "probe other binaries" with cross-fetch and vice
versa sometimes.

> This is more complex situation and needs more discussion
> as you said.  So let's skip the discussion for now. :)

Agreed.

> > So far I think that trace_uprobes.c should not play games with vma. At all.
>
> Yes, playing with vma is fragile.  But otherwise how can we get the
> address from the file+offset in random processes?

Yes, this is not as simple as I thought.

Let me repeat, somehow I completely forgot we need to probe other (libc)
binaries (not only the executable itself) and dump the data from that
binary. That is why I wrongly thought that that ->start_data trick can
work.

OK, I see other emails from you. Perhaps we can rely on instruction_pointer(),
(I'll write more emails on this), But if not, then we probably need tu->inode
and vma games in fetch/get_user_vaddr(). I'd still like to avoid this, if
possible, but I am no longer sure.


> >> > Can't we simply implement get_user_vaddr() as
> >> >
> >> > 	static void __user *get_user_vaddr(unsigned long addr, struct trace_uprobe *tu)
> >> > 	{
> >> > 		void __user *vaddr = (void __force __user *)addr;
> >> >
> >> > 		/* A NULL tu means that we already got the vaddr */
> >> > 		if (tu)
> >> > 			vaddr += (current->mm->start_data & PAGE_MASK);
> >> >
> >> > 		return vaddr;
> >> > 	}
> >> >
> >> This can only work with the probes fetching data from the executable,
> >> right?  But as I said it should support any other binaries too.
> >
> > See above, we can't in general read other binaries.
>
> Okay, I need to clarify my words.  I'm not saying about "cross-fetch"
> here, what I wanted to say is adding a probe in some dso and fetch data
> from the dso.
>
> [...snip...]

Yes, sorry for confusion, see above.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05  2:15           ` Namhyung Kim
@ 2013-11-05 16:33             ` Oleg Nesterov
  2013-11-06  8:34               ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-05 16:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/05, Namhyung Kim wrote:
>
> On Mon, 4 Nov 2013 17:22:29 +0100, Oleg Nesterov wrote:
> >
> > So probably I was wrong and this all needs more thinking. Damn.
>
> :)

Don't laugh at me ;)

> > Perhaps we really need to pass @file/offset, but it is not clear what
> > we can do with bss/anon-mapping.
>
> The @file/offset should work with bss since data in bss is accessed via
> an offset in the program,

Yes, yes, but it is still not clear to me how we can identify the "right"
vma which doesn't cover the .bss address but can be used to calculate the
offset.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05  2:51               ` Namhyung Kim
@ 2013-11-05 16:41                 ` Oleg Nesterov
  2013-11-06  8:37                   ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-05 16:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/05, Namhyung Kim wrote:
>
> On Mon, 4 Nov 2013 19:57:54 +0100, Oleg Nesterov wrote:
> >>
> >> 	static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr)
> >> 	{
> >> 		return (void __force __user *)addr + instruction_pointer(regs);
> >> 	}
> >>
> >> ?
> >>
> >> This should solve the problems with relocations/randomization/bss.
> >>
> >> The obvious disadvantage is that it is not easy to calculate the
> >> offset we need to pass as an argument, it depends on the probed
> >> function.
> >
> > forgot to mention... and instruction_pointer() can't work in ret-probe,
> > we need to pass the "unsigned long func" arg somehow...
>
> Hmm.. what's the value of tu->offset in this case?  Does it have the
> offset of the return address or the start of the function?

It is the offest of function. IOW, it is the same regardless of
is_ret_probe().

Ignoring probes_seq_show() we only need it for uprobe_register().

(yes, it is also used in uprobe_unregister/apply, but this is only
 because this API ugly and should be cleanuped).

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05  6:58             ` Namhyung Kim
@ 2013-11-05 17:45               ` Oleg Nesterov
  2013-11-05 19:24                 ` Oleg Nesterov
  2013-11-06  8:48                 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-05 17:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/05, Namhyung Kim wrote:
>
> This is what I have for now:
>
> static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr,
> 				   struct trace_uprobe *tu)
> {
> 	unsigned long base_addr;
> 	unsigned long vaddr;
>
> 	base_addr = instruction_pointer(regs) - tu->offset;
> 	vaddr = base_addr + addr;
>
> 	return (void __force __user *) vaddr;
> }
>
> When I tested it, it was able to fetch global and bss data from both of
> executable and library properly.

Heh ;) I didn't expect you will agree with this suggestion. But if you
think it can work - great!

Let me clarify just in case. Yes, _personally_ I think we should try
to avoid the vma games, and it looks better to me this way. But I won't
argue if you change your mind, I understand this approach has its own
disadvantages.

As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
the "@" argument anyway, why it can't also substruct this offset?

Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
in this case it needs another argument, not sure...

> But it still doesn't work for uretprobes
> as you said before.

This looks simple,

	+	if (is_ret_probe(tu)) {
	+		saved_ip = instruction_pointer(regs);
	+		instruction_pointer_set(func);
	+	}
		store_trace_args(...);
	+	if (is_ret_probe(tu))
	+		instruction_pointer_set(saved_ip);

although not pretty.

> This symbol offset calculation was done in the getsymoff which implemented
> like below (I'm sure there's a much simpler way to do this, but ...).

Perhaps I'll even try to read/understand it later, but this elf stuff is
the black magic to me ;)

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05 17:45               ` Oleg Nesterov
@ 2013-11-05 19:24                 ` Oleg Nesterov
  2013-11-06  8:57                   ` Namhyung Kim
  2013-11-06  8:48                 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-05 19:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/05, Oleg Nesterov wrote:
>
> As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
> the "@" argument anyway, why it can't also substruct this offset?
>
> Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
> in this case it needs another argument, not sure...

Or,

> 	+	if (is_ret_probe(tu)) {
> 	+		saved_ip = instruction_pointer(regs);
> 	+		instruction_pointer_set(func);
> 	+	}
> 		store_trace_args(...);
> 	+	if (is_ret_probe(tu))
> 	+		instruction_pointer_set(saved_ip);

we can put "-= tu->offset" here.

> although not pretty.

Yes.

Or. Perhaps we can leave "case '@'" in parse_probe_arg() and
FETCH_MTD_memory alone. You seem to agree that "absolute address"
can be useful anyway.

Instead, perhaps we can add FETCH_MTD_memory_do_fancy_addr_translation,
and, say, the new "case '*'" in parse_probe_arg() should add all the
neccessary info as f->data (like, say, FETCH_MTD_symbol).

But, just in case, I do not have a strong opinion. Just I think it
is better to discuss every choice we have.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05 16:28         ` Oleg Nesterov
@ 2013-11-06  8:31           ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-06  8:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

On Tue, 5 Nov 2013 17:28:20 +0100, Oleg Nesterov wrote:
> On 11/05, Namhyung Kim wrote:
>>
>> On Mon, 4 Nov 2013 16:01:12 +0100, Oleg Nesterov wrote:
>> > Or the syntax should be "name=probe @file/addr" or something like this.
>>
>> Okay.  Let's call this kind of thing "cross-fetch" (or a better name can
>> be suggested).
>
> Yes ;) and I am afraid there was some confusion in our discussion.
> I probably confused "probe other binaries" with cross-fetch and vice
> versa sometimes.

Sorry for not being clear enough.

>
>> > So far I think that trace_uprobes.c should not play games with vma. At all.
>>
>> Yes, playing with vma is fragile.  But otherwise how can we get the
>> address from the file+offset in random processes?
>
> Yes, this is not as simple as I thought.
>
> Let me repeat, somehow I completely forgot we need to probe other (libc)
> binaries (not only the executable itself) and dump the data from that
> binary. That is why I wrongly thought that that ->start_data trick can
> work.
>
> OK, I see other emails from you. Perhaps we can rely on instruction_pointer(),
> (I'll write more emails on this), But if not, then we probably need tu->inode
> and vma games in fetch/get_user_vaddr(). I'd still like to avoid this, if
> possible, but I am no longer sure.

Yes, I think it'll be necessary for the cross-fetch anyway.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05 16:33             ` Oleg Nesterov
@ 2013-11-06  8:34               ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-06  8:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Tue, 5 Nov 2013 17:33:41 +0100, Oleg Nesterov wrote:
> On 11/05, Namhyung Kim wrote:
>>
>> On Mon, 4 Nov 2013 17:22:29 +0100, Oleg Nesterov wrote:
>> >
>> > So probably I was wrong and this all needs more thinking. Damn.
>>
>> :)
>
> Don't laugh at me ;)

:)

(And now I'm feeling guilty..)

>
>> > Perhaps we really need to pass @file/offset, but it is not clear what
>> > we can do with bss/anon-mapping.
>>
>> The @file/offset should work with bss since data in bss is accessed via
>> an offset in the program,
>
> Yes, yes, but it is still not clear to me how we can identify the "right"
> vma which doesn't cover the .bss address but can be used to calculate the
> offset.

Yes, that's the open problem for the cross-fetch.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05 16:41                 ` Oleg Nesterov
@ 2013-11-06  8:37                   ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-06  8:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Tue, 5 Nov 2013 17:41:02 +0100, Oleg Nesterov wrote:
> On 11/05, Namhyung Kim wrote:
>>
>> On Mon, 4 Nov 2013 19:57:54 +0100, Oleg Nesterov wrote:
>> >>
>> >> 	static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr)
>> >> 	{
>> >> 		return (void __force __user *)addr + instruction_pointer(regs);
>> >> 	}
>> >>
>> >> ?
>> >>
>> >> This should solve the problems with relocations/randomization/bss.
>> >>
>> >> The obvious disadvantage is that it is not easy to calculate the
>> >> offset we need to pass as an argument, it depends on the probed
>> >> function.
>> >
>> > forgot to mention... and instruction_pointer() can't work in ret-probe,
>> > we need to pass the "unsigned long func" arg somehow...
>>
>> Hmm.. what's the value of tu->offset in this case?  Does it have the
>> offset of the return address or the start of the function?
>
> It is the offest of function. IOW, it is the same regardless of
> is_ret_probe().

Ah, okay.  Thanks for the info.  Then yes, it'd probably better to pass
the func rather than regs since it's the only info we need, right?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05 17:45               ` Oleg Nesterov
  2013-11-05 19:24                 ` Oleg Nesterov
@ 2013-11-06  8:48                 ` Namhyung Kim
  2013-11-06 16:28                   ` Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-06  8:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Tue, 5 Nov 2013 18:45:35 +0100, Oleg Nesterov wrote:
> On 11/05, Namhyung Kim wrote:
>>
>> This is what I have for now:
>>
>> static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr,
>> 				   struct trace_uprobe *tu)
>> {
>> 	unsigned long base_addr;
>> 	unsigned long vaddr;
>>
>> 	base_addr = instruction_pointer(regs) - tu->offset;
>> 	vaddr = base_addr + addr;
>>
>> 	return (void __force __user *) vaddr;
>> }
>>
>> When I tested it, it was able to fetch global and bss data from both of
>> executable and library properly.
>
> Heh ;) I didn't expect you will agree with this suggestion. But if you
> think it can work - great!

It seems to work for me well except the cross-fetch.

But I'm not sure it'll work for every cases.  It would be great if some
elf gurus come up and give some feedbacks.

Masami?

>
> Let me clarify just in case. Yes, _personally_ I think we should try
> to avoid the vma games, and it looks better to me this way. But I won't
> argue if you change your mind, I understand this approach has its own
> disadvantages.
>
> As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
> the "@" argument anyway, why it can't also substruct this offset?

Hmm.. it makes sense too. :)

>
> Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
> in this case it needs another argument, not sure...
>
>> But it still doesn't work for uretprobes
>> as you said before.
>
> This looks simple,
>
> 	+	if (is_ret_probe(tu)) {
> 	+		saved_ip = instruction_pointer(regs);
> 	+		instruction_pointer_set(func);
> 	+	}
> 		store_trace_args(...);
> 	+	if (is_ret_probe(tu))
> 	+		instruction_pointer_set(saved_ip);
>
> although not pretty.

So for normal non-uretprobes, func == instruction_pointer(), right?

If so, just passing func as you suggested looks better than this.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-05 19:24                 ` Oleg Nesterov
@ 2013-11-06  8:57                   ` Namhyung Kim
  2013-11-06 17:37                     ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-06  8:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Tue, 5 Nov 2013 20:24:01 +0100, Oleg Nesterov wrote:
> On 11/05, Oleg Nesterov wrote:
>>
>> As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
>> the "@" argument anyway, why it can't also substruct this offset?
>>
>> Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
>> in this case it needs another argument, not sure...
>
> Or,
>
>> 	+	if (is_ret_probe(tu)) {
>> 	+		saved_ip = instruction_pointer(regs);
>> 	+		instruction_pointer_set(func);
>> 	+	}
>> 		store_trace_args(...);
>> 	+	if (is_ret_probe(tu))
>> 	+		instruction_pointer_set(saved_ip);
>
> we can put "-= tu->offset" here.

I don't think I get the point.

>
> Or. Perhaps we can leave "case '@'" in parse_probe_arg() and
> FETCH_MTD_memory alone. You seem to agree that "absolute address"
> can be useful anyway.

Yes, but it's only meaningful to process-wide tracing sessions IMHO.

>
> Instead, perhaps we can add FETCH_MTD_memory_do_fancy_addr_translation,
> and, say, the new "case '*'" in parse_probe_arg() should add all the
> neccessary info as f->data (like, say, FETCH_MTD_symbol).

Could you elaborate this more?

>
> But, just in case, I do not have a strong opinion. Just I think it
> is better to discuss every choice we have.

Okay.  I really appreciate your reviews.


Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-06  8:48                 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
@ 2013-11-06 16:28                   ` Oleg Nesterov
  2013-11-07  7:33                     ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-06 16:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/06, Namhyung Kim wrote:
>
> On Tue, 5 Nov 2013 18:45:35 +0100, Oleg Nesterov wrote:
> > On 11/05, Namhyung Kim wrote:
> >>
> >> This is what I have for now:
> >>
> >> static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr,
> >> 				   struct trace_uprobe *tu)
> >> {
> >> 	unsigned long base_addr;
> >> 	unsigned long vaddr;
> >>
> >> 	base_addr = instruction_pointer(regs) - tu->offset;
> >> 	vaddr = base_addr + addr;
> >>
> >> 	return (void __force __user *) vaddr;
> >> }
> >>
> >> When I tested it, it was able to fetch global and bss data from both of
> >> executable and library properly.
> >
> > Heh ;) I didn't expect you will agree with this suggestion. But if you
> > think it can work - great!
>
> It seems to work for me well except the cross-fetch.

Yes, but cross-fetching needs something different anyway, so I think we
should discuss this separately.

> But I'm not sure it'll work for every cases.

I think "ip - tu->offset + vaddr" trick should always work, just we need
to calculate this "vaddr" passed as an argument correctly.

Except: user-space can create another executable mapping and call the
probed function via another address, but I think we can ignore this.
And I think we can do nothing in this case, because in this case we
can't even rely on tu->inode.

But,

> It would be great if some
> elf gurus come up and give some feedbacks.
>
> Masami?

Yes.

> > As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
> > the "@" argument anyway, why it can't also substruct this offset?
>
> Hmm.. it makes sense too. :)

I am no longer sure ;)

This way the "@" argument will look more confusing, it will depend on the
address/offset of the probed insn. But again, I do not know, this is up
to you.

> >> But it still doesn't work for uretprobes
> >> as you said before.
> >
> > This looks simple,
> >
> > 	+	if (is_ret_probe(tu)) {
> > 	+		saved_ip = instruction_pointer(regs);
> > 	+		instruction_pointer_set(func);
> > 	+	}
> > 		store_trace_args(...);
> > 	+	if (is_ret_probe(tu))
> > 	+		instruction_pointer_set(saved_ip);
> >
> > although not pretty.
>
> So for normal non-uretprobes, func == instruction_pointer(), right?

No, for normal non-uretprobes func == 0 (actually, undefined).

> If so, just passing func as you suggested looks better than this.

Not sure I understand... OK, we can change uprobe_trace_func() and
uprobe_perf_func()

		if (!is_ret_probe(tu))
	-		uprobe_trace_print(tu, 0, regs);
	+		uprobe_trace_print(tu, instruction_pointer(regs), regs);
		return 0;

but why?

We need the "saved_ip" ugly hack above only if is_ret_probe() == T and
thus instruction_pointer() doesn't match the address of the probed function.
And there is no way to pass some additional info to call_fetch/etc from
uprobe_*_print().

See also another email...

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-06  8:57                   ` Namhyung Kim
@ 2013-11-06 17:37                     ` Oleg Nesterov
  2013-11-06 18:24                       ` Oleg Nesterov
  2013-11-07  8:48                       ` Namhyung Kim
  0 siblings, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-06 17:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/06, Namhyung Kim wrote:
>
> On Tue, 5 Nov 2013 20:24:01 +0100, Oleg Nesterov wrote:
> > On 11/05, Oleg Nesterov wrote:
> >>
> >> As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
> >> the "@" argument anyway, why it can't also substruct this offset?
> >>
> >> Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
> >> in this case it needs another argument, not sure...
> >
> > Or,
> >
> >> 	+	if (is_ret_probe(tu)) {
> >> 	+		saved_ip = instruction_pointer(regs);
> >> 	+		instruction_pointer_set(func);
> >> 	+	}
> >> 		store_trace_args(...);
> >> 	+	if (is_ret_probe(tu))
> >> 	+		instruction_pointer_set(saved_ip);
> >
> > we can put "-= tu->offset" here.
>
> I don't think I get the point.

I meant,

		saved_ip = instruction_pointer(regs);

		// pass the "ip" which was used to calculate
		// the @addr argument to fetch_*() methods

		temp_ip = is_ret_probe(tu) ? func : saved_ip;
		temp_ip -= tu->offset;
		instruction_pointer_set(temp_ip);

		store_trace_args(...);

		instruction_pointer_set(saved_ip);

This way we can avoid the new "void *" argument for fetch_func_t,
we do not need it to calculate the address.

But: we still need the additional "bool translate_vaddr" to solve
the problems with FETCH_MTD_deref.

We already discussed this a bit, previously I suggested the new
FETCH_MTD_memory_notranslate and

        -       dprm->fetch = t->fetch[FETCH_MTD_memory];
        +       dprm->fetch = t->fetch[FETCH_MTD_memory_notranslate];

change in parse_probe_arg().

However, now I think it would be more clean to leave FETCH_MTD_memory
alone and add FETCH_MTD_memory_dotranslate instead.

So trace_uprobes.c should define

	void FETCH_FUNC_NAME(memory, type)(addr, ...)
	{
		copy_from_user((void __user *)addr);
	}

	void FETCH_FUNC_NAME(memory_dotranslate, type)(addr, ...)
	{
		void __user *uaddr = get_user_vaddr(regs, addr);
		copy_from_user(uaddr);
	}

Then,

> > Or. Perhaps we can leave "case '@'" in parse_probe_arg() and
> > FETCH_MTD_memory alone. You seem to agree that "absolute address"
> > can be useful anyway.
>
> Yes, but it's only meaningful to process-wide tracing sessions IMHO.

Yes, yes, sure.

I meant, we need both. Say, "perf probe "func global=@addr" means
FETCH_MTD_memory, and "perf probe "func global=*addr" means
FETCH_MTD_memory_dotranslate.

Just in case, of course I do not care about the syntax, for example we
can use "@~addr" for translate (or not translate) or whatever.

My only point: I think we need both to

	1. avoid the new argument in fetch_func_t

	2. allow the dump the data from the absolute address

And just to simplify the discussion, lets assume we use "*addr" for
FETCH_MTD_memory_dotranslate and thus parse_probe_arg() gets the new

	case '*':
		if (is_kprobe)
			return -EINVAL;

		kstrtoul(arg + 1, 0, &param);
		f->fn = t->fetch[FETCH_MTD_memory_dotranslate];
		f->data = (void *)param;
		break;
		
branch.

> > Instead, perhaps we can add FETCH_MTD_memory_do_fancy_addr_translation,
> > and, say, the new "case '*'" in parse_probe_arg() should add all the
> > neccessary info as f->data (like, say, FETCH_MTD_symbol).
>
> Could you elaborate this more?

Yes, I was confusing sorry.

As for FETCH_MTD_memory_do_fancy_addr_translation, please see above.

As for "neccessary info as f->data". Suppose that we still have a reason
for the additional argument in FETCH_MTD_memory_dotranslate method. Even
in this case I don't think we should change the signature of fetch_func_t.

What I think we can do is something like

	1. Changed parse_probe_arg() to accept "struct trace_uprobe *tu"
	   instead of is_kprobe. Naturally, !tu can be used instead.

	2. Introduce

		struct dotranslate_fetch_param {
			struct trace_uprobe	*tu;
			fetch_func_t		fetch;
			fetch_func_t		fetch_size;
		};

	3. Change the "case '*'" above to do

		case '*':
			if (!tu)
				return -EINVAL;

			struct dotranslate_fetch_param *xxx = kmalloc(..);

			xxx->fetch = t->fetch[FETCH_MTD_memory];

			// ... kstrtoul, fetch_size, etc, ...

			f->fn = t->fetch[FETCH_MTD_memory_dotranslate];
			f->data = (void *)xxx;

	4. Update traceprobe_free_probe_arg/etc.

	5. Now,
	
		void FETCH_FUNC_NAME(memory_dotranslate, type)(addr, ...)
		{
			struct dotranslate_fetch_param *xxx = data;
			void __user *uaddr = get_user_vaddr(regs, addr, tu);

			xxx->fetch(regs, addr, dest);
		}

Yes, yes, I am sure I missed something and this is not that simple,
I am new to this "fetch" code.

And even if I am right, let me repeat that I am not going to argue.
Well, at least too much ;) This looks better in my opinion, but this
is always subjective, so please free to ignore.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-06 17:37                     ` Oleg Nesterov
@ 2013-11-06 18:24                       ` Oleg Nesterov
  2013-11-07  9:00                         ` Namhyung Kim
  2013-11-07  8:48                       ` Namhyung Kim
  1 sibling, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-06 18:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Forgot to mention,

On 11/06, Oleg Nesterov wrote:
>
> I meant,
>
> 		saved_ip = instruction_pointer(regs);
>
> 		// pass the "ip" which was used to calculate
> 		// the @addr argument to fetch_*() methods
>
> 		temp_ip = is_ret_probe(tu) ? func : saved_ip;
> 		temp_ip -= tu->offset;
> 		instruction_pointer_set(temp_ip);
>
> 		store_trace_args(...);

Note that instruction_pointer_set() is not really nice in any case,
this can obviously confuse FETCH_MTD_reg("ip").

But lets ignore this. The solution is simple, we can pass/use this
info via current->utask. We can either add the new member, or add
a union. Or simply reuse xol_vaddr. Doesn't matter.

So the only question is should we rely on instruction_pointer/func
to translate the address or we should do something else (say, vma).

So far I like this approach.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-06 16:28                   ` Oleg Nesterov
@ 2013-11-07  7:33                     ` Namhyung Kim
  2013-11-08 16:52                       ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-07  7:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

On Wed, 6 Nov 2013 17:28:06 +0100, Oleg Nesterov wrote:
> On 11/06, Namhyung Kim wrote:
>>
>> On Tue, 5 Nov 2013 18:45:35 +0100, Oleg Nesterov wrote:
>> > On 11/05, Namhyung Kim wrote:
>> >>
>> >> This is what I have for now:
>> >>
>> >> static void __user *get_user_vaddr(struct pt_regs *regs, unsigned long addr,
>> >> 				   struct trace_uprobe *tu)
>> >> {
>> >> 	unsigned long base_addr;
>> >> 	unsigned long vaddr;
>> >>
>> >> 	base_addr = instruction_pointer(regs) - tu->offset;
>> >> 	vaddr = base_addr + addr;
>> >>
>> >> 	return (void __force __user *) vaddr;
>> >> }
>> >>
>> >> When I tested it, it was able to fetch global and bss data from both of
>> >> executable and library properly.
>> >
>> > Heh ;) I didn't expect you will agree with this suggestion. But if you
>> > think it can work - great!
>>
>> It seems to work for me well except the cross-fetch.
>
> Yes, but cross-fetching needs something different anyway, so I think we
> should discuss this separately.

Okay.

>
>> But I'm not sure it'll work for every cases.
>
> I think "ip - tu->offset + vaddr" trick should always work, just we need
> to calculate this "vaddr" passed as an argument correctly.

Right.

>
> Except: user-space can create another executable mapping and call the
> probed function via another address, but I think we can ignore this.
> And I think we can do nothing in this case, because in this case we
> can't even rely on tu->inode.

Agreed.


>> > As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
>> > the "@" argument anyway, why it can't also substruct this offset?
>>
>> Hmm.. it makes sense too. :)
>
> I am no longer sure ;)
>
> This way the "@" argument will look more confusing, it will depend on the
> address/offset of the probed insn. But again, I do not know, this is up
> to you.

That said, I'd prefer the original "-= -tu->offset" approach.  It'll
make debugging easier IMHO.

>
>> >> But it still doesn't work for uretprobes
>> >> as you said before.
>> >
>> > This looks simple,
>> >
>> > 	+	if (is_ret_probe(tu)) {
>> > 	+		saved_ip = instruction_pointer(regs);
>> > 	+		instruction_pointer_set(func);
>> > 	+	}
>> > 		store_trace_args(...);
>> > 	+	if (is_ret_probe(tu))
>> > 	+		instruction_pointer_set(saved_ip);
>> >
>> > although not pretty.
>>
>> So for normal non-uretprobes, func == instruction_pointer(), right?
>
> No, for normal non-uretprobes func == 0 (actually, undefined).

Ah, okay.

>
>> If so, just passing func as you suggested looks better than this.
>
> Not sure I understand... OK, we can change uprobe_trace_func() and
> uprobe_perf_func()
>
> 		if (!is_ret_probe(tu))
> 	-		uprobe_trace_print(tu, 0, regs);
> 	+		uprobe_trace_print(tu, instruction_pointer(regs), regs);
> 		return 0;
>
> but why?
>
> We need the "saved_ip" ugly hack above only if is_ret_probe() == T and
> thus instruction_pointer() doesn't match the address of the probed function.
> And there is no way to pass some additional info to call_fetch/etc from
> uprobe_*_print().

Ah, I was confused that the 'func' is somewhat available in trace_uprobe
and it can make to avoid passing regs to get_user_vaddr().

Sorry for the noise.


> See also another email...

Will do.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-06 17:37                     ` Oleg Nesterov
  2013-11-06 18:24                       ` Oleg Nesterov
@ 2013-11-07  8:48                       ` Namhyung Kim
  2013-11-09  3:18                         ` Masami Hiramatsu
  1 sibling, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-07  8:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Wed, 6 Nov 2013 18:37:54 +0100, Oleg Nesterov wrote:
> On 11/06, Namhyung Kim wrote:
>>
>> On Tue, 5 Nov 2013 20:24:01 +0100, Oleg Nesterov wrote:
>> > On 11/05, Oleg Nesterov wrote:
>> >>
>> >> As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
>> >> the "@" argument anyway, why it can't also substruct this offset?
>> >>
>> >> Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
>> >> in this case it needs another argument, not sure...
>> >
>> > Or,
>> >
>> >> 	+	if (is_ret_probe(tu)) {
>> >> 	+		saved_ip = instruction_pointer(regs);
>> >> 	+		instruction_pointer_set(func);
>> >> 	+	}
>> >> 		store_trace_args(...);
>> >> 	+	if (is_ret_probe(tu))
>> >> 	+		instruction_pointer_set(saved_ip);
>> >
>> > we can put "-= tu->offset" here.
>>
>> I don't think I get the point.
>
> I meant,
>
> 		saved_ip = instruction_pointer(regs);
>
> 		// pass the "ip" which was used to calculate
> 		// the @addr argument to fetch_*() methods
>
> 		temp_ip = is_ret_probe(tu) ? func : saved_ip;
> 		temp_ip -= tu->offset;
> 		instruction_pointer_set(temp_ip);
>
> 		store_trace_args(...);
>
> 		instruction_pointer_set(saved_ip);
>
> This way we can avoid the new "void *" argument for fetch_func_t,
> we do not need it to calculate the address.

Okay, but as I said before, subtracting tu->offset part can be removed.

>
> But: we still need the additional "bool translate_vaddr" to solve
> the problems with FETCH_MTD_deref.
>
> We already discussed this a bit, previously I suggested the new
> FETCH_MTD_memory_notranslate and
>
>         -       dprm->fetch = t->fetch[FETCH_MTD_memory];
>         +       dprm->fetch = t->fetch[FETCH_MTD_memory_notranslate];
>
> change in parse_probe_arg().

Okay, I agree with you that adding one more fetch method will make
things simpler.

>
> However, now I think it would be more clean to leave FETCH_MTD_memory
> alone and add FETCH_MTD_memory_dotranslate instead.
>
> So trace_uprobes.c should define
>
> 	void FETCH_FUNC_NAME(memory, type)(addr, ...)
> 	{
> 		copy_from_user((void __user *)addr);
> 	}
>
> 	void FETCH_FUNC_NAME(memory_dotranslate, type)(addr, ...)
> 	{
> 		void __user *uaddr = get_user_vaddr(regs, addr);
> 		copy_from_user(uaddr);
> 	}

Looks good.

>
> Then,
>
>> > Or. Perhaps we can leave "case '@'" in parse_probe_arg() and
>> > FETCH_MTD_memory alone. You seem to agree that "absolute address"
>> > can be useful anyway.
>>
>> Yes, but it's only meaningful to process-wide tracing sessions IMHO.
>
> Yes, yes, sure.
>
> I meant, we need both. Say, "perf probe "func global=@addr" means
> FETCH_MTD_memory, and "perf probe "func global=*addr" means
> FETCH_MTD_memory_dotranslate.
>
> Just in case, of course I do not care about the syntax, for example we
> can use "@~addr" for translate (or not translate) or whatever.

Yeah, and I want to hear from Masami.

>
> My only point: I think we need both to
>
> 	1. avoid the new argument in fetch_func_t
>
> 	2. allow the dump the data from the absolute address

I got it.

>
> And just to simplify the discussion, lets assume we use "*addr" for
> FETCH_MTD_memory_dotranslate and thus parse_probe_arg() gets the new
>
> 	case '*':
> 		if (is_kprobe)
> 			return -EINVAL;
>
> 		kstrtoul(arg + 1, 0, &param);
> 		f->fn = t->fetch[FETCH_MTD_memory_dotranslate];
> 		f->data = (void *)param;
> 		break;
> 		
> branch.

Looks good.

>
>> > Instead, perhaps we can add FETCH_MTD_memory_do_fancy_addr_translation,
>> > and, say, the new "case '*'" in parse_probe_arg() should add all the
>> > neccessary info as f->data (like, say, FETCH_MTD_symbol).
>>
>> Could you elaborate this more?
>
> Yes, I was confusing sorry.
>
> As for FETCH_MTD_memory_do_fancy_addr_translation, please see above.

Okay.

>
> As for "neccessary info as f->data". Suppose that we still have a reason
> for the additional argument in FETCH_MTD_memory_dotranslate method. Even
> in this case I don't think we should change the signature of fetch_func_t.
>
> What I think we can do is something like
>
> 	1. Changed parse_probe_arg() to accept "struct trace_uprobe *tu"
> 	   instead of is_kprobe. Naturally, !tu can be used instead.
>
> 	2. Introduce
>
> 		struct dotranslate_fetch_param {
> 			struct trace_uprobe	*tu;
> 			fetch_func_t		fetch;
> 			fetch_func_t		fetch_size;
> 		};
>
> 	3. Change the "case '*'" above to do
>
> 		case '*':
> 			if (!tu)
> 				return -EINVAL;
>
> 			struct dotranslate_fetch_param *xxx = kmalloc(..);
>
> 			xxx->fetch = t->fetch[FETCH_MTD_memory];
>
> 			// ... kstrtoul, fetch_size, etc, ...
>
> 			f->fn = t->fetch[FETCH_MTD_memory_dotranslate];
> 			f->data = (void *)xxx;
>
> 	4. Update traceprobe_free_probe_arg/etc.
>
> 	5. Now,
> 	
> 		void FETCH_FUNC_NAME(memory_dotranslate, type)(addr, ...)
> 		{
> 			struct dotranslate_fetch_param *xxx = data;
> 			void __user *uaddr = get_user_vaddr(regs, addr, tu);
>
> 			xxx->fetch(regs, addr, dest);
> 		}
>
> Yes, yes, I am sure I missed something and this is not that simple,
> I am new to this "fetch" code.
>
> And even if I am right, let me repeat that I am not going to argue.
> Well, at least too much ;) This looks better in my opinion, but this
> is always subjective, so please free to ignore.

Thank you very much for providing good review, suggestion and pseudo
code. :)  I indeed like this approach too.

I'll change the code this way in next version.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-06 18:24                       ` Oleg Nesterov
@ 2013-11-07  9:00                         ` Namhyung Kim
  2013-11-08 17:00                           ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-07  9:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Wed, 6 Nov 2013 19:24:08 +0100, Oleg Nesterov wrote:
> Forgot to mention,
>
> On 11/06, Oleg Nesterov wrote:
>>
>> I meant,
>>
>> 		saved_ip = instruction_pointer(regs);
>>
>> 		// pass the "ip" which was used to calculate
>> 		// the @addr argument to fetch_*() methods
>>
>> 		temp_ip = is_ret_probe(tu) ? func : saved_ip;
>> 		temp_ip -= tu->offset;
>> 		instruction_pointer_set(temp_ip);
>>
>> 		store_trace_args(...);
>
> Note that instruction_pointer_set() is not really nice in any case,
> this can obviously confuse FETCH_MTD_reg("ip").

Yes.

>
> But lets ignore this. The solution is simple, we can pass/use this
> info via current->utask. We can either add the new member, or add
> a union. Or simply reuse xol_vaddr. Doesn't matter.

Okay, I'll take a look.

>
> So the only question is should we rely on instruction_pointer/func
> to translate the address or we should do something else (say, vma).
>
> So far I like this approach.

Me too. :)

So, I'll wait for more feedbacks from others and then send a new
version.  Actually I don't have much to do this these days - probably
until next week (there'll be the KLF).  Please forgive my brevity in
replies.

I'll be back with a new patchset then. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-07  7:33                     ` Namhyung Kim
@ 2013-11-08 16:52                       ` Oleg Nesterov
  0 siblings, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-08 16:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Namhyung,

sorry for delay.

On 11/07, Namhyung Kim wrote:
>
> >> > As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
> >> > the "@" argument anyway, why it can't also substruct this offset?
> >>
> >> Hmm.. it makes sense too. :)
> >
> > I am no longer sure ;)
> >
> > This way the "@" argument will look more confusing, it will depend on the
> > address/offset of the probed insn. But again, I do not know, this is up
> > to you.
>
> That said, I'd prefer the original "-= -tu->offset" approach.  It'll
> make debugging easier IMHO.

I do not really mind, and probably you are right. Actually it seems
that I was confused, if user-space does "-= -tu->offset" itself then
the "@" argument will look more consistent (contrary to what I said
above).

In any case we should make the calculation of "@" argument (in user
space) as simple/clear as possible, it is very easy to add the
additional hacks in kernel if necessary.

And this is very (if not most) important part, we can change the kernel
later, but it is not easy to change the already working semantics, so I'd
like to know what other reviewers think.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-07  9:00                         ` Namhyung Kim
@ 2013-11-08 17:00                           ` Oleg Nesterov
  2013-11-12  7:49                             ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-08 17:00 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/07, Namhyung Kim wrote:
>
> On Wed, 6 Nov 2013 19:24:08 +0100, Oleg Nesterov wrote:
> >
> > Note that instruction_pointer_set() is not really nice in any case,
> > this can obviously confuse FETCH_MTD_reg("ip").
>
> Yes.
>
> >
> > But lets ignore this. The solution is simple, we can pass/use this
> > info via current->utask. We can either add the new member, or add
> > a union. Or simply reuse xol_vaddr. Doesn't matter.
>
> Okay, I'll take a look.

I guess we need the union in uprobe_task anyway... I'll send the patch
soon.

Until we have the new members in uprobe_task, you can reuse utask->vaddr,
this is safe.

IOW, you can use current->utask->vaddr instead of regs->ip (as we did
in the previous discussin) to pass the necessary info to ->fetch().

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-07  8:48                       ` Namhyung Kim
@ 2013-11-09  3:18                         ` Masami Hiramatsu
  2013-11-09 15:23                           ` Oleg Nesterov
  0 siblings, 1 reply; 92+ messages in thread
From: Masami Hiramatsu @ 2013-11-09  3:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Oleg Nesterov, Steven Rostedt, Namhyung Kim, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

(2013/11/07 17:48), Namhyung Kim wrote:
> On Wed, 6 Nov 2013 18:37:54 +0100, Oleg Nesterov wrote:
>> On 11/06, Namhyung Kim wrote:
>>>
>>> On Tue, 5 Nov 2013 20:24:01 +0100, Oleg Nesterov wrote:
>>>> On 11/05, Oleg Nesterov wrote:
>>>>>
>>>>> As for "-= tu->offset"... Can't we avoid it? User-space needs to calculate
>>>>> the "@" argument anyway, why it can't also substruct this offset?
>>>>>
>>>>> Or perhaps we can change parse_probe_arg("@") to update "param" ? Yes,
>>>>> in this case it needs another argument, not sure...
>>>>
>>>> Or,
>>>>
>>>>> 	+	if (is_ret_probe(tu)) {
>>>>> 	+		saved_ip = instruction_pointer(regs);
>>>>> 	+		instruction_pointer_set(func);
>>>>> 	+	}
>>>>> 		store_trace_args(...);
>>>>> 	+	if (is_ret_probe(tu))
>>>>> 	+		instruction_pointer_set(saved_ip);
>>>>
>>>> we can put "-= tu->offset" here.
>>>
>>> I don't think I get the point.
>>
>> I meant,
>>
>> 		saved_ip = instruction_pointer(regs);
>>
>> 		// pass the "ip" which was used to calculate
>> 		// the @addr argument to fetch_*() methods
>>
>> 		temp_ip = is_ret_probe(tu) ? func : saved_ip;
>> 		temp_ip -= tu->offset;
>> 		instruction_pointer_set(temp_ip);
>>
>> 		store_trace_args(...);
>>
>> 		instruction_pointer_set(saved_ip);
>>
>> This way we can avoid the new "void *" argument for fetch_func_t,
>> we do not need it to calculate the address.
> 
> Okay, but as I said before, subtracting tu->offset part can be removed.

Ah, that's good to me too :)


>>
>> However, now I think it would be more clean to leave FETCH_MTD_memory
>> alone and add FETCH_MTD_memory_dotranslate instead.
>>
>> So trace_uprobes.c should define
>>
>> 	void FETCH_FUNC_NAME(memory, type)(addr, ...)
>> 	{
>> 		copy_from_user((void __user *)addr);
>> 	}
>>
>> 	void FETCH_FUNC_NAME(memory_dotranslate, type)(addr, ...)
>> 	{
>> 		void __user *uaddr = get_user_vaddr(regs, addr);
>> 		copy_from_user(uaddr);
>> 	}
> 
> Looks good.
> 
>>
>> Then,
>>
>>>> Or. Perhaps we can leave "case '@'" in parse_probe_arg() and
>>>> FETCH_MTD_memory alone. You seem to agree that "absolute address"
>>>> can be useful anyway.
>>>
>>> Yes, but it's only meaningful to process-wide tracing sessions IMHO.
>>
>> Yes, yes, sure.
>>
>> I meant, we need both. Say, "perf probe "func global=@addr" means
>> FETCH_MTD_memory, and "perf probe "func global=*addr" means
>> FETCH_MTD_memory_dotranslate.
>>
>> Just in case, of course I do not care about the syntax, for example we
>> can use "@~addr" for translate (or not translate) or whatever.
> 
> Yeah, and I want to hear from Masami.

Hm, this part I need to clarify. So you mean the @addr is for referring
the absolute address in a user process, but @~addr is for referring
the relative address of a executable or library, correct?
In that case, I suggest you to use "@+addr" for the relative address,
since that is an offset, isn't that? :)

BTW, it seems that @addr syntax is hard to use for uprobes, because
current uprobes is based on a binary, not a process, we cannot specify
which process is probed when we define it.


>> My only point: I think we need both to
>>
>> 	1. avoid the new argument in fetch_func_t
>>
>> 	2. allow the dump the data from the absolute address

Looks good to me :)

Thank you!

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-09  3:18                         ` Masami Hiramatsu
@ 2013-11-09 15:23                           ` Oleg Nesterov
  2013-11-12  8:00                             ` Namhyung Kim
  0 siblings, 1 reply; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-09 15:23 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Namhyung Kim, Steven Rostedt, Namhyung Kim, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 11/09, Masami Hiramatsu wrote:
>
> In that case, I suggest you to use "@+addr" for the relative address,
> since that is an offset, isn't that? :)

Agreed, @+addr looks better!

> BTW, it seems that @addr syntax is hard to use for uprobes, because
> current uprobes is based on a binary, not a process, we cannot specify
> which process is probed when we define it.

Yes, exactly. That is why we suggest that user-space should pass the
ip-relative address (actually offset). This should hopefully solve all
problems with relocations.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-08 17:00                           ` Oleg Nesterov
@ 2013-11-12  7:49                             ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-12  7:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Masami Hiramatsu, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

On Fri, 8 Nov 2013 18:00:17 +0100, Oleg Nesterov wrote:
> On 11/07, Namhyung Kim wrote:
>>
>> On Wed, 6 Nov 2013 19:24:08 +0100, Oleg Nesterov wrote:
>> >
>> > Note that instruction_pointer_set() is not really nice in any case,
>> > this can obviously confuse FETCH_MTD_reg("ip").
>>
>> Yes.
>>
>> >
>> > But lets ignore this. The solution is simple, we can pass/use this
>> > info via current->utask. We can either add the new member, or add
>> > a union. Or simply reuse xol_vaddr. Doesn't matter.
>>
>> Okay, I'll take a look.
>
> I guess we need the union in uprobe_task anyway... I'll send the patch
> soon.
>
> Until we have the new members in uprobe_task, you can reuse utask->vaddr,
> this is safe.
>
> IOW, you can use current->utask->vaddr instead of regs->ip (as we did
> in the previous discussin) to pass the necessary info to ->fetch().

Thanks for the info!
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-09 15:23                           ` Oleg Nesterov
@ 2013-11-12  8:00                             ` Namhyung Kim
  2013-11-12 18:44                               ` Oleg Nesterov
  2013-11-25  6:59                               ` Namhyung Kim
  0 siblings, 2 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-11-12  8:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg and Masami,

On Sat, 9 Nov 2013 16:23:13 +0100, Oleg Nesterov wrote:
> On 11/09, Masami Hiramatsu wrote:
>>
>> In that case, I suggest you to use "@+addr" for the relative address,
>> since that is an offset, isn't that? :)
>
> Agreed, @+addr looks better!

Looks good to me too.

>
>> BTW, it seems that @addr syntax is hard to use for uprobes, because
>> current uprobes is based on a binary, not a process, we cannot specify
>> which process is probed when we define it.
>
> Yes, exactly. That is why we suggest that user-space should pass the
> ip-relative address (actually offset). This should hopefully solve all
> problems with relocations.

Let me clarify what I understand.

For @addr syntax: kernel does no translation and uses given address

For @+addr syntax: user-space uses relative symbol address from a loaded
                   base address and kernel calculates the base address
                   using "current->utask->vaddr - tu->offset".

Is that right?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-12  8:00                             ` Namhyung Kim
@ 2013-11-12 18:44                               ` Oleg Nesterov
  2013-11-25  6:59                               ` Namhyung Kim
  1 sibling, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-12 18:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Namhyung,

On 11/12, Namhyung Kim wrote:
>
> Let me clarify what I understand.
>
> For @addr syntax: kernel does no translation and uses given address

Yes,

> For @+addr syntax: user-space uses relative symbol address from a loaded
>                    base address and kernel calculates the base address
>                    using "current->utask->vaddr - tu->offset".

Looks right to me.

IOW, when user-space calculates the "addr" for @+ argument, it should
assume that this binary will be mmaped at "NULL" (so that the virtual
address of the probed function is always equal to its offset in the
probed binary).

Masami, Steven, et al, do you agree?

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6)
  2013-11-12  8:00                             ` Namhyung Kim
  2013-11-12 18:44                               ` Oleg Nesterov
@ 2013-11-25  6:59                               ` Namhyung Kim
  2013-11-25 14:12                                 ` [PATCH] uprobes: Allocate ->utask before handler_chain() for tracing handlers Oleg Nesterov
  1 sibling, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-11-25  6:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Oleg,

On Tue, 12 Nov 2013 17:00:01 +0900, Namhyung Kim wrote:
> For @+addr syntax: user-space uses relative symbol address from a loaded
>                    base address and kernel calculates the base address
>                    using "current->utask->vaddr - tu->offset".

I tried this approach and realized that current->utask is not set or has
an invalid vaddr when handler_chain() is called.  So I had to apply
following patch and it seems to work well for me.  Could you confirm it?

Thanks,
Namhyung


diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ad8e1bdca70e..e63748d3520e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1456,7 +1456,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
 
 /* Prepare to single-step probed instruction out of line. */
 static int
-pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
+pre_ssout(struct uprobe *uprobe, struct pt_regs *regs)
 {
 	struct uprobe_task *utask;
 	unsigned long xol_vaddr;
@@ -1471,7 +1471,6 @@ pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
 		return -ENOMEM;
 
 	utask->xol_vaddr = xol_vaddr;
-	utask->vaddr = bp_vaddr;
 
 	err = arch_uprobe_pre_xol(&uprobe->arch, regs);
 	if (unlikely(err)) {
@@ -1701,6 +1700,7 @@ static bool handle_trampoline(struct pt_regs *regs)
 static void handle_swbp(struct pt_regs *regs)
 {
 	struct uprobe *uprobe;
+	struct uprobe_task *utask;
 	unsigned long bp_vaddr;
 	int uninitialized_var(is_swbp);
 
@@ -1744,11 +1744,17 @@ static void handle_swbp(struct pt_regs *regs)
 	if (unlikely(!test_bit(UPROBE_COPY_INSN, &uprobe->flags)))
 		goto out;
 
+	utask = get_utask();
+	if (!utask)
+		goto out;
+
+	utask->vaddr = bp_vaddr;
+
 	handler_chain(uprobe, regs);
 	if (can_skip_sstep(uprobe, regs))
 		goto out;
 
-	if (!pre_ssout(uprobe, regs, bp_vaddr))
+	if (!pre_ssout(uprobe, regs))
 		return;
 
 	/* can_skip_sstep() succeeded, or restart if can't singlestep */

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH] uprobes: Allocate ->utask before handler_chain() for tracing handlers
  2013-11-25  6:59                               ` Namhyung Kim
@ 2013-11-25 14:12                                 ` Oleg Nesterov
  0 siblings, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-11-25 14:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Hyeoncheol Lee,
	Hemant Kumar, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Hi Namhyung,

On 11/25, Namhyung Kim wrote:
>
> Hi Oleg,
>
> On Tue, 12 Nov 2013 17:00:01 +0900, Namhyung Kim wrote:
> > For @+addr syntax: user-space uses relative symbol address from a loaded
> >                    base address and kernel calculates the base address
> >                    using "current->utask->vaddr - tu->offset".
>
> I tried this approach and realized that current->utask is not set

Aaah, I am stipid. Yes, I forgot that it is always NULL until the
task does xol or prepare_uretprobe() for the 1st time.... And on
powerpc it can be always NULL because it can likely emulate the
probed insn.

> or has
> an invalid vaddr when handler_chain() is called.

But this should not matter at all? you should not rely on the value of
->vaddr, you should use at as uninitialized scratchpad. And in fact we
were going to add another member into the union which should be used
instead later (but lets ignore this for now).

> So I had to apply
> following patch and it seems to work well for me.  Could you confirm it?

I don't think we need this patch, see below.

> @@ -1744,11 +1744,17 @@ static void handle_swbp(struct pt_regs *regs)
>  	if (unlikely(!test_bit(UPROBE_COPY_INSN, &uprobe->flags)))
>  		goto out;
>
> +	utask = get_utask();

Yes, we need this until we find another way to pass the additional info
to ->fetch() methods. This is a bit unfortunate, may be we fill find a
better solution later.

But until then we can probably tolerate the hack below, what do you
think?

Oleg.
---

Subject: [PATCH] uprobes: Allocate ->utask before handler_chain() for tracing handlers

uprobe_trace_print() and uprobe_perf_print() need to pass the additional
info to call_fetch() methods, currently there is no simple way to do this.

current->utask looks like a natural place to hold this info, but we need
to allocate it before handler_chain().

This is a bit unfortunate, perhaps we will find a better solution later,
but this is simple and should work right now.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b886a5e..307d87c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1854,6 +1854,10 @@ static void handle_swbp(struct pt_regs *regs)
 	if (unlikely(!test_bit(UPROBE_COPY_INSN, &uprobe->flags)))
 		goto out;
 
+	/* Tracing handlers use ->utask to communicate with fetch methods */
+	if (!get_utask())
+		goto out;
+
 	handler_chain(uprobe, regs);
 	if (can_skip_sstep(uprobe, regs))
 		goto out;
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-09-03 11:10   ` zhangwei(Jovi)
@ 2013-09-04  7:10     ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-09-04  7:10 UTC (permalink / raw)
  To: zhangwei(Jovi)
  Cc: Steven Rostedt, Namhyung Kim, Hyeoncheol Lee, Masami Hiramatsu,
	LKML, Srikar Dronamraju, Oleg Nesterov, Arnaldo Carvalho de Melo

Hi Jovi,

[SNIP]

On Tue, 3 Sep 2013 19:10:04 +0800, zhangwei wrote:
>> +	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
>> +		int cpu;
>> +
>> +		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
>> +		if (uprobe_cpu_buffer == NULL)
>> +			return -ENOMEM;
>> +
>
> Do we need add atomic_dec if allocate percpu buffer failed?

Good catch!  I'll fix it. :)

Thanks,
Namhyung


>
>> +		for_each_possible_cpu(cpu)
>> +			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
>> +	}
>> +
>>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-09-03 10:50   ` Masami Hiramatsu
@ 2013-09-04  7:08     ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-09-04  7:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Tue, 03 Sep 2013 19:50:28 +0900, Masami Hiramatsu wrote:
> (2013/09/03 14:44), Namhyung Kim wrote:
>> From: Namhyung Kim <namhyung.kim@lge.com>
>> 
>> Fetching from user space should be done in a non-atomic context.  So
>> use a per-cpu buffer and copy its content to the ring buffer
>> atomically.  Note that we can migrate during accessing user memory
>> thus use a per-cpu mutex to protect concurrent accesses.
>> 
>> This is needed since we'll be able to fetch args from an user memory
>> which can be swapped out.  Before that uprobes could fetch args from
>> registers only which saved in a kernel space.
>> 
>> While at it, use __get_data_size() and store_trace_args() to reduce
>> code duplication.

[SNIP]

>> +	size = esize + tu->p.size + dsize;
>>  	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
>> -						  size + tu->p.size, 0, 0);
>> -	if (!event)
>> +						  size, 0, 0);
>> +	if (!event) {
>> +		mutex_unlock(mutex);
>>  		return;
>
> Just for a maintenance reason, I personally like to use "goto" in this case
> to fold up the mutex_unlock. :)
>
> Other parts are good for me.
>
> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you for review!

I'll change it as you said and fix atomic_dec bug that Jovi found.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-09-03  5:44 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
  2013-09-03 10:50   ` Masami Hiramatsu
@ 2013-09-03 11:10   ` zhangwei(Jovi)
  2013-09-04  7:10     ` Namhyung Kim
  1 sibling, 1 reply; 92+ messages in thread
From: zhangwei(Jovi) @ 2013-09-03 11:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Hyeoncheol Lee, Masami Hiramatsu,
	LKML, Srikar Dronamraju, Oleg Nesterov, Arnaldo Carvalho de Melo

On 2013/9/3 13:44, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> Fetching from user space should be done in a non-atomic context.  So
> use a per-cpu buffer and copy its content to the ring buffer
> atomically.  Note that we can migrate during accessing user memory
> thus use a per-cpu mutex to protect concurrent accesses.
> 
> This is needed since we'll be able to fetch args from an user memory
> which can be swapped out.  Before that uprobes could fetch args from
> registers only which saved in a kernel space.
> 
> While at it, use __get_data_size() and store_trace_args() to reduce
> code duplication.
> 
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/trace/trace_uprobe.c | 97 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 81 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 9f2d12d2311d..9ede401759ab 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -530,21 +530,46 @@ static const struct file_operations uprobe_profile_ops = {
>  	.release	= seq_release,
>  };
>  
> +static atomic_t uprobe_buffer_ref = ATOMIC_INIT(0);
> +static void __percpu *uprobe_cpu_buffer;
> +static DEFINE_PER_CPU(struct mutex, uprobe_cpu_mutex);
> +
>  static void uprobe_trace_print(struct trace_uprobe *tu,
>  				unsigned long func, struct pt_regs *regs)
>  {
>  	struct uprobe_trace_entry_head *entry;
>  	struct ring_buffer_event *event;
>  	struct ring_buffer *buffer;
> -	void *data;
> -	int size, i;
> +	struct mutex *mutex;
> +	void *data, *arg_buf;
> +	int size, dsize, esize;
> +	int cpu;
>  	struct ftrace_event_call *call = &tu->p.call;
>  
> -	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +	dsize = __get_data_size(&tu->p, regs);
> +	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +
> +	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
> +		return;
> +
> +	cpu = raw_smp_processor_id();
> +	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
> +	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
> +
> +	/*
> +	 * Use per-cpu buffers for fastest access, but we might migrate
> +	 * so the mutex makes sure we have sole access to it.
> +	 */
> +	mutex_lock(mutex);
> +	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
> +
> +	size = esize + tu->p.size + dsize;
>  	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> -						  size + tu->p.size, 0, 0);
> -	if (!event)
> +						  size, 0, 0);
> +	if (!event) {
> +		mutex_unlock(mutex);
>  		return;
> +	}
>  
>  	entry = ring_buffer_event_data(event);
>  	if (is_ret_probe(tu)) {
> @@ -556,13 +581,12 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
>  		data = DATAOF_TRACE_ENTRY(entry, false);
>  	}
>  
> -	for (i = 0; i < tu->p.nr_args; i++) {
> -		call_fetch(&tu->p.args[i].fetch, regs,
> -			   data + tu->p.args[i].offset);
> -	}
> +	memcpy(data, arg_buf, tu->p.size + dsize);
>  
>  	if (!filter_current_check_discard(buffer, call, entry, event))
>  		trace_buffer_unlock_commit(buffer, event, 0, 0);
> +
> +	mutex_unlock(mutex);
>  }
>  
>  /* uprobe handler */
> @@ -630,6 +654,17 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
>  	if (trace_probe_is_enabled(&tu->p))
>  		return -EINTR;
>  
> +	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
> +		int cpu;
> +
> +		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
> +		if (uprobe_cpu_buffer == NULL)
> +			return -ENOMEM;
> +

Do we need add atomic_dec if allocate percpu buffer failed?

> +		for_each_possible_cpu(cpu)
> +			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
> +	}
> +
>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>  
>  	tu->p.flags |= flag;
> @@ -646,6 +681,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>  	if (!trace_probe_is_enabled(&tu->p))
>  		return;
>  
> +	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
> +		free_percpu(uprobe_cpu_buffer);
> +		uprobe_cpu_buffer = NULL;
> +	}
> +
>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>  
>  	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
> @@ -776,11 +816,33 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
>  	struct ftrace_event_call *call = &tu->p.call;
>  	struct uprobe_trace_entry_head *entry;
>  	struct hlist_head *head;
> -	void *data;
> -	int size, rctx, i;
> +	struct mutex *mutex;
> +	void *data, *arg_buf;
> +	int size, dsize, esize;
> +	int cpu;
> +	int rctx;
>  
> -	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> -	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> +	dsize = __get_data_size(&tu->p, regs);
> +	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +
> +	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
> +		return;
> +
> +	size = esize + tu->p.size + dsize;
> +	size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> +	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> +		return;
> +
> +	cpu = raw_smp_processor_id();
> +	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
> +	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
> +
> +	/*
> +	 * Use per-cpu buffers for fastest access, but we might migrate
> +	 * so the mutex makes sure we have sole access to it.
> +	 */
> +	mutex_lock(mutex);
> +	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
>  
>  	preempt_disable();
>  	head = this_cpu_ptr(call->perf_events);
> @@ -800,15 +862,18 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
>  		data = DATAOF_TRACE_ENTRY(entry, false);
>  	}
>  
> -	for (i = 0; i < tu->p.nr_args; i++) {
> -		struct probe_arg *parg = &tu->p.args[i];
> +	memcpy(data, arg_buf, tu->p.size + dsize);
> +
> +	if (size - esize > tu->p.size + dsize) {
> +		int len = tu->p.size + dsize;
>  
> -		call_fetch(&parg->fetch, regs, data + parg->offset);
> +		memset(data + len, 0, size - esize - len);
>  	}
>  
>  	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
>   out:
>  	preempt_enable();
> +	mutex_unlock(mutex);
>  }
>  
>  /* uprobe profile handler */
> 



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-09-03  5:44 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
@ 2013-09-03 10:50   ` Masami Hiramatsu
  2013-09-04  7:08     ` Namhyung Kim
  2013-09-03 11:10   ` zhangwei(Jovi)
  1 sibling, 1 reply; 92+ messages in thread
From: Masami Hiramatsu @ 2013-09-03 10:50 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

(2013/09/03 14:44), Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> Fetching from user space should be done in a non-atomic context.  So
> use a per-cpu buffer and copy its content to the ring buffer
> atomically.  Note that we can migrate during accessing user memory
> thus use a per-cpu mutex to protect concurrent accesses.
> 
> This is needed since we'll be able to fetch args from an user memory
> which can be swapped out.  Before that uprobes could fetch args from
> registers only which saved in a kernel space.
> 
> While at it, use __get_data_size() and store_trace_args() to reduce
> code duplication.
> 
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/trace/trace_uprobe.c | 97 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 81 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 9f2d12d2311d..9ede401759ab 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -530,21 +530,46 @@ static const struct file_operations uprobe_profile_ops = {
>  	.release	= seq_release,
>  };
>  
> +static atomic_t uprobe_buffer_ref = ATOMIC_INIT(0);
> +static void __percpu *uprobe_cpu_buffer;
> +static DEFINE_PER_CPU(struct mutex, uprobe_cpu_mutex);
> +
>  static void uprobe_trace_print(struct trace_uprobe *tu,
>  				unsigned long func, struct pt_regs *regs)
>  {
>  	struct uprobe_trace_entry_head *entry;
>  	struct ring_buffer_event *event;
>  	struct ring_buffer *buffer;
> -	void *data;
> -	int size, i;
> +	struct mutex *mutex;
> +	void *data, *arg_buf;
> +	int size, dsize, esize;
> +	int cpu;
>  	struct ftrace_event_call *call = &tu->p.call;
>  
> -	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +	dsize = __get_data_size(&tu->p, regs);
> +	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +
> +	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
> +		return;
> +
> +	cpu = raw_smp_processor_id();
> +	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
> +	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
> +
> +	/*
> +	 * Use per-cpu buffers for fastest access, but we might migrate
> +	 * so the mutex makes sure we have sole access to it.
> +	 */
> +	mutex_lock(mutex);
> +	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
> +
> +	size = esize + tu->p.size + dsize;
>  	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> -						  size + tu->p.size, 0, 0);
> -	if (!event)
> +						  size, 0, 0);
> +	if (!event) {
> +		mutex_unlock(mutex);
>  		return;

Just for a maintenance reason, I personally like to use "goto" in this case
to fold up the mutex_unlock. :)

Other parts are good for me.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you!


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-09-03  5:44 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v5) Namhyung Kim
@ 2013-09-03  5:44 ` Namhyung Kim
  2013-09-03 10:50   ` Masami Hiramatsu
  2013-09-03 11:10   ` zhangwei(Jovi)
  0 siblings, 2 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-09-03  5:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Hyeoncheol Lee, Masami Hiramatsu, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Fetching from user space should be done in a non-atomic context.  So
use a per-cpu buffer and copy its content to the ring buffer
atomically.  Note that we can migrate during accessing user memory
thus use a per-cpu mutex to protect concurrent accesses.

This is needed since we'll be able to fetch args from an user memory
which can be swapped out.  Before that uprobes could fetch args from
registers only which saved in a kernel space.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_uprobe.c | 97 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 81 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 9f2d12d2311d..9ede401759ab 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -530,21 +530,46 @@ static const struct file_operations uprobe_profile_ops = {
 	.release	= seq_release,
 };
 
+static atomic_t uprobe_buffer_ref = ATOMIC_INIT(0);
+static void __percpu *uprobe_cpu_buffer;
+static DEFINE_PER_CPU(struct mutex, uprobe_cpu_mutex);
+
 static void uprobe_trace_print(struct trace_uprobe *tu,
 				unsigned long func, struct pt_regs *regs)
 {
 	struct uprobe_trace_entry_head *entry;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
-	void *data;
-	int size, i;
+	struct mutex *mutex;
+	void *data, *arg_buf;
+	int size, dsize, esize;
+	int cpu;
 	struct ftrace_event_call *call = &tu->p.call;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
+		return;
+
+	cpu = raw_smp_processor_id();
+	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
+	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+	/*
+	 * Use per-cpu buffers for fastest access, but we might migrate
+	 * so the mutex makes sure we have sole access to it.
+	 */
+	mutex_lock(mutex);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
+
+	size = esize + tu->p.size + dsize;
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
-						  size + tu->p.size, 0, 0);
-	if (!event)
+						  size, 0, 0);
+	if (!event) {
+		mutex_unlock(mutex);
 		return;
+	}
 
 	entry = ring_buffer_event_data(event);
 	if (is_ret_probe(tu)) {
@@ -556,13 +581,12 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		call_fetch(&tu->p.args[i].fetch, regs,
-			   data + tu->p.args[i].offset);
-	}
+	memcpy(data, arg_buf, tu->p.size + dsize);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit(buffer, event, 0, 0);
+
+	mutex_unlock(mutex);
 }
 
 /* uprobe handler */
@@ -630,6 +654,17 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
 	if (trace_probe_is_enabled(&tu->p))
 		return -EINTR;
 
+	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
+		int cpu;
+
+		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+		if (uprobe_cpu_buffer == NULL)
+			return -ENOMEM;
+
+		for_each_possible_cpu(cpu)
+			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
+	}
+
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	tu->p.flags |= flag;
@@ -646,6 +681,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
 	if (!trace_probe_is_enabled(&tu->p))
 		return;
 
+	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
+		free_percpu(uprobe_cpu_buffer);
+		uprobe_cpu_buffer = NULL;
+	}
+
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
@@ -776,11 +816,33 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	struct ftrace_event_call *call = &tu->p.call;
 	struct uprobe_trace_entry_head *entry;
 	struct hlist_head *head;
-	void *data;
-	int size, rctx, i;
+	struct mutex *mutex;
+	void *data, *arg_buf;
+	int size, dsize, esize;
+	int cpu;
+	int rctx;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
-	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
+		return;
+
+	size = esize + tu->p.size + dsize;
+	size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
+		return;
+
+	cpu = raw_smp_processor_id();
+	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
+	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+	/*
+	 * Use per-cpu buffers for fastest access, but we might migrate
+	 * so the mutex makes sure we have sole access to it.
+	 */
+	mutex_lock(mutex);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
@@ -800,15 +862,18 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		struct probe_arg *parg = &tu->p.args[i];
+	memcpy(data, arg_buf, tu->p.size + dsize);
+
+	if (size - esize > tu->p.size + dsize) {
+		int len = tu->p.size + dsize;
 
-		call_fetch(&parg->fetch, regs, data + parg->offset);
+		memset(data + len, 0, size - esize - len);
 	}
 
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
  out:
 	preempt_enable();
+	mutex_unlock(mutex);
 }
 
 /* uprobe profile handler */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-27  8:48 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v4) Namhyung Kim
@ 2013-08-27  8:48 ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-08-27  8:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Hyeoncheol Lee, Masami Hiramatsu, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Fetching from user space should be done in a non-atomic context.  So
use a per-cpu buffer and copy its content to the ring buffer
atomically.  Note that we can migrate during accessing user memory
thus use a per-cpu mutex to protect concurrent accesses.

This is needed since we'll be able to fetch args from an user memory
which can be swapped out.  Before that uprobes could fetch args from
registers only which saved in a kernel space.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_uprobe.c | 97 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 81 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 9f2d12d2311d..9ede401759ab 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -530,21 +530,46 @@ static const struct file_operations uprobe_profile_ops = {
 	.release	= seq_release,
 };
 
+static atomic_t uprobe_buffer_ref = ATOMIC_INIT(0);
+static void __percpu *uprobe_cpu_buffer;
+static DEFINE_PER_CPU(struct mutex, uprobe_cpu_mutex);
+
 static void uprobe_trace_print(struct trace_uprobe *tu,
 				unsigned long func, struct pt_regs *regs)
 {
 	struct uprobe_trace_entry_head *entry;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
-	void *data;
-	int size, i;
+	struct mutex *mutex;
+	void *data, *arg_buf;
+	int size, dsize, esize;
+	int cpu;
 	struct ftrace_event_call *call = &tu->p.call;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->p.size + dsize > PAGE_SIZE))
+		return;
+
+	cpu = raw_smp_processor_id();
+	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
+	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+	/*
+	 * Use per-cpu buffers for fastest access, but we might migrate
+	 * so the mutex makes sure we have sole access to it.
+	 */
+	mutex_lock(mutex);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
+
+	size = esize + tu->p.size + dsize;
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
-						  size + tu->p.size, 0, 0);
-	if (!event)
+						  size, 0, 0);
+	if (!event) {
+		mutex_unlock(mutex);
 		return;
+	}
 
 	entry = ring_buffer_event_data(event);
 	if (is_ret_probe(tu)) {
@@ -556,13 +581,12 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		call_fetch(&tu->p.args[i].fetch, regs,
-			   data + tu->p.args[i].offset);
-	}
+	memcpy(data, arg_buf, tu->p.size + dsize);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit(buffer, event, 0, 0);
+
+	mutex_unlock(mutex);
 }
 
 /* uprobe handler */
@@ -630,6 +654,17 @@ probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
 	if (trace_probe_is_enabled(&tu->p))
 		return -EINTR;
 
+	if (atomic_inc_return(&uprobe_buffer_ref) == 1) {
+		int cpu;
+
+		uprobe_cpu_buffer = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+		if (uprobe_cpu_buffer == NULL)
+			return -ENOMEM;
+
+		for_each_possible_cpu(cpu)
+			mutex_init(&per_cpu(uprobe_cpu_mutex, cpu));
+	}
+
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	tu->p.flags |= flag;
@@ -646,6 +681,11 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
 	if (!trace_probe_is_enabled(&tu->p))
 		return;
 
+	if (atomic_dec_and_test(&uprobe_buffer_ref)) {
+		free_percpu(uprobe_cpu_buffer);
+		uprobe_cpu_buffer = NULL;
+	}
+
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
@@ -776,11 +816,33 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	struct ftrace_event_call *call = &tu->p.call;
 	struct uprobe_trace_entry_head *entry;
 	struct hlist_head *head;
-	void *data;
-	int size, rctx, i;
+	struct mutex *mutex;
+	void *data, *arg_buf;
+	int size, dsize, esize;
+	int cpu;
+	int rctx;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
-	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
+		return;
+
+	size = esize + tu->p.size + dsize;
+	size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
+		return;
+
+	cpu = raw_smp_processor_id();
+	mutex = &per_cpu(uprobe_cpu_mutex, cpu);
+	arg_buf = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+	/*
+	 * Use per-cpu buffers for fastest access, but we might migrate
+	 * so the mutex makes sure we have sole access to it.
+	 */
+	mutex_lock(mutex);
+	store_trace_args(esize, &tu->p, regs, arg_buf, dsize);
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
@@ -800,15 +862,18 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		struct probe_arg *parg = &tu->p.args[i];
+	memcpy(data, arg_buf, tu->p.size + dsize);
+
+	if (size - esize > tu->p.size + dsize) {
+		int len = tu->p.size + dsize;
 
-		call_fetch(&parg->fetch, regs, data + parg->offset);
+		memset(data + len, 0, size - esize - len);
 	}
 
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
  out:
 	preempt_enable();
+	mutex_unlock(mutex);
 }
 
 /* uprobe profile handler */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-23  1:08         ` Steven Rostedt
@ 2013-08-27  8:07           ` Namhyung Kim
  0 siblings, 0 replies; 92+ messages in thread
From: Namhyung Kim @ 2013-08-27  8:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: zhangwei(Jovi),
	Masami Hiramatsu, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, Oleg Nesterov, Arnaldo Carvalho de Melo

Hi Steven,

On Thu, 22 Aug 2013 21:08:30 -0400, Steven Rostedt wrote:
> On Fri, 23 Aug 2013 07:57:15 +0800
> "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> wrote:
>
>
>> > 
>> > What about creating a per cpu buffer when uprobes are registered, and
>> > delete them when they are finished? Basically what trace_printk() does
>> > if it detects that there are users of trace_printk() in the kernel.
>> > Note, it does not deallocate them when finished, as it is never
>> > finished until reboot ;-)
>> > 
>> > -- Steve
>> >
>> I also thought out this approach, but the issue is we cannot fetch user
>> memory into per-cpu buffer, because use per-cpu buffer should under
>> preempt disabled, and fetching user memory could sleep.
>
> Actually, we could create a per_cpu mutex to match the per_cpu buffers.
> This is not unlike what we do in -rt.
>
> 	int cpu;
> 	struct mutex *mutex;
> 	void *buf;
>
>
> 	/*
> 	 * Use per cpu buffers for fastest access, but we might migrate
> 	 * So the mutex makes sure we have sole access to it.
> 	 */
>
> 	cpu = raw_smp_processor_id();
> 	mutex = per_cpu(uprobe_cpu_mutex, cpu);
> 	buf = per_cpu(uprobe_cpu_buffer, cpu);
>
> 	mutex_lock(mutex);
> 	store_trace_args(..., buf,...);
> 	mutex_unlock(mutex);
>

Great!  I'll go with this approach.  Is it OK to you, Masami?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-22 23:57       ` zhangwei(Jovi)
  2013-08-23  1:08         ` Steven Rostedt
@ 2013-08-23  4:22         ` Masami Hiramatsu
  1 sibling, 0 replies; 92+ messages in thread
From: Masami Hiramatsu @ 2013-08-23  4:22 UTC (permalink / raw)
  To: zhangwei(Jovi)
  Cc: Steven Rostedt, Namhyung Kim, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, Oleg Nesterov, Arnaldo Carvalho de Melo

(2013/08/23 8:57), zhangwei(Jovi) wrote:
> On 2013/8/23 0:42, Steven Rostedt wrote:
>> On Fri, 09 Aug 2013 18:56:54 +0900
>> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
>>
>>> (2013/08/09 17:45), Namhyung Kim wrote:
>>>> From: Namhyung Kim <namhyung.kim@lge.com>
>>>>
>>>> Fetching from user space should be done in a non-atomic context.  So
>>>> use a temporary buffer and copy its content to the ring buffer
>>>> atomically.
>>>>
>>>> While at it, use __get_data_size() and store_trace_args() to reduce
>>>> code duplication.
>>>
>>> I just concern using kmalloc() in the event handler. For fetching user
>>> memory which can be swapped out, that is true. But most of the cases,
>>> we can presume that it exists on the physical memory.
>>>
>>
>>
>> What about creating a per cpu buffer when uprobes are registered, and
>> delete them when they are finished? Basically what trace_printk() does
>> if it detects that there are users of trace_printk() in the kernel.
>> Note, it does not deallocate them when finished, as it is never
>> finished until reboot ;-)
>>
>> -- Steve
>>
> I also thought out this approach, but the issue is we cannot fetch user
> memory into per-cpu buffer, because use per-cpu buffer should under
> preempt disabled, and fetching user memory could sleep.

Hm, perhaps, we just need a "hot" buffer pool which can be allocate/free
soon, and whan the pool shortage caller just wait or allocate new page
from "cold" area, this is a.k.a. kmem_cache :)

Anyway, kmem_cache/kmalloc looks so heavy to just allocate temporally
buffers for trace handler (and also, those have tracepoints), so I think
you may just need a memory pool whose has enough number of slots with
a semaphore (which will wait if the all slots are currently used).

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-22 23:57       ` zhangwei(Jovi)
@ 2013-08-23  1:08         ` Steven Rostedt
  2013-08-27  8:07           ` Namhyung Kim
  2013-08-23  4:22         ` Masami Hiramatsu
  1 sibling, 1 reply; 92+ messages in thread
From: Steven Rostedt @ 2013-08-23  1:08 UTC (permalink / raw)
  To: zhangwei(Jovi)
  Cc: Masami Hiramatsu, Namhyung Kim, Namhyung Kim, Hyeoncheol Lee,
	LKML, Srikar Dronamraju, Oleg Nesterov, Arnaldo Carvalho de Melo

On Fri, 23 Aug 2013 07:57:15 +0800
"zhangwei(Jovi)" <jovi.zhangwei@huawei.com> wrote:


> > 
> > What about creating a per cpu buffer when uprobes are registered, and
> > delete them when they are finished? Basically what trace_printk() does
> > if it detects that there are users of trace_printk() in the kernel.
> > Note, it does not deallocate them when finished, as it is never
> > finished until reboot ;-)
> > 
> > -- Steve
> >
> I also thought out this approach, but the issue is we cannot fetch user
> memory into per-cpu buffer, because use per-cpu buffer should under
> preempt disabled, and fetching user memory could sleep.

Actually, we could create a per_cpu mutex to match the per_cpu buffers.
This is not unlike what we do in -rt.

	int cpu;
	struct mutex *mutex;
	void *buf;


	/*
	 * Use per cpu buffers for fastest access, but we might migrate
	 * So the mutex makes sure we have sole access to it.
	 */

	cpu = raw_smp_processor_id();
	mutex = per_cpu(uprobe_cpu_mutex, cpu);
	buf = per_cpu(uprobe_cpu_buffer, cpu);

	mutex_lock(mutex);
	store_trace_args(..., buf,...);
	mutex_unlock(mutex);

-- Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-22 16:42     ` Steven Rostedt
@ 2013-08-22 23:57       ` zhangwei(Jovi)
  2013-08-23  1:08         ` Steven Rostedt
  2013-08-23  4:22         ` Masami Hiramatsu
  0 siblings, 2 replies; 92+ messages in thread
From: zhangwei(Jovi) @ 2013-08-22 23:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Namhyung Kim, Namhyung Kim, Hyeoncheol Lee,
	LKML, Srikar Dronamraju, Oleg Nesterov, Arnaldo Carvalho de Melo

On 2013/8/23 0:42, Steven Rostedt wrote:
> On Fri, 09 Aug 2013 18:56:54 +0900
> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>> (2013/08/09 17:45), Namhyung Kim wrote:
>>> From: Namhyung Kim <namhyung.kim@lge.com>
>>>
>>> Fetching from user space should be done in a non-atomic context.  So
>>> use a temporary buffer and copy its content to the ring buffer
>>> atomically.
>>>
>>> While at it, use __get_data_size() and store_trace_args() to reduce
>>> code duplication.
>>
>> I just concern using kmalloc() in the event handler. For fetching user
>> memory which can be swapped out, that is true. But most of the cases,
>> we can presume that it exists on the physical memory.
>>
> 
> 
> What about creating a per cpu buffer when uprobes are registered, and
> delete them when they are finished? Basically what trace_printk() does
> if it detects that there are users of trace_printk() in the kernel.
> Note, it does not deallocate them when finished, as it is never
> finished until reboot ;-)
> 
> -- Steve
>
I also thought out this approach, but the issue is we cannot fetch user
memory into per-cpu buffer, because use per-cpu buffer should under
preempt disabled, and fetching user memory could sleep.

jovi.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-09  9:56   ` Masami Hiramatsu
  2013-08-09 16:20     ` Oleg Nesterov
@ 2013-08-22 16:42     ` Steven Rostedt
  2013-08-22 23:57       ` zhangwei(Jovi)
  1 sibling, 1 reply; 92+ messages in thread
From: Steven Rostedt @ 2013-08-22 16:42 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Namhyung Kim, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Fri, 09 Aug 2013 18:56:54 +0900
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> (2013/08/09 17:45), Namhyung Kim wrote:
> > From: Namhyung Kim <namhyung.kim@lge.com>
> > 
> > Fetching from user space should be done in a non-atomic context.  So
> > use a temporary buffer and copy its content to the ring buffer
> > atomically.
> > 
> > While at it, use __get_data_size() and store_trace_args() to reduce
> > code duplication.
> 
> I just concern using kmalloc() in the event handler. For fetching user
> memory which can be swapped out, that is true. But most of the cases,
> we can presume that it exists on the physical memory.
> 


What about creating a per cpu buffer when uprobes are registered, and
delete them when they are finished? Basically what trace_printk() does
if it detects that there are users of trace_printk() in the kernel.
Note, it does not deallocate them when finished, as it is never
finished until reboot ;-)

-- Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-10  1:41         ` zhangwei(Jovi)
@ 2013-08-10 14:06           ` zhangwei(Jovi)
  0 siblings, 0 replies; 92+ messages in thread
From: zhangwei(Jovi) @ 2013-08-10 14:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Namhyung Kim,
	Hyeoncheol Lee, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Sat, Aug 10, 2013 at 9:41 AM, zhangwei(Jovi) <jovi.zhangwei@gmail.com> wrote:
> On Sat, Aug 10, 2013 at 9:26 AM, zhangwei(Jovi) <jovi.zhangwei@gmail.com> wrote:
>> On Sat, Aug 10, 2013 at 12:20 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>>
>>> Sorry, I didn't read this series yet. Not that I think this needs my
>>> help, but I'll try to do this a later...
>>>
>>> On 08/09, Masami Hiramatsu wrote:
>>> >
>>> > I just concern using kmalloc() in the event handler.
>>>
>>> GFP_KERNEL should be fine for uprobe handler.
>>>
>>> However, iirc this conflicts with the patches from Jovi,
>>> "Support ftrace_event_file base multibuffer" adds rcu_read_lock()
>>> around uprobe_trace_print().
>>
>> (Sorry about html text rejected by kernel.org, send again with plain text.)
>>
>> Then we might need to call kmalloc before rcu_read_lock, also call kfree
>> after rcu_read_unlock.
>>
>> And it's not needed to call kmalloc for each instances in multi-buffer
>> case, just
>> kmalloc once is enough.
>>
>> I also have same concern about use kmalloc in uprobe handler, use kmalloc
>> in uprobe handler seems have a little overhead, why not pre-allocate one page
>> static memory for temp buffer(perhaps trace_uprobe based)? one page size
>> would be enough for all uprobe args storage, then we don't need to call
>> kmalloc in that "fast path".
>>
> forgotten to say, that pre-allocated buffer would need to be per-cpu, to prevent
> buffer corruption.
>
> It's a memory space vs. performance trade-off problem. :)
>
Oops, I missed the per-cpu buffer operation still need
preempt_disable, ignore this part
of my comments, now I agree kmalloc in uprobe handler is needed.

jovi.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-10  1:26       ` zhangwei(Jovi)
@ 2013-08-10  1:41         ` zhangwei(Jovi)
  2013-08-10 14:06           ` zhangwei(Jovi)
  0 siblings, 1 reply; 92+ messages in thread
From: zhangwei(Jovi) @ 2013-08-10  1:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Namhyung Kim,
	Hyeoncheol Lee, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Sat, Aug 10, 2013 at 9:26 AM, zhangwei(Jovi) <jovi.zhangwei@gmail.com> wrote:
> On Sat, Aug 10, 2013 at 12:20 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>
>> Sorry, I didn't read this series yet. Not that I think this needs my
>> help, but I'll try to do this a later...
>>
>> On 08/09, Masami Hiramatsu wrote:
>> >
>> > I just concern using kmalloc() in the event handler.
>>
>> GFP_KERNEL should be fine for uprobe handler.
>>
>> However, iirc this conflicts with the patches from Jovi,
>> "Support ftrace_event_file base multibuffer" adds rcu_read_lock()
>> around uprobe_trace_print().
>
> (Sorry about html text rejected by kernel.org, send again with plain text.)
>
> Then we might need to call kmalloc before rcu_read_lock, also call kfree
> after rcu_read_unlock.
>
> And it's not needed to call kmalloc for each instances in multi-buffer
> case, just
> kmalloc once is enough.
>
> I also have same concern about use kmalloc in uprobe handler, use kmalloc
> in uprobe handler seems have a little overhead, why not pre-allocate one page
> static memory for temp buffer(perhaps trace_uprobe based)? one page size
> would be enough for all uprobe args storage, then we don't need to call
> kmalloc in that "fast path".
>
forgotten to say, that pre-allocated buffer would need to be per-cpu, to prevent
buffer corruption.

It's a memory space vs. performance trade-off problem. :)

>
> Thanks.
>>
>>
>> Steven, Jovi, what should we do with that patch? It seems that it
>> was forgotten.
>>
>> I can take these patches into my ubprobes branch and then ask Ingo
>> to pull. But this will complicate the routing of the new changes
>> like this.
>>
>> Oleg.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-09 16:20     ` Oleg Nesterov
  2013-08-09 16:21       ` Oleg Nesterov
@ 2013-08-10  1:26       ` zhangwei(Jovi)
  2013-08-10  1:41         ` zhangwei(Jovi)
  1 sibling, 1 reply; 92+ messages in thread
From: zhangwei(Jovi) @ 2013-08-10  1:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim, Namhyung Kim,
	Hyeoncheol Lee, LKML, Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On Sat, Aug 10, 2013 at 12:20 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Sorry, I didn't read this series yet. Not that I think this needs my
> help, but I'll try to do this a later...
>
> On 08/09, Masami Hiramatsu wrote:
> >
> > I just concern using kmalloc() in the event handler.
>
> GFP_KERNEL should be fine for uprobe handler.
>
> However, iirc this conflicts with the patches from Jovi,
> "Support ftrace_event_file base multibuffer" adds rcu_read_lock()
> around uprobe_trace_print().

(Sorry about html text rejected by kernel.org, send again with plain text.)

Then we might need to call kmalloc before rcu_read_lock, also call kfree
after rcu_read_unlock.

And it's not needed to call kmalloc for each instances in multi-buffer
case, just
kmalloc once is enough.

I also have same concern about use kmalloc in uprobe handler, use kmalloc
in uprobe handler seems have a little overhead, why not pre-allocate one page
static memory for temp buffer(perhaps trace_uprobe based)? one page size
would be enough for all uprobe args storage, then we don't need to call
kmalloc in that "fast path".

Thanks.
>
>
> Steven, Jovi, what should we do with that patch? It seems that it
> was forgotten.
>
> I can take these patches into my ubprobes branch and then ask Ingo
> to pull. But this will complicate the routing of the new changes
> like this.
>
> Oleg.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-09 16:20     ` Oleg Nesterov
@ 2013-08-09 16:21       ` Oleg Nesterov
  2013-08-10  1:26       ` zhangwei(Jovi)
  1 sibling, 0 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-08-09 16:21 UTC (permalink / raw)
  To: Masami Hiramatsu, Steven Rostedt
  Cc: Namhyung Kim, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

On 08/09, Oleg Nesterov wrote:
>
> Steven, Jovi, what should we do with that patch? It seems that it
> was forgotten.
>
> I can take these patches into my ubprobes branch

I meant that patch from Jovi, sorry for confusion.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-09  9:56   ` Masami Hiramatsu
@ 2013-08-09 16:20     ` Oleg Nesterov
  2013-08-09 16:21       ` Oleg Nesterov
  2013-08-10  1:26       ` zhangwei(Jovi)
  2013-08-22 16:42     ` Steven Rostedt
  1 sibling, 2 replies; 92+ messages in thread
From: Oleg Nesterov @ 2013-08-09 16:20 UTC (permalink / raw)
  To: Masami Hiramatsu, Steven Rostedt
  Cc: Namhyung Kim, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

Sorry, I didn't read this series yet. Not that I think this needs my
help, but I'll try to do this a later...

On 08/09, Masami Hiramatsu wrote:
>
> I just concern using kmalloc() in the event handler.

GFP_KERNEL should be fine for uprobe handler.

However, iirc this conflicts with the patches from Jovi,
"Support ftrace_event_file base multibuffer" adds rcu_read_lock()
around uprobe_trace_print().

Steven, Jovi, what should we do with that patch? It seems that it
was forgotten.

I can take these patches into my ubprobes branch and then ask Ingo
to pull. But this will complicate the routing of the new changes
like this.

Oleg.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-09  8:45 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
@ 2013-08-09  9:56   ` Masami Hiramatsu
  2013-08-09 16:20     ` Oleg Nesterov
  2013-08-22 16:42     ` Steven Rostedt
  0 siblings, 2 replies; 92+ messages in thread
From: Masami Hiramatsu @ 2013-08-09  9:56 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Namhyung Kim, Hyeoncheol Lee, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

(2013/08/09 17:45), Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> Fetching from user space should be done in a non-atomic context.  So
> use a temporary buffer and copy its content to the ring buffer
> atomically.
> 
> While at it, use __get_data_size() and store_trace_args() to reduce
> code duplication.

I just concern using kmalloc() in the event handler. For fetching user
memory which can be swapped out, that is true. But most of the cases,
we can presume that it exists on the physical memory.

I'd like to ask the opinions of Srikar and Oleg.

BTW, you'd better add to patch description why previously this is
not needed, and your series needs it too. :)

Thank you,

> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/trace/trace_uprobe.c | 69 ++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 53 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index f991cac2b9ba..2888b95b063f 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -516,15 +516,31 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
>  	struct uprobe_trace_entry_head *entry;
>  	struct ring_buffer_event *event;
>  	struct ring_buffer *buffer;
> -	void *data;
> -	int size, i;
> +	void *data, *tmp;
> +	int size, dsize, esize;
>  	struct ftrace_event_call *call = &tu->p.call;
>  
> -	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +	dsize = __get_data_size(&tu->p, regs);
> +	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +
> +	/*
> +	 * A temporary buffer is used for storing fetched data before reserving
> +	 * the ring buffer because fetching from user space should be done in a
> +	 * non-atomic context.
> +	 */
> +	tmp = kmalloc(tu->p.size + dsize, GFP_KERNEL);
> +	if (tmp == NULL)
> +		return;
> +
> +	store_trace_args(esize, &tu->p, regs, tmp, dsize);
> +
> +	size = esize + tu->p.size + dsize;
>  	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> -						  size + tu->p.size, 0, 0);
> -	if (!event)
> +						  size, 0, 0);
> +	if (!event) {
> +		kfree(tmp);
>  		return;
> +	}
>  
>  	entry = ring_buffer_event_data(event);
>  	if (is_ret_probe(tu)) {
> @@ -536,13 +552,12 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
>  		data = DATAOF_TRACE_ENTRY(entry, false);
>  	}
>  
> -	for (i = 0; i < tu->p.nr_args; i++) {
> -		call_fetch(&tu->p.args[i].fetch, regs,
> -			   data + tu->p.args[i].offset);
> -	}
> +	memcpy(data, tmp, tu->p.size + dsize);
>  
>  	if (!filter_current_check_discard(buffer, call, entry, event))
>  		trace_buffer_unlock_commit(buffer, event, 0, 0);
> +
> +	kfree(tmp);
>  }
>  
>  /* uprobe handler */
> @@ -756,11 +771,30 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
>  	struct ftrace_event_call *call = &tu->p.call;
>  	struct uprobe_trace_entry_head *entry;
>  	struct hlist_head *head;
> -	void *data;
> -	int size, rctx, i;
> +	void *data, *tmp;
> +	int size, dsize, esize;
> +	int rctx;
> +
> +	dsize = __get_data_size(&tu->p, regs);
> +	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> +
> +	/*
> +	 * A temporary buffer is used for storing fetched data before reserving
> +	 * the ring buffer because fetching from user space should be done in a
> +	 * non-atomic context.
> +	 */
> +	tmp = kmalloc(tu->p.size + dsize, GFP_KERNEL);
> +	if (tmp == NULL)
> +		return;
>  
> -	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> -	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> +	store_trace_args(esize, &tu->p, regs, tmp, dsize);
> +
> +	size = esize + tu->p.size + dsize;
> +	size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> +	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough")) {
> +		kfree(tmp);
> +		return;
> +	}
>  
>  	preempt_disable();
>  	head = this_cpu_ptr(call->perf_events);
> @@ -780,15 +814,18 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
>  		data = DATAOF_TRACE_ENTRY(entry, false);
>  	}
>  
> -	for (i = 0; i < tu->p.nr_args; i++) {
> -		struct probe_arg *parg = &tu->p.args[i];
> +	memcpy(data, tmp, tu->p.size + dsize);
> +
> +	if (size - esize > tu->p.size + dsize) {
> +		int len = tu->p.size + dsize;
>  
> -		call_fetch(&parg->fetch, regs, data + parg->offset);
> +		memset(data + len, 0, size - esize - len);
>  	}
>  
>  	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
>   out:
>  	preempt_enable();
> +	kfree(tmp);
>  }
>  
>  /* uprobe profile handler */
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer
  2013-08-09  8:44 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v3) Namhyung Kim
@ 2013-08-09  8:45 ` Namhyung Kim
  2013-08-09  9:56   ` Masami Hiramatsu
  0 siblings, 1 reply; 92+ messages in thread
From: Namhyung Kim @ 2013-08-09  8:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Hyeoncheol Lee, Masami Hiramatsu, LKML,
	Srikar Dronamraju, Oleg Nesterov, zhangwei(Jovi),
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung.kim@lge.com>

Fetching from user space should be done in a non-atomic context.  So
use a temporary buffer and copy its content to the ring buffer
atomically.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/trace/trace_uprobe.c | 69 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f991cac2b9ba..2888b95b063f 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -516,15 +516,31 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 	struct uprobe_trace_entry_head *entry;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
-	void *data;
-	int size, i;
+	void *data, *tmp;
+	int size, dsize, esize;
 	struct ftrace_event_call *call = &tu->p.call;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	/*
+	 * A temporary buffer is used for storing fetched data before reserving
+	 * the ring buffer because fetching from user space should be done in a
+	 * non-atomic context.
+	 */
+	tmp = kmalloc(tu->p.size + dsize, GFP_KERNEL);
+	if (tmp == NULL)
+		return;
+
+	store_trace_args(esize, &tu->p, regs, tmp, dsize);
+
+	size = esize + tu->p.size + dsize;
 	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
-						  size + tu->p.size, 0, 0);
-	if (!event)
+						  size, 0, 0);
+	if (!event) {
+		kfree(tmp);
 		return;
+	}
 
 	entry = ring_buffer_event_data(event);
 	if (is_ret_probe(tu)) {
@@ -536,13 +552,12 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		call_fetch(&tu->p.args[i].fetch, regs,
-			   data + tu->p.args[i].offset);
-	}
+	memcpy(data, tmp, tu->p.size + dsize);
 
 	if (!filter_current_check_discard(buffer, call, entry, event))
 		trace_buffer_unlock_commit(buffer, event, 0, 0);
+
+	kfree(tmp);
 }
 
 /* uprobe handler */
@@ -756,11 +771,30 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 	struct ftrace_event_call *call = &tu->p.call;
 	struct uprobe_trace_entry_head *entry;
 	struct hlist_head *head;
-	void *data;
-	int size, rctx, i;
+	void *data, *tmp;
+	int size, dsize, esize;
+	int rctx;
+
+	dsize = __get_data_size(&tu->p, regs);
+	esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+	/*
+	 * A temporary buffer is used for storing fetched data before reserving
+	 * the ring buffer because fetching from user space should be done in a
+	 * non-atomic context.
+	 */
+	tmp = kmalloc(tu->p.size + dsize, GFP_KERNEL);
+	if (tmp == NULL)
+		return;
 
-	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
-	size = ALIGN(size + tu->p.size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	store_trace_args(esize, &tu->p, regs, tmp, dsize);
+
+	size = esize + tu->p.size + dsize;
+	size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
+	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough")) {
+		kfree(tmp);
+		return;
+	}
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
@@ -780,15 +814,18 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 		data = DATAOF_TRACE_ENTRY(entry, false);
 	}
 
-	for (i = 0; i < tu->p.nr_args; i++) {
-		struct probe_arg *parg = &tu->p.args[i];
+	memcpy(data, tmp, tu->p.size + dsize);
+
+	if (size - esize > tu->p.size + dsize) {
+		int len = tu->p.size + dsize;
 
-		call_fetch(&parg->fetch, regs, data + parg->offset);
+		memset(data + len, 0, size - esize - len);
 	}
 
 	perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
  out:
 	preempt_enable();
+	kfree(tmp);
 }
 
 /* uprobe profile handler */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2013-11-25 14:11 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-29  6:53 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
2013-10-29  6:53 ` [PATCH 01/13] tracing/uprobes: Fix documentation of uprobe registration syntax Namhyung Kim
2013-10-29  6:53 ` [PATCH 02/13] tracing/probes: Fix basic print type functions Namhyung Kim
2013-10-29  6:53 ` [PATCH 03/13] tracing/kprobes: Move fetch functions to trace_kprobe.c Namhyung Kim
2013-10-29  6:53 ` [PATCH 04/13] tracing/kprobes: Add fetch{,_size} member into deref fetch method Namhyung Kim
2013-10-29  6:53 ` [PATCH 05/13] tracing/kprobes: Staticize stack and memory fetch functions Namhyung Kim
2013-10-29  6:53 ` [PATCH 06/13] tracing/kprobes: Factor out struct trace_probe Namhyung Kim
2013-10-29  6:53 ` [PATCH 07/13] tracing/uprobes: Convert to " Namhyung Kim
2013-10-29  6:53 ` [PATCH 08/13] tracing/kprobes: Move common functions to trace_probe.h Namhyung Kim
2013-10-29  6:53 ` [PATCH 09/13] tracing/kprobes: Integrate duplicate set_print_fmt() Namhyung Kim
2013-10-29  6:53 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
2013-10-31 18:16   ` Oleg Nesterov
2013-11-01  9:00     ` Namhyung Kim
2013-11-04  8:06     ` Namhyung Kim
2013-11-04 14:35       ` Oleg Nesterov
2013-11-05  1:12         ` Namhyung Kim
2013-11-01 15:09   ` Oleg Nesterov
2013-11-01 15:22     ` Oleg Nesterov
2013-11-03 20:20       ` Oleg Nesterov
2013-11-04  8:11         ` Namhyung Kim
2013-11-04 14:38           ` Oleg Nesterov
2013-11-05  1:17             ` Namhyung Kim
2013-10-29  6:53 ` [PATCH 11/13] tracing/kprobes: Add priv argument to fetch functions Namhyung Kim
2013-11-04 16:09   ` Oleg Nesterov
2013-11-05  2:10     ` Namhyung Kim
2013-10-29  6:53 ` [PATCH 12/13] tracing/uprobes: Add more " Namhyung Kim
2013-10-31 18:22   ` Oleg Nesterov
2013-11-04  8:50     ` Namhyung Kim
2013-11-04 16:44       ` Oleg Nesterov
2013-11-04 17:17         ` Steven Rostedt
2013-11-05  2:19           ` Namhyung Kim
2013-11-05  2:17         ` Namhyung Kim
2013-11-01 17:53   ` Oleg Nesterov
2013-10-29  6:53 ` [PATCH 13/13] tracing/uprobes: Add support for full argument access methods Namhyung Kim
2013-10-30 10:36 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Masami Hiramatsu
2013-11-02 15:54 ` Oleg Nesterov
2013-11-04  8:46   ` Namhyung Kim
2013-11-04  8:59     ` Namhyung Kim
2013-11-04 15:51       ` Oleg Nesterov
2013-11-04 16:22         ` Oleg Nesterov
2013-11-04 18:47           ` Oleg Nesterov
2013-11-04 18:57             ` Oleg Nesterov
2013-11-05  2:51               ` Namhyung Kim
2013-11-05 16:41                 ` Oleg Nesterov
2013-11-06  8:37                   ` Namhyung Kim
2013-11-05  2:49             ` Namhyung Kim
2013-11-05  6:58             ` Namhyung Kim
2013-11-05 17:45               ` Oleg Nesterov
2013-11-05 19:24                 ` Oleg Nesterov
2013-11-06  8:57                   ` Namhyung Kim
2013-11-06 17:37                     ` Oleg Nesterov
2013-11-06 18:24                       ` Oleg Nesterov
2013-11-07  9:00                         ` Namhyung Kim
2013-11-08 17:00                           ` Oleg Nesterov
2013-11-12  7:49                             ` Namhyung Kim
2013-11-07  8:48                       ` Namhyung Kim
2013-11-09  3:18                         ` Masami Hiramatsu
2013-11-09 15:23                           ` Oleg Nesterov
2013-11-12  8:00                             ` Namhyung Kim
2013-11-12 18:44                               ` Oleg Nesterov
2013-11-25  6:59                               ` Namhyung Kim
2013-11-25 14:12                                 ` [PATCH] uprobes: Allocate ->utask before handler_chain() for tracing handlers Oleg Nesterov
2013-11-06  8:48                 ` [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v6) Namhyung Kim
2013-11-06 16:28                   ` Oleg Nesterov
2013-11-07  7:33                     ` Namhyung Kim
2013-11-08 16:52                       ` Oleg Nesterov
2013-11-05  2:15           ` Namhyung Kim
2013-11-05 16:33             ` Oleg Nesterov
2013-11-06  8:34               ` Namhyung Kim
2013-11-05  1:59         ` Namhyung Kim
2013-11-04 15:01     ` Oleg Nesterov
2013-11-05  1:53       ` Namhyung Kim
2013-11-05 16:28         ` Oleg Nesterov
2013-11-06  8:31           ` Namhyung Kim
  -- strict thread matches above, loose matches on Subject: below --
2013-09-03  5:44 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v5) Namhyung Kim
2013-09-03  5:44 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
2013-09-03 10:50   ` Masami Hiramatsu
2013-09-04  7:08     ` Namhyung Kim
2013-09-03 11:10   ` zhangwei(Jovi)
2013-09-04  7:10     ` Namhyung Kim
2013-08-27  8:48 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v4) Namhyung Kim
2013-08-27  8:48 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
2013-08-09  8:44 [PATCHSET 00/13] tracing/uprobes: Add support for more fetch methods (v3) Namhyung Kim
2013-08-09  8:45 ` [PATCH 10/13] tracing/uprobes: Fetch args before reserving a ring buffer Namhyung Kim
2013-08-09  9:56   ` Masami Hiramatsu
2013-08-09 16:20     ` Oleg Nesterov
2013-08-09 16:21       ` Oleg Nesterov
2013-08-10  1:26       ` zhangwei(Jovi)
2013-08-10  1:41         ` zhangwei(Jovi)
2013-08-10 14:06           ` zhangwei(Jovi)
2013-08-22 16:42     ` Steven Rostedt
2013-08-22 23:57       ` zhangwei(Jovi)
2013-08-23  1:08         ` Steven Rostedt
2013-08-27  8:07           ` Namhyung Kim
2013-08-23  4:22         ` Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.