linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg
@ 2013-11-23 20:15 Oleg Nesterov
  2013-11-23 20:16 ` [PATCH 1/1] " Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-23 20:15 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

Hello,

After I looked at Namhyung's patches I learned that trace_probes
has a rich set of ->fetch methods.

But perhaps it makes sense to add a bit more? The patch is simple
and the idea looks natural. If this patch is accepted, I can try
to add a bit more hacks to fetch the per-cpu data, but this needs
a couple of simple preparations.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg
  2013-11-23 20:15 [PATCH 0/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg Oleg Nesterov
@ 2013-11-23 20:16 ` Oleg Nesterov
  2013-11-24  7:32   ` Masami Hiramatsu
  2013-11-25 17:21 ` [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
  2013-11-25 19:29 ` [PATCH 0/2] tracing: Add $cpu and $current probe-vars Oleg Nesterov
  2 siblings, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-23 20:16 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

The probe can dump the registers or memory, but it is not possible
to dump, say, current->pid. This patch adds the pseudo regs table,
currently it has only two methods to get current/smp_processor_id
but it can be trivially extended.

This syntax is %%pseudo-reg-name, I agree in advance with any other
naming.

Test-case: 452 == offsetof(task_struct, pid), 39 == __NR_getpid,

  # perf probe 'sys_getpid%return ret=$retval:s32 pid=+452(%%current):s32'
  # perf record -e probe:sys_getpid perl -e 'syscall 39'
  # perf --no-pager script | tail -1
	perl   586 [001]   753.102549: probe:sys_getpid: \
	(ffffffff81052c00 <- ffffffff8134d012) ret=586 pid=586

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   57 +++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 412e959..e167c0a 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -109,13 +109,14 @@ DEFINE_FETCH_##method(u64)
 	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
 	 && (fn != NULL))
 
+static unsigned long probe_get_register(struct pt_regs *, unsigned long);
+
 /* Data fetch function templates */
 #define DEFINE_FETCH_reg(type)						\
 static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,	\
 					void *offset, void *dest)	\
 {									\
-	*(type *)dest = (type)regs_get_register(regs,			\
-				(unsigned int)((unsigned long)offset));	\
+	*(type *)dest = (type)probe_get_register(regs, (long)offset);	\
 }
 DEFINE_BASIC_FETCH_FUNCS(reg)
 /* No string on the register */
@@ -548,6 +549,52 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 	return ret;
 }
 
+#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
+
+static unsigned long pseudo_reg_cpu(void)
+{
+	return (unsigned long)raw_smp_processor_id();
+}
+
+static unsigned long pseudo_reg_current(void)
+{
+	return (unsigned long)current;
+}
+
+static struct {
+	const char *name;
+	unsigned long (*fetch)(void);
+}
+const pseudo_reg_table[] = {
+	{
+		.name	= "cpu",
+		.fetch	= pseudo_reg_cpu,
+	},
+	{
+		.name	= "current",
+		.fetch	= pseudo_reg_current,
+	},
+};
+
+static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
+{
+	if (offset < PSEUDO_REG_OFFSET)
+		return regs_get_register(regs, offset);
+
+	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
+}
+
+static int pseudo_reg_query_offset(const char *name)
+{
+	int nr;
+
+	for (nr = 0; nr < ARRAY_SIZE(pseudo_reg_table); ++nr)
+		if (strcmp(pseudo_reg_table[nr].name, name) == 0)
+			return PSEUDO_REG_OFFSET + nr;
+
+	return -EINVAL;
+}
+
 /* Recursive argument parser */
 static int parse_probe_arg(char *arg, const struct fetch_type *t,
 		     struct fetch_param *f, bool is_return, bool is_kprobe)
@@ -569,7 +616,11 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 		break;
 
 	case '%':	/* named register */
-		ret = regs_query_register_offset(arg + 1);
+		if (arg[1] == '%')
+			ret = pseudo_reg_query_offset(arg + 2);
+		else
+			ret = regs_query_register_offset(arg + 1);
+
 		if (ret >= 0) {
 			f->fn = t->fetch[FETCH_MTD_reg];
 			f->data = (void *)(unsigned long)ret;
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg
  2013-11-23 20:16 ` [PATCH 1/1] " Oleg Nesterov
@ 2013-11-24  7:32   ` Masami Hiramatsu
  2013-11-25  8:04     ` Namhyung Kim
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-24  7:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

(2013/11/24 5:16), Oleg Nesterov wrote:
> The probe can dump the registers or memory, but it is not possible
> to dump, say, current->pid. This patch adds the pseudo regs table,
> currently it has only two methods to get current/smp_processor_id
> but it can be trivially extended.

Good catch! :)

> This syntax is %%pseudo-reg-name, I agree in advance with any other
> naming.

For this kind of use, kprobe-tracer already provides '$' prefix :)
e.g. $stack, $retval etc. Please see parse_probe_vars in trace_probe.c.

So, $current and $cpu is better for me.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg
  2013-11-24  7:32   ` Masami Hiramatsu
@ 2013-11-25  8:04     ` Namhyung Kim
  2013-11-25 14:35       ` Oleg Nesterov
  0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2013-11-25  8:04 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Oleg Nesterov, Steven Rostedt, Namhyung Kim, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, linux-kernel

On Sun, 24 Nov 2013 16:32:53 +0900, Masami Hiramatsu wrote:
> (2013/11/24 5:16), Oleg Nesterov wrote:
>> The probe can dump the registers or memory, but it is not possible
>> to dump, say, current->pid. This patch adds the pseudo regs table,
>> currently it has only two methods to get current/smp_processor_id
>> but it can be trivially extended.
>
> Good catch! :)
>
>> This syntax is %%pseudo-reg-name, I agree in advance with any other
>> naming.
>
> For this kind of use, kprobe-tracer already provides '$' prefix :)
> e.g. $stack, $retval etc. Please see parse_probe_vars in trace_probe.c.
>
> So, $current and $cpu is better for me.

Agreed.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg
  2013-11-25  8:04     ` Namhyung Kim
@ 2013-11-25 14:35       ` Oleg Nesterov
  0 siblings, 0 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 14:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Masami Hiramatsu, Steven Rostedt, Namhyung Kim,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, linux-kernel

On 11/25, Namhyung Kim wrote:
>
> On Sun, 24 Nov 2013 16:32:53 +0900, Masami Hiramatsu wrote:
> > (2013/11/24 5:16), Oleg Nesterov wrote:
> >
> >> This syntax is %%pseudo-reg-name, I agree in advance with any other
> >> naming.
> >
> > For this kind of use, kprobe-tracer already provides '$' prefix :)
> > e.g. $stack, $retval etc. Please see parse_probe_vars in trace_probe.c.
> >
> > So, $current and $cpu is better for me.
>
> Agreed.

OK, thanks, I'll redo.

But I hope you will agree with overloading of FETCH_MTD_reg, I do not
want to add yet another method.



In fact I thought about $current too, and I agree it looks better. But
this conflicts with other changes I had in mind. I was going to add the
support for @symbol+%reg and +%reg(...), then add the %%percpu_shift
pseudo reg. But on a second thought this doesn't look very nice, so
lets use $name.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data
  2013-11-23 20:15 [PATCH 0/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg Oleg Nesterov
  2013-11-23 20:16 ` [PATCH 1/1] " Oleg Nesterov
@ 2013-11-25 17:21 ` Oleg Nesterov
  2013-11-25 17:21   ` [RFC PATCH 1/3] tracing: Don't mangle sc->addr in update_symbol_cache() Oleg Nesterov
                     ` (2 more replies)
  2013-11-25 19:29 ` [PATCH 0/2] tracing: Add $cpu and $current probe-vars Oleg Nesterov
  2 siblings, 3 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 17:21 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

On 11/23, Oleg Nesterov wrote:
>
> After I looked at Namhyung's patches I learned that trace_probes
> has a rich set of ->fetch methods.
>
> But perhaps it makes sense to add a bit more? The patch is simple
> and the idea looks natural. If this patch is accepted, I can try
> to add a bit more hacks to fetch the per-cpu data, but this needs
> a couple of simple preparations.

Before I redo/resend this patch, let me show the next changes which
allow to dump the per-cpu data.

Not for inclusion! I am not sure if this is really useful, I am not
sure about the syntax or implementation, etc. Just for discussion,
please comment.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/3] tracing: Don't mangle sc->addr in update_symbol_cache()
  2013-11-25 17:21 ` [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
@ 2013-11-25 17:21   ` Oleg Nesterov
  2013-11-25 17:21   ` [RFC PATCH 2/3] tracing: introduce {calc,parse}_probe_offset() for FETCH_MTD_{symbol,deref} Oleg Nesterov
  2013-11-25 17:22   ` [RFC PATCH 3/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
  2 siblings, 0 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 17:21 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

Preparation for the next changes. Shift "+= offset" from
update_symbol_cache() to ->fetch[FETCH_MTD_symbol], this
allows to overload the meaning of sc->offset.

Also remove the !offset check in traceprobe_split_symbol_offset(),
it is never called with offset == NULL.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   13 +++----------
 1 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e167c0a..dcf7780 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -230,14 +230,9 @@ struct symbol_cache {
 	unsigned long	addr;
 };
 
-static unsigned long update_symbol_cache(struct symbol_cache *sc)
+static void update_symbol_cache(struct symbol_cache *sc)
 {
 	sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-
-	if (sc->addr)
-		sc->addr += sc->offset;
-
-	return sc->addr;
 }
 
 static void free_symbol_cache(struct symbol_cache *sc)
@@ -273,8 +268,9 @@ static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
 					  void *data, void *dest)	\
 {									\
 	struct symbol_cache *sc = data;					\
+	long off = sc->offset;						\
 	if (sc->addr)							\
-		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
+		fetch_memory_##type(regs, (void *)sc->addr + off, dest);\
 	else								\
 		*(type *)dest = 0;					\
 }
@@ -497,9 +493,6 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 	char *tmp;
 	int ret;
 
-	if (!offset)
-		return -EINVAL;
-
 	tmp = strchr(symbol, '+');
 	if (tmp) {
 		/* skip sign because kstrtoul doesn't accept '+' */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/3] tracing: introduce {calc,parse}_probe_offset() for FETCH_MTD_{symbol,deref}
  2013-11-25 17:21 ` [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
  2013-11-25 17:21   ` [RFC PATCH 1/3] tracing: Don't mangle sc->addr in update_symbol_cache() Oleg Nesterov
@ 2013-11-25 17:21   ` Oleg Nesterov
  2013-11-25 17:22   ` [RFC PATCH 3/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
  2 siblings, 0 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 17:21 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

Preparation to simplify the review of the next patch.

Add 2 "dummy" helpers used by FETCH_MTD_{symbol,deref} parse/fetch
code: calc_probe_offset(offset) which currently simply returns its
arg, and parse_probe_offset(name, offset) which calls kstrtol().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   27 +++++++++++++++++++++------
 1 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index dcf7780..723e6e9 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -263,12 +263,14 @@ static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
 	return sc;
 }
 
+static long calc_probe_offset(unsigned long offset);
+
 #define DEFINE_FETCH_symbol(type)					\
 static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
 					  void *data, void *dest)	\
 {									\
 	struct symbol_cache *sc = data;					\
-	long off = sc->offset;						\
+	long off = calc_probe_offset(sc->offset);			\
 	if (sc->addr)							\
 		fetch_memory_##type(regs, (void *)sc->addr + off, dest);\
 	else								\
@@ -292,7 +294,7 @@ static __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,\
 	unsigned long addr;						\
 	call_fetch(&dprm->orig, regs, &addr);				\
 	if (addr) {							\
-		addr += dprm->offset;					\
+		addr += calc_probe_offset(dprm->offset);		\
 		fetch_memory_##type(regs, (void *)addr, dest);		\
 	} else								\
 		*(type *)dest = 0;					\
@@ -487,6 +489,8 @@ static fetch_func_t get_fetch_size_function(const struct fetch_type *type,
 	return NULL;
 }
 
+static int parse_probe_offset(const char *name, unsigned long *offset);
+
 /* Split symbol and offset. */
 int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 {
@@ -495,8 +499,7 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 
 	tmp = strchr(symbol, '+');
 	if (tmp) {
-		/* skip sign because kstrtoul doesn't accept '+' */
-		ret = kstrtoul(tmp + 1, 0, offset);
+		ret = parse_probe_offset(tmp, offset);
 		if (ret)
 			return ret;
 
@@ -588,6 +591,19 @@ static int pseudo_reg_query_offset(const char *name)
 	return -EINVAL;
 }
 
+static long calc_probe_offset(unsigned long offset)
+{
+	return offset;
+}
+
+static int parse_probe_offset(const char *name, unsigned long *offset)
+{
+	if (name[0] == '+')	/* kstrtol() rejects '+' */
+		name++;
+
+	return kstrtol(name, 0, offset);
+}
+
 /* Recursive argument parser */
 static int parse_probe_arg(char *arg, const struct fetch_type *t,
 		     struct fetch_param *f, bool is_return, bool is_kprobe)
@@ -641,14 +657,13 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 		break;
 
 	case '+':	/* deref memory */
-		arg++;	/* Skip '+', because kstrtol() rejects it. */
 	case '-':
 		tmp = strchr(arg, '(');
 		if (!tmp)
 			break;
 
 		*tmp = '\0';
-		ret = kstrtol(arg, 0, &offset);
+		ret = parse_probe_offset(arg, &offset);
 
 		if (ret)
 			break;
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 3/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data
  2013-11-25 17:21 ` [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
  2013-11-25 17:21   ` [RFC PATCH 1/3] tracing: Don't mangle sc->addr in update_symbol_cache() Oleg Nesterov
  2013-11-25 17:21   ` [RFC PATCH 2/3] tracing: introduce {calc,parse}_probe_offset() for FETCH_MTD_{symbol,deref} Oleg Nesterov
@ 2013-11-25 17:22   ` Oleg Nesterov
  2013-11-26  9:34     ` Masami Hiramatsu
  2 siblings, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 17:22 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

@symbol can't be used to dump the per-cpu variables. The same is
true for +offset(something) if "something" results in __percpu
pointer.

With this patch parse_probe_offset() treats "~" before the numeric
offset as "per cpu" mark and stores it in the lowest bit,
calc_probe_offset() simply adds per_cpu_offset(smp_processor_id())
if this bit is set.

We could turn {dprm,sc}->offset into the "struct probe_offset" which
holds offset + is_percpu, but this hack looks simple enough and I hope
that LONG_MAX/2 is enough for the numeric offset.

Test-case: 2088 == offsetof(struct rq, curr)

  # perf probe 'do_exit curr=%%current rq_curr=@runqueues+~2088:u64'
  # perf record -e probe:do_exit true
  # perf --no-pager script | tail -1
	true   537 [000]   521.282640: probe:do_exit: (ffffffff8103dd60) \
	curr=ffff88001da8c900 rq_curr=ffff88001da8c900

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   35 ++++++++++++++++++++++++++++++++---
 1 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 723e6e9..bcf6827 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -593,15 +593,44 @@ static int pseudo_reg_query_offset(const char *name)
 
 static long calc_probe_offset(unsigned long offset)
 {
-	return offset;
+	long off = offset >> 1;
+
+	if (offset & 1)
+		off += per_cpu_offset(raw_smp_processor_id());
+
+	return off;
 }
 
 static int parse_probe_offset(const char *name, unsigned long *offset)
 {
-	if (name[0] == '+')	/* kstrtol() rejects '+' */
+	bool negative = false;
+	long percpu_bit = 0;
+	long off = 0;
+
+	switch (name[0]) {
+	case '-':
+		negative = true;
+	case '+':
 		name++;
+	}
 
-	return kstrtol(name, 0, offset);
+	if (name[0] == '~') {
+		percpu_bit = 1;
+		name++;
+	}
+
+	if (name[0]) {
+		int err = kstrtoul(name, 0, &off);
+
+		if (err || off >= LONG_MAX/2)
+			return -EINVAL;
+
+		if (negative)
+			off = -off;
+	}
+
+	*offset = (off << 1) | percpu_bit;
+	return 0;
 }
 
 /* Recursive argument parser */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 0/2] tracing: Add $cpu and $current probe-vars
  2013-11-23 20:15 [PATCH 0/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg Oleg Nesterov
  2013-11-23 20:16 ` [PATCH 1/1] " Oleg Nesterov
  2013-11-25 17:21 ` [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
@ 2013-11-25 19:29 ` Oleg Nesterov
  2013-11-25 19:29   ` [PATCH 1/2] " Oleg Nesterov
  2013-11-25 19:30   ` [PATCH 2/2] tracing: Kill FETCH_MTD_retval, reimplement $retval via pseudo_reg_retval() Oleg Nesterov
  2 siblings, 2 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 19:29 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

On 11/23, Oleg Nesterov wrote:
>
> Hello,
>
> After I looked at Namhyung's patches I learned that trace_probes
> has a rich set of ->fetch methods.
>
> But perhaps it makes sense to add a bit more? The patch is simple
> and the idea looks natural. If this patch is accepted, I can try
> to add a bit more hacks to fetch the per-cpu data, but this needs
> a couple of simple preparations.

OK, please see v2.

As Masami and Namhyung suggested it uses '$' prefix instead of '%%'.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/2] tracing: Add $cpu and $current probe-vars
  2013-11-25 19:29 ` [PATCH 0/2] tracing: Add $cpu and $current probe-vars Oleg Nesterov
@ 2013-11-25 19:29   ` Oleg Nesterov
  2013-11-26  2:21     ` Masami Hiramatsu
  2013-11-25 19:30   ` [PATCH 2/2] tracing: Kill FETCH_MTD_retval, reimplement $retval via pseudo_reg_retval() Oleg Nesterov
  1 sibling, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 19:29 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

The probe can dump the registers or memory, but it is not possible
to dump, say, current->pid. This patch adds the pseudo regs table,
currently it has only two methods to get current/smp_processor_id
but it can be trivially extended.

The syntax is '$cpu' and '$current', we overload FETCH_MTD_reg just
to avoid the new fetch method(s).

Test-case: 452 == offsetof(task_struct, pid), 39 == __NR_getpid,

  # perf probe 'sys_getpid%return ret=$retval:s32 pid=+452($current):s32'
  # perf record -e probe:sys_getpid perl -e 'syscall 39'
  # perf --no-pager script | tail -1
	perl   586 [001]   753.102549: probe:sys_getpid: \
	(ffffffff81052c00 <- ffffffff8134d012) ret=586 pid=586

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   61 +++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 412e959..b066e9d 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -109,13 +109,14 @@ DEFINE_FETCH_##method(u64)
 	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
 	 && (fn != NULL))
 
+static unsigned long probe_get_register(struct pt_regs *, unsigned long);
+
 /* Data fetch function templates */
 #define DEFINE_FETCH_reg(type)						\
 static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,	\
 					void *offset, void *dest)	\
 {									\
-	*(type *)dest = (type)regs_get_register(regs,			\
-				(unsigned int)((unsigned long)offset));	\
+	*(type *)dest = (type)probe_get_register(regs, (long)offset);	\
 }
 DEFINE_BASIC_FETCH_FUNCS(reg)
 /* No string on the register */
@@ -513,6 +514,52 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 	return 0;
 }
 
+static unsigned long pseudo_reg_cpu(void)
+{
+	return (unsigned long)raw_smp_processor_id();
+}
+
+static unsigned long pseudo_reg_current(void)
+{
+	return (unsigned long)current;
+}
+
+static struct {
+	const char *name;
+	unsigned long (*fetch)(void);
+}
+const pseudo_reg_table[] = {
+	{
+		.name	= "cpu",
+		.fetch	= pseudo_reg_cpu,
+	},
+	{
+		.name	= "current",
+		.fetch	= pseudo_reg_current,
+	},
+};
+
+#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
+
+static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
+{
+	if (offset < PSEUDO_REG_OFFSET)
+		return regs_get_register(regs, offset);
+
+	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
+}
+
+static int pseudo_reg_query_offset(const char *name)
+{
+	int nr;
+
+	for (nr = 0; nr < ARRAY_SIZE(pseudo_reg_table); ++nr)
+		if (strcmp(pseudo_reg_table[nr].name, name) == 0)
+			return PSEUDO_REG_OFFSET + nr;
+
+	return -EINVAL;
+}
+
 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
 
 static int parse_probe_vars(char *arg, const struct fetch_type *t,
@@ -542,8 +589,14 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 			}
 		} else
 			ret = -EINVAL;
-	} else
-		ret = -EINVAL;
+	} else {
+		ret = pseudo_reg_query_offset(arg);
+		if (ret >= 0) {
+			f->fn = t->fetch[FETCH_MTD_reg];
+			f->data = (void *)(unsigned long)ret;
+			ret = 0;
+		}
+	}
 
 	return ret;
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] tracing: Kill FETCH_MTD_retval, reimplement $retval via pseudo_reg_retval()
  2013-11-25 19:29 ` [PATCH 0/2] tracing: Add $cpu and $current probe-vars Oleg Nesterov
  2013-11-25 19:29   ` [PATCH 1/2] " Oleg Nesterov
@ 2013-11-25 19:30   ` Oleg Nesterov
  1 sibling, 0 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-25 19:30 UTC (permalink / raw)
  To: Steven Rostedt, Namhyung Kim, Masami Hiramatsu,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa
  Cc: linux-kernel

Now that we have pseudo_reg_table[] we can kill FETCH_MTD_retval
and related code, we can simply add the new pseudo_reg_retval()
entry into the pseudo_reg_table[],

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   42 +++++++++++++++++++-----------------------
 kernel/trace/trace_probe.h |    1 -
 2 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index b066e9d..fce222f 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -135,17 +135,6 @@ DEFINE_BASIC_FETCH_FUNCS(stack)
 #define fetch_stack_string	NULL
 #define fetch_stack_string_size	NULL
 
-#define DEFINE_FETCH_retval(type)					\
-static __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,\
-					  void *dummy, void *dest)	\
-{									\
-	*(type *)dest = (type)regs_return_value(regs);			\
-}
-DEFINE_BASIC_FETCH_FUNCS(retval)
-/* No string on the retval */
-#define fetch_retval_string		NULL
-#define fetch_retval_string_size	NULL
-
 #define DEFINE_FETCH_memory(type)					\
 static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
 					  void *addr, void *dest)	\
@@ -394,7 +383,6 @@ free_bitfield_fetch_param(struct bitfield_fetch_param *data)
 	 .fetch = {					\
 ASSIGN_FETCH_FUNC(reg, ftype),				\
 ASSIGN_FETCH_FUNC(stack, ftype),			\
-ASSIGN_FETCH_FUNC(retval, ftype),			\
 ASSIGN_FETCH_FUNC(memory, ftype),			\
 ASSIGN_FETCH_FUNC(symbol, ftype),			\
 ASSIGN_FETCH_FUNC(deref, ftype),			\
@@ -514,22 +502,31 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 	return 0;
 }
 
-static unsigned long pseudo_reg_cpu(void)
+static unsigned long pseudo_reg_retval(struct pt_regs *regs)
+{
+	return regs_return_value(regs);
+}
+
+static unsigned long pseudo_reg_cpu(struct pt_regs *regs)
 {
 	return (unsigned long)raw_smp_processor_id();
 }
 
-static unsigned long pseudo_reg_current(void)
+static unsigned long pseudo_reg_current(struct pt_regs *regs)
 {
 	return (unsigned long)current;
 }
 
 static struct {
 	const char *name;
-	unsigned long (*fetch)(void);
+	unsigned long (*fetch)(struct pt_regs *regs);
 }
 const pseudo_reg_table[] = {
 	{
+		.name	= "retval",
+		.fetch	= pseudo_reg_retval,
+	},
+	{
 		.name	= "cpu",
 		.fetch	= pseudo_reg_cpu,
 	},
@@ -546,7 +543,7 @@ static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offs
 	if (offset < PSEUDO_REG_OFFSET)
 		return regs_get_register(regs, offset);
 
-	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
+	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch(regs);
 }
 
 static int pseudo_reg_query_offset(const char *name)
@@ -568,12 +565,7 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 	int ret = 0;
 	unsigned long param;
 
-	if (strcmp(arg, "retval") == 0) {
-		if (is_return)
-			f->fn = t->fetch[FETCH_MTD_retval];
-		else
-			ret = -EINVAL;
-	} else if (strncmp(arg, "stack", 5) == 0) {
+	if (strncmp(arg, "stack", 5) == 0) {
 		if (arg[5] == '\0') {
 			if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR) == 0)
 				f->fn = fetch_stack_address;
@@ -590,7 +582,11 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 		} else
 			ret = -EINVAL;
 	} else {
-		ret = pseudo_reg_query_offset(arg);
+		if (strcmp(arg, "retval") == 0 && !is_return)
+			ret = -EINVAL;
+		else
+			ret = pseudo_reg_query_offset(arg);
+
 		if (ret >= 0) {
 			f->fn = t->fetch[FETCH_MTD_reg];
 			f->data = (void *)(unsigned long)ret;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 5c7e09d..b6f33b1 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -90,7 +90,6 @@ typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void
 enum {
 	FETCH_MTD_reg = 0,
 	FETCH_MTD_stack,
-	FETCH_MTD_retval,
 	FETCH_MTD_memory,
 	FETCH_MTD_symbol,
 	FETCH_MTD_deref,
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] tracing: Add $cpu and $current probe-vars
  2013-11-25 19:29   ` [PATCH 1/2] " Oleg Nesterov
@ 2013-11-26  2:21     ` Masami Hiramatsu
  2013-11-26 17:23       ` Oleg Nesterov
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-26  2:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

(2013/11/26 4:29), Oleg Nesterov wrote:
> The probe can dump the registers or memory, but it is not possible
> to dump, say, current->pid. This patch adds the pseudo regs table,
> currently it has only two methods to get current/smp_processor_id
> but it can be trivially extended.
> 
> The syntax is '$cpu' and '$current', we overload FETCH_MTD_reg just
> to avoid the new fetch method(s).
> 
> Test-case: 452 == offsetof(task_struct, pid), 39 == __NR_getpid,
> 
>   # perf probe 'sys_getpid%return ret=$retval:s32 pid=+452($current):s32'
>   # perf record -e probe:sys_getpid perl -e 'syscall 39'
>   # perf --no-pager script | tail -1
> 	perl   586 [001]   753.102549: probe:sys_getpid: \
> 	(ffffffff81052c00 <- ffffffff8134d012) ret=586 pid=586
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/trace/trace_probe.c |   61 +++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 412e959..b066e9d 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -109,13 +109,14 @@ DEFINE_FETCH_##method(u64)
>  	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
>  	 && (fn != NULL))
>  
> +static unsigned long probe_get_register(struct pt_regs *, unsigned long);
> +
>  /* Data fetch function templates */
>  #define DEFINE_FETCH_reg(type)						\
>  static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,	\
>  					void *offset, void *dest)	\
>  {									\
> -	*(type *)dest = (type)regs_get_register(regs,			\
> -				(unsigned int)((unsigned long)offset));	\
> +	*(type *)dest = (type)probe_get_register(regs, (long)offset);	\
>  }
>  DEFINE_BASIC_FETCH_FUNCS(reg)
>  /* No string on the register */
> @@ -513,6 +514,52 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
>  	return 0;
>  }
>  
> +static unsigned long pseudo_reg_cpu(void)
> +{
> +	return (unsigned long)raw_smp_processor_id();
> +}
> +
> +static unsigned long pseudo_reg_current(void)
> +{
> +	return (unsigned long)current;
> +}
> +
> +static struct {
> +	const char *name;
> +	unsigned long (*fetch)(void);
> +}
> +const pseudo_reg_table[] = {
> +	{
> +		.name	= "cpu",
> +		.fetch	= pseudo_reg_cpu,
> +	},
> +	{
> +		.name	= "current",
> +		.fetch	= pseudo_reg_current,
> +	},
> +};
> +
> +#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
> +
> +static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
> +{
> +	if (offset < PSEUDO_REG_OFFSET)
> +		return regs_get_register(regs, offset);
> +
> +	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
> +}


Hmm, I don't like this, since fetch_reg is the most frequently
used fetch method, and it actually uses the offset in different
way.

Why don't you make another fetch function for vars?
Perhaps, you can make fetch_retval more generic.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 3/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data
  2013-11-25 17:22   ` [RFC PATCH 3/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
@ 2013-11-26  9:34     ` Masami Hiramatsu
  2013-11-26 17:43       ` [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol " Oleg Nesterov
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-26  9:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

(2013/11/26 2:22), Oleg Nesterov wrote:
> @symbol can't be used to dump the per-cpu variables. The same is
> true for +offset(something) if "something" results in __percpu
> pointer.
>
> With this patch parse_probe_offset() treats "~" before the numeric
> offset as "per cpu" mark and stores it in the lowest bit,
> calc_probe_offset() simply adds per_cpu_offset(smp_processor_id())
> if this bit is set.

IMHO, if the symbol is a per-cpu symbol, it is meaningless to
access the symbol itself. For the symbol(static) percpu, maybe
we can use is_kernel_percpu_address() to check. If the symbol is
percpu, it should be automatically translated to something like
FETCH_percpu, instead of such additional expression.

For the dynamic allocated per-cpu things, it is also good to give
a straight operation, like +10(percpu(%rdi)).

> We could turn {dprm,sc}->offset into the "struct probe_offset" which
> holds offset + is_percpu, but this hack looks simple enough and I hope
> that LONG_MAX/2 is enough for the numeric offset.

Also, please do not be afraid to add fetch operation, such kind of
is_percpu flag is not dynamically changed. Then we can use another
fetch function.

Thank you,



-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] tracing: Add $cpu and $current probe-vars
  2013-11-26  2:21     ` Masami Hiramatsu
@ 2013-11-26 17:23       ` Oleg Nesterov
  2013-11-27  8:22         ` Masami Hiramatsu
  0 siblings, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-26 17:23 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

On 11/26, Masami Hiramatsu wrote:
>
> (2013/11/26 4:29), Oleg Nesterov wrote:
> > +#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
> > +
> > +static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
> > +{
> > +	if (offset < PSEUDO_REG_OFFSET)
> > +		return regs_get_register(regs, offset);
> > +
> > +	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
> > +}
>
>
> Hmm, I don't like this, since fetch_reg is the most frequently
> used fetch method, and it actually uses the offset in different
> way.

Sure, this overloads the usage of FETCH_MTD_reg/offset.

And btw, yes, the naming is ugly (I mean pseudo_*). But why this
is bad? This is cheap and simple.

And,

> Why don't you make another fetch function for vars?

This is what I tried to avoid ;) I do not want to add another
FETCH_MTD_. To me, this looks like unnecessary complication and
bloat (but see below).

> Perhaps, you can make fetch_retval more generic.

Please look at 2/2. It kills fetch_retval because we make
fetch_reg more generic (this patch), and this saves 300 bytes.

However. I understand that this is subjective, so please ignore.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-26  9:34     ` Masami Hiramatsu
@ 2013-11-26 17:43       ` Oleg Nesterov
  2013-11-26 17:44         ` [RFC PATCH 1/2] tracing: Don't update sc->addr in update_symbol_cache() Oleg Nesterov
                           ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-26 17:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

On 11/26, Masami Hiramatsu wrote:
>
> (2013/11/26 2:22), Oleg Nesterov wrote:
> > @symbol can't be used to dump the per-cpu variables. The same is
> > true for +offset(something) if "something" results in __percpu
> > pointer.
> >
> > With this patch parse_probe_offset() treats "~" before the numeric
> > offset as "per cpu" mark and stores it in the lowest bit,
> > calc_probe_offset() simply adds per_cpu_offset(smp_processor_id())
> > if this bit is set.
>
> IMHO, if the symbol is a per-cpu symbol, it is meaningless to
> access the symbol itself. For the symbol(static) percpu, maybe
> we can use is_kernel_percpu_address() to check.

No, we can't use is_kernel_percpu_address(), this is another thing.
is_kernel_percpu_address(&my_pcpu_var) == F. It is true for
&per_cpu(my_pcpu_var), this is not what we need.

> If the symbol is
> percpu, it should be automatically translated to something like
> FETCH_percpu, instead of such additional expression.

OK. Probably yes, it should be automatically translated, please
see the patches.

This doesn't allow to read the data from other CPUs, but at least
the changes are simple and this_cpu_ is better than the reading
from the obviously wrong address.

> For the dynamic allocated per-cpu things, it is also good to give
> a straight operation, like +10(percpu(%rdi)).

Probably yes, but this needs more changes and I am still not sure
this is really useful. And if we do this, we probably also need
to allow to read from other CPUs...

> Also, please do not be afraid to add fetch operation,

Well, this is what I am trying to avoid ;)

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/2] tracing: Don't update sc->addr in update_symbol_cache()
  2013-11-26 17:43       ` [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol " Oleg Nesterov
@ 2013-11-26 17:44         ` Oleg Nesterov
  2013-11-26 17:44         ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Oleg Nesterov
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-26 17:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

Preparation for the next changes. Shift "+= offset" from
update_symbol_cache() to ->fetch[FETCH_MTD_symbol], this
allows to detect the per-cpu symbols.

Also remove the !offset check in traceprobe_split_symbol_offset(),
it is never called with offset == NULL.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   13 +++----------
 1 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index fce222f..097cc80 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -219,14 +219,9 @@ struct symbol_cache {
 	unsigned long	addr;
 };
 
-static unsigned long update_symbol_cache(struct symbol_cache *sc)
+static void update_symbol_cache(struct symbol_cache *sc)
 {
 	sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-
-	if (sc->addr)
-		sc->addr += sc->offset;
-
-	return sc->addr;
 }
 
 static void free_symbol_cache(struct symbol_cache *sc)
@@ -262,8 +257,9 @@ static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
 					  void *data, void *dest)	\
 {									\
 	struct symbol_cache *sc = data;					\
+	long off = sc->offset;						\
 	if (sc->addr)							\
-		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
+		fetch_memory_##type(regs, (void *)sc->addr + off, dest);\
 	else								\
 		*(type *)dest = 0;					\
 }
@@ -485,9 +481,6 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
 	char *tmp;
 	int ret;
 
-	if (!offset)
-		return -EINVAL;
-
 	tmp = strchr(symbol, '+');
 	if (tmp) {
 		/* skip sign because kstrtoul doesn't accept '+' */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-26 17:43       ` [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol " Oleg Nesterov
  2013-11-26 17:44         ` [RFC PATCH 1/2] tracing: Don't update sc->addr in update_symbol_cache() Oleg Nesterov
@ 2013-11-26 17:44         ` Oleg Nesterov
  2013-11-26 17:50           ` modules, add_kallsyms() && DEFINE_PER_CPU Oleg Nesterov
  2013-11-27 11:30           ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Masami Hiramatsu
  2013-11-27  0:37         ` [RFC PATCH 0/2] " Namhyung Kim
  2013-11-27 10:01         ` Masami Hiramatsu
  3 siblings, 2 replies; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-26 17:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

FETCH_FUNC_NAME(symbol) blindly dereferences sc->addr + sc->offset,
this is not what we want if this symbol is per-cpu. Change this code
to use this_cpu_ptr(sc->addr) in this case.

Note: this doesn't work for modules, but module's per-cpu data is
not visible for kallsyms_lookup_name() anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_probe.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 097cc80..a0144b6 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -252,14 +252,28 @@ static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
 	return sc;
 }
 
+static void *sc_calc_addr(struct symbol_cache *sc)
+{
+	void *addr = (void*)sc->addr;
+
+	if (addr) {
+		if (addr >= (void *)__per_cpu_start &&
+		    addr <  (void *)__per_cpu_end)
+			addr = this_cpu_ptr(addr);
+
+		addr += sc->offset;
+	}
+
+	return addr;
+}
+
 #define DEFINE_FETCH_symbol(type)					\
 static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
 					  void *data, void *dest)	\
 {									\
-	struct symbol_cache *sc = data;					\
-	long off = sc->offset;						\
-	if (sc->addr)							\
-		fetch_memory_##type(regs, (void *)sc->addr + off, dest);\
+	void *addr = sc_calc_addr(data);				\
+	if (addr)							\
+		fetch_memory_##type(regs, addr, dest);			\
 	else								\
 		*(type *)dest = 0;					\
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* modules, add_kallsyms() && DEFINE_PER_CPU
  2013-11-26 17:44         ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Oleg Nesterov
@ 2013-11-26 17:50           ` Oleg Nesterov
  2013-11-27  2:23             ` Masami Hiramatsu
  2013-11-27 11:30           ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Masami Hiramatsu
  1 sibling, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-26 17:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Rusty Russell
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

On 11/26, Oleg Nesterov wrote:
>
> Note: this doesn't work for modules, but module's per-cpu data is
> not visible for kallsyms_lookup_name() anyway.

Rusty, I am just curious if it makes sense to change this or not...

But DEFINE_PER_CPU'ed symbols are ignored by add_kallsyms(). I guess
this is because is_core_symbol() requires "sh_flags & SHF_ALLOC".
And probably because of INIT_OFFSET_MASK.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-26 17:43       ` [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol " Oleg Nesterov
  2013-11-26 17:44         ` [RFC PATCH 1/2] tracing: Don't update sc->addr in update_symbol_cache() Oleg Nesterov
  2013-11-26 17:44         ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Oleg Nesterov
@ 2013-11-27  0:37         ` Namhyung Kim
  2013-11-27 10:01         ` Masami Hiramatsu
  3 siblings, 0 replies; 32+ messages in thread
From: Namhyung Kim @ 2013-11-27  0:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Masami Hiramatsu, Steven Rostedt, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, linux-kernel, Rusty Russell

Hi Oleg,

(Please use my korg email address, otherwise I might not see - at least 
in time - the messages.)

2013-11-27 AM 2:43, Oleg Nesterov wrote:
> On 11/26, Masami Hiramatsu wrote:
>> If the symbol is
>> percpu, it should be automatically translated to something like
>> FETCH_percpu, instead of such additional expression.
>
> OK. Probably yes, it should be automatically translated, please
> see the patches.
>
> This doesn't allow to read the data from other CPUs, but at least
> the changes are simple and this_cpu_ is better than the reading
> from the obviously wrong address.

Agreed.  I also like the automatic translation.

>
>> For the dynamic allocated per-cpu things, it is also good to give
>> a straight operation, like +10(percpu(%rdi)).
>
> Probably yes, but this needs more changes and I am still not sure
> this is really useful. And if we do this, we probably also need
> to allow to read from other CPUs...

Right.  This should be another story.

Do we also need something similar to user-space TLS?

>
>> Also, please do not be afraid to add fetch operation,
>
> Well, this is what I am trying to avoid ;)

Well, it looks like a trade-off between code-size (or performance) and 
readability (or modularity).  For this specific case, I don't have any 
preference though.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: modules, add_kallsyms() && DEFINE_PER_CPU
  2013-11-26 17:50           ` modules, add_kallsyms() && DEFINE_PER_CPU Oleg Nesterov
@ 2013-11-27  2:23             ` Masami Hiramatsu
  2013-11-27  8:20               ` Namhyung Kim
  2013-11-27 13:35               ` Oleg Nesterov
  0 siblings, 2 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-27  2:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Rusty Russell, Steven Rostedt, Namhyung Kim, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, linux-kernel, yrl.pp-manager.tt

(2013/11/27 2:50), Oleg Nesterov wrote:
> On 11/26, Oleg Nesterov wrote:
>>
>> Note: this doesn't work for modules, but module's per-cpu data is
>> not visible for kallsyms_lookup_name() anyway.
> 
> Rusty, I am just curious if it makes sense to change this or not...
> 
> But DEFINE_PER_CPU'ed symbols are ignored by add_kallsyms(). I guess
> this is because is_core_symbol() requires "sh_flags & SHF_ALLOC".
> And probably because of INIT_OFFSET_MASK.

Oleg, I think you can do it by using is_module_percpu_address(). :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: modules, add_kallsyms() && DEFINE_PER_CPU
  2013-11-27  2:23             ` Masami Hiramatsu
@ 2013-11-27  8:20               ` Namhyung Kim
  2013-11-27 11:22                 ` Masami Hiramatsu
  2013-11-27 13:35               ` Oleg Nesterov
  1 sibling, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2013-11-27  8:20 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Oleg Nesterov, Rusty Russell, Steven Rostedt, Namhyung Kim,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, linux-kernel,
	yrl.pp-manager.tt

Hi Masami,

On Wed, 27 Nov 2013 11:23:43 +0900, Masami Hiramatsu wrote:
> (2013/11/27 2:50), Oleg Nesterov wrote:
>> On 11/26, Oleg Nesterov wrote:
>>>
>>> Note: this doesn't work for modules, but module's per-cpu data is
>>> not visible for kallsyms_lookup_name() anyway.
>> 
>> Rusty, I am just curious if it makes sense to change this or not...
>> 
>> But DEFINE_PER_CPU'ed symbols are ignored by add_kallsyms(). I guess
>> this is because is_core_symbol() requires "sh_flags & SHF_ALLOC".
>> And probably because of INIT_OFFSET_MASK.
>
> Oleg, I think you can do it by using is_module_percpu_address(). :)

It seems the function only works for a translated address like
is_kernel_percpu_address() does.  But we want to check a not-yet-
translated symbol address, right?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] tracing: Add $cpu and $current probe-vars
  2013-11-26 17:23       ` Oleg Nesterov
@ 2013-11-27  8:22         ` Masami Hiramatsu
  2013-11-27 17:05           ` Oleg Nesterov
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-27  8:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

(2013/11/27 2:23), Oleg Nesterov wrote:
> On 11/26, Masami Hiramatsu wrote:
>>
>> (2013/11/26 4:29), Oleg Nesterov wrote:
>>> +#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
>>> +
>>> +static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
>>> +{
>>> +	if (offset < PSEUDO_REG_OFFSET)
>>> +		return regs_get_register(regs, offset);
>>> +
>>> +	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
>>> +}
>>
>>
>> Hmm, I don't like this, since fetch_reg is the most frequently
>> used fetch method, and it actually uses the offset in different
>> way.
> 
> Sure, this overloads the usage of FETCH_MTD_reg/offset.
> 
> And btw, yes, the naming is ugly (I mean pseudo_*). But why this
> is bad? This is cheap and simple.

I think it's not simple. The code looks short, but not well
self-described. It is "hidden" in the structure, and double-meaning.

> And,
> 
>> Why don't you make another fetch function for vars?
> 
> This is what I tried to avoid ;) I do not want to add another
> FETCH_MTD_. To me, this looks like unnecessary complication and
> bloat (but see below).

I see, perhaps, it is a time to introduce independent fetch
method implementation. Current one is not sophisticated
nor generic. We need a kind of "bytecode" implementation
for more complex expression, especially for the combination
of several registers/variables on stacks. This feature will
also be useful for some other scriptable tracers like ktap,
systemtap etc.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-26 17:43       ` [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol " Oleg Nesterov
                           ` (2 preceding siblings ...)
  2013-11-27  0:37         ` [RFC PATCH 0/2] " Namhyung Kim
@ 2013-11-27 10:01         ` Masami Hiramatsu
  2013-11-27 17:41           ` Oleg Nesterov
  3 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-27 10:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

(2013/11/27 2:43), Oleg Nesterov wrote:
> On 11/26, Masami Hiramatsu wrote:
>>
>> (2013/11/26 2:22), Oleg Nesterov wrote:
>>> @symbol can't be used to dump the per-cpu variables. The same is
>>> true for +offset(something) if "something" results in __percpu
>>> pointer.
>>>
>>> With this patch parse_probe_offset() treats "~" before the numeric
>>> offset as "per cpu" mark and stores it in the lowest bit,
>>> calc_probe_offset() simply adds per_cpu_offset(smp_processor_id())
>>> if this bit is set.
>>
>> IMHO, if the symbol is a per-cpu symbol, it is meaningless to
>> access the symbol itself. For the symbol(static) percpu, maybe
>> we can use is_kernel_percpu_address() to check.
> 
> No, we can't use is_kernel_percpu_address(), this is another thing.
> is_kernel_percpu_address(&my_pcpu_var) == F. It is true for
> &per_cpu(my_pcpu_var), this is not what we need.

Ah OK.

> 
>> If the symbol is
>> percpu, it should be automatically translated to something like
>> FETCH_percpu, instead of such additional expression.
> 
> OK. Probably yes, it should be automatically translated, please
> see the patches.
> 
> This doesn't allow to read the data from other CPUs, but at least
> the changes are simple and this_cpu_ is better than the reading
> from the obviously wrong address.

Yeah, usually per_cpu variable is used in current cpu context.

>> For the dynamic allocated per-cpu things, it is also good to give
>> a straight operation, like +10(percpu(%rdi)).
> 
> Probably yes, but this needs more changes and I am still not sure
> this is really useful. And if we do this, we probably also need
> to allow to read from other CPUs...

No, it is enough to provide "percpu(FETCHARG)" which just returns
current cpu's percpu variable address. (Note that kprobes handler
runs in interrupt)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: modules, add_kallsyms() && DEFINE_PER_CPU
  2013-11-27  8:20               ` Namhyung Kim
@ 2013-11-27 11:22                 ` Masami Hiramatsu
  0 siblings, 0 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-27 11:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Oleg Nesterov, Rusty Russell, Steven Rostedt, Namhyung Kim,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, linux-kernel,
	yrl.pp-manager.tt

(2013/11/27 17:20), Namhyung Kim wrote:
> Hi Masami,
> 
> On Wed, 27 Nov 2013 11:23:43 +0900, Masami Hiramatsu wrote:
>> (2013/11/27 2:50), Oleg Nesterov wrote:
>>> On 11/26, Oleg Nesterov wrote:
>>>>
>>>> Note: this doesn't work for modules, but module's per-cpu data is
>>>> not visible for kallsyms_lookup_name() anyway.
>>>
>>> Rusty, I am just curious if it makes sense to change this or not...
>>>
>>> But DEFINE_PER_CPU'ed symbols are ignored by add_kallsyms(). I guess
>>> this is because is_core_symbol() requires "sh_flags & SHF_ALLOC".
>>> And probably because of INIT_OFFSET_MASK.
>>
>> Oleg, I think you can do it by using is_module_percpu_address(). :)
> 
> It seems the function only works for a translated address like
> is_kernel_percpu_address() does.  But we want to check a not-yet-
> translated symbol address, right?

Ah, got it. Thanks to pointed! :)


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-26 17:44         ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Oleg Nesterov
  2013-11-26 17:50           ` modules, add_kallsyms() && DEFINE_PER_CPU Oleg Nesterov
@ 2013-11-27 11:30           ` Masami Hiramatsu
  1 sibling, 0 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-27 11:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

(2013/11/27 2:44), Oleg Nesterov wrote:
> FETCH_FUNC_NAME(symbol) blindly dereferences sc->addr + sc->offset,
> this is not what we want if this symbol is per-cpu. Change this code
> to use this_cpu_ptr(sc->addr) in this case.

This looks good for me :)

Thanks!

> Note: this doesn't work for modules, but module's per-cpu data is
> not visible for kallsyms_lookup_name() anyway.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/trace/trace_probe.c |   22 ++++++++++++++++++----
>  1 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 097cc80..a0144b6 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -252,14 +252,28 @@ static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
>  	return sc;
>  }
>  
> +static void *sc_calc_addr(struct symbol_cache *sc)
> +{
> +	void *addr = (void*)sc->addr;
> +
> +	if (addr) {
> +		if (addr >= (void *)__per_cpu_start &&
> +		    addr <  (void *)__per_cpu_end)
> +			addr = this_cpu_ptr(addr);
> +
> +		addr += sc->offset;
> +	}
> +
> +	return addr;
> +}
> +
>  #define DEFINE_FETCH_symbol(type)					\
>  static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
>  					  void *data, void *dest)	\
>  {									\
> -	struct symbol_cache *sc = data;					\
> -	long off = sc->offset;						\
> -	if (sc->addr)							\
> -		fetch_memory_##type(regs, (void *)sc->addr + off, dest);\
> +	void *addr = sc_calc_addr(data);				\
> +	if (addr)							\
> +		fetch_memory_##type(regs, addr, dest);			\
>  	else								\
>  		*(type *)dest = 0;					\
>  }
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: modules, add_kallsyms() && DEFINE_PER_CPU
  2013-11-27  2:23             ` Masami Hiramatsu
  2013-11-27  8:20               ` Namhyung Kim
@ 2013-11-27 13:35               ` Oleg Nesterov
  2013-11-28  2:02                 ` Masami Hiramatsu
  1 sibling, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-27 13:35 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Rusty Russell, Steven Rostedt, Namhyung Kim, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, linux-kernel, yrl.pp-manager.tt

On 11/27, Masami Hiramatsu wrote:
>
> (2013/11/27 2:50), Oleg Nesterov wrote:
> > On 11/26, Oleg Nesterov wrote:
> >>
> >> Note: this doesn't work for modules, but module's per-cpu data is
> >> not visible for kallsyms_lookup_name() anyway.
> >
> > Rusty, I am just curious if it makes sense to change this or not...
> >
> > But DEFINE_PER_CPU'ed symbols are ignored by add_kallsyms(). I guess
> > this is because is_core_symbol() requires "sh_flags & SHF_ALLOC".
> > And probably because of INIT_OFFSET_MASK.
>
> Oleg, I think you can do it by using is_module_percpu_address(). :)

Not only is_module_percpu_address() can't help. We can solve the
is-it-percpu problem (although the check won't be cheap).

The problem is, sc->addr is always NULL if DEFINE_PER_CPU() was used
in a module. kallsyms_lookup_name() can never see it because it is
ignored by add_kallsyms(), it is nacked by is_core_symbol().

IOW, with or without this patch @modular_percpu_sym never reads the
memory.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] tracing: Add $cpu and $current probe-vars
  2013-11-27  8:22         ` Masami Hiramatsu
@ 2013-11-27 17:05           ` Oleg Nesterov
  2013-11-28  2:51             ` Masami Hiramatsu
  0 siblings, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-27 17:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

On 11/27, Masami Hiramatsu wrote:
>
> (2013/11/27 2:23), Oleg Nesterov wrote:
> > On 11/26, Masami Hiramatsu wrote:
> >>
> >> (2013/11/26 4:29), Oleg Nesterov wrote:
> >>> +#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
> >>> +
> >>> +static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
> >>> +{
> >>> +	if (offset < PSEUDO_REG_OFFSET)
> >>> +		return regs_get_register(regs, offset);
> >>> +
> >>> +	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
> >>> +}
> >>
> >>
> >> Hmm, I don't like this, since fetch_reg is the most frequently
> >> used fetch method, and it actually uses the offset in different
> >> way.
> >
> > Sure, this overloads the usage of FETCH_MTD_reg/offset.
> >
> > And btw, yes, the naming is ugly (I mean pseudo_*). But why this
> > is bad? This is cheap and simple.
>
> I think it's not simple. The code looks short, but not well
> self-described. It is "hidden" in the structure, and double-meaning.

Yes, I agree. And the ugly naming doesn't make it more clear/clean.

I'll try to cleanup this somehow and resend. Perhaps I should start
with 2/2 which generalizes FETCH_MTD_reg and kills FETCH_MTD_retval.
Or at least generalizes FETCH_MTD_retval.

> >> Why don't you make another fetch function for vars?
> >
> > This is what I tried to avoid ;) I do not want to add another
> > FETCH_MTD_. To me, this looks like unnecessary complication and
> > bloat (but see below).
>
> I see, perhaps, it is a time to introduce independent fetch
> method implementation. Current one is not sophisticated
> nor generic

Yes, yes, I agree. But until then I do not want to complicate/blow
this code to implement $current.

Thanks.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-27 10:01         ` Masami Hiramatsu
@ 2013-11-27 17:41           ` Oleg Nesterov
  2013-11-28  2:55             ` Masami Hiramatsu
  0 siblings, 1 reply; 32+ messages in thread
From: Oleg Nesterov @ 2013-11-27 17:41 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

On 11/27, Masami Hiramatsu wrote:
>
> (2013/11/27 2:43), Oleg Nesterov wrote:
> >
> > This doesn't allow to read the data from other CPUs, but at least
> > the changes are simple and this_cpu_ is better than the reading
> > from the obviously wrong address.
>
> Yeah, usually per_cpu variable is used in current cpu context.
>
> >> For the dynamic allocated per-cpu things, it is also good to give
> >> a straight operation, like +10(percpu(%rdi)).
> >
> > Probably yes, but this needs more changes and I am still not sure
> > this is really useful. And if we do this, we probably also need
> > to allow to read from other CPUs...
>
> No, it is enough to provide "percpu(FETCHARG)" which just returns
> current cpu's percpu variable address.

I don't really agree. I am not saying this is terribly useful, but:

> (Note that kprobes handler
> runs in interrupt)

but this doesn't matter at all, I think. The code can execute, say,
__percpu_counter_sum-like code.

And even if we dump the .data..percpu memory (@percpu_symbol) the
user might want to know the "global" state of this per-cpu object.

And note that sometimes DEFINE_PER_CPU doesn't really connect to
smp_processor_id(). Just for example, bp_cpuinfo[]. It is never
used as this-cpu-data. This means that @bp_cpuinfo+whatever is
always pointless.

But anyway I agree, this is not that important, lets ignore.

Oleg.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: modules, add_kallsyms() && DEFINE_PER_CPU
  2013-11-27 13:35               ` Oleg Nesterov
@ 2013-11-28  2:02                 ` Masami Hiramatsu
  0 siblings, 0 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-28  2:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Rusty Russell, Steven Rostedt, Namhyung Kim, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, linux-kernel, yrl.pp-manager.tt

(2013/11/27 22:35), Oleg Nesterov wrote:
> On 11/27, Masami Hiramatsu wrote:
>>
>> (2013/11/27 2:50), Oleg Nesterov wrote:
>>> On 11/26, Oleg Nesterov wrote:
>>>>
>>>> Note: this doesn't work for modules, but module's per-cpu data is
>>>> not visible for kallsyms_lookup_name() anyway.
>>>
>>> Rusty, I am just curious if it makes sense to change this or not...
>>>
>>> But DEFINE_PER_CPU'ed symbols are ignored by add_kallsyms(). I guess
>>> this is because is_core_symbol() requires "sh_flags & SHF_ALLOC".
>>> And probably because of INIT_OFFSET_MASK.
>>
>> Oleg, I think you can do it by using is_module_percpu_address(). :)
> 
> Not only is_module_percpu_address() can't help. We can solve the
> is-it-percpu problem (although the check won't be cheap).

I think you can do this at registering phase (by adding a flag
in symbol_cache). That is in a slow path.

> The problem is, sc->addr is always NULL if DEFINE_PER_CPU() was used
> in a module. kallsyms_lookup_name() can never see it because it is
> ignored by add_kallsyms(), it is nacked by is_core_symbol().
>
> IOW, with or without this patch @modular_percpu_sym never reads the
> memory.

Hmm, in that case, should we support it? It looks non-existed
symbol is given. I think until the module.c support per-cpu in
kallsyms, we can delay to support it. I mean we can do 3 steps
1. support percpu symbol in the kernel except for drivers
2. update module.c to store the percpu symbols into kallsyms.
3. support percpu symbol in module .


Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Re: [PATCH 1/2] tracing: Add $cpu and $current probe-vars
  2013-11-27 17:05           ` Oleg Nesterov
@ 2013-11-28  2:51             ` Masami Hiramatsu
  0 siblings, 0 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-28  2:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel

(2013/11/28 2:05), Oleg Nesterov wrote:
> On 11/27, Masami Hiramatsu wrote:
>>
>> (2013/11/27 2:23), Oleg Nesterov wrote:
>>> On 11/26, Masami Hiramatsu wrote:
>>>>
>>>> (2013/11/26 4:29), Oleg Nesterov wrote:
>>>>> +#define PSEUDO_REG_OFFSET	4096	/* arbitrary value > MAX_REG_OFFSET */
>>>>> +
>>>>> +static unsigned long probe_get_register(struct pt_regs *regs, unsigned long offset)
>>>>> +{
>>>>> +	if (offset < PSEUDO_REG_OFFSET)
>>>>> +		return regs_get_register(regs, offset);
>>>>> +
>>>>> +	return pseudo_reg_table[offset - PSEUDO_REG_OFFSET].fetch();
>>>>> +}
>>>>
>>>>
>>>> Hmm, I don't like this, since fetch_reg is the most frequently
>>>> used fetch method, and it actually uses the offset in different
>>>> way.
>>>
>>> Sure, this overloads the usage of FETCH_MTD_reg/offset.
>>>
>>> And btw, yes, the naming is ugly (I mean pseudo_*). But why this
>>> is bad? This is cheap and simple.
>>
>> I think it's not simple. The code looks short, but not well
>> self-described. It is "hidden" in the structure, and double-meaning.
> 
> Yes, I agree. And the ugly naming doesn't make it more clear/clean.
> 
> I'll try to cleanup this somehow and resend. Perhaps I should start
> with 2/2 which generalizes FETCH_MTD_reg and kills FETCH_MTD_retval.
> Or at least generalizes FETCH_MTD_retval.

Sounds good for me :)

>>>> Why don't you make another fetch function for vars?
>>>
>>> This is what I tried to avoid ;) I do not want to add another
>>> FETCH_MTD_. To me, this looks like unnecessary complication and
>>> bloat (but see below).
>>
>> I see, perhaps, it is a time to introduce independent fetch
>> method implementation. Current one is not sophisticated
>> nor generic
> 
> Yes, yes, I agree. But until then I do not want to complicate/blow
> this code to implement $current.

Yeah, that is another topic. Let's focus on making it more useful. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data
  2013-11-27 17:41           ` Oleg Nesterov
@ 2013-11-28  2:55             ` Masami Hiramatsu
  0 siblings, 0 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2013-11-28  2:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Namhyung Kim, Frederic Weisbecker, Ingo Molnar,
	Jiri Olsa, linux-kernel, Rusty Russell

(2013/11/28 2:41), Oleg Nesterov wrote:
> On 11/27, Masami Hiramatsu wrote:
>>
>> (2013/11/27 2:43), Oleg Nesterov wrote:
>>>
>>> This doesn't allow to read the data from other CPUs, but at least
>>> the changes are simple and this_cpu_ is better than the reading
>>> from the obviously wrong address.
>>
>> Yeah, usually per_cpu variable is used in current cpu context.
>>
>>>> For the dynamic allocated per-cpu things, it is also good to give
>>>> a straight operation, like +10(percpu(%rdi)).
>>>
>>> Probably yes, but this needs more changes and I am still not sure
>>> this is really useful. And if we do this, we probably also need
>>> to allow to read from other CPUs...
>>
>> No, it is enough to provide "percpu(FETCHARG)" which just returns
>> current cpu's percpu variable address.
> 
> I don't really agree. I am not saying this is terribly useful, but:
> 
>> (Note that kprobes handler
>> runs in interrupt)
> 
> but this doesn't matter at all, I think. The code can execute, say,
> __percpu_counter_sum-like code.
>
> And even if we dump the .data..percpu memory (@percpu_symbol) the
> user might want to know the "global" state of this per-cpu object.

I see, but it is the problem solved in the next step.
IMHO, giving people to access just to this cpu's variable is enough
useful. Expanding it to other cpu is another one.

> And note that sometimes DEFINE_PER_CPU doesn't really connect to
> smp_processor_id(). Just for example, bp_cpuinfo[]. It is never
> used as this-cpu-data. This means that @bp_cpuinfo+whatever is
> always pointless.

Yeah, there should be some exceptions.

> But anyway I agree, this is not that important, lets ignore.

 :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-11-28  2:55 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-23 20:15 [PATCH 0/1] tracing: Introduce "pseudo registers" for FETCH_MTD_reg Oleg Nesterov
2013-11-23 20:16 ` [PATCH 1/1] " Oleg Nesterov
2013-11-24  7:32   ` Masami Hiramatsu
2013-11-25  8:04     ` Namhyung Kim
2013-11-25 14:35       ` Oleg Nesterov
2013-11-25 17:21 ` [RFC PATCH 0/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
2013-11-25 17:21   ` [RFC PATCH 1/3] tracing: Don't mangle sc->addr in update_symbol_cache() Oleg Nesterov
2013-11-25 17:21   ` [RFC PATCH 2/3] tracing: introduce {calc,parse}_probe_offset() for FETCH_MTD_{symbol,deref} Oleg Nesterov
2013-11-25 17:22   ` [RFC PATCH 3/3] tracing: Teach FETCH_MTD_{symbol,deref} to handle per-cpu data Oleg Nesterov
2013-11-26  9:34     ` Masami Hiramatsu
2013-11-26 17:43       ` [RFC PATCH 0/2] tracing: Teach FETCH_MTD_symbol " Oleg Nesterov
2013-11-26 17:44         ` [RFC PATCH 1/2] tracing: Don't update sc->addr in update_symbol_cache() Oleg Nesterov
2013-11-26 17:44         ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Oleg Nesterov
2013-11-26 17:50           ` modules, add_kallsyms() && DEFINE_PER_CPU Oleg Nesterov
2013-11-27  2:23             ` Masami Hiramatsu
2013-11-27  8:20               ` Namhyung Kim
2013-11-27 11:22                 ` Masami Hiramatsu
2013-11-27 13:35               ` Oleg Nesterov
2013-11-28  2:02                 ` Masami Hiramatsu
2013-11-27 11:30           ` [RFC PATCH 2/2] tracing: Teach FETCH_MTD_symbol to handle per-cpu data Masami Hiramatsu
2013-11-27  0:37         ` [RFC PATCH 0/2] " Namhyung Kim
2013-11-27 10:01         ` Masami Hiramatsu
2013-11-27 17:41           ` Oleg Nesterov
2013-11-28  2:55             ` Masami Hiramatsu
2013-11-25 19:29 ` [PATCH 0/2] tracing: Add $cpu and $current probe-vars Oleg Nesterov
2013-11-25 19:29   ` [PATCH 1/2] " Oleg Nesterov
2013-11-26  2:21     ` Masami Hiramatsu
2013-11-26 17:23       ` Oleg Nesterov
2013-11-27  8:22         ` Masami Hiramatsu
2013-11-27 17:05           ` Oleg Nesterov
2013-11-28  2:51             ` Masami Hiramatsu
2013-11-25 19:30   ` [PATCH 2/2] tracing: Kill FETCH_MTD_retval, reimplement $retval via pseudo_reg_retval() Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).