linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFD 0/5] Add latency histogram
@ 2015-04-30 10:06 Daniel Wagner
  2015-04-30 10:06 ` [RFD 1/5] tracing: 'hist' triggers Daniel Wagner
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-04-30 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users,
	linux-kernel, Daniel Wagner

Hi,

I would like to discuss a possible way of getting the feature of the
latecy_hist.patch [1] added to mainline.

"Latency histograms are primarily relevant in the context of real-time
enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)and are used in the
quality management of the Linux real-time capabilities."

Steven pointed out that this might be doable based on Tom Zanussi's
"[PATCH v4 0/7] tracing: 'hist' triggers" [2].

Here are my findings. It was not too complicated to get it working,
though I had to add some hacks. I have added comments to each patch.

cheers,
daniel

[2] https://lkml.org/lkml/2015/4/10/591
[1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb

Daniel Wagner (4):
  tracing: Add support to sort on the key
  tracing: Add option to quantize key values
  tracing: Deference pointers without RCU checks
  tracing: Add trace_irqsoff tracepoints

Tom Zanussi (1):
  tracing: 'hist' triggers

 include/linux/rculist.h             | 36 +++++++++++++++++++++++++++++
 include/linux/tracepoint.h          |  4 ++--
 include/trace/events/latency.h      | 40 ++++++++++++++++++++++++++++++++
 kernel/trace/trace_events_hist.c    | 46 +++++++++++++++++++++++++++++++++----
 kernel/trace/trace_events_trigger.c | 18 +++++++++------
 kernel/trace/trace_irqsoff.c        | 38 ++++++++++++++++++++++++++++++
 6 files changed, 168 insertions(+), 14 deletions(-)
 create mode 100644 include/trace/events/latency.h

-- 
2.1.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFD 1/5] tracing: 'hist' triggers
  2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
@ 2015-04-30 10:06 ` Daniel Wagner
  2015-04-30 10:06 ` [RFD 2/5] tracing: Add support to sort on the key Daniel Wagner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-04-30 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users,
	linux-kernel, Daniel Wagner

From: Tom Zanussi <tom.zanussi@linux.intel.com>

Hi Daniel,

On Mon, 2015-04-20 at 14:52 +0200, Daniel Wagner wrote:
> Hi Tom,
>
> On 04/10/2015 06:05 PM, Tom Zanussi wrote:
> > This is v4 of the 'hist triggers' patchset, following feedback from
> > v3.
> >
> > This version fixes the race in tracing_map_insert() noted in v3, where
> > map.val.key could be checked even if map.val wasn't yet set.  The
> > simple fix for that in tracing_map_insert() introduces the possibility
> > of duplicates in the map, which though rare, need to be accounted for
> > in the output.  To address that, duplicate-merging code was added to
> > the map-printing code.
>
> Note: I might be abusing your patch. So this could be a completely
> bogus question/feedback.
>
> Most important information here is that I placed additional tracepoints
> during state transitions. I needed to disable the rcu_dereference
> consistency checks to be able to get it working (not claiming that this
> is the right thing to do)
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c728513..f194324 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -125,7 +125,7 @@ extern void syscall_unregfunc(void);
>                         return;                                         \
>                 prercu;                                                 \
>                 rcu_read_lock_sched_notrace();                          \
> -               it_func_ptr = rcu_dereference_sched((tp)->funcs);       \
> +               it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs); \
>                 if (it_func_ptr) {                                      \
>                         do {                                            \
>                                 it_func = (it_func_ptr)->func;          \
> @@ -175,7 +175,7 @@ extern void syscall_unregfunc(void);
>                                 TP_CONDITION(cond),,);                  \
>                 if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {             \
>                         rcu_read_lock_sched_notrace();                  \
> -                       rcu_dereference_sched(__tracepoint_##name.funcs);\
> +                       rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\
>                         rcu_read_unlock_sched_notrace();                \
>                 }                                                       \
>         }                                                               \
>
>
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 8523ea3..03f42b9 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -14,6 +14,8 @@
>  #include <linux/module.h>
>  #include <linux/ftrace.h>
>
> +#include <trace/events/sched.h>
> +
>  #include "trace.h"
>
>  static struct trace_array              *irqsoff_trace __read_mostly;
> @@ -433,11 +435,13 @@ void start_critical_timings(void)
>  {
>         if (preempt_trace() || irq_trace())
>                 start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> +       trace_sched_latency_critical_timing(1);
>  }
>  EXPORT_SYMBOL_GPL(start_critical_timings);
>
>  void stop_critical_timings(void)
>  {
> +       trace_sched_latency_critical_timing(0);
>         if (preempt_trace() || irq_trace())
>                 stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
>  }
> @@ -447,6 +451,7 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
>  #ifdef CONFIG_PROVE_LOCKING
>  void time_hardirqs_on(unsigned long a0, unsigned long a1)
>  {
> +       trace_sched_latency_irqs(0);
>         if (!preempt_trace() && irq_trace())
>                 stop_critical_timing(a0, a1);
>  }
> @@ -455,6 +460,7 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
>  {
>         if (!preempt_trace() && irq_trace())
>                 start_critical_timing(a0, a1);
> +       trace_sched_latency_irqs(1);
>  }
>
> When activating now the hist trigger by
>
>   echo 'hist:key=common_pid:val=enabled' > \
>     /sys/kernel/debug/tracing/events/sched/sched_latency_preempt/trigger
>
> the system crashes reliable after a very short time. Those tracepoints
> do work normally, so I see them in /sys/kernel/debug/tracing/trace.
>
> After some investigation I found out that event_hist_trigger() gets
> called with rec=NULL. This is handed over to hist_field_s32() and
> that function derefences the argument event (=NULL).
>
> #19 event_hist_trigger (data=0xffff88007934c780, rec=0x0 <irq_stack_union>) at kernel/trace/trace_events_hist.c:885
> #20 0xffffffff8113771b in event_triggers_call (file=<optimized out>, rec=0x0 <irq_stack_union>) at kernel/trace/trace_events_trigger.c:78
> #21 0xffffffff81079def in ftrace_trigger_soft_disabled (file=<optimized out>) at include/linux/ftrace_event.h:453
> #22 ftrace_raw_event_sched_latency_template (__data=0xffff88007c895aa8, enabled=0) at include/trace/events/sched.h:557
> #23 0xffffffff8112cbec in trace_sched_latency_preempt (enabled=<optimized out>) at include/trace/events/sched.h:587
> #24 trace_preempt_off (a0=18446744071579483412, a1=18446744071579485792) at kernel/trace/trace_irqsoff.c:532
> #25 0xffffffff8107f449 in preempt_count_add (val=1) at kernel/sched/core.c:2554
> #26 0xffffffff8109bd14 in get_lock_stats (class=0xffffffff82e235d0 <lock_classes+273296>) at kernel/locking/lockdep.c:249
> #27 0xffffffff8109c660 in lock_release_holdtime (hlock=0xffff880079c8ee50) at kernel/locking/lockdep.c:267
> #28 0xffffffff810a2cc9 in lock_release_holdtime (hlock=<optimized out>) at kernel/locking/lockdep.c:3464
> #29 lock_release_nested (ip=<optimized out>, lock=<optimized out>, curr=<optimized out>) at kernel/locking/lockdep.c:3481
> #30 __lock_release (ip=<optimized out>, nested=<optimized out>, lock=<optimized out>) at kernel/locking/lockdep.c:3507
> #31 lock_release (lock=0xffffffff8226e5f0 <kernfs_mutex+112>, nested=<optimized out>, ip=18446744071590106190) at kernel/locking/lockdep.c:3628
> #32 0xffffffff81abd306 in __mutex_unlock_common_slowpath (nested=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:744
> #33 __mutex_unlock_slowpath (lock_count=0xffff88007934c780) at kernel/locking/mutex.c:769
> #34 0xffffffff81abd44e in mutex_unlock (lock=<optimized out>) at kernel/locking/mutex.c:446
> #35 0xffffffff81238e84 in kernfs_dop_revalidate (dentry=0xffff88007acede00, flags=<optimized out>) at fs/kernfs/dir.c:470
> #36 0xffffffff811c9813 in d_revalidate (flags=<optimized out>, dentry=<optimized out>) at fs/namei.c:607
> #37 lookup_fast (nd=0xffff8800790e3e48, path=0xffff8800790e3ca8, inode=0xffff8800790e3cb8) at fs/namei.c:1465
> #38 0xffffffff811cc05f in walk_component (follow=<optimized out>, path=<optimized out>, nd=<optimized out>) at fs/namei.c:1577
> #39 link_path_walk (name=0xffff880079fa602d "virtual/block/loop6/subsystem", nd=0xffff8800790e3e48) at fs/namei.c:1836
> #40 0xffffffff811cd327 in path_init (dfd=<optimized out>, name=0xffff880079fa6020 "/sys/devices/virtual/block/loop6/subsystem", flags=2051493056, nd=0xffff8800790e3e48)
>     at fs/namei.c:1952
> #41 0xffffffff811cda90 in path_lookupat (dfd=<optimized out>, name=<optimized out>, flags=16448, nd=0xffff8800790e3e48) at fs/namei.c:1995
> #42 0xffffffff811cec1a in filename_lookup (dfd=-100, name=0xffff880079fa6000, flags=16384, nd=0xffff8800790e3e48) at fs/namei.c:2030
> #43 0xffffffff811d0d34 in user_path_at_empty (dfd=-100, name=<optimized out>, flags=16384, path=0xffff8800790e3f38, empty=<optimized out>) at fs/namei.c:2197
> #44 0xffffffff811c285c in SYSC_readlinkat (bufsiz=<optimized out>, buf=<optimized out>, pathname=<optimized out>, dfd=<optimized out>) at fs/stat.c:327
> #45 SyS_readlinkat (bufsiz=<optimized out>, buf=<optimized out>, pathname=<optimized out>, dfd=<optimized out>) at fs/stat.c:315
> #46 SYSC_readlink (bufsiz=<optimized out>, buf=<optimized out>, path=<optimized out>) at fs/stat.c:352
> #47 SyS_readlink (path=-131939361831040, buf=0, bufsiz=<optimized out>) at fs/stat.c:349
>
> So I am wondering if the path from ftrace_trigger_soft_disabled()
> to event_triggers_call() is supposed never to happen? The comment
> on ftrace_trigger_soft_disabled() indicate this might happen on
> normal operation:
>

So I looked at this on the plane and you're right, this is a path that
should never be taken in this case, since the hist trigger does set the
post_trigger flag and should therefore TRIGGER_COND would be true and
that block should never be entered.

However, the code that registers the trigger first enables the event,
then sets the cond flag, which would allow this to happen - a trigger
expecting data would be passed a NULL rec.

Can you try the patch below and see if it fixes the problem? (untested
and I haven't even had a chance to compile-test it..)

Tom

[PATCH] tracing: Update cond flag before enabling or disabling a trigger

Before a trigger is enabled, the cond flag should be set beforehand,
otherwise an trigger that's expecting to process a trace record
(e.g. one with post_trigger set) could be invoked without one.

Likewise a trigger's cond flag should be reset after it's disabled,
not before.

[dw: added a couple of more update_cond_flag()]

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 kernel/trace/trace_events_hist.c    |  3 ++-
 kernel/trace/trace_events_trigger.c | 12 ++++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 200f0ad..9a7a675 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1471,11 +1471,12 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
 	list_add_rcu(&data->list, &file->triggers);
 	ret++;
 
+	update_cond_flag(file);
 	if (trace_event_trigger_enable_disable(file, 1) < 0) {
 		list_del_rcu(&data->list);
+		update_cond_flag(file);
 		ret--;
 	}
-	update_cond_flag(file);
  out:
 	return ret;
 }
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 7105f15..39aabaa 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -546,11 +546,12 @@ static int register_trigger(char *glob, struct event_trigger_ops *ops,
 	list_add_rcu(&data->list, &file->triggers);
 	ret++;
 
+	update_cond_flag(file);
 	if (trace_event_trigger_enable_disable(file, 1) < 0) {
 		list_del_rcu(&data->list);
+		update_cond_flag(file);
 		ret--;
 	}
-	update_cond_flag(file);
 out:
 	return ret;
 }
@@ -578,8 +579,8 @@ void unregister_trigger(char *glob, struct event_trigger_ops *ops,
 		if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
 			unregistered = true;
 			list_del_rcu(&data->list);
-			update_cond_flag(file);
 			trace_event_trigger_enable_disable(file, 0);
+			update_cond_flag(file);
 			break;
 		}
 	}
@@ -1263,6 +1264,7 @@ int event_enable_trigger_func(struct event_command *cmd_ops,
 		goto out_free;
 	}
 
+	update_cond_flag(file);
 	ret = trace_event_enable_disable(event_enable_file, 1, 1);
 	if (ret < 0)
 		goto out_put;
@@ -1285,6 +1287,7 @@ int event_enable_trigger_func(struct event_command *cmd_ops,
  out_disable:
 	trace_event_enable_disable(event_enable_file, 0, 1);
  out_put:
+	update_cond_flag(file);
 	module_put(event_enable_file->event_call->mod);
  out_free:
 	if (cmd_ops->set_filter)
@@ -1324,11 +1327,12 @@ int event_enable_register_trigger(char *glob,
 	list_add_rcu(&data->list, &file->triggers);
 	ret++;
 
+	update_cond_flag(file);
 	if (trace_event_trigger_enable_disable(file, 1) < 0) {
 		list_del_rcu(&data->list);
+		update_cond_flag(file);
 		ret--;
 	}
-	update_cond_flag(file);
 out:
 	return ret;
 }
@@ -1351,8 +1355,8 @@ void event_enable_unregister_trigger(char *glob,
 		    (enable_data->file == test_enable_data->file)) {
 			unregistered = true;
 			list_del_rcu(&data->list);
-			update_cond_flag(file);
 			trace_event_trigger_enable_disable(file, 0);
+			update_cond_flag(file);
 			break;
 		}
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFD 2/5] tracing: Add support to sort on the key
  2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
  2015-04-30 10:06 ` [RFD 1/5] tracing: 'hist' triggers Daniel Wagner
@ 2015-04-30 10:06 ` Daniel Wagner
  2015-05-01  2:02   ` Tom Zanussi
  2015-04-30 10:06 ` [RFD 3/5] tracing: Add option to quantize key values Daniel Wagner
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2015-04-30 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users,
	linux-kernel, Daniel Wagner

The hist patch allows sorting on values only. By allowing to
sort also on the key we can do something like this:

'hist:key=latency:val=hitcount:sort=latency'

latency:         16 hitcount:          3
latency:         17 hitcount:        171
latency:         18 hitcount:        626
latency:         19 hitcount:        594
latency:         20 hitcount:        306
latency:         21 hitcount:        214
latency:         22 hitcount:        232
latency:         23 hitcount:        283
latency:         24 hitcount:        235
latency:         25 hitcount:        105
latency:         26 hitcount:         54
latency:         27 hitcount:         79
latency:         28 hitcount:        214
latency:         29 hitcount:        895
latency:         30 hitcount:       1400
latency:         31 hitcount:        774
latency:         32 hitcount:        653
[...]

The obvious choice for the bool was already taken. I haven't found a
good name for the the flag. I guess it would make sense to refactor the
sorting code so that it doesn't really matter what kind of field it
is.

BTW, I wonder if it would possible to drop the need to always provide
the 'val' argument and just assume the 'val=hitcount' in this case.

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 kernel/trace/trace_events_hist.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 9a7a675..fe06707 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -89,6 +89,7 @@ enum hist_field_flags {
 struct hist_trigger_sort_key {
 	bool		use_hitcount;
 	bool		use_key;
+	bool		use_real_key;
 	bool		descending;
 	unsigned int	idx;
 };
@@ -260,7 +261,7 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
 	}
 }
 
-static inline struct hist_trigger_sort_key *create_default_sort_key(void)
+static inline struct hist_trigger_sort_key *create_hitcount_sort_key(void)
 {
 	struct hist_trigger_sort_key *sort_key;
 
@@ -273,6 +274,19 @@ static inline struct hist_trigger_sort_key *create_default_sort_key(void)
 	return sort_key;
 }
 
+static inline struct hist_trigger_sort_key *create_real_key_sort_key(void)
+{
+	struct hist_trigger_sort_key *sort_key;
+
+	sort_key = kzalloc(sizeof(*sort_key), GFP_KERNEL);
+	if (!sort_key)
+		return ERR_PTR(-ENOMEM);
+
+	sort_key->use_real_key = true;
+
+	return sort_key;
+}
+
 static inline struct hist_trigger_sort_key *
 create_sort_key(char *field_name, struct hist_trigger_data *hist_data)
 {
@@ -280,7 +294,10 @@ create_sort_key(char *field_name, struct hist_trigger_data *hist_data)
 	unsigned int i;
 
 	if (!strcmp(field_name, "hitcount"))
-		return create_default_sort_key();
+		return create_hitcount_sort_key();
+
+	if (!strcmp(field_name, hist_data->key->field->name))
+		return create_real_key_sort_key();
 
 	for (i = 0; i < hist_data->n_vals; i++)
 		if (!strcmp(field_name, hist_data->vals[i]->field->name))
@@ -306,7 +323,7 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
 	int ret = 0;
 
 	if (!fields_str) {
-		sort_key = create_default_sort_key();
+		sort_key = create_hitcount_sort_key();
 		if (IS_ERR(sort_key)) {
 			ret = PTR_ERR(sort_key);
 			goto out;
@@ -984,6 +1001,12 @@ static int cmp_entries(const struct hist_trigger_sort_entry **a,
 	hist_data = entry_a->hist_data;
 	sort_key = hist_data->sort_key_cur;
 
+	if (sort_key->use_real_key) {
+		val_a = *(u64 *)entry_a->key;
+		val_b = *(u64 *)entry_b->key;
+		goto out;
+	}
+
 	if (sort_key->use_key) {
 		if (memcmp((*a)->key, (*b)->key, hist_data->map->key_size))
 			ret = 1;
@@ -998,6 +1021,7 @@ static int cmp_entries(const struct hist_trigger_sort_entry **a,
 		val_b = atomic64_read(&entry_b->sums[sort_key->idx]);
 	}
 
+out:
 	if (val_a > val_b)
 		ret = 1;
 	else if (val_a < val_b)
@@ -1372,7 +1396,10 @@ static int event_hist_trigger_print(struct seq_file *m,
 		else {
 			unsigned int idx = hist_data->sort_keys[i]->idx;
 
-			hist_field_print(m, hist_data->vals[idx]);
+			if (hist_data->sort_keys[i]->use_real_key)
+				hist_field_print(m, hist_data->key);
+			else
+				hist_field_print(m, hist_data->vals[idx]);
 		}
 	}
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFD 3/5] tracing: Add option to quantize key values
  2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
  2015-04-30 10:06 ` [RFD 1/5] tracing: 'hist' triggers Daniel Wagner
  2015-04-30 10:06 ` [RFD 2/5] tracing: Add support to sort on the key Daniel Wagner
@ 2015-04-30 10:06 ` Daniel Wagner
  2015-05-01  2:12   ` Tom Zanussi
  2015-04-30 10:06 ` [RFD 4/5] tracing: Deference pointers without RCU checks Daniel Wagner
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2015-04-30 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users,
	linux-kernel, Daniel Wagner

Let's group some values together. This avoids a too detailed
histogram. Some sort of logarythmic scale could be useful
for latency plots.

Now we can write something like:

'hist:key=latency.bucket:val=hitcount:sort=latency'

latency:          0 hitcount:     166440
latency:        256 hitcount:      21104
latency:        512 hitcount:       7754
latency:        768 hitcount:       3269
latency:       1024 hitcount:       1647
latency:       1280 hitcount:        841
latency:       1536 hitcount:        524
latency:       1792 hitcount:        371
latency:       2048 hitcount:        302
latency:       2304 hitcount:        240
latency:       2560 hitcount:        207
latency:       2816 hitcount:        149
latency:       3072 hitcount:        123
latency:       3328 hitcount:        119
latency:       3584 hitcount:        102
latency:       3840 hitcount:         94
latency:       4096 hitcount:         89
latency:       4352 hitcount:         79
latency:       4608 hitcount:         88

One thing I struggled with the grammatic above is that I haven't found
a nice way to pass in arguments, for example the bucket size. There a lot
of options to do it. Just a couple random ideas, not necessarly consistent
or clever:

'hist:key=latency.bucket[10,1.5]:val=hitcount:sort=latency'
   where [x,y]: x first bucket size, y scaling factor

'hist:key=latency:val=hitcount:sort=latency:bucket=latency,10,1.5'

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 kernel/trace/trace_events_hist.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index fe06707..cac94a6 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -84,6 +84,7 @@ enum hist_field_flags {
 	HIST_FIELD_STRING	= 8,
 	HIST_FIELD_EXECNAME	= 16,
 	HIST_FIELD_SYSCALL	= 32,
+	HIST_FIELD_BUCKET	= 64,
 };
 
 struct hist_trigger_sort_key {
@@ -400,6 +401,8 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 				flags |= HIST_FIELD_EXECNAME;
 			else if (!strcmp(field_str, "syscall"))
 				flags |= HIST_FIELD_SYSCALL;
+			else if (!strcmp(field_str, "bucket"))
+				flags |= HIST_FIELD_BUCKET;
 		}
 
 		field = trace_find_event_field(file->event_call, field_name);
@@ -900,6 +903,9 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec)
 		key = entries;
 	} else {
 		field_contents = hist_data->key->fn(hist_data->key, rec);
+		if (hist_data->key->flags & HIST_FIELD_BUCKET)
+			field_contents &= ~0xff;
+
 		if (hist_data->key->flags & HIST_FIELD_STRING)
 			key = (void *)field_contents;
 		else
@@ -1343,6 +1349,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
 		flags_str = "hex";
 	else if (hist_field->flags & HIST_FIELD_SYSCALL)
 		flags_str = "syscall";
+	else if (hist_field->flags & HIST_FIELD_BUCKET)
+		flags_str = "bucket";
 	else if (hist_field->flags & HIST_FIELD_EXECNAME)
 		flags_str = "execname";
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFD 4/5] tracing: Deference pointers without RCU checks
  2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
                   ` (2 preceding siblings ...)
  2015-04-30 10:06 ` [RFD 3/5] tracing: Add option to quantize key values Daniel Wagner
@ 2015-04-30 10:06 ` Daniel Wagner
  2015-04-30 10:06 ` [RFD 5/5] tracing: Add trace_irqsoff tracepoints Daniel Wagner
  2015-05-01  1:52 ` [RFD 0/5] Add latency histogram Tom Zanussi
  5 siblings, 0 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-04-30 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users,
	linux-kernel, Daniel Wagner

The tracepoint can't be used in code section where we are in the
middle of a state transition.

For example if we place a tracepoint inside start/stop_critical_section(),
lockdep complains with

[    0.035589] WARNING: CPU: 0 PID: 3 at kernel/locking/lockdep.c:3560 check_flags.part.36+0x1bc/0x210()
[    0.036000] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[    0.036000] Kernel panic - not syncing: panic_on_warn set ...
[    0.036000]
[    0.036000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc7+ #460
[    0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[    0.036000]  ffffffff81f2463a ffff88007c93bb98 ffffffff81afb317 0000000000000001
[    0.036000]  ffffffff81f212b3 ffff88007c93bc18 ffffffff81af7bc2 ffff88007c93bbb8
[    0.036000]  ffffffff00000008 ffff88007c93bc28 ffff88007c93bbc8 000000000093bbd8
[    0.036000] Call Trace:
[    0.036000]  [<ffffffff81afb317>] dump_stack+0x4f/0x7b
[    0.036000]  [<ffffffff81af7bc2>] panic+0xc0/0x1e9
[    0.036000]  [<ffffffff81b07188>] ? _raw_spin_unlock_irqrestore+0x38/0x80
[    0.036000]  [<ffffffff81052a60>] warn_slowpath_common+0xc0/0xc0
[    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff81052aa6>] warn_slowpath_fmt+0x46/0x50
[    0.036000]  [<ffffffff810a468c>] check_flags.part.36+0x1bc/0x210
[    0.036000]  [<ffffffff810a5168>] lock_is_held+0x78/0x90
[    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff810588ab>] ? __do_softirq+0x3db/0x500
[    0.036000]  [<ffffffff81134495>] trace_preempt_on+0x255/0x260
[    0.036000]  [<ffffffff81081c0b>] preempt_count_sub+0xab/0xf0
[    0.036000]  [<ffffffff81057116>] __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff810588ab>] __do_softirq+0x3db/0x500
[    0.036000]  [<ffffffff810589ef>] run_ksoftirqd+0x1f/0x60
[    0.036000]  [<ffffffff81079653>] smpboot_thread_fn+0x193/0x2a0
[    0.036000]  [<ffffffff810794c0>] ? SyS_setgroups+0x150/0x150
[    0.036000]  [<ffffffff81075312>] kthread+0xf2/0x110
[    0.036000]  [<ffffffff81b026b3>] ? wait_for_completion+0xc3/0x120
[    0.036000]  [<ffffffff81081c0b>] ? preempt_count_sub+0xab/0xf0
[    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
[    0.036000]  [<ffffffff81b07cc8>] ret_from_fork+0x58/0x90
[    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
[    0.036000] ---[ end Kernel panic - not syncing: panic_on_warn set ...

PeterZ was so kind to explain it to me what is happening:

"__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
tells lockdep softirqs are enabled with trace_softirqs_on() after that
we go an actually modify the preempt_count with preempt_count_sub().
Then in preempt_count_sub() you call into trace_preempt_on() if this
was the last preempt_count increment but you do that _before_ you
actually change the preempt_count with __preempt_count_sub() at this
point lockdep and preempt_count think the world differs and *boom*"

So the simplest way to avoid this is by disabling the consistency
checks.

We also need to take care of the iterating in trace_events_trigger.c
to avoid a splatter in conjunction with the hist trigger.

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 include/linux/rculist.h             | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/tracepoint.h          |  4 ++--
 kernel/trace/trace_events_trigger.c |  6 +++---
 3 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index a18b16f..2fae75e 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -253,6 +253,24 @@ static inline void list_splice_init_rcu(struct list_head *list,
 })
 
 /**
+ * list_entry_rcu_notrace - get the struct for this entry (for tracing)
+ * @ptr:        the &struct list_head pointer.
+ * @type:       the type of the struct this is embedded in.
+ * @member:     the name of the list_head within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
+ *
+ * This is the same as list_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_entry_rcu_notrace(ptr, type, member) \
+({ \
+	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
+	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), type, member); \
+})
+
+/**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
  *
  * Implementing those functions following their counterparts list_empty() and
@@ -308,6 +326,24 @@ static inline void list_splice_init_rcu(struct list_head *list,
 		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_rcu_notrace	-	iterate over rcu list of given type (for tracing)
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as list_add_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as list_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_for_each_entry_rcu_notrace(pos, head, member) \
+	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
+		&pos->member != (head); \
+		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c728513..f194324 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -125,7 +125,7 @@ extern void syscall_unregfunc(void);
 			return;						\
 		prercu;							\
 		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+		it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs);	\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -175,7 +175,7 @@ extern void syscall_unregfunc(void);
 				TP_CONDITION(cond),,);			\
 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
 			rcu_read_lock_sched_notrace();			\
-			rcu_dereference_sched(__tracepoint_##name.funcs);\
+			rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\
 			rcu_read_unlock_sched_notrace();		\
 		}							\
 	}								\
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 39aabaa..e47a229 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -71,14 +71,14 @@ event_triggers_call(struct ftrace_event_file *file, void *rec)
 	if (list_empty(&file->triggers))
 		return tt;
 
-	list_for_each_entry_rcu(data, &file->triggers, list) {
+	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
 		if (data->paused)
 			continue;
 		if (!rec) {
 			data->ops->func(data, rec);
 			continue;
 		}
-		filter = rcu_dereference_sched(data->filter);
+		filter = rcu_dereference_raw_notrace(data->filter);
 		if (filter && !filter_match_preds(filter, rec))
 			continue;
 		if (data->cmd_ops->post_trigger) {
@@ -110,7 +110,7 @@ event_triggers_post_call(struct ftrace_event_file *file,
 {
 	struct event_trigger_data *data;
 
-	list_for_each_entry_rcu(data, &file->triggers, list) {
+	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
 		if (data->paused)
 			continue;
 		if (data->cmd_ops->trigger_type & tt)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
                   ` (3 preceding siblings ...)
  2015-04-30 10:06 ` [RFD 4/5] tracing: Deference pointers without RCU checks Daniel Wagner
@ 2015-04-30 10:06 ` Daniel Wagner
  2015-05-01  2:14   ` Tom Zanussi
  2015-05-06  6:31   ` Daniel Wagner
  2015-05-01  1:52 ` [RFD 0/5] Add latency histogram Tom Zanussi
  5 siblings, 2 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-04-30 10:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users,
	linux-kernel, Daniel Wagner

Finally we place a few tracepoint at the end of critical section. With
the hist trigger in place we can generate the plots.

There are a few drawbacks compared to the latency_hist.patch [1]

The latency plots contain the values from all CPUs. In theory you
can also filter with something like

'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'

but I haven't got this working. I didn't spend much time figuring out
why this doesn't work. Even if the above is working you still
don't get the per CPU breakdown of the events. I don't know if this is
a must have feature.

Another point is the need for placing at least one tracepoint so that
the hist code is able to do something. Also I think it would be a good
idea to reuse the time diff from the tracer instead. Some refactoring
would be necessary for this. For simplicity I just added a hack for
getting the time diff. And I am not really sure if it is okay to use
the *_enabled() in this way, splitting the time stamping and adding
the tracepoint in two sections.

Steven was tossing the idea around to introduce a 'variable' to the
tracepoints which can be used for timestamping etc. I'd like to avoid
placing two tracepoints and doing later the time diff. It sounds like
too much overhead.

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>

[1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb
---
 include/trace/events/latency.h | 40 ++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_irqsoff.c   | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)
 create mode 100644 include/trace/events/latency.h

diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
new file mode 100644
index 0000000..9659b08
--- /dev/null
+++ b/include/trace/events/latency.h
@@ -0,0 +1,40 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM latency
+
+#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HIST_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(latency_template,
+	TP_PROTO(cycles_t latency),
+
+	TP_ARGS(latency),
+
+	TP_STRUCT__entry(
+		__field(cycles_t,	latency)
+	),
+
+	TP_fast_assign(
+		__entry->latency	= latency;
+	),
+
+	TP_printk("latency=%llu", __entry->latency)
+);
+
+DEFINE_EVENT(latency_template, latency_irqs,
+	    TP_PROTO(cycles_t latency),
+	    TP_ARGS(latency));
+
+DEFINE_EVENT(latency_template, latency_preempt,
+	    TP_PROTO(cycles_t latency),
+	    TP_ARGS(latency));
+
+DEFINE_EVENT(latency_template, latency_critical_timings,
+	    TP_PROTO(cycles_t latency),
+	    TP_ARGS(latency));
+
+#endif /* _TRACE_HIST_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 8523ea3..7465c72 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -14,12 +14,20 @@
 #include <linux/module.h>
 #include <linux/ftrace.h>
 
+#include <trace/events/sched.h>
+
 #include "trace.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/latency.h>
+
 static struct trace_array		*irqsoff_trace __read_mostly;
 static int				tracer_enabled __read_mostly;
 
 static DEFINE_PER_CPU(int, tracing_cpu);
+static DEFINE_PER_CPU(cycle_t, ts_irqs);
+static DEFINE_PER_CPU(cycle_t, ts_preempt);
+static DEFINE_PER_CPU(cycle_t, ts_critical_timings);
 
 static DEFINE_RAW_SPINLOCK(max_trace_lock);
 
@@ -428,9 +436,24 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip)
 	atomic_dec(&data->disabled);
 }
 
+static void do_timestamp(cycle_t __percpu *ts)
+{
+	int cpu = raw_smp_processor_id();
+	per_cpu(*ts, cpu) = ftrace_now(cpu);
+}
+
+static cycle_t get_delta(cycle_t __percpu *ts)
+{
+	int cpu = raw_smp_processor_id();
+	return ftrace_now(cpu) - per_cpu(*ts, cpu);
+}
+
 /* start and stop critical timings used to for stoppage (in idle) */
 void start_critical_timings(void)
 {
+	if (trace_latency_critical_timings_enabled())
+		do_timestamp(&ts_critical_timings);
+
 	if (preempt_trace() || irq_trace())
 		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
@@ -440,6 +463,9 @@ void stop_critical_timings(void)
 {
 	if (preempt_trace() || irq_trace())
 		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+
+	if (trace_latency_critical_timings_enabled())
+		trace_latency_critical_timings(get_delta(&ts_critical_timings));
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
@@ -447,6 +473,9 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
 #ifdef CONFIG_PROVE_LOCKING
 void time_hardirqs_on(unsigned long a0, unsigned long a1)
 {
+	if (trace_latency_irqs_enabled())
+		do_timestamp(&ts_irqs);
+
 	if (!preempt_trace() && irq_trace())
 		stop_critical_timing(a0, a1);
 }
@@ -455,6 +484,9 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
 {
 	if (!preempt_trace() && irq_trace())
 		start_critical_timing(a0, a1);
+
+	if (trace_latency_irqs_enabled())
+		trace_latency_irqs(get_delta(&ts_irqs));
 }
 
 #else /* !CONFIG_PROVE_LOCKING */
@@ -512,6 +544,9 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
 #ifdef CONFIG_PREEMPT_TRACER
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
+	if (trace_latency_preempt_enabled())
+		do_timestamp(&ts_preempt);
+
 	if (preempt_trace() && !irq_trace())
 		stop_critical_timing(a0, a1);
 }
@@ -520,6 +555,9 @@ void trace_preempt_off(unsigned long a0, unsigned long a1)
 {
 	if (preempt_trace() && !irq_trace())
 		start_critical_timing(a0, a1);
+
+	if (trace_latency_preempt_enabled())
+		trace_latency_preempt(get_delta(&ts_preempt));
 }
 #endif /* CONFIG_PREEMPT_TRACER */
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFD 0/5] Add latency histogram
  2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
                   ` (4 preceding siblings ...)
  2015-04-30 10:06 ` [RFD 5/5] tracing: Add trace_irqsoff tracepoints Daniel Wagner
@ 2015-05-01  1:52 ` Tom Zanussi
  2015-05-01  9:22   ` Daniel Wagner
  5 siblings, 1 reply; 16+ messages in thread
From: Tom Zanussi @ 2015-05-01  1:52 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Steven Rostedt, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

Hi Daniel,

On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> Hi,
> 
> I would like to discuss a possible way of getting the feature of the
> latecy_hist.patch [1] added to mainline.
> 
> "Latency histograms are primarily relevant in the context of real-time
> enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)and are used in the
> quality management of the Linux real-time capabilities."
> 
> Steven pointed out that this might be doable based on Tom Zanussi's
> "[PATCH v4 0/7] tracing: 'hist' triggers" [2].
> 
> Here are my findings. It was not too complicated to get it working,
> though I had to add some hacks. I have added comments to each patch.
> 

It looks like you were able to do quite a bit here with not much code -
nice!

Just FYI, I'll be working on a v5 of the hist triggers patchset that
will incorporate the stuff from patch 1 (needs to be split into a
separate patch for the triggers code already upstream, and one for hist
triggers) and your comments from patch 2 (see comments in my reply to
that patch), along with  a couple other unrelated changes...

Tom

> cheers,
> daniel
> 
> [2] https://lkml.org/lkml/2015/4/10/591
> [1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb
> 
> Daniel Wagner (4):
>   tracing: Add support to sort on the key
>   tracing: Add option to quantize key values
>   tracing: Deference pointers without RCU checks
>   tracing: Add trace_irqsoff tracepoints
> 
> Tom Zanussi (1):
>   tracing: 'hist' triggers
> 
>  include/linux/rculist.h             | 36 +++++++++++++++++++++++++++++
>  include/linux/tracepoint.h          |  4 ++--
>  include/trace/events/latency.h      | 40 ++++++++++++++++++++++++++++++++
>  kernel/trace/trace_events_hist.c    | 46 +++++++++++++++++++++++++++++++++----
>  kernel/trace/trace_events_trigger.c | 18 +++++++++------
>  kernel/trace/trace_irqsoff.c        | 38 ++++++++++++++++++++++++++++++
>  6 files changed, 168 insertions(+), 14 deletions(-)
>  create mode 100644 include/trace/events/latency.h
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 2/5] tracing: Add support to sort on the key
  2015-04-30 10:06 ` [RFD 2/5] tracing: Add support to sort on the key Daniel Wagner
@ 2015-05-01  2:02   ` Tom Zanussi
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Zanussi @ 2015-05-01  2:02 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Steven Rostedt, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> The hist patch allows sorting on values only. By allowing to
> sort also on the key we can do something like this:
> 
> 'hist:key=latency:val=hitcount:sort=latency'
> 
> latency:         16 hitcount:          3
> latency:         17 hitcount:        171
> latency:         18 hitcount:        626
> latency:         19 hitcount:        594
> latency:         20 hitcount:        306
> latency:         21 hitcount:        214
> latency:         22 hitcount:        232
> latency:         23 hitcount:        283
> latency:         24 hitcount:        235
> latency:         25 hitcount:        105
> latency:         26 hitcount:         54
> latency:         27 hitcount:         79
> latency:         28 hitcount:        214
> latency:         29 hitcount:        895
> latency:         30 hitcount:       1400
> latency:         31 hitcount:        774
> latency:         32 hitcount:        653
> [...]
> 
> The obvious choice for the bool was already taken. I haven't found a
> good name for the the flag. I guess it would make sense to refactor the
> sorting code so that it doesn't really matter what kind of field it
> is.
> 

I think you're right - the current flag is kind of an internal thing to
the implementation, and uses a name that's too generic.  Of course, you
should be able to sort on keys as well as values, and the code shouldn't
care too much about which is specified.  The original code was more
capable wrt sorting and I probably simplified it a bit too much - I'll
refactor things taking all that into account for the next version.

> BTW, I wonder if it would possible to drop the need to always provide
> the 'val' argument and just assume the 'val=hitcount' in this case.
> 

That also makes a lot of sense - I'll make that change too.

Tom

> Not for inclusion!
> 
> Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> ---
>  kernel/trace/trace_events_hist.c | 35 +++++++++++++++++++++++++++++++----
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 9a7a675..fe06707 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -89,6 +89,7 @@ enum hist_field_flags {
>  struct hist_trigger_sort_key {
>  	bool		use_hitcount;
>  	bool		use_key;
> +	bool		use_real_key;
>  	bool		descending;
>  	unsigned int	idx;
>  };
> @@ -260,7 +261,7 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
>  	}
>  }
>  
> -static inline struct hist_trigger_sort_key *create_default_sort_key(void)
> +static inline struct hist_trigger_sort_key *create_hitcount_sort_key(void)
>  {
>  	struct hist_trigger_sort_key *sort_key;
>  
> @@ -273,6 +274,19 @@ static inline struct hist_trigger_sort_key *create_default_sort_key(void)
>  	return sort_key;
>  }
>  
> +static inline struct hist_trigger_sort_key *create_real_key_sort_key(void)
> +{
> +	struct hist_trigger_sort_key *sort_key;
> +
> +	sort_key = kzalloc(sizeof(*sort_key), GFP_KERNEL);
> +	if (!sort_key)
> +		return ERR_PTR(-ENOMEM);
> +
> +	sort_key->use_real_key = true;
> +
> +	return sort_key;
> +}
> +
>  static inline struct hist_trigger_sort_key *
>  create_sort_key(char *field_name, struct hist_trigger_data *hist_data)
>  {
> @@ -280,7 +294,10 @@ create_sort_key(char *field_name, struct hist_trigger_data *hist_data)
>  	unsigned int i;
>  
>  	if (!strcmp(field_name, "hitcount"))
> -		return create_default_sort_key();
> +		return create_hitcount_sort_key();
> +
> +	if (!strcmp(field_name, hist_data->key->field->name))
> +		return create_real_key_sort_key();
>  
>  	for (i = 0; i < hist_data->n_vals; i++)
>  		if (!strcmp(field_name, hist_data->vals[i]->field->name))
> @@ -306,7 +323,7 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
>  	int ret = 0;
>  
>  	if (!fields_str) {
> -		sort_key = create_default_sort_key();
> +		sort_key = create_hitcount_sort_key();
>  		if (IS_ERR(sort_key)) {
>  			ret = PTR_ERR(sort_key);
>  			goto out;
> @@ -984,6 +1001,12 @@ static int cmp_entries(const struct hist_trigger_sort_entry **a,
>  	hist_data = entry_a->hist_data;
>  	sort_key = hist_data->sort_key_cur;
>  
> +	if (sort_key->use_real_key) {
> +		val_a = *(u64 *)entry_a->key;
> +		val_b = *(u64 *)entry_b->key;
> +		goto out;
> +	}
> +
>  	if (sort_key->use_key) {
>  		if (memcmp((*a)->key, (*b)->key, hist_data->map->key_size))
>  			ret = 1;
> @@ -998,6 +1021,7 @@ static int cmp_entries(const struct hist_trigger_sort_entry **a,
>  		val_b = atomic64_read(&entry_b->sums[sort_key->idx]);
>  	}
>  
> +out:
>  	if (val_a > val_b)
>  		ret = 1;
>  	else if (val_a < val_b)
> @@ -1372,7 +1396,10 @@ static int event_hist_trigger_print(struct seq_file *m,
>  		else {
>  			unsigned int idx = hist_data->sort_keys[i]->idx;
>  
> -			hist_field_print(m, hist_data->vals[idx]);
> +			if (hist_data->sort_keys[i]->use_real_key)
> +				hist_field_print(m, hist_data->key);
> +			else
> +				hist_field_print(m, hist_data->vals[idx]);
>  		}
>  	}
>  



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 3/5] tracing: Add option to quantize key values
  2015-04-30 10:06 ` [RFD 3/5] tracing: Add option to quantize key values Daniel Wagner
@ 2015-05-01  2:12   ` Tom Zanussi
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Zanussi @ 2015-05-01  2:12 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Steven Rostedt, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> Let's group some values together. This avoids a too detailed
> histogram. Some sort of logarythmic scale could be useful
> for latency plots.
> 
> Now we can write something like:
> 
> 'hist:key=latency.bucket:val=hitcount:sort=latency'
> 
> latency:          0 hitcount:     166440
> latency:        256 hitcount:      21104
> latency:        512 hitcount:       7754
> latency:        768 hitcount:       3269
> latency:       1024 hitcount:       1647
> latency:       1280 hitcount:        841
> latency:       1536 hitcount:        524
> latency:       1792 hitcount:        371
> latency:       2048 hitcount:        302
> latency:       2304 hitcount:        240
> latency:       2560 hitcount:        207
> latency:       2816 hitcount:        149
> latency:       3072 hitcount:        123
> latency:       3328 hitcount:        119
> latency:       3584 hitcount:        102
> latency:       3840 hitcount:         94
> latency:       4096 hitcount:         89
> latency:       4352 hitcount:         79
> latency:       4608 hitcount:         88
> 

Nice addition!

> One thing I struggled with the grammatic above is that I haven't found
> a nice way to pass in arguments, for example the bucket size. There a lot
> of options to do it. Just a couple random ideas, not necessarly consistent
> or clever:
> 
> 'hist:key=latency.bucket[10,1.5]:val=hitcount:sort=latency'
>    where [x,y]: x first bucket size, y scaling factor
> 

I like this notation - it's consistent with the other uses of the dot
notation in that it's modifying the way things are displayed, in this
case displaying latency as a histogram with specific [non-default]
parameters.

Tom

> 'hist:key=latency:val=hitcount:sort=latency:bucket=latency,10,1.5'
> 
> Not for inclusion!
> 
> Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> ---
>  kernel/trace/trace_events_hist.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index fe06707..cac94a6 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -84,6 +84,7 @@ enum hist_field_flags {
>  	HIST_FIELD_STRING	= 8,
>  	HIST_FIELD_EXECNAME	= 16,
>  	HIST_FIELD_SYSCALL	= 32,
> +	HIST_FIELD_BUCKET	= 64,
>  };
>  
>  struct hist_trigger_sort_key {
> @@ -400,6 +401,8 @@ static int create_key_field(struct hist_trigger_data *hist_data,
>  				flags |= HIST_FIELD_EXECNAME;
>  			else if (!strcmp(field_str, "syscall"))
>  				flags |= HIST_FIELD_SYSCALL;
> +			else if (!strcmp(field_str, "bucket"))
> +				flags |= HIST_FIELD_BUCKET;
>  		}
>  
>  		field = trace_find_event_field(file->event_call, field_name);
> @@ -900,6 +903,9 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec)
>  		key = entries;
>  	} else {
>  		field_contents = hist_data->key->fn(hist_data->key, rec);
> +		if (hist_data->key->flags & HIST_FIELD_BUCKET)
> +			field_contents &= ~0xff;
> +
>  		if (hist_data->key->flags & HIST_FIELD_STRING)
>  			key = (void *)field_contents;
>  		else
> @@ -1343,6 +1349,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
>  		flags_str = "hex";
>  	else if (hist_field->flags & HIST_FIELD_SYSCALL)
>  		flags_str = "syscall";
> +	else if (hist_field->flags & HIST_FIELD_BUCKET)
> +		flags_str = "bucket";
>  	else if (hist_field->flags & HIST_FIELD_EXECNAME)
>  		flags_str = "execname";
>  



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-04-30 10:06 ` [RFD 5/5] tracing: Add trace_irqsoff tracepoints Daniel Wagner
@ 2015-05-01  2:14   ` Tom Zanussi
  2015-05-01  2:54     ` Steven Rostedt
  2015-05-06  6:31   ` Daniel Wagner
  1 sibling, 1 reply; 16+ messages in thread
From: Tom Zanussi @ 2015-05-01  2:14 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Steven Rostedt, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
> Finally we place a few tracepoint at the end of critical section. With
> the hist trigger in place we can generate the plots.
> 
> There are a few drawbacks compared to the latency_hist.patch [1]
> 
> The latency plots contain the values from all CPUs. In theory you
> can also filter with something like
> 
> 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
> 
> but I haven't got this working. I didn't spend much time figuring out
> why this doesn't work. Even if the above is working you still

I think it doesn't work because the tracepoint doesn't actually have a
'cpu' field to use in the filter...

Tom


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-05-01  2:14   ` Tom Zanussi
@ 2015-05-01  2:54     ` Steven Rostedt
  2015-05-01  9:23       ` Daniel Wagner
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2015-05-01  2:54 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Daniel Wagner, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On Thu, 30 Apr 2015 21:14:52 -0500
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:


> > 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
> > 
> > but I haven't got this working. I didn't spend much time figuring out
> > why this doesn't work. Even if the above is working you still
> 
> I think it doesn't work because the tracepoint doesn't actually have a
> 'cpu' field to use in the filter...

Perhaps we should add special fields that don't use the tracepoint
field, but can use generically know fields that are always known when
the tracepoint is triggered. COMM could be one, as well as CPU.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 0/5] Add latency histogram
  2015-05-01  1:52 ` [RFD 0/5] Add latency histogram Tom Zanussi
@ 2015-05-01  9:22   ` Daniel Wagner
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-05-01  9:22 UTC (permalink / raw)
  To: Tom Zanussi, Daniel Wagner
  Cc: Steven Rostedt, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

Hi Tom,

On 05/01/2015 01:52 AM, Tom Zanussi wrote:
> Hi Daniel,
>
> On Thu, 2015-04-30 at 12:06 +0200, Daniel Wagner wrote:
>> Hi,
>>
>> I would like to discuss a possible way of getting the feature of the
>> latecy_hist.patch [1] added to mainline.
>>
>> "Latency histograms are primarily relevant in the context of real-time
>> enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT)and are used in the
>> quality management of the Linux real-time capabilities."
>>
>> Steven pointed out that this might be doable based on Tom Zanussi's
>> "[PATCH v4 0/7] tracing: 'hist' triggers" [2].
>>
>> Here are my findings. It was not too complicated to get it working,
>> though I had to add some hacks. I have added comments to each patch.
>>
>
> It looks like you were able to do quite a bit here with not much code -
> nice!

Thanks :)

> Just FYI, I'll be working on a v5 of the hist triggers patchset that
> will incorporate the stuff from patch 1 (needs to be split into a
> separate patch for the triggers code already upstream, and one for hist
> triggers) and your comments from patch 2 (see comments in my reply to
> that patch), along with  a couple other unrelated changes...

Great to hear!

cheers,
daniel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-05-01  2:54     ` Steven Rostedt
@ 2015-05-01  9:23       ` Daniel Wagner
  2015-05-04 14:05         ` Daniel Wagner
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2015-05-01  9:23 UTC (permalink / raw)
  To: Steven Rostedt, Tom Zanussi
  Cc: Daniel Wagner, Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On 05/01/2015 02:54 AM, Steven Rostedt wrote:
> On Thu, 30 Apr 2015 21:14:52 -0500
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>
>
>>> 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
>>>
>>> but I haven't got this working. I didn't spend much time figuring out
>>> why this doesn't work. Even if the above is working you still
>>
>> I think it doesn't work because the tracepoint doesn't actually have a
>> 'cpu' field to use in the filter...
>
> Perhaps we should add special fields that don't use the tracepoint
> field, but can use generically know fields that are always known when
> the tracepoint is triggered. COMM could be one, as well as CPU.

I'll give it a try if no one objects next week (public holiday today :))

cheers,
daniel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-05-01  9:23       ` Daniel Wagner
@ 2015-05-04 14:05         ` Daniel Wagner
  2015-05-04 15:41           ` Daniel Wagner
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2015-05-04 14:05 UTC (permalink / raw)
  To: Daniel Wagner, Steven Rostedt, Tom Zanussi
  Cc: Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On 05/01/2015 11:23 AM, Daniel Wagner wrote:
> On 05/01/2015 02:54 AM, Steven Rostedt wrote:
>> On Thu, 30 Apr 2015 21:14:52 -0500
>> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>>
>>
>>>> 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
>>>>
>>>> but I haven't got this working. I didn't spend much time figuring out
>>>> why this doesn't work. Even if the above is working you still
>>>
>>> I think it doesn't work because the tracepoint doesn't actually have a
>>> 'cpu' field to use in the filter...
>>
>> Perhaps we should add special fields that don't use the tracepoint
>> field, but can use generically know fields that are always known when
>> the tracepoint is triggered. COMM could be one, as well as CPU.

Here a first attempt:

>From fcd910836179dd738b3300dbf93a30f352e90996 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Mon, 4 May 2015 15:44:24 +0200
Subject: [PATCH] tracing: Allow triggers to filter for CPUs

By extending the filter rules by more 'generically know' fields
we can write triggers filters like

'stacktrace:5 if cpu == 1'

We add a new element to ftrace_common_fields which describes
the new 'generically know' field. This change is visiable to user space.
E.g. the format file changes for kmem:kmalloc changes to

name: kmalloc
ID: 395
format:
        field:int cpu;  offset:0;       size:0; signed:1;
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long call_site;  offset:8;       size:8; signed:0;
        field:const void * ptr; offset:16;      size:8; signed:0;
        field:size_t bytes_req; offset:24;      size:8; signed:0;
        field:size_t bytes_alloc;       offset:32;      size:8; signed:0;
        field:gfp_t gfp_flags;  offset:40;      size:4; signed:0;

I think that is not complete correct since the field cpu doesn't
really exists in trace event entry. Propably moving those known fields
their own list is a way to overcome this problem. What do you think?

And yes, I am still marveling on this comment

  /* If not of not match is equal to not of not, then it is a match */

from DEFINE_COMPARISON_PRED. This led me to copy it for
filter_pred_cpu() :)

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 kernel/trace/trace_events.c        |  8 ++++++++
 kernel/trace/trace_events_filter.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c4de47f..3f51f2b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -144,6 +144,13 @@ int trace_define_field(struct ftrace_event_call *call, const char *type,
 }
 EXPORT_SYMBOL_GPL(trace_define_field);
 
+#define __trace_field(type, item)					\
+	ret = __trace_define_field(&ftrace_common_fields, #type,	\
+				   #item, 0, 0, is_signed_type(type),	\
+				   FILTER_OTHER);			\
+	if (ret)							\
+		return ret;
+
 #define __common_field(type, item)					\
 	ret = __trace_define_field(&ftrace_common_fields, #type,	\
 				   "common_" #item,			\
@@ -158,6 +165,7 @@ static int trace_define_common_fields(void)
 	int ret;
 	struct trace_entry ent;
 
+	__trace_field(int, cpu);
 	__common_field(unsigned short, type);
 	__common_field(unsigned char, flags);
 	__common_field(unsigned char, preempt_count);
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index ced69da..af2a6bb 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -252,6 +252,38 @@ static int filter_pred_strloc(struct filter_pred *pred, void *event)
 	return match;
 }
 
+/* Filter predicate for CPUs. */
+static int filter_pred_cpu(struct filter_pred *pred, void *event)
+{
+	int cpu, cmp;
+	int match = 0;
+
+	cpu = raw_smp_processor_id();
+	cmp = pred->val;
+
+	switch (pred->op) {
+	case OP_EQ:
+		match = cpu == cmp;
+		break;
+	case OP_LT:
+		match = cpu < cmp;
+		break;
+	case OP_LE:
+		match = cpu <= cmp;
+		break;
+	case OP_GT:
+		match = cpu > cmp;
+		break;
+	case OP_GE:
+		match = cpu >= cmp;
+		break;
+	default:
+		break;
+	}
+
+	return !!match == !pred->not;
+}
+
 static int filter_pred_none(struct filter_pred *pred, void *event)
 {
 	return 0;
@@ -1025,7 +1057,10 @@ static int init_pred(struct filter_parse_state *ps,
 		}
 		pred->val = val;
 
-		fn = select_comparison_fn(pred->op, field->size,
+		if (!strcmp(field->name, "cpu"))
+			fn = filter_pred_cpu;
+		else
+			fn = select_comparison_fn(pred->op, field->size,
 					  field->is_signed);
 		if (!fn) {
 			parse_error(ps, FILT_ERR_INVALID_OP, 0);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-05-04 14:05         ` Daniel Wagner
@ 2015-05-04 15:41           ` Daniel Wagner
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-05-04 15:41 UTC (permalink / raw)
  To: Daniel Wagner, Steven Rostedt, Tom Zanussi
  Cc: Ingo Molnar, Carsten Emde, linux-rt-users, linux-kernel

On 05/04/2015 04:05 PM, Daniel Wagner wrote:
> On 05/01/2015 11:23 AM, Daniel Wagner wrote:
>> On 05/01/2015 02:54 AM, Steven Rostedt wrote:
>>> On Thu, 30 Apr 2015 21:14:52 -0500
>>> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>>>
>>>
>>>>> 'hist:key=latency.bucket:val=hitcount:sort=latency if cpu==0'
>>>>>
>>>>> but I haven't got this working. I didn't spend much time figuring out
>>>>> why this doesn't work. Even if the above is working you still
>>>>
>>>> I think it doesn't work because the tracepoint doesn't actually have a
>>>> 'cpu' field to use in the filter...
>>>
>>> Perhaps we should add special fields that don't use the tracepoint
>>> field, but can use generically know fields that are always known when
>>> the tracepoint is triggered. COMM could be one, as well as CPU.

Here my second attempt.

Changes v0:
  - Added COMM filter predicate
  - Avoid API break in format by introducing a new list for
    generic fields



>From 10a58dcc9157ef29514b237aa5d5d8412365a389 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Mon, 4 May 2015 15:44:24 +0200
Subject: [PATCH] tracing: Allow triggers to filter for CPUs and COMM

By extending the filter rules by more generic fields
we can write triggers filters like

  echo 'stacktrace if cpu == 1' > \
	/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

or

  echo 'stacktrace if comm == sshd'  > \
	/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

We don't want to change the API for the format files. Therefore a new
list (ftrace_generic_fields) is introduced and by this we avoid
getting them listed.

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 kernel/trace/trace_events.c        | 25 ++++++++++++++++++
 kernel/trace/trace_events_filter.c | 54 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c4de47f..805cb75 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -30,6 +30,7 @@
 DEFINE_MUTEX(event_mutex);
 
 LIST_HEAD(ftrace_events);
+static LIST_HEAD(ftrace_generic_fields);
 static LIST_HEAD(ftrace_common_fields);
 
 #define GFP_TRACE (GFP_KERNEL | __GFP_ZERO)
@@ -94,6 +95,10 @@ trace_find_event_field(struct ftrace_event_call *call, char *name)
 	struct ftrace_event_field *field;
 	struct list_head *head;
 
+	field = __find_event_field(&ftrace_generic_fields, name);
+	if (field)
+		return field;
+
 	field = __find_event_field(&ftrace_common_fields, name);
 	if (field)
 		return field;
@@ -144,6 +149,13 @@ int trace_define_field(struct ftrace_event_call *call, const char *type,
 }
 EXPORT_SYMBOL_GPL(trace_define_field);
 
+#define __generic_field(type, item, filter_type)			\
+	ret = __trace_define_field(&ftrace_generic_fields, #type,	\
+				   #item, 0, 0, is_signed_type(type),	\
+				   filter_type);			\
+	if (ret)							\
+		return ret;
+
 #define __common_field(type, item)					\
 	ret = __trace_define_field(&ftrace_common_fields, #type,	\
 				   "common_" #item,			\
@@ -153,6 +165,16 @@ EXPORT_SYMBOL_GPL(trace_define_field);
 	if (ret)							\
 		return ret;
 
+static int trace_define_generic_fields(void)
+{
+	int ret;
+
+	__generic_field(int, cpu, FILTER_OTHER);
+	__generic_field(char *, comm, FILTER_PTR_STRING);
+
+	return ret;
+}
+
 static int trace_define_common_fields(void)
 {
 	int ret;
@@ -2671,6 +2693,9 @@ static __init int event_trace_init(void)
 	if (!entry)
 		pr_warn("Could not create tracefs 'available_events' entry\n");
 
+	if (trace_define_generic_fields())
+		pr_warn("tracing: Failed to allocated generic fields");
+
 	if (trace_define_common_fields())
 		pr_warn("tracing: Failed to allocate common fields");
 
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index ced69da..74feaf4 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -252,6 +252,50 @@ static int filter_pred_strloc(struct filter_pred *pred, void *event)
 	return match;
 }
 
+/* Filter predicate for CPUs. */
+static int filter_pred_cpu(struct filter_pred *pred, void *event)
+{
+	int cpu, cmp;
+	int match = 0;
+
+	cpu = raw_smp_processor_id();
+	cmp = pred->val;
+
+	switch (pred->op) {
+	case OP_EQ:
+		match = cpu == cmp;
+		break;
+	case OP_LT:
+		match = cpu < cmp;
+		break;
+	case OP_LE:
+		match = cpu <= cmp;
+		break;
+	case OP_GT:
+		match = cpu > cmp;
+		break;
+	case OP_GE:
+		match = cpu >= cmp;
+		break;
+	default:
+		break;
+	}
+
+	return !!match == !pred->not;
+}
+
+/* Filter predicate for COMM. */
+static int filter_pred_comm(struct filter_pred *pred, void *event)
+{
+	int cmp, match;
+
+	cmp = pred->regex.match(current->comm, &pred->regex,
+				pred->regex.field_len);
+	match = cmp ^ pred->not;
+
+	return match;
+}
+
 static int filter_pred_none(struct filter_pred *pred, void *event)
 {
 	return 0;
@@ -1002,7 +1046,10 @@ static int init_pred(struct filter_parse_state *ps,
 	if (is_string_field(field)) {
 		filter_build_regex(pred);
 
-		if (field->filter_type == FILTER_STATIC_STRING) {
+		if (!strcmp(field->name, "comm")) {
+			fn = filter_pred_comm;
+			pred->regex.field_len = TASK_COMM_LEN;
+		} else if (field->filter_type == FILTER_STATIC_STRING) {
 			fn = filter_pred_string;
 			pred->regex.field_len = field->size;
 		} else if (field->filter_type == FILTER_DYN_STRING)
@@ -1025,7 +1072,10 @@ static int init_pred(struct filter_parse_state *ps,
 		}
 		pred->val = val;
 
-		fn = select_comparison_fn(pred->op, field->size,
+		if (!strcmp(field->name, "cpu"))
+			fn = filter_pred_cpu;
+		else
+			fn = select_comparison_fn(pred->op, field->size,
 					  field->is_signed);
 		if (!fn) {
 			parse_error(ps, FILT_ERR_INVALID_OP, 0);
-- 
2.1.0




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFD 5/5] tracing: Add trace_irqsoff tracepoints
  2015-04-30 10:06 ` [RFD 5/5] tracing: Add trace_irqsoff tracepoints Daniel Wagner
  2015-05-01  2:14   ` Tom Zanussi
@ 2015-05-06  6:31   ` Daniel Wagner
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel Wagner @ 2015-05-06  6:31 UTC (permalink / raw)
  To: Daniel Wagner, Steven Rostedt
  Cc: Ingo Molnar, Tom Zanussi, Carsten Emde, linux-rt-users, linux-kernel

It is possible to avoid the hackery from patch 4 completely.
If the tracepoint is placed in stop_critical_timing(),
lockdep doesn't complain anymore.

I have changed this patch so that I don't do need to any additional
work to get the latency calculated. Instead just use the values
from check_critical_timing().

That means the tracepoints will only be hit if the tracer
preemptirqsoff & co is enabled. Instead having several
different tracepoints just add the a new field which describes where
the latency is coming from. Furthermore, this approach also avoids
adding additional overhead when the tracers are compiled in but
not used.

# cat /sys/kernel/debug/tracing/events/latency/latency/format 
name: latency
ID: 333
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:int source;       offset:8;       size:4; signed:1;
        field:cycles_t latency; offset:16;      size:8; signed:0;

print fmt: "source=%s latency=%llu", __print_symbolic(REC->source, {1, "critical section" }, {2, "irq"}, {3, "preempt"}), REC->latency


So in short you could do following to get a histogram:

# echo preemptirqsoff > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /sys/kernel/debug/tracing/tracing_on
# echo 'hist:key=latency.bucket:val=hitcount:sort=latency '	\
	'if source == 1' > 					\
	/sys/kernel/debug/tracing/events/latency/latency/trigger

# cat /sys/kernel/debug/tracing/events/latency/latency/hist | head -20
# trigger info: hist:keys=latency.bucket:vals=:sort=latency.bucket:size=2048 if source == 1 [active]

latency:       1792 hitcount:          7
latency:       2048 hitcount:         23
latency:       2304 hitcount:         37
latency:       2560 hitcount:         57
latency:       2816 hitcount:         33
latency:       3072 hitcount:         40
latency:       3328 hitcount:         16
latency:       3584 hitcount:         40
latency:       3840 hitcount:        133
latency:       4096 hitcount:        150
latency:       4352 hitcount:        196
latency:       4608 hitcount:        338
latency:       4864 hitcount:        159
latency:       5120 hitcount:         92
latency:       5376 hitcount:         58
latency:       5632 hitcount:         26
latency:       5888 hitcount:         25
latency:       6144 hitcount:         24


I do some more testing to see if I missed something.

cheers,
daniel



>From 449c6881425cc461dd7ea6b6f865d2efe99175b3 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Wed, 29 Apr 2015 17:09:04 +0200
Subject: [PATCH] tracing: Add trace_irqsoff tracepoints

In order to create latency histograms, we place a tracepoint at the end
of critical section. 

Not for inclusion!

Not-Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 include/trace/events/latency.h | 41 +++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_irqsoff.c   | 22 ++++++++++++++--------
 2 files changed, 55 insertions(+), 8 deletions(-)
 create mode 100644 include/trace/events/latency.h

diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
new file mode 100644
index 0000000..1ed8b13
--- /dev/null
+++ b/include/trace/events/latency.h
@@ -0,0 +1,41 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM latency
+
+#if !defined(_TRACE_LATENCY_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_LATENCY_H
+
+#include <linux/tracepoint.h>
+
+#define LATENCY_CS	1
+#define LATENCY_IRQ	2
+#define LATENCY_PREEMPT	3
+
+#define latency_source_sym						\
+	{LATENCY_CS,		"critical section" },			\
+	{LATENCY_IRQ,		"irq"},					\
+	{LATENCY_PREEMPT,	"preempt"}
+
+TRACE_EVENT(latency,
+	TP_PROTO(int source, cycles_t latency),
+
+	TP_ARGS(source, latency),
+
+	TP_STRUCT__entry(
+		__field(int,		source)
+		__field(cycles_t,	latency)
+	),
+
+	TP_fast_assign(
+		__entry->source		= source;
+		__entry->latency	= latency;
+	),
+
+	TP_printk("source=%s latency=%llu",
+		__print_symbolic(__entry->source, latency_source_sym),
+		__entry->latency)
+);
+
+#endif /* _TRACE_LATENCY_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 8523ea3..7a1d254 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -14,8 +14,13 @@
 #include <linux/module.h>
 #include <linux/ftrace.h>
 
+#include <trace/events/sched.h>
+
 #include "trace.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/latency.h>
+
 static struct trace_array		*irqsoff_trace __read_mostly;
 static int				tracer_enabled __read_mostly;
 
@@ -311,7 +316,7 @@ static void
 check_critical_timing(struct trace_array *tr,
 		      struct trace_array_cpu *data,
 		      unsigned long parent_ip,
-		      int cpu)
+		      int cpu, int source)
 {
 	cycle_t T0, T1, delta;
 	unsigned long flags;
@@ -325,6 +330,7 @@ check_critical_timing(struct trace_array *tr,
 
 	pc = preempt_count();
 
+	trace_latency(source, delta);
 	if (!report_latency(tr, delta))
 		goto out;
 
@@ -396,7 +402,7 @@ start_critical_timing(unsigned long ip, unsigned long parent_ip)
 }
 
 static inline void
-stop_critical_timing(unsigned long ip, unsigned long parent_ip)
+stop_critical_timing(unsigned long ip, unsigned long parent_ip, int source)
 {
 	int cpu;
 	struct trace_array *tr = irqsoff_trace;
@@ -423,7 +429,7 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip)
 
 	local_save_flags(flags);
 	__trace_function(tr, ip, parent_ip, flags, preempt_count());
-	check_critical_timing(tr, data, parent_ip ? : ip, cpu);
+	check_critical_timing(tr, data, parent_ip ? : ip, cpu, source);
 	data->critical_start = 0;
 	atomic_dec(&data->disabled);
 }
@@ -439,7 +445,7 @@ EXPORT_SYMBOL_GPL(start_critical_timings);
 void stop_critical_timings(void)
 {
 	if (preempt_trace() || irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1, LATENCY_CS);
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
@@ -448,7 +454,7 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
 void time_hardirqs_on(unsigned long a0, unsigned long a1)
 {
 	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(a0, a1);
+		stop_critical_timing(a0, a1, LATENCY_IRQ);
 }
 
 void time_hardirqs_off(unsigned long a0, unsigned long a1)
@@ -481,7 +487,7 @@ inline void print_irqtrace_events(struct task_struct *curr)
 void trace_hardirqs_on(void)
 {
 	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1, LATENCY_IRQ);
 }
 EXPORT_SYMBOL(trace_hardirqs_on);
 
@@ -495,7 +501,7 @@ EXPORT_SYMBOL(trace_hardirqs_off);
 __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 {
 	if (!preempt_trace() && irq_trace())
-		stop_critical_timing(CALLER_ADDR0, caller_addr);
+		stop_critical_timing(CALLER_ADDR0, caller_addr, LATENCY_IRQ);
 }
 EXPORT_SYMBOL(trace_hardirqs_on_caller);
 
@@ -513,7 +519,7 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
 	if (preempt_trace() && !irq_trace())
-		stop_critical_timing(a0, a1);
+		stop_critical_timing(a0, a1, LATENCY_PREEMPT);
 }
 
 void trace_preempt_off(unsigned long a0, unsigned long a1)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-05-06  6:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 10:06 [RFD 0/5] Add latency histogram Daniel Wagner
2015-04-30 10:06 ` [RFD 1/5] tracing: 'hist' triggers Daniel Wagner
2015-04-30 10:06 ` [RFD 2/5] tracing: Add support to sort on the key Daniel Wagner
2015-05-01  2:02   ` Tom Zanussi
2015-04-30 10:06 ` [RFD 3/5] tracing: Add option to quantize key values Daniel Wagner
2015-05-01  2:12   ` Tom Zanussi
2015-04-30 10:06 ` [RFD 4/5] tracing: Deference pointers without RCU checks Daniel Wagner
2015-04-30 10:06 ` [RFD 5/5] tracing: Add trace_irqsoff tracepoints Daniel Wagner
2015-05-01  2:14   ` Tom Zanussi
2015-05-01  2:54     ` Steven Rostedt
2015-05-01  9:23       ` Daniel Wagner
2015-05-04 14:05         ` Daniel Wagner
2015-05-04 15:41           ` Daniel Wagner
2015-05-06  6:31   ` Daniel Wagner
2015-05-01  1:52 ` [RFD 0/5] Add latency histogram Tom Zanussi
2015-05-01  9:22   ` Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).