linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] *** Latency histograms - IRQSOFF,PREEMPTOFF ***
@ 2016-09-02 12:37 Binoy Jayan
  2016-09-02 12:37 ` [PATCH v5 1/4] tracing: Deference pointers without RCU checks Binoy Jayan
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Binoy Jayan @ 2016-09-02 12:37 UTC (permalink / raw)
  To: Steven Rostedt (Red Hat)
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami,
	Binoy Jayan

Hi,

Thank you Daniel, Steven for reviewing the code and for the comments.
I have made the changes mentioned below and have added the hack to mark
the field cpu as a key.

These set of patches [v5] capture latency events caused by interrupts and
premption disabled in kernel. The patches are based on the hist trigger
feature developed by Tom Zanussi.

v4: https://lkml.org/lkml/2016/8/30/188
v3: https://lkml.org/lkml/2016/8/29/50
v2: https://lkml.org/lkml/2016/8/24/296

Changes from v4
  - Added unlikely() for less probable paths
  - Dropped field 'ccomm' for hrtimer latency
  - Patch to mark the generic field cpu as a key field and make it part
    of histogram output
  - Changed ambiguous function names

TODO:
1. kselftest test scripts
2. delayed hrtimer offset test scenario

Thanks,
Binoy

Binoy Jayan (3):
  tracing: Add cpu as a key field in histogram
  tracing: Add trace_irqsoff tracepoints
  tracing: Histogram for delayed hrtimer offsets

Daniel Wagner (1):
  tracing: Deference pointers without RCU checks

 include/linux/hrtimer.h             |  3 ++
 include/linux/rculist.h             | 36 ++++++++++++++++++
 include/linux/tracepoint.h          |  4 +-
 include/trace/events/latency.h      | 73 +++++++++++++++++++++++++++++++++++++
 kernel/time/hrtimer.c               | 46 +++++++++++++++++++++++
 kernel/trace/trace_events.c         |  3 +-
 kernel/trace/trace_events_filter.c  |  4 +-
 kernel/trace/trace_events_hist.c    | 15 ++++++++
 kernel/trace/trace_events_trigger.c |  6 +--
 kernel/trace/trace_irqsoff.c        | 35 ++++++++++++++++++
 10 files changed, 217 insertions(+), 8 deletions(-)
 create mode 100644 include/trace/events/latency.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/4] tracing: Deference pointers without RCU checks
  2016-09-02 12:37 [PATCH v5 0/4] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
@ 2016-09-02 12:37 ` Binoy Jayan
  2016-09-02 12:37 ` [PATCH v5 2/4] tracing: Add cpu as a key field in histogram Binoy Jayan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Binoy Jayan @ 2016-09-02 12:37 UTC (permalink / raw)
  To: Steven Rostedt (Red Hat)
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami,
	Binoy Jayan

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

The tracepoint can't be used in code section where we are in the
middle of a state transition.

For example if we place a tracepoint inside start/stop_critical_section(),
lockdep complains with

[    0.035589] WARNING: CPU: 0 PID: 3 at kernel/locking/lockdep.c:3560 \
check_flags.part.36+0x1bc/0x210() [    0.036000] \
DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [    0.036000] Kernel panic - not \
syncing: panic_on_warn set ... [    0.036000]
[    0.036000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc7+ #460
[    0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS \
1.7.5-20140709_153950- 04/01/2014 [    0.036000]  ffffffff81f2463a ffff88007c93bb98 \
ffffffff81afb317 0000000000000001 [    0.036000]  ffffffff81f212b3 ffff88007c93bc18 \
ffffffff81af7bc2 ffff88007c93bbb8 [    0.036000]  ffffffff00000008 ffff88007c93bc28 \
ffff88007c93bbc8 000000000093bbd8 [    0.036000] Call Trace:
[    0.036000]  [<ffffffff81afb317>] dump_stack+0x4f/0x7b
[    0.036000]  [<ffffffff81af7bc2>] panic+0xc0/0x1e9
[    0.036000]  [<ffffffff81b07188>] ? _raw_spin_unlock_irqrestore+0x38/0x80
[    0.036000]  [<ffffffff81052a60>] warn_slowpath_common+0xc0/0xc0
[    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff81052aa6>] warn_slowpath_fmt+0x46/0x50
[    0.036000]  [<ffffffff810a468c>] check_flags.part.36+0x1bc/0x210
[    0.036000]  [<ffffffff810a5168>] lock_is_held+0x78/0x90
[    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff810588ab>] ? __do_softirq+0x3db/0x500
[    0.036000]  [<ffffffff81134495>] trace_preempt_on+0x255/0x260
[    0.036000]  [<ffffffff81081c0b>] preempt_count_sub+0xab/0xf0
[    0.036000]  [<ffffffff81057116>] __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff810588ab>] __do_softirq+0x3db/0x500
[    0.036000]  [<ffffffff810589ef>] run_ksoftirqd+0x1f/0x60
[    0.036000]  [<ffffffff81079653>] smpboot_thread_fn+0x193/0x2a0
[    0.036000]  [<ffffffff810794c0>] ? SyS_setgroups+0x150/0x150
[    0.036000]  [<ffffffff81075312>] kthread+0xf2/0x110
[    0.036000]  [<ffffffff81b026b3>] ? wait_for_completion+0xc3/0x120
[    0.036000]  [<ffffffff81081c0b>] ? preempt_count_sub+0xab/0xf0
[    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
[    0.036000]  [<ffffffff81b07cc8>] ret_from_fork+0x58/0x90
[    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
[    0.036000] ---[ end Kernel panic - not syncing: panic_on_warn set ...

PeterZ was so kind to explain it to me what is happening:

"__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
tells lockdep softirqs are enabled with trace_softirqs_on() after that
we go an actually modify the preempt_count with preempt_count_sub().
Then in preempt_count_sub() you call into trace_preempt_on() if this
was the last preempt_count increment but you do that _before_ you
actually change the preempt_count with __preempt_count_sub() at this
point lockdep and preempt_count think the world differs and *boom*"

So the simplest way to avoid this is by disabling the consistency
checks.

We also need to take care of the iterating in trace_events_trigger.c
to avoid a splatter in conjunction with the hist trigger.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 include/linux/rculist.h             | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/tracepoint.h          |  4 ++--
 kernel/trace/trace_events_filter.c  |  4 ++--
 kernel/trace/trace_events_trigger.c |  6 +++---
 4 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8beb98d..bee836b 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -279,6 +279,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
 	container_of(lockless_dereference(ptr), type, member)
 
 /**
+ * list_entry_rcu_notrace - get the struct for this entry (for tracing)
+ * @ptr:        the &struct list_head pointer.
+ * @type:       the type of the struct this is embedded in.
+ * @member:     the name of the list_head within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
+ *
+ * This is the same as list_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_entry_rcu_notrace(ptr, type, member) \
+({ \
+	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
+	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), type, member); \
+})
+
+/**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
  *
  * Implementing those functions following their counterparts list_empty() and
@@ -391,6 +409,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
 	     pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_rcu_notrace	-	iterate over rcu list of given type (for tracing)
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as list_add_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as list_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_for_each_entry_rcu_notrace(pos, head, member) \
+	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
+		&pos->member != (head); \
+		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index be586c6..d0e2a82 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -138,7 +138,7 @@ extern void syscall_unregfunc(void);
 			return;						\
 		prercu;							\
 		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+		it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs);	\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -189,7 +189,7 @@ extern void syscall_unregfunc(void);
 				TP_CONDITION(cond),,);			\
 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
 			rcu_read_lock_sched_notrace();			\
-			rcu_dereference_sched(__tracepoint_##name.funcs);\
+			rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\
 			rcu_read_unlock_sched_notrace();		\
 		}							\
 	}								\
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 9daa9b3..a00a8d5 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -608,11 +608,11 @@ int filter_match_preds(struct event_filter *filter, void *rec)
 	/*
 	 * n_preds, root and filter->preds are protect with preemption disabled.
 	 */
-	root = rcu_dereference_sched(filter->root);
+	root = rcu_dereference_raw_notrace(filter->root);
 	if (!root)
 		return 1;
 
-	data.preds = preds = rcu_dereference_sched(filter->preds);
+	data.preds = preds = rcu_dereference_raw_notrace(filter->preds);
 	ret = walk_pred_tree(preds, root, filter_match_preds_cb, &data);
 	WARN_ON(ret);
 	return data.match;
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index a975571..4489fb4 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -71,14 +71,14 @@ event_triggers_call(struct trace_event_file *file, void *rec)
 	if (list_empty(&file->triggers))
 		return tt;
 
-	list_for_each_entry_rcu(data, &file->triggers, list) {
+	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
 		if (data->paused)
 			continue;
 		if (!rec) {
 			data->ops->func(data, rec);
 			continue;
 		}
-		filter = rcu_dereference_sched(data->filter);
+		filter = rcu_dereference_raw_notrace(data->filter);
 		if (filter && !filter_match_preds(filter, rec))
 			continue;
 		if (event_command_post_trigger(data->cmd_ops)) {
@@ -111,7 +111,7 @@ event_triggers_post_call(struct trace_event_file *file,
 {
 	struct event_trigger_data *data;
 
-	list_for_each_entry_rcu(data, &file->triggers, list) {
+	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
 		if (data->paused)
 			continue;
 		if (data->cmd_ops->trigger_type & tt)
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/4] tracing: Add cpu as a key field in histogram
  2016-09-02 12:37 [PATCH v5 0/4] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
  2016-09-02 12:37 ` [PATCH v5 1/4] tracing: Deference pointers without RCU checks Binoy Jayan
@ 2016-09-02 12:37 ` Binoy Jayan
  2016-09-02 13:09   ` Steven Rostedt
  2016-09-05  5:40   ` Daniel Wagner
  2016-09-02 12:37 ` [PATCH v5 3/4] tracing: Add trace_irqsoff tracepoints Binoy Jayan
  2016-09-02 12:37 ` [PATCH v5 4/4] tracing: Histogram for delayed hrtimer offsets Binoy Jayan
  3 siblings, 2 replies; 9+ messages in thread
From: Binoy Jayan @ 2016-09-02 12:37 UTC (permalink / raw)
  To: Steven Rostedt (Red Hat)
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami,
	Binoy Jayan

The field 'cpu' although part of the set of generic fields, is not made
part of the key fields when mentioned in the trigger command. This hack
suggested by Daniel marks it as one of the key fields and make it appear
in the histogram output.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 kernel/trace/trace_events.c      |  3 ++-
 kernel/trace/trace_events_hist.c | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 03c0a48..c395608 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -152,7 +152,8 @@ EXPORT_SYMBOL_GPL(trace_define_field);
 
 #define __generic_field(type, item, filter_type)			\
 	ret = __trace_define_field(&ftrace_generic_fields, #type,	\
-				   #item, 0, 0, is_signed_type(type),	\
+				   #item, 0, sizeof(type),              \
+				   is_signed_type(type),                \
 				   filter_type);			\
 	if (ret)							\
 		return ret;
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0c05b8a..4e0a12e 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -75,6 +75,11 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event)
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
 
+static u64 hist_field_cpu(struct hist_field *hist_field, void *event)
+{
+	return (u64) raw_smp_processor_id();
+}
+
 #define DEFINE_HIST_FIELD_FN(type)					\
 static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
 {									\
@@ -119,6 +124,7 @@ enum hist_field_flags {
 	HIST_FIELD_FL_SYSCALL		= 128,
 	HIST_FIELD_FL_STACKTRACE	= 256,
 	HIST_FIELD_FL_LOG2		= 512,
+	HIST_FIELD_FL_CPU		= 1024,
 };
 
 struct hist_trigger_attrs {
@@ -371,6 +377,11 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 		goto out;
 	}
 
+	if (flags & HIST_FIELD_FL_CPU) {
+		hist_field->fn = hist_field_cpu;
+		goto out;
+	}
+
 	if (WARN_ON_ONCE(!field))
 		goto out;
 
@@ -526,6 +537,9 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 	} else {
 		char *field_name = strsep(&field_str, ".");
 
+		if (strcmp(field_name, "cpu") == 0)
+			flags |= HIST_FIELD_FL_CPU;
+
 		if (field_str) {
 			if (strcmp(field_str, "hex") == 0)
 				flags |= HIST_FIELD_FL_HEX;
@@ -556,6 +570,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 			key_size = MAX_FILTER_STR_VAL;
 		else
 			key_size = field->size;
+
 	}
 
 	hist_data->fields[key_idx] = create_hist_field(field, flags);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/4] tracing: Add trace_irqsoff tracepoints
  2016-09-02 12:37 [PATCH v5 0/4] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
  2016-09-02 12:37 ` [PATCH v5 1/4] tracing: Deference pointers without RCU checks Binoy Jayan
  2016-09-02 12:37 ` [PATCH v5 2/4] tracing: Add cpu as a key field in histogram Binoy Jayan
@ 2016-09-02 12:37 ` Binoy Jayan
  2016-09-02 13:14   ` Steven Rostedt
  2016-09-02 12:37 ` [PATCH v5 4/4] tracing: Histogram for delayed hrtimer offsets Binoy Jayan
  3 siblings, 1 reply; 9+ messages in thread
From: Binoy Jayan @ 2016-09-02 12:37 UTC (permalink / raw)
  To: Steven Rostedt (Red Hat)
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami,
	Binoy Jayan

This work is based on work by Daniel Wagner. A few tracepoints are added
at the end of the critical section. With the hist trigger in place, the
hist trigger plots may be generated, with per-cpu breakdown of events
captured. It is based on linux kernel's event infrastructure.

The following filter(s) may be used

'hist:key=latency.log2:val=hitcount:sort=latency'
'hist:key=ltype,latency:val=hitcount:sort=latency if cpu==1'
'hist:key=ltype:val=latency:sort=ltype if ltype==0 && cpu==2'

Where ltype is
0: IRQSOFF latency
1: PREEMPTOFF Latency
2: Critical Timings

This captures only the latencies introduced by disabled irqs and
preemption. Additional per process data has to be captured to calculate
the effective latencies introduced for individual processes.

Initial work - latency.patch

[1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 include/trace/events/latency.h | 50 ++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_irqsoff.c   | 35 +++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)
 create mode 100644 include/trace/events/latency.h

diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
new file mode 100644
index 0000000..ca57f06
--- /dev/null
+++ b/include/trace/events/latency.h
@@ -0,0 +1,50 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM latency
+
+#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HIST_H
+
+#include <linux/tracepoint.h>
+
+#ifndef __TRACE_LATENCY_TYPE
+#define __TRACE_LATENCY_TYPE
+enum latency_type {
+	LT_IRQ,
+	LT_PREEMPT,
+	LT_CRITTIME,
+	LT_MAX
+};
+#define show_ltype(type)			\
+	__print_symbolic(type,			\
+		{ LT_IRQ,	"IRQ" },	\
+		{ LT_PREEMPT,	"PREEMPT" },	\
+		{ LT_PREEMPT,	"CRIT_TIME" })
+#endif
+
+DECLARE_EVENT_CLASS(latency_template,
+	TP_PROTO(int ltype, cycles_t latency),
+
+	TP_ARGS(ltype, latency),
+
+	TP_STRUCT__entry(
+		__field(int,		ltype)
+		__field(cycles_t,	latency)
+	),
+
+	TP_fast_assign(
+		__entry->ltype		= ltype;
+		__entry->latency	= latency;
+	),
+
+	TP_printk("ltype=%s(%d), latency=%lu", show_ltype(__entry->ltype),
+		  __entry->ltype, (unsigned long) __entry->latency)
+);
+
+DEFINE_EVENT(latency_template, latency_preempt,
+	    TP_PROTO(int ltype, cycles_t latency),
+	    TP_ARGS(ltype, latency));
+
+#endif /* _TRACE_HIST_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 03cdff8..60ee660 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -13,13 +13,19 @@
 #include <linux/uaccess.h>
 #include <linux/module.h>
 #include <linux/ftrace.h>
+#include <linux/percpu-defs.h>
+#include <trace/events/sched.h>
 
 #include "trace.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/latency.h>
+
 static struct trace_array		*irqsoff_trace __read_mostly;
 static int				tracer_enabled __read_mostly;
 
 static DEFINE_PER_CPU(int, tracing_cpu);
+static DEFINE_PER_CPU(cycle_t, lat_ts[LT_MAX]);
 
 static DEFINE_RAW_SPINLOCK(max_trace_lock);
 
@@ -419,9 +425,23 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip)
 	atomic_dec(&data->disabled);
 }
 
+static inline void latency_preempt_timing_start(enum latency_type ltype)
+{
+	this_cpu_write(lat_ts[ltype], (cycle_t) trace_clock_local());
+}
+
+static inline void latency_preempt_timing_stop(enum latency_type type)
+{
+	trace_latency_preempt(type,
+		(cycle_t) trace_clock_local() - this_cpu_read(lat_ts[type]));
+}
+
 /* start and stop critical timings used to for stoppage (in idle) */
 void start_critical_timings(void)
 {
+	if (unlikely(trace_latency_preempt_enabled()))
+		latency_preempt_timing_start(LT_CRITTIME);
+
 	if (preempt_trace() || irq_trace())
 		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
@@ -431,6 +451,9 @@ void stop_critical_timings(void)
 {
 	if (preempt_trace() || irq_trace())
 		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+
+	if (unlikely(trace_latency_preempt_enabled()))
+		latency_preempt_timing_stop(LT_CRITTIME);
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
@@ -438,6 +461,9 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
 #ifdef CONFIG_PROVE_LOCKING
 void time_hardirqs_on(unsigned long a0, unsigned long a1)
 {
+	if (unlikely(trace_latency_preempt_enabled()))
+		latency_preempt_timing_stop(LT_IRQ);
+
 	if (!preempt_trace() && irq_trace())
 		stop_critical_timing(a0, a1);
 }
@@ -446,6 +472,9 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
 {
 	if (!preempt_trace() && irq_trace())
 		start_critical_timing(a0, a1);
+
+	if (unlikely(trace_latency_preempt_enabled()))
+		latency_preempt_timing_start(LT_IRQ);
 }
 
 #else /* !CONFIG_PROVE_LOCKING */
@@ -503,6 +532,9 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
 #ifdef CONFIG_PREEMPT_TRACER
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
+	if (unlikely(trace_latency_preempt_enabled()))
+		latency_preempt_timing_stop(LT_PREEMPT);
+
 	if (preempt_trace() && !irq_trace())
 		stop_critical_timing(a0, a1);
 }
@@ -511,6 +543,9 @@ void trace_preempt_off(unsigned long a0, unsigned long a1)
 {
 	if (preempt_trace() && !irq_trace())
 		start_critical_timing(a0, a1);
+
+	if (unlikely(trace_latency_preempt_enabled()))
+		latency_preempt_timing_start(LT_PREEMPT);
 }
 #endif /* CONFIG_PREEMPT_TRACER */
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 4/4] tracing: Histogram for delayed hrtimer offsets
  2016-09-02 12:37 [PATCH v5 0/4] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
                   ` (2 preceding siblings ...)
  2016-09-02 12:37 ` [PATCH v5 3/4] tracing: Add trace_irqsoff tracepoints Binoy Jayan
@ 2016-09-02 12:37 ` Binoy Jayan
  2016-09-02 13:20   ` Steven Rostedt
  3 siblings, 1 reply; 9+ messages in thread
From: Binoy Jayan @ 2016-09-02 12:37 UTC (permalink / raw)
  To: Steven Rostedt (Red Hat)
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami,
	Binoy Jayan

Generate a histogram of the latencies of delayed timer offsets in
nanoseconds. It shows the latency captured due to a delayed timer expire
event. It happens for example when a timer misses its deadline due to
disabled interrupts. A process if scheduled as a result of the timer
expiration suffers this latency.

The following filter(s) may be used

'hist:key=common_pid.execname:val=toffset,hitcount'
'hist:key=cpu,tcomm:val=toffset:sort=tcomm'
'hist:key=common_pid.execname,tcomm'

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 include/linux/hrtimer.h        |  3 +++
 include/trace/events/latency.h | 23 +++++++++++++++++++++
 kernel/time/hrtimer.c          | 46 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 5e00f80..e09de14 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -104,6 +104,9 @@ struct hrtimer {
 	struct hrtimer_clock_base	*base;
 	u8				state;
 	u8				is_rel;
+#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
+	ktime_t				praecox;
+#endif
 #ifdef CONFIG_TIMER_STATS
 	int				start_pid;
 	void				*start_site;
diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
index ca57f06..d616db5 100644
--- a/include/trace/events/latency.h
+++ b/include/trace/events/latency.h
@@ -44,6 +44,29 @@ DEFINE_EVENT(latency_template, latency_preempt,
 	    TP_PROTO(int ltype, cycles_t latency),
 	    TP_ARGS(ltype, latency));
 
+TRACE_EVENT(latency_hrtimer_interrupt,
+
+	TP_PROTO(long long toffset, struct task_struct *task),
+
+	TP_ARGS(toffset, task),
+
+	TP_STRUCT__entry(
+		__field(long long,	toffset)
+		__array(char,		tcomm,	TASK_COMM_LEN)
+		__field(int,		tprio)
+	),
+
+	TP_fast_assign(
+		__entry->toffset = toffset;
+		memcpy(__entry->tcomm, task != NULL ? task->comm : "<none>",
+			task != NULL ? TASK_COMM_LEN : 7);
+		__entry->tprio  = task != NULL ? task->prio : -1;
+	),
+
+	TP_printk("toffset=%lld thread=%s[%d]",
+		__entry->toffset, __entry->tcomm, __entry->tprio)
+);
+
 #endif /* _TRACE_HIST_H */
 
 /* This part must be outside protection */
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 9ba7c82..04d936b 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -53,9 +53,12 @@
 #include <asm/uaccess.h>
 
 #include <trace/events/timer.h>
+#include <trace/events/latency.h>
 
 #include "tick-internal.h"
 
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
 /*
  * The timer bases:
  *
@@ -960,6 +963,45 @@ static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t tim,
 	return tim;
 }
 
+static inline void latency_hrtimer_timing_start(struct hrtimer *timer,
+					 struct hrtimer_clock_base *new_base,
+					 ktime_t tim)
+{
+#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
+	if (unlikely(trace_latency_hrtimer_interrupt_enabled())) {
+		ktime_t now = new_base->get_time();
+
+		if (ktime_to_ns(tim) < ktime_to_ns(now))
+			timer->praecox = now;
+		else
+			timer->praecox = ktime_set(0, 0);
+	}
+#endif
+}
+
+static inline void latency_hrtimer_timing_stop(struct hrtimer *timer,
+						ktime_t basenow)
+{
+#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
+	long latency;
+
+	struct task_struct *task;
+
+	if (likely(!trace_latency_hrtimer_interrupt_enabled()))
+		return;
+
+	latency = ktime_to_ns(ktime_sub(basenow,
+			      ktime_to_ns(timer->praecox) ?
+			      timer->praecox : hrtimer_get_expires(timer)));
+
+	task = timer->function == hrtimer_wakeup ?
+			container_of(timer, struct hrtimer_sleeper,
+				     timer)->task : NULL;
+	if (latency > 0)
+		trace_latency_hrtimer_interrupt((u64) latency, task);
+#endif
+}
+
 /**
  * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
  * @timer:	the timer to be added
@@ -992,6 +1034,8 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 
 	timer_stats_hrtimer_set_start_info(timer);
 
+	latency_hrtimer_timing_start(timer, new_base, tim);
+
 	leftmost = enqueue_hrtimer(timer, new_base);
 	if (!leftmost)
 		goto unlock;
@@ -1284,6 +1328,8 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
 
 			timer = container_of(node, struct hrtimer, node);
 
+			latency_hrtimer_timing_stop(timer, basenow);
+
 			/*
 			 * The immediate goal for using the softexpires is
 			 * minimizing wakeups, not running timers at the
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 2/4] tracing: Add cpu as a key field in histogram
  2016-09-02 12:37 ` [PATCH v5 2/4] tracing: Add cpu as a key field in histogram Binoy Jayan
@ 2016-09-02 13:09   ` Steven Rostedt
  2016-09-05  5:40   ` Daniel Wagner
  1 sibling, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2016-09-02 13:09 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami

On Fri,  2 Sep 2016 18:07:29 +0530
Binoy Jayan <binoy.jayan@linaro.org> wrote:

> The field 'cpu' although part of the set of generic fields, is not made
> part of the key fields when mentioned in the trigger command. This hack
> suggested by Daniel marks it as one of the key fields and make it appear
> in the histogram output.
> 
> Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
> ---
>  kernel/trace/trace_events.c      |  3 ++-
>  kernel/trace/trace_events_hist.c | 15 +++++++++++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 03c0a48..c395608 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -152,7 +152,8 @@ EXPORT_SYMBOL_GPL(trace_define_field);
>  
>  #define __generic_field(type, item, filter_type)			\
>  	ret = __trace_define_field(&ftrace_generic_fields, #type,	\
> -				   #item, 0, 0, is_signed_type(type),	\
> +				   #item, 0, sizeof(type),              \
> +				   is_signed_type(type),                \
>  				   filter_type);			\
>  	if (ret)							\
>  		return ret;
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 0c05b8a..4e0a12e 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -75,6 +75,11 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event)
>  	return (u64) ilog2(roundup_pow_of_two(val));
>  }
>  
> +static u64 hist_field_cpu(struct hist_field *hist_field, void *event)
> +{
> +	return (u64) raw_smp_processor_id();

Does this really need to be raw? I'm guessing that this would be called
always in preempt disabled locations, but if it is not, wouldn't we
want to know about it?

> +}
> +
>  #define DEFINE_HIST_FIELD_FN(type)					\
>  static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
>  {									\
> @@ -119,6 +124,7 @@ enum hist_field_flags {
>  	HIST_FIELD_FL_SYSCALL		= 128,
>  	HIST_FIELD_FL_STACKTRACE	= 256,
>  	HIST_FIELD_FL_LOG2		= 512,
> +	HIST_FIELD_FL_CPU		= 1024,
>  };
>  
>  struct hist_trigger_attrs {
> @@ -371,6 +377,11 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
>  		goto out;
>  	}
>  
> +	if (flags & HIST_FIELD_FL_CPU) {
> +		hist_field->fn = hist_field_cpu;
> +		goto out;
> +	}
> +
>  	if (WARN_ON_ONCE(!field))
>  		goto out;
>  
> @@ -526,6 +537,9 @@ static int create_key_field(struct hist_trigger_data *hist_data,
>  	} else {
>  		char *field_name = strsep(&field_str, ".");
>  
> +		if (strcmp(field_name, "cpu") == 0)
> +			flags |= HIST_FIELD_FL_CPU;
> +
>  		if (field_str) {
>  			if (strcmp(field_str, "hex") == 0)
>  				flags |= HIST_FIELD_FL_HEX;
> @@ -556,6 +570,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
>  			key_size = MAX_FILTER_STR_VAL;
>  		else
>  			key_size = field->size;
> +

Unnecessary white space.

-- Steve

>  	}
>  
>  	hist_data->fields[key_idx] = create_hist_field(field, flags);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/4] tracing: Add trace_irqsoff tracepoints
  2016-09-02 12:37 ` [PATCH v5 3/4] tracing: Add trace_irqsoff tracepoints Binoy Jayan
@ 2016-09-02 13:14   ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2016-09-02 13:14 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami

On Fri,  2 Sep 2016 18:07:30 +0530
Binoy Jayan <binoy.jayan@linaro.org> wrote:

> This work is based on work by Daniel Wagner. A few tracepoints are added
> at the end of the critical section. With the hist trigger in place, the
> hist trigger plots may be generated, with per-cpu breakdown of events
> captured. It is based on linux kernel's event infrastructure.
> 
> The following filter(s) may be used
> 
> 'hist:key=latency.log2:val=hitcount:sort=latency'
> 'hist:key=ltype,latency:val=hitcount:sort=latency if cpu==1'
> 'hist:key=ltype:val=latency:sort=ltype if ltype==0 && cpu==2'
> 
> Where ltype is
> 0: IRQSOFF latency
> 1: PREEMPTOFF Latency
> 2: Critical Timings
> 
> This captures only the latencies introduced by disabled irqs and
> preemption. Additional per process data has to be captured to calculate
> the effective latencies introduced for individual processes.
> 
> Initial work - latency.patch
> 
> [1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb
> 
> Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
> ---
>  include/trace/events/latency.h | 50 ++++++++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace_irqsoff.c   | 35 +++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+)
>  create mode 100644 include/trace/events/latency.h
> 
> diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
> new file mode 100644
> index 0000000..ca57f06
> --- /dev/null
> +++ b/include/trace/events/latency.h
> @@ -0,0 +1,50 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM latency
> +
> +#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HIST_H
> +
> +#include <linux/tracepoint.h>
> +
> +#ifndef __TRACE_LATENCY_TYPE
> +#define __TRACE_LATENCY_TYPE
> +enum latency_type {
> +	LT_IRQ,
> +	LT_PREEMPT,
> +	LT_CRITTIME,
> +	LT_MAX
> +};
> +#define show_ltype(type)			\
> +	__print_symbolic(type,			\
> +		{ LT_IRQ,	"IRQ" },	\
> +		{ LT_PREEMPT,	"PREEMPT" },	\
> +		{ LT_PREEMPT,	"CRIT_TIME" })

Hmm, you have LT_PREEMPT twice above.

Also please add:

TRACE_DEFINE_ENUM(LT_IRQ);
TRACE_DEFINE_ENUM(LT_PREEMPT);
TRACE_DEFINE_ENUM(LT_CRITTIME);

-- Steve

> +#endif
> +
> +DECLARE_EVENT_CLASS(latency_template,
> +	TP_PROTO(int ltype, cycles_t latency),
> +
> +	TP_ARGS(ltype, latency),
> +
> +	TP_STRUCT__entry(
> +		__field(int,		ltype)
> +		__field(cycles_t,	latency)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->ltype		= ltype;
> +		__entry->latency	= latency;
> +	),
> +
> +	TP_printk("ltype=%s(%d), latency=%lu", show_ltype(__entry->ltype),
> +		  __entry->ltype, (unsigned long) __entry->latency)
> +);
> +
> +DEFINE_EVENT(latency_template, latency_preempt,
> +	    TP_PROTO(int ltype, cycles_t latency),
> +	    TP_ARGS(ltype, latency));
> +
> +#endif /* _TRACE_HIST_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 03cdff8..60ee660 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -13,13 +13,19 @@
>  #include <linux/uaccess.h>
>  #include <linux/module.h>
>  #include <linux/ftrace.h>
> +#include <linux/percpu-defs.h>
> +#include <trace/events/sched.h>
>  
>  #include "trace.h"
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/latency.h>
> +
>  static struct trace_array		*irqsoff_trace __read_mostly;
>  static int				tracer_enabled __read_mostly;
>  
>  static DEFINE_PER_CPU(int, tracing_cpu);
> +static DEFINE_PER_CPU(cycle_t, lat_ts[LT_MAX]);
>  
>  static DEFINE_RAW_SPINLOCK(max_trace_lock);
>  
> @@ -419,9 +425,23 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip)
>  	atomic_dec(&data->disabled);
>  }
>  
> +static inline void latency_preempt_timing_start(enum latency_type ltype)
> +{
> +	this_cpu_write(lat_ts[ltype], (cycle_t) trace_clock_local());
> +}
> +
> +static inline void latency_preempt_timing_stop(enum latency_type type)
> +{
> +	trace_latency_preempt(type,
> +		(cycle_t) trace_clock_local() - this_cpu_read(lat_ts[type]));
> +}
> +
>  /* start and stop critical timings used to for stoppage (in idle) */
>  void start_critical_timings(void)
>  {
> +	if (unlikely(trace_latency_preempt_enabled()))
> +		latency_preempt_timing_start(LT_CRITTIME);
> +
>  	if (preempt_trace() || irq_trace())
>  		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
>  }
> @@ -431,6 +451,9 @@ void stop_critical_timings(void)
>  {
>  	if (preempt_trace() || irq_trace())
>  		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
> +
> +	if (unlikely(trace_latency_preempt_enabled()))
> +		latency_preempt_timing_stop(LT_CRITTIME);
>  }
>  EXPORT_SYMBOL_GPL(stop_critical_timings);
>  
> @@ -438,6 +461,9 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
>  #ifdef CONFIG_PROVE_LOCKING
>  void time_hardirqs_on(unsigned long a0, unsigned long a1)
>  {
> +	if (unlikely(trace_latency_preempt_enabled()))
> +		latency_preempt_timing_stop(LT_IRQ);
> +
>  	if (!preempt_trace() && irq_trace())
>  		stop_critical_timing(a0, a1);
>  }
> @@ -446,6 +472,9 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
>  {
>  	if (!preempt_trace() && irq_trace())
>  		start_critical_timing(a0, a1);
> +
> +	if (unlikely(trace_latency_preempt_enabled()))
> +		latency_preempt_timing_start(LT_IRQ);
>  }
>  
>  #else /* !CONFIG_PROVE_LOCKING */
> @@ -503,6 +532,9 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
>  #ifdef CONFIG_PREEMPT_TRACER
>  void trace_preempt_on(unsigned long a0, unsigned long a1)
>  {
> +	if (unlikely(trace_latency_preempt_enabled()))
> +		latency_preempt_timing_stop(LT_PREEMPT);
> +
>  	if (preempt_trace() && !irq_trace())
>  		stop_critical_timing(a0, a1);
>  }
> @@ -511,6 +543,9 @@ void trace_preempt_off(unsigned long a0, unsigned long a1)
>  {
>  	if (preempt_trace() && !irq_trace())
>  		start_critical_timing(a0, a1);
> +
> +	if (unlikely(trace_latency_preempt_enabled()))
> +		latency_preempt_timing_start(LT_PREEMPT);
>  }
>  #endif /* CONFIG_PREEMPT_TRACER */
>  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 4/4] tracing: Histogram for delayed hrtimer offsets
  2016-09-02 12:37 ` [PATCH v5 4/4] tracing: Histogram for delayed hrtimer offsets Binoy Jayan
@ 2016-09-02 13:20   ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2016-09-02 13:20 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Ingo Molnar, Daniel Wagner, Arnd Bergmann, linux-kernel, Masami,
	Thomas Gleixner

[ Added Cc to hrtimer maintainer ]

On Fri,  2 Sep 2016 18:07:31 +0530
Binoy Jayan <binoy.jayan@linaro.org> wrote:

> Generate a histogram of the latencies of delayed timer offsets in
> nanoseconds. It shows the latency captured due to a delayed timer expire
> event. It happens for example when a timer misses its deadline due to
> disabled interrupts. A process if scheduled as a result of the timer
> expiration suffers this latency.
> 
> The following filter(s) may be used
> 
> 'hist:key=common_pid.execname:val=toffset,hitcount'
> 'hist:key=cpu,tcomm:val=toffset:sort=tcomm'
> 'hist:key=common_pid.execname,tcomm'
> 
> Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
> ---
>  include/linux/hrtimer.h        |  3 +++
>  include/trace/events/latency.h | 23 +++++++++++++++++++++
>  kernel/time/hrtimer.c          | 46 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 72 insertions(+)
> 
> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index 5e00f80..e09de14 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -104,6 +104,9 @@ struct hrtimer {
>  	struct hrtimer_clock_base	*base;
>  	u8				state;
>  	u8				is_rel;
> +#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
> +	ktime_t				praecox;
> +#endif
>  #ifdef CONFIG_TIMER_STATS
>  	int				start_pid;
>  	void				*start_site;
> diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
> index ca57f06..d616db5 100644
> --- a/include/trace/events/latency.h
> +++ b/include/trace/events/latency.h
> @@ -44,6 +44,29 @@ DEFINE_EVENT(latency_template, latency_preempt,
>  	    TP_PROTO(int ltype, cycles_t latency),
>  	    TP_ARGS(ltype, latency));
>  
> +TRACE_EVENT(latency_hrtimer_interrupt,
> +
> +	TP_PROTO(long long toffset, struct task_struct *task),
> +
> +	TP_ARGS(toffset, task),
> +
> +	TP_STRUCT__entry(
> +		__field(long long,	toffset)
> +		__array(char,		tcomm,	TASK_COMM_LEN)
> +		__field(int,		tprio)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->toffset = toffset;
> +		memcpy(__entry->tcomm, task != NULL ? task->comm : "<none>",
> +			task != NULL ? TASK_COMM_LEN : 7);
> +		__entry->tprio  = task != NULL ? task->prio : -1;
> +	),
> +
> +	TP_printk("toffset=%lld thread=%s[%d]",
> +		__entry->toffset, __entry->tcomm, __entry->tprio)
> +);

This probably should be added into the hrtimer interrupt tracepoints
and not off in some special tracepoint header.

> +
>  #endif /* _TRACE_HIST_H */
>  
>  /* This part must be outside protection */
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 9ba7c82..04d936b 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -53,9 +53,12 @@
>  #include <asm/uaccess.h>
>  
>  #include <trace/events/timer.h>
> +#include <trace/events/latency.h>
>  
>  #include "tick-internal.h"
>  
> +static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
> +
>  /*
>   * The timer bases:
>   *
> @@ -960,6 +963,45 @@ static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t tim,
>  	return tim;
>  }
>  
> +static inline void latency_hrtimer_timing_start(struct hrtimer *timer,
> +					 struct hrtimer_clock_base *new_base,
> +					 ktime_t tim)
> +{
> +#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)

The ideal approach to #ifdefs like this, is not to add them in side
each function, as it is confusing to understand, but simply have one:

#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
static inline void latency_hrtimer_timing_start(..)
{
> +	if (unlikely(trace_latency_hrtimer_interrupt_enabled())) {
> +		ktime_t now = new_base->get_time();
> +
> +		if (ktime_to_ns(tim) < ktime_to_ns(now))
> +			timer->praecox = now;
> +		else
> +			timer->praecox = ktime_set(0, 0);
> +	}
}

static inline void latency_hrtimer_timing_stop(..)
{
	[...]
}
#else
static inline void latency_hrtimer_timing_start(..) { }
static inline void latency_hrtimer_timing_stop(..) { }
#endif

But something like this patch is going to need Thomas's approval.

-- Steve

> +#endif
> +}
> +
> +static inline void latency_hrtimer_timing_stop(struct hrtimer *timer,
> +						ktime_t basenow)
> +{
> +#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
> +	long latency;
> +
> +	struct task_struct *task;
> +
> +	if (likely(!trace_latency_hrtimer_interrupt_enabled()))
> +		return;
> +
> +	latency = ktime_to_ns(ktime_sub(basenow,
> +			      ktime_to_ns(timer->praecox) ?
> +			      timer->praecox : hrtimer_get_expires(timer)));
> +
> +	task = timer->function == hrtimer_wakeup ?
> +			container_of(timer, struct hrtimer_sleeper,
> +				     timer)->task : NULL;
> +	if (latency > 0)
> +		trace_latency_hrtimer_interrupt((u64) latency, task);
> +#endif
> +}
> +
>  /**
>   * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
>   * @timer:	the timer to be added
> @@ -992,6 +1034,8 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>  
>  	timer_stats_hrtimer_set_start_info(timer);
>  
> +	latency_hrtimer_timing_start(timer, new_base, tim);
> +
>  	leftmost = enqueue_hrtimer(timer, new_base);
>  	if (!leftmost)
>  		goto unlock;
> @@ -1284,6 +1328,8 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
>  
>  			timer = container_of(node, struct hrtimer, node);
>  
> +			latency_hrtimer_timing_stop(timer, basenow);
> +
>  			/*
>  			 * The immediate goal for using the softexpires is
>  			 * minimizing wakeups, not running timers at the

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 2/4] tracing: Add cpu as a key field in histogram
  2016-09-02 12:37 ` [PATCH v5 2/4] tracing: Add cpu as a key field in histogram Binoy Jayan
  2016-09-02 13:09   ` Steven Rostedt
@ 2016-09-05  5:40   ` Daniel Wagner
  1 sibling, 0 replies; 9+ messages in thread
From: Daniel Wagner @ 2016-09-05  5:40 UTC (permalink / raw)
  To: Binoy Jayan, Steven Rostedt (Red Hat)
  Cc: Ingo Molnar, Arnd Bergmann, linux-kernel, Masami

Hi Binoy

On 09/02/2016 02:37 PM, Binoy Jayan wrote:
> The field 'cpu' although part of the set of generic fields, is not made
> part of the key fields when mentioned in the trigger command. This hack
> suggested by Daniel marks it as one of the key fields and make it appear
> in the histogram output.

While you are at it, why not adding the COMM as well? I know there is
already the common_pid.execname. With only a few extra lines we get COMM
as well and it allows to write something like

	hist:key=cpu,comm:sort=cpu,comm

for example.

cheers,
daniel



>From cc2f610078ebe9ba8d850cbf602aa529df690cdb Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Wed, 31 Aug 2016 09:37:38 +0200
Subject: [PATCH] tracing: Add hist trigger support for generic fields

Whenever a trace is printed the generic fields (CPU, COMM) are
reconstructed (see trace_print_context()). CPU is taken from the
trace_iterator and COMM is extracted from the savedcmd map (see
__trace_find_cmdline()).

We can't reconstruct this information for hist events. Therefore this
information needs to stored when a new event is added to the hist
buffer.

There is already support for extracting the COMM for the common_pid
field. For this the tracing_map_ops infrasture is used. Unfortantly, we
can't reuse it because it extends an existing hist_field. That means we
first need to add a hist_field before we are able to make reuse of
trace_map_ops.

Furthermore, it is quite easy to extend the current code to support
those two fields by adding hist_field_cpu() and hist_field_comm().

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Binoy Jayan <binoy.jayan@linaro.org>
---
 kernel/trace/trace.h             |  6 ++++++
 kernel/trace/trace_events.c      | 13 +++++++------
 kernel/trace/trace_events_hist.c | 31 ++++++++++++++++++++++++++++---
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f783df4..2b8889b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1288,6 +1288,12 @@ static inline bool is_function_field(struct ftrace_event_field *field)
 	return field->filter_type == FILTER_TRACE_FN;
 }
 
+static inline bool is_generic_field(struct ftrace_event_field *field)
+{
+	return field->filter_type == FILTER_CPU ||
+	       field->filter_type == FILTER_COMM;
+}
+
 extern enum regex_type
 filter_parse_regex(char *buff, int len, char **search, int *not);
 extern void print_event_filter(struct trace_event_file *file,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 03c0a48..7d9154d 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -150,9 +150,10 @@ int trace_define_field(struct trace_event_call *call, const char *type,
 }
 EXPORT_SYMBOL_GPL(trace_define_field);
 
-#define __generic_field(type, item, filter_type)			\
+#define __generic_field(type, item, filter_type, size)			\
 	ret = __trace_define_field(&ftrace_generic_fields, #type,	\
-				   #item, 0, 0, is_signed_type(type),	\
+				   #item, 0, size,			\
+				   is_signed_type(type),		\
 				   filter_type);			\
 	if (ret)							\
 		return ret;
@@ -170,10 +171,10 @@ static int trace_define_generic_fields(void)
 {
 	int ret;
 
-	__generic_field(int, CPU, FILTER_CPU);
-	__generic_field(int, cpu, FILTER_CPU);
-	__generic_field(char *, COMM, FILTER_COMM);
-	__generic_field(char *, comm, FILTER_COMM);
+	__generic_field(int, CPU, FILTER_CPU, sizeof(int));
+	__generic_field(int, cpu, FILTER_CPU, sizeof(int));
+	__generic_field(char *, COMM, FILTER_COMM, TASK_COMM_LEN + 1);
+	__generic_field(char *, comm, FILTER_COMM, TASK_COMM_LEN + 1);
 
 	return ret;
 }
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index f3a960e..0b40f22 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -75,6 +75,16 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event)
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
 
+static u64 hist_field_cpu(struct hist_field *hist_field, void *event)
+{
+	return (u64) smp_processor_id();
+}
+
+static u64 hist_field_comm(struct hist_field *hist_field, void *event)
+{
+	return (u64)(unsigned long)current->comm;
+}
+
 #define DEFINE_HIST_FIELD_FN(type)					\
 static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
 {									\
@@ -119,6 +129,7 @@ enum hist_field_flags {
 	HIST_FIELD_FL_SYSCALL		= 128,
 	HIST_FIELD_FL_STACKTRACE	= 256,
 	HIST_FIELD_FL_LOG2		= 512,
+	HIST_FIELD_FL_COMM		= 1024,
 };
 
 struct hist_trigger_attrs {
@@ -383,6 +394,13 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 			hist_field->fn = hist_field_dynstring;
 		else
 			hist_field->fn = hist_field_pstring;
+	} else if (is_generic_field(field)) {
+		if (field->filter_type == FILTER_CPU)
+			hist_field->fn = hist_field_cpu;
+		else if (field->filter_type == FILTER_COMM) {
+			flags |= HIST_FIELD_FL_COMM;
+			hist_field->fn = hist_field_comm;
+		}
 	} else {
 		hist_field->fn = select_value_fn(field->size,
 						 field->is_signed);
@@ -748,7 +766,8 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 
 			if (hist_field->flags & HIST_FIELD_FL_STACKTRACE)
 				cmp_fn = tracing_map_cmp_none;
-			else if (is_string_field(field))
+			else if (is_string_field(field) ||
+					hist_field->flags & HIST_FIELD_FL_COMM)
 				cmp_fn = tracing_map_cmp_string;
 			else
 				cmp_fn = tracing_map_cmp_num(field->size,
@@ -871,7 +890,8 @@ static inline void add_to_key(char *compound_key, void *key,
 		/* ensure NULL-termination */
 		if (size > key_field->size - 1)
 			size = key_field->size - 1;
-	}
+	} else if (key_field->flags & HIST_FIELD_FL_COMM)
+		size = key_field->field->size;
 
 	memcpy(compound_key + key_field->offset, key, size);
 }
@@ -906,7 +926,8 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec)
 			key = entries;
 		} else {
 			field_contents = key_field->fn(key_field, rec);
-			if (key_field->flags & HIST_FIELD_FL_STRING) {
+			if (key_field->flags & (HIST_FIELD_FL_STRING |
+						HIST_FIELD_FL_COMM)) {
 				key = (void *)(unsigned long)field_contents;
 				use_compound_key = true;
 			} else
@@ -1004,6 +1025,10 @@ hist_trigger_entry_print(struct seq_file *m,
 		} else if (key_field->flags & HIST_FIELD_FL_STRING) {
 			seq_printf(m, "%s: %-50s", key_field->field->name,
 				   (char *)(key + key_field->offset));
+		} else if (key_field->flags & HIST_FIELD_FL_COMM) {
+			seq_printf(m, "%s: %-16s",
+				   key_field->field->name,
+				   (char *)(key + key_field->offset));
 		} else {
 			uval = *(u64 *)(key + key_field->offset);
 			seq_printf(m, "%s: %10llu", key_field->field->name,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-05  5:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02 12:37 [PATCH v5 0/4] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
2016-09-02 12:37 ` [PATCH v5 1/4] tracing: Deference pointers without RCU checks Binoy Jayan
2016-09-02 12:37 ` [PATCH v5 2/4] tracing: Add cpu as a key field in histogram Binoy Jayan
2016-09-02 13:09   ` Steven Rostedt
2016-09-05  5:40   ` Daniel Wagner
2016-09-02 12:37 ` [PATCH v5 3/4] tracing: Add trace_irqsoff tracepoints Binoy Jayan
2016-09-02 13:14   ` Steven Rostedt
2016-09-02 12:37 ` [PATCH v5 4/4] tracing: Histogram for delayed hrtimer offsets Binoy Jayan
2016-09-02 13:20   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).