linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF ***
@ 2016-08-24 11:17 Binoy Jayan
  2016-08-24 11:17 ` [PATCH v2 1/3] tracing: Deference pointers without RCU checks Binoy Jayan
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Binoy Jayan @ 2016-08-24 11:17 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-kernel
  Cc: Daniel Wagner, Carsten Emde, linux-kernel,
	Steven Rostedt (Red Hat),
	Binoy Jayan

Hi,

Thank you Steven and Daniel for reviewing v1 and providing suggestions.
These set of patches [v2] capture latency events caused by interrupts and
premption disabled in kernel. The patches are based on the hist trigger
feature developed by Tom Zanussi.

Git-commit: 7ef224d1d0e3a1ade02d02c01ce1dcffb736d2c3

As mentioned by Daniel, there is also a good write up in the following 
blog by Brendan Gregg:
http://www.brendangregg.com/blog/2016-06-08/linux-hist-triggers.html

The perf interface for the same have not been developed yet.
Related efforts: https://patchwork.kernel.org/patch/8439401

hwlat_detector tracer:
https://lkml.org/lkml/2016/8/4/348
https://lkml.org/lkml/2016/8/4/346

The patch series also contains histogram triggers for missed
hrtimer offsets.

Dependencies:
CONFIG_IRQSOFF_TRACER
CONFIG_PREEMPT_TRACER
CONFIG_PROVE_LOCKING
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG

Usage of triggers to generate histograms:

mount -t debugfs nodev /sys/kernel/debug
echo 'hist:key=latency.log2:val=hitcount:sort=latency' > /sys/kernel/debug/tracing/events/latency/latency_irqs/trigger
echo 'hist:key=common_pid.execname' > /sys/kernel/debug/tracing/events/latency/latency_hrtimer_interrupt/trigger

CPU specific breakdown of events:

echo 'hist:key=cpu,latency:val=hitcount:sort=latency' > /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger
echo 'hist:key=cpu,latency:val=hitcount:sort=latency if cpu==1' > /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger

Histogram output:
cat /sys/kernel/debug/tracing/events/latency/latency_irqs/hist
cat /sys/kernel/debug/tracing/events/latency/latency_preempt/hist
cat /sys/kernel/debug/tracing/events/latency/latency_critical_timings/hist
cat /sys/kernel/debug/tracing/events/latency/latency_hrtimer_interrupt/hist

Disable a trigger:
echo '!hist:key=latency.log2' > /sys/kernel/debug/tracing/events/latency/latency_irqs/trigger

Changes from v1 as per comments from Steven/Daniel
  - Use single tracepoint for irq/preempt/critical timings by introducing
    a trace type field to differentiate trace type in the same tracepoint.
    A suspicious RCU usage error was introduced, while using the trigger
    command by mentioning the trace type as a key along with cpu.
    I couldn't figure out why. Also, this type of arrangement may also 
    be substandard performance vice.
  - Using a more accurate fast local clock instead of a global ftrace clock.

TODO:
1. perf interface. Not sure if this is needed
2. Latency histograms - process wakeup latency
  - Suggestion from Daniel to not introduce tracepoints in scheduler's hotpaths
  - Alternative to attach kprobes to functions which falls in critical paths
    and find diff of timestamps using event trigger commands.

    For example:

    echo "p:myprobe1 start_critical_timings" > /sys/kernel/debug/tracing/kprobe_events
    echo "p:myprobe2 stop_critical_timings" >>  /sys/kernel/debug/tracing/kprobe_events
    cat /sys/kernel/debug/tracing/kprobe_events
    echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe1/enable
    echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe2/enable
    cat /sys/kernel/debug/tracing/kprobe_events

    And somehow save the timestamps for 'myprobe1' and 'myprobe2' in
    'event_hist_trigger()'. This seems not feasible now as the histogram
    data is saved as a 'sum' only for the conditions met in the key definition.
    This makes it impossible to save timestamps for individual events.

    kernel/trace/trace_events_hist.c +840: hist_trigger_elt_update()

    Mhiramat and Steve, suggested an alternative to keep this timestamp is
    to create a new ftrace map, store the timestamp with context "key" on the
    named map upon event start. Then, at the event end trigger the histogram,
    pick timestamp from the map by using context "key" and calculate the
    difference. Basically this needs is a "map" which can be accessed from both
    the events, .i.e that is the "global variable".

Binoy

Binoy Jayan (2):
  tracing: Add trace_irqsoff tracepoints
  tracing: Histogram for missed timer offsets

Daniel Wagner (1):
  tracing: Deference pointers without RCU checks

 include/linux/hrtimer.h             |  3 ++
 include/linux/rculist.h             | 36 ++++++++++++++++++
 include/linux/tracepoint.h          |  4 +-
 include/trace/events/latency.h      | 74 +++++++++++++++++++++++++++++++++++++
 kernel/time/hrtimer.c               | 39 +++++++++++++++++++
 kernel/trace/trace_events_trigger.c |  6 +--
 kernel/trace/trace_irqsoff.c        | 42 ++++++++++++++++++++-
 7 files changed, 198 insertions(+), 6 deletions(-)
 create mode 100644 include/trace/events/latency.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] tracing: Deference pointers without RCU checks
  2016-08-24 11:17 [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
@ 2016-08-24 11:17 ` Binoy Jayan
  2016-08-26  1:49   ` Masami Hiramatsu
  2016-08-24 11:17 ` [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints Binoy Jayan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Binoy Jayan @ 2016-08-24 11:17 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-kernel
  Cc: Daniel Wagner, Carsten Emde, linux-kernel, Steven Rostedt (Red Hat)

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

The tracepoint can't be used in code section where we are in the
middle of a state transition.

For example if we place a tracepoint inside start/stop_critical_section(),
lockdep complains with

[    0.035589] WARNING: CPU: 0 PID: 3 at kernel/locking/lockdep.c:3560 \
check_flags.part.36+0x1bc/0x210() [    0.036000] \
DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [    0.036000] Kernel panic - not \
syncing: panic_on_warn set ... [    0.036000]
[    0.036000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc7+ #460
[    0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS \
1.7.5-20140709_153950- 04/01/2014 [    0.036000]  ffffffff81f2463a ffff88007c93bb98 \
ffffffff81afb317 0000000000000001 [    0.036000]  ffffffff81f212b3 ffff88007c93bc18 \
ffffffff81af7bc2 ffff88007c93bbb8 [    0.036000]  ffffffff00000008 ffff88007c93bc28 \
ffff88007c93bbc8 000000000093bbd8 [    0.036000] Call Trace:
[    0.036000]  [<ffffffff81afb317>] dump_stack+0x4f/0x7b
[    0.036000]  [<ffffffff81af7bc2>] panic+0xc0/0x1e9
[    0.036000]  [<ffffffff81b07188>] ? _raw_spin_unlock_irqrestore+0x38/0x80
[    0.036000]  [<ffffffff81052a60>] warn_slowpath_common+0xc0/0xc0
[    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff81052aa6>] warn_slowpath_fmt+0x46/0x50
[    0.036000]  [<ffffffff810a468c>] check_flags.part.36+0x1bc/0x210
[    0.036000]  [<ffffffff810a5168>] lock_is_held+0x78/0x90
[    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff810588ab>] ? __do_softirq+0x3db/0x500
[    0.036000]  [<ffffffff81134495>] trace_preempt_on+0x255/0x260
[    0.036000]  [<ffffffff81081c0b>] preempt_count_sub+0xab/0xf0
[    0.036000]  [<ffffffff81057116>] __local_bh_enable+0x36/0x70
[    0.036000]  [<ffffffff810588ab>] __do_softirq+0x3db/0x500
[    0.036000]  [<ffffffff810589ef>] run_ksoftirqd+0x1f/0x60
[    0.036000]  [<ffffffff81079653>] smpboot_thread_fn+0x193/0x2a0
[    0.036000]  [<ffffffff810794c0>] ? SyS_setgroups+0x150/0x150
[    0.036000]  [<ffffffff81075312>] kthread+0xf2/0x110
[    0.036000]  [<ffffffff81b026b3>] ? wait_for_completion+0xc3/0x120
[    0.036000]  [<ffffffff81081c0b>] ? preempt_count_sub+0xab/0xf0
[    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
[    0.036000]  [<ffffffff81b07cc8>] ret_from_fork+0x58/0x90
[    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
[    0.036000] ---[ end Kernel panic - not syncing: panic_on_warn set ...

PeterZ was so kind to explain it to me what is happening:

"__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
tells lockdep softirqs are enabled with trace_softirqs_on() after that
we go an actually modify the preempt_count with preempt_count_sub().
Then in preempt_count_sub() you call into trace_preempt_on() if this
was the last preempt_count increment but you do that _before_ you
actually change the preempt_count with __preempt_count_sub() at this
point lockdep and preempt_count think the world differs and *boom*"

So the simplest way to avoid this is by disabling the consistency
checks.

We also need to take care of the iterating in trace_events_trigger.c
to avoid a splatter in conjunction with the hist trigger.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 include/linux/rculist.h             | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/tracepoint.h          |  4 ++--
 kernel/trace/trace_events_trigger.c |  6 +++---
 3 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8beb98d..bee836b 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -279,6 +279,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
 	container_of(lockless_dereference(ptr), type, member)
 
 /**
+ * list_entry_rcu_notrace - get the struct for this entry (for tracing)
+ * @ptr:        the &struct list_head pointer.
+ * @type:       the type of the struct this is embedded in.
+ * @member:     the name of the list_head within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
+ *
+ * This is the same as list_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_entry_rcu_notrace(ptr, type, member) \
+({ \
+	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
+	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), type, member); \
+})
+
+/**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
  *
  * Implementing those functions following their counterparts list_empty() and
@@ -391,6 +409,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
 	     pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_rcu_notrace	-	iterate over rcu list of given type (for tracing)
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_head within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as list_add_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as list_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_for_each_entry_rcu_notrace(pos, head, member) \
+	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
+		&pos->member != (head); \
+		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index be586c6..d0e2a82 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -138,7 +138,7 @@ extern void syscall_unregfunc(void);
 			return;						\
 		prercu;							\
 		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+		it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs);	\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -189,7 +189,7 @@ extern void syscall_unregfunc(void);
 				TP_CONDITION(cond),,);			\
 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
 			rcu_read_lock_sched_notrace();			\
-			rcu_dereference_sched(__tracepoint_##name.funcs);\
+			rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\
 			rcu_read_unlock_sched_notrace();		\
 		}							\
 	}								\
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index a975571..4489fb4 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -71,14 +71,14 @@ event_triggers_call(struct trace_event_file *file, void *rec)
 	if (list_empty(&file->triggers))
 		return tt;
 
-	list_for_each_entry_rcu(data, &file->triggers, list) {
+	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
 		if (data->paused)
 			continue;
 		if (!rec) {
 			data->ops->func(data, rec);
 			continue;
 		}
-		filter = rcu_dereference_sched(data->filter);
+		filter = rcu_dereference_raw_notrace(data->filter);
 		if (filter && !filter_match_preds(filter, rec))
 			continue;
 		if (event_command_post_trigger(data->cmd_ops)) {
@@ -111,7 +111,7 @@ event_triggers_post_call(struct trace_event_file *file,
 {
 	struct event_trigger_data *data;
 
-	list_for_each_entry_rcu(data, &file->triggers, list) {
+	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
 		if (data->paused)
 			continue;
 		if (data->cmd_ops->trigger_type & tt)
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints
  2016-08-24 11:17 [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
  2016-08-24 11:17 ` [PATCH v2 1/3] tracing: Deference pointers without RCU checks Binoy Jayan
@ 2016-08-24 11:17 ` Binoy Jayan
  2016-08-25  6:35   ` kbuild test robot
  2016-08-25  6:35   ` kbuild test robot
  2016-08-24 11:17 ` [PATCH v2 3/3] tracing: Histogram for missed timer offsets Binoy Jayan
  2016-08-25  5:26 ` [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Daniel Wagner
  3 siblings, 2 replies; 11+ messages in thread
From: Binoy Jayan @ 2016-08-24 11:17 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-kernel
  Cc: Daniel Wagner, Carsten Emde, linux-kernel,
	Steven Rostedt (Red Hat),
	Binoy Jayan

This work is based on work by Daniel Wagner. A few tracepoints are added
at the end of the critical section. With the hist trigger in place, the
hist trigger plots may be generated, with per-cpu breakdown of events
captured. It is based on linux kernel's event infrastructure.

The following filter(s) may be used

'hist:key=latency.log2:val=hitcount:sort=latency'
'hist:key=cpu,latency:val=hitcount:sort=latency if cpu==1'
'hist:key=common_pid.execname'

This captures only the latencies introduced by disabled irqs and
preemption. Additional per process data has to be captured to calculate
the effective latencies introduced for individual processes.

Initial work - latency.patch

[1] https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 include/trace/events/latency.h | 43 ++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_irqsoff.c   | 42 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/latency.h

diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
new file mode 100644
index 0000000..77896c7
--- /dev/null
+++ b/include/trace/events/latency.h
@@ -0,0 +1,43 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM latency
+
+#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HIST_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(latency_template,
+	TP_PROTO(int cpu, cycles_t latency),
+
+	TP_ARGS(cpu, latency),
+
+	TP_STRUCT__entry(
+		__field(int,		cpu)
+		__field(cycles_t,	latency)
+	),
+
+	TP_fast_assign(
+		__entry->cpu		= cpu;
+		__entry->latency	= latency;
+	),
+
+	TP_printk("cpu=%d, latency=%lu", __entry->cpu,
+			(unsigned long) __entry->latency)
+);
+
+DEFINE_EVENT(latency_template, latency_irqs,
+	    TP_PROTO(int cpu, cycles_t latency),
+	    TP_ARGS(cpu, latency));
+
+DEFINE_EVENT(latency_template, latency_preempt,
+	    TP_PROTO(int cpu, cycles_t latency),
+	    TP_ARGS(cpu, latency));
+
+DEFINE_EVENT(latency_template, latency_critical_timings,
+	    TP_PROTO(int cpu, cycles_t latency),
+	    TP_ARGS(cpu, latency));
+
+#endif /* _TRACE_HIST_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 03cdff8..3fcf446 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -13,14 +13,22 @@
 #include <linux/uaccess.h>
 #include <linux/module.h>
 #include <linux/ftrace.h>
+#include <linux/percpu-defs.h>
+
+#include <trace/events/sched.h>
 
 #include "trace.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/latency.h>
+
 static struct trace_array		*irqsoff_trace __read_mostly;
 static int				tracer_enabled __read_mostly;
 
 static DEFINE_PER_CPU(int, tracing_cpu);
-
+static DEFINE_PER_CPU(cycle_t __maybe_unused, ts_irqs);
+static DEFINE_PER_CPU(cycle_t __maybe_unused, ts_preempt);
+static DEFINE_PER_CPU(cycle_t __maybe_unused, ts_critical_timings);
 static DEFINE_RAW_SPINLOCK(max_trace_lock);
 
 enum {
@@ -419,9 +427,17 @@ stop_critical_timing(unsigned long ip, unsigned long parent_ip)
 	atomic_dec(&data->disabled);
 }
 
+static inline cycle_t get_delta(cycle_t __percpu *ts)
+{
+	return (cycle_t) trace_clock_local() - this_cpu_read(*ts);
+}
 /* start and stop critical timings used to for stoppage (in idle) */
 void start_critical_timings(void)
 {
+	if (trace_latency_critical_timings_enabled())
+		this_cpu_write(ts_critical_timings,
+			(cycle_t) trace_clock_local());
+
 	if (preempt_trace() || irq_trace())
 		start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
 }
@@ -431,6 +447,13 @@ void stop_critical_timings(void)
 {
 	if (preempt_trace() || irq_trace())
 		stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
+
+	if (trace_latency_critical_timings_enabled()) {
+		trace_latency_critical_timings(
+			raw_smp_processor_id(),
+			get_delta(&ts_critical_timings));
+	}
+
 }
 EXPORT_SYMBOL_GPL(stop_critical_timings);
 
@@ -438,6 +461,10 @@ EXPORT_SYMBOL_GPL(stop_critical_timings);
 #ifdef CONFIG_PROVE_LOCKING
 void time_hardirqs_on(unsigned long a0, unsigned long a1)
 {
+	if (trace_latency_irqs_enabled()) {
+		trace_latency_irqs(raw_smp_processor_id(),
+			get_delta(&ts_irqs));
+	}
 	if (!preempt_trace() && irq_trace())
 		stop_critical_timing(a0, a1);
 }
@@ -446,6 +473,10 @@ void time_hardirqs_off(unsigned long a0, unsigned long a1)
 {
 	if (!preempt_trace() && irq_trace())
 		start_critical_timing(a0, a1);
+
+	if (trace_latency_irqs_enabled()) {
+                this_cpu_write(ts_irqs, (cycle_t) trace_clock_local());
+	}
 }
 
 #else /* !CONFIG_PROVE_LOCKING */
@@ -503,6 +534,11 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
 #ifdef CONFIG_PREEMPT_TRACER
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
+	if (trace_latency_preempt_enabled()) {
+		trace_latency_preempt(raw_smp_processor_id(),
+			get_delta(&ts_preempt));
+	}
+
 	if (preempt_trace() && !irq_trace())
 		stop_critical_timing(a0, a1);
 }
@@ -511,6 +547,10 @@ void trace_preempt_off(unsigned long a0, unsigned long a1)
 {
 	if (preempt_trace() && !irq_trace())
 		start_critical_timing(a0, a1);
+
+	if (trace_latency_preempt_enabled()) {
+                this_cpu_write(ts_preempt, (cycle_t) trace_clock_local());
+	}
 }
 #endif /* CONFIG_PREEMPT_TRACER */
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] tracing: Histogram for missed timer offsets
  2016-08-24 11:17 [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
  2016-08-24 11:17 ` [PATCH v2 1/3] tracing: Deference pointers without RCU checks Binoy Jayan
  2016-08-24 11:17 ` [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints Binoy Jayan
@ 2016-08-24 11:17 ` Binoy Jayan
  2016-08-25  5:26 ` [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Daniel Wagner
  3 siblings, 0 replies; 11+ messages in thread
From: Binoy Jayan @ 2016-08-24 11:17 UTC (permalink / raw)
  To: Arnd Bergmann, linaro-kernel
  Cc: Daniel Wagner, Carsten Emde, linux-kernel,
	Steven Rostedt (Red Hat),
	Binoy Jayan

Latencies of missed timer offsets. Generate a histogram of missed
timer offsets in microseconds. This will be a based along with irq
and preemption latencies to calculate the effective process wakeup
latencies.

The following filter(s) may be used

'hist:key=common_pid.execname'
'hist:key=common_pid.execname,cpu:val=toffset,hitcount'

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
---
 include/linux/hrtimer.h        |  3 +++
 include/trace/events/latency.h | 31 +++++++++++++++++++++++++++++++
 kernel/time/hrtimer.c          | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 5e00f80..e09de14 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -104,6 +104,9 @@ struct hrtimer {
 	struct hrtimer_clock_base	*base;
 	u8				state;
 	u8				is_rel;
+#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
+	ktime_t				praecox;
+#endif
 #ifdef CONFIG_TIMER_STATS
 	int				start_pid;
 	void				*start_site;
diff --git a/include/trace/events/latency.h b/include/trace/events/latency.h
index 77896c7..24cf009 100644
--- a/include/trace/events/latency.h
+++ b/include/trace/events/latency.h
@@ -37,6 +37,37 @@ DEFINE_EVENT(latency_template, latency_critical_timings,
 	    TP_PROTO(int cpu, cycles_t latency),
 	    TP_ARGS(cpu, latency));
 
+TRACE_EVENT(latency_hrtimer_interrupt,
+
+	TP_PROTO(int cpu, long long toffset, struct task_struct *curr,
+		struct task_struct *task),
+
+	TP_ARGS(cpu, toffset, curr, task),
+
+	TP_STRUCT__entry(
+		__field(int,		cpu)
+		__field(long long,	toffset)
+		__array(char,		ccomm,	TASK_COMM_LEN)
+		__field(int,		cprio)
+		__array(char,		tcomm,	TASK_COMM_LEN)
+		__field(int,		tprio)
+	),
+
+	TP_fast_assign(
+		__entry->cpu	 = cpu;
+		__entry->toffset = toffset;
+		memcpy(__entry->ccomm, curr->comm, TASK_COMM_LEN);
+		__entry->cprio  = curr->prio;
+		memcpy(__entry->tcomm, task != NULL ? task->comm : "<none>",
+			task != NULL ? TASK_COMM_LEN : 7);
+		__entry->tprio  = task != NULL ? task->prio : -1;
+	),
+
+	TP_printk("cpu=%d toffset=%lld curr=%s[%d] thread=%s[%d]",
+		__entry->cpu, __entry->toffset, __entry->ccomm,
+		__entry->cprio, __entry->tcomm, __entry->tprio)
+);
+
 #endif /* _TRACE_HIST_H */
 
 /* This part must be outside protection */
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 9ba7c82..1a96e34 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -53,9 +53,12 @@
 #include <asm/uaccess.h>
 
 #include <trace/events/timer.h>
+#include <trace/events/latency.h>
 
 #include "tick-internal.h"
 
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
 /*
  * The timer bases:
  *
@@ -960,6 +963,38 @@ static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t tim,
 	return tim;
 }
 
+static inline void trace_latency_mark_ts(struct hrtimer *timer,
+					 struct hrtimer_clock_base *new_base,
+					 ktime_t tim)
+{
+#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
+	if (trace_latency_hrtimer_interrupt_enabled()) {
+		ktime_t now = new_base->get_time();
+
+		if (ktime_to_ns(tim) < ktime_to_ns(now))
+			timer->praecox = now;
+		else
+			timer->praecox = ktime_set(0, 0);
+	}
+#endif
+}
+
+static inline void trace_missed_hrtimer(struct hrtimer *timer, ktime_t basenow)
+{
+#if defined(CONFIG_PREEMPT_TRACER) || defined(CONFIG_IRQSOFF_TRACER)
+	if (trace_latency_hrtimer_interrupt_enabled())
+		trace_latency_hrtimer_interrupt(raw_smp_processor_id(),
+			ktime_to_ns(ktime_sub(ktime_to_ns(timer->praecox) ?
+				timer->praecox : hrtimer_get_expires(timer),
+				basenow)),
+			current,
+			timer->function == hrtimer_wakeup ?
+				container_of(timer, struct hrtimer_sleeper,
+					timer)->task : NULL);
+#endif
+
+}
+
 /**
  * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
  * @timer:	the timer to be added
@@ -992,6 +1027,8 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 
 	timer_stats_hrtimer_set_start_info(timer);
 
+	trace_latency_mark_ts(timer, new_base, tim);
+
 	leftmost = enqueue_hrtimer(timer, new_base);
 	if (!leftmost)
 		goto unlock;
@@ -1284,6 +1321,8 @@ static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
 
 			timer = container_of(node, struct hrtimer, node);
 
+			trace_missed_hrtimer(timer, basenow);
+
 			/*
 			 * The immediate goal for using the softexpires is
 			 * minimizing wakeups, not running timers at the
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF ***
  2016-08-24 11:17 [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
                   ` (2 preceding siblings ...)
  2016-08-24 11:17 ` [PATCH v2 3/3] tracing: Histogram for missed timer offsets Binoy Jayan
@ 2016-08-25  5:26 ` Daniel Wagner
  2016-08-25  6:33   ` Binoy Jayan
  3 siblings, 1 reply; 11+ messages in thread
From: Daniel Wagner @ 2016-08-25  5:26 UTC (permalink / raw)
  To: Binoy Jayan, Arnd Bergmann, linaro-kernel
  Cc: Carsten Emde, linux-kernel, Steven Rostedt (Red Hat)

Hi Binoy,

On 08/24/2016 01:17 PM, Binoy Jayan wrote:
> Histogram output:
> cat /sys/kernel/debug/tracing/events/latency/latency_irqs/hist
> cat /sys/kernel/debug/tracing/events/latency/latency_preempt/hist
> cat /sys/kernel/debug/tracing/events/latency/latency_critical_timings/hist
> cat /sys/kernel/debug/tracing/events/latency/latency_hrtimer_interrupt/hist

[...]

> Changes from v1 as per comments from Steven/Daniel
>   - Use single tracepoint for irq/preempt/critical timings by introducing
>     a trace type field to differentiate trace type in the same tracepoint.

Did you send out the right patches? This version still looks like the 
previous one in this regard. And wouldn't be the 'Histogram output' have 
only one file? Maybe I just understood something wrong here.

cheers,
daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF ***
  2016-08-25  5:26 ` [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Daniel Wagner
@ 2016-08-25  6:33   ` Binoy Jayan
  0 siblings, 0 replies; 11+ messages in thread
From: Binoy Jayan @ 2016-08-25  6:33 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Arnd Bergmann, linaro-kernel, Carsten Emde,
	Linux kernel mailing list, Steven Rostedt (Red Hat)

On 25 August 2016 at 10:56, Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> Hi Binoy,
>
> On 08/24/2016 01:17 PM, Binoy Jayan wrote:
>>
>> Histogram output:
>> cat /sys/kernel/debug/tracing/events/latency/latency_irqs/hist
>> cat /sys/kernel/debug/tracing/events/latency/latency_preempt/hist
>> cat /sys/kernel/debug/tracing/events/latency/latency_critical_timings/hist
>> cat
>> /sys/kernel/debug/tracing/events/latency/latency_hrtimer_interrupt/hist
>
>
> [...]
>
>> Changes from v1 as per comments from Steven/Daniel
>>   - Use single tracepoint for irq/preempt/critical timings by introducing
>>     a trace type field to differentiate trace type in the same tracepoint.
>
>
> Did you send out the right patches? This version still looks like the
> previous one in this regard. And wouldn't be the 'Histogram output' have
> only one file? Maybe I just understood something wrong here.
>
> cheers,
> daniel

Hi Daniel,

This patch is after incorporating changes w.r.t. comments by steven.
And regarding
using one tracepoint, I have mentioned the same in the cover letter. I have sent
you (only) another patch with that change. When I do it like that I
get an RCU error,
the first time the "type" key is used. It is weird that it happens
only for the first time
something is echo-ed to the trigger file. I haven't been able to
figure out why yet.

Binoy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints
  2016-08-24 11:17 ` [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints Binoy Jayan
@ 2016-08-25  6:35   ` kbuild test robot
  2016-08-25  6:35   ` kbuild test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-08-25  6:35 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: kbuild-all, Arnd Bergmann, linaro-kernel, Daniel Wagner,
	Carsten Emde, linux-kernel, Steven Rostedt (Red Hat),
	Binoy Jayan

[-- Attachment #1: Type: text/plain, Size: 2584 bytes --]

Hi Binoy,

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.8-rc3 next-20160824]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Binoy-Jayan/tracing-Deference-pointers-without-RCU-checks/20160824-192127
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

   kernel/trace/trace_irqsoff.c: In function 'stop_critical_timings':
>> kernel/trace/trace_irqsoff.c:454:14: error: passing argument 1 of 'get_delta' from incompatible pointer type [-Werror=incompatible-pointer-types]
       get_delta(&ts_critical_timings));
                 ^
   kernel/trace/trace_irqsoff.c:430:23: note: expected 'cycle_t * {aka long long unsigned int *}' but argument is of type 'cycle_t * {aka long long unsigned int *}'
    static inline cycle_t get_delta(cycle_t __percpu *ts)
                          ^
   kernel/trace/trace_irqsoff.c: In function 'time_hardirqs_on':
   kernel/trace/trace_irqsoff.c:466:14: error: passing argument 1 of 'get_delta' from incompatible pointer type [-Werror=incompatible-pointer-types]
       get_delta(&ts_irqs));
                 ^
   kernel/trace/trace_irqsoff.c:430:23: note: expected 'cycle_t * {aka long long unsigned int *}' but argument is of type 'cycle_t * {aka long long unsigned int *}'
    static inline cycle_t get_delta(cycle_t __percpu *ts)
                          ^
   cc1: some warnings being treated as errors

vim +/get_delta +454 kernel/trace/trace_irqsoff.c

   448		if (preempt_trace() || irq_trace())
   449			stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
   450	
   451		if (trace_latency_critical_timings_enabled()) {
   452			trace_latency_critical_timings(
   453				raw_smp_processor_id(),
 > 454				get_delta(&ts_critical_timings));
   455		}
   456	
   457	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 47053 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints
  2016-08-24 11:17 ` [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints Binoy Jayan
  2016-08-25  6:35   ` kbuild test robot
@ 2016-08-25  6:35   ` kbuild test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-08-25  6:35 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: kbuild-all, Arnd Bergmann, linaro-kernel, Daniel Wagner,
	Carsten Emde, linux-kernel, Steven Rostedt (Red Hat),
	Binoy Jayan

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

Hi Binoy,

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.8-rc3 next-20160824]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Binoy-Jayan/tracing-Deference-pointers-without-RCU-checks/20160824-192127
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   kernel/trace/trace_irqsoff.c: In function 'stop_critical_timings':
>> kernel/trace/trace_irqsoff.c:454:4: warning: passing argument 1 of 'get_delta' from incompatible pointer type [enabled by default]
   kernel/trace/trace_irqsoff.c:430:23: note: expected 'cycle_t *' but argument is of type 'cycle_t *'
   kernel/trace/trace_irqsoff.c: In function 'time_hardirqs_on':
   kernel/trace/trace_irqsoff.c:466:4: warning: passing argument 1 of 'get_delta' from incompatible pointer type [enabled by default]
   kernel/trace/trace_irqsoff.c:430:23: note: expected 'cycle_t *' but argument is of type 'cycle_t *'

vim +/get_delta +454 kernel/trace/trace_irqsoff.c

   438			this_cpu_write(ts_critical_timings,
   439				(cycle_t) trace_clock_local());
   440	
   441		if (preempt_trace() || irq_trace())
   442			start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
   443	}
   444	EXPORT_SYMBOL_GPL(start_critical_timings);
   445	
   446	void stop_critical_timings(void)
   447	{
   448		if (preempt_trace() || irq_trace())
   449			stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
   450	
   451		if (trace_latency_critical_timings_enabled()) {
   452			trace_latency_critical_timings(
   453				raw_smp_processor_id(),
 > 454				get_delta(&ts_critical_timings));
   455		}
   456	
   457	}
   458	EXPORT_SYMBOL_GPL(stop_critical_timings);
   459	
   460	#ifdef CONFIG_IRQSOFF_TRACER
   461	#ifdef CONFIG_PROVE_LOCKING
   462	void time_hardirqs_on(unsigned long a0, unsigned long a1)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 45507 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] tracing: Deference pointers without RCU checks
  2016-08-24 11:17 ` [PATCH v2 1/3] tracing: Deference pointers without RCU checks Binoy Jayan
@ 2016-08-26  1:49   ` Masami Hiramatsu
  2016-08-26  5:26     ` Binoy Jayan
  0 siblings, 1 reply; 11+ messages in thread
From: Masami Hiramatsu @ 2016-08-26  1:49 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Arnd Bergmann, linaro-kernel, Daniel Wagner, Carsten Emde,
	linux-kernel, Steven Rostedt (Red Hat)

On Wed, 24 Aug 2016 16:47:28 +0530
Binoy Jayan <binoy.jayan@linaro.org> wrote:

> From: Daniel Wagner <daniel.wagner@bmw-carit.de>
> 
> The tracepoint can't be used in code section where we are in the
> middle of a state transition.
> 
> For example if we place a tracepoint inside start/stop_critical_section(),
> lockdep complains with
> 
> [    0.035589] WARNING: CPU: 0 PID: 3 at kernel/locking/lockdep.c:3560 \
> check_flags.part.36+0x1bc/0x210() [    0.036000] \
> DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [    0.036000] Kernel panic - not \
> syncing: panic_on_warn set ... [    0.036000]
> [    0.036000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc7+ #460
> [    0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS \
> 1.7.5-20140709_153950- 04/01/2014 [    0.036000]  ffffffff81f2463a ffff88007c93bb98 \
> ffffffff81afb317 0000000000000001 [    0.036000]  ffffffff81f212b3 ffff88007c93bc18 \
> ffffffff81af7bc2 ffff88007c93bbb8 [    0.036000]  ffffffff00000008 ffff88007c93bc28 \
> ffff88007c93bbc8 000000000093bbd8 [    0.036000] Call Trace:
> [    0.036000]  [<ffffffff81afb317>] dump_stack+0x4f/0x7b
> [    0.036000]  [<ffffffff81af7bc2>] panic+0xc0/0x1e9
> [    0.036000]  [<ffffffff81b07188>] ? _raw_spin_unlock_irqrestore+0x38/0x80
> [    0.036000]  [<ffffffff81052a60>] warn_slowpath_common+0xc0/0xc0
> [    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
> [    0.036000]  [<ffffffff81052aa6>] warn_slowpath_fmt+0x46/0x50
> [    0.036000]  [<ffffffff810a468c>] check_flags.part.36+0x1bc/0x210
> [    0.036000]  [<ffffffff810a5168>] lock_is_held+0x78/0x90
> [    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
> [    0.036000]  [<ffffffff810588ab>] ? __do_softirq+0x3db/0x500
> [    0.036000]  [<ffffffff81134495>] trace_preempt_on+0x255/0x260
> [    0.036000]  [<ffffffff81081c0b>] preempt_count_sub+0xab/0xf0
> [    0.036000]  [<ffffffff81057116>] __local_bh_enable+0x36/0x70
> [    0.036000]  [<ffffffff810588ab>] __do_softirq+0x3db/0x500
> [    0.036000]  [<ffffffff810589ef>] run_ksoftirqd+0x1f/0x60
> [    0.036000]  [<ffffffff81079653>] smpboot_thread_fn+0x193/0x2a0
> [    0.036000]  [<ffffffff810794c0>] ? SyS_setgroups+0x150/0x150
> [    0.036000]  [<ffffffff81075312>] kthread+0xf2/0x110
> [    0.036000]  [<ffffffff81b026b3>] ? wait_for_completion+0xc3/0x120
> [    0.036000]  [<ffffffff81081c0b>] ? preempt_count_sub+0xab/0xf0
> [    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
> [    0.036000]  [<ffffffff81b07cc8>] ret_from_fork+0x58/0x90
> [    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
> [    0.036000] ---[ end Kernel panic - not syncing: panic_on_warn set ...
> 
> PeterZ was so kind to explain it to me what is happening:
> 
> "__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
> tells lockdep softirqs are enabled with trace_softirqs_on() after that
> we go an actually modify the preempt_count with preempt_count_sub().
> Then in preempt_count_sub() you call into trace_preempt_on() if this
> was the last preempt_count increment but you do that _before_ you
> actually change the preempt_count with __preempt_count_sub() at this
> point lockdep and preempt_count think the world differs and *boom*"
> 
> So the simplest way to avoid this is by disabling the consistency
> checks.
> 
> We also need to take care of the iterating in trace_events_trigger.c
> to avoid a splatter in conjunction with the hist trigger.

Special care for lockdep inside tracepoint handler is reasonable.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Steven, since this seems a bugfix, could you pick this from the series?

Thank you,

> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> ---
>  include/linux/rculist.h             | 36 ++++++++++++++++++++++++++++++++++++
>  include/linux/tracepoint.h          |  4 ++--
>  kernel/trace/trace_events_trigger.c |  6 +++---
>  3 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 8beb98d..bee836b 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -279,6 +279,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
>  	container_of(lockless_dereference(ptr), type, member)
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry (for tracing)
> + * @ptr:        the &struct list_head pointer.
> + * @type:       the type of the struct this is embedded in.
> + * @member:     the name of the list_head within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> +	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> +	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -391,6 +409,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
>  	     pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace	-	iterate over rcu list of given type (for tracing)
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_head within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> +	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> +		&pos->member != (head); \
> +		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given type
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index be586c6..d0e2a82 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -138,7 +138,7 @@ extern void syscall_unregfunc(void);
>  			return;						\
>  		prercu;							\
>  		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +		it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs);	\
>  		if (it_func_ptr) {					\
>  			do {						\
>  				it_func = (it_func_ptr)->func;		\
> @@ -189,7 +189,7 @@ extern void syscall_unregfunc(void);
>  				TP_CONDITION(cond),,);			\
>  		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
>  			rcu_read_lock_sched_notrace();			\
> -			rcu_dereference_sched(__tracepoint_##name.funcs);\
> +			rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\
>  			rcu_read_unlock_sched_notrace();		\
>  		}							\
>  	}								\
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index a975571..4489fb4 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -71,14 +71,14 @@ event_triggers_call(struct trace_event_file *file, void *rec)
>  	if (list_empty(&file->triggers))
>  		return tt;
>  
> -	list_for_each_entry_rcu(data, &file->triggers, list) {
> +	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
>  		if (data->paused)
>  			continue;
>  		if (!rec) {
>  			data->ops->func(data, rec);
>  			continue;
>  		}
> -		filter = rcu_dereference_sched(data->filter);
> +		filter = rcu_dereference_raw_notrace(data->filter);
>  		if (filter && !filter_match_preds(filter, rec))
>  			continue;
>  		if (event_command_post_trigger(data->cmd_ops)) {
> @@ -111,7 +111,7 @@ event_triggers_post_call(struct trace_event_file *file,
>  {
>  	struct event_trigger_data *data;
>  
> -	list_for_each_entry_rcu(data, &file->triggers, list) {
> +	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
>  		if (data->paused)
>  			continue;
>  		if (data->cmd_ops->trigger_type & tt)
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] tracing: Deference pointers without RCU checks
  2016-08-26  1:49   ` Masami Hiramatsu
@ 2016-08-26  5:26     ` Binoy Jayan
  2016-08-28 23:50       ` Masami Hiramatsu
  0 siblings, 1 reply; 11+ messages in thread
From: Binoy Jayan @ 2016-08-26  5:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Arnd Bergmann, linaro-kernel, Daniel Wagner, Carsten Emde,
	Linux kernel mailing list, Steven Rostedt (Red Hat)

On 26 August 2016 at 07:19, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> On Wed, 24 Aug 2016 16:47:28 +0530
>> "__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
>> tells lockdep softirqs are enabled with trace_softirqs_on() after that
>> we go an actually modify the preempt_count with preempt_count_sub().
>> Then in preempt_count_sub() you call into trace_preempt_on() if this
>> was the last preempt_count increment but you do that _before_ you
>> actually change the preempt_count with __preempt_count_sub() at this
>> point lockdep and preempt_count think the world differs and *boom*"
>>
>> So the simplest way to avoid this is by disabling the consistency
>> checks.
>>
>> We also need to take care of the iterating in trace_events_trigger.c
>> to avoid a splatter in conjunction with the hist trigger.
>
> Special care for lockdep inside tracepoint handler is reasonable.
>
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
>
> Steven, since this seems a bugfix, could you pick this from the series?
>
> Thank you,
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>


Hi Daniel/Masami,

I ran into a similar rcu error while using same tracepoint for all
three latency types
and using a filter like below to trigger only events falling under a
specific type.

echo 'hist:key=ltype,cpu:val=latency:sort=ltype,cpu if ltype==0' > \
   /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger

The error occurs only when I use the predicate 'if ltype==0' as filter.

It occurs in 'filter_match_preds' during a call to 'rcu_dereference_sched'.
kernel/trace/trace_events_filter.c +611 : filter_match_preds()

Surprisingly, this happens only the first time the echo command is used on
the trigger file after each boot.

Do you think it is similar to the bug you have fixed? May be i'll try using
"rcu_dereference_raw_notrace" instead of 'rcu_dereference_sched'.

 Binoy


[ 1029.324257] ===============================
[ 1029.324785] [ INFO: suspicious RCU usage. ]
[ 1029.328698] 4.7.0+ #49 Not tainted
[ 1029.332858] -------------------------------
[ 1029.336334] /local/mnt/workspace/src/korg/linux/kernel/trace/trace_events_filter.c:611
suspicious rcu_dereference_check() usage!
[ 1029.340423]
[ 1029.340423] other info that might help us debug this:
[ 1029.340423]
[ 1029.352226]
[ 1029.352226] RCU used illegally from idle CPU!
[ 1029.352226] rcu_scheduler_active = 1, debug_locks = 0
[ 1029.359953] RCU used illegally from extended quiescent state!
[ 1029.371057] no locks held by swapper/0/0.
[ 1029.376696]
[ 1029.376696] stack backtrace:
[ 1029.380693] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0+ #49
[ 1029.385033] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 1029.391111] Call trace:
[ 1029.397798] [<ffff000008089ecc>] dump_backtrace+0x0/0x1e0
[ 1029.399967] [<ffff00000808a0d0>] show_stack+0x24/0x2c
[ 1029.405523] [<ffff000008431554>] dump_stack+0xb0/0xf0
[ 1029.410557] [<ffff000008115dd0>] lockdep_rcu_suspicious+0xe8/0x120
[ 1029.415595] [<ffff0000081a5d40>] filter_match_preds+0x108/0x118
[ 1029.421669] [<ffff0000081a8308>] event_triggers_call+0x5c/0xc0
[ 1029.427485] [<ffff0000081a2184>] trace_event_buffer_commit+0x11c/0x244
[ 1029.433390] [<ffff00000819a4c0>]
trace_event_raw_event_latency_template+0x58/0xa4
[ 1029.439902] [<ffff00000819b5ec>] time_hardirqs_on+0x264/0x290
[ 1029.447450] [<ffff0000081170d0>] trace_hardirqs_on_caller+0x20/0x180
[ 1029.453179] [<ffff000008117240>] trace_hardirqs_on+0x10/0x18
[ 1029.459604] [<ffff000008789290>] cpuidle_enter_state+0xc8/0x2e0
[ 1029.465246] [<ffff00000878951c>] cpuidle_enter+0x34/0x40
[ 1029.470888] [<ffff000008111188>] call_cpuidle+0x3c/0x5c
[ 1029.476442] [<ffff0000081114e8>] cpu_startup_entry+0x1c0/0x360
[ 1029.481392] [<ffff000008955bc8>] rest_init+0x150/0x160
[ 1029.487293] [<ffff000008d30b98>] start_kernel+0x3a4/0x3b8
[ 1029.492415] [<ffff000008d301bc>] __primary_switched+0x30/0x74

Binoy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] tracing: Deference pointers without RCU checks
  2016-08-26  5:26     ` Binoy Jayan
@ 2016-08-28 23:50       ` Masami Hiramatsu
  0 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2016-08-28 23:50 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Arnd Bergmann, linaro-kernel, Daniel Wagner, Carsten Emde,
	Linux kernel mailing list, Steven Rostedt (Red Hat)

Hi Binoy,

On Fri, 26 Aug 2016 10:56:12 +0530
Binoy Jayan <binoy.jayan@linaro.org> wrote:

> On 26 August 2016 at 07:19, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > On Wed, 24 Aug 2016 16:47:28 +0530
> >> "__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
> >> tells lockdep softirqs are enabled with trace_softirqs_on() after that
> >> we go an actually modify the preempt_count with preempt_count_sub().
> >> Then in preempt_count_sub() you call into trace_preempt_on() if this
> >> was the last preempt_count increment but you do that _before_ you
> >> actually change the preempt_count with __preempt_count_sub() at this
> >> point lockdep and preempt_count think the world differs and *boom*"
> >>
> >> So the simplest way to avoid this is by disabling the consistency
> >> checks.
> >>
> >> We also need to take care of the iterating in trace_events_trigger.c
> >> to avoid a splatter in conjunction with the hist trigger.
> >
> > Special care for lockdep inside tracepoint handler is reasonable.
> >
> > Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> >
> > Steven, since this seems a bugfix, could you pick this from the series?
> >
> > Thank you,
> >
> > --
> > Masami Hiramatsu <mhiramat@kernel.org>
> 
> 
> Hi Daniel/Masami,
> 
> I ran into a similar rcu error while using same tracepoint for all
> three latency types
> and using a filter like below to trigger only events falling under a
> specific type.
> 
> echo 'hist:key=ltype,cpu:val=latency:sort=ltype,cpu if ltype==0' > \
>    /sys/kernel/debug/tracing/events/latency/latency_preempt/trigger
> 
> The error occurs only when I use the predicate 'if ltype==0' as filter.
> 
> It occurs in 'filter_match_preds' during a call to 'rcu_dereference_sched'.
> kernel/trace/trace_events_filter.c +611 : filter_match_preds()
> 
> Surprisingly, this happens only the first time the echo command is used on
> the trigger file after each boot.
> 
> Do you think it is similar to the bug you have fixed? May be i'll try using
> "rcu_dereference_raw_notrace" instead of 'rcu_dereference_sched'.

Yes, I think we should fix that too. No rcu_derefaernce_sched used in
trace callees. (except for kprobe events, since it is out of context)

Thank you,


> 
>  Binoy
> 
> 
> [ 1029.324257] ===============================
> [ 1029.324785] [ INFO: suspicious RCU usage. ]
> [ 1029.328698] 4.7.0+ #49 Not tainted
> [ 1029.332858] -------------------------------
> [ 1029.336334] /local/mnt/workspace/src/korg/linux/kernel/trace/trace_events_filter.c:611
> suspicious rcu_dereference_check() usage!
> [ 1029.340423]
> [ 1029.340423] other info that might help us debug this:
> [ 1029.340423]
> [ 1029.352226]
> [ 1029.352226] RCU used illegally from idle CPU!
> [ 1029.352226] rcu_scheduler_active = 1, debug_locks = 0
> [ 1029.359953] RCU used illegally from extended quiescent state!
> [ 1029.371057] no locks held by swapper/0/0.
> [ 1029.376696]
> [ 1029.376696] stack backtrace:
> [ 1029.380693] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0+ #49
> [ 1029.385033] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 1029.391111] Call trace:
> [ 1029.397798] [<ffff000008089ecc>] dump_backtrace+0x0/0x1e0
> [ 1029.399967] [<ffff00000808a0d0>] show_stack+0x24/0x2c
> [ 1029.405523] [<ffff000008431554>] dump_stack+0xb0/0xf0
> [ 1029.410557] [<ffff000008115dd0>] lockdep_rcu_suspicious+0xe8/0x120
> [ 1029.415595] [<ffff0000081a5d40>] filter_match_preds+0x108/0x118
> [ 1029.421669] [<ffff0000081a8308>] event_triggers_call+0x5c/0xc0
> [ 1029.427485] [<ffff0000081a2184>] trace_event_buffer_commit+0x11c/0x244
> [ 1029.433390] [<ffff00000819a4c0>]
> trace_event_raw_event_latency_template+0x58/0xa4
> [ 1029.439902] [<ffff00000819b5ec>] time_hardirqs_on+0x264/0x290
> [ 1029.447450] [<ffff0000081170d0>] trace_hardirqs_on_caller+0x20/0x180
> [ 1029.453179] [<ffff000008117240>] trace_hardirqs_on+0x10/0x18
> [ 1029.459604] [<ffff000008789290>] cpuidle_enter_state+0xc8/0x2e0
> [ 1029.465246] [<ffff00000878951c>] cpuidle_enter+0x34/0x40
> [ 1029.470888] [<ffff000008111188>] call_cpuidle+0x3c/0x5c
> [ 1029.476442] [<ffff0000081114e8>] cpu_startup_entry+0x1c0/0x360
> [ 1029.481392] [<ffff000008955bc8>] rest_init+0x150/0x160
> [ 1029.487293] [<ffff000008d30b98>] start_kernel+0x3a4/0x3b8
> [ 1029.492415] [<ffff000008d301bc>] __primary_switched+0x30/0x74
> 
> Binoy


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-28 23:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24 11:17 [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
2016-08-24 11:17 ` [PATCH v2 1/3] tracing: Deference pointers without RCU checks Binoy Jayan
2016-08-26  1:49   ` Masami Hiramatsu
2016-08-26  5:26     ` Binoy Jayan
2016-08-28 23:50       ` Masami Hiramatsu
2016-08-24 11:17 ` [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints Binoy Jayan
2016-08-25  6:35   ` kbuild test robot
2016-08-25  6:35   ` kbuild test robot
2016-08-24 11:17 ` [PATCH v2 3/3] tracing: Histogram for missed timer offsets Binoy Jayan
2016-08-25  5:26 ` [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Daniel Wagner
2016-08-25  6:33   ` Binoy Jayan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).