All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tracing WIP patches
@ 2009-04-17  6:35 Jeremy Fitzhardinge
  2009-04-17  6:35 ` [PATCH 1/4] tracing: move __DO_TRACE out of line Jeremy Fitzhardinge
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17  6:35 UTC (permalink / raw)
  To: mathieu.desnoyers; +Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List

Hi,

Here's the patches I have against the tip/tracing/core.  They
consist of:

- Move __DO_TRACE out of line, so that the inline code is just an
  if() and a call.  This reduces linux/tracepoint.h's include dependencies
  to just <linux/types.h>, which means its safe to include in any context
  at all.

- Remove the use of the global CREATE_TRACE_POINTS, and institute
  a set of CREATE_subsys_TRACE_POINTS variables to cause targeted
  instantiation of the tracepoint code and data.  Without this
  we end up in the situation where one
	#define CREATE_TRACE_POINTS
	#include <trace/events/foo.h>
  instantiates not only foo's events but also bar's, if foo.h ends
  up directly or indirectly including trace/events/bar.h
  Unfortunately it increases the amount of boilerplace in each
  events definition header by a bit.

- A followup, ot make out-lining __do_trace_##name work for
  direct users of DECLARE_TRACE/DEFINE_TRACE, as DEFINE_TRACE now needs
  a full arg and proto list.  This results if fairly ugly wholesale
  duplication of each tracepoint's arg lists.  This can be avoided if
  we migrate those users to use TRACE_EVENT(), etc.

I'll post my pvops patches for comment in a moment.  Unfortunately, even
with all this, I can't get the kernel to link due to duplicate kmem.h
events...

	J

 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    4 +-
 arch/x86/kernel/process.c                  |    9 +++-
 block/blk-core.c                           |   56 +++++++++++++++++++++++------
 block/elevator.c                           |   13 +++++-
 drivers/md/dm.c                            |    4 +-
 fs/bio.c                                   |    4 +-
 include/linux/tracepoint.h                 |   33 +++++++++--------
 include/trace/define_trace.h               |   10 ++---
 include/trace/events/irq.h                 |    5 ++
 include/trace/events/kmem.h                |    5 ++
 include/trace/events/lockdep.h             |    5 ++
 include/trace/events/sched.h               |    5 ++
 include/trace/events/skb.h                 |    6 +++
 include/trace/ftrace.h                     |    4 +-
 include/trace/instantiate_trace.h          |    7 +++
 kernel/irq/handle.c                        |    2 -
 kernel/lockdep.c                           |    2 -
 kernel/sched.c                             |    2 -
 kernel/tracepoint.c                        |    6 +++
 kernel/workqueue.c                         |   16 ++++++--
 mm/bounce.c                                |    4 +-
 mm/util.c                                  |    2 -
 net/core/net-traces.c                      |    2 -
 samples/trace_events/trace-events-sample.c |    2 -
 samples/trace_events/trace-events-sample.h |    6 +++
 25 files changed, 162 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17  6:35 [PATCH] tracing WIP patches Jeremy Fitzhardinge
@ 2009-04-17  6:35 ` Jeremy Fitzhardinge
  2009-04-17 15:46   ` Ingo Molnar
  2009-04-17  6:35 ` [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17  6:35 UTC (permalink / raw)
  To: mathieu.desnoyers
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Mainly simplify linux/tracepoint.h's include dependencies (removes
rcupdate.h), but it can't help with icache locality, since it
definitely moves the code out of line, rather than relying on gcc
to do it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/linux/tracepoint.h   |   21 ++++++++++-----------
 include/trace/define_trace.h |    7 +++++--
 kernel/tracepoint.c          |    6 ++++++
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 4353f3f..1052e33 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,7 +15,6 @@
  */
 
 #include <linux/types.h>
-#include <linux/rcupdate.h>
 
 struct module;
 struct tracepoint;
@@ -42,19 +41,20 @@ struct tracepoint {
  * it_func[0] is never NULL because there is at least one element in the array
  * when the array itself is non NULL.
  */
-#define __DO_TRACE(tp, proto, args)					\
-	do {								\
+#define DEFINE_DO_TRACE(name, proto, args)				\
+	void __do_trace_##name(struct tracepoint *tp, TP_PROTO(proto))	\
+	{								\
 		void **it_func;						\
 									\
 		rcu_read_lock_sched_notrace();				\
-		it_func = rcu_dereference((tp)->funcs);			\
+		it_func = rcu_dereference(tp->funcs);			\
 		if (it_func) {						\
 			do {						\
 				((void(*)(proto))(*it_func))(args);	\
 			} while (*(++it_func));				\
 		}							\
 		rcu_read_unlock_sched_notrace();			\
-	} while (0)
+	}
 
 /*
  * Make sure the alignment of the structure in the __tracepoints section will
@@ -63,11 +63,13 @@ struct tracepoint {
  */
 #define DECLARE_TRACE(name, proto, args)				\
 	extern struct tracepoint __tracepoint_##name;			\
+	extern void __do_trace_##name(struct tracepoint *tp,		\
+				      TP_PROTO(proto));			\
 	static inline void trace_##name(proto)				\
 	{								\
 		if (unlikely(__tracepoint_##name.state))		\
-			__DO_TRACE(&__tracepoint_##name,		\
-				TP_PROTO(proto), TP_ARGS(args));	\
+			__do_trace_##name(&__tracepoint_##name,		\
+					  TP_ARGS(args));		\
 	}								\
 	static inline int register_trace_##name(void (*probe)(proto))	\
 	{								\
@@ -151,10 +153,7 @@ extern int tracepoint_get_iter_range(struct tracepoint **tracepoint,
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
-static inline void tracepoint_synchronize_unregister(void)
-{
-	synchronize_sched();
-}
+extern void tracepoint_synchronize_unregister(void);
 
 #define PARAMS(args...) args
 
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 1886941..fa46100 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -24,14 +24,17 @@
 
 #undef TRACE_EVENT
 #define TRACE_EVENT(name, proto, args, tstruct, assign, print)	\
+	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))	\
 	DEFINE_TRACE(name)
 
 #undef TRACE_FORMAT
-#define TRACE_FORMAT(name, proto, args, print)	\
+#define TRACE_FORMAT(name, proto, args, print)			\
+	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))	\
 	DEFINE_TRACE(name)
 
 #undef DECLARE_TRACE
-#define DECLARE_TRACE(name, proto, args)	\
+#define DECLARE_TRACE(name, proto, args)			\
+	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))	\
 	DEFINE_TRACE(name)
 
 #undef TRACE_INCLUDE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1ef5d3a..6ac1f48 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -545,6 +545,12 @@ void tracepoint_iter_reset(struct tracepoint_iter *iter)
 }
 EXPORT_SYMBOL_GPL(tracepoint_iter_reset);
 
+void tracepoint_synchronize_unregister(void)
+{
+	synchronize_sched();
+}
+EXPORT_SYMBOL_GPL(tracepoint_synchronize_unregister);
+
 #ifdef CONFIG_MODULES
 
 int tracepoint_module_notify(struct notifier_block *self,
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17  6:35 [PATCH] tracing WIP patches Jeremy Fitzhardinge
  2009-04-17  6:35 ` [PATCH 1/4] tracing: move __DO_TRACE out of line Jeremy Fitzhardinge
@ 2009-04-17  6:35 ` Jeremy Fitzhardinge
  2009-04-17 15:55   ` Steven Rostedt
  2009-04-17  6:35 ` [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE Jeremy Fitzhardinge
  2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
  3 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17  6:35 UTC (permalink / raw)
  To: mathieu.desnoyers
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Rather than having a single CREATE_TRACE_POINTS which indescriminately
creates tracepoints, use CREATE_FOO_TRACE_POINTS to make then for subsystem
foo.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/trace/events/irq.h                 |    5 +++++
 include/trace/events/kmem.h                |    5 +++++
 include/trace/events/lockdep.h             |    5 +++++
 include/trace/events/sched.h               |    5 +++++
 include/trace/events/skb.h                 |    6 ++++++
 include/trace/instantiate_trace.h          |    7 +++++++
 kernel/irq/handle.c                        |    2 +-
 kernel/lockdep.c                           |    2 +-
 kernel/sched.c                             |    2 +-
 mm/util.c                                  |    2 +-
 net/core/net-traces.c                      |    2 +-
 samples/trace_events/trace-events-sample.c |    2 +-
 samples/trace_events/trace-events-sample.h |    6 ++++++
 13 files changed, 45 insertions(+), 6 deletions(-)
 create mode 100644 include/trace/instantiate_trace.h

diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 75e3468..ddc62f5 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -54,4 +54,9 @@ TRACE_FORMAT(softirq_exit,
 #endif /*  _TRACE_IRQ_H */
 
 /* This part must be outside protection */
+#ifdef CREATE_IRQ_TRACE_POINTS
+#undef CREATE_IRQ_TRACE_POINTS	/* avoid infinite recursion */
+#include <trace/instantiate_trace.h>
+#else
 #include <trace/define_trace.h>
+#endif
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index c22c42f..80c24b4 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -191,4 +191,9 @@ TRACE_EVENT(kmem_cache_free,
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */
+#ifdef CREATE_KMEM_TRACE_POINTS
+#undef CREATE_KMEM_TRACE_POINTS	/* avoid infinite recursion */
+#include <trace/instantiate_trace.h>
+#else
 #include <trace/define_trace.h>
+#endif
diff --git a/include/trace/events/lockdep.h b/include/trace/events/lockdep.h
index 45e326b..d5540c8 100644
--- a/include/trace/events/lockdep.h
+++ b/include/trace/events/lockdep.h
@@ -57,4 +57,9 @@ TRACE_EVENT(lock_acquired,
 #endif /* _TRACE_LOCKDEP_H */
 
 /* This part must be outside protection */
+#ifdef CREATE_LOCKDEP_TRACE_POINTS
+#undef CREATE_LOCKDEP_TRACE_POINTS	/* avoid infinite recursion */
+#include <trace/instantiate_trace.h>
+#else
 #include <trace/define_trace.h>
+#endif
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index ffa1cab..bacdc54 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -336,4 +336,9 @@ TRACE_EVENT(sched_signal_send,
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
+#ifdef CREATE_SCHED_TRACE_POINTS
+#undef CREATE_SCHED_TRACE_POINTS	/* avoid infinite recursion */
+#include <trace/instantiate_trace.h>
+#else
 #include <trace/define_trace.h>
+#endif
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 1e8fabb..d316de9 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -37,4 +37,10 @@ TRACE_EVENT(kfree_skb,
 #endif /* _TRACE_SKB_H */
 
 /* This part must be outside protection */
+#ifdef CREATE_SKB_TRACE_POINTS
+#undef CREATE_SKB_TRACE_POINTS	/* avoid infinite recursion */
+#include <trace/instantiate_trace.h>
+#else
 #include <trace/define_trace.h>
+#endif
+
diff --git a/include/trace/instantiate_trace.h b/include/trace/instantiate_trace.h
new file mode 100644
index 0000000..9ccda3e
--- /dev/null
+++ b/include/trace/instantiate_trace.h
@@ -0,0 +1,7 @@
+/* 
+ * trace/events/foo.h include this when their subsystem-specific
+ * CREATE_FOO_TRACE_POINTS is defined.
+ */
+#define CREATE_TRACE_POINTS
+#include <trace/define_trace.h>
+#undef CREATE_TRACE_POINTS
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 37c6363..264f478 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -19,7 +19,7 @@
 #include <linux/hash.h>
 #include <linux/bootmem.h>
 
-#define CREATE_TRACE_POINTS
+#define CREATE_IRQ_TRACE_POINTS
 #include <trace/events/irq.h>
 
 #include "internals.h"
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 47b201e..c2ddfb2 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -47,7 +47,7 @@
 
 #include "lockdep_internals.h"
 
-#define CREATE_TRACE_POINTS
+#define CREATE_LOCKDEP_TRACE_POINTS
 #include <trace/events/lockdep.h>
 
 #ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/sched.c b/kernel/sched.c
index 9f7ffd0..3093b44 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -78,7 +78,7 @@
 
 #include "sched_cpupri.h"
 
-#define CREATE_TRACE_POINTS
+#define CREATE_SCHED_TRACE_POINTS
 #include <trace/events/sched.h>
 
 /*
diff --git a/mm/util.c b/mm/util.c
index 6794a33..9bef53c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -6,7 +6,7 @@
 #include <linux/sched.h>
 #include <asm/uaccess.h>
 
-#define CREATE_TRACE_POINTS
+#define CREATE_KMEM_TRACE_POINTS
 #include <trace/events/kmem.h>
 
 /**
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 499a67e..df0b5fb 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -23,7 +23,7 @@
 #include <asm/unaligned.h>
 #include <asm/bitops.h>
 
-#define CREATE_TRACE_POINTS
+#define CREATE_SKB_TRACE_POINTS
 #include <trace/events/skb.h>
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index f33b3ba..fe47b6d 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -7,7 +7,7 @@
  * CREATE_TRACE_POINTS first.  This will make the C code that
  * creates the handles for the trace points.
  */
-#define CREATE_TRACE_POINTS
+#define CREATE_SAMPLE_TRACE_POINTS
 #include "trace-events-sample.h"
 
 
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index eab4644..42be370 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -121,4 +121,10 @@ TRACE_EVENT(foo_bar,
  */
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH .
+
+#ifdef CREATE_SAMPLE_TRACE_POINTS
+#undef CREATE_SAMPLE_TRACE_POINTS	/* avoid infinite recursion */
+#include <trace/instantiate_trace.h>
+#else
 #include <trace/define_trace.h>
+#endif
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE
  2009-04-17  6:35 [PATCH] tracing WIP patches Jeremy Fitzhardinge
  2009-04-17  6:35 ` [PATCH 1/4] tracing: move __DO_TRACE out of line Jeremy Fitzhardinge
  2009-04-17  6:35 ` [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems Jeremy Fitzhardinge
@ 2009-04-17  6:35 ` Jeremy Fitzhardinge
  2009-04-17  6:48   ` Christoph Hellwig
  2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
  3 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17  6:35 UTC (permalink / raw)
  To: mathieu.desnoyers
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

We need to pass the prototype and args to DEFINE_TRACE so that it can
instantiate __do_trace_X.  Unfortunately, this requires duplication
of information already passed to DECLARE_TRACE, so any change needs
to be updated in both places.  It doesn't help that uses of DEFINE_TRACE
are scattered throughout the sources.

(Moving to the new TRACE_EVENT() interface will solve this by putting
all the information into one place.)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    4 +-
 arch/x86/kernel/process.c                  |    9 +++-
 block/blk-core.c                           |   56 ++++++++++++++++++++++-----
 block/elevator.c                           |   13 +++++-
 drivers/md/dm.c                            |    4 +-
 fs/bio.c                                   |    4 +-
 include/linux/tracepoint.h                 |   12 ++++--
 include/trace/define_trace.h               |    9 +---
 kernel/workqueue.c                         |   16 ++++++--
 mm/bounce.c                                |    4 +-
 10 files changed, 97 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 3e3cd3d..07a6ce7 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -73,7 +73,9 @@ struct acpi_cpufreq_data {
 
 static DEFINE_PER_CPU(struct acpi_cpufreq_data *, drv_data);
 
-DEFINE_TRACE(power_mark);
+DEFINE_TRACE(power_mark,
+	TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
+	      TP_ARGS(it, type, state));
 
 /* acpi_perf_data is a pointer to percpu data. */
 static struct acpi_processor_performance *acpi_perf_data;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ca98915..e3b0593 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -22,8 +22,13 @@ EXPORT_SYMBOL(idle_nomwait);
 
 struct kmem_cache *task_xstate_cachep;
 
-DEFINE_TRACE(power_start);
-DEFINE_TRACE(power_end);
+DEFINE_TRACE(power_start,
+	TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
+	      TP_ARGS(it, type, state));
+
+DEFINE_TRACE(power_end,
+	TP_PROTO(struct power_trace *it),
+	      TP_ARGS(it));
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
diff --git a/block/blk-core.c b/block/blk-core.c
index 07ab754..dbd6391 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -32,17 +32,51 @@
 
 #include "blk.h"
 
-DEFINE_TRACE(block_plug);
-DEFINE_TRACE(block_unplug_io);
-DEFINE_TRACE(block_unplug_timer);
-DEFINE_TRACE(block_getrq);
-DEFINE_TRACE(block_sleeprq);
-DEFINE_TRACE(block_rq_requeue);
-DEFINE_TRACE(block_bio_backmerge);
-DEFINE_TRACE(block_bio_frontmerge);
-DEFINE_TRACE(block_bio_queue);
-DEFINE_TRACE(block_rq_complete);
-DEFINE_TRACE(block_remap);	/* Also used in drivers/md/dm.c */
+DEFINE_TRACE(block_rq_requeue,
+	TP_PROTO(struct request_queue *q, struct request *rq),
+	      TP_ARGS(q, rq));
+
+DEFINE_TRACE(block_rq_complete,
+	TP_PROTO(struct request_queue *q, struct request *rq),
+	      TP_ARGS(q, rq));
+
+DEFINE_TRACE(block_bio_backmerge,
+	TP_PROTO(struct request_queue *q, struct bio *bio),
+	      TP_ARGS(q, bio));
+
+DEFINE_TRACE(block_bio_frontmerge,
+	TP_PROTO(struct request_queue *q, struct bio *bio),
+	      TP_ARGS(q, bio));
+
+DEFINE_TRACE(block_bio_queue,
+	TP_PROTO(struct request_queue *q, struct bio *bio),
+	      TP_ARGS(q, bio));
+
+DEFINE_TRACE(block_getrq,
+	TP_PROTO(struct request_queue *q, struct bio *bio, int rw),
+	      TP_ARGS(q, bio, rw));
+
+DEFINE_TRACE(block_sleeprq,
+	TP_PROTO(struct request_queue *q, struct bio *bio, int rw),
+	      TP_ARGS(q, bio, rw));
+
+DEFINE_TRACE(block_plug,
+	TP_PROTO(struct request_queue *q),
+	      TP_ARGS(q));
+
+DEFINE_TRACE(block_unplug_timer,
+	TP_PROTO(struct request_queue *q),
+	      TP_ARGS(q));
+
+DEFINE_TRACE(block_unplug_io,
+	TP_PROTO(struct request_queue *q),
+	      TP_ARGS(q));
+
+DEFINE_TRACE(block_remap,	/* Also used in drivers/md/dm.c */
+	TP_PROTO(struct request_queue *q, struct bio *bio, dev_t dev,
+		 sector_t from, sector_t to),
+	      TP_ARGS(q, bio, dev, from, to));
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_remap);
 
 static int __make_request(struct request_queue *q, struct bio *bio);
diff --git a/block/elevator.c b/block/elevator.c
index fb81bcc..96c7849 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -42,7 +42,9 @@
 static DEFINE_SPINLOCK(elv_list_lock);
 static LIST_HEAD(elv_list);
 
-DEFINE_TRACE(block_rq_abort);
+DEFINE_TRACE(block_rq_abort,
+	TP_PROTO(struct request_queue *q, struct request *rq),
+	      TP_ARGS(q, rq));
 
 /*
  * Merge hash stuff.
@@ -55,8 +57,13 @@ static const int elv_hash_shift = 6;
 #define rq_hash_key(rq)		((rq)->sector + (rq)->nr_sectors)
 #define ELV_ON_HASH(rq)		(!hlist_unhashed(&(rq)->hash))
 
-DEFINE_TRACE(block_rq_insert);
-DEFINE_TRACE(block_rq_issue);
+DEFINE_TRACE(block_rq_insert,
+	TP_PROTO(struct request_queue *q, struct request *rq),
+	      TP_ARGS(q, rq));
+
+DEFINE_TRACE(block_rq_issue,
+	TP_PROTO(struct request_queue *q, struct request *rq),
+	      TP_ARGS(q, rq));
 
 /*
  * Query io scheduler to see if the current process issuing bio may be
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8a994be..73efa35 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -54,7 +54,9 @@ struct dm_target_io {
 	union map_info info;
 };
 
-DEFINE_TRACE(block_bio_complete);
+DEFINE_TRACE(block_bio_complete,
+	TP_PROTO(struct request_queue *q, struct bio *bio),
+	      TP_ARGS(q, bio));
 
 /*
  * For request-based dm.
diff --git a/fs/bio.c b/fs/bio.c
index e0c9e54..07fa718 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -29,7 +29,9 @@
 #include <trace/block.h>
 #include <scsi/sg.h>		/* for struct sg_iovec */
 
-DEFINE_TRACE(block_split);
+DEFINE_TRACE(block_split,
+	TP_PROTO(struct request_queue *q, struct bio *bio, unsigned int pdu),
+	      TP_ARGS(q, bio, pdu));
 
 /*
  * Test patch to inline a certain number of bi_io_vec's inside the bio
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 1052e33..7451361 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -80,17 +80,21 @@ struct tracepoint {
 		return tracepoint_probe_unregister(#name, (void *)probe);\
 	}
 
-#define DEFINE_TRACE(name)						\
+#define DEFINE_TRACE(name, proto, args)					\
 	static const char __tpstrtab_##name[]				\
 	__attribute__((section("__tracepoints_strings"))) = #name;	\
 	struct tracepoint __tracepoint_##name				\
 	__attribute__((section("__tracepoints"), aligned(32))) =	\
-		{ __tpstrtab_##name, 0, NULL }
+	{ __tpstrtab_##name, 0, NULL };					\
+	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))
 
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)				\
-	EXPORT_SYMBOL_GPL(__tracepoint_##name)
+	EXPORT_SYMBOL_GPL(__tracepoint_##name);				\
+	EXPORT_SYMBOL_GPL(__do_trace_##name)
+
 #define EXPORT_TRACEPOINT_SYMBOL(name)					\
-	EXPORT_SYMBOL(__tracepoint_##name)
+	EXPORT_SYMBOL(__tracepoint_##name);				\
+	EXPORT_SYMBOL(__do_trace_##name)
 
 extern void tracepoint_update_probe_range(struct tracepoint *begin,
 	struct tracepoint *end);
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index fa46100..9887525 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -24,18 +24,15 @@
 
 #undef TRACE_EVENT
 #define TRACE_EVENT(name, proto, args, tstruct, assign, print)	\
-	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, TP_PROTO(proto), TP_ARGS(args))
 
 #undef TRACE_FORMAT
 #define TRACE_FORMAT(name, proto, args, print)			\
-	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, TP_PROTO(proto), TP_ARGS(args))
 
 #undef DECLARE_TRACE
 #define DECLARE_TRACE(name, proto, args)			\
-	DEFINE_DO_TRACE(name, TP_PROTO(proto), TP_ARGS(args))	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, TP_PROTO(proto), TP_ARGS(args))
 
 #undef TRACE_INCLUDE
 #undef __TRACE_INCLUDE
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f71fb2a..4e8b519 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -124,7 +124,9 @@ struct cpu_workqueue_struct *get_wq_data(struct work_struct *work)
 	return (void *) (atomic_long_read(&work->data) & WORK_STRUCT_WQ_DATA_MASK);
 }
 
-DEFINE_TRACE(workqueue_insertion);
+DEFINE_TRACE(workqueue_insertion,
+	   TP_PROTO(struct task_struct *wq_thread, struct work_struct *work),
+	   TP_ARGS(wq_thread, work));
 
 static void insert_work(struct cpu_workqueue_struct *cwq,
 			struct work_struct *work, struct list_head *head)
@@ -262,7 +264,9 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
 
-DEFINE_TRACE(workqueue_execution);
+DEFINE_TRACE(workqueue_execution,
+	     TP_PROTO(struct task_struct *wq_thread, struct work_struct *work),
+	     TP_ARGS(wq_thread, work));
 
 static void run_workqueue(struct cpu_workqueue_struct *cwq)
 {
@@ -753,7 +757,9 @@ init_cpu_workqueue(struct workqueue_struct *wq, int cpu)
 	return cwq;
 }
 
-DEFINE_TRACE(workqueue_creation);
+DEFINE_TRACE(workqueue_creation,
+	   TP_PROTO(struct task_struct *wq_thread, int cpu),
+	   TP_ARGS(wq_thread, cpu));
 
 static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu)
 {
@@ -860,7 +866,9 @@ struct workqueue_struct *__create_workqueue_key(const char *name,
 }
 EXPORT_SYMBOL_GPL(__create_workqueue_key);
 
-DEFINE_TRACE(workqueue_destruction);
+DEFINE_TRACE(workqueue_destruction,
+	   TP_PROTO(struct task_struct *wq_thread),
+	   TP_ARGS(wq_thread));
 
 static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
 {
diff --git a/mm/bounce.c b/mm/bounce.c
index e590272..8a7a59d 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -22,7 +22,9 @@
 
 static mempool_t *page_pool, *isa_page_pool;
 
-DEFINE_TRACE(block_bio_bounce);
+DEFINE_TRACE(block_bio_bounce,
+	TP_PROTO(struct request_queue *q, struct bio *bio),
+	      TP_ARGS(q, bio));
 
 #ifdef CONFIG_HIGHMEM
 static __init int init_emergency_pool(void)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints
  2009-04-17  6:35 [PATCH] tracing WIP patches Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2009-04-17  6:35 ` [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE Jeremy Fitzhardinge
@ 2009-04-17  6:35 ` Jeremy Fitzhardinge
  2009-04-17 15:53   ` Steven Rostedt
                     ` (2 more replies)
  3 siblings, 3 replies; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17  6:35 UTC (permalink / raw)
  To: mathieu.desnoyers
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Tracepoints with no arguments can issue two warnings:
	"field" defined by not used
	"ret" is uninitialized in this function

Mark field as being OK to leave unused, and initialize ret.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/trace/ftrace.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 60c5323..39a3351 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -160,8 +160,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags)	\
 static int								\
 ftrace_format_##call(struct trace_seq *s)				\
 {									\
-	struct ftrace_raw_##call field;					\
-	int ret;							\
+	struct ftrace_raw_##call field __attribute__((unused));		\
+	int ret = 0;							\
 									\
 	tstruct;							\
 									\
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE
  2009-04-17  6:35 ` [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE Jeremy Fitzhardinge
@ 2009-04-17  6:48   ` Christoph Hellwig
  2009-04-17  6:58     ` Jeremy Fitzhardinge
  2009-04-17 15:21     ` Mathieu Desnoyers
  0 siblings, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2009-04-17  6:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

On Thu, Apr 16, 2009 at 11:35:38PM -0700, Jeremy Fitzhardinge wrote:
> -DEFINE_TRACE(power_mark);
> +DEFINE_TRACE(power_mark,
> +	TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
> +	      TP_ARGS(it, type, state));

Wrong indentation, the TP_ARGS should have the same level of indentation
(one) as the TP_PROTO.  Also true for all others.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE
  2009-04-17  6:48   ` Christoph Hellwig
@ 2009-04-17  6:58     ` Jeremy Fitzhardinge
  2009-04-17  7:05       ` Christoph Hellwig
  2009-04-17 15:21     ` Mathieu Desnoyers
  1 sibling, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17  6:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mathieu.desnoyers, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

Christoph Hellwig wrote:
> On Thu, Apr 16, 2009 at 11:35:38PM -0700, Jeremy Fitzhardinge wrote:
>   
>> -DEFINE_TRACE(power_mark);
>> +DEFINE_TRACE(power_mark,
>> +	TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
>> +	      TP_ARGS(it, type, state));
>>     
>
> Wrong indentation, the TP_ARGS should have the same level of indentation
> (one) as the TP_PROTO.  Also true for all others.
>   

It's just cut+replace+paste from the DECLARE_TRACE() definitions in 
trace/power.h.  The proper fix is to not duplicate all that stuff in the 
first place, but I didn't want to introduce a gratuitous re-indent in 
the process.

    J


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE
  2009-04-17  6:58     ` Jeremy Fitzhardinge
@ 2009-04-17  7:05       ` Christoph Hellwig
  2009-04-17 12:53         ` Ingo Molnar
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2009-04-17  7:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Christoph Hellwig, mathieu.desnoyers, Steven Rostedt,
	Ingo Molnar, Linux Kernel Mailing List, Jeremy Fitzhardinge

On Thu, Apr 16, 2009 at 11:58:56PM -0700, Jeremy Fitzhardinge wrote:
> Christoph Hellwig wrote:
>> On Thu, Apr 16, 2009 at 11:35:38PM -0700, Jeremy Fitzhardinge wrote:
>>   
>>> -DEFINE_TRACE(power_mark);
>>> +DEFINE_TRACE(power_mark,
>>> +	TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
>>> +	      TP_ARGS(it, type, state));
>>>     
>>
>> Wrong indentation, the TP_ARGS should have the same level of indentation
>> (one) as the TP_PROTO.  Also true for all others.
>>   
>
> It's just cut+replace+paste from the DECLARE_TRACE() definitions in  
> trace/power.h.  The proper fix is to not duplicate all that stuff in the  
> first place, but I didn't want to introduce a gratuitous re-indent in  
> the process.

When copying it to a new place I don't think that's a good enough
excuse.


Then again I'd really wish we could get Steve's recents bits merged for
various reasons.  The whole DEFINE_TRACE thing only appeared in 2.6.30
and releasing that one kernel with the half-baked inferior version
sounds like a really bad idea.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE
  2009-04-17  7:05       ` Christoph Hellwig
@ 2009-04-17 12:53         ` Ingo Molnar
  0 siblings, 0 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-04-17 12:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeremy Fitzhardinge, mathieu.desnoyers, Steven Rostedt,
	Linux Kernel Mailing List, Jeremy Fitzhardinge


* Christoph Hellwig <hch@infradead.org> wrote:

> Then again I'd really wish we could get Steve's recents bits 
> merged for various reasons. [...]

I symphathise with your desire to have the latest and greatest stuff 
upstream right now (i get asked this all the time - _everyone_ wants 
their important stuff upstream, yesterday (and everyone else's 
unstable unnecessary crap should wait)), but there's no way we'll 
rush these new tracing features upstream now. They are not yet 
complete and there's a set of new users whose needs have to be 
observed and designed in.

That is how the upstream development cycle works: stuff added after 
the merge window goes upstream in the next merge window, at the 
earliest. (I.e. in about 2 months from now.)

> [...] The whole DEFINE_TRACE thing only appeared in 2.6.30 and 
> releasing that one kernel with the half-baked inferior version 
> sounds like a really bad idea.

The current upstream code in .30 is fully functional and useful to 
all the subsystems that are making use of it - so your attack on it 
is a bit puzzling to me.

You showed up when, some two weeks ago or so, in the merge window? 
That's generally pretty much the worst time to ask for more 
features. The moral of the story is really: if you feel strongly 
about features in an area, get involved sooner.

Module support (which your complaint really boils down to) was never 
really popular with users of this stuff: the pre-.30 facilities were 
exported to modules all along, but were rarely used from that angle 
(for understandable reasons). Most tracing code contributions and 
enhancements came from the direction of non-modular core kernel 
facilities.

Linus doesnt even _use_ modular kernels - neither do i ;)

Yes, once a facility proves to be successful (this is about the 
fifth version - it all evolved gradually along several kernel 
releases), everyone jumps on it and wants new features, preferably 
yesterday.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE
  2009-04-17  6:48   ` Christoph Hellwig
  2009-04-17  6:58     ` Jeremy Fitzhardinge
@ 2009-04-17 15:21     ` Mathieu Desnoyers
  1 sibling, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2009-04-17 15:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeremy Fitzhardinge, Steven Rostedt, Ingo Molnar,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

* Christoph Hellwig (hch@infradead.org) wrote:
> On Thu, Apr 16, 2009 at 11:35:38PM -0700, Jeremy Fitzhardinge wrote:
> > -DEFINE_TRACE(power_mark);
> > +DEFINE_TRACE(power_mark,
> > +	TP_PROTO(struct power_trace *it, unsigned int type, unsigned int state),
> > +	      TP_ARGS(it, type, state));
> 
> Wrong indentation, the TP_ARGS should have the same level of indentation
> (one) as the TP_PROTO.  Also true for all others.
> 

Ingo actually proposed to add a supplementary indentation before TP_ARGS
in trace/sched.h to ease readability.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17  6:35 ` [PATCH 1/4] tracing: move __DO_TRACE out of line Jeremy Fitzhardinge
@ 2009-04-17 15:46   ` Ingo Molnar
  2009-04-17 16:10     ` Mathieu Desnoyers
  0 siblings, 1 reply; 37+ messages in thread
From: Ingo Molnar @ 2009-04-17 15:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Mainly simplify linux/tracepoint.h's include dependencies (removes 
> rcupdate.h), but it can't help with icache locality, since it 
> definitely moves the code out of line, rather than relying on gcc 
> to do it.

> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,7 +15,6 @@
>   */
>  
>  #include <linux/types.h>
> -#include <linux/rcupdate.h>

nice!

> +#define DEFINE_DO_TRACE(name, proto, args)				\
> +	void __do_trace_##name(struct tracepoint *tp, TP_PROTO(proto))	\
> +	{								\

that needs to be marked notrace, otherwise the function tracer 
becomes noisy. (or even lockupy.)

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints
  2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
@ 2009-04-17 15:53   ` Steven Rostedt
  2009-04-17 15:53   ` Ingo Molnar
  2009-04-17 16:10   ` [tip:tracing/core] " tip-bot for Jeremy Fitzhardinge
  2 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 15:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge



On Thu, 16 Apr 2009, Jeremy Fitzhardinge wrote:

> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Tracepoints with no arguments can issue two warnings:
> 	"field" defined by not used
> 	"ret" is uninitialized in this function
> 
> Mark field as being OK to leave unused, and initialize ret.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  include/trace/ftrace.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 60c5323..39a3351 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -160,8 +160,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags)	\
>  static int								\
>  ftrace_format_##call(struct trace_seq *s)				\
>  {									\
> -	struct ftrace_raw_##call field;					\
> -	int ret;							\
> +	struct ftrace_raw_##call field __attribute__((unused));		\
> +	int ret = 0;							\
>  									\
>  	tstruct;							\
>  									\

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints
  2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
  2009-04-17 15:53   ` Steven Rostedt
@ 2009-04-17 15:53   ` Ingo Molnar
  2009-04-17 16:10   ` [tip:tracing/core] " tip-bot for Jeremy Fitzhardinge
  2 siblings, 0 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-04-17 15:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Tracepoints with no arguments can issue two warnings:
> 	"field" defined by not used
> 	"ret" is uninitialized in this function
> 
> Mark field as being OK to leave unused, and initialize ret.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  include/trace/ftrace.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)

> 
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 60c5323..39a3351 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -160,8 +160,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags)	\
>  static int								\
>  ftrace_format_##call(struct trace_seq *s)				\
>  {									\
> -	struct ftrace_raw_##call field;					\
> -	int ret;							\
> +	struct ftrace_raw_##call field __attribute__((unused));		\
> +	int ret = 0;							\
>  									\
>  	tstruct;							\

This looks like a fix we should pick up straight away. I've applied 
it to tip:tracing/ftrace - Steve is it fine with you too?

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17  6:35 ` [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems Jeremy Fitzhardinge
@ 2009-04-17 15:55   ` Steven Rostedt
  2009-04-17 16:14     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 15:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


On Thu, 16 Apr 2009, Jeremy Fitzhardinge wrote:
> 
> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> index 75e3468..ddc62f5 100644
> --- a/include/trace/events/irq.h
> +++ b/include/trace/events/irq.h
> @@ -54,4 +54,9 @@ TRACE_FORMAT(softirq_exit,
>  #endif /*  _TRACE_IRQ_H */
>  
>  /* This part must be outside protection */
> +#ifdef CREATE_IRQ_TRACE_POINTS
> +#undef CREATE_IRQ_TRACE_POINTS	/* avoid infinite recursion */
> +#include <trace/instantiate_trace.h>
> +#else
>  #include <trace/define_trace.h>
> +#endif




> --- /dev/null
> +++ b/include/trace/instantiate_trace.h
> @@ -0,0 +1,7 @@
> +/* 
> + * trace/events/foo.h include this when their subsystem-specific
> + * CREATE_FOO_TRACE_POINTS is defined.
> + */
> +#define CREATE_TRACE_POINTS
> +#include <trace/define_trace.h>
> +#undef CREATE_TRACE_POINTS

Instead of doing it this way, what about not having this new header, and 
just do:


#ifdef CREATE_IRQ_TRACE_POINTS
#define CREATE_TRACE_POINTS
#endif
#include <trace/define_trace.h>

Heck, define_trace.h is only defined when CREATE_TRACE_POINTS is defined, 
why not just remove that and have:

#ifdef CREATE_IRQ_TRACE_POINTS
#define <trace/define_trace.h>
#endif

The whole trickery with the CREATE_TRACE_POINTS was to avoid the #if in 
the trace headers anyway. If we can't avoid it. We don't need to add more 
confusion to the mix.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 15:46   ` Ingo Molnar
@ 2009-04-17 16:10     ` Mathieu Desnoyers
  2009-04-17 16:23       ` Ingo Molnar
  0 siblings, 1 reply; 37+ messages in thread
From: Mathieu Desnoyers @ 2009-04-17 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
> > From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> > 
> > Mainly simplify linux/tracepoint.h's include dependencies (removes 
> > rcupdate.h), but it can't help with icache locality, since it 
> > definitely moves the code out of line, rather than relying on gcc 
> > to do it.
> 
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -15,7 +15,6 @@
> >   */
> >  
> >  #include <linux/types.h>
> > -#include <linux/rcupdate.h>
> 
> nice!
> 
> > +#define DEFINE_DO_TRACE(name, proto, args)				\
> > +	void __do_trace_##name(struct tracepoint *tp, TP_PROTO(proto))	\
> > +	{								\
> 
> that needs to be marked notrace, otherwise the function tracer 
> becomes noisy. (or even lockupy.)
> 

I guess I'll have to put it more clearly : I am all for minimizing
tracepoint header dependency, but I'll be nacking this kind of
out-of-lining patch. Taking a function call, and moving it out-of-line
(thus duplicating the function call for nothing) seems *really*
pointless and will hurt tracer performance.

If thread_info.h is now so big that it needs a cleanup, I guess we'll
just have to do it.

Mathieu

> 	Ingo

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [tip:tracing/core] tracing: avoid warnings from zero-arg tracepoints
  2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
  2009-04-17 15:53   ` Steven Rostedt
  2009-04-17 15:53   ` Ingo Molnar
@ 2009-04-17 16:10   ` tip-bot for Jeremy Fitzhardinge
  2 siblings, 0 replies; 37+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2009-04-17 16:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, rostedt, tglx, mingo, jeremy.fitzhardinge

Commit-ID:  76aa81118ddfbb3dc31533030cf3ec329dd067a6
Gitweb:     http://git.kernel.org/tip/76aa81118ddfbb3dc31533030cf3ec329dd067a6
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 16 Apr 2009 23:35:39 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 17 Apr 2009 17:52:26 +0200

tracing: avoid warnings from zero-arg tracepoints

Tracepoints with no arguments can issue two warnings:

	"field" defined by not used
	"ret" is uninitialized in this function

Mark field as being OK to leave unused, and initialize ret.

[ Impact: fix false positive compiler warnings. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: mathieu.desnoyers@polymtl.ca
LKML-Reference: <1239950139-1119-5-git-send-email-jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 include/trace/ftrace.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 60c5323..39a3351 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -160,8 +160,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags)	\
 static int								\
 ftrace_format_##call(struct trace_seq *s)				\
 {									\
-	struct ftrace_raw_##call field;					\
-	int ret;							\
+	struct ftrace_raw_##call field __attribute__((unused));		\
+	int ret = 0;							\
 									\
 	tstruct;							\
 									\

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 15:55   ` Steven Rostedt
@ 2009-04-17 16:14     ` Jeremy Fitzhardinge
  2009-04-17 16:32       ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Steven Rostedt wrote:
> Instead of doing it this way, what about not having this new header, and 
> just do:
>
>
> #ifdef CREATE_IRQ_TRACE_POINTS
> #define CREATE_TRACE_POINTS
> #endif
> #include <trace/define_trace.h>
>
> Heck, define_trace.h is only defined when CREATE_TRACE_POINTS is defined, 
> why not just remove that and have:
>
> #ifdef CREATE_IRQ_TRACE_POINTS
> #define <trace/define_trace.h>
> #endif
>
> The whole trickery with the CREATE_TRACE_POINTS was to avoid the #if in 
> the trace headers anyway. If we can't avoid it. We don't need to add more 
> confusion to the mix.
>   

OK, and remove the test for CREATE_TRACE_POINTS in define_trace.h 
altogether?  Works for me.
Uh, still needs the #undef CREATE_IRQ_TRACE_POINTS.

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 16:10     ` Mathieu Desnoyers
@ 2009-04-17 16:23       ` Ingo Molnar
  2009-04-17 16:47         ` Jeremy Fitzhardinge
  2009-04-17 19:31         ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-04-17 16:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jeremy Fitzhardinge, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> > > +#define DEFINE_DO_TRACE(name, proto, args)				\
> > > +	void __do_trace_##name(struct tracepoint *tp, TP_PROTO(proto))	\
> > > +	{								\
> > 
> > that needs to be marked notrace, otherwise the function tracer 
> > becomes noisy. (or even lockupy.)
> 
> I guess I'll have to put it more clearly : I am all for minimizing 
> tracepoint header dependency, but I'll be nacking this kind of 
> out-of-lining patch. Taking a function call, and moving it 
> out-of-line (thus duplicating the function call for nothing) seems 
> *really* pointless and will hurt tracer performance.

No need to nak - just say you dont like it and it gets fixed :)

I meant to suggest to Jeremy to measure the effect of this 
out-of-lining, in terms of instruction count in the hotpath.

> If thread_info.h is now so big that it needs a cleanup, I guess 
> we'll just have to do it.

Music to my ears ...

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 16:14     ` Jeremy Fitzhardinge
@ 2009-04-17 16:32       ` Steven Rostedt
  2009-04-17 16:48         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 16:32 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:

> Steven Rostedt wrote:
> > Instead of doing it this way, what about not having this new header, and
> > just do:
> > 
> > 
> > #ifdef CREATE_IRQ_TRACE_POINTS
> > #define CREATE_TRACE_POINTS
> > #endif
> > #include <trace/define_trace.h>
> > 
> > Heck, define_trace.h is only defined when CREATE_TRACE_POINTS is defined,
> > why not just remove that and have:
> > 
> > #ifdef CREATE_IRQ_TRACE_POINTS
> > #define <trace/define_trace.h>
> > #endif
> > 
> > The whole trickery with the CREATE_TRACE_POINTS was to avoid the #if in the
> > trace headers anyway. If we can't avoid it. We don't need to add more
> > confusion to the mix.
> >   
> 
> OK, and remove the test for CREATE_TRACE_POINTS in define_trace.h altogether?
> Works for me.
> Uh, still needs the #undef CREATE_IRQ_TRACE_POINTS.

Ah yes! It needs to be:

#ifdef CONFIG_IRQ_TRACE_POINTS
#undef CONFIG_IRQ_TRACE_POINTS
#include <trace/define_trace.h>
#endif

Otherwise we get into the recursion again.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 16:23       ` Ingo Molnar
@ 2009-04-17 16:47         ` Jeremy Fitzhardinge
  2009-04-17 19:31         ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 16:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Ingo Molnar wrote:
> No need to nak - just say you dont like it and it gets fixed :)
>
> I meant to suggest to Jeremy to measure the effect of this 
> out-of-lining, in terms of instruction count in the hotpath.

Once I get the pvops tracing to link properly, it should be fairly clear 
one way or the other.

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 16:32       ` Steven Rostedt
@ 2009-04-17 16:48         ` Jeremy Fitzhardinge
  2009-04-17 16:57           ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 16:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Steven Rostedt wrote:
> Ah yes! It needs to be:
>
> #ifdef CONFIG_IRQ_TRACE_POINTS
> #undef CONFIG_IRQ_TRACE_POINTS
> #include <trace/define_trace.h>
> #endif
>
> Otherwise we get into the recursion again.
>   

We should probably also move the #define TRACE_SYS in there as well 
(without the #undef), as it should only have one definition at a time...

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 16:48         ` Jeremy Fitzhardinge
@ 2009-04-17 16:57           ` Steven Rostedt
  2009-04-17 17:14             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 16:57 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:

> Steven Rostedt wrote:
> > Ah yes! It needs to be:
> > 
> > #ifdef CONFIG_IRQ_TRACE_POINTS
> > #undef CONFIG_IRQ_TRACE_POINTS
> > #include <trace/define_trace.h>
> > #endif
> > 
> > Otherwise we get into the recursion again.
> >   
> 
> We should probably also move the #define TRACE_SYS in there as well (without
> the #undef), as it should only have one definition at a time...

Actually, I'm kind of against that. Just because as it stands, the 
TRACE_SYSTEM macro is up at the top, and it is easy to see.

Actually, we could do (from the top of the file)

#ifdef CONFIG_IRQ_TRACE_POINTS
#undef CONFIG_IRQ_TRACE_POINTS

#define TRACE_SYSTEM irq

#include <trace/define_trace.h>

#elif !defined(_TRACE_IRQ_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_IRQ_H

#include <linux/tracepoint.h>
#include <linux/interrupt.h>


[...]

#endif


-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 16:57           ` Steven Rostedt
@ 2009-04-17 17:14             ` Jeremy Fitzhardinge
  2009-04-17 17:33               ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 17:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Steven Rostedt wrote:
> On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:
>
>   
>> Steven Rostedt wrote:
>>     
>>> Ah yes! It needs to be:
>>>
>>> #ifdef CONFIG_IRQ_TRACE_POINTS
>>> #undef CONFIG_IRQ_TRACE_POINTS
>>> #include <trace/define_trace.h>
>>> #endif
>>>
>>> Otherwise we get into the recursion again.
>>>   
>>>       
>> We should probably also move the #define TRACE_SYS in there as well (without
>> the #undef), as it should only have one definition at a time...
>>     
>
> Actually, I'm kind of against that. Just because as it stands, the 
> TRACE_SYSTEM macro is up at the top, and it is easy to see.
>   

Yes, but it means that if you're in the middle of 
CREATE_FOO_TRACE_POINTS and foo.h happens to include bar.h, suddenly 
TRACE_SUBSYSTEM becomes bar...

> Actually, we could do (from the top of the file)
>
> #ifdef CONFIG_IRQ_TRACE_POINTS
> #undef CONFIG_IRQ_TRACE_POINTS
>
> #define TRACE_SYSTEM irq
>
> #include <trace/define_trace.h>
>
> #elif !defined(_TRACE_IRQ_H) || defined(TRACE_HEADER_MULTI_READ)
> #define _TRACE_IRQ_H
>
> #include <linux/tracepoint.h>
> #include <linux/interrupt.h>
>
>
> [...]
>
> #endif
>   

That's slightly different from what we have now.  At the moment its

    #if !defined(_TRACE_IRQ_H)...
    ...
    #endif

    #ifdef CREATE_IRQ_TRACE_POINTS
    ...
    #endif

So we get both the main part of the file and the CREATE_X_TRACE_POINTS 
parts.  Your suggestion makes them exclusive:

    #ifdef CREATE_IRQ_TRACE_POINTS
    ...
    #elif !defined(_TRACE_IRQ_H)...
    ...
    #endif
      

Does that make a difference?

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 17:14             ` Jeremy Fitzhardinge
@ 2009-04-17 17:33               ` Steven Rostedt
  2009-04-17 18:11                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 17:33 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge



On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:

> Steven Rostedt wrote:
> > On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:
> > 
> >   
> > > Steven Rostedt wrote:
> > >     
> > > > Ah yes! It needs to be:
> > > > 
> > > > #ifdef CONFIG_IRQ_TRACE_POINTS
> > > > #undef CONFIG_IRQ_TRACE_POINTS
> > > > #include <trace/define_trace.h>
> > > > #endif
> > > > 
> > > > Otherwise we get into the recursion again.
> > > >         
> > > We should probably also move the #define TRACE_SYS in there as well
> > > (without
> > > the #undef), as it should only have one definition at a time...
> > >     
> > 
> > Actually, I'm kind of against that. Just because as it stands, the
> > TRACE_SYSTEM macro is up at the top, and it is easy to see.
> >   
> 
> Yes, but it means that if you're in the middle of CREATE_FOO_TRACE_POINTS and
> foo.h happens to include bar.h, suddenly TRACE_SUBSYSTEM becomes bar...

How so?

> 
> > Actually, we could do (from the top of the file)
> > 
> > #ifdef CONFIG_IRQ_TRACE_POINTS
> > #undef CONFIG_IRQ_TRACE_POINTS

The first thing the file does is undef CONFIG_FOO_TRACE_POINTS, anything 
else that gets incuded, will not take this path.

> > 
> > #define TRACE_SYSTEM irq
> > 
> > #include <trace/define_trace.h>
> > 
> > #elif !defined(_TRACE_IRQ_H) || defined(TRACE_HEADER_MULTI_READ)
> > #define _TRACE_IRQ_H
> > 
> > #include <linux/tracepoint.h>
> > #include <linux/interrupt.h>
> > 
> > 
> > [...]
> > 
> > #endif
> >   
> 
> That's slightly different from what we have now.  At the moment its
> 
>    #if !defined(_TRACE_IRQ_H)...
>    ...
>    #endif
> 
>    #ifdef CREATE_IRQ_TRACE_POINTS
>    ...
>    #endif
> 
> So we get both the main part of the file and the CREATE_X_TRACE_POINTS parts.
> Your suggestion makes them exclusive:
> 
>    #ifdef CREATE_IRQ_TRACE_POINTS
>    ...
>    #elif !defined(_TRACE_IRQ_H)...
>    ...
>    #endif
>      
> Does that make a difference?

Hmm, I was about to disagree, but I think we need it separate. Because, we 
still need it to do the tracepoint conversion, and we also need to call 
tracepoint.h and TRACE_EVENT before calling define_trace.h.

OK, scrap that idea ;-)

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems
  2009-04-17 17:33               ` Steven Rostedt
@ 2009-04-17 18:11                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 18:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mathieu.desnoyers, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Steven Rostedt wrote:
> On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:
>
>   
>> Steven Rostedt wrote:
>>     
>>> On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:
>>>
>>>   
>>>       
>>>> Steven Rostedt wrote:
>>>>     
>>>>         
>>>>> Ah yes! It needs to be:
>>>>>
>>>>> #ifdef CONFIG_IRQ_TRACE_POINTS
>>>>> #undef CONFIG_IRQ_TRACE_POINTS
>>>>> #include <trace/define_trace.h>
>>>>> #endif
>>>>>
>>>>> Otherwise we get into the recursion again.
>>>>>         
>>>>>           
>>>> We should probably also move the #define TRACE_SYS in there as well
>>>> (without
>>>> the #undef), as it should only have one definition at a time...
>>>>     
>>>>         
>>> Actually, I'm kind of against that. Just because as it stands, the
>>> TRACE_SYSTEM macro is up at the top, and it is easy to see.
>>>   
>>>       
>> Yes, but it means that if you're in the middle of CREATE_FOO_TRACE_POINTS and
>> foo.h happens to include bar.h, suddenly TRACE_SUBSYSTEM becomes bar...
>>     
>
> How so?
>   

Hm, not exactly sure - now that I think of it - but it fixed things when 
I made the change.  Before I was getting kmem definitions where I was 
expecting pvops ones...

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 16:23       ` Ingo Molnar
  2009-04-17 16:47         ` Jeremy Fitzhardinge
@ 2009-04-17 19:31         ` Jeremy Fitzhardinge
  2009-04-17 19:46           ` Ingo Molnar
  2009-04-18  6:53           ` Mathieu Desnoyers
  1 sibling, 2 replies; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 19:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Ingo Molnar wrote:
> I meant to suggest to Jeremy to measure the effect of this 
> out-of-lining, in terms of instruction count in the hotpath.
>   

OK, here's a comparison for trace_sched_switch, comparing inline and out 
of line tracing functions, with CONFIG_PREEMPT enabled:

The inline __DO_TRACE version of trace_sched_switch inserts 20 
instructions, assembling to 114 bytes of code in the hot path:

        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
        je      .L1582  #,
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        incl    -8124(%rax)     # <variable>.preempt_count
        movq    __tracepoint_sched_switch+16(%rip), %r12        #, it_func
        testq   %r12, %r12      # it_func
.L1603:
        je      .L1583  #,
        movq    -136(%rbp), %rdx        # next,
        movq    -144(%rbp), %rsi        # prev,
        movq    %rbx, %rdi      # rq,
        call    *(%r12) #* it_func
        addq    $8, %r12        #, it_func
        cmpq    $0, (%r12)      #,* it_func
        jmp     .L1603  #
.L1583:
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        decl    -8124(%rax)     # <variable>.preempt_count
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        testb   $8, -8136(%rax) #,
        je      .L1582  #,
        call    preempt_schedule        #
.L1582:


Taking __do_trace_sched_switch out of lines inserts this into the hot 
path (6 instructions, 31 bytes):

        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
        je      .L1748  #,
        movq    -136(%rbp), %rdx        # next,
        movq    -144(%rbp), %rsi        # prev,
        movq    %rbx, %rdi      # rq,
        call    __do_trace_sched_switch #
.L1748:


__do_trace_sched_switch is a fair bit larger, mostly due to function 
preamble frame and reg save/restore, and some unfortunate and 
unnecessary register thrashing (why not keep rdi,rsi,rdx where they 
are?).  But it isn't that much larger than the inline version: 34 
instructions, 118 bytes.  This code will also be shared among all 
instances of the tracepoint (not in this case, because sched_switch is 
unique, but other tracepoints have multiple users).

__do_trace_sched_switch:
        pushq   %rbp    #
        movq    %rsp, %rbp      #,
        pushq   %r14    #
        movq    %rdi, %r14      # rq, rq
        pushq   %r13    #
        movq    %rsi, %r13      # prev, prev
        pushq   %r12    #
        movq    %rdx, %r12      # next, next
        pushq   %rbx    #
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        incl    -8124(%rax)     # <variable>.preempt_count
        movq    __tracepoint_sched_switch+16(%rip), %rax        #, _________p1
        testq   %rax, %rax      # _________p1
        je      .L2403  #,
        movq    %rax, %rbx      # _________p1, it_func
.L2404:
        movq    %r12, %rdx      # next,
        movq    %r13, %rsi      # prev,
        movq    %r14, %rdi      # rq,
        call    *(%rbx) #* it_func
        addq    $8, %rbx        #, it_func
        cmpq    $0, (%rbx)      #,* it_func
        jne     .L2404  #,
.L2403:
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        decl    -8124(%rax)     # <variable>.preempt_count
        movq %gs:per_cpu__kernel_stack,%rax     # per_cpu__kernel_stack, ret__
        testb   $8, -8136(%rax) #,
        je      .L2406  #,
        call    preempt_schedule        #
.L2406:
        popq    %rbx    #
        popq    %r12    #
        popq    %r13    #
        popq    %r14    #
        leave
        ret


So, conclusion: putting the tracepoint code out of line significantly 
reduces the hot-path code size at each tracepoint (114 bytes down to 31 
in this case, 27% the size).  This should reduce the overhead of having 
tracing configured but not enabled.  The saving won't be as large for 
tracepoints with fewer arguments or without CONFIG_PREEMPT, but I chose 
this example because it is realistic and undeniably a hot path.  And 
when doing pvops tracing, 80 new events with hundreds of callsites 
around the kernel, this is really going to add up.

The tradeoff is that the actual tracing function is a little larger, but 
not dramatically so.  I would expect some performance hit when the 
tracepoint is actually enabled.  This may be mitigated increased icache 
hits when a tracepoint has multiple sites.

(BTW, I realized that we don't need to pass &__tracepoint_FOO to 
__do_trace_FOO(), since its always going to be the same; this simplifies 
the calling convention at the callsite, and it also makes void 
tracepoints work again.)

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 19:31         ` Jeremy Fitzhardinge
@ 2009-04-17 19:46           ` Ingo Molnar
  2009-04-17 19:57             ` Steven Rostedt
  2009-04-17 19:58             ` Jeremy Fitzhardinge
  2009-04-18  6:53           ` Mathieu Desnoyers
  1 sibling, 2 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-04-17 19:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Mathieu Desnoyers, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Taking __do_trace_sched_switch out of lines inserts this into the 
> hot path (6 instructions, 31 bytes):
>
>        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
>        je      .L1748  #,
>        movq    -136(%rbp), %rdx        # next,
>        movq    -144(%rbp), %rsi        # prev,
>        movq    %rbx, %rdi      # rq,
>        call    __do_trace_sched_switch #
> .L1748:

Hm, why isnt this off-line in the function? It's marked unlikely(), 
isnt it?

also, did you investigate the effect on the _instrumented_ function 
itself? (i.e. the non-tracing related bits) A function call clobbers 
various registers and creates pressure on gcc to shuffle registers 
around.

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 19:46           ` Ingo Molnar
@ 2009-04-17 19:57             ` Steven Rostedt
  2009-04-17 19:58             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 19:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Mathieu Desnoyers,
	Linux Kernel Mailing List, Jeremy Fitzhardinge


On Fri, 17 Apr 2009, Ingo Molnar wrote:

> 
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
> > Taking __do_trace_sched_switch out of lines inserts this into the 
> > hot path (6 instructions, 31 bytes):
> >
> >        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
> >        je      .L1748  #,
> >        movq    -136(%rbp), %rdx        # next,
> >        movq    -144(%rbp), %rsi        # prev,
> >        movq    %rbx, %rdi      # rq,
> >        call    __do_trace_sched_switch #
> > .L1748:
> 
> Hm, why isnt this off-line in the function? It's marked unlikely(), 
> isnt it?
> 
> also, did you investigate the effect on the _instrumented_ function 
> itself? (i.e. the non-tracing related bits) A function call clobbers 
> various registers and creates pressure on gcc to shuffle registers 
> around.

I doubt it will make much difference. The inline version stil has the
function call to the trace point handler.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 19:46           ` Ingo Molnar
  2009-04-17 19:57             ` Steven Rostedt
@ 2009-04-17 19:58             ` Jeremy Fitzhardinge
  2009-04-17 20:06               ` Steven Rostedt
  1 sibling, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-17 19:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> Taking __do_trace_sched_switch out of lines inserts this into the 
>> hot path (6 instructions, 31 bytes):
>>
>>        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #, __tracepoint_sched_switch.state
>>        je      .L1748  #,
>>        movq    -136(%rbp), %rdx        # next,
>>        movq    -144(%rbp), %rsi        # prev,
>>        movq    %rbx, %rdi      # rq,
>>        call    __do_trace_sched_switch #
>> .L1748:
>>     
>
> Hm, why isnt this off-line in the function? It's marked unlikely(), 
> isnt it?
>   

Yes, its unlikely().  I don't know why it doesn't move it; I've never 
seen unlikely() do anything useful.

> also, did you investigate the effect on the _instrumented_ function 
> itself? (i.e. the non-tracing related bits) A function call clobbers 
> various registers and creates pressure on gcc to shuffle registers 
> around.
>   

Well, there's a function call in either case, so I don't think it makes 
much difference.

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 19:58             ` Jeremy Fitzhardinge
@ 2009-04-17 20:06               ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-04-17 20:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Mathieu Desnoyers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


On Fri, 17 Apr 2009, Jeremy Fitzhardinge wrote:

> Ingo Molnar wrote:
> > * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> > 
> >   
> > > Taking __do_trace_sched_switch out of lines inserts this into the hot path
> > > (6 instructions, 31 bytes):
> > > 
> > >        cmpl    $0, __tracepoint_sched_switch+8(%rip)   #,
> > > __tracepoint_sched_switch.state
> > >        je      .L1748  #,
> > >        movq    -136(%rbp), %rdx        # next,
> > >        movq    -144(%rbp), %rsi        # prev,
> > >        movq    %rbx, %rdi      # rq,
> > >        call    __do_trace_sched_switch #
> > > .L1748:
> > >     
> > 
> > Hm, why isnt this off-line in the function? It's marked unlikely(), isnt it?
> >   
> 
> Yes, its unlikely().  I don't know why it doesn't move it; I've never seen
> unlikely() do anything useful.
> 

I wounder if it is because it is in context_switch(). Perhaps it moved it 
to the end of that function, but being that context_switch() is only used 
in schedule, it could have inlined it. The gcc was not smart enough to 
move this unlikely down to the end of the schedule function?

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-17 19:31         ` Jeremy Fitzhardinge
  2009-04-17 19:46           ` Ingo Molnar
@ 2009-04-18  6:53           ` Mathieu Desnoyers
  2009-04-18 14:16             ` Steven Rostedt
  2009-04-19 23:40             ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2009-04-18  6:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> Ingo Molnar wrote:
>> I meant to suggest to Jeremy to measure the effect of this  
>> out-of-lining, in terms of instruction count in the hotpath.
>>   
>
> OK, here's a comparison for trace_sched_switch, comparing inline and out  
> of line tracing functions, with CONFIG_PREEMPT enabled:
>
> The inline __DO_TRACE version of trace_sched_switch inserts 20  
> instructions, assembling to 114 bytes of code in the hot path:
>

[...]

>
> __do_trace_sched_switch is a fair bit larger, mostly due to function  
> preamble frame and reg save/restore, and some unfortunate and  
> unnecessary register thrashing (why not keep rdi,rsi,rdx where they  
> are?).  But it isn't that much larger than the inline version: 34  
> instructions, 118 bytes.  This code will also be shared among all  
> instances of the tracepoint (not in this case, because sched_switch is  
> unique, but other tracepoints have multiple users).
>

[...]

> So, conclusion: putting the tracepoint code out of line significantly  
> reduces the hot-path code size at each tracepoint (114 bytes down to 31  
> in this case, 27% the size).  This should reduce the overhead of having  
> tracing configured but not enabled.  The saving won't be as large for  
> tracepoints with fewer arguments or without CONFIG_PREEMPT, but I chose  
> this example because it is realistic and undeniably a hot path.  And  
> when doing pvops tracing, 80 new events with hundreds of callsites  
> around the kernel, this is really going to add up.
>
> The tradeoff is that the actual tracing function is a little larger, but  
> not dramatically so.  I would expect some performance hit when the  
> tracepoint is actually enabled.  This may be mitigated increased icache  
> hits when a tracepoint has multiple sites.
>
> (BTW, I realized that we don't need to pass &__tracepoint_FOO to  
> __do_trace_FOO(), since its always going to be the same; this simplifies  
> the calling convention at the callsite, and it also makes void  
> tracepoints work again.)
>
>    J

Yep, keeping "void" working is a niceness I would like to keep. So about
this supposed "near-zero function call impact", I decided to take LTTng
for a little test. I compare tracing the "core set" of Google
tracepoints with the tracepoints inline and out-of line. Here is the
result :

tbench test

kernel : 2.6.30-rc1

running on a 8-cores x86_64, localhost server

tracepoints inactive :

2051.20 MB/sec

"google" tracepoints activated, flight recorder mode (overwrite) tracing

inline tracepoints

1704.70 MB/sec (16.9 % slower than baseline)

out-of-line tracepoints

1635.14 MB/sec (20.3 % slower than baseline)

So the overall tracer impact is 20 % bigger just by making the
tracepoints out-of-line. This is going to add up quickly if we add as
much function calls as we currently find in the event tracer fast path,
but LTTng, OTOH, has been designed to minimize the number of such
function calls, and you see a good example of why it's been such an
important design goal above.

About cache-line usage, I agree that in some cases gcc does not seem
intelligent enough to move those code paths away from the fast path.
What we would really whant there is -freorder-blocks-and-partition, but
I doubt we want this for the whole kernel, as it makes some jumps
slightly larger. One thing we should maybe look into is to add some kind
of "very unlikely" builtin expect to gcc that would teach it to really
put the branch in a cache-cold location, no matter what.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-18  6:53           ` Mathieu Desnoyers
@ 2009-04-18 14:16             ` Steven Rostedt
  2009-04-19  3:59               ` Mathieu Desnoyers
  2009-04-19 23:40             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2009-04-18 14:16 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jeremy Fitzhardinge, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


On Sat, 18 Apr 2009, Mathieu Desnoyers wrote:
> 
> tbench test
> 
> kernel : 2.6.30-rc1
> 
> running on a 8-cores x86_64, localhost server
> 
> tracepoints inactive :
> 
> 2051.20 MB/sec

Is this with or without inlined trace points?

It would be interesting to see:

time with tracepoints not configured at all

time with inline tracepoints (inactive)

time with out-of-line tracepoints (inactive)

Because if inline trace points affect the use of normal operations when 
inactive, that would be a cause against trace points all together.

-- Steve

> 
> "google" tracepoints activated, flight recorder mode (overwrite) tracing
> 
> inline tracepoints
> 
> 1704.70 MB/sec (16.9 % slower than baseline)
> 
> out-of-line tracepoints
> 
> 1635.14 MB/sec (20.3 % slower than baseline)
> 
> So the overall tracer impact is 20 % bigger just by making the
> tracepoints out-of-line. This is going to add up quickly if we add as
> much function calls as we currently find in the event tracer fast path,
> but LTTng, OTOH, has been designed to minimize the number of such
> function calls, and you see a good example of why it's been such an
> important design goal above.
> 
> About cache-line usage, I agree that in some cases gcc does not seem
> intelligent enough to move those code paths away from the fast path.
> What we would really whant there is -freorder-blocks-and-partition, but
> I doubt we want this for the whole kernel, as it makes some jumps
> slightly larger. One thing we should maybe look into is to add some kind
> of "very unlikely" builtin expect to gcc that would teach it to really
> put the branch in a cache-cold location, no matter what.
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-18 14:16             ` Steven Rostedt
@ 2009-04-19  3:59               ` Mathieu Desnoyers
  2009-04-19 23:38                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 37+ messages in thread
From: Mathieu Desnoyers @ 2009-04-19  3:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jeremy Fitzhardinge, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, Christoph Hellwig, Andrew Morton

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Sat, 18 Apr 2009, Mathieu Desnoyers wrote:
> > 
> > tbench test
> > 
> > kernel : 2.6.30-rc1
> > 
> > running on a 8-cores x86_64, localhost server
> > 
> > tracepoints inactive :
> > 
> > 2051.20 MB/sec
> 
> Is this with or without inlined trace points?
> 
> It would be interesting to see:
> 
> time with tracepoints not configured at all
> 
> time with inline tracepoints (inactive)
> 
> time with out-of-line tracepoints (inactive)
> 
> Because if inline trace points affect the use of normal operations when 
> inactive, that would be a cause against trace points all together.
> 

I've done a few performance measurements which lead to interesting
conclusions.

Here is the conclusions I gather from the following tbench tests on the LTTng
tree :

- Dormant tracepoints, when sprinkled all over the place, have a very small, but
  measurable, footprint on kernel stress-test workloads (3 % for the
  whole 2.6.30-rc1 LTTng tree).

- "Immediate values" help lessening this impact significantly (3 % -> 2.5 %).

- Static jump patching would diminish impact even more, but would require gcc
  modifications to be acceptable. I did some prototypes using instruction
  pattern matching in the past which was judged too complex.

- I strongly recommend adding per-subsystem config-out option for heavy
  users like kmemtrace or pvops. Compiling-out kmemtrace instrumentation
  brings the performance impact from 2.5 % down to 1.9 % slowdown.

- Putting the tracepoint out-of-line is a no-go, as it slows down *both* the
  dormant (3 % -> 4.7 %) and the active (+20% to tracer overhead) tracepoints
  compared to inline tracepoints.

I also think someone should have a closer look at how GCC handles unlikely()
branches in static inline functions. For some reason, it does not seem to put
code in cache-cold cacheline in some cases. Improving the compiler optimisations
could help in those cases, not only for tracepoints, but for all other affected
facilities (e.g. BUG_ON()).

In conclusion, adding tracepoints should come at least with performance impact
measurements. For instance, the scheduler tracepoints have shown to have been
tested to have no significant performance impact, but it is not the case for
kmemtrace as my tests show. It makes sense, in this case, to put such invasive
tracepoints (kmemtrace, pvops) under a per-subsystem config option.
I would also recommend merging immediate values as a short-term
performance improvement and putting some resources on adding static jump
patching support to gcc.

Tracepoints are cheap, but not completely free.

Disclaimer : using tbench is probably the worse-case scenario we can
think of in terms of using the tracepoint code paths very frequently.
I dont expect more "balanced" workloads to exhibit such measurable
performance impact. Anyone willing to talk about the general performance
impact of tracepoints should _really_ run tests on a wider variety of
workloads.

Linux 2.6.30-rc1, LTTng tree
(includes both mainline tracepoints and LTTng tracepoints)

(tbench 8, in MB/sec)

Tracepoints all compiled-out :

run 1 :                2091.50
run 2 (after reboot) : 2089.50 (baseline)
run 3 (after reboot) : 2083.61

Dormant tracepoints :

inline, no immediate value optimization

run 1 :                1990.63
run 2 (after reboot) : 2025.38 (3 %)
run 3 (after reboot) : 2028.81

out-of-line, no immediate value optimization

run 1 :                1990.66
run 2 (after reboot) : 1990.19 (4.7 %)
run 3 (after reboot) : 1977.79

inline, immediate value optimization

run 1 :                2035.99 (2.5 %)
run 2 (after reboot) : 2036.11
run 3 (after reboot) : 2035.75

inline, immediate value optimization, configuring out kmemtrace tracepoints

run 1 :                2048.08 (1.9 %)
run 2 (after reboot) : 2055.53
run 3 (after reboot) : 2046.49

Mathieu


> -- Steve
> 
> > 
> > "google" tracepoints activated, flight recorder mode (overwrite) tracing
> > 
> > inline tracepoints
> > 
> > 1704.70 MB/sec (16.9 % slower than baseline)
> > 
> > out-of-line tracepoints
> > 
> > 1635.14 MB/sec (20.3 % slower than baseline)
> > 
> > So the overall tracer impact is 20 % bigger just by making the
> > tracepoints out-of-line. This is going to add up quickly if we add as
> > much function calls as we currently find in the event tracer fast path,
> > but LTTng, OTOH, has been designed to minimize the number of such
> > function calls, and you see a good example of why it's been such an
> > important design goal above.
> > 
> > About cache-line usage, I agree that in some cases gcc does not seem
> > intelligent enough to move those code paths away from the fast path.
> > What we would really whant there is -freorder-blocks-and-partition, but
> > I doubt we want this for the whole kernel, as it makes some jumps
> > slightly larger. One thing we should maybe look into is to add some kind
> > of "very unlikely" builtin expect to gcc that would teach it to really
> > put the branch in a cache-cold location, no matter what.
> > 
> > Mathieu
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> > 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-19  3:59               ` Mathieu Desnoyers
@ 2009-04-19 23:38                 ` Jeremy Fitzhardinge
  2009-04-20 21:39                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-19 23:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, Christoph Hellwig, Andrew Morton

Mathieu Desnoyers wrote:
> Here is the conclusions I gather from the following tbench tests on the LTTng
> tree :
>
> - Dormant tracepoints, when sprinkled all over the place, have a very small, but
>   measurable, footprint on kernel stress-test workloads (3 % for the
>   whole 2.6.30-rc1 LTTng tree).
>
> - "Immediate values" help lessening this impact significantly (3 % -> 2.5 %).
>
> - Static jump patching would diminish impact even more, but would require gcc
>   modifications to be acceptable. I did some prototypes using instruction
>   pattern matching in the past which was judged too complex.
>
> - I strongly recommend adding per-subsystem config-out option for heavy
>   users like kmemtrace or pvops. Compiling-out kmemtrace instrumentation
>   brings the performance impact from 2.5 % down to 1.9 % slowdown.
>
> - Putting the tracepoint out-of-line is a no-go, as it slows down *both* the
>   dormant (3 % -> 4.7 %) and the active (+20% to tracer overhead) tracepoints
>   compared to inline tracepoints.
>   

That's an interestingly counter-intuitive result.  Do you have any 
theories how this might happen?  The only mechanism I can think of is 
that, because the inline code sections are smaller, gcc is less inclined 
to put the if(unlikely) code out of line, so the amount of hot-patch 
code is higher.  But still, 1.7% is a massive increase in overhead, 
especially compared to the relative differences of the other changes.

> Tracepoints all compiled-out :
>
> run 1 :                2091.50
> run 2 (after reboot) : 2089.50 (baseline)
> run 3 (after reboot) : 2083.61
>
> Dormant tracepoints :
>
> inline, no immediate value optimization
>
> run 1 :                1990.63
> run 2 (after reboot) : 2025.38 (3 %)
> run 3 (after reboot) : 2028.81
>
> out-of-line, no immediate value optimization
>
> run 1 :                1990.66
> run 2 (after reboot) : 1990.19 (4.7 %)
> run 3 (after reboot) : 1977.79
>
> inline, immediate value optimization
>
> run 1 :                2035.99 (2.5 %)
> run 2 (after reboot) : 2036.11
> run 3 (after reboot) : 2035.75
>
> inline, immediate value optimization, configuring out kmemtrace tracepoints
>
> run 1 :                2048.08 (1.9 %)
> run 2 (after reboot) : 2055.53
> run 3 (after reboot) : 2046.49
>   

So what are you doing here?  Are you doing 3 runs, then comparing he 
median measurement in each case?

The trouble is that your run to run variations are at least as large as 
the difference you're trying to detect.  For example in run 1 of 
"inline, no immediate value optimization" you got 1990.6MB/s throughput, 
and then runs 2 & 3 both went up to ~2025.  Why?  That's a huge jump.

The "out-of-line, no immediate value optimization" runs 1&2 has the same 
throughput as run 1 of the previous test, 1990MB/s, while run 3 is a bit 
worse.  OK, so perhaps its slower.  But why are runs 1&2 more or less 
identical to inline/run1?

What would happen if you happened to do 10 iterations of these tests?  
There just seems like too much run to run variation to make 3 runs 
statistically meaningful.

I'm not picking on you personally, because I had exactly the same 
problems when trying to benchmark the overhead of pvops.  The 
reboot/rerun variations were at least as large as the effects I'm trying 
to measure, and I'm just feeling suspicious of all the results.

I think there's something fundimentally off about about this kind of 
kernel benchmark methodology.  The results are not stable and are not - 
I think - reliable.  Unfortunately I don't have enough of a background 
in statistics to really analyze what's going on here, or how we should 
change the test/measurement methodology to get results that we can 
really stand by.

I don't even have a good explanation for why there are such large 
boot-to-boot variations anyway.  The normal explanation is "cache 
effects", but what is actually changing here?  The kernel image is 
identical, loaded into the same physical pages each time, and mapped 
into the same virtual address.  So the I&D caches and tlb should get 
exactly the same access patterns for the kernel code itself.  The 
dynamically allocated memory is going to vary, and have different cache 
interactions, but is that enough to explain these kinds of variations?  
If so, we're going to need to do a lot more iterations to see any signal 
from our actual changes over the noise that "cache effects" are throwing 
our way...

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-18  6:53           ` Mathieu Desnoyers
  2009-04-18 14:16             ` Steven Rostedt
@ 2009-04-19 23:40             ` Jeremy Fitzhardinge
  2009-04-20 21:47               ` Mathieu Desnoyers
  1 sibling, 1 reply; 37+ messages in thread
From: Jeremy Fitzhardinge @ 2009-04-19 23:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

Mathieu Desnoyers wrote:
> * Jeremy Fitzhardinge (jeremy@goop.org) wrote:
>   
>> Ingo Molnar wrote:
>>     
>>> I meant to suggest to Jeremy to measure the effect of this  
>>> out-of-lining, in terms of instruction count in the hotpath.
>>>   
>>>       
>> OK, here's a comparison for trace_sched_switch, comparing inline and out  
>> of line tracing functions, with CONFIG_PREEMPT enabled:
>>
>> The inline __DO_TRACE version of trace_sched_switch inserts 20  
>> instructions, assembling to 114 bytes of code in the hot path:
>>
>>     
>
> [...]
>
>   
>> __do_trace_sched_switch is a fair bit larger, mostly due to function  
>> preamble frame and reg save/restore, and some unfortunate and  
>> unnecessary register thrashing (why not keep rdi,rsi,rdx where they  
>> are?).  But it isn't that much larger than the inline version: 34  
>> instructions, 118 bytes.  This code will also be shared among all  
>> instances of the tracepoint (not in this case, because sched_switch is  
>> unique, but other tracepoints have multiple users).
>>
>>     
>
> [...]
>
>   
>> So, conclusion: putting the tracepoint code out of line significantly  
>> reduces the hot-path code size at each tracepoint (114 bytes down to 31  
>> in this case, 27% the size).  This should reduce the overhead of having  
>> tracing configured but not enabled.  The saving won't be as large for  
>> tracepoints with fewer arguments or without CONFIG_PREEMPT, but I chose  
>> this example because it is realistic and undeniably a hot path.  And  
>> when doing pvops tracing, 80 new events with hundreds of callsites  
>> around the kernel, this is really going to add up.
>>
>> The tradeoff is that the actual tracing function is a little larger, but  
>> not dramatically so.  I would expect some performance hit when the  
>> tracepoint is actually enabled.  This may be mitigated increased icache  
>> hits when a tracepoint has multiple sites.
>>
>> (BTW, I realized that we don't need to pass &__tracepoint_FOO to  
>> __do_trace_FOO(), since its always going to be the same; this simplifies  
>> the calling convention at the callsite, and it also makes void  
>> tracepoints work again.)
>>
>>    J
>>     
>
> Yep, keeping "void" working is a niceness I would like to keep. So about
> this supposed "near-zero function call impact", I decided to take LTTng
> for a little test. I compare tracing the "core set" of Google
> tracepoints with the tracepoints inline and out-of line. Here is the
> result :
>
> tbench test
>
> kernel : 2.6.30-rc1
>
> running on a 8-cores x86_64, localhost server
>
> tracepoints inactive :
>
> 2051.20 MB/sec
>
> "google" tracepoints activated, flight recorder mode (overwrite) tracing
>
> inline tracepoints
>
> 1704.70 MB/sec (16.9 % slower than baseline)
>
> out-of-line tracepoints
>
> 1635.14 MB/sec (20.3 % slower than baseline)
>
> So the overall tracer impact is 20 % bigger just by making the
> tracepoints out-of-line. This is going to add up quickly if we add as
> much function calls as we currently find in the event tracer fast path,
> but LTTng, OTOH, has been designed to minimize the number of such
> function calls, and you see a good example of why it's been such an
> important design goal above.
>   

Yes, that is a surprising amount.  I fully expect to see some amount of 
cost associated with it, but 20% is surprising.

However, for me, the performance question is secondary.  The main issue 
is that I can't use the tracing infrastructure with inline tracepoints 
because of the #include dependency issue.  I think this may be solvable 
in the long term by restructuring all the headers to make it work out, 
but I'm still concerned that the result will be brittle and hard to 
maintain.

Putting the tracing code out of line and making tracepoint.h have 
trivial #include dependencies is "obviously correct" from a maintenance 
and correctness point of view, and will remain robust in the face of 
ongoing changes.  It allows the tracing machinery to be used in the 
deepest, darkest corners of the kernel without having to worry about new 
header interactions.

But I completely understand your concerns about performance.  Existing 
tracepoints which have no problem with the existing header dependencies 
are, apparently, faster with the inline tracing code.  And there's no 
real reason to penalize them.

So the compromise is this: we add (yet another) #ifdef so that a 
particular set of tracepoints can be emitted with either inline or 
out-of-line code, by defining two variants of __DO_TRACE: one inline, 
and one out of line?  trace/events/pvops.h could select the out of line 
variant, and everyone else could leave it as-is.  I don't think that 
would add very much additional complexity to tracepoint.h, but it means 
we can both get the outcomes we need.

(I haven't tried to prototype this, so maybe it all falls apart in the 
details, but I'll give it a go tomorrow.)

> About cache-line usage, I agree that in some cases gcc does not seem
> intelligent enough to move those code paths away from the fast path.
> What we would really whant there is -freorder-blocks-and-partition, but
> I doubt we want this for the whole kernel, as it makes some jumps
> slightly larger. One thing we should maybe look into is to add some kind
> of "very unlikely" builtin expect to gcc that would teach it to really
> put the branch in a cache-cold location, no matter what.
>   

I wonder if -fno-guess-branch-probabilities would help?  The 
documentation says that the interaction between __builtin_expect() and 
its normal branch probability heuristics is complex, and it might be 
interfereing here.  But I think that in the general case we don't want 
to either 1) require external branch probability data, or 2) annotate 
every single branch with an expect, so we want gcc to be trying to guess 
for us.  The __cold annotation on functions is more useful anyway, I 
think (so gcc knows that any code path which results in calling a cold 
function is unlikely, without needing explicit annotations on every 
conditional).

    J

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-19 23:38                 ` Jeremy Fitzhardinge
@ 2009-04-20 21:39                   ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2009-04-20 21:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, Ingo Molnar, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, Christoph Hellwig, Andrew Morton

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> Mathieu Desnoyers wrote:
>> Here is the conclusions I gather from the following tbench tests on the LTTng
>> tree :
>>
>> - Dormant tracepoints, when sprinkled all over the place, have a very small, but
>>   measurable, footprint on kernel stress-test workloads (3 % for the
>>   whole 2.6.30-rc1 LTTng tree).
>>
>> - "Immediate values" help lessening this impact significantly (3 % -> 2.5 %).
>>
>> - Static jump patching would diminish impact even more, but would require gcc
>>   modifications to be acceptable. I did some prototypes using instruction
>>   pattern matching in the past which was judged too complex.
>>
>> - I strongly recommend adding per-subsystem config-out option for heavy
>>   users like kmemtrace or pvops. Compiling-out kmemtrace instrumentation
>>   brings the performance impact from 2.5 % down to 1.9 % slowdown.
>>
>> - Putting the tracepoint out-of-line is a no-go, as it slows down *both* the
>>   dormant (3 % -> 4.7 %) and the active (+20% to tracer overhead) tracepoints
>>   compared to inline tracepoints.
>>   
>
> That's an interestingly counter-intuitive result.  Do you have any  
> theories how this might happen?  The only mechanism I can think of is  
> that, because the inline code sections are smaller, gcc is less inclined  
> to put the if(unlikely) code out of line, so the amount of hot-patch  
> code is higher.  But still, 1.7% is a massive increase in overhead,  
> especially compared to the relative differences of the other changes.
>

Hrm, there is an approximation I've done in my test code to minimize the
development time, and it might explain it. I have simplistically changed the

static inline
for
static noinline

in DECLARE_TRACE(), and have not modified DEFINE_TRACE. Therefore,
some duplicated instances of the function are defined. We should clearly
re-do those tests with your approach of extern prototype in the
DECLARE_TRACE and add proto and args arguments to DEFINE_TRACE, where
the callback would be declared. I'd be very interested to see the
result. For a limited instrumentation modification, one could
concentrate on kmemtrace instrumentation, given I've shown that cover
enough sites that its performance impact, under tbench, seems to be
consistently perceivable.

However I have very limited time on my hands, and I won't be able to do
the modification required to test this in the LTTng setup applied to all
the instrumentation. I also don't have the hardware and cpu time to
perform the 10 runs of each you are talking about, given that the 3 runs
already monopolized my development machine for way too long.

Mathieu, who really has to focus back on his ph.d. thesis :/

>> Tracepoints all compiled-out :
>>
>> run 1 :                2091.50
>> run 2 (after reboot) : 2089.50 (baseline)
>> run 3 (after reboot) : 2083.61
>>
>> Dormant tracepoints :
>>
>> inline, no immediate value optimization
>>
>> run 1 :                1990.63
>> run 2 (after reboot) : 2025.38 (3 %)
>> run 3 (after reboot) : 2028.81
>>
>> out-of-line, no immediate value optimization
>>
>> run 1 :                1990.66
>> run 2 (after reboot) : 1990.19 (4.7 %)
>> run 3 (after reboot) : 1977.79
>>
>> inline, immediate value optimization
>>
>> run 1 :                2035.99 (2.5 %)
>> run 2 (after reboot) : 2036.11
>> run 3 (after reboot) : 2035.75
>>
>> inline, immediate value optimization, configuring out kmemtrace tracepoints
>>
>> run 1 :                2048.08 (1.9 %)
>> run 2 (after reboot) : 2055.53
>> run 3 (after reboot) : 2046.49
>>   
>
> So what are you doing here?  Are you doing 3 runs, then comparing he  
> median measurement in each case?
>
> The trouble is that your run to run variations are at least as large as  
> the difference you're trying to detect.  For example in run 1 of  
> "inline, no immediate value optimization" you got 1990.6MB/s throughput,  
> and then runs 2 & 3 both went up to ~2025.  Why?  That's a huge jump.
>
> The "out-of-line, no immediate value optimization" runs 1&2 has the same  
> throughput as run 1 of the previous test, 1990MB/s, while run 3 is a bit  
> worse.  OK, so perhaps its slower.  But why are runs 1&2 more or less  
> identical to inline/run1?
>
> What would happen if you happened to do 10 iterations of these tests?   
> There just seems like too much run to run variation to make 3 runs  
> statistically meaningful.
>
> I'm not picking on you personally, because I had exactly the same  
> problems when trying to benchmark the overhead of pvops.  The  
> reboot/rerun variations were at least as large as the effects I'm trying  
> to measure, and I'm just feeling suspicious of all the results.
>
> I think there's something fundimentally off about about this kind of  
> kernel benchmark methodology.  The results are not stable and are not -  
> I think - reliable.  Unfortunately I don't have enough of a background  
> in statistics to really analyze what's going on here, or how we should  
> change the test/measurement methodology to get results that we can  
> really stand by.
>
> I don't even have a good explanation for why there are such large  
> boot-to-boot variations anyway.  The normal explanation is "cache  
> effects", but what is actually changing here?  The kernel image is  
> identical, loaded into the same physical pages each time, and mapped  
> into the same virtual address.  So the I&D caches and tlb should get  
> exactly the same access patterns for the kernel code itself.  The  
> dynamically allocated memory is going to vary, and have different cache  
> interactions, but is that enough to explain these kinds of variations?   
> If so, we're going to need to do a lot more iterations to see any signal  
> from our actual changes over the noise that "cache effects" are throwing  
> our way...
>
>    J

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
  2009-04-19 23:40             ` Jeremy Fitzhardinge
@ 2009-04-20 21:47               ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2009-04-20 21:47 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Steven Rostedt, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> Mathieu Desnoyers wrote:
>> * Jeremy Fitzhardinge (jeremy@goop.org) wrote:
>>   
>>> Ingo Molnar wrote:
>>>     
>>>> I meant to suggest to Jeremy to measure the effect of this   
>>>> out-of-lining, in terms of instruction count in the hotpath.
>>>>         
>>> OK, here's a comparison for trace_sched_switch, comparing inline and 
>>> out  of line tracing functions, with CONFIG_PREEMPT enabled:
>>>
>>> The inline __DO_TRACE version of trace_sched_switch inserts 20   
>>> instructions, assembling to 114 bytes of code in the hot path:
>>>
>>>     
>>
>> [...]
>>
>>   
>>> __do_trace_sched_switch is a fair bit larger, mostly due to function  
>>> preamble frame and reg save/restore, and some unfortunate and   
>>> unnecessary register thrashing (why not keep rdi,rsi,rdx where they   
>>> are?).  But it isn't that much larger than the inline version: 34   
>>> instructions, 118 bytes.  This code will also be shared among all   
>>> instances of the tracepoint (not in this case, because sched_switch 
>>> is  unique, but other tracepoints have multiple users).
>>>
>>>     
>>
>> [...]
>>
>>   
>>> So, conclusion: putting the tracepoint code out of line significantly 
>>>  reduces the hot-path code size at each tracepoint (114 bytes down to 
>>> 31  in this case, 27% the size).  This should reduce the overhead of 
>>> having  tracing configured but not enabled.  The saving won't be as 
>>> large for  tracepoints with fewer arguments or without 
>>> CONFIG_PREEMPT, but I chose  this example because it is realistic and 
>>> undeniably a hot path.  And  when doing pvops tracing, 80 new events 
>>> with hundreds of callsites  around the kernel, this is really going 
>>> to add up.
>>>
>>> The tradeoff is that the actual tracing function is a little larger, 
>>> but  not dramatically so.  I would expect some performance hit when 
>>> the  tracepoint is actually enabled.  This may be mitigated increased 
>>> icache  hits when a tracepoint has multiple sites.
>>>
>>> (BTW, I realized that we don't need to pass &__tracepoint_FOO to   
>>> __do_trace_FOO(), since its always going to be the same; this 
>>> simplifies  the calling convention at the callsite, and it also makes 
>>> void  tracepoints work again.)
>>>
>>>    J
>>>     
>>
>> Yep, keeping "void" working is a niceness I would like to keep. So about
>> this supposed "near-zero function call impact", I decided to take LTTng
>> for a little test. I compare tracing the "core set" of Google
>> tracepoints with the tracepoints inline and out-of line. Here is the
>> result :
>>
>> tbench test
>>
>> kernel : 2.6.30-rc1
>>
>> running on a 8-cores x86_64, localhost server
>>
>> tracepoints inactive :
>>
>> 2051.20 MB/sec
>>
>> "google" tracepoints activated, flight recorder mode (overwrite) tracing
>>
>> inline tracepoints
>>
>> 1704.70 MB/sec (16.9 % slower than baseline)
>>
>> out-of-line tracepoints
>>
>> 1635.14 MB/sec (20.3 % slower than baseline)
>>
>> So the overall tracer impact is 20 % bigger just by making the
>> tracepoints out-of-line. This is going to add up quickly if we add as
>> much function calls as we currently find in the event tracer fast path,
>> but LTTng, OTOH, has been designed to minimize the number of such
>> function calls, and you see a good example of why it's been such an
>> important design goal above.
>>   
>
> Yes, that is a surprising amount.  I fully expect to see some amount of  
> cost associated with it, but 20% is surprising.
>
> However, for me, the performance question is secondary.  The main issue  
> is that I can't use the tracing infrastructure with inline tracepoints  
> because of the #include dependency issue.  I think this may be solvable  
> in the long term by restructuring all the headers to make it work out,  
> but I'm still concerned that the result will be brittle and hard to  
> maintain.

Hrm, I'm not that sure about that. Once we get the thread_info headers
right, we just have to provide the build tests required so the
kernel autobuilders detect any inconsistency due to circular header
addition. If this is not enough, we can also add a comment at the
beginning of thread_info.h and other required headers saying that only
type headers should be included.

>
> Putting the tracing code out of line and making tracepoint.h have  
> trivial #include dependencies is "obviously correct" from a maintenance  
> and correctness point of view, and will remain robust in the face of  
> ongoing changes.  It allows the tracing machinery to be used in the  
> deepest, darkest corners of the kernel without having to worry about new  
> header interactions.

The other way around is to do a tiny, self-contained, "trace-rcu"
implementation specially for tracing. :-D But I really doubt there is
any gain in not using something as simple as the thread_info structure
and the rcu read-side, really.

>
> But I completely understand your concerns about performance.  Existing  
> tracepoints which have no problem with the existing header dependencies  
> are, apparently, faster with the inline tracing code.  And there's no  
> real reason to penalize them.
>
> So the compromise is this: we add (yet another) #ifdef so that a  
> particular set of tracepoints can be emitted with either inline or  
> out-of-line code, by defining two variants of __DO_TRACE: one inline,  
> and one out of line?  trace/events/pvops.h could select the out of line  
> variant, and everyone else could leave it as-is.  I don't think that  
> would add very much additional complexity to tracepoint.h, but it means  
> we can both get the outcomes we need.
>
> (I haven't tried to prototype this, so maybe it all falls apart in the  
> details, but I'll give it a go tomorrow.)
>

Hrm, my opinion is that _this_ would be a worse maintenance burden (two
tracepoint flavors instead of one) than trying to keep already thin
headers as thin as they should be. This might also bring very
interesting corner-cases with __builtin_return_address and friends in
the tracepoint probes.

Mathieu

>> About cache-line usage, I agree that in some cases gcc does not seem
>> intelligent enough to move those code paths away from the fast path.
>> What we would really whant there is -freorder-blocks-and-partition, but
>> I doubt we want this for the whole kernel, as it makes some jumps
>> slightly larger. One thing we should maybe look into is to add some kind
>> of "very unlikely" builtin expect to gcc that would teach it to really
>> put the branch in a cache-cold location, no matter what.
>>   
>
> I wonder if -fno-guess-branch-probabilities would help?  The  
> documentation says that the interaction between __builtin_expect() and  
> its normal branch probability heuristics is complex, and it might be  
> interfereing here.  But I think that in the general case we don't want  
> to either 1) require external branch probability data, or 2) annotate  
> every single branch with an expect, so we want gcc to be trying to guess  
> for us.  The __cold annotation on functions is more useful anyway, I  
> think (so gcc knows that any code path which results in calling a cold  
> function is unlikely, without needing explicit annotations on every  
> conditional).
>
>    J

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2009-04-20 21:47 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-17  6:35 [PATCH] tracing WIP patches Jeremy Fitzhardinge
2009-04-17  6:35 ` [PATCH 1/4] tracing: move __DO_TRACE out of line Jeremy Fitzhardinge
2009-04-17 15:46   ` Ingo Molnar
2009-04-17 16:10     ` Mathieu Desnoyers
2009-04-17 16:23       ` Ingo Molnar
2009-04-17 16:47         ` Jeremy Fitzhardinge
2009-04-17 19:31         ` Jeremy Fitzhardinge
2009-04-17 19:46           ` Ingo Molnar
2009-04-17 19:57             ` Steven Rostedt
2009-04-17 19:58             ` Jeremy Fitzhardinge
2009-04-17 20:06               ` Steven Rostedt
2009-04-18  6:53           ` Mathieu Desnoyers
2009-04-18 14:16             ` Steven Rostedt
2009-04-19  3:59               ` Mathieu Desnoyers
2009-04-19 23:38                 ` Jeremy Fitzhardinge
2009-04-20 21:39                   ` Mathieu Desnoyers
2009-04-19 23:40             ` Jeremy Fitzhardinge
2009-04-20 21:47               ` Mathieu Desnoyers
2009-04-17  6:35 ` [PATCH 2/4] x86/pvops: target CREATE_TRACE_POINTS to particular subsystems Jeremy Fitzhardinge
2009-04-17 15:55   ` Steven Rostedt
2009-04-17 16:14     ` Jeremy Fitzhardinge
2009-04-17 16:32       ` Steven Rostedt
2009-04-17 16:48         ` Jeremy Fitzhardinge
2009-04-17 16:57           ` Steven Rostedt
2009-04-17 17:14             ` Jeremy Fitzhardinge
2009-04-17 17:33               ` Steven Rostedt
2009-04-17 18:11                 ` Jeremy Fitzhardinge
2009-04-17  6:35 ` [PATCH 3/4] tracing: pass proto and args to DEFINE_TRACE Jeremy Fitzhardinge
2009-04-17  6:48   ` Christoph Hellwig
2009-04-17  6:58     ` Jeremy Fitzhardinge
2009-04-17  7:05       ` Christoph Hellwig
2009-04-17 12:53         ` Ingo Molnar
2009-04-17 15:21     ` Mathieu Desnoyers
2009-04-17  6:35 ` [PATCH 4/4] tracing: avoid warnings from zero-arg tracepoints Jeremy Fitzhardinge
2009-04-17 15:53   ` Steven Rostedt
2009-04-17 15:53   ` Ingo Molnar
2009-04-17 16:10   ` [tip:tracing/core] " tip-bot for Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.