linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/17] Tracepoints v4 for linux-next
@ 2008-07-15 22:26 Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 01/17] RCU read sched Mathieu Desnoyers
                   ` (18 more replies)
  0 siblings, 19 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu

Hi,

Here is the newest release of the Tracepoints, following the feedback from Peter
Zijlstra. The main change is the creation of include/trace/ as a placeholder
from tracepoint headers.

The patchset applies over linux-next patch-v2.6.26-next-20080715 in this order :

#This a separate RCU update upon which the tracepoints depend
rcu-read-sched.patch

tracepoints.patch
tracepoints-documentation.patch
tracepoints-samples.patch

lttng-instrumentation-irq.patch
lttng-instrumentation-scheduler.patch
lttng-instrumentation-timer.patch
lttng-instrumentation-kernel.patch

lttng-instrumentation-filemap.patch
lttng-instrumentation-swap.patch
lttng-instrumentation-memory.patch
lttng-instrumentation-page.patch
lttng-instrumentation-hugetlb.patch

lttng-instrumentation-net.patch
lttng-instrumentation-ipv4.patch
lttng-instrumentation-ipv6.patch

ftrace-port-to-tracepoints.patch

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 01/17] RCU read sched
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-08-01 21:10   ` Paul E. McKenney
  2008-07-15 22:26 ` [patch 02/17] Kernel Tracepoints Mathieu Desnoyers
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Paul E McKenney

[-- Attachment #1: rcu-read-sched.patch --]
[-- Type: text/plain, Size: 1866 bytes --]

Add rcu_read_lock_sched() and rcu_read_unlock_sched() to rcupdate.h to match the
recently added write-side call_rcu_sched() and rcu_barrier_sched(). They also
match the no-so-recently-added synchronize_sched().

It will help following matching use of the update/read lock primitives. Those
new read lock will replace preempt_disable()/enable() used in pair with
RCU-classic synchronization.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E McKenney <paulmck@linux.vnet.ibm.com>
CC: akpm@linux-foundation.org
---
 include/linux/rcupdate.h |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6-lttng/include/linux/rcupdate.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/rcupdate.h	2008-07-15 15:28:08.000000000 -0400
+++ linux-2.6-lttng/include/linux/rcupdate.h	2008-07-15 17:38:02.000000000 -0400
@@ -133,6 +133,24 @@ struct rcu_head {
 #define rcu_read_unlock_bh() __rcu_read_unlock_bh()
 
 /**
+ * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
+ *
+ * Should be used with either
+ * - synchronize_sched()
+ * or
+ * - call_rcu_sched() and rcu_barrier_sched()
+ * on the write-side to insure proper synchronization.
+ */
+#define rcu_read_lock_sched() preempt_disable()
+
+/*
+ * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
+ *
+ * See rcu_read_lock_sched for more information.
+ */
+#define rcu_read_unlock_sched() preempt_enable()
+
+/**
  * rcu_dereference - fetch an RCU-protected pointer in an
  * RCU read-side critical section.  This pointer may later
  * be safely dereferenced.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 02/17] Kernel Tracepoints
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 01/17] RCU read sched Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-24 15:08   ` Steven Rostedt
                     ` (2 more replies)
  2008-07-15 22:26 ` [patch 03/17] Tracepoints Documentation Mathieu Desnoyers
                   ` (16 subsequent siblings)
  18 siblings, 3 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Alexander Viro, Eduard - Gabriel Munteanu

[-- Attachment #1: tracepoints.patch --]
[-- Type: text/plain, Size: 33854 bytes --]

Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers.
Allows complete typing verification by declaring both tracing statement inline
functions and probe registration/unregistration static inline functions within
the same macro "DEFINE_TRACE". No format string is required. See the
tracepoint Documentation and Samples patches for usage examples.

Taken from the documentation patch :

"A tracepoint placed in code provides a hook to call a function (probe) that you
can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
"off" (no probe is attached). When a tracepoint is "off" it has no effect,
except for adding a tiny time penalty (checking a condition for a branch) and
space penalty (adding a few bytes for the function call at the end of the
instrumented function and adds a data structure in a separate section).  When a
tracepoint is "on", the function you provide is called each time the tracepoint
is executed, in the execution context of the caller. When the function provided
ends its execution, it returns to the caller (continuing from the tracepoint
site).

You can put tracepoints at important locations in the code. They are lightweight
hooks that can pass an arbitrary number of parameters, which prototypes are
described in a tracepoint declaration placed in a header file."

Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".

We make sure the previous array containing probes, which has been scheduled for
deletion by the rcu callback, is indeed freed before we proceed to the next
update. It therefore limits the rate of modification of a single tracepoint to
one update per RCU period. The objective here is to permit fast batch
add/removal of probes on _different_ tracepoints.

Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
  tracepoint table. This will make sure not type mismatch happens due to
  connexion of a probe with the wrong type to a tracepoint declared with
  the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.

Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.

Performance impact of a tracepoint : same as markers, except that it adds about
70 bytes of instructions in an unlikely branch of each instrumented function
(the for loop, the stack setup and the function call). It currently adds a
memory read, a test and a conditional branch at the instrumentation site (in the
hot path). Immediate values will eventually change this into a load immediate,
test and branch, which removes the memory read which will make the i-cache
impact smaller (changing the memory read for a load immediate removes 3-4 bytes
per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also
saves the d-cache hit).

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.


Quoting Hideo Aoki about Markers :

I evaluated overhead of kernel marker using linux-2.6-sched-fixes
git tree, which includes several markers for LTTng, using an ia64
server.

While the immediate trace mark feature isn't implemented on ia64,
there is no major performance regression. So, I think that we 
don't have any issues to propose merging marker point patches 
into Linus's tree from the viewpoint of performance impact.

I prepared two kernels to evaluate. The first one was compiled
without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

I ran hackbench 5 times in each condition and calculated the
average and difference between the kernels.  

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

Below is the results. As you can see, major performance
regression wasn't found in any case. Even if number of processes
increases, differences between marker-enabled kernel and marker-
disabled kernel doesn't increase. Moreover, if number of CPUs 
increases, the differences doesn't increase either.

Curiously, marker-enabled kernel is better than marker-disabled
kernel in more than half cases, although I guess it comes from
the difference of memory access pattern.


* 2 CPUs 

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
      100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
      150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
      200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
      250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
      300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
      350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
      400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
      450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
      500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
      550 |     58.401   |      57.700  |  -0.701  |  -1.2   | 
      600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
      650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
      700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
      750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
      800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
--------------------------------------------------------------

* 4 CPUs 

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
      100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
      150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
      200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
      250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
      300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
      350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
      400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
      450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
      500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
      550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
      600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
      650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
      700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
      750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
      800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
--------------------------------------------------------------

* 8 CPUs 

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
      100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
      150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
      200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
      250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
      300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
      350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
      400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
      450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
      500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
      550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
      600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
      650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
      700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
      750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
      800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
--------------------------------------------------------------

Best regards,
Hideo


P.S. When I compiled the linux-2.6-sched-fixes tree on ia64, I
had to revert the following git commit since pteval_t is defined
on x86 only.

commit 8686f2b37e7394b51dd6593678cbfd85ecd28c65
Date:   Tue May 6 15:42:40 2008 -0700

    generic, x86, PAT: fix mprotect


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/asm-generic/vmlinux.lds.h |    6 
 include/linux/module.h            |   17 +
 include/linux/tracepoint.h        |  127 ++++++++++
 init/Kconfig                      |    7 
 kernel/Makefile                   |    1 
 kernel/module.c                   |   66 +++++
 kernel/tracepoint.c               |  473 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 695 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2008-07-15 17:34:42.000000000 -0400
+++ linux-2.6-lttng/init/Kconfig	2008-07-15 17:35:00.000000000 -0400
@@ -782,6 +782,13 @@ config PROFILING
 	  Say Y here to enable the extended profiling support mechanisms used
 	  by profilers such as OProfile.
 
+config TRACEPOINTS
+	bool "Activate tracepoints"
+	default y
+	help
+	  Place an empty function call at each tracepoint site. Can be
+	  dynamically changed for a probe function.
+
 config MARKERS
 	bool "Activate markers"
 	help
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2008-07-15 17:34:42.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2008-07-15 17:35:00.000000000 -0400
@@ -77,6 +77,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_MARKERS) += marker.o
+obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
 obj-$(CONFIG_LATENCYTOP) += latencytop.o
 obj-$(CONFIG_FTRACE) += trace/
 obj-$(CONFIG_TRACING) += trace/
Index: linux-2.6-lttng/include/linux/tracepoint.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/tracepoint.h	2008-07-15 17:35:19.000000000 -0400
@@ -0,0 +1,127 @@
+#ifndef _LINUX_TRACEPOINT_H
+#define _LINUX_TRACEPOINT_H
+
+/*
+ * Kernel Tracepoint API.
+ *
+ * See Documentation/tracepoint.txt.
+ *
+ * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * Heavily inspired from the Linux Kernel Markers.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/types.h>
+#include <linux/rcupdate.h>
+
+struct module;
+struct tracepoint;
+
+struct tracepoint {
+	const char *name;		/* Tracepoint name */
+	int state;			/* State. */
+	void **funcs;
+} __attribute__((aligned(8)));
+
+
+#define TPPROTO(args...)	args
+#define TPARGS(args...)		args
+
+#ifdef CONFIG_TRACEPOINTS
+
+/*
+ * it_func[0] is never NULL because there is at least one element in the array
+ * when the array itself is non NULL.
+ */
+#define __DO_TRACE(tp, proto, args)					\
+	do {								\
+		void **it_func;						\
+									\
+		rcu_read_lock_sched();					\
+		it_func = rcu_dereference((tp)->funcs);			\
+		if (it_func) {						\
+			do {						\
+				((void(*)(proto))(*it_func))(args);	\
+			} while (*(++it_func));				\
+		}							\
+		rcu_read_unlock_sched();				\
+	} while (0)
+
+/*
+ * Make sure the alignment of the structure in the __tracepoints section will
+ * not add unwanted padding between the beginning of the section and the
+ * structure. Force alignment to the same alignment as the section start.
+ */
+#define DEFINE_TRACE(name, proto, args)					\
+	static inline void trace_##name(proto)				\
+	{								\
+		static const char __tpstrtab_##name[]			\
+		__attribute__((section("__tracepoints_strings")))	\
+		= #name ":" #proto;					\
+		static struct tracepoint __tracepoint_##name		\
+		__attribute__((section("__tracepoints"), aligned(8))) =	\
+		{ __tpstrtab_##name, 0, NULL };				\
+		if (unlikely(__tracepoint_##name.state))		\
+			__DO_TRACE(&__tracepoint_##name,		\
+				TPPROTO(proto), TPARGS(args));		\
+	}								\
+	static inline int register_trace_##name(void (*probe)(proto))	\
+	{								\
+		return tracepoint_probe_register(#name ":" #proto,	\
+			(void *)probe);					\
+	}								\
+	static inline void unregister_trace_##name(void (*probe)(proto))\
+	{								\
+		tracepoint_probe_unregister(#name ":" #proto,		\
+			(void *)probe);					\
+	}
+
+extern void tracepoint_update_probe_range(struct tracepoint *begin,
+	struct tracepoint *end);
+
+#else /* !CONFIG_TRACEPOINTS */
+#define DEFINE_TRACE(name, proto, args)			\
+	static inline void _do_trace_##name(struct tracepoint *tp, proto) \
+	{ }								\
+	static inline void trace_##name(proto)				\
+	{ }								\
+	static inline int register_trace_##name(void (*probe)(proto))	\
+	{								\
+		return -ENOSYS;						\
+	}								\
+	static inline void unregister_trace_##name(void (*probe)(proto))\
+	{ }
+
+static inline void tracepoint_update_probe_range(struct tracepoint *begin,
+	struct tracepoint *end)
+{ }
+#endif /* CONFIG_TRACEPOINTS */
+
+/*
+ * Connect a probe to a tracepoint.
+ * Internal API, should not be used directly.
+ */
+extern int tracepoint_probe_register(const char *name, void *probe);
+
+/*
+ * Disconnect a probe from a tracepoint.
+ * Internal API, should not be used directly.
+ */
+extern int tracepoint_probe_unregister(const char *name, void *probe);
+
+struct tracepoint_iter {
+	struct module *module;
+	struct tracepoint *tracepoint;
+};
+
+extern void tracepoint_iter_start(struct tracepoint_iter *iter);
+extern void tracepoint_iter_next(struct tracepoint_iter *iter);
+extern void tracepoint_iter_stop(struct tracepoint_iter *iter);
+extern void tracepoint_iter_reset(struct tracepoint_iter *iter);
+extern int tracepoint_get_iter_range(struct tracepoint **tracepoint,
+	struct tracepoint *begin, struct tracepoint *end);
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2008-07-15 17:34:42.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2008-07-15 17:35:00.000000000 -0400
@@ -52,7 +52,10 @@
 	. = ALIGN(8);							\
 	VMLINUX_SYMBOL(__start___markers) = .;				\
 	*(__markers)							\
-	VMLINUX_SYMBOL(__stop___markers) = .;
+	VMLINUX_SYMBOL(__stop___markers) = .;				\
+	VMLINUX_SYMBOL(__start___tracepoints) = .;			\
+	*(__tracepoints)						\
+	VMLINUX_SYMBOL(__stop___tracepoints) = .;
 
 #define RO_DATA(align)							\
 	. = ALIGN((align));						\
@@ -61,6 +64,7 @@
 		*(.rodata) *(.rodata.*)					\
 		*(__vermagic)		/* Kernel version magic */	\
 		*(__markers_strings)	/* Markers: strings */		\
+		*(__tracepoints_strings)/* Tracepoints: strings */	\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\
Index: linux-2.6-lttng/kernel/tracepoint.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/tracepoint.c	2008-07-15 17:35:00.000000000 -0400
@@ -0,0 +1,473 @@
+/*
+ * Copyright (C) 2008 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/jhash.h>
+#include <linux/list.h>
+#include <linux/rcupdate.h>
+#include <linux/tracepoint.h>
+#include <linux/err.h>
+#include <linux/slab.h>
+
+extern struct tracepoint __start___tracepoints[];
+extern struct tracepoint __stop___tracepoints[];
+
+/* Set to 1 to enable tracepoint debug output */
+static const int tracepoint_debug;
+
+/*
+ * tracepoints_mutex nests inside module_mutex. Tracepoints mutex protects the
+ * builtin and module tracepoints and the hash table.
+ */
+static DEFINE_MUTEX(tracepoints_mutex);
+
+/*
+ * Tracepoint hash table, containing the active tracepoints.
+ * Protected by tracepoints_mutex.
+ */
+#define TRACEPOINT_HASH_BITS 6
+#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
+
+/*
+ * Note about RCU :
+ * It is used to to delay the free of multiple probes array until a quiescent
+ * state is reached.
+ * Tracepoint entries modifications are protected by the tracepoints_mutex.
+ */
+struct tracepoint_entry {
+	struct hlist_node hlist;
+	void **funcs;
+	int refcount;	/* Number of times armed. 0 if disarmed. */
+	struct rcu_head rcu;
+	void *oldptr;
+	unsigned char rcu_pending:1;
+	char name[0];
+};
+
+static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
+
+static void free_old_closure(struct rcu_head *head)
+{
+	struct tracepoint_entry *entry = container_of(head,
+		struct tracepoint_entry, rcu);
+	kfree(entry->oldptr);
+	/* Make sure we free the data before setting the pending flag to 0 */
+	smp_wmb();
+	entry->rcu_pending = 0;
+}
+
+static void tracepoint_entry_free_old(struct tracepoint_entry *entry, void *old)
+{
+	if (!old)
+		return;
+	entry->oldptr = old;
+	entry->rcu_pending = 1;
+	/* write rcu_pending before calling the RCU callback */
+	smp_wmb();
+	call_rcu_sched(&entry->rcu, free_old_closure);
+}
+
+static void debug_print_probes(struct tracepoint_entry *entry)
+{
+	int i;
+
+	if (!tracepoint_debug)
+		return;
+
+	for (i = 0; entry->funcs[i]; i++)
+		printk(KERN_DEBUG "Probe %d : %p\n", i, entry->funcs[i]);
+}
+
+static void *
+tracepoint_entry_add_probe(struct tracepoint_entry *entry, void *probe)
+{
+	int nr_probes = 0;
+	void **old, **new;
+
+	WARN_ON(!probe);
+
+	debug_print_probes(entry);
+	old = entry->funcs;
+	if (old) {
+		/* (N -> N+1), (N != 0, 1) probes */
+		for (nr_probes = 0; old[nr_probes]; nr_probes++)
+			if (old[nr_probes] == probe)
+				return ERR_PTR(-EEXIST);
+	}
+	/* + 2 : one for new probe, one for NULL func */
+	new = kzalloc((nr_probes + 2) * sizeof(void *), GFP_KERNEL);
+	if (new == NULL)
+		return ERR_PTR(-ENOMEM);
+	if (old)
+		memcpy(new, old, nr_probes * sizeof(void *));
+	new[nr_probes] = probe;
+	entry->refcount = nr_probes + 1;
+	entry->funcs = new;
+	debug_print_probes(entry);
+	return old;
+}
+
+static void *
+tracepoint_entry_remove_probe(struct tracepoint_entry *entry, void *probe)
+{
+	int nr_probes = 0, nr_del = 0, i;
+	void **old, **new;
+
+	old = entry->funcs;
+
+	debug_print_probes(entry);
+	/* (N -> M), (N > 1, M >= 0) probes */
+	for (nr_probes = 0; old[nr_probes]; nr_probes++) {
+		if ((!probe || old[nr_probes] == probe))
+			nr_del++;
+	}
+
+	if (nr_probes - nr_del == 0) {
+		/* N -> 0, (N > 1) */
+		entry->funcs = NULL;
+		entry->refcount = 0;
+		debug_print_probes(entry);
+		return old;
+	} else {
+		int j = 0;
+		/* N -> M, (N > 1, M > 0) */
+		/* + 1 for NULL */
+		new = kzalloc((nr_probes - nr_del + 1)
+			* sizeof(void *), GFP_KERNEL);
+		if (new == NULL)
+			return ERR_PTR(-ENOMEM);
+		for (i = 0; old[i]; i++)
+			if ((probe && old[i] != probe))
+				new[j++] = old[i];
+		entry->refcount = nr_probes - nr_del;
+		entry->funcs = new;
+	}
+	debug_print_probes(entry);
+	return old;
+}
+
+/*
+ * Get tracepoint if the tracepoint is present in the tracepoint hash table.
+ * Must be called with tracepoints_mutex held.
+ * Returns NULL if not present.
+ */
+static struct tracepoint_entry *get_tracepoint(const char *name)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct tracepoint_entry *e;
+	u32 hash = jhash(name, strlen(name), 0);
+
+	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
+	hlist_for_each_entry(e, node, head, hlist) {
+		if (!strcmp(name, e->name))
+			return e;
+	}
+	return NULL;
+}
+
+/*
+ * Add the tracepoint to the tracepoint hash table. Must be called with
+ * tracepoints_mutex held.
+ */
+static struct tracepoint_entry *add_tracepoint(const char *name)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct tracepoint_entry *e;
+	size_t name_len = strlen(name) + 1;
+	u32 hash = jhash(name, name_len-1, 0);
+
+	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
+	hlist_for_each_entry(e, node, head, hlist) {
+		if (!strcmp(name, e->name)) {
+			printk(KERN_NOTICE
+				"tracepoint %s busy\n", name);
+			return ERR_PTR(-EEXIST);	/* Already there */
+		}
+	}
+	/*
+	 * Using kmalloc here to allocate a variable length element. Could
+	 * cause some memory fragmentation if overused.
+	 */
+	e = kmalloc(sizeof(struct tracepoint_entry) + name_len, GFP_KERNEL);
+	if (!e)
+		return ERR_PTR(-ENOMEM);
+	memcpy(&e->name[0], name, name_len);
+	e->funcs = NULL;
+	e->refcount = 0;
+	e->rcu_pending = 0;
+	hlist_add_head(&e->hlist, head);
+	return e;
+}
+
+/*
+ * Remove the tracepoint from the tracepoint hash table. Must be called with
+ * mutex_lock held.
+ */
+static int remove_tracepoint(const char *name)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct tracepoint_entry *e;
+	int found = 0;
+	size_t len = strlen(name) + 1;
+	u32 hash = jhash(name, len-1, 0);
+
+	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
+	hlist_for_each_entry(e, node, head, hlist) {
+		if (!strcmp(name, e->name)) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found)
+		return -ENOENT;
+	if (e->refcount)
+		return -EBUSY;
+	hlist_del(&e->hlist);
+	/* Make sure the call_rcu has been executed */
+	if (e->rcu_pending)
+		rcu_barrier();
+	kfree(e);
+	return 0;
+}
+
+/*
+ * Sets the probe callback corresponding to one tracepoint.
+ */
+static void set_tracepoint(struct tracepoint_entry **entry,
+	struct tracepoint *elem, int active)
+{
+	WARN_ON(strcmp((*entry)->name, elem->name) != 0);
+
+	/*
+	 * rcu_assign_pointer has a smp_wmb() which makes sure that the new
+	 * probe callbacks array is consistent before setting a pointer to it.
+	 * This array is referenced by __DO_TRACE from
+	 * include/linux/tracepoints.h. A matching smp_read_barrier_depends()
+	 * is used.
+	 */
+	rcu_assign_pointer(elem->funcs, (*entry)->funcs);
+	elem->state = active;
+}
+
+/*
+ * Disable a tracepoint and its probe callback.
+ * Note: only waiting an RCU period after setting elem->call to the empty
+ * function insures that the original callback is not used anymore. This insured
+ * by preempt_disable around the call site.
+ */
+static void disable_tracepoint(struct tracepoint *elem)
+{
+	elem->state = 0;
+}
+
+/**
+ * tracepoint_update_probe_range - Update a probe range
+ * @begin: beginning of the range
+ * @end: end of the range
+ *
+ * Updates the probe callback corresponding to a range of tracepoints.
+ */
+void tracepoint_update_probe_range(struct tracepoint *begin,
+	struct tracepoint *end)
+{
+	struct tracepoint *iter;
+	struct tracepoint_entry *mark_entry;
+
+	mutex_lock(&tracepoints_mutex);
+	for (iter = begin; iter < end; iter++) {
+		mark_entry = get_tracepoint(iter->name);
+		if (mark_entry) {
+			set_tracepoint(&mark_entry, iter,
+					!!mark_entry->refcount);
+		} else {
+			disable_tracepoint(iter);
+		}
+	}
+	mutex_unlock(&tracepoints_mutex);
+}
+
+/*
+ * Update probes, removing the faulty probes.
+ */
+static void tracepoint_update_probes(void)
+{
+	/* Core kernel tracepoints */
+	tracepoint_update_probe_range(__start___tracepoints,
+		__stop___tracepoints);
+	/* tracepoints in modules. */
+	module_update_tracepoints();
+}
+
+/**
+ * tracepoint_probe_register -  Connect a probe to a tracepoint
+ * @name: tracepoint name
+ * @probe: probe handler
+ *
+ * Returns 0 if ok, error value on error.
+ * The probe address must at least be aligned on the architecture pointer size.
+ */
+int tracepoint_probe_register(const char *name, void *probe)
+{
+	struct tracepoint_entry *entry;
+	int ret = 0;
+	void *old;
+
+	mutex_lock(&tracepoints_mutex);
+	entry = get_tracepoint(name);
+	if (!entry) {
+		entry = add_tracepoint(name);
+		if (IS_ERR(entry)) {
+			ret = PTR_ERR(entry);
+			goto end;
+		}
+	}
+	/*
+	 * If we detect that a call_rcu is pending for this tracepoint,
+	 * make sure it's executed now.
+	 */
+	if (entry->rcu_pending)
+		rcu_barrier();
+	old = tracepoint_entry_add_probe(entry, probe);
+	if (IS_ERR(old)) {
+		ret = PTR_ERR(old);
+		goto end;
+	}
+	mutex_unlock(&tracepoints_mutex);
+	tracepoint_update_probes();		/* may update entry */
+	mutex_lock(&tracepoints_mutex);
+	entry = get_tracepoint(name);
+	WARN_ON(!entry);
+	tracepoint_entry_free_old(entry, old);
+end:
+	mutex_unlock(&tracepoints_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tracepoint_probe_register);
+
+/**
+ * tracepoint_probe_unregister -  Disconnect a probe from a tracepoint
+ * @name: tracepoint name
+ * @probe: probe function pointer
+ *
+ * We do not need to call a synchronize_sched to make sure the probes have
+ * finished running before doing a module unload, because the module unload
+ * itself uses stop_machine(), which insures that every preempt disabled section
+ * have finished.
+ */
+int tracepoint_probe_unregister(const char *name, void *probe)
+{
+	struct tracepoint_entry *entry;
+	void *old;
+	int ret = -ENOENT;
+
+	mutex_lock(&tracepoints_mutex);
+	entry = get_tracepoint(name);
+	if (!entry)
+		goto end;
+	if (entry->rcu_pending)
+		rcu_barrier();
+	old = tracepoint_entry_remove_probe(entry, probe);
+	mutex_unlock(&tracepoints_mutex);
+	tracepoint_update_probes();		/* may update entry */
+	mutex_lock(&tracepoints_mutex);
+	entry = get_tracepoint(name);
+	if (!entry)
+		goto end;
+	tracepoint_entry_free_old(entry, old);
+	remove_tracepoint(name);	/* Ignore busy error message */
+	ret = 0;
+end:
+	mutex_unlock(&tracepoints_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
+
+/**
+ * tracepoint_get_iter_range - Get a next tracepoint iterator given a range.
+ * @tracepoint: current tracepoints (in), next tracepoint (out)
+ * @begin: beginning of the range
+ * @end: end of the range
+ *
+ * Returns whether a next tracepoint has been found (1) or not (0).
+ * Will return the first tracepoint in the range if the input tracepoint is
+ * NULL.
+ */
+int tracepoint_get_iter_range(struct tracepoint **tracepoint,
+	struct tracepoint *begin, struct tracepoint *end)
+{
+	if (!*tracepoint && begin != end) {
+		*tracepoint = begin;
+		return 1;
+	}
+	if (*tracepoint >= begin && *tracepoint < end)
+		return 1;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tracepoint_get_iter_range);
+
+static void tracepoint_get_iter(struct tracepoint_iter *iter)
+{
+	int found = 0;
+
+	/* Core kernel tracepoints */
+	if (!iter->module) {
+		found = tracepoint_get_iter_range(&iter->tracepoint,
+				__start___tracepoints, __stop___tracepoints);
+		if (found)
+			goto end;
+	}
+	/* tracepoints in modules. */
+	found = module_get_iter_tracepoints(iter);
+end:
+	if (!found)
+		tracepoint_iter_reset(iter);
+}
+
+void tracepoint_iter_start(struct tracepoint_iter *iter)
+{
+	tracepoint_get_iter(iter);
+}
+EXPORT_SYMBOL_GPL(tracepoint_iter_start);
+
+void tracepoint_iter_next(struct tracepoint_iter *iter)
+{
+	iter->tracepoint++;
+	/*
+	 * iter->tracepoint may be invalid because we blindly incremented it.
+	 * Make sure it is valid by marshalling on the tracepoints, getting the
+	 * tracepoints from following modules if necessary.
+	 */
+	tracepoint_get_iter(iter);
+}
+EXPORT_SYMBOL_GPL(tracepoint_iter_next);
+
+void tracepoint_iter_stop(struct tracepoint_iter *iter)
+{
+}
+EXPORT_SYMBOL_GPL(tracepoint_iter_stop);
+
+void tracepoint_iter_reset(struct tracepoint_iter *iter)
+{
+	iter->module = NULL;
+	iter->tracepoint = NULL;
+}
+EXPORT_SYMBOL_GPL(tracepoint_iter_reset);
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2008-07-15 17:34:42.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2008-07-15 17:35:00.000000000 -0400
@@ -47,6 +47,7 @@
 #include <asm/sections.h>
 #include <linux/license.h>
 #include <asm/sections.h>
+#include <linux/tracepoint.h>
 
 #if 0
 #define DEBUGP printk
@@ -1831,6 +1832,8 @@ static struct module *load_module(void _
 #endif
 	unsigned int markersindex;
 	unsigned int markersstringsindex;
+	unsigned int tracepointsindex;
+	unsigned int tracepointsstringsindex;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -2117,6 +2120,9 @@ static struct module *load_module(void _
 	markersindex = find_sec(hdr, sechdrs, secstrings, "__markers");
  	markersstringsindex = find_sec(hdr, sechdrs, secstrings,
 					"__markers_strings");
+	tracepointsindex = find_sec(hdr, sechdrs, secstrings, "__tracepoints");
+	tracepointsstringsindex = find_sec(hdr, sechdrs, secstrings,
+					"__tracepoints_strings");
 
 	/* Now do relocations. */
 	for (i = 1; i < hdr->e_shnum; i++) {
@@ -2144,6 +2150,12 @@ static struct module *load_module(void _
 	mod->num_markers =
 		sechdrs[markersindex].sh_size / sizeof(*mod->markers);
 #endif
+#ifdef CONFIG_TRACEPOINTS
+	mod->tracepoints = (void *)sechdrs[tracepointsindex].sh_addr;
+	mod->num_tracepoints =
+		sechdrs[tracepointsindex].sh_size / sizeof(*mod->tracepoints);
+#endif
+
 
         /* Find duplicate symbols */
 	err = verify_export_symbols(mod);
@@ -2162,11 +2174,16 @@ static struct module *load_module(void _
 
 	add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
 
+	if (!mod->taints) {
 #ifdef CONFIG_MARKERS
-	if (!mod->taints)
 		marker_update_probe_range(mod->markers,
 			mod->markers + mod->num_markers);
 #endif
+#ifdef CONFIG_TRACEPOINTS
+		tracepoint_update_probe_range(mod->tracepoints,
+			mod->tracepoints + mod->num_tracepoints);
+#endif
+	}
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2717,3 +2734,50 @@ void module_update_markers(void)
 	mutex_unlock(&module_mutex);
 }
 #endif
+
+#ifdef CONFIG_TRACEPOINTS
+void module_update_tracepoints(void)
+{
+	struct module *mod;
+
+	mutex_lock(&module_mutex);
+	list_for_each_entry(mod, &modules, list)
+		if (!mod->taints)
+			tracepoint_update_probe_range(mod->tracepoints,
+				mod->tracepoints + mod->num_tracepoints);
+	mutex_unlock(&module_mutex);
+}
+
+/*
+ * Returns 0 if current not found.
+ * Returns 1 if current found.
+ */
+int module_get_iter_tracepoints(struct tracepoint_iter *iter)
+{
+	struct module *iter_mod;
+	int found = 0;
+
+	mutex_lock(&module_mutex);
+	list_for_each_entry(iter_mod, &modules, list) {
+		if (!iter_mod->taints) {
+			/*
+			 * Sorted module list
+			 */
+			if (iter_mod < iter->module)
+				continue;
+			else if (iter_mod > iter->module)
+				iter->tracepoint = NULL;
+			found = tracepoint_get_iter_range(&iter->tracepoint,
+				iter_mod->tracepoints,
+				iter_mod->tracepoints
+					+ iter_mod->num_tracepoints);
+			if (found) {
+				iter->module = iter_mod;
+				break;
+			}
+		}
+	}
+	mutex_unlock(&module_mutex);
+	return found;
+}
+#endif
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2008-07-15 17:34:42.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2008-07-15 17:35:00.000000000 -0400
@@ -16,6 +16,7 @@
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
 #include <linux/marker.h>
+#include <linux/tracepoint.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -331,6 +332,10 @@ struct module
 	struct marker *markers;
 	unsigned int num_markers;
 #endif
+#ifdef CONFIG_TRACEPOINTS
+	struct tracepoint *tracepoints;
+	unsigned int num_tracepoints;
+#endif
 
 #ifdef CONFIG_MODULE_UNLOAD
 	/* What modules depend on me? */
@@ -454,6 +459,9 @@ extern void print_modules(void);
 
 extern void module_update_markers(void);
 
+extern void module_update_tracepoints(void);
+extern int module_get_iter_tracepoints(struct tracepoint_iter *iter);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -558,6 +566,15 @@ static inline void module_update_markers
 {
 }
 
+static inline void module_update_tracepoints(void)
+{
+}
+
+static inline int module_get_iter_tracepoints(struct tracepoint_iter *iter)
+{
+	return 0;
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 03/17] Tracepoints Documentation
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 01/17] RCU read sched Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 02/17] Kernel Tracepoints Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 04/17] Tracepoints Samples Mathieu Desnoyers
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: tracepoints-documentation.patch --]
[-- Type: text/plain, Size: 4878 bytes --]

Documentation of tracepoint usage.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Documentation/tracepoints.txt |  101 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

Index: linux-2.6-lttng/Documentation/tracepoints.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/tracepoints.txt	2008-07-15 17:39:38.000000000 -0400
@@ -0,0 +1,101 @@
+ 	             Using the Linux Kernel Tracepoints
+
+			    Mathieu Desnoyers
+
+
+This document introduces Linux Kernel Tracepoints and their use. It provides
+examples of how to insert tracepoints in the kernel and connect probe functions
+to them and provides some examples of probe functions.
+
+
+* Purpose of tracepoints
+
+A tracepoint placed in code provides a hook to call a function (probe) that you
+can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
+"off" (no probe is attached). When a tracepoint is "off" it has no effect,
+except for adding a tiny time penalty (checking a condition for a branch) and
+space penalty (adding a few bytes for the function call at the end of the
+instrumented function and adds a data structure in a separate section).  When a
+tracepoint is "on", the function you provide is called each time the tracepoint
+is executed, in the execution context of the caller. When the function provided
+ends its execution, it returns to the caller (continuing from the tracepoint
+site).
+
+You can put tracepoints at important locations in the code. They are
+lightweight hooks that can pass an arbitrary number of parameters,
+which prototypes are described in a tracepoint declaration placed in a header
+file.
+
+They can be used for tracing and performance accounting.
+
+
+* Usage
+
+Two elements are required for tracepoints :
+
+- A tracepoint definition, placed in a header file.
+- The tracepoint statement, in C code.
+
+In order to use tracepoints, you should include linux/tracepoint.h.
+
+In include/trace/subsys.h :
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(subsys_eventname,
+	TPPTOTO(int firstarg, struct task_struct *p),
+	TPARGS(firstarg, p));
+
+In subsys/file.c (where the tracing statement must be added) :
+
+#include <trace/subsys.h>
+
+void somefct(void)
+{
+	...
+	trace_subsys_eventname(arg, task);
+	...
+}
+
+Where :
+- subsys_eventname is an identifier unique to your event
+    - subsys is the name of your subsystem.
+    - eventname is the name of the event to trace.
+- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
+  called by this tracepoint.
+- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.
+
+Connecting a function (probe) to a tracepoint is done by providing a probe
+(function to call) for the specific tracepoint through
+register_trace_subsys_eventname().  Removing a probe is done through
+unregister_trace_subsys_eventname(); it will remove the probe sure there is no
+caller left using the probe when it returns. Probe removal is preempt-safe
+because preemption is disabled around the probe call. See the "Probe example"
+section below for a sample probe module.
+
+The tracepoint mechanism supports inserting multiple instances of the same
+tracepoint, but a single definition must be made of a given tracepoint name over
+all the kernel to make sure no type conflict will occur. Name mangling of the
+tracepoints is done using the prototypes to make sure typing is correct.
+Verification of probe type correctness is done at the registration site by the
+compiler. Tracepoints can be put in inline functions, inlined static functions,
+and unrolled loops as well as regular functions.
+
+The naming scheme "subsys_event" is suggested here as a convention intended
+to limit collisions. Tracepoint names are global to the kernel: they are
+considered as being the same whether they are in the core kernel image or in
+modules.
+
+
+* Probe / tracepoint example
+
+See the example provided in samples/tracepoints/src
+
+Compile them with your kernel.
+
+Run, as root :
+modprobe tracepoint-example (insmod order is not important)
+modprobe tracepoint-probe-example
+cat /proc/tracepoint-example (returns an expected error)
+rmmod tracepoint-example tracepoint-probe-example
+dmesg

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 04/17] Tracepoints Samples
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2008-07-15 22:26 ` [patch 03/17] Tracepoints Documentation Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 05/17] LTTng instrumentation - irq Mathieu Desnoyers
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: tracepoints-samples.patch --]
[-- Type: text/plain, Size: 7644 bytes --]

Tracepoint example code under samples/.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 samples/Kconfig                                |    6 ++
 samples/Makefile                               |    2 
 samples/tracepoints/Makefile                   |    6 ++
 samples/tracepoints/tp-samples-trace.h         |   13 +++++
 samples/tracepoints/tracepoint-probe-sample.c  |   55 +++++++++++++++++++++++++
 samples/tracepoints/tracepoint-probe-sample2.c |   42 +++++++++++++++++++
 samples/tracepoints/tracepoint-sample.c        |   53 ++++++++++++++++++++++++
 7 files changed, 176 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/samples/Kconfig
===================================================================
--- linux-2.6-lttng.orig/samples/Kconfig	2008-07-07 09:59:25.000000000 -0400
+++ linux-2.6-lttng/samples/Kconfig	2008-07-07 10:00:07.000000000 -0400
@@ -13,6 +13,12 @@ config SAMPLE_MARKERS
 	help
 	  This build markers example modules.
 
+config SAMPLE_TRACEPOINTS
+	tristate "Build tracepoints examples -- loadable modules only"
+	depends on TRACEPOINTS && m
+	help
+	  This build tracepoints example modules.
+
 config SAMPLE_KOBJECT
 	tristate "Build kobject examples"
 	help
Index: linux-2.6-lttng/samples/tracepoints/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/tracepoints/Makefile	2008-07-07 10:53:09.000000000 -0400
@@ -0,0 +1,6 @@
+# builds the tracepoint example kernel modules;
+# then to use one (as root):  insmod <module_name.ko>
+
+obj-$(CONFIG_SAMPLE_TRACEPOINTS) += tracepoint-sample.o
+obj-$(CONFIG_SAMPLE_TRACEPOINTS) += tracepoint-probe-sample.o
+obj-$(CONFIG_SAMPLE_TRACEPOINTS) += tracepoint-probe-sample2.o
Index: linux-2.6-lttng/samples/tracepoints/tracepoint-probe-sample.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/tracepoints/tracepoint-probe-sample.c	2008-07-07 10:50:26.000000000 -0400
@@ -0,0 +1,55 @@
+/*
+ * tracepoint-probe-sample.c
+ *
+ * sample tracepoint probes.
+ */
+
+#include <linux/module.h>
+#include <linux/file.h>
+#include <linux/dcache.h>
+#include "tp-samples-trace.h"
+
+/*
+ * Here the caller only guarantees locking for struct file and struct inode.
+ * Locking must therefore be done in the probe to use the dentry.
+ */
+static void probe_subsys_event(struct inode *inode, struct file *file)
+{
+	path_get(&file->f_path);
+	dget(file->f_path.dentry);
+	printk(KERN_INFO "Event is encountered with filename %s\n",
+		file->f_path.dentry->d_name.name);
+	dput(file->f_path.dentry);
+	path_put(&file->f_path);
+}
+
+static void probe_subsys_eventb(void)
+{
+	printk(KERN_INFO "Event B is encountered\n");
+}
+
+int __init tp_sample_trace_init(void)
+{
+	int ret;
+
+	ret = register_trace_subsys_event(probe_subsys_event);
+	WARN_ON(ret);
+	ret = register_trace_subsys_eventb(probe_subsys_eventb);
+	WARN_ON(ret);
+
+	return 0;
+}
+
+module_init(tp_sample_trace_init);
+
+void __exit tp_sample_trace_exit(void)
+{
+	unregister_trace_subsys_eventb(probe_subsys_eventb);
+	unregister_trace_subsys_event(probe_subsys_event);
+}
+
+module_exit(tp_sample_trace_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Tracepoint Probes Samples");
Index: linux-2.6-lttng/samples/tracepoints/tracepoint-sample.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/tracepoints/tracepoint-sample.c	2008-07-07 10:04:16.000000000 -0400
@@ -0,0 +1,53 @@
+/* tracepoint-sample.c
+ *
+ * Executes a tracepoint when /proc/tracepoint-example is opened.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/proc_fs.h>
+#include "tp-samples-trace.h"
+
+struct proc_dir_entry *pentry_example;
+
+static int my_open(struct inode *inode, struct file *file)
+{
+	int i;
+
+	trace_subsys_event(inode, file);
+	for (i = 0; i < 10; i++)
+		trace_subsys_eventb();
+	return -EPERM;
+}
+
+static struct file_operations mark_ops = {
+	.open = my_open,
+};
+
+static int example_init(void)
+{
+	printk(KERN_ALERT "example init\n");
+	pentry_example = proc_create("tracepoint-example", 0444, NULL,
+		&mark_ops);
+	if (!pentry_example)
+		return -EPERM;
+	return 0;
+}
+
+static void example_exit(void)
+{
+	printk(KERN_ALERT "example exit\n");
+	remove_proc_entry("tracepoint-example", NULL);
+}
+
+module_init(example_init)
+module_exit(example_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Tracepoint example");
Index: linux-2.6-lttng/samples/tracepoints/tp-samples-trace.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/tracepoints/tp-samples-trace.h	2008-07-07 10:06:26.000000000 -0400
@@ -0,0 +1,13 @@
+#ifndef _TP_SAMPLES_TRACE_H
+#define _TP_SAMPLES_TRACE_H
+
+#include <linux/proc_fs.h>	/* for struct inode and struct file */
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(subsys_event,
+	TPPROTO(struct inode *inode, struct file *file),
+	TPARGS(inode, file));
+DEFINE_TRACE(subsys_eventb,
+	TPPROTO(void),
+	TPARGS());
+#endif
Index: linux-2.6-lttng/samples/Makefile
===================================================================
--- linux-2.6-lttng.orig/samples/Makefile	2008-07-07 10:44:50.000000000 -0400
+++ linux-2.6-lttng/samples/Makefile	2008-07-07 10:44:59.000000000 -0400
@@ -1,3 +1,3 @@
 # Makefile for Linux samples code
 
-obj-$(CONFIG_SAMPLES)	+= markers/ kobject/ kprobes/
+obj-$(CONFIG_SAMPLES)	+= markers/ kobject/ kprobes/ tracepoints/
Index: linux-2.6-lttng/samples/tracepoints/tracepoint-probe-sample2.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/tracepoints/tracepoint-probe-sample2.c	2008-07-07 10:56:09.000000000 -0400
@@ -0,0 +1,42 @@
+/*
+ * tracepoint-probe-sample2.c
+ *
+ * 2nd sample tracepoint probes.
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include "tp-samples-trace.h"
+
+/*
+ * Here the caller only guarantees locking for struct file and struct inode.
+ * Locking must therefore be done in the probe to use the dentry.
+ */
+static void probe_subsys_event(struct inode *inode, struct file *file)
+{
+	printk(KERN_INFO "Event is encountered with inode number %lu\n",
+		inode->i_ino);
+}
+
+int __init tp_sample_trace_init(void)
+{
+	int ret;
+
+	ret = register_trace_subsys_event(probe_subsys_event);
+	WARN_ON(ret);
+
+	return 0;
+}
+
+module_init(tp_sample_trace_init);
+
+void __exit tp_sample_trace_exit(void)
+{
+	unregister_trace_subsys_event(probe_subsys_event);
+}
+
+module_exit(tp_sample_trace_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Tracepoint Probes Samples");

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 05/17] LTTng instrumentation - irq
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2008-07-15 22:26 ` [patch 04/17] Tracepoints Samples Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 06/17] LTTng instrumentation - scheduler Mathieu Desnoyers
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Thomas Gleixner, Russell King,
	Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-irq.patch --]
[-- Type: text/plain, Size: 5242 bytes --]

Instrumentation of IRQ related events : irq, softirq, tasklet entry and exit and
softirq "raise" events.

It allows tracers to perform latency analysis on those various types of
interrupts and to detect interrupts with max/min/avg duration. It helps
detecting driver or hardware problems which cause an ISR to take ages to
execute. It has been shown to be the case with bogus hardware causing an mmio
read to take a few milliseconds.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Changelog:
- Add retval as irq_exit argument.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Russell King <rmk+lkml@arm.linux.org.uk>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/irq.h |   36 ++++++++++++++++++++++++++++++++++++
 kernel/irq/handle.c |    6 ++++++
 kernel/softirq.c    |    8 ++++++++
 3 files changed, 50 insertions(+)

Index: linux-2.6-lttng/kernel/irq/handle.c
===================================================================
--- linux-2.6-lttng.orig/kernel/irq/handle.c	2008-07-15 14:49:14.000000000 -0400
+++ linux-2.6-lttng/kernel/irq/handle.c	2008-07-15 15:12:48.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/random.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
+#include <trace/irq.h>
 
 #include "internals.h"
 
@@ -130,6 +131,9 @@ irqreturn_t handle_IRQ_event(unsigned in
 {
 	irqreturn_t ret, retval = IRQ_NONE;
 	unsigned int status = 0;
+	struct pt_regs *regs = get_irq_regs();
+
+	trace_irq_entry(irq, regs);
 
 	handle_dynamic_tick(action);
 
@@ -148,6 +152,8 @@ irqreturn_t handle_IRQ_event(unsigned in
 		add_interrupt_randomness(irq);
 	local_irq_disable();
 
+	trace_irq_exit(retval);
+
 	return retval;
 }
 
Index: linux-2.6-lttng/kernel/softirq.c
===================================================================
--- linux-2.6-lttng.orig/kernel/softirq.c	2008-07-15 14:51:50.000000000 -0400
+++ linux-2.6-lttng/kernel/softirq.c	2008-07-15 15:12:48.000000000 -0400
@@ -21,6 +21,7 @@
 #include <linux/rcupdate.h>
 #include <linux/smp.h>
 #include <linux/tick.h>
+#include <trace/irq.h>
 
 #include <asm/irq.h>
 /*
@@ -205,7 +206,9 @@ restart:
 
 	do {
 		if (pending & 1) {
+			trace_irq_softirq_entry(h, softirq_vec);
 			h->action(h);
+			trace_irq_softirq_exit(h, softirq_vec);
 			rcu_bh_qsctr_inc(cpu);
 		}
 		h++;
@@ -297,6 +300,7 @@ void irq_exit(void)
  */
 inline void raise_softirq_irqoff(unsigned int nr)
 {
+	trace_irq_softirq_raise(nr);
 	__raise_softirq_irqoff(nr);
 
 	/*
@@ -394,7 +398,9 @@ static void tasklet_action(struct softir
 			if (!atomic_read(&t->count)) {
 				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
 					BUG();
+				trace_irq_tasklet_low_entry(t);
 				t->func(t->data);
+				trace_irq_tasklet_low_exit(t);
 				tasklet_unlock(t);
 				continue;
 			}
@@ -429,7 +435,9 @@ static void tasklet_hi_action(struct sof
 			if (!atomic_read(&t->count)) {
 				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
 					BUG();
+				trace_irq_tasklet_high_entry(t);
 				t->func(t->data);
+				trace_irq_tasklet_high_exit(t);
 				tasklet_unlock(t);
 				continue;
 			}
Index: linux-2.6-lttng/include/trace/irq.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/irq.h	2008-07-15 15:12:48.000000000 -0400
@@ -0,0 +1,36 @@
+#ifndef _TRACE_IRQ_H
+#define _TRACE_IRQ_H
+
+#include <linux/kdebug.h>
+#include <linux/interrupt.h>
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(irq_entry,
+	TPPROTO(unsigned int id, struct pt_regs *regs),
+	TPARGS(id, regs));
+DEFINE_TRACE(irq_exit,
+	TPPROTO(irqreturn_t retval),
+	TPARGS(retval));
+DEFINE_TRACE(irq_softirq_entry,
+	TPPROTO(struct softirq_action *h, struct softirq_action *softirq_vec),
+	TPARGS(h, softirq_vec));
+DEFINE_TRACE(irq_softirq_exit,
+	TPPROTO(struct softirq_action *h, struct softirq_action *softirq_vec),
+	TPARGS(h, softirq_vec));
+DEFINE_TRACE(irq_softirq_raise,
+	TPPROTO(unsigned int nr),
+	TPARGS(nr));
+DEFINE_TRACE(irq_tasklet_low_entry,
+	TPPROTO(struct tasklet_struct *t),
+	TPARGS(t));
+DEFINE_TRACE(irq_tasklet_low_exit,
+	TPPROTO(struct tasklet_struct *t),
+	TPARGS(t));
+DEFINE_TRACE(irq_tasklet_high_entry,
+	TPPROTO(struct tasklet_struct *t),
+	TPARGS(t));
+DEFINE_TRACE(irq_tasklet_high_exit,
+	TPPROTO(struct tasklet_struct *t),
+	TPARGS(t));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 06/17] LTTng instrumentation - scheduler
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2008-07-15 22:26 ` [patch 05/17] LTTng instrumentation - irq Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-16  8:30   ` Peter Zijlstra
  2008-07-15 22:26 ` [patch 07/17] LTTng instrumentation - timer Mathieu Desnoyers
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Steven Rostedt, Thomas Gleixner,
	Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-scheduler.patch --]
[-- Type: text/plain, Size: 9072 bytes --]

Instrument the scheduler activity (sched_switch, migration, wakeups, wait for a
task, signal delivery) and process/thread creation/destruction (fork, exit,
kthread stop). Actually, kthread creation is not instrumented in this patch
because it is architecture dependent. It allows to connect tracers such as
ftrace which detects scheduling latencies, good/bad scheduler decisions. Tools
like LTTng can export this scheduler information along with instrumentation of
the rest of the kernel activity to perform post-mortem analysis on the scheduler
activity.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Changelog :
- Change instrumentation location and parameter to match ftrace instrumentation,
  previously done with kernel markers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/sched.h |   45 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c         |    6 ++++++
 kernel/fork.c         |    3 +++
 kernel/kthread.c      |    5 +++++
 kernel/sched.c        |   17 ++++++-----------
 kernel/signal.c       |    3 +++
 6 files changed, 68 insertions(+), 11 deletions(-)

Index: linux-2.6-lttng/kernel/kthread.c
===================================================================
--- linux-2.6-lttng.orig/kernel/kthread.c	2008-07-15 14:51:49.000000000 -0400
+++ linux-2.6-lttng/kernel/kthread.c	2008-07-15 15:12:54.000000000 -0400
@@ -13,6 +13,7 @@
 #include <linux/file.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <trace/sched.h>
 
 #define KTHREAD_NICE_LEVEL (-5)
 
@@ -187,6 +188,8 @@ int kthread_stop(struct task_struct *k)
 	/* It could exit after stop_info.k set, but before wake_up_process. */
 	get_task_struct(k);
 
+	trace_sched_kthread_stop(k);
+
 	/* Must init completion *before* thread sees kthread_stop_info.k */
 	init_completion(&kthread_stop_info.done);
 	smp_wmb();
@@ -202,6 +205,8 @@ int kthread_stop(struct task_struct *k)
 	ret = kthread_stop_info.err;
 	mutex_unlock(&kthread_stop_lock);
 
+	trace_sched_kthread_stop_ret(ret);
+
 	return ret;
 }
 EXPORT_SYMBOL(kthread_stop);
Index: linux-2.6-lttng/kernel/sched.c
===================================================================
--- linux-2.6-lttng.orig/kernel/sched.c	2008-07-15 14:51:50.000000000 -0400
+++ linux-2.6-lttng/kernel/sched.c	2008-07-15 15:13:49.000000000 -0400
@@ -71,6 +71,7 @@
 #include <linux/debugfs.h>
 #include <linux/ctype.h>
 #include <linux/ftrace.h>
+#include <trace/sched.h>
 
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
@@ -1987,6 +1988,7 @@ void wait_task_inactive(struct task_stru
 		 * just go back and repeat.
 		 */
 		rq = task_rq_lock(p, &flags);
+		trace_sched_wait_task(rq, p);
 		running = task_running(rq, p);
 		on_rq = p->se.on_rq;
 		task_rq_unlock(rq, &flags);
@@ -2337,9 +2339,7 @@ out_activate:
 	success = 1;
 
 out_running:
-	trace_mark(kernel_sched_wakeup,
-		"pid %d state %ld ## rq %p task %p rq->curr %p",
-		p->pid, p->state, rq, p, rq->curr);
+	trace_sched_wakeup(rq, p);
 	check_preempt_curr(rq, p);
 
 	p->state = TASK_RUNNING;
@@ -2472,9 +2472,7 @@ void wake_up_new_task(struct task_struct
 		p->sched_class->task_new(rq, p);
 		inc_nr_running(rq);
 	}
-	trace_mark(kernel_sched_wakeup_new,
-		"pid %d state %ld ## rq %p task %p rq->curr %p",
-		p->pid, p->state, rq, p, rq->curr);
+	trace_sched_wakeup_new(rq, p);
 	check_preempt_curr(rq, p);
 #ifdef CONFIG_SMP
 	if (p->sched_class->task_wake_up)
@@ -2647,11 +2645,7 @@ context_switch(struct rq *rq, struct tas
 	struct mm_struct *mm, *oldmm;
 
 	prepare_task_switch(rq, prev, next);
-	trace_mark(kernel_sched_schedule,
-		"prev_pid %d next_pid %d prev_state %ld "
-		"## rq %p prev %p next %p",
-		prev->pid, next->pid, prev->state,
-		rq, prev, next);
+	trace_sched_switch(rq, prev, next);
 	mm = next->mm;
 	oldmm = prev->active_mm;
 	/*
@@ -2884,6 +2878,7 @@ static void sched_migrate_task(struct ta
 	    || unlikely(cpu_is_offline(dest_cpu)))
 		goto out;
 
+	trace_sched_migrate_task(rq, p, dest_cpu);
 	/* force the process onto the specified CPU */
 	if (migrate_task(p, dest_cpu, &req)) {
 		/* Need to wait for migration thread (might exit: take ref). */
Index: linux-2.6-lttng/kernel/exit.c
===================================================================
--- linux-2.6-lttng.orig/kernel/exit.c	2008-07-15 14:51:49.000000000 -0400
+++ linux-2.6-lttng/kernel/exit.c	2008-07-15 15:12:54.000000000 -0400
@@ -46,6 +46,7 @@
 #include <linux/resource.h>
 #include <linux/blkdev.h>
 #include <linux/task_io_accounting_ops.h>
+#include <trace/sched.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -149,6 +150,7 @@ static void __exit_signal(struct task_st
 
 static void delayed_put_task_struct(struct rcu_head *rhp)
 {
+	trace_sched_process_free(container_of(rhp, struct task_struct, rcu));
 	put_task_struct(container_of(rhp, struct task_struct, rcu));
 }
 
@@ -1040,6 +1042,8 @@ NORET_TYPE void do_exit(long code)
 
 	if (group_dead)
 		acct_process();
+	trace_sched_process_exit(tsk);
+
 	exit_sem(tsk);
 	exit_files(tsk);
 	exit_fs(tsk);
@@ -1524,6 +1528,8 @@ static long do_wait(enum pid_type type, 
 	struct task_struct *tsk;
 	int flag, retval;
 
+	trace_sched_process_wait(pid);
+
 	add_wait_queue(&current->signal->wait_chldexit,&wait);
 repeat:
 	/* If there is nothing that can match our critier just get out */
Index: linux-2.6-lttng/kernel/fork.c
===================================================================
--- linux-2.6-lttng.orig/kernel/fork.c	2008-07-15 14:51:49.000000000 -0400
+++ linux-2.6-lttng/kernel/fork.c	2008-07-15 15:14:23.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
 #include <linux/magic.h>
+#include <trace/sched.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1362,6 +1363,8 @@ long do_fork(unsigned long clone_flags,
 	if (!IS_ERR(p)) {
 		struct completion vfork;
 
+		trace_sched_process_fork(current, p);
+
 		nr = task_pid_vnr(p);
 
 		if (clone_flags & CLONE_PARENT_SETTID)
Index: linux-2.6-lttng/kernel/signal.c
===================================================================
--- linux-2.6-lttng.orig/kernel/signal.c	2008-07-15 14:49:14.000000000 -0400
+++ linux-2.6-lttng/kernel/signal.c	2008-07-15 15:12:54.000000000 -0400
@@ -26,6 +26,7 @@
 #include <linux/freezer.h>
 #include <linux/pid_namespace.h>
 #include <linux/nsproxy.h>
+#include <trace/sched.h>
 
 #include <asm/param.h>
 #include <asm/uaccess.h>
@@ -807,6 +808,8 @@ static int send_signal(int sig, struct s
 	struct sigpending *pending;
 	struct sigqueue *q;
 
+	trace_sched_signal_send(sig, t);
+
 	assert_spin_locked(&t->sighand->siglock);
 	if (!prepare_signal(sig, t))
 		return 0;
Index: linux-2.6-lttng/include/trace/sched.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/sched.h	2008-07-15 15:12:54.000000000 -0400
@@ -0,0 +1,45 @@
+#ifndef _TRACE_SCHED_H
+#define _TRACE_SCHED_H
+
+#include <linux/sched.h>
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(sched_kthread_stop,
+	TPPROTO(struct task_struct *t),
+	TPARGS(t));
+DEFINE_TRACE(sched_kthread_stop_ret,
+	TPPROTO(int ret),
+	TPARGS(ret));
+DEFINE_TRACE(sched_wait_task,
+	TPPROTO(struct rq *rq, struct task_struct *p),
+	TPARGS(rq, p));
+DEFINE_TRACE(sched_wakeup,
+	TPPROTO(struct rq *rq, struct task_struct *p),
+	TPARGS(rq, p));
+DEFINE_TRACE(sched_wakeup_new,
+	TPPROTO(struct rq *rq, struct task_struct *p),
+	TPARGS(rq, p));
+DEFINE_TRACE(sched_switch,
+	TPPROTO(struct rq *rq, struct task_struct *prev,
+		struct task_struct *next),
+	TPARGS(rq, prev, next));
+DEFINE_TRACE(sched_migrate_task,
+	TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
+	TPARGS(rq, p, dest_cpu));
+DEFINE_TRACE(sched_process_free,
+	TPPROTO(struct task_struct *p),
+	TPARGS(p));
+DEFINE_TRACE(sched_process_exit,
+	TPPROTO(struct task_struct *p),
+	TPARGS(p));
+DEFINE_TRACE(sched_process_wait,
+	TPPROTO(struct pid *pid),
+	TPARGS(pid));
+DEFINE_TRACE(sched_process_fork,
+	TPPROTO(struct task_struct *parent, struct task_struct *child),
+	TPARGS(parent, child));
+DEFINE_TRACE(sched_signal_send,
+	TPPROTO(int sig, struct task_struct *p),
+	TPARGS(sig, p));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 07/17] LTTng instrumentation - timer
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2008-07-15 22:26 ` [patch 06/17] LTTng instrumentation - scheduler Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-16  8:34   ` Peter Zijlstra
  2008-07-15 22:26 ` [patch 08/17] LTTng instrumentation - kernel Mathieu Desnoyers
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, David S. Miller, Frank Ch. Eigler, Hideo AOKI,
	Takashi Nishiie, Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-timer.patch --]
[-- Type: text/plain, Size: 4614 bytes --]

Instrument timer activity (timer set, expired, current time updates) to keep
information about the "real time" flow within the kernel. It can be used by a
trace analysis tool to synchronize information coming from various sources, e.g.
to merge traces with system logs.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: "David S. Miller" <davem@davemloft.net>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/timer.h |   24 ++++++++++++++++++++++++
 kernel/itimer.c       |    5 +++++
 kernel/timer.c        |    8 +++++++-
 3 files changed, 36 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/kernel/itimer.c
===================================================================
--- linux-2.6-lttng.orig/kernel/itimer.c	2008-07-15 14:49:14.000000000 -0400
+++ linux-2.6-lttng/kernel/itimer.c	2008-07-15 15:14:28.000000000 -0400
@@ -12,6 +12,7 @@
 #include <linux/time.h>
 #include <linux/posix-timers.h>
 #include <linux/hrtimer.h>
+#include <trace/timer.h>
 
 #include <asm/uaccess.h>
 
@@ -132,6 +133,8 @@ enum hrtimer_restart it_real_fn(struct h
 	struct signal_struct *sig =
 		container_of(timer, struct signal_struct, real_timer);
 
+	trace_timer_itimer_expired(sig);
+
 	kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);
 
 	return HRTIMER_NORESTART;
@@ -157,6 +160,8 @@ int do_setitimer(int which, struct itime
 	    !timeval_valid(&value->it_interval))
 		return -EINVAL;
 
+	trace_timer_itimer_set(which, value);
+
 	switch (which) {
 	case ITIMER_REAL:
 again:
Index: linux-2.6-lttng/kernel/timer.c
===================================================================
--- linux-2.6-lttng.orig/kernel/timer.c	2008-07-15 14:51:50.000000000 -0400
+++ linux-2.6-lttng/kernel/timer.c	2008-07-15 15:14:28.000000000 -0400
@@ -37,12 +37,14 @@
 #include <linux/delay.h>
 #include <linux/tick.h>
 #include <linux/kallsyms.h>
+#include <trace/timer.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/div64.h>
 #include <asm/timex.h>
 #include <asm/io.h>
+#include <asm/irq_regs.h>
 
 u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
 
@@ -288,6 +290,7 @@ static void internal_add_timer(struct tv
 		i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
 		vec = base->tv5.vec + i;
 	}
+	trace_timer_set(timer);
 	/*
 	 * Timers are FIFO:
 	 */
@@ -1066,6 +1069,7 @@ void do_timer(unsigned long ticks)
 {
 	jiffies_64 += ticks;
 	update_times(ticks);
+	trace_timer_update_time(&xtime, &wall_to_monotonic);
 }
 
 #ifdef __ARCH_WANT_SYS_ALARM
@@ -1147,7 +1151,9 @@ asmlinkage long sys_getegid(void)
 
 static void process_timeout(unsigned long __data)
 {
-	wake_up_process((struct task_struct *)__data);
+	struct task_struct *task = (struct task_struct *)__data;
+	trace_timer_timeout(task);
+	wake_up_process(task);
 }
 
 /**
Index: linux-2.6-lttng/include/trace/timer.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/timer.h	2008-07-15 15:14:28.000000000 -0400
@@ -0,0 +1,24 @@
+#ifndef _TRACE_TIMER_H
+#define _TRACE_TIMER_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(timer_itimer_expired,
+	TPPROTO(struct signal_struct *sig),
+	TPARGS(sig));
+DEFINE_TRACE(timer_itimer_set,
+	TPPROTO(int which, struct itimerval *value),
+	TPARGS(which, value));
+DEFINE_TRACE(timer_set,
+	TPPROTO(struct timer_list *timer),
+	TPARGS(timer));
+/*
+ * xtime_lock is taken when kernel_timer_update_time tracepoint is reached.
+ */
+DEFINE_TRACE(timer_update_time,
+	TPPROTO(struct timespec *_xtime, struct timespec *_wall_to_monotonic),
+	TPARGS(_xtime, _wall_to_monotonic));
+DEFINE_TRACE(timer_timeout,
+	TPPROTO(struct task_struct *p),
+	TPARGS(p));
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 08/17] LTTng instrumentation - kernel
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2008-07-15 22:26 ` [patch 07/17] LTTng instrumentation - timer Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-24 13:57   ` Steven Rostedt
  2008-07-15 22:26 ` [patch 09/17] LTTng instrumentation - filemap Mathieu Desnoyers
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-kernel.patch --]
[-- Type: text/plain, Size: 4147 bytes --]

Instrument the core kernel : module load/free and printk events. It helps the
tracer to keep track of module related events and to export valuable printk
information into the traces.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/kernel.h |   19 +++++++++++++++++++
 kernel/module.c        |    5 +++++
 kernel/printk.c        |    6 ++++++
 3 files changed, 30 insertions(+)

Index: linux-2.6-lttng/kernel/printk.c
===================================================================
--- linux-2.6-lttng.orig/kernel/printk.c	2008-07-15 14:51:50.000000000 -0400
+++ linux-2.6-lttng/kernel/printk.c	2008-07-15 15:14:31.000000000 -0400
@@ -32,6 +32,7 @@
 #include <linux/security.h>
 #include <linux/bootmem.h>
 #include <linux/syscalls.h>
+#include <trace/kernel.h>
 
 #include <asm/uaccess.h>
 
@@ -59,6 +60,7 @@ int console_printk[4] = {
 	MINIMUM_CONSOLE_LOGLEVEL,	/* minimum_console_loglevel */
 	DEFAULT_CONSOLE_LOGLEVEL,	/* default_console_loglevel */
 };
+EXPORT_SYMBOL_GPL(console_printk);
 
 /*
  * Low level drivers may need that to know if they can schedule in
@@ -601,6 +603,7 @@ asmlinkage int printk(const char *fmt, .
 	int r;
 
 	va_start(args, fmt);
+	trace_kernel_printk(__builtin_return_address(0));
 	r = vprintk(fmt, args);
 	va_end(args);
 
@@ -677,6 +680,9 @@ asmlinkage int vprintk(const char *fmt, 
 	raw_local_irq_save(flags);
 	this_cpu = smp_processor_id();
 
+	trace_kernel_vprintk(__builtin_return_address(0),
+		printk_buf, printed_len);
+
 	/*
 	 * Ouch, printk recursed into itself!
 	 */
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2008-07-15 15:12:09.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2008-07-15 15:14:31.000000000 -0400
@@ -48,6 +48,7 @@
 #include <linux/license.h>
 #include <asm/sections.h>
 #include <linux/tracepoint.h>
+#include <trace/kernel.h>
 
 #if 0
 #define DEBUGP printk
@@ -1429,6 +1430,8 @@ static int __unlink_module(void *_mod)
 /* Free a module, remove from lists, etc (must hold module_mutex). */
 static void free_module(struct module *mod)
 {
+	trace_kernel_module_free(mod);
+
 	/* Delete from various lists */
 	stop_machine(__unlink_module, mod, NULL);
 	remove_notes_attrs(mod);
@@ -2244,6 +2247,8 @@ static struct module *load_module(void _
 	/* Get rid of temporary copy */
 	vfree(hdr);
 
+	trace_kernel_module_load(mod);
+
 	/* Done! */
 	return mod;
 
Index: linux-2.6-lttng/include/trace/kernel.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/kernel.h	2008-07-15 15:14:31.000000000 -0400
@@ -0,0 +1,19 @@
+#ifndef _TRACE_KERNEL_H
+#define _TRACE_KERNEL_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(kernel_printk,
+	TPPROTO(void *retaddr),
+	TPARGS(retaddr));
+DEFINE_TRACE(kernel_vprintk,
+	TPPROTO(void *retaddr, char *buf, int len),
+	TPARGS(retaddr, buf, len));
+DEFINE_TRACE(kernel_module_free,
+	TPPROTO(struct module *mod),
+	TPARGS(mod));
+DEFINE_TRACE(kernel_module_load,
+	TPPROTO(struct module *mod),
+	TPARGS(mod));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 09/17] LTTng instrumentation - filemap
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2008-07-15 22:26 ` [patch 08/17] LTTng instrumentation - kernel Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-16  8:35   ` Peter Zijlstra
  2008-07-17  6:25   ` Nick Piggin
  2008-07-15 22:26 ` [patch 10/17] LTTng instrumentation - swap Mathieu Desnoyers
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-filemap.patch --]
[-- Type: text/plain, Size: 2609 bytes --]

Instrumentation of waits caused by memory accesses on mmap regions.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: linux-mm@kvack.org
CC: Dave Hansen <haveblue@us.ibm.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/filemap.h |   13 +++++++++++++
 mm/filemap.c            |    3 +++
 2 files changed, 16 insertions(+)

Index: linux-2.6-lttng/mm/filemap.c
===================================================================
--- linux-2.6-lttng.orig/mm/filemap.c	2008-07-15 14:51:50.000000000 -0400
+++ linux-2.6-lttng/mm/filemap.c	2008-07-15 15:14:46.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpuset.h>
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
+#include <trace/filemap.h>
 #include "internal.h"
 
 /*
@@ -541,9 +542,11 @@ void wait_on_page_bit(struct page *page,
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
 
+	trace_filemap_wait_start(page, bit_nr);
 	if (test_bit(bit_nr, &page->flags))
 		__wait_on_bit(page_waitqueue(page), &wait, sync_page,
 							TASK_UNINTERRUPTIBLE);
+	trace_filemap_wait_end(page, bit_nr);
 }
 EXPORT_SYMBOL(wait_on_page_bit);
 
Index: linux-2.6-lttng/include/trace/filemap.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/filemap.h	2008-07-15 15:14:46.000000000 -0400
@@ -0,0 +1,13 @@
+#ifndef _TRACE_FILEMAP_H
+#define _TRACE_FILEMAP_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(filemap_wait_start,
+	TPPROTO(struct page *page, int bit_nr),
+	TPARGS(page, bit_nr));
+DEFINE_TRACE(filemap_wait_end,
+	TPPROTO(struct page *page, int bit_nr),
+	TPARGS(page, bit_nr));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 10/17] LTTng instrumentation - swap
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2008-07-15 22:26 ` [patch 09/17] LTTng instrumentation - filemap Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-16  8:39   ` Peter Zijlstra
  2008-07-15 22:26 ` [patch 11/17] LTTng instrumentation - memory page faults Mathieu Desnoyers
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-swap.patch --]
[-- Type: text/plain, Size: 4579 bytes --]

Instrumentation of waits caused by swap activity. Also instrumentation
swapon/swapoff events to keep track of active swap partitions.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: linux-mm@kvack.org
CC: Dave Hansen <haveblue@us.ibm.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/swap.h |   20 ++++++++++++++++++++
 mm/memory.c          |    2 ++
 mm/page_io.c         |    2 ++
 mm/swapfile.c        |    4 ++++
 4 files changed, 28 insertions(+)

Index: linux-2.6-lttng/mm/memory.c
===================================================================
--- linux-2.6-lttng.orig/mm/memory.c	2008-07-15 13:54:46.000000000 -0400
+++ linux-2.6-lttng/mm/memory.c	2008-07-15 14:02:54.000000000 -0400
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <trace/swap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -2213,6 +2214,7 @@ static int do_swap_page(struct mm_struct
 		/* Had to read the page from swap area: Major fault */
 		ret = VM_FAULT_MAJOR;
 		count_vm_event(PGMAJFAULT);
+		trace_swap_in(page, entry);
 	}
 
 	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
Index: linux-2.6-lttng/mm/page_io.c
===================================================================
--- linux-2.6-lttng.orig/mm/page_io.c	2008-07-15 13:54:46.000000000 -0400
+++ linux-2.6-lttng/mm/page_io.c	2008-07-15 14:02:54.000000000 -0400
@@ -17,6 +17,7 @@
 #include <linux/bio.h>
 #include <linux/swapops.h>
 #include <linux/writeback.h>
+#include <trace/swap.h>
 #include <asm/pgtable.h>
 
 static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
@@ -114,6 +115,7 @@ int swap_writepage(struct page *page, st
 		rw |= (1 << BIO_RW_SYNC);
 	count_vm_event(PSWPOUT);
 	set_page_writeback(page);
+	trace_swap_out(page);
 	unlock_page(page);
 	submit_bio(rw, bio);
 out:
Index: linux-2.6-lttng/mm/swapfile.c
===================================================================
--- linux-2.6-lttng.orig/mm/swapfile.c	2008-07-15 13:54:46.000000000 -0400
+++ linux-2.6-lttng/mm/swapfile.c	2008-07-15 14:02:54.000000000 -0400
@@ -32,6 +32,7 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 #include <linux/swapops.h>
+#include <trace/swap.h>
 
 DEFINE_SPINLOCK(swap_lock);
 unsigned int nr_swapfiles;
@@ -1310,6 +1311,7 @@ asmlinkage long sys_swapoff(const char _
 	swap_map = p->swap_map;
 	p->swap_map = NULL;
 	p->flags = 0;
+	trace_swap_file_close(swap_file);
 	spin_unlock(&swap_lock);
 	mutex_unlock(&swapon_mutex);
 	vfree(swap_map);
@@ -1695,6 +1697,7 @@ asmlinkage long sys_swapon(const char __
 	} else {
 		swap_info[prev].next = p - swap_info;
 	}
+	trace_swap_file_open(swap_file, name);
 	spin_unlock(&swap_lock);
 	mutex_unlock(&swapon_mutex);
 	error = 0;
@@ -1796,6 +1799,7 @@ get_swap_info_struct(unsigned type)
 {
 	return &swap_info[type];
 }
+EXPORT_SYMBOL_GPL(get_swap_info_struct);
 
 /*
  * swap_lock prevents swap_map being freed. Don't grab an extra
Index: linux-2.6-lttng/include/trace/swap.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/swap.h	2008-07-15 14:02:54.000000000 -0400
@@ -0,0 +1,20 @@
+#ifndef _TRACE_SWAP_H
+#define _TRACE_SWAP_H
+
+#include <linux/swap.h>
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(swap_in,
+	TPPROTO(struct page *page, swp_entry_t entry),
+	TPARGS(page, entry));
+DEFINE_TRACE(swap_out,
+	TPPROTO(struct page *page),
+	TPARGS(page));
+DEFINE_TRACE(swap_file_open,
+	TPPROTO(struct file *file, char *filename),
+	TPARGS(file, filename));
+DEFINE_TRACE(swap_file_close,
+	TPPROTO(struct file *file),
+	TPARGS(file));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 11/17] LTTng instrumentation - memory page faults
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  2008-07-15 22:26 ` [patch 10/17] LTTng instrumentation - swap Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 12/17] LTTng instrumentation - page Mathieu Desnoyers
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Andi Kleen, linux-mm, Dave Hansen,
	Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-memory.patch --]
[-- Type: text/plain, Size: 3673 bytes --]

Instrument the page fault entry and exit. Useful to detect delays caused by page
faults and bad memory usage patterns.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi-suse@firstfloor.org>
CC: linux-mm@kvack.org
CC: Dave Hansen <haveblue@us.ibm.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/memory.h |   14 ++++++++++++++
 mm/memory.c            |   33 ++++++++++++++++++++++++---------
 2 files changed, 38 insertions(+), 9 deletions(-)

Index: linux-2.6-lttng/mm/memory.c
===================================================================
--- linux-2.6-lttng.orig/mm/memory.c	2008-07-15 14:02:54.000000000 -0400
+++ linux-2.6-lttng/mm/memory.c	2008-07-15 14:03:47.000000000 -0400
@@ -61,6 +61,7 @@
 
 #include <linux/swapops.h>
 #include <linux/elf.h>
+#include <trace/memory.h>
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 /* use the per-pgdat data instead for discontigmem - mbligh */
@@ -2664,30 +2665,44 @@ unlock:
 int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		unsigned long address, int write_access)
 {
+	int res;
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 
+	trace_memory_handle_fault_entry(mm, vma, address, write_access);
+
 	__set_current_state(TASK_RUNNING);
 
 	count_vm_event(PGFAULT);
 
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		return hugetlb_fault(mm, vma, address, write_access);
+	if (unlikely(is_vm_hugetlb_page(vma))) {
+		res = hugetlb_fault(mm, vma, address, write_access);
+		goto end;
+	}
 
 	pgd = pgd_offset(mm, address);
 	pud = pud_alloc(mm, pgd, address);
-	if (!pud)
-		return VM_FAULT_OOM;
+	if (!pud) {
+		res = VM_FAULT_OOM;
+		goto end;
+	}
 	pmd = pmd_alloc(mm, pud, address);
-	if (!pmd)
-		return VM_FAULT_OOM;
+	if (!pmd) {
+		res = VM_FAULT_OOM;
+		goto end;
+	}
 	pte = pte_alloc_map(mm, pmd, address);
-	if (!pte)
-		return VM_FAULT_OOM;
+	if (!pte) {
+		res = VM_FAULT_OOM;
+		goto end;
+	}
 
-	return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+	res = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+end:
+	trace_memory_handle_fault_exit(res);
+	return res;
 }
 
 #ifndef __PAGETABLE_PUD_FOLDED
Index: linux-2.6-lttng/include/trace/memory.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/memory.h	2008-07-15 14:03:47.000000000 -0400
@@ -0,0 +1,14 @@
+#ifndef _TRACE_MEMORY_H
+#define _TRACE_MEMORY_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(memory_handle_fault_entry,
+	TPPROTO(struct mm_struct *mm, struct vm_area_struct *vma,
+		unsigned long address, int write_access),
+	TPARGS(mm, vma, address, write_access));
+DEFINE_TRACE(memory_handle_fault_exit,
+	TPPROTO(int res),
+	TPARGS(res));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 12/17] LTTng instrumentation - page
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (10 preceding siblings ...)
  2008-07-15 22:26 ` [patch 11/17] LTTng instrumentation - memory page faults Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-16  8:41   ` Peter Zijlstra
  2008-07-15 22:26 ` [patch 13/17] LTTng instrumentation - hugetlb Mathieu Desnoyers
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Martin Bligh, Frank Ch. Eigler, Hideo AOKI,
	Takashi Nishiie, Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-page.patch --]
[-- Type: text/plain, Size: 2930 bytes --]

Paging activity instrumentation. Instruments page allocation/free to keep track
of page allocation. This does not cover hugetlb activity, which is covered by a
separate patch.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Martin Bligh <mbligh@google.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/page.h |   16 ++++++++++++++++
 mm/page_alloc.c      |    6 ++++++
 2 files changed, 22 insertions(+)

Index: linux-2.6-lttng/mm/page_alloc.c
===================================================================
--- linux-2.6-lttng.orig/mm/page_alloc.c	2008-07-15 13:54:46.000000000 -0400
+++ linux-2.6-lttng/mm/page_alloc.c	2008-07-15 14:04:38.000000000 -0400
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/memcontrol.h>
 #include <linux/debugobjects.h>
+#include <trace/page.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -510,6 +511,8 @@ static void __free_pages_ok(struct page 
 	int i;
 	int reserved = 0;
 
+	trace_page_free(page, order);
+
 	for (i = 0 ; i < (1 << order) ; ++i)
 		reserved += free_pages_check(page + i);
 	if (reserved)
@@ -966,6 +969,8 @@ static void free_hot_cold_page(struct pa
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
 
+	trace_page_free(page, 0);
+
 	if (PageAnon(page))
 		page->mapping = NULL;
 	if (free_pages_check(page))
@@ -1630,6 +1635,7 @@ nopage:
 		show_mem();
 	}
 got_pg:
+	trace_page_alloc(page, order);
 	return page;
 }
 
Index: linux-2.6-lttng/include/trace/page.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/page.h	2008-07-15 14:04:38.000000000 -0400
@@ -0,0 +1,16 @@
+#ifndef _TRACE_PAGE_H
+#define _TRACE_PAGE_H
+
+#include <linux/tracepoint.h>
+
+/*
+ * mm_page_alloc : page can be NULL.
+ */
+DEFINE_TRACE(page_alloc,
+	TPPROTO(struct page *page, unsigned int order),
+	TPARGS(page, order));
+DEFINE_TRACE(page_free,
+	TPPROTO(struct page *page, unsigned int order),
+	TPARGS(page, order));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 13/17] LTTng instrumentation - hugetlb
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (11 preceding siblings ...)
  2008-07-15 22:26 ` [patch 12/17] LTTng instrumentation - page Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 14/17] LTTng instrumentation - net Mathieu Desnoyers
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, William Lee Irwin III, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-hugetlb.patch --]
[-- Type: text/plain, Size: 5680 bytes --]

Instrumentation of hugetlb activity (alloc/free/reserve/grab/release).

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Changelog :
- instrument page grab, buddy allocator alloc, page release.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: William Lee Irwin III <wli@holomorphy.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/hugetlb.h |   28 ++++++++++++++++++++++++++++
 mm/hugetlb.c            |   41 +++++++++++++++++++++++++++++------------
 2 files changed, 57 insertions(+), 12 deletions(-)

Index: linux-2.6-lttng/mm/hugetlb.c
===================================================================
--- linux-2.6-lttng.orig/mm/hugetlb.c	2008-07-15 13:54:45.000000000 -0400
+++ linux-2.6-lttng/mm/hugetlb.c	2008-07-15 14:05:36.000000000 -0400
@@ -14,6 +14,7 @@
 #include <linux/mempolicy.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
+#include <trace/hugetlb.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -123,6 +124,7 @@ static struct page *dequeue_huge_page_vm
 static void update_and_free_page(struct page *page)
 {
 	int i;
+	trace_hugetlb_page_release(page);
 	nr_huge_pages--;
 	nr_huge_pages_node[page_to_nid(page)]--;
 	for (i = 0; i < (HPAGE_SIZE / PAGE_SIZE); i++) {
@@ -141,6 +143,7 @@ static void free_huge_page(struct page *
 	int nid = page_to_nid(page);
 	struct address_space *mapping;
 
+	trace_hugetlb_page_free(page);
 	mapping = (struct address_space *) page_private(page);
 	set_page_private(page, 0);
 	BUG_ON(page_count(page));
@@ -205,7 +208,8 @@ static struct page *alloc_fresh_huge_pag
 	if (page) {
 		if (arch_prepare_hugepage(page)) {
 			__free_pages(page, HUGETLB_PAGE_ORDER);
-			return NULL;
+			page = NULL;
+			goto end;
 		}
 		set_compound_page_dtor(page, free_huge_page);
 		spin_lock(&hugetlb_lock);
@@ -214,7 +218,8 @@ static struct page *alloc_fresh_huge_pag
 		spin_unlock(&hugetlb_lock);
 		put_page(page); /* free it into the hugepage allocator */
 	}
-
+end:
+	trace_hugetlb_page_grab(page);
 	return page;
 }
 
@@ -288,7 +293,8 @@ static struct page *alloc_buddy_huge_pag
 	spin_lock(&hugetlb_lock);
 	if (surplus_huge_pages >= nr_overcommit_huge_pages) {
 		spin_unlock(&hugetlb_lock);
-		return NULL;
+		page = NULL;
+		goto end;
 	} else {
 		nr_huge_pages++;
 		surplus_huge_pages++;
@@ -321,7 +327,8 @@ static struct page *alloc_buddy_huge_pag
 		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
 	}
 	spin_unlock(&hugetlb_lock);
-
+end:
+	trace_hugetlb_buddy_pgalloc(page);
 	return page;
 }
 
@@ -510,6 +517,7 @@ static struct page *alloc_huge_page(stru
 		set_page_refcounted(page);
 		set_page_private(page, (unsigned long) mapping);
 	}
+	trace_hugetlb_page_alloc(page);
 	return page;
 }
 
@@ -1292,27 +1300,36 @@ out:
 
 int hugetlb_reserve_pages(struct inode *inode, long from, long to)
 {
-	long ret, chg;
+	int ret;
+	long chg;
 
 	chg = region_chg(&inode->i_mapping->private_list, from, to);
-	if (chg < 0)
-		return chg;
+	if (chg < 0) {
+		ret = chg;
+		goto end;
+	}
 
-	if (hugetlb_get_quota(inode->i_mapping, chg))
-		return -ENOSPC;
+	if (hugetlb_get_quota(inode->i_mapping, chg)) {
+		ret = -ENOSPC;
+		goto end;
+	}
 	ret = hugetlb_acct_memory(chg);
 	if (ret < 0) {
 		hugetlb_put_quota(inode->i_mapping, chg);
-		return ret;
+		goto end;
 	}
 	region_add(&inode->i_mapping->private_list, from, to);
-	return 0;
+end:
+	trace_hugetlb_pages_reserve(inode, from, to, ret);
+	return ret;
 }
 
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
 {
-	long chg = region_truncate(&inode->i_mapping->private_list, offset);
+	long chg;
 
+	trace_hugetlb_pages_unreserve(inode, offset, freed);
+	chg = region_truncate(&inode->i_mapping->private_list, offset);
 	spin_lock(&inode->i_lock);
 	inode->i_blocks -= BLOCKS_PER_HUGEPAGE * freed;
 	spin_unlock(&inode->i_lock);
Index: linux-2.6-lttng/include/trace/hugetlb.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/hugetlb.h	2008-07-15 14:05:36.000000000 -0400
@@ -0,0 +1,28 @@
+#ifndef _TRACE_HUGETLB_H
+#define _TRACE_HUGETLB_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(hugetlb_page_release,
+	TPPROTO(struct page *page),
+	TPARGS(page));
+DEFINE_TRACE(hugetlb_page_grab,
+	TPPROTO(struct page *page),
+	TPARGS(page));
+DEFINE_TRACE(hugetlb_buddy_pgalloc,
+	TPPROTO(struct page *page),
+	TPARGS(page));
+DEFINE_TRACE(hugetlb_page_alloc,
+	TPPROTO(struct page *page),
+	TPARGS(page));
+DEFINE_TRACE(hugetlb_page_free,
+	TPPROTO(struct page *page),
+	TPARGS(page));
+DEFINE_TRACE(hugetlb_pages_reserve,
+	TPPROTO(struct inode *inode, long from, long to, int ret),
+	TPARGS(inode, from, to, ret));
+DEFINE_TRACE(hugetlb_pages_unreserve,
+	TPPROTO(struct inode *inode, long offset, long freed),
+	TPARGS(inode, offset, freed));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 14/17] LTTng instrumentation - net
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (12 preceding siblings ...)
  2008-07-15 22:26 ` [patch 13/17] LTTng instrumentation - hugetlb Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 15/17] LTTng instrumentation - ipv4 Mathieu Desnoyers
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, netdev, Jeff Garzik, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-net.patch --]
[-- Type: text/plain, Size: 3007 bytes --]

Network device activity instrumentation (xmit/receive). Allows to detect when a
packet had arrived on the network card or when it is going to be sent. This is
the instrumentation point outside of the drivers that is the closest to the
hardware. It allows to detect the amount of time taken by a packet to go through
the kernel between the system call and the actual delivery to the network card
(given that system calls are instrumented).

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: netdev@vger.kernel.org
CC: Jeff Garzik <jgarzik@pobox.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/net.h |   14 ++++++++++++++
 net/core/dev.c      |    3 +++
 2 files changed, 17 insertions(+)

Index: linux-2.6-lttng/net/core/dev.c
===================================================================
--- linux-2.6-lttng.orig/net/core/dev.c	2008-07-15 14:51:51.000000000 -0400
+++ linux-2.6-lttng/net/core/dev.c	2008-07-15 15:15:59.000000000 -0400
@@ -121,6 +121,7 @@
 #include <linux/ctype.h>
 #include <linux/if_arp.h>
 #include <linux/if_vlan.h>
+#include <trace/net.h>
 
 #include "net-sysfs.h"
 
@@ -1702,6 +1703,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 	}
 
 gso:
+	trace_net_dev_xmit(skb);
 	txq = &dev->tx_queue;
 	spin_lock_prefetch(&txq->lock);
 
@@ -2107,6 +2109,7 @@ int netif_receive_skb(struct sk_buff *sk
 
 	__get_cpu_var(netdev_rx_stat).total++;
 
+	trace_net_dev_receive(skb);
 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
 	skb->mac_len = skb->network_header - skb->mac_header;
Index: linux-2.6-lttng/include/trace/net.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/net.h	2008-07-15 15:14:54.000000000 -0400
@@ -0,0 +1,14 @@
+#ifndef _TRACE_NET_H
+#define _TRACE_NET_H
+
+#include <net/sock.h>
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(net_dev_xmit,
+	TPPROTO(struct sk_buff *skb),
+	TPARGS(skb));
+DEFINE_TRACE(net_dev_receive,
+	TPPROTO(struct sk_buff *skb),
+	TPARGS(skb));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 15/17] LTTng instrumentation - ipv4
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (13 preceding siblings ...)
  2008-07-15 22:26 ` [patch 14/17] LTTng instrumentation - net Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 16/17] LTTng instrumentation - ipv6 Mathieu Desnoyers
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, netdev, David S. Miller, Alexey Kuznetsov,
	Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-ipv4.patch --]
[-- Type: text/plain, Size: 2800 bytes --]

Keep track of interface up/down for ipv4. Allows to keep track of interface
address changes in a trace.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: netdev@vger.kernel.org
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: 
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/ipv4.h |   14 ++++++++++++++
 net/ipv4/devinet.c   |    3 +++
 2 files changed, 17 insertions(+)

Index: linux-2.6-lttng/net/ipv4/devinet.c
===================================================================
--- linux-2.6-lttng.orig/net/ipv4/devinet.c	2008-07-15 14:51:51.000000000 -0400
+++ linux-2.6-lttng/net/ipv4/devinet.c	2008-07-15 15:16:11.000000000 -0400
@@ -61,6 +61,7 @@
 #include <net/ip_fib.h>
 #include <net/rtnetlink.h>
 #include <net/net_namespace.h>
+#include <trace/ipv4.h>
 
 static struct ipv4_devconf ipv4_devconf = {
 	.data = {
@@ -257,6 +258,7 @@ static void __inet_del_ifa(struct in_dev
 		struct in_ifaddr **ifap1 = &ifa1->ifa_next;
 
 		while ((ifa = *ifap1) != NULL) {
+			trace_ipv4_addr_del(ifa);
 			if (!(ifa->ifa_flags & IFA_F_SECONDARY) &&
 			    ifa1->ifa_scope <= ifa->ifa_scope)
 				last_prim = ifa;
@@ -363,6 +365,7 @@ static int __inet_insert_ifa(struct in_i
 			}
 			ifa->ifa_flags |= IFA_F_SECONDARY;
 		}
+		trace_ipv4_addr_add(ifa);
 	}
 
 	if (!(ifa->ifa_flags & IFA_F_SECONDARY)) {
Index: linux-2.6-lttng/include/trace/ipv4.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/ipv4.h	2008-07-15 15:16:11.000000000 -0400
@@ -0,0 +1,14 @@
+#ifndef _TRACE_IPV4_H
+#define _TRACE_IPV4_H
+
+#include <linux/inetdevice.h>
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(ipv4_addr_add,
+	TPPROTO(struct in_ifaddr *ifa),
+	TPARGS(ifa));
+DEFINE_TRACE(ipv4_addr_del,
+	TPPROTO(struct in_ifaddr *ifa),
+	TPARGS(ifa));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 16/17] LTTng instrumentation - ipv6
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (14 preceding siblings ...)
  2008-07-15 22:26 ` [patch 15/17] LTTng instrumentation - ipv4 Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-15 22:26 ` [patch 17/17] ftrace port to tracepoints Mathieu Desnoyers
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Pekka Savola, netdev, David S. Miller,
	Alexey Kuznetsov, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: lttng-instrumentation-ipv6.patch --]
[-- Type: text/plain, Size: 2759 bytes --]

Instrument addr_add and del of network interfaces. Lets a tracer know the
interface address changes.

Those tracepoints are used by LTTng.

About the performance impact of tracepoints (which is comparable to markers),
even without immediate values optimizations, tests done by Hideo Aoki on ia64
show no regression. His test case was using hackbench on a kernel where
scheduler instrumentation (about 5 events in code scheduler code) was added.
See the "Tracepoints" patch header for performance result detail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Pekka Savola <pekkas@netcore.fi>
CC: netdev@vger.kernel.org
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 include/trace/ipv6.h |   14 ++++++++++++++
 net/ipv6/addrconf.c  |    4 ++++
 2 files changed, 18 insertions(+)

Index: linux-2.6-lttng/net/ipv6/addrconf.c
===================================================================
--- linux-2.6-lttng.orig/net/ipv6/addrconf.c	2008-07-15 14:51:51.000000000 -0400
+++ linux-2.6-lttng/net/ipv6/addrconf.c	2008-07-15 15:16:14.000000000 -0400
@@ -85,6 +85,7 @@
 
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <trace/ipv6.h>
 
 /* Set to 3 to get tracing... */
 #define ACONF_DEBUG 2
@@ -650,6 +651,8 @@ ipv6_add_addr(struct inet6_dev *idev, co
 	/* For caller */
 	in6_ifa_hold(ifa);
 
+	trace_ipv6_addr_add(ifa);
+
 	/* Add to big hash table */
 	hash = ipv6_addr_hash(addr);
 
@@ -2163,6 +2166,7 @@ static int inet6_addr_del(struct net *ne
 			in6_ifa_hold(ifp);
 			read_unlock_bh(&idev->lock);
 
+			trace_ipv6_addr_del(ifp);
 			ipv6_del_addr(ifp);
 
 			/* If the last address is deleted administratively,
Index: linux-2.6-lttng/include/trace/ipv6.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/ipv6.h	2008-07-15 15:16:14.000000000 -0400
@@ -0,0 +1,14 @@
+#ifndef _TRACE_IPV6_H
+#define _TRACE_IPV6_H
+
+#include <net/if_inet6.h>
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(ipv6_addr_add,
+	TPPROTO(struct inet6_ifaddr *ifa),
+	TPARGS(ifa));
+DEFINE_TRACE(ipv6_addr_del,
+	TPPROTO(struct inet6_ifaddr *ifa),
+	TPARGS(ifa));
+
+#endif

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [patch 17/17] ftrace port to tracepoints
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (15 preceding siblings ...)
  2008-07-15 22:26 ` [patch 16/17] LTTng instrumentation - ipv6 Mathieu Desnoyers
@ 2008-07-15 22:26 ` Mathieu Desnoyers
  2008-07-16  8:51 ` [patch 00/17] Tracepoints v4 for linux-next Peter Zijlstra
  2008-07-18 15:41 ` Masami Hiramatsu
  18 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-15 22:26 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

[-- Attachment #1: ftrace-port-to-tracepoints.patch --]
[-- Type: text/plain, Size: 14284 bytes --]

Porting the trace_mark() used by ftrace to tracepoints. (cleanup)

Changelog :
- Change error messages : marker -> tracepoint

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: 'Peter Zijlstra' <peterz@infradead.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: 'Ingo Molnar' <mingo@elte.hu>
CC: 'Hideo AOKI' <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: 'Steven Rostedt' <rostedt@goodmis.org>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 kernel/trace/trace_sched_switch.c |  120 ++++++---------------------------
 kernel/trace/trace_sched_wakeup.c |  135 +++++++++-----------------------------
 2 files changed, 58 insertions(+), 197 deletions(-)

Index: linux-2.6-lttng/kernel/trace/trace_sched_switch.c
===================================================================
--- linux-2.6-lttng.orig/kernel/trace/trace_sched_switch.c	2008-07-15 17:41:18.000000000 -0400
+++ linux-2.6-lttng/kernel/trace/trace_sched_switch.c	2008-07-15 17:41:59.000000000 -0400
@@ -9,8 +9,8 @@
 #include <linux/debugfs.h>
 #include <linux/kallsyms.h>
 #include <linux/uaccess.h>
-#include <linux/marker.h>
 #include <linux/ftrace.h>
+#include <trace/sched.h>
 
 #include "trace.h"
 
@@ -19,16 +19,17 @@ static int __read_mostly	tracer_enabled;
 static atomic_t			sched_ref;
 
 static void
-sched_switch_func(void *private, void *__rq, struct task_struct *prev,
+probe_sched_switch(struct rq *__rq, struct task_struct *prev,
 			struct task_struct *next)
 {
-	struct trace_array **ptr = private;
-	struct trace_array *tr = *ptr;
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	long disabled;
 	int cpu;
 
+	if (!atomic_read(&sched_ref))
+		return;
+
 	tracing_record_cmdline(prev);
 	tracing_record_cmdline(next);
 
@@ -37,95 +38,42 @@ sched_switch_func(void *private, void *_
 
 	local_irq_save(flags);
 	cpu = raw_smp_processor_id();
-	data = tr->data[cpu];
+	data = ctx_trace->data[cpu];
 	disabled = atomic_inc_return(&data->disabled);
 
 	if (likely(disabled == 1))
-		tracing_sched_switch_trace(tr, data, prev, next, flags);
+		tracing_sched_switch_trace(ctx_trace, data, prev, next, flags);
 
 	atomic_dec(&data->disabled);
 	local_irq_restore(flags);
 }
 
-static notrace void
-sched_switch_callback(void *probe_data, void *call_data,
-		      const char *format, va_list *args)
-{
-	struct task_struct *prev;
-	struct task_struct *next;
-	struct rq *__rq;
-
-	if (!atomic_read(&sched_ref))
-		return;
-
-	/* skip prev_pid %d next_pid %d prev_state %ld */
-	(void)va_arg(*args, int);
-	(void)va_arg(*args, int);
-	(void)va_arg(*args, long);
-	__rq = va_arg(*args, typeof(__rq));
-	prev = va_arg(*args, typeof(prev));
-	next = va_arg(*args, typeof(next));
-
-	/*
-	 * If tracer_switch_func only points to the local
-	 * switch func, it still needs the ptr passed to it.
-	 */
-	sched_switch_func(probe_data, __rq, prev, next);
-}
-
 static void
-wakeup_func(void *private, void *__rq, struct task_struct *wakee, struct
-			task_struct *curr)
+probe_sched_wakeup(struct rq *__rq, struct task_struct *wakee)
 {
-	struct trace_array **ptr = private;
-	struct trace_array *tr = *ptr;
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	long disabled;
 	int cpu;
 
-	if (!tracer_enabled)
+	if (!likely(tracer_enabled))
 		return;
 
-	tracing_record_cmdline(curr);
+	tracing_record_cmdline(current);
 
 	local_irq_save(flags);
 	cpu = raw_smp_processor_id();
-	data = tr->data[cpu];
+	data = ctx_trace->data[cpu];
 	disabled = atomic_inc_return(&data->disabled);
 
 	if (likely(disabled == 1))
-		tracing_sched_wakeup_trace(tr, data, wakee, curr, flags);
+		tracing_sched_wakeup_trace(ctx_trace, data, wakee, current,
+			flags);
 
 	atomic_dec(&data->disabled);
 	local_irq_restore(flags);
 }
 
-static notrace void
-wake_up_callback(void *probe_data, void *call_data,
-		 const char *format, va_list *args)
-{
-	struct task_struct *curr;
-	struct task_struct *task;
-	struct rq *__rq;
-
-	if (likely(!tracer_enabled))
-		return;
-
-	/* Skip pid %d state %ld */
-	(void)va_arg(*args, int);
-	(void)va_arg(*args, long);
-	/* now get the meat: "rq %p task %p rq->curr %p" */
-	__rq = va_arg(*args, typeof(__rq));
-	task = va_arg(*args, typeof(task));
-	curr = va_arg(*args, typeof(curr));
-
-	tracing_record_cmdline(task);
-	tracing_record_cmdline(curr);
-
-	wakeup_func(probe_data, __rq, task, curr);
-}
-
 static void sched_switch_reset(struct trace_array *tr)
 {
 	int cpu;
@@ -140,60 +88,40 @@ static int tracing_sched_register(void)
 {
 	int ret;
 
-	ret = marker_probe_register("kernel_sched_wakeup",
-			"pid %d state %ld ## rq %p task %p rq->curr %p",
-			wake_up_callback,
-			&ctx_trace);
+	ret = register_trace_sched_wakeup(probe_sched_wakeup);
 	if (ret) {
-		pr_info("wakeup trace: Couldn't add marker"
+		pr_info("wakeup trace: Couldn't activate tracepoint"
 			" probe to kernel_sched_wakeup\n");
 		return ret;
 	}
 
-	ret = marker_probe_register("kernel_sched_wakeup_new",
-			"pid %d state %ld ## rq %p task %p rq->curr %p",
-			wake_up_callback,
-			&ctx_trace);
+	ret = register_trace_sched_wakeup_new(probe_sched_wakeup);
 	if (ret) {
-		pr_info("wakeup trace: Couldn't add marker"
+		pr_info("wakeup trace: Couldn't activate tracepoint"
 			" probe to kernel_sched_wakeup_new\n");
 		goto fail_deprobe;
 	}
 
-	ret = marker_probe_register("kernel_sched_schedule",
-		"prev_pid %d next_pid %d prev_state %ld "
-		"## rq %p prev %p next %p",
-		sched_switch_callback,
-		&ctx_trace);
+	ret = register_trace_sched_switch(probe_sched_switch);
 	if (ret) {
-		pr_info("sched trace: Couldn't add marker"
+		pr_info("sched trace: Couldn't activate tracepoint"
 			" probe to kernel_sched_schedule\n");
 		goto fail_deprobe_wake_new;
 	}
 
 	return ret;
 fail_deprobe_wake_new:
-	marker_probe_unregister("kernel_sched_wakeup_new",
-				wake_up_callback,
-				&ctx_trace);
+	unregister_trace_sched_wakeup_new(probe_sched_wakeup);
 fail_deprobe:
-	marker_probe_unregister("kernel_sched_wakeup",
-				wake_up_callback,
-				&ctx_trace);
+	unregister_trace_sched_wakeup(probe_sched_wakeup);
 	return ret;
 }
 
 static void tracing_sched_unregister(void)
 {
-	marker_probe_unregister("kernel_sched_schedule",
-				sched_switch_callback,
-				&ctx_trace);
-	marker_probe_unregister("kernel_sched_wakeup_new",
-				wake_up_callback,
-				&ctx_trace);
-	marker_probe_unregister("kernel_sched_wakeup",
-				wake_up_callback,
-				&ctx_trace);
+	unregister_trace_sched_switch(probe_sched_switch);
+	unregister_trace_sched_wakeup_new(probe_sched_wakeup);
+	unregister_trace_sched_wakeup(probe_sched_wakeup);
 }
 
 static void tracing_start_sched_switch(void)
Index: linux-2.6-lttng/kernel/trace/trace_sched_wakeup.c
===================================================================
--- linux-2.6-lttng.orig/kernel/trace/trace_sched_wakeup.c	2008-07-15 17:41:18.000000000 -0400
+++ linux-2.6-lttng/kernel/trace/trace_sched_wakeup.c	2008-07-15 17:41:59.000000000 -0400
@@ -15,7 +15,7 @@
 #include <linux/kallsyms.h>
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>
-#include <linux/marker.h>
+#include <trace/sched.h>
 
 #include "trace.h"
 
@@ -109,18 +109,18 @@ static int report_latency(cycle_t delta)
 }
 
 static void notrace
-wakeup_sched_switch(void *private, void *rq, struct task_struct *prev,
+probe_wakeup_sched_switch(struct rq *rq, struct task_struct *prev,
 	struct task_struct *next)
 {
 	unsigned long latency = 0, t0 = 0, t1 = 0;
-	struct trace_array **ptr = private;
-	struct trace_array *tr = *ptr;
 	struct trace_array_cpu *data;
 	cycle_t T0, T1, delta;
 	unsigned long flags;
 	long disabled;
 	int cpu;
 
+	tracing_record_cmdline(prev);
+
 	if (unlikely(!tracer_enabled))
 		return;
 
@@ -137,11 +137,11 @@ wakeup_sched_switch(void *private, void 
 		return;
 
 	/* The task we are waiting for is waking up */
-	data = tr->data[wakeup_cpu];
+	data = wakeup_trace->data[wakeup_cpu];
 
 	/* disable local data, not wakeup_cpu data */
 	cpu = raw_smp_processor_id();
-	disabled = atomic_inc_return(&tr->data[cpu]->disabled);
+	disabled = atomic_inc_return(&wakeup_trace->data[cpu]->disabled);
 	if (likely(disabled != 1))
 		goto out;
 
@@ -151,7 +151,7 @@ wakeup_sched_switch(void *private, void 
 	if (unlikely(!tracer_enabled || next != wakeup_task))
 		goto out_unlock;
 
-	trace_function(tr, data, CALLER_ADDR1, CALLER_ADDR2, flags);
+	trace_function(wakeup_trace, data, CALLER_ADDR1, CALLER_ADDR2, flags);
 
 	/*
 	 * usecs conversion is slow so we try to delay the conversion
@@ -170,38 +170,13 @@ wakeup_sched_switch(void *private, void 
 	t0 = nsecs_to_usecs(T0);
 	t1 = nsecs_to_usecs(T1);
 
-	update_max_tr(tr, wakeup_task, wakeup_cpu);
+	update_max_tr(wakeup_trace, wakeup_task, wakeup_cpu);
 
 out_unlock:
-	__wakeup_reset(tr);
+	__wakeup_reset(wakeup_trace);
 	spin_unlock_irqrestore(&wakeup_lock, flags);
 out:
-	atomic_dec(&tr->data[cpu]->disabled);
-}
-
-static notrace void
-sched_switch_callback(void *probe_data, void *call_data,
-		      const char *format, va_list *args)
-{
-	struct task_struct *prev;
-	struct task_struct *next;
-	struct rq *__rq;
-
-	/* skip prev_pid %d next_pid %d prev_state %ld */
-	(void)va_arg(*args, int);
-	(void)va_arg(*args, int);
-	(void)va_arg(*args, long);
-	__rq = va_arg(*args, typeof(__rq));
-	prev = va_arg(*args, typeof(prev));
-	next = va_arg(*args, typeof(next));
-
-	tracing_record_cmdline(prev);
-
-	/*
-	 * If tracer_switch_func only points to the local
-	 * switch func, it still needs the ptr passed to it.
-	 */
-	wakeup_sched_switch(probe_data, __rq, prev, next);
+	atomic_dec(&wakeup_trace->data[cpu]->disabled);
 }
 
 static void __wakeup_reset(struct trace_array *tr)
@@ -235,19 +210,24 @@ static void wakeup_reset(struct trace_ar
 }
 
 static void
-wakeup_check_start(struct trace_array *tr, struct task_struct *p,
-		   struct task_struct *curr)
+probe_wakeup(struct rq *rq, struct task_struct *p)
 {
 	int cpu = smp_processor_id();
 	unsigned long flags;
 	long disabled;
 
+	if (likely(!tracer_enabled))
+		return;
+
+	tracing_record_cmdline(p);
+	tracing_record_cmdline(current);
+
 	if (likely(!rt_task(p)) ||
 			p->prio >= wakeup_prio ||
-			p->prio >= curr->prio)
+			p->prio >= current->prio)
 		return;
 
-	disabled = atomic_inc_return(&tr->data[cpu]->disabled);
+	disabled = atomic_inc_return(&wakeup_trace->data[cpu]->disabled);
 	if (unlikely(disabled != 1))
 		goto out;
 
@@ -259,7 +239,7 @@ wakeup_check_start(struct trace_array *t
 		goto out_locked;
 
 	/* reset the trace */
-	__wakeup_reset(tr);
+	__wakeup_reset(wakeup_trace);
 
 	wakeup_cpu = task_cpu(p);
 	wakeup_prio = p->prio;
@@ -269,74 +249,37 @@ wakeup_check_start(struct trace_array *t
 
 	local_save_flags(flags);
 
-	tr->data[wakeup_cpu]->preempt_timestamp = ftrace_now(cpu);
-	trace_function(tr, tr->data[wakeup_cpu],
+	wakeup_trace->data[wakeup_cpu]->preempt_timestamp = ftrace_now(cpu);
+	trace_function(wakeup_trace, wakeup_trace->data[wakeup_cpu],
 		       CALLER_ADDR1, CALLER_ADDR2, flags);
 
 out_locked:
 	spin_unlock(&wakeup_lock);
 out:
-	atomic_dec(&tr->data[cpu]->disabled);
-}
-
-static notrace void
-wake_up_callback(void *probe_data, void *call_data,
-		 const char *format, va_list *args)
-{
-	struct trace_array **ptr = probe_data;
-	struct trace_array *tr = *ptr;
-	struct task_struct *curr;
-	struct task_struct *task;
-	struct rq *__rq;
-
-	if (likely(!tracer_enabled))
-		return;
-
-	/* Skip pid %d state %ld */
-	(void)va_arg(*args, int);
-	(void)va_arg(*args, long);
-	/* now get the meat: "rq %p task %p rq->curr %p" */
-	__rq = va_arg(*args, typeof(__rq));
-	task = va_arg(*args, typeof(task));
-	curr = va_arg(*args, typeof(curr));
-
-	tracing_record_cmdline(task);
-	tracing_record_cmdline(curr);
-
-	wakeup_check_start(tr, task, curr);
+	atomic_dec(&wakeup_trace->data[cpu]->disabled);
 }
 
 static void start_wakeup_tracer(struct trace_array *tr)
 {
 	int ret;
 
-	ret = marker_probe_register("kernel_sched_wakeup",
-			"pid %d state %ld ## rq %p task %p rq->curr %p",
-			wake_up_callback,
-			&wakeup_trace);
+	ret = register_trace_sched_wakeup(probe_wakeup);
 	if (ret) {
-		pr_info("wakeup trace: Couldn't add marker"
+		pr_info("wakeup trace: Couldn't activate tracepoint"
 			" probe to kernel_sched_wakeup\n");
 		return;
 	}
 
-	ret = marker_probe_register("kernel_sched_wakeup_new",
-			"pid %d state %ld ## rq %p task %p rq->curr %p",
-			wake_up_callback,
-			&wakeup_trace);
+	ret = register_trace_sched_wakeup_new(probe_wakeup);
 	if (ret) {
-		pr_info("wakeup trace: Couldn't add marker"
+		pr_info("wakeup trace: Couldn't activate tracepoint"
 			" probe to kernel_sched_wakeup_new\n");
 		goto fail_deprobe;
 	}
 
-	ret = marker_probe_register("kernel_sched_schedule",
-		"prev_pid %d next_pid %d prev_state %ld "
-		"## rq %p prev %p next %p",
-		sched_switch_callback,
-		&wakeup_trace);
+	ret = register_trace_sched_switch(probe_wakeup_sched_switch);
 	if (ret) {
-		pr_info("sched trace: Couldn't add marker"
+		pr_info("sched trace: Couldn't activate tracepoint"
 			" probe to kernel_sched_schedule\n");
 		goto fail_deprobe_wake_new;
 	}
@@ -358,28 +301,18 @@ static void start_wakeup_tracer(struct t
 
 	return;
 fail_deprobe_wake_new:
-	marker_probe_unregister("kernel_sched_wakeup_new",
-				wake_up_callback,
-				&wakeup_trace);
+	unregister_trace_sched_wakeup_new(probe_wakeup);
 fail_deprobe:
-	marker_probe_unregister("kernel_sched_wakeup",
-				wake_up_callback,
-				&wakeup_trace);
+	unregister_trace_sched_wakeup(probe_wakeup);
 }
 
 static void stop_wakeup_tracer(struct trace_array *tr)
 {
 	tracer_enabled = 0;
 	unregister_ftrace_function(&trace_ops);
-	marker_probe_unregister("kernel_sched_schedule",
-				sched_switch_callback,
-				&wakeup_trace);
-	marker_probe_unregister("kernel_sched_wakeup_new",
-				wake_up_callback,
-				&wakeup_trace);
-	marker_probe_unregister("kernel_sched_wakeup",
-				wake_up_callback,
-				&wakeup_trace);
+	unregister_trace_sched_switch(probe_wakeup_sched_switch);
+	unregister_trace_sched_wakeup_new(probe_wakeup);
+	unregister_trace_sched_wakeup(probe_wakeup);
 }
 
 static void wakeup_tracer_init(struct trace_array *tr)

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 06/17] LTTng instrumentation - scheduler
  2008-07-15 22:26 ` [patch 06/17] LTTng instrumentation - scheduler Mathieu Desnoyers
@ 2008-07-16  8:30   ` Peter Zijlstra
  2008-07-16 14:18     ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16  8:30 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu,
	Steven Rostedt, Thomas Gleixner, Frank Ch. Eigler, Hideo AOKI,
	Takashi Nishiie, Eduard - Gabriel Munteanu

On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> plain text document attachment (lttng-instrumentation-scheduler.patch)
> Instrument the scheduler activity (sched_switch, migration, wakeups, wait for a
> task, signal delivery) and process/thread creation/destruction (fork, exit,
> kthread stop). Actually, kthread creation is not instrumented in this patch
> because it is architecture dependent. It allows to connect tracers such as
> ftrace which detects scheduling latencies, good/bad scheduler decisions. Tools
> like LTTng can export this scheduler information along with instrumentation of
> the rest of the kernel activity to perform post-mortem analysis on the scheduler
> activity.
> 
> About the performance impact of tracepoints (which is comparable to markers),
> even without immediate values optimizations, tests done by Hideo Aoki on ia64
> show no regression. His test case was using hackbench on a kernel where
> scheduler instrumentation (about 5 events in code scheduler code) was added.
> See the "Tracepoints" patch header for performance result detail.
> 
> Changelog :
> - Change instrumentation location and parameter to match ftrace instrumentation,
>   previously done with kernel markers.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: 'Peter Zijlstra' <peterz@infradead.org>
> CC: 'Steven Rostedt' <rostedt@goodmis.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Masami Hiramatsu <mhiramat@redhat.com>
> CC: "Frank Ch. Eigler" <fche@redhat.com>
> CC: 'Ingo Molnar' <mingo@elte.hu>
> CC: 'Hideo AOKI' <haoki@redhat.com>
> CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  include/trace/sched.h |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/exit.c         |    6 ++++++
>  kernel/fork.c         |    3 +++
>  kernel/kthread.c      |    5 +++++
>  kernel/sched.c        |   17 ++++++-----------
>  kernel/signal.c       |    3 +++
>  6 files changed, 68 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6-lttng/kernel/kthread.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/kthread.c	2008-07-15 14:51:49.000000000 -0400
> +++ linux-2.6-lttng/kernel/kthread.c	2008-07-15 15:12:54.000000000 -0400
> @@ -13,6 +13,7 @@
>  #include <linux/file.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> +#include <trace/sched.h>
>  
>  #define KTHREAD_NICE_LEVEL (-5)
>  
> @@ -187,6 +188,8 @@ int kthread_stop(struct task_struct *k)
>  	/* It could exit after stop_info.k set, but before wake_up_process. */
>  	get_task_struct(k);
>  
> +	trace_sched_kthread_stop(k);
> +
>  	/* Must init completion *before* thread sees kthread_stop_info.k */
>  	init_completion(&kthread_stop_info.done);
>  	smp_wmb();
> @@ -202,6 +205,8 @@ int kthread_stop(struct task_struct *k)
>  	ret = kthread_stop_info.err;
>  	mutex_unlock(&kthread_stop_lock);
>  
> +	trace_sched_kthread_stop_ret(ret);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL(kthread_stop);

Why do we need two trace points in this function?

> Index: linux-2.6-lttng/kernel/sched.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/sched.c	2008-07-15 14:51:50.000000000 -0400
> +++ linux-2.6-lttng/kernel/sched.c	2008-07-15 15:13:49.000000000 -0400
> @@ -71,6 +71,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/ctype.h>
>  #include <linux/ftrace.h>
> +#include <trace/sched.h>
>  
>  #include <asm/tlb.h>
>  #include <asm/irq_regs.h>
> @@ -1987,6 +1988,7 @@ void wait_task_inactive(struct task_stru
>  		 * just go back and repeat.
>  		 */
>  		rq = task_rq_lock(p, &flags);
> +		trace_sched_wait_task(rq, p);
>  		running = task_running(rq, p);
>  		on_rq = p->se.on_rq;
>  		task_rq_unlock(rq, &flags);
> @@ -2337,9 +2339,7 @@ out_activate:
>  	success = 1;
>  
>  out_running:
> -	trace_mark(kernel_sched_wakeup,
> -		"pid %d state %ld ## rq %p task %p rq->curr %p",
> -		p->pid, p->state, rq, p, rq->curr);
> +	trace_sched_wakeup(rq, p);
>  	check_preempt_curr(rq, p);
>  
>  	p->state = TASK_RUNNING;
> @@ -2472,9 +2472,7 @@ void wake_up_new_task(struct task_struct
>  		p->sched_class->task_new(rq, p);
>  		inc_nr_running(rq);
>  	}
> -	trace_mark(kernel_sched_wakeup_new,
> -		"pid %d state %ld ## rq %p task %p rq->curr %p",
> -		p->pid, p->state, rq, p, rq->curr);
> +	trace_sched_wakeup_new(rq, p);
>  	check_preempt_curr(rq, p);
>  #ifdef CONFIG_SMP
>  	if (p->sched_class->task_wake_up)
> @@ -2647,11 +2645,7 @@ context_switch(struct rq *rq, struct tas
>  	struct mm_struct *mm, *oldmm;
>  
>  	prepare_task_switch(rq, prev, next);
> -	trace_mark(kernel_sched_schedule,
> -		"prev_pid %d next_pid %d prev_state %ld "
> -		"## rq %p prev %p next %p",
> -		prev->pid, next->pid, prev->state,
> -		rq, prev, next);
> +	trace_sched_switch(rq, prev, next);
>  	mm = next->mm;
>  	oldmm = prev->active_mm;
>  	/*
> @@ -2884,6 +2878,7 @@ static void sched_migrate_task(struct ta
>  	    || unlikely(cpu_is_offline(dest_cpu)))
>  		goto out;
>  
> +	trace_sched_migrate_task(rq, p, dest_cpu);
>  	/* force the process onto the specified CPU */
>  	if (migrate_task(p, dest_cpu, &req)) {
>  		/* Need to wait for migration thread (might exit: take ref). */
> Index: linux-2.6-lttng/kernel/exit.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/exit.c	2008-07-15 14:51:49.000000000 -0400
> +++ linux-2.6-lttng/kernel/exit.c	2008-07-15 15:12:54.000000000 -0400
> @@ -46,6 +46,7 @@
>  #include <linux/resource.h>
>  #include <linux/blkdev.h>
>  #include <linux/task_io_accounting_ops.h>
> +#include <trace/sched.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/unistd.h>
> @@ -149,6 +150,7 @@ static void __exit_signal(struct task_st
>  
>  static void delayed_put_task_struct(struct rcu_head *rhp)
>  {
> +	trace_sched_process_free(container_of(rhp, struct task_struct, rcu));
>  	put_task_struct(container_of(rhp, struct task_struct, rcu));
>  }

It might make sense to write it like:

static void delayed_put_task_struct(struct rcu_head *rhp)
{
	struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);

	trace_sched_process_free(tsk);
	put_task_struct(tsk);
}

> @@ -1040,6 +1042,8 @@ NORET_TYPE void do_exit(long code)
>  
>  	if (group_dead)
>  		acct_process();
> +	trace_sched_process_exit(tsk);
> +
>  	exit_sem(tsk);
>  	exit_files(tsk);
>  	exit_fs(tsk);
> @@ -1524,6 +1528,8 @@ static long do_wait(enum pid_type type, 
>  	struct task_struct *tsk;
>  	int flag, retval;
>  
> +	trace_sched_process_wait(pid);
> +
>  	add_wait_queue(&current->signal->wait_chldexit,&wait);
>  repeat:
>  	/* If there is nothing that can match our critier just get out */
> Index: linux-2.6-lttng/kernel/fork.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/fork.c	2008-07-15 14:51:49.000000000 -0400
> +++ linux-2.6-lttng/kernel/fork.c	2008-07-15 15:14:23.000000000 -0400
> @@ -56,6 +56,7 @@
>  #include <linux/proc_fs.h>
>  #include <linux/blkdev.h>
>  #include <linux/magic.h>
> +#include <trace/sched.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
> @@ -1362,6 +1363,8 @@ long do_fork(unsigned long clone_flags,
>  	if (!IS_ERR(p)) {
>  		struct completion vfork;
>  
> +		trace_sched_process_fork(current, p);
> +
>  		nr = task_pid_vnr(p);
>  
>  		if (clone_flags & CLONE_PARENT_SETTID)
> Index: linux-2.6-lttng/kernel/signal.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/signal.c	2008-07-15 14:49:14.000000000 -0400
> +++ linux-2.6-lttng/kernel/signal.c	2008-07-15 15:12:54.000000000 -0400
> @@ -26,6 +26,7 @@
>  #include <linux/freezer.h>
>  #include <linux/pid_namespace.h>
>  #include <linux/nsproxy.h>
> +#include <trace/sched.h>
>  
>  #include <asm/param.h>
>  #include <asm/uaccess.h>
> @@ -807,6 +808,8 @@ static int send_signal(int sig, struct s
>  	struct sigpending *pending;
>  	struct sigqueue *q;
>  
> +	trace_sched_signal_send(sig, t);
> +
>  	assert_spin_locked(&t->sighand->siglock);
>  	if (!prepare_signal(sig, t))
>  		return 0;

Would it make sense to also put a trace point on receiveing a signal?

/me utterly clueless about the whole signal stuff.

> Index: linux-2.6-lttng/include/trace/sched.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/trace/sched.h	2008-07-15 15:12:54.000000000 -0400
> @@ -0,0 +1,45 @@
> +#ifndef _TRACE_SCHED_H
> +#define _TRACE_SCHED_H
> +
> +#include <linux/sched.h>
> +#include <linux/tracepoint.h>
> +
> +DEFINE_TRACE(sched_kthread_stop,
> +	TPPROTO(struct task_struct *t),
> +	TPARGS(t));
> +DEFINE_TRACE(sched_kthread_stop_ret,
> +	TPPROTO(int ret),
> +	TPARGS(ret));
> +DEFINE_TRACE(sched_wait_task,
> +	TPPROTO(struct rq *rq, struct task_struct *p),
> +	TPARGS(rq, p));
> +DEFINE_TRACE(sched_wakeup,
> +	TPPROTO(struct rq *rq, struct task_struct *p),
> +	TPARGS(rq, p));
> +DEFINE_TRACE(sched_wakeup_new,
> +	TPPROTO(struct rq *rq, struct task_struct *p),
> +	TPARGS(rq, p));
> +DEFINE_TRACE(sched_switch,
> +	TPPROTO(struct rq *rq, struct task_struct *prev,
> +		struct task_struct *next),
> +	TPARGS(rq, prev, next));
> +DEFINE_TRACE(sched_migrate_task,
> +	TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
> +	TPARGS(rq, p, dest_cpu));
> +DEFINE_TRACE(sched_process_free,
> +	TPPROTO(struct task_struct *p),
> +	TPARGS(p));
> +DEFINE_TRACE(sched_process_exit,
> +	TPPROTO(struct task_struct *p),
> +	TPARGS(p));
> +DEFINE_TRACE(sched_process_wait,
> +	TPPROTO(struct pid *pid),
> +	TPARGS(pid));
> +DEFINE_TRACE(sched_process_fork,
> +	TPPROTO(struct task_struct *parent, struct task_struct *child),
> +	TPARGS(parent, child));
> +DEFINE_TRACE(sched_signal_send,
> +	TPPROTO(int sig, struct task_struct *p),
> +	TPARGS(sig, p));
> +
> +#endif
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 07/17] LTTng instrumentation - timer
  2008-07-15 22:26 ` [patch 07/17] LTTng instrumentation - timer Mathieu Desnoyers
@ 2008-07-16  8:34   ` Peter Zijlstra
  2008-07-16 14:34     ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16  8:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu,
	David S. Miller, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu, Thomas Gleixner

On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> plain text document attachment (lttng-instrumentation-timer.patch)
> Instrument timer activity (timer set, expired, current time updates) to keep
> information about the "real time" flow within the kernel. It can be used by a
> trace analysis tool to synchronize information coming from various sources, e.g.
> to merge traces with system logs.
> 
> Those tracepoints are used by LTTng.
> 
> About the performance impact of tracepoints (which is comparable to markers),
> even without immediate values optimizations, tests done by Hideo Aoki on ia64
> show no regression. His test case was using hackbench on a kernel where
> scheduler instrumentation (about 5 events in code scheduler code) was added.
> See the "Tracepoints" patch header for performance result detail.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: 'Ingo Molnar' <mingo@elte.hu>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: Masami Hiramatsu <mhiramat@redhat.com>
> CC: 'Peter Zijlstra' <peterz@infradead.org>
> CC: "Frank Ch. Eigler" <fche@redhat.com>
> CC: 'Hideo AOKI' <haoki@redhat.com>
> CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> CC: 'Steven Rostedt' <rostedt@goodmis.org>
> CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  include/trace/timer.h |   24 ++++++++++++++++++++++++
>  kernel/itimer.c       |    5 +++++
>  kernel/timer.c        |    8 +++++++-
>  3 files changed, 36 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6-lttng/kernel/itimer.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/itimer.c	2008-07-15 14:49:14.000000000 -0400
> +++ linux-2.6-lttng/kernel/itimer.c	2008-07-15 15:14:28.000000000 -0400
> @@ -12,6 +12,7 @@
>  #include <linux/time.h>
>  #include <linux/posix-timers.h>
>  #include <linux/hrtimer.h>
> +#include <trace/timer.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -132,6 +133,8 @@ enum hrtimer_restart it_real_fn(struct h
>  	struct signal_struct *sig =
>  		container_of(timer, struct signal_struct, real_timer);
>  
> +	trace_timer_itimer_expired(sig);
> +
>  	kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);
>  
>  	return HRTIMER_NORESTART;
> @@ -157,6 +160,8 @@ int do_setitimer(int which, struct itime
>  	    !timeval_valid(&value->it_interval))
>  		return -EINVAL;
>  
> +	trace_timer_itimer_set(which, value);
> +
>  	switch (which) {
>  	case ITIMER_REAL:
>  again:
> Index: linux-2.6-lttng/kernel/timer.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/timer.c	2008-07-15 14:51:50.000000000 -0400
> +++ linux-2.6-lttng/kernel/timer.c	2008-07-15 15:14:28.000000000 -0400
> @@ -37,12 +37,14 @@
>  #include <linux/delay.h>
>  #include <linux/tick.h>
>  #include <linux/kallsyms.h>
> +#include <trace/timer.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/unistd.h>
>  #include <asm/div64.h>
>  #include <asm/timex.h>
>  #include <asm/io.h>
> +#include <asm/irq_regs.h>
>  
>  u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
>  
> @@ -288,6 +290,7 @@ static void internal_add_timer(struct tv
>  		i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
>  		vec = base->tv5.vec + i;
>  	}
> +	trace_timer_set(timer);
>  	/*
>  	 * Timers are FIFO:
>  	 */
> @@ -1066,6 +1069,7 @@ void do_timer(unsigned long ticks)
>  {
>  	jiffies_64 += ticks;
>  	update_times(ticks);
> +	trace_timer_update_time(&xtime, &wall_to_monotonic);
>  }

This is a very dangerous trace point - we're holding xtime lock here.

Ah, I see you make that comment below too, are you sure you want to do
this? 

Thomas, any input?

>  #ifdef __ARCH_WANT_SYS_ALARM
> @@ -1147,7 +1151,9 @@ asmlinkage long sys_getegid(void)
>  
>  static void process_timeout(unsigned long __data)
>  {
> -	wake_up_process((struct task_struct *)__data);
> +	struct task_struct *task = (struct task_struct *)__data;
> +	trace_timer_timeout(task);
> +	wake_up_process(task);
>  }
>  
>  /**
> Index: linux-2.6-lttng/include/trace/timer.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/trace/timer.h	2008-07-15 15:14:28.000000000 -0400
> @@ -0,0 +1,24 @@
> +#ifndef _TRACE_TIMER_H
> +#define _TRACE_TIMER_H
> +
> +#include <linux/tracepoint.h>
> +
> +DEFINE_TRACE(timer_itimer_expired,
> +	TPPROTO(struct signal_struct *sig),
> +	TPARGS(sig));
> +DEFINE_TRACE(timer_itimer_set,
> +	TPPROTO(int which, struct itimerval *value),
> +	TPARGS(which, value));
> +DEFINE_TRACE(timer_set,
> +	TPPROTO(struct timer_list *timer),
> +	TPARGS(timer));
> +/*
> + * xtime_lock is taken when kernel_timer_update_time tracepoint is reached.
> + */
> +DEFINE_TRACE(timer_update_time,
> +	TPPROTO(struct timespec *_xtime, struct timespec *_wall_to_monotonic),
> +	TPARGS(_xtime, _wall_to_monotonic));
> +DEFINE_TRACE(timer_timeout,
> +	TPPROTO(struct task_struct *p),
> +	TPARGS(p));
> +#endif
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 09/17] LTTng instrumentation - filemap
  2008-07-15 22:26 ` [patch 09/17] LTTng instrumentation - filemap Mathieu Desnoyers
@ 2008-07-16  8:35   ` Peter Zijlstra
  2008-07-16 14:37     ` Mathieu Desnoyers
  2008-07-17  6:25   ` Nick Piggin
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16  8:35 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, linux-mm,
	Dave Hansen, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> plain text document attachment (lttng-instrumentation-filemap.patch)
> Instrumentation of waits caused by memory accesses on mmap regions.
> 
> Those tracepoints are used by LTTng.
> 
> About the performance impact of tracepoints (which is comparable to markers),
> even without immediate values optimizations, tests done by Hideo Aoki on ia64
> show no regression. His test case was using hackbench on a kernel where
> scheduler instrumentation (about 5 events in code scheduler code) was added.
> See the "Tracepoints" patch header for performance result detail.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: linux-mm@kvack.org
> CC: Dave Hansen <haveblue@us.ibm.com>
> CC: Masami Hiramatsu <mhiramat@redhat.com>
> CC: 'Peter Zijlstra' <peterz@infradead.org>
> CC: "Frank Ch. Eigler" <fche@redhat.com>
> CC: 'Ingo Molnar' <mingo@elte.hu>
> CC: 'Hideo AOKI' <haoki@redhat.com>
> CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> CC: 'Steven Rostedt' <rostedt@goodmis.org>
> CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  include/trace/filemap.h |   13 +++++++++++++
>  mm/filemap.c            |    3 +++
>  2 files changed, 16 insertions(+)
> 
> Index: linux-2.6-lttng/mm/filemap.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/filemap.c	2008-07-15 14:51:50.000000000 -0400
> +++ linux-2.6-lttng/mm/filemap.c	2008-07-15 15:14:46.000000000 -0400
> @@ -33,6 +33,7 @@
>  #include <linux/cpuset.h>
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
> +#include <trace/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -541,9 +542,11 @@ void wait_on_page_bit(struct page *page,
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
>  
> +	trace_filemap_wait_start(page, bit_nr);
>  	if (test_bit(bit_nr, &page->flags))
>  		__wait_on_bit(page_waitqueue(page), &wait, sync_page,
>  							TASK_UNINTERRUPTIBLE);
> +	trace_filemap_wait_end(page, bit_nr);
>  }
>  EXPORT_SYMBOL(wait_on_page_bit);

I don't like the trace_filemap_wait_* naming..

trace_wait_on_page_* might make more sense

> Index: linux-2.6-lttng/include/trace/filemap.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/trace/filemap.h	2008-07-15 15:14:46.000000000 -0400
> @@ -0,0 +1,13 @@
> +#ifndef _TRACE_FILEMAP_H
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/tracepoint.h>
> +
> +DEFINE_TRACE(filemap_wait_start,
> +	TPPROTO(struct page *page, int bit_nr),
> +	TPARGS(page, bit_nr));
> +DEFINE_TRACE(filemap_wait_end,
> +	TPPROTO(struct page *page, int bit_nr),
> +	TPARGS(page, bit_nr));
> +
> +#endif
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 10/17] LTTng instrumentation - swap
  2008-07-15 22:26 ` [patch 10/17] LTTng instrumentation - swap Mathieu Desnoyers
@ 2008-07-16  8:39   ` Peter Zijlstra
  2008-07-16 14:40     ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16  8:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, linux-mm,
	Dave Hansen, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> plain text document attachment (lttng-instrumentation-swap.patch)
> Instrumentation of waits caused by swap activity. Also instrumentation
> swapon/swapoff events to keep track of active swap partitions.
> 
> Those tracepoints are used by LTTng.
> 
> About the performance impact of tracepoints (which is comparable to markers),
> even without immediate values optimizations, tests done by Hideo Aoki on ia64
> show no regression. His test case was using hackbench on a kernel where
> scheduler instrumentation (about 5 events in code scheduler code) was added.
> See the "Tracepoints" patch header for performance result detail.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: linux-mm@kvack.org
> CC: Dave Hansen <haveblue@us.ibm.com>
> CC: Masami Hiramatsu <mhiramat@redhat.com>
> CC: 'Peter Zijlstra' <peterz@infradead.org>
> CC: "Frank Ch. Eigler" <fche@redhat.com>
> CC: 'Ingo Molnar' <mingo@elte.hu>
> CC: 'Hideo AOKI' <haoki@redhat.com>
> CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> CC: 'Steven Rostedt' <rostedt@goodmis.org>
> CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  include/trace/swap.h |   20 ++++++++++++++++++++
>  mm/memory.c          |    2 ++
>  mm/page_io.c         |    2 ++
>  mm/swapfile.c        |    4 ++++
>  4 files changed, 28 insertions(+)
> 
> Index: linux-2.6-lttng/mm/memory.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/memory.c	2008-07-15 13:54:46.000000000 -0400
> +++ linux-2.6-lttng/mm/memory.c	2008-07-15 14:02:54.000000000 -0400
> @@ -51,6 +51,7 @@
>  #include <linux/init.h>
>  #include <linux/writeback.h>
>  #include <linux/memcontrol.h>
> +#include <trace/swap.h>
>  
>  #include <asm/pgalloc.h>
>  #include <asm/uaccess.h>
> @@ -2213,6 +2214,7 @@ static int do_swap_page(struct mm_struct
>  		/* Had to read the page from swap area: Major fault */
>  		ret = VM_FAULT_MAJOR;
>  		count_vm_event(PGMAJFAULT);
> +		trace_swap_in(page, entry);
>  	}
>  
>  	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> Index: linux-2.6-lttng/mm/page_io.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/page_io.c	2008-07-15 13:54:46.000000000 -0400
> +++ linux-2.6-lttng/mm/page_io.c	2008-07-15 14:02:54.000000000 -0400
> @@ -17,6 +17,7 @@
>  #include <linux/bio.h>
>  #include <linux/swapops.h>
>  #include <linux/writeback.h>
> +#include <trace/swap.h>
>  #include <asm/pgtable.h>
>  
>  static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
> @@ -114,6 +115,7 @@ int swap_writepage(struct page *page, st
>  		rw |= (1 << BIO_RW_SYNC);
>  	count_vm_event(PSWPOUT);
>  	set_page_writeback(page);
> +	trace_swap_out(page);
>  	unlock_page(page);
>  	submit_bio(rw, bio);
>  out:
> Index: linux-2.6-lttng/mm/swapfile.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/swapfile.c	2008-07-15 13:54:46.000000000 -0400
> +++ linux-2.6-lttng/mm/swapfile.c	2008-07-15 14:02:54.000000000 -0400
> @@ -32,6 +32,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  #include <linux/swapops.h>
> +#include <trace/swap.h>
>  
>  DEFINE_SPINLOCK(swap_lock);
>  unsigned int nr_swapfiles;
> @@ -1310,6 +1311,7 @@ asmlinkage long sys_swapoff(const char _
>  	swap_map = p->swap_map;
>  	p->swap_map = NULL;
>  	p->flags = 0;
> +	trace_swap_file_close(swap_file);
>  	spin_unlock(&swap_lock);
>  	mutex_unlock(&swapon_mutex);
>  	vfree(swap_map);
> @@ -1695,6 +1697,7 @@ asmlinkage long sys_swapon(const char __
>  	} else {
>  		swap_info[prev].next = p - swap_info;
>  	}
> +	trace_swap_file_open(swap_file, name);
>  	spin_unlock(&swap_lock);
>  	mutex_unlock(&swapon_mutex);
>  	error = 0;
> @@ -1796,6 +1799,7 @@ get_swap_info_struct(unsigned type)
>  {
>  	return &swap_info[type];
>  }
> +EXPORT_SYMBOL_GPL(get_swap_info_struct);

I'm not too happy with this export.

>  
>  /*
>   * swap_lock prevents swap_map being freed. Don't grab an extra
> Index: linux-2.6-lttng/include/trace/swap.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/trace/swap.h	2008-07-15 14:02:54.000000000 -0400
> @@ -0,0 +1,20 @@
> +#ifndef _TRACE_SWAP_H
> +#define _TRACE_SWAP_H
> +
> +#include <linux/swap.h>
> +#include <linux/tracepoint.h>
> +
> +DEFINE_TRACE(swap_in,
> +	TPPROTO(struct page *page, swp_entry_t entry),
> +	TPARGS(page, entry));
> +DEFINE_TRACE(swap_out,
> +	TPPROTO(struct page *page),
> +	TPARGS(page));
> +DEFINE_TRACE(swap_file_open,
> +	TPPROTO(struct file *file, char *filename),
> +	TPARGS(file, filename));
> +DEFINE_TRACE(swap_file_close,
> +	TPPROTO(struct file *file),
> +	TPARGS(file));
> +
> +#endif
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 12/17] LTTng instrumentation - page
  2008-07-15 22:26 ` [patch 12/17] LTTng instrumentation - page Mathieu Desnoyers
@ 2008-07-16  8:41   ` Peter Zijlstra
  2008-07-16 15:03     ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16  8:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, Martin Bligh,
	Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> plain text document attachment (lttng-instrumentation-page.patch)
> Paging activity instrumentation. Instruments page allocation/free to keep track
> of page allocation. This does not cover hugetlb activity, which is covered by a
> separate patch.
> 
> Those tracepoints are used by LTTng.
> 
> About the performance impact of tracepoints (which is comparable to markers),
> even without immediate values optimizations, tests done by Hideo Aoki on ia64
> show no regression. His test case was using hackbench on a kernel where
> scheduler instrumentation (about 5 events in code scheduler code) was added.
> See the "Tracepoints" patch header for performance result detail.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: Martin Bligh <mbligh@google.com>
> CC: Masami Hiramatsu <mhiramat@redhat.com>
> CC: 'Peter Zijlstra' <peterz@infradead.org>
> CC: "Frank Ch. Eigler" <fche@redhat.com>
> CC: 'Ingo Molnar' <mingo@elte.hu>
> CC: 'Hideo AOKI' <haoki@redhat.com>
> CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> CC: 'Steven Rostedt' <rostedt@goodmis.org>
> CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  include/trace/page.h |   16 ++++++++++++++++
>  mm/page_alloc.c      |    6 ++++++
>  2 files changed, 22 insertions(+)
> 
> Index: linux-2.6-lttng/mm/page_alloc.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/page_alloc.c	2008-07-15 13:54:46.000000000 -0400
> +++ linux-2.6-lttng/mm/page_alloc.c	2008-07-15 14:04:38.000000000 -0400
> @@ -46,6 +46,7 @@
>  #include <linux/page-isolation.h>
>  #include <linux/memcontrol.h>
>  #include <linux/debugobjects.h>
> +#include <trace/page.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/div64.h>
> @@ -510,6 +511,8 @@ static void __free_pages_ok(struct page 
>  	int i;
>  	int reserved = 0;
>  
> +	trace_page_free(page, order);
> +
>  	for (i = 0 ; i < (1 << order) ; ++i)
>  		reserved += free_pages_check(page + i);
>  	if (reserved)
> @@ -966,6 +969,8 @@ static void free_hot_cold_page(struct pa
>  	struct per_cpu_pages *pcp;
>  	unsigned long flags;
>  
> +	trace_page_free(page, 0);
> +
>  	if (PageAnon(page))
>  		page->mapping = NULL;
>  	if (free_pages_check(page))
> @@ -1630,6 +1635,7 @@ nopage:
>  		show_mem();
>  	}
>  got_pg:
> +	trace_page_alloc(page, order);
>  	return page;
>  }
>  
> Index: linux-2.6-lttng/include/trace/page.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/trace/page.h	2008-07-15 14:04:38.000000000 -0400

This name seems inconsitent with your other choices.

include/traec/page_alloc.h comes to mind

> @@ -0,0 +1,16 @@
> +#ifndef _TRACE_PAGE_H
> +#define _TRACE_PAGE_H
> +
> +#include <linux/tracepoint.h>
> +
> +/*
> + * mm_page_alloc : page can be NULL.
> + */
> +DEFINE_TRACE(page_alloc,
> +	TPPROTO(struct page *page, unsigned int order),
> +	TPARGS(page, order));
> +DEFINE_TRACE(page_free,
> +	TPPROTO(struct page *page, unsigned int order),
> +	TPARGS(page, order));
> +
> +#endif
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 00/17] Tracepoints v4 for linux-next
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (16 preceding siblings ...)
  2008-07-15 22:26 ` [patch 17/17] ftrace port to tracepoints Mathieu Desnoyers
@ 2008-07-16  8:51 ` Peter Zijlstra
  2008-07-18 15:41 ` Masami Hiramatsu
  18 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16  8:51 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu

On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> Hi,
> 
> Here is the newest release of the Tracepoints, following the feedback from Peter
> Zijlstra. The main change is the creation of include/trace/ as a placeholder
> from tracepoint headers.
> 
> The patchset applies over linux-next patch-v2.6.26-next-20080715 in this order :
> 
> #This a separate RCU update upon which the tracepoints depend
> rcu-read-sched.patch

ACK

> tracepoints.patch
> tracepoints-documentation.patch
> tracepoints-samples.patch

3x ACK

> lttng-instrumentation-irq.patch
> lttng-instrumentation-scheduler.patch
> lttng-instrumentation-timer.patch
> lttng-instrumentation-kernel.patch
> 
> lttng-instrumentation-filemap.patch
> lttng-instrumentation-swap.patch
> lttng-instrumentation-memory.patch
> lttng-instrumentation-page.patch
> lttng-instrumentation-hugetlb.patch
> 
> lttng-instrumentation-net.patch
> lttng-instrumentation-ipv4.patch
> lttng-instrumentation-ipv6.patch

posted individual comments where I saw some

> ftrace-port-to-tracepoints.patch

Nice cleanup..



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 06/17] LTTng instrumentation - scheduler
  2008-07-16  8:30   ` Peter Zijlstra
@ 2008-07-16 14:18     ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu,
	Steven Rostedt, Thomas Gleixner, Frank Ch. Eigler, Hideo AOKI,
	Takashi Nishiie, Eduard - Gabriel Munteanu

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> > plain text document attachment (lttng-instrumentation-scheduler.patch)
> > Instrument the scheduler activity (sched_switch, migration, wakeups, wait for a
> > task, signal delivery) and process/thread creation/destruction (fork, exit,
> > kthread stop). Actually, kthread creation is not instrumented in this patch
> > because it is architecture dependent. It allows to connect tracers such as
> > ftrace which detects scheduling latencies, good/bad scheduler decisions. Tools
> > like LTTng can export this scheduler information along with instrumentation of
> > the rest of the kernel activity to perform post-mortem analysis on the scheduler
> > activity.
> > 
> > About the performance impact of tracepoints (which is comparable to markers),
> > even without immediate values optimizations, tests done by Hideo Aoki on ia64
> > show no regression. His test case was using hackbench on a kernel where
> > scheduler instrumentation (about 5 events in code scheduler code) was added.
> > See the "Tracepoints" patch header for performance result detail.
> > 
> > Changelog :
> > - Change instrumentation location and parameter to match ftrace instrumentation,
> >   previously done with kernel markers.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > CC: 'Peter Zijlstra' <peterz@infradead.org>
> > CC: 'Steven Rostedt' <rostedt@goodmis.org>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > CC: Masami Hiramatsu <mhiramat@redhat.com>
> > CC: "Frank Ch. Eigler" <fche@redhat.com>
> > CC: 'Ingo Molnar' <mingo@elte.hu>
> > CC: 'Hideo AOKI' <haoki@redhat.com>
> > CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> > CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---
> >  include/trace/sched.h |   45 +++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/exit.c         |    6 ++++++
> >  kernel/fork.c         |    3 +++
> >  kernel/kthread.c      |    5 +++++
> >  kernel/sched.c        |   17 ++++++-----------
> >  kernel/signal.c       |    3 +++
> >  6 files changed, 68 insertions(+), 11 deletions(-)
> > 
> > Index: linux-2.6-lttng/kernel/kthread.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/kthread.c	2008-07-15 14:51:49.000000000 -0400
> > +++ linux-2.6-lttng/kernel/kthread.c	2008-07-15 15:12:54.000000000 -0400
> > @@ -13,6 +13,7 @@
> >  #include <linux/file.h>
> >  #include <linux/module.h>
> >  #include <linux/mutex.h>
> > +#include <trace/sched.h>
> >  
> >  #define KTHREAD_NICE_LEVEL (-5)
> >  
> > @@ -187,6 +188,8 @@ int kthread_stop(struct task_struct *k)
> >  	/* It could exit after stop_info.k set, but before wake_up_process. */
> >  	get_task_struct(k);
> >  
> > +	trace_sched_kthread_stop(k);
> > +
> >  	/* Must init completion *before* thread sees kthread_stop_info.k */
> >  	init_completion(&kthread_stop_info.done);
> >  	smp_wmb();
> > @@ -202,6 +205,8 @@ int kthread_stop(struct task_struct *k)
> >  	ret = kthread_stop_info.err;
> >  	mutex_unlock(&kthread_stop_lock);
> >  
> > +	trace_sched_kthread_stop_ret(ret);
> > +
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL(kthread_stop);
> 
> Why do we need two trace points in this function?
> 

As of my understanding, if we look at the complete function :

int kthread_stop(struct task_struct *k)
{
        int ret;

        mutex_lock(&kthread_stop_lock);

        /* It could exit after stop_info.k set, but before wake_up_process. */
        get_task_struct(k);

        trace_sched_kthread_stop(k);

        /* Must init completion *before* thread sees kthread_stop_info.k */
        init_completion(&kthread_stop_info.done);
        smp_wmb();

        /* Now set kthread_should_stop() to true, and wake it up. */
        kthread_stop_info.k = k;
        wake_up_process(k);
        put_task_struct(k);

--> starting from this point, k can be freed by the thread being stopped
and exiting. Therefore, if we want an identifier about the thread about
to stop, we have to instrument before this point.

        /* Once it dies, reset stop ptr, gather result and we're done. */
        wait_for_completion(&kthread_stop_info.done);
        kthread_stop_info.k = NULL;
        ret = kthread_stop_info.err;
        mutex_unlock(&kthread_stop_lock);

        trace_sched_kthread_stop_ret(ret);

---> the success of this operation is only known after "ret" is
assigned. So this is why I couldn't find one tracepoint to have both
valid "k" and "ret" at the same time.

        return ret;
}


> > Index: linux-2.6-lttng/kernel/sched.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/sched.c	2008-07-15 14:51:50.000000000 -0400
> > +++ linux-2.6-lttng/kernel/sched.c	2008-07-15 15:13:49.000000000 -0400
> > @@ -71,6 +71,7 @@
> >  #include <linux/debugfs.h>
> >  #include <linux/ctype.h>
> >  #include <linux/ftrace.h>
> > +#include <trace/sched.h>
> >  
> >  #include <asm/tlb.h>
> >  #include <asm/irq_regs.h>
> > @@ -1987,6 +1988,7 @@ void wait_task_inactive(struct task_stru
> >  		 * just go back and repeat.
> >  		 */
> >  		rq = task_rq_lock(p, &flags);
> > +		trace_sched_wait_task(rq, p);
> >  		running = task_running(rq, p);
> >  		on_rq = p->se.on_rq;
> >  		task_rq_unlock(rq, &flags);
> > @@ -2337,9 +2339,7 @@ out_activate:
> >  	success = 1;
> >  
> >  out_running:
> > -	trace_mark(kernel_sched_wakeup,
> > -		"pid %d state %ld ## rq %p task %p rq->curr %p",
> > -		p->pid, p->state, rq, p, rq->curr);
> > +	trace_sched_wakeup(rq, p);
> >  	check_preempt_curr(rq, p);
> >  
> >  	p->state = TASK_RUNNING;
> > @@ -2472,9 +2472,7 @@ void wake_up_new_task(struct task_struct
> >  		p->sched_class->task_new(rq, p);
> >  		inc_nr_running(rq);
> >  	}
> > -	trace_mark(kernel_sched_wakeup_new,
> > -		"pid %d state %ld ## rq %p task %p rq->curr %p",
> > -		p->pid, p->state, rq, p, rq->curr);
> > +	trace_sched_wakeup_new(rq, p);
> >  	check_preempt_curr(rq, p);
> >  #ifdef CONFIG_SMP
> >  	if (p->sched_class->task_wake_up)
> > @@ -2647,11 +2645,7 @@ context_switch(struct rq *rq, struct tas
> >  	struct mm_struct *mm, *oldmm;
> >  
> >  	prepare_task_switch(rq, prev, next);
> > -	trace_mark(kernel_sched_schedule,
> > -		"prev_pid %d next_pid %d prev_state %ld "
> > -		"## rq %p prev %p next %p",
> > -		prev->pid, next->pid, prev->state,
> > -		rq, prev, next);
> > +	trace_sched_switch(rq, prev, next);
> >  	mm = next->mm;
> >  	oldmm = prev->active_mm;
> >  	/*
> > @@ -2884,6 +2878,7 @@ static void sched_migrate_task(struct ta
> >  	    || unlikely(cpu_is_offline(dest_cpu)))
> >  		goto out;
> >  
> > +	trace_sched_migrate_task(rq, p, dest_cpu);
> >  	/* force the process onto the specified CPU */
> >  	if (migrate_task(p, dest_cpu, &req)) {
> >  		/* Need to wait for migration thread (might exit: take ref). */
> > Index: linux-2.6-lttng/kernel/exit.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/exit.c	2008-07-15 14:51:49.000000000 -0400
> > +++ linux-2.6-lttng/kernel/exit.c	2008-07-15 15:12:54.000000000 -0400
> > @@ -46,6 +46,7 @@
> >  #include <linux/resource.h>
> >  #include <linux/blkdev.h>
> >  #include <linux/task_io_accounting_ops.h>
> > +#include <trace/sched.h>
> >  
> >  #include <asm/uaccess.h>
> >  #include <asm/unistd.h>
> > @@ -149,6 +150,7 @@ static void __exit_signal(struct task_st
> >  
> >  static void delayed_put_task_struct(struct rcu_head *rhp)
> >  {
> > +	trace_sched_process_free(container_of(rhp, struct task_struct, rcu));
> >  	put_task_struct(container_of(rhp, struct task_struct, rcu));
> >  }
> 
> It might make sense to write it like:
> 
> static void delayed_put_task_struct(struct rcu_head *rhp)
> {
> 	struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
> 
> 	trace_sched_process_free(tsk);
> 	put_task_struct(tsk);
> }
> 

Yep, will fix.

> > @@ -1040,6 +1042,8 @@ NORET_TYPE void do_exit(long code)
> >  
> >  	if (group_dead)
> >  		acct_process();
> > +	trace_sched_process_exit(tsk);
> > +
> >  	exit_sem(tsk);
> >  	exit_files(tsk);
> >  	exit_fs(tsk);
> > @@ -1524,6 +1528,8 @@ static long do_wait(enum pid_type type, 
> >  	struct task_struct *tsk;
> >  	int flag, retval;
> >  
> > +	trace_sched_process_wait(pid);
> > +
> >  	add_wait_queue(&current->signal->wait_chldexit,&wait);
> >  repeat:
> >  	/* If there is nothing that can match our critier just get out */
> > Index: linux-2.6-lttng/kernel/fork.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/fork.c	2008-07-15 14:51:49.000000000 -0400
> > +++ linux-2.6-lttng/kernel/fork.c	2008-07-15 15:14:23.000000000 -0400
> > @@ -56,6 +56,7 @@
> >  #include <linux/proc_fs.h>
> >  #include <linux/blkdev.h>
> >  #include <linux/magic.h>
> > +#include <trace/sched.h>
> >  
> >  #include <asm/pgtable.h>
> >  #include <asm/pgalloc.h>
> > @@ -1362,6 +1363,8 @@ long do_fork(unsigned long clone_flags,
> >  	if (!IS_ERR(p)) {
> >  		struct completion vfork;
> >  
> > +		trace_sched_process_fork(current, p);
> > +
> >  		nr = task_pid_vnr(p);
> >  
> >  		if (clone_flags & CLONE_PARENT_SETTID)
> > Index: linux-2.6-lttng/kernel/signal.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/signal.c	2008-07-15 14:49:14.000000000 -0400
> > +++ linux-2.6-lttng/kernel/signal.c	2008-07-15 15:12:54.000000000 -0400
> > @@ -26,6 +26,7 @@
> >  #include <linux/freezer.h>
> >  #include <linux/pid_namespace.h>
> >  #include <linux/nsproxy.h>
> > +#include <trace/sched.h>
> >  
> >  #include <asm/param.h>
> >  #include <asm/uaccess.h>
> > @@ -807,6 +808,8 @@ static int send_signal(int sig, struct s
> >  	struct sigpending *pending;
> >  	struct sigqueue *q;
> >  
> > +	trace_sched_signal_send(sig, t);
> > +
> >  	assert_spin_locked(&t->sighand->siglock);
> >  	if (!prepare_signal(sig, t))
> >  		return 0;
> 
> Would it make sense to also put a trace point on receiveing a signal?
> 
> /me utterly clueless about the whole signal stuff.
> 

When the signal is sent, it is added to the pending signal list of the
target process. Unless the process blocks the signal, its TIF_SIGPENDING
thread flag is set and the process (actually, all threads in the
process) is then woken up. If signals are blocked by that process, it
will set its own TIF_SIGPENDING flag when it unblocks the signals.
recalc_sigpending() as well as a few other code paths) set/clear the
thread flag TIF_SIGPENDING which controls delivery of signals on
interrupt, trap and syscall return paths to userspace.

When TIF_SIGPENDING is set, do_signal(regs) is called. It is
architecture specific (I'm currently looking at x86_32). It delivers the
signal by calling the arch-specific handle_signal().

So I think that handle_signal() could be an interesting function to
instrument within each architecture to know exactly when the signal
stack is setup. However, since we are currently discussing
architecture-independent instrumentation, instrumenting all

  set_tsk_thread_flag(p, TIF_SIGPENDING);

statements would be good to learn whenever the signal is being seen by
the target process.

Since this thread flag is set at 3 locations in signal.c, putting a
wrapper around set_tsk_thread_flag(t, TIF_SIGPENDING); and inserting a
tracepoint with each call could make sense.

But we can also decide that knowing when a signal is sent is enough,
and that by simply inserting the correct architecture-specific
instrumentation we learn when the signal is delivered...

Mathieu

> > Index: linux-2.6-lttng/include/trace/sched.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/include/trace/sched.h	2008-07-15 15:12:54.000000000 -0400
> > @@ -0,0 +1,45 @@
> > +#ifndef _TRACE_SCHED_H
> > +#define _TRACE_SCHED_H
> > +
> > +#include <linux/sched.h>
> > +#include <linux/tracepoint.h>
> > +
> > +DEFINE_TRACE(sched_kthread_stop,
> > +	TPPROTO(struct task_struct *t),
> > +	TPARGS(t));
> > +DEFINE_TRACE(sched_kthread_stop_ret,
> > +	TPPROTO(int ret),
> > +	TPARGS(ret));
> > +DEFINE_TRACE(sched_wait_task,
> > +	TPPROTO(struct rq *rq, struct task_struct *p),
> > +	TPARGS(rq, p));
> > +DEFINE_TRACE(sched_wakeup,
> > +	TPPROTO(struct rq *rq, struct task_struct *p),
> > +	TPARGS(rq, p));
> > +DEFINE_TRACE(sched_wakeup_new,
> > +	TPPROTO(struct rq *rq, struct task_struct *p),
> > +	TPARGS(rq, p));
> > +DEFINE_TRACE(sched_switch,
> > +	TPPROTO(struct rq *rq, struct task_struct *prev,
> > +		struct task_struct *next),
> > +	TPARGS(rq, prev, next));
> > +DEFINE_TRACE(sched_migrate_task,
> > +	TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
> > +	TPARGS(rq, p, dest_cpu));
> > +DEFINE_TRACE(sched_process_free,
> > +	TPPROTO(struct task_struct *p),
> > +	TPARGS(p));
> > +DEFINE_TRACE(sched_process_exit,
> > +	TPPROTO(struct task_struct *p),
> > +	TPARGS(p));
> > +DEFINE_TRACE(sched_process_wait,
> > +	TPPROTO(struct pid *pid),
> > +	TPARGS(pid));
> > +DEFINE_TRACE(sched_process_fork,
> > +	TPPROTO(struct task_struct *parent, struct task_struct *child),
> > +	TPARGS(parent, child));
> > +DEFINE_TRACE(sched_signal_send,
> > +	TPPROTO(int sig, struct task_struct *p),
> > +	TPARGS(sig, p));
> > +
> > +#endif
> > 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 07/17] LTTng instrumentation - timer
  2008-07-16  8:34   ` Peter Zijlstra
@ 2008-07-16 14:34     ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu,
	David S. Miller, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu, Thomas Gleixner

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> > plain text document attachment (lttng-instrumentation-timer.patch)
> > Instrument timer activity (timer set, expired, current time updates) to keep
> > information about the "real time" flow within the kernel. It can be used by a
> > trace analysis tool to synchronize information coming from various sources, e.g.
> > to merge traces with system logs.
> > 
> > Those tracepoints are used by LTTng.
> > 
> > About the performance impact of tracepoints (which is comparable to markers),
> > even without immediate values optimizations, tests done by Hideo Aoki on ia64
> > show no regression. His test case was using hackbench on a kernel where
> > scheduler instrumentation (about 5 events in code scheduler code) was added.
> > See the "Tracepoints" patch header for performance result detail.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > CC: 'Ingo Molnar' <mingo@elte.hu>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: Masami Hiramatsu <mhiramat@redhat.com>
> > CC: 'Peter Zijlstra' <peterz@infradead.org>
> > CC: "Frank Ch. Eigler" <fche@redhat.com>
> > CC: 'Hideo AOKI' <haoki@redhat.com>
> > CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> > CC: 'Steven Rostedt' <rostedt@goodmis.org>
> > CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---
> >  include/trace/timer.h |   24 ++++++++++++++++++++++++
> >  kernel/itimer.c       |    5 +++++
> >  kernel/timer.c        |    8 +++++++-
> >  3 files changed, 36 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6-lttng/kernel/itimer.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/itimer.c	2008-07-15 14:49:14.000000000 -0400
> > +++ linux-2.6-lttng/kernel/itimer.c	2008-07-15 15:14:28.000000000 -0400
> > @@ -12,6 +12,7 @@
> >  #include <linux/time.h>
> >  #include <linux/posix-timers.h>
> >  #include <linux/hrtimer.h>
> > +#include <trace/timer.h>
> >  
> >  #include <asm/uaccess.h>
> >  
> > @@ -132,6 +133,8 @@ enum hrtimer_restart it_real_fn(struct h
> >  	struct signal_struct *sig =
> >  		container_of(timer, struct signal_struct, real_timer);
> >  
> > +	trace_timer_itimer_expired(sig);
> > +
> >  	kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);
> >  
> >  	return HRTIMER_NORESTART;
> > @@ -157,6 +160,8 @@ int do_setitimer(int which, struct itime
> >  	    !timeval_valid(&value->it_interval))
> >  		return -EINVAL;
> >  
> > +	trace_timer_itimer_set(which, value);
> > +
> >  	switch (which) {
> >  	case ITIMER_REAL:
> >  again:
> > Index: linux-2.6-lttng/kernel/timer.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/timer.c	2008-07-15 14:51:50.000000000 -0400
> > +++ linux-2.6-lttng/kernel/timer.c	2008-07-15 15:14:28.000000000 -0400
> > @@ -37,12 +37,14 @@
> >  #include <linux/delay.h>
> >  #include <linux/tick.h>
> >  #include <linux/kallsyms.h>
> > +#include <trace/timer.h>
> >  
> >  #include <asm/uaccess.h>
> >  #include <asm/unistd.h>
> >  #include <asm/div64.h>
> >  #include <asm/timex.h>
> >  #include <asm/io.h>
> > +#include <asm/irq_regs.h>
> >  
> >  u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
> >  
> > @@ -288,6 +290,7 @@ static void internal_add_timer(struct tv
> >  		i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
> >  		vec = base->tv5.vec + i;
> >  	}
> > +	trace_timer_set(timer);
> >  	/*
> >  	 * Timers are FIFO:
> >  	 */
> > @@ -1066,6 +1069,7 @@ void do_timer(unsigned long ticks)
> >  {
> >  	jiffies_64 += ticks;
> >  	update_times(ticks);
> > +	trace_timer_update_time(&xtime, &wall_to_monotonic);
> >  }
> 
> This is a very dangerous trace point - we're holding xtime lock here.
> 
> Ah, I see you make that comment below too, are you sure you want to do
> this? 
> 

Yes, this is done on purpose : I want to know the link between evolution
of the "system time" and my more precise time source (habitually the TSC),
so I can know how to interleave events between system logs and low-level
traces.

However it involves that a tracer connected to this tracepoint cannot
take the xtime lock (or must be aware that it is already taken). Since
LTTng has been designed to support NMIs, and because seqlocks and NMIs
does not mix well together (cause deadlocks), I have no such concern. I
created my own RCU-based 32-to-64 bits counter extension infrastructure
for that precise purpose (so I could support architectures which provide
hardware counters with fewer than 64 bits).

You'll notice that as tracepoints are added, one must be more and more
careful about what locks it takes or what parts of kernel infrastructure
it uses in its tracer. But nobody said tracing was easy. ;-)

Mathieu

> Thomas, any input?
> 
> >  #ifdef __ARCH_WANT_SYS_ALARM
> > @@ -1147,7 +1151,9 @@ asmlinkage long sys_getegid(void)
> >  
> >  static void process_timeout(unsigned long __data)
> >  {
> > -	wake_up_process((struct task_struct *)__data);
> > +	struct task_struct *task = (struct task_struct *)__data;
> > +	trace_timer_timeout(task);
> > +	wake_up_process(task);
> >  }
> >  
> >  /**
> > Index: linux-2.6-lttng/include/trace/timer.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/include/trace/timer.h	2008-07-15 15:14:28.000000000 -0400
> > @@ -0,0 +1,24 @@
> > +#ifndef _TRACE_TIMER_H
> > +#define _TRACE_TIMER_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +DEFINE_TRACE(timer_itimer_expired,
> > +	TPPROTO(struct signal_struct *sig),
> > +	TPARGS(sig));
> > +DEFINE_TRACE(timer_itimer_set,
> > +	TPPROTO(int which, struct itimerval *value),
> > +	TPARGS(which, value));
> > +DEFINE_TRACE(timer_set,
> > +	TPPROTO(struct timer_list *timer),
> > +	TPARGS(timer));
> > +/*
> > + * xtime_lock is taken when kernel_timer_update_time tracepoint is reached.
> > + */
> > +DEFINE_TRACE(timer_update_time,
> > +	TPPROTO(struct timespec *_xtime, struct timespec *_wall_to_monotonic),
> > +	TPARGS(_xtime, _wall_to_monotonic));
> > +DEFINE_TRACE(timer_timeout,
> > +	TPPROTO(struct task_struct *p),
> > +	TPARGS(p));
> > +#endif
> > 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 09/17] LTTng instrumentation - filemap
  2008-07-16  8:35   ` Peter Zijlstra
@ 2008-07-16 14:37     ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, linux-mm,
	Dave Hansen, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> > plain text document attachment (lttng-instrumentation-filemap.patch)
> > Instrumentation of waits caused by memory accesses on mmap regions.
> > 
> > Those tracepoints are used by LTTng.
> > 
> > About the performance impact of tracepoints (which is comparable to markers),
> > even without immediate values optimizations, tests done by Hideo Aoki on ia64
> > show no regression. His test case was using hackbench on a kernel where
> > scheduler instrumentation (about 5 events in code scheduler code) was added.
> > See the "Tracepoints" patch header for performance result detail.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > CC: linux-mm@kvack.org
> > CC: Dave Hansen <haveblue@us.ibm.com>
> > CC: Masami Hiramatsu <mhiramat@redhat.com>
> > CC: 'Peter Zijlstra' <peterz@infradead.org>
> > CC: "Frank Ch. Eigler" <fche@redhat.com>
> > CC: 'Ingo Molnar' <mingo@elte.hu>
> > CC: 'Hideo AOKI' <haoki@redhat.com>
> > CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> > CC: 'Steven Rostedt' <rostedt@goodmis.org>
> > CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---
> >  include/trace/filemap.h |   13 +++++++++++++
> >  mm/filemap.c            |    3 +++
> >  2 files changed, 16 insertions(+)
> > 
> > Index: linux-2.6-lttng/mm/filemap.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/mm/filemap.c	2008-07-15 14:51:50.000000000 -0400
> > +++ linux-2.6-lttng/mm/filemap.c	2008-07-15 15:14:46.000000000 -0400
> > @@ -33,6 +33,7 @@
> >  #include <linux/cpuset.h>
> >  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
> >  #include <linux/memcontrol.h>
> > +#include <trace/filemap.h>
> >  #include "internal.h"
> >  
> >  /*
> > @@ -541,9 +542,11 @@ void wait_on_page_bit(struct page *page,
> >  {
> >  	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
> >  
> > +	trace_filemap_wait_start(page, bit_nr);
> >  	if (test_bit(bit_nr, &page->flags))
> >  		__wait_on_bit(page_waitqueue(page), &wait, sync_page,
> >  							TASK_UNINTERRUPTIBLE);
> > +	trace_filemap_wait_end(page, bit_nr);
> >  }
> >  EXPORT_SYMBOL(wait_on_page_bit);
> 
> I don't like the trace_filemap_wait_* naming..
> 

Me neither :)

> trace_wait_on_page_* might make more sense
> 

Yep, agreed,

Mathieu

> > Index: linux-2.6-lttng/include/trace/filemap.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/include/trace/filemap.h	2008-07-15 15:14:46.000000000 -0400
> > @@ -0,0 +1,13 @@
> > +#ifndef _TRACE_FILEMAP_H
> > +#define _TRACE_FILEMAP_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +DEFINE_TRACE(filemap_wait_start,
> > +	TPPROTO(struct page *page, int bit_nr),
> > +	TPARGS(page, bit_nr));
> > +DEFINE_TRACE(filemap_wait_end,
> > +	TPPROTO(struct page *page, int bit_nr),
> > +	TPARGS(page, bit_nr));
> > +
> > +#endif
> > 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 10/17] LTTng instrumentation - swap
  2008-07-16  8:39   ` Peter Zijlstra
@ 2008-07-16 14:40     ` Mathieu Desnoyers
  2008-07-16 14:47       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, linux-mm,
	Dave Hansen, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> > plain text document attachment (lttng-instrumentation-swap.patch)
> > Instrumentation of waits caused by swap activity. Also instrumentation
> > swapon/swapoff events to keep track of active swap partitions.
> > 
> > Those tracepoints are used by LTTng.
> > 
> > About the performance impact of tracepoints (which is comparable to markers),
> > even without immediate values optimizations, tests done by Hideo Aoki on ia64
> > show no regression. His test case was using hackbench on a kernel where
> > scheduler instrumentation (about 5 events in code scheduler code) was added.
> > See the "Tracepoints" patch header for performance result detail.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > CC: linux-mm@kvack.org
> > CC: Dave Hansen <haveblue@us.ibm.com>
> > CC: Masami Hiramatsu <mhiramat@redhat.com>
> > CC: 'Peter Zijlstra' <peterz@infradead.org>
> > CC: "Frank Ch. Eigler" <fche@redhat.com>
> > CC: 'Ingo Molnar' <mingo@elte.hu>
> > CC: 'Hideo AOKI' <haoki@redhat.com>
> > CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> > CC: 'Steven Rostedt' <rostedt@goodmis.org>
> > CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---
> >  include/trace/swap.h |   20 ++++++++++++++++++++
> >  mm/memory.c          |    2 ++
> >  mm/page_io.c         |    2 ++
> >  mm/swapfile.c        |    4 ++++
> >  4 files changed, 28 insertions(+)
> > 
> > Index: linux-2.6-lttng/mm/memory.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/mm/memory.c	2008-07-15 13:54:46.000000000 -0400
> > +++ linux-2.6-lttng/mm/memory.c	2008-07-15 14:02:54.000000000 -0400
> > @@ -51,6 +51,7 @@
> >  #include <linux/init.h>
> >  #include <linux/writeback.h>
> >  #include <linux/memcontrol.h>
> > +#include <trace/swap.h>
> >  
> >  #include <asm/pgalloc.h>
> >  #include <asm/uaccess.h>
> > @@ -2213,6 +2214,7 @@ static int do_swap_page(struct mm_struct
> >  		/* Had to read the page from swap area: Major fault */
> >  		ret = VM_FAULT_MAJOR;
> >  		count_vm_event(PGMAJFAULT);
> > +		trace_swap_in(page, entry);
> >  	}
> >  
> >  	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> > Index: linux-2.6-lttng/mm/page_io.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/mm/page_io.c	2008-07-15 13:54:46.000000000 -0400
> > +++ linux-2.6-lttng/mm/page_io.c	2008-07-15 14:02:54.000000000 -0400
> > @@ -17,6 +17,7 @@
> >  #include <linux/bio.h>
> >  #include <linux/swapops.h>
> >  #include <linux/writeback.h>
> > +#include <trace/swap.h>
> >  #include <asm/pgtable.h>
> >  
> >  static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
> > @@ -114,6 +115,7 @@ int swap_writepage(struct page *page, st
> >  		rw |= (1 << BIO_RW_SYNC);
> >  	count_vm_event(PSWPOUT);
> >  	set_page_writeback(page);
> > +	trace_swap_out(page);
> >  	unlock_page(page);
> >  	submit_bio(rw, bio);
> >  out:
> > Index: linux-2.6-lttng/mm/swapfile.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/mm/swapfile.c	2008-07-15 13:54:46.000000000 -0400
> > +++ linux-2.6-lttng/mm/swapfile.c	2008-07-15 14:02:54.000000000 -0400
> > @@ -32,6 +32,7 @@
> >  #include <asm/pgtable.h>
> >  #include <asm/tlbflush.h>
> >  #include <linux/swapops.h>
> > +#include <trace/swap.h>
> >  
> >  DEFINE_SPINLOCK(swap_lock);
> >  unsigned int nr_swapfiles;
> > @@ -1310,6 +1311,7 @@ asmlinkage long sys_swapoff(const char _
> >  	swap_map = p->swap_map;
> >  	p->swap_map = NULL;
> >  	p->flags = 0;
> > +	trace_swap_file_close(swap_file);
> >  	spin_unlock(&swap_lock);
> >  	mutex_unlock(&swapon_mutex);
> >  	vfree(swap_map);
> > @@ -1695,6 +1697,7 @@ asmlinkage long sys_swapon(const char __
> >  	} else {
> >  		swap_info[prev].next = p - swap_info;
> >  	}
> > +	trace_swap_file_open(swap_file, name);
> >  	spin_unlock(&swap_lock);
> >  	mutex_unlock(&swapon_mutex);
> >  	error = 0;
> > @@ -1796,6 +1799,7 @@ get_swap_info_struct(unsigned type)
> >  {
> >  	return &swap_info[type];
> >  }
> > +EXPORT_SYMBOL_GPL(get_swap_info_struct);
> 
> I'm not too happy with this export.
> 

Would it make more sense to turn get_swap_info_struct into a static
inline in swap.h ?

Mathieu

> >  
> >  /*
> >   * swap_lock prevents swap_map being freed. Don't grab an extra
> > Index: linux-2.6-lttng/include/trace/swap.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/include/trace/swap.h	2008-07-15 14:02:54.000000000 -0400
> > @@ -0,0 +1,20 @@
> > +#ifndef _TRACE_SWAP_H
> > +#define _TRACE_SWAP_H
> > +
> > +#include <linux/swap.h>
> > +#include <linux/tracepoint.h>
> > +
> > +DEFINE_TRACE(swap_in,
> > +	TPPROTO(struct page *page, swp_entry_t entry),
> > +	TPARGS(page, entry));
> > +DEFINE_TRACE(swap_out,
> > +	TPPROTO(struct page *page),
> > +	TPARGS(page));
> > +DEFINE_TRACE(swap_file_open,
> > +	TPPROTO(struct file *file, char *filename),
> > +	TPARGS(file, filename));
> > +DEFINE_TRACE(swap_file_close,
> > +	TPPROTO(struct file *file),
> > +	TPARGS(file));
> > +
> > +#endif
> > 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 10/17] LTTng instrumentation - swap
  2008-07-16 14:40     ` Mathieu Desnoyers
@ 2008-07-16 14:47       ` Peter Zijlstra
  2008-07-16 15:00         ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2008-07-16 14:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, linux-mm,
	Dave Hansen, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

On Wed, 2008-07-16 at 10:40 -0400, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:

> > > @@ -1796,6 +1799,7 @@ get_swap_info_struct(unsigned type)
> > >  {
> > >  	return &swap_info[type];
> > >  }
> > > +EXPORT_SYMBOL_GPL(get_swap_info_struct);
> > 
> > I'm not too happy with this export.
> > 
> 
> Would it make more sense to turn get_swap_info_struct into a static
> inline in swap.h ?

Seeing a consumer of it would go a long way towards discussing it ;-)


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 10/17] LTTng instrumentation - swap
  2008-07-16 14:47       ` Peter Zijlstra
@ 2008-07-16 15:00         ` Mathieu Desnoyers
  2008-07-16 15:50           ` KOSAKI Motohiro
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, linux-mm,
	Dave Hansen, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Steven Rostedt, Eduard - Gabriel Munteanu

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Wed, 2008-07-16 at 10:40 -0400, Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> 
> > > > @@ -1796,6 +1799,7 @@ get_swap_info_struct(unsigned type)
> > > >  {
> > > >  	return &swap_info[type];
> > > >  }
> > > > +EXPORT_SYMBOL_GPL(get_swap_info_struct);
> > > 
> > > I'm not too happy with this export.
> > > 
> > 
> > Would it make more sense to turn get_swap_info_struct into a static
> > inline in swap.h ?
> 
> Seeing a consumer of it would go a long way towards discussing it ;-)
> 

The LTTng probe which connects to this tracepoint looks like :

+static void probe_swap_out(struct page *page)
+{
+       trace_mark(mm_swap_out, "pfn %lu filp %p offset %lu",
+               page_to_pfn(page),
+               get_swap_info_struct(swp_type(
+                       page_swp_entry(page)))->swap_file,
+               swp_offset(page_swp_entry(page)));
+}

So, I need get_swap_info_struct to extract the swap file pointer from
the swap entry.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 12/17] LTTng instrumentation - page
  2008-07-16  8:41   ` Peter Zijlstra
@ 2008-07-16 15:03     ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu, Martin Bligh,
	Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2008-07-15 at 18:26 -0400, Mathieu Desnoyers wrote:
> > plain text document attachment (lttng-instrumentation-page.patch)
> > Paging activity instrumentation. Instruments page allocation/free to keep track
> > of page allocation. This does not cover hugetlb activity, which is covered by a
> > separate patch.
> > 
> > Those tracepoints are used by LTTng.
> > 
> > About the performance impact of tracepoints (which is comparable to markers),
> > even without immediate values optimizations, tests done by Hideo Aoki on ia64
> > show no regression. His test case was using hackbench on a kernel where
> > scheduler instrumentation (about 5 events in code scheduler code) was added.
> > See the "Tracepoints" patch header for performance result detail.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > CC: Martin Bligh <mbligh@google.com>
> > CC: Masami Hiramatsu <mhiramat@redhat.com>
> > CC: 'Peter Zijlstra' <peterz@infradead.org>
> > CC: "Frank Ch. Eigler" <fche@redhat.com>
> > CC: 'Ingo Molnar' <mingo@elte.hu>
> > CC: 'Hideo AOKI' <haoki@redhat.com>
> > CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
> > CC: 'Steven Rostedt' <rostedt@goodmis.org>
> > CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---
> >  include/trace/page.h |   16 ++++++++++++++++
> >  mm/page_alloc.c      |    6 ++++++
> >  2 files changed, 22 insertions(+)
> > 
> > Index: linux-2.6-lttng/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/mm/page_alloc.c	2008-07-15 13:54:46.000000000 -0400
> > +++ linux-2.6-lttng/mm/page_alloc.c	2008-07-15 14:04:38.000000000 -0400
> > @@ -46,6 +46,7 @@
> >  #include <linux/page-isolation.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/debugobjects.h>
> > +#include <trace/page.h>
> >  
> >  #include <asm/tlbflush.h>
> >  #include <asm/div64.h>
> > @@ -510,6 +511,8 @@ static void __free_pages_ok(struct page 
> >  	int i;
> >  	int reserved = 0;
> >  
> > +	trace_page_free(page, order);
> > +
> >  	for (i = 0 ; i < (1 << order) ; ++i)
> >  		reserved += free_pages_check(page + i);
> >  	if (reserved)
> > @@ -966,6 +969,8 @@ static void free_hot_cold_page(struct pa
> >  	struct per_cpu_pages *pcp;
> >  	unsigned long flags;
> >  
> > +	trace_page_free(page, 0);
> > +
> >  	if (PageAnon(page))
> >  		page->mapping = NULL;
> >  	if (free_pages_check(page))
> > @@ -1630,6 +1635,7 @@ nopage:
> >  		show_mem();
> >  	}
> >  got_pg:
> > +	trace_page_alloc(page, order);
> >  	return page;
> >  }
> >  
> > Index: linux-2.6-lttng/include/trace/page.h
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-lttng/include/trace/page.h	2008-07-15 14:04:38.000000000 -0400
> 
> This name seems inconsitent with your other choices.
> 
> include/traec/page_alloc.h comes to mind
> 

Yes, good idea, will fix.

Mathieu

> > @@ -0,0 +1,16 @@
> > +#ifndef _TRACE_PAGE_H
> > +#define _TRACE_PAGE_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +/*
> > + * mm_page_alloc : page can be NULL.
> > + */
> > +DEFINE_TRACE(page_alloc,
> > +	TPPROTO(struct page *page, unsigned int order),
> > +	TPARGS(page, order));
> > +DEFINE_TRACE(page_free,
> > +	TPPROTO(struct page *page, unsigned int order),
> > +	TPARGS(page, order));
> > +
> > +#endif
> > 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 10/17] LTTng instrumentation - swap
  2008-07-16 15:00         ` Mathieu Desnoyers
@ 2008-07-16 15:50           ` KOSAKI Motohiro
  2008-07-16 16:17             ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: KOSAKI Motohiro @ 2008-07-16 15:50 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: kosaki.motohiro, Peter Zijlstra, akpm, Ingo Molnar, linux-kernel,
	Masami Hiramatsu, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

Hi

> > > Would it make more sense to turn get_swap_info_struct into a static
> > > inline in swap.h ?
> > 
> > Seeing a consumer of it would go a long way towards discussing it ;-)
> 
> The LTTng probe which connects to this tracepoint looks like :

I have no objection to this exporting.

However, This is LTTng requirement.
but tracepoint is tracer independent mechanism.
then, split out is better IMHO.





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 10/17] LTTng instrumentation - swap
  2008-07-16 15:50           ` KOSAKI Motohiro
@ 2008-07-16 16:17             ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-16 16:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, akpm, Ingo Molnar, linux-kernel,
	Masami Hiramatsu, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
> 
> > > > Would it make more sense to turn get_swap_info_struct into a static
> > > > inline in swap.h ?
> > > 
> > > Seeing a consumer of it would go a long way towards discussing it ;-)
> > 
> > The LTTng probe which connects to this tracepoint looks like :
> 
> I have no objection to this exporting.
> 
> However, This is LTTng requirement.
> but tracepoint is tracer independent mechanism.
> then, split out is better IMHO.
> 

Good point. I'll move it to my following lttng-specific patches.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 09/17] LTTng instrumentation - filemap
  2008-07-15 22:26 ` [patch 09/17] LTTng instrumentation - filemap Mathieu Desnoyers
  2008-07-16  8:35   ` Peter Zijlstra
@ 2008-07-17  6:25   ` Nick Piggin
  2008-07-17  7:02     ` Mathieu Desnoyers
  1 sibling, 1 reply; 51+ messages in thread
From: Nick Piggin @ 2008-07-17  6:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

On Wednesday 16 July 2008 08:26, Mathieu Desnoyers wrote:
> Instrumentation of waits caused by memory accesses on mmap regions.
>
> Those tracepoints are used by LTTng.
>
> About the performance impact of tracepoints (which is comparable to
> markers), even without immediate values optimizations, tests done by Hideo
> Aoki on ia64 show no regression. His test case was using hackbench on a
> kernel where scheduler instrumentation (about 5 events in code scheduler
> code) was added. See the "Tracepoints" patch header for performance result
> detail.

BTW. this sort of test is practically useless to measure overhead. If
a modern CPU is executing out of primed insn/data and branch prediction
cache, then yes this sort of thing is pretty well free.

I see *real* workloads that have got continually and incrementally slower
eg from 2.6.5 to 2.6.20+ as "features" get added. Surprisingly, none of
them ever showed up individually on a microbenchmark.

OK, for this case if you can configure it out, I guess that's fine. But
let's not pretend that adding code and branches and function calls are
ever free.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 09/17] LTTng instrumentation - filemap
  2008-07-17  6:25   ` Nick Piggin
@ 2008-07-17  7:02     ` Mathieu Desnoyers
  2008-07-17  7:11       ` Nick Piggin
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-17  7:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

* Nick Piggin (nickpiggin@yahoo.com.au) wrote:
> On Wednesday 16 July 2008 08:26, Mathieu Desnoyers wrote:
> > Instrumentation of waits caused by memory accesses on mmap regions.
> >
> > Those tracepoints are used by LTTng.
> >
> > About the performance impact of tracepoints (which is comparable to
> > markers), even without immediate values optimizations, tests done by Hideo
> > Aoki on ia64 show no regression. His test case was using hackbench on a
> > kernel where scheduler instrumentation (about 5 events in code scheduler
> > code) was added. See the "Tracepoints" patch header for performance result
> > detail.
> 
> BTW. this sort of test is practically useless to measure overhead. If
> a modern CPU is executing out of primed insn/data and branch prediction
> cache, then yes this sort of thing is pretty well free.
> 
> I see *real* workloads that have got continually and incrementally slower
> eg from 2.6.5 to 2.6.20+ as "features" get added. Surprisingly, none of
> them ever showed up individually on a microbenchmark.
> 
> OK, for this case if you can configure it out, I guess that's fine. But
> let's not pretend that adding code and branches and function calls are
> ever free.

I never pretended anything like that. Actually, that's what the
"immediate values" are for : they allow to patch load immediate value
instead of a memory read to decrease d-cache impact. They now allow to
patch a jump instead of the memory read/immediate value read + test +
conditional branch to skip the function call with fairly minimal impact.
I agree with you that eating precious d-cache and jump prediction buffer
entries can eventually slow down the system. But this will be _hard_ to
show on a single macro benchmark, and the microbenchmark showing it will
have to be taken in conditions which will exacerbate the d-cache and BPB
impact.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 09/17] LTTng instrumentation - filemap
  2008-07-17  7:02     ` Mathieu Desnoyers
@ 2008-07-17  7:11       ` Nick Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2008-07-17  7:11 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, linux-mm, Dave Hansen, Frank Ch. Eigler,
	Hideo AOKI, Takashi Nishiie, Steven Rostedt,
	Eduard - Gabriel Munteanu

On Thursday 17 July 2008 17:02, Mathieu Desnoyers wrote:
> * Nick Piggin (nickpiggin@yahoo.com.au) wrote:
> > On Wednesday 16 July 2008 08:26, Mathieu Desnoyers wrote:
> > > Instrumentation of waits caused by memory accesses on mmap regions.
> > >
> > > Those tracepoints are used by LTTng.
> > >
> > > About the performance impact of tracepoints (which is comparable to
> > > markers), even without immediate values optimizations, tests done by
> > > Hideo Aoki on ia64 show no regression. His test case was using
> > > hackbench on a kernel where scheduler instrumentation (about 5 events
> > > in code scheduler code) was added. See the "Tracepoints" patch header
> > > for performance result detail.
> >
> > BTW. this sort of test is practically useless to measure overhead. If
> > a modern CPU is executing out of primed insn/data and branch prediction
> > cache, then yes this sort of thing is pretty well free.
> >
> > I see *real* workloads that have got continually and incrementally slower
> > eg from 2.6.5 to 2.6.20+ as "features" get added. Surprisingly, none of
> > them ever showed up individually on a microbenchmark.
> >
> > OK, for this case if you can configure it out, I guess that's fine. But
> > let's not pretend that adding code and branches and function calls are
> > ever free.
>
> I never pretended anything like that. Actually, that's what the

OK but saying "there is no detectable impact when running hackbench" is
basically meaningless.


> "immediate values" are for : they allow to patch load immediate value
> instead of a memory read to decrease d-cache impact. They now allow to
> patch a jump instead of the memory read/immediate value read + test +
> conditional branch to skip the function call with fairly minimal impact.
> I agree with you that eating precious d-cache and jump prediction buffer
> entries can eventually slow down the system. But this will be _hard_ to
> show on a single macro benchmark, and the microbenchmark showing it will
> have to be taken in conditions which will exacerbate the d-cache and BPB
> impact.

I'm not saying you have to reproduce it (although Intel's Oracle OLTP
benchmark is very sensitive to that kind of thing and schedule() is near
the top). But just acknowledge that it adds some cost. OK you're one of
the few people really trying hard to count every cycle so I don't mean to
pick on this code in particular.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 00/17] Tracepoints v4 for linux-next
  2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
                   ` (17 preceding siblings ...)
  2008-07-16  8:51 ` [patch 00/17] Tracepoints v4 for linux-next Peter Zijlstra
@ 2008-07-18 15:41 ` Masami Hiramatsu
  18 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2008-07-18 15:41 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra

Hi Mathieu,

As far as I can see, your patchset can be separated into
several series of patches. Would you think this series of
patches should be pushed at once, or could be pushed
individually?

I think it is hard to push all of them into kernel at once,
because it needs ACKs from several developers who maintain
traced subsystems.

So, I'd like to suggest you to separate series of patches,
and to push framework and instrumentation patches step by step.
What would you think about it?

Thank you,

Mathieu Desnoyers wrote:
> Hi,
> 
> Here is the newest release of the Tracepoints, following the feedback from Peter
> Zijlstra. The main change is the creation of include/trace/ as a placeholder
> from tracepoint headers.
> 
> The patchset applies over linux-next patch-v2.6.26-next-20080715 in this order :
> 
> #This a separate RCU update upon which the tracepoints depend
> rcu-read-sched.patch
> 
> tracepoints.patch
> tracepoints-documentation.patch
> tracepoints-samples.patch
> 
> lttng-instrumentation-irq.patch
> lttng-instrumentation-scheduler.patch
> lttng-instrumentation-timer.patch
> lttng-instrumentation-kernel.patch
> 
> lttng-instrumentation-filemap.patch
> lttng-instrumentation-swap.patch
> lttng-instrumentation-memory.patch
> lttng-instrumentation-page.patch
> lttng-instrumentation-hugetlb.patch
> 
> lttng-instrumentation-net.patch
> lttng-instrumentation-ipv4.patch
> lttng-instrumentation-ipv6.patch
> 
> ftrace-port-to-tracepoints.patch
> 
> Mathieu
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 08/17] LTTng instrumentation - kernel
  2008-07-15 22:26 ` [patch 08/17] LTTng instrumentation - kernel Mathieu Desnoyers
@ 2008-07-24 13:57   ` Steven Rostedt
  2008-07-24 14:30     ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2008-07-24 13:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Eduard - Gabriel Munteanu



On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
>
>  /*
>   * Low level drivers may need that to know if they can schedule in
> @@ -601,6 +603,7 @@ asmlinkage int printk(const char *fmt, .
>  	int r;
>
>  	va_start(args, fmt);
> +	trace_kernel_printk(__builtin_return_address(0));

BTW, ftrace.h has macros that let you use CALLER_ADDR0 for
__builtin_return_address. It also converts it from a pointer to a long,
but makes the code look prettier.


>  	r = vprintk(fmt, args);
>  	va_end(args);
>
> @@ -677,6 +680,9 @@ asmlinkage int vprintk(const char *fmt,
>  	raw_local_irq_save(flags);
>  	this_cpu = smp_processor_id();
>
> +	trace_kernel_vprintk(__builtin_return_address(0),
> +		printk_buf, printed_len);
> +
>  	/*
>  	 * Ouch, printk recursed into itself!
>  	 */

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 08/17] LTTng instrumentation - kernel
  2008-07-24 13:57   ` Steven Rostedt
@ 2008-07-24 14:30     ` Mathieu Desnoyers
  2008-07-24 15:13       ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-24 14:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Eduard - Gabriel Munteanu

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> 
> On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> >
> >  /*
> >   * Low level drivers may need that to know if they can schedule in
> > @@ -601,6 +603,7 @@ asmlinkage int printk(const char *fmt, .
> >  	int r;
> >
> >  	va_start(args, fmt);
> > +	trace_kernel_printk(__builtin_return_address(0));
> 
> BTW, ftrace.h has macros that let you use CALLER_ADDR0 for
> __builtin_return_address. It also converts it from a pointer to a long,
> but makes the code look prettier.
> 

include/linux/kernel.h:#define _RET_IP_   (unsigned long)__builtin_return_address(0)

Hrm, did not see this one. So I guess we can both switch to it ?

Thanks for pointing this out,

Mathieu

> 
> >  	r = vprintk(fmt, args);
> >  	va_end(args);
> >
> > @@ -677,6 +680,9 @@ asmlinkage int vprintk(const char *fmt,
> >  	raw_local_irq_save(flags);
> >  	this_cpu = smp_processor_id();
> >
> > +	trace_kernel_vprintk(__builtin_return_address(0),
> > +		printk_buf, printed_len);
> > +
> >  	/*
> >  	 * Ouch, printk recursed into itself!
> >  	 */
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-15 22:26 ` [patch 02/17] Kernel Tracepoints Mathieu Desnoyers
@ 2008-07-24 15:08   ` Steven Rostedt
  2008-07-24 20:18     ` Mathieu Desnoyers
  2008-07-24 15:34   ` Steven Rostedt
  2008-07-24 15:39   ` Steven Rostedt
  2 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2008-07-24 15:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrew Morton, Ingo Molnar, LKML, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu, Paul E. McKenney


[Added Paul McKenney to CC]

On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> +++ linux-2.6-lttng/include/linux/tracepoint.h	2008-07-15 17:35:19.000000000 -0400
> @@ -0,0 +1,127 @@
> +#ifndef _LINUX_TRACEPOINT_H
> +#define _LINUX_TRACEPOINT_H
> +
> +/*
> + * Kernel Tracepoint API.
> + *
> + * See Documentation/tracepoint.txt.
> + *
> + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> + *
> + * Heavily inspired from the Linux Kernel Markers.
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/rcupdate.h>
> +
> +struct module;
> +struct tracepoint;
> +
> +struct tracepoint {
> +	const char *name;		/* Tracepoint name */
> +	int state;			/* State. */
> +	void **funcs;
> +} __attribute__((aligned(8)));
> +
> +
> +#define TPPROTO(args...)	args
> +#define TPARGS(args...)		args
> +
> +#ifdef CONFIG_TRACEPOINTS
> +
> +/*
> + * it_func[0] is never NULL because there is at least one element in the array
> + * when the array itself is non NULL.
> + */
> +#define __DO_TRACE(tp, proto, args)					\
> +	do {								\
> +		void **it_func;						\
> +									\
> +		rcu_read_lock_sched();					\
> +		it_func = rcu_dereference((tp)->funcs);			\
> +		if (it_func) {						\
> +			do {						\
> +				((void(*)(proto))(*it_func))(args);	\
> +			} while (*(++it_func));				\

OK, I still don't understand the concept of the rcu_dereference, but why
is it needed for the first assignment of it_func but not the ++? Is it
only needed with the (tp)->funcs?

-- Steve


> +		}							\
> +		rcu_read_unlock_sched();				\
> +	} while (0)


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 08/17] LTTng instrumentation - kernel
  2008-07-24 14:30     ` Mathieu Desnoyers
@ 2008-07-24 15:13       ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2008-07-24 15:13 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Eduard - Gabriel Munteanu


On Thu, 24 Jul 2008, Mathieu Desnoyers wrote:
>
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> >
> >
> > On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> > >
> > >  /*
> > >   * Low level drivers may need that to know if they can schedule in
> > > @@ -601,6 +603,7 @@ asmlinkage int printk(const char *fmt, .
> > >  	int r;
> > >
> > >  	va_start(args, fmt);
> > > +	trace_kernel_printk(__builtin_return_address(0));
> >
> > BTW, ftrace.h has macros that let you use CALLER_ADDR0 for
> > __builtin_return_address. It also converts it from a pointer to a long,
> > but makes the code look prettier.
> >
>
> include/linux/kernel.h:#define _RET_IP_   (unsigned long)__builtin_return_address(0)
>
> Hrm, did not see this one. So I guess we can both switch to it ?

Hmm, yeah, I forgot about that. Someplaces I do use CALLER_ADDR1 and
higher, but when it is just CALLER_ADDR0, then _RET_IP_ would probably be
better.

>
> Thanks for pointing this out,

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-15 22:26 ` [patch 02/17] Kernel Tracepoints Mathieu Desnoyers
  2008-07-24 15:08   ` Steven Rostedt
@ 2008-07-24 15:34   ` Steven Rostedt
  2008-07-24 20:30     ` Mathieu Desnoyers
  2008-07-24 15:39   ` Steven Rostedt
  2 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2008-07-24 15:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu



On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> +static void *
> +tracepoint_entry_remove_probe(struct tracepoint_entry *entry, void *probe)
> +{
> +	int nr_probes = 0, nr_del = 0, i;
> +	void **old, **new;
> +
> +	old = entry->funcs;
> +
> +	debug_print_probes(entry);
> +	/* (N -> M), (N > 1, M >= 0) probes */
> +	for (nr_probes = 0; old[nr_probes]; nr_probes++) {
> +		if ((!probe || old[nr_probes] == probe))
> +			nr_del++;
> +	}
> +
> +	if (nr_probes - nr_del == 0) {
> +		/* N -> 0, (N > 1) */
> +		entry->funcs = NULL;
> +		entry->refcount = 0;
> +		debug_print_probes(entry);
> +		return old;
> +	} else {
> +		int j = 0;
> +		/* N -> M, (N > 1, M > 0) */
> +		/* + 1 for NULL */
> +		new = kzalloc((nr_probes - nr_del + 1)
> +			* sizeof(void *), GFP_KERNEL);
> +		if (new == NULL)
> +			return ERR_PTR(-ENOMEM);

Hmm, on failure of allocating a new array, we could simply use the
old array, and remove the one probe from it instead of just failing.

-- Steve

> +		for (i = 0; old[i]; i++)
> +			if ((probe && old[i] != probe))
> +				new[j++] = old[i];
> +		entry->refcount = nr_probes - nr_del;
> +		entry->funcs = new;
> +	}
> +	debug_print_probes(entry);
> +	return old;
> +}
> +

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-15 22:26 ` [patch 02/17] Kernel Tracepoints Mathieu Desnoyers
  2008-07-24 15:08   ` Steven Rostedt
  2008-07-24 15:34   ` Steven Rostedt
@ 2008-07-24 15:39   ` Steven Rostedt
  2008-07-24 20:37     ` [PATCH] Tracepoints use TABLE_SIZE macro Mathieu Desnoyers
  2 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2008-07-24 15:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu




On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> Index: linux-2.6-lttng/kernel/tracepoint.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/kernel/tracepoint.c	2008-07-15 17:35:00.000000000 -0400

[...]

> +/*
> + * Tracepoint hash table, containing the active tracepoints.
> + * Protected by tracepoints_mutex.
> + */
> +#define TRACEPOINT_HASH_BITS 6
> +#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
> +

[...]

> +/*
> + * Get tracepoint if the tracepoint is present in the tracepoint hash table.
> + * Must be called with tracepoints_mutex held.
> + * Returns NULL if not present.
> + */
> +static struct tracepoint_entry *get_tracepoint(const char *name)
> +{
> +	struct hlist_head *head;
> +	struct hlist_node *node;
> +	struct tracepoint_entry *e;
> +	u32 hash = jhash(name, strlen(name), 0);
> +
> +	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];

Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ?

  hash & (
> +	hlist_for_each_entry(e, node, head, hlist) {
> +		if (!strcmp(name, e->name))
> +			return e;
> +	}
> +	return NULL;
> +}
> +
> +/*
> + * Add the tracepoint to the tracepoint hash table. Must be called with
> + * tracepoints_mutex held.
> + */
> +static struct tracepoint_entry *add_tracepoint(const char *name)
> +{
> +	struct hlist_head *head;
> +	struct hlist_node *node;
> +	struct tracepoint_entry *e;
> +	size_t name_len = strlen(name) + 1;
> +	u32 hash = jhash(name, name_len-1, 0);
> +
> +	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];

here too.

> +	hlist_for_each_entry(e, node, head, hlist) {
> +		if (!strcmp(name, e->name)) {
> +			printk(KERN_NOTICE
> +				"tracepoint %s busy\n", name);
> +			return ERR_PTR(-EEXIST);	/* Already there */
> +		}
> +	}
> +	/*
> +	 * Using kmalloc here to allocate a variable length element. Could
> +	 * cause some memory fragmentation if overused.
> +	 */
> +	e = kmalloc(sizeof(struct tracepoint_entry) + name_len, GFP_KERNEL);
> +	if (!e)
> +		return ERR_PTR(-ENOMEM);
> +	memcpy(&e->name[0], name, name_len);
> +	e->funcs = NULL;
> +	e->refcount = 0;
> +	e->rcu_pending = 0;
> +	hlist_add_head(&e->hlist, head);
> +	return e;
> +}
> +
> +/*
> + * Remove the tracepoint from the tracepoint hash table. Must be called with
> + * mutex_lock held.
> + */
> +static int remove_tracepoint(const char *name)
> +{
> +	struct hlist_head *head;
> +	struct hlist_node *node;
> +	struct tracepoint_entry *e;
> +	int found = 0;
> +	size_t len = strlen(name) + 1;
> +	u32 hash = jhash(name, len-1, 0);
> +
> +	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];

here too.

> +	hlist_for_each_entry(e, node, head, hlist) {
> +		if (!strcmp(name, e->name)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +	if (!found)
> +		return -ENOENT;
> +	if (e->refcount)
> +		return -EBUSY;
> +	hlist_del(&e->hlist);
> +	/* Make sure the call_rcu has been executed */
> +	if (e->rcu_pending)
> +		rcu_barrier();
> +	kfree(e);
> +	return 0;
> +}

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-24 15:08   ` Steven Rostedt
@ 2008-07-24 20:18     ` Mathieu Desnoyers
  2008-08-01 21:10       ` Paul E. McKenney
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-24 20:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Ingo Molnar, LKML, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu, Paul E. McKenney

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> [Added Paul McKenney to CC]
> 
> On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> > +++ linux-2.6-lttng/include/linux/tracepoint.h	2008-07-15 17:35:19.000000000 -0400
> > @@ -0,0 +1,127 @@
> > +#ifndef _LINUX_TRACEPOINT_H
> > +#define _LINUX_TRACEPOINT_H
> > +
> > +/*
> > + * Kernel Tracepoint API.
> > + *
> > + * See Documentation/tracepoint.txt.
> > + *
> > + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > + *
> > + * Heavily inspired from the Linux Kernel Markers.
> > + *
> > + * This file is released under the GPLv2.
> > + * See the file COPYING for more details.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/rcupdate.h>
> > +
> > +struct module;
> > +struct tracepoint;
> > +
> > +struct tracepoint {
> > +	const char *name;		/* Tracepoint name */
> > +	int state;			/* State. */
> > +	void **funcs;
> > +} __attribute__((aligned(8)));
> > +
> > +
> > +#define TPPROTO(args...)	args
> > +#define TPARGS(args...)		args
> > +
> > +#ifdef CONFIG_TRACEPOINTS
> > +
> > +/*
> > + * it_func[0] is never NULL because there is at least one element in the array
> > + * when the array itself is non NULL.
> > + */
> > +#define __DO_TRACE(tp, proto, args)					\
> > +	do {								\
> > +		void **it_func;						\
> > +									\
> > +		rcu_read_lock_sched();					\
> > +		it_func = rcu_dereference((tp)->funcs);			\
> > +		if (it_func) {						\
> > +			do {						\
> > +				((void(*)(proto))(*it_func))(args);	\
> > +			} while (*(++it_func));				\
> 
> OK, I still don't understand the concept of the rcu_dereference, but why
> is it needed for the first assignment of it_func but not the ++? Is it
> only needed with the (tp)->funcs?
> 

rcu_dereference copies the tp->funcs pointer on the local stack and then
puts a smp_read_barrier_depends() to make sure that the tp->funcs read
occurs before the actual use of the data (here, it is the array
elements) where the tp->funcs pointer copy points to.

What happens here is that the tp->funcs pointer, pointing to the
beginning of the array, is only read once. Afterward, the iterator is
located on the stack and therefore incrementing it does not need to be
protected by any other kind of barrier whatsoever because only the
original tp->funcs read was a RCU pointer read.

Then, as you probably know, the update side performs a
rcu_assign_pointer which does a smp_wmb before the pointer assignment to
make sure the array data has been populated before the pointer
assignment.

Mathieu

> -- Steve
> 
> 
> > +		}							\
> > +		rcu_read_unlock_sched();				\
> > +	} while (0)
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-24 15:34   ` Steven Rostedt
@ 2008-07-24 20:30     ` Mathieu Desnoyers
  2008-07-24 22:22       ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-24 20:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> 
> On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> > +static void *
> > +tracepoint_entry_remove_probe(struct tracepoint_entry *entry, void *probe)
> > +{
> > +	int nr_probes = 0, nr_del = 0, i;
> > +	void **old, **new;
> > +
> > +	old = entry->funcs;
> > +
> > +	debug_print_probes(entry);
> > +	/* (N -> M), (N > 1, M >= 0) probes */
> > +	for (nr_probes = 0; old[nr_probes]; nr_probes++) {
> > +		if ((!probe || old[nr_probes] == probe))
> > +			nr_del++;
> > +	}
> > +
> > +	if (nr_probes - nr_del == 0) {
> > +		/* N -> 0, (N > 1) */
> > +		entry->funcs = NULL;
> > +		entry->refcount = 0;
> > +		debug_print_probes(entry);
> > +		return old;
> > +	} else {
> > +		int j = 0;
> > +		/* N -> M, (N > 1, M > 0) */
> > +		/* + 1 for NULL */
> > +		new = kzalloc((nr_probes - nr_del + 1)
> > +			* sizeof(void *), GFP_KERNEL);
> > +		if (new == NULL)
> > +			return ERR_PTR(-ENOMEM);
> 
> Hmm, on failure of allocating a new array, we could simply use the
> old array, and remove the one probe from it instead of just failing.
> 

Nay, because of RCU constraints. So we have the readers in the current
RCU window who need to see the old version, and readers of the following
window who need to see the next version. Both can live at the same time
on the system. We cannot reuse the same memory to perform the array
shrink without corrupting the data seen by the previous readers. We
really have to perform a copy here.

Mathieu


> -- Steve
> 
> > +		for (i = 0; old[i]; i++)
> > +			if ((probe && old[i] != probe))
> > +				new[j++] = old[i];
> > +		entry->refcount = nr_probes - nr_del;
> > +		entry->funcs = new;
> > +	}
> > +	debug_print_probes(entry);
> > +	return old;
> > +}
> > +

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] Tracepoints use TABLE_SIZE macro
  2008-07-24 15:39   ` Steven Rostedt
@ 2008-07-24 20:37     ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-07-24 20:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu

Steven Rostedt <rostedt@goodmis.org> :

Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ?

me :

Sure,

It applies on top of the "Tracepoints" patch, currently in the ftrace
tree.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: akpm@linux-foundation.org
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Masami Hiramatsu <mhiramat@redhat.com>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: Hideo AOKI <haoki@redhat.com>
CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 kernel/tracepoint.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/kernel/tracepoint.c
===================================================================
--- linux-2.6-lttng.orig/kernel/tracepoint.c	2008-07-24 16:33:52.000000000 -0400
+++ linux-2.6-lttng/kernel/tracepoint.c	2008-07-24 16:34:57.000000000 -0400
@@ -177,7 +177,7 @@ static struct tracepoint_entry *get_trac
 	struct tracepoint_entry *e;
 	u32 hash = jhash(name, strlen(name), 0);
 
-	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
+	head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
 	hlist_for_each_entry(e, node, head, hlist) {
 		if (!strcmp(name, e->name))
 			return e;
@@ -197,7 +197,7 @@ static struct tracepoint_entry *add_trac
 	size_t name_len = strlen(name) + 1;
 	u32 hash = jhash(name, name_len-1, 0);
 
-	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
+	head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
 	hlist_for_each_entry(e, node, head, hlist) {
 		if (!strcmp(name, e->name)) {
 			printk(KERN_NOTICE
@@ -233,7 +233,7 @@ static int remove_tracepoint(const char 
 	size_t len = strlen(name) + 1;
 	u32 hash = jhash(name, len-1, 0);
 
-	head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
+	head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
 	hlist_for_each_entry(e, node, head, hlist) {
 		if (!strcmp(name, e->name)) {
 			found = 1;
-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-24 20:30     ` Mathieu Desnoyers
@ 2008-07-24 22:22       ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2008-07-24 22:22 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu


On Thu, 24 Jul 2008, Mathieu Desnoyers wrote:
> > > +
> > > +	if (nr_probes - nr_del == 0) {
> > > +		/* N -> 0, (N > 1) */
> > > +		entry->funcs = NULL;
> > > +		entry->refcount = 0;
> > > +		debug_print_probes(entry);
> > > +		return old;
> > > +	} else {
> > > +		int j = 0;
> > > +		/* N -> M, (N > 1, M > 0) */
> > > +		/* + 1 for NULL */
> > > +		new = kzalloc((nr_probes - nr_del + 1)
> > > +			* sizeof(void *), GFP_KERNEL);
> > > +		if (new == NULL)
> > > +			return ERR_PTR(-ENOMEM);
> >
> > Hmm, on failure of allocating a new array, we could simply use the
> > old array, and remove the one probe from it instead of just failing.
> >
>
> Nay, because of RCU constraints. So we have the readers in the current
> RCU window who need to see the old version, and readers of the following
> window who need to see the next version. Both can live at the same time
> on the system. We cannot reuse the same memory to perform the array
> shrink without corrupting the data seen by the previous readers. We
> really have to perform a copy here.

Ah, good point. I forgot the whole RCU factor here.

-- Steve


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 01/17] RCU read sched
  2008-07-15 22:26 ` [patch 01/17] RCU read sched Mathieu Desnoyers
@ 2008-08-01 21:10   ` Paul E. McKenney
  2008-08-01 23:04     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Peter Zijlstra, Masami Hiramatsu

On Tue, Jul 15, 2008 at 06:26:05PM -0400, Mathieu Desnoyers wrote:
> Add rcu_read_lock_sched() and rcu_read_unlock_sched() to rcupdate.h to match the
> recently added write-side call_rcu_sched() and rcu_barrier_sched(). They also
> match the no-so-recently-added synchronize_sched().
> 
> It will help following matching use of the update/read lock primitives. Those
> new read lock will replace preempt_disable()/enable() used in pair with
> RCU-classic synchronization.

Looks good, but...

synchronize_sched(), call_rcu_sched(), and rcu_barrier_sched() can also
pair up with:

o	local_irq_save() and local_irq_restore()
o	local_irq_disable() and local_irq_enable()
o	spin_lock_irqsave() and spin_lock_irqrestore()
o	etc. etc.

I do very much like the idea of marking the intent of matching with
RCU, but am getting a bit queasy about adding rcu_read_lock_sched_irq()
and so on.

Thoughts?  Other than having an rcu_read_lock_sched_nop() or some
other window-dressing macro that doesn't really do anything?  (Which
might really be the right thing to do...)

						Thanx, Paul

> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Paul E McKenney <paulmck@linux.vnet.ibm.com>
> CC: akpm@linux-foundation.org
> ---
>  include/linux/rcupdate.h |   18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> Index: linux-2.6-lttng/include/linux/rcupdate.h
> ===================================================================
> --- linux-2.6-lttng.orig/include/linux/rcupdate.h	2008-07-15 15:28:08.000000000 -0400
> +++ linux-2.6-lttng/include/linux/rcupdate.h	2008-07-15 17:38:02.000000000 -0400
> @@ -133,6 +133,24 @@ struct rcu_head {
>  #define rcu_read_unlock_bh() __rcu_read_unlock_bh()
> 
>  /**
> + * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
> + *
> + * Should be used with either
> + * - synchronize_sched()
> + * or
> + * - call_rcu_sched() and rcu_barrier_sched()
> + * on the write-side to insure proper synchronization.
> + */
> +#define rcu_read_lock_sched() preempt_disable()
> +
> +/*
> + * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
> + *
> + * See rcu_read_lock_sched for more information.
> + */
> +#define rcu_read_unlock_sched() preempt_enable()
> +
> +/**
>   * rcu_dereference - fetch an RCU-protected pointer in an
>   * RCU read-side critical section.  This pointer may later
>   * be safely dereferenced.
> 
> -- 
> Mathieu Desnoyers
> Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-07-24 20:18     ` Mathieu Desnoyers
@ 2008-08-01 21:10       ` Paul E. McKenney
  2008-08-04 15:17         ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, Andrew Morton, Ingo Molnar, LKML, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu

On Thu, Jul 24, 2008 at 04:18:00PM -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > 
> > [Added Paul McKenney to CC]
> > 
> > On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> > > +++ linux-2.6-lttng/include/linux/tracepoint.h	2008-07-15 17:35:19.000000000 -0400
> > > @@ -0,0 +1,127 @@
> > > +#ifndef _LINUX_TRACEPOINT_H
> > > +#define _LINUX_TRACEPOINT_H
> > > +
> > > +/*
> > > + * Kernel Tracepoint API.
> > > + *
> > > + * See Documentation/tracepoint.txt.
> > > + *
> > > + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > > + *
> > > + * Heavily inspired from the Linux Kernel Markers.
> > > + *
> > > + * This file is released under the GPLv2.
> > > + * See the file COPYING for more details.
> > > + */
> > > +
> > > +#include <linux/types.h>
> > > +#include <linux/rcupdate.h>
> > > +
> > > +struct module;
> > > +struct tracepoint;
> > > +
> > > +struct tracepoint {
> > > +	const char *name;		/* Tracepoint name */
> > > +	int state;			/* State. */
> > > +	void **funcs;
> > > +} __attribute__((aligned(8)));
> > > +
> > > +
> > > +#define TPPROTO(args...)	args
> > > +#define TPARGS(args...)		args
> > > +
> > > +#ifdef CONFIG_TRACEPOINTS
> > > +
> > > +/*
> > > + * it_func[0] is never NULL because there is at least one element in the array
> > > + * when the array itself is non NULL.
> > > + */
> > > +#define __DO_TRACE(tp, proto, args)					\
> > > +	do {								\
> > > +		void **it_func;						\
> > > +									\
> > > +		rcu_read_lock_sched();					\
> > > +		it_func = rcu_dereference((tp)->funcs);			\
> > > +		if (it_func) {						\
> > > +			do {						\
> > > +				((void(*)(proto))(*it_func))(args);	\
> > > +			} while (*(++it_func));				\
> > 
> > OK, I still don't understand the concept of the rcu_dereference, but why
> > is it needed for the first assignment of it_func but not the ++? Is it
> > only needed with the (tp)->funcs?
> > 
> 
> rcu_dereference copies the tp->funcs pointer on the local stack and then
> puts a smp_read_barrier_depends() to make sure that the tp->funcs read
> occurs before the actual use of the data (here, it is the array
> elements) where the tp->funcs pointer copy points to.
> 
> What happens here is that the tp->funcs pointer, pointing to the
> beginning of the array, is only read once. Afterward, the iterator is
> located on the stack and therefore incrementing it does not need to be
> protected by any other kind of barrier whatsoever because only the
> original tp->funcs read was a RCU pointer read.
> 
> Then, as you probably know, the update side performs a
> rcu_assign_pointer which does a smp_wmb before the pointer assignment to
> make sure the array data has been populated before the pointer
> assignment.

So the update side inserts a whole new array, rather than just the
first entry, correct?  If so, I am happy.

							Thanx, Paul

> Mathieu
> 
> > -- Steve
> > 
> > 
> > > +		}							\
> > > +		rcu_read_unlock_sched();				\
> > > +	} while (0)
> > 
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 01/17] RCU read sched
  2008-08-01 21:10   ` Paul E. McKenney
@ 2008-08-01 23:04     ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2008-08-01 23:04 UTC (permalink / raw)
  To: paulmck
  Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Masami Hiramatsu

On Fri, 2008-08-01 at 14:10 -0700, Paul E. McKenney wrote:
> On Tue, Jul 15, 2008 at 06:26:05PM -0400, Mathieu Desnoyers wrote:
> > Add rcu_read_lock_sched() and rcu_read_unlock_sched() to rcupdate.h to match the
> > recently added write-side call_rcu_sched() and rcu_barrier_sched(). They also
> > match the no-so-recently-added synchronize_sched().
> > 
> > It will help following matching use of the update/read lock primitives. Those
> > new read lock will replace preempt_disable()/enable() used in pair with
> > RCU-classic synchronization.
> 
> Looks good, but...
> 
> synchronize_sched(), call_rcu_sched(), and rcu_barrier_sched() can also
> pair up with:
> 
> o	local_irq_save() and local_irq_restore()
> o	local_irq_disable() and local_irq_enable()

> o	spin_lock_irqsave() and spin_lock_irqrestore()

You can't actually, as on PREEMP_RT these will not actuall disable
preemption.

> o	etc. etc.
> 
> I do very much like the idea of marking the intent of matching with
> RCU, but am getting a bit queasy about adding rcu_read_lock_sched_irq()
> and so on.

I'm thinking that if you disable interrupts, you're doing that for
another reason than RCU, so I'm not seeing the need for
rcu_read_lock_sched_irq variants.

Also, we should be very careful with using the *sched* RCU variant as it
really relies on disabling preemption - we should only use it when there
really is no other option, as we generally prefer to keep stuff
preemptable.

> Thoughts?  Other than having an rcu_read_lock_sched_nop() or some
> other window-dressing macro that doesn't really do anything?  (Which
> might really be the right thing to do...)

Afraid you lost me here..


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [patch 02/17] Kernel Tracepoints
  2008-08-01 21:10       ` Paul E. McKenney
@ 2008-08-04 15:17         ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2008-08-04 15:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Andrew Morton, Ingo Molnar, LKML, Peter Zijlstra,
	Masami Hiramatsu, Frank Ch. Eigler, Hideo AOKI, Takashi Nishiie,
	Alexander Viro, Eduard - Gabriel Munteanu

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Thu, Jul 24, 2008 at 04:18:00PM -0400, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > > 
> > > [Added Paul McKenney to CC]
> > > 
> > > On Tue, 15 Jul 2008, Mathieu Desnoyers wrote:
> > > > +++ linux-2.6-lttng/include/linux/tracepoint.h	2008-07-15 17:35:19.000000000 -0400
> > > > @@ -0,0 +1,127 @@
> > > > +#ifndef _LINUX_TRACEPOINT_H
> > > > +#define _LINUX_TRACEPOINT_H
> > > > +
> > > > +/*
> > > > + * Kernel Tracepoint API.
> > > > + *
> > > > + * See Documentation/tracepoint.txt.
> > > > + *
> > > > + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > > > + *
> > > > + * Heavily inspired from the Linux Kernel Markers.
> > > > + *
> > > > + * This file is released under the GPLv2.
> > > > + * See the file COPYING for more details.
> > > > + */
> > > > +
> > > > +#include <linux/types.h>
> > > > +#include <linux/rcupdate.h>
> > > > +
> > > > +struct module;
> > > > +struct tracepoint;
> > > > +
> > > > +struct tracepoint {
> > > > +	const char *name;		/* Tracepoint name */
> > > > +	int state;			/* State. */
> > > > +	void **funcs;
> > > > +} __attribute__((aligned(8)));
> > > > +
> > > > +
> > > > +#define TPPROTO(args...)	args
> > > > +#define TPARGS(args...)		args
> > > > +
> > > > +#ifdef CONFIG_TRACEPOINTS
> > > > +
> > > > +/*
> > > > + * it_func[0] is never NULL because there is at least one element in the array
> > > > + * when the array itself is non NULL.
> > > > + */
> > > > +#define __DO_TRACE(tp, proto, args)					\
> > > > +	do {								\
> > > > +		void **it_func;						\
> > > > +									\
> > > > +		rcu_read_lock_sched();					\
> > > > +		it_func = rcu_dereference((tp)->funcs);			\
> > > > +		if (it_func) {						\
> > > > +			do {						\
> > > > +				((void(*)(proto))(*it_func))(args);	\
> > > > +			} while (*(++it_func));				\
> > > 
> > > OK, I still don't understand the concept of the rcu_dereference, but why
> > > is it needed for the first assignment of it_func but not the ++? Is it
> > > only needed with the (tp)->funcs?
> > > 
> > 
> > rcu_dereference copies the tp->funcs pointer on the local stack and then
> > puts a smp_read_barrier_depends() to make sure that the tp->funcs read
> > occurs before the actual use of the data (here, it is the array
> > elements) where the tp->funcs pointer copy points to.
> > 
> > What happens here is that the tp->funcs pointer, pointing to the
> > beginning of the array, is only read once. Afterward, the iterator is
> > located on the stack and therefore incrementing it does not need to be
> > protected by any other kind of barrier whatsoever because only the
> > original tp->funcs read was a RCU pointer read.
> > 
> > Then, as you probably know, the update side performs a
> > rcu_assign_pointer which does a smp_wmb before the pointer assignment to
> > make sure the array data has been populated before the pointer
> > assignment.
> 
> So the update side inserts a whole new array, rather than just the
> first entry, correct?  If so, I am happy.
> 

Exactly.

Mathieu

> 							Thanx, Paul
> 
> > Mathieu
> > 
> > > -- Steve
> > > 
> > > 
> > > > +		}							\
> > > > +		rcu_read_unlock_sched();				\
> > > > +	} while (0)
> > > 
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2008-08-04 15:17 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-15 22:26 [patch 00/17] Tracepoints v4 for linux-next Mathieu Desnoyers
2008-07-15 22:26 ` [patch 01/17] RCU read sched Mathieu Desnoyers
2008-08-01 21:10   ` Paul E. McKenney
2008-08-01 23:04     ` Peter Zijlstra
2008-07-15 22:26 ` [patch 02/17] Kernel Tracepoints Mathieu Desnoyers
2008-07-24 15:08   ` Steven Rostedt
2008-07-24 20:18     ` Mathieu Desnoyers
2008-08-01 21:10       ` Paul E. McKenney
2008-08-04 15:17         ` Mathieu Desnoyers
2008-07-24 15:34   ` Steven Rostedt
2008-07-24 20:30     ` Mathieu Desnoyers
2008-07-24 22:22       ` Steven Rostedt
2008-07-24 15:39   ` Steven Rostedt
2008-07-24 20:37     ` [PATCH] Tracepoints use TABLE_SIZE macro Mathieu Desnoyers
2008-07-15 22:26 ` [patch 03/17] Tracepoints Documentation Mathieu Desnoyers
2008-07-15 22:26 ` [patch 04/17] Tracepoints Samples Mathieu Desnoyers
2008-07-15 22:26 ` [patch 05/17] LTTng instrumentation - irq Mathieu Desnoyers
2008-07-15 22:26 ` [patch 06/17] LTTng instrumentation - scheduler Mathieu Desnoyers
2008-07-16  8:30   ` Peter Zijlstra
2008-07-16 14:18     ` Mathieu Desnoyers
2008-07-15 22:26 ` [patch 07/17] LTTng instrumentation - timer Mathieu Desnoyers
2008-07-16  8:34   ` Peter Zijlstra
2008-07-16 14:34     ` Mathieu Desnoyers
2008-07-15 22:26 ` [patch 08/17] LTTng instrumentation - kernel Mathieu Desnoyers
2008-07-24 13:57   ` Steven Rostedt
2008-07-24 14:30     ` Mathieu Desnoyers
2008-07-24 15:13       ` Steven Rostedt
2008-07-15 22:26 ` [patch 09/17] LTTng instrumentation - filemap Mathieu Desnoyers
2008-07-16  8:35   ` Peter Zijlstra
2008-07-16 14:37     ` Mathieu Desnoyers
2008-07-17  6:25   ` Nick Piggin
2008-07-17  7:02     ` Mathieu Desnoyers
2008-07-17  7:11       ` Nick Piggin
2008-07-15 22:26 ` [patch 10/17] LTTng instrumentation - swap Mathieu Desnoyers
2008-07-16  8:39   ` Peter Zijlstra
2008-07-16 14:40     ` Mathieu Desnoyers
2008-07-16 14:47       ` Peter Zijlstra
2008-07-16 15:00         ` Mathieu Desnoyers
2008-07-16 15:50           ` KOSAKI Motohiro
2008-07-16 16:17             ` Mathieu Desnoyers
2008-07-15 22:26 ` [patch 11/17] LTTng instrumentation - memory page faults Mathieu Desnoyers
2008-07-15 22:26 ` [patch 12/17] LTTng instrumentation - page Mathieu Desnoyers
2008-07-16  8:41   ` Peter Zijlstra
2008-07-16 15:03     ` Mathieu Desnoyers
2008-07-15 22:26 ` [patch 13/17] LTTng instrumentation - hugetlb Mathieu Desnoyers
2008-07-15 22:26 ` [patch 14/17] LTTng instrumentation - net Mathieu Desnoyers
2008-07-15 22:26 ` [patch 15/17] LTTng instrumentation - ipv4 Mathieu Desnoyers
2008-07-15 22:26 ` [patch 16/17] LTTng instrumentation - ipv6 Mathieu Desnoyers
2008-07-15 22:26 ` [patch 17/17] ftrace port to tracepoints Mathieu Desnoyers
2008-07-16  8:51 ` [patch 00/17] Tracepoints v4 for linux-next Peter Zijlstra
2008-07-18 15:41 ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).