linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
@ 2009-03-05 22:47 Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 01/41] LTTng - core header Mathieu Desnoyers
                   ` (42 more replies)
  0 siblings, 43 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md

Hi,

I spent the last 4-5 months working with the Fujitsu team at implementing the
tracer elements identified as goals at Kernel Summit 2008 and at the following
Plumber Conference. My idea was to incremententally adapt the LTTng tracer,
currently used in the industry and well tested, to those requirements.

I spent the last days rearranging/folding/inspecting the LTTng patchset
to prepare it for an LKML post. The version 0.105 in the LTTng git tree
corresponds to the patchset I am posting here. The said patchset will
only include the core features of LTTng, excluding the timestamping
infrastructure (trace clock) and excluding the instrumentation.

The corresponding git tree contains also the trace clock patches and the lttng
instrumentation. The trace clock is required to use the tracer, but it can be
used without the instrumentation : there is already a kprobes and userspace
event support included in this patchset.

This tracer exports binary data through buffers using splice(). The resulting
binary files can be parsed from userspace because the format string metadata is
exported in the files. The event set can be enhanced by adding tracepoints to
the kernel code and by creating probe modules, which connects callbacks to the
tracepoints and contain the format string metainformation. Those callbacks are
responsible for writing the data in the trace buffers. This separation between
the trace buffer format string and the tracepoints is done on purpose so the
core kernel instrumentation (tracepoints) is not exported to userspace, which
will make maintainance much easier.

The tree including the trace clock patches is available at :

git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
branch : 2.6.29-rc7-lttng-0.105

Project website : http://www.lttng.org/

Information about how to install and use the tracer is available at :

http://ltt.polymtl.ca/svn/trunk/lttv/LTTngManual.html

The size of the LTTng core patchset is 41 patches. The diffstat details
as follow :

 include/linux/ltt-core.h                                  |   35 
 include/linux/ltt-relay.h                                 |  161 +
 include/linux/ltt-tracer.h                                |   43 
 include/linux/marker.h                                    |  121 
 kernel/marker.c                                           |  353 ++
 kernel/module.c                                           |   31 
 linux-2.6-lttng/Documentation/markers.txt                 |   17 
 linux-2.6-lttng/MAINTAINERS                               |    7 
 linux-2.6-lttng/Makefile                                  |    2 
 linux-2.6-lttng/arch/powerpc/kernel/traps.c               |    5 
 linux-2.6-lttng/arch/powerpc/platforms/cell/spufs/spufs.h |    6 
 linux-2.6-lttng/arch/sparc/Makefile                       |    2 
 linux-2.6-lttng/arch/x86/kernel/dumpstack.c               |    5 
 linux-2.6-lttng/arch/x86/mm/fault.c                       |    1 
 linux-2.6-lttng/fs/ext4/fsync.c                           |    8 
 linux-2.6-lttng/fs/ext4/ialloc.c                          |   17 
 linux-2.6-lttng/fs/ext4/inode.c                           |   79 
 linux-2.6-lttng/fs/ext4/mballoc.c                         |   71 
 linux-2.6-lttng/fs/ext4/mballoc.h                         |    2 
 linux-2.6-lttng/fs/ext4/super.c                           |    6 
 linux-2.6-lttng/fs/jbd2/checkpoint.c                      |    7 
 linux-2.6-lttng/fs/jbd2/commit.c                          |   12 
 linux-2.6-lttng/fs/pipe.c                                 |    5 
 linux-2.6-lttng/fs/select.c                               |   41 
 linux-2.6-lttng/fs/seq_file.c                             |   45 
 linux-2.6-lttng/fs/splice.c                               |    1 
 linux-2.6-lttng/include/linux/immediate.h                 |   94 
 linux-2.6-lttng/include/linux/kvm_host.h                  |   12 
 linux-2.6-lttng/include/linux/ltt-channels.h              |   94 
 linux-2.6-lttng/include/linux/ltt-core.h                  |   47 
 linux-2.6-lttng/include/linux/ltt-relay.h                 |  186 +
 linux-2.6-lttng/include/linux/ltt-tracer.h                |  731 ++++++
 linux-2.6-lttng/include/linux/ltt-type-serializer.h       |  107 
 linux-2.6-lttng/include/linux/marker.h                    |   16 
 linux-2.6-lttng/include/linux/module.h                    |    6 
 linux-2.6-lttng/include/linux/poll.h                      |    2 
 linux-2.6-lttng/include/linux/seq_file.h                  |   20 
 linux-2.6-lttng/include/trace/ext4.h                      |  129 +
 linux-2.6-lttng/include/trace/jbd2.h                      |   19 
 linux-2.6-lttng/init/Kconfig                              |    2 
 linux-2.6-lttng/kernel/kallsyms.c                         |    1 
 linux-2.6-lttng/kernel/marker.c                           |   12 
 linux-2.6-lttng/kernel/module.c                           |   32 
 linux-2.6-lttng/ltt/Kconfig                               |  130 +
 linux-2.6-lttng/ltt/Makefile                              |   15 
 linux-2.6-lttng/ltt/ltt-channels.c                        |  338 ++
 linux-2.6-lttng/ltt/ltt-core.c                            |  101 
 linux-2.6-lttng/ltt/ltt-filter.c                          |   66 
 linux-2.6-lttng/ltt/ltt-kprobes.c                         |  479 +++
 linux-2.6-lttng/ltt/ltt-marker-control.c                  |  265 ++
 linux-2.6-lttng/ltt/ltt-relay-alloc.c                     |  715 +++++
 linux-2.6-lttng/ltt/ltt-relay-locked.c                    | 1704 ++++++++++++++
 linux-2.6-lttng/ltt/ltt-serialize.c                       |  685 +++++
 linux-2.6-lttng/ltt/ltt-trace-control.c                   | 1061 ++++++++
 linux-2.6-lttng/ltt/ltt-tracer.c                          | 1210 +++++++++
 linux-2.6-lttng/ltt/ltt-type-serializer.c                 |   96 
 linux-2.6-lttng/ltt/ltt-userspace-event.c                 |  131 +
 linux-2.6-lttng/samples/markers/Makefile                  |    2 
 linux-2.6-lttng/samples/markers/marker-example.c          |    4 
 linux-2.6-lttng/samples/markers/probe-example.c           |   10 
 linux-2.6-lttng/samples/markers/test-multi.c              |  120 
 linux-2.6-lttng/virt/kvm/kvm_trace.c                      |   12 
 ltt/Kconfig                                               |   24 
 ltt/Makefile                                              |    2 
 ltt/ltt-relay-alloc.c                                     |   80 
 65 files changed, 9445 insertions(+), 398 deletions(-)


Comments are welcome.

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 01/41] LTTng - core header
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-06 18:37   ` Steven Rostedt
  2009-03-05 22:47 ` [RFC patch 02/41] LTTng - core data structures Mathieu Desnoyers
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-core-header.patch --]
[-- Type: text/plain, Size: 1917 bytes --]

Contains the structures required by the builtin part of the LTTng tracer.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/ltt-core.h |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

Index: linux-2.6-lttng/include/linux/ltt-core.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/ltt-core.h	2009-03-04 13:37:26.000000000 -0500
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2005,2006 Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * This contains the core definitions for the Linux Trace Toolkit.
+ */
+
+#ifndef LTT_CORE_H
+#define LTT_CORE_H
+
+#include <linux/list.h>
+#include <linux/percpu.h>
+
+/* ltt's root dir in debugfs */
+#define LTT_ROOT        "ltt"
+
+/*
+ * All modifications of ltt_traces must be done by ltt-tracer.c, while holding
+ * the semaphore. Only reading of this information can be done elsewhere, with
+ * the RCU mechanism : the preemption must be disabled while reading the
+ * list.
+ */
+struct ltt_traces {
+	struct list_head setup_head;	/* Pre-allocated traces list */
+	struct list_head head;		/* Allocated Traces list */
+	unsigned int num_active_traces;	/* Number of active traces */
+} ____cacheline_aligned;
+
+extern struct ltt_traces ltt_traces;
+
+/*
+ * get dentry of ltt's root dir
+ */
+struct dentry *get_ltt_root(void);
+
+void put_ltt_root(void);
+
+/* Keep track of trap nesting inside LTT */
+DECLARE_PER_CPU(unsigned int, ltt_nesting);
+
+typedef int (*ltt_run_filter_functor)(void *trace, uint16_t eID);
+
+extern ltt_run_filter_functor ltt_run_filter;
+
+extern void ltt_filter_register(ltt_run_filter_functor func);
+extern void ltt_filter_unregister(void);
+
+#endif /* LTT_CORE_H */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 02/41] LTTng - core data structures
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 01/41] LTTng - core header Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-06 18:41   ` Steven Rostedt
  2009-03-05 22:47 ` [RFC patch 03/41] LTTng core x86 Mathieu Desnoyers
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-core.patch --]
[-- Type: text/plain, Size: 3839 bytes --]

Home of the traces data structures. Needs to be built into the kernel.

LTT heartbeat is a module specialized into firing periodical interrupts to
record events in traces (so cycle counter rollover can be detected) and to
update the 64 bits "synthetic TSC" (extended from the CPU 32 bits TSC on MIPS).
Also needs to be built into the kernel.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 MAINTAINERS              |    7 +++
 include/linux/ltt-core.h |   10 ++++
 ltt/ltt-core.c           |  101 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+)

Index: linux-2.6-lttng/MAINTAINERS
===================================================================
--- linux-2.6-lttng.orig/MAINTAINERS	2009-03-04 13:24:38.000000000 -0500
+++ linux-2.6-lttng/MAINTAINERS	2009-03-04 13:24:59.000000000 -0500
@@ -2766,6 +2766,13 @@ P:	Eric Piel
 M:	eric.piel@tremplin-utc.net
 S:	Maintained
 
+LINUX TRACE TOOLKIT NEXT GENERATION
+P:	Mathieu Desnoyers
+M:	mathieu.desnoyers@polymtl.ca
+L:	ltt-dev@lttng.org
+W:	http://ltt.polymtl.ca
+S:	Maintained
+
 LM83 HARDWARE MONITOR DRIVER
 P:	Jean Delvare
 M:	khali@linux-fr.org
Index: linux-2.6-lttng/ltt/ltt-core.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-core.c	2009-03-04 13:36:17.000000000 -0500
@@ -0,0 +1,101 @@
+/*
+ * LTT core in-kernel infrastructure.
+ *
+ * Copyright 2006 - Mathieu Desnoyers mathieu.desnoyers@polymtl.ca
+ *
+ * Distributed under the GPL license
+ */
+
+#include <linux/ltt-core.h>
+#include <linux/percpu.h>
+#include <linux/module.h>
+#include <linux/debugfs.h>
+#include <linux/kref.h>
+
+/* Traces structures */
+struct ltt_traces ltt_traces = {
+	.setup_head = LIST_HEAD_INIT(ltt_traces.setup_head),
+	.head = LIST_HEAD_INIT(ltt_traces.head),
+};
+EXPORT_SYMBOL(ltt_traces);
+
+/* Traces list writer locking */
+static DEFINE_MUTEX(ltt_traces_mutex);
+
+/* root dentry mutex */
+static DEFINE_MUTEX(ltt_root_mutex);
+/* dentry of ltt's root dir */
+static struct dentry *ltt_root_dentry;
+static struct kref ltt_root_kref = {
+	.refcount = ATOMIC_INIT(0),
+};
+
+static void ltt_root_release(struct kref *ref)
+{
+	debugfs_remove(ltt_root_dentry);
+	ltt_root_dentry = NULL;
+}
+
+void put_ltt_root(void)
+{
+	mutex_lock(&ltt_root_mutex);
+	if (ltt_root_dentry)
+		kref_put(&ltt_root_kref, ltt_root_release);
+	mutex_unlock(&ltt_root_mutex);
+}
+EXPORT_SYMBOL_GPL(put_ltt_root);
+
+struct dentry *get_ltt_root(void)
+{
+	mutex_lock(&ltt_root_mutex);
+	if (!ltt_root_dentry) {
+		ltt_root_dentry = debugfs_create_dir(LTT_ROOT, NULL);
+		if (!ltt_root_dentry) {
+			printk(KERN_ERR "LTT : create ltt root dir failed\n");
+			goto out;
+		}
+		kref_init(&ltt_root_kref);
+		goto out;
+	}
+	kref_get(&ltt_root_kref);
+out:
+	mutex_unlock(&ltt_root_mutex);
+	return ltt_root_dentry;
+}
+EXPORT_SYMBOL_GPL(get_ltt_root);
+
+void ltt_lock_traces(void)
+{
+	mutex_lock(&ltt_traces_mutex);
+}
+EXPORT_SYMBOL_GPL(ltt_lock_traces);
+
+void ltt_unlock_traces(void)
+{
+	mutex_unlock(&ltt_traces_mutex);
+}
+EXPORT_SYMBOL_GPL(ltt_unlock_traces);
+
+DEFINE_PER_CPU(unsigned int, ltt_nesting);
+EXPORT_PER_CPU_SYMBOL(ltt_nesting);
+
+int ltt_run_filter_default(void *trace, uint16_t eID)
+{
+	return 1;
+}
+
+/* This function pointer is protected by a trace activation check */
+ltt_run_filter_functor ltt_run_filter = ltt_run_filter_default;
+EXPORT_SYMBOL_GPL(ltt_run_filter);
+
+void ltt_filter_register(ltt_run_filter_functor func)
+{
+	ltt_run_filter = func;
+}
+EXPORT_SYMBOL_GPL(ltt_filter_register);
+
+void ltt_filter_unregister(void)
+{
+	ltt_run_filter = ltt_run_filter_default;
+}
+EXPORT_SYMBOL_GPL(ltt_filter_unregister);

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 03/41] LTTng core x86
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 01/41] LTTng - core header Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 02/41] LTTng - core data structures Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 04/41] LTTng core powerpc Mathieu Desnoyers
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: lttng-core-x86.patch --]
[-- Type: text/plain, Size: 1220 bytes --]

Adds OOPS printing indication of nesting within tracer code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/dumpstack.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-2.6-lttng/arch/x86/kernel/dumpstack.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/dumpstack.c	2009-01-30 10:06:43.000000000 -0500
+++ linux-2.6-lttng/arch/x86/kernel/dumpstack.c	2009-01-30 10:07:35.000000000 -0500
@@ -14,6 +14,7 @@
 #include <linux/bug.h>
 #include <linux/nmi.h>
 #include <linux/sysfs.h>
+#include <linux/ltt-core.h>
 
 #include <asm/stacktrace.h>
 
@@ -254,6 +255,10 @@ int __kprobes __die(const char *str, str
 	printk("DEBUG_PAGEALLOC");
 #endif
 	printk("\n");
+#ifdef CONFIG_LTT
+	printk(KERN_EMERG "LTT NESTING LEVEL : %u", __get_cpu_var(ltt_nesting));
+	printk("\n");
+#endif
 	sysfs_printk_last_file();
 	if (notify_die(DIE_OOPS, str, regs, err,
 			current->thread.trap_no, SIGSEGV) == NOTIFY_STOP)

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 04/41] LTTng core powerpc
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 03/41] LTTng core x86 Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 05/41] LTTng relay buffer allocation, read, write Mathieu Desnoyers
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-core-powerpc.patch --]
[-- Type: text/plain, Size: 1072 bytes --]

Adds OOPS printing indication of nesting within tracer code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/powerpc/kernel/traps.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-2.6-lttng/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/kernel/traps.c	2009-01-09 18:16:47.000000000 -0500
+++ linux-2.6-lttng/arch/powerpc/kernel/traps.c	2009-01-09 18:17:23.000000000 -0500
@@ -33,6 +33,7 @@
 #include <linux/backlight.h>
 #include <linux/bug.h>
 #include <linux/kdebug.h>
+#include <linux/ltt-core.h>
 
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
@@ -138,6 +139,10 @@ int die(const char *str, struct pt_regs 
 #ifdef CONFIG_NUMA
 		printk("NUMA ");
 #endif
+#ifdef CONFIG_LTT
+		printk("LTT NESTING LEVEL : %u ", __get_cpu_var(ltt_nesting));
+		printk("\n");
+#endif
 		printk("%s\n", ppc_md.name ? ppc_md.name : "");
 
 		print_modules();

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 05/41] LTTng relay buffer allocation, read, write
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 04/41] LTTng core powerpc Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 06/41] LTTng optimize write to page function Mathieu Desnoyers
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Jens Axboe, Peter Zijlstra, Tom Zanussi,
	prasad, Thomas Gleixner, od, hch, David Wilder

[-- Attachment #1: lttng-relay-alloc.patch --]
[-- Type: text/plain, Size: 27395 bytes --]

As I told Martin, I was thinking about taking an axe and moving stuff around in
relay. Which I just did.

This patch reimplements relay with a linked list of pages. Provides read/write
wrappers which should be used to read or write from the buffers. It's the core
of a layered approach to the design requirements expressed by Martin and
discussed earlier.

It does not provide _any_ sort of locking on buffer data. Locking should be done
by the caller. Given that we might think of very lightweight locking schemes, it
makes sense to me that the underlying buffering infrastructure supports event
records larger than 1 page.

A cache saving 4 pointers is used to keep track of current page used for the
buffer for write, current page read and two contiguous subbuffer header pointer
lookup. The offset of each page within the buffer is saved in a structure
containing the offset, linked list and page frame pointer to permit cache lookup
without extra locking.

The offset and linked list are not placed in the page frame itself to allow
using the pages directly for disk I/O, network I/O or to mmap it to userspace
for live processing.

Write and header address lookup tested through LTTng. This patch contains
self-test code which detects if a client is actually trying to use the
read/write/get header address API to do random buffer offset access. If such
behavior is detected, a warning message is issued and the random access is done
as requested.

TODO : Currently, no splice file operations are implemented. Should come soon.
The idea is to splice the buffers directly into files or to the network.
We have to make sure the page frame fields used are not used by disk I/O or
network.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jens Axboe <jens.axboe@oracle.com>
CC: Martin Bligh <mbligh@google.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Tom Zanussi <zanussi@comcast.net>
CC: prasad@linux.vnet.ibm.com
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: od@suse.com
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: hch@lst.de
CC: David Wilder <dwilder@us.ibm.com>
---
 include/linux/ltt-relay.h |  182 +++++++++++
 ltt/ltt-relay-alloc.c     |  705 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 887 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-relay-alloc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-relay-alloc.c	2009-03-05 15:05:56.000000000 -0500
@@ -0,0 +1,705 @@
+/*
+ * Public API and common code for kernel->userspace relay file support.
+ *
+ * Copyright (C) 2002-2005 - Tom Zanussi (zanussi@us.ibm.com), IBM Corp
+ * Copyright (C) 1999-2005 - Karim Yaghmour (karim@opersys.com)
+ * Copyright (C) 2008 - Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * Moved to kernel/relay.c by Paul Mundt, 2006.
+ * November 2006 - CPU hotplug support by Mathieu Desnoyers
+ * 	(mathieu.desnoyers@polymtl.ca)
+ *
+ * This file is released under the GPL.
+ */
+#include <linux/errno.h>
+#include <linux/stddef.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/ltt-relay.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/cpu.h>
+#include <linux/splice.h>
+#include <linux/bitops.h>
+
+/* list of open channels, for cpu hotplug */
+static DEFINE_MUTEX(relay_channels_mutex);
+static LIST_HEAD(relay_channels);
+
+/**
+ *	relay_alloc_buf - allocate a channel buffer
+ *	@buf: the buffer struct
+ *	@size: total size of the buffer
+ */
+static int relay_alloc_buf(struct rchan_buf *buf, size_t *size)
+{
+	unsigned int i, n_pages;
+	struct buf_page *buf_page, *n;
+
+	*size = PAGE_ALIGN(*size);
+	n_pages = *size >> PAGE_SHIFT;
+
+	INIT_LIST_HEAD(&buf->pages);
+
+	for (i = 0; i < n_pages; i++) {
+		buf_page = kmalloc_node(sizeof(*buf_page), GFP_KERNEL,
+			cpu_to_node(buf->cpu));
+		if (unlikely(!buf_page))
+			goto depopulate;
+		buf_page->page = alloc_pages_node(cpu_to_node(buf->cpu),
+			GFP_KERNEL | __GFP_ZERO, 0);
+		if (unlikely(!buf_page->page)) {
+			kfree(buf_page);
+			goto depopulate;
+		}
+		list_add_tail(&buf_page->list, &buf->pages);
+		buf_page->offset = (size_t)i << PAGE_SHIFT;
+		set_page_private(buf_page->page, (unsigned long)buf_page);
+		if (i == 0) {
+			buf->wpage = buf_page;
+			buf->hpage[0] = buf_page;
+			buf->hpage[1] = buf_page;
+			buf->rpage = buf_page;
+		}
+	}
+	buf->page_count = n_pages;
+	return 0;
+
+depopulate:
+	list_for_each_entry_safe(buf_page, n, &buf->pages, list) {
+		list_del_init(&buf_page->list);
+		__free_page(buf_page->page);
+		kfree(buf_page);
+	}
+	return -ENOMEM;
+}
+
+/**
+ *	relay_create_buf - allocate and initialize a channel buffer
+ *	@chan: the relay channel
+ *	@cpu: cpu the buffer belongs to
+ *
+ *	Returns channel buffer if successful, %NULL otherwise.
+ */
+static struct rchan_buf *relay_create_buf(struct rchan *chan, int cpu)
+{
+	int ret;
+	struct rchan_buf *buf = kzalloc(sizeof(struct rchan_buf), GFP_KERNEL);
+	if (!buf)
+		return NULL;
+
+	buf->cpu = cpu;
+	ret = relay_alloc_buf(buf, &chan->alloc_size);
+	if (ret)
+		goto free_buf;
+
+	buf->chan = chan;
+	kref_get(&buf->chan->kref);
+	return buf;
+
+free_buf:
+	kfree(buf);
+	return NULL;
+}
+
+/**
+ *	relay_destroy_channel - free the channel struct
+ *	@kref: target kernel reference that contains the relay channel
+ *
+ *	Should only be called from kref_put().
+ */
+static void relay_destroy_channel(struct kref *kref)
+{
+	struct rchan *chan = container_of(kref, struct rchan, kref);
+	kfree(chan);
+}
+
+void ltt_relay_get_chan(struct rchan *chan)
+{
+	kref_get(&chan->kref);
+}
+EXPORT_SYMBOL_GPL(ltt_relay_get_chan);
+
+void ltt_relay_put_chan(struct rchan *chan)
+{
+	kref_put(&chan->kref, relay_destroy_channel);
+}
+EXPORT_SYMBOL_GPL(ltt_relay_put_chan);
+
+/**
+ *	relay_destroy_buf - destroy an rchan_buf struct and associated buffer
+ *	@buf: the buffer struct
+ */
+static void relay_destroy_buf(struct rchan_buf *buf)
+{
+	struct rchan *chan = buf->chan;
+	struct buf_page *buf_page, *n;
+
+	list_for_each_entry_safe(buf_page, n, &buf->pages, list) {
+		list_del_init(&buf_page->list);
+		__free_page(buf_page->page);
+		kfree(buf_page);
+	}
+	chan->buf[buf->cpu] = NULL;
+	kfree(buf);
+	kref_put(&chan->kref, relay_destroy_channel);
+}
+
+/**
+ *	relay_remove_buf - remove a channel buffer
+ *	@kref: target kernel reference that contains the relay buffer
+ *
+ *	Removes the file from the fileystem, which also frees the
+ *	rchan_buf_struct and the channel buffer.  Should only be called from
+ *	kref_put().
+ */
+static void relay_remove_buf(struct kref *kref)
+{
+	struct rchan_buf *buf = container_of(kref, struct rchan_buf, kref);
+	buf->chan->cb->remove_buf_file(buf->dentry);
+	relay_destroy_buf(buf);
+}
+
+void ltt_relay_get_chan_buf(struct rchan_buf *buf)
+{
+	kref_get(&buf->kref);
+}
+EXPORT_SYMBOL_GPL(ltt_relay_get_chan_buf);
+
+void ltt_relay_put_chan_buf(struct rchan_buf *buf)
+{
+	kref_put(&buf->kref, relay_remove_buf);
+}
+EXPORT_SYMBOL_GPL(ltt_relay_put_chan_buf);
+
+/*
+ * High-level relay kernel API and associated functions.
+ */
+
+/*
+ * rchan_callback implementations defining default channel behavior.  Used
+ * in place of corresponding NULL values in client callback struct.
+ */
+
+/*
+ * create_buf_file_create() default callback.  Does nothing.
+ */
+static struct dentry *create_buf_file_default_callback(const char *filename,
+						       struct dentry *parent,
+						       int mode,
+						       struct rchan_buf *buf)
+{
+	return NULL;
+}
+
+/*
+ * remove_buf_file() default callback.  Does nothing.
+ */
+static int remove_buf_file_default_callback(struct dentry *dentry)
+{
+	return -EINVAL;
+}
+
+/* relay channel default callbacks */
+static struct rchan_callbacks default_channel_callbacks = {
+	.create_buf_file = create_buf_file_default_callback,
+	.remove_buf_file = remove_buf_file_default_callback,
+};
+
+/**
+ *	__relay_reset - reset a channel buffer
+ *	@buf: the channel buffer
+ *	@init: 1 if this is a first-time initialization
+ *
+ *	See relay_reset() for description of effect.
+ */
+static void __relay_reset(struct rchan_buf *buf, unsigned int init)
+{
+	if (init)
+		kref_init(&buf->kref);
+}
+
+/*
+ *	relay_open_buf - create a new relay channel buffer
+ *
+ *	used by relay_open() and CPU hotplug.
+ */
+static struct rchan_buf *relay_open_buf(struct rchan *chan, unsigned int cpu)
+{
+	struct rchan_buf *buf = NULL;
+	struct dentry *dentry;
+	char *tmpname;
+
+	tmpname = kzalloc(NAME_MAX + 1, GFP_KERNEL);
+	if (!tmpname)
+		goto end;
+	snprintf(tmpname, NAME_MAX, "%s%d", chan->base_filename, cpu);
+
+	buf = relay_create_buf(chan, cpu);
+	if (!buf)
+		goto free_name;
+
+	__relay_reset(buf, 1);
+
+	/* Create file in fs */
+	dentry = chan->cb->create_buf_file(tmpname, chan->parent, S_IRUSR,
+					   buf);
+	if (!dentry)
+		goto free_buf;
+
+	buf->dentry = dentry;
+
+	goto free_name;
+
+free_buf:
+	relay_destroy_buf(buf);
+	buf = NULL;
+free_name:
+	kfree(tmpname);
+end:
+	return buf;
+}
+
+/**
+ *	relay_close_buf - close a channel buffer
+ *	@buf: channel buffer
+ *
+ *	Restores the default callbacks.
+ *	The channel buffer and channel buffer data structure are then freed
+ *	automatically when the last reference is given up.
+ */
+static void relay_close_buf(struct rchan_buf *buf)
+{
+	kref_put(&buf->kref, relay_remove_buf);
+}
+
+static void setup_callbacks(struct rchan *chan,
+				   struct rchan_callbacks *cb)
+{
+	if (!cb) {
+		chan->cb = &default_channel_callbacks;
+		return;
+	}
+
+	if (!cb->create_buf_file)
+		cb->create_buf_file = create_buf_file_default_callback;
+	if (!cb->remove_buf_file)
+		cb->remove_buf_file = remove_buf_file_default_callback;
+	chan->cb = cb;
+}
+
+/**
+ * 	relay_hotcpu_callback - CPU hotplug callback
+ * 	@nb: notifier block
+ * 	@action: hotplug action to take
+ * 	@hcpu: CPU number
+ *
+ * 	Returns the success/failure of the operation. (%NOTIFY_OK, %NOTIFY_BAD)
+ */
+static int __cpuinit relay_hotcpu_callback(struct notifier_block *nb,
+				unsigned long action,
+				void *hcpu)
+{
+	unsigned int hotcpu = (unsigned long)hcpu;
+	struct rchan *chan;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		mutex_lock(&relay_channels_mutex);
+		list_for_each_entry(chan, &relay_channels, list) {
+			if (chan->buf[hotcpu])
+				continue;
+			chan->buf[hotcpu] = relay_open_buf(chan, hotcpu);
+			if (!chan->buf[hotcpu]) {
+				printk(KERN_ERR
+					"relay_hotcpu_callback: cpu %d buffer "
+					"creation failed\n", hotcpu);
+				mutex_unlock(&relay_channels_mutex);
+				return NOTIFY_BAD;
+			}
+		}
+		mutex_unlock(&relay_channels_mutex);
+		break;
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		/* No need to flush the cpu : will be flushed upon
+		 * final relay_flush() call. */
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/**
+ *	ltt_relay_open - create a new relay channel
+ *	@base_filename: base name of files to create
+ *	@parent: dentry of parent directory, %NULL for root directory
+ *	@subbuf_size: size of sub-buffers
+ *	@n_subbufs: number of sub-buffers
+ *	@cb: client callback functions
+ *	@private_data: user-defined data
+ *
+ *	Returns channel pointer if successful, %NULL otherwise.
+ *
+ *	Creates a channel buffer for each cpu using the sizes and
+ *	attributes specified.  The created channel buffer files
+ *	will be named base_filename0...base_filenameN-1.  File
+ *	permissions will be %S_IRUSR.
+ */
+struct rchan *ltt_relay_open(const char *base_filename,
+			 struct dentry *parent,
+			 size_t subbuf_size,
+			 size_t n_subbufs,
+			 struct rchan_callbacks *cb,
+			 void *private_data)
+{
+	unsigned int i;
+	struct rchan *chan;
+	if (!base_filename)
+		return NULL;
+
+	if (!(subbuf_size && n_subbufs))
+		return NULL;
+
+	chan = kzalloc(sizeof(struct rchan), GFP_KERNEL);
+	if (!chan)
+		return NULL;
+
+	chan->version = LTT_RELAY_CHANNEL_VERSION;
+	chan->n_subbufs = n_subbufs;
+	chan->subbuf_size = subbuf_size;
+	chan->subbuf_size_order = get_count_order(subbuf_size);
+	chan->alloc_size = FIX_SIZE(subbuf_size * n_subbufs);
+	chan->parent = parent;
+	chan->private_data = private_data;
+	strlcpy(chan->base_filename, base_filename, NAME_MAX);
+	setup_callbacks(chan, cb);
+	kref_init(&chan->kref);
+
+	mutex_lock(&relay_channels_mutex);
+	for_each_online_cpu(i) {
+		chan->buf[i] = relay_open_buf(chan, i);
+		if (!chan->buf[i])
+			goto free_bufs;
+	}
+	list_add(&chan->list, &relay_channels);
+	mutex_unlock(&relay_channels_mutex);
+
+	return chan;
+
+free_bufs:
+	for_each_possible_cpu(i) {
+		if (!chan->buf[i])
+			break;
+		relay_close_buf(chan->buf[i]);
+	}
+
+	kref_put(&chan->kref, relay_destroy_channel);
+	mutex_unlock(&relay_channels_mutex);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(ltt_relay_open);
+
+/**
+ *	ltt_relay_close - close the channel
+ *	@chan: the channel
+ *
+ *	Closes all channel buffers and frees the channel.
+ */
+void ltt_relay_close(struct rchan *chan)
+{
+	unsigned int i;
+
+	if (!chan)
+		return;
+
+	mutex_lock(&relay_channels_mutex);
+	for_each_possible_cpu(i)
+		if (chan->buf[i])
+			relay_close_buf(chan->buf[i]);
+
+	list_del(&chan->list);
+	kref_put(&chan->kref, relay_destroy_channel);
+	mutex_unlock(&relay_channels_mutex);
+}
+EXPORT_SYMBOL_GPL(ltt_relay_close);
+
+/*
+ * Start iteration at the previous element. Skip the real list head.
+ */
+static struct buf_page *ltt_relay_find_prev_page(struct rchan_buf *buf,
+	struct buf_page *page, size_t offset, ssize_t diff_offset)
+{
+	struct buf_page *iter;
+	size_t orig_iter_off;
+	unsigned int i = 0;
+
+	orig_iter_off = page->offset;
+	list_for_each_entry_reverse(iter, &page->list, list) {
+		/*
+		 * Skip the real list head.
+		 */
+		if (&iter->list == &buf->pages)
+			continue;
+		i++;
+		if (offset >= iter->offset
+			&& offset < iter->offset + PAGE_SIZE) {
+#ifdef CONFIG_LTT_RELAY_CHECK_RANDOM_ACCESS
+			if (i > 1) {
+				printk(KERN_WARNING
+					"Backward random access detected in "
+					"ltt_relay. Iterations %u, "
+					"offset %zu, orig iter->off %zu, "
+					"iter->off %zu diff_offset %zd.\n", i,
+					offset, orig_iter_off, iter->offset,
+					diff_offset);
+				WARN_ON(1);
+			}
+#endif
+			return iter;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * Start iteration at the next element. Skip the real list head.
+ */
+static struct buf_page *ltt_relay_find_next_page(struct rchan_buf *buf,
+	struct buf_page *page, size_t offset, ssize_t diff_offset)
+{
+	struct buf_page *iter;
+	unsigned int i = 0;
+	size_t orig_iter_off;
+
+	orig_iter_off = page->offset;
+	list_for_each_entry(iter, &page->list, list) {
+		/*
+		 * Skip the real list head.
+		 */
+		if (&iter->list == &buf->pages)
+			continue;
+		i++;
+		if (offset >= iter->offset
+			&& offset < iter->offset + PAGE_SIZE) {
+#ifdef CONFIG_LTT_RELAY_CHECK_RANDOM_ACCESS
+			if (i > 1) {
+				printk(KERN_WARNING
+					"Forward random access detected in "
+					"ltt_relay. Iterations %u, "
+					"offset %zu, orig iter->off %zu, "
+					"iter->off %zu diff_offset %zd.\n", i,
+					offset, orig_iter_off, iter->offset,
+					diff_offset);
+				WARN_ON(1);
+			}
+#endif
+			return iter;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * Find the page containing "offset". Cache it if it is after the currently
+ * cached page.
+ */
+static struct buf_page *ltt_relay_cache_page(struct rchan_buf *buf,
+		struct buf_page **page_cache,
+		struct buf_page *page, size_t offset)
+{
+	ssize_t diff_offset;
+	ssize_t half_buf_size = buf->chan->alloc_size >> 1;
+
+	/*
+	 * Make sure this is the page we want to write into. The current
+	 * page is changed concurrently by other writers. [wrh]page are
+	 * used as a cache remembering the last page written
+	 * to/read/looked up for header address. No synchronization;
+	 * could have to find the previous page is a nested write
+	 * occured. Finding the right page is done by comparing the
+	 * dest_offset with the buf_page offsets.
+	 * When at the exact opposite of the buffer, bias towards forward search
+	 * because it will be cached.
+	 */
+
+	diff_offset = (ssize_t)offset - (ssize_t)page->offset;
+	if (diff_offset <= -(ssize_t)half_buf_size)
+		diff_offset += buf->chan->alloc_size;
+	else if (diff_offset > half_buf_size)
+		diff_offset -= buf->chan->alloc_size;
+
+	if (unlikely(diff_offset >= (ssize_t)PAGE_SIZE)) {
+		page = ltt_relay_find_next_page(buf, page, offset, diff_offset);
+		WARN_ON(!page);
+		*page_cache = page;
+	} else if (unlikely(diff_offset < 0)) {
+		page = ltt_relay_find_prev_page(buf, page, offset, diff_offset);
+		WARN_ON(!page);
+	}
+	return page;
+}
+
+/**
+ * ltt_relay_write - write data to a ltt_relay buffer.
+ * @buf : buffer
+ * @offset : offset within the buffer
+ * @src : source address
+ * @len : length to write
+ */
+int ltt_relay_write(struct rchan_buf *buf, size_t offset,
+	const void *src, size_t len)
+{
+	struct buf_page *page;
+	ssize_t pagecpy, orig_len;
+
+	orig_len = len;
+	offset &= buf->chan->alloc_size - 1;
+	page = buf->wpage;
+	if (unlikely(!len))
+		return 0;
+	for (;;) {
+		page = ltt_relay_cache_page(buf, &buf->wpage, page, offset);
+		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+		memcpy(page_address(page->page)
+			+ (offset & ~PAGE_MASK), src, pagecpy);
+		len -= pagecpy;
+		if (likely(!len))
+			break;
+		src += pagecpy;
+		offset += pagecpy;
+		/*
+		 * Underlying layer should never ask for writes across
+		 * subbuffers.
+		 */
+		WARN_ON(offset >= buf->chan->alloc_size);
+	}
+	return orig_len;
+}
+EXPORT_SYMBOL_GPL(ltt_relay_write);
+
+/**
+ * ltt_relay_read - read data from ltt_relay_buffer.
+ * @buf : buffer
+ * @offset : offset within the buffer
+ * @dest : destination address
+ * @len : length to write
+ */
+int ltt_relay_read(struct rchan_buf *buf, size_t offset,
+	void *dest, size_t len)
+{
+	struct buf_page *page;
+	ssize_t pagecpy, orig_len;
+
+	orig_len = len;
+	offset &= buf->chan->alloc_size - 1;
+	page = buf->rpage;
+	if (unlikely(!len))
+		return 0;
+	for (;;) {
+		page = ltt_relay_cache_page(buf, &buf->rpage, page, offset);
+		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+		memcpy(dest, page_address(page->page) + (offset & ~PAGE_MASK),
+			pagecpy);
+		len -= pagecpy;
+		if (likely(!len))
+			break;
+		dest += pagecpy;
+		offset += pagecpy;
+		/*
+		 * Underlying layer should never ask for reads across
+		 * subbuffers.
+		 */
+		WARN_ON(offset >= buf->chan->alloc_size);
+	}
+	return orig_len;
+}
+EXPORT_SYMBOL_GPL(ltt_relay_read);
+
+/**
+ * ltt_relay_read_get_page - Get a whole page to read from
+ * @buf : buffer
+ * @offset : offset within the buffer
+ */
+struct buf_page *ltt_relay_read_get_page(struct rchan_buf *buf, size_t offset)
+{
+	struct buf_page *page;
+
+	offset &= buf->chan->alloc_size - 1;
+	page = buf->rpage;
+	page = ltt_relay_cache_page(buf, &buf->rpage, page, offset);
+	return page;
+}
+EXPORT_SYMBOL_GPL(ltt_relay_read_get_page);
+
+/**
+ * ltt_relay_offset_address - get address of a location within the buffer
+ * @buf : buffer
+ * @offset : offset within the buffer.
+ *
+ * Return the address where a given offset is located.
+ * Should be used to get the current subbuffer header pointer. Given we know
+ * it's never on a page boundary, it's safe to write directly to this address,
+ * as long as the write is never bigger than a page size.
+ */
+void *ltt_relay_offset_address(struct rchan_buf *buf, size_t offset)
+{
+	struct buf_page *page;
+	unsigned int odd;
+
+	offset &= buf->chan->alloc_size - 1;
+	odd = !!(offset & buf->chan->subbuf_size);
+	page = buf->hpage[odd];
+	if (offset < page->offset || offset >= page->offset + PAGE_SIZE)
+		buf->hpage[odd] = page = buf->wpage;
+	page = ltt_relay_cache_page(buf, &buf->hpage[odd], page, offset);
+	return page_address(page->page) + (offset & ~PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(ltt_relay_offset_address);
+
+/**
+ *	relay_file_open - open file op for relay files
+ *	@inode: the inode
+ *	@filp: the file
+ *
+ *	Increments the channel buffer refcount.
+ */
+static int relay_file_open(struct inode *inode, struct file *filp)
+{
+	struct rchan_buf *buf = inode->i_private;
+	kref_get(&buf->kref);
+	filp->private_data = buf;
+
+	return nonseekable_open(inode, filp);
+}
+
+/**
+ *	relay_file_release - release file op for relay files
+ *	@inode: the inode
+ *	@filp: the file
+ *
+ *	Decrements the channel refcount, as the filesystem is
+ *	no longer using it.
+ */
+static int relay_file_release(struct inode *inode, struct file *filp)
+{
+	struct rchan_buf *buf = filp->private_data;
+	kref_put(&buf->kref, relay_remove_buf);
+
+	return 0;
+}
+
+const struct file_operations ltt_relay_file_operations = {
+	.open		= relay_file_open,
+	.release	= relay_file_release,
+};
+EXPORT_SYMBOL_GPL(ltt_relay_file_operations);
+
+static __init int relay_init(void)
+{
+	hotcpu_notifier(relay_hotcpu_callback, 5);
+	return 0;
+}
+
+module_init(relay_init);
Index: linux-2.6-lttng/include/linux/ltt-relay.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/ltt-relay.h	2009-03-05 15:05:56.000000000 -0500
@@ -0,0 +1,182 @@
+/*
+ * linux/include/linux/ltt-relay.h
+ *
+ * Copyright (C) 2002, 2003 - Tom Zanussi (zanussi@us.ibm.com), IBM Corp
+ * Copyright (C) 1999, 2000, 2001, 2002 - Karim Yaghmour (karim@opersys.com)
+ * Copyright (C) 2008 - Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * CONFIG_RELAY definitions and declarations
+ */
+
+#ifndef _LINUX_LTT_RELAY_H
+#define _LINUX_LTT_RELAY_H
+
+#include <linux/types.h>
+#include <linux/sched.h>
+#include <linux/timer.h>
+#include <linux/wait.h>
+#include <linux/list.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <linux/kref.h>
+#include <linux/mm.h>
+
+/* Needs a _much_ better name... */
+#define FIX_SIZE(x) ((((x) - 1) & PAGE_MASK) + PAGE_SIZE)
+
+/*
+ * Tracks changes to rchan/rchan_buf structs
+ */
+#define LTT_RELAY_CHANNEL_VERSION		8
+
+struct rchan_buf;
+
+struct buf_page {
+	struct page *page;
+	size_t offset;		/* page offset in the buffer */
+	struct list_head list;	/* buffer linked list */
+};
+
+/*
+ * Per-cpu relay channel buffer
+ */
+struct rchan_buf {
+	void *chan_private;		/* private data for this buf */
+	struct rchan *chan;		/* associated channel */
+	struct dentry *dentry;		/* channel file dentry */
+	struct kref kref;		/* channel buffer refcount */
+	struct list_head pages;		/* list of buffer pages */
+	struct buf_page *wpage;		/* current write page (cache) */
+	struct buf_page *hpage[2];	/* current subbuf header page (cache) */
+	struct buf_page *rpage;		/* current subbuf read page (cache) */
+	unsigned int page_count;	/* number of current buffer pages */
+	unsigned int cpu;		/* this buf's cpu */
+} ____cacheline_aligned;
+
+/*
+ * Relay channel data structure
+ */
+struct rchan {
+	u32 version;			/* the version of this struct */
+	size_t subbuf_size;		/* sub-buffer size */
+	size_t n_subbufs;		/* number of sub-buffers per buffer */
+	size_t alloc_size;		/* total buffer size allocated */
+	struct rchan_callbacks *cb;	/* client callbacks */
+	struct kref kref;		/* channel refcount */
+	void *private_data;		/* for user-defined data */
+	struct rchan_buf *buf[NR_CPUS]; /* per-cpu channel buffers */
+	struct list_head list;		/* for channel list */
+	struct dentry *parent;		/* parent dentry passed to open */
+	int subbuf_size_order;		/* order of sub-buffer size */
+	char base_filename[NAME_MAX];	/* saved base filename */
+};
+
+/*
+ * Relay channel client callbacks
+ */
+struct rchan_callbacks {
+	/*
+	 * subbuf_start - called on buffer-switch to a new sub-buffer
+	 * @buf: the channel buffer containing the new sub-buffer
+	 * @subbuf: the start of the new sub-buffer
+	 * @prev_subbuf: the start of the previous sub-buffer
+	 * @prev_padding: unused space at the end of previous sub-buffer
+	 *
+	 * The client should return 1 to continue logging, 0 to stop
+	 * logging.
+	 *
+	 * NOTE: subbuf_start will also be invoked when the buffer is
+	 *       created, so that the first sub-buffer can be initialized
+	 *       if necessary.  In this case, prev_subbuf will be NULL.
+	 *
+	 * NOTE: the client can reserve bytes at the beginning of the new
+	 *       sub-buffer by calling subbuf_start_reserve() in this callback.
+	 */
+	int (*subbuf_start) (struct rchan_buf *buf,
+			     void *subbuf,
+			     void *prev_subbuf,
+			     size_t prev_padding);
+
+	/*
+	 * create_buf_file - create file to represent a relay channel buffer
+	 * @filename: the name of the file to create
+	 * @parent: the parent of the file to create
+	 * @mode: the mode of the file to create
+	 * @buf: the channel buffer
+	 *
+	 * Called during relay_open(), once for each per-cpu buffer,
+	 * to allow the client to create a file to be used to
+	 * represent the corresponding channel buffer.  If the file is
+	 * created outside of relay, the parent must also exist in
+	 * that filesystem.
+	 *
+	 * The callback should return the dentry of the file created
+	 * to represent the relay buffer.
+	 *
+	 * Setting the is_global outparam to a non-zero value will
+	 * cause relay_open() to create a single global buffer rather
+	 * than the default set of per-cpu buffers.
+	 *
+	 * See Documentation/filesystems/relayfs.txt for more info.
+	 */
+	struct dentry *(*create_buf_file)(const char *filename,
+					  struct dentry *parent,
+					  int mode,
+					  struct rchan_buf *buf);
+
+	/*
+	 * remove_buf_file - remove file representing a relay channel buffer
+	 * @dentry: the dentry of the file to remove
+	 *
+	 * Called during relay_close(), once for each per-cpu buffer,
+	 * to allow the client to remove a file used to represent a
+	 * channel buffer.
+	 *
+	 * The callback should return 0 if successful, negative if not.
+	 */
+	int (*remove_buf_file)(struct dentry *dentry);
+};
+
+extern int ltt_relay_write(struct rchan_buf *buf, size_t offset,
+	const void *src, size_t len);
+
+extern int ltt_relay_read(struct rchan_buf *buf, size_t offset,
+	void *dest, size_t len);
+
+extern struct buf_page *ltt_relay_read_get_page(struct rchan_buf *buf,
+	size_t offset);
+
+/*
+ * Return the address where a given offset is located.
+ * Should be used to get the current subbuffer header pointer. Given we know
+ * it's never on a page boundary, it's safe to write directly to this address,
+ * as long as the write is never bigger than a page size.
+ */
+extern void *ltt_relay_offset_address(struct rchan_buf *buf,
+	size_t offset);
+
+/*
+ * CONFIG_LTT_RELAY kernel API, ltt/ltt-relay-alloc.c
+ */
+
+struct rchan *ltt_relay_open(const char *base_filename,
+			 struct dentry *parent,
+			 size_t subbuf_size,
+			 size_t n_subbufs,
+			 struct rchan_callbacks *cb,
+			 void *private_data);
+extern void ltt_relay_close(struct rchan *chan);
+
+void ltt_relay_get_chan(struct rchan *chan);
+void ltt_relay_put_chan(struct rchan *chan);
+
+void ltt_relay_get_chan_buf(struct rchan_buf *buf);
+void ltt_relay_put_chan_buf(struct rchan_buf *buf);
+
+/*
+ * exported ltt_relay file operations, ltt/ltt-relay-alloc.c
+ */
+extern const struct file_operations ltt_relay_file_operations;
+
+#endif /* _LINUX_LTT_RELAY_H */
+

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 06/41] LTTng optimize write to page function
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 05/41] LTTng relay buffer allocation, read, write Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 07/41] LTTng dynamic channels Mathieu Desnoyers
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Zhaolei

[-- Attachment #1: lttng-optimize-write-to-page-function.patch --]
[-- Type: text/plain, Size: 12931 bytes --]

The functions in ltt-relay-alloc.c take care of writing the data into
the buffer pages. Those pages are allocated from the page allocator and
no virtual mapping is done so we can save precious TLB entries.
ltt-relay-alloc.c is the abstraction layer which makes the buffers
"look" like a contiguous memory area, although they are made from
physically discontiguous pages linked with a linked list. A caching
mechanism makes sure we never walk over more than 1-2 entries of the
list. We use a linked list rather than a table to make sure we don't
depend on vmalloc to allocate large pointer arrays.

I did a bit of profiling with oprofile on LTTng and found out that write
functions in ltt-relay-alloc.c were taking a lot of CPU time. I thought it would
be good to improve them a bit.

Running a 2.6.29-rc3 kernel

Compiling a 2.6.25 kernel using make -j10 on a 8-cores x86_64 with a vanilla
2.6.29-rc3 kernel (all tests are cache-hot) :
real 1m22.103s

With dormant instrumentation
real 1m24.667s
(note : this 2s regression should be identified eventually by doing a bissection
of the LTTng tree.)

ltt-armall

Without modification, with flight recorder tracing active :
real 1m31.135s

Replacing the memcpy call with a specialized call for 1, 2, 4 and 8 bytes :
real 1m30.440s

Inlining the fast path of the write function (v2) :
real 1m29.349s


* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> > +static inline void ltt_relay_do_copy(void *dest, const void *src, size_t
+len)
> > +{
> > +   switch (len) {
> > +   case 1: *(u8 *)dest = *(const u8 *)src;
> > +           break;
> > +   case 2: *(u16 *)dest = *(const u16 *)src;
> > +           break;
> > +   case 4: *(u32 *)dest = *(const u32 *)src;
> > +           break;
> > +#if (BITS_PER_LONG == 64)
> > +   case 8: *(u64 *)dest = *(const u64 *)src;
> > +           break;
> > +#endif
> > +   default:
> > +           memcpy(dest, src, len);
> > +   }
> > +}
>
> hm, interesting.
>
> IIRC, few month ago, linus said this optimization is not optimazation.
> lastest gcc does this inlining automatically.
> (but I can't point its url, sorry)
>
> Is this result gcc version independent? and can you send
> the difference of gcc assembly outout?



Here we go :

x86_64
gcc (Debian 4.3.2-1) 4.3.2 (haven't tried other compiler versions)
kernel 2.6.29-rc3

char dataout[100];
char datain[100];

int sizea = 8;

void testfct_ltt(void)
{
        asm ("/* begin */");
        ltt_relay_do_copy(dataout, datain, sizea);
        asm ("/* end*/");
}

Turns into a jump table :

        movslq  sizea(%rip),%rdx
        cmpq    $8, %rdx
        jbe     .L15
.L6:
        movq    $datain, %rsi
        movq    $dataout, %rdi
        call    memcpy
        .p2align 4,,10
        .p2align 3
.L7:
[...]
.L15:
        jmp     *.L12(,%rdx,8)

        .section        .rodata
        .align 8
        .align 4
.L12:
        .quad   .L7
        .quad   .L8
        .quad   .L9
        .quad   .L6
        .quad   .L10
        .quad   .L6
        .quad   .L6
        .quad   .L6
        .quad   .L11
        .text
        .p2align 4,,10
        .p2align 3
.L11:
        movq    datain(%rip), %rax
        movq    %rax, dataout(%rip)
        jmp     .L7
        .p2align 4,,10
        .p2align 3
.L8:
        movzbl  datain(%rip), %eax
        movb    %al, dataout(%rip)
        jmp     .L7
        .p2align 4,,10
        .p2align 3
.L9:
        movzwl  datain(%rip), %eax
        movw    %ax, dataout(%rip)
        jmp     .L7
        .p2align 4,,10
        .p2align 3
.L10:
        movl    datain(%rip), %eax
        movl    %eax, dataout(%rip)
        jmp     .L7
        .size   testfct_ltt, .-testfct_ltt
        .p2align 4,,15


void testfct_memcpy(void)
{
        asm ("/* begin */");
        memcpy(dataout, datain, sizea);
        asm ("/* end */");
}

Turns into a function call because the size is not statically known :

        movslq  sizea(%rip),%rdx
        movq    $datain, %rsi
        movq    $dataout, %rdi
        call    memcpy


Below, when a constant is passed, both behave similarly :

void testfct_ltt_const(void)
{
        asm ("/* begin */");
        ltt_relay_do_copy(dataout, datain, 8);
        asm ("/* end*/");
}

        movq    datain(%rip), %rax
        movq    %rax, dataout(%rip)


void testfct_memcpy_const(void)
{
        asm ("/* begin */");
        memcpy(dataout, datain, 8);
        asm ("/* end */");
}

        movq    datain(%rip), %rax
        movq    %rax, dataout(%rip)


Therefore, I agree that when memcpy is passed a constant, it will do
the same as my ltt_relay_do_copy. However, when we know we usually
expect sizes of 1, 2, 4 and 8 bytes (unknown at compile-time), the jump
table saves the costly function call to memcpy.


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Martin Bligh <mbligh@google.com>
CC: Zhaolei <zhaolei@cn.fujitsu.com>
---
 include/linux/ltt-relay.h |   91 ++++++++++++++++++++++++++++++++++++++++++++--
 ltt/ltt-relay-alloc.c     |   80 +++++++++-------------------------------
 2 files changed, 107 insertions(+), 64 deletions(-)

Index: linux-2.6-lttng/ltt/ltt-relay-alloc.c
===================================================================
--- linux-2.6-lttng.orig/ltt/ltt-relay-alloc.c	2009-03-05 15:22:47.000000000 -0500
+++ linux-2.6-lttng/ltt/ltt-relay-alloc.c	2009-03-05 15:22:49.000000000 -0500
@@ -428,7 +428,7 @@ EXPORT_SYMBOL_GPL(ltt_relay_close);
 /*
  * Start iteration at the previous element. Skip the real list head.
  */
-static struct buf_page *ltt_relay_find_prev_page(struct rchan_buf *buf,
+struct buf_page *ltt_relay_find_prev_page(struct rchan_buf *buf,
 	struct buf_page *page, size_t offset, ssize_t diff_offset)
 {
 	struct buf_page *iter;
@@ -460,13 +460,15 @@ static struct buf_page *ltt_relay_find_p
 			return iter;
 		}
 	}
+	WARN_ON(1);
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(ltt_relay_find_prev_page);
 
 /*
  * Start iteration at the next element. Skip the real list head.
  */
-static struct buf_page *ltt_relay_find_next_page(struct rchan_buf *buf,
+struct buf_page *ltt_relay_find_next_page(struct rchan_buf *buf,
 	struct buf_page *page, size_t offset, ssize_t diff_offset)
 {
 	struct buf_page *iter;
@@ -498,48 +500,10 @@ static struct buf_page *ltt_relay_find_n
 			return iter;
 		}
 	}
+	WARN_ON(1);
 	return NULL;
 }
-
-/*
- * Find the page containing "offset". Cache it if it is after the currently
- * cached page.
- */
-static struct buf_page *ltt_relay_cache_page(struct rchan_buf *buf,
-		struct buf_page **page_cache,
-		struct buf_page *page, size_t offset)
-{
-	ssize_t diff_offset;
-	ssize_t half_buf_size = buf->chan->alloc_size >> 1;
-
-	/*
-	 * Make sure this is the page we want to write into. The current
-	 * page is changed concurrently by other writers. [wrh]page are
-	 * used as a cache remembering the last page written
-	 * to/read/looked up for header address. No synchronization;
-	 * could have to find the previous page is a nested write
-	 * occured. Finding the right page is done by comparing the
-	 * dest_offset with the buf_page offsets.
-	 * When at the exact opposite of the buffer, bias towards forward search
-	 * because it will be cached.
-	 */
-
-	diff_offset = (ssize_t)offset - (ssize_t)page->offset;
-	if (diff_offset <= -(ssize_t)half_buf_size)
-		diff_offset += buf->chan->alloc_size;
-	else if (diff_offset > half_buf_size)
-		diff_offset -= buf->chan->alloc_size;
-
-	if (unlikely(diff_offset >= (ssize_t)PAGE_SIZE)) {
-		page = ltt_relay_find_next_page(buf, page, offset, diff_offset);
-		WARN_ON(!page);
-		*page_cache = page;
-	} else if (unlikely(diff_offset < 0)) {
-		page = ltt_relay_find_prev_page(buf, page, offset, diff_offset);
-		WARN_ON(!page);
-	}
-	return page;
-}
+EXPORT_SYMBOL_GPL(ltt_relay_find_next_page);
 
 /**
  * ltt_relay_write - write data to a ltt_relay buffer.
@@ -547,26 +511,14 @@ static struct buf_page *ltt_relay_cache_
  * @offset : offset within the buffer
  * @src : source address
  * @len : length to write
+ * @page : cached buffer page
+ * @pagecpy : page size copied so far
  */
-int ltt_relay_write(struct rchan_buf *buf, size_t offset,
-	const void *src, size_t len)
+void _ltt_relay_write(struct rchan_buf *buf, size_t offset,
+	const void *src, size_t len, struct buf_page *page, ssize_t pagecpy)
 {
-	struct buf_page *page;
-	ssize_t pagecpy, orig_len;
-
-	orig_len = len;
-	offset &= buf->chan->alloc_size - 1;
-	page = buf->wpage;
-	if (unlikely(!len))
-		return 0;
-	for (;;) {
-		page = ltt_relay_cache_page(buf, &buf->wpage, page, offset);
-		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
-		memcpy(page_address(page->page)
-			+ (offset & ~PAGE_MASK), src, pagecpy);
+	do {
 		len -= pagecpy;
-		if (likely(!len))
-			break;
 		src += pagecpy;
 		offset += pagecpy;
 		/*
@@ -574,10 +526,14 @@ int ltt_relay_write(struct rchan_buf *bu
 		 * subbuffers.
 		 */
 		WARN_ON(offset >= buf->chan->alloc_size);
-	}
-	return orig_len;
+
+		page = ltt_relay_cache_page(buf, &buf->wpage, page, offset);
+		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+		ltt_relay_do_copy(page_address(page->page)
+			+ (offset & ~PAGE_MASK), src, pagecpy);
+	} while (unlikely(len != pagecpy));
 }
-EXPORT_SYMBOL_GPL(ltt_relay_write);
+EXPORT_SYMBOL_GPL(_ltt_relay_write);
 
 /**
  * ltt_relay_read - read data from ltt_relay_buffer.
Index: linux-2.6-lttng/include/linux/ltt-relay.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-relay.h	2009-03-05 15:22:47.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-relay.h	2009-03-05 15:23:47.000000000 -0500
@@ -137,8 +137,14 @@ struct rchan_callbacks {
 	int (*remove_buf_file)(struct dentry *dentry);
 };
 
-extern int ltt_relay_write(struct rchan_buf *buf, size_t offset,
-	const void *src, size_t len);
+extern struct buf_page *ltt_relay_find_prev_page(struct rchan_buf *buf,
+	struct buf_page *page, size_t offset, ssize_t diff_offset);
+
+extern struct buf_page *ltt_relay_find_next_page(struct rchan_buf *buf,
+	struct buf_page *page, size_t offset, ssize_t diff_offset);
+
+extern void _ltt_relay_write(struct rchan_buf *buf, size_t offset,
+	const void *src, size_t len, struct buf_page *page, ssize_t pagecpy);
 
 extern int ltt_relay_read(struct rchan_buf *buf, size_t offset,
 	void *dest, size_t len);
@@ -156,6 +162,87 @@ extern void *ltt_relay_offset_address(st
 	size_t offset);
 
 /*
+ * Find the page containing "offset". Cache it if it is after the currently
+ * cached page.
+ */
+static inline struct buf_page *ltt_relay_cache_page(struct rchan_buf *buf,
+		struct buf_page **page_cache,
+		struct buf_page *page, size_t offset)
+{
+	ssize_t diff_offset;
+	ssize_t half_buf_size = buf->chan->alloc_size >> 1;
+
+	/*
+	 * Make sure this is the page we want to write into. The current
+	 * page is changed concurrently by other writers. [wrh]page are
+	 * used as a cache remembering the last page written
+	 * to/read/looked up for header address. No synchronization;
+	 * could have to find the previous page is a nested write
+	 * occured. Finding the right page is done by comparing the
+	 * dest_offset with the buf_page offsets.
+	 * When at the exact opposite of the buffer, bias towards forward search
+	 * because it will be cached.
+	 */
+
+	diff_offset = (ssize_t)offset - (ssize_t)page->offset;
+	if (diff_offset <= -(ssize_t)half_buf_size)
+		diff_offset += buf->chan->alloc_size;
+	else if (diff_offset > half_buf_size)
+		diff_offset -= buf->chan->alloc_size;
+
+	if (unlikely(diff_offset >= (ssize_t)PAGE_SIZE)) {
+		page = ltt_relay_find_next_page(buf, page, offset, diff_offset);
+		*page_cache = page;
+	} else if (unlikely(diff_offset < 0)) {
+		page = ltt_relay_find_prev_page(buf, page, offset, diff_offset);
+	}
+	return page;
+}
+
+static inline void ltt_relay_do_copy(void *dest, const void *src, size_t len)
+{
+	switch (len) {
+	case 0:
+		break;
+	case 1:
+		*(u8 *)dest = *(const u8 *)src;
+		break;
+	case 2:
+		*(u16 *)dest = *(const u16 *)src;
+		break;
+	case 4:
+		*(u32 *)dest = *(const u32 *)src;
+		break;
+#if (BITS_PER_LONG == 64)
+	case 8:
+		*(u64 *)dest = *(const u64 *)src;
+		break;
+#endif
+	default:
+		memcpy(dest, src, len);
+	}
+}
+
+static inline int ltt_relay_write(struct rchan_buf *buf, size_t offset,
+	const void *src, size_t len)
+{
+	struct buf_page *page;
+	ssize_t pagecpy;
+
+	offset &= buf->chan->alloc_size - 1;
+	page = buf->wpage;
+
+	page = ltt_relay_cache_page(buf, &buf->wpage, page, offset);
+	pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+	ltt_relay_do_copy(page_address(page->page)
+		+ (offset & ~PAGE_MASK), src, pagecpy);
+
+	if (unlikely(len != pagecpy))
+		_ltt_relay_write(buf, offset, src, len, page, pagecpy);
+	return len;
+}
+
+/*
  * CONFIG_LTT_RELAY kernel API, ltt/ltt-relay-alloc.c
  */
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 07/41] LTTng dynamic channels
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 06/41] LTTng optimize write to page function Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 08/41] LTTng - tracer header Mathieu Desnoyers
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-dynamic-channels.patch --]
[-- Type: text/plain, Size: 13420 bytes --]

Make channels dynamic (a channel can be added at module load time).
When a trace is setup, its channel array stays fixed even if the channels
change.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/ltt-channels.h |   94 +++++++++++
 ltt/ltt-channels.c           |  338 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 432 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-channels.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-channels.c	2009-03-05 15:03:00.000000000 -0500
@@ -0,0 +1,338 @@
+/*
+ * ltt/ltt-channels.c
+ *
+ * (C) Copyright 2008 - Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * LTTng channel management.
+ *
+ * Author:
+ *	Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ */
+
+#include <linux/module.h>
+#include <linux/ltt-channels.h>
+#include <linux/mutex.h>
+#include <linux/vmalloc.h>
+
+/*
+ * ltt_channel_mutex may be nested inside the LTT trace mutex.
+ * ltt_channel_mutex mutex may be nested inside markers mutex.
+ */
+static DEFINE_MUTEX(ltt_channel_mutex);
+static LIST_HEAD(ltt_channels);
+/*
+ * Index of next channel in array. Makes sure that as long as a trace channel is
+ * allocated, no array index will be re-used when a channel is freed and then
+ * another channel is allocated. This index is cleared and the array indexeds
+ * get reassigned when the index_kref goes back to 0, which indicates that no
+ * more trace channels are allocated.
+ */
+static unsigned int free_index;
+static struct kref index_kref;	/* Keeps track of allocated trace channels */
+
+static struct ltt_channel_setting *lookup_channel(const char *name)
+{
+	struct ltt_channel_setting *iter;
+
+	list_for_each_entry(iter, &ltt_channels, list)
+		if (strcmp(name, iter->name) == 0)
+			return iter;
+	return NULL;
+}
+
+/*
+ * Must be called when channel refcount falls to 0 _and_ also when the last
+ * trace is freed. This function is responsible for compacting the channel and
+ * event IDs when no users are active.
+ *
+ * Called with lock_markers() and channels mutex held.
+ */
+static void release_channel_setting(struct kref *kref)
+{
+	struct ltt_channel_setting *setting = container_of(kref,
+		struct ltt_channel_setting, kref);
+	struct ltt_channel_setting *iter;
+
+	if (atomic_read(&index_kref.refcount) == 0
+	    && atomic_read(&setting->kref.refcount) == 0) {
+		list_del(&setting->list);
+		kfree(setting);
+
+		free_index = 0;
+		list_for_each_entry(iter, &ltt_channels, list) {
+			iter->index = free_index++;
+			iter->free_event_id = 0;
+		}
+		markers_compact_event_ids();
+	}
+}
+
+/*
+ * Perform channel index compaction when the last trace channel is freed.
+ *
+ * Called with lock_markers() and channels mutex held.
+ */
+static void release_trace_channel(struct kref *kref)
+{
+	struct ltt_channel_setting *iter, *n;
+
+	list_for_each_entry_safe(iter, n, &ltt_channels, list)
+		release_channel_setting(&iter->kref);
+}
+
+/**
+ * ltt_channels_register - Register a trace channel.
+ * @name: channel name
+ *
+ * Uses refcounting.
+ */
+int ltt_channels_register(const char *name)
+{
+	struct ltt_channel_setting *setting;
+	int ret = 0;
+
+	mutex_lock(&ltt_channel_mutex);
+	setting = lookup_channel(name);
+	if (setting) {
+		if (atomic_read(&setting->kref.refcount) == 0)
+			goto init_kref;
+		else {
+			kref_get(&setting->kref);
+			goto end;
+		}
+	}
+	setting = kzalloc(sizeof(*setting), GFP_KERNEL);
+	if (!setting) {
+		ret = -ENOMEM;
+		goto end;
+	}
+	list_add(&setting->list, &ltt_channels);
+	strncpy(setting->name, name, PATH_MAX-1);
+	setting->index = free_index++;
+init_kref:
+	kref_init(&setting->kref);
+end:
+	mutex_unlock(&ltt_channel_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_channels_register);
+
+/**
+ * ltt_channels_unregister - Unregister a trace channel.
+ * @name: channel name
+ *
+ * Must be called with markers mutex held.
+ */
+int ltt_channels_unregister(const char *name)
+{
+	struct ltt_channel_setting *setting;
+	int ret = 0;
+
+	mutex_lock(&ltt_channel_mutex);
+	setting = lookup_channel(name);
+	if (!setting || atomic_read(&setting->kref.refcount) == 0) {
+		ret = -ENOENT;
+		goto end;
+	}
+	kref_put(&setting->kref, release_channel_setting);
+end:
+	mutex_unlock(&ltt_channel_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_channels_unregister);
+
+/**
+ * ltt_channels_set_default - Set channel default behavior.
+ * @name: default channel name
+ * @subbuf_size: size of the subbuffers
+ * @subbuf_cnt: number of subbuffers
+ */
+int ltt_channels_set_default(const char *name,
+			     unsigned int subbuf_size,
+			     unsigned int subbuf_cnt)
+{
+	struct ltt_channel_setting *setting;
+	int ret = 0;
+
+	mutex_lock(&ltt_channel_mutex);
+	setting = lookup_channel(name);
+	if (!setting || atomic_read(&setting->kref.refcount) == 0) {
+		ret = -ENOENT;
+		goto end;
+	}
+	setting->subbuf_size = subbuf_size;
+	setting->subbuf_cnt = subbuf_cnt;
+end:
+	mutex_unlock(&ltt_channel_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_channels_set_default);
+
+/**
+ * ltt_channels_get_name_from_index - get channel name from channel index
+ * @index: channel index
+ *
+ * Allows to lookup the channel name given its index. Done to keep the name
+ * information outside of each trace channel instance.
+ */
+const char *ltt_channels_get_name_from_index(unsigned int index)
+{
+	struct ltt_channel_setting *iter;
+
+	list_for_each_entry(iter, &ltt_channels, list)
+		if (iter->index == index && atomic_read(&iter->kref.refcount))
+			return iter->name;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(ltt_channels_get_name_from_index);
+
+static struct ltt_channel_setting *
+ltt_channels_get_setting_from_name(const char *name)
+{
+	struct ltt_channel_setting *iter;
+
+	list_for_each_entry(iter, &ltt_channels, list)
+		if (!strcmp(iter->name, name)
+		    && atomic_read(&iter->kref.refcount))
+			return iter;
+	return NULL;
+}
+
+/**
+ * ltt_channels_get_index_from_name - get channel index from channel name
+ * @name: channel name
+ *
+ * Allows to lookup the channel index given its name. Done to keep the name
+ * information outside of each trace channel instance.
+ * Returns -1 if not found.
+ */
+int ltt_channels_get_index_from_name(const char *name)
+{
+	struct ltt_channel_setting *setting;
+
+	setting = ltt_channels_get_setting_from_name(name);
+	if (setting)
+		return setting->index;
+	else
+		return -1;
+}
+EXPORT_SYMBOL_GPL(ltt_channels_get_index_from_name);
+
+/**
+ * ltt_channels_trace_alloc - Allocate channel structures for a trace
+ * @subbuf_size: subbuffer size. 0 uses default.
+ * @subbuf_cnt: number of subbuffers per per-cpu buffers. 0 uses default.
+ * @flags: Default channel flags
+ *
+ * Use the current channel list to allocate the channels for a trace.
+ * Called with trace lock held. Does not perform the trace buffer allocation,
+ * because we must let the user overwrite specific channel sizes.
+ */
+struct ltt_channel_struct *ltt_channels_trace_alloc(unsigned int *nr_channels,
+						    int overwrite,
+						    int active)
+{
+	struct ltt_channel_struct *channel = NULL;
+	struct ltt_channel_setting *iter;
+
+	mutex_lock(&ltt_channel_mutex);
+	if (!free_index)
+		goto end;
+	if (!atomic_read(&index_kref.refcount))
+		kref_init(&index_kref);
+	else
+		kref_get(&index_kref);
+	*nr_channels = free_index;
+	channel = kzalloc(sizeof(struct ltt_channel_struct) * free_index,
+			  GFP_KERNEL);
+	if (!channel)
+		goto end;
+	list_for_each_entry(iter, &ltt_channels, list) {
+		if (!atomic_read(&iter->kref.refcount))
+			continue;
+		channel[iter->index].subbuf_size = iter->subbuf_size;
+		channel[iter->index].subbuf_cnt = iter->subbuf_cnt;
+		channel[iter->index].overwrite = overwrite;
+		channel[iter->index].active = active;
+		channel[iter->index].channel_name = iter->name;
+	}
+end:
+	mutex_unlock(&ltt_channel_mutex);
+	return channel;
+}
+EXPORT_SYMBOL_GPL(ltt_channels_trace_alloc);
+
+/**
+ * ltt_channels_trace_free - Free one trace's channels
+ * @channels: channels to free
+ *
+ * Called with trace lock held. The actual channel buffers must be freed before
+ * this function is called.
+ */
+void ltt_channels_trace_free(struct ltt_channel_struct *channels)
+{
+	lock_markers();
+	mutex_lock(&ltt_channel_mutex);
+	kfree(channels);
+	kref_put(&index_kref, release_trace_channel);
+	mutex_unlock(&ltt_channel_mutex);
+	unlock_markers();
+}
+EXPORT_SYMBOL_GPL(ltt_channels_trace_free);
+
+/**
+ * _ltt_channels_get_event_id - get next event ID for a marker
+ * @channel: channel name
+ * @name: event name
+ *
+ * Returns a unique event ID (for this channel) or < 0 on error.
+ * Must be called with channels mutex held.
+ */
+int _ltt_channels_get_event_id(const char *channel, const char *name)
+{
+	struct ltt_channel_setting *setting;
+	int ret;
+
+	setting = ltt_channels_get_setting_from_name(channel);
+	if (!setting) {
+		ret = -ENOENT;
+		goto end;
+	}
+	if (strcmp(channel, "metadata") == 0) {
+		if (strcmp(name, "core_marker_id") == 0)
+			ret = 0;
+		else if (strcmp(name, "core_marker_format") == 0)
+			ret = 1;
+		else
+			ret = -ENOENT;
+		goto end;
+	}
+	if (setting->free_event_id == EVENTS_PER_CHANNEL - 1) {
+		ret = -ENOSPC;
+		goto end;
+	}
+	ret = setting->free_event_id++;
+end:
+	return ret;
+}
+
+/**
+ * ltt_channels_get_event_id - get next event ID for a marker
+ * @channel: channel name
+ * @name: event name
+ *
+ * Returns a unique event ID (for this channel) or < 0 on error.
+ */
+int ltt_channels_get_event_id(const char *channel, const char *name)
+{
+	int ret;
+
+	mutex_lock(&ltt_channel_mutex);
+	ret = _ltt_channels_get_event_id(channel, name);
+	mutex_unlock(&ltt_channel_mutex);
+	return ret;
+}
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit Next Generation Channel Management");
Index: linux-2.6-lttng/include/linux/ltt-channels.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/ltt-channels.h	2009-03-05 15:03:13.000000000 -0500
@@ -0,0 +1,94 @@
+#ifndef _LTT_CHANNELS_H
+#define _LTT_CHANNELS_H
+
+/*
+ * Copyright (C) 2008 Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * Dynamic tracer channel allocation.
+ */
+
+#include <linux/limits.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+
+#define EVENTS_PER_CHANNEL	65536
+
+struct ltt_trace_struct;
+struct rchan_buf;
+
+struct ltt_channel_struct {
+	/* First 32 bytes cache-hot cacheline */
+	struct ltt_trace_struct	*trace;
+	void *buf;
+	void *trans_channel_data;
+	int overwrite:1;
+	int active:1;
+	unsigned int n_subbufs_order;
+	unsigned long commit_count_mask;	/*
+						 * Commit count mask, removing
+						 * the MSBs corresponding to
+						 * bits used to represent the
+						 * subbuffer index.
+						 */
+	/* End of first 32 bytes cacheline */
+
+	/*
+	 * buffer_begin - called on buffer-switch to a new sub-buffer
+	 * @buf: the channel buffer containing the new sub-buffer
+	 */
+	void (*buffer_begin) (struct rchan_buf *buf,
+			u64 tsc, unsigned int subbuf_idx);
+	/*
+	 * buffer_end - called on buffer-switch to a new sub-buffer
+	 * @buf: the channel buffer containing the previous sub-buffer
+	 */
+	void (*buffer_end) (struct rchan_buf *buf,
+			u64 tsc, unsigned int offset, unsigned int subbuf_idx);
+	struct kref kref;	/* Channel transport reference count */
+	struct ltt_channel_buf_access_ops *buf_access_ops;
+	unsigned int subbuf_size;
+	unsigned int subbuf_cnt;
+	const char *channel_name;
+} ____cacheline_aligned;
+
+/*
+ * ops for accessing struct channel data.
+ * Only meant to be used in slow-path code (ascii formatter code,
+ * buffer read-side access through a system call).
+ */
+struct ltt_channel_buf_access_ops {
+	unsigned long (*get_offset)(struct rchan_buf *buf);
+	unsigned long (*get_consumed)(struct rchan_buf *buf);
+	int (*open)(struct rchan_buf *buf);
+	int (*release)(struct rchan_buf *buf);
+	int (*get_subbuf)(struct rchan_buf *buf, unsigned long *consumed);
+	int (*put_subbuf)(struct rchan_buf *buf, unsigned long consumed);
+	unsigned long (*get_n_subbufs)(struct rchan_buf *buf);
+	unsigned long (*get_subbuf_size)(struct rchan_buf *buf);
+};
+
+struct ltt_channel_setting {
+	unsigned int subbuf_size;
+	unsigned int subbuf_cnt;
+	struct kref kref;	/* Number of references to structure content */
+	struct list_head list;
+	unsigned int index;	/* index of channel in trace channel array */
+	u16 free_event_id;	/* Next event ID to allocate */
+	char name[PATH_MAX];
+};
+
+int ltt_channels_register(const char *name);
+int ltt_channels_unregister(const char *name);
+int ltt_channels_set_default(const char *name,
+			     unsigned int subbuf_size,
+			     unsigned int subbuf_cnt);
+const char *ltt_channels_get_name_from_index(unsigned int index);
+int ltt_channels_get_index_from_name(const char *name);
+struct ltt_channel_struct *ltt_channels_trace_alloc(unsigned int *nr_channels,
+						    int overwrite,
+						    int active);
+void ltt_channels_trace_free(struct ltt_channel_struct *channels);
+int _ltt_channels_get_event_id(const char *channel, const char *name);
+int ltt_channels_get_event_id(const char *channel, const char *name);
+
+#endif /* _LTT_CHANNELS_H */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 08/41] LTTng - tracer header
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 07/41] LTTng dynamic channels Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 09/41] LTTng optimize write to page function deal with unaligned access Mathieu Desnoyers
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Zhao Lei, Dennis W. Tokarski, Jan Kiszka

[-- Attachment #1: lttng-tracer-header.patch --]
[-- Type: text/plain, Size: 23582 bytes --]

Declaration of traces/channel structures. Tracer infrastructure API.

Contains the structures that expresses the traces and "channels" (set of buffers
conveying part of the trace data, typically splitted in high, medium and low
data rate channels).

Also contains the tracing API : tracing behavior can be controlled from within
the kernel through calls to this API.

Thanks to Jan Kiszka <jan.kiszka@siemens.com> for cleanup of some home brewed
macros.

Credits to Zhao Lei <zhaolei@cn.fujitsu.com> for a lot of fixes and extensions
which have been folded in this patch.

Credits to "Dennis W. Tokarski" <dwt@PolTec.COM> for compile fixes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Zhao Lei <zhaolei@cn.fujitsu.com>
CC: "Dennis W. Tokarski" <dwt@PolTec.COM>
CC: Jan Kiszka <jan.kiszka@siemens.com>
---
 include/linux/ltt-tracer.h |  729 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 729 insertions(+)

Index: linux-2.6-lttng/include/linux/ltt-tracer.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/ltt-tracer.h	2009-03-05 16:40:10.000000000 -0500
@@ -0,0 +1,729 @@
+/*
+ * Copyright (C) 2005,2006,2008 Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * This contains the definitions for the Linux Trace Toolkit tracer.
+ */
+
+#ifndef _LTT_TRACER_H
+#define _LTT_TRACER_H
+
+#include <stdarg.h>
+#include <linux/types.h>
+#include <linux/limits.h>
+#include <linux/list.h>
+#include <linux/cache.h>
+#include <linux/kernel.h>
+#include <linux/timex.h>
+#include <linux/wait.h>
+#include <linux/ltt-relay.h>
+#include <linux/ltt-channels.h>
+#include <linux/ltt-core.h>
+#include <linux/marker.h>
+#include <linux/trace-clock.h>
+#include <asm/atomic.h>
+#include <asm/local.h>
+
+/* Number of bytes to log with a read/write event */
+#define LTT_LOG_RW_SIZE			32L
+
+/* Interval (in jiffies) at which the LTT per-CPU timer fires */
+#define LTT_PERCPU_TIMER_INTERVAL	1
+
+#ifndef LTT_ARCH_TYPE
+#define LTT_ARCH_TYPE			LTT_ARCH_TYPE_UNDEFINED
+#endif
+
+#ifndef LTT_ARCH_VARIANT
+#define LTT_ARCH_VARIANT		LTT_ARCH_VARIANT_NONE
+#endif
+
+struct ltt_active_marker;
+
+/* Maximum number of callbacks per marker */
+#define LTT_NR_CALLBACKS	10
+
+struct ltt_serialize_closure;
+struct ltt_probe_private_data;
+
+/* Serialization callback '%k' */
+typedef size_t (*ltt_serialize_cb)(struct rchan_buf *buf, size_t buf_offset,
+			struct ltt_serialize_closure *closure,
+			void *serialize_private, int *largest_align,
+			const char *fmt, va_list *args);
+
+struct ltt_serialize_closure {
+	ltt_serialize_cb *callbacks;
+	long cb_args[LTT_NR_CALLBACKS];
+	unsigned int cb_idx;
+};
+
+size_t ltt_serialize_data(struct rchan_buf *buf, size_t buf_offset,
+			struct ltt_serialize_closure *closure,
+			void *serialize_private,
+			int *largest_align, const char *fmt, va_list *args);
+
+struct ltt_available_probe {
+	const char *name;		/* probe name */
+	const char *format;
+	marker_probe_func *probe_func;
+	ltt_serialize_cb callbacks[LTT_NR_CALLBACKS];
+	struct list_head node;		/* registered probes list */
+};
+
+struct ltt_probe_private_data {
+	struct ltt_trace_struct *trace;	/*
+					 * Target trace, for metadata
+					 * or statedump.
+					 */
+	ltt_serialize_cb serializer;	/*
+					 * Serialization function override.
+					 */
+	void *serialize_private;	/*
+					 * Private data for serialization
+					 * functions.
+					 */
+};
+
+enum ltt_channels {
+	LTT_CHANNEL_METADATA,
+	LTT_CHANNEL_FD_STATE,
+	LTT_CHANNEL_GLOBAL_STATE,
+	LTT_CHANNEL_IRQ_STATE,
+	LTT_CHANNEL_MODULE_STATE,
+	LTT_CHANNEL_NETIF_STATE,
+	LTT_CHANNEL_SOFTIRQ_STATE,
+	LTT_CHANNEL_SWAP_STATE,
+	LTT_CHANNEL_SYSCALL_STATE,
+	LTT_CHANNEL_TASK_STATE,
+	LTT_CHANNEL_VM_STATE,
+	LTT_CHANNEL_FS,
+	LTT_CHANNEL_INPUT,
+	LTT_CHANNEL_IPC,
+	LTT_CHANNEL_KERNEL,
+	LTT_CHANNEL_MM,
+	LTT_CHANNEL_RCU,
+	LTT_CHANNEL_DEFAULT,
+};
+
+struct ltt_active_marker {
+	struct list_head node;		/* active markers list */
+	const char *channel;
+	const char *name;
+	const char *format;
+	struct ltt_available_probe *probe;
+};
+
+extern void ltt_vtrace(const struct marker *mdata, void *probe_data,
+	void *call_data, const char *fmt, va_list *args);
+extern void ltt_trace(const struct marker *mdata, void *probe_data,
+	void *call_data, const char *fmt, ...);
+
+/*
+ * Unique ID assigned to each registered probe.
+ */
+enum marker_id {
+	MARKER_ID_SET_MARKER_ID = 0,	/* Static IDs available (range 0-7) */
+	MARKER_ID_SET_MARKER_FORMAT,
+	MARKER_ID_COMPACT,		/* Compact IDs (range: 8-127)	    */
+	MARKER_ID_DYNAMIC,		/* Dynamic IDs (range: 128-65535)   */
+};
+
+/* static ids 0-1 reserved for internal use. */
+#define MARKER_CORE_IDS		2
+static inline enum marker_id marker_id_type(uint16_t id)
+{
+	if (id < MARKER_CORE_IDS)
+		return (enum marker_id)id;
+	else
+		return MARKER_ID_DYNAMIC;
+}
+
+#if defined(CONFIG_LTT) && defined(CONFIG_LTT_ALIGNMENT)
+
+/*
+ * Calculate the offset needed to align the type.
+ * size_of_type must be non-zero.
+ */
+static inline unsigned int ltt_align(size_t align_drift, size_t size_of_type)
+{
+	size_t alignment = min(sizeof(void *), size_of_type);
+	return (alignment - align_drift) & (alignment - 1);
+}
+/* Default arch alignment */
+#define LTT_ALIGN
+
+static inline int ltt_get_alignment(void)
+{
+	return sizeof(void *);
+}
+
+#else
+
+static inline unsigned int ltt_align(size_t align_drift,
+		 size_t size_of_type)
+{
+	return 0;
+}
+
+#define LTT_ALIGN __attribute__((packed))
+
+static inline int ltt_get_alignment(void)
+{
+	return 0;
+}
+#endif /* CONFIG_LTT_ALIGNMENT */
+
+#ifdef CONFIG_LTT
+
+struct user_dbg_data {
+	unsigned long avail_size;
+	unsigned long write;
+	unsigned long read;
+};
+
+struct ltt_trace_ops {
+	/* First 32 bytes cache-hot cacheline */
+	int (*reserve_slot) (struct ltt_trace_struct *trace,
+				struct ltt_channel_struct *channel,
+				void **transport_data, size_t data_size,
+				size_t *slot_size, long *buf_offset, u64 *tsc,
+				unsigned int *rflags,
+				int largest_align,
+				int cpu);
+	void (*commit_slot) (struct ltt_channel_struct *channel,
+				void **transport_data, long buf_offset,
+				size_t slot_size);
+	void (*wakeup_channel) (struct ltt_channel_struct *ltt_channel);
+	int (*user_blocking) (struct ltt_trace_struct *trace,
+				unsigned int index, size_t data_size,
+				struct user_dbg_data *dbg);
+	/* End of first 32 bytes cacheline */
+	int (*create_dirs) (struct ltt_trace_struct *new_trace);
+	void (*remove_dirs) (struct ltt_trace_struct *new_trace);
+	int (*create_channel) (const char *trace_name,
+				struct ltt_trace_struct *trace,
+				struct dentry *dir, const char *channel_name,
+				struct ltt_channel_struct *ltt_chan,
+				unsigned int subbuf_size,
+				unsigned int n_subbufs, int overwrite);
+	void (*finish_channel) (struct ltt_channel_struct *channel);
+	void (*remove_channel) (struct ltt_channel_struct *channel);
+	void (*user_errors) (struct ltt_trace_struct *trace,
+				unsigned int index, size_t data_size,
+				struct user_dbg_data *dbg, int cpu);
+#ifdef CONFIG_HOTPLUG_CPU
+	int (*handle_cpuhp) (struct notifier_block *nb,
+				unsigned long action, void *hcpu,
+				struct ltt_trace_struct *trace);
+#endif
+} ____cacheline_aligned;
+
+struct ltt_transport {
+	char *name;
+	struct module *owner;
+	struct list_head node;
+	struct ltt_trace_ops ops;
+};
+
+enum trace_mode { LTT_TRACE_NORMAL, LTT_TRACE_FLIGHT, LTT_TRACE_HYBRID };
+
+#define CHANNEL_FLAG_ENABLE	(1U<<0)
+#define CHANNEL_FLAG_OVERWRITE	(1U<<1)
+
+/* Per-trace information - each trace/flight recorder represented by one */
+struct ltt_trace_struct {
+	/* First 32 bytes cache-hot cacheline */
+	struct list_head list;
+	struct ltt_trace_ops *ops;
+	int active;
+	/* Second 32 bytes cache-hot cacheline */
+	struct ltt_channel_struct *channels;
+	unsigned int nr_channels;
+	u32 freq_scale;
+	u64 start_freq;
+	u64 start_tsc;
+	unsigned long long start_monotonic;
+	struct timeval		start_time;
+	struct ltt_channel_setting *settings;
+	struct {
+		struct dentry			*trace_root;
+	} dentry;
+	struct rchan_callbacks callbacks;
+	struct kref kref; /* Each channel has a kref of the trace struct */
+	struct ltt_transport *transport;
+	struct kref ltt_transport_kref;
+	wait_queue_head_t kref_wq; /* Place for ltt_trace_destroy to sleep */
+	char trace_name[NAME_MAX];
+} ____cacheline_aligned;
+
+/* Hardcoded event headers
+ *
+ * event header for a trace with active heartbeat : 27 bits timestamps
+ *
+ * headers are 32-bits aligned. In order to insure such alignment, a dynamic per
+ * trace alignment value must be done.
+ *
+ * Remember that the C compiler does align each member on the boundary
+ * equivalent to their own size.
+ *
+ * As relay subbuffers are aligned on pages, we are sure that they are 4 and 8
+ * bytes aligned, so the buffer header and trace header are aligned.
+ *
+ * Event headers are aligned depending on the trace alignment option.
+ *
+ * Note using C structure bitfields for cross-endianness and portability
+ * concerns.
+ */
+
+#define LTT_RESERVED_EVENTS	3
+#define LTT_EVENT_BITS		5
+#define LTT_FREE_EVENTS		((1 << LTT_EVENT_BITS) - LTT_RESERVED_EVENTS)
+#define LTT_TSC_BITS		27
+#define LTT_TSC_MASK		((1 << LTT_TSC_BITS) - 1)
+
+struct ltt_event_header {
+	u32 id_time;		/* 5 bits event id (MSB); 27 bits time (LSB) */
+};
+
+/* Reservation flags */
+#define	LTT_RFLAG_ID			(1 << 0)
+#define	LTT_RFLAG_ID_SIZE		(1 << 1)
+#define	LTT_RFLAG_ID_SIZE_TSC		(1 << 2)
+
+#define LTT_MAX_SMALL_SIZE		0xFFFFU
+
+/*
+ * We use asm/timex.h : cpu_khz/HZ variable in here : we might have to deal
+ * specifically with CPU frequency scaling someday, so using an interpolation
+ * between the start and end of buffer values is not flexible enough. Using an
+ * immediate frequency value permits to calculate directly the times for parts
+ * of a buffer that would be before a frequency change.
+ *
+ * Keep the natural field alignment for _each field_ within this structure if
+ * you ever add/remove a field from this header. Packed attribute is not used
+ * because gcc generates poor code on at least powerpc and mips. Don't ever
+ * let gcc add padding between the structure elements.
+ */
+struct ltt_subbuffer_header {
+	uint64_t cycle_count_begin;	/* Cycle count at subbuffer start */
+	uint64_t cycle_count_end;	/* Cycle count at subbuffer end */
+	uint32_t magic_number;		/*
+					 * Trace magic number.
+					 * contains endianness information.
+					 */
+	uint8_t major_version;
+	uint8_t minor_version;
+	uint8_t arch_size;		/* Architecture pointer size */
+	uint8_t alignment;		/* LTT data alignment */
+	uint64_t start_time_sec;	/* NTP-corrected start time */
+	uint64_t start_time_usec;
+	uint64_t start_freq;		/*
+					 * Frequency at trace start,
+					 * used all along the trace.
+					 */
+	uint32_t freq_scale;		/* Frequency scaling (divisor) */
+	uint32_t lost_size;		/* Size unused at end of subbuffer */
+	uint32_t buf_size;		/* Size of this subbuffer */
+	uint32_t events_lost;		/*
+					 * Events lost in this subbuffer since
+					 * the beginning of the trace.
+					 * (may overflow)
+					 */
+	uint32_t subbuf_corrupt;	/*
+					 * Corrupted (lost) subbuffers since
+					 * the begginig of the trace.
+					 * (may overflow)
+					 */
+	uint8_t header_end[0];		/* End of header */
+};
+
+/**
+ * ltt_subbuffer_header_size - called on buffer-switch to a new sub-buffer
+ *
+ * Return header size without padding after the structure. Don't use packed
+ * structure because gcc generates inefficient code on some architectures
+ * (powerpc, mips..)
+ */
+static inline size_t ltt_subbuffer_header_size(void)
+{
+	return offsetof(struct ltt_subbuffer_header, header_end);
+}
+
+/*
+ * ltt_get_header_size
+ *
+ * Calculate alignment offset to 32-bits. This is the alignment offset of the
+ * event header.
+ *
+ * Important note :
+ * The event header must be 32-bits. The total offset calculated here :
+ *
+ * Alignment of header struct on 32 bits (min arch size, header size)
+ * + sizeof(header struct)  (32-bits)
+ * + (opt) u16 (ext. event id)
+ * + (opt) u16 (event_size)
+ *             (if event_size == LTT_MAX_SMALL_SIZE, has ext. event size)
+ * + (opt) u32 (ext. event size)
+ * + (opt) u64 full TSC (aligned on min(64-bits, arch size))
+ *
+ * The payload must itself determine its own alignment from the biggest type it
+ * contains.
+ * */
+static inline unsigned char ltt_get_header_size(
+		struct ltt_channel_struct *channel,
+		size_t offset,
+		size_t data_size,
+		size_t *before_hdr_pad,
+		unsigned int rflags)
+{
+	size_t orig_offset = offset;
+	size_t padding;
+
+	BUILD_BUG_ON(sizeof(struct ltt_event_header) != sizeof(u32));
+
+	padding = ltt_align(offset, sizeof(struct ltt_event_header));
+	offset += padding;
+	offset += sizeof(struct ltt_event_header);
+
+	switch (rflags) {
+	case LTT_RFLAG_ID_SIZE_TSC:
+		offset += sizeof(u16) + sizeof(u16);
+		if (data_size >= LTT_MAX_SMALL_SIZE)
+			offset += sizeof(u32);
+		offset += ltt_align(offset, sizeof(u64));
+		offset += sizeof(u64);
+		break;
+	case LTT_RFLAG_ID_SIZE:
+		offset += sizeof(u16) + sizeof(u16);
+		if (data_size >= LTT_MAX_SMALL_SIZE)
+			offset += sizeof(u32);
+		break;
+	case LTT_RFLAG_ID:
+		offset += sizeof(u16);
+		break;
+	}
+
+	*before_hdr_pad = padding;
+	return offset - orig_offset;
+}
+
+/*
+ * ltt_write_event_header
+ *
+ * Writes the event header to the offset (already aligned on 32-bits).
+ *
+ * @trace : trace to write to.
+ * @channel : pointer to the channel structure..
+ * @buf : buffer to write to.
+ * @buf_offset : buffer offset to write to (aligned on 32 bits).
+ * @eID : event ID
+ * @event_size : size of the event, excluding the event header.
+ * @tsc : time stamp counter.
+ * @rflags : reservation flags.
+ *
+ * returns : offset where the event data must be written.
+ */
+static inline size_t ltt_write_event_header(struct ltt_trace_struct *trace,
+		struct ltt_channel_struct *channel,
+		struct rchan_buf *buf, long buf_offset,
+		u16 eID, size_t event_size,
+		u64 tsc, unsigned int rflags)
+{
+	struct ltt_event_header header;
+	size_t small_size;
+
+	switch (rflags) {
+	case LTT_RFLAG_ID_SIZE_TSC:
+		header.id_time = 29 << LTT_TSC_BITS;
+		break;
+	case LTT_RFLAG_ID_SIZE:
+		header.id_time = 30 << LTT_TSC_BITS;
+		break;
+	case LTT_RFLAG_ID:
+		header.id_time = 31 << LTT_TSC_BITS;
+		break;
+	default:
+		header.id_time = eID << LTT_TSC_BITS;
+		break;
+	}
+	header.id_time |= (u32)tsc & LTT_TSC_MASK;
+	ltt_relay_write(buf, buf_offset, &header, sizeof(header));
+	buf_offset += sizeof(header);
+
+	switch (rflags) {
+	case LTT_RFLAG_ID_SIZE_TSC:
+		small_size = min_t(size_t, event_size, LTT_MAX_SMALL_SIZE);
+		ltt_relay_write(buf, buf_offset,
+			(u16[]){ (u16)eID }, sizeof(u16));
+		buf_offset += sizeof(u16);
+		ltt_relay_write(buf, buf_offset,
+			(u16[]){ (u16)small_size }, sizeof(u16));
+		buf_offset += sizeof(u16);
+		if (small_size == LTT_MAX_SMALL_SIZE) {
+			ltt_relay_write(buf, buf_offset,
+				(u32[]){ (u32)event_size }, sizeof(u32));
+			buf_offset += sizeof(u32);
+		}
+		buf_offset += ltt_align(buf_offset, sizeof(u64));
+		ltt_relay_write(buf, buf_offset,
+			(u64[]){ (u64)tsc }, sizeof(u64));
+		buf_offset += sizeof(u64);
+		break;
+	case LTT_RFLAG_ID_SIZE:
+		small_size = min_t(size_t, event_size, LTT_MAX_SMALL_SIZE);
+		ltt_relay_write(buf, buf_offset,
+			(u16[]){ (u16)eID }, sizeof(u16));
+		buf_offset += sizeof(u16);
+		ltt_relay_write(buf, buf_offset,
+			(u16[]){ (u16)small_size }, sizeof(u16));
+		buf_offset += sizeof(u16);
+		if (small_size == LTT_MAX_SMALL_SIZE) {
+			ltt_relay_write(buf, buf_offset,
+				(u32[]){ (u32)event_size }, sizeof(u32));
+			buf_offset += sizeof(u32);
+		}
+		break;
+	case LTT_RFLAG_ID:
+		ltt_relay_write(buf, buf_offset,
+			(u16[]){ (u16)eID }, sizeof(u16));
+		buf_offset += sizeof(u16);
+		break;
+	default:
+		break;
+	}
+
+	return buf_offset;
+}
+
+/* Lockless LTTng */
+
+/* Buffer offset macros */
+
+/*
+ * BUFFER_TRUNC zeroes the subbuffer offset and the subbuffer number parts of
+ * the offset, which leaves only the buffer number.
+ */
+#define BUFFER_TRUNC(offset, chan) \
+	((offset) & (~((chan)->alloc_size-1)))
+#define BUFFER_OFFSET(offset, chan) ((offset) & ((chan)->alloc_size - 1))
+#define SUBBUF_OFFSET(offset, chan) ((offset) & ((chan)->subbuf_size - 1))
+#define SUBBUF_ALIGN(offset, chan) \
+	(((offset) + (chan)->subbuf_size) & (~((chan)->subbuf_size - 1)))
+#define SUBBUF_TRUNC(offset, chan) \
+	((offset) & (~((chan)->subbuf_size - 1)))
+#define SUBBUF_INDEX(offset, chan) \
+	(BUFFER_OFFSET((offset), chan) >> (chan)->subbuf_size_order)
+
+/*
+ * ltt_reserve_slot
+ *
+ * Atomic slot reservation in a LTTng buffer. It will take care of
+ * sub-buffer switching.
+ *
+ * Parameters:
+ *
+ * @trace : the trace structure to log to.
+ * @channel : the chanel to reserve space into.
+ * @transport_data : specific transport data.
+ * @data_size : size of the variable length data to log.
+ * @slot_size : pointer to total size of the slot (out)
+ * @buf_offset : pointer to reserve offset (out)
+ * @tsc : pointer to the tsc at the slot reservation (out)
+ * @rflags : reservation flags (header specificity)
+ * @cpu : cpu id
+ *
+ * Return : -ENOSPC if not enough space, else 0.
+ */
+static inline int ltt_reserve_slot(
+		struct ltt_trace_struct *trace,
+		struct ltt_channel_struct *channel,
+		void **transport_data,
+		size_t data_size,
+		size_t *slot_size,
+		long *buf_offset,
+		u64 *tsc,
+		unsigned int *rflags,
+		int largest_align,
+		int cpu)
+{
+	return trace->ops->reserve_slot(trace, channel, transport_data,
+			data_size, slot_size, buf_offset, tsc, rflags,
+			largest_align, cpu);
+}
+
+
+/*
+ * ltt_commit_slot
+ *
+ * Atomic unordered slot commit. Increments the commit count in the
+ * specified sub-buffer, and delivers it if necessary.
+ *
+ * Parameters:
+ *
+ * @channel : the chanel to reserve space into.
+ * @transport_data : specific transport data.
+ * @buf_offset : offset of beginning of reserved slot
+ * @slot_size : size of the reserved slot.
+ */
+static inline void ltt_commit_slot(
+		struct ltt_channel_struct *channel,
+		void **transport_data,
+		long buf_offset,
+		size_t slot_size)
+{
+	struct ltt_trace_struct *trace = channel->trace;
+
+	trace->ops->commit_slot(channel, transport_data, buf_offset, slot_size);
+}
+
+/*
+ * Control channels :
+ * control/metadata
+ * control/interrupts
+ * control/...
+ *
+ * cpu channel :
+ * cpu
+ */
+#define LTT_RELAY_ROOT		"ltt"
+#define LTT_RELAY_LOCKED_ROOT	"ltt-locked"
+
+#define LTT_METADATA_CHANNEL		"metadata_state"
+#define LTT_FD_STATE_CHANNEL		"fd_state"
+#define LTT_GLOBAL_STATE_CHANNEL	"global_state"
+#define LTT_IRQ_STATE_CHANNEL		"irq_state"
+#define LTT_MODULE_STATE_CHANNEL	"module_state"
+#define LTT_NETIF_STATE_CHANNEL		"netif_state"
+#define LTT_SOFTIRQ_STATE_CHANNEL	"softirq_state"
+#define LTT_SWAP_STATE_CHANNEL		"swap_state"
+#define LTT_SYSCALL_STATE_CHANNEL	"syscall_state"
+#define LTT_TASK_STATE_CHANNEL		"task_state"
+#define LTT_VM_STATE_CHANNEL		"vm_state"
+#define LTT_FS_CHANNEL			"fs"
+#define LTT_INPUT_CHANNEL		"input"
+#define LTT_IPC_CHANNEL			"ipc"
+#define LTT_KERNEL_CHANNEL		"kernel"
+#define LTT_MM_CHANNEL			"mm"
+#define LTT_RCU_CHANNEL			"rcu"
+
+#define LTT_FLIGHT_PREFIX	"flight-"
+
+/* Tracer properties */
+#define LTT_DEFAULT_SUBBUF_SIZE_LOW	65536
+#define LTT_DEFAULT_N_SUBBUFS_LOW	2
+#define LTT_DEFAULT_SUBBUF_SIZE_MED	262144
+#define LTT_DEFAULT_N_SUBBUFS_MED	2
+#define LTT_DEFAULT_SUBBUF_SIZE_HIGH	1048576
+#define LTT_DEFAULT_N_SUBBUFS_HIGH	2
+#define LTT_TRACER_MAGIC_NUMBER		0x00D6B7ED
+#define LTT_TRACER_VERSION_MAJOR	2
+#define LTT_TRACER_VERSION_MINOR	3
+
+/*
+ * Size reserved for high priority events (interrupts, NMI, BH) at the end of a
+ * nearly full buffer. User space won't use this last amount of space when in
+ * blocking mode. This space also includes the event header that would be
+ * written by this user space event.
+ */
+#define LTT_RESERVE_CRITICAL		4096
+
+/* Register and unregister function pointers */
+
+enum ltt_module_function {
+	LTT_FUNCTION_RUN_FILTER,
+	LTT_FUNCTION_FILTER_CONTROL,
+	LTT_FUNCTION_STATEDUMP
+};
+
+extern int ltt_module_register(enum ltt_module_function name, void *function,
+		struct module *owner);
+extern void ltt_module_unregister(enum ltt_module_function name);
+
+void ltt_transport_register(struct ltt_transport *transport);
+void ltt_transport_unregister(struct ltt_transport *transport);
+
+/* Exported control function */
+
+enum ltt_control_msg {
+	LTT_CONTROL_START,
+	LTT_CONTROL_STOP,
+	LTT_CONTROL_CREATE_TRACE,
+	LTT_CONTROL_DESTROY_TRACE
+};
+
+union ltt_control_args {
+	struct {
+		enum trace_mode mode;
+		unsigned int subbuf_size_low;
+		unsigned int n_subbufs_low;
+		unsigned int subbuf_size_med;
+		unsigned int n_subbufs_med;
+		unsigned int subbuf_size_high;
+		unsigned int n_subbufs_high;
+	} new_trace;
+};
+
+int _ltt_trace_setup(const char *trace_name);
+int ltt_trace_setup(const char *trace_name);
+struct ltt_trace_struct *_ltt_trace_find_setup(const char *trace_name);
+int ltt_trace_set_type(const char *trace_name, const char *trace_type);
+int ltt_trace_set_channel_subbufsize(const char *trace_name,
+		const char *channel_name, unsigned int size);
+int ltt_trace_set_channel_subbufcount(const char *trace_name,
+		const char *channel_name, unsigned int cnt);
+int ltt_trace_set_channel_enable(const char *trace_name,
+		const char *channel_name, unsigned int enable);
+int ltt_trace_set_channel_overwrite(const char *trace_name,
+		const char *channel_name, unsigned int overwrite);
+int ltt_trace_alloc(const char *trace_name);
+int ltt_trace_destroy(const char *trace_name);
+int ltt_trace_start(const char *trace_name);
+int ltt_trace_stop(const char *trace_name);
+
+extern int ltt_control(enum ltt_control_msg msg, const char *trace_name,
+		const char *trace_type, union ltt_control_args args);
+
+enum ltt_filter_control_msg {
+	LTT_FILTER_DEFAULT_ACCEPT,
+	LTT_FILTER_DEFAULT_REJECT
+};
+
+extern int ltt_filter_control(enum ltt_filter_control_msg msg,
+		const char *trace_name);
+
+void ltt_write_trace_header(struct ltt_trace_struct *trace,
+		struct ltt_subbuffer_header *header);
+extern void ltt_buffer_destroy(struct ltt_channel_struct *ltt_chan);
+
+void ltt_core_register(int (*function)(u8, void *));
+
+void ltt_core_unregister(void);
+
+void ltt_release_trace(struct kref *kref);
+void ltt_release_transport(struct kref *kref);
+
+extern int ltt_probe_register(struct ltt_available_probe *pdata);
+extern int ltt_probe_unregister(struct ltt_available_probe *pdata);
+extern int ltt_marker_connect(const char *channel, const char *mname,
+		const char *pname);
+extern int ltt_marker_disconnect(const char *channel, const char *mname,
+		const char *pname);
+extern void ltt_dump_marker_state(struct ltt_trace_struct *trace);
+
+void ltt_lock_traces(void);
+void ltt_unlock_traces(void);
+
+/* Relay IOCTL */
+
+/* Get the next sub buffer that can be read. */
+#define RELAY_GET_SUBBUF		_IOR(0xF5, 0x00, __u32)
+/* Release the oldest reserved (by "get") sub buffer. */
+#define RELAY_PUT_SUBBUF		_IOW(0xF5, 0x01, __u32)
+/* returns the number of sub buffers in the per cpu channel. */
+#define RELAY_GET_N_SUBBUFS		_IOR(0xF5, 0x02, __u32)
+/* returns the size of the sub buffers. */
+#define RELAY_GET_SUBBUF_SIZE		_IOR(0xF5, 0x03, __u32)
+
+#endif /* CONFIG_LTT */
+
+#endif /* _LTT_TRACER_H */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 09/41] LTTng optimize write to page function deal with unaligned access
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 08/41] LTTng - tracer header Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 10/41] lttng-optimize-write-to-page-function-remove-some-memcpy-calls Mathieu Desnoyers
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: lttng-optimize-write-to-page-function-deal-with-unaligned.patch --]
[-- Type: text/plain, Size: 5847 bytes --]

Make sure we don't end up doing unaligned accesses on architectures which lack
support for efficient unaligned access.

Standard configurations are either :

If architecture defines
  CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
    -> !CONFIG_LTT_ALIGNMENT  (to save space)

or if the architecture does not define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
    -> CONFIG_LTT_ALIGNMENT   (to speed up tracing)

Compiling a kernel with tracing active :

Tests done only on x86_64 (which has efficient unaligned access) :

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
!CONFIG_LTT_ALIGNMENT
real 1m29.349s

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
CONFIG_LTT_ALIGNMENT
real 1m29.309s

!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS (forced by modifying arch/x86/Kconfig)
CONFIG_LTT_ALIGNMENT
real	1m29.162s

So even with this supplementary test, the fast path stays fast.

Testing the variations on an architecture without efficient unaligned
access would be welcome.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Martin Bligh <mbligh@google.com>
---
 include/linux/ltt-core.h   |   35 ++++++++++++++++++++++++++++++++
 include/linux/ltt-relay.h  |   49 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ltt-tracer.h |   35 --------------------------------
 3 files changed, 84 insertions(+), 35 deletions(-)

Index: linux-2.6-lttng/include/linux/ltt-relay.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-relay.h	2009-03-05 15:23:53.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-relay.h	2009-03-05 15:24:23.000000000 -0500
@@ -20,6 +20,7 @@
 #include <linux/poll.h>
 #include <linux/kref.h>
 #include <linux/mm.h>
+#include <linux/ltt-core.h>
 
 /* Needs a _much_ better name... */
 #define FIX_SIZE(x) ((((x) - 1) & PAGE_MASK) + PAGE_SIZE)
@@ -199,6 +200,7 @@ static inline struct buf_page *ltt_relay
 	return page;
 }
 
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 static inline void ltt_relay_do_copy(void *dest, const void *src, size_t len)
 {
 	switch (len) {
@@ -222,6 +224,53 @@ static inline void ltt_relay_do_copy(voi
 		memcpy(dest, src, len);
 	}
 }
+#else
+/*
+ * Returns whether the dest and src addresses are aligned on
+ * min(sizeof(void *), len). Call this with statically known len for efficiency.
+ */
+static inline int addr_aligned(const void *dest, const void *src, size_t len)
+{
+	if (ltt_align((size_t)dest, len))
+		return 0;
+	if (ltt_align((size_t)src, len))
+		return 0;
+	return 1;
+}
+
+static inline void ltt_relay_do_copy(void *dest, const void *src, size_t len)
+{
+	switch (len) {
+	case 0:
+		break;
+	case 1:
+		*(u8 *)dest = *(const u8 *)src;
+		break;
+	case 2:
+		if (unlikely(!addr_aligned(dest, src, 2)))
+			goto memcpy_fallback;
+		*(u16 *)dest = *(const u16 *)src;
+		break;
+	case 4:
+		if (unlikely(!addr_aligned(dest, src, 4)))
+			goto memcpy_fallback;
+		*(u32 *)dest = *(const u32 *)src;
+		break;
+#if (BITS_PER_LONG == 64)
+	case 8:
+		if (unlikely(!addr_aligned(dest, src, 8)))
+			goto memcpy_fallback;
+		*(u64 *)dest = *(const u64 *)src;
+		break;
+#endif
+	default:
+		goto memcpy_fallback;
+	}
+	return;
+memcpy_fallback:
+	memcpy(dest, src, len);
+}
+#endif
 
 static inline int ltt_relay_write(struct rchan_buf *buf, size_t offset,
 	const void *src, size_t len)
Index: linux-2.6-lttng/include/linux/ltt-core.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-core.h	2009-03-05 15:22:42.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-core.h	2009-03-05 15:23:58.000000000 -0500
@@ -44,4 +44,39 @@ extern ltt_run_filter_functor ltt_run_fi
 extern void ltt_filter_register(ltt_run_filter_functor func);
 extern void ltt_filter_unregister(void);
 
+#if defined(CONFIG_LTT) && defined(CONFIG_LTT_ALIGNMENT)
+
+/*
+ * Calculate the offset needed to align the type.
+ * size_of_type must be non-zero.
+ */
+static inline unsigned int ltt_align(size_t align_drift, size_t size_of_type)
+{
+	size_t alignment = min(sizeof(void *), size_of_type);
+	return (alignment - align_drift) & (alignment - 1);
+}
+/* Default arch alignment */
+#define LTT_ALIGN
+
+static inline int ltt_get_alignment(void)
+{
+	return sizeof(void *);
+}
+
+#else
+
+static inline unsigned int ltt_align(size_t align_drift,
+		 size_t size_of_type)
+{
+	return 0;
+}
+
+#define LTT_ALIGN __attribute__((packed))
+
+static inline int ltt_get_alignment(void)
+{
+	return 0;
+}
+#endif /* defined(CONFIG_LTT) && defined(CONFIG_LTT_ALIGNMENT) */
+
 #endif /* LTT_CORE_H */
Index: linux-2.6-lttng/include/linux/ltt-tracer.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-tracer.h	2009-03-05 15:23:56.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-tracer.h	2009-03-05 15:23:58.000000000 -0500
@@ -138,41 +138,6 @@ static inline enum marker_id marker_id_t
 		return MARKER_ID_DYNAMIC;
 }
 
-#if defined(CONFIG_LTT) && defined(CONFIG_LTT_ALIGNMENT)
-
-/*
- * Calculate the offset needed to align the type.
- * size_of_type must be non-zero.
- */
-static inline unsigned int ltt_align(size_t align_drift, size_t size_of_type)
-{
-	size_t alignment = min(sizeof(void *), size_of_type);
-	return (alignment - align_drift) & (alignment - 1);
-}
-/* Default arch alignment */
-#define LTT_ALIGN
-
-static inline int ltt_get_alignment(void)
-{
-	return sizeof(void *);
-}
-
-#else
-
-static inline unsigned int ltt_align(size_t align_drift,
-		 size_t size_of_type)
-{
-	return 0;
-}
-
-#define LTT_ALIGN __attribute__((packed))
-
-static inline int ltt_get_alignment(void)
-{
-	return 0;
-}
-#endif /* CONFIG_LTT_ALIGNMENT */
-
 #ifdef CONFIG_LTT
 
 struct user_dbg_data {

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 10/41] lttng-optimize-write-to-page-function-remove-some-memcpy-calls
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 09/41] LTTng optimize write to page function deal with unaligned access Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 11/41] ltt-relay: cache pages address Mathieu Desnoyers
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Zhaolei

[-- Attachment #1: lttng-optimize-write-to-page-function-remove-some-memcpy-calls.patch --]
[-- Type: text/plain, Size: 2721 bytes --]

Zhaolei :
> Hello, Mathieu
> 
> Why not use instructions generated by gcc instead of memcpy on arch without
> 64bit write as:
> case 4: *(u32 *)dest = *(const u32 *)src;
>   break;
> case 8: *(u64 *)dest = *(const u64 *)src;
>   break;
> 
> IMHO, even on arch without 64bit write, memcpy is more complex.

#include <inttypes.h>

char dest[100];
char src[100];

typedef uint64_t u64;
typedef uint32_t u32;

void gcc_u64(void)
{
        asm("/* begin */");
        *(u64 *)dest = *(const u64 *)src;
        asm("/* end */");
}


        movl    src, %eax
        movl    src+4, %edx
        movl    %eax, dest
        movl    %edx, dest+4


void twice_u32(void)
{
        asm("/* begin */");
        ((u32 *)dest)[0] = ((const u32 *)src)[0];
        ((u32 *)dest)[1] = ((const u32 *)src)[1];
        asm("/* end */");
}
        movl    src, %eax
        movl    %eax, dest
        movl    src+4, %eax
        movl    %eax, dest+4

gcc seems to do a better register scheduler than my code, so I think
it's not so bad. I will take your proposal.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Zhaolei <zhaolei@cn.fujitsu.com>
---
 include/linux/ltt-relay.h |   21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Index: linux-2.6-lttng/include/linux/ltt-relay.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-relay.h	2009-03-05 15:40:02.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-relay.h	2009-03-05 15:40:42.000000000 -0500
@@ -215,13 +215,16 @@ static inline void ltt_relay_do_copy(voi
 	case 4:
 		*(u32 *)dest = *(const u32 *)src;
 		break;
-#if (BITS_PER_LONG == 64)
 	case 8:
 		*(u64 *)dest = *(const u64 *)src;
 		break;
-#endif
 	default:
-		memcpy(dest, src, len);
+		/*
+		 * What we really want here is an inline memcpy, but we don't
+		 * have constants, so gcc generally uses a function call.
+		 */
+		for (; len > 0; len--)
+			*(u8 *)dest++ = *(const u8 *)src++;
 	}
 }
 #else
@@ -256,19 +259,19 @@ static inline void ltt_relay_do_copy(voi
 			goto memcpy_fallback;
 		*(u32 *)dest = *(const u32 *)src;
 		break;
-#if (BITS_PER_LONG == 64)
 	case 8:
 		if (unlikely(!addr_aligned(dest, src, 8)))
 			goto memcpy_fallback;
 		*(u64 *)dest = *(const u64 *)src;
 		break;
-#endif
 	default:
-		goto memcpy_fallback;
+		/*
+		 * What we really want here is an inline memcpy, but we don't
+		 * have constants, so gcc generally uses a function call.
+		 */
+		for (; len > 0; len--)
+			*(u8 *)dest++ = *(const u8 *)src++;
 	}
-	return;
-memcpy_fallback:
-	memcpy(dest, src, len);
 }
 #endif
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 11/41] ltt-relay: cache pages address
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 10/41] lttng-optimize-write-to-page-function-remove-some-memcpy-calls Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 12/41] x86 : export vmalloc_sync_all() Mathieu Desnoyers
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Lai Jiangshan, Mathieu Desnoyers

[-- Attachment #1: lttng-optimize-write-to-page-function-cache-page-address.patch --]
[-- Type: text/plain, Size: 3197 bytes --]

page_address is not fast in some systems,
we should cache it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/ltt-relay.h |    4 ++--
 ltt/ltt-relay-alloc.c     |   10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

Index: linux-2.6-lttng/include/linux/ltt-relay.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-relay.h	2009-03-05 16:07:47.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-relay.h	2009-03-05 16:07:47.000000000 -0500
@@ -34,6 +34,7 @@ struct rchan_buf;
 
 struct buf_page {
 	struct page *page;
+	void *virt;		/* page address of the struct page */
 	size_t offset;		/* page offset in the buffer */
 	struct list_head list;	/* buffer linked list */
 };
@@ -286,8 +287,7 @@ static inline int ltt_relay_write(struct
 
 	page = ltt_relay_cache_page(buf, &buf->wpage, page, offset);
 	pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
-	ltt_relay_do_copy(page_address(page->page)
-		+ (offset & ~PAGE_MASK), src, pagecpy);
+	ltt_relay_do_copy(page->virt + (offset & ~PAGE_MASK), src, pagecpy);
 
 	if (unlikely(len != pagecpy))
 		_ltt_relay_write(buf, offset, src, len, page, pagecpy);
Index: linux-2.6-lttng/ltt/ltt-relay-alloc.c
===================================================================
--- linux-2.6-lttng.orig/ltt/ltt-relay-alloc.c	2009-03-05 16:07:47.000000000 -0500
+++ linux-2.6-lttng/ltt/ltt-relay-alloc.c	2009-03-05 16:07:47.000000000 -0500
@@ -54,6 +54,7 @@ static int relay_alloc_buf(struct rchan_
 			goto depopulate;
 		}
 		list_add_tail(&buf_page->list, &buf->pages);
+		buf_page->virt = page_address(buf_page->page);
 		buf_page->offset = (size_t)i << PAGE_SHIFT;
 		set_page_private(buf_page->page, (unsigned long)buf_page);
 		if (i == 0) {
@@ -529,8 +530,8 @@ void _ltt_relay_write(struct rchan_buf *
 
 		page = ltt_relay_cache_page(buf, &buf->wpage, page, offset);
 		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
-		ltt_relay_do_copy(page_address(page->page)
-			+ (offset & ~PAGE_MASK), src, pagecpy);
+		ltt_relay_do_copy(page->virt
+				+ (offset & ~PAGE_MASK), src, pagecpy);
 	} while (unlikely(len != pagecpy));
 }
 EXPORT_SYMBOL_GPL(_ltt_relay_write);
@@ -556,8 +557,7 @@ int ltt_relay_read(struct rchan_buf *buf
 	for (;;) {
 		page = ltt_relay_cache_page(buf, &buf->rpage, page, offset);
 		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
-		memcpy(dest, page_address(page->page) + (offset & ~PAGE_MASK),
-			pagecpy);
+		memcpy(dest, page->virt + (offset & ~PAGE_MASK), pagecpy);
 		len -= pagecpy;
 		if (likely(!len))
 			break;
@@ -610,7 +610,7 @@ void *ltt_relay_offset_address(struct rc
 	if (offset < page->offset || offset >= page->offset + PAGE_SIZE)
 		buf->hpage[odd] = page = buf->wpage;
 	page = ltt_relay_cache_page(buf, &buf->hpage[odd], page, offset);
-	return page_address(page->page) + (offset & ~PAGE_MASK);
+	return page->virt + (offset & ~PAGE_MASK);
 }
 EXPORT_SYMBOL_GPL(ltt_relay_offset_address);
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 12/41] x86 : export vmalloc_sync_all()
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (10 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 11/41] ltt-relay: cache pages address Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 13/41] LTTng - tracer code Mathieu Desnoyers
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: export-vmalloc-sync-all-symbol.patch --]
[-- Type: text/plain, Size: 761 bytes --]

Needed by the tracer module. Used to make sure the tracer will not trigger a
page faults. It would be useful in case of full page fault handler
instrumentation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/x86/mm/fault.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6-lttng/arch/x86/mm/fault.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/mm/fault.c	2009-03-04 13:24:31.000000000 -0500
+++ linux-2.6-lttng/arch/x86/mm/fault.c	2009-03-04 14:11:47.000000000 -0500
@@ -935,3 +935,4 @@ void vmalloc_sync_all(void)
 	}
 #endif
 }
+EXPORT_SYMBOL_GPL(vmalloc_sync_all);

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 13/41] LTTng - tracer code
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (11 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 12/41] x86 : export vmalloc_sync_all() Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 14/41] Splice and pipe : export pipe buf operations for GPL modules Mathieu Desnoyers
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Zhao Lei, Daniel Walker

[-- Attachment #1: lttng-tracer.patch --]
[-- Type: text/plain, Size: 32122 bytes --]

Trace management code (API implementation). Home of the trace operation
synchronization.

Changelog :

In ltt_exit() the ltt_traces_sem is taken and list_for_each_entry_rcu is
> used. 
> 
> (There is another problem with the code I mention above. It's also
> appear to modify the list inside _ltt_trace_destroy, but your not doing
> a "safe" list traversal. So it should be list_for_each_entry_safe_rcu)
> 

This point is good, will fix.

Thanks to Daniel Walker (dwalker@mvista.com) for pointing this out.

Credits to Zhao Lei <zhaolei@cn.fujitsu.com> for a lot of fixes and extensions
which have been folded in this patch.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Zhao Lei <zhaolei@cn.fujitsu.com>
CC: Daniel Walker <dwalker@mvista.com>
---
 ltt/ltt-tracer.c | 1210 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1210 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-tracer.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-tracer.c	2009-03-04 14:13:18.000000000 -0500
@@ -0,0 +1,1210 @@
+/*
+ * ltt/ltt-tracer.c
+ *
+ * (C) Copyright	2005-2008 -
+ * 		Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * Tracing management internal kernel API. Trace buffer allocation/free, tracing
+ * start/stop.
+ *
+ * Author:
+ *	Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * Inspired from LTT :
+ *  Karim Yaghmour (karim@opersys.com)
+ *  Tom Zanussi (zanussi@us.ibm.com)
+ *  Bob Wisniewski (bob@watson.ibm.com)
+ * And from K42 :
+ *  Bob Wisniewski (bob@watson.ibm.com)
+ *
+ * Changelog:
+ *  22/09/06, Move to the marker/probes mechanism.
+ *  19/10/05, Complete lockless mechanism.
+ *  27/05/05, Modular redesign and rewrite.
+ */
+
+#include <linux/time.h>
+#include <linux/ltt-tracer.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/bitops.h>
+#include <linux/fs.h>
+#include <linux/cpu.h>
+#include <linux/kref.h>
+#include <linux/delay.h>
+#include <linux/vmalloc.h>
+#include <asm/atomic.h>
+
+static void async_wakeup(unsigned long data);
+
+static DEFINE_TIMER(ltt_async_wakeup_timer, async_wakeup, 0, 0);
+
+/* Default callbacks for modules */
+notrace int ltt_filter_control_default(enum ltt_filter_control_msg msg,
+		struct ltt_trace_struct *trace)
+{
+	return 0;
+}
+
+int ltt_statedump_default(struct ltt_trace_struct *trace)
+{
+	return 0;
+}
+
+/* Callbacks for registered modules */
+
+int (*ltt_filter_control_functor)
+	(enum ltt_filter_control_msg msg, struct ltt_trace_struct *trace) =
+					ltt_filter_control_default;
+struct module *ltt_filter_control_owner;
+
+/* These function pointers are protected by a trace activation check */
+struct module *ltt_run_filter_owner;
+int (*ltt_statedump_functor)(struct ltt_trace_struct *trace) =
+					ltt_statedump_default;
+struct module *ltt_statedump_owner;
+
+struct chan_info_struct {
+	const char *name;
+	unsigned int def_subbufsize;
+	unsigned int def_subbufcount;
+} chan_infos[] = {
+	[LTT_CHANNEL_METADATA] = {
+		LTT_METADATA_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_FD_STATE] = {
+		LTT_FD_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_GLOBAL_STATE] = {
+		LTT_GLOBAL_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_IRQ_STATE] = {
+		LTT_IRQ_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_MODULE_STATE] = {
+		LTT_MODULE_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_NETIF_STATE] = {
+		LTT_NETIF_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_SOFTIRQ_STATE] = {
+		LTT_SOFTIRQ_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_SWAP_STATE] = {
+		LTT_SWAP_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_SYSCALL_STATE] = {
+		LTT_SYSCALL_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_TASK_STATE] = {
+		LTT_TASK_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_VM_STATE] = {
+		LTT_VM_STATE_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_FS] = {
+		LTT_FS_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_MED,
+		LTT_DEFAULT_N_SUBBUFS_MED,
+	},
+	[LTT_CHANNEL_INPUT] = {
+		LTT_INPUT_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_IPC] = {
+		LTT_IPC_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_LOW,
+		LTT_DEFAULT_N_SUBBUFS_LOW,
+	},
+	[LTT_CHANNEL_KERNEL] = {
+		LTT_KERNEL_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_HIGH,
+		LTT_DEFAULT_N_SUBBUFS_HIGH,
+	},
+	[LTT_CHANNEL_MM] = {
+		LTT_MM_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_MED,
+		LTT_DEFAULT_N_SUBBUFS_MED,
+	},
+	[LTT_CHANNEL_RCU] = {
+		LTT_RCU_CHANNEL,
+		LTT_DEFAULT_SUBBUF_SIZE_MED,
+		LTT_DEFAULT_N_SUBBUFS_MED,
+	},
+	[LTT_CHANNEL_DEFAULT] = {
+		NULL,
+		LTT_DEFAULT_SUBBUF_SIZE_MED,
+		LTT_DEFAULT_N_SUBBUFS_MED,
+	},
+};
+
+static enum ltt_channels get_channel_type_from_name(const char *name)
+{
+	int i;
+
+	if (!name)
+		return LTT_CHANNEL_DEFAULT;
+
+	for (i = 0; i < ARRAY_SIZE(chan_infos); i++)
+		if (chan_infos[i].name && !strcmp(name, chan_infos[i].name))
+			return (enum ltt_channels)i;
+
+	return LTT_CHANNEL_DEFAULT;
+}
+
+/**
+ * ltt_module_register - LTT module registration
+ * @name: module type
+ * @function: callback to register
+ * @owner: module which owns the callback
+ *
+ * The module calling this registration function must ensure that no
+ * trap-inducing code will be executed by "function". E.g. vmalloc_sync_all()
+ * must be called between a vmalloc and the moment the memory is made visible to
+ * "function". This registration acts as a vmalloc_sync_all. Therefore, only if
+ * the module allocates virtual memory after its registration must it
+ * synchronize the TLBs.
+ */
+int ltt_module_register(enum ltt_module_function name, void *function,
+		struct module *owner)
+{
+	int ret = 0;
+
+	/*
+	 * Make sure no page fault can be triggered by the module about to be
+	 * registered. We deal with this here so we don't have to call
+	 * vmalloc_sync_all() in each module's init.
+	 */
+	vmalloc_sync_all();
+
+	switch (name) {
+	case LTT_FUNCTION_RUN_FILTER:
+		if (ltt_run_filter_owner != NULL) {
+			ret = -EEXIST;
+			goto end;
+		}
+		ltt_filter_register((ltt_run_filter_functor)function);
+		ltt_run_filter_owner = owner;
+		break;
+	case LTT_FUNCTION_FILTER_CONTROL:
+		if (ltt_filter_control_owner != NULL) {
+			ret = -EEXIST;
+			goto end;
+		}
+		ltt_filter_control_functor =
+			(int (*)(enum ltt_filter_control_msg,
+			struct ltt_trace_struct *))function;
+		ltt_filter_control_owner = owner;
+		break;
+	case LTT_FUNCTION_STATEDUMP:
+		if (ltt_statedump_owner != NULL) {
+			ret = -EEXIST;
+			goto end;
+		}
+		ltt_statedump_functor =
+			(int (*)(struct ltt_trace_struct *))function;
+		ltt_statedump_owner = owner;
+		break;
+	}
+
+end:
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_module_register);
+
+/**
+ * ltt_module_unregister - LTT module unregistration
+ * @name: module type
+ */
+void ltt_module_unregister(enum ltt_module_function name)
+{
+	switch (name) {
+	case LTT_FUNCTION_RUN_FILTER:
+		ltt_filter_unregister();
+		ltt_run_filter_owner = NULL;
+		/* Wait for preempt sections to finish */
+		synchronize_sched();
+		break;
+	case LTT_FUNCTION_FILTER_CONTROL:
+		ltt_filter_control_functor = ltt_filter_control_default;
+		ltt_filter_control_owner = NULL;
+		break;
+	case LTT_FUNCTION_STATEDUMP:
+		ltt_statedump_functor = ltt_statedump_default;
+		ltt_statedump_owner = NULL;
+		break;
+	}
+
+}
+EXPORT_SYMBOL_GPL(ltt_module_unregister);
+
+static LIST_HEAD(ltt_transport_list);
+
+/**
+ * ltt_transport_register - LTT transport registration
+ * @transport: transport structure
+ *
+ * Registers a transport which can be used as output to extract the data out of
+ * LTTng. The module calling this registration function must ensure that no
+ * trap-inducing code will be executed by the transport functions. E.g.
+ * vmalloc_sync_all() must be called between a vmalloc and the moment the memory
+ * is made visible to the transport function. This registration acts as a
+ * vmalloc_sync_all. Therefore, only if the module allocates virtual memory
+ * after its registration must it synchronize the TLBs.
+ */
+void ltt_transport_register(struct ltt_transport *transport)
+{
+	/*
+	 * Make sure no page fault can be triggered by the module about to be
+	 * registered. We deal with this here so we don't have to call
+	 * vmalloc_sync_all() in each module's init.
+	 */
+	vmalloc_sync_all();
+
+	ltt_lock_traces();
+	list_add_tail(&transport->node, &ltt_transport_list);
+	ltt_unlock_traces();
+}
+EXPORT_SYMBOL_GPL(ltt_transport_register);
+
+/**
+ * ltt_transport_unregister - LTT transport unregistration
+ * @transport: transport structure
+ */
+void ltt_transport_unregister(struct ltt_transport *transport)
+{
+	ltt_lock_traces();
+	list_del(&transport->node);
+	ltt_unlock_traces();
+}
+EXPORT_SYMBOL_GPL(ltt_transport_unregister);
+
+static inline int is_channel_overwrite(enum ltt_channels chan,
+	enum trace_mode mode)
+{
+	switch (mode) {
+	case LTT_TRACE_NORMAL:
+		return 0;
+	case LTT_TRACE_FLIGHT:
+		switch (chan) {
+		case LTT_CHANNEL_METADATA:
+			return 0;
+		default:
+			return 1;
+		}
+	case LTT_TRACE_HYBRID:
+		switch (chan) {
+		case LTT_CHANNEL_KERNEL:
+		case LTT_CHANNEL_FS:
+		case LTT_CHANNEL_MM:
+		case LTT_CHANNEL_RCU:
+		case LTT_CHANNEL_IPC:
+		case LTT_CHANNEL_INPUT:
+			return 1;
+		default:
+			return 0;
+		}
+	default:
+		return 0;
+	}
+}
+
+/**
+ * ltt_write_trace_header - Write trace header
+ * @trace: Trace information
+ * @header: Memory address where the information must be written to
+ */
+void notrace ltt_write_trace_header(struct ltt_trace_struct *trace,
+		struct ltt_subbuffer_header *header)
+{
+	header->magic_number = LTT_TRACER_MAGIC_NUMBER;
+	header->major_version = LTT_TRACER_VERSION_MAJOR;
+	header->minor_version = LTT_TRACER_VERSION_MINOR;
+	header->arch_size = sizeof(void *);
+	header->alignment = ltt_get_alignment();
+	header->start_time_sec = trace->start_time.tv_sec;
+	header->start_time_usec = trace->start_time.tv_usec;
+	header->start_freq = trace->start_freq;
+	header->freq_scale = trace->freq_scale;
+}
+EXPORT_SYMBOL_GPL(ltt_write_trace_header);
+
+static void trace_async_wakeup(struct ltt_trace_struct *trace)
+{
+	int i;
+	struct ltt_channel_struct *chan;
+
+	/* Must check each channel for pending read wakeup */
+	for (i = 0; i < trace->nr_channels; i++) {
+		chan = &trace->channels[i];
+		if (chan->active)
+			trace->ops->wakeup_channel(chan);
+	}
+}
+
+/* Timer to send async wakeups to the readers */
+static void async_wakeup(unsigned long data)
+{
+	struct ltt_trace_struct *trace;
+	rcu_read_lock_sched();
+	list_for_each_entry_rcu(trace, &ltt_traces.head, list) {
+		trace_async_wakeup(trace);
+	}
+	rcu_read_unlock_sched();
+
+	mod_timer(&ltt_async_wakeup_timer, jiffies + LTT_PERCPU_TIMER_INTERVAL);
+}
+
+/**
+ * _ltt_trace_find - find a trace by given name.
+ * trace_name: trace name
+ *
+ * Returns a pointer to the trace structure, NULL if not found.
+ */
+static struct ltt_trace_struct *_ltt_trace_find(const char *trace_name)
+{
+	struct ltt_trace_struct *trace;
+
+	list_for_each_entry(trace, &ltt_traces.head, list)
+		if (!strncmp(trace->trace_name, trace_name, NAME_MAX))
+			return trace;
+
+	return NULL;
+}
+
+/* _ltt_trace_find_setup :
+ * find a trace in setup list by given name.
+ *
+ * Returns a pointer to the trace structure, NULL if not found.
+ */
+struct ltt_trace_struct *_ltt_trace_find_setup(const char *trace_name)
+{
+	struct ltt_trace_struct *trace;
+
+	list_for_each_entry(trace, &ltt_traces.setup_head, list)
+		if (!strncmp(trace->trace_name, trace_name, NAME_MAX))
+			return trace;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(_ltt_trace_find_setup);
+
+/**
+ * ltt_release_transport - Release an LTT transport
+ * @kref : reference count on the transport
+ */
+void ltt_release_transport(struct kref *kref)
+{
+	struct ltt_trace_struct *trace = container_of(kref,
+			struct ltt_trace_struct, ltt_transport_kref);
+	trace->ops->remove_dirs(trace);
+}
+EXPORT_SYMBOL_GPL(ltt_release_transport);
+
+/**
+ * ltt_release_trace - Release a LTT trace
+ * @kref : reference count on the trace
+ */
+void ltt_release_trace(struct kref *kref)
+{
+	struct ltt_trace_struct *trace = container_of(kref,
+			struct ltt_trace_struct, kref);
+	ltt_channels_trace_free(trace->channels);
+	kfree(trace);
+}
+EXPORT_SYMBOL_GPL(ltt_release_trace);
+
+static inline void prepare_chan_size_num(unsigned int *subbuf_size,
+					 unsigned int *n_subbufs)
+{
+	*subbuf_size = 1 << get_count_order(*subbuf_size);
+	*n_subbufs = 1 << get_count_order(*n_subbufs);
+
+	/* Subbuf size and number must both be power of two */
+	WARN_ON(hweight32(*subbuf_size) != 1);
+	WARN_ON(hweight32(*n_subbufs) != 1);
+}
+
+int _ltt_trace_setup(const char *trace_name)
+{
+	int err = 0;
+	struct ltt_trace_struct *new_trace = NULL;
+	int metadata_index;
+	unsigned int chan;
+	enum ltt_channels chantype;
+
+	if (_ltt_trace_find_setup(trace_name)) {
+		printk(KERN_ERR	"LTT : Trace name %s already used.\n",
+				trace_name);
+		err = -EEXIST;
+		goto traces_error;
+	}
+
+	if (_ltt_trace_find(trace_name)) {
+		printk(KERN_ERR	"LTT : Trace name %s already used.\n",
+				trace_name);
+		err = -EEXIST;
+		goto traces_error;
+	}
+
+	new_trace = kzalloc(sizeof(struct ltt_trace_struct), GFP_KERNEL);
+	if (!new_trace) {
+		printk(KERN_ERR
+			"LTT : Unable to allocate memory for trace %s\n",
+			trace_name);
+		err = -ENOMEM;
+		goto traces_error;
+	}
+	strncpy(new_trace->trace_name, trace_name, NAME_MAX);
+	new_trace->channels = ltt_channels_trace_alloc(&new_trace->nr_channels,
+						       0, 1);
+	if (!new_trace->channels) {
+		printk(KERN_ERR
+			"LTT : Unable to allocate memory for chaninfo  %s\n",
+			trace_name);
+		err = -ENOMEM;
+		goto trace_free;
+	}
+
+	/*
+	 * Force metadata channel to active, no overwrite.
+	 */
+	metadata_index = ltt_channels_get_index_from_name("metadata");
+	WARN_ON(metadata_index < 0);
+	new_trace->channels[metadata_index].overwrite = 0;
+	new_trace->channels[metadata_index].active = 1;
+
+	/*
+	 * Set hardcoded tracer defaults for some channels
+	 */
+	for (chan = 0; chan < new_trace->nr_channels; chan++) {
+		if (!(new_trace->channels[chan].active))
+			continue;
+
+		chantype = get_channel_type_from_name(
+			ltt_channels_get_name_from_index(chan));
+		new_trace->channels[chan].subbuf_size =
+			chan_infos[chantype].def_subbufsize;
+		new_trace->channels[chan].subbuf_cnt =
+			chan_infos[chantype].def_subbufcount;
+	}
+
+	list_add(&new_trace->list, &ltt_traces.setup_head);
+	return 0;
+
+trace_free:
+	kfree(new_trace);
+traces_error:
+	return err;
+}
+EXPORT_SYMBOL_GPL(_ltt_trace_setup);
+
+
+int ltt_trace_setup(const char *trace_name)
+{
+	int ret;
+	ltt_lock_traces();
+	ret = _ltt_trace_setup(trace_name);
+	ltt_unlock_traces();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_setup);
+
+/* must be called from within a traces lock. */
+static void _ltt_trace_free(struct ltt_trace_struct *trace)
+{
+	list_del(&trace->list);
+	kfree(trace);
+}
+
+int ltt_trace_set_type(const char *trace_name, const char *trace_type)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+	struct ltt_transport *tran_iter, *transport = NULL;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (!trace) {
+		printk(KERN_ERR "LTT : Trace not found %s\n", trace_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	list_for_each_entry(tran_iter, &ltt_transport_list, node) {
+		if (!strcmp(tran_iter->name, trace_type)) {
+			transport = tran_iter;
+			break;
+		}
+	}
+	if (!transport) {
+		printk(KERN_ERR	"LTT : Transport %s is not present.\n",
+			trace_type);
+		err = -EINVAL;
+		goto traces_error;
+	}
+
+	trace->transport = transport;
+
+traces_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_set_type);
+
+int ltt_trace_set_channel_subbufsize(const char *trace_name,
+		const char *channel_name, unsigned int size)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+	int index;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (!trace) {
+		printk(KERN_ERR "LTT : Trace not found %s\n", trace_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	index = ltt_channels_get_index_from_name(channel_name);
+	if (index < 0) {
+		printk(KERN_ERR "LTT : Channel %s not found\n", channel_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+	trace->channels[index].subbuf_size = size;
+
+traces_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_set_channel_subbufsize);
+
+int ltt_trace_set_channel_subbufcount(const char *trace_name,
+		const char *channel_name, unsigned int cnt)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+	int index;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (!trace) {
+		printk(KERN_ERR "LTT : Trace not found %s\n", trace_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	index = ltt_channels_get_index_from_name(channel_name);
+	if (index < 0) {
+		printk(KERN_ERR "LTT : Channel %s not found\n", channel_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+	trace->channels[index].subbuf_cnt = cnt;
+
+traces_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_set_channel_subbufcount);
+
+int ltt_trace_set_channel_enable(const char *trace_name,
+		const char *channel_name, unsigned int enable)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+	int index;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (!trace) {
+		printk(KERN_ERR "LTT : Trace not found %s\n", trace_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	/*
+	 * Datas in metadata channel(marker info) is necessary to be able to
+	 * read the trace, we always enable this channel.
+	 */
+	if (!enable && !strcmp(channel_name, "metadata")) {
+		printk(KERN_ERR "LTT : Trying to disable metadata channel\n");
+		err = -EINVAL;
+		goto traces_error;
+	}
+
+	index = ltt_channels_get_index_from_name(channel_name);
+	if (index < 0) {
+		printk(KERN_ERR "LTT : Channel %s not found\n", channel_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	trace->channels[index].active = enable;
+
+traces_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_set_channel_enable);
+
+int ltt_trace_set_channel_overwrite(const char *trace_name,
+		const char *channel_name, unsigned int overwrite)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+	int index;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (!trace) {
+		printk(KERN_ERR "LTT : Trace not found %s\n", trace_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	/*
+	 * Always put the metadata channel in non-overwrite mode :
+	 * This is a very low traffic channel and it can't afford to have its
+	 * data overwritten : this data (marker info) is necessary to be
+	 * able to read the trace.
+	 */
+	if (overwrite && !strcmp(channel_name, "metadata")) {
+		printk(KERN_ERR "LTT : Trying to set metadata channel to "
+				"overwrite mode\n");
+		err = -EINVAL;
+		goto traces_error;
+	}
+
+	index = ltt_channels_get_index_from_name(channel_name);
+	if (index < 0) {
+		printk(KERN_ERR "LTT : Channel %s not found\n", channel_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	trace->channels[index].overwrite = overwrite;
+
+traces_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_set_channel_overwrite);
+
+int ltt_trace_alloc(const char *trace_name)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+	int subbuf_size, subbuf_cnt;
+	unsigned long flags;
+	int chan;
+	const char *channel_name;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (!trace) {
+		printk(KERN_ERR "LTT : Trace not found %s\n", trace_name);
+		err = -ENOENT;
+		goto traces_error;
+	}
+
+	kref_init(&trace->kref);
+	kref_init(&trace->ltt_transport_kref);
+	init_waitqueue_head(&trace->kref_wq);
+	trace->active = 0;
+	get_trace_clock();
+	trace->freq_scale = trace_clock_freq_scale();
+
+	if (!trace->transport) {
+		printk(KERN_ERR "LTT : Transport is not set.\n");
+		err = -EINVAL;
+		goto transport_error;
+	}
+	if (!try_module_get(trace->transport->owner)) {
+		printk(KERN_ERR	"LTT : Can't lock transport module.\n");
+		err = -ENODEV;
+		goto transport_error;
+	}
+	trace->ops = &trace->transport->ops;
+
+	err = trace->ops->create_dirs(trace);
+	if (err) {
+		printk(KERN_ERR	"LTT : Can't create dir for trace %s.\n",
+			trace_name);
+		goto dirs_error;
+	}
+
+	local_irq_save(flags);
+	trace->start_freq = trace_clock_frequency();
+	trace->start_tsc = trace_clock_read64();
+	do_gettimeofday(&trace->start_time);
+	local_irq_restore(flags);
+
+	for (chan = 0; chan < trace->nr_channels; chan++) {
+		if (!(trace->channels[chan].active))
+			continue;
+
+		channel_name = ltt_channels_get_name_from_index(chan);
+		WARN_ON(!channel_name);
+		subbuf_size = trace->channels[chan].subbuf_size;
+		subbuf_cnt = trace->channels[chan].subbuf_cnt;
+		prepare_chan_size_num(&subbuf_size, &subbuf_cnt);
+		err = trace->ops->create_channel(trace_name, trace,
+				trace->dentry.trace_root,
+				channel_name,
+				&trace->channels[chan],
+				subbuf_size,
+				subbuf_cnt,
+				trace->channels[chan].overwrite);
+		if (err != 0) {
+			printk(KERN_ERR	"LTT : Can't create channel %s.\n",
+				channel_name);
+			goto create_channel_error;
+		}
+	}
+
+	list_del(&trace->list);
+	if (list_empty(&ltt_traces.head)) {
+		mod_timer(&ltt_async_wakeup_timer,
+				jiffies + LTT_PERCPU_TIMER_INTERVAL);
+	}
+	list_add_rcu(&trace->list, &ltt_traces.head);
+	synchronize_sched();
+
+	ltt_unlock_traces();
+
+	return 0;
+
+create_channel_error:
+	for (chan--; chan >= 0; chan--)
+		if (trace->channels[chan].active)
+			trace->ops->remove_channel(&trace->channels[chan]);
+
+dirs_error:
+	module_put(trace->transport->owner);
+transport_error:
+	put_trace_clock();
+traces_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_alloc);
+
+/*
+ * It is worked as a wrapper for current version of ltt_control.ko.
+ * We will make a new ltt_control based on debugfs, and control each channel's
+ * buffer.
+ */
+static int ltt_trace_create(const char *trace_name, const char *trace_type,
+		enum trace_mode mode,
+		unsigned int subbuf_size_low, unsigned int n_subbufs_low,
+		unsigned int subbuf_size_med, unsigned int n_subbufs_med,
+		unsigned int subbuf_size_high, unsigned int n_subbufs_high)
+{
+	int err = 0;
+
+	err = ltt_trace_setup(trace_name);
+	if (IS_ERR_VALUE(err))
+		return err;
+
+	err = ltt_trace_set_type(trace_name, trace_type);
+	if (IS_ERR_VALUE(err))
+		return err;
+
+	err = ltt_trace_alloc(trace_name);
+	if (IS_ERR_VALUE(err))
+		return err;
+
+	return err;
+}
+
+/* Must be called while sure that trace is in the list. */
+static int _ltt_trace_destroy(struct ltt_trace_struct	*trace)
+{
+	int err = -EPERM;
+
+	if (trace == NULL) {
+		err = -ENOENT;
+		goto traces_error;
+	}
+	if (trace->active) {
+		printk(KERN_ERR
+			"LTT : Can't destroy trace %s : tracer is active\n",
+			trace->trace_name);
+		err = -EBUSY;
+		goto active_error;
+	}
+	/* Everything went fine */
+	list_del_rcu(&trace->list);
+	synchronize_sched();
+	if (list_empty(&ltt_traces.head)) {
+		/*
+		 * We stop the asynchronous delivery of reader wakeup, but
+		 * we must make one last check for reader wakeups pending
+		 * later in __ltt_trace_destroy.
+		 */
+		del_timer_sync(&ltt_async_wakeup_timer);
+	}
+	return 0;
+
+	/* error handling */
+active_error:
+traces_error:
+	return err;
+}
+
+/* Sleepable part of the destroy */
+static void __ltt_trace_destroy(struct ltt_trace_struct	*trace)
+{
+	int i;
+	struct ltt_channel_struct *chan;
+
+	for (i = 0; i < trace->nr_channels; i++) {
+		chan = &trace->channels[i];
+		if (chan->active)
+			trace->ops->finish_channel(chan);
+	}
+
+	flush_scheduled_work();
+
+	/*
+	 * The currently destroyed trace is not in the trace list anymore,
+	 * so it's safe to call the async wakeup ourself. It will deliver
+	 * the last subbuffers.
+	 */
+	trace_async_wakeup(trace);
+
+	for (i = 0; i < trace->nr_channels; i++) {
+		chan = &trace->channels[i];
+		if (chan->active)
+			trace->ops->remove_channel(chan);
+	}
+
+	kref_put(&trace->ltt_transport_kref, ltt_release_transport);
+
+	module_put(trace->transport->owner);
+
+	/*
+	 * Wait for lttd readers to release the files, therefore making sure
+	 * the last subbuffers have been read.
+	 */
+	if (atomic_read(&trace->kref.refcount) > 1) {
+		int ret = 0;
+		__wait_event_interruptible(trace->kref_wq,
+			(atomic_read(&trace->kref.refcount) == 1), ret);
+	}
+	kref_put(&trace->kref, ltt_release_trace);
+}
+
+int ltt_trace_destroy(const char *trace_name)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find(trace_name);
+	if (trace) {
+		err = _ltt_trace_destroy(trace);
+		if (err)
+			goto error;
+
+		ltt_unlock_traces();
+
+		__ltt_trace_destroy(trace);
+		put_trace_clock();
+
+		return 0;
+	}
+
+	trace = _ltt_trace_find_setup(trace_name);
+	if (trace) {
+		_ltt_trace_free(trace);
+		ltt_unlock_traces();
+		return 0;
+	}
+
+	err = -ENOENT;
+
+	/* Error handling */
+error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_destroy);
+
+/* must be called from within a traces lock. */
+static int _ltt_trace_start(struct ltt_trace_struct *trace)
+{
+	int err = 0;
+
+	if (trace == NULL) {
+		err = -ENOENT;
+		goto traces_error;
+	}
+	if (trace->active)
+		printk(KERN_INFO "LTT : Tracing already active for trace %s\n",
+				trace->trace_name);
+	if (!try_module_get(ltt_run_filter_owner)) {
+		err = -ENODEV;
+		printk(KERN_ERR "LTT : Can't lock filter module.\n");
+		goto get_ltt_run_filter_error;
+	}
+	trace->active = 1;
+	/* Read by trace points without protection : be careful */
+	ltt_traces.num_active_traces++;
+	return err;
+
+	/* error handling */
+get_ltt_run_filter_error:
+traces_error:
+	return err;
+}
+
+int ltt_trace_start(const char *trace_name)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+
+	ltt_lock_traces();
+
+	trace = _ltt_trace_find(trace_name);
+	err = _ltt_trace_start(trace);
+	if (err)
+		goto no_trace;
+
+	ltt_unlock_traces();
+
+	/*
+	 * Call the kernel state dump.
+	 * Events will be mixed with real kernel events, it's ok.
+	 * Notice that there is no protection on the trace : that's exactly
+	 * why we iterate on the list and check for trace equality instead of
+	 * directly using this trace handle inside the logging function.
+	 */
+
+	ltt_dump_marker_state(trace);
+
+	if (!try_module_get(ltt_statedump_owner)) {
+		err = -ENODEV;
+		printk(KERN_ERR
+			"LTT : Can't lock state dump module.\n");
+	} else {
+		ltt_statedump_functor(trace);
+		module_put(ltt_statedump_owner);
+	}
+
+	return err;
+
+	/* Error handling */
+no_trace:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_start);
+
+/* must be called from within traces lock */
+static int _ltt_trace_stop(struct ltt_trace_struct *trace)
+{
+	int err = -EPERM;
+
+	if (trace == NULL) {
+		err = -ENOENT;
+		goto traces_error;
+	}
+	if (!trace->active)
+		printk(KERN_INFO "LTT : Tracing not active for trace %s\n",
+				trace->trace_name);
+	if (trace->active) {
+		trace->active = 0;
+		ltt_traces.num_active_traces--;
+		synchronize_sched(); /* Wait for each tracing to be finished */
+	}
+	module_put(ltt_run_filter_owner);
+	/* Everything went fine */
+	return 0;
+
+	/* Error handling */
+traces_error:
+	return err;
+}
+
+int ltt_trace_stop(const char *trace_name)
+{
+	int err = 0;
+	struct ltt_trace_struct *trace;
+
+	ltt_lock_traces();
+	trace = _ltt_trace_find(trace_name);
+	err = _ltt_trace_stop(trace);
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_trace_stop);
+
+/**
+ * ltt_control - Trace control in-kernel API
+ * @msg: Action to perform
+ * @trace_name: Trace on which the action must be done
+ * @trace_type: Type of trace (normal, flight, hybrid)
+ * @args: Arguments specific to the action
+ */
+int ltt_control(enum ltt_control_msg msg, const char *trace_name,
+		const char *trace_type, union ltt_control_args args)
+{
+	int err = -EPERM;
+
+	printk(KERN_ALERT "ltt_control : trace %s\n", trace_name);
+	switch (msg) {
+	case LTT_CONTROL_START:
+		printk(KERN_DEBUG "Start tracing %s\n", trace_name);
+		err = ltt_trace_start(trace_name);
+		break;
+	case LTT_CONTROL_STOP:
+		printk(KERN_DEBUG "Stop tracing %s\n", trace_name);
+		err = ltt_trace_stop(trace_name);
+		break;
+	case LTT_CONTROL_CREATE_TRACE:
+		printk(KERN_DEBUG "Creating trace %s\n", trace_name);
+		err = ltt_trace_create(trace_name, trace_type,
+			args.new_trace.mode,
+			args.new_trace.subbuf_size_low,
+			args.new_trace.n_subbufs_low,
+			args.new_trace.subbuf_size_med,
+			args.new_trace.n_subbufs_med,
+			args.new_trace.subbuf_size_high,
+			args.new_trace.n_subbufs_high);
+		break;
+	case LTT_CONTROL_DESTROY_TRACE:
+		printk(KERN_DEBUG "Destroying trace %s\n", trace_name);
+		err = ltt_trace_destroy(trace_name);
+		break;
+	}
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_control);
+
+/**
+ * ltt_filter_control - Trace filter control in-kernel API
+ * @msg: Action to perform on the filter
+ * @trace_name: Trace on which the action must be done
+ */
+int ltt_filter_control(enum ltt_filter_control_msg msg, const char *trace_name)
+{
+	int err;
+	struct ltt_trace_struct *trace;
+
+	printk(KERN_DEBUG "ltt_filter_control : trace %s\n", trace_name);
+	ltt_lock_traces();
+	trace = _ltt_trace_find(trace_name);
+	if (trace == NULL) {
+		printk(KERN_ALERT
+			"Trace does not exist. Cannot proxy control request\n");
+		err = -ENOENT;
+		goto trace_error;
+	}
+	if (!try_module_get(ltt_filter_control_owner)) {
+		err = -ENODEV;
+		goto get_module_error;
+	}
+	switch (msg) {
+	case LTT_FILTER_DEFAULT_ACCEPT:
+		printk(KERN_DEBUG
+			"Proxy filter default accept %s\n", trace_name);
+		err = (*ltt_filter_control_functor)(msg, trace);
+		break;
+	case LTT_FILTER_DEFAULT_REJECT:
+		printk(KERN_DEBUG
+			"Proxy filter default reject %s\n", trace_name);
+		err = (*ltt_filter_control_functor)(msg, trace);
+		break;
+	default:
+		err = -EPERM;
+	}
+	module_put(ltt_filter_control_owner);
+
+get_module_error:
+trace_error:
+	ltt_unlock_traces();
+	return err;
+}
+EXPORT_SYMBOL_GPL(ltt_filter_control);
+
+int __init ltt_init(void)
+{
+	/* Make sure no page fault can be triggered by this module */
+	vmalloc_sync_all();
+	return 0;
+}
+
+module_init(ltt_init)
+
+static void __exit ltt_exit(void)
+{
+	struct ltt_trace_struct *trace;
+	struct list_head *pos, *n;
+
+	ltt_lock_traces();
+	/* Stop each trace, currently being read by RCU read-side */
+	list_for_each_entry_rcu(trace, &ltt_traces.head, list)
+		_ltt_trace_stop(trace);
+	/* Wait for quiescent state. Readers have preemption disabled. */
+	synchronize_sched();
+	/* Safe iteration is now permitted. It does not have to be RCU-safe
+	 * because no readers are left. */
+	list_for_each_safe(pos, n, &ltt_traces.head) {
+		trace = container_of(pos, struct ltt_trace_struct, list);
+		/* _ltt_trace_destroy does a synchronize_sched() */
+		_ltt_trace_destroy(trace);
+		__ltt_trace_destroy(trace);
+	}
+	/* free traces in pre-alloc status */
+	list_for_each_safe(pos, n, &ltt_traces.setup_head) {
+		trace = container_of(pos, struct ltt_trace_struct, list);
+		_ltt_trace_free(trace);
+	}
+
+	ltt_unlock_traces();
+}
+
+module_exit(ltt_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit Next Generation Tracer Kernel API");

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 14/41] Splice and pipe : export pipe buf operations for GPL modules
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (12 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 13/41] LTTng - tracer code Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 15/41] Poll : add poll_wait_set_exclusive Mathieu Desnoyers
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Jens Axboe, Linus Torvalds

[-- Attachment #1: splice-support-modules.patch --]
[-- Type: text/plain, Size: 2162 bytes --]

LTTng splice transport use the generic pipe and splice operations from a GPL
module.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jens Axboe <axboe@kernel.dk>
CC: Linus Torvalds <torvalds@osdl.org>
CC: Ingo Molnar <mingo@elte.hu>
---
 fs/pipe.c   |    5 +++++
 fs/splice.c |    1 +
 2 files changed, 6 insertions(+)

Index: linux-2.6-lttng/fs/pipe.c
===================================================================
--- linux-2.6-lttng.orig/fs/pipe.c	2009-02-06 14:45:33.000000000 -0500
+++ linux-2.6-lttng/fs/pipe.c	2009-02-06 14:57:50.000000000 -0500
@@ -188,6 +188,7 @@ void *generic_pipe_buf_map(struct pipe_i
 
 	return kmap(buf->page);
 }
+EXPORT_SYMBOL_GPL(generic_pipe_buf_map);
 
 /**
  * generic_pipe_buf_unmap - unmap a previously mapped pipe buffer
@@ -207,6 +208,7 @@ void generic_pipe_buf_unmap(struct pipe_
 	} else
 		kunmap(buf->page);
 }
+EXPORT_SYMBOL_GPL(generic_pipe_buf_unmap);
 
 /**
  * generic_pipe_buf_steal - attempt to take ownership of a &pipe_buffer
@@ -237,6 +239,7 @@ int generic_pipe_buf_steal(struct pipe_i
 
 	return 1;
 }
+EXPORT_SYMBOL_GPL(generic_pipe_buf_steal);
 
 /**
  * generic_pipe_buf_get - get a reference to a &struct pipe_buffer
@@ -252,6 +255,7 @@ void generic_pipe_buf_get(struct pipe_in
 {
 	page_cache_get(buf->page);
 }
+EXPORT_SYMBOL_GPL(generic_pipe_buf_get);
 
 /**
  * generic_pipe_buf_confirm - verify contents of the pipe buffer
@@ -267,6 +271,7 @@ int generic_pipe_buf_confirm(struct pipe
 {
 	return 0;
 }
+EXPORT_SYMBOL_GPL(generic_pipe_buf_confirm);
 
 static const struct pipe_buf_operations anon_pipe_buf_ops = {
 	.can_merge = 1,
Index: linux-2.6-lttng/fs/splice.c
===================================================================
--- linux-2.6-lttng.orig/fs/splice.c	2009-02-06 14:45:33.000000000 -0500
+++ linux-2.6-lttng/fs/splice.c	2009-02-06 14:57:50.000000000 -0500
@@ -260,6 +260,7 @@ ssize_t splice_to_pipe(struct pipe_inode
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(splice_to_pipe);
 
 static void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
 {

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 15/41] Poll : add poll_wait_set_exclusive
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (13 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 14/41] Splice and pipe : export pipe buf operations for GPL modules Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 16/41] LTTng Transport Locked Mathieu Desnoyers
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, William Lee Irwin III

[-- Attachment #1: poll-wait-exclusive.patch --]
[-- Type: text/plain, Size: 4267 bytes --]


Problem description :

In LTTng, all lttd readers are polling all the available debugfs files
for data. This is principally because the number of reader threads is
user-defined and there are typical workloads where a single CPU is
producing most of the tracing data and all other CPUs are idle,
available to consume data. It therefore makes sense not to tie those
threads to specific buffers. However, when the number of threads grows,
we face a "thundering herd" problem where many threads can be woken up
and put back to sleep, leaving only a single thread doing useful work.

Solution :

I just created a patch which adds a poll_wait_set_exclusive() primitive
to poll(), so the code which implements the pollfd operation can specify
that only a single waiter must be woken up.

poll_wait_set_exclusive : set poll wait queue to exclusive
Sets up a poll wait queue to use exclusive wakeups. This is useful to
wake up only one waiter at each wakeup. Used to work-around "thundering herd"
problem.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: William Lee Irwin III <wli@holomorphy.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 fs/select.c          |   41 ++++++++++++++++++++++++++++++++++++++---
 include/linux/poll.h |    2 ++
 2 files changed, 40 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/fs/select.c
===================================================================
--- linux-2.6-lttng.orig/fs/select.c	2009-03-05 15:41:29.000000000 -0500
+++ linux-2.6-lttng/fs/select.c	2009-03-05 15:41:42.000000000 -0500
@@ -105,6 +105,9 @@ struct poll_table_page {
  */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 		       poll_table *p);
+static void __pollwait_exclusive(struct file *filp,
+				 wait_queue_head_t *wait_address,
+				 poll_table *p);
 
 void poll_initwait(struct poll_wqueues *pwq)
 {
@@ -144,6 +147,20 @@ void poll_freewait(struct poll_wqueues *
 }
 EXPORT_SYMBOL(poll_freewait);
 
+/**
+ * poll_wait_set_exclusive - set poll wait queue to exclusive
+ *
+ * Sets up a poll wait queue to use exclusive wakeups. This is useful to
+ * wake up only one waiter at each wakeup. Used to work-around "thundering herd"
+ * problem.
+ */
+void poll_wait_set_exclusive(poll_table *p)
+{
+	if (p)
+		init_poll_funcptr(p, __pollwait_exclusive);
+}
+EXPORT_SYMBOL(poll_wait_set_exclusive);
+
 static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
 {
 	struct poll_table_page *table = p->table;
@@ -195,8 +212,10 @@ static int pollwake(wait_queue_t *wait, 
 }
 
 /* Add a new entry */
-static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
-				poll_table *p)
+static void __pollwait_common(struct file *filp,
+			      wait_queue_head_t *wait_address,
+			      poll_table *p,
+			      int exclusive)
 {
 	struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
 	struct poll_table_entry *entry = poll_get_entry(pwq);
@@ -207,7 +226,23 @@ static void __pollwait(struct file *filp
 	entry->wait_address = wait_address;
 	init_waitqueue_func_entry(&entry->wait, pollwake);
 	entry->wait.private = pwq;
-	add_wait_queue(wait_address, &entry->wait);
+	if (!exclusive)
+		add_wait_queue(wait_address, &entry->wait);
+	else
+		add_wait_queue_exclusive(wait_address, &entry->wait);
+}
+
+static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
+				poll_table *p)
+{
+	__pollwait_common(filp, wait_address, p, 0);
+}
+
+static void __pollwait_exclusive(struct file *filp,
+				 wait_queue_head_t *wait_address,
+				 poll_table *p)
+{
+	__pollwait_common(filp, wait_address, p, 1);
 }
 
 int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
Index: linux-2.6-lttng/include/linux/poll.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/poll.h	2009-03-05 15:41:29.000000000 -0500
+++ linux-2.6-lttng/include/linux/poll.h	2009-03-05 15:41:32.000000000 -0500
@@ -74,6 +74,8 @@ static inline int poll_schedule(struct p
 	return poll_schedule_timeout(pwq, state, NULL, 0);
 }
 
+extern void poll_wait_set_exclusive(poll_table *p);
+
 /*
  * Scaleable version of the fd_set.
  */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 16/41] LTTng Transport Locked
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (14 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 15/41] Poll : add poll_wait_set_exclusive Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 17/41] LTTng - serialization Mathieu Desnoyers
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: lttng-transport-locked.patch --]
[-- Type: text/plain, Size: 51696 bytes --]

LTTng data relay transport protected by spin lock and interrupt disable.

Performance tests, tbench, flight recorder trace, default instrumentation

Dual quad-core x86_64, 2.0GHz
Instrumentation dynamically disabled :                           1883.09 MB/s
Markers connected :                                              1812.14 MB/s
Lockless scheme, flight recorder :                                925.68 MB/s
Per-cpu-buffer spinlock and interrupt disable, flight recorder :  871.78 MB/s
Global spinlock and interrupt disable, flight recorder :          153.74 MB/s

Single Core Pentium 4, noreplace-smp, 3.0GHz
Instrumentation dynamically disabled :                            146.85 MB/s
Markers connected :                                               144.32 MB/s
Lockless scheme, flight recorder :                                 89.59 MB/s
Per-cpu-buffer spinlock and interrupt disable, flight recorder :   86.05 MB/s


Credits to Lai Jiangshan <laijs@cn.fujitsu.com> for updates and fixes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 ltt/ltt-relay-locked.c | 1704 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1704 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-relay-locked.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-relay-locked.c	2009-03-05 16:33:29.000000000 -0500
@@ -0,0 +1,1704 @@
+/*
+ * ltt/ltt-relay-locked.c
+ *
+ * (C) Copyright 2005-2008 - Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * LTTng buffer space management (reader/writer) using spinlock and interrupt
+ * disable.
+ *
+ * Author:
+ *  Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * Inspired from LTT :
+ *  Karim Yaghmour (karim@opersys.com)
+ *  Tom Zanussi (zanussi@us.ibm.com)
+ *  Bob Wisniewski (bob@watson.ibm.com)
+ * And from K42 :
+ *  Bob Wisniewski (bob@watson.ibm.com)
+ *
+ * Changelog:
+ *  08/10/08, Fork from lockless mechanism, use spinlock and irqoff.
+ *  19/10/05, Complete lockless mechanism.
+ *  27/05/05, Modular redesign and rewrite.
+ *
+ * Userspace reader semantic :
+ * while (poll fd != POLLHUP) {
+ *   - ioctl RELAY_GET_SUBBUF_SIZE
+ *   while (1) {
+ *     - ioctl GET_SUBBUF
+ *     - splice 1 subbuffer worth of data to a pipe
+ *     - splice the data from pipe to disk/network
+ *     - ioctl PUT_SUBBUF, check error value
+ *       if err val < 0, previous subbuffer was corrupted.
+ *   }
+ * }
+ */
+
+#include <linux/time.h>
+#include <linux/ltt-tracer.h>
+#include <linux/ltt-relay.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/bitops.h>
+#include <linux/fs.h>
+#include <linux/smp_lock.h>
+#include <linux/debugfs.h>
+#include <linux/stat.h>
+#include <linux/cpu.h>
+#include <linux/pipe_fs_i.h>
+#include <linux/splice.h>
+#include <linux/spinlock.h>
+
+#if 0
+#define printk_dbg(fmt, args...) printk(fmt, args)
+#else
+#define printk_dbg(fmt, args...)
+#endif
+
+
+/* LTTng locked logging buffer info */
+struct ltt_channel_buf_struct {
+	/* First 32 bytes cache-hot cacheline */
+	long offset;			/* Current offset in the buffer */
+	long *commit_count;		/* Commit count per sub-buffer */
+	unsigned long irqflags;		/* IRQ flags saved by reserve */
+	raw_spinlock_t lock;		/* Spinlock protecting buffer */
+	/* End of first 32 bytes cacheline */
+	unsigned long last_tsc;		/*
+					 * Last timestamp written in the buffer.
+					 */
+	long consumed;			/* Current offset in the buffer */
+	atomic_long_t active_readers;	/* Active readers count */
+	long events_lost;
+	long corrupted_subbuffers;
+	wait_queue_head_t write_wait;	/*
+					 * Wait queue for blocking user space
+					 * writers
+					 */
+	int wakeup_readers;		/* Boolean : wakeup readers waiting ? */
+	wait_queue_head_t read_wait;	/* reader wait queue */
+	unsigned int finalized;		/* buffer has been finalized */
+} ____cacheline_aligned;
+
+static const struct file_operations ltt_file_operations;
+
+/*
+ * Last TSC comparison functions. Check if the current TSC overflows
+ * LTT_TSC_BITS bits from the last TSC read. Reads and writes last_tsc
+ * atomically.
+ */
+
+#if (BITS_PER_LONG == 32)
+static inline void save_last_tsc(struct ltt_channel_buf_struct *ltt_buf,
+					u64 tsc)
+{
+	ltt_buf->last_tsc = (unsigned long)(tsc >> LTT_TSC_BITS);
+}
+
+static inline int last_tsc_overflow(struct ltt_channel_buf_struct *ltt_buf,
+					u64 tsc)
+{
+	unsigned long tsc_shifted = (unsigned long)(tsc >> LTT_TSC_BITS);
+
+	if (unlikely((tsc_shifted - ltt_buf->last_tsc)))
+		return 1;
+	else
+		return 0;
+}
+#else
+static inline void save_last_tsc(struct ltt_channel_buf_struct *ltt_buf,
+					u64 tsc)
+{
+	ltt_buf->last_tsc = (unsigned long)tsc;
+}
+
+static inline int last_tsc_overflow(struct ltt_channel_buf_struct *ltt_buf,
+					u64 tsc)
+{
+	if (unlikely((tsc - ltt_buf->last_tsc) >> LTT_TSC_BITS))
+		return 1;
+	else
+		return 0;
+}
+#endif
+
+/*
+ * A switch is done during tracing or as a final flush after tracing (so it
+ * won't write in the new sub-buffer).
+ */
+enum force_switch_mode { FORCE_ACTIVE, FORCE_FLUSH };
+
+static int ltt_relay_create_buffer(struct ltt_trace_struct *trace,
+		struct ltt_channel_struct *ltt_chan,
+		struct rchan_buf *buf,
+		unsigned int cpu,
+		unsigned int n_subbufs);
+
+static void ltt_relay_destroy_buffer(struct ltt_channel_struct *ltt_chan,
+		unsigned int cpu);
+
+static void ltt_force_switch(struct rchan_buf *buf,
+		enum force_switch_mode mode);
+
+/*
+ * Trace callbacks
+ */
+static void ltt_buffer_begin_callback(struct rchan_buf *buf,
+			u64 tsc, unsigned int subbuf_idx)
+{
+	struct ltt_channel_struct *channel =
+		(struct ltt_channel_struct *)buf->chan->private_data;
+	struct ltt_subbuffer_header *header =
+		(struct ltt_subbuffer_header *)
+			ltt_relay_offset_address(buf,
+				subbuf_idx * buf->chan->subbuf_size);
+
+	header->cycle_count_begin = tsc;
+	header->lost_size = 0xFFFFFFFF; /* for debugging */
+	header->buf_size = buf->chan->subbuf_size;
+	ltt_write_trace_header(channel->trace, header);
+}
+
+/*
+ * offset is assumed to never be 0 here : never deliver a completely empty
+ * subbuffer. The lost size is between 0 and subbuf_size-1.
+ */
+static notrace void ltt_buffer_end_callback(struct rchan_buf *buf,
+		u64 tsc, unsigned int offset, unsigned int subbuf_idx)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	struct ltt_subbuffer_header *header =
+		(struct ltt_subbuffer_header *)
+			ltt_relay_offset_address(buf,
+				subbuf_idx * buf->chan->subbuf_size);
+
+	header->lost_size = SUBBUF_OFFSET((buf->chan->subbuf_size - offset),
+				buf->chan);
+	header->cycle_count_end = tsc;
+	header->events_lost = ltt_buf->events_lost;
+	header->subbuf_corrupt = ltt_buf->corrupted_subbuffers;
+}
+
+static notrace void ltt_deliver(struct rchan_buf *buf, unsigned int subbuf_idx,
+		void *subbuf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	ltt_buf->wakeup_readers = 1;
+}
+
+static struct dentry *ltt_create_buf_file_callback(const char *filename,
+		struct dentry *parent, int mode,
+		struct rchan_buf *buf)
+{
+	struct ltt_channel_struct *ltt_chan;
+	int err;
+	struct dentry *dentry;
+
+	ltt_chan = buf->chan->private_data;
+	err = ltt_relay_create_buffer(ltt_chan->trace, ltt_chan,
+					buf, buf->cpu,
+					buf->chan->n_subbufs);
+	if (err)
+		return ERR_PTR(err);
+
+	dentry = debugfs_create_file(filename, mode, parent, buf,
+			&ltt_file_operations);
+	if (!dentry)
+		goto error;
+	return dentry;
+error:
+	ltt_relay_destroy_buffer(ltt_chan, buf->cpu);
+	return NULL;
+}
+
+static int ltt_remove_buf_file_callback(struct dentry *dentry)
+{
+	struct rchan_buf *buf = dentry->d_inode->i_private;
+	struct ltt_channel_struct *ltt_chan = buf->chan->private_data;
+
+	debugfs_remove(dentry);
+	ltt_relay_destroy_buffer(ltt_chan, buf->cpu);
+
+	return 0;
+}
+
+/*
+ * Wake writers :
+ *
+ * This must be done after the trace is removed from the RCU list so that there
+ * are no stalled writers.
+ */
+static void ltt_relay_wake_writers(struct ltt_channel_buf_struct *ltt_buf)
+{
+
+	if (waitqueue_active(&ltt_buf->write_wait))
+		wake_up_interruptible(&ltt_buf->write_wait);
+}
+
+/*
+ * This function should not be called from NMI interrupt context
+ */
+static notrace void ltt_buf_unfull(struct rchan_buf *buf,
+		unsigned int subbuf_idx,
+		long offset)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	ltt_relay_wake_writers(ltt_buf);
+}
+
+/*
+ * Reader API.
+ */
+static unsigned long get_offset(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	return ltt_buf->offset;
+}
+
+static unsigned long get_consumed(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	return ltt_buf->consumed;
+}
+
+static int _ltt_open(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	if (!atomic_long_add_unless(&ltt_buf->active_readers, 1, 1))
+		return -EBUSY;
+	ltt_relay_get_chan(buf->chan);
+	return 0;
+}
+
+static int _ltt_release(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	ltt_relay_put_chan(buf->chan);
+	WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+	atomic_long_dec(&ltt_buf->active_readers);
+	return 0;
+}
+
+static int get_subbuf(struct rchan_buf *buf, unsigned long *consumed)
+{
+	struct ltt_channel_struct *ltt_channel =
+		(struct ltt_channel_struct *)buf->chan->private_data;
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	long consumed_old, consumed_idx, commit_count, write_offset;
+
+	WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+
+	local_irq_disable();
+	__raw_spin_lock(&ltt_buf->lock);
+	consumed_old = ltt_buf->consumed;
+	consumed_idx = SUBBUF_INDEX(consumed_old, buf->chan);
+	commit_count = ltt_buf->commit_count[consumed_idx];
+	write_offset = ltt_buf->offset;
+	/*
+	 * Check that the subbuffer we are trying to consume has been
+	 * already fully committed.
+	 */
+	if (((commit_count - buf->chan->subbuf_size)
+	     & ltt_channel->commit_count_mask)
+	    - (BUFFER_TRUNC(consumed_old, buf->chan)
+	       >> ltt_channel->n_subbufs_order)
+	    != 0) {
+		__raw_spin_unlock(&ltt_buf->lock);
+		local_irq_enable();
+		return -EAGAIN;
+	}
+	/*
+	 * Check that we are not about to read the same subbuffer in
+	 * which the writer head is.
+	 */
+	if ((SUBBUF_TRUNC(write_offset, buf->chan)
+	   - SUBBUF_TRUNC(consumed_old, buf->chan))
+	   == 0) {
+		__raw_spin_unlock(&ltt_buf->lock);
+		local_irq_enable();
+		return -EAGAIN;
+	}
+	__raw_spin_unlock(&ltt_buf->lock);
+	local_irq_enable();
+	*consumed = consumed_old;
+	return 0;
+}
+
+static int put_subbuf(struct rchan_buf *buf, unsigned long consumed)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	long consumed_new, consumed_old;
+
+	WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+
+	local_irq_disable();
+	__raw_spin_lock(&ltt_buf->lock);
+	consumed_old = consumed;
+	consumed_new = SUBBUF_ALIGN(consumed_old, buf->chan);
+
+	if (ltt_buf->consumed != consumed_old) {
+		/* We have been pushed by the writer : the last
+		 * buffer read _is_ corrupted! It can also
+		 * happen if this is a buffer we never got. */
+		__raw_spin_unlock(&ltt_buf->lock);
+		local_irq_enable();
+		return -EIO;
+	} else {
+		/* tell the client that buffer is now unfull */
+		int index;
+		long data;
+
+		ltt_buf->consumed = consumed_new;
+		index = SUBBUF_INDEX(consumed_old, buf->chan);
+		data = BUFFER_OFFSET(consumed_old, buf->chan);
+		ltt_buf_unfull(buf, index, data);
+		__raw_spin_unlock(&ltt_buf->lock);
+		local_irq_enable();
+	}
+	return 0;
+}
+
+static unsigned long get_n_subbufs(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+	return buf->chan->n_subbufs;
+}
+
+static unsigned long get_subbuf_size(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+	return buf->chan->subbuf_size;
+}
+
+static struct ltt_channel_buf_access_ops ltt_channel_buf_accessor = {
+	.get_offset   = get_offset,
+	.get_consumed = get_consumed,
+	.get_subbuf = get_subbuf,
+	.put_subbuf = put_subbuf,
+	.get_n_subbufs = get_n_subbufs,
+	.get_subbuf_size = get_subbuf_size,
+	.open = _ltt_open,
+	.release = _ltt_release,
+};
+
+/**
+ *	ltt_open - open file op for ltt files
+ *	@inode: opened inode
+ *	@file: opened file
+ *
+ *	Open implementation. Makes sure only one open instance of a buffer is
+ *	done at a given moment.
+ */
+static int ltt_open(struct inode *inode, struct file *file)
+{
+	int ret;
+	struct rchan_buf *buf = inode->i_private;
+
+	ret = _ltt_open(buf);
+	if (!ret)
+		ret = ltt_relay_file_operations.open(inode, file);
+	return ret;
+}
+
+/**
+ *	ltt_release - release file op for ltt files
+ *	@inode: opened inode
+ *	@file: opened file
+ *
+ *	Release implementation.
+ */
+static int ltt_release(struct inode *inode, struct file *file)
+{
+	struct rchan_buf *buf = inode->i_private;
+	int ret;
+
+	_ltt_release(buf);
+	ret = ltt_relay_file_operations.release(inode, file);
+	WARN_ON(ret);
+	return ret;
+}
+
+/**
+ *	ltt_poll - file op for ltt files
+ *	@filp: the file
+ *	@wait: poll table
+ *
+ *	Poll implementation.
+ */
+static unsigned int ltt_poll(struct file *filp, poll_table *wait)
+{
+	unsigned int mask = 0, ret;
+	struct inode *inode = filp->f_dentry->d_inode;
+	struct rchan_buf *buf = inode->i_private;
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	if (filp->f_mode & FMODE_READ) {
+		poll_wait_set_exclusive(wait);
+		poll_wait(filp, &ltt_buf->read_wait, wait);
+
+		local_irq_disable();
+		__raw_spin_lock(&ltt_buf->lock);
+		WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+		if (SUBBUF_TRUNC(ltt_buf->offset, buf->chan)
+		  - SUBBUF_TRUNC(ltt_buf->consumed, buf->chan)
+		  == 0) {
+			if (ltt_buf->finalized)
+				ret = POLLHUP;
+			else
+				ret = 0;
+			goto end;
+		} else {
+			struct rchan *rchan = buf->chan;
+
+			if (SUBBUF_TRUNC(ltt_buf->offset, buf->chan)
+			  - SUBBUF_TRUNC(ltt_buf->consumed, buf->chan)
+			  >= rchan->alloc_size)
+				ret = POLLPRI | POLLRDBAND;
+			else
+				ret = POLLIN | POLLRDNORM;
+			goto end;
+		}
+end:
+		__raw_spin_unlock(&ltt_buf->lock);
+		local_irq_enable();
+		return ret;
+	}
+	return mask;
+}
+
+/**
+ *	ltt_ioctl - control on the debugfs file
+ *
+ *	@inode: the inode
+ *	@filp: the file
+ *	@cmd: the command
+ *	@arg: command arg
+ *
+ *	This ioctl implements three commands necessary for a minimal
+ *	producer/consumer implementation :
+ *	RELAY_GET_SUBBUF
+ *		Get the next sub buffer that can be read. It never blocks.
+ *	RELAY_PUT_SUBBUF
+ *		Release the currently read sub-buffer. Parameter is the last
+ *		put subbuffer (returned by GET_SUBBUF).
+ *	RELAY_GET_N_BUBBUFS
+ *		returns the number of sub buffers in the per cpu channel.
+ *	RELAY_GET_SUBBUF_SIZE
+ *		returns the size of the sub buffers.
+ */
+static int ltt_ioctl(struct inode *inode, struct file *filp,
+		unsigned int cmd, unsigned long arg)
+{
+	struct rchan_buf *buf = inode->i_private;
+	u32 __user *argp = (u32 __user *)arg;
+
+	switch (cmd) {
+	case RELAY_GET_SUBBUF:
+	{
+		unsigned long consumed;
+		int ret;
+
+		ret = get_subbuf(buf, &consumed);
+		if (ret)
+			return ret;
+		else
+			return put_user((u32)consumed, argp);
+		break;
+	}
+	case RELAY_PUT_SUBBUF:
+	{
+		struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+		u32 uconsumed_old;
+		int ret;
+		long consumed_old;
+
+		ret = get_user(uconsumed_old, argp);
+		if (ret)
+			return ret; /* will return -EFAULT */
+
+		consumed_old = ltt_buf->consumed;
+		consumed_old = consumed_old & (~0xFFFFFFFFL);
+		consumed_old = consumed_old | uconsumed_old;
+		ret = put_subbuf(buf, consumed_old);
+		if (ret)
+			return ret;
+		break;
+	}
+	case RELAY_GET_N_SUBBUFS:
+		return put_user((u32)get_n_subbufs(buf), argp);
+		break;
+	case RELAY_GET_SUBBUF_SIZE:
+		return put_user((u32)get_subbuf_size(buf), argp);
+		break;
+	default:
+		return -ENOIOCTLCMD;
+	}
+	return 0;
+}
+
+#ifdef CONFIG_COMPAT
+static long ltt_compat_ioctl(struct file *file, unsigned int cmd,
+		unsigned long arg)
+{
+	long ret = -ENOIOCTLCMD;
+
+	lock_kernel();
+	ret = ltt_ioctl(file->f_dentry->d_inode, file, cmd, arg);
+	unlock_kernel();
+
+	return ret;
+}
+#endif
+
+static void ltt_relay_pipe_buf_release(struct pipe_inode_info *pipe,
+				   struct pipe_buffer *pbuf)
+{
+}
+
+static struct pipe_buf_operations ltt_relay_pipe_buf_ops = {
+	.can_merge = 0,
+	.map = generic_pipe_buf_map,
+	.unmap = generic_pipe_buf_unmap,
+	.confirm = generic_pipe_buf_confirm,
+	.release = ltt_relay_pipe_buf_release,
+	.steal = generic_pipe_buf_steal,
+	.get = generic_pipe_buf_get,
+};
+
+static void ltt_relay_page_release(struct splice_pipe_desc *spd, unsigned int i)
+{
+}
+
+/*
+ *	subbuf_splice_actor - splice up to one subbuf's worth of data
+ */
+static int subbuf_splice_actor(struct file *in,
+			       loff_t *ppos,
+			       struct pipe_inode_info *pipe,
+			       size_t len,
+			       unsigned int flags)
+{
+	struct rchan_buf *buf = in->private_data;
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	unsigned int poff, subbuf_pages, nr_pages;
+	struct page *pages[PIPE_BUFFERS];
+	struct partial_page partial[PIPE_BUFFERS];
+	struct splice_pipe_desc spd = {
+		.pages = pages,
+		.nr_pages = 0,
+		.partial = partial,
+		.flags = flags,
+		.ops = &ltt_relay_pipe_buf_ops,
+		.spd_release = ltt_relay_page_release,
+	};
+	long consumed_old, consumed_idx, roffset;
+	unsigned long bytes_avail;
+
+	/*
+	 * Check that a GET_SUBBUF ioctl has been done before.
+	 */
+	local_irq_disable();
+	__raw_spin_lock(&ltt_buf->lock);
+	WARN_ON(atomic_long_read(&ltt_buf->active_readers) != 1);
+	consumed_old = ltt_buf->consumed;
+	consumed_old += *ppos;
+	consumed_idx = SUBBUF_INDEX(consumed_old, buf->chan);
+
+	/*
+	 * Adjust read len, if longer than what is available
+	 */
+	bytes_avail = SUBBUF_TRUNC(ltt_buf->offset, buf->chan) - consumed_old;
+	WARN_ON(bytes_avail > buf->chan->alloc_size);
+	len = min_t(size_t, len, bytes_avail);
+	subbuf_pages = bytes_avail >> PAGE_SHIFT;
+	nr_pages = min_t(unsigned int, subbuf_pages, PIPE_BUFFERS);
+	roffset = consumed_old & PAGE_MASK;
+	poff = consumed_old & ~PAGE_MASK;
+	printk_dbg(KERN_DEBUG "SPLICE actor len %zu pos %zd write_pos %ld\n",
+		len, (ssize_t)*ppos, ltt_buf->offset);
+
+	for (; spd.nr_pages < nr_pages; spd.nr_pages++) {
+		unsigned int this_len;
+		struct buf_page *page;
+
+		if (!len)
+			break;
+		printk_dbg(KERN_DEBUG "SPLICE actor loop len %zu roffset %ld\n",
+			len, roffset);
+
+		this_len = PAGE_SIZE - poff;
+		page = ltt_relay_read_get_page(buf, roffset);
+		spd.pages[spd.nr_pages] = page->page;
+		spd.partial[spd.nr_pages].offset = poff;
+		spd.partial[spd.nr_pages].len = this_len;
+
+		poff = 0;
+		roffset += PAGE_SIZE;
+		len -= this_len;
+	}
+	__raw_spin_unlock(&ltt_buf->lock);
+	local_irq_enable();
+
+	if (!spd.nr_pages)
+		return 0;
+
+	return splice_to_pipe(pipe, &spd);
+}
+
+static ssize_t ltt_relay_file_splice_read(struct file *in,
+				      loff_t *ppos,
+				      struct pipe_inode_info *pipe,
+				      size_t len,
+				      unsigned int flags)
+{
+	ssize_t spliced;
+	int ret;
+
+	ret = 0;
+	spliced = 0;
+
+	printk_dbg(KERN_DEBUG "SPLICE read len %zu pos %zd\n",
+		len, (ssize_t)*ppos);
+	while (len && !spliced) {
+		ret = subbuf_splice_actor(in, ppos, pipe, len, flags);
+		printk_dbg(KERN_DEBUG "SPLICE read loop ret %d\n", ret);
+		if (ret < 0)
+			break;
+		else if (!ret) {
+			if (flags & SPLICE_F_NONBLOCK)
+				ret = -EAGAIN;
+			break;
+		}
+
+		*ppos += ret;
+		if (ret > len)
+			len = 0;
+		else
+			len -= ret;
+		spliced += ret;
+	}
+
+	if (spliced)
+		return spliced;
+
+	return ret;
+}
+
+static void ltt_relay_print_subbuffer_errors(
+		struct ltt_channel_struct *ltt_chan,
+		long cons_off, unsigned int cpu)
+{
+	struct rchan *rchan = ltt_chan->trans_channel_data;
+	struct ltt_channel_buf_struct *ltt_buf = rchan->buf[cpu]->chan_private;
+	long cons_idx, commit_count, write_offset;
+
+	cons_idx = SUBBUF_INDEX(cons_off, rchan);
+	commit_count = ltt_buf->commit_count[cons_idx];
+	write_offset = ltt_buf->offset;
+	printk(KERN_WARNING
+		"LTT : unread channel %s offset is %ld "
+		"and cons_off : %ld (cpu %u)\n",
+		ltt_chan->channel_name, write_offset, cons_off, cpu);
+	/* Check each sub-buffer for non filled commit count */
+	if (((commit_count - rchan->subbuf_size) & ltt_chan->commit_count_mask)
+	    - (BUFFER_TRUNC(cons_off, rchan) >> ltt_chan->n_subbufs_order)
+	    != 0)
+		printk(KERN_ALERT
+			"LTT : %s : subbuffer %lu has non filled "
+			"commit count %lu.\n",
+			ltt_chan->channel_name, cons_idx, commit_count);
+	printk(KERN_ALERT "LTT : %s : commit count : %lu, subbuf size %zd\n",
+			ltt_chan->channel_name, commit_count,
+			rchan->subbuf_size);
+}
+
+static void ltt_relay_print_errors(struct ltt_trace_struct *trace,
+		struct ltt_channel_struct *ltt_chan, int cpu)
+{
+	struct rchan *rchan = ltt_chan->trans_channel_data;
+	struct ltt_channel_buf_struct *ltt_buf = rchan->buf[cpu]->chan_private;
+	long cons_off;
+
+	/*
+	 * Can be called in the error path of allocation when
+	 * trans_channel_data is not yet set.
+	 */
+	if (!rchan)
+		return;
+	for (cons_off = ltt_buf->consumed;
+			(SUBBUF_TRUNC(ltt_buf->offset, rchan) - cons_off) > 0;
+			cons_off = SUBBUF_ALIGN(cons_off, rchan))
+		ltt_relay_print_subbuffer_errors(ltt_chan, cons_off, cpu);
+}
+
+static void ltt_relay_print_buffer_errors(struct ltt_channel_struct *ltt_chan,
+		unsigned int cpu)
+{
+	struct ltt_trace_struct *trace = ltt_chan->trace;
+	struct rchan *rchan = ltt_chan->trans_channel_data;
+	struct ltt_channel_buf_struct *ltt_buf = rchan->buf[cpu]->chan_private;
+
+	if (ltt_buf->events_lost)
+		printk(KERN_ALERT
+			"LTT : %s : %ld events lost "
+			"in %s channel (cpu %u).\n",
+			ltt_chan->channel_name,
+			ltt_buf->events_lost,
+			ltt_chan->channel_name, cpu);
+	if (ltt_buf->corrupted_subbuffers)
+		printk(KERN_ALERT
+			"LTT : %s : %ld corrupted subbuffers "
+			"in %s channel (cpu %u).\n",
+			ltt_chan->channel_name,
+			ltt_buf->corrupted_subbuffers,
+			ltt_chan->channel_name, cpu);
+
+	ltt_relay_print_errors(trace, ltt_chan, cpu);
+}
+
+static void ltt_relay_remove_dirs(struct ltt_trace_struct *trace)
+{
+	debugfs_remove(trace->dentry.trace_root);
+}
+
+/*
+ * Create ltt buffer.
+ */
+static int ltt_relay_create_buffer(struct ltt_trace_struct *trace,
+		struct ltt_channel_struct *ltt_chan, struct rchan_buf *buf,
+		unsigned int cpu, unsigned int n_subbufs)
+{
+	struct ltt_channel_buf_struct *ltt_buf;
+	ltt_buf = kzalloc_node(sizeof(*ltt_buf), GFP_KERNEL, cpu_to_node(cpu));
+	if (!ltt_buf)
+		return -ENOMEM;
+
+	ltt_buf->commit_count =
+		kzalloc_node(sizeof(ltt_buf->commit_count) * n_subbufs,
+			GFP_KERNEL, cpu_to_node(cpu));
+	if (!ltt_buf->commit_count) {
+		kfree(ltt_buf);
+		return -ENOMEM;
+	}
+	buf->chan_private = ltt_buf;
+
+	kref_get(&trace->kref);
+	kref_get(&trace->ltt_transport_kref);
+	ltt_buf->offset = ltt_subbuffer_header_size();
+	atomic_long_set(&ltt_buf->active_readers, 0);
+	init_waitqueue_head(&ltt_buf->write_wait);
+	init_waitqueue_head(&ltt_buf->read_wait);
+	ltt_buffer_begin_callback(buf, trace->start_tsc, 0);
+	ltt_buf->commit_count[0] += ltt_subbuffer_header_size();
+	ltt_buf->lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
+
+	return 0;
+}
+
+static void ltt_relay_destroy_buffer(struct ltt_channel_struct *ltt_chan,
+		unsigned int cpu)
+{
+	struct ltt_trace_struct *trace = ltt_chan->trace;
+	struct rchan *rchan = ltt_chan->trans_channel_data;
+	struct ltt_channel_buf_struct *ltt_buf = rchan->buf[cpu]->chan_private;
+
+	kref_put(&ltt_chan->trace->ltt_transport_kref,
+		ltt_release_transport);
+	ltt_relay_print_buffer_errors(ltt_chan, cpu);
+	kfree(ltt_buf->commit_count);
+	kfree(ltt_buf);
+	kref_put(&trace->kref, ltt_release_trace);
+	wake_up_interruptible(&trace->kref_wq);
+}
+
+/*
+ * Create channel.
+ */
+static int ltt_relay_create_channel(const char *trace_name,
+		struct ltt_trace_struct *trace, struct dentry *dir,
+		const char *channel_name, struct ltt_channel_struct *ltt_chan,
+		unsigned int subbuf_size, unsigned int n_subbufs,
+		int overwrite)
+{
+	char *tmpname;
+	unsigned int tmpname_len;
+	int err = 0;
+
+	tmpname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (!tmpname)
+		return EPERM;
+	if (overwrite) {
+		strncpy(tmpname, LTT_FLIGHT_PREFIX, PATH_MAX-1);
+		strncat(tmpname, channel_name,
+			PATH_MAX-1-sizeof(LTT_FLIGHT_PREFIX));
+	} else {
+		strncpy(tmpname, channel_name, PATH_MAX-1);
+	}
+	strncat(tmpname, "_", PATH_MAX-1-strlen(tmpname));
+
+	ltt_chan->trace = trace;
+	ltt_chan->buffer_begin = ltt_buffer_begin_callback;
+	ltt_chan->buffer_end = ltt_buffer_end_callback;
+	ltt_chan->overwrite = overwrite;
+	ltt_chan->n_subbufs_order = get_count_order(n_subbufs);
+	ltt_chan->commit_count_mask = (~0UL >> ltt_chan->n_subbufs_order);
+	ltt_chan->trans_channel_data = ltt_relay_open(tmpname,
+			dir,
+			subbuf_size,
+			n_subbufs,
+			&trace->callbacks,
+			ltt_chan);
+	tmpname_len = strlen(tmpname);
+	if (tmpname_len > 0) {
+		/* Remove final _ for pretty printing */
+		tmpname[tmpname_len-1] = '\0';
+	}
+	if (ltt_chan->trans_channel_data == NULL) {
+		printk(KERN_ERR "LTT : Can't open %s channel for trace %s\n",
+				tmpname, trace_name);
+		goto relay_open_error;
+	}
+
+	ltt_chan->buf_access_ops = &ltt_channel_buf_accessor;
+
+	err = 0;
+	goto end;
+
+relay_open_error:
+	err = EPERM;
+end:
+	kfree(tmpname);
+	return err;
+}
+
+static int ltt_relay_create_dirs(struct ltt_trace_struct *new_trace)
+{
+	struct dentry *ltt_root_dentry;
+
+	ltt_root_dentry = get_ltt_root();
+	if (!ltt_root_dentry)
+		return ENOENT;
+
+	new_trace->dentry.trace_root = debugfs_create_dir(new_trace->trace_name,
+			ltt_root_dentry);
+	put_ltt_root();
+	if (new_trace->dentry.trace_root == NULL) {
+		printk(KERN_ERR "LTT : Trace directory name %s already taken\n",
+				new_trace->trace_name);
+		return EEXIST;
+	}
+
+	new_trace->callbacks.create_buf_file = ltt_create_buf_file_callback;
+	new_trace->callbacks.remove_buf_file = ltt_remove_buf_file_callback;
+
+	return 0;
+}
+
+/*
+ * LTTng channel flush function.
+ *
+ * Must be called when no tracing is active in the channel, because of
+ * accesses across CPUs.
+ */
+static notrace void ltt_relay_buffer_flush(struct rchan_buf *buf)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+
+	ltt_buf->finalized = 1;
+	ltt_force_switch(buf, FORCE_FLUSH);
+}
+
+static void ltt_relay_async_wakeup_chan(struct ltt_channel_struct *ltt_channel)
+{
+	unsigned int i;
+	struct rchan *rchan = ltt_channel->trans_channel_data;
+
+	for_each_possible_cpu(i) {
+		struct ltt_channel_buf_struct *ltt_buf;
+
+		if (!rchan->buf[i])
+			continue;
+
+		ltt_buf = rchan->buf[i]->chan_private;
+		if (ltt_buf->wakeup_readers == 1) {
+			ltt_buf->wakeup_readers = 0;
+			wake_up_interruptible(&ltt_buf->read_wait);
+		}
+	}
+}
+
+static void ltt_relay_finish_buffer(struct ltt_channel_struct *ltt_channel,
+		unsigned int cpu)
+{
+	struct rchan *rchan = ltt_channel->trans_channel_data;
+
+	if (rchan->buf[cpu]) {
+		struct ltt_channel_buf_struct *ltt_buf =
+				rchan->buf[cpu]->chan_private;
+		ltt_relay_buffer_flush(rchan->buf[cpu]);
+		ltt_relay_wake_writers(ltt_buf);
+	}
+}
+
+
+static void ltt_relay_finish_channel(struct ltt_channel_struct *ltt_channel)
+{
+	unsigned int i;
+
+	for_each_possible_cpu(i)
+		ltt_relay_finish_buffer(ltt_channel, i);
+}
+
+static void ltt_relay_remove_channel(struct ltt_channel_struct *channel)
+{
+	struct rchan *rchan = channel->trans_channel_data;
+
+	ltt_relay_close(rchan);
+}
+
+struct ltt_reserve_switch_offsets {
+	long begin, end, old;
+	long begin_switch, end_switch_current, end_switch_old;
+	long commit_count, reserve_commit_diff;
+	size_t before_hdr_pad, size;
+};
+
+/*
+ * Returns :
+ * 0 if ok
+ * !0 if execution must be aborted.
+ */
+static inline int ltt_relay_try_reserve(
+		struct ltt_channel_struct *ltt_channel,
+		struct ltt_channel_buf_struct *ltt_buf, struct rchan *rchan,
+		struct rchan_buf *buf,
+		struct ltt_reserve_switch_offsets *offsets, size_t data_size,
+		u64 *tsc, unsigned int *rflags, int largest_align)
+{
+	offsets->begin = ltt_buf->offset;
+	offsets->old = offsets->begin;
+	offsets->begin_switch = 0;
+	offsets->end_switch_current = 0;
+	offsets->end_switch_old = 0;
+
+	*tsc = trace_clock_read64();
+	if (last_tsc_overflow(ltt_buf, *tsc))
+		*rflags = LTT_RFLAG_ID_SIZE_TSC;
+
+	if (SUBBUF_OFFSET(offsets->begin, buf->chan) == 0) {
+		offsets->begin_switch = 1;		/* For offsets->begin */
+	} else {
+		offsets->size = ltt_get_header_size(ltt_channel,
+					offsets->begin, data_size,
+					&offsets->before_hdr_pad, *rflags);
+		offsets->size += ltt_align(offsets->begin + offsets->size,
+					   largest_align)
+				 + data_size;
+		if ((SUBBUF_OFFSET(offsets->begin, buf->chan) + offsets->size)
+				> buf->chan->subbuf_size) {
+			offsets->end_switch_old = 1;	/* For offsets->old */
+			offsets->begin_switch = 1;	/* For offsets->begin */
+		}
+	}
+	if (offsets->begin_switch) {
+		long subbuf_index;
+
+		if (offsets->end_switch_old)
+			offsets->begin = SUBBUF_ALIGN(offsets->begin,
+						      buf->chan);
+		offsets->begin = offsets->begin + ltt_subbuffer_header_size();
+		/* Test new buffer integrity */
+		subbuf_index = SUBBUF_INDEX(offsets->begin, buf->chan);
+		offsets->reserve_commit_diff =
+			(BUFFER_TRUNC(offsets->begin, buf->chan)
+			 >> ltt_channel->n_subbufs_order)
+			- (ltt_buf->commit_count[subbuf_index]
+			   & ltt_channel->commit_count_mask);
+		if (offsets->reserve_commit_diff == 0) {
+			/* Next buffer not corrupted. */
+			if (!ltt_channel->overwrite &&
+				(SUBBUF_TRUNC(offsets->begin, buf->chan)
+				- SUBBUF_TRUNC(ltt_buf->consumed, buf->chan))
+				>= rchan->alloc_size) {
+				/*
+				 * We do not overwrite non consumed buffers
+				 * and we are full : event is lost.
+				 */
+				ltt_buf->events_lost++;
+				return -1;
+			} else {
+				/*
+				 * next buffer not corrupted, we are either in
+				 * overwrite mode or the buffer is not full.
+				 * It's safe to write in this new subbuffer.
+				 */
+			}
+		} else {
+			/*
+			 * Next subbuffer corrupted. Force pushing reader even
+			 * in normal mode. It's safe to write in this new
+			 * subbuffer.
+			 */
+		}
+		offsets->size = ltt_get_header_size(ltt_channel,
+					offsets->begin, data_size,
+					&offsets->before_hdr_pad, *rflags);
+		offsets->size += ltt_align(offsets->begin + offsets->size,
+					   largest_align)
+				 + data_size;
+		if ((SUBBUF_OFFSET(offsets->begin, buf->chan) + offsets->size)
+				> buf->chan->subbuf_size) {
+			/*
+			 * Event too big for subbuffers, report error, don't
+			 * complete the sub-buffer switch.
+			 */
+			ltt_buf->events_lost++;
+			return -1;
+		} else {
+			/*
+			 * We just made a successful buffer switch and the event
+			 * fits in the new subbuffer. Let's write.
+			 */
+		}
+	} else {
+		/*
+		 * Event fits in the current buffer and we are not on a switch
+		 * boundary. It's safe to write.
+		 */
+	}
+	offsets->end = offsets->begin + offsets->size;
+
+	if ((SUBBUF_OFFSET(offsets->end, buf->chan)) == 0) {
+		/*
+		 * The offset_end will fall at the very beginning of the next
+		 * subbuffer.
+		 */
+		offsets->end_switch_current = 1;	/* For offsets->begin */
+	}
+	return 0;
+}
+
+/*
+ * Returns :
+ * 0 if ok
+ * !0 if execution must be aborted.
+ */
+static inline int ltt_relay_try_switch(
+		enum force_switch_mode mode,
+		struct ltt_channel_struct *ltt_channel,
+		struct ltt_channel_buf_struct *ltt_buf, struct rchan *rchan,
+		struct rchan_buf *buf,
+		struct ltt_reserve_switch_offsets *offsets,
+		u64 *tsc)
+{
+	long subbuf_index;
+
+	offsets->begin = ltt_buf->offset;
+	offsets->old = offsets->begin;
+	offsets->begin_switch = 0;
+	offsets->end_switch_old = 0;
+
+	*tsc = trace_clock_read64();
+
+	if (SUBBUF_OFFSET(offsets->begin, buf->chan) != 0) {
+		offsets->begin = SUBBUF_ALIGN(offsets->begin, buf->chan);
+		offsets->end_switch_old = 1;
+	} else {
+		/* we do not have to switch : buffer is empty */
+		return -1;
+	}
+	if (mode == FORCE_ACTIVE)
+		offsets->begin += ltt_subbuffer_header_size();
+	/*
+	 * Always begin_switch in FORCE_ACTIVE mode.
+	 * Test new buffer integrity
+	 */
+	subbuf_index = SUBBUF_INDEX(offsets->begin, buf->chan);
+	offsets->reserve_commit_diff =
+		(BUFFER_TRUNC(offsets->begin, buf->chan)
+		 >> ltt_channel->n_subbufs_order)
+		- (ltt_buf->commit_count[subbuf_index]
+		   & ltt_channel->commit_count_mask);
+	if (offsets->reserve_commit_diff == 0) {
+		/* Next buffer not corrupted. */
+		if (mode == FORCE_ACTIVE
+		    && !ltt_channel->overwrite
+		    && offsets->begin - ltt_buf->consumed
+		       >= rchan->alloc_size) {
+			/*
+			 * We do not overwrite non consumed buffers and we are
+			 * full : ignore switch while tracing is active.
+			 */
+			return -1;
+		}
+	} else {
+		/*
+		 * Next subbuffer corrupted. Force pushing reader even in normal
+		 * mode
+		 */
+	}
+	offsets->end = offsets->begin;
+	return 0;
+}
+
+static inline void ltt_reserve_push_reader(
+		struct ltt_channel_struct *ltt_channel,
+		struct ltt_channel_buf_struct *ltt_buf,
+		struct rchan *rchan,
+		struct rchan_buf *buf,
+		struct ltt_reserve_switch_offsets *offsets)
+{
+	long consumed_old, consumed_new;
+
+	consumed_old = ltt_buf->consumed;
+	/*
+	 * If buffer is in overwrite mode, push the reader consumed
+	 * count if the write position has reached it and we are not
+	 * at the first iteration (don't push the reader farther than
+	 * the writer). This operation can be done concurrently by many
+	 * writers in the same buffer, the writer being at the farthest
+	 * write position sub-buffer index in the buffer being the one
+	 * which will win this loop.
+	 * If the buffer is not in overwrite mode, pushing the reader
+	 * only happens if a sub-buffer is corrupted.
+	 */
+	if ((SUBBUF_TRUNC(offsets->end-1, buf->chan)
+	   - SUBBUF_TRUNC(consumed_old, buf->chan))
+	   >= rchan->alloc_size) {
+		consumed_new = SUBBUF_ALIGN(consumed_old, buf->chan);
+		ltt_buf->consumed = consumed_new;
+	} else
+		consumed_new = consumed_old;
+
+	if (consumed_old != consumed_new) {
+		/*
+		 * Reader pushed : we are the winner of the push, we can
+		 * therefore reequilibrate reserve and commit. Atomic increment
+		 * of the commit count permits other writers to play around
+		 * with this variable before us. We keep track of
+		 * corrupted_subbuffers even in overwrite mode :
+		 * we never want to write over a non completely committed
+		 * sub-buffer : possible causes : the buffer size is too low
+		 * compared to the unordered data input, or there is a writer
+		 * that died between the reserve and the commit.
+		 */
+		if (offsets->reserve_commit_diff) {
+			/*
+			 * We have to alter the sub-buffer commit count.
+			 * We do not deliver the previous subbuffer, given it
+			 * was either corrupted or not consumed (overwrite
+			 * mode).
+			 */
+			ltt_buf->commit_count[SUBBUF_INDEX(offsets->begin,
+							   buf->chan)] +=
+						offsets->reserve_commit_diff;
+			if (!ltt_channel->overwrite
+			    || offsets->reserve_commit_diff
+			       != rchan->subbuf_size) {
+				/*
+				 * The reserve commit diff was not subbuf_size :
+				 * it means the subbuffer was partly written to
+				 * and is therefore corrupted. If it is multiple
+				 * of subbuffer size and we are in flight
+				 * recorder mode, we are skipping over a whole
+				 * subbuffer.
+				 */
+				ltt_buf->corrupted_subbuffers++;
+			}
+		}
+	}
+}
+
+
+/*
+ * ltt_reserve_switch_old_subbuf: switch old subbuffer
+ *
+ * Concurrency safe because we are the last and only thread to alter this
+ * sub-buffer. As long as it is not delivered and read, no other thread can
+ * alter the offset, alter the reserve_count or call the
+ * client_buffer_end_callback on this sub-buffer.
+ *
+ * The only remaining threads could be the ones with pending commits. They will
+ * have to do the deliver themselves.  Not concurrency safe in overwrite mode.
+ * We detect corrupted subbuffers with commit and reserve counts. We keep a
+ * corrupted sub-buffers count and push the readers across these sub-buffers.
+ *
+ * Not concurrency safe if a writer is stalled in a subbuffer and another writer
+ * switches in, finding out it's corrupted.  The result will be than the old
+ * (uncommited) subbuffer will be declared corrupted, and that the new subbuffer
+ * will be declared corrupted too because of the commit count adjustment.
+ *
+ * Note : offset_old should never be 0 here.
+ */
+static inline void ltt_reserve_switch_old_subbuf(
+		struct ltt_channel_struct *ltt_channel,
+		struct ltt_channel_buf_struct *ltt_buf, struct rchan *rchan,
+		struct rchan_buf *buf,
+		struct ltt_reserve_switch_offsets *offsets, u64 *tsc)
+{
+	long oldidx = SUBBUF_INDEX(offsets->old - 1, rchan);
+
+	ltt_channel->buffer_end(buf, *tsc, offsets->old, oldidx);
+	ltt_buf->commit_count[oldidx] +=
+		rchan->subbuf_size
+		- (SUBBUF_OFFSET(offsets->old - 1, rchan)
+		+ 1);
+	offsets->commit_count = ltt_buf->commit_count[oldidx];
+	if ((BUFFER_TRUNC(offsets->old - 1, rchan)
+			>> ltt_channel->n_subbufs_order)
+			- ((offsets->commit_count - rchan->subbuf_size)
+			   & ltt_channel->commit_count_mask) == 0)
+		ltt_deliver(buf, oldidx, NULL);
+}
+
+/*
+ * ltt_reserve_switch_new_subbuf: Populate new subbuffer.
+ *
+ * This code can be executed unordered : writers may already have written to the
+ * sub-buffer before this code gets executed, caution.  The commit makes sure
+ * that this code is executed before the deliver of this sub-buffer.
+ */
+static inline void ltt_reserve_switch_new_subbuf(
+		struct ltt_channel_struct *ltt_channel,
+		struct ltt_channel_buf_struct *ltt_buf, struct rchan *rchan,
+		struct rchan_buf *buf,
+		struct ltt_reserve_switch_offsets *offsets, u64 *tsc)
+{
+	long beginidx = SUBBUF_INDEX(offsets->begin, rchan);
+
+	ltt_channel->buffer_begin(buf, *tsc, beginidx);
+	ltt_buf->commit_count[beginidx] += ltt_subbuffer_header_size();
+	offsets->commit_count = ltt_buf->commit_count[beginidx];
+	/* Check if the written buffer has to be delivered */
+	if ((BUFFER_TRUNC(offsets->end - 1, rchan)
+			>> ltt_channel->n_subbufs_order)
+			- ((offsets->commit_count - rchan->subbuf_size)
+			   & ltt_channel->commit_count_mask) == 0)
+		ltt_deliver(buf, beginidx, NULL);
+}
+
+
+/*
+ * ltt_reserve_end_switch_current: finish switching current subbuffer
+ *
+ * Concurrency safe because we are the last and only thread to alter this
+ * sub-buffer. As long as it is not delivered and read, no other thread can
+ * alter the offset, alter the reserve_count or call the
+ * client_buffer_end_callback on this sub-buffer.
+ *
+ * The only remaining threads could be the ones with pending commits. They will
+ * have to do the deliver themselves.  Not concurrency safe in overwrite mode.
+ * We detect corrupted subbuffers with commit and reserve counts. We keep a
+ * corrupted sub-buffers count and push the readers across these sub-buffers.
+ *
+ * Not concurrency safe if a writer is stalled in a subbuffer and another writer
+ * switches in, finding out it's corrupted.  The result will be than the old
+ * (uncommited) subbuffer will be declared corrupted, and that the new subbuffer
+ * will be declared corrupted too because of the commit count adjustment.
+ */
+static inline void ltt_reserve_end_switch_current(
+		struct ltt_channel_struct *ltt_channel,
+		struct ltt_channel_buf_struct *ltt_buf, struct rchan *rchan,
+		struct rchan_buf *buf,
+		struct ltt_reserve_switch_offsets *offsets, u64 *tsc)
+{
+	long endidx = SUBBUF_INDEX(offsets->end - 1, rchan);
+
+	ltt_channel->buffer_end(buf, *tsc, offsets->end, endidx);
+	ltt_buf->commit_count[endidx] +=
+		rchan->subbuf_size
+		- (SUBBUF_OFFSET(offsets->end - 1, rchan)
+		+ 1);
+	offsets->commit_count = ltt_buf->commit_count[endidx];
+	if ((BUFFER_TRUNC(offsets->end - 1, rchan)
+			>> ltt_channel->n_subbufs_order)
+			- ((offsets->commit_count - rchan->subbuf_size)
+			   & ltt_channel->commit_count_mask) == 0)
+		ltt_deliver(buf, endidx, NULL);
+}
+
+/**
+ * ltt_relay_reserve_slot - Atomic slot reservation in a LTTng buffer.
+ * @trace : the trace structure to log to.
+ * @ltt_channel : channel structure
+ * @transport_data : data structure specific to ltt relay
+ * @data_size : size of the variable length data to log.
+ * @slot_size : pointer to total size of the slot (out)
+ * @buf_offset : pointer to reserved buffer offset (out)
+ * @tsc : pointer to the tsc at the slot reservation (out)
+ * @cpu : cpuid
+ *
+ * Return : -ENOSPC if not enough space, else returns 0.
+ *
+ * It will take care of sub-buffer switching.
+ */
+static notrace int ltt_relay_reserve_slot(struct ltt_trace_struct *trace,
+		struct ltt_channel_struct *ltt_channel, void **transport_data,
+		size_t data_size, size_t *slot_size, long *buf_offset, u64 *tsc,
+		unsigned int *rflags, int largest_align, int cpu)
+{
+	struct rchan *rchan = ltt_channel->trans_channel_data;
+	struct rchan_buf *buf = *transport_data = rchan->buf[cpu];
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	struct ltt_reserve_switch_offsets offsets;
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__raw_spin_lock(&ltt_buf->lock);
+
+	offsets.reserve_commit_diff = 0;
+	offsets.size = 0;
+
+	/*
+	 * Perform retryable operations.
+	 */
+	if (__get_cpu_var(ltt_nesting) > 4) {
+		ltt_buf->events_lost++;
+		__raw_spin_unlock(&ltt_buf->lock);
+		raw_local_irq_restore(flags);
+		return -EPERM;
+	}
+
+	if (ltt_relay_try_reserve(ltt_channel, ltt_buf,
+			rchan, buf, &offsets, data_size, tsc, rflags,
+			largest_align)) {
+		__raw_spin_unlock(&ltt_buf->lock);
+		raw_local_irq_restore(flags);
+		return -ENOSPC;
+	}
+	ltt_buf->offset = offsets.end;
+
+	save_last_tsc(ltt_buf, *tsc);
+
+	/*
+	 * Push the reader if necessary
+	 */
+	ltt_reserve_push_reader(ltt_channel, ltt_buf, rchan, buf, &offsets);
+
+	/*
+	 * Switch old subbuffer if needed.
+	 */
+	if (offsets.end_switch_old)
+		ltt_reserve_switch_old_subbuf(ltt_channel, ltt_buf, rchan, buf,
+			&offsets, tsc);
+
+	/*
+	 * Populate new subbuffer.
+	 */
+	if (offsets.begin_switch)
+		ltt_reserve_switch_new_subbuf(ltt_channel, ltt_buf, rchan,
+			buf, &offsets, tsc);
+
+	if (offsets.end_switch_current)
+		ltt_reserve_end_switch_current(ltt_channel, ltt_buf, rchan,
+			buf, &offsets, tsc);
+
+	ltt_buf->irqflags = flags;
+	*slot_size = offsets.size;
+	*buf_offset = offsets.begin + offsets.before_hdr_pad;
+	return 0;
+}
+
+/*
+ * Force a sub-buffer switch for a per-cpu buffer. This operation is
+ * completely reentrant : can be called while tracing is active with
+ * absolutely no lock held.
+ */
+static notrace void ltt_force_switch(struct rchan_buf *buf,
+		enum force_switch_mode mode)
+{
+	struct ltt_channel_struct *ltt_channel =
+			(struct ltt_channel_struct *)buf->chan->private_data;
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	struct rchan *rchan = ltt_channel->trans_channel_data;
+	struct ltt_reserve_switch_offsets offsets;
+	unsigned long flags;
+	u64 tsc;
+
+	offsets.reserve_commit_diff = 0;
+	offsets.size = 0;
+
+	raw_local_irq_save(flags);
+	__raw_spin_lock(&ltt_buf->lock);
+
+	/*
+	 * Perform retryable operations.
+	 */
+	if (ltt_relay_try_switch(mode, ltt_channel, ltt_buf,
+			rchan, buf, &offsets, &tsc)) {
+		__raw_spin_unlock(&ltt_buf->lock);
+		raw_local_irq_restore(flags);
+		return;
+	}
+	ltt_buf->offset = offsets.end;
+
+	save_last_tsc(ltt_buf, tsc);
+
+	/*
+	 * Push the reader if necessary
+	 */
+	if (mode == FORCE_ACTIVE)
+		ltt_reserve_push_reader(ltt_channel, ltt_buf, rchan,
+					buf, &offsets);
+
+	/*
+	 * Switch old subbuffer if needed.
+	 */
+	if (offsets.end_switch_old)
+		ltt_reserve_switch_old_subbuf(ltt_channel, ltt_buf, rchan, buf,
+			&offsets, &tsc);
+
+	/*
+	 * Populate new subbuffer.
+	 */
+	if (mode == FORCE_ACTIVE)
+		ltt_reserve_switch_new_subbuf(ltt_channel,
+			ltt_buf, rchan, buf, &offsets, &tsc);
+
+	__raw_spin_unlock(&ltt_buf->lock);
+	raw_local_irq_restore(flags);
+}
+
+/*
+ * for flight recording. must be called after relay_commit.
+ * This function decrements de subbuffer's lost_size each time the commit count
+ * reaches back the reserve offset (module subbuffer size). It is useful for
+ * crash dump.
+ * We use slot_size - 1 to make sure we deal correctly with the case where we
+ * fill the subbuffer completely (so the subbuf index stays in the previous
+ * subbuffer).
+ */
+#ifdef CONFIG_LTT_VMCORE
+static inline void ltt_write_commit_counter(struct rchan_buf *buf,
+		long buf_offset, size_t slot_size)
+{
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	struct ltt_subbuffer_header *header;
+	long offset, subbuf_idx, commit_count;
+	uint32_t lost_old, lost_new;
+
+	subbuf_idx = SUBBUF_INDEX(buf_offset - 1, buf->chan);
+	offset = buf_offset + slot_size;
+	header = (struct ltt_subbuffer_header *)
+			ltt_relay_offset_address(buf,
+				subbuf_idx * buf->chan->subbuf_size);
+	for (;;) {
+		lost_old = header->lost_size;
+		commit_count = ltt_buf->commit_count[subbuf_idx];
+		/* SUBBUF_OFFSET includes commit_count_mask */
+		if (!SUBBUF_OFFSET(offset - commit_count, buf->chan)) {
+			lost_new = (uint32_t)buf->chan->subbuf_size
+				   - SUBBUF_OFFSET(commit_count, buf->chan);
+			lost_old = cmpxchg_local(&header->lost_size, lost_old,
+							lost_new);
+			if (lost_old <= lost_new)
+				break;
+		} else {
+			break;
+		}
+	}
+}
+#else
+static inline void ltt_write_commit_counter(struct rchan_buf *buf,
+		long buf_offset, size_t slot_size)
+{
+}
+#endif
+
+/*
+ * Atomic unordered slot commit. Increments the commit count in the
+ * specified sub-buffer, and delivers it if necessary.
+ *
+ * Parameters:
+ *
+ * @ltt_channel : channel structure
+ * @transport_data: transport-specific data
+ * @buf_offset : offset following the event header.
+ * @slot_size : size of the reserved slot.
+ */
+static notrace void ltt_relay_commit_slot(
+		struct ltt_channel_struct *ltt_channel,
+		void **transport_data, long buf_offset, size_t slot_size)
+{
+	struct rchan_buf *buf = *transport_data;
+	struct ltt_channel_buf_struct *ltt_buf = buf->chan_private;
+	struct rchan *rchan = buf->chan;
+	unsigned int offset_end = buf_offset;
+	long endidx = SUBBUF_INDEX(offset_end - 1, rchan);
+	long commit_count;
+
+	ltt_buf->commit_count[endidx] += slot_size;
+	commit_count = ltt_buf->commit_count[endidx];
+	/* Check if all commits have been done */
+	if ((BUFFER_TRUNC(offset_end - 1, rchan)
+			>> ltt_channel->n_subbufs_order)
+			- ((commit_count - rchan->subbuf_size)
+			   & ltt_channel->commit_count_mask) == 0)
+		ltt_deliver(buf, endidx, NULL);
+	/*
+	 * Update lost_size for each commit. It's needed only for extracting
+	 * ltt buffers from vmcore, after crash.
+	 */
+	ltt_write_commit_counter(buf, buf_offset, slot_size);
+	__raw_spin_unlock(&ltt_buf->lock);
+	raw_local_irq_restore(ltt_buf->irqflags);
+}
+
+/*
+ * This is called with preemption disabled when user space has requested
+ * blocking mode.  If one of the active traces has free space below a
+ * specific threshold value, we reenable preemption and block.
+ */
+static int ltt_relay_user_blocking(struct ltt_trace_struct *trace,
+		unsigned int chan_index, size_t data_size,
+		struct user_dbg_data *dbg)
+{
+	struct rchan *rchan;
+	struct ltt_channel_buf_struct *ltt_buf;
+	struct ltt_channel_struct *channel;
+	struct rchan_buf *relay_buf;
+	int cpu;
+	DECLARE_WAITQUEUE(wait, current);
+
+	channel = &trace->channels[chan_index];
+	rchan = channel->trans_channel_data;
+	cpu = smp_processor_id();
+	relay_buf = rchan->buf[cpu];
+	ltt_buf = relay_buf->chan_private;
+
+	/*
+	 * Check if data is too big for the channel : do not
+	 * block for it.
+	 */
+	if (LTT_RESERVE_CRITICAL + data_size > relay_buf->chan->subbuf_size)
+		return 0;
+
+	/*
+	 * If free space too low, we block. We restart from the
+	 * beginning after we resume (cpu id may have changed
+	 * while preemption is active).
+	 */
+	local_irq_disable();
+	__raw_spin_lock(&ltt_buf->lock);
+	if (!channel->overwrite) {
+		dbg->write = ltt_buf->offset;
+		dbg->read = ltt_buf->consumed;
+		dbg->avail_size = dbg->write + LTT_RESERVE_CRITICAL + data_size
+				  - SUBBUF_TRUNC(dbg->read,
+						 relay_buf->chan);
+		if (dbg->avail_size > rchan->alloc_size) {
+			__set_current_state(TASK_INTERRUPTIBLE);
+			add_wait_queue(&ltt_buf->write_wait, &wait);
+			__raw_spin_unlock(&ltt_buf->lock);
+			local_irq_enable();
+			preempt_enable();
+			schedule();
+			__set_current_state(TASK_RUNNING);
+			remove_wait_queue(&ltt_buf->write_wait, &wait);
+			if (signal_pending(current))
+				return -ERESTARTSYS;
+			preempt_disable();
+			return 1;
+		}
+	}
+	__raw_spin_unlock(&ltt_buf->lock);
+	local_irq_enable();
+	return 0;
+}
+
+static void ltt_relay_print_user_errors(struct ltt_trace_struct *trace,
+		unsigned int chan_index, size_t data_size,
+		struct user_dbg_data *dbg, int cpu)
+{
+	struct rchan *rchan;
+	struct ltt_channel_buf_struct *ltt_buf;
+	struct ltt_channel_struct *channel;
+	struct rchan_buf *relay_buf;
+
+	channel = &trace->channels[chan_index];
+	rchan = channel->trans_channel_data;
+	relay_buf = rchan->buf[cpu];
+	ltt_buf = relay_buf->chan_private;
+
+	printk(KERN_ERR "Error in LTT usertrace : "
+	"buffer full : event lost in blocking "
+	"mode. Increase LTT_RESERVE_CRITICAL.\n");
+	printk(KERN_ERR "LTT nesting level is %u.\n",
+		per_cpu(ltt_nesting, cpu));
+	printk(KERN_ERR "LTT avail size %lu.\n",
+		dbg->avail_size);
+	printk(KERN_ERR "avai write : %lu, read : %lu\n",
+			dbg->write, dbg->read);
+
+	dbg->write = ltt_buf->offset;
+	dbg->read = ltt_buf->consumed;
+
+	printk(KERN_ERR "LTT cur size %lu.\n",
+		dbg->write + LTT_RESERVE_CRITICAL + data_size
+		- SUBBUF_TRUNC(dbg->read, relay_buf->chan));
+	printk(KERN_ERR "cur write : %lu, read : %lu\n",
+			dbg->write, dbg->read);
+}
+
+static struct ltt_transport ltt_relay_transport = {
+	.name = "relay-locked",
+	.owner = THIS_MODULE,
+	.ops = {
+		.create_dirs = ltt_relay_create_dirs,
+		.remove_dirs = ltt_relay_remove_dirs,
+		.create_channel = ltt_relay_create_channel,
+		.finish_channel = ltt_relay_finish_channel,
+		.remove_channel = ltt_relay_remove_channel,
+		.wakeup_channel = ltt_relay_async_wakeup_chan,
+		.commit_slot = ltt_relay_commit_slot,
+		.reserve_slot = ltt_relay_reserve_slot,
+		.user_blocking = ltt_relay_user_blocking,
+		.user_errors = ltt_relay_print_user_errors,
+	},
+};
+
+static const struct file_operations ltt_file_operations = {
+	.open = ltt_open,
+	.release = ltt_release,
+	.poll = ltt_poll,
+	.splice_read = ltt_relay_file_splice_read,
+	.ioctl = ltt_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = ltt_compat_ioctl,
+#endif
+};
+
+static int __init ltt_relay_init(void)
+{
+	printk(KERN_INFO "LTT : ltt-relay-locked init\n");
+
+	ltt_transport_register(&ltt_relay_transport);
+
+	return 0;
+}
+
+static void __exit ltt_relay_exit(void)
+{
+	printk(KERN_INFO "LTT : ltt-relay-locked exit\n");
+
+	ltt_transport_unregister(&ltt_relay_transport);
+}
+
+module_init(ltt_relay_init);
+module_exit(ltt_relay_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit Next Generation Locked Relay");

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 17/41] LTTng - serialization
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (15 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 16/41] LTTng Transport Locked Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 18/41] Seq_file add support for sorted list Mathieu Desnoyers
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-serialize.patch --]
[-- Type: text/plain, Size: 18521 bytes --]

Generic serialization mechanism : takes the variable argument list and
serializes it them in the trace buffers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 ltt/ltt-serialize.c |  685 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 685 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-serialize.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-serialize.c	2009-03-04 14:07:58.000000000 -0500
@@ -0,0 +1,685 @@
+/*
+ * LTTng serializing code.
+ *
+ * Copyright Mathieu Desnoyers, March 2007.
+ *
+ * Licensed under the GPLv2.
+ *
+ * See this discussion about weirdness about passing va_list and then va_list to
+ * functions. (related to array argument passing). va_list seems to be
+ * implemented as an array on x86_64, but not on i386... This is why we pass a
+ * va_list * to ltt_vtrace.
+ */
+
+#include <stdarg.h>
+#include <linux/ctype.h>
+#include <linux/string.h>
+#include <linux/module.h>
+#include <linux/ltt-tracer.h>
+
+enum ltt_type {
+	LTT_TYPE_SIGNED_INT,
+	LTT_TYPE_UNSIGNED_INT,
+	LTT_TYPE_STRING,
+	LTT_TYPE_NONE,
+};
+
+#define LTT_ATTRIBUTE_NETWORK_BYTE_ORDER (1<<1)
+
+/*
+ * Inspired from vsnprintf
+ *
+ * The serialization format string supports the basic printf format strings.
+ * In addition, it defines new formats that can be used to serialize more
+ * complex/non portable data structures.
+ *
+ * Typical use:
+ *
+ * field_name %ctype
+ * field_name #tracetype %ctype
+ * field_name #tracetype %ctype1 %ctype2 ...
+ *
+ * A conversion is performed between format string types supported by GCC and
+ * the trace type requested. GCC type is used to perform type checking on format
+ * strings. Trace type is used to specify the exact binary representation
+ * in the trace. A mapping is done between one or more GCC types to one trace
+ * type. Sign extension, if required by the conversion, is performed following
+ * the trace type.
+ *
+ * If a gcc format is not declared with a trace format, the gcc format is
+ * also used as binary representation in the trace.
+ *
+ * Strings are supported with %s.
+ * A single tracetype (sequence) can take multiple c types as parameter.
+ *
+ * c types:
+ *
+ * see printf(3).
+ *
+ * Note: to write a uint32_t in a trace, the following expression is recommended
+ * si it can be portable:
+ *
+ * ("#4u%lu", (unsigned long)var)
+ *
+ * trace types:
+ *
+ * Serialization specific formats :
+ *
+ * Fixed size integers
+ * #1u     writes uint8_t
+ * #2u     writes uint16_t
+ * #4u     writes uint32_t
+ * #8u     writes uint64_t
+ * #1d     writes int8_t
+ * #2d     writes int16_t
+ * #4d     writes int32_t
+ * #8d     writes int64_t
+ * i.e.:
+ * #1u%lu #2u%lu #4d%lu #8d%lu #llu%hu #d%lu
+ *
+ * * Attributes:
+ *
+ * n:  (for network byte order)
+ * #ntracetype%ctype
+ *            is written in the trace in network byte order.
+ *
+ * i.e.: #bn4u%lu, #n%lu, #b%u
+ *
+ * TODO (eventually)
+ * Variable length sequence
+ * #a #tracetype1 #tracetype2 %array_ptr %elem_size %num_elems
+ *            In the trace:
+ *            #a specifies that this is a sequence
+ *            #tracetype1 is the type of elements in the sequence
+ *            #tracetype2 is the type of the element count
+ *            GCC input:
+ *            array_ptr is a pointer to an array that contains members of size
+ *            elem_size.
+ *            num_elems is the number of elements in the array.
+ * i.e.: #a #lu #lu %p %lu %u
+ *
+ * Callback
+ * #k         callback (taken from the probe data)
+ *            The following % arguments are exepected by the callback
+ *
+ * i.e.: #a #lu #lu #k %p
+ *
+ * Note: No conversion is done from floats to integers, nor from integers to
+ * floats between c types and trace types. float conversion from double to float
+ * or from float to double is also not supported.
+ *
+ * REMOVE
+ * %*b     expects sizeof(data), data
+ *         where sizeof(data) is 1, 2, 4 or 8
+ *
+ * Fixed length struct, union or array.
+ * FIXME: unable to extract those sizes statically.
+ * %*r     expects sizeof(*ptr), ptr
+ * %*.*r   expects sizeof(*ptr), __alignof__(*ptr), ptr
+ * struct and unions removed.
+ * Fixed length array:
+ * [%p]#a[len #tracetype]
+ * i.e.: [%p]#a[12 #lu]
+ *
+ * Variable length sequence
+ * %*.*:*v expects sizeof(*ptr), __alignof__(*ptr), elem_num, ptr
+ *         where elem_num is the number of elements in the sequence
+ */
+static inline const char *parse_trace_type(const char *fmt,
+		char *trace_size, enum ltt_type *trace_type,
+		unsigned long *attributes)
+{
+	int qualifier;		/* 'h', 'l', or 'L' for integer fields */
+				/* 'z' support added 23/7/1999 S.H.    */
+				/* 'z' changed to 'Z' --davidm 1/25/99 */
+				/* 't' added for ptrdiff_t */
+
+	/* parse attributes. */
+repeat:
+	switch (*fmt) {
+	case 'n':
+		*attributes |= LTT_ATTRIBUTE_NETWORK_BYTE_ORDER;
+		++fmt;
+		goto repeat;
+	}
+
+	/* get the conversion qualifier */
+	qualifier = -1;
+	if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L' ||
+	    *fmt == 'Z' || *fmt == 'z' || *fmt == 't' ||
+	    *fmt == 'S' || *fmt == '1' || *fmt == '2' ||
+	    *fmt == '4' || *fmt == 8) {
+		qualifier = *fmt;
+		++fmt;
+		if (qualifier == 'l' && *fmt == 'l') {
+			qualifier = 'L';
+			++fmt;
+		}
+	}
+
+	switch (*fmt) {
+	case 'c':
+		*trace_type = LTT_TYPE_UNSIGNED_INT;
+		*trace_size = sizeof(unsigned char);
+		goto parse_end;
+	case 's':
+		*trace_type = LTT_TYPE_STRING;
+		goto parse_end;
+	case 'p':
+		*trace_type = LTT_TYPE_UNSIGNED_INT;
+		*trace_size = sizeof(void *);
+		goto parse_end;
+	case 'd':
+	case 'i':
+		*trace_type = LTT_TYPE_SIGNED_INT;
+		break;
+	case 'o':
+	case 'u':
+	case 'x':
+	case 'X':
+		*trace_type = LTT_TYPE_UNSIGNED_INT;
+		break;
+	default:
+		if (!*fmt)
+			--fmt;
+		goto parse_end;
+	}
+	switch (qualifier) {
+	case 'L':
+		*trace_size = sizeof(long long);
+		break;
+	case 'l':
+		*trace_size = sizeof(long);
+		break;
+	case 'Z':
+	case 'z':
+		*trace_size = sizeof(size_t);
+		break;
+	case 't':
+		*trace_size = sizeof(ptrdiff_t);
+		break;
+	case 'h':
+		*trace_size = sizeof(short);
+		break;
+	case '1':
+		*trace_size = sizeof(uint8_t);
+		break;
+	case '2':
+		*trace_size = sizeof(uint16_t);
+		break;
+	case '4':
+		*trace_size = sizeof(uint32_t);
+		break;
+	case '8':
+		*trace_size = sizeof(uint64_t);
+		break;
+	default:
+		*trace_size = sizeof(int);
+	}
+
+parse_end:
+	return fmt;
+}
+
+/*
+ * Restrictions:
+ * Field width and precision are *not* supported.
+ * %n not supported.
+ */
+static inline const char *parse_c_type(const char *fmt,
+		char *c_size, enum ltt_type *c_type)
+{
+	int qualifier;		/* 'h', 'l', or 'L' for integer fields */
+				/* 'z' support added 23/7/1999 S.H.    */
+				/* 'z' changed to 'Z' --davidm 1/25/99 */
+				/* 't' added for ptrdiff_t */
+
+	/* process flags : ignore standard print formats for now. */
+repeat:
+	switch (*fmt) {
+	case '-':
+	case '+':
+	case ' ':
+	case '#':
+	case '0':
+		++fmt;
+		goto repeat;
+	}
+
+	/* get the conversion qualifier */
+	qualifier = -1;
+	if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L' ||
+	    *fmt == 'Z' || *fmt == 'z' || *fmt == 't' ||
+	    *fmt == 'S') {
+		qualifier = *fmt;
+		++fmt;
+		if (qualifier == 'l' && *fmt == 'l') {
+			qualifier = 'L';
+			++fmt;
+		}
+	}
+
+	switch (*fmt) {
+	case 'c':
+		*c_type = LTT_TYPE_UNSIGNED_INT;
+		*c_size = sizeof(unsigned char);
+		goto parse_end;
+	case 's':
+		*c_type = LTT_TYPE_STRING;
+		goto parse_end;
+	case 'p':
+		*c_type = LTT_TYPE_UNSIGNED_INT;
+		*c_size = sizeof(void *);
+		goto parse_end;
+	case 'd':
+	case 'i':
+		*c_type = LTT_TYPE_SIGNED_INT;
+		break;
+	case 'o':
+	case 'u':
+	case 'x':
+	case 'X':
+		*c_type = LTT_TYPE_UNSIGNED_INT;
+		break;
+	default:
+		if (!*fmt)
+			--fmt;
+		goto parse_end;
+	}
+	switch (qualifier) {
+	case 'L':
+		*c_size = sizeof(long long);
+		break;
+	case 'l':
+		*c_size = sizeof(long);
+		break;
+	case 'Z':
+	case 'z':
+		*c_size = sizeof(size_t);
+		break;
+	case 't':
+		*c_size = sizeof(ptrdiff_t);
+		break;
+	case 'h':
+		*c_size = sizeof(short);
+		break;
+	default:
+		*c_size = sizeof(int);
+	}
+
+parse_end:
+	return fmt;
+}
+
+static inline size_t serialize_trace_data(struct rchan_buf *buf,
+		size_t buf_offset,
+		char trace_size, enum ltt_type trace_type,
+		char c_size, enum ltt_type c_type,
+		int *largest_align, va_list *args)
+{
+	union {
+		unsigned long v_ulong;
+		uint64_t v_uint64;
+		struct {
+			const char *s;
+			size_t len;
+		} v_string;
+	} tmp;
+
+	/*
+	 * Be careful about sign extension here.
+	 * Sign extension is done with the destination (trace) type.
+	 */
+	switch (trace_type) {
+	case LTT_TYPE_SIGNED_INT:
+		switch (c_size) {
+		case 1:
+			tmp.v_ulong = (long)(int8_t)va_arg(*args, int);
+			break;
+		case 2:
+			tmp.v_ulong = (long)(int16_t)va_arg(*args, int);
+			break;
+		case 4:
+			tmp.v_ulong = (long)(int32_t)va_arg(*args, int);
+			break;
+		case 8:
+			tmp.v_uint64 = va_arg(*args, int64_t);
+			break;
+		default:
+			BUG();
+		}
+		break;
+	case LTT_TYPE_UNSIGNED_INT:
+		switch (c_size) {
+		case 1:
+			tmp.v_ulong = (unsigned long)(uint8_t)
+					va_arg(*args, unsigned int);
+			break;
+		case 2:
+			tmp.v_ulong = (unsigned long)(uint16_t)
+					va_arg(*args, unsigned int);
+			break;
+		case 4:
+			tmp.v_ulong = (unsigned long)(uint32_t)
+					va_arg(*args, unsigned int);
+			break;
+		case 8:
+			tmp.v_uint64 = va_arg(*args, uint64_t);
+			break;
+		default:
+			BUG();
+		}
+		break;
+	case LTT_TYPE_STRING:
+		tmp.v_string.s = va_arg(*args, const char *);
+		if ((unsigned long)tmp.v_string.s < PAGE_SIZE)
+			tmp.v_string.s = "<NULL>";
+		tmp.v_string.len = strlen(tmp.v_string.s)+1;
+		if (buf)
+			ltt_relay_write(buf, buf_offset, tmp.v_string.s,
+				tmp.v_string.len);
+		buf_offset += tmp.v_string.len;
+		goto copydone;
+	default:
+		BUG();
+	}
+
+	/*
+	 * If trace_size is lower or equal to 4 bytes, there is no sign
+	 * extension to do because we are already encoded in a long. Therefore,
+	 * we can combine signed and unsigned ops. 4 bytes float also works
+	 * with this, because we do a simple copy of 4 bytes into 4 bytes
+	 * without manipulation (and we do not support conversion from integers
+	 * to floats).
+	 * It is also the case if c_size is 8 bytes, which is the largest
+	 * possible integer.
+	 */
+	if (ltt_get_alignment()) {
+		buf_offset += ltt_align(buf_offset, trace_size);
+		if (largest_align)
+			*largest_align = max_t(int, *largest_align, trace_size);
+	}
+	if (trace_size <= 4 || c_size == 8) {
+		if (buf) {
+			switch (trace_size) {
+			case 1:
+				if (c_size == 8)
+					ltt_relay_write(buf, buf_offset,
+					(uint8_t[]){ (uint8_t)tmp.v_uint64 },
+					sizeof(uint8_t));
+				else
+					ltt_relay_write(buf, buf_offset,
+					(uint8_t[]){ (uint8_t)tmp.v_ulong },
+					sizeof(uint8_t));
+				break;
+			case 2:
+				if (c_size == 8)
+					ltt_relay_write(buf, buf_offset,
+					(uint16_t[]){ (uint16_t)tmp.v_uint64 },
+					sizeof(uint16_t));
+				else
+					ltt_relay_write(buf, buf_offset,
+					(uint16_t[]){ (uint16_t)tmp.v_ulong },
+					sizeof(uint16_t));
+				break;
+			case 4:
+				if (c_size == 8)
+					ltt_relay_write(buf, buf_offset,
+					(uint32_t[]){ (uint32_t)tmp.v_uint64 },
+					sizeof(uint32_t));
+				else
+					ltt_relay_write(buf, buf_offset,
+					(uint32_t[]){ (uint32_t)tmp.v_ulong },
+					sizeof(uint32_t));
+				break;
+			case 8:
+				/*
+				 * c_size cannot be other than 8 here because
+				 * trace_size > 4.
+				 */
+				ltt_relay_write(buf, buf_offset,
+				(uint64_t[]){ (uint64_t)tmp.v_uint64 },
+				sizeof(uint64_t));
+				break;
+			default:
+				BUG();
+			}
+		}
+		buf_offset += trace_size;
+		goto copydone;
+	} else {
+		/*
+		 * Perform sign extension.
+		 */
+		if (buf) {
+			switch (trace_type) {
+			case LTT_TYPE_SIGNED_INT:
+				ltt_relay_write(buf, buf_offset,
+					(int64_t[]){ (int64_t)tmp.v_ulong },
+					sizeof(int64_t));
+				break;
+			case LTT_TYPE_UNSIGNED_INT:
+				ltt_relay_write(buf, buf_offset,
+					(uint64_t[]){ (uint64_t)tmp.v_ulong },
+					sizeof(uint64_t));
+				break;
+			default:
+				BUG();
+			}
+		}
+		buf_offset += trace_size;
+		goto copydone;
+	}
+
+copydone:
+	return buf_offset;
+}
+
+notrace size_t ltt_serialize_data(struct rchan_buf *buf, size_t buf_offset,
+			struct ltt_serialize_closure *closure,
+			void *serialize_private, int *largest_align,
+			const char *fmt, va_list *args)
+{
+	char trace_size = 0, c_size = 0;	/*
+						 * 0 (unset), 1, 2, 4, 8 bytes.
+						 */
+	enum ltt_type trace_type = LTT_TYPE_NONE, c_type = LTT_TYPE_NONE;
+	unsigned long attributes = 0;
+
+	for (; *fmt ; ++fmt) {
+		switch (*fmt) {
+		case '#':
+			/* tracetypes (#) */
+			++fmt;			/* skip first '#' */
+			if (*fmt == '#')	/* Escaped ## */
+				break;
+			attributes = 0;
+			fmt = parse_trace_type(fmt, &trace_size, &trace_type,
+				&attributes);
+			break;
+		case '%':
+			/* c types (%) */
+			++fmt;			/* skip first '%' */
+			if (*fmt == '%')	/* Escaped %% */
+				break;
+			fmt = parse_c_type(fmt, &c_size, &c_type);
+			/*
+			 * Output c types if no trace types has been
+			 * specified.
+			 */
+			if (!trace_size)
+				trace_size = c_size;
+			if (trace_type == LTT_TYPE_NONE)
+				trace_type = c_type;
+			if (c_type == LTT_TYPE_STRING)
+				trace_type = LTT_TYPE_STRING;
+			/* perform trace write */
+			buf_offset = serialize_trace_data(buf,
+						buf_offset, trace_size,
+						trace_type, c_size, c_type,
+						largest_align, args);
+			trace_size = 0;
+			c_size = 0;
+			trace_type = LTT_TYPE_NONE;
+			c_size = LTT_TYPE_NONE;
+			attributes = 0;
+			break;
+			/* default is to skip the text, doing nothing */
+		}
+	}
+	return buf_offset;
+}
+EXPORT_SYMBOL_GPL(ltt_serialize_data);
+
+/*
+ * Calculate data size
+ * Assume that the padding for alignment starts at a sizeof(void *) address.
+ */
+static notrace size_t ltt_get_data_size(struct ltt_serialize_closure *closure,
+				void *serialize_private, int *largest_align,
+				const char *fmt, va_list *args)
+{
+	ltt_serialize_cb cb = closure->callbacks[0];
+	closure->cb_idx = 0;
+	return (size_t)cb(NULL, 0, closure, serialize_private,
+				largest_align, fmt, args);
+}
+
+static notrace
+void ltt_write_event_data(struct rchan_buf *buf, size_t buf_offset,
+				struct ltt_serialize_closure *closure,
+				void *serialize_private, int largest_align,
+				const char *fmt, va_list *args)
+{
+	ltt_serialize_cb cb = closure->callbacks[0];
+	closure->cb_idx = 0;
+	buf_offset += ltt_align(buf_offset, largest_align);
+	cb(buf, buf_offset, closure, serialize_private, NULL, fmt, args);
+}
+
+
+notrace void ltt_vtrace(const struct marker *mdata, void *probe_data,
+			void *call_data, const char *fmt, va_list *args)
+{
+	int largest_align, ret;
+	struct ltt_active_marker *pdata;
+	uint16_t eID;
+	size_t data_size, slot_size;
+	unsigned int chan_index;
+	struct ltt_channel_struct *channel;
+	struct ltt_trace_struct *trace, *dest_trace = NULL;
+	struct rchan_buf *buf;
+	void *transport_data;
+	uint64_t tsc;
+	long buf_offset;
+	va_list args_copy;
+	struct ltt_serialize_closure closure;
+	struct ltt_probe_private_data *private_data = call_data;
+	void *serialize_private = NULL;
+	int cpu;
+	unsigned int rflags;
+
+	/*
+	 * This test is useful for quickly exiting static tracing when no trace
+	 * is active. We expect to have an active trace when we get here.
+	 */
+	if (unlikely(ltt_traces.num_active_traces == 0))
+		return;
+
+	rcu_read_lock_sched_notrace();
+	cpu = smp_processor_id();
+	__get_cpu_var(ltt_nesting)++;
+
+	pdata = (struct ltt_active_marker *)probe_data;
+	eID = mdata->event_id;
+	chan_index = mdata->channel_id;
+	closure.callbacks = pdata->probe->callbacks;
+
+	if (unlikely(private_data)) {
+		dest_trace = private_data->trace;
+		if (private_data->serializer)
+			closure.callbacks = &private_data->serializer;
+		serialize_private = private_data->serialize_private;
+	}
+
+	va_copy(args_copy, *args);
+	/*
+	 * Assumes event payload to start on largest_align alignment.
+	 */
+	largest_align = 1;	/* must be non-zero for ltt_align */
+	data_size = ltt_get_data_size(&closure, serialize_private,
+					&largest_align, fmt, &args_copy);
+	largest_align = min_t(int, largest_align, sizeof(void *));
+	va_end(args_copy);
+
+	/* Iterate on each trace */
+	list_for_each_entry_rcu(trace, &ltt_traces.head, list) {
+		/*
+		 * Expect the filter to filter out events. If we get here,
+		 * we went through tracepoint activation as a first step.
+		 */
+		if (unlikely(dest_trace && trace != dest_trace))
+			continue;
+		if (unlikely(!trace->active))
+			continue;
+		if (unlikely(!ltt_run_filter(trace, eID)))
+			continue;
+#ifdef CONFIG_LTT_DEBUG_EVENT_SIZE
+		rflags = LTT_RFLAG_ID_SIZE;
+#else
+		if (unlikely(eID >= LTT_FREE_EVENTS))
+			rflags = LTT_RFLAG_ID;
+		else
+			rflags = 0;
+#endif
+		/*
+		 * Skip channels added after trace creation.
+		 */
+		if (unlikely(chan_index >= trace->nr_channels))
+			continue;
+		channel = &trace->channels[chan_index];
+		if (!channel->active)
+			continue;
+
+		/* reserve space : header and data */
+		ret = ltt_reserve_slot(trace, channel, &transport_data,
+					data_size, &slot_size, &buf_offset,
+					&tsc, &rflags,
+					largest_align, cpu);
+		if (unlikely(ret < 0))
+			continue; /* buffer full */
+
+		va_copy(args_copy, *args);
+		/* FIXME : could probably encapsulate transport better. */
+		buf = ((struct rchan *)channel->trans_channel_data)->buf[cpu];
+		/* Out-of-order write : header and data */
+		buf_offset = ltt_write_event_header(trace,
+					channel, buf, buf_offset,
+					eID, data_size, tsc, rflags);
+		ltt_write_event_data(buf, buf_offset, &closure,
+					serialize_private,
+					largest_align, fmt, &args_copy);
+		va_end(args_copy);
+		/* Out-of-order commit */
+		ltt_commit_slot(channel, &transport_data, buf_offset,
+				slot_size);
+	}
+	__get_cpu_var(ltt_nesting)--;
+	rcu_read_unlock_sched_notrace();
+}
+EXPORT_SYMBOL_GPL(ltt_vtrace);
+
+notrace void ltt_trace(const struct marker *mdata, void *probe_data,
+		       void *call_data, const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	ltt_vtrace(mdata, probe_data, call_data, fmt, &args);
+	va_end(args);
+}
+EXPORT_SYMBOL_GPL(ltt_trace);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit Next Generation Serializer");

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 18/41] Seq_file add support for sorted list
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (16 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 17/41] LTTng - serialization Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 19/41] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: seq_file_sorted.patch --]
[-- Type: text/plain, Size: 3688 bytes --]

Add support for sorted list in seq_file. It aims at changing the way
/proc/modules and kallsyms iterates on the module list to remove a race between
module unload and module/symbol listing.

The list is sorted by ascending list_head pointer address.

Changelog:

When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.

Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 fs/seq_file.c            |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/seq_file.h |   20 ++++++++++++++++++++
 2 files changed, 64 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/fs/seq_file.c
===================================================================
--- linux-2.6-lttng.orig/fs/seq_file.c	2009-03-05 15:44:04.000000000 -0500
+++ linux-2.6-lttng/fs/seq_file.c	2009-03-05 15:44:36.000000000 -0500
@@ -671,5 +671,48 @@ struct list_head *seq_list_next(void *v,
 	++*ppos;
 	return lh == head ? NULL : lh;
 }
-
 EXPORT_SYMBOL(seq_list_next);
+
+struct list_head *seq_sorted_list_start(struct list_head *head, loff_t *ppos)
+{
+	struct list_head *lh;
+
+	list_for_each(lh, head)
+		if ((unsigned long)lh >= *ppos) {
+			*ppos = (unsigned long)lh;
+			return lh;
+		}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_sorted_list_start);
+
+struct list_head *seq_sorted_list_start_head(struct list_head *head,
+		loff_t *ppos)
+{
+	struct list_head *lh;
+
+	if (!*ppos) {
+		*ppos = (unsigned long)head;
+		return head;
+	}
+	list_for_each(lh, head)
+		if ((unsigned long)lh >= *ppos) {
+			*ppos = (long)lh->prev;
+			return lh->prev;
+		}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_sorted_list_start_head);
+
+struct list_head *seq_sorted_list_next(void *p, struct list_head *head,
+		loff_t *ppos)
+{
+	struct list_head *lh;
+	void *next;
+
+	lh = ((struct list_head *)p)->next;
+	next = (lh == head) ? NULL : lh;
+	*ppos = next ? ((unsigned long)next) : (-1UL);
+	return next;
+}
+EXPORT_SYMBOL(seq_sorted_list_next);
Index: linux-2.6-lttng/include/linux/seq_file.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/seq_file.h	2009-03-05 15:44:04.000000000 -0500
+++ linux-2.6-lttng/include/linux/seq_file.h	2009-03-05 15:44:05.000000000 -0500
@@ -95,4 +95,24 @@ extern struct list_head *seq_list_start_
 extern struct list_head *seq_list_next(void *v, struct list_head *head,
 		loff_t *ppos);
 
+/*
+ * Helpers for iteration over a list sorted by ascending head pointer address.
+ * To be used in contexts where preemption cannot be disabled to insure to
+ * continue iteration on a modified list starting at the same location where it
+ * stopped, or at a following location. It insures that the lost information
+ * will only be in elements added/removed from the list between iterations.
+ * void *pos is only used to get the next list element and may not be a valid
+ * list_head anymore when given to seq_sorted_list_start() or
+ * seq_sorted_list_start_head().
+ */
+extern struct list_head *seq_sorted_list_start(struct list_head *head,
+		loff_t *ppos);
+extern struct list_head *seq_sorted_list_start_head(struct list_head *head,
+		loff_t *ppos);
+/*
+ * next must be called with an existing p node
+ */
+extern struct list_head *seq_sorted_list_next(void *p, struct list_head *head,
+		loff_t *ppos);
+
 #endif

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 19/41] Sort module list by pointer address to get coherent sleepable seq_file iterators
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (17 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 18/41] Seq_file add support for sorted list Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 20/41] Linux Kernel Markers - Iterator Mathieu Desnoyers
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: module.c-sort-module-list.patch --]
[-- Type: text/plain, Size: 5867 bytes --]

A race that appears both in /proc/modules and in kallsyms: if, between the
seq file reads, the process is put to sleep and at this moment a module is
or removed from the module list, the listing will skip an amount of
modules/symbols corresponding to the amount of elements present in the unloaded
module, but at the current position in the list if the iteration is located
after the removed module.

The cleanest way I found to deal with this problem is to sort the module list.
We can then keep the old struct module * as the old iterator, knowing the it may
be removed between the seq file reads, but we only use it as "get next". If it
is not present in the module list, the next pointer will be used.

By doing this, removing a given module will now only fuzz the output related to
this specific module, not any random module anymore. Since modprobe uses
/proc/modules, it might be important to make sure multiple concurrent running
modprobes won't interfere with each other.


Small test program for this:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define BUFSIZE 1024

int main()
{
	int fd = open("/proc/modules", O_RDONLY);
	char buf[BUFSIZE];
	ssize_t size;

	do {
		size = read(fd, buf, 1);
		printf("%c", buf[0]);
		usleep(100000);
	} while(size > 0);

	close(fd);
	return 0;
}


Actual test (kernel 2.6.23-rc3):

dijkstra:~# lsmod
Module                  Size  Used by
pl2303                 18564  0
usbserial              29032  1 pl2303
ppdev                   7844  0
sky2                   37476  0
skge                   36368  0
rtc                    10104  0
snd_hda_intel         265628  0

  (here, while we are printing the 2nd line, I rmmod pl2303)
compudj@dijkstra:~/test$ ./module
pl2303 18564 0 - Live 0xf886e000
usbserial 29032 1 pl2303, Live 0xf8865000
sky2 37476 0 - Live 0xf884f000
skge 36368 0 - Live 0xf8838000
rtc 10104 0 - Live 0xf8825000

We see the the 2nd line is garbage.

Now, with my patch applied:
  (here, while we are printing the rtc module, I rmmod rtc)
nd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000

We see that since the rtc line was already in the buffer, it has been
printed completely.


  (here, while we are printing the skge module, I rmmod rtc)
snd_hda_intel 268708 0 - Live 0xf8820000
ltt_control 2372 0 - Live 0xf8866000
rtc 10392 0 - Live 0xf886d000
skge 36768 0 - Live 0xf8871000
ltt_statedump 8516 0 - Live 0xf887b000
sky2 38420 0 - Live 0xf88cd000

We see that the iteration continued at the same position even though the
rtc module, located in earlier addresses, was removed.

Changelog:

When reading the data by small chunks (i.e. byte by byte), the index (ppos) is
incremented by seq_read() directly and no "next" callback is called when going
to the next module.

Therefore, use ppos instead of m->private to deal with the fact that this index
is incremented directly to pass to the next module in seq_read() after the
buffer has been emptied.

Before fix, it prints the first module indefinitely. The patch fixes
this.

Changelog:
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 kernel/module.c |   27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2009-03-05 15:29:39.000000000 -0500
+++ linux-2.6-lttng/kernel/module.c	2009-03-05 15:45:33.000000000 -0500
@@ -66,7 +66,9 @@
 #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
 
 /* List of modules, protected by module_mutex or preempt_disable
- * (delete uses stop_machine/add uses RCU list operations). */
+ * (delete uses stop_machine/add uses RCU list operations).
+ * Sorted by ascending list node address.
+ */
 static DEFINE_MUTEX(module_mutex);
 static LIST_HEAD(modules);
 
@@ -1877,6 +1879,7 @@ static noinline struct module *load_modu
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
 	unsigned long *mseg;
 	mm_segment_t old_fs;
+	struct module *iter;
 
 	DEBUGP("load_module: umod=%p, len=%lu, uargs=%p\n",
 	       umod, len, uargs);
@@ -2260,8 +2263,24 @@ static noinline struct module *load_modu
 	 * function to insert in a way safe to concurrent readers.
 	 * The mutex protects against concurrent writers.
 	 */
-	list_add_rcu(&mod->list, &modules);
 
+	/*
+	 * We sort the modules by struct module pointer address to permit
+	 * correct iteration over modules of, at least, kallsyms for preemptible
+	 * operations, such as read(). Sorting by struct module pointer address
+	 * is equivalent to sort by list node address.
+	 */
+	list_for_each_entry_reverse(iter, &modules, list) {
+		BUG_ON(iter == mod);	/* Should never be in the list twice */
+		if (iter < mod) {
+			/* We belong to the location right after iter. */
+			list_add_rcu(&mod->list, &iter->list);
+			goto module_added;
+		}
+	}
+	/* We should be added at the head of the list */
+	list_add_rcu(&mod->list, &modules);
+module_added:
 	err = parse_args(mod->name, mod->args, kp, num_kp, NULL);
 	if (err < 0)
 		goto unlink;
@@ -2618,12 +2637,12 @@ static char *module_flags(struct module 
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
 	mutex_lock(&module_mutex);
-	return seq_list_start(&modules, *pos);
+	return seq_sorted_list_start(&modules, pos);
 }
 
 static void *m_next(struct seq_file *m, void *p, loff_t *pos)
 {
-	return seq_list_next(p, &modules, pos);
+	return seq_sorted_list_next(p, &modules, pos);
 }
 
 static void m_stop(struct seq_file *m, void *p)

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 20/41] Linux Kernel Markers - Iterator
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (18 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 19/41] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 21/41] LTTng probes specialized tracepoints Mathieu Desnoyers
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: linux-kernel-markers-iterator.patch --]
[-- Type: text/plain, Size: 5374 bytes --]

Add marker iterators. Useful for /proc interface (listing markers).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/marker.h |   12 ++++++++
 include/linux/module.h |    6 ++++
 kernel/marker.c        |   70 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/module.c        |   32 ++++++++++++++++++++++
 4 files changed, 120 insertions(+)

Index: linux-2.6-lttng/kernel/marker.c
===================================================================
--- linux-2.6-lttng.orig/kernel/marker.c	2009-03-05 15:21:58.000000000 -0500
+++ linux-2.6-lttng/kernel/marker.c	2009-03-05 15:45:38.000000000 -0500
@@ -898,6 +898,76 @@ EXPORT_SYMBOL_GPL(marker_get_private_dat
 
 #ifdef CONFIG_MODULES
 
+/**
+ * marker_get_iter_range - Get a next marker iterator given a range.
+ * @marker: current markers (in), next marker (out)
+ * @begin: beginning of the range
+ * @end: end of the range
+ *
+ * Returns whether a next marker has been found (1) or not (0).
+ * Will return the first marker in the range if the input marker is NULL.
+ */
+int marker_get_iter_range(struct marker **marker, struct marker *begin,
+	struct marker *end)
+{
+	if (!*marker && begin != end) {
+		*marker = begin;
+		return 1;
+	}
+	if (*marker >= begin && *marker < end)
+		return 1;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(marker_get_iter_range);
+
+static void marker_get_iter(struct marker_iter *iter)
+{
+	int found = 0;
+
+	/* Core kernel markers */
+	if (!iter->module) {
+		found = marker_get_iter_range(&iter->marker,
+				__start___markers, __stop___markers);
+		if (found)
+			goto end;
+	}
+	/* Markers in modules. */
+	found = module_get_iter_markers(iter);
+end:
+	if (!found)
+		marker_iter_reset(iter);
+}
+
+void marker_iter_start(struct marker_iter *iter)
+{
+	marker_get_iter(iter);
+}
+EXPORT_SYMBOL_GPL(marker_iter_start);
+
+void marker_iter_next(struct marker_iter *iter)
+{
+	iter->marker++;
+	/*
+	 * iter->marker may be invalid because we blindly incremented it.
+	 * Make sure it is valid by marshalling on the markers, getting the
+	 * markers from following modules if necessary.
+	 */
+	marker_get_iter(iter);
+}
+EXPORT_SYMBOL_GPL(marker_iter_next);
+
+void marker_iter_stop(struct marker_iter *iter)
+{
+}
+EXPORT_SYMBOL_GPL(marker_iter_stop);
+
+void marker_iter_reset(struct marker_iter *iter)
+{
+	iter->module = NULL;
+	iter->marker = NULL;
+}
+EXPORT_SYMBOL_GPL(marker_iter_reset);
+
 int marker_module_notify(struct notifier_block *self,
 			 unsigned long val, void *data)
 {
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2009-03-05 15:45:37.000000000 -0500
+++ linux-2.6-lttng/kernel/module.c	2009-03-05 15:45:38.000000000 -0500
@@ -2814,6 +2814,38 @@ void module_update_markers(void)
 				mod->markers + mod->num_markers);
 	mutex_unlock(&module_mutex);
 }
+
+/*
+ * Returns 0 if current not found.
+ * Returns 1 if current found.
+ */
+int module_get_iter_markers(struct marker_iter *iter)
+{
+	struct module *iter_mod;
+	int found = 0;
+
+	mutex_lock(&module_mutex);
+	list_for_each_entry(iter_mod, &modules, list) {
+		if (!iter_mod->taints) {
+			/*
+			 * Sorted module list
+			 */
+			if (iter_mod < iter->module)
+				continue;
+			else if (iter_mod > iter->module)
+				iter->marker = NULL;
+			found = marker_get_iter_range(&iter->marker,
+				iter_mod->markers,
+				iter_mod->markers + iter_mod->num_markers);
+			if (found) {
+				iter->module = iter_mod;
+				break;
+			}
+		}
+	}
+	mutex_unlock(&module_mutex);
+	return found;
+}
 #endif
 
 #ifdef CONFIG_TRACEPOINTS
Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2009-03-05 15:21:58.000000000 -0500
+++ linux-2.6-lttng/include/linux/marker.h	2009-03-05 15:45:38.000000000 -0500
@@ -218,4 +218,16 @@ extern void *marker_get_private_data(con
  */
 #define marker_synchronize_unregister() synchronize_sched()
 
+struct marker_iter {
+	struct module *module;
+	struct marker *marker;
+};
+
+extern void marker_iter_start(struct marker_iter *iter);
+extern void marker_iter_next(struct marker_iter *iter);
+extern void marker_iter_stop(struct marker_iter *iter);
+extern void marker_iter_reset(struct marker_iter *iter);
+extern int marker_get_iter_range(struct marker **marker, struct marker *begin,
+	struct marker *end);
+
 #endif
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2009-03-05 15:21:58.000000000 -0500
+++ linux-2.6-lttng/include/linux/module.h	2009-03-05 15:45:38.000000000 -0500
@@ -472,6 +472,7 @@ int unregister_module_notifier(struct no
 extern void print_modules(void);
 
 extern void module_update_markers(void);
+extern int module_get_iter_markers(struct marker_iter *iter);
 
 extern void module_update_tracepoints(void);
 extern int module_get_iter_tracepoints(struct tracepoint_iter *iter);
@@ -589,6 +590,11 @@ static inline int module_get_iter_tracep
 	return 0;
 }
 
+static inline int module_get_iter_markers(struct marker_iter *iter)
+{
+	return 0;
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 21/41] LTTng probes specialized tracepoints
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (19 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 20/41] Linux Kernel Markers - Iterator Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 22/41] LTTng marker control Mathieu Desnoyers
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-type-serializer.patch --]
[-- Type: text/plain, Size: 6652 bytes --]

tracing tbench runs gives those top event counts :

(~7 sec run)

kernel_arch_syscall_exit :   4570630
kernel_arch_syscall_entry :  4570589
kernel_timer_set :           2276276
kernel_softirq_entry :       1446470
kernel_softirq_exit :        1446469
kernel_sched_schedule :      1362552
kernel_sched_try_wakeup :    1140044
mm_page_alloc :              1033063
mm_page_free :                927878

All other events are much lower :
fs_write :                     20575

This patch creates specialized probes to accelerate tbench high-rate
events.

TODO : do lockdep specialized probes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/ltt-type-serializer.h |  107 ++++++++++++++++++++++++++++++++++++
 ltt/ltt-type-serializer.c           |   96 ++++++++++++++++++++++++++++++++
 2 files changed, 203 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-type-serializer.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-type-serializer.c	2009-03-04 14:08:15.000000000 -0500
@@ -0,0 +1,96 @@
+/**
+ * ltt-type-serializer.c
+ *
+ * LTTng specialized type serializer.
+ *
+ * Copyright Mathieu Desnoyers, 2008.
+ */
+#include <linux/module.h>
+#include <linux/ltt-type-serializer.h>
+
+notrace void _ltt_specialized_trace(const struct marker *mdata,
+		void *probe_data,
+		void *serialize_private, unsigned int data_size,
+		unsigned int largest_align)
+{
+	int ret;
+	uint16_t eID;
+	size_t slot_size;
+	unsigned int chan_index;
+	struct ltt_channel_struct *channel;
+	struct ltt_trace_struct *trace;
+	struct rchan_buf *buf;
+	void *transport_data;
+	uint64_t tsc;
+	long buf_offset;
+	int cpu;
+	unsigned int rflags;
+
+	/*
+	 * If we get here, it's probably because we have useful work to do.
+	 */
+	if (unlikely(ltt_traces.num_active_traces == 0))
+		return;
+
+	rcu_read_lock_sched_notrace();
+	cpu = smp_processor_id();
+	__get_cpu_var(ltt_nesting)++;
+
+	eID = mdata->event_id;
+	chan_index = mdata->channel_id;
+
+	/* Iterate on each trace */
+	list_for_each_entry_rcu(trace, &ltt_traces.head, list) {
+		if (unlikely(!trace->active))
+			continue;
+		if (unlikely(!ltt_run_filter(trace, eID)))
+			continue;
+#ifdef CONFIG_LTT_DEBUG_EVENT_SIZE
+		rflags = LTT_RFLAG_ID_SIZE;
+#else
+		if (unlikely(eID >= LTT_FREE_EVENTS))
+			rflags = LTT_RFLAG_ID;
+		else
+			rflags = 0;
+#endif
+		/*
+		 * Skip channels added after trace creation.
+		 */
+		if (unlikely(chan_index >= trace->nr_channels))
+			continue;
+		channel = &trace->channels[chan_index];
+		if (!channel->active)
+			continue;
+
+		/* reserve space : header and data */
+		ret = ltt_reserve_slot(trace, channel, &transport_data,
+					data_size, &slot_size, &buf_offset,
+					&tsc, &rflags,
+					largest_align, cpu);
+		if (unlikely(ret < 0))
+			continue; /* buffer full */
+
+		/* FIXME : could probably encapsulate transport better. */
+		buf = ((struct rchan *)channel->trans_channel_data)->buf[cpu];
+		/* Out-of-order write : header and data */
+		buf_offset = ltt_write_event_header(trace,
+					channel, buf, buf_offset,
+					eID, data_size, tsc, rflags);
+		if (data_size) {
+			buf_offset += ltt_align(buf_offset, largest_align);
+			ltt_relay_write(buf, buf_offset, serialize_private,
+				data_size);
+			buf_offset += data_size;
+		}
+		/* Out-of-order commit */
+		ltt_commit_slot(channel, &transport_data, buf_offset,
+				slot_size);
+	}
+	__get_cpu_var(ltt_nesting)--;
+	rcu_read_unlock_sched_notrace();
+}
+EXPORT_SYMBOL_GPL(_ltt_specialized_trace);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("LTT type serializer");
Index: linux-2.6-lttng/include/linux/ltt-type-serializer.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/ltt-type-serializer.h	2009-03-04 14:08:15.000000000 -0500
@@ -0,0 +1,107 @@
+#ifndef _LTT_TYPE_SERIALIZER_H
+#define _LTT_TYPE_SERIALIZER_H
+
+#include <linux/ltt-tracer.h>
+
+/*
+ * largest_align must be non-zero, equal to the minimum between the largest type
+ * and sizeof(void *).
+ */
+extern void _ltt_specialized_trace(const struct marker *mdata, void *probe_data,
+		void *serialize_private, unsigned int data_size,
+		unsigned int largest_align);
+
+/*
+ * Statically check that 0 < largest_align < sizeof(void *) to make sure it is
+ * dumb-proof. It will make sure 0 is changed into 1 and unsigned long long is
+ * changed into sizeof(void *) on 32-bit architectures.
+ */
+static inline void ltt_specialized_trace(const struct marker *mdata,
+		void *probe_data,
+		void *serialize_private, unsigned int data_size,
+		unsigned int largest_align)
+{
+	largest_align = min_t(unsigned int, largest_align, sizeof(void *));
+	largest_align = max_t(unsigned int, largest_align, 1);
+	_ltt_specialized_trace(mdata, probe_data, serialize_private, data_size,
+		largest_align);
+}
+
+/*
+ * Type serializer definitions.
+ */
+
+/*
+ * Return size of structure without end-of-structure padding.
+ */
+#define serialize_sizeof(type)	offsetof(typeof(type), end_field)
+
+struct serialize_long_int {
+	unsigned long f1;
+	unsigned int f2;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_int_int_long {
+	unsigned int f1;
+	unsigned int f2;
+	unsigned long f3;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_int_int_short {
+	unsigned int f1;
+	unsigned int f2;
+	unsigned short f3;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_long_long {
+	unsigned long f1;
+	unsigned long f2;
+	unsigned long f3;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_long_int {
+	unsigned long f1;
+	unsigned long f2;
+	unsigned int f3;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_long_short_char {
+	unsigned long f1;
+	unsigned long f2;
+	unsigned short f3;
+	unsigned char f4;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_long_short {
+	unsigned long f1;
+	unsigned long f2;
+	unsigned short f3;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_short_char {
+	unsigned long f1;
+	unsigned short f2;
+	unsigned char f3;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_short {
+	unsigned long f1;
+	unsigned short f2;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+struct serialize_long_char {
+	unsigned long f1;
+	unsigned char f2;
+	unsigned char end_field[0];
+} LTT_ALIGN;
+
+#endif /* _LTT_TYPE_SERIALIZER_H */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 22/41] LTTng marker control
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (20 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 21/41] LTTng probes specialized tracepoints Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 23/41] Immediate Values Stub header Mathieu Desnoyers
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Jan Kiszka

[-- Attachment #1: lttng-marker-control.patch --]
[-- Type: text/plain, Size: 7470 bytes --]

Marker control. Depends on LTT to assign numeric IDs to markers. Used by the
following ltt trace control module which exports these interfaces through
debugfs.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jan Kiszka <jan.kiszka@siemens.com>
---
 ltt/ltt-marker-control.c |  265 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 265 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-marker-control.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-marker-control.c	2009-02-06 18:15:14.000000000 -0500
@@ -0,0 +1,265 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/stat.h>
+#include <linux/vmalloc.h>
+#include <linux/marker.h>
+#include <linux/ltt-tracer.h>
+#include <linux/uaccess.h>
+#include <linux/string.h>
+#include <linux/ctype.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+#define DEFAULT_CHANNEL "cpu"
+#define DEFAULT_PROBE "default"
+
+LIST_HEAD(probes_list);
+
+/*
+ * Mutex protecting the probe slab cache.
+ * Nests inside the traces mutex.
+ */
+DEFINE_MUTEX(probes_mutex);
+
+struct ltt_available_probe default_probe = {
+	.name = "default",
+	.format = NULL,
+	.probe_func = ltt_vtrace,
+	.callbacks[0] = ltt_serialize_data,
+};
+
+static struct kmem_cache *markers_loaded_cachep;
+static LIST_HEAD(markers_loaded_list);
+/*
+ * List sorted by name strcmp order.
+ */
+static LIST_HEAD(probes_registered_list);
+
+static struct ltt_available_probe *get_probe_from_name(const char *pname)
+{
+	struct ltt_available_probe *iter;
+	int comparison, found = 0;
+
+	if (!pname)
+		pname = DEFAULT_PROBE;
+	list_for_each_entry(iter, &probes_registered_list, node) {
+		comparison = strcmp(pname, iter->name);
+		if (!comparison)
+			found = 1;
+		if (comparison <= 0)
+			break;
+	}
+	if (found)
+		return iter;
+	else
+		return NULL;
+}
+
+int ltt_probe_register(struct ltt_available_probe *pdata)
+{
+	int ret = 0;
+	int comparison;
+	struct ltt_available_probe *iter;
+
+	mutex_lock(&probes_mutex);
+	list_for_each_entry_reverse(iter, &probes_registered_list, node) {
+		comparison = strcmp(pdata->name, iter->name);
+		if (!comparison) {
+			ret = -EBUSY;
+			goto end;
+		} else if (comparison > 0) {
+			/* We belong to the location right after iter. */
+			list_add(&pdata->node, &iter->node);
+			goto end;
+		}
+	}
+	/* Should be added at the head of the list */
+	list_add(&pdata->node, &probes_registered_list);
+end:
+	mutex_unlock(&probes_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_probe_register);
+
+/*
+ * Called when a probe does not want to be called anymore.
+ */
+int ltt_probe_unregister(struct ltt_available_probe *pdata)
+{
+	int ret = 0;
+	struct ltt_active_marker *amark, *tmp;
+
+	mutex_lock(&probes_mutex);
+	list_for_each_entry_safe(amark, tmp, &markers_loaded_list, node) {
+		if (amark->probe == pdata) {
+			ret = marker_probe_unregister_private_data(
+				pdata->probe_func, amark);
+			if (ret)
+				goto end;
+			list_del(&amark->node);
+			kmem_cache_free(markers_loaded_cachep, amark);
+		}
+	}
+	list_del(&pdata->node);
+end:
+	mutex_unlock(&probes_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_probe_unregister);
+
+/*
+ * Connect marker "mname" to probe "pname".
+ * Only allow _only_ probe instance to be connected to a marker.
+ */
+int ltt_marker_connect(const char *channel, const char *mname,
+		       const char *pname)
+
+{
+	int ret;
+	struct ltt_active_marker *pdata;
+	struct ltt_available_probe *probe;
+
+	ltt_lock_traces();
+	mutex_lock(&probes_mutex);
+	probe = get_probe_from_name(pname);
+	if (!probe) {
+		ret = -ENOENT;
+		goto end;
+	}
+	pdata = marker_get_private_data(channel, mname, probe->probe_func, 0);
+	if (pdata && !IS_ERR(pdata)) {
+		ret = -EEXIST;
+		goto end;
+	}
+	pdata = kmem_cache_zalloc(markers_loaded_cachep, GFP_KERNEL);
+	if (!pdata) {
+		ret = -ENOMEM;
+		goto end;
+	}
+	pdata->probe = probe;
+	/*
+	 * ID has priority over channel in case of conflict.
+	 */
+	ret = marker_probe_register(channel, mname, NULL,
+		probe->probe_func, pdata);
+	if (ret)
+		kmem_cache_free(markers_loaded_cachep, pdata);
+	else
+		list_add(&pdata->node, &markers_loaded_list);
+end:
+	mutex_unlock(&probes_mutex);
+	ltt_unlock_traces();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_marker_connect);
+
+/*
+ * Disconnect marker "mname", probe "pname".
+ */
+int ltt_marker_disconnect(const char *channel, const char *mname,
+			  const char *pname)
+{
+	struct ltt_active_marker *pdata;
+	struct ltt_available_probe *probe;
+	int ret = 0;
+
+	mutex_lock(&probes_mutex);
+	probe = get_probe_from_name(pname);
+	if (!probe) {
+		ret = -ENOENT;
+		goto end;
+	}
+	pdata = marker_get_private_data(channel, mname, probe->probe_func, 0);
+	if (IS_ERR(pdata)) {
+		ret = PTR_ERR(pdata);
+		goto end;
+	} else if (!pdata) {
+		/*
+		 * Not registered by us.
+		 */
+		ret = -EPERM;
+		goto end;
+	}
+	ret = marker_probe_unregister(channel, mname, probe->probe_func, pdata);
+	if (ret)
+		goto end;
+	else {
+		list_del(&pdata->node);
+		kmem_cache_free(markers_loaded_cachep, pdata);
+	}
+end:
+	mutex_unlock(&probes_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_marker_disconnect);
+
+static void disconnect_all_markers(void)
+{
+	struct ltt_active_marker *pdata, *tmp;
+
+	list_for_each_entry_safe(pdata, tmp, &markers_loaded_list, node) {
+		marker_probe_unregister_private_data(pdata->probe->probe_func,
+			pdata);
+		list_del(&pdata->node);
+		kmem_cache_free(markers_loaded_cachep, pdata);
+	}
+}
+
+static int __init marker_control_init(void)
+{
+	int ret;
+
+	markers_loaded_cachep = KMEM_CACHE(ltt_active_marker, 0);
+
+	ret = ltt_probe_register(&default_probe);
+	BUG_ON(ret);
+	ret = ltt_marker_connect("metadata", "core_marker_format",
+				 DEFAULT_PROBE);
+	BUG_ON(ret);
+	ret = ltt_marker_connect("metadata", "core_marker_id", DEFAULT_PROBE);
+	BUG_ON(ret);
+
+	return 0;
+}
+module_init(marker_control_init);
+
+static void __exit marker_control_exit(void)
+{
+	int ret;
+
+	ret = ltt_marker_disconnect("metadata", "core_marker_format",
+				    DEFAULT_PROBE);
+	BUG_ON(ret);
+	ret = ltt_marker_disconnect("metadata", "core_marker_id",
+				    DEFAULT_PROBE);
+	BUG_ON(ret);
+	ret = ltt_probe_unregister(&default_probe);
+	BUG_ON(ret);
+	disconnect_all_markers();
+	kmem_cache_destroy(markers_loaded_cachep);
+	marker_synchronize_unregister();
+}
+module_exit(marker_control_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit Marker Control");

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 23/41] Immediate Values Stub header
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (21 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 22/41] LTTng marker control Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 24/41] Linux Kernel Markers - Use Immediate Values Mathieu Desnoyers
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-value-stub-header.patch --]
[-- Type: text/plain, Size: 2955 bytes --]

For Markers tree integration.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/immediate.h |   94 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2009-02-06 15:20:17.000000000 -0500
@@ -0,0 +1,94 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+struct __imv {
+	unsigned long var;	/* Pointer to the identifier variable of the
+				 * immediate value
+				 */
+	unsigned long imv;	/*
+				 * Pointer to the memory location of the
+				 * immediate value within the instruction.
+				 */
+	unsigned char size;	/* Type size. */
+} __attribute__ ((packed));
+
+#include <asm/immediate.h>
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)						\
+	do {								\
+		name##__imv = (i);					\
+		core_imv_update();					\
+		module_imv_update();					\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_imv_update(void);
+extern void imv_update_range(const struct __imv *begin,
+	const struct __imv *end);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define imv_read(name)			_imv_read(name)
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)		(name##__imv = (i))
+
+static inline void core_imv_update(void) { }
+static inline void module_imv_update(void) { }
+
+#endif
+
+#define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
+#define DEFINE_IMV(type, name)  __typeof__(type) name##__imv
+
+#define EXPORT_IMV_SYMBOL(name) EXPORT_SYMBOL(name##__imv)
+#define EXPORT_IMV_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__imv)
+
+/**
+ * _imv_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _imv_read(name)		(name##__imv)
+
+#endif

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 24/41] Linux Kernel Markers - Use Immediate Values
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (22 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 23/41] Immediate Values Stub header Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 25/41] Markers Support for Proprierary Modules Mathieu Desnoyers
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: linux-kernel-markers-immediate-values.patch --]
[-- Type: text/plain, Size: 5580 bytes --]

Make markers use immediate values.

Changelog :
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 Documentation/markers.txt |   17 +++++++++++++----
 include/linux/marker.h    |   16 ++++++++++++----
 kernel/marker.c           |   12 ++++++++----
 ltt/ltt-marker-control.c  |    4 ++--
 4 files changed, 35 insertions(+), 14 deletions(-)

Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2009-02-06 15:03:19.000000000 -0500
+++ linux-2.6-lttng/include/linux/marker.h	2009-02-06 15:17:46.000000000 -0500
@@ -14,6 +14,7 @@
 
 #include <stdarg.h>
 #include <linux/types.h>
+#include <linux/immediate.h>
 
 struct module;
 struct marker;
@@ -43,7 +44,7 @@ struct marker {
 	const char *format;	/* Marker format string, describing the
 				 * variable argument list.
 				 */
-	char state;		/* Marker state. */
+	DEFINE_IMV(char, state);/* Immediate value state. */
 	char ptype;		/* probe type : 0 : single, 1 : multi */
 				/* Probe wrapper */
 	void (*call)(const struct marker *mdata, void *call_private, ...);
@@ -86,9 +87,16 @@ struct marker {
 	do {								\
 		DEFINE_MARKER(name, format);				\
 		__mark_check_format(format, ## args);			\
-		if (unlikely(__mark_##name.state)) {			\
-			(*__mark_##name.call)				\
-				(&__mark_##name, call_private, ## args);\
+		if (!generic) {						\
+			if (unlikely(imv_read(__mark_##name.state)))	\
+				(*__mark_##name.call)			\
+					(&__mark_##name, call_private,	\
+					## args);			\
+		} else {						\
+			if (unlikely(_imv_read(__mark_##name.state)))	\
+				(*__mark_##name.call)			\
+					(&__mark_##name, call_private,	\
+					## args);			\
 		}							\
 	} while (0)
 
Index: linux-2.6-lttng/kernel/marker.c
===================================================================
--- linux-2.6-lttng.orig/kernel/marker.c	2009-02-06 15:03:19.000000000 -0500
+++ linux-2.6-lttng/kernel/marker.c	2009-02-06 15:17:46.000000000 -0500
@@ -24,6 +24,7 @@
 #include <linux/marker.h>
 #include <linux/err.h>
 #include <linux/slab.h>
+#include <linux/immediate.h>
 
 extern struct marker __start___markers[];
 extern struct marker __stop___markers[];
@@ -532,7 +533,7 @@ static int set_marker(struct marker_entr
 	smp_wmb();
 	elem->ptype = entry->ptype;
 
-	if (elem->tp_name && (active ^ elem->state)) {
+	if (elem->tp_name && (active ^ _imv_read(elem->state))) {
 		WARN_ON(!elem->tp_cb);
 		/*
 		 * It is ok to directly call the probe registration because type
@@ -562,7 +563,7 @@ static int set_marker(struct marker_entr
 				(unsigned long)elem->tp_cb));
 		}
 	}
-	elem->state = active;
+	elem->state__imv = active;
 
 	return ret;
 }
@@ -578,7 +579,7 @@ static void disable_marker(struct marker
 	int ret;
 
 	/* leave "call" as is. It is known statically. */
-	if (elem->tp_name && elem->state) {
+	if (elem->tp_name && _imv_read(elem->state)) {
 		WARN_ON(!elem->tp_cb);
 		/*
 		 * It is ok to directly call the probe registration because type
@@ -593,7 +594,7 @@ static void disable_marker(struct marker
 		 */
 		module_put(__module_text_address((unsigned long)elem->tp_cb));
 	}
-	elem->state = 0;
+	elem->state__imv = 0;
 	elem->single.func = __mark_empty_function;
 	/* Update the function before setting the ptype */
 	smp_wmb();
@@ -657,6 +658,9 @@ static void marker_update_probes(void)
 	/* Markers in modules. */
 	module_update_markers();
 	tracepoint_probe_update_all();
+	/* Update immediate values */
+	core_imv_update();
+	module_imv_update();
 }
 
 /**
Index: linux-2.6-lttng/Documentation/markers.txt
===================================================================
--- linux-2.6-lttng.orig/Documentation/markers.txt	2009-02-06 14:45:26.000000000 -0500
+++ linux-2.6-lttng/Documentation/markers.txt	2009-02-06 15:17:46.000000000 -0500
@@ -15,10 +15,12 @@ provide at runtime. A marker can be "on"
 (no probe is attached). When a marker is "off" it has no effect, except for
 adding a tiny time penalty (checking a condition for a branch) and space
 penalty (adding a few bytes for the function call at the end of the
-instrumented function and adds a data structure in a separate section).  When a
-marker is "on", the function you provide is called each time the marker is
-executed, in the execution context of the caller. When the function provided
-ends its execution, it returns to the caller (continuing from the marker site).
+instrumented function and adds a data structure in a separate section). The
+immediate values are used to minimize the impact on data cache, encoding the
+condition in the instruction stream. When a marker is "on", the function you
+provide is called each time the marker is executed, in the execution context of
+the caller. When the function provided ends its execution, it returns to the
+caller (continuing from the marker site).
 
 You can put markers at important locations in the code. Markers are
 lightweight hooks that can pass an arbitrary number of parameters,
@@ -90,6 +92,13 @@ notrace void probe_tracepoint_name(unsig
 	/* write data to trace buffers ... */
 }
 
+* Optimization for a given architecture
+
+To force use of a non-optimized version of the markers, _trace_mark() should be
+used. It takes the same parameters as the normal markers, but it does not use
+the immediate values based on code patching.
+
+
 * Probe / marker example
 
 See the example provided in samples/markers/src

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 25/41] Markers Support for Proprierary Modules
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (23 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 24/41] Linux Kernel Markers - Use Immediate Values Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 26/41] Marers remove old comment Mathieu Desnoyers
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Jon Masters, Jon Masters, Rusty Russell

[-- Attachment #1: markers-support-for-proprietary-modules.patch --]
[-- Type: text/plain, Size: 4913 bytes --]

There seems to be good arguments for markers to support proprierary modules. So
I am throwing this one-liner in and let's see how people react. It only makes
sure that a module that has been "forced" to be loaded won't have its markers
used. It is important to leave this check to make sure the kernel does not crash
by expecting the markers part of the struct module by mistake in the case there
is an incorrect checksum.


Discussion so far :
http://lkml.org/lkml/2008/1/22/226

Jon Masters <jcm@redhat.com> writes:
I notice in module.c:

#ifdef CONFIG_MARKERS
      if (!mod->taints)
              marker_update_probe_range(mod->markers,
                      mod->markers + mod->num_markers, NULL, NULL);
#endif

Is this an attempt to not set a marker for proprietary modules? [...]

* Frank Ch. Eigler (fche@redhat.com) wrote:
I can't seem to find any discussion about this aspect.  If this is the
intent, it seems misguided to me.  There may instead be a relationship
to TAINT_FORCED_{RMMOD,MODULE}.  Mathieu?

- FChE

On Tue, 2008-01-22 at 22:10 -0500, Mathieu Desnoyers wrote:
On my part, its mostly a matter of not crashing the kernel when someone
tries to force modprobe of a proprietary module (where the checksums
doesn't match) on a kernel that supports the markers. Not doing so
causes the markers to try to find the marker-specific information in
struct module which doesn't exist and OOPSes.

Christoph's point of view is rather more drastic than mine : it's not
interesting for the kernel community to help proprietary modules writers,
so it's a good idea not to give them marker support. (I CC'ed him so he
can clarify his position).

* Frank Ch. Eigler (fche@redhat.com) wrote:
[...]
Another way of looking at this though is that by allowing/encouraging
proprietary module writers to include markers, we and their users get
new diagnostic capabilities.  It constitutes a little bit of opening
up, which IMO we should reward rather than punish.


* Vladis Ketnieks (Valdis.Kletnieks@vt.edu) wrote:
On Wed, 23 Jan 2008 09:48:12 EST, Mathieu Desnoyers said:

> This specific one is a kernel policy matter, and I personally don't
> have a strong opinion about it. I agree that you raise a good counter
> argument : it can be useful to proprietary modules users to be able to
> extract tracing information from those modules to argue with their
> vendors that their driver/hardware is broken (a tracer is _very_ useful
> in that kind of situation).

Amen, brother. Been there, done that, got the tshirt (not on Linux, but
other operating systems).

>                             However, it is also useful to proprieraty
> module writers who can benefit from the merged kernel/modules traces.
> Do we want to give them this ability ?

The proprietary module writer has the *source* for the kernel and their module.
There's no way you can prevent the proprietary module writers from using this
feature as long as you allow other module writers to use it.

>                                           It would surely help writing
> better proprieraty kernel modules.

The biggest complaint against proprietary modules is that they make it
impossible for *us* to debug.  And you want to argue *against* a feature that
would allow them to develop better code that causes less crashes, and therefor
less people *asking* for us to debug it?

Remember - when a user tries a Linux box with a proprietary module, and the
experience sucks because the module sucks, they will walk away thinking
"Linux sucks", not "That module sucks".

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Jon Masters <jcm@jonmasters.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: Jon Masters <jcm@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/module.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2009-03-05 15:45:38.000000000 -0500
+++ linux-2.6-lttng/kernel/module.c	2009-03-05 15:45:49.000000000 -0500
@@ -2809,7 +2809,7 @@ void module_update_markers(void)
 
 	mutex_lock(&module_mutex);
 	list_for_each_entry(mod, &modules, list)
-		if (!mod->taints)
+		if (!(mod->taints & TAINT_FORCED_MODULE))
 			marker_update_probe_range(mod->markers,
 				mod->markers + mod->num_markers);
 	mutex_unlock(&module_mutex);
@@ -2826,7 +2826,7 @@ int module_get_iter_markers(struct marke
 
 	mutex_lock(&module_mutex);
 	list_for_each_entry(iter_mod, &modules, list) {
-		if (!iter_mod->taints) {
+		if (!(iter_mod->taints & TAINT_FORCED_MODULE)) {
 			/*
 			 * Sorted module list
 			 */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 26/41] Marers remove old comment
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (24 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 25/41] Markers Support for Proprierary Modules Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 27/41] Markers use dynamic channels Mathieu Desnoyers
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: markers-remove-old-comment.patch --]
[-- Type: text/plain, Size: 1118 bytes --]

> > + * Note : the empty asm volatile with read constraint is used here instead of a
> > + * "used" attribute to fix a gcc 4.1.x bug.
> 
> There seems no empty asm...

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/marker.h |    2 --
 1 file changed, 2 deletions(-)

Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2009-02-06 15:21:25.000000000 -0500
+++ linux-2.6-lttng/include/linux/marker.h	2009-02-06 15:21:30.000000000 -0500
@@ -73,8 +73,6 @@ struct marker {
 		_DEFINE_MARKER(name, #tp_name, tp_cb, format)
 
 /*
- * Note : the empty asm volatile with read constraint is used here instead of a
- * "used" attribute to fix a gcc 4.1.x bug.
  * Make sure the alignment of the structure in the __markers section will
  * not add unwanted padding between the beginning of the section and the
  * structure. Force alignment to the same alignment as the section start.

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 27/41] Markers use dynamic channels
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (25 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 26/41] Marers remove old comment Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 28/41] LTT trace control Mathieu Desnoyers
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: markers-use-dynamic-channels.patch --]
[-- Type: text/plain, Size: 22070 bytes --]

Make marker infrastructure use dynamic channels, adding a new (first) parameter
to trace_mark( : the channel name where the data must be sent.
Switch to per-channel marker IDs.
Marker IDs are now managed by marker infrastructure rather than LTTng.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/marker.h |  101 +++++++++++++++++------------
 kernel/marker.c        |  168 +++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 195 insertions(+), 74 deletions(-)

Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2009-02-06 15:44:53.000000000 -0500
+++ linux-2.6-lttng/include/linux/marker.h	2009-02-06 15:52:11.000000000 -0500
@@ -15,6 +15,7 @@
 #include <stdarg.h>
 #include <linux/types.h>
 #include <linux/immediate.h>
+#include <linux/ltt-channels.h>
 
 struct module;
 struct marker;
@@ -40,6 +41,7 @@ struct marker_probe_closure {
 };
 
 struct marker {
+	const char *channel;	/* Name of channel where to send data */
 	const char *name;	/* Marker name */
 	const char *format;	/* Marker format string, describing the
 				 * variable argument list.
@@ -47,6 +49,8 @@ struct marker {
 	DEFINE_IMV(char, state);/* Immediate value state. */
 	char ptype;		/* probe type : 0 : single, 1 : multi */
 				/* Probe wrapper */
+	u16 chan_id;		/* Numeric channel identifier, dynamic */
+	u16 event_id;		/* Numeric event identifier, dynamic */
 	void (*call)(const struct marker *mdata, void *call_private, ...);
 	struct marker_probe_closure single;
 	struct marker_probe_closure *multi;
@@ -56,21 +60,25 @@ struct marker {
 
 #ifdef CONFIG_MARKERS
 
-#define _DEFINE_MARKER(name, tp_name_str, tp_cb, format)		\
-		static const char __mstrtab_##name[]			\
+#define _DEFINE_MARKER(channel, name, tp_name_str, tp_cb, format)	\
+		static const char __mstrtab_##channel##_##name[]	\
 		__attribute__((section("__markers_strings")))		\
-		= #name "\0" format;					\
-		static struct marker __mark_##name			\
+		= #channel "\0" #name "\0" format;			\
+		static struct marker __mark_##channel##_##name		\
 		__attribute__((section("__markers"), aligned(8))) =	\
-		{ __mstrtab_##name, &__mstrtab_##name[sizeof(#name)],	\
-		  0, 0, marker_probe_cb, { __mark_empty_function, NULL},\
+		{ __mstrtab_##channel##_##name,				\
+		  &__mstrtab_##channel##_##name[sizeof(#channel)],	\
+		  &__mstrtab_##channel##_##name[sizeof(#channel) +	\
+						sizeof(#name)],		\
+		  0, 0, 0, 0, marker_probe_cb,				\
+		  { __mark_empty_function, NULL},			\
 		  NULL, tp_name_str, tp_cb }
 
-#define DEFINE_MARKER(name, format)					\
-		_DEFINE_MARKER(name, NULL, NULL, format)
+#define DEFINE_MARKER(channel, name, format)				\
+		_DEFINE_MARKER(channel, name, NULL, NULL, format)
 
-#define DEFINE_MARKER_TP(name, tp_name, tp_cb, format)			\
-		_DEFINE_MARKER(name, #tp_name, tp_cb, format)
+#define DEFINE_MARKER_TP(channel, name, tp_name, tp_cb, format)		\
+		_DEFINE_MARKER(channel, name, #tp_name, tp_cb, format)
 
 /*
  * Make sure the alignment of the structure in the __markers section will
@@ -81,45 +89,49 @@ struct marker {
  * If generic is true, a variable read is used.
  * If generic is false, immediate values are used.
  */
-#define __trace_mark(generic, name, call_private, format, args...)	\
+#define __trace_mark(generic, channel, name, call_private, format, args...) \
 	do {								\
-		DEFINE_MARKER(name, format);				\
+		DEFINE_MARKER(channel, name, format);			\
 		__mark_check_format(format, ## args);			\
 		if (!generic) {						\
-			if (unlikely(imv_read(__mark_##name.state)))	\
-				(*__mark_##name.call)			\
-					(&__mark_##name, call_private,	\
-					## args);			\
+			if (unlikely(imv_read(				\
+					__mark_##channel##_##name.state))) \
+				(*__mark_##channel##_##name.call)	\
+					(&__mark_##channel##_##name,	\
+					call_private, ## args);		\
 		} else {						\
-			if (unlikely(_imv_read(__mark_##name.state)))	\
-				(*__mark_##name.call)			\
-					(&__mark_##name, call_private,	\
-					## args);			\
+			if (unlikely(_imv_read(				\
+					__mark_##channel##_##name.state))) \
+				(*__mark_##channel##_##name.call)	\
+					(&__mark_##channel##_##name,	\
+					call_private, ## args);		\
 		}							\
 	} while (0)
 
-#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
+#define __trace_mark_tp(channel, name, call_private, tp_name, tp_cb,	\
+			format, args...)				\
 	do {								\
 		void __check_tp_type(void)				\
 		{							\
 			register_trace_##tp_name(tp_cb);		\
 		}							\
-		DEFINE_MARKER_TP(name, tp_name, tp_cb, format);		\
+		DEFINE_MARKER_TP(channel, name, tp_name, tp_cb, format);\
 		__mark_check_format(format, ## args);			\
-		(*__mark_##name.call)(&__mark_##name, call_private,	\
-					## args);			\
+		(*__mark_##channel##_##name.call)(&__mark_##channel##_##name, \
+			call_private, ## args);				\
 	} while (0)
 
 extern void marker_update_probe_range(struct marker *begin,
 	struct marker *end);
 
-#define GET_MARKER(name)	(__mark_##name)
+#define GET_MARKER(channel, name)	(__mark_##channel##_##name)
 
 #else /* !CONFIG_MARKERS */
-#define DEFINE_MARKER(name, tp_name, tp_cb, format)
-#define __trace_mark(generic, name, call_private, format, args...) \
+#define DEFINE_MARKER(channel, name, tp_name, tp_cb, format)
+#define __trace_mark(generic, channel, name, call_private, format, args...) \
 		__mark_check_format(format, ## args)
-#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
+#define __trace_mark_tp(channel, name, call_private, tp_name, tp_cb,	\
+		format, args...)					\
 	do {								\
 		void __check_tp_type(void)				\
 		{							\
@@ -130,11 +142,12 @@ extern void marker_update_probe_range(st
 static inline void marker_update_probe_range(struct marker *begin,
 	struct marker *end)
 { }
-#define GET_MARKER(name)
+#define GET_MARKER(channel, name)
 #endif /* CONFIG_MARKERS */
 
 /**
  * trace_mark - Marker using code patching
+ * @channel: marker channel (where to send the data), not quoted.
  * @name: marker name, not quoted.
  * @format: format string
  * @args...: variable argument list
@@ -142,11 +155,12 @@ static inline void marker_update_probe_r
  * Places a marker using optimized code patching technique (imv_read())
  * to be enabled when immediate values are present.
  */
-#define trace_mark(name, format, args...) \
-	__trace_mark(0, name, NULL, format, ## args)
+#define trace_mark(channel, name, format, args...) \
+	__trace_mark(0, channel, name, NULL, format, ## args)
 
 /**
  * _trace_mark - Marker using variable read
+ * @channel: marker channel (where to send the data), not quoted.
  * @name: marker name, not quoted.
  * @format: format string
  * @args...: variable argument list
@@ -156,11 +170,12 @@ static inline void marker_update_probe_r
  * modification based enabling is not welcome. (__init and __exit functions,
  * lockdep, some traps, printk).
  */
-#define _trace_mark(name, format, args...) \
-	__trace_mark(1, name, NULL, format, ## args)
+#define _trace_mark(channel, name, format, args...) \
+	__trace_mark(1, channel, name, NULL, format, ## args)
 
 /**
  * trace_mark_tp - Marker in a tracepoint callback
+ * @channel: marker channel (where to send the data), not quoted.
  * @name: marker name, not quoted.
  * @tp_name: tracepoint name, not quoted.
  * @tp_cb: tracepoint callback. Should have an associated global symbol so it
@@ -170,14 +185,19 @@ static inline void marker_update_probe_r
  *
  * Places a marker in a tracepoint callback.
  */
-#define trace_mark_tp(name, tp_name, tp_cb, format, args...)	\
-	__trace_mark_tp(name, NULL, tp_name, tp_cb, format, ## args)
+#define trace_mark_tp(channel, name, tp_name, tp_cb, format, args...)	\
+	__trace_mark_tp(channel, name, NULL, tp_name, tp_cb, format, ## args)
 
 /**
  * MARK_NOARGS - Format string for a marker with no argument.
  */
 #define MARK_NOARGS " "
 
+extern void lock_markers(void);
+extern void unlock_markers(void);
+
+extern void markers_compact_event_ids(void);
+
 /* To be used for string format validity checking with gcc */
 static inline void __printf(1, 2) ___mark_check_format(const char *fmt, ...)
 {
@@ -198,13 +218,13 @@ extern void marker_probe_cb(const struct
  * Connect a probe to a marker.
  * private data pointer must be a valid allocated memory address, or NULL.
  */
-extern int marker_probe_register(const char *name, const char *format,
-				marker_probe_func *probe, void *probe_private);
+extern int marker_probe_register(const char *channel, const char *name,
+	const char *format, marker_probe_func *probe, void *probe_private);
 
 /*
  * Returns the private data given to marker_probe_register.
  */
-extern int marker_probe_unregister(const char *name,
+extern int marker_probe_unregister(const char *channel, const char *name,
 	marker_probe_func *probe, void *probe_private);
 /*
  * Unregister a marker by providing the registered private data.
@@ -212,8 +232,8 @@ extern int marker_probe_unregister(const
 extern int marker_probe_unregister_private_data(marker_probe_func *probe,
 	void *probe_private);
 
-extern void *marker_get_private_data(const char *name, marker_probe_func *probe,
-	int num);
+extern void *marker_get_private_data(const char *channel, const char *name,
+	marker_probe_func *probe, int num);
 
 /*
  * marker_synchronize_unregister must be called between the last marker probe
@@ -235,5 +255,6 @@ extern void marker_iter_stop(struct mark
 extern void marker_iter_reset(struct marker_iter *iter);
 extern int marker_get_iter_range(struct marker **marker, struct marker *begin,
 	struct marker *end);
+extern int is_marker_enabled(const char *channel, const char *name);
 
 #endif
Index: linux-2.6-lttng/kernel/marker.c
===================================================================
--- linux-2.6-lttng.orig/kernel/marker.c	2009-02-06 15:44:53.000000000 -0500
+++ linux-2.6-lttng/kernel/marker.c	2009-02-06 15:52:18.000000000 -0500
@@ -38,6 +38,16 @@ static const int marker_debug;
  */
 static DEFINE_MUTEX(markers_mutex);
 
+void lock_markers(void)
+{
+	mutex_lock(&markers_mutex);
+}
+
+void unlock_markers(void)
+{
+	mutex_unlock(&markers_mutex);
+}
+
 /*
  * Marker hash table, containing the active markers.
  * Protected by module_mutex.
@@ -57,6 +67,7 @@ static struct hlist_head marker_table[MA
 struct marker_entry {
 	struct hlist_node hlist;
 	char *format;
+	char *name;
 			/* Probe wrapper */
 	void (*call)(const struct marker *mdata, void *call_private, ...);
 	struct marker_probe_closure single;
@@ -65,9 +76,11 @@ struct marker_entry {
 	struct rcu_head rcu;
 	void *oldptr;
 	int rcu_pending;
+	u16 chan_id;
+	u16 event_id;
 	unsigned char ptype:1;
 	unsigned char format_allocated:1;
-	char name[0];	/* Contains name'\0'format'\0' */
+	char channel[0];	/* Contains channel'\0'name'\0'format'\0' */
 };
 
 /**
@@ -205,6 +218,13 @@ static void free_old_closure(struct rcu_
 {
 	struct marker_entry *entry = container_of(head,
 		struct marker_entry, rcu);
+	int ret;
+
+	/* Single probe removed */
+	if (!entry->ptype) {
+		ret = ltt_channels_unregister(entry->channel);
+		WARN_ON(ret);
+	}
 	kfree(entry->oldptr);
 	/* Make sure we free the data before setting the pending flag to 0 */
 	smp_wmb();
@@ -354,16 +374,19 @@ marker_entry_remove_probe(struct marker_
  * Must be called with markers_mutex held.
  * Returns NULL if not present.
  */
-static struct marker_entry *get_marker(const char *name)
+static struct marker_entry *get_marker(const char *channel, const char *name)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
 	struct marker_entry *e;
-	u32 hash = jhash(name, strlen(name), 0);
+	size_t channel_len = strlen(channel) + 1;
+	size_t name_len = strlen(name) + 1;
+	u32 hash;
 
+	hash = jhash(channel, channel_len-1, 0) ^ jhash(name, name_len-1, 0);
 	head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
 	hlist_for_each_entry(e, node, head, hlist) {
-		if (!strcmp(name, e->name))
+		if (!strcmp(channel, e->channel) && !strcmp(name, e->name))
 			return e;
 	}
 	return NULL;
@@ -373,22 +396,25 @@ static struct marker_entry *get_marker(c
  * Add the marker to the marker hash table. Must be called with markers_mutex
  * held.
  */
-static struct marker_entry *add_marker(const char *name, const char *format)
+static struct marker_entry *add_marker(const char *channel, const char *name,
+		const char *format)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
 	struct marker_entry *e;
+	size_t channel_len = strlen(channel) + 1;
 	size_t name_len = strlen(name) + 1;
 	size_t format_len = 0;
-	u32 hash = jhash(name, name_len-1, 0);
+	u32 hash;
 
+	hash = jhash(channel, channel_len-1, 0) ^ jhash(name, name_len-1, 0);
 	if (format)
 		format_len = strlen(format) + 1;
 	head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
 	hlist_for_each_entry(e, node, head, hlist) {
-		if (!strcmp(name, e->name)) {
+		if (!strcmp(channel, e->channel) && !strcmp(name, e->name)) {
 			printk(KERN_NOTICE
-				"Marker %s busy\n", name);
+				"Marker %s.%s busy\n", channel, name);
 			return ERR_PTR(-EBUSY);	/* Already there */
 		}
 	}
@@ -396,13 +422,16 @@ static struct marker_entry *add_marker(c
 	 * Using kmalloc here to allocate a variable length element. Could
 	 * cause some memory fragmentation if overused.
 	 */
-	e = kmalloc(sizeof(struct marker_entry) + name_len + format_len,
-			GFP_KERNEL);
+	e = kmalloc(sizeof(struct marker_entry)
+		    + channel_len + name_len + format_len,
+		    GFP_KERNEL);
 	if (!e)
 		return ERR_PTR(-ENOMEM);
-	memcpy(&e->name[0], name, name_len);
+	memcpy(e->channel, channel, channel_len);
+	e->name = &e->channel[channel_len];
+	memcpy(e->name, name, name_len);
 	if (format) {
-		e->format = &e->name[name_len];
+		e->format = &e->name[channel_len + name_len];
 		memcpy(e->format, format, format_len);
 		if (strcmp(e->format, MARK_NOARGS) == 0)
 			e->call = marker_probe_cb_noarg;
@@ -435,12 +464,14 @@ static int remove_marker(const char *nam
 	struct hlist_node *node;
 	struct marker_entry *e;
 	int found = 0;
-	size_t len = strlen(name) + 1;
-	u32 hash = jhash(name, len-1, 0);
+	size_t channel_len = strlen(channel) + 1;
+	size_t name_len = strlen(name) + 1;
+	u32 hash;
 
+	hash = jhash(channel, channel_len-1, 0) ^ jhash(name, name_len-1, 0);
 	head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
 	hlist_for_each_entry(e, node, head, hlist) {
-		if (!strcmp(name, e->name)) {
+		if (!strcmp(channel, e->channel) && !strcmp(name, e->name)) {
 			found = 1;
 			break;
 		}
@@ -665,6 +696,7 @@ static void marker_update_probes(void)
 
 /**
  * marker_probe_register -  Connect a probe to a marker
+ * @channel: marker channel
  * @name: marker name
  * @format: format string
  * @probe: probe handler
@@ -674,27 +706,43 @@ static void marker_update_probes(void)
  * Returns 0 if ok, error value on error.
  * The probe address must at least be aligned on the architecture pointer size.
  */
-int marker_probe_register(const char *name, const char *format,
-			marker_probe_func *probe, void *probe_private)
+int marker_probe_register(const char *channel, const char *name,
+			  const char *format, marker_probe_func *probe,
+			  void *probe_private)
 {
 	struct marker_entry *entry;
-	int ret = 0;
+	int ret = 0, ret_err;
 	struct marker_probe_closure *old;
+	int first_probe = 0;
 
 	mutex_lock(&markers_mutex);
 	entry = get_marker(name);
 	if (!entry) {
-		entry = add_marker(name, format);
+		first_probe = 1;
+		entry = add_marker(channel, name, format);
 		if (IS_ERR(entry))
 			ret = PTR_ERR(entry);
+		if (ret)
+			goto end;
+		ret = ltt_channels_register(channel);
+		if (ret)
+			goto error_remove_marker;
+		ret = ltt_channels_get_index_from_name(channel);
+		if (ret < 0)
+			goto error_unregister_channel;
+		entry->channel_id = ret;
+		ret = ltt_channels_get_event_id(channel);
+		if (ret < 0)
+			goto error_unregister_channel;
+		entry->event_id = ret;
 	} else if (format) {
 		if (!entry->format)
 			ret = marker_set_format(entry, format);
 		else if (strcmp(entry->format, format))
 			ret = -EPERM;
+		if (ret)
+			goto end;
 	}
-	if (ret)
-		goto end;
 
 	/*
 	 * If we detect that a call_rcu is pending for this marker,
@@ -705,12 +753,17 @@ int marker_probe_register(const char *na
 	old = marker_entry_add_probe(entry, probe, probe_private);
 	if (IS_ERR(old)) {
 		ret = PTR_ERR(old);
-		goto end;
+		if (first_probe)
+			goto error_unregister_channel;
+		else
+			goto end;
 	}
 	mutex_unlock(&markers_mutex);
+
 	marker_update_probes();
+
 	mutex_lock(&markers_mutex);
-	entry = get_marker(name);
+	entry = get_marker(channel, name);
 	if (!entry)
 		goto end;
 	if (entry->rcu_pending)
@@ -720,6 +773,13 @@ int marker_probe_register(const char *na
 	/* write rcu_pending before calling the RCU callback */
 	smp_wmb();
 	call_rcu_sched(&entry->rcu, free_old_closure);
+
+error_unregister_channel:
+	ret_err = ltt_channels_unregister(channel);
+	WARN_ON(ret_err);
+error_remove_marker:
+	ret_err = remove_marker(channel, name);
+	WARN_ON(ret_err);
 end:
 	mutex_unlock(&markers_mutex);
 	return ret;
@@ -728,6 +788,7 @@ EXPORT_SYMBOL_GPL(marker_probe_register)
 
 /**
  * marker_probe_unregister -  Disconnect a probe from a marker
+ * @channel: marker channel
  * @name: marker name
  * @probe: probe function pointer
  * @probe_private: probe private data
@@ -738,24 +799,26 @@ EXPORT_SYMBOL_GPL(marker_probe_register)
  * itself uses stop_machine(), which insures that every preempt disabled section
  * have finished.
  */
-int marker_probe_unregister(const char *name,
-	marker_probe_func *probe, void *probe_private)
+int marker_probe_unregister(const char *channel, const char *name,
+			    marker_probe_func *probe, void *probe_private)
 {
 	struct marker_entry *entry;
 	struct marker_probe_closure *old;
 	int ret = -ENOENT;
 
 	mutex_lock(&markers_mutex);
-	entry = get_marker(name);
+	entry = get_marker(channel, name);
 	if (!entry)
 		goto end;
 	if (entry->rcu_pending)
 		rcu_barrier_sched();
 	old = marker_entry_remove_probe(entry, probe, probe_private);
 	mutex_unlock(&markers_mutex);
+
 	marker_update_probes();
+
 	mutex_lock(&markers_mutex);
-	entry = get_marker(name);
+	entry = get_marker(channel, name);
 	if (!entry)
 		goto end;
 	if (entry->rcu_pending)
@@ -765,7 +828,7 @@ int marker_probe_unregister(const char *
 	/* write rcu_pending before calling the RCU callback */
 	smp_wmb();
 	call_rcu_sched(&entry->rcu, free_old_closure);
-	remove_marker(name);	/* Ignore busy error message */
+	remove_marker(channel, name);	/* Ignore busy error message */
 	ret = 0;
 end:
 	mutex_unlock(&markers_mutex);
@@ -823,6 +886,7 @@ int marker_probe_unregister_private_data
 	struct marker_entry *entry;
 	int ret = 0;
 	struct marker_probe_closure *old;
+	const char *channel = NULL, *name = NULL;
 
 	mutex_lock(&markers_mutex);
 	entry = get_marker_from_private_data(probe, probe_private);
@@ -833,10 +897,14 @@ int marker_probe_unregister_private_data
 	if (entry->rcu_pending)
 		rcu_barrier_sched();
 	old = marker_entry_remove_probe(entry, NULL, probe_private);
+	channel = kstrdup(entry->channel, GFP_KERNEL);
+	name = kstrdup(entry->name, GFP_KERNEL);
 	mutex_unlock(&markers_mutex);
+
 	marker_update_probes();
+
 	mutex_lock(&markers_mutex);
-	entry = get_marker_from_private_data(probe, probe_private);
+	entry = get_marker(channel, name);
 	if (!entry)
 		goto end;
 	if (entry->rcu_pending)
@@ -846,15 +914,19 @@ int marker_probe_unregister_private_data
 	/* write rcu_pending before calling the RCU callback */
 	smp_wmb();
 	call_rcu_sched(&entry->rcu, free_old_closure);
-	remove_marker(entry->name);	/* Ignore busy error message */
+	/* Ignore busy error message */
+	remove_marker(channel, name);
 end:
 	mutex_unlock(&markers_mutex);
+	kfree(channel);
+	kfree(name);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(marker_probe_unregister_private_data);
 
 /**
  * marker_get_private_data - Get a marker's probe private data
+ * @channel: marker channel
  * @name: marker name
  * @probe: probe to match
  * @num: get the nth matching probe's private data
@@ -866,19 +938,21 @@ EXPORT_SYMBOL_GPL(marker_probe_unregiste
  * owner of the data, or its content could vanish. This is mostly used to
  * confirm that a caller is the owner of a registered probe.
  */
-void *marker_get_private_data(const char *name, marker_probe_func *probe,
-		int num)
+void *marker_get_private_data(const char *channel, const char *name,
+			      marker_probe_func *probe, int num)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
 	struct marker_entry *e;
+	size_t channel_len = strlen(channel) + 1;
 	size_t name_len = strlen(name) + 1;
-	u32 hash = jhash(name, name_len-1, 0);
 	int i;
+	u32 hash;
 
+	hash = jhash(channel, channel_len-1, 0) ^ jhash(name, name_len-1, 0);
 	head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
 	hlist_for_each_entry(e, node, head, hlist) {
-		if (!strcmp(name, e->name)) {
+		if (!strcmp(channel, e->channel) && !strcmp(name, e->name)) {
 			if (!e->ptype) {
 				if (num == 0 && e->single.func == probe)
 					return e->single.probe_private;
@@ -900,6 +974,32 @@ void *marker_get_private_data(const char
 }
 EXPORT_SYMBOL_GPL(marker_get_private_data);
 
+/**
+ * markers_compact_event_ids - Compact markers event IDs and reassign channels
+ *
+ * Called when no channel users are active by the channel infrastructure.
+ * Called with lock_markers() held.
+ */
+void markers_compact_event_ids(void)
+{
+	struct marker_entry *entry;
+	unsigned int i;
+	struct hlist_head *head;
+	struct hlist_node *node;
+
+	for (i = 0; i < MARKER_TABLE_SIZE; i++) {
+		head = &marker_table[i];
+		hlist_for_each_entry(entry, node, head, hlist) {
+			ret = ltt_channels_get_index_from_name(entry->channel);
+			WARN_ON(ret < 0);
+			entry->channel_id = ret;
+			ret = ltt_channels_get_event_id(entry->channel);
+			WARN_ON(ret < 0);
+			entry->event_id = ret;
+		}
+	}
+}
+
 #ifdef CONFIG_MODULES
 
 /**

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 28/41] LTT trace control
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (26 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 27/41] Markers use dynamic channels Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 29/41] LTTng menus Mathieu Desnoyers
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Zhao Lei, Mathieu Desnoyers

[-- Attachment #1: ltt-trace-control.patch --]
[-- Type: text/plain, Size: 25898 bytes --]

ltt-trace-control is used to control ltt's traces.
It exports the tracing control through a debugfs interface.

From: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 ltt/ltt-trace-control.c | 1061 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1061 insertions(+)
 create mode 100644 ltt/ltt-trace-control.c

Index: linux-2.6-lttng/ltt/ltt-trace-control.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-trace-control.c	2009-03-05 15:46:27.000000000 -0500
@@ -0,0 +1,1061 @@
+/*
+ * LTT trace control module over debugfs.
+ *
+ * Copyright 2008 - Zhaolei <zhaolei@cn.fujitsu.com>
+ *
+ * Copyright 2009 - Gui Jianfeng <guijianfeng@cn.fujitsu.com>
+ *                  Make mark-control work in debugfs
+ */
+
+/*
+ * Todo:
+ *   Impl read operations for control file to read attributes
+ *   Create a README file in ltt control dir, for display help info
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/debugfs.h>
+#include <linux/ltt-tracer.h>
+#include <linux/notifier.h>
+
+#define LTT_CONTROL_DIR "control"
+#define MARKERS_CONTROL_DIR "markers"
+#define LTT_SETUP_TRACE_FILE "setup_trace"
+#define LTT_DESTROY_TRACE_FILE "destroy_trace"
+
+#define LTT_WRITE_MAXLEN	(128)
+
+struct dentry *ltt_control_dir, *ltt_setup_trace_file, *ltt_destroy_trace_file,
+	*markers_control_dir;
+
+/*
+ * the traces_lock nests inside control_lock.
+ */
+static DEFINE_MUTEX(control_lock);
+
+/*
+ * lookup a file/dir in parent dir.
+ * only designed to work well for debugfs.
+ * (although it maybe ok for other fs)
+ *
+ * return:
+ *	file/dir's dentry on success
+ *	NULL on failure
+ */
+static struct dentry *dir_lookup(struct dentry *parent, const char *name)
+{
+	struct qstr q;
+	struct dentry *d;
+
+	q.name = name;
+	q.len = strlen(name);
+	q.hash = full_name_hash(q.name, q.len);
+
+	d = d_lookup(parent, &q);
+	if (d)
+		dput(d);
+
+	return d;
+}
+
+
+static ssize_t alloc_write(struct file *file, const char __user *user_buf,
+		size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char cmd[NAME_MAX];
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", cmd) != 1) {
+		err = -EPERM;
+		goto err_get_cmd;
+	}
+
+	if ((cmd[0] != 'Y' && cmd[0] != 'y' && cmd[0] != '1') || cmd[1]) {
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	err = ltt_trace_alloc(file->f_dentry->d_parent->d_name.name);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR "alloc_write: ltt_trace_alloc failed: %d\n",
+			err);
+		goto err_alloc_trace;
+	}
+
+	return count;
+
+err_alloc_trace:
+err_bad_cmd:
+err_get_cmd:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_alloc_operations = {
+	.write = alloc_write,
+};
+
+
+static ssize_t enabled_write(struct file *file, const char __user *user_buf,
+		size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char cmd[NAME_MAX];
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", cmd) != 1) {
+		err = -EPERM;
+		goto err_get_cmd;
+	}
+
+	if (cmd[1]) {
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	switch (cmd[0]) {
+	case 'Y':
+	case 'y':
+	case '1':
+		err = ltt_trace_start(file->f_dentry->d_parent->d_name.name);
+		if (IS_ERR_VALUE(err)) {
+			printk(KERN_ERR
+				"enabled_write: ltt_trace_start failed: %d\n",
+				err);
+			err = -EPERM;
+			goto err_start_trace;
+		}
+		break;
+	case 'N':
+	case 'n':
+	case '0':
+		err = ltt_trace_stop(file->f_dentry->d_parent->d_name.name);
+		if (IS_ERR_VALUE(err)) {
+			printk(KERN_ERR
+				"enabled_write: ltt_trace_stop failed: %d\n",
+				err);
+			err = -EPERM;
+			goto err_stop_trace;
+		}
+		break;
+	default:
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	return count;
+
+err_stop_trace:
+err_start_trace:
+err_bad_cmd:
+err_get_cmd:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_enabled_operations = {
+	.write = enabled_write,
+};
+
+
+static ssize_t trans_write(struct file *file, const char __user *user_buf,
+		size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char trans_name[NAME_MAX];
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", trans_name) != 1) {
+		err = -EPERM;
+		goto err_get_transname;
+	}
+
+	err = ltt_trace_set_type(file->f_dentry->d_parent->d_name.name,
+		trans_name);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR "trans_write: ltt_trace_set_type failed: %d\n",
+			err);
+		goto err_set_trans;
+	}
+
+	return count;
+
+err_set_trans:
+err_get_transname:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_trans_operations = {
+	.write = trans_write,
+};
+
+
+static ssize_t channel_subbuf_num_write(struct file *file,
+		const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	unsigned int num;
+	const char *channel_name;
+	const char *trace_name;
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%u", &num) != 1) {
+		err = -EPERM;
+		goto err_get_number;
+	}
+
+	channel_name = file->f_dentry->d_parent->d_name.name;
+	trace_name = file->f_dentry->d_parent->d_parent->d_parent->d_name.name;
+
+	err = ltt_trace_set_channel_subbufcount(trace_name, channel_name, num);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR "channel_subbuf_num_write: "
+			"ltt_trace_set_channel_subbufcount failed: %d\n", err);
+		goto err_set_subbufcount;
+	}
+
+	return count;
+
+err_set_subbufcount:
+err_get_number:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_channel_subbuf_num_operations = {
+	.write = channel_subbuf_num_write,
+};
+
+
+static ssize_t channel_subbuf_size_write(struct file *file,
+	const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	unsigned int num;
+	const char *channel_name;
+	const char *trace_name;
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%u", &num) != 1) {
+		err = -EPERM;
+		goto err_get_number;
+	}
+
+	channel_name = file->f_dentry->d_parent->d_name.name;
+	trace_name = file->f_dentry->d_parent->d_parent->d_parent->d_name.name;
+
+	err = ltt_trace_set_channel_subbufsize(trace_name, channel_name, num);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR "channel_subbuf_size_write: "
+		"ltt_trace_set_channel_subbufsize failed: %d\n", err);
+		goto err_set_subbufsize;
+	}
+
+	return count;
+
+err_set_subbufsize:
+err_get_number:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_channel_subbuf_size_operations = {
+	.write = channel_subbuf_size_write,
+};
+
+
+static ssize_t channel_overwrite_write(struct file *file,
+	const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char cmd[NAME_MAX];
+	const char *channel_name;
+	const char *trace_name;
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", cmd) != 1) {
+		err = -EPERM;
+		goto err_get_cmd;
+	}
+
+	if (cmd[1]) {
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	channel_name = file->f_dentry->d_parent->d_name.name;
+	trace_name = file->f_dentry->d_parent->d_parent->d_parent->d_name.name;
+
+	switch (cmd[0]) {
+	case 'Y':
+	case 'y':
+	case '1':
+		err = ltt_trace_set_channel_overwrite(trace_name, channel_name,
+			1);
+		if (IS_ERR_VALUE(err)) {
+			printk(KERN_ERR "channel_overwrite_write: "
+			"ltt_trace_set_channel_overwrite failed: %d\n", err);
+			goto err_set_subbufsize;
+		}
+		break;
+	case 'N':
+	case 'n':
+	case '0':
+		err = ltt_trace_set_channel_overwrite(trace_name, channel_name,
+			0);
+		if (IS_ERR_VALUE(err)) {
+			printk(KERN_ERR "channel_overwrite_write: "
+			"ltt_trace_set_channel_overwrite failed: %d\n", err);
+			goto err_set_subbufsize;
+		}
+		break;
+	default:
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	return count;
+
+err_set_subbufsize:
+err_bad_cmd:
+err_get_cmd:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_channel_overwrite_operations = {
+	.write = channel_overwrite_write,
+};
+
+
+static ssize_t channel_enable_write(struct file *file,
+	const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char cmd[NAME_MAX];
+	const char *channel_name;
+	const char *trace_name;
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", cmd) != 1) {
+		err = -EPERM;
+		goto err_get_cmd;
+	}
+
+	if (cmd[1]) {
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	channel_name = file->f_dentry->d_parent->d_name.name;
+	trace_name = file->f_dentry->d_parent->d_parent->d_parent->d_name.name;
+
+	switch (cmd[0]) {
+	case 'Y':
+	case 'y':
+	case '1':
+		err = ltt_trace_set_channel_enable(trace_name, channel_name,
+			1);
+		if (IS_ERR_VALUE(err)) {
+			printk(KERN_ERR "channel_enable_write: "
+			"ltt_trace_set_channel_enable failed: %d\n", err);
+			goto err_set_subbufsize;
+		}
+		break;
+	case 'N':
+	case 'n':
+	case '0':
+		err = ltt_trace_set_channel_enable(trace_name, channel_name,
+			0);
+		if (IS_ERR_VALUE(err)) {
+			printk(KERN_ERR "channel_enable_write: "
+			"ltt_trace_set_channel_enable failed: %d\n", err);
+			goto err_set_subbufsize;
+		}
+		break;
+	default:
+		err = -EPERM;
+		goto err_bad_cmd;
+	}
+
+	return count;
+
+err_set_subbufsize:
+err_bad_cmd:
+err_get_cmd:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_channel_enable_operations = {
+	.write = channel_enable_write,
+};
+
+
+static int _create_trace_control_dir(const char *trace_name,
+				     struct ltt_trace_struct *trace)
+{
+	int err;
+	struct dentry *trace_root, *channel_root;
+	struct dentry *tmp_den;
+	int i;
+
+	/* debugfs/control/trace_name */
+	trace_root = debugfs_create_dir(trace_name, ltt_control_dir);
+	if (IS_ERR(trace_root) || !trace_root) {
+		printk(KERN_ERR "_create_trace_control_dir: "
+			"create control root dir of %s failed\n", trace_name);
+		err = -ENOMEM;
+		goto err_create_trace_root;
+	}
+
+	/* debugfs/control/trace_name/alloc */
+	tmp_den = debugfs_create_file("alloc", S_IWUSR, trace_root, NULL,
+		&ltt_alloc_operations);
+	if (IS_ERR(tmp_den) || !tmp_den) {
+		printk(KERN_ERR "_create_trace_control_dir: "
+			"create file of alloc failed\n");
+		err = -ENOMEM;
+		goto err_create_subdir;
+	}
+
+	/* debugfs/control/trace_name/trans */
+	tmp_den = debugfs_create_file("trans", S_IWUSR, trace_root, NULL,
+		&ltt_trans_operations);
+	if (IS_ERR(tmp_den) || !tmp_den) {
+		printk(KERN_ERR "_create_trace_control_dir: "
+			"create file of trans failed\n");
+		err = -ENOMEM;
+		goto err_create_subdir;
+	}
+
+	/* debugfs/control/trace_name/enabled */
+	tmp_den = debugfs_create_file("enabled", S_IWUSR, trace_root, NULL,
+		&ltt_enabled_operations);
+	if (IS_ERR(tmp_den) || !tmp_den) {
+		printk(KERN_ERR "_create_trace_control_dir: "
+			"create file of enabled failed\n");
+		err = -ENOMEM;
+		goto err_create_subdir;
+	}
+
+	/* debugfs/control/trace_name/channel/ */
+	channel_root = debugfs_create_dir("channel", trace_root);
+	if (IS_ERR(channel_root) || !channel_root) {
+		printk(KERN_ERR "_create_trace_control_dir: "
+			"create dir of channel failed\n");
+		err = -ENOMEM;
+		goto err_create_subdir;
+	}
+
+	/*
+	 * Create dir and files in debugfs/ltt/control/trace_name/channel/
+	 * Following things(without <>) will be created:
+	 * `-- <control>
+	 *     `-- <trace_name>
+	 *         `-- <channel>
+	 *             |-- <channel_name>
+	 *             |   |-- enable
+	 *             |   |-- overwrite
+	 *             |   |-- subbuf_num
+	 *             |   `-- subbuf_size
+	 *             `-- ...
+	 */
+
+	for (i = 0; i < trace->nr_channels; i++) {
+		struct dentry *channel_den;
+		struct ltt_channel_struct *channel;
+
+		channel = &trace->channels[i];
+		if (!channel->active)
+			continue;
+		channel_den = debugfs_create_dir(channel->channel_name,
+						 channel_root);
+		if (IS_ERR(channel_den) || !channel_den) {
+			printk(KERN_ERR "_create_trace_control_dir: "
+				"create channel dir of %s failed\n",
+				channel->channel_name);
+			err = -ENOMEM;
+			goto err_create_subdir;
+		}
+
+		tmp_den = debugfs_create_file("subbuf_num", S_IWUSR,
+			channel_den, NULL, &ltt_channel_subbuf_num_operations);
+		if (IS_ERR(tmp_den) || !tmp_den) {
+			printk(KERN_ERR "_create_trace_control_dir: "
+				"create subbuf_num in %s failed\n",
+				channel->channel_name);
+			err = -ENOMEM;
+			goto err_create_subdir;
+		}
+
+		tmp_den = debugfs_create_file("subbuf_size", S_IWUSR,
+			channel_den, NULL, &ltt_channel_subbuf_size_operations);
+		if (IS_ERR(tmp_den) || !tmp_den) {
+			printk(KERN_ERR "_create_trace_control_dir: "
+				"create subbuf_size in %s failed\n",
+				channel->channel_name);
+			err = -ENOMEM;
+			goto err_create_subdir;
+		}
+
+		tmp_den = debugfs_create_file("enable", S_IWUSR, channel_den,
+			NULL, &ltt_channel_enable_operations);
+		if (IS_ERR(tmp_den) || !tmp_den) {
+			printk(KERN_ERR "_create_trace_control_dir: "
+				"create enable in %s failed\n",
+				channel->channel_name);
+			err = -ENOMEM;
+			goto err_create_subdir;
+		}
+
+		tmp_den = debugfs_create_file("overwrite", S_IWUSR, channel_den,
+			NULL, &ltt_channel_overwrite_operations);
+		if (IS_ERR(tmp_den) || !tmp_den) {
+			printk(KERN_ERR "_create_trace_control_dir: "
+				"create overwrite in %s failed\n",
+				channel->channel_name);
+			err = -ENOMEM;
+			goto err_create_subdir;
+		}
+	}
+
+	return 0;
+
+err_create_subdir:
+	debugfs_remove_recursive(trace_root);
+err_create_trace_root:
+	return err;
+}
+
+static ssize_t setup_trace_write(struct file *file, const char __user *user_buf,
+		size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char trace_name[NAME_MAX];
+	struct ltt_trace_struct *trace;
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", trace_name) != 1) {
+		err = -EPERM;
+		goto err_get_tracename;
+	}
+
+	mutex_lock(&control_lock);
+	ltt_lock_traces();
+
+	err = _ltt_trace_setup(trace_name);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR
+			"setup_trace_write: ltt_trace_setup failed: %d\n", err);
+		goto err_setup_trace;
+	}
+	trace = _ltt_trace_find_setup(trace_name);
+	BUG_ON(!trace);
+	err = _create_trace_control_dir(trace_name, trace);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR "setup_trace_write: "
+			"_create_trace_control_dir failed: %d\n", err);
+		goto err_create_trace_control_dir;
+	}
+
+	ltt_unlock_traces();
+	mutex_unlock(&control_lock);
+
+	return count;
+
+err_create_trace_control_dir:
+	ltt_trace_destroy(trace_name);
+err_setup_trace:
+	ltt_unlock_traces();
+	mutex_unlock(&control_lock);
+err_get_tracename:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_setup_trace_operations = {
+	.write = setup_trace_write,
+};
+
+static ssize_t destroy_trace_write(struct file *file,
+		const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err = 0;
+	char buf[NAME_MAX];
+	int buf_size;
+	char trace_name[NAME_MAX];
+	struct dentry *trace_den;
+
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto err_copy_from_user;
+	buf[buf_size] = 0;
+
+	if (sscanf(buf, "%s", trace_name) != 1) {
+		err = -EPERM;
+		goto err_get_tracename;
+	}
+
+	mutex_lock(&control_lock);
+
+	err = ltt_trace_destroy(trace_name);
+	if (IS_ERR_VALUE(err)) {
+		printk(KERN_ERR
+			"destroy_trace_write: ltt_trace_destroy failed: %d\n",
+			err);
+		err = -EPERM;
+		goto err_destroy_trace;
+	}
+
+	trace_den = dir_lookup(ltt_control_dir, trace_name);
+	if (!trace_den) {
+		printk(KERN_ERR
+			"destroy_trace_write: lookup for %s's dentry failed\n",
+			trace_name);
+		err = -ENOENT;
+		goto err_get_dentry;
+	}
+
+	debugfs_remove_recursive(trace_den);
+
+	mutex_unlock(&control_lock);
+
+	return count;
+
+err_get_dentry:
+err_destroy_trace:
+	mutex_unlock(&control_lock);
+err_get_tracename:
+err_copy_from_user:
+	return err;
+}
+
+static const struct file_operations ltt_destroy_trace_operations = {
+	.write = destroy_trace_write,
+};
+
+static int marker_enable_open(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+static ssize_t marker_enable_read(struct file *filp, char __user *ubuf,
+			    size_t cnt, loff_t *ppos)
+{
+	struct marker *marker;
+	char *buf;
+	int len;
+
+	marker = (struct marker *)filp->private_data;
+	buf = kmalloc(1024, GFP_KERNEL);
+
+	len = sprintf(buf, "%d\n", _imv_read(marker->state));
+
+	len = simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
+	kfree(buf);
+
+	return len;
+}
+
+static ssize_t marker_enable_write(struct file *filp, const char __user *ubuf,
+				size_t cnt, loff_t *ppos)
+{
+	char buf[NAME_MAX];
+	int buf_size;
+	int err = 0;
+	struct marker *marker;
+
+	marker = (struct marker *)filp->private_data;
+	buf_size = min(cnt, sizeof(buf) - 1);
+	err = copy_from_user(buf, ubuf, buf_size);
+	if (err)
+		return err;
+
+	buf[buf_size] = 0;
+
+	switch (buf[0]) {
+	case 'Y':
+	case 'y':
+	case '1':
+		err = ltt_marker_connect(marker->channel, marker->name,
+					 "default");
+		if (err)
+			return err;
+		break;
+	case 'N':
+	case 'n':
+	case '0':
+		err = ltt_marker_disconnect(marker->channel, marker->name,
+					    "default");
+		if (err)
+			return err;
+		break;
+	default:
+		return -EPERM;
+	}
+
+	return cnt;
+}
+
+static const struct file_operations enable_fops = {
+	.open = marker_enable_open,
+	.read = marker_enable_read,
+	.write = marker_enable_write,
+};
+
+static int marker_info_open(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+static ssize_t marker_info_read(struct file *filp, char __user *ubuf,
+			    size_t cnt, loff_t *ppos)
+{
+	struct marker *marker;
+	char *buf;
+	int len;
+
+	marker = (struct marker *)filp->private_data;
+	buf = kmalloc(1024, GFP_KERNEL);
+
+	len = sprintf(buf, "format: \"%s\"\nstate: %d\n"
+		      "event_id: %hu\n"
+		      "call: 0x%p\n"
+		      "probe %s : 0x%p\n",
+		      marker->format, _imv_read(marker->state),
+		      marker->event_id, marker->call, marker->ptype ?
+		      "multi" : "single", marker->ptype ?
+		      (void *)marker->multi : (void *)marker->single.func);
+
+	len = simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
+	kfree(buf);
+
+	return len;
+}
+
+static const struct file_operations info_fops = {
+	.open = marker_info_open,
+	.read = marker_info_read,
+};
+
+static int build_marker_file(struct marker *marker)
+{
+	struct dentry *channel_d, *marker_d, *enable_d, *info_d;
+	int err;
+
+	channel_d = dir_lookup(markers_control_dir, marker->channel);
+	if (!channel_d) {
+		channel_d = debugfs_create_dir(marker->channel,
+					       markers_control_dir);
+		if (IS_ERR(channel_d) || !channel_d) {
+			printk(KERN_ERR
+			       "%s: build channel dir of %s failed\n",
+			       __func__, marker->channel);
+			err = -ENOMEM;
+			goto err_build_fail;
+		}
+	}
+
+	marker_d  = dir_lookup(channel_d, marker->name);
+	if (!marker_d) {
+		marker_d = debugfs_create_dir(marker->name, channel_d);
+		if (IS_ERR(marker_d) || !marker_d) {
+			printk(KERN_ERR
+			       "%s: marker dir of %s failed\n",
+			       __func__, marker->name);
+			err = -ENOMEM;
+			goto err_build_fail;
+		}
+	}
+
+	enable_d = dir_lookup(marker_d, "enable");
+	if (!enable_d) {
+		enable_d = debugfs_create_file("enable", 0644, marker_d,
+						marker, &enable_fops);
+		if (IS_ERR(enable_d) || !enable_d) {
+			printk(KERN_ERR
+			       "%s: create file of %s failed\n",
+			       __func__, "enable");
+			err = -ENOMEM;
+			goto err_build_fail;
+		}
+	}
+
+	info_d = dir_lookup(marker_d, "info");
+	if (!info_d) {
+		info_d = debugfs_create_file("info", 0444, marker_d,
+						marker, &info_fops);
+		if (IS_ERR(info_d) || !info_d) {
+			printk(KERN_ERR
+			       "%s: create file of %s failed\n",
+			       __func__, "enable");
+			err = -ENOMEM;
+			goto err_build_fail;
+		}
+	}
+
+	return 0;
+
+err_build_fail:
+	return err;
+}
+
+static int build_marker_control_files(void)
+{
+	struct marker_iter iter;
+	int err;
+
+	err = 0;
+	if (!markers_control_dir)
+		return -EEXIST;
+
+	marker_iter_reset(&iter);
+	marker_iter_start(&iter);
+	for (; iter.marker != NULL; marker_iter_next(&iter)) {
+		err = build_marker_file(iter.marker);
+		if (err)
+			goto err_build_fail;
+	}
+	marker_iter_stop(&iter);
+	return 0;
+
+err_build_fail:
+	return err;
+}
+
+static int remove_marker_control_dir(struct marker *marker)
+{
+	struct dentry *channel_d, *marker_d;
+
+	channel_d = dir_lookup(markers_control_dir, marker->channel);
+	if (!channel_d)
+		return -ENOENT;
+
+	marker_d = dir_lookup(channel_d, marker->name);
+	if (!marker_d)
+		return -ENOENT;
+
+	debugfs_remove_recursive(marker_d);
+	if (list_empty(&channel_d->d_subdirs))
+		debugfs_remove(channel_d);
+
+	return 0;
+}
+
+static void cleanup_control_dir(struct marker *begin, struct marker *end)
+{
+	struct marker *iter;
+
+	if (!markers_control_dir)
+		return;
+
+	for (iter = begin; iter < end; iter++)
+		remove_marker_control_dir(iter);
+
+	return;
+}
+
+static void build_control_dir(struct marker *begin, struct marker *end)
+{
+	struct marker *iter;
+	int err;
+
+	err = 0;
+	if (!markers_control_dir)
+		return;
+
+	for (iter = begin; iter < end; iter++) {
+		err = build_marker_file(iter);
+		if (err)
+			goto err_build_fail;
+	}
+
+	return;
+err_build_fail:
+	cleanup_control_dir(begin, end);
+}
+
+static int module_notify(struct notifier_block *self,
+		  unsigned long val, void *data)
+{
+	struct module *mod = data;
+
+	switch (val) {
+	case MODULE_STATE_COMING:
+		build_control_dir(mod->markers,
+				  mod->markers + mod->num_markers);
+		break;
+	case MODULE_STATE_GOING:
+		cleanup_control_dir(mod->markers,
+				    mod->markers + mod->num_markers);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block module_nb = {
+	.notifier_call = module_notify,
+};
+
+static int __init ltt_trace_control_init(void)
+{
+	int err = 0;
+	struct dentry *ltt_root_dentry;
+
+	ltt_root_dentry = get_ltt_root();
+	if (!ltt_root_dentry) {
+		err = -ENOENT;
+		goto err_no_root;
+	}
+
+	ltt_control_dir = debugfs_create_dir(LTT_CONTROL_DIR, ltt_root_dentry);
+	if (IS_ERR(ltt_control_dir) || !ltt_control_dir) {
+		printk(KERN_ERR
+			"ltt_channel_control_init: create dir of %s failed\n",
+			LTT_CONTROL_DIR);
+		err = -ENOMEM;
+		goto err_create_control_dir;
+	}
+
+	ltt_setup_trace_file = debugfs_create_file(LTT_SETUP_TRACE_FILE,
+		S_IWUSR, ltt_root_dentry, NULL, &ltt_setup_trace_operations);
+	if (IS_ERR(ltt_setup_trace_file) || !ltt_setup_trace_file) {
+		printk(KERN_ERR
+			"ltt_channel_control_init: create file of %s failed\n",
+			LTT_SETUP_TRACE_FILE);
+		err = -ENOMEM;
+		goto err_create_setup_trace_file;
+	}
+
+	ltt_destroy_trace_file = debugfs_create_file(LTT_DESTROY_TRACE_FILE,
+		S_IWUSR, ltt_root_dentry, NULL, &ltt_destroy_trace_operations);
+	if (IS_ERR(ltt_destroy_trace_file) || !ltt_destroy_trace_file) {
+		printk(KERN_ERR
+			"ltt_channel_control_init: create file of %s failed\n",
+			LTT_DESTROY_TRACE_FILE);
+		err = -ENOMEM;
+		goto err_create_destroy_trace_file;
+	}
+
+	markers_control_dir = debugfs_create_dir(MARKERS_CONTROL_DIR,
+						 ltt_root_dentry);
+	if (IS_ERR(markers_control_dir) || !markers_control_dir) {
+		printk(KERN_ERR
+			"ltt_channel_control_init: create dir of %s failed\n",
+			MARKERS_CONTROL_DIR);
+		err = -ENOMEM;
+		goto err_create_marker_control_dir;
+	}
+
+	if (build_marker_control_files())
+		goto err_build_fail;
+
+	if (!register_module_notifier(&module_nb))
+		return 0;
+
+err_build_fail:
+	debugfs_remove_recursive(markers_control_dir);
+	markers_control_dir = NULL;
+err_create_marker_control_dir:
+	debugfs_remove(ltt_destroy_trace_file);
+err_create_destroy_trace_file:
+	debugfs_remove(ltt_setup_trace_file);
+err_create_setup_trace_file:
+	debugfs_remove(ltt_control_dir);
+err_create_control_dir:
+err_no_root:
+	return err;
+}
+
+static void __exit ltt_trace_control_exit(void)
+{
+	struct dentry *trace_dir;
+
+	/* destory all traces */
+	list_for_each_entry(trace_dir, &ltt_control_dir->d_subdirs,
+		d_u.d_child) {
+		ltt_trace_stop(trace_dir->d_name.name);
+		ltt_trace_destroy(trace_dir->d_name.name);
+	}
+
+	/* clean dirs in debugfs */
+	debugfs_remove(ltt_setup_trace_file);
+	debugfs_remove(ltt_destroy_trace_file);
+	debugfs_remove_recursive(ltt_control_dir);
+	debugfs_remove_recursive(markers_control_dir);
+	unregister_module_notifier(&module_nb);
+	put_ltt_root();
+}
+
+module_init(ltt_trace_control_init);
+module_exit(ltt_trace_control_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Zhao Lei <zhaolei@cn.fujitsu.com>");
+MODULE_DESCRIPTION("Linux Trace Toolkit Trace Controller");
+

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 29/41] LTTng menus
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (27 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 28/41] LTT trace control Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 23:35   ` Randy Dunlap
  2009-03-05 22:47 ` [RFC patch 30/41] LTTng build Mathieu Desnoyers
                   ` (13 subsequent siblings)
  42 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-menus.patch --]
[-- Type: text/plain, Size: 5656 bytes --]

LTTng build Kconfig and makefiles.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 ltt/Kconfig  |  127 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ltt/Makefile |   14 ++++++
 2 files changed, 141 insertions(+)

Index: linux-2.6-lttng/ltt/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/Kconfig	2009-03-05 16:09:08.000000000 -0500
@@ -0,0 +1,127 @@
+menuconfig LTT
+	bool "Linux Trace Toolkit Next Generation (LTTng)"
+	depends on EXPERIMENTAL
+	select MARKERS
+	select TRACEPOINTS
+	default y
+	help
+	  It is possible for the kernel to log important events to a trace
+	  facility. Doing so enables the use of the generated traces in order
+	  to reconstruct the dynamic behavior of the kernel, and hence the
+	  whole system.
+
+	  The tracing process contains 4 parts :
+	      1) The logging of events by key parts of the kernel.
+	      2) The tracer that keeps the events in a data buffer (uses
+	         relay).
+	      3) A trace daemon that interacts with the tracer and is
+	         notified every time there is a certain quantity of data to
+	         read from the tracer.
+	      4) A trace event data decoder that reads the accumulated data
+	         and formats it in a human-readable format.
+
+	  If you say Y, the first component will be built into the kernel.
+
+	  For more information on kernel tracing, lttctl, lttd or lttv,
+	  please check the following address :
+	       http://ltt.polymtl.ca
+
+if LTT
+
+config LTT_RELAY_ALLOC
+	def_bool n
+
+config LTT_RELAY_LOCKED
+	tristate "Linux Trace Toolkit Lock-Protected Data Relay"
+	select DEBUG_FS
+	select LTT_RELAY_ALLOC
+	depends on LTT_TRACER
+	default n
+	help
+	  Support using the slow spinlock and interrupt disable algorithm to log
+	  the data obtained through LTT.
+
+config LTT_RELAY_CHECK_RANDOM_ACCESS
+	bool "Debug check for random access in ltt relay buffers"
+	depends on LTT_RELAY_ALLOC
+	default n
+	help
+	  Add checks for random access to LTTng relay buffers. Given those
+	  buffers are a linked list, such access are rather slow. Rare accesses
+	  are OK; they can be caused by large writes (more than a page large) or
+	  by reentrancy (e.g. interrupt nesting over the tracing code).
+
+config LTT_SERIALIZE
+	tristate "Linux Trace Toolkit Serializer"
+	depends on LTT_RELAY_ALLOC
+	default y
+	help
+	  Library for serializing information from format string and argument
+	  list to the trace buffers.
+
+config LTT_FAST_SERIALIZE
+	tristate "Linux Trace Toolkit Custom Serializer"
+	depends on LTT_RELAY_ALLOC
+	default y
+	help
+	  Library for serializing information from custom, efficient, tracepoint
+	  probes.
+
+config LTT_TRACE_CONTROL
+	tristate "Linux Trace Toolkit Trace Controller"
+	depends on LTT_TRACER
+	default m
+	help
+	  If you enable this option, the debugfs-based Linux Trace Toolkit Trace
+	  Controller will be either built in the kernel or as module.
+
+config LTT_TRACER
+	tristate "Linux Trace Toolkit Tracer"
+	default y
+	help
+	  If you enable this option, the Linux Trace Toolkit Tracer will be
+	  either built in the kernel or as module.
+
+	  Critical parts of the kernel will call upon the kernel tracing
+	  function. The data is then recorded by the tracer if a trace daemon
+	  is running in user-space and has issued a "start" command.
+
+	  For more information on kernel tracing, the trace daemon or the event
+	  decoder, please check the following address :
+	       http://www.opersys.com/ltt
+	  See also the experimental page of the project :
+	       http://ltt.polymtl.ca
+
+config LTT_ALIGNMENT
+	bool "Align Linux Trace Toolkit Traces"
+	default n
+	help
+	  This option enables dynamic alignment of data in buffers. The
+	  alignment is made on the smallest size between architecture size
+	  and the size of the value to be written.
+
+	  Dynamically calculating the offset of the data has a performance cost,
+	  but it is more efficient on some architectures (especially 64 bits) to
+	  align data than to write it unaligned.
+
+config LTT_CHECK_ARCH_EFFICIENT_UNALIGNED_ACCESS
+	def_bool y
+	select LTT_ALIGNMENT if !HAVE_EFFICIENT_UNALIGNED_ACCESS
+
+config LTT_DEBUG_EVENT_SIZE
+	bool "Add event size field to LTT events for tracer debugging"
+	default n
+	help
+	  Tracer-internal option to help debugging event type encoding problems.
+
+config LTT_VMCORE
+	bool "Support trace extraction from crash dump"
+	default y
+	help
+	  If you enable this option, the Linux Trace Toolkit Tracer will
+	  support extacting ltt log from vmcore, which can be generated with
+	  kdump or LKCD tools.
+
+	  Special crash extension should be used to extract ltt buffers.
+
+endif # LTT
Index: linux-2.6-lttng/ltt/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/Makefile	2009-03-05 16:09:20.000000000 -0500
@@ -0,0 +1,14 @@
+#
+# Makefile for the LTT objects.
+#
+
+obj-$(CONFIG_MARKERS)			+= ltt-channels.o
+obj-$(CONFIG_LTT)			+= ltt-core.o
+obj-$(CONFIG_LTT_TRACER)		+= ltt-tracer.o
+obj-$(CONFIG_LTT_TRACE_CONTROL)		+= ltt-marker-control.o
+obj-$(CONFIG_LTT_RELAY_LOCKED)		+= ltt-relay-locked.o
+obj-$(CONFIG_LTT_RELAY_ALLOC)		+= ltt-relay-alloc.o
+obj-$(CONFIG_LTT_SERIALIZE)		+= ltt-serialize.o
+obj-$(CONFIG_LTT_STATEDUMP)		+= ltt-statedump.o
+obj-$(CONFIG_LTT_FAST_SERIALIZE)	+= ltt-type-serializer.o
+obj-$(CONFIG_LTT_TRACE_CONTROL)		+= ltt-trace-control.o

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 30/41] LTTng build
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (28 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 29/41] LTTng menus Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:47 ` [RFC patch 31/41] LTTng userspace event v2 Mathieu Desnoyers
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	William L. Irwin

[-- Attachment #1: lttng-build-instrumentation-menu.patch --]
[-- Type: text/plain, Size: 2439 bytes --]

Adds the basic LTTng config options.

sparc32 needs to have ltt/ added to core-y for some reason (seems broken).

ltt/Kconfig is sourced from kernel/Kconfig.instrumentation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: William L. Irwin <wli@holomorphy.com>
---
 Makefile            |    2 +-
 arch/sparc/Makefile |    2 +-
 init/Kconfig        |    2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/arch/sparc/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/sparc/Makefile	2009-03-05 15:57:40.000000000 -0500
+++ linux-2.6-lttng/arch/sparc/Makefile	2009-03-05 15:57:45.000000000 -0500
@@ -81,7 +81,7 @@ drivers-$(CONFIG_OPROFILE)	+= arch/sparc
 # Export what is needed by arch/sparc/boot/Makefile
 export VMLINUX_INIT VMLINUX_MAIN
 VMLINUX_INIT := $(head-y) $(init-y)
-VMLINUX_MAIN := $(core-y) kernel/ mm/ fs/ ipc/ security/ crypto/ block/
+VMLINUX_MAIN := $(core-y) kernel/ mm/ fs/ ipc/ security/ crypto/ block/ ltt/
 VMLINUX_MAIN += $(patsubst %/, %/lib.a, $(libs-y)) $(libs-y)
 VMLINUX_MAIN += $(drivers-y) $(net-y)
 
Index: linux-2.6-lttng/Makefile
===================================================================
--- linux-2.6-lttng.orig/Makefile	2009-03-05 15:57:40.000000000 -0500
+++ linux-2.6-lttng/Makefile	2009-03-05 15:57:45.000000000 -0500
@@ -630,7 +630,7 @@ export mod_strip_cmd
 
 
 ifeq ($(KBUILD_EXTMOD),)
-core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/
+core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/ ltt/
 
 vmlinux-dirs	:= $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
 		     $(core-y) $(core-m) $(drivers-y) $(drivers-m) \
Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2009-03-05 15:57:40.000000000 -0500
+++ linux-2.6-lttng/init/Kconfig	2009-03-05 15:57:45.000000000 -0500
@@ -987,6 +987,8 @@ config MARKERS
 
 source "arch/Kconfig"
 
+source "ltt/Kconfig"
+
 endmenu		# General setup
 
 config HAVE_GENERIC_DMA_COHERENT

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 31/41] LTTng userspace event v2
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (29 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 30/41] LTTng build Mathieu Desnoyers
@ 2009-03-05 22:47 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 32/41] LTTng filter Mathieu Desnoyers
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-userspace-write-event.patch --]
[-- Type: text/plain, Size: 6198 bytes --]

Add userspace event support to LTTng.

Simply has to write to :

/debugfs/ltt/write_event

E.g. :

echo "Error X happened !" > /debugfs/ltt/write_event

(assuming debugfs is mounted under /debugfs)

Todo :
Maybe use ltt_relay_user_blocking to block if channel is full rather than losing
an event ? Be careful about effect of stopped tracing on userspace...

Changelog :
- Support correctly when multiple strings are sent to the same write.
- Cut the strings at each \n or \0.
- Made sure we never return a count value larger than the requested count. Count
  is counting the number of _source_ data used, not the number of trace bytes
  written.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 ltt/Kconfig               |    9 +++
 ltt/Makefile              |    1 
 ltt/ltt-userspace-event.c |  131 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 141 insertions(+)

Index: linux-2.6-lttng/ltt/Kconfig
===================================================================
--- linux-2.6-lttng.orig/ltt/Kconfig	2009-03-05 16:09:08.000000000 -0500
+++ linux-2.6-lttng/ltt/Kconfig	2009-03-05 16:09:41.000000000 -0500
@@ -114,6 +114,15 @@ config LTT_DEBUG_EVENT_SIZE
 	help
 	  Tracer-internal option to help debugging event type encoding problems.
 
+config LTT_USERSPACE_EVENT
+	tristate "Support logging events from userspace"
+	depends on LTT_TRACER
+	depends on LTT_FAST_SERIALIZE
+	default m
+	help
+	  This option lets userspace write text events in
+	  /debugfs/ltt/write_event.
+
 config LTT_VMCORE
 	bool "Support trace extraction from crash dump"
 	default y
Index: linux-2.6-lttng/ltt/Makefile
===================================================================
--- linux-2.6-lttng.orig/ltt/Makefile	2009-03-05 16:09:20.000000000 -0500
+++ linux-2.6-lttng/ltt/Makefile	2009-03-05 16:09:53.000000000 -0500
@@ -12,3 +12,4 @@ obj-$(CONFIG_LTT_SERIALIZE)		+= ltt-seri
 obj-$(CONFIG_LTT_STATEDUMP)		+= ltt-statedump.o
 obj-$(CONFIG_LTT_FAST_SERIALIZE)	+= ltt-type-serializer.o
 obj-$(CONFIG_LTT_TRACE_CONTROL)		+= ltt-trace-control.o
+obj-$(CONFIG_LTT_USERSPACE_EVENT)	+= ltt-userspace-event.o
Index: linux-2.6-lttng/ltt/ltt-userspace-event.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-userspace-event.c	2009-03-05 16:09:41.000000000 -0500
@@ -0,0 +1,131 @@
+/*
+ * Copyright (C) 2008 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <linux/uaccess.h>
+#include <linux/gfp.h>
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/ltt-type-serializer.h>
+
+#define LTT_WRITE_EVENT_FILE	"write_event"
+
+DEFINE_MARKER(userspace, event, "string %s");
+static struct dentry *ltt_event_file;
+
+/**
+ * write_event - write a userspace string into the trace system
+ * @file: file pointer
+ * @user_buf: user string
+ * @count: length to copy, including the final NULL
+ * @ppos: unused
+ *
+ * Copy a string into a trace event, in channel "userspace", event "event".
+ * Copies until either \n or \0 is reached.
+ * On success, returns the number of bytes copied from the source, including the
+ * \n or \0 character (if there was one in the count range). It cannot return
+ * more than count.
+ * Inspired from tracing_mark_write implementation from Steven Rostedt and
+ * Ingo Molnar.
+ */
+static ssize_t write_event(struct file *file, const char __user *user_buf,
+		size_t count, loff_t *ppos)
+{
+	struct marker *marker;
+	char *buf, *end;
+	long copycount;
+	ssize_t ret;
+
+	buf = kmalloc(count + 1, GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto string_out;
+	}
+	copycount = strncpy_from_user(buf, user_buf, count);
+	if (copycount < 0) {
+		ret = -EFAULT;
+		goto string_err;
+	}
+	/* Cut from the first nil or newline. */
+	buf[copycount] = '\0';
+	end = strchr(buf, '\n');
+	if (end) {
+		*end = '\0';
+		copycount = end - buf;
+	}
+	/* Add final \0 to copycount */
+	copycount++;
+	marker = &GET_MARKER(userspace, event);
+	ltt_specialized_trace(marker, marker->single.probe_private,
+		buf, copycount, sizeof(char));
+	/* If there is no \0 nor \n in count, do not return a larger value */
+	ret = min_t(size_t, copycount, count);
+string_err:
+	kfree(buf);
+string_out:
+	return ret;
+}
+
+static const struct file_operations ltt_userspace_operations = {
+	.write = write_event,
+};
+
+static int __init ltt_userspace_init(void)
+{
+	struct dentry *ltt_root_dentry;
+	int err = 0;
+
+	ltt_root_dentry = get_ltt_root();
+	if (!ltt_root_dentry) {
+		err = -ENOENT;
+		goto err_no_root;
+	}
+
+	ltt_event_file = debugfs_create_file(LTT_WRITE_EVENT_FILE,
+					     S_IWUGO,
+					     ltt_root_dentry,
+					     NULL,
+					     &ltt_userspace_operations);
+	if (IS_ERR(ltt_event_file) || !ltt_event_file) {
+		printk(KERN_ERR
+			"ltt_userspace_init: failed to create file %s\n",
+			LTT_WRITE_EVENT_FILE);
+		err = -EPERM;
+		goto err_no_file;
+	}
+
+	return err;
+err_no_file:
+	put_ltt_root();
+err_no_root:
+	return err;
+}
+
+static void __exit ltt_userspace_exit(void)
+{
+	debugfs_remove(ltt_event_file);
+	put_ltt_root();
+}
+
+module_init(ltt_userspace_init);
+module_exit(ltt_userspace_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>");
+MODULE_DESCRIPTION("Linux Trace Toolkit Userspace Event");

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 32/41] LTTng filter
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (30 preceding siblings ...)
  2009-03-05 22:47 ` [RFC patch 31/41] LTTng userspace event v2 Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 33/41] LTTng dynamic tracing support with kprobes Mathieu Desnoyers
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-filter.patch --]
[-- Type: text/plain, Size: 4128 bytes --]

Filter module providing the filter/ directory entry.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/ltt-tracer.h |    2 +
 ltt/Kconfig                |    3 ++
 ltt/Makefile               |    1 
 ltt/ltt-filter.c           |   66 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+)

Index: linux-2.6-lttng/ltt/Makefile
===================================================================
--- linux-2.6-lttng.orig/ltt/Makefile	2009-03-05 16:09:53.000000000 -0500
+++ linux-2.6-lttng/ltt/Makefile	2009-03-05 16:10:10.000000000 -0500
@@ -13,3 +13,4 @@ obj-$(CONFIG_LTT_STATEDUMP)		+= ltt-stat
 obj-$(CONFIG_LTT_FAST_SERIALIZE)	+= ltt-type-serializer.o
 obj-$(CONFIG_LTT_TRACE_CONTROL)		+= ltt-trace-control.o
 obj-$(CONFIG_LTT_USERSPACE_EVENT)	+= ltt-userspace-event.o
+obj-$(CONFIG_LTT_FILTER)		+= ltt-filter.o
Index: linux-2.6-lttng/ltt/ltt-filter.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-filter.c	2009-03-05 16:09:59.000000000 -0500
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2008 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/ltt-tracer.h>
+#include <linux/mutex.h>
+
+#define LTT_FILTER_DIR	"filter"
+
+/*
+ * Protects the ltt_filter_dir allocation.
+ */
+static DEFINE_MUTEX(ltt_filter_mutex);
+
+static struct dentry *ltt_filter_dir;
+
+struct dentry *get_filter_root(void)
+{
+	struct dentry *ltt_root_dentry;
+
+	mutex_lock(&ltt_filter_mutex);
+	if (!ltt_filter_dir) {
+		ltt_root_dentry = get_ltt_root();
+		if (!ltt_root_dentry)
+			goto err_no_root;
+
+		ltt_filter_dir = debugfs_create_dir(LTT_FILTER_DIR,
+						    ltt_root_dentry);
+		if (!ltt_filter_dir)
+			printk(KERN_ERR
+				"ltt_filter_init: failed to create dir %s\n",
+				LTT_FILTER_DIR);
+	}
+err_no_root:
+	mutex_unlock(&ltt_filter_mutex);
+	return ltt_filter_dir;
+}
+EXPORT_SYMBOL_GPL(get_filter_root);
+
+static void __exit ltt_filter_exit(void)
+{
+	debugfs_remove(ltt_filter_dir);
+}
+
+module_exit(ltt_filter_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>");
+MODULE_DESCRIPTION("Linux Trace Toolkit Filter");
Index: linux-2.6-lttng/include/linux/ltt-tracer.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-tracer.h	2009-03-05 16:08:41.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-tracer.h	2009-03-05 16:09:59.000000000 -0500
@@ -648,6 +648,8 @@ enum ltt_filter_control_msg {
 extern int ltt_filter_control(enum ltt_filter_control_msg msg,
 		const char *trace_name);
 
+extern struct dentry *get_filter_root(void);
+
 void ltt_write_trace_header(struct ltt_trace_struct *trace,
 		struct ltt_subbuffer_header *header);
 extern void ltt_buffer_destroy(struct ltt_channel_struct *ltt_chan);
Index: linux-2.6-lttng/ltt/Kconfig
===================================================================
--- linux-2.6-lttng.orig/ltt/Kconfig	2009-03-05 16:09:41.000000000 -0500
+++ linux-2.6-lttng/ltt/Kconfig	2009-03-05 16:09:59.000000000 -0500
@@ -28,6 +28,9 @@ menuconfig LTT
 
 if LTT
 
+config LTT_FILTER
+	tristate
+
 config LTT_RELAY_ALLOC
 	def_bool n
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 33/41] LTTng dynamic tracing support with kprobes
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (31 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 32/41] LTTng filter Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 34/41] Marker header API update Mathieu Desnoyers
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: lttng-kprobes-support.patch --]
[-- Type: text/plain, Size: 14713 bytes --]

Add Kprobe support to LTTng, so we can use

/mnt/debugfs/ltt/kprobe/enable
/mnt/debugfs/ltt/kprobe/disable
/mnt/debugfs/ltt/kprobe/list

to respectively enable, disable and list(!) all LTTng-kprobes.

An event kernel.kprobes will be logged. It records the instruction pointer
associated with the probe hit.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/ltt-tracer.h |    8 
 kernel/kallsyms.c          |    1 
 ltt/Kconfig                |   15 +
 ltt/Makefile               |    1 
 ltt/ltt-kprobes.c          |  479 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 504 insertions(+)

Index: linux-2.6-lttng/ltt/Kconfig
===================================================================
--- linux-2.6-lttng.orig/ltt/Kconfig	2009-03-05 16:09:59.000000000 -0500
+++ linux-2.6-lttng/ltt/Kconfig	2009-03-05 16:10:12.000000000 -0500
@@ -136,4 +136,19 @@ config LTT_VMCORE
 
 	  Special crash extension should be used to extract ltt buffers.
 
+config LTT_KPROBES
+	bool "Linux Trace Toolkit Kprobes Support"
+	depends on HAVE_KPROBES
+	select LTT_TRACE_CONTROL
+	select LTT_FAST_SERIALIZE
+	select KPROBES
+	select KALLSYMS
+	default y
+	help
+	  Allows connecting the LTTng tracer on kprobes using simple debugfs
+	  file operations :
+	    ltt/kprobes/enable
+	    ltt/kprobes/disable
+	    ltt/kprobes/list
+
 endif # LTT
Index: linux-2.6-lttng/ltt/Makefile
===================================================================
--- linux-2.6-lttng.orig/ltt/Makefile	2009-03-05 16:10:10.000000000 -0500
+++ linux-2.6-lttng/ltt/Makefile	2009-03-05 16:10:12.000000000 -0500
@@ -14,3 +14,4 @@ obj-$(CONFIG_LTT_FAST_SERIALIZE)	+= ltt-
 obj-$(CONFIG_LTT_TRACE_CONTROL)		+= ltt-trace-control.o
 obj-$(CONFIG_LTT_USERSPACE_EVENT)	+= ltt-userspace-event.o
 obj-$(CONFIG_LTT_FILTER)		+= ltt-filter.o
+obj-$(CONFIG_LTT_KPROBES)		+= ltt-kprobes.o
Index: linux-2.6-lttng/ltt/ltt-kprobes.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-kprobes.c	2009-03-05 16:10:12.000000000 -0500
@@ -0,0 +1,479 @@
+/*
+ * (C) Copyright	2009 -
+ * 		Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * LTTng kprobes integration module.
+ */
+
+#include <linux/module.h>
+#include <linux/kprobes.h>
+#include <linux/ltt-tracer.h>
+#include <linux/marker.h>
+#include <linux/mutex.h>
+#include <linux/jhash.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/debugfs.h>
+#include <linux/kallsyms.h>
+#include <linux/ltt-type-serializer.h>
+
+#define LTT_KPROBES_DIR 	"kprobes"
+#define LTT_KPROBES_ENABLE	"enable"
+#define LTT_KPROBES_DISABLE	"disable"
+#define LTT_KPROBES_LIST	"list"
+
+/* Active LTTng kprobes hash table */
+static DEFINE_MUTEX(ltt_kprobes_mutex);
+
+#define LTT_KPROBE_HASH_BITS	6
+#define LTT_KPROBE_TABLE_SIZE	(1 << LTT_KPROBE_HASH_BITS)
+static struct hlist_head ltt_kprobe_table[LTT_KPROBE_TABLE_SIZE];
+
+struct kprobe_entry {
+	struct hlist_node hlist;
+	struct kprobe kp;
+	char key[0];
+};
+
+static struct dentry *ltt_kprobes_dir,
+		     *ltt_kprobes_enable_dentry,
+		     *ltt_kprobes_disable_dentry,
+		     *ltt_kprobes_list_dentry;
+
+static int module_exit;
+
+
+static void trace_kprobe_table_entry(void *call_data, struct kprobe_entry *e)
+{
+	char namebuf[KSYM_NAME_LEN];
+	unsigned long addr;
+
+	if (e->kp.addr) {
+		sprint_symbol(namebuf,
+			      (unsigned long)e->kp.addr);
+		addr = (unsigned long)e->kp.addr;
+	} else {
+		strcpy(namebuf, e->kp.symbol_name);
+		/* TODO : add offset */
+		addr = kallsyms_lookup_name(namebuf);
+	}
+	if (addr)
+		__trace_mark(0, kprobe_state, kprobe_table,
+			call_data,
+			"ip 0x%lX symbol %s", addr, namebuf);
+}
+
+DEFINE_MARKER(kernel, kprobe, "ip %lX");
+
+static int ltt_kprobe_handler_pre(struct kprobe *p, struct pt_regs *regs)
+{
+	struct marker *marker;
+	unsigned long data;
+
+	data = (unsigned long)p->addr;
+	marker = &GET_MARKER(kernel, kprobe);
+	ltt_specialized_trace(marker, marker->single.probe_private,
+		&data, sizeof(data), sizeof(data));
+	return 0;
+}
+
+static int ltt_register_kprobe(const char *key)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct kprobe_entry *e = NULL;
+	char *symbol_name = NULL;
+	unsigned long addr;
+	unsigned int offset = 0;
+	u32 hash;
+	size_t key_len = strlen(key) + 1;
+	int ret;
+
+	if (key_len == 1)
+		return -ENOENT;	/* only \0 */
+
+	if (sscanf(key, "%li", &addr) != 1)
+		addr = 0;
+
+	if (!addr) {
+		const char *symbol_end = NULL;
+		unsigned int symbol_len;	/* includes final \0 */
+
+		symbol_end = strchr(key, ' ');
+		if (symbol_end)
+			symbol_len = symbol_end - key + 1;
+		else
+			symbol_len = key_len;
+		symbol_name = kmalloc(symbol_len, GFP_KERNEL);
+		if (!symbol_name) {
+			ret = -ENOMEM;
+			goto error;
+		}
+		memcpy(symbol_name, key, symbol_len - 1);
+		symbol_name[symbol_len-1] = '\0';
+		if (symbol_end) {
+			symbol_end++;	/* start of offset */
+			if (sscanf(symbol_end, "%i", &offset) != 1)
+				offset = 0;
+		}
+	}
+
+	hash = jhash(key, key_len-1, 0);
+	head = &ltt_kprobe_table[hash & ((1 << LTT_KPROBE_HASH_BITS)-1)];
+	hlist_for_each_entry(e, node, head, hlist) {
+		if (!strcmp(key, e->key)) {
+			printk(KERN_NOTICE "Kprobe %s busy\n", key);
+			ret = -EBUSY;
+			goto error;
+		}
+	}
+	/*
+	 * Using kzalloc here to allocate a variable length element. Could
+	 * cause some memory fragmentation if overused.
+	 */
+	e = kzalloc(sizeof(struct kprobe_entry) + key_len, GFP_KERNEL);
+	if (!e) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	memcpy(e->key, key, key_len);
+	hlist_add_head(&e->hlist, head);
+	e->kp.pre_handler = ltt_kprobe_handler_pre;
+	e->kp.symbol_name = symbol_name;
+	e->kp.offset = offset;
+	e->kp.addr = (void *)addr;
+	ret = register_kprobe(&e->kp);
+	if (ret < 0)
+		goto error_list_del;
+	trace_kprobe_table_entry(NULL, e);
+	return 0;
+
+error_list_del:
+	hlist_del(&e->hlist);
+error:
+	kfree(symbol_name);
+	kfree(e);
+	return ret;
+}
+
+static int ltt_unregister_kprobe(const char *key)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct kprobe_entry *e;
+	int found = 0;
+	size_t key_len = strlen(key) + 1;
+	u32 hash;
+
+	hash = jhash(key, key_len-1, 0);
+	head = &ltt_kprobe_table[hash & ((1 << LTT_KPROBE_HASH_BITS)-1)];
+	hlist_for_each_entry(e, node, head, hlist) {
+		if (!strcmp(key, e->key)) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found)
+		return -ENOENT;
+	hlist_del(&e->hlist);
+	unregister_kprobe(&e->kp);
+	kfree(e->kp.symbol_name);
+	kfree(e);
+	return 0;
+}
+
+static void ltt_unregister_all_kprobes(void)
+{
+	struct kprobe_entry *e;
+	struct hlist_head *head;
+	struct hlist_node *node, *tmp;
+	unsigned int i;
+
+	for (i = 0; i < LTT_KPROBE_TABLE_SIZE; i++) {
+		head = &ltt_kprobe_table[i];
+		hlist_for_each_entry_safe(e, node, tmp, head, hlist) {
+			hlist_del(&e->hlist);
+			unregister_kprobe(&e->kp);
+			kfree(e->kp.symbol_name);
+			kfree(e);
+		}
+	}
+}
+
+/*
+ * Allows to specify either
+ * - symbol
+ * - symbol offset
+ * - address
+ */
+static ssize_t enable_op_write(struct file *file,
+	const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err, buf_size;
+	char buf[NAME_MAX];
+	char *end;
+
+	mutex_lock(&ltt_kprobes_mutex);
+	if (module_exit) {
+		err = -EPERM;
+		goto error;
+	}
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto error;
+	buf[buf_size] = '\0';
+	end = strchr(buf, '\n');
+	if (end)
+		*end = '\0';
+	err = ltt_register_kprobe(buf);
+	if (err)
+		goto error;
+	mutex_unlock(&ltt_kprobes_mutex);
+
+	return count;
+error:
+	mutex_unlock(&ltt_kprobes_mutex);
+	return err;
+}
+
+static const struct file_operations ltt_kprobes_enable = {
+	.write = enable_op_write,
+};
+
+static ssize_t disable_op_write(struct file *file,
+	const char __user *user_buf, size_t count, loff_t *ppos)
+{
+	int err, buf_size;
+	char buf[NAME_MAX];
+	char *end;
+
+	mutex_lock(&ltt_kprobes_mutex);
+	if (module_exit)
+		goto end;
+
+	buf_size = min(count, sizeof(buf) - 1);
+	err = copy_from_user(buf, user_buf, buf_size);
+	if (err)
+		goto error;
+	buf[buf_size] = '\0';
+	end = strchr(buf, '\n');
+	if (end)
+		*end = '\0';
+	err = ltt_unregister_kprobe(buf);
+	if (err)
+		goto error;
+end:
+	mutex_unlock(&ltt_kprobes_mutex);
+	return count;
+error:
+	mutex_unlock(&ltt_kprobes_mutex);
+	return err;
+}
+
+static const struct file_operations ltt_kprobes_disable = {
+	.write = disable_op_write,
+};
+
+/*
+ * This seqfile read is not perfectly safe, as a kprobe could be removed from
+ * the hash table between two reads. This will result in an incomplete output.
+ */
+static struct kprobe_entry *ltt_find_next_kprobe(struct kprobe_entry *prev)
+{
+	struct kprobe_entry *e;
+	struct hlist_head *head;
+	struct hlist_node *node;
+	unsigned int i;
+	int found = 0;
+
+	if (prev == (void *)-1UL)
+		return NULL;
+
+	if (!prev)
+		found = 1;
+
+	for (i = 0; i < LTT_KPROBE_TABLE_SIZE; i++) {
+		head = &ltt_kprobe_table[i];
+		hlist_for_each_entry(e, node, head, hlist) {
+			if (found)
+				return e;
+			if (e == prev)
+				found = 1;
+		}
+	}
+	return NULL;
+}
+
+static void *lk_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	m->private = ltt_find_next_kprobe(m->private);
+	if (!m->private) {
+		m->private = (void *)-1UL;
+		return NULL;
+	}
+	return m->private;
+}
+
+static void *lk_start(struct seq_file *m, loff_t *pos)
+{
+	mutex_lock(&ltt_kprobes_mutex);
+	if (!*pos)
+		m->private = NULL;
+	m->private = ltt_find_next_kprobe(m->private);
+	if (!m->private) {
+		m->private = (void *)-1UL;
+		return NULL;
+	}
+	return m->private;
+}
+
+static void lk_stop(struct seq_file *m, void *p)
+{
+	mutex_unlock(&ltt_kprobes_mutex);
+}
+
+static int lk_show(struct seq_file *m, void *p)
+{
+	struct kprobe_entry *e = m->private;
+	seq_printf(m, "%s\n", e->key);
+	return 0;
+}
+
+static const struct seq_operations ltt_kprobes_list_op = {
+	.start = lk_start,
+	.next = lk_next,
+	.stop = lk_stop,
+	.show = lk_show,
+};
+
+static int ltt_kprobes_list_open(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	ret = seq_open(file, &ltt_kprobes_list_op);
+	if (ret == 0)
+		((struct seq_file *)file->private_data)->private = NULL;
+	return ret;
+}
+
+static int ltt_kprobes_list_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+
+	seq->private = NULL;
+	return seq_release(inode, file);
+}
+
+static const struct file_operations ltt_kprobes_list = {
+	.open = ltt_kprobes_list_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = ltt_kprobes_list_release,
+};
+
+void ltt_dump_kprobes_table(void *call_data)
+{
+	struct kprobe_entry *e;
+	struct hlist_head *head;
+	struct hlist_node *node;
+	unsigned int i;
+
+	for (i = 0; i < LTT_KPROBE_TABLE_SIZE; i++) {
+		head = &ltt_kprobe_table[i];
+		hlist_for_each_entry(e, node, head, hlist)
+			trace_kprobe_table_entry(call_data, e);
+	}
+}
+EXPORT_SYMBOL_GPL(ltt_dump_kprobes_table);
+
+static int __init ltt_kprobes_init(void)
+{
+	struct dentry *ltt_root_dentry;
+	int ret = 0;
+
+	printk(KERN_INFO "LTT : ltt-kprobes init\n");
+	mutex_lock(&ltt_kprobes_mutex);
+
+	ltt_root_dentry = get_ltt_root();
+	if (!ltt_root_dentry) {
+		ret = -ENOENT;
+		goto err_no_root;
+	}
+
+	ltt_kprobes_dir = debugfs_create_dir(LTT_KPROBES_DIR, ltt_root_dentry);
+	if (!ltt_kprobes_dir) {
+		printk(KERN_ERR
+		       "ltt_kprobes_init: failed to create dir %s\n",
+			LTT_KPROBES_DIR);
+		ret = -ENOMEM;
+		goto err_no_dir;
+	}
+
+	ltt_kprobes_enable_dentry = debugfs_create_file(LTT_KPROBES_ENABLE,
+		S_IWUSR, ltt_kprobes_dir, NULL,
+		&ltt_kprobes_enable);
+	if (IS_ERR(ltt_kprobes_enable_dentry) || !ltt_kprobes_enable_dentry) {
+		printk(KERN_ERR
+		       "ltt_kprobes_init: failed to create file %s\n",
+			LTT_KPROBES_ENABLE);
+		ret = -ENOMEM;
+		goto err_no_enable;
+	}
+
+	ltt_kprobes_disable_dentry = debugfs_create_file(LTT_KPROBES_DISABLE,
+		S_IWUSR, ltt_kprobes_dir, NULL,
+		&ltt_kprobes_disable);
+	if (IS_ERR(ltt_kprobes_disable_dentry) || !ltt_kprobes_disable_dentry) {
+		printk(KERN_ERR
+		       "ltt_kprobes_init: failed to create file %s\n",
+			LTT_KPROBES_DISABLE);
+		ret = -ENOMEM;
+		goto err_no_disable;
+	}
+
+	ltt_kprobes_list_dentry = debugfs_create_file(LTT_KPROBES_LIST,
+		S_IWUSR, ltt_kprobes_dir, NULL,
+		&ltt_kprobes_list);
+	if (IS_ERR(ltt_kprobes_list_dentry) || !ltt_kprobes_list_dentry) {
+		printk(KERN_ERR
+		       "ltt_kprobes_init: failed to create file %s\n",
+			LTT_KPROBES_LIST);
+		ret = -ENOMEM;
+		goto err_no_list;
+	}
+
+	mutex_unlock(&ltt_kprobes_mutex);
+	return ret;
+
+err_no_list:
+	debugfs_remove(ltt_kprobes_disable_dentry);
+err_no_disable:
+	debugfs_remove(ltt_kprobes_enable_dentry);
+err_no_enable:
+	debugfs_remove(ltt_kprobes_dir);
+err_no_dir:
+err_no_root:
+	mutex_unlock(&ltt_kprobes_mutex);
+	return ret;
+}
+module_init(ltt_kprobes_init);
+
+static void __exit ltt_kprobes_exit(void)
+{
+	printk(KERN_INFO "LTT : ltt-kprobes exit\n");
+	mutex_lock(&ltt_kprobes_mutex);
+	module_exit = 1;
+	debugfs_remove(ltt_kprobes_list_dentry);
+	debugfs_remove(ltt_kprobes_disable_dentry);
+	debugfs_remove(ltt_kprobes_enable_dentry);
+	debugfs_remove(ltt_kprobes_dir);
+	ltt_unregister_all_kprobes();
+	mutex_unlock(&ltt_kprobes_mutex);
+}
+module_exit(ltt_kprobes_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit Kprobes Support");
Index: linux-2.6-lttng/include/linux/ltt-tracer.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ltt-tracer.h	2009-03-05 16:09:59.000000000 -0500
+++ linux-2.6-lttng/include/linux/ltt-tracer.h	2009-03-05 16:10:12.000000000 -0500
@@ -672,6 +672,14 @@ extern void ltt_dump_marker_state(struct
 void ltt_lock_traces(void);
 void ltt_unlock_traces(void);
 
+#ifdef CONFIG_LTT_KPROBES
+extern void ltt_dump_kprobes_table(void *call_data);
+#else
+static inline void ltt_dump_kprobes_table(void *call_data)
+{
+}
+#endif
+
 /* Relay IOCTL */
 
 /* Get the next sub buffer that can be read. */
Index: linux-2.6-lttng/kernel/kallsyms.c
===================================================================
--- linux-2.6-lttng.orig/kernel/kallsyms.c	2009-03-05 16:08:41.000000000 -0500
+++ linux-2.6-lttng/kernel/kallsyms.c	2009-03-05 16:10:12.000000000 -0500
@@ -160,6 +160,7 @@ unsigned long kallsyms_lookup_name(const
 	}
 	return module_kallsyms_lookup_name(name);
 }
+EXPORT_SYMBOL_GPL(kallsyms_lookup_name);
 
 static unsigned long get_symbol_pos(unsigned long addr,
 				    unsigned long *symbolsize,

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 34/41] Marker header API update
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (32 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 33/41] LTTng dynamic tracing support with kprobes Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 35/41] Marker " Mathieu Desnoyers
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: marker.h-api-update.patch --]
[-- Type: text/plain, Size: 1739 bytes --]

Add channel id field to marker structure. Identifies the buffer group in which
the event will be written.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/marker.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2009-02-06 15:46:14.000000000 -0500
+++ linux-2.6-lttng/include/linux/marker.h	2009-02-06 15:46:28.000000000 -0500
@@ -22,6 +22,7 @@ struct marker;
 
 /**
  * marker_probe_func - Type of a marker probe function
+ * @mdata: marker data
  * @probe_private: probe private data
  * @call_private: call site private data
  * @fmt: format string
@@ -32,7 +33,8 @@ struct marker;
  * Type of marker probe functions. They receive the mdata and need to parse the
  * format string to recover the variable argument list.
  */
-typedef void marker_probe_func(void *probe_private, void *call_private,
+typedef void marker_probe_func(const struct marker *mdata,
+		void *probe_private, void *call_private,
 		const char *fmt, va_list *args);
 
 struct marker_probe_closure {
@@ -49,7 +51,7 @@ struct marker {
 	DEFINE_IMV(char, state);/* Immediate value state. */
 	char ptype;		/* probe type : 0 : single, 1 : multi */
 				/* Probe wrapper */
-	u16 chan_id;		/* Numeric channel identifier, dynamic */
+	u16 channel_id;		/* Numeric channel identifier, dynamic */
 	u16 event_id;		/* Numeric event identifier, dynamic */
 	void (*call)(const struct marker *mdata, void *call_private, ...);
 	struct marker_probe_closure single;

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 35/41] Marker API update
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (33 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 34/41] Marker header API update Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 36/41] kvm markers " Mathieu Desnoyers
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: marker.c-api-update.patch --]
[-- Type: text/plain, Size: 9930 bytes --]

Add channel id field to marker structure. Identifies the buffer group in which
the event will be written. Adapt the marker.c file to manage the new field.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 kernel/marker.c |  115 ++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 84 insertions(+), 31 deletions(-)

Index: linux-2.6-lttng/kernel/marker.c
===================================================================
--- linux-2.6-lttng.orig/kernel/marker.c	2009-02-06 15:52:18.000000000 -0500
+++ linux-2.6-lttng/kernel/marker.c	2009-02-06 15:52:36.000000000 -0500
@@ -25,6 +25,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/immediate.h>
+#include <linux/ltt-tracer.h>
 
 extern struct marker __start___markers[];
 extern struct marker __stop___markers[];
@@ -76,7 +77,7 @@ struct marker_entry {
 	struct rcu_head rcu;
 	void *oldptr;
 	int rcu_pending;
-	u16 chan_id;
+	u16 channel_id;
 	u16 event_id;
 	unsigned char ptype:1;
 	unsigned char format_allocated:1;
@@ -85,6 +86,7 @@ struct marker_entry {
 
 /**
  * __mark_empty_function - Empty probe callback
+ * @mdata: marker data
  * @probe_private: probe private data
  * @call_private: call site private data
  * @fmt: format string
@@ -95,8 +97,8 @@ struct marker_entry {
  * though the function pointer change and the marker enabling are two distinct
  * operations that modifies the execution flow of preemptible code.
  */
-notrace void __mark_empty_function(void *probe_private, void *call_private,
-	const char *fmt, va_list *args)
+notrace void __mark_empty_function(const struct marker *mdata,
+	void *probe_private, void *call_private, const char *fmt, va_list *args)
 {
 }
 EXPORT_SYMBOL_GPL(__mark_empty_function);
@@ -134,8 +136,8 @@ notrace void marker_probe_cb(const struc
 		 * dependant, so we put an explicit smp_rmb() here. */
 		smp_rmb();
 		va_start(args, call_private);
-		func(mdata->single.probe_private, call_private, mdata->format,
-			&args);
+		func(mdata, mdata->single.probe_private, call_private,
+			mdata->format, &args);
 		va_end(args);
 	} else {
 		struct marker_probe_closure *multi;
@@ -155,8 +157,8 @@ notrace void marker_probe_cb(const struc
 		smp_read_barrier_depends();
 		for (i = 0; multi[i].func; i++) {
 			va_start(args, call_private);
-			multi[i].func(multi[i].probe_private, call_private,
-				mdata->format, &args);
+			multi[i].func(mdata, multi[i].probe_private,
+				call_private, mdata->format, &args);
 			va_end(args);
 		}
 	}
@@ -189,8 +191,8 @@ static notrace void marker_probe_cb_noar
 		/* Must read the ptr before private data. They are not data
 		 * dependant, so we put an explicit smp_rmb() here. */
 		smp_rmb();
-		func(mdata->single.probe_private, call_private, mdata->format,
-			&args);
+		func(mdata, mdata->single.probe_private, call_private,
+			mdata->format, &args);
 	} else {
 		struct marker_probe_closure *multi;
 		int i;
@@ -208,8 +210,8 @@ static notrace void marker_probe_cb_noar
 		 */
 		smp_read_barrier_depends();
 		for (i = 0; multi[i].func; i++)
-			multi[i].func(multi[i].probe_private, call_private,
-				mdata->format, &args);
+			multi[i].func(mdata, multi[i].probe_private,
+				call_private, mdata->format, &args);
 	}
 	rcu_read_unlock_sched_notrace();
 }
@@ -218,13 +220,6 @@ static void free_old_closure(struct rcu_
 {
 	struct marker_entry *entry = container_of(head,
 		struct marker_entry, rcu);
-	int ret;
-
-	/* Single probe removed */
-	if (!entry->ptype) {
-		ret = ltt_channels_unregister(entry->channel);
-		WARN_ON(ret);
-	}
 	kfree(entry->oldptr);
 	/* Make sure we free the data before setting the pending flag to 0 */
 	smp_wmb();
@@ -437,8 +432,9 @@ static struct marker_entry *add_marker(c
 			e->call = marker_probe_cb_noarg;
 		else
 			e->call = marker_probe_cb;
-		trace_mark(core_marker_format, "name %s format %s",
-				e->name, e->format);
+		trace_mark(metadata, core_marker_format,
+			   "channel %s name %s format %s",
+			   e->channel, e->name, e->format);
 	} else {
 		e->format = NULL;
 		e->call = marker_probe_cb;
@@ -458,7 +454,7 @@ static struct marker_entry *add_marker(c
  * Remove the marker from the marker hash table. Must be called with mutex_lock
  * held.
  */
-static int remove_marker(const char *name)
+static int remove_marker(const char *channel, const char *name)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
@@ -467,6 +463,7 @@ static int remove_marker(const char *nam
 	size_t channel_len = strlen(channel) + 1;
 	size_t name_len = strlen(name) + 1;
 	u32 hash;
+	int ret;
 
 	hash = jhash(channel, channel_len-1, 0) ^ jhash(name, name_len-1, 0);
 	head = &marker_table[hash & ((1 << MARKER_HASH_BITS)-1)];
@@ -483,6 +480,8 @@ static int remove_marker(const char *nam
 	hlist_del(&e->hlist);
 	if (e->format_allocated)
 		kfree(e->format);
+	ret = ltt_channels_unregister(e->channel);
+	WARN_ON(ret);
 	/* Make sure the call_rcu has been executed */
 	if (e->rcu_pending)
 		rcu_barrier_sched();
@@ -500,8 +499,9 @@ static int marker_set_format(struct mark
 		return -ENOMEM;
 	entry->format_allocated = 1;
 
-	trace_mark(core_marker_format, "name %s format %s",
-			entry->name, entry->format);
+	trace_mark(metadata, core_marker_format,
+		   "channel %s name %s format %s",
+		   entry->channel, entry->name, entry->format);
 	return 0;
 }
 
@@ -537,6 +537,8 @@ static int set_marker(struct marker_entr
 	 * callback (does not set arguments).
 	 */
 	elem->call = entry->call;
+	elem->channel_id = entry->channel_id;
+	elem->event_id = entry->event_id;
 	/*
 	 * Sanity check :
 	 * We only update the single probe private data when the ptr is
@@ -631,9 +633,9 @@ static void disable_marker(struct marker
 	smp_wmb();
 	elem->ptype = 0;	/* single probe */
 	/*
-	 * Leave the private data and id there, because removal is racy and
-	 * should be done only after an RCU period. These are never used until
-	 * the next initialization anyway.
+	 * Leave the private data and channel_id/event_id there, because removal
+	 * is racy and should be done only after an RCU period. These are never
+	 * used until the next initialization anyway.
 	 */
 }
 
@@ -652,7 +654,7 @@ void marker_update_probe_range(struct ma
 
 	mutex_lock(&markers_mutex);
 	for (iter = begin; iter < end; iter++) {
-		mark_entry = get_marker(iter->name);
+		mark_entry = get_marker(iter->channel, iter->name);
 		if (mark_entry) {
 			set_marker(mark_entry, iter, !!mark_entry->refcount);
 			/*
@@ -716,7 +718,7 @@ int marker_probe_register(const char *ch
 	int first_probe = 0;
 
 	mutex_lock(&markers_mutex);
-	entry = get_marker(name);
+	entry = get_marker(channel, name);
 	if (!entry) {
 		first_probe = 1;
 		entry = add_marker(channel, name, format);
@@ -731,10 +733,18 @@ int marker_probe_register(const char *ch
 		if (ret < 0)
 			goto error_unregister_channel;
 		entry->channel_id = ret;
-		ret = ltt_channels_get_event_id(channel);
+		ret = ltt_channels_get_event_id(channel, name);
 		if (ret < 0)
 			goto error_unregister_channel;
 		entry->event_id = ret;
+		ret = 0;
+		trace_mark(metadata, core_marker_id,
+			   "channel %s name %s event_id %hu "
+			   "int #1u%zu long #1u%zu pointer #1u%zu "
+			   "size_t #1u%zu alignment #1u%u",
+			   channel, name, entry->event_id,
+			   sizeof(int), sizeof(long), sizeof(void *),
+			   sizeof(size_t), ltt_get_alignment());
 	} else if (format) {
 		if (!entry->format)
 			ret = marker_set_format(entry, format);
@@ -773,6 +783,7 @@ int marker_probe_register(const char *ch
 	/* write rcu_pending before calling the RCU callback */
 	smp_wmb();
 	call_rcu_sched(&entry->rcu, free_old_closure);
+	goto end;
 
 error_unregister_channel:
 	ret_err = ltt_channels_unregister(channel);
@@ -978,7 +989,7 @@ EXPORT_SYMBOL_GPL(marker_get_private_dat
  * markers_compact_event_ids - Compact markers event IDs and reassign channels
  *
  * Called when no channel users are active by the channel infrastructure.
- * Called with lock_markers() held.
+ * Called with lock_markers() and channel mutex held.
  */
 void markers_compact_event_ids(void)
 {
@@ -986,6 +997,7 @@ void markers_compact_event_ids(void)
 	unsigned int i;
 	struct hlist_head *head;
 	struct hlist_node *node;
+	int ret;
 
 	for (i = 0; i < MARKER_TABLE_SIZE; i++) {
 		head = &marker_table[i];
@@ -993,7 +1005,8 @@ void markers_compact_event_ids(void)
 			ret = ltt_channels_get_index_from_name(entry->channel);
 			WARN_ON(ret < 0);
 			entry->channel_id = ret;
-			ret = ltt_channels_get_event_id(entry->channel);
+			ret = _ltt_channels_get_event_id(entry->channel,
+							 entry->name);
 			WARN_ON(ret < 0);
 			entry->event_id = ret;
 		}
@@ -1102,3 +1115,43 @@ static int init_markers(void)
 __initcall(init_markers);
 
 #endif /* CONFIG_MODULES */
+
+void ltt_dump_marker_state(struct ltt_trace_struct *trace)
+{
+	struct marker_iter iter;
+	struct ltt_probe_private_data call_data;
+	const char *channel;
+
+	call_data.trace = trace;
+	call_data.serializer = NULL;
+
+	marker_iter_reset(&iter);
+	marker_iter_start(&iter);
+	for (; iter.marker != NULL; marker_iter_next(&iter)) {
+		if (!_imv_read(iter.marker->state))
+			continue;
+		channel = ltt_channels_get_name_from_index(
+				iter.marker->channel_id);
+		__trace_mark(0, metadata, core_marker_id,
+			&call_data,
+			"channel %s name %s event_id %hu "
+			"int #1u%zu long #1u%zu pointer #1u%zu "
+			"size_t #1u%zu alignment #1u%u",
+			channel,
+			iter.marker->name,
+			iter.marker->event_id,
+			sizeof(int), sizeof(long),
+			sizeof(void *), sizeof(size_t),
+			ltt_get_alignment());
+		if (iter.marker->format)
+			__trace_mark(0, metadata,
+				core_marker_format,
+				&call_data,
+				"channel %s name %s format %s",
+				channel,
+				iter.marker->name,
+				iter.marker->format);
+	}
+	marker_iter_stop(&iter);
+}
+EXPORT_SYMBOL_GPL(ltt_dump_marker_state);

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 36/41] kvm markers API update
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (34 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 35/41] Marker " Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 37/41] Markers : multi-probes test Mathieu Desnoyers
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: kvm-markers-api-update.patch --]
[-- Type: text/plain, Size: 4004 bytes --]

Update kvm markers to new API.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/kvm_host.h |   12 ++++++------
 virt/kvm/kvm_trace.c     |   12 +++++++-----
 2 files changed, 13 insertions(+), 11 deletions(-)

Index: linux-2.6-lttng/include/linux/kvm_host.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/kvm_host.h	2009-03-05 15:21:55.000000000 -0500
+++ linux-2.6-lttng/include/linux/kvm_host.h	2009-03-05 15:49:23.000000000 -0500
@@ -416,22 +416,22 @@ extern struct kvm_stats_debugfs_item deb
 extern struct dentry *kvm_debugfs_dir;
 
 #define KVMTRACE_5D(evt, vcpu, d1, d2, d3, d4, d5, name) \
-	trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
+	trace_mark(kvm, trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt,\
 						vcpu, 5, d1, d2, d3, d4, d5)
 #define KVMTRACE_4D(evt, vcpu, d1, d2, d3, d4, name) \
-	trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
+	trace_mark(kvm, trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt,\
 						vcpu, 4, d1, d2, d3, d4, 0)
 #define KVMTRACE_3D(evt, vcpu, d1, d2, d3, name) \
-	trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
+	trace_mark(kvm, trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt,\
 						vcpu, 3, d1, d2, d3, 0, 0)
 #define KVMTRACE_2D(evt, vcpu, d1, d2, name) \
-	trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
+	trace_mark(kvm, trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt,\
 						vcpu, 2, d1, d2, 0, 0, 0)
 #define KVMTRACE_1D(evt, vcpu, d1, name) \
-	trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
+	trace_mark(kvm, trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt,\
 						vcpu, 1, d1, 0, 0, 0, 0)
 #define KVMTRACE_0D(evt, vcpu, name) \
-	trace_mark(kvm_trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt, \
+	trace_mark(kvm, trace_##name, "%u %p %u %u %u %u %u %u", KVM_TRC_##evt,\
 						vcpu, 0, 0, 0, 0, 0, 0)
 
 #ifdef CONFIG_KVM_TRACE
Index: linux-2.6-lttng/virt/kvm/kvm_trace.c
===================================================================
--- linux-2.6-lttng.orig/virt/kvm/kvm_trace.c	2009-03-05 15:21:55.000000000 -0500
+++ linux-2.6-lttng/virt/kvm/kvm_trace.c	2009-03-05 15:49:23.000000000 -0500
@@ -48,7 +48,8 @@ static inline int calc_rec_size(int time
 	return timestamp ? rec_size += KVM_TRC_CYCLE_SIZE : rec_size;
 }
 
-static void kvm_add_trace(void *probe_private, void *call_data,
+static void kvm_add_trace(const struct marker *mdata,
+			  void *probe_private, void *call_private,
 			  const char *format, va_list *args)
 {
 	struct kvm_trace_probe *p = probe_private;
@@ -88,8 +89,8 @@ static void kvm_add_trace(void *probe_pr
 }
 
 static struct kvm_trace_probe kvm_trace_probes[] = {
-	{ "kvm_trace_entryexit", "%u %p %u %u %u %u %u %u", 1, kvm_add_trace },
-	{ "kvm_trace_handler", "%u %p %u %u %u %u %u %u", 0, kvm_add_trace },
+	{ "trace_entryexit", "%u %p %u %u %u %u %u %u", 1, kvm_add_trace },
+	{ "trace_handler", "%u %p %u %u %u %u %u %u", 0, kvm_add_trace },
 };
 
 static int lost_records_get(void *data, u64 *val)
@@ -182,7 +183,8 @@ static int do_kvm_trace_enable(struct kv
 	for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
 		struct kvm_trace_probe *p = &kvm_trace_probes[i];
 
-		r = marker_probe_register(p->name, p->format, p->probe_func, p);
+		r = marker_probe_register("kvm", p->name, p->format,
+					  p->probe_func, p);
 		if (r)
 			printk(KERN_INFO "Unable to register probe %s\n",
 			       p->name);
@@ -250,7 +252,7 @@ void kvm_trace_cleanup(void)
 
 		for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
 			struct kvm_trace_probe *p = &kvm_trace_probes[i];
-			marker_probe_unregister(p->name, p->probe_func, p);
+			marker_probe_unregister("kvm", p->name, p->probe_func, p);
 		}
 		marker_synchronize_unregister();
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 37/41] Markers : multi-probes test
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (35 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 36/41] kvm markers " Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 38/41] Markers examples API update Mathieu Desnoyers
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: markers-multi-probes-test.patch --]
[-- Type: text/plain, Size: 4005 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 samples/markers/Makefile     |    2 
 samples/markers/test-multi.c |  116 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/samples/markers/Makefile
===================================================================
--- linux-2.6-lttng.orig/samples/markers/Makefile	2009-01-09 18:15:52.000000000 -0500
+++ linux-2.6-lttng/samples/markers/Makefile	2009-01-09 18:17:56.000000000 -0500
@@ -1,4 +1,4 @@
 # builds the kprobes example kernel modules;
 # then to use one (as root):  insmod <module_name.ko>
 
-obj-$(CONFIG_SAMPLE_MARKERS) += probe-example.o marker-example.o
+obj-$(CONFIG_SAMPLE_MARKERS) += probe-example.o marker-example.o test-multi.o
Index: linux-2.6-lttng/samples/markers/test-multi.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/markers/test-multi.c	2009-01-09 18:17:56.000000000 -0500
@@ -0,0 +1,116 @@
+/* test-multi.c
+ *
+ * Connects multiple callbacks.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <asm/atomic.h>
+
+struct probe_data {
+	const char *name;
+	const char *format;
+	marker_probe_func *probe_func;
+};
+
+atomic_t eventb_count = ATOMIC_INIT(0);
+
+void probe_subsystem_eventa(void *probe_data, void *call_data,
+	const char *format, va_list *args)
+{
+	/* Increment counter */
+	atomic_inc(&eventb_count);
+}
+
+void probe_subsystem_eventb(void *probe_data, void *call_data,
+	const char *format, va_list *args)
+{
+	/* Increment counter */
+	atomic_inc(&eventb_count);
+}
+
+void probe_subsystem_eventc(void *probe_data, void *call_data,
+	const char *format, va_list *args)
+{
+	/* Increment counter */
+	atomic_inc(&eventb_count);
+}
+
+void probe_subsystem_eventd(void *probe_data, void *call_data,
+	const char *format, va_list *args)
+{
+	/* Increment counter */
+	atomic_inc(&eventb_count);
+}
+
+static struct probe_data probe_array[] =
+{
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0xa },
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0xb },
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0xc },
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0xd },
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0x10 },
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0x20 },
+	{	.name = "test_multi",
+		.format = MARK_NOARGS,
+		.probe_func = (marker_probe_func*)0x30 },
+};
+
+static int __init probe_init(void)
+{
+	int result;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
+		result = marker_probe_register(probe_array[i].name,
+				probe_array[i].format,
+				probe_array[i].probe_func, (void*)(long)i);
+		if (result)
+			printk(KERN_INFO "Unable to register probe %s\n",
+				probe_array[i].name);
+	}
+	return 0;
+}
+
+static void __exit probe_fini(void)
+{
+	int result;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
+		result = marker_probe_unregister(probe_array[i].name,
+			probe_array[i].probe_func, (void*)(long)i);
+		if (result)
+			printk(KERN_INFO "Unable to unregister probe %s\n",
+				probe_array[i].name);
+	}
+	printk(KERN_INFO "Number of event b : %u\n",
+		atomic_read(&eventb_count));
+	marker_synchronize_unregister();
+}
+
+module_init(probe_init);
+module_exit(probe_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("SUBSYSTEM Probe");

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 38/41] Markers examples API update
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (36 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 37/41] Markers : multi-probes test Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 39/41] SPUFS markers " Mathieu Desnoyers
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: markers-examples-api-update.patch --]
[-- Type: text/plain, Size: 3628 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 samples/markers/marker-example.c |    4 ++--
 samples/markers/probe-example.c  |   10 ++++++----
 samples/markers/test-multi.c     |    4 ++--
 3 files changed, 10 insertions(+), 8 deletions(-)

Index: linux-2.6-lttng/samples/markers/marker-example.c
===================================================================
--- linux-2.6-lttng.orig/samples/markers/marker-example.c	2009-02-06 14:45:12.000000000 -0500
+++ linux-2.6-lttng/samples/markers/marker-example.c	2009-02-06 15:40:05.000000000 -0500
@@ -19,10 +19,10 @@ static int my_open(struct inode *inode, 
 {
 	int i;
 
-	trace_mark(subsystem_event, "integer %d string %s", 123,
+	trace_mark(samples, subsystem_event, "integer %d string %s", 123,
 		"example string");
 	for (i = 0; i < 10; i++)
-		trace_mark(subsystem_eventb, MARK_NOARGS);
+		trace_mark(samples, subsystem_eventb, MARK_NOARGS);
 	return -EPERM;
 }
 
Index: linux-2.6-lttng/samples/markers/probe-example.c
===================================================================
--- linux-2.6-lttng.orig/samples/markers/probe-example.c	2009-02-06 14:45:12.000000000 -0500
+++ linux-2.6-lttng/samples/markers/probe-example.c	2009-02-06 15:40:05.000000000 -0500
@@ -20,7 +20,8 @@ struct probe_data {
 	marker_probe_func *probe_func;
 };
 
-void probe_subsystem_event(void *probe_data, void *call_data,
+void probe_subsystem_event(const struct marker *mdata,
+	void *probe_data, void *call_data,
 	const char *format, va_list *args)
 {
 	/* Declare args */
@@ -39,7 +40,8 @@ void probe_subsystem_event(void *probe_d
 
 atomic_t eventb_count = ATOMIC_INIT(0);
 
-void probe_subsystem_eventb(void *probe_data, void *call_data,
+void probe_subsystem_eventb(const struct marker *mdata,
+	void *probe_data, void *call_data,
 	const char *format, va_list *args)
 {
 	/* Increment counter */
@@ -62,7 +64,7 @@ static int __init probe_init(void)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
-		result = marker_probe_register(probe_array[i].name,
+		result = marker_probe_register("samples", probe_array[i].name,
 				probe_array[i].format,
 				probe_array[i].probe_func, &probe_array[i]);
 		if (result)
@@ -77,7 +79,7 @@ static void __exit probe_fini(void)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(probe_array); i++)
-		marker_probe_unregister(probe_array[i].name,
+		marker_probe_unregister("samples", probe_array[i].name,
 			probe_array[i].probe_func, &probe_array[i]);
 	printk(KERN_INFO "Number of event b : %u\n",
 			atomic_read(&eventb_count));
Index: linux-2.6-lttng/samples/markers/test-multi.c
===================================================================
--- linux-2.6-lttng.orig/samples/markers/test-multi.c	2009-02-06 15:39:59.000000000 -0500
+++ linux-2.6-lttng/samples/markers/test-multi.c	2009-02-06 15:40:05.000000000 -0500
@@ -81,7 +81,7 @@ static int __init probe_init(void)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
-		result = marker_probe_register(probe_array[i].name,
+		result = marker_probe_register("samples", probe_array[i].name,
 				probe_array[i].format,
 				probe_array[i].probe_func, (void*)(long)i);
 		if (result)
@@ -97,7 +97,7 @@ static void __exit probe_fini(void)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
-		result = marker_probe_unregister(probe_array[i].name,
+		result = marker_probe_unregister("samples", probe_array[i].name,
 			probe_array[i].probe_func, (void*)(long)i);
 		if (result)
 			printk(KERN_INFO "Unable to unregister probe %s\n",

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 39/41] SPUFS markers API update
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (37 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 38/41] Markers examples API update Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 40/41] EXT4: instrumentation with tracepoints Mathieu Desnoyers
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers

[-- Attachment #1: spufs-markers-api-update.patch --]
[-- Type: text/plain, Size: 1093 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/powerpc/platforms/cell/spufs/spufs.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/arch/powerpc/platforms/cell/spufs/spufs.h
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/platforms/cell/spufs/spufs.h	2009-02-06 14:45:12.000000000 -0500
+++ linux-2.6-lttng/arch/powerpc/platforms/cell/spufs/spufs.h	2009-02-06 15:40:25.000000000 -0500
@@ -373,9 +373,9 @@ extern void spu_free_lscsa(struct spu_st
 extern void spuctx_switch_state(struct spu_context *ctx,
 		enum spu_utilization_state new_state);
 
-#define spu_context_trace(name, ctx, spu) \
-	trace_mark(name, "ctx %p spu %p", ctx, spu);
+#define spu_context_trace(name, ctx, _spu) \
+	trace_mark(spu, name, "ctx %p spu %p", ctx, _spu);
 #define spu_context_nospu_trace(name, ctx) \
-	trace_mark(name, "ctx %p", ctx);
+	trace_mark(spu, name, "ctx %p", ctx);
 
 #endif

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 40/41] EXT4: instrumentation with tracepoints
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (38 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 39/41] SPUFS markers " Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-05 22:48 ` [RFC patch 41/41] JBD2: use tracepoints for instrumentation Mathieu Desnoyers
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Theodore Tso, Stephen C. Tweedie,
	Ext4 Developers List

[-- Attachment #1: ext4-move-from-markers-to-tracepoints.patch --]
[-- Type: text/plain, Size: 20781 bytes --]

Took the instrumentation from "ext4: Add markers for better debuggability" and
moved it to tracepoints.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Stephen C. Tweedie <sct@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ext4 Developers List <linux-ext4@vger.kernel.org>
---
 fs/ext4/fsync.c      |    8 +--
 fs/ext4/ialloc.c     |   17 +++---
 fs/ext4/inode.c      |   79 +++++++++----------------------
 fs/ext4/mballoc.c    |   71 +++++++++-------------------
 fs/ext4/mballoc.h    |    2 
 fs/ext4/super.c      |    6 +-
 include/trace/ext4.h |  129 +++++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 194 insertions(+), 118 deletions(-)

Index: linux-2.6-lttng/fs/ext4/fsync.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/fsync.c	2009-03-05 15:21:54.000000000 -0500
+++ linux-2.6-lttng/fs/ext4/fsync.c	2009-03-05 15:49:28.000000000 -0500
@@ -28,10 +28,12 @@
 #include <linux/writeback.h>
 #include <linux/jbd2.h>
 #include <linux/blkdev.h>
-#include <linux/marker.h>
+#include <trace/ext4.h>
 #include "ext4.h"
 #include "ext4_jbd2.h"
 
+DEFINE_TRACE(ext4_sync_file);
+
 /*
  * akpm: A new design for ext4_sync_file().
  *
@@ -52,9 +54,7 @@ int ext4_sync_file(struct file *file, st
 
 	J_ASSERT(ext4_journal_current_handle() == NULL);
 
-	trace_mark(ext4_sync_file, "dev %s datasync %d ino %ld parent %ld",
-		   inode->i_sb->s_id, datasync, inode->i_ino,
-		   dentry->d_parent->d_inode->i_ino);
+	trace_ext4_sync_file(file, dentry, datasync);
 
 	/*
 	 * data=writeback:
Index: linux-2.6-lttng/fs/ext4/ialloc.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/ialloc.c	2009-03-05 15:21:54.000000000 -0500
+++ linux-2.6-lttng/fs/ext4/ialloc.c	2009-03-05 15:49:28.000000000 -0500
@@ -22,6 +22,7 @@
 #include <linux/random.h>
 #include <linux/bitops.h>
 #include <linux/blkdev.h>
+#include <trace/ext4.h>
 #include <asm/byteorder.h>
 #include "ext4.h"
 #include "ext4_jbd2.h"
@@ -29,6 +30,10 @@
 #include "acl.h"
 #include "group.h"
 
+DEFINE_TRACE(ext4_free_inode);
+DEFINE_TRACE(ext4_request_inode);
+DEFINE_TRACE(ext4_allocate_inode);
+
 /*
  * ialloc.c contains the inodes allocation and deallocation routines
  */
@@ -210,11 +215,7 @@ void ext4_free_inode(handle_t *handle, s
 
 	ino = inode->i_ino;
 	ext4_debug("freeing inode %lu\n", ino);
-	trace_mark(ext4_free_inode,
-		   "dev %s ino %lu mode %d uid %lu gid %lu bocks %llu",
-		   sb->s_id, inode->i_ino, inode->i_mode,
-		   (unsigned long) inode->i_uid, (unsigned long) inode->i_gid,
-		   (unsigned long long) inode->i_blocks);
+	trace_ext4_free_inode(inode);
 
 	/*
 	 * Note: we must free any quota before locking the superblock,
@@ -703,8 +704,7 @@ struct inode *ext4_new_inode(handle_t *h
 		return ERR_PTR(-EPERM);
 
 	sb = dir->i_sb;
-	trace_mark(ext4_request_inode, "dev %s dir %lu mode %d", sb->s_id,
-		   dir->i_ino, mode);
+	trace_ext4_request_inode(dir, mode);
 	inode = new_inode(sb);
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
@@ -939,8 +939,7 @@ got:
 	}
 
 	ext4_debug("allocating inode %lu\n", inode->i_ino);
-	trace_mark(ext4_allocate_inode, "dev %s ino %lu dir %lu mode %d",
-		   sb->s_id, inode->i_ino, dir->i_ino, mode);
+	trace_ext4_allocate_inode(inode, dir, mode);
 	goto really_out;
 fail:
 	ext4_std_error(sb, err);
Index: linux-2.6-lttng/fs/ext4/inode.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/inode.c	2009-03-05 15:21:54.000000000 -0500
+++ linux-2.6-lttng/fs/ext4/inode.c	2009-03-05 15:49:28.000000000 -0500
@@ -37,11 +37,24 @@
 #include <linux/namei.h>
 #include <linux/uio.h>
 #include <linux/bio.h>
+#include <trace/ext4.h>
 #include "ext4_jbd2.h"
 #include "xattr.h"
 #include "acl.h"
 #include "ext4_extents.h"
 
+DEFINE_TRACE(ext4_write_begin);
+DEFINE_TRACE(ext4_ordered_write_end);
+DEFINE_TRACE(ext4_writeback_write_end);
+DEFINE_TRACE(ext4_journalled_write_end);
+DEFINE_TRACE(ext4_da_writepage);
+DEFINE_TRACE(ext4_da_writepages);
+DEFINE_TRACE(ext4_da_writepages_result);
+DEFINE_TRACE(ext4_da_write_begin);
+DEFINE_TRACE(ext4_da_write_end);
+DEFINE_TRACE(ext4_normal_writepage);
+DEFINE_TRACE(ext4_journalled_writepage);
+
 #define MPAGE_DA_EXTENT_TAIL 0x01
 
 static inline int ext4_begin_ordered_truncate(struct inode *inode,
@@ -1353,10 +1366,7 @@ static int ext4_write_begin(struct file 
  	pgoff_t index;
 	unsigned from, to;
 
-	trace_mark(ext4_write_begin,
-		   "dev %s ino %lu pos %llu len %u flags %u",
-		   inode->i_sb->s_id, inode->i_ino,
-		   (unsigned long long) pos, len, flags);
+	trace_ext4_write_begin(inode, pos, len, flags);
  	index = pos >> PAGE_CACHE_SHIFT;
 	from = pos & (PAGE_CACHE_SIZE - 1);
 	to = from + len;
@@ -1432,10 +1442,7 @@ static int ext4_ordered_write_end(struct
 	struct inode *inode = mapping->host;
 	int ret = 0, ret2;
 
-	trace_mark(ext4_ordered_write_end,
-		   "dev %s ino %lu pos %llu len %u copied %u",
-		   inode->i_sb->s_id, inode->i_ino,
-		   (unsigned long long) pos, len, copied);
+	trace_ext4_ordered_write_end(inode, pos, len, copied);
 	ret = ext4_jbd2_file_inode(handle, inode);
 
 	if (ret == 0) {
@@ -1474,10 +1481,7 @@ static int ext4_writeback_write_end(stru
 	int ret = 0, ret2;
 	loff_t new_i_size;
 
-	trace_mark(ext4_writeback_write_end,
-		   "dev %s ino %lu pos %llu len %u copied %u",
-		   inode->i_sb->s_id, inode->i_ino,
-		   (unsigned long long) pos, len, copied);
+	trace_ext4_writeback_write_end(inode, pos, len, copied);
 	new_i_size = pos + copied;
 	if (new_i_size > EXT4_I(inode)->i_disksize) {
 		ext4_update_i_disksize(inode, new_i_size);
@@ -1513,10 +1517,7 @@ static int ext4_journalled_write_end(str
 	unsigned from, to;
 	loff_t new_i_size;
 
-	trace_mark(ext4_journalled_write_end,
-		   "dev %s ino %lu pos %llu len %u copied %u",
-		   inode->i_sb->s_id, inode->i_ino,
-		   (unsigned long long) pos, len, copied);
+	trace_ext4_journalled_write_end(inode, pos, len, copied);
 	from = pos & (PAGE_CACHE_SIZE - 1);
 	to = from + len;
 
@@ -2333,9 +2334,7 @@ static int ext4_da_writepage(struct page
 	struct buffer_head *page_bufs;
 	struct inode *inode = page->mapping->host;
 
-	trace_mark(ext4_da_writepage,
-		   "dev %s ino %lu page_index %lu",
-		   inode->i_sb->s_id, inode->i_ino, page->index);
+	trace_ext4_da_writepage(inode, page);
 	size = i_size_read(inode);
 	if (page->index == size >> PAGE_CACHE_SHIFT)
 		len = size & ~PAGE_CACHE_MASK;
@@ -2447,19 +2446,7 @@ static int ext4_da_writepages(struct add
 	int needed_blocks, ret = 0, nr_to_writebump = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
 
-	trace_mark(ext4_da_writepages,
-		   "dev %s ino %lu nr_t_write %ld "
-		   "pages_skipped %ld range_start %llu "
-		   "range_end %llu nonblocking %d "
-		   "for_kupdate %d for_reclaim %d "
-		   "for_writepages %d range_cyclic %d",
-		   inode->i_sb->s_id, inode->i_ino,
-		   wbc->nr_to_write, wbc->pages_skipped,
-		   (unsigned long long) wbc->range_start,
-		   (unsigned long long) wbc->range_end,
-		   wbc->nonblocking, wbc->for_kupdate,
-		   wbc->for_reclaim, wbc->for_writepages,
-		   wbc->range_cyclic);
+	trace_ext4_da_writepages(inode, wbc);
 
 	/*
 	 * No pages to write? This is mainly a kludge to avoid starting
@@ -2595,14 +2582,7 @@ out_writepages:
 	if (!no_nrwrite_index_update)
 		wbc->no_nrwrite_index_update = 0;
 	wbc->nr_to_write -= nr_to_writebump;
-	trace_mark(ext4_da_writepage_result,
-		   "dev %s ino %lu ret %d pages_written %d "
-		   "pages_skipped %ld congestion %d "
-		   "more_io %d no_nrwrite_index_update %d",
-		   inode->i_sb->s_id, inode->i_ino, ret,
-		   pages_written, wbc->pages_skipped,
-		   wbc->encountered_congestion, wbc->more_io,
-		   wbc->no_nrwrite_index_update);
+	trace_ext4_da_writepages_result(inode, wbc, ret, pages_written);
 	return ret;
 }
 
@@ -2654,11 +2634,7 @@ static int ext4_da_write_begin(struct fi
 					len, flags, pagep, fsdata);
 	}
 	*fsdata = (void *)0;
-
-	trace_mark(ext4_da_write_begin,
-		   "dev %s ino %lu pos %llu len %u flags %u",
-		   inode->i_sb->s_id, inode->i_ino,
-		   (unsigned long long) pos, len, flags);
+	trace_ext4_da_write_begin(inode, pos, len, flags);
 retry:
 	/*
 	 * With delayed allocation, we don't log the i_disksize update
@@ -2751,10 +2727,7 @@ static int ext4_da_write_end(struct file
 		}
 	}
 
-	trace_mark(ext4_da_write_end,
-		   "dev %s ino %lu pos %llu len %u copied %u",
-		   inode->i_sb->s_id, inode->i_ino,
-		   (unsigned long long) pos, len, copied);
+	trace_ext4_da_write_end(inode, pos, len, copied);
 	start = pos & (PAGE_CACHE_SIZE - 1);
 	end = start + copied - 1;
 
@@ -2965,9 +2938,7 @@ static int ext4_normal_writepage(struct 
 	loff_t size = i_size_read(inode);
 	loff_t len;
 
-	trace_mark(ext4_normal_writepage,
-		   "dev %s ino %lu page_index %lu",
-		   inode->i_sb->s_id, inode->i_ino, page->index);
+	trace_ext4_normal_writepage(inode, page);
 	J_ASSERT(PageLocked(page));
 	if (page->index == size >> PAGE_CACHE_SHIFT)
 		len = size & ~PAGE_CACHE_MASK;
@@ -3053,9 +3024,7 @@ static int ext4_journalled_writepage(str
 	loff_t size = i_size_read(inode);
 	loff_t len;
 
-	trace_mark(ext4_journalled_writepage,
-		   "dev %s ino %lu page_index %lu",
-		   inode->i_sb->s_id, inode->i_ino, page->index);
+	trace_ext4_journalled_writepage(inode, page);
 	J_ASSERT(PageLocked(page));
 	if (page->index == size >> PAGE_CACHE_SHIFT)
 		len = size & ~PAGE_CACHE_MASK;
Index: linux-2.6-lttng/fs/ext4/mballoc.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/mballoc.c	2009-03-05 15:21:54.000000000 -0500
+++ linux-2.6-lttng/fs/ext4/mballoc.c	2009-03-05 15:49:28.000000000 -0500
@@ -343,6 +343,17 @@ static void release_blocks_on_commit(jou
 
 
 
+DEFINE_TRACE(ext4_discard_blocks);
+DEFINE_TRACE(ext4_mb_new_inode_pa);
+DEFINE_TRACE(ext4_mb_new_group_pa);
+DEFINE_TRACE(ext4_mb_release_inode_pa);
+DEFINE_TRACE(ext4_mb_release_group_pa);
+DEFINE_TRACE(ext4_discard_preallocations);
+DEFINE_TRACE(ext4_mb_discard_preallocations);
+DEFINE_TRACE(ext4_request_blocks);
+DEFINE_TRACE(ext4_allocate_blocks);
+DEFINE_TRACE(ext4_free_blocks);
+
 static inline void *mb_correct_addr_and_bit(int *bit, void *addr)
 {
 #if BITS_PER_LONG == 64
@@ -2878,9 +2889,8 @@ static void release_blocks_on_commit(jou
 		discard_block = (ext4_fsblk_t) entry->group * EXT4_BLOCKS_PER_GROUP(sb)
 			+ entry->start_blk
 			+ le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
-		trace_mark(ext4_discard_blocks, "dev %s blk %llu count %u",
-			   sb->s_id, (unsigned long long) discard_block,
-			   entry->count);
+		trace_ext4_discard_blocks(sb, (unsigned long long)discard_block,
+					  entry);
 		sb_issue_discard(sb, discard_block, entry->count);
 
 		kmem_cache_free(ext4_free_ext_cachep, entry);
@@ -3700,10 +3710,7 @@ ext4_mb_new_inode_pa(struct ext4_allocat
 
 	mb_debug("new inode pa %p: %llu/%u for %u\n", pa,
 			pa->pa_pstart, pa->pa_len, pa->pa_lstart);
-	trace_mark(ext4_mb_new_inode_pa,
-		   "dev %s ino %lu pstart %llu len %u lstart %u",
-		   sb->s_id, ac->ac_inode->i_ino,
-		   pa->pa_pstart, pa->pa_len, pa->pa_lstart);
+	trace_ext4_mb_new_inode_pa(ac, pa);
 
 	ext4_mb_use_inode_pa(ac, pa);
 	atomic_add(pa->pa_free, &EXT4_SB(sb)->s_mb_preallocated);
@@ -3762,9 +3769,8 @@ ext4_mb_new_group_pa(struct ext4_allocat
 	pa->pa_linear = 1;
 
 	mb_debug("new group pa %p: %llu/%u for %u\n", pa,
-		 pa->pa_pstart, pa->pa_len, pa->pa_lstart);
-	trace_mark(ext4_mb_new_group_pa, "dev %s pstart %llu len %u lstart %u",
-		   sb->s_id, pa->pa_pstart, pa->pa_len, pa->pa_lstart);
+			pa->pa_pstart, pa->pa_len, pa->pa_lstart);
+	trace_ext4_mb_new_group_pa(ac, pa);
 
 	ext4_mb_use_group_pa(ac, pa);
 	atomic_add(pa->pa_free, &EXT4_SB(sb)->s_mb_preallocated);
@@ -3854,10 +3860,8 @@ ext4_mb_release_inode_pa(struct ext4_bud
 			ext4_mb_store_history(ac);
 		}
 
-		trace_mark(ext4_mb_release_inode_pa,
-			   "dev %s ino %lu block %llu count %u",
-			   sb->s_id, pa->pa_inode->i_ino, grp_blk_start + bit,
-			   next - bit);
+		trace_ext4_mb_release_inode_pa(ac, pa, grp_blk_start + bit,
+					       next - bit);
 		mb_free_blocks(pa->pa_inode, e4b, bit, next - bit);
 		bit = next + 1;
 	}
@@ -3891,8 +3895,7 @@ ext4_mb_release_group_pa(struct ext4_bud
 	if (ac)
 		ac->ac_op = EXT4_MB_HISTORY_DISCARD;
 
-	trace_mark(ext4_mb_release_group_pa, "dev %s pstart %llu len %d",
-		   sb->s_id, pa->pa_pstart, pa->pa_len);
+	trace_ext4_mb_release_group_pa(ac, pa);
 	BUG_ON(pa->pa_deleted == 0);
 	ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, &bit);
 	BUG_ON(group != e4b->bd_group && pa->pa_len != 0);
@@ -4058,8 +4061,7 @@ void ext4_discard_preallocations(struct 
 	}
 
 	mb_debug("discard preallocation for inode %lu\n", inode->i_ino);
-	trace_mark(ext4_discard_preallocations, "dev %s ino %lu", sb->s_id,
-		   inode->i_ino);
+	trace_ext4_discard_preallocations(inode);
 
 	INIT_LIST_HEAD(&list);
 
@@ -4515,8 +4517,7 @@ static int ext4_mb_discard_preallocation
 	int ret;
 	int freed = 0;
 
-	trace_mark(ext4_mb_discard_preallocations, "dev %s needed %d",
-		   sb->s_id, needed);
+	trace_ext4_mb_discard_preallocations(sb, needed);
 	for (i = 0; i < EXT4_SB(sb)->s_groups_count && needed > 0; i++) {
 		ret = ext4_mb_discard_group_preallocations(sb, i, needed);
 		freed += ret;
@@ -4545,17 +4546,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t
 	sb = ar->inode->i_sb;
 	sbi = EXT4_SB(sb);
 
-	trace_mark(ext4_request_blocks, "dev %s flags %u len %u ino %lu "
-		   "lblk %llu goal %llu lleft %llu lright %llu "
-		   "pleft %llu pright %llu ",
-		   sb->s_id, ar->flags, ar->len,
-		   ar->inode ? ar->inode->i_ino : 0,
-		   (unsigned long long) ar->logical,
-		   (unsigned long long) ar->goal,
-		   (unsigned long long) ar->lleft,
-		   (unsigned long long) ar->lright,
-		   (unsigned long long) ar->pleft,
-		   (unsigned long long) ar->pright);
+	trace_ext4_request_blocks(ar);
 
 	if (!EXT4_I(ar->inode)->i_delalloc_reserved_flag) {
 		/*
@@ -4659,18 +4650,7 @@ out3:
 						reserv_blks);
 	}
 
-	trace_mark(ext4_allocate_blocks,
-		   "dev %s block %llu flags %u len %u ino %lu "
-		   "logical %llu goal %llu lleft %llu lright %llu "
-		   "pleft %llu pright %llu ",
-		   sb->s_id, (unsigned long long) block,
-		   ar->flags, ar->len, ar->inode ? ar->inode->i_ino : 0,
-		   (unsigned long long) ar->logical,
-		   (unsigned long long) ar->goal,
-		   (unsigned long long) ar->lleft,
-		   (unsigned long long) ar->lright,
-		   (unsigned long long) ar->pleft,
-		   (unsigned long long) ar->pright);
+	trace_ext4_allocate_blocks(ar, (unsigned long long)block);
 
 	return block;
 }
@@ -4805,10 +4785,7 @@ void ext4_mb_free_blocks(handle_t *handl
 	}
 
 	ext4_debug("freeing block %lu\n", block);
-	trace_mark(ext4_free_blocks,
-		   "dev %s block %llu count %lu metadata %d ino %lu",
-		   sb->s_id, (unsigned long long) block, count, metadata,
-		   inode ? inode->i_ino : 0);
+	trace_ext4_free_blocks(inode, block, count, metadata);
 
 	ac = kmem_cache_alloc(ext4_ac_cachep, GFP_NOFS);
 	if (ac) {
Index: linux-2.6-lttng/fs/ext4/mballoc.h
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/mballoc.h	2009-03-05 15:21:54.000000000 -0500
+++ linux-2.6-lttng/fs/ext4/mballoc.h	2009-03-05 15:49:28.000000000 -0500
@@ -19,8 +19,8 @@
 #include <linux/seq_file.h>
 #include <linux/version.h>
 #include <linux/blkdev.h>
-#include <linux/marker.h>
 #include <linux/mutex.h>
+#include <trace/ext4.h>
 #include "ext4_jbd2.h"
 #include "ext4.h"
 #include "group.h"
Index: linux-2.6-lttng/fs/ext4/super.c
===================================================================
--- linux-2.6-lttng.orig/fs/ext4/super.c	2009-03-05 15:21:54.000000000 -0500
+++ linux-2.6-lttng/fs/ext4/super.c	2009-03-05 15:49:28.000000000 -0500
@@ -35,9 +35,9 @@
 #include <linux/quotaops.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
-#include <linux/marker.h>
 #include <linux/log2.h>
 #include <linux/crc16.h>
+#include <trace/ext4.h>
 #include <asm/uaccess.h>
 
 #include "ext4.h"
@@ -47,6 +47,8 @@
 #include "namei.h"
 #include "group.h"
 
+DEFINE_TRACE(ext4_sync_fs);
+
 struct proc_dir_entry *ext4_proc_root;
 
 static int ext4_load_journal(struct super_block *, struct ext4_super_block *,
@@ -3048,7 +3050,7 @@ static int ext4_sync_fs(struct super_blo
 	int ret = 0;
 	tid_t target;
 
-	trace_mark(ext4_sync_fs, "dev %s wait %d", sb->s_id, wait);
+	trace_ext4_sync_fs(sb, wait);
 	sb->s_dirt = 0;
 	if (EXT4_SB(sb)->s_journal) {
 		if (jbd2_journal_start_commit(EXT4_SB(sb)->s_journal,
Index: linux-2.6-lttng/include/trace/ext4.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/ext4.h	2009-03-05 15:49:28.000000000 -0500
@@ -0,0 +1,129 @@
+#ifndef _TRACE_EXT4_H
+#define _TRACE_EXT4_H
+
+#include <linux/tracepoint.h>
+
+struct ext4_free_data;
+struct ext4_prealloc_space;
+struct ext4_allocation_context;
+struct ext4_allocation_request;
+
+DECLARE_TRACE(ext4_free_inode,
+	TPPROTO(struct inode *inode),
+		TPARGS(inode));
+
+DECLARE_TRACE(ext4_request_inode,
+	TPPROTO(struct inode *dir, int mode),
+		TPARGS(dir, mode));
+
+DECLARE_TRACE(ext4_allocate_inode,
+	TPPROTO(struct inode *inode, struct inode *dir, int mode),
+		TPARGS(inode, dir, mode));
+
+DECLARE_TRACE(ext4_write_begin,
+	TPPROTO(struct inode *inode, loff_t pos, unsigned int len,
+			unsigned int flags),
+		TPARGS(inode, pos, len, flags));
+
+DECLARE_TRACE(ext4_ordered_write_end,
+	TPPROTO(struct inode *inode, loff_t pos, unsigned int len,
+			unsigned int copied),
+		TPARGS(inode, pos, len, copied));
+
+DECLARE_TRACE(ext4_writeback_write_end,
+	TPPROTO(struct inode *inode, loff_t pos, unsigned int len,
+			unsigned int copied),
+		TPARGS(inode, pos, len, copied));
+
+DECLARE_TRACE(ext4_journalled_write_end,
+	TPPROTO(struct inode *inode, loff_t pos, unsigned int len,
+			unsigned int copied),
+		TPARGS(inode, pos, len, copied));
+
+DECLARE_TRACE(ext4_da_writepage,
+	TPPROTO(struct inode *inode, struct page *page),
+		TPARGS(inode, page));
+
+DECLARE_TRACE(ext4_da_writepages,
+	TPPROTO(struct inode *inode, struct writeback_control *wbc),
+		TPARGS(inode, wbc));
+
+DECLARE_TRACE(ext4_da_writepages_result,
+	TPPROTO(struct inode *inode, struct writeback_control *wbc,
+			int ret, int pages_written),
+		TPARGS(inode, wbc, ret, pages_written));
+
+DECLARE_TRACE(ext4_da_write_begin,
+	TPPROTO(struct inode *inode, loff_t pos, unsigned int len,
+			unsigned int flags),
+		TPARGS(inode, pos, len, flags));
+
+DECLARE_TRACE(ext4_da_write_end,
+	TPPROTO(struct inode *inode, loff_t pos, unsigned int len,
+			unsigned int copied),
+		TPARGS(inode, pos, len, copied));
+
+DECLARE_TRACE(ext4_normal_writepage,
+	TPPROTO(struct inode *inode, struct page *page),
+		TPARGS(inode, page));
+
+DECLARE_TRACE(ext4_journalled_writepage,
+	TPPROTO(struct inode *inode, struct page *page),
+		TPARGS(inode, page));
+
+DECLARE_TRACE(ext4_discard_blocks,
+	TPPROTO(struct super_block *sb, unsigned long long blk,
+			struct ext4_free_data *entry),
+		TPARGS(sb, blk, entry));
+
+DECLARE_TRACE(ext4_mb_new_inode_pa,
+	TPPROTO(struct ext4_allocation_context *ac,
+			struct ext4_prealloc_space *pa),
+		TPARGS(ac, pa));
+
+DECLARE_TRACE(ext4_mb_new_group_pa,
+	TPPROTO(struct ext4_allocation_context *ac,
+			struct ext4_prealloc_space *pa),
+		TPARGS(ac, pa));
+
+DECLARE_TRACE(ext4_mb_release_inode_pa,
+	TPPROTO(struct ext4_allocation_context *ac,
+			struct ext4_prealloc_space *pa,
+			unsigned long long block, unsigned int count),
+		TPARGS(ac, pa, block, count));
+
+DECLARE_TRACE(ext4_mb_release_group_pa,
+	TPPROTO(struct ext4_allocation_context *ac,
+			struct ext4_prealloc_space *pa),
+		TPARGS(ac, pa));
+
+DECLARE_TRACE(ext4_discard_preallocations,
+	TPPROTO(struct inode *inode),
+		TPARGS(inode));
+
+DECLARE_TRACE(ext4_mb_discard_preallocations,
+	TPPROTO(struct super_block *sb, int needed),
+		TPARGS(sb, needed));
+
+DECLARE_TRACE(ext4_request_blocks,
+	TPPROTO(struct ext4_allocation_request *ar),
+		TPARGS(ar));
+
+DECLARE_TRACE(ext4_allocate_blocks,
+	TPPROTO(struct ext4_allocation_request *ar, unsigned long long block),
+		TPARGS(ar, block));
+
+DECLARE_TRACE(ext4_free_blocks,
+	TPPROTO(struct inode *inode, unsigned long block, unsigned long count,
+			int metadata),
+		TPARGS(inode, block, count, metadata));
+
+DECLARE_TRACE(ext4_sync_file,
+	TPPROTO(struct file *file, struct dentry *dentry, int datasync),
+		TPARGS(file, dentry, datasync));
+
+DECLARE_TRACE(ext4_sync_fs,
+	TPPROTO(struct super_block *sb, int wait),
+		TPARGS(sb, wait));
+
+#endif

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC patch 41/41] JBD2: use tracepoints for instrumentation
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (39 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 40/41] EXT4: instrumentation with tracepoints Mathieu Desnoyers
@ 2009-03-05 22:48 ` Mathieu Desnoyers
  2009-03-06 10:11 ` [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Ingo Molnar
  2009-03-06 18:34 ` Steven Rostedt
  42 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 22:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md
  Cc: Mathieu Desnoyers, Theodore Tso, Stephen C. Tweedie, linux-ext4

[-- Attachment #1: jbd2-instrumentation-move-to-tracepoints.patch --]
[-- Type: text/plain, Size: 3887 bytes --]

Moved the jbd2 instrumentation to tracepoints. This incrases maintainability.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Stephen C. Tweedie <sct@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-ext4@vger.kernel.org
---
 fs/jbd2/checkpoint.c |    7 ++++---
 fs/jbd2/commit.c     |   12 ++++++------
 include/trace/jbd2.h |   19 +++++++++++++++++++
 3 files changed, 29 insertions(+), 9 deletions(-)

Index: linux-2.6-lttng/fs/jbd2/checkpoint.c
===================================================================
--- linux-2.6-lttng.orig/fs/jbd2/checkpoint.c	2009-01-30 11:58:01.000000000 -0500
+++ linux-2.6-lttng/fs/jbd2/checkpoint.c	2009-01-30 11:58:18.000000000 -0500
@@ -20,9 +20,11 @@
 #include <linux/time.h>
 #include <linux/fs.h>
 #include <linux/jbd2.h>
-#include <linux/marker.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
+#include <trace/jbd2.h>
+
+DEFINE_TRACE(jbd2_checkpoint);
 
 /*
  * Unlink a buffer from a transaction checkpoint list.
@@ -358,8 +360,7 @@ int jbd2_log_do_checkpoint(journal_t *jo
 	 * journal straight away.
 	 */
 	result = jbd2_cleanup_journal_tail(journal);
-	trace_mark(jbd2_checkpoint, "dev %s need_checkpoint %d",
-		   journal->j_devname, result);
+	trace_jbd2_checkpoint(journal, result);
 	jbd_debug(1, "cleanup_journal_tail returned %d\n", result);
 	if (result <= 0)
 		return result;
Index: linux-2.6-lttng/fs/jbd2/commit.c
===================================================================
--- linux-2.6-lttng.orig/fs/jbd2/commit.c	2009-01-30 11:58:01.000000000 -0500
+++ linux-2.6-lttng/fs/jbd2/commit.c	2009-01-30 12:00:21.000000000 -0500
@@ -16,7 +16,6 @@
 #include <linux/time.h>
 #include <linux/fs.h>
 #include <linux/jbd2.h>
-#include <linux/marker.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
@@ -26,6 +25,10 @@
 #include <linux/writeback.h>
 #include <linux/backing-dev.h>
 #include <linux/bio.h>
+#include <trace/jbd2.h>
+
+DEFINE_TRACE(jbd2_start_commit);
+DEFINE_TRACE(jbd2_end_commit);
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -393,8 +396,7 @@ void jbd2_journal_commit_transaction(jou
 	commit_transaction = journal->j_running_transaction;
 	J_ASSERT(commit_transaction->t_state == T_RUNNING);
 
-	trace_mark(jbd2_start_commit, "dev %s transaction %d",
-		   journal->j_devname, commit_transaction->t_tid);
+	trace_jbd2_start_commit(journal, commit_transaction);
 	jbd_debug(1, "JBD: starting commit of transaction %d\n",
 			commit_transaction->t_tid);
 
@@ -1045,9 +1047,7 @@ restart_loop:
 	if (journal->j_commit_callback)
 		journal->j_commit_callback(journal, commit_transaction);
 
-	trace_mark(jbd2_end_commit, "dev %s transaction %d head %d",
-		   journal->j_devname, commit_transaction->t_tid,
-		   journal->j_tail_sequence);
+	trace_jbd2_end_commit(journal, commit_transaction);
 	jbd_debug(1, "JBD: commit %d complete, head %d\n",
 		  journal->j_commit_sequence, journal->j_tail_sequence);
 	if (to_free)
Index: linux-2.6-lttng/include/trace/jbd2.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/trace/jbd2.h	2009-01-30 12:03:33.000000000 -0500
@@ -0,0 +1,19 @@
+#ifndef _TRACE_JBD2_H
+#define _TRACE_JBD2_H
+
+#include <linux/tracepoint.h>
+#include <linux/jbd2.h>
+
+DECLARE_TRACE(jbd2_checkpoint,
+	TPPROTO(journal_t *journal, int result),
+		TPARGS(journal, result));
+
+DECLARE_TRACE(jbd2_start_commit,
+	TPPROTO(journal_t *journal, transaction_t *commit_transaction),
+		TPARGS(journal, commit_transaction));
+
+DECLARE_TRACE(jbd2_end_commit,
+	TPPROTO(journal_t *journal, transaction_t *commit_transaction),
+		TPARGS(journal, commit_transaction));
+
+#endif

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 29/41] LTTng menus
  2009-03-05 22:47 ` [RFC patch 29/41] LTTng menus Mathieu Desnoyers
@ 2009-03-05 23:35   ` Randy Dunlap
  2009-03-05 23:47     ` Mathieu Desnoyers
  0 siblings, 1 reply; 57+ messages in thread
From: Randy Dunlap @ 2009-03-05 23:35 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md

Mathieu Desnoyers wrote:

<<attachment :(>>


+menuconfig LTT
+	bool "Linux Trace Toolkit Next Generation (LTTng)"
+	depends on EXPERIMENTAL
+	select MARKERS
+	select TRACEPOINTS
+	default y

Not default 'y', please.

-- 
~Randy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 29/41] LTTng menus
  2009-03-05 23:35   ` Randy Dunlap
@ 2009-03-05 23:47     ` Mathieu Desnoyers
  2009-03-05 23:51       ` Randy Dunlap
  0 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-05 23:47 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md

* Randy Dunlap (randy.dunlap@oracle.com) wrote:
> Mathieu Desnoyers wrote:
> 
> <<attachment :(>>
> 
> 
> +menuconfig LTT
> +	bool "Linux Trace Toolkit Next Generation (LTTng)"
> +	depends on EXPERIMENTAL
> +	select MARKERS
> +	select TRACEPOINTS
> +	default y
> 
> Not default 'y', please.
> 

OK, so default n it is. But I plan to leave the main menu "sub-features" as
default y, given that people get the standard features when they choose
to enable the tracer. Hopefully this is ok ?

Mathieu

> -- 
> ~Randy

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 29/41] LTTng menus
  2009-03-05 23:47     ` Mathieu Desnoyers
@ 2009-03-05 23:51       ` Randy Dunlap
  2009-03-06  0:01         ` [ltt-dev] " Mathieu Desnoyers
  0 siblings, 1 reply; 57+ messages in thread
From: Randy Dunlap @ 2009-03-05 23:51 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	Steven Rostedt, ltt-dev, Peter Zijlstra, Frederic Weisbecker,
	Arjan van de Ven, Pekka Paalanen, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Martin Bligh, Frank Ch. Eigler, Tom Zanussi,
	Masami Hiramatsu, KOSAKI Motohiro, Jason Baron,
	Christoph Hellwig, Jiaying Zhang, Eduard - Gabriel Munteanu,
	mrubin, md

Mathieu Desnoyers wrote:
> * Randy Dunlap (randy.dunlap@oracle.com) wrote:
>> Mathieu Desnoyers wrote:
>>
>> <<attachment :(>>
>>
>>
>> +menuconfig LTT
>> +	bool "Linux Trace Toolkit Next Generation (LTTng)"
>> +	depends on EXPERIMENTAL
>> +	select MARKERS
>> +	select TRACEPOINTS
>> +	default y
>>
>> Not default 'y', please.
>>
> 
> OK, so default n it is. But I plan to leave the main menu "sub-features" as
> default y, given that people get the standard features when they choose
> to enable the tracer. Hopefully this is ok ?

Sure, as long as it just enables viewing the menu and not adding
code to a growing kernel.

-- 
~Randy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [ltt-dev] [RFC patch 29/41] LTTng menus
  2009-03-05 23:51       ` Randy Dunlap
@ 2009-03-06  0:01         ` Mathieu Desnoyers
  2009-03-06  0:12           ` Randy Dunlap
  0 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06  0:01 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: mrubin, Peter Zijlstra, Frederic Weisbecker, Pekka Paalanen,
	H. Peter Anvin, md, Tom Zanussi, Christoph Hellwig,
	Frank Ch. Eigler, ltt-dev, Eduard - Gabriel Munteanu,
	Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Arjan van de Ven, linux-kernel, Martin Bligh, Andrew Morton,
	Linus Torvalds

* Randy Dunlap (randy.dunlap@oracle.com) wrote:
> Mathieu Desnoyers wrote:
> > * Randy Dunlap (randy.dunlap@oracle.com) wrote:
> >> Mathieu Desnoyers wrote:
> >>
> >> <<attachment :(>>
> >>
> >>
> >> +menuconfig LTT
> >> +	bool "Linux Trace Toolkit Next Generation (LTTng)"
> >> +	depends on EXPERIMENTAL
> >> +	select MARKERS
> >> +	select TRACEPOINTS
> >> +	default y
> >>
> >> Not default 'y', please.
> >>
> > 
> > OK, so default n it is. But I plan to leave the main menu "sub-features" as
> > default y, given that people get the standard features when they choose
> > to enable the tracer. Hopefully this is ok ?
> 
> Sure, as long as it just enables viewing the menu and not adding
> code to a growing kernel.
> 

I want to be sure to understand your point. Would be following be OK ?

Menu [ ] Linux Trace Toolkit Next Generation (LTTng)  --->  (default n)

Within this menu, the following options enable various tracer modules,
some of which are typically needed, except in some very specific tracer
use :

< >   Linux Trace Toolkit Lock-Protected Data Relay (default n)
               (default y is planned to be used for the lockless data
               relay module, which is not posted as part of this patchset)
[ ]   Debug check for random access in ltt relay buffers (default n)
<*>   Linux Trace Toolkit Serializer (default y)
-*-   Linux Trace Toolkit Custom Serializer (default y)
-*-   Linux Trace Toolkit Trace Controller (default m)
<*>   Linux Trace Toolkit Tracer (default y)
[ ]   Align Linux Trace Toolkit Traces (default n, selected of
        !HAVE_EFFICIENT_UNALIGNED_ACCESS)
[ ]   Add event size field to LTT events for tracer debugging (default n)
<M>   Support logging events from userspace (default m)
[*]   Support trace extraction from crash dump (default y)
[*]   Linux Trace Toolkit Kprobes Support (default y)

I understand from your answer above that just enabling the "LTTng"
submenu should not activate any of these items, am I correct ?

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [ltt-dev] [RFC patch 29/41] LTTng menus
  2009-03-06  0:01         ` [ltt-dev] " Mathieu Desnoyers
@ 2009-03-06  0:12           ` Randy Dunlap
  0 siblings, 0 replies; 57+ messages in thread
From: Randy Dunlap @ 2009-03-06  0:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mrubin, Peter Zijlstra, Frederic Weisbecker, Pekka Paalanen,
	H. Peter Anvin, md, Tom Zanussi, Christoph Hellwig,
	Frank Ch. Eigler, ltt-dev, Eduard - Gabriel Munteanu,
	Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Arjan van de Ven, linux-kernel, Martin Bligh, Andrew Morton,
	Linus Torvalds

Mathieu Desnoyers wrote:
> * Randy Dunlap (randy.dunlap@oracle.com) wrote:
>> Mathieu Desnoyers wrote:
>>> * Randy Dunlap (randy.dunlap@oracle.com) wrote:
>>>> Mathieu Desnoyers wrote:
>>>>
>>>> <<attachment :(>>
>>>>
>>>>
>>>> +menuconfig LTT
>>>> +	bool "Linux Trace Toolkit Next Generation (LTTng)"
>>>> +	depends on EXPERIMENTAL
>>>> +	select MARKERS
>>>> +	select TRACEPOINTS
>>>> +	default y
>>>>
>>>> Not default 'y', please.
>>>>
>>> OK, so default n it is. But I plan to leave the main menu "sub-features" as
>>> default y, given that people get the standard features when they choose
>>> to enable the tracer. Hopefully this is ok ?
>> Sure, as long as it just enables viewing the menu and not adding
>> code to a growing kernel.
>>

I see what you mean now.  Thanks for the details.

> 
> I want to be sure to understand your point. Would be following be OK ?
> 
> Menu [ ] Linux Trace Toolkit Next Generation (LTTng)  --->  (default n)
> 
> Within this menu, the following options enable various tracer modules,
> some of which are typically needed, except in some very specific tracer
> use :
> 
> < >   Linux Trace Toolkit Lock-Protected Data Relay (default n)
>                (default y is planned to be used for the lockless data
>                relay module, which is not posted as part of this patchset)
> [ ]   Debug check for random access in ltt relay buffers (default n)
> <*>   Linux Trace Toolkit Serializer (default y)
> -*-   Linux Trace Toolkit Custom Serializer (default y)
> -*-   Linux Trace Toolkit Trace Controller (default m)
> <*>   Linux Trace Toolkit Tracer (default y)
> [ ]   Align Linux Trace Toolkit Traces (default n, selected of
>         !HAVE_EFFICIENT_UNALIGNED_ACCESS)
> [ ]   Add event size field to LTT events for tracer debugging (default n)
> <M>   Support logging events from userspace (default m)
> [*]   Support trace extraction from crash dump (default y)
> [*]   Linux Trace Toolkit Kprobes Support (default y)
> 
> I understand from your answer above that just enabling the "LTTng"
> submenu should not activate any of these items, am I correct ?

Ideally one wouldn't add bloat to the kernel, but if someone enables
the top-level menu item, I'm OK with enabling others under it.

(not that we all consider the same things to be bloat ;)

-- 
~Randy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (40 preceding siblings ...)
  2009-03-05 22:48 ` [RFC patch 41/41] JBD2: use tracepoints for instrumentation Mathieu Desnoyers
@ 2009-03-06 10:11 ` Ingo Molnar
  2009-03-06 19:02   ` Mathieu Desnoyers
  2009-03-06 18:34 ` Steven Rostedt
  42 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2009-03-06 10:11 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Steven Rostedt,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> Hi,
> 
> I spent the last 4-5 months working with the Fujitsu team at 
> implementing the tracer elements identified as goals at Kernel 
> Summit 2008 and at the following Plumber Conference. My idea 
> was to incremententally adapt the LTTng tracer, currently used 
> in the industry and well tested, to those requirements.
> 
> I spent the last days rearranging/folding/inspecting the LTTng 
> patchset to prepare it for an LKML post. The version 0.105 in 
> the LTTng git tree corresponds to the patchset I am posting 
> here. The said patchset will only include the core features of 
> LTTng, excluding the timestamping infrastructure (trace clock) 
> and excluding the instrumentation.

I'd like to merge the good bits into the tracing tree. Looking 
at the patches you submitted there's a lot of avoidable overlap 
with existing tracing features either present upstream already 
or queued up for v2.6.30 - and we need to work more on 
eliminating that overlap.

I dont think there's much fundamental disagreement just 
different implementations - so we should evaluate each of those 
details one by one, iteratively.

The first step would be to split the patches up into three 
logical buckets:

 - Unique features not present in the tracing infracture, in the 
   event tracer or other tracing plugins - those should be 
   structured as feature additions.

 - Features that you consider superior to existing tracing
   features of the kernel. For those, please iterate the
   existing code with your enhancements - instead of a parallel 
   implementation.

 - Items which offer nothing new and are not superior to 
   existing features, those should be dropped probably. This too 
   is a case by case thing.

Would you be interested in working with us on that? I know that 
both Steve and me would be very much interested in that. If you 
have time/interest to work on that then we can go through each 
patch one by one and categorize them and map out the way to go.

Let me give you a few examples of existing areas of overlap:

> The corresponding git tree contains also the trace clock 
> patches and the lttng instrumentation. The trace clock is 
> required to use the tracer, but it can be used without the 
> instrumentation : there is already a kprobes and userspace 
> event support included in this patchset.

The latest tracing tree includes kernel/tracing/trace_clock.c 
which offers three trace clock variants, with different 
performance/precision tradeoffs:

 trace_clock_local()   [ for pure CPU-local tracers with no idle 
                         events. This is the fastest but least 
                         coherent tracing clock. ]

 trace_clock()         [ intermediate, scalable clock with
                         usable but imprecise global coherency. ]

 trace_clock_global()  [ globally serialized, coherent clock. 
                         It is the slowest but most accurate variant. ]

Tracing plugins can pick their choice. (This is relatively new 
code but you get the idea.)

> This tracer exports binary data through buffers using 
> splice(). The resulting binary files can be parsed from 
> userspace because the format string metadata is exported in 
> the files. The event set can be enhanced by adding tracepoints 
> to the kernel code and by creating probe modules, which 
> connects callbacks to the tracepoints and contain the format 
> string metainformation. Those callbacks are responsible for 
> writing the data in the trace buffers. This separation between 
> the trace buffer format string and the tracepoints is done on 
> purpose so the core kernel instrumentation (tracepoints) is 
> not exported to userspace, which will make maintainance much 
> easier.

A tracepoint format specification mechanism plus working (and 
fast!) zero-copy splice() support of the ring-buffer exists in 
the latest tracing tree already - as you are probably aware of 
because you commented on those patches a few days ago.

There are 3 good ways to go from here regarding the trace 
buffering and splice code:

  1- we end up switching to the lttng version in essence
  2- we end up keeping the tracing tree version
  3- we end up somewhere inbetween

Which point in the above spectrum we will settle down on depends 
on the technical details.

Note, whichever path we choose a gradual, iterative workflow is 
still needed, so that we improve the existing upstream code with
lttng enhancements gradually.

This approach works for all your other patches as well. A 
direct, constructive comparison and active work on unifying them 
is required.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
                   ` (41 preceding siblings ...)
  2009-03-06 10:11 ` [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Ingo Molnar
@ 2009-03-06 18:34 ` Steven Rostedt
  2009-03-06 19:01   ` Frederic Weisbecker
  42 siblings, 1 reply; 57+ messages in thread
From: Steven Rostedt @ 2009-03-06 18:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md


Hi Mathieu,

Thanks for posting this. But it might be better to post in much smaller 
chunks. Lets work out the little things first. Posting 41 patches is a bit 
overwhelming. Took me a few hours to look at them all, and when I got to 
the end, I forgot what was at the beginning.

There's also minor changes to core kernel infrastructure code. seq_file, 
exporting functions, and such. These really need to be packaged 
separately, and sent to the proper maintainers. Having them in a patch 
bomb does not get the proper focus that they need.


On Thu, 5 Mar 2009, Mathieu Desnoyers wrote:

> Hi,
> 
> I spent the last 4-5 months working with the Fujitsu team at implementing the
> tracer elements identified as goals at Kernel Summit 2008 and at the following
> Plumber Conference. My idea was to incremententally adapt the LTTng tracer,
> currently used in the industry and well tested, to those requirements.

We really need to work together on this too. The biggest requirement that 
came out of that conference was to have a "single unified buffering 
system". And this was discussed quite heavily on LKML afterwards. All 
development was done incrementally and publicly.

> 
> I spent the last days rearranging/folding/inspecting the LTTng patchset
> to prepare it for an LKML post. The version 0.105 in the LTTng git tree
> corresponds to the patchset I am posting here. The said patchset will
> only include the core features of LTTng, excluding the timestamping
> infrastructure (trace clock) and excluding the instrumentation.
> 
> The corresponding git tree contains also the trace clock patches and the lttng
> instrumentation. The trace clock is required to use the tracer, but it can be
> used without the instrumentation : there is already a kprobes and userspace
> event support included in this patchset.
> 
> This tracer exports binary data through buffers using splice(). The resulting
> binary files can be parsed from userspace because the format string metadata is
> exported in the files. The event set can be enhanced by adding tracepoints to
> the kernel code and by creating probe modules, which connects callbacks to the
> tracepoints and contain the format string metainformation. Those callbacks are
> responsible for writing the data in the trace buffers. This separation between
> the trace buffer format string and the tracepoints is done on purpose so the
> core kernel instrumentation (tracepoints) is not exported to userspace, which
> will make maintainance much easier.

I've discussed this in my previous email. There is still a separation with
the TRACE_EVENT_FORMAT and the maintainers code. The format sting is 
"hint" only and may change without notice. LTTng could use it or ignore 
it, it is up to the tracer to actually export that string. ftrace chose to 
export it because it was a simple way to extract that information. My 
utility will need to do a bit more work when the events get more complex, 
but the way it is set up, we can do that on a case by case basis.


> 
> The tree including the trace clock patches is available at :
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
> branch : 2.6.29-rc7-lttng-0.105
> 
> Project website : http://www.lttng.org/
> 
> Information about how to install and use the tracer is available at :
> 
> http://ltt.polymtl.ca/svn/trunk/lttv/LTTngManual.html
> 
> The size of the LTTng core patchset is 41 patches. The diffstat details
> as follow :


Again, this is overwhelming. This needs to be broken up into a small 
subsets that can be examined piece by piece.

> 
>  include/linux/ltt-core.h                                  |   35 
>  include/linux/ltt-relay.h                                 |  161 +
>  include/linux/ltt-tracer.h                                |   43 
>  include/linux/marker.h                                    |  121 
>  kernel/marker.c                                           |  353 ++
>  kernel/module.c                                           |   31 
>  linux-2.6-lttng/Documentation/markers.txt                 |   17 
>  linux-2.6-lttng/MAINTAINERS                               |    7 
>  linux-2.6-lttng/Makefile                                  |    2 
>  linux-2.6-lttng/arch/powerpc/kernel/traps.c               |    5 
>  linux-2.6-lttng/arch/powerpc/platforms/cell/spufs/spufs.h |    6 
>  linux-2.6-lttng/arch/sparc/Makefile                       |    2 
>  linux-2.6-lttng/arch/x86/kernel/dumpstack.c               |    5 
>  linux-2.6-lttng/arch/x86/mm/fault.c                       |    1 
>  linux-2.6-lttng/fs/ext4/fsync.c                           |    8 
>  linux-2.6-lttng/fs/ext4/ialloc.c                          |   17 
>  linux-2.6-lttng/fs/ext4/inode.c                           |   79 
>  linux-2.6-lttng/fs/ext4/mballoc.c                         |   71 
>  linux-2.6-lttng/fs/ext4/mballoc.h                         |    2 
>  linux-2.6-lttng/fs/ext4/super.c                           |    6 
>  linux-2.6-lttng/fs/jbd2/checkpoint.c                      |    7 
>  linux-2.6-lttng/fs/jbd2/commit.c                          |   12 
>  linux-2.6-lttng/fs/pipe.c                                 |    5 
>  linux-2.6-lttng/fs/select.c                               |   41 
>  linux-2.6-lttng/fs/seq_file.c                             |   45 
>  linux-2.6-lttng/fs/splice.c                               |    1 

There is a lot of code above that needs to be in their own patch series.
Maintainers do not have the time to pick through 41 patches to find out 
which patch might deal with their code.

Thanks,

-- Steve


>  linux-2.6-lttng/include/linux/immediate.h                 |   94 
>  linux-2.6-lttng/include/linux/kvm_host.h                  |   12 
>  linux-2.6-lttng/include/linux/ltt-channels.h              |   94 
>  linux-2.6-lttng/include/linux/ltt-core.h                  |   47 
>  linux-2.6-lttng/include/linux/ltt-relay.h                 |  186 +
>  linux-2.6-lttng/include/linux/ltt-tracer.h                |  731 ++++++
>  linux-2.6-lttng/include/linux/ltt-type-serializer.h       |  107 
>  linux-2.6-lttng/include/linux/marker.h                    |   16 
>  linux-2.6-lttng/include/linux/module.h                    |    6 
>  linux-2.6-lttng/include/linux/poll.h                      |    2 
>  linux-2.6-lttng/include/linux/seq_file.h                  |   20 
>  linux-2.6-lttng/include/trace/ext4.h                      |  129 +
>  linux-2.6-lttng/include/trace/jbd2.h                      |   19 
>  linux-2.6-lttng/init/Kconfig                              |    2 
>  linux-2.6-lttng/kernel/kallsyms.c                         |    1 
>  linux-2.6-lttng/kernel/marker.c                           |   12 
>  linux-2.6-lttng/kernel/module.c                           |   32 
>  linux-2.6-lttng/ltt/Kconfig                               |  130 +
>  linux-2.6-lttng/ltt/Makefile                              |   15 
>  linux-2.6-lttng/ltt/ltt-channels.c                        |  338 ++
>  linux-2.6-lttng/ltt/ltt-core.c                            |  101 
>  linux-2.6-lttng/ltt/ltt-filter.c                          |   66 
>  linux-2.6-lttng/ltt/ltt-kprobes.c                         |  479 +++
>  linux-2.6-lttng/ltt/ltt-marker-control.c                  |  265 ++
>  linux-2.6-lttng/ltt/ltt-relay-alloc.c                     |  715 +++++
>  linux-2.6-lttng/ltt/ltt-relay-locked.c                    | 1704 ++++++++++++++
>  linux-2.6-lttng/ltt/ltt-serialize.c                       |  685 +++++
>  linux-2.6-lttng/ltt/ltt-trace-control.c                   | 1061 ++++++++
>  linux-2.6-lttng/ltt/ltt-tracer.c                          | 1210 +++++++++
>  linux-2.6-lttng/ltt/ltt-type-serializer.c                 |   96 
>  linux-2.6-lttng/ltt/ltt-userspace-event.c                 |  131 +
>  linux-2.6-lttng/samples/markers/Makefile                  |    2 
>  linux-2.6-lttng/samples/markers/marker-example.c          |    4 
>  linux-2.6-lttng/samples/markers/probe-example.c           |   10 
>  linux-2.6-lttng/samples/markers/test-multi.c              |  120 
>  linux-2.6-lttng/virt/kvm/kvm_trace.c                      |   12 
>  ltt/Kconfig                                               |   24 
>  ltt/Makefile                                              |    2 
>  ltt/ltt-relay-alloc.c                                     |   80 
>  65 files changed, 9445 insertions(+), 398 deletions(-)
> 
> 
> Comments are welcome.
> 
> Mathieu
> 
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 01/41] LTTng - core header
  2009-03-05 22:47 ` [RFC patch 01/41] LTTng - core header Mathieu Desnoyers
@ 2009-03-06 18:37   ` Steven Rostedt
  0 siblings, 0 replies; 57+ messages in thread
From: Steven Rostedt @ 2009-03-06 18:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md


On Thu, 5 Mar 2009, Mathieu Desnoyers wrote:

> Contains the structures required by the builtin part of the LTTng tracer.


Also note, it is better to push the core changes needed first. This way if 
something comes up, you can make your changes to your code before it gets 
into the kernel. This has been the approach I try to use. I send out patch 
series that modify the core kernel first, see what the feed back is, get 
them accepted, before continuing with further patches.

Thanks,

-- Steve


> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> ---
>  include/linux/ltt-core.h |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> Index: linux-2.6-lttng/include/linux/ltt-core.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/linux/ltt-core.h	2009-03-04 13:37:26.000000000 -0500
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright (C) 2005,2006 Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
> + *
> + * This contains the core definitions for the Linux Trace Toolkit.
> + */
> +
> +#ifndef LTT_CORE_H
> +#define LTT_CORE_H
> +
> +#include <linux/list.h>
> +#include <linux/percpu.h>
> +
> +/* ltt's root dir in debugfs */
> +#define LTT_ROOT        "ltt"
> +
> +/*
> + * All modifications of ltt_traces must be done by ltt-tracer.c, while holding
> + * the semaphore. Only reading of this information can be done elsewhere, with
> + * the RCU mechanism : the preemption must be disabled while reading the
> + * list.
> + */
> +struct ltt_traces {
> +	struct list_head setup_head;	/* Pre-allocated traces list */
> +	struct list_head head;		/* Allocated Traces list */
> +	unsigned int num_active_traces;	/* Number of active traces */
> +} ____cacheline_aligned;
> +
> +extern struct ltt_traces ltt_traces;
> +
> +/*
> + * get dentry of ltt's root dir
> + */
> +struct dentry *get_ltt_root(void);
> +
> +void put_ltt_root(void);
> +
> +/* Keep track of trap nesting inside LTT */
> +DECLARE_PER_CPU(unsigned int, ltt_nesting);
> +
> +typedef int (*ltt_run_filter_functor)(void *trace, uint16_t eID);
> +
> +extern ltt_run_filter_functor ltt_run_filter;
> +
> +extern void ltt_filter_register(ltt_run_filter_functor func);
> +extern void ltt_filter_unregister(void);
> +
> +#endif /* LTT_CORE_H */
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 02/41] LTTng - core data structures
  2009-03-05 22:47 ` [RFC patch 02/41] LTTng - core data structures Mathieu Desnoyers
@ 2009-03-06 18:41   ` Steven Rostedt
  0 siblings, 0 replies; 57+ messages in thread
From: Steven Rostedt @ 2009-03-06 18:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Ingo Molnar, linux-kernel, Andrew Morton,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md



On Thu, 5 Mar 2009, Mathieu Desnoyers wrote:

> Home of the traces data structures. Needs to be built into the kernel.
> 
> LTT heartbeat is a module specialized into firing periodical interrupts to
> record events in traces (so cycle counter rollover can be detected) and to
> update the 64 bits "synthetic TSC" (extended from the CPU 32 bits TSC on MIPS).
> Also needs to be built into the kernel.

Why is patch 1 and 2 separate. They look like they should be a single 
patch. Patch 1 is pretty useless by itself. This patch depends on patch 1.

-- Steve



> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> ---
>  MAINTAINERS              |    7 +++
>  include/linux/ltt-core.h |   10 ++++
>  ltt/ltt-core.c           |  101 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 118 insertions(+)
> 
> Index: linux-2.6-lttng/MAINTAINERS
> ===================================================================
> --- linux-2.6-lttng.orig/MAINTAINERS	2009-03-04 13:24:38.000000000 -0500
> +++ linux-2.6-lttng/MAINTAINERS	2009-03-04 13:24:59.000000000 -0500
> @@ -2766,6 +2766,13 @@ P:	Eric Piel
>  M:	eric.piel@tremplin-utc.net
>  S:	Maintained
>  
> +LINUX TRACE TOOLKIT NEXT GENERATION
> +P:	Mathieu Desnoyers
> +M:	mathieu.desnoyers@polymtl.ca
> +L:	ltt-dev@lttng.org
> +W:	http://ltt.polymtl.ca
> +S:	Maintained
> +
>  LM83 HARDWARE MONITOR DRIVER
>  P:	Jean Delvare
>  M:	khali@linux-fr.org
> Index: linux-2.6-lttng/ltt/ltt-core.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/ltt/ltt-core.c	2009-03-04 13:36:17.000000000 -0500
> @@ -0,0 +1,101 @@
> +/*
> + * LTT core in-kernel infrastructure.
> + *
> + * Copyright 2006 - Mathieu Desnoyers mathieu.desnoyers@polymtl.ca
> + *
> + * Distributed under the GPL license
> + */
> +
> +#include <linux/ltt-core.h>
> +#include <linux/percpu.h>
> +#include <linux/module.h>
> +#include <linux/debugfs.h>
> +#include <linux/kref.h>
> +
> +/* Traces structures */
> +struct ltt_traces ltt_traces = {
> +	.setup_head = LIST_HEAD_INIT(ltt_traces.setup_head),
> +	.head = LIST_HEAD_INIT(ltt_traces.head),
> +};
> +EXPORT_SYMBOL(ltt_traces);
> +
> +/* Traces list writer locking */
> +static DEFINE_MUTEX(ltt_traces_mutex);
> +
> +/* root dentry mutex */
> +static DEFINE_MUTEX(ltt_root_mutex);
> +/* dentry of ltt's root dir */
> +static struct dentry *ltt_root_dentry;
> +static struct kref ltt_root_kref = {
> +	.refcount = ATOMIC_INIT(0),
> +};
> +
> +static void ltt_root_release(struct kref *ref)
> +{
> +	debugfs_remove(ltt_root_dentry);
> +	ltt_root_dentry = NULL;
> +}
> +
> +void put_ltt_root(void)
> +{
> +	mutex_lock(&ltt_root_mutex);
> +	if (ltt_root_dentry)
> +		kref_put(&ltt_root_kref, ltt_root_release);
> +	mutex_unlock(&ltt_root_mutex);
> +}
> +EXPORT_SYMBOL_GPL(put_ltt_root);
> +
> +struct dentry *get_ltt_root(void)
> +{
> +	mutex_lock(&ltt_root_mutex);
> +	if (!ltt_root_dentry) {
> +		ltt_root_dentry = debugfs_create_dir(LTT_ROOT, NULL);
> +		if (!ltt_root_dentry) {
> +			printk(KERN_ERR "LTT : create ltt root dir failed\n");
> +			goto out;
> +		}
> +		kref_init(&ltt_root_kref);
> +		goto out;
> +	}
> +	kref_get(&ltt_root_kref);
> +out:
> +	mutex_unlock(&ltt_root_mutex);
> +	return ltt_root_dentry;
> +}
> +EXPORT_SYMBOL_GPL(get_ltt_root);
> +
> +void ltt_lock_traces(void)
> +{
> +	mutex_lock(&ltt_traces_mutex);
> +}
> +EXPORT_SYMBOL_GPL(ltt_lock_traces);
> +
> +void ltt_unlock_traces(void)
> +{
> +	mutex_unlock(&ltt_traces_mutex);
> +}
> +EXPORT_SYMBOL_GPL(ltt_unlock_traces);
> +
> +DEFINE_PER_CPU(unsigned int, ltt_nesting);
> +EXPORT_PER_CPU_SYMBOL(ltt_nesting);
> +
> +int ltt_run_filter_default(void *trace, uint16_t eID)
> +{
> +	return 1;
> +}
> +
> +/* This function pointer is protected by a trace activation check */
> +ltt_run_filter_functor ltt_run_filter = ltt_run_filter_default;
> +EXPORT_SYMBOL_GPL(ltt_run_filter);
> +
> +void ltt_filter_register(ltt_run_filter_functor func)
> +{
> +	ltt_run_filter = func;
> +}
> +EXPORT_SYMBOL_GPL(ltt_filter_register);
> +
> +void ltt_filter_unregister(void)
> +{
> +	ltt_run_filter = ltt_run_filter_default;
> +}
> +EXPORT_SYMBOL_GPL(ltt_filter_unregister);
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-06 18:34 ` Steven Rostedt
@ 2009-03-06 19:01   ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2009-03-06 19:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Linus Torvalds, Ingo Molnar, linux-kernel,
	Andrew Morton, ltt-dev, Peter Zijlstra, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md

On Fri, Mar 06, 2009 at 01:34:43PM -0500, Steven Rostedt wrote:
> 
> Hi Mathieu,
> 
> Thanks for posting this. But it might be better to post in much smaller 
> chunks. Lets work out the little things first. Posting 41 patches is a bit 
> overwhelming. Took me a few hours to look at them all, and when I got to 
> the end, I forgot what was at the beginning.
> 
> There's also minor changes to core kernel infrastructure code. seq_file, 
> exporting functions, and such. These really need to be packaged 
> separately, and sent to the proper maintainers. Having them in a patch 
> bomb does not get the proper focus that they need.
> 


Yes, I must confess I tried to review some of them but I have been discouraged
by the high volume and the multiple subjects that come with.

Iterating with smaller topics at a time, more focused subjects would help us to bring
the attention it deserves.

Frederic.

 
> On Thu, 5 Mar 2009, Mathieu Desnoyers wrote:
> 
> > Hi,
> > 
> > I spent the last 4-5 months working with the Fujitsu team at implementing the
> > tracer elements identified as goals at Kernel Summit 2008 and at the following
> > Plumber Conference. My idea was to incremententally adapt the LTTng tracer,
> > currently used in the industry and well tested, to those requirements.
> 
> We really need to work together on this too. The biggest requirement that 
> came out of that conference was to have a "single unified buffering 
> system". And this was discussed quite heavily on LKML afterwards. All 
> development was done incrementally and publicly.
> > 
> > I spent the last days rearranging/folding/inspecting the LTTng patchset
> > to prepare it for an LKML post. The version 0.105 in the LTTng git tree
> > corresponds to the patchset I am posting here. The said patchset will
> > only include the core features of LTTng, excluding the timestamping
> > infrastructure (trace clock) and excluding the instrumentation.
> > 
> > The corresponding git tree contains also the trace clock patches and the lttng
> > instrumentation. The trace clock is required to use the tracer, but it can be
> > used without the instrumentation : there is already a kprobes and userspace
> > event support included in this patchset.
> > 
> > This tracer exports binary data through buffers using splice(). The resulting
> > binary files can be parsed from userspace because the format string metadata is
> > exported in the files. The event set can be enhanced by adding tracepoints to
> > the kernel code and by creating probe modules, which connects callbacks to the
> > tracepoints and contain the format string metainformation. Those callbacks are
> > responsible for writing the data in the trace buffers. This separation between
> > the trace buffer format string and the tracepoints is done on purpose so the
> > core kernel instrumentation (tracepoints) is not exported to userspace, which
> > will make maintainance much easier.
> 
> I've discussed this in my previous email. There is still a separation with
> the TRACE_EVENT_FORMAT and the maintainers code. The format sting is 
> "hint" only and may change without notice. LTTng could use it or ignore 
> it, it is up to the tracer to actually export that string. ftrace chose to 
> export it because it was a simple way to extract that information. My 
> utility will need to do a bit more work when the events get more complex, 
> but the way it is set up, we can do that on a case by case basis.
> 
> 
> > 
> > The tree including the trace clock patches is available at :
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
> > branch : 2.6.29-rc7-lttng-0.105
> > 
> > Project website : http://www.lttng.org/
> > 
> > Information about how to install and use the tracer is available at :
> > 
> > http://ltt.polymtl.ca/svn/trunk/lttv/LTTngManual.html
> > 
> > The size of the LTTng core patchset is 41 patches. The diffstat details
> > as follow :
> 
> 
> Again, this is overwhelming. This needs to be broken up into a small 
> subsets that can be examined piece by piece.
> 
> > 
> >  include/linux/ltt-core.h                                  |   35 
> >  include/linux/ltt-relay.h                                 |  161 +
> >  include/linux/ltt-tracer.h                                |   43 
> >  include/linux/marker.h                                    |  121 
> >  kernel/marker.c                                           |  353 ++
> >  kernel/module.c                                           |   31 
> >  linux-2.6-lttng/Documentation/markers.txt                 |   17 
> >  linux-2.6-lttng/MAINTAINERS                               |    7 
> >  linux-2.6-lttng/Makefile                                  |    2 
> >  linux-2.6-lttng/arch/powerpc/kernel/traps.c               |    5 
> >  linux-2.6-lttng/arch/powerpc/platforms/cell/spufs/spufs.h |    6 
> >  linux-2.6-lttng/arch/sparc/Makefile                       |    2 
> >  linux-2.6-lttng/arch/x86/kernel/dumpstack.c               |    5 
> >  linux-2.6-lttng/arch/x86/mm/fault.c                       |    1 
> >  linux-2.6-lttng/fs/ext4/fsync.c                           |    8 
> >  linux-2.6-lttng/fs/ext4/ialloc.c                          |   17 
> >  linux-2.6-lttng/fs/ext4/inode.c                           |   79 
> >  linux-2.6-lttng/fs/ext4/mballoc.c                         |   71 
> >  linux-2.6-lttng/fs/ext4/mballoc.h                         |    2 
> >  linux-2.6-lttng/fs/ext4/super.c                           |    6 
> >  linux-2.6-lttng/fs/jbd2/checkpoint.c                      |    7 
> >  linux-2.6-lttng/fs/jbd2/commit.c                          |   12 
> >  linux-2.6-lttng/fs/pipe.c                                 |    5 
> >  linux-2.6-lttng/fs/select.c                               |   41 
> >  linux-2.6-lttng/fs/seq_file.c                             |   45 
> >  linux-2.6-lttng/fs/splice.c                               |    1 
> 
> There is a lot of code above that needs to be in their own patch series.
> Maintainers do not have the time to pick through 41 patches to find out 
> which patch might deal with their code.
> 
> Thanks,
> 
> -- Steve
> 
> 
> >  linux-2.6-lttng/include/linux/immediate.h                 |   94 
> >  linux-2.6-lttng/include/linux/kvm_host.h                  |   12 
> >  linux-2.6-lttng/include/linux/ltt-channels.h              |   94 
> >  linux-2.6-lttng/include/linux/ltt-core.h                  |   47 
> >  linux-2.6-lttng/include/linux/ltt-relay.h                 |  186 +
> >  linux-2.6-lttng/include/linux/ltt-tracer.h                |  731 ++++++
> >  linux-2.6-lttng/include/linux/ltt-type-serializer.h       |  107 
> >  linux-2.6-lttng/include/linux/marker.h                    |   16 
> >  linux-2.6-lttng/include/linux/module.h                    |    6 
> >  linux-2.6-lttng/include/linux/poll.h                      |    2 
> >  linux-2.6-lttng/include/linux/seq_file.h                  |   20 
> >  linux-2.6-lttng/include/trace/ext4.h                      |  129 +
> >  linux-2.6-lttng/include/trace/jbd2.h                      |   19 
> >  linux-2.6-lttng/init/Kconfig                              |    2 
> >  linux-2.6-lttng/kernel/kallsyms.c                         |    1 
> >  linux-2.6-lttng/kernel/marker.c                           |   12 
> >  linux-2.6-lttng/kernel/module.c                           |   32 
> >  linux-2.6-lttng/ltt/Kconfig                               |  130 +
> >  linux-2.6-lttng/ltt/Makefile                              |   15 
> >  linux-2.6-lttng/ltt/ltt-channels.c                        |  338 ++
> >  linux-2.6-lttng/ltt/ltt-core.c                            |  101 
> >  linux-2.6-lttng/ltt/ltt-filter.c                          |   66 
> >  linux-2.6-lttng/ltt/ltt-kprobes.c                         |  479 +++
> >  linux-2.6-lttng/ltt/ltt-marker-control.c                  |  265 ++
> >  linux-2.6-lttng/ltt/ltt-relay-alloc.c                     |  715 +++++
> >  linux-2.6-lttng/ltt/ltt-relay-locked.c                    | 1704 ++++++++++++++
> >  linux-2.6-lttng/ltt/ltt-serialize.c                       |  685 +++++
> >  linux-2.6-lttng/ltt/ltt-trace-control.c                   | 1061 ++++++++
> >  linux-2.6-lttng/ltt/ltt-tracer.c                          | 1210 +++++++++
> >  linux-2.6-lttng/ltt/ltt-type-serializer.c                 |   96 
> >  linux-2.6-lttng/ltt/ltt-userspace-event.c                 |  131 +
> >  linux-2.6-lttng/samples/markers/Makefile                  |    2 
> >  linux-2.6-lttng/samples/markers/marker-example.c          |    4 
> >  linux-2.6-lttng/samples/markers/probe-example.c           |   10 
> >  linux-2.6-lttng/samples/markers/test-multi.c              |  120 
> >  linux-2.6-lttng/virt/kvm/kvm_trace.c                      |   12 
> >  ltt/Kconfig                                               |   24 
> >  ltt/Makefile                                              |    2 
> >  ltt/ltt-relay-alloc.c                                     |   80 
> >  65 files changed, 9445 insertions(+), 398 deletions(-)
> > 
> > 
> > Comments are welcome.
> > 
> > Mathieu
> > 
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> > 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-06 10:11 ` [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Ingo Molnar
@ 2009-03-06 19:02   ` Mathieu Desnoyers
  2009-03-11 18:32     ` Ingo Molnar
  0 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 19:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Steven Rostedt,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > Hi,
> > 
> > I spent the last 4-5 months working with the Fujitsu team at 
> > implementing the tracer elements identified as goals at Kernel 
> > Summit 2008 and at the following Plumber Conference. My idea 
> > was to incremententally adapt the LTTng tracer, currently used 
> > in the industry and well tested, to those requirements.
> > 
> > I spent the last days rearranging/folding/inspecting the LTTng 
> > patchset to prepare it for an LKML post. The version 0.105 in 
> > the LTTng git tree corresponds to the patchset I am posting 
> > here. The said patchset will only include the core features of 
> > LTTng, excluding the timestamping infrastructure (trace clock) 
> > and excluding the instrumentation.
> 
> I'd like to merge the good bits into the tracing tree. Looking 
> at the patches you submitted there's a lot of avoidable overlap 
> with existing tracing features either present upstream already 
> or queued up for v2.6.30 - and we need to work more on 
> eliminating that overlap.
> 
> I dont think there's much fundamental disagreement just 
> different implementations - so we should evaluate each of those 
> details one by one, iteratively.
> 
> The first step would be to split the patches up into three 
> logical buckets:
> 
>  - Unique features not present in the tracing infracture, in the 
>    event tracer or other tracing plugins - those should be 
>    structured as feature additions.
> 
>  - Features that you consider superior to existing tracing
>    features of the kernel. For those, please iterate the
>    existing code with your enhancements - instead of a parallel 
>    implementation.
> 
>  - Items which offer nothing new and are not superior to 
>    existing features, those should be dropped probably. This too 
>    is a case by case thing.
> 
> Would you be interested in working with us on that? I know that 
> both Steve and me would be very much interested in that. If you 
> have time/interest to work on that then we can go through each 
> patch one by one and categorize them and map out the way to go.
> 

Hi Ingo,

Yes, I think an incremental inclusion is the way to go in the current
context. If we do it correctly, the resulting discussion will end up
putting the best features of both tracers in the resulting one. The only
problem is that I have less time at the moment (I should be writing my
Ph.D. thesis full time), but I think it's very important at this stage
to interact with the kernel community so everyone can benefit of the
work done in the past years.

I guess that identifying the good parts in each tracer will be a first
step towards integration. If you want, I could start by replying to my
own patchset post and do a ftrace-lttng comparison on each important
item.

I don't know how much time I'll be able to put into refactoring all this
though : I just spent 4 years developing LTTng and making sure every
nuts and bolts fits together fine. Hopefully we'll be able to keep the
modifications as lightweight and as iterative as possible.

> Let me give you a few examples of existing areas of overlap:
> 
> > The corresponding git tree contains also the trace clock 
> > patches and the lttng instrumentation. The trace clock is 
> > required to use the tracer, but it can be used without the 
> > instrumentation : there is already a kprobes and userspace 
> > event support included in this patchset.
> 
> The latest tracing tree includes kernel/tracing/trace_clock.c 
> which offers three trace clock variants, with different 
> performance/precision tradeoffs:
> 
>  trace_clock_local()   [ for pure CPU-local tracers with no idle 
>                          events. This is the fastest but least 
>                          coherent tracing clock. ]
> 
>  trace_clock()         [ intermediate, scalable clock with
>                          usable but imprecise global coherency. ]
> 
>  trace_clock_global()  [ globally serialized, coherent clock. 
>                          It is the slowest but most accurate variant. ]
> 
> Tracing plugins can pick their choice. (This is relatively new 
> code but you get the idea.)
> 

Hehe this reminds me of the trace clock thread I started a few months
ago on LKML. So you guys took over that work ? Nice :) Is it based on
the trace-clock patches I proposed back then ? Ah, no. Well I guess
we'll have to discuss this too. I agree on the
trace_clock_local/trace_clock/trace_clock_global interface, it looks
nice. The underlying implementation will have to be discussed though.


> > This tracer exports binary data through buffers using 
> > splice(). The resulting binary files can be parsed from 
> > userspace because the format string metadata is exported in 
> > the files. The event set can be enhanced by adding tracepoints 
> > to the kernel code and by creating probe modules, which 
> > connects callbacks to the tracepoints and contain the format 
> > string metainformation. Those callbacks are responsible for 
> > writing the data in the trace buffers. This separation between 
> > the trace buffer format string and the tracepoints is done on 
> > purpose so the core kernel instrumentation (tracepoints) is 
> > not exported to userspace, which will make maintainance much 
> > easier.
> 
> A tracepoint format specification mechanism plus working (and 
> fast!) zero-copy splice() support of the ring-buffer exists in 
> the latest tracing tree already - as you are probably aware of 
> because you commented on those patches a few days ago.

Yep, I know. :)

> 
> There are 3 good ways to go from here regarding the trace 
> buffering and splice code:
> 
>   1- we end up switching to the lttng version in essence
>   2- we end up keeping the tracing tree version
>   3- we end up somewhere inbetween
> 
> Which point in the above spectrum we will settle down on depends 
> on the technical details.
> 
> Note, whichever path we choose a gradual, iterative workflow is 
> still needed, so that we improve the existing upstream code with
> lttng enhancements gradually.
> 
> This approach works for all your other patches as well. A 
> direct, constructive comparison and active work on unifying them 
> is required.
> 

Yes, let's try to do it. Maybe it's better to start a new thread with
less CCs for this type of work ?

Mathieu

> Thanks,
> 
> 	Ingo

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-06 19:02   ` Mathieu Desnoyers
@ 2009-03-11 18:32     ` Ingo Molnar
  2009-03-13 16:18       ` Mathieu Desnoyers
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2009-03-11 18:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Steven Rostedt,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> > Let me give you a few examples of existing areas of overlap:
> > 
> > > The corresponding git tree contains also the trace clock 
> > > patches and the lttng instrumentation. The trace clock is 
> > > required to use the tracer, but it can be used without the 
> > > instrumentation : there is already a kprobes and userspace 
> > > event support included in this patchset.
> > 
> > The latest tracing tree includes 
> > kernel/tracing/trace_clock.c which offers three trace clock 
> > variants, with different performance/precision tradeoffs:
> > 
> >  trace_clock_local()   [ for pure CPU-local tracers with no idle 
> >                          events. This is the fastest but least 
> >                          coherent tracing clock. ]
> > 
> >  trace_clock()         [ intermediate, scalable clock with
> >                          usable but imprecise global coherency. ]
> > 
> >  trace_clock_global()  [ globally serialized, coherent clock. 
> >                          It is the slowest but most accurate variant. ]
> > 
> > Tracing plugins can pick their choice. (This is relatively new 
> > code but you get the idea.)
> > 
> 
> Hehe this reminds me of the trace clock thread I started a few 
> months ago on LKML. So you guys took over that work ? Nice :) 
> Is it based on the trace-clock patches I proposed back then ? 
> Ah, no. Well I guess we'll have to discuss this too. I agree 
> on the trace_clock_local/trace_clock/trace_clock_global 
> interface, it looks nice. The underlying implementation will 
> have to be discussed though.

Beware: i found the assembly trace_clock() stuff you did back 
then rather ugly ;-) I dont think there's any easy solutions 
here, so i went for this palette of clocks.

> > This approach works for all your other patches as well. A 
> > direct, constructive comparison and active work on unifying 
> > them is required.
> 
> Yes, let's try to do it. Maybe it's better to start a new 
> thread with less CCs for this type of work ?

Yeah. More finegrained steps are really needed.

The least controversial bits would be the many tracepoints you 
identified in LTTng as interesting. Mind sending them separately 
so that we can make some progress?

In the latest tracing code all tracepoints will show up 
automatically under /debug/tracing/events/ and can be used by 
user-space tools.

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-11 18:32     ` Ingo Molnar
@ 2009-03-13 16:18       ` Mathieu Desnoyers
  2009-03-14 16:43         ` Ingo Molnar
  0 siblings, 1 reply; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-13 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Steven Rostedt,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > > Let me give you a few examples of existing areas of overlap:
> > > 
> > > > The corresponding git tree contains also the trace clock 
> > > > patches and the lttng instrumentation. The trace clock is 
> > > > required to use the tracer, but it can be used without the 
> > > > instrumentation : there is already a kprobes and userspace 
> > > > event support included in this patchset.
> > > 
> > > The latest tracing tree includes 
> > > kernel/tracing/trace_clock.c which offers three trace clock 
> > > variants, with different performance/precision tradeoffs:
> > > 
> > >  trace_clock_local()   [ for pure CPU-local tracers with no idle 
> > >                          events. This is the fastest but least 
> > >                          coherent tracing clock. ]
> > > 
> > >  trace_clock()         [ intermediate, scalable clock with
> > >                          usable but imprecise global coherency. ]
> > > 
> > >  trace_clock_global()  [ globally serialized, coherent clock. 
> > >                          It is the slowest but most accurate variant. ]
> > > 
> > > Tracing plugins can pick their choice. (This is relatively new 
> > > code but you get the idea.)
> > > 
> > 
> > Hehe this reminds me of the trace clock thread I started a few 
> > months ago on LKML. So you guys took over that work ? Nice :) 
> > Is it based on the trace-clock patches I proposed back then ? 
> > Ah, no. Well I guess we'll have to discuss this too. I agree 
> > on the trace_clock_local/trace_clock/trace_clock_global 
> > interface, it looks nice. The underlying implementation will 
> > have to be discussed though.
> 
> Beware: i found the assembly trace_clock() stuff you did back 
> then rather ugly ;-) I dont think there's any easy solutions 
> here, so i went for this palette of clocks.
> 

Hi Ingo,

I agree for the palette of clocks to fit all needs. I wonder what
exactly you found ugly in the approach I took with my trace_clock()
implementation ? Maybe you could refresh my memory, I do not recall
writing any part of it in assembly.. ? But this is a whole different
topic. We can discuss this later.


> > > This approach works for all your other patches as well. A 
> > > direct, constructive comparison and active work on unifying 
> > > them is required.
> > 
> > Yes, let's try to do it. Maybe it's better to start a new 
> > thread with less CCs for this type of work ?
> 
> Yeah. More finegrained steps are really needed.
> 
> The least controversial bits would be the many tracepoints you 
> identified in LTTng as interesting. Mind sending them separately 
> so that we can make some progress?
> 

OK, I'll work on it. Note however that I flipped my patchset around in
the past months : thinking that the tracer acceptance would be easier
than tracepoints. And now we are back at square one. Is it just me or I
have the funny feeling of acting like a dog running in circles after his
tail ? :)


> In the latest tracing code all tracepoints will show up 
> automatically under /debug/tracing/events/ and can be used by 
> user-space tools.
> 

Hrm, the thing is : I strongly disagree with showing tracepoints to
userspace and with the fact that you embed the data serialization
"pseudo-callbacks" into the tracepoint headers. Here is why. Peter
Zijlstra convinced me that putting format strings directly in tracepoint
headers was a bad idea. First off, you end up requiring all tracers
which connect on the tracepoints to choose your event format description
if they ever want to benefit from it. It's a "all included" formula :
either the tracers use them or they cannot output "standard" trace
information.

Second point : the tracepoints are meant to be tied to the kernel
source. Putting those event descriptions in global headers seems like
the people responsible for writing the kernel code surrounding the
tracepoints will end up being responsible for updating those tracepoint
event format descriptions. I think this is an unacceptable maintainance
burden for the whole community. Only tracer-specific modules should
refuse to build whenever it does not match the inner kernel structures
anymore.

Third point : it's plainly ugly. If we look at your tracepoint example :


/*
 * Tracepoint for task switches, performed by the scheduler:
 *
 * (NOTE: the 'rq' argument is not used by generic trace events,
 *        but used by the latency tracer plugin. )
 */
TRACE_EVENT(sched_switch,

        TP_PROTO(struct rq *rq, struct task_struct *prev,
                 struct task_struct *next),

        TP_ARGS(rq, prev, next),

        TP_STRUCT__entry(
                __array(        char,   prev_comm,      TASK_COMM_LEN   )
                __field(        pid_t,  prev_pid                        )
                __field(        int,    prev_prio                       )
                __array(        char,   next_comm,      TASK_COMM_LEN   )
                __field(        pid_t,  next_pid                        )
                __field(        int,    next_prio                       )
        ),

        TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
                __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
                __entry->next_comm, __entry->next_pid, __entry->next_prio),

        TP_fast_assign(
                memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
                __entry->prev_pid       = prev->pid;
                __entry->prev_prio      = prev->prio;
                memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
                __entry->next_pid       = next->pid;
                __entry->next_prio      = next->prio;
        )
);


I notice that you actually embed the "function" that converts between
the format string into a header macro declaration. Why don't we write
this in plain C ?

in include/trace/sched.h :

DECLARE_TRACE(sched_switch,
        TPPROTO(struct rq *rq, struct task_struct *prev,
                struct task_struct *next),
                TPARGS(rq, prev, next));

in ltt/probes/kernel-trace.c :


void probe_sched_switch(struct rq *rq, struct task_struct *prev,
                struct task_struct *next);

DEFINE_MARKER_TP(kernel, sched_schedule, sched_switch, probe_sched_switch,
        "prev_pid %d next_pid %d prev_state #2d%ld");

notrace void probe_sched_switch(struct rq *rq, struct task_struct *prev,
                struct task_struct *next)
{
        struct marker *marker;
        struct serialize_int_int_short data;

        data.f1 = prev->pid;
        data.f2 = next->pid;
        data.f3 = prev->state;

        marker = &GET_MARKER(kernel, sched_schedule);
        ltt_specialized_trace(marker, marker->single.probe_private,
                &data, serialize_sizeof(data), sizeof(int));
}

This way, if the content of task_struct ever changes, only the tracer
module will break, not code touching a global header.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-13 16:18       ` Mathieu Desnoyers
@ 2009-03-14 16:43         ` Ingo Molnar
  2009-03-14 16:59           ` [ltt-dev] " Mathieu Desnoyers
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2009-03-14 16:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, linux-kernel, Andrew Morton, Steven Rostedt,
	ltt-dev, Peter Zijlstra, Frederic Weisbecker, Arjan van de Ven,
	Pekka Paalanen, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Martin Bligh, Frank Ch. Eigler, Tom Zanussi, Masami Hiramatsu,
	KOSAKI Motohiro, Jason Baron, Christoph Hellwig, Jiaying Zhang,
	Eduard - Gabriel Munteanu, mrubin, md


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> > > > Let me give you a few examples of existing areas of overlap:
> > > > 
> > > > > The corresponding git tree contains also the trace clock 
> > > > > patches and the lttng instrumentation. The trace clock is 
> > > > > required to use the tracer, but it can be used without the 
> > > > > instrumentation : there is already a kprobes and userspace 
> > > > > event support included in this patchset.
> > > > 
> > > > The latest tracing tree includes 
> > > > kernel/tracing/trace_clock.c which offers three trace clock 
> > > > variants, with different performance/precision tradeoffs:
> > > > 
> > > >  trace_clock_local()   [ for pure CPU-local tracers with no idle 
> > > >                          events. This is the fastest but least 
> > > >                          coherent tracing clock. ]
> > > > 
> > > >  trace_clock()         [ intermediate, scalable clock with
> > > >                          usable but imprecise global coherency. ]
> > > > 
> > > >  trace_clock_global()  [ globally serialized, coherent clock. 
> > > >                          It is the slowest but most accurate variant. ]
> > > > 
> > > > Tracing plugins can pick their choice. (This is relatively new 
> > > > code but you get the idea.)
> > > > 
> > > 
> > > Hehe this reminds me of the trace clock thread I started a few 
> > > months ago on LKML. So you guys took over that work ? Nice :) 
> > > Is it based on the trace-clock patches I proposed back then ? 
> > > Ah, no. Well I guess we'll have to discuss this too. I agree 
> > > on the trace_clock_local/trace_clock/trace_clock_global 
> > > interface, it looks nice. The underlying implementation will 
> > > have to be discussed though.
> > 
> > Beware: i found the assembly trace_clock() stuff you did back 
> > then rather ugly ;-) I dont think there's any easy solutions 
> > here, so i went for this palette of clocks.
> > 
> 
> Hi Ingo,
> 
> I agree for the palette of clocks to fit all needs. I wonder 
> what exactly you found ugly in the approach I took with my 
> trace_clock() implementation ? Maybe you could refresh my 
> memory, I do not recall writing any part of it in assembly.. ? 
> But this is a whole different topic. We can discuss this 
> later.

hm, it was months ago. Ok, it must have been this one:

 http://lkml.org/lkml/2008/11/7/21
 http://lkml.org/lkml/2008/11/7/23

indeed no assembly but almost ;-) What i found rather ugly were 
the cnt32_to_63() complications.

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [ltt-dev] [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9
  2009-03-14 16:43         ` Ingo Molnar
@ 2009-03-14 16:59           ` Mathieu Desnoyers
  0 siblings, 0 replies; 57+ messages in thread
From: Mathieu Desnoyers @ 2009-03-14 16:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mrubin, Peter Zijlstra, Frederic Weisbecker, Pekka Paalanen,
	H. Peter Anvin, md, Tom Zanussi, Christoph Hellwig,
	Frank Ch. Eigler, ltt-dev, Eduard - Gabriel Munteanu,
	Steven Rostedt, Arnaldo Carvalho de Melo, Arjan van de Ven,
	linux-kernel, Martin Bligh, Andrew Morton, Linus Torvalds

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > * Ingo Molnar (mingo@elte.hu) wrote:
> > > 
> > > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > 
> > > > > Let me give you a few examples of existing areas of overlap:
> > > > > 
> > > > > > The corresponding git tree contains also the trace clock 
> > > > > > patches and the lttng instrumentation. The trace clock is 
> > > > > > required to use the tracer, but it can be used without the 
> > > > > > instrumentation : there is already a kprobes and userspace 
> > > > > > event support included in this patchset.
> > > > > 
> > > > > The latest tracing tree includes 
> > > > > kernel/tracing/trace_clock.c which offers three trace clock 
> > > > > variants, with different performance/precision tradeoffs:
> > > > > 
> > > > >  trace_clock_local()   [ for pure CPU-local tracers with no idle 
> > > > >                          events. This is the fastest but least 
> > > > >                          coherent tracing clock. ]
> > > > > 
> > > > >  trace_clock()         [ intermediate, scalable clock with
> > > > >                          usable but imprecise global coherency. ]
> > > > > 
> > > > >  trace_clock_global()  [ globally serialized, coherent clock. 
> > > > >                          It is the slowest but most accurate variant. ]
> > > > > 
> > > > > Tracing plugins can pick their choice. (This is relatively new 
> > > > > code but you get the idea.)
> > > > > 
> > > > 
> > > > Hehe this reminds me of the trace clock thread I started a few 
> > > > months ago on LKML. So you guys took over that work ? Nice :) 
> > > > Is it based on the trace-clock patches I proposed back then ? 
> > > > Ah, no. Well I guess we'll have to discuss this too. I agree 
> > > > on the trace_clock_local/trace_clock/trace_clock_global 
> > > > interface, it looks nice. The underlying implementation will 
> > > > have to be discussed though.
> > > 
> > > Beware: i found the assembly trace_clock() stuff you did back 
> > > then rather ugly ;-) I dont think there's any easy solutions 
> > > here, so i went for this palette of clocks.
> > > 
> > 
> > Hi Ingo,
> > 
> > I agree for the palette of clocks to fit all needs. I wonder 
> > what exactly you found ugly in the approach I took with my 
> > trace_clock() implementation ? Maybe you could refresh my 
> > memory, I do not recall writing any part of it in assembly.. ? 
> > But this is a whole different topic. We can discuss this 
> > later.
> 
> hm, it was months ago. Ok, it must have been this one:
> 
>  http://lkml.org/lkml/2008/11/7/21
>  http://lkml.org/lkml/2008/11/7/23
> 
> indeed no assembly but almost ;-) What i found rather ugly were 
> the cnt32_to_63() complications.
> 

The fact that I put a patch touching cnt32_to_63 back then was just a
way to point out how the current cnt32_to_63 implementation is broken
for SMP and should stay in UP-only architecture-specific code (that was
an answer to Peter Zijlstra's reuse concerns). Once I got agreement that
tracers should not be expected to use cnt32_to_63, I dropped any patch
touching this piece of infrastructure and stayed with my
trace-clock-32-to-64.c implementation, which is SMP-safe, scalable and
basically extends atomically (through a rcu-like algorithm) a N bit
clock to a full 64-bits clock. This is very, very useful for lots of
architectures. Is it that code you find ugly ?

Mathieu

> 	Ingo
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2009-03-14 16:59 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-05 22:47 [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 01/41] LTTng - core header Mathieu Desnoyers
2009-03-06 18:37   ` Steven Rostedt
2009-03-05 22:47 ` [RFC patch 02/41] LTTng - core data structures Mathieu Desnoyers
2009-03-06 18:41   ` Steven Rostedt
2009-03-05 22:47 ` [RFC patch 03/41] LTTng core x86 Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 04/41] LTTng core powerpc Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 05/41] LTTng relay buffer allocation, read, write Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 06/41] LTTng optimize write to page function Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 07/41] LTTng dynamic channels Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 08/41] LTTng - tracer header Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 09/41] LTTng optimize write to page function deal with unaligned access Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 10/41] lttng-optimize-write-to-page-function-remove-some-memcpy-calls Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 11/41] ltt-relay: cache pages address Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 12/41] x86 : export vmalloc_sync_all() Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 13/41] LTTng - tracer code Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 14/41] Splice and pipe : export pipe buf operations for GPL modules Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 15/41] Poll : add poll_wait_set_exclusive Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 16/41] LTTng Transport Locked Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 17/41] LTTng - serialization Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 18/41] Seq_file add support for sorted list Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 19/41] Sort module list by pointer address to get coherent sleepable seq_file iterators Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 20/41] Linux Kernel Markers - Iterator Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 21/41] LTTng probes specialized tracepoints Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 22/41] LTTng marker control Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 23/41] Immediate Values Stub header Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 24/41] Linux Kernel Markers - Use Immediate Values Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 25/41] Markers Support for Proprierary Modules Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 26/41] Marers remove old comment Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 27/41] Markers use dynamic channels Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 28/41] LTT trace control Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 29/41] LTTng menus Mathieu Desnoyers
2009-03-05 23:35   ` Randy Dunlap
2009-03-05 23:47     ` Mathieu Desnoyers
2009-03-05 23:51       ` Randy Dunlap
2009-03-06  0:01         ` [ltt-dev] " Mathieu Desnoyers
2009-03-06  0:12           ` Randy Dunlap
2009-03-05 22:47 ` [RFC patch 30/41] LTTng build Mathieu Desnoyers
2009-03-05 22:47 ` [RFC patch 31/41] LTTng userspace event v2 Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 32/41] LTTng filter Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 33/41] LTTng dynamic tracing support with kprobes Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 34/41] Marker header API update Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 35/41] Marker " Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 36/41] kvm markers " Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 37/41] Markers : multi-probes test Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 38/41] Markers examples API update Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 39/41] SPUFS markers " Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 40/41] EXT4: instrumentation with tracepoints Mathieu Desnoyers
2009-03-05 22:48 ` [RFC patch 41/41] JBD2: use tracepoints for instrumentation Mathieu Desnoyers
2009-03-06 10:11 ` [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9 Ingo Molnar
2009-03-06 19:02   ` Mathieu Desnoyers
2009-03-11 18:32     ` Ingo Molnar
2009-03-13 16:18       ` Mathieu Desnoyers
2009-03-14 16:43         ` Ingo Molnar
2009-03-14 16:59           ` [ltt-dev] " Mathieu Desnoyers
2009-03-06 18:34 ` Steven Rostedt
2009-03-06 19:01   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).