linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3
@ 2008-01-15 20:49 Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 01/30 v3] Add basic support for gcc profiler instrumentation Steven Rostedt
                   ` (29 more replies)
  0 siblings, 30 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka


[
  version 3 of mcount patches:

  changes include:

  Made mcount be able to register more than one function to call.
  If only one function is registered, then it is called directly.
  If more than one is registered, then a loop function is called
  to call all registered functions.

  Add schedule context switch tracing.

  Added preemption off tracing.

  Removed recording of command line of task at each trace, and
  placed it into the scheduling trace, or at time of max
  preemption off.

  Renamed irqs off files to preempt prefixes.

 Suggested by Jan Kiszka:
  cleaned up calls to mcount (following glibc more)

  always call mcount_trace_function directly from assembly

 Suggested by Sam Ravnborg:

   Created CONFIG_HAVE_MCOUNT to be selected by archs
   instead of each are defining a new config option.

   Added time keeping fixes by John Stultz
]

All released version of these patches can be found at:

   http://people.redhat.com/srostedt/tracing/


The following patch series brings to vanilla Linux a bit of the RT kernel
trace facility. This incorporates the "-pg" profiling option of gcc
that will call the "mcount" function for all functions called in
the kernel.

Note: I did investigate using -finstrument-functions but that adds a call
to both start and end of a function. Using mcount only does the
beginning of the function. mcount alone adds ~13% overhead. The
-finstrument-functions added ~19%.  Also it caused me to do tricks with
inline, because it adds the function calls to inline functions as well.

This patch series implements the code for x86 (32 and 64 bit), but
other archs can easily be implemented as well (note: ARM and PPC are
already implemented in -rt)

Some Background:
----------------

A while back, Ingo Molnar and William Lee Irwin III created a latency tracer
to find problem latency areas in the kernel for the RT patch.  This tracer
became a very integral part of the RT kernel in solving where latency hot
spots were.  One of the features that the latency tracer added was a
function trace.  This function tracer would record all functions that
were called (implemented by the gcc "-pg" option) and would show what was
called when interrupts or preemption was turned off.

This feature is also very helpful in normal debugging. So it's been talked
about taking bits and pieces from the RT latency tracer and bring them
to LKML. But no one had the time to do it.

Arnaldo Carvalho de Melo took a crack at it. He pulled out the mcount
as well as part of the tracing code and made it generic from the point
of the tracing code.  I'm not sure why this stopped. Probably because
Arnaldo is a very busy man, and his efforts had to be utilized elsewhere.

While I still maintain my own Logdev utility:

  http://rostedt.homelinux.com/logdev

I came across a need to do the mcount with logdev too. I was successful
but found that it became very dependent on a lot of code. One thing that
I liked about my logdev utility was that it was very non-intrusive, and has
been easy to port from the Linux 2.0 days. I did not want to burden the
logdev patch with the intrusiveness of mcount (not really that intrusive,
it just needs to add a "notrace" annotation to functions in the kernel
that will cause more conflicts in applying patches for me).

Being close to the holidays, I grabbed Arnaldos old patches and started
massaging them into something that could be useful for logdev, and what
I found out (and talking this over with Arnaldo too) that this can
be much more useful for others as well.

The main thing I changed, was that I made the mcount function itself
generic, and not the dependency on the tracing code.  That is I added

register_mcount_function()
 and
clear_mcount_function()

So when ever mcount is enabled and a function is registered that function
is called for all functions in the kernel that is not labeled with the
"notrace" annotation.


The Simple Tracer:
------------------

To show the power of this I also massaged the tracer code that Arnaldo pulled
from the RT patch and made it be a nice example of what can be done
with this.

The function that is registered to mcount has the prototype:

 void func(unsigned long ip, unsigned long parent_ip);

The ip is the address of the function and parent_ip is the address of
the parent function that called it.

The x86_64 version has the assembly call the registered function directly
to save having to do a double function call.

To enable mcount, a sysctl is added:

   /proc/sys/kernel/mcount_enabled

Once mcount is enabled, when a function is registed, it will be called by
all functions. The tracer in this patch series shows how this is done.
It adds a directory in the debugfs, called mctracer. With a ctrl file that
will allow the user have the tracer register its function.  Note, the order
of enabling mcount and registering a function is not important, but both
must be done to initiate the tracing. That is, you can disable tracing
by either disabling mcount or by clearing the registered function.

Only one function may be registered at a time. If another function is
registered, it will simply override what ever was there previously.

Here's a simple example of the tracer output:

CPU 2: hackbench:11867 preempt_schedule+0xc/0x84 <-- avc_has_perm_noaudit+0x45d/0x52c
CPU 1: hackbench:12052 selinux_file_permission+0x10/0x11c <-- security_file_permission+0x16/0x18
CPU 3: hackbench:12017 update_curr+0xe/0x8b <-- put_prev_task_fair+0x24/0x4c
CPU 2: hackbench:11867 avc_audit+0x16/0x9e3 <-- avc_has_perm+0x51/0x63
CPU 0: hackbench:12019 socket_has_perm+0x16/0x7c <-- selinux_socket_sendmsg+0x27/0x3e
CPU 1: hackbench:12052 file_has_perm+0x16/0xbb <-- selinux_file_permission+0x104/0x11c

This is formated like:

 CPU <CPU#>: <task-comm>:<task-pid> <function> <-- <parent-function>


Latency Tracer Format:
----------------------

The format used by the RT patch is a bit more complex. It is designed to
record a lot of information quickly and dump out a lot too.

There's two versions of the format. Verbose and non-vebose.

verbose:

preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
 latency: 89 us, #3/3, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
    -----------------
    | task: kjournald-600 (uid:0 nice:-5 policy:0 rt_prio:0)
    -----------------
 => started at: _spin_lock_irqsave+0x2a/0x63 <c06310d2>
 => ended at:   _spin_unlock_irqrestore+0x32/0x41 <c0631245>

       kjournald   600 1 1 00000000 00000000 [397408f1] 0.003ms (+0.079ms): _spin_lock_irqsave+0x2a/0x63 <c06310d2> (scsi_dispatch_cmd+0x155/0x234 [scsi_mod] <f8867c19>)
       kjournald   600 1 1 00000000 00000001 [39740940] 0.081ms (+0.005ms): _spin_unlock_irqrestore+0x32/0x41 <c0631245> (scsi_dispatch_cmd+0x1be/0x234 [scsi_mod] <f8867c82>)
       kjournald   600 1 1 00000000 00000002 [39740945] 0.087ms (+0.000ms): trace_hardirqs_on_caller+0x74/0x86 <c0508bdc> (_spin_unlock_irqrestore+0x32/0x41 <c0631245>)


non-verbose:

preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
 latency: 89 us, #3/3, CPU#2 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
    -----------------
    | task: kjournald-600 (uid:0 nice:-5 policy:0 rt_prio:0)
    -----------------
 => started at: _spin_lock_irqsave+0x2a/0x63 <c06310d2>
 => ended at:   _spin_unlock_irqrestore+0x32/0x41 <c0631245>

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
kjournal-600   1d...    3us+: _spin_lock_irqsave+0x2a/0x63 <c06310d2> (scsi_dispatch_cmd+0x155/0x234 [scsi_mod] <f8867c19>)
kjournal-600   1d...   81us+: _spin_unlock_irqrestore+0x32/0x41 <c0631245> (scsi_dispatch_cmd+0x1be/0x234 [scsi_mod] <f8867c82>)
kjournal-600   1d...   87us : trace_hardirqs_on_caller+0x74/0x86 <c0508bdc> (_spin_unlock_irqrestore+0x32/0x41 <c0631245>)


Debug FS:
---------

Although enabling and disabling mcount is done through the sysctl:

/proc/sys/kernel/mcount_enabled

The rest of the tracing uses debugfs.

/debugfs/tracing

Here's the available files:

fn_trace_ctrl
  echo 1 to this enables mcount tracing (if mcount_enabled is set)
  echo 0 to disable the function trace tracing.

function_trace
  Outputs the function trace in latency_trace format.

preempt_fn_trace_ctrl
  echo 1 to enable function tracing in critical sections timings
  echo 0 to disable

preempt_trace
  Outputs the critical section latency

preempt_thresh
  echo a number (in usecs) into this to record all traces that are
  greater than threshold.

iter_ctrl
  echo "symonly" to not show the instruction pointers in the trace
  echo "nosymonly" to disable symonly.
  echo "verbose" for verbose output from latency format.
  echo "noverbose" to disable verbose ouput.
  cat iter_ctrl to see the current settings.

preempt_max_latency
  Holds the current max critical latency.
  echo 0 to reset and start tracing.

trace
  simple output format of the function trace code.


Overhead:
---------

Note that having mcount compiled in seems to show a little overhead.

Here's 3 runs of hackbench 50 without the patches:
Time: 2.137
Time: 2.283
Time: 2.245

 Avg: 2.221

and here's 3 runs with the patches (without tracing on):
Time: 2.738
Time: 2.469
Time: 2.388

  Avg: 2.531

So it is a 13% overhead when enabled (according to hackbench).

But full tracing can cause a bit more problems:

# hackbench 50
Time: 113.350

  113.350!!!!!

But this is tracing *every* function call!


Future:
-------
The way the mcount hook is done here, other utilities can easily add their
own functions. Just care needs to be made not to call anything that is not
marked with notrace, or you will crash the box with recursion. But
even the simple tracer adds a "disabled" feature so in case it happens
to call something that is not marked with notrace, it is a safety net
not to kill the box.

I was originally going to use the relay system to record the data, but
that had a chance of calling functions not marked with notrace. But, if
for example LTTng wanted to use this, it could disable tracing on a CPU
when doing the calls, and this will protect from recusion.

SystemTap:
----------
One thing that Arnaldo and I discussed last year was using systemtap to
add hooks into the kernel to start and stop tracing.  kprobes is too
heavy to do on all funtion calls, but it would be perfect to add to
non hot paths to start the tracer and stop the tracer.

So when debugging the kernel, instead of recompiling with printks
or other markers, you could simply use systemtap to place a trace start
and stop locations and trace the problem areas to see what is happening.


Latency Tracing:
----------------

We can also add trace points to record the time the highest priority task
needs to wait before running. This too is currently done in the RT patch.


These are just some of the ideas we have with this. And we are sure others
could come up with more.

These patches are for the underlining work. We'll see what happens next.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 01/30 v3] Add basic support for gcc profiler instrumentation
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 02/30 v3] Annotate core code that should not be traced Steven Rostedt
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-add-basic-support-for-gcc-profiler-instrum.patch --]
[-- Type: text/plain, Size: 12498 bytes --]

If CONFIG_MCOUNT is selected and /proc/sys/kernel/mcount_enabled is set to a
non-zero value the mcount routine will be called everytime we enter a kernel
function that is not marked with the "notrace" attribute.

The mcount routine will then call a registered function if a function
happens to be registered.

[This code has been highly hacked by Steven Rostedt, so don't
 blame Arnaldo for all of this ;-) ]

Update:
  It is now possible to register more than one mcount function.
  If only one mcount function is registered, that will be the
  function that mcount calls directly. If more than one function
  is registered, then mcount will call a function that will loop
  through the functions to call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 Makefile                   |    3 
 arch/x86/Kconfig           |    1 
 arch/x86/kernel/entry_32.S |   22 +++++++
 arch/x86/kernel/entry_64.S |   33 ++++++++++
 include/linux/linkage.h    |    2 
 include/linux/mcount.h     |   38 ++++++++++++
 kernel/sysctl.c            |   11 +++
 lib/Kconfig.debug          |    2 
 lib/Makefile               |    2 
 lib/tracing/Kconfig        |   10 +++
 lib/tracing/Makefile       |    3 
 lib/tracing/mcount.c       |  141 +++++++++++++++++++++++++++++++++++++++++++++
 12 files changed, 268 insertions(+)

Index: linux-compile.git/Makefile
===================================================================
--- linux-compile.git.orig/Makefile	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/Makefile	2008-01-15 12:50:01.000000000 -0500
@@ -509,6 +509,9 @@ endif
 
 include $(srctree)/arch/$(SRCARCH)/Makefile
 
+ifdef CONFIG_MCOUNT
+KBUILD_CFLAGS	+= -pg
+endif
 ifdef CONFIG_FRAME_POINTER
 KBUILD_CFLAGS	+= -fno-omit-frame-pointer -fno-optimize-sibling-calls
 else
Index: linux-compile.git/arch/x86/Kconfig
===================================================================
--- linux-compile.git.orig/arch/x86/Kconfig	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/arch/x86/Kconfig	2008-01-15 12:50:01.000000000 -0500
@@ -19,6 +19,7 @@ config X86_64
 config X86
 	bool
 	default y
+	select HAVE_MCOUNT
 
 config GENERIC_TIME
 	bool
Index: linux-compile.git/arch/x86/kernel/entry_32.S
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/entry_32.S	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/entry_32.S	2008-01-15 12:50:01.000000000 -0500
@@ -75,6 +75,28 @@ DF_MASK		= 0x00000400 
 NT_MASK		= 0x00004000
 VM_MASK		= 0x00020000
 
+#ifdef CONFIG_MCOUNT
+.globl mcount
+mcount:
+	cmpl $0, mcount_enabled
+	jz out
+
+	/* taken from glibc */
+	pushl %eax
+	pushl %ecx
+	pushl %edx
+	movl 0xc(%esp), %edx
+	movl 0x4(%ebp), %eax
+
+	call   *mcount_trace_function
+
+	popl %edx
+	popl %ecx
+	popl %eax
+out:
+	ret
+#endif
+
 #ifdef CONFIG_PREEMPT
 #define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
Index: linux-compile.git/arch/x86/kernel/entry_64.S
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/entry_64.S	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/entry_64.S	2008-01-15 12:50:01.000000000 -0500
@@ -53,6 +53,39 @@
 
 	.code64
 
+#ifdef CONFIG_MCOUNT
+
+ENTRY(mcount)
+	cmpl $0, mcount_enabled
+	jz out
+
+	/* taken from glibc */
+	subq $0x38, %rsp
+	movq %rax, (%rsp)
+	movq %rcx, 8(%rsp)
+	movq %rdx, 16(%rsp)
+	movq %rsi, 24(%rsp)
+	movq %rdi, 32(%rsp)
+	movq %r8, 40(%rsp)
+	movq %r9, 48(%rsp)
+
+	movq 0x38(%rsp), %rsi
+	movq 8(%rbp), %rdi
+
+	call   *mcount_trace_function
+
+	movq 48(%rsp), %r9
+	movq 40(%rsp), %r8
+	movq 32(%rsp), %rdi
+	movq 24(%rsp), %rsi
+	movq 16(%rsp), %rdx
+	movq 8(%rsp), %rcx
+	movq (%rsp), %rax
+	addq $0x38, %rsp
+out:
+	retq
+#endif
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif	
Index: linux-compile.git/include/linux/linkage.h
===================================================================
--- linux-compile.git.orig/include/linux/linkage.h	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/include/linux/linkage.h	2008-01-15 12:50:01.000000000 -0500
@@ -3,6 +3,8 @@
 
 #include <asm/linkage.h>
 
+#define notrace __attribute__((no_instrument_function))
+
 #ifdef __cplusplus
 #define CPP_ASMLINKAGE extern "C"
 #else
Index: linux-compile.git/include/linux/mcount.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/include/linux/mcount.h	2008-01-15 12:50:57.000000000 -0500
@@ -0,0 +1,38 @@
+#ifndef _LINUX_MCOUNT_H
+#define _LINUX_MCOUNT_H
+
+#ifdef CONFIG_MCOUNT
+extern int mcount_enabled;
+
+#include <linux/linkage.h>
+
+#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
+#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
+
+typedef void (*mcount_func_t)(unsigned long ip, unsigned long parent_ip);
+
+struct mcount_ops {
+	mcount_func_t func;
+	struct mcount_ops *next;
+};
+
+/*
+ * The mcount_ops must be a static and should also
+ * be read_mostly.  These functions do modify read_mostly variables
+ * so use them sparely. Never free an mcount_op or modify the
+ * next pointer after it has been registered. Even after unregistering
+ * it, the next pointer may still be used internally.
+ */
+int register_mcount_function(struct mcount_ops *ops);
+int unregister_mcount_function(struct mcount_ops *ops);
+void clear_mcount_function(void);
+
+extern void mcount(void);
+
+#else /* !CONFIG_MCOUNT */
+# define register_mcount_function(ops) do { } while (0)
+# define unregister_mcount_function(ops) do { } while (0)
+# define clear_mcount_function(ops) do { } while (0)
+#endif /* CONFIG_MCOUNT */
+#endif /* _LINUX_MCOUNT_H */
Index: linux-compile.git/kernel/sysctl.c
===================================================================
--- linux-compile.git.orig/kernel/sysctl.c	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/kernel/sysctl.c	2008-01-15 12:50:01.000000000 -0500
@@ -46,6 +46,7 @@
 #include <linux/nfs_fs.h>
 #include <linux/acpi.h>
 #include <linux/reboot.h>
+#include <linux/mcount.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -470,6 +471,16 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+#ifdef CONFIG_MCOUNT
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "mcount_enabled",
+		.data		= &mcount_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+#endif
 #ifdef CONFIG_KMOD
 	{
 		.ctl_name	= KERN_MODPROBE,
Index: linux-compile.git/lib/Kconfig.debug
===================================================================
--- linux-compile.git.orig/lib/Kconfig.debug	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/lib/Kconfig.debug	2008-01-15 12:50:01.000000000 -0500
@@ -517,4 +517,6 @@ config FAULT_INJECTION_STACKTRACE_FILTER
 	help
 	  Provide stacktrace filter for fault-injection capabilities
 
+source lib/tracing/Kconfig
+
 source "samples/Kconfig"
Index: linux-compile.git/lib/Makefile
===================================================================
--- linux-compile.git.orig/lib/Makefile	2008-01-15 12:49:53.000000000 -0500
+++ linux-compile.git/lib/Makefile	2008-01-15 12:50:01.000000000 -0500
@@ -66,6 +66,8 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
+obj-$(CONFIG_MCOUNT) += tracing/
+
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 hostprogs-y	:= gen_crc32table
Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 12:50:01.000000000 -0500
@@ -0,0 +1,10 @@
+
+# Archs that enable MCOUNT should select HAVE_MCOUNT
+config HAVE_MCOUNT
+       bool
+
+# MCOUNT itself is useless, or will just be added overhead.
+# It needs something to register a function with it.
+config MCOUNT
+	bool
+	select FRAME_POINTER
Index: linux-compile.git/lib/tracing/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/Makefile	2008-01-15 12:50:01.000000000 -0500
@@ -0,0 +1,3 @@
+obj-$(CONFIG_MCOUNT) += libmcount.o
+
+libmcount-y := mcount.o
Index: linux-compile.git/lib/tracing/mcount.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/mcount.c	2008-01-15 12:50:01.000000000 -0500
@@ -0,0 +1,141 @@
+/*
+ * Infrastructure for profiling code inserted by 'gcc -pg'.
+ *
+ * Copyright (C) 2007-2008 Steven Rostedt <srostedt@redhat.com>
+ *
+ * Originally ported from the -rt patch by:
+ *   Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+ *
+ * Based on code in the latency_tracer, that is:
+ *
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+
+#include <linux/module.h>
+#include <linux/mcount.h>
+
+/*
+ * Since we have nothing protecting between the test of
+ * mcount_trace_function and the call to it, we can't
+ * set it to NULL without risking a race that will have
+ * the kernel call the NULL pointer. Instead, we just
+ * set the function pointer to a dummy function.
+ */
+notrace void dummy_mcount_tracer(unsigned long ip,
+				 unsigned long parent_ip)
+{
+	/* do nothing */
+}
+
+static DEFINE_SPINLOCK(mcount_func_lock);
+static struct mcount_ops mcount_list_end __read_mostly =
+{
+	.func = dummy_mcount_tracer,
+};
+
+static struct mcount_ops *mcount_list __read_mostly = &mcount_list_end;
+mcount_func_t mcount_trace_function __read_mostly = dummy_mcount_tracer;
+int mcount_enabled __read_mostly;
+
+/* mcount is defined per arch in assembly */
+EXPORT_SYMBOL_GPL(mcount);
+
+notrace void mcount_list_func(unsigned long ip, unsigned long parent_ip)
+{
+	struct mcount_ops *op = mcount_list;
+
+	while (op != &mcount_list_end) {
+		op->func(ip, parent_ip);
+		op = op->next;
+	};
+}
+
+/**
+ * register_mcount_function - register a function for profiling
+ * @ops - ops structure that holds the function for profiling.
+ *
+ * Register a function to be called by all functions in the
+ * kernel.
+ *
+ * Note: @ops->func and all the functions it calls must be labeled
+ *       with "notrace", otherwise it will go into a
+ *       recursive loop.
+ */
+int register_mcount_function(struct mcount_ops *ops)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&mcount_func_lock, flags);
+	ops->next = mcount_list;
+	/* must have next seen before we update the list pointer */
+	smp_wmb();
+	mcount_list = ops;
+	/*
+	 * For one func, simply call it directly.
+	 * For more than one func, call the chain.
+	 */
+	if (ops->next == &mcount_list_end)
+		mcount_trace_function = ops->func;
+	else
+		mcount_trace_function = mcount_list_func;
+	spin_unlock_irqrestore(&mcount_func_lock, flags);
+
+	return 0;
+}
+
+/**
+ * unregister_mcount_function - unresgister a function for profiling.
+ * @ops - ops structure that holds the function to unregister
+ *
+ * Unregister a function that was added to be called by mcount profiling.
+ */
+int unregister_mcount_function(struct mcount_ops *ops)
+{
+	unsigned long flags;
+	struct mcount_ops **p;
+	int ret = 0;
+
+	spin_lock_irqsave(&mcount_func_lock, flags);
+
+	/*
+	 * If we are the only function, then the mcount pointer is
+	 * pointing directly to that function.
+	 */
+	if (mcount_list == ops && ops->next == &mcount_list_end) {
+		mcount_trace_function = dummy_mcount_tracer;
+		mcount_list = &mcount_list_end;
+		goto out;
+	}
+
+	for (p = &mcount_list; *p != &mcount_list_end; p = &(*p)->next)
+		if (*p == ops)
+			break;
+
+	if (*p != ops) {
+		ret = -1;
+		goto out;
+	}
+
+	*p = (*p)->next;
+
+	/* If we only have one func left, then call that directly */
+	if (mcount_list->next == &mcount_list_end)
+		mcount_trace_function = mcount_list->func;
+
+ out:
+	spin_unlock_irqrestore(&mcount_func_lock, flags);
+
+	return 0;
+}
+
+/**
+ * clear_mcount_function - reset the mcount function
+ *
+ * This NULLs the mcount function and in essence stops
+ * tracing.  There may be lag
+ */
+void clear_mcount_function(void)
+{
+	mcount_trace_function = dummy_mcount_tracer;
+}

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 02/30 v3] Annotate core code that should not be traced
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 01/30 v3] Add basic support for gcc profiler instrumentation Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 03/30 v3] x86_64: notrace annotations Steven Rostedt
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-annotate-generic-code.patch --]
[-- Type: text/plain, Size: 922 bytes --]

Mark with "notrace" functions in core code that should not be
traced.  The "notrace" attribute will prevent gcc from adding
a call to mcount on the annotated funtions.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>

---
 lib/smp_processor_id.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-compile.git/lib/smp_processor_id.c
===================================================================
--- linux-compile.git.orig/lib/smp_processor_id.c	2008-01-14 13:14:03.000000000 -0500
+++ linux-compile.git/lib/smp_processor_id.c	2008-01-14 13:14:13.000000000 -0500
@@ -7,7 +7,7 @@
 #include <linux/kallsyms.h>
 #include <linux/sched.h>
 
-unsigned int debug_smp_processor_id(void)
+notrace unsigned int debug_smp_processor_id(void)
 {
 	unsigned long preempt_count = preempt_count();
 	int this_cpu = raw_smp_processor_id();

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 03/30 v3] x86_64: notrace annotations
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 01/30 v3] Add basic support for gcc profiler instrumentation Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 02/30 v3] Annotate core code that should not be traced Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 04/30 v3] add notrace annotations to vsyscall Steven Rostedt
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-add-x86_64-notrace-annotations.patch --]
[-- Type: text/plain, Size: 2221 bytes --]

Add "notrace" annotation to x86_64 specific files.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/head64.c     |    2 +-
 arch/x86/kernel/setup64.c    |    4 ++--
 arch/x86/kernel/smpboot_64.c |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-compile.git/arch/x86/kernel/head64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/head64.c	2008-01-14 13:14:01.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/head64.c	2008-01-14 13:14:13.000000000 -0500
@@ -46,7 +46,7 @@ static void __init copy_bootdata(char *r
 	}
 }
 
-void __init x86_64_start_kernel(char * real_mode_data)
+notrace void __init x86_64_start_kernel(char *real_mode_data)
 {
 	int i;
 
Index: linux-compile.git/arch/x86/kernel/setup64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/setup64.c	2008-01-14 13:14:01.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/setup64.c	2008-01-14 13:14:13.000000000 -0500
@@ -114,7 +114,7 @@ void __init setup_per_cpu_areas(void)
 	}
 } 
 
-void pda_init(int cpu)
+notrace void pda_init(int cpu)
 { 
 	struct x8664_pda *pda = cpu_pda(cpu);
 
@@ -197,7 +197,7 @@ DEFINE_PER_CPU(struct orig_ist, orig_ist
  * 'CPU state barrier', nothing should get across.
  * A lot of state is already set up in PDA init.
  */
-void __cpuinit cpu_init (void)
+notrace void __cpuinit cpu_init(void)
 {
 	int cpu = stack_smp_processor_id();
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
Index: linux-compile.git/arch/x86/kernel/smpboot_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/smpboot_64.c	2008-01-14 13:14:01.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/smpboot_64.c	2008-01-14 13:14:13.000000000 -0500
@@ -317,7 +317,7 @@ static inline void set_cpu_sibling_map(i
 /*
  * Setup code on secondary processor (after comming out of the trampoline)
  */
-void __cpuinit start_secondary(void)
+notrace __cpuinit void start_secondary(void)
 {
 	/*
 	 * Dont put anything before smp_callin(), SMP

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 04/30 v3] add notrace annotations to vsyscall.
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (2 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 03/30 v3] x86_64: notrace annotations Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 05/30 v3] add notrace annotations for NMI routines Steven Rostedt
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-add-x86-vdso-notrace-annotations.patch --]
[-- Type: text/plain, Size: 4578 bytes --]

Add the notrace annotations to some of the vsyscall functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/vsyscall_64.c  |    3 ++-
 arch/x86/vdso/vclock_gettime.c |   15 ++++++++-------
 arch/x86/vdso/vgetcpu.c        |    3 ++-
 include/asm-x86/vsyscall.h     |    3 ++-
 4 files changed, 14 insertions(+), 10 deletions(-)

Index: linux-compile.git/arch/x86/kernel/vsyscall_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/vsyscall_64.c	2008-01-14 13:14:00.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/vsyscall_64.c	2008-01-14 14:57:53.000000000 -0500
@@ -42,7 +42,8 @@
 #include <asm/topology.h>
 #include <asm/vgtod.h>
 
-#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
+#define __vsyscall(nr) \
+		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
 #define __syscall_clobber "r11","rcx","memory"
 #define __pa_vsymbol(x)			\
 	({unsigned long v;  		\
Index: linux-compile.git/arch/x86/vdso/vclock_gettime.c
===================================================================
--- linux-compile.git.orig/arch/x86/vdso/vclock_gettime.c	2008-01-14 13:14:00.000000000 -0500
+++ linux-compile.git/arch/x86/vdso/vclock_gettime.c	2008-01-14 13:14:13.000000000 -0500
@@ -24,7 +24,7 @@
 
 #define gtod vdso_vsyscall_gtod_data
 
-static long vdso_fallback_gettime(long clock, struct timespec *ts)
+notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
 {
 	long ret;
 	asm("syscall" : "=a" (ret) :
@@ -32,7 +32,7 @@ static long vdso_fallback_gettime(long c
 	return ret;
 }
 
-static inline long vgetns(void)
+notrace static inline long vgetns(void)
 {
 	long v;
 	cycles_t (*vread)(void);
@@ -41,7 +41,7 @@ static inline long vgetns(void)
 	return (v * gtod->clock.mult) >> gtod->clock.shift;
 }
 
-static noinline int do_realtime(struct timespec *ts)
+notrace static noinline int do_realtime(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	do {
@@ -55,7 +55,8 @@ static noinline int do_realtime(struct t
 }
 
 /* Copy of the version in kernel/time.c which we cannot directly access */
-static void vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
+notrace static void
+vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
 {
 	while (nsec >= NSEC_PER_SEC) {
 		nsec -= NSEC_PER_SEC;
@@ -69,7 +70,7 @@ static void vset_normalized_timespec(str
 	ts->tv_nsec = nsec;
 }
 
-static noinline int do_monotonic(struct timespec *ts)
+notrace static noinline int do_monotonic(struct timespec *ts)
 {
 	unsigned long seq, ns, secs;
 	do {
@@ -83,7 +84,7 @@ static noinline int do_monotonic(struct 
 	return 0;
 }
 
-int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
+notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
 	if (likely(gtod->sysctl_enabled && gtod->clock.vread))
 		switch (clock) {
@@ -97,7 +98,7 @@ int __vdso_clock_gettime(clockid_t clock
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
-int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
+notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	long ret;
 	if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
Index: linux-compile.git/arch/x86/vdso/vgetcpu.c
===================================================================
--- linux-compile.git.orig/arch/x86/vdso/vgetcpu.c	2008-01-14 13:14:00.000000000 -0500
+++ linux-compile.git/arch/x86/vdso/vgetcpu.c	2008-01-14 13:14:13.000000000 -0500
@@ -13,7 +13,8 @@
 #include <asm/vgtod.h>
 #include "vextern.h"
 
-long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
+notrace long
+__vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
 	unsigned int dummy, p;
 
Index: linux-compile.git/include/asm-x86/vsyscall.h
===================================================================
--- linux-compile.git.orig/include/asm-x86/vsyscall.h	2008-01-14 13:14:00.000000000 -0500
+++ linux-compile.git/include/asm-x86/vsyscall.h	2008-01-14 13:14:13.000000000 -0500
@@ -24,7 +24,8 @@ enum vsyscall_num {
 	((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
 #define __section_vsyscall_clock __attribute__ \
 	((unused, __section__ (".vsyscall_clock"),aligned(16)))
-#define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn")))
+#define __vsyscall_fn \
+	__attribute__ ((unused, __section__(".vsyscall_fn"))) notrace
 
 #define VGETCPU_RDTSCP	1
 #define VGETCPU_LSL	2

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 05/30 v3] add notrace annotations for NMI routines
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (3 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 04/30 v3] add notrace annotations to vsyscall Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 06/30 v3] mcount based trace in the form of a header file library Steven Rostedt
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-nmi-notrace-annotations.patch --]
[-- Type: text/plain, Size: 2475 bytes --]

This annotates NMI functions with notrace. Some tracers may be able
to live with this, but some cannot. So we turn off NMI tracing.

One solution might be to make a notrace_nmi which would only turn
off NMI tracing if a trace utility needed it off.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>

---
 arch/x86/kernel/nmi_32.c   |    2 +-
 arch/x86/kernel/nmi_64.c   |    2 +-
 arch/x86/kernel/traps_32.c |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-compile.git/arch/x86/kernel/nmi_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/nmi_32.c	2008-01-14 13:13:58.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/nmi_32.c	2008-01-14 13:14:13.000000000 -0500
@@ -323,7 +323,7 @@ EXPORT_SYMBOL(touch_nmi_watchdog);
 
 extern void die_nmi(struct pt_regs *, const char *msg);
 
-__kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+notrace __kprobes int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 {
 
 	/*
Index: linux-compile.git/arch/x86/kernel/nmi_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/nmi_64.c	2008-01-14 13:13:58.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/nmi_64.c	2008-01-14 13:14:13.000000000 -0500
@@ -314,7 +314,7 @@ void touch_nmi_watchdog(void)
  	touch_softlockup_watchdog();
 }
 
-int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+notrace __kprobes int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 {
 	int sum;
 	int touched = 0;
Index: linux-compile.git/arch/x86/kernel/traps_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/traps_32.c	2008-01-14 13:13:58.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/traps_32.c	2008-01-14 13:14:13.000000000 -0500
@@ -722,7 +722,7 @@ void __kprobes die_nmi(struct pt_regs *r
 	do_exit(SIGSEGV);
 }
 
-static __kprobes void default_do_nmi(struct pt_regs * regs)
+static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
 
@@ -762,7 +762,7 @@ static __kprobes void default_do_nmi(str
 
 static int ignore_nmis;
 
-fastcall __kprobes void do_nmi(struct pt_regs * regs, long error_code)
+notrace fastcall __kprobes void do_nmi(struct pt_regs *regs, long error_code)
 {
 	int cpu;
 

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 06/30 v3] mcount based trace in the form of a header file library
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (4 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 05/30 v3] add notrace annotations for NMI routines Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 07/30 v3] tracer add debugfs interface Steven Rostedt
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-simple-tracer.patch --]
[-- Type: text/plain, Size: 6805 bytes --]

The design is for mcount based tracers to be added thru the
lib/tracing/tracer_interface.h file, just like mcount users should add
themselves to lib/tracing/mcount.h. A Kconfig rule chooses the right MCOUNT and
MCOUNT_TRACER user.

This is to avoid function call costs for something that is supposed to be used
only in a debug kernel and that has to reduce to the bare minimum the per
function call overhead of mcount based tracing.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/Kconfig            |   11 +++
 lib/tracing/Makefile           |    2 
 lib/tracing/tracer.c           |  128 +++++++++++++++++++++++++++++++++++++++++
 lib/tracing/tracer.h           |   21 ++++++
 lib/tracing/tracer_interface.h |   14 ++++
 5 files changed, 176 insertions(+)

Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-15 14:54:17.000000000 -0500
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 14:54:37.000000000 -0500
@@ -8,3 +8,14 @@ config HAVE_MCOUNT
 config MCOUNT
 	bool
 	select FRAME_POINTER
+
+config MCOUNT_TRACER
+	bool "Profiler instrumentation based tracer"
+	depends on DEBUG_KERNEL && HAVE_MCOUNT
+	default n
+	select MCOUNT
+	help
+	  Use profiler instrumentation, adding -pg to CFLAGS. This will
+	  insert a call to an architecture specific __mcount routine,
+	  that the debugging mechanism using this facility will hook by
+	  providing a set of inline routines.
Index: linux-compile.git/lib/tracing/Makefile
===================================================================
--- linux-compile.git.orig/lib/tracing/Makefile	2008-01-15 14:54:17.000000000 -0500
+++ linux-compile.git/lib/tracing/Makefile	2008-01-15 14:54:37.000000000 -0500
@@ -1,3 +1,5 @@
 obj-$(CONFIG_MCOUNT) += libmcount.o
 
+obj-$(CONFIG_MCOUNT_TRACER) += tracer.o
+
 libmcount-y := mcount.o
Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-15 14:54:37.000000000 -0500
@@ -0,0 +1,128 @@
+/*
+ * ring buffer based mcount tracer
+ *
+ * Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+ * 		      Steven Rostedt <srostedt@redhat.com>
+ *
+ * From code in the latency_tracer, that is:
+ *
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/seq_file.h>
+#include <linux/mcount.h>
+
+#include "tracer.h"
+#include "tracer_interface.h"
+
+static struct mctracer_trace mctracer_trace;
+
+static inline notrace void
+mctracer_add_trace_entry(struct mctracer_trace *tr,
+			 int cpu,
+			 const unsigned long ip,
+			 const unsigned long parent_ip)
+{
+	unsigned long idx, idx_next;
+	struct mctracer_entry *entry;
+
+	idx = tr->trace_idx[cpu];
+	idx_next = idx + 1;
+
+	if (unlikely(idx_next >= tr->entries)) {
+		atomic_inc(&tr->underrun[cpu]);
+		idx_next = 0;
+	}
+
+	tr->trace_idx[cpu] = idx_next;
+
+	if (unlikely(idx_next != 0 && atomic_read(&tr->underrun[cpu])))
+		atomic_inc(&tr->underrun[cpu]);
+
+	entry = tr->trace[cpu] + idx * MCTRACER_ENTRY_SIZE;
+	entry->idx	 = atomic_inc_return(&tr->cnt);
+	entry->ip	 = ip;
+	entry->parent_ip = parent_ip;
+}
+
+static notrace void trace_function(const unsigned long ip,
+				   const unsigned long parent_ip)
+{
+	unsigned long flags;
+	struct mctracer_trace *tr;
+	int cpu;
+
+	raw_local_irq_save(flags);
+	cpu = raw_smp_processor_id();
+
+	tr = &mctracer_trace;
+
+	atomic_inc(&tr->disabled[cpu]);
+	if (likely(atomic_read(&tr->disabled[cpu]) == 1))
+		mctracer_add_trace_entry(tr, cpu, ip, parent_ip);
+
+	atomic_dec(&tr->disabled[cpu]);
+
+	raw_local_irq_restore(flags);
+}
+
+static struct mcount_ops trace_ops __read_mostly =
+{
+	.func = trace_function,
+};
+
+static notrace int page_order(const unsigned long size)
+{
+	const unsigned long nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+	return ilog2(roundup_pow_of_two(nr_pages));
+}
+
+static notrace int mctracer_alloc_buffers(void)
+{
+	const int order = page_order(MCTRACER_NR_ENTRIES * MCTRACER_ENTRY_SIZE);
+	const unsigned long size = (1UL << order) << PAGE_SHIFT;
+	struct mctracer_entry *array;
+	int i;
+
+	for_each_possible_cpu(i) {
+		array = (struct mctracer_entry *)
+			  __get_free_pages(GFP_KERNEL, order);
+		if (array == NULL) {
+			printk(KERN_ERR "mctracer: failed to allocate"
+			       " %ld bytes for trace buffer!\n", size);
+			goto free_buffers;
+		}
+		mctracer_trace.trace[i] = array;
+	}
+
+	/*
+	 * Since we allocate by orders of pages, we may be able to
+	 * round up a bit.
+	 */
+	mctracer_trace.entries = size / MCTRACER_ENTRY_SIZE;
+
+	pr_info("mctracer: %ld bytes allocated for %ld entries of %ld bytes\n",
+		size, MCTRACER_NR_ENTRIES, (long)MCTRACER_ENTRY_SIZE);
+	pr_info("   actual entries %ld\n", mctracer_trace.entries);
+
+	register_mcount_function(&trace_ops);
+
+	return 0;
+
+ free_buffers:
+	for (i-- ; i >= 0; i--) {
+		if (mctracer_trace.trace[i]) {
+			free_pages((unsigned long)mctracer_trace.trace[i],
+				   order);
+			mctracer_trace.trace[i] = NULL;
+		}
+	}
+	return -ENOMEM;
+}
+
+device_initcall(mctracer_alloc_buffers);
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 14:54:37.000000000 -0500
@@ -0,0 +1,21 @@
+#ifndef _LINUX_MCOUNT_TRACER_H
+#define _LINUX_MCOUNT_TRACER_H
+
+#include <asm/atomic.h>
+
+struct mctracer_entry {
+	unsigned long idx;
+	unsigned long ip;
+	unsigned long parent_ip;
+};
+
+struct mctracer_trace {
+	void	      *trace[NR_CPUS];
+	unsigned long trace_idx[NR_CPUS];
+	unsigned long entries;
+	atomic_t      cnt;
+	atomic_t      disabled[NR_CPUS];
+	atomic_t      underrun[NR_CPUS];
+};
+
+#endif /* _LINUX_MCOUNT_TRACER_H */
Index: linux-compile.git/lib/tracing/tracer_interface.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/tracer_interface.h	2008-01-15 14:54:37.000000000 -0500
@@ -0,0 +1,14 @@
+#ifndef _LINUX_MCTRACER_INTERFACE_H
+#define _LINUX_MCTRACER_INTERFACE_H
+
+#include "tracer.h"
+
+/*
+ * Will be at least sizeof(struct mctracer_entry), but callers can request more
+ * space for private stuff, such as a timestamp, preempt_count, etc.
+ */
+#define MCTRACER_ENTRY_SIZE sizeof(struct mctracer_entry)
+
+#define MCTRACER_NR_ENTRIES (65536UL)
+
+#endif /* _LINUX_MCTRACER_INTERFACE_H */

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 07/30 v3] tracer add debugfs interface
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (5 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 06/30 v3] mcount based trace in the form of a header file library Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 08/30 v3] mcount tracer output file Steven Rostedt
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-debugfs.patch --]
[-- Type: text/plain, Size: 3863 bytes --]

This patch adds an interface into debugfs.

  /debugfs/tracing/ctrl

echoing 1 into the ctrl file turns on the tracer,
and echoing 0 turns it off.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/tracing/tracer.h |    1 
 2 files changed, 87 insertions(+), 1 deletion(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:59.000000000 -0500
@@ -15,6 +15,8 @@
 #include <linux/init.h>
 #include <linux/linkage.h>
 #include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
 #include <linux/mcount.h>
 
 #include "tracer.h"
@@ -58,9 +60,12 @@ static notrace void trace_function(const
 	int cpu;
 
 	raw_local_irq_save(flags);
-	cpu = raw_smp_processor_id();
 
 	tr = &mctracer_trace;
+	if (!tr->ctrl)
+		goto out;
+
+	cpu = raw_smp_processor_id();
 
 	atomic_inc(&tr->disabled[cpu]);
 	if (likely(atomic_read(&tr->disabled[cpu]) == 1))
@@ -68,6 +73,7 @@ static notrace void trace_function(const
 
 	atomic_dec(&tr->disabled[cpu]);
 
+ out:
 	raw_local_irq_restore(flags);
 }
 
@@ -76,6 +82,83 @@ static struct mcount_ops trace_ops __rea
 	.func = trace_function,
 };
 
+#ifdef CONFIG_DEBUG_FS
+static int mctracer_open_generic(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+
+static ssize_t mctracer_ctrl_read(struct file *filp, char __user *ubuf,
+				  size_t cnt, loff_t *ppos)
+{
+	struct mctracer_trace *tr = filp->private_data;
+	char buf[16];
+	int r;
+
+	r = sprintf(buf, "%ld\n", tr->ctrl);
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       buf, r);
+}
+
+static ssize_t mctracer_ctrl_write(struct file *filp,
+				   const char __user *ubuf,
+				   size_t cnt, loff_t *ppos)
+{
+	struct mctracer_trace *tr = filp->private_data;
+	long val;
+	char buf[16];
+
+	if (cnt > 15)
+		cnt = 15;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	val = !!simple_strtoul(buf, NULL, 10);
+
+	tr->ctrl = val;
+
+	filp->f_pos += cnt;
+
+	return cnt;
+}
+
+static struct file_operations mctracer_ctrl_fops = {
+	.open = mctracer_open_generic,
+	.read = mctracer_ctrl_read,
+	.write = mctracer_ctrl_write,
+};
+
+static void mctrace_init_debugfs(void)
+{
+	struct dentry *d_mctracer;
+	struct dentry *entry;
+
+	d_mctracer = debugfs_create_dir("tracing", NULL);
+	if (!d_mctracer) {
+		pr_warning("Could not create debugfs directory mctracer\n");
+		return;
+	}
+
+	entry = debugfs_create_file("ctrl", 0644, d_mctracer,
+				    &mctracer_trace, &mctracer_ctrl_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'ctrl' entry\n");
+}
+#else /* CONFIG_DEBUG_FS */
+static void mctrace_init_debugfs(void)
+{
+	/*
+	 * No way to turn on or off the trace function
+	 * without debugfs.
+	 */
+}
+#endif /* CONFIG_DEBUG_FS */
+
 static notrace int page_order(const unsigned long size)
 {
 	const unsigned long nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
@@ -112,6 +195,8 @@ static notrace int mctracer_alloc_buffer
 
 	register_mcount_function(&trace_ops);
 
+	mctrace_init_debugfs();
+
 	return 0;
 
  free_buffers:
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-14 14:57:58.000000000 -0500
@@ -13,6 +13,7 @@ struct mctracer_trace {
 	void	      *trace[NR_CPUS];
 	unsigned long trace_idx[NR_CPUS];
 	unsigned long entries;
+	long	      ctrl;
 	atomic_t      cnt;
 	atomic_t      disabled[NR_CPUS];
 	atomic_t      underrun[NR_CPUS];

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 08/30 v3] mcount tracer output file
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (6 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 07/30 v3] tracer add debugfs interface Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 09/30 v3] mcount tracer show task comm and pid Steven Rostedt
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-debugfs-show.patch --]
[-- Type: text/plain, Size: 7564 bytes --]

Add /debugfs/tracing/trace to output trace output.

Here's an example of the content.

  CPU 0:  [<ffffffff80494691>] notifier_call_chain+0x16/0x60 <-- [<ffffffff80494701>] __atomic_notifier_call_chain+0x26/0x56
  CPU 0:  [<ffffffff802161c8>] mce_idle_callback+0x9/0x2f <-- [<ffffffff804946b3>] notifier_call_chain+0x38/0x60
  CPU 0:  [<ffffffff8037fb7a>] acpi_processor_idle+0x16/0x518 <-- [<ffffffff8020aee8>] cpu_idle+0xa1/0xe7
  CPU 0:  [<ffffffff8037fa98>] acpi_safe_halt+0x9/0x43 <-- [<ffffffff8037fd3a>] acpi_processor_idle+0x1d6/0x518
  CPU 1:  [<ffffffff80221db8>] smp_apic_timer_interrupt+0xc/0x58 <-- [<ffffffff8020cf06>] apic_timer_interrupt+0x66/0x70
  CPU 1:  [<ffffffff8020ac22>] exit_idle+0x9/0x22 <-- [<ffffffff80221de1>] smp_apic_timer_interrupt+0x35/0x58
  CPU 1:  [<ffffffff8020ab97>] __exit_idle+0x9/0x2e <-- [<ffffffff8020ac39>] exit_idle+0x20/0x22
  CPU 1:  [<ffffffff8049473a>] atomic_notifier_call_chain+0x9/0x16 <-- [<ffffffff8020abba>] __exit_idle+0x2c/0x2e
  CPU 1:  [<ffffffff804946e9>] __atomic_notifier_call_chain+0xe/0x56 <-- [<ffffffff80494745>] atomic_notifier_call_chain+0x14/0x16
  CPU 1:  [<ffffffff80494691>] notifier_call_chain+0x16/0x60 <-- [<ffffffff80494701>] __atomic_notifier_call_chain+0x26/0x56
  CPU 1:  [<ffffffff802161c8>] mce_idle_callback+0x9/0x2f <-- [<ffffffff804946b3>] notifier_call_chain+0x38/0x60

This is in the format of the output when KALLSYMS is defined.

  CPU <CPU#>: [<IP>] <func> <-- [<Parent-IP>] <parent-func>

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |  217 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 215 insertions(+), 2 deletions(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:58.000000000 -0500
@@ -13,9 +13,11 @@
 #include <linux/fs.h>
 #include <linux/gfp.h>
 #include <linux/init.h>
+#include <linux/module.h>
 #include <linux/linkage.h>
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
+#include <linux/kallsyms.h>
 #include <linux/uaccess.h>
 #include <linux/mcount.h>
 
@@ -23,6 +25,7 @@
 #include "tracer_interface.h"
 
 static struct mctracer_trace mctracer_trace;
+static int trace_enabled __read_mostly;
 
 static inline notrace void
 mctracer_add_trace_entry(struct mctracer_trace *tr,
@@ -62,7 +65,7 @@ static notrace void trace_function(const
 	raw_local_irq_save(flags);
 
 	tr = &mctracer_trace;
-	if (!tr->ctrl)
+	if (!trace_enabled)
 		goto out;
 
 	cpu = raw_smp_processor_id();
@@ -83,6 +86,205 @@ static struct mcount_ops trace_ops __rea
 };
 
 #ifdef CONFIG_DEBUG_FS
+struct mctracer_iterator {
+	struct mctracer_trace *tr;
+	struct mctracer_entry *ent;
+	unsigned long next_idx[NR_CPUS];
+	int cpu;
+	int idx;
+};
+
+static struct mctracer_entry *mctracer_entry_idx(struct mctracer_trace *tr,
+						 unsigned long idx,
+						 int cpu)
+{
+	struct mctracer_entry *array = tr->trace[cpu];
+	unsigned long underrun;
+
+	if (idx >= tr->entries)
+		return NULL;
+
+	underrun = atomic_read(&tr->underrun[cpu]);
+	if (underrun)
+		idx = ((underrun - 1) + idx) % tr->entries;
+	else if (idx >= tr->trace_idx[cpu])
+		return NULL;
+
+	return &array[idx];
+}
+
+static void *find_next_entry(struct mctracer_iterator *iter)
+{
+	struct mctracer_trace *tr = iter->tr;
+	struct mctracer_entry *ent;
+	struct mctracer_entry *next = NULL;
+	int next_i = -1;
+	int i;
+
+	for_each_possible_cpu(i) {
+		if (!tr->trace[i])
+			continue;
+		ent = mctracer_entry_idx(tr, iter->next_idx[i], i);
+		if (ent && (!next || next->idx > ent->idx)) {
+			next = ent;
+			next_i = i;
+		}
+	}
+	if (next) {
+		iter->next_idx[next_i]++;
+		iter->idx++;
+	}
+	iter->ent = next;
+	iter->cpu = next_i;
+
+	return next ? iter : NULL;
+}
+
+static void *s_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct mctracer_iterator *iter = m->private;
+	void *ent;
+	int i = (int)*pos;
+
+	(*pos)++;
+
+	/* can't go backwards */
+	if (iter->idx > i)
+		return NULL;
+
+	if (iter->idx < 0)
+		ent = find_next_entry(iter);
+	else
+		ent = iter;
+
+	while (ent && iter->idx < i)
+		ent = find_next_entry(iter);
+
+	return ent;
+}
+
+static void *s_start(struct seq_file *m, loff_t *pos)
+{
+	struct mctracer_iterator *iter = m->private;
+	void *p = NULL;
+	loff_t l = 0;
+	int i;
+
+	iter->ent = NULL;
+	iter->cpu = 0;
+	iter->idx = -1;
+
+	for (i = 0; i < NR_CPUS; i++)
+		iter->next_idx[i] = 0;
+
+	/* stop the trace while dumping */
+	if (iter->tr->ctrl)
+		trace_enabled = 0;
+
+	for (p = iter; p && l < *pos; p = s_next(m, p, &l))
+		;
+
+	return p;
+}
+
+static void s_stop(struct seq_file *m, void *p)
+{
+	struct mctracer_iterator *iter = m->private;
+	if (iter->tr->ctrl)
+		trace_enabled = 1;
+}
+
+#ifdef CONFIG_KALLSYMS
+static void seq_print_symbol(struct seq_file *m,
+			     const char *fmt, unsigned long address)
+{
+	char buffer[KSYM_SYMBOL_LEN];
+
+	sprint_symbol(buffer, address);
+	seq_printf(m, fmt, buffer);
+}
+#else
+# define seq_print_symbol(m, fmt, address) do { } while (0)
+#endif
+
+#ifndef CONFIG_64BIT
+# define IP_FMT "%08lx"
+#else
+# define IP_FMT "%016lx"
+#endif
+
+static void notrace seq_print_ip_sym(struct seq_file *m,
+				     unsigned long ip)
+{
+	seq_print_symbol(m, "%s", ip);
+	seq_printf(m, " <" IP_FMT ">", ip);
+}
+
+static int s_show(struct seq_file *m, void *v)
+{
+	struct mctracer_iterator *iter = v;
+
+	if (iter->ent == NULL) {
+		seq_printf(m, "mctracer:\n");
+	} else {
+		seq_printf(m, "  CPU %d:  ", iter->cpu);
+		seq_print_ip_sym(m, iter->ent->ip);
+		if (iter->ent->parent_ip) {
+			seq_printf(m, " <-- ");
+			seq_print_ip_sym(m, iter->ent->parent_ip);
+		}
+		seq_printf(m, "\n");
+	}
+
+	return 0;
+}
+
+static struct seq_operations mctrace_seq_ops = {
+	.start = s_start,
+	.next = s_next,
+	.stop = s_stop,
+	.show = s_show,
+};
+
+static int mctrace_open(struct inode *inode, struct file *file)
+{
+	struct mctracer_iterator *iter;
+	int ret;
+
+	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	if (!iter)
+		return -ENOMEM;
+
+	iter->tr = &mctracer_trace;
+
+	/* TODO stop tracer */
+	ret = seq_open(file, &mctrace_seq_ops);
+	if (!ret) {
+		struct seq_file *m = file->private_data;
+		m->private = iter;
+	} else
+		kfree(iter);
+
+	return ret;
+}
+
+int mctrace_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = (struct seq_file *)file->private_data;
+	struct mctracer_iterator *iter = m->private;
+
+	seq_release(inode, file);
+	kfree(iter);
+	return 0;
+}
+
+static struct file_operations mctrace_fops = {
+	.open = mctrace_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = mctrace_release,
+};
+
 static int mctracer_open_generic(struct inode *inode, struct file *filp)
 {
 	filp->private_data = inode->i_private;
@@ -120,7 +322,13 @@ static ssize_t mctracer_ctrl_write(struc
 
 	val = !!simple_strtoul(buf, NULL, 10);
 
-	tr->ctrl = val;
+	if (tr->ctrl ^ val) {
+		if (val)
+			trace_enabled = 1;
+		else
+			trace_enabled = 0;
+		tr->ctrl = val;
+	}
 
 	filp->f_pos += cnt;
 
@@ -148,6 +356,11 @@ static void mctrace_init_debugfs(void)
 				    &mctracer_trace, &mctracer_ctrl_fops);
 	if (!entry)
 		pr_warning("Could not create debugfs 'ctrl' entry\n");
+
+	entry = debugfs_create_file("trace", 0444, d_mctracer,
+				    &mctracer_trace, &mctrace_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'trace' entry\n");
 }
 #else /* CONFIG_DEBUG_FS */
 static void mctrace_init_debugfs(void)

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 09/30 v3] mcount tracer show task comm and pid
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (7 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 08/30 v3] mcount tracer output file Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 10/30 v3] Add a symbol only trace output Steven Rostedt
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-add-pid-comm.patch --]
[-- Type: text/plain, Size: 2650 bytes --]

This adds the task comm and pid to the trace output. This gives the
output like:

CPU 0: sshd:2605 [<ffffffff80251858>] remove_wait_queue+0xc/0x4a <-- [<ffffffff802ad7be>] free_poll_entry+0x1e/0x2a
CPU 2: bash:2610 [<ffffffff8038c3aa>] tty_check_change+0x9/0xb6 <-- [<ffffffff8038d295>] tty_ioctl+0x59f/0xcdd
CPU 0: sshd:2605 [<ffffffff80491ec6>] _spin_lock_irqsave+0xe/0x81 <-- [<ffffffff80251863>] remove_wait_queue+0x17/0x4a
CPU 2: bash:2610 [<ffffffff8024e2f7>] find_vpid+0x9/0x24 <-- [<ffffffff8038d325>] tty_ioctl+0x62f/0xcdd
CPU 0: sshd:2605 [<ffffffff804923ec>] _spin_unlock_irqrestore+0x9/0x3a <-- [<ffffffff80251891>] remove_wait_queue+0x45/0x4a
CPU 0: sshd:2605 [<ffffffff802a18b3>] fput+0x9/0x1b <-- [<ffffffff802ad7c6>] free_poll_entry+0x26/0x2a


Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |    6 +++++-
 lib/tracing/tracer.h |    3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:57.000000000 -0500
@@ -35,6 +35,7 @@ mctracer_add_trace_entry(struct mctracer
 {
 	unsigned long idx, idx_next;
 	struct mctracer_entry *entry;
+	struct task_struct *tsk = current;
 
 	idx = tr->trace_idx[cpu];
 	idx_next = idx + 1;
@@ -53,6 +54,8 @@ mctracer_add_trace_entry(struct mctracer
 	entry->idx	 = atomic_inc_return(&tr->cnt);
 	entry->ip	 = ip;
 	entry->parent_ip = parent_ip;
+	entry->pid	 = tsk->pid;
+	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
 }
 
 static notrace void trace_function(const unsigned long ip,
@@ -227,7 +230,8 @@ static int s_show(struct seq_file *m, vo
 	if (iter->ent == NULL) {
 		seq_printf(m, "mctracer:\n");
 	} else {
-		seq_printf(m, "  CPU %d:  ", iter->cpu);
+		seq_printf(m, "CPU %d: ", iter->cpu);
+		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
 		seq_print_ip_sym(m, iter->ent->ip);
 		if (iter->ent->parent_ip) {
 			seq_printf(m, " <-- ");
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-14 14:57:57.000000000 -0500
@@ -2,11 +2,14 @@
 #define _LINUX_MCOUNT_TRACER_H
 
 #include <asm/atomic.h>
+#include <linux/sched.h>
 
 struct mctracer_entry {
 	unsigned long idx;
 	unsigned long ip;
 	unsigned long parent_ip;
+	char comm[TASK_COMM_LEN];
+	pid_t pid;
 };
 
 struct mctracer_trace {

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 10/30 v3] Add a symbol only trace output
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (8 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 09/30 v3] mcount tracer show task comm and pid Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 11/30 v3] Reset the tracer when started Steven Rostedt
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-symbol-only.patch --]
[-- Type: text/plain, Size: 4934 bytes --]

The trace output is very verbose with outputing both the
IP address (Instruction Pointer not Internet Protocol!)
and the kallsyms symbol. So if kallsyms is configured into
the kernel, another file is created in the debugfs system.

A new file is added to the debugfs. iter_ctrl

 echo symonly > /debugfs/tracing/iter_ctrl

will turn off printing of instruction pointers.

 echo nosymonly > /debugfs/tracing/iter_ctrl

will turn them back on.

Here's an example:

CPU 1: swapper:0 smp_apic_timer_interrupt+0xc/0x58 <-- apic_timer_interrupt+0x66/0x70
CPU 1: swapper:0 exit_idle+0x9/0x22 <-- smp_apic_timer_interrupt+0x35/0x58
CPU 0: sshd:2611 _spin_unlock+0x9/0x38 <-- __qdisc_run+0xb2/0x1a1
CPU 1: swapper:0 __exit_idle+0x9/0x2e <-- exit_idle+0x20/0x22
CPU 0: sshd:2611 _spin_lock+0xe/0x7a <-- __qdisc_run+0xba/0x1a1
CPU 1: swapper:0 atomic_notifier_call_chain+0x9/0x16 <-- __exit_idle+0x2c/0x2e
CPU 1: swapper:0 __atomic_notifier_call_chain+0xe/0x56 <-- atomic_notifier_call_chain+0x14/0x16


Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |   65 +++++++++++++++++++++++++++++++++++++++++++++++----
 lib/tracing/tracer.h |    1 
 2 files changed, 62 insertions(+), 4 deletions(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-15 14:57:29.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-15 14:57:36.000000000 -0500
@@ -89,6 +89,10 @@ static struct mcount_ops trace_ops __rea
 };
 
 #ifdef CONFIG_DEBUG_FS
+enum trace_iterator {
+	TRACE_ITER_SYM_ONLY	= 1,
+};
+
 struct mctracer_iterator {
 	struct mctracer_trace *tr;
 	struct mctracer_entry *ent;
@@ -217,25 +221,29 @@ static void seq_print_symbol(struct seq_
 #endif
 
 static void notrace seq_print_ip_sym(struct seq_file *m,
-				     unsigned long ip)
+				     unsigned long ip,
+				     int sym_only)
 {
 	seq_print_symbol(m, "%s", ip);
-	seq_printf(m, " <" IP_FMT ">", ip);
+	if (!sym_only)
+		seq_printf(m, " <" IP_FMT ">", ip);
 }
 
 static int s_show(struct seq_file *m, void *v)
 {
 	struct mctracer_iterator *iter = v;
+	int sym_only = !!(iter->tr->iter_flags & TRACE_ITER_SYM_ONLY);
 
 	if (iter->ent == NULL) {
 		seq_printf(m, "mctracer:\n");
 	} else {
 		seq_printf(m, "CPU %d: ", iter->cpu);
 		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
-		seq_print_ip_sym(m, iter->ent->ip);
+		seq_print_ip_sym(m, iter->ent->ip, sym_only);
 		if (iter->ent->parent_ip) {
 			seq_printf(m, " <-- ");
-			seq_print_ip_sym(m, iter->ent->parent_ip);
+			seq_print_ip_sym(m, iter->ent->parent_ip,
+					 sym_only);
 		}
 		seq_printf(m, "\n");
 	}
@@ -345,6 +353,50 @@ static struct file_operations mctracer_c
 	.write = mctracer_ctrl_write,
 };
 
+static ssize_t mctracer_iter_ctrl_read(struct file *filp, char __user *ubuf,
+				       size_t cnt, loff_t *ppos)
+{
+	struct mctracer_trace *tr = filp->private_data;
+	char buf[64];
+	int r = 0;
+
+	if (tr->iter_flags & TRACE_ITER_SYM_ONLY)
+		r = sprintf(buf, "%s", "symonly ");
+	r += sprintf(buf+r, "\n");
+
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       buf, r);
+}
+
+static ssize_t mctracer_iter_ctrl_write(struct file *filp,
+					const char __user *ubuf,
+					size_t cnt, loff_t *ppos)
+{
+	struct mctracer_trace *tr = filp->private_data;
+	char buf[64];
+
+	if (cnt > 63)
+		cnt = 63;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	if (strncmp(buf, "symonly", 7) == 0)
+		tr->iter_flags |= TRACE_ITER_SYM_ONLY;
+
+	filp->f_pos += cnt;
+
+	return cnt;
+}
+
+static struct file_operations mctracer_iter_fops = {
+	.open = mctracer_open_generic,
+	.read = mctracer_iter_ctrl_read,
+	.write = mctracer_iter_ctrl_write,
+};
+
 static void mctrace_init_debugfs(void)
 {
 	struct dentry *d_mctracer;
@@ -361,10 +413,15 @@ static void mctrace_init_debugfs(void)
 	if (!entry)
 		pr_warning("Could not create debugfs 'ctrl' entry\n");
 
+	entry = debugfs_create_file("iter_ctrl", 0644, d_mctracer,
+				    &mctracer_trace, &mctracer_iter_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'iter_ctrl' entry\n");
 	entry = debugfs_create_file("trace", 0444, d_mctracer,
 				    &mctracer_trace, &mctrace_fops);
 	if (!entry)
 		pr_warning("Could not create debugfs 'trace' entry\n");
+
 }
 #else /* CONFIG_DEBUG_FS */
 static void mctrace_init_debugfs(void)
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-15 14:57:29.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 14:57:36.000000000 -0500
@@ -17,6 +17,7 @@ struct mctracer_trace {
 	unsigned long trace_idx[NR_CPUS];
 	unsigned long entries;
 	long	      ctrl;
+	unsigned long iter_flags;
 	atomic_t      cnt;
 	atomic_t      disabled[NR_CPUS];
 	atomic_t      underrun[NR_CPUS];

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 11/30 v3] Reset the tracer when started
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (9 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 10/30 v3] Add a symbol only trace output Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 12/30 v3] separate out the percpu date into a percpu struct Steven Rostedt
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-clear-buffer.patch --]
[-- Type: text/plain, Size: 1081 bytes --]

This patch resets the trace when it is started by the user.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:54.000000000 -0500
@@ -88,6 +88,16 @@ static struct mcount_ops trace_ops __rea
 	.func = trace_function,
 };
 
+static notrace void mctracer_reset(struct mctracer_trace *tr)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		tr->trace_idx[cpu] = 0;
+		atomic_set(&tr->underrun[cpu], 0);
+	}
+}
+
 #ifdef CONFIG_DEBUG_FS
 enum trace_iterator {
 	TRACE_ITER_SYM_ONLY	= 1,
@@ -334,6 +344,10 @@ static ssize_t mctracer_ctrl_write(struc
 
 	val = !!simple_strtoul(buf, NULL, 10);
 
+	/* When starting a new trace, reset the buffers */
+	if (val)
+		mctracer_reset(tr);
+
 	if (tr->ctrl ^ val) {
 		if (val)
 			trace_enabled = 1;

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 12/30 v3] separate out the percpu date into a percpu struct
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (10 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 11/30 v3] Reset the tracer when started Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 13/30 v3] handle accurate time keeping over long delays Steven Rostedt
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-percpu-struct.patch --]
[-- Type: text/plain, Size: 5263 bytes --]

For better cacheline performance, this patch creates a separate
struct for each CPU with the percpu data grouped together.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |   42 +++++++++++++++++++++++-------------------
 lib/tracing/tracer.h |   12 ++++++++----
 2 files changed, 31 insertions(+), 23 deletions(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:43.000000000 -0500
@@ -16,6 +16,7 @@
 #include <linux/module.h>
 #include <linux/linkage.h>
 #include <linux/seq_file.h>
+#include <linux/percpu.h>
 #include <linux/debugfs.h>
 #include <linux/kallsyms.h>
 #include <linux/uaccess.h>
@@ -25,6 +26,7 @@
 #include "tracer_interface.h"
 
 static struct mctracer_trace mctracer_trace;
+static DEFINE_PER_CPU(struct mctracer_trace_cpu, mctracer_trace_cpu);
 static int trace_enabled __read_mostly;
 
 static inline notrace void
@@ -36,21 +38,22 @@ mctracer_add_trace_entry(struct mctracer
 	unsigned long idx, idx_next;
 	struct mctracer_entry *entry;
 	struct task_struct *tsk = current;
+	struct mctracer_trace_cpu *data = tr->data[cpu];
 
-	idx = tr->trace_idx[cpu];
+	idx = data->trace_idx;
 	idx_next = idx + 1;
 
 	if (unlikely(idx_next >= tr->entries)) {
-		atomic_inc(&tr->underrun[cpu]);
+		atomic_inc(&data->underrun);
 		idx_next = 0;
 	}
 
-	tr->trace_idx[cpu] = idx_next;
+	data->trace_idx = idx_next;
 
-	if (unlikely(idx_next != 0 && atomic_read(&tr->underrun[cpu])))
-		atomic_inc(&tr->underrun[cpu]);
+	if (unlikely(idx_next != 0 && atomic_read(&data->underrun)))
+		atomic_inc(&data->underrun);
 
-	entry = tr->trace[cpu] + idx * MCTRACER_ENTRY_SIZE;
+	entry = data->trace + idx * MCTRACER_ENTRY_SIZE;
 	entry->idx	 = atomic_inc_return(&tr->cnt);
 	entry->ip	 = ip;
 	entry->parent_ip = parent_ip;
@@ -73,11 +76,11 @@ static notrace void trace_function(const
 
 	cpu = raw_smp_processor_id();
 
-	atomic_inc(&tr->disabled[cpu]);
-	if (likely(atomic_read(&tr->disabled[cpu]) == 1))
+	atomic_inc(&tr->data[cpu]->disabled);
+	if (likely(atomic_read(&tr->data[cpu]->disabled) == 1))
 		mctracer_add_trace_entry(tr, cpu, ip, parent_ip);
 
-	atomic_dec(&tr->disabled[cpu]);
+	atomic_dec(&tr->data[cpu]->disabled);
 
  out:
 	raw_local_irq_restore(flags);
@@ -93,8 +96,8 @@ static notrace void mctracer_reset(struc
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		tr->trace_idx[cpu] = 0;
-		atomic_set(&tr->underrun[cpu], 0);
+		tr->data[cpu]->trace_idx = 0;
+		atomic_set(&tr->data[cpu]->underrun, 0);
 	}
 }
 
@@ -115,16 +118,16 @@ static struct mctracer_entry *mctracer_e
 						 unsigned long idx,
 						 int cpu)
 {
-	struct mctracer_entry *array = tr->trace[cpu];
+	struct mctracer_entry *array = tr->data[cpu]->trace;
 	unsigned long underrun;
 
 	if (idx >= tr->entries)
 		return NULL;
 
-	underrun = atomic_read(&tr->underrun[cpu]);
+	underrun = atomic_read(&tr->data[cpu]->underrun);
 	if (underrun)
 		idx = ((underrun - 1) + idx) % tr->entries;
-	else if (idx >= tr->trace_idx[cpu])
+	else if (idx >= tr->data[cpu]->trace_idx)
 		return NULL;
 
 	return &array[idx];
@@ -139,7 +142,7 @@ static void *find_next_entry(struct mctr
 	int i;
 
 	for_each_possible_cpu(i) {
-		if (!tr->trace[i])
+		if (!tr->data[i]->trace)
 			continue;
 		ent = mctracer_entry_idx(tr, iter->next_idx[i], i);
 		if (ent && (!next || next->idx > ent->idx)) {
@@ -461,6 +464,7 @@ static notrace int mctracer_alloc_buffer
 	int i;
 
 	for_each_possible_cpu(i) {
+		mctracer_trace.data[i] = &per_cpu(mctracer_trace_cpu, i);
 		array = (struct mctracer_entry *)
 			  __get_free_pages(GFP_KERNEL, order);
 		if (array == NULL) {
@@ -468,7 +472,7 @@ static notrace int mctracer_alloc_buffer
 			       " %ld bytes for trace buffer!\n", size);
 			goto free_buffers;
 		}
-		mctracer_trace.trace[i] = array;
+		mctracer_trace.data[i]->trace = array;
 	}
 
 	/*
@@ -489,10 +493,10 @@ static notrace int mctracer_alloc_buffer
 
  free_buffers:
 	for (i-- ; i >= 0; i--) {
-		if (mctracer_trace.trace[i]) {
-			free_pages((unsigned long)mctracer_trace.trace[i],
+		if (mctracer_trace.data[i] && mctracer_trace.data[i]->trace) {
+			free_pages((unsigned long)mctracer_trace.data[i]->trace,
 				   order);
-			mctracer_trace.trace[i] = NULL;
+			mctracer_trace.data[i]->trace = NULL;
 		}
 	}
 	return -ENOMEM;
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-14 14:57:43.000000000 -0500
@@ -12,15 +12,19 @@ struct mctracer_entry {
 	pid_t pid;
 };
 
+struct mctracer_trace_cpu {
+	void *trace;
+	unsigned long trace_idx;
+	atomic_t      disabled;
+	atomic_t      underrun;
+};
+
 struct mctracer_trace {
-	void	      *trace[NR_CPUS];
-	unsigned long trace_idx[NR_CPUS];
 	unsigned long entries;
 	long	      ctrl;
 	unsigned long iter_flags;
 	atomic_t      cnt;
-	atomic_t      disabled[NR_CPUS];
-	atomic_t      underrun[NR_CPUS];
+	struct mctracer_trace_cpu *data[NR_CPUS];
 };
 
 #endif /* _LINUX_MCOUNT_TRACER_H */

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 13/30 v3] handle accurate time keeping over long delays
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (11 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 12/30 v3] separate out the percpu date into a percpu struct Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 14/30 v3] ppc clock accumulate fix Steven Rostedt
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, John Stultz,
	Steven Rostedt

[-- Attachment #1: rt-time-starvation-fix.patch --]
[-- Type: text/plain, Size: 9388 bytes --]

Handle accurate time even if there's a long delay between
accumulated clock cycles.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/vsyscall_64.c |    5 ++-
 include/asm-x86/vgtod.h       |    2 -
 include/linux/clocksource.h   |   58 ++++++++++++++++++++++++++++++++++++++++--
 kernel/time/timekeeping.c     |   35 +++++++++++++------------
 4 files changed, 80 insertions(+), 20 deletions(-)

Index: linux-compile.git/arch/x86/kernel/vsyscall_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/vsyscall_64.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/vsyscall_64.c	2008-01-14 13:14:13.000000000 -0500
@@ -86,6 +86,7 @@ void update_vsyscall(struct timespec *wa
 	vsyscall_gtod_data.clock.mask = clock->mask;
 	vsyscall_gtod_data.clock.mult = clock->mult;
 	vsyscall_gtod_data.clock.shift = clock->shift;
+	vsyscall_gtod_data.clock.cycle_accumulated = clock->cycle_accumulated;
 	vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
 	vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
 	vsyscall_gtod_data.wall_to_monotonic = wall_to_monotonic;
@@ -121,7 +122,7 @@ static __always_inline long time_syscall
 
 static __always_inline void do_vgettimeofday(struct timeval * tv)
 {
-	cycle_t now, base, mask, cycle_delta;
+	cycle_t now, base, accumulated, mask, cycle_delta;
 	unsigned seq;
 	unsigned long mult, shift, nsec;
 	cycle_t (*vread)(void);
@@ -135,6 +136,7 @@ static __always_inline void do_vgettimeo
 		}
 		now = vread();
 		base = __vsyscall_gtod_data.clock.cycle_last;
+		accumulated  = __vsyscall_gtod_data.clock.cycle_accumulated;
 		mask = __vsyscall_gtod_data.clock.mask;
 		mult = __vsyscall_gtod_data.clock.mult;
 		shift = __vsyscall_gtod_data.clock.shift;
@@ -145,6 +147,7 @@ static __always_inline void do_vgettimeo
 
 	/* calculate interval: */
 	cycle_delta = (now - base) & mask;
+	cycle_delta += accumulated;
 	/* convert to nsecs: */
 	nsec += (cycle_delta * mult) >> shift;
 
Index: linux-compile.git/include/asm-x86/vgtod.h
===================================================================
--- linux-compile.git.orig/include/asm-x86/vgtod.h	2008-01-14 13:13:44.000000000 -0500
+++ linux-compile.git/include/asm-x86/vgtod.h	2008-01-14 13:14:13.000000000 -0500
@@ -15,7 +15,7 @@ struct vsyscall_gtod_data {
 	struct timezone sys_tz;
 	struct { /* extract of a clocksource struct */
 		cycle_t (*vread)(void);
-		cycle_t	cycle_last;
+		cycle_t	cycle_last, cycle_accumulated;
 		cycle_t	mask;
 		u32	mult;
 		u32	shift;
Index: linux-compile.git/include/linux/clocksource.h
===================================================================
--- linux-compile.git.orig/include/linux/clocksource.h	2008-01-14 13:13:44.000000000 -0500
+++ linux-compile.git/include/linux/clocksource.h	2008-01-14 14:57:48.000000000 -0500
@@ -50,8 +50,12 @@ struct clocksource;
  * @flags:		flags describing special properties
  * @vread:		vsyscall based read
  * @resume:		resume function for the clocksource, if necessary
+ * @cycle_last:		Used internally by timekeeping core, please ignore.
+ * @cycle_accumulated:	Used internally by timekeeping core, please ignore.
  * @cycle_interval:	Used internally by timekeeping core, please ignore.
  * @xtime_interval:	Used internally by timekeeping core, please ignore.
+ * @xtime_nsec:		Used internally by timekeeping core, please ignore.
+ * @error:		Used internally by timekeeping core, please ignore.
  */
 struct clocksource {
 	/*
@@ -82,7 +86,10 @@ struct clocksource {
 	 * Keep it in a different cache line to dirty no
 	 * more than one cache line.
 	 */
-	cycle_t cycle_last ____cacheline_aligned_in_smp;
+	struct {
+		cycle_t cycle_last, cycle_accumulated;
+	} ____cacheline_aligned_in_smp;
+
 	u64 xtime_nsec;
 	s64 error;
 
@@ -168,11 +175,44 @@ static inline cycle_t clocksource_read(s
 }
 
 /**
+ * clocksource_get_cycles: - Access the clocksource's accumulated cycle value
+ * @cs:		pointer to clocksource being read
+ * @now:	current cycle value
+ *
+ * Uses the clocksource to return the current cycle_t value.
+ * NOTE!!!: This is different from clocksource_read, because it
+ * returns the accumulated cycle value! Must hold xtime lock!
+ */
+static inline cycle_t
+clocksource_get_cycles(struct clocksource *cs, cycle_t now)
+{
+	cycle_t offset = (now - cs->cycle_last) & cs->mask;
+	offset += cs->cycle_accumulated;
+	return offset;
+}
+
+/**
+ * clocksource_accumulate: - Accumulates clocksource cycles
+ * @cs:		pointer to clocksource being read
+ * @now:	current cycle value
+ *
+ * Used to avoids clocksource hardware overflow by periodically
+ * accumulating the current cycle delta. Must hold xtime write lock!
+ */
+static inline void clocksource_accumulate(struct clocksource *cs, cycle_t now)
+{
+	cycle_t offset = (now - cs->cycle_last) & cs->mask;
+	cs->cycle_last = now;
+	cs->cycle_accumulated += offset;
+}
+
+/**
  * cyc2ns - converts clocksource cycles to nanoseconds
  * @cs:		Pointer to clocksource
  * @cycles:	Cycles
  *
  * Uses the clocksource and ntp ajdustment to convert cycle_ts to nanoseconds.
+ * Must hold xtime lock!
  *
  * XXX - This could use some mult_lxl_ll() asm optimization
  */
@@ -184,13 +224,27 @@ static inline s64 cyc2ns(struct clocksou
 }
 
 /**
+ * ns2cyc - converts nanoseconds to clocksource cycles
+ * @cs:		Pointer to clocksource
+ * @nsecs:	Nanoseconds
+ */
+static inline cycle_t ns2cyc(struct clocksource *cs, u64 nsecs)
+{
+	cycle_t ret = nsecs << cs->shift;
+
+	do_div(ret, cs->mult + 1);
+
+	return ret;
+}
+
+/**
  * clocksource_calculate_interval - Calculates a clocksource interval struct
  *
  * @c:		Pointer to clocksource.
  * @length_nsec: Desired interval length in nanoseconds.
  *
  * Calculates a fixed cycle/nsec interval for a given clocksource/adjustment
- * pair and interval request.
+ * pair and interval request. Must hold xtime_lock!
  *
  * Unless you're the timekeeping code, you should not be using this!
  */
Index: linux-compile.git/kernel/time/timekeeping.c
===================================================================
--- linux-compile.git.orig/kernel/time/timekeeping.c	2008-01-14 13:13:44.000000000 -0500
+++ linux-compile.git/kernel/time/timekeeping.c	2008-01-14 14:57:50.000000000 -0500
@@ -66,16 +66,10 @@ static struct clocksource *clock; /* poi
  */
 static inline s64 __get_nsec_offset(void)
 {
-	cycle_t cycle_now, cycle_delta;
+	cycle_t cycle_delta;
 	s64 ns_offset;
 
-	/* read clocksource: */
-	cycle_now = clocksource_read(clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
-
-	/* convert to nanoseconds: */
+	cycle_delta = clocksource_get_cycles(clock, clocksource_read(clock));
 	ns_offset = cyc2ns(clock, cycle_delta);
 
 	return ns_offset;
@@ -195,7 +189,7 @@ static void change_clocksource(void)
 
 	clock = new;
 	clock->cycle_last = now;
-
+	clock->cycle_accumulated = 0;
 	clock->error = 0;
 	clock->xtime_nsec = 0;
 	clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH);
@@ -205,9 +199,15 @@ static void change_clocksource(void)
 	printk(KERN_INFO "Time: %s clocksource has been installed.\n",
 	       clock->name);
 }
+
+void timekeeping_accumulate(void)
+{
+	clocksource_accumulate(clock, clocksource_read(clock));
+}
 #else
 static inline void change_clocksource(void) { }
 static inline s64 __get_nsec_offset(void) { return 0; }
+void timekeeping_accumulate(void) { }
 #endif
 
 /**
@@ -302,6 +302,7 @@ static int timekeeping_resume(struct sys
 	timespec_add_ns(&xtime, timekeeping_suspend_nsecs);
 	/* re-base the last cycle value */
 	clock->cycle_last = clocksource_read(clock);
+	clock->cycle_accumulated = 0;
 	clock->error = 0;
 	timekeeping_suspended = 0;
 	write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -448,27 +449,29 @@ static void clocksource_adjust(s64 offse
  */
 void update_wall_time(void)
 {
-	cycle_t offset;
+	cycle_t cycle_now, offset;
 
 	/* Make sure we're fully resumed: */
 	if (unlikely(timekeeping_suspended))
 		return;
 
 #ifdef CONFIG_GENERIC_TIME
-	offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask;
+	cycle_now = clocksource_read(clock);
 #else
-	offset = clock->cycle_interval;
+	cycle_now = clock->cycle_last + clock->cycle_interval;
 #endif
+	offset = (cycle_now - clock->cycle_last) & clock->mask;
+	clocksource_accumulate(clock, cycle_now);
+
 	clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
 
 	/* normally this loop will run just once, however in the
 	 * case of lost or late ticks, it will accumulate correctly.
 	 */
-	while (offset >= clock->cycle_interval) {
+	while (clock->cycle_accumulated >= clock->cycle_interval) {
 		/* accumulate one interval */
 		clock->xtime_nsec += clock->xtime_interval;
-		clock->cycle_last += clock->cycle_interval;
-		offset -= clock->cycle_interval;
+		clock->cycle_accumulated -= clock->cycle_interval;
 
 		if (clock->xtime_nsec >= (u64)NSEC_PER_SEC << clock->shift) {
 			clock->xtime_nsec -= (u64)NSEC_PER_SEC << clock->shift;
@@ -482,7 +485,7 @@ void update_wall_time(void)
 	}
 
 	/* correct the clock when NTP error is too big */
-	clocksource_adjust(offset);
+	clocksource_adjust(clock->cycle_accumulated);
 
 	/* store full nanoseconds into xtime */
 	xtime.tv_nsec = (s64)clock->xtime_nsec >> clock->shift;

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 14/30 v3] ppc clock accumulate fix
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (12 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 13/30 v3] handle accurate time keeping over long delays Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 15/30 v3] Fixup merge between xtime_cache and timkkeeping starvation fix Steven Rostedt
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, John Stultz,
	Steven Rostedt

[-- Attachment #1: ppc-clock-accumulate-update.patch --]
[-- Type: text/plain, Size: 1119 bytes --]


The following is a quick and dirty fix for powerpc so it includes
cycle_accumulated in its calculation. It relies on the fact that the
powerpc clocksource is a 64bit counter (don't have to worry about
multiple overflows), so the subtraction should be safe.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/powerpc/kernel/time.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-compile.git/arch/powerpc/kernel/time.c
===================================================================
--- linux-compile.git.orig/arch/powerpc/kernel/time.c	2008-01-14 13:13:43.000000000 -0500
+++ linux-compile.git/arch/powerpc/kernel/time.c	2008-01-14 13:14:13.000000000 -0500
@@ -773,7 +773,8 @@ void update_vsyscall(struct timespec *wa
 	stamp_xsec = (u64) xtime.tv_nsec * XSEC_PER_SEC;
 	do_div(stamp_xsec, 1000000000);
 	stamp_xsec += (u64) xtime.tv_sec * XSEC_PER_SEC;
-	update_gtod(clock->cycle_last, stamp_xsec, t2x);
+	update_gtod(clock->cycle_last-clock->cycle_accumulated,
+		    stamp_xsec, t2x);
 }
 
 void update_vsyscall_tz(void)

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 15/30 v3] Fixup merge between xtime_cache and timkkeeping starvation fix
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (13 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 14/30 v3] ppc clock accumulate fix Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 16/30 v3] time keeping add cycle_raw for actual incrementation Steven Rostedt
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, John Stultz,
	Steven Rostedt

[-- Attachment #1: fixup-merge-between-xtime-cache-and-starvation.patch --]
[-- Type: text/plain, Size: 3141 bytes --]

>  void update_wall_time(void)
>  {
> -	cycle_t offset;
> +	cycle_t cycle_now, offset;
> 
>  	/* Make sure we're fully resumed: */
>  	if (unlikely(timekeeping_suspended))
>  		return;
> 
>  #ifdef CONFIG_GENERIC_TIME
> -	offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask;
> +	cycle_now = clocksource_read(clock);
>  #else
> -	offset = clock->cycle_interval;
> +	cycle_now = clock->cycle_last + clock->cycle_interval;
>  #endif
> +	offset = (cycle_now - clock->cycle_last) & clock->mask;

It seems this offset addition was to merge against the colliding
xtime_cache changes in mainline. However, I don't think its quite right,
and might be causing incorrect time() or vtime() results if NO_HZ is
enabled.

> +	clocksource_accumulate(clock, cycle_now);
> +
>  	clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
> 
>  	/* normally this loop will run just once, however in the
>  	 * case of lost or late ticks, it will accumulate correctly.
>  	 */
> -	while (offset >= clock->cycle_interval) {
> +	while (clock->cycle_accumulated >= clock->cycle_interval) {
>  		/* accumulate one interval */
>  		clock->xtime_nsec += clock->xtime_interval;
> -		clock->cycle_last += clock->cycle_interval;
> -		offset -= clock->cycle_interval;
> +		clock->cycle_accumulated -= clock->cycle_interval;
> 
>  		if (clock->xtime_nsec >= (u64)NSEC_PER_SEC << clock->shift) {
>  			clock->xtime_nsec -= (u64)NSEC_PER_SEC << clock->shift;
> @@ -482,7 +485,7 @@ void update_wall_time(void)
>  	}
> 
>  	/* correct the clock when NTP error is too big */
> -	clocksource_adjust(offset);
> +	clocksource_adjust(clock->cycle_accumulated);


I suspect the following is needed, but haven't been able to test it yet.


Fixup merge between xtime_cache and timekeeping starvation fix.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/time/timekeeping.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-compile.git/kernel/time/timekeeping.c
===================================================================
--- linux-compile.git.orig/kernel/time/timekeeping.c	2008-01-15 10:20:15.000000000 -0500
+++ linux-compile.git/kernel/time/timekeeping.c	2008-01-15 10:20:50.000000000 -0500
@@ -449,7 +449,7 @@ static void clocksource_adjust(s64 offse
  */
 void update_wall_time(void)
 {
-	cycle_t cycle_now, offset;
+	cycle_t cycle_now;
 
 	/* Make sure we're fully resumed: */
 	if (unlikely(timekeeping_suspended))
@@ -460,7 +460,6 @@ void update_wall_time(void)
 #else
 	cycle_now = clock->cycle_last + clock->cycle_interval;
 #endif
-	offset = (cycle_now - clock->cycle_last) & clock->mask;
 	clocksource_accumulate(clock, cycle_now);
 
 	clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
@@ -491,7 +490,7 @@ void update_wall_time(void)
 	xtime.tv_nsec = (s64)clock->xtime_nsec >> clock->shift;
 	clock->xtime_nsec -= (s64)xtime.tv_nsec << clock->shift;
 
-	update_xtime_cache(cyc2ns(clock, offset));
+	update_xtime_cache(cyc2ns(clock, clock->cycle_accumulated));
 
 	/* check to see if there is a new clocksource to use */
 	change_clocksource();

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 16/30 v3] time keeping add cycle_raw for actual incrementation
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (14 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 15/30 v3] Fixup merge between xtime_cache and timkkeeping starvation fix Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock Steven Rostedt
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, John Stultz

[-- Attachment #1: timekeeping-add-cycle-raw-for-actual-incrementation.patch --]
[-- Type: text/plain, Size: 1295 bytes --]

The get_monotonic_cycles needs to produce a monotonic counter as output.

This patch adds a cycle_raw to produce an accumulative counter.
Unfortunately there is already an cycle_accumulate variable, but that is
used to avoid clock source overflow and can also be decremented
(probably that name should be changed and we should use that for this
patch).


Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: John Stultz <johnstul@us.ibm.com>

---
 include/linux/clocksource.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-compile.git/include/linux/clocksource.h
===================================================================
--- linux-compile.git.orig/include/linux/clocksource.h	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/include/linux/clocksource.h	2008-01-14 14:57:47.000000000 -0500
@@ -87,7 +87,7 @@ struct clocksource {
 	 * more than one cache line.
 	 */
 	struct {
-		cycle_t cycle_last, cycle_accumulated;
+		cycle_t cycle_last, cycle_accumulated, cycle_raw;
 	} ____cacheline_aligned_in_smp;
 
 	u64 xtime_nsec;
@@ -204,6 +204,7 @@ static inline void clocksource_accumulat
 	cycle_t offset = (now - cs->cycle_last) & cs->mask;
 	cs->cycle_last = now;
 	cs->cycle_accumulated += offset;
+	cs->cycle_raw += offset;
 }
 
 /**

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock.
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (15 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 16/30 v3] time keeping add cycle_raw for actual incrementation Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 21:14   ` Mathieu Desnoyers
  2008-01-15 20:49 ` [RFC PATCH 18/30 v3] add get_monotonic_cycles Steven Rostedt
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, John Stultz

[-- Attachment #1: initialize-clocksource-to-jiffies.patch --]
[-- Type: text/plain, Size: 2092 bytes --]

The latency tracer can call clocksource_read very early in bootup and
before the clock source variable has been initialized. This results in a
crash at boot up (even before earlyprintk is initialized). Since the
clock->read variable points to NULL.

This patch simply initializes the clock to use clocksource_jiffies, so
that any early user of clocksource_read will not crash.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: John Stultz <johnstul@us.ibm.com>
---
 include/linux/clocksource.h |    3 +++
 kernel/time/timekeeping.c   |    9 +++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

Index: linux-compile.git/include/linux/clocksource.h
===================================================================
--- linux-compile.git.orig/include/linux/clocksource.h	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/include/linux/clocksource.h	2008-01-14 14:57:46.000000000 -0500
@@ -274,6 +274,9 @@ extern struct clocksource* clocksource_g
 extern void clocksource_change_rating(struct clocksource *cs, int rating);
 extern void clocksource_resume(void);
 
+/* used to initialize clock */
+extern struct clocksource clocksource_jiffies;
+
 #ifdef CONFIG_GENERIC_TIME_VSYSCALL
 extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
 extern void update_vsyscall_tz(void);
Index: linux-compile.git/kernel/time/timekeeping.c
===================================================================
--- linux-compile.git.orig/kernel/time/timekeeping.c	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/kernel/time/timekeeping.c	2008-01-14 14:57:46.000000000 -0500
@@ -53,8 +53,13 @@ static inline void update_xtime_cache(u6
 	timespec_add_ns(&xtime_cache, nsec);
 }
 
-static struct clocksource *clock; /* pointer to current clocksource */
-
+/*
+ * pointer to current clocksource
+ *  Just in case we use clocksource_read before we initialize
+ *  the actual clock source. Instead of calling a NULL read pointer
+ *  we return jiffies.
+ */
+static struct clocksource *clock = &clocksource_jiffies;
 
 #ifdef CONFIG_GENERIC_TIME
 /**

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 18/30 v3] add get_monotonic_cycles
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (16 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 19/30 v3] add notrace annotations to timing events Steven Rostedt
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: get-monotonic-cycles.patch --]
[-- Type: text/plain, Size: 3104 bytes --]

The latency tracer needs a way to get an accurate time
without grabbing any locks. Locks themselves might call
the latency tracer and cause at best a slow down.

This patch adds get_monotonic_cycles that returns cycles
from a reliable clock source in a monotonic fashion.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/clocksource.h |    3 ++
 kernel/time/timekeeping.c   |   48 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

Index: linux-compile.git/include/linux/clocksource.h
===================================================================
--- linux-compile.git.orig/include/linux/clocksource.h	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/include/linux/clocksource.h	2008-01-14 13:14:14.000000000 -0500
@@ -273,6 +273,9 @@ extern int clocksource_register(struct c
 extern struct clocksource* clocksource_get_next(void);
 extern void clocksource_change_rating(struct clocksource *cs, int rating);
 extern void clocksource_resume(void);
+extern cycle_t get_monotonic_cycles(void);
+extern unsigned long cycles_to_usecs(cycle_t cycles);
+extern cycle_t usecs_to_cycles(unsigned long usecs);
 
 /* used to initialize clock */
 extern struct clocksource clocksource_jiffies;
Index: linux-compile.git/kernel/time/timekeeping.c
===================================================================
--- linux-compile.git.orig/kernel/time/timekeeping.c	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/kernel/time/timekeeping.c	2008-01-14 13:14:14.000000000 -0500
@@ -103,6 +103,54 @@ static inline void __get_realtime_clock_
 	timespec_add_ns(ts, nsecs);
 }
 
+cycle_t notrace get_monotonic_cycles(void)
+{
+	cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;
+
+	do {
+		/*
+		 * cycle_raw and cycle_last can change on
+		 * another CPU and we need the delta calculation
+		 * of cycle_now and cycle_last happen atomic, as well
+		 * as the adding to cycle_raw. We don't need to grab
+		 * any locks, we just keep trying until get all the
+		 * calculations together in one state.
+		 *
+		 * In fact, we __cant__ grab any locks. This
+		 * function is called from the latency_tracer which can
+		 * be called anywhere. To grab any locks (including
+		 * seq_locks) we risk putting ourselves into a deadlock.
+		 */
+		cycle_raw = clock->cycle_raw;
+		cycle_last = clock->cycle_last;
+
+		/* read clocksource: */
+		cycle_now = clocksource_read(clock);
+
+		/* calculate the delta since the last update_wall_time: */
+		cycle_delta = (cycle_now - cycle_last) & clock->mask;
+
+	} while (cycle_raw != clock->cycle_raw ||
+		 cycle_last != clock->cycle_last);
+
+	return cycle_raw + cycle_delta;
+}
+
+unsigned long notrace cycles_to_usecs(cycle_t cycles)
+{
+	u64 ret = cyc2ns(clock, cycles);
+
+	ret += NSEC_PER_USEC/2; /* For rounding in do_div() */
+	do_div(ret, NSEC_PER_USEC);
+
+	return ret;
+}
+
+cycle_t notrace usecs_to_cycles(unsigned long usecs)
+{
+	return ns2cyc(clock, (u64)usecs * 1000);
+}
+
 /**
  * getnstimeofday - Returns the time of day in a timespec
  * @ts:		pointer to the timespec to be set

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 19/30 v3] add notrace annotations to timing events
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (17 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 18/30 v3] add get_monotonic_cycles Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 20/30 v3] Add timestamps to tracer Steven Rostedt
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-add-time-notrace-annotations.patch --]
[-- Type: text/plain, Size: 5200 bytes --]

This patch adds notrace annotations to timer functions
that will be used by tracing. This helps speed things up and
also keeps the ugliness of printing these functions down.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/apic_32.c     |    2 +-
 arch/x86/kernel/hpet.c        |    2 +-
 arch/x86/kernel/time_32.c     |    2 +-
 arch/x86/kernel/tsc_32.c      |    2 +-
 arch/x86/kernel/tsc_64.c      |    4 ++--
 arch/x86/lib/delay_32.c       |    6 +++---
 drivers/clocksource/acpi_pm.c |    8 ++++----
 7 files changed, 13 insertions(+), 13 deletions(-)

Index: linux-compile.git/arch/x86/kernel/apic_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/apic_32.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/apic_32.c	2008-01-14 13:14:14.000000000 -0500
@@ -577,7 +577,7 @@ static void local_apic_timer_interrupt(v
  *   interrupt as well. Thus we cannot inline the local irq ... ]
  */
 
-void fastcall smp_apic_timer_interrupt(struct pt_regs *regs)
+notrace fastcall void smp_apic_timer_interrupt(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs = set_irq_regs(regs);
 
Index: linux-compile.git/arch/x86/kernel/hpet.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/hpet.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/hpet.c	2008-01-14 13:14:14.000000000 -0500
@@ -295,7 +295,7 @@ static int hpet_legacy_next_event(unsign
 /*
  * Clock source related code
  */
-static cycle_t read_hpet(void)
+static notrace cycle_t read_hpet(void)
 {
 	return (cycle_t)hpet_readl(HPET_COUNTER);
 }
Index: linux-compile.git/arch/x86/kernel/time_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/time_32.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/time_32.c	2008-01-14 13:14:14.000000000 -0500
@@ -122,7 +122,7 @@ static int set_rtc_mmss(unsigned long no
 
 int timer_ack;
 
-unsigned long profile_pc(struct pt_regs *regs)
+notrace unsigned long profile_pc(struct pt_regs *regs)
 {
 	unsigned long pc = instruction_pointer(regs);
 
Index: linux-compile.git/arch/x86/kernel/tsc_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/tsc_32.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/tsc_32.c	2008-01-14 13:14:14.000000000 -0500
@@ -269,7 +269,7 @@ core_initcall(cpufreq_tsc);
 
 static unsigned long current_tsc_khz = 0;
 
-static cycle_t read_tsc(void)
+static notrace cycle_t read_tsc(void)
 {
 	cycle_t ret;
 
Index: linux-compile.git/arch/x86/kernel/tsc_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/tsc_64.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/tsc_64.c	2008-01-14 13:14:14.000000000 -0500
@@ -248,13 +248,13 @@ __setup("notsc", notsc_setup);
 
 
 /* clock source code: */
-static cycle_t read_tsc(void)
+static notrace cycle_t read_tsc(void)
 {
 	cycle_t ret = (cycle_t)get_cycles_sync();
 	return ret;
 }
 
-static cycle_t __vsyscall_fn vread_tsc(void)
+static notrace cycle_t __vsyscall_fn vread_tsc(void)
 {
 	cycle_t ret = (cycle_t)get_cycles_sync();
 	return ret;
Index: linux-compile.git/arch/x86/lib/delay_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/lib/delay_32.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/arch/x86/lib/delay_32.c	2008-01-14 13:14:14.000000000 -0500
@@ -24,7 +24,7 @@
 #endif
 
 /* simple loop based delay: */
-static void delay_loop(unsigned long loops)
+static notrace void delay_loop(unsigned long loops)
 {
 	int d0;
 
@@ -39,7 +39,7 @@ static void delay_loop(unsigned long loo
 }
 
 /* TSC based delay: */
-static void delay_tsc(unsigned long loops)
+static notrace void delay_tsc(unsigned long loops)
 {
 	unsigned long bclock, now;
 
@@ -72,7 +72,7 @@ int read_current_timer(unsigned long *ti
 	return -1;
 }
 
-void __delay(unsigned long loops)
+notrace void __delay(unsigned long loops)
 {
 	delay_fn(loops);
 }
Index: linux-compile.git/drivers/clocksource/acpi_pm.c
===================================================================
--- linux-compile.git.orig/drivers/clocksource/acpi_pm.c	2008-01-14 13:13:34.000000000 -0500
+++ linux-compile.git/drivers/clocksource/acpi_pm.c	2008-01-14 13:14:14.000000000 -0500
@@ -30,13 +30,13 @@
  */
 u32 pmtmr_ioport __read_mostly;
 
-static inline u32 read_pmtmr(void)
+static inline notrace u32 read_pmtmr(void)
 {
 	/* mask the output to 24 bits */
 	return inl(pmtmr_ioport) & ACPI_PM_MASK;
 }
 
-u32 acpi_pm_read_verified(void)
+notrace u32 acpi_pm_read_verified(void)
 {
 	u32 v1 = 0, v2 = 0, v3 = 0;
 
@@ -56,12 +56,12 @@ u32 acpi_pm_read_verified(void)
 	return v2;
 }
 
-static cycle_t acpi_pm_read_slow(void)
+static notrace cycle_t acpi_pm_read_slow(void)
 {
 	return (cycle_t)acpi_pm_read_verified();
 }
 
-static cycle_t acpi_pm_read(void)
+static notrace cycle_t acpi_pm_read(void)
 {
 	return (cycle_t)read_pmtmr();
 }

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 20/30 v3] Add timestamps to tracer
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (18 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 19/30 v3] add notrace annotations to timing events Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 21/30 v3] Sort trace by timestamp Steven Rostedt
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-timestamp.patch --]
[-- Type: text/plain, Size: 2138 bytes --]

Add timestamps to trace entries.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |   16 ++++++++++++++++
 lib/tracing/tracer.h |    1 +
 2 files changed, 17 insertions(+)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:41.000000000 -0500
@@ -19,12 +19,18 @@
 #include <linux/percpu.h>
 #include <linux/debugfs.h>
 #include <linux/kallsyms.h>
+#include <linux/clocksource.h>
 #include <linux/uaccess.h>
 #include <linux/mcount.h>
 
 #include "tracer.h"
 #include "tracer_interface.h"
 
+static inline notrace cycle_t now(void)
+{
+	return get_monotonic_cycles();
+}
+
 static struct mctracer_trace mctracer_trace;
 static DEFINE_PER_CPU(struct mctracer_trace_cpu, mctracer_trace_cpu);
 static int trace_enabled __read_mostly;
@@ -58,6 +64,7 @@ mctracer_add_trace_entry(struct mctracer
 	entry->ip	 = ip;
 	entry->parent_ip = parent_ip;
 	entry->pid	 = tsk->pid;
+	entry->t	 = now();
 	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
 }
 
@@ -250,6 +257,15 @@ static int s_show(struct seq_file *m, vo
 	if (iter->ent == NULL) {
 		seq_printf(m, "mctracer:\n");
 	} else {
+		unsigned long long t;
+		unsigned long usec_rem;
+		unsigned long secs;
+
+		t = cycles_to_usecs(iter->ent->t);
+		usec_rem = do_div(t, 1000000ULL);
+		secs = (unsigned long)t;
+
+		seq_printf(m, "[%5lu.%06lu] ", secs, usec_rem);
 		seq_printf(m, "CPU %d: ", iter->cpu);
 		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
 		seq_print_ip_sym(m, iter->ent->ip, sym_only);
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-14 14:57:41.000000000 -0500
@@ -5,6 +5,7 @@
 #include <linux/sched.h>
 
 struct mctracer_entry {
+	unsigned long long t;
 	unsigned long idx;
 	unsigned long ip;
 	unsigned long parent_ip;

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 21/30 v3] Sort trace by timestamp
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (19 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 20/30 v3] Add timestamps to tracer Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 22/30 v3] speed up the output of the tracer Steven Rostedt
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-sort-by-time.patch --]
[-- Type: text/plain, Size: 1773 bytes --]

Now that each entry has a reliable timestamp, we can
use the timestamp as the source of sorting the trace and
remove the atomic increment.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |    3 +--
 lib/tracing/tracer.h |    2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:40.000000000 -0500
@@ -60,7 +60,6 @@ mctracer_add_trace_entry(struct mctracer
 		atomic_inc(&data->underrun);
 
 	entry = data->trace + idx * MCTRACER_ENTRY_SIZE;
-	entry->idx	 = atomic_inc_return(&tr->cnt);
 	entry->ip	 = ip;
 	entry->parent_ip = parent_ip;
 	entry->pid	 = tsk->pid;
@@ -152,7 +151,7 @@ static void *find_next_entry(struct mctr
 		if (!tr->data[i]->trace)
 			continue;
 		ent = mctracer_entry_idx(tr, iter->next_idx[i], i);
-		if (ent && (!next || next->idx > ent->idx)) {
+		if (ent && (!next || next->t > ent->t)) {
 			next = ent;
 			next_i = i;
 		}
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-14 14:57:38.000000000 -0500
@@ -6,7 +6,6 @@
 
 struct mctracer_entry {
 	unsigned long long t;
-	unsigned long idx;
 	unsigned long ip;
 	unsigned long parent_ip;
 	char comm[TASK_COMM_LEN];
@@ -24,7 +23,6 @@ struct mctracer_trace {
 	unsigned long entries;
 	long	      ctrl;
 	unsigned long iter_flags;
-	atomic_t      cnt;
 	struct mctracer_trace_cpu *data[NR_CPUS];
 };
 

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 22/30 v3] speed up the output of the tracer
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (20 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 21/30 v3] Sort trace by timestamp Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 23/30 v3] Add latency_trace format tor tracer Steven Rostedt
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-speed-output.patch --]
[-- Type: text/plain, Size: 1901 bytes --]

The current method of printing out the trace is on every
read, do a linear search for the next entry to print.
This patch remembers the next entry to look at in the
iterator, and if the next read is sequential, it can
start reading from the next location.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |   28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 14:57:38.000000000 -0500
@@ -115,6 +115,7 @@ enum trace_iterator {
 struct mctracer_iterator {
 	struct mctracer_trace *tr;
 	struct mctracer_entry *ent;
+	loff_t pos;
 	unsigned long next_idx[NR_CPUS];
 	int cpu;
 	int idx;
@@ -186,6 +187,8 @@ static void *s_next(struct seq_file *m, 
 	while (ent && iter->idx < i)
 		ent = find_next_entry(iter);
 
+	iter->pos = *pos;
+
 	return ent;
 }
 
@@ -196,19 +199,25 @@ static void *s_start(struct seq_file *m,
 	loff_t l = 0;
 	int i;
 
-	iter->ent = NULL;
-	iter->cpu = 0;
-	iter->idx = -1;
-
-	for (i = 0; i < NR_CPUS; i++)
-		iter->next_idx[i] = 0;
-
 	/* stop the trace while dumping */
 	if (iter->tr->ctrl)
 		trace_enabled = 0;
 
-	for (p = iter; p && l < *pos; p = s_next(m, p, &l))
-		;
+	if (*pos != iter->pos) {
+		iter->ent = NULL;
+		iter->cpu = 0;
+		iter->idx = -1;
+
+		for (i = 0; i < NR_CPUS; i++)
+			iter->next_idx[i] = 0;
+
+		for (p = iter; p && l < *pos; p = s_next(m, p, &l))
+			;
+
+	} else {
+		l = *pos;
+		p = s_next(m, p, &l);
+	}
 
 	return p;
 }
@@ -296,6 +305,7 @@ static int mctrace_open(struct inode *in
 		return -ENOMEM;
 
 	iter->tr = &mctracer_trace;
+	iter->pos = -1;
 
 	/* TODO stop tracer */
 	ret = seq_open(file, &mctrace_seq_ops);

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 23/30 v3] Add latency_trace format tor tracer
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (21 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 22/30 v3] speed up the output of the tracer Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 24/30 v3] Split out specific tracing functions Steven Rostedt
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-latency-tracer-fmt.patch --]
[-- Type: text/plain, Size: 19835 bytes --]

This patch adds a latency_trace file with the format used
by RT in which others have created tools to disect. This
file adds some useful recording for tracing, but still
does not add actual latency tracing.

Format like (default):

  "echo noverbose > /debugfs/tracing/iter_ctrl"

preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
 latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
    -----------------
    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
 swapper-0     0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
 swapper-0     0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)

Or with verbose turned on:

  "echo verbose > /debugfs/tracing/iter_ctrl"

preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
 latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
    -----------------
    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------

         swapper     0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
         swapper     0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
         swapper     0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)


Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/tracer.c |  451 +++++++++++++++++++++++++++++++++++++++++++++------
 lib/tracing/tracer.h |   13 +
 2 files changed, 411 insertions(+), 53 deletions(-)

Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-15 15:03:17.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-15 15:03:41.000000000 -0500
@@ -20,7 +20,9 @@
 #include <linux/debugfs.h>
 #include <linux/kallsyms.h>
 #include <linux/clocksource.h>
+#include <linux/utsrelease.h>
 #include <linux/uaccess.h>
+#include <linux/hardirq.h>
 #include <linux/mcount.h>
 
 #include "tracer.h"
@@ -35,16 +37,25 @@ static struct mctracer_trace mctracer_tr
 static DEFINE_PER_CPU(struct mctracer_trace_cpu, mctracer_trace_cpu);
 static int trace_enabled __read_mostly;
 
+enum trace_flag_type {
+	TRACE_FLAG_IRQS_OFF		= 0x01,
+	TRACE_FLAG_NEED_RESCHED		= 0x02,
+	TRACE_FLAG_HARDIRQ		= 0x04,
+	TRACE_FLAG_SOFTIRQ		= 0x08,
+};
+
 static inline notrace void
 mctracer_add_trace_entry(struct mctracer_trace *tr,
 			 int cpu,
 			 const unsigned long ip,
-			 const unsigned long parent_ip)
+			 const unsigned long parent_ip,
+			 unsigned long flags)
 {
 	unsigned long idx, idx_next;
 	struct mctracer_entry *entry;
 	struct task_struct *tsk = current;
 	struct mctracer_trace_cpu *data = tr->data[cpu];
+	unsigned long pc;
 
 	idx = data->trace_idx;
 	idx_next = idx + 1;
@@ -59,11 +70,18 @@ mctracer_add_trace_entry(struct mctracer
 	if (unlikely(idx_next != 0 && atomic_read(&data->underrun)))
 		atomic_inc(&data->underrun);
 
+	pc = preempt_count();
+
 	entry = data->trace + idx * MCTRACER_ENTRY_SIZE;
+	entry->preempt_count = pc & 0xff;
 	entry->ip	 = ip;
 	entry->parent_ip = parent_ip;
 	entry->pid	 = tsk->pid;
 	entry->t	 = now();
+	entry->flags = (irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
+		((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
+		((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
+		(need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
 	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
 }
 
@@ -84,7 +102,7 @@ static notrace void trace_function(const
 
 	atomic_inc(&tr->data[cpu]->disabled);
 	if (likely(atomic_read(&tr->data[cpu]->disabled) == 1))
-		mctracer_add_trace_entry(tr, cpu, ip, parent_ip);
+		mctracer_add_trace_entry(tr, cpu, ip, parent_ip, flags);
 
 	atomic_dec(&tr->data[cpu]->disabled);
 
@@ -101,6 +119,11 @@ static notrace void mctracer_reset(struc
 {
 	int cpu;
 
+	tr->time_start = now();
+	tr->saved_latency = 0;
+	tr->critical_start = 0;
+	tr->critical_end = 0;
+
 	for_each_online_cpu(cpu) {
 		tr->data[cpu]->trace_idx = 0;
 		atomic_set(&tr->data[cpu]->underrun, 0);
@@ -110,11 +133,24 @@ static notrace void mctracer_reset(struc
 #ifdef CONFIG_DEBUG_FS
 enum trace_iterator {
 	TRACE_ITER_SYM_ONLY	= 1,
+	TRACE_ITER_VERBOSE	= 2,
+};
+
+/* These must match the bit postions above */
+static const char *trace_options[] = {
+	"symonly",
+	"verbose",
+	NULL
+};
+
+enum trace_file_type {
+	TRACE_FILE_LAT_FMT	= 1,
 };
 
 struct mctracer_iterator {
 	struct mctracer_trace *tr;
 	struct mctracer_entry *ent;
+	unsigned long iter_flags;
 	loff_t pos;
 	unsigned long next_idx[NR_CPUS];
 	int cpu;
@@ -140,37 +176,53 @@ static struct mctracer_entry *mctracer_e
 	return &array[idx];
 }
 
-static void *find_next_entry(struct mctracer_iterator *iter)
+static struct notrace mctracer_entry *
+find_next_entry(struct mctracer_iterator *iter, int *ent_cpu)
 {
 	struct mctracer_trace *tr = iter->tr;
-	struct mctracer_entry *ent;
-	struct mctracer_entry *next = NULL;
-	int next_i = -1;
-	int i;
+	struct mctracer_entry *ent, *next = NULL;
+	int next_cpu = -1;
+	int cpu;
 
-	for_each_possible_cpu(i) {
-		if (!tr->data[i]->trace)
+	for_each_possible_cpu(cpu) {
+		if (!tr->data[cpu]->trace)
 			continue;
-		ent = mctracer_entry_idx(tr, iter->next_idx[i], i);
+		ent = mctracer_entry_idx(tr, iter->next_idx[cpu], cpu);
 		if (ent && (!next || next->t > ent->t)) {
 			next = ent;
-			next_i = i;
+			next_cpu = cpu;
 		}
 	}
+
+	if (ent_cpu)
+		*ent_cpu = next_cpu;
+
+	return next;
+}
+
+static void *find_next_entry_inc(struct mctracer_iterator *iter)
+{
+	struct mctracer_entry *next;
+	int next_cpu = -1;
+
+	next = find_next_entry(iter, &next_cpu);
+
 	if (next) {
-		iter->next_idx[next_i]++;
+		iter->next_idx[next_cpu]++;
 		iter->idx++;
 	}
 	iter->ent = next;
-	iter->cpu = next_i;
+	iter->cpu = next_cpu;
 
 	return next ? iter : NULL;
 }
 
-static void *s_next(struct seq_file *m, void *v, loff_t *pos)
+static void notrace *
+s_next(struct seq_file *m, void *v, loff_t *pos)
 {
 	struct mctracer_iterator *iter = m->private;
 	void *ent;
+	void *last_ent = iter->ent;
 	int i = (int)*pos;
 
 	(*pos)++;
@@ -180,15 +232,18 @@ static void *s_next(struct seq_file *m, 
 		return NULL;
 
 	if (iter->idx < 0)
-		ent = find_next_entry(iter);
+		ent = find_next_entry_inc(iter);
 	else
 		ent = iter;
 
 	while (ent && iter->idx < i)
-		ent = find_next_entry(iter);
+		ent = find_next_entry_inc(iter);
 
 	iter->pos = *pos;
 
+	if (last_ent && !ent)
+		seq_puts(m, "\n\nvim:ft=help\n");
+
 	return ent;
 }
 
@@ -249,40 +304,239 @@ static void seq_print_symbol(struct seq_
 #endif
 
 static void notrace seq_print_ip_sym(struct seq_file *m,
-				     unsigned long ip,
-				     int sym_only)
+				     unsigned long ip, int sym_only)
 {
+	if (!ip) {
+		seq_printf(m, "0");
+		return;
+	}
+
 	seq_print_symbol(m, "%s", ip);
 	if (!sym_only)
 		seq_printf(m, " <" IP_FMT ">", ip);
 }
 
+static void notrace print_help_header(struct seq_file *m)
+{
+	seq_puts(m, "                 _------=> CPU#            \n");
+	seq_puts(m, "                / _-----=> irqs-off        \n");
+	seq_puts(m, "               | / _----=> need-resched    \n");
+	seq_puts(m, "               || / _---=> hardirq/softirq \n");
+	seq_puts(m, "               ||| / _--=> preempt-depth   \n");
+	seq_puts(m, "               |||| /                      \n");
+	seq_puts(m, "               |||||     delay             \n");
+	seq_puts(m, "   cmd     pid ||||| time  |   caller      \n");
+	seq_puts(m, "      \\   /    |||||   \\   |   /           \n");
+}
+
+static void notrace print_trace_header(struct seq_file *m,
+				       struct mctracer_iterator *iter)
+{
+	struct mctracer_trace *tr = iter->tr;
+	unsigned long underruns = 0;
+	unsigned long underrun;
+	unsigned long entries   = 0;
+	int sym_only = !!(tr->iter_flags & TRACE_ITER_SYM_ONLY);
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (tr->data[cpu]->trace) {
+			underrun = atomic_read(&tr->data[cpu]->underrun);
+			if (underrun) {
+				underruns += underrun;
+				entries += tr->entries;
+			} else
+				entries += tr->data[cpu]->trace_idx;
+		}
+	}
+
+	seq_printf(m, "preemption latency trace v1.1.5 on %s\n",
+		   UTS_RELEASE);
+	seq_puts(m, "-----------------------------------"
+		 "---------------------------------\n");
+	seq_printf(m, " latency: %lu us, #%lu/%lu, CPU#%d |"
+		   " (M:%s VP:%d, KP:%d, SP:%d HP:%d",
+		   cycles_to_usecs(tr->saved_latency),
+		   entries,
+		   (entries + underruns),
+		   smp_processor_id(),
+#if defined(CONFIG_PREEMPT_NONE)
+		   "server",
+#elif defined(CONFIG_PREEMPT_VOLUNTARY)
+		   "desktop",
+#elif defined(CONFIG_PREEMPT_DESKTOP)
+		   "preempt",
+#else
+		   "rt",
+#endif
+		   /* These are reserved for later use */
+		   0, 0, 0, 0);
+#ifdef CONFIG_SMP
+	seq_printf(m, " #P:%d)\n", num_online_cpus());
+#else
+	seq_puts(m, ")\n");
+#endif
+	seq_puts(m, "    -----------------\n");
+	seq_printf(m, "    | task: %.16s-%d "
+		   "(uid:%d nice:%ld policy:%ld rt_prio:%ld)\n",
+		   tr->comm, tr->pid, tr->uid, tr->nice,
+		   tr->policy, tr->rt_priority);
+	seq_puts(m, "    -----------------\n");
+
+	if (tr->critical_start) {
+		seq_puts(m, " => started at: ");
+		seq_print_ip_sym(m, tr->critical_start, sym_only);
+		seq_puts(m, "\n => ended at:   ");
+		seq_print_ip_sym(m, tr->critical_end, sym_only);
+		seq_puts(m, "\n");
+	}
+
+	seq_puts(m, "\n");
+}
+
+
+static void notrace
+lat_print_generic(struct seq_file *m, struct mctracer_entry *entry, int cpu)
+{
+	int hardirq, softirq;
+
+	seq_printf(m, "%8.8s-%-5d ", entry->comm, entry->pid);
+	seq_printf(m, "%d", cpu);
+	seq_printf(m, "%c%c",
+		   (entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
+		   ((entry->flags & TRACE_FLAG_NEED_RESCHED) ? 'N' : '.'));
+
+	hardirq = entry->flags & TRACE_FLAG_HARDIRQ;
+	softirq = entry->flags & TRACE_FLAG_SOFTIRQ;
+	if (hardirq && softirq)
+		seq_putc(m, 'H');
+	else {
+		if (hardirq)
+			seq_putc(m, 'h');
+		else {
+			if (softirq)
+				seq_putc(m, 's');
+			else
+				seq_putc(m, '.');
+		}
+	}
+
+	if (entry->preempt_count)
+		seq_printf(m, "%lx", entry->preempt_count);
+	else
+		seq_puts(m, ".");
+}
+
+unsigned long preempt_mark_thresh = 100;
+
+static void notrace
+lat_print_timestamp(struct seq_file *m, unsigned long long abs_usecs,
+		    unsigned long rel_usecs)
+{
+	seq_printf(m, " %4lldus", abs_usecs);
+	if (rel_usecs > preempt_mark_thresh)
+		seq_puts(m, "!: ");
+	else if (rel_usecs > 1)
+		seq_puts(m, "+: ");
+	else
+		seq_puts(m, " : ");
+}
+
+static void notrace
+print_lat_fmt(struct seq_file *m, struct mctracer_iterator *iter,
+	      unsigned int trace_idx, int cpu)
+{
+	struct mctracer_entry *entry = iter->ent;
+	struct mctracer_entry *next_entry = find_next_entry(iter, NULL);
+	unsigned long abs_usecs;
+	unsigned long rel_usecs;
+	int sym_only = !!(iter->tr->iter_flags & TRACE_ITER_SYM_ONLY);
+	int verbose = !!(iter->tr->iter_flags & TRACE_ITER_VERBOSE);
+
+	if (!next_entry)
+		next_entry = entry;
+	rel_usecs = cycles_to_usecs(next_entry->t - entry->t);
+	abs_usecs = cycles_to_usecs(entry->t - iter->tr->time_start);
+
+	if (verbose) {
+		seq_printf(m, "%16s %5d %d %ld %08lx %08x [%08lx]"
+			   " %ld.%03ldms (+%ld.%03ldms): ",
+			   entry->comm,
+			   entry->pid, cpu, entry->flags,
+			   entry->preempt_count, trace_idx,
+			   cycles_to_usecs(entry->t),
+			   abs_usecs/1000,
+			   abs_usecs % 1000, rel_usecs/1000, rel_usecs % 1000);
+	} else {
+		lat_print_generic(m, entry, cpu);
+		lat_print_timestamp(m, abs_usecs, rel_usecs);
+	}
+	seq_print_ip_sym(m, entry->ip, sym_only);
+	seq_puts(m, " (");
+	seq_print_ip_sym(m, entry->parent_ip, sym_only);
+	seq_puts(m, ")\n");
+}
+
+static void notrace print_trace_fmt(struct seq_file *m,
+				    struct mctracer_iterator *iter)
+{
+	unsigned long usec_rem;
+	unsigned long secs;
+	int sym_only = !!(iter->tr->iter_flags & TRACE_ITER_SYM_ONLY);
+	unsigned long long t;
+
+	t = cycles_to_usecs(iter->ent->t);
+	usec_rem = do_div(t, 1000000ULL);
+	secs = (unsigned long)t;
+
+	seq_printf(m, "[%5lu.%06lu] ", secs, usec_rem);
+	seq_printf(m, "CPU %d: ", iter->cpu);
+	seq_printf(m, "%s:%d ", iter->ent->comm,
+		   iter->ent->pid);
+	seq_print_ip_sym(m, iter->ent->ip, sym_only);
+	if (iter->ent->parent_ip) {
+		seq_printf(m, " <-- ");
+		seq_print_ip_sym(m, iter->ent->parent_ip,
+				 sym_only);
+	}
+	seq_printf(m, "\n");
+}
+
+static int trace_empty(struct mctracer_iterator *iter)
+{
+	struct mctracer_trace_cpu *data;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		data = iter->tr->data[cpu];
+
+		if (data->trace &&
+		    (data->trace_idx ||
+		     atomic_read(&data->underrun)))
+			return 0;
+	}
+	return 1;
+}
+
 static int s_show(struct seq_file *m, void *v)
 {
 	struct mctracer_iterator *iter = v;
-	int sym_only = !!(iter->tr->iter_flags & TRACE_ITER_SYM_ONLY);
 
 	if (iter->ent == NULL) {
-		seq_printf(m, "mctracer:\n");
+		if (iter->iter_flags & TRACE_FILE_LAT_FMT) {
+			/* print nothing if the buffers are empty */
+			if (trace_empty(iter))
+				return 0;
+			print_trace_header(m, iter);
+			if (!(iter->tr->iter_flags & TRACE_ITER_VERBOSE))
+				print_help_header(m);
+		} else
+			seq_printf(m, "mctracer:\n");
 	} else {
-		unsigned long long t;
-		unsigned long usec_rem;
-		unsigned long secs;
-
-		t = cycles_to_usecs(iter->ent->t);
-		usec_rem = do_div(t, 1000000ULL);
-		secs = (unsigned long)t;
-
-		seq_printf(m, "[%5lu.%06lu] ", secs, usec_rem);
-		seq_printf(m, "CPU %d: ", iter->cpu);
-		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
-		seq_print_ip_sym(m, iter->ent->ip, sym_only);
-		if (iter->ent->parent_ip) {
-			seq_printf(m, " <-- ");
-			seq_print_ip_sym(m, iter->ent->parent_ip,
-					 sym_only);
-		}
-		seq_printf(m, "\n");
+		if (iter->iter_flags & TRACE_FILE_LAT_FMT)
+			print_lat_fmt(m, iter, iter->idx, iter->cpu);
+		else
+			print_trace_fmt(m, iter);
 	}
 
 	return 0;
@@ -295,25 +549,52 @@ static struct seq_operations mctrace_seq
 	.show = s_show,
 };
 
-static int mctrace_open(struct inode *inode, struct file *file)
+static struct mctracer_iterator *
+__mctrace_open(struct inode *inode, struct file *file, int *ret)
 {
 	struct mctracer_iterator *iter;
-	int ret;
 
 	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
-	if (!iter)
-		return -ENOMEM;
+	if (!iter) {
+		*ret = -ENOMEM;
+		goto out;
+	}
 
 	iter->tr = &mctracer_trace;
 	iter->pos = -1;
 
 	/* TODO stop tracer */
-	ret = seq_open(file, &mctrace_seq_ops);
-	if (!ret) {
+	*ret = seq_open(file, &mctrace_seq_ops);
+	if (!*ret) {
 		struct seq_file *m = file->private_data;
 		m->private = iter;
-	} else
+	} else {
 		kfree(iter);
+		iter = NULL;
+	}
+
+ out:
+	return iter;
+}
+
+static int mctrace_open(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	__mctrace_open(inode, file, &ret);
+
+	return ret;
+}
+
+static int mctrace_lt_open(struct inode *inode, struct file *file)
+{
+	struct mctracer_iterator *iter;
+	int ret;
+
+	iter = __mctrace_open(inode, file, &ret);
+
+	if (!ret)
+		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
 	return ret;
 }
@@ -335,6 +616,13 @@ static struct file_operations mctrace_fo
 	.release = mctrace_release,
 };
 
+static struct file_operations mctrace_lt_fops = {
+	.open = mctrace_lt_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = mctrace_release,
+};
+
 static int mctracer_open_generic(struct inode *inode, struct file *filp)
 {
 	filp->private_data = inode->i_private;
@@ -375,6 +663,17 @@ static ssize_t mctracer_ctrl_write(struc
 	/* When starting a new trace, reset the buffers */
 	if (val)
 		mctracer_reset(tr);
+	else {
+		/* pretty meaningless for now */
+		tr->time_end = now();
+		tr->saved_latency = tr->time_end - tr->time_start;
+		memcpy(tr->comm, current->comm, TASK_COMM_LEN);
+		tr->pid = current->pid;
+		tr->uid = current->uid;
+		tr->nice = current->static_prio - 20 - MAX_RT_PRIO;
+		tr->policy = current->policy;
+		tr->rt_priority = current->rt_priority;
+	}
 
 	if (tr->ctrl ^ val) {
 		if (val)
@@ -399,15 +698,38 @@ static ssize_t mctracer_iter_ctrl_read(s
 				       size_t cnt, loff_t *ppos)
 {
 	struct mctracer_trace *tr = filp->private_data;
-	char buf[64];
+	char *buf;
 	int r = 0;
+	int i;
+	int len = 0;
 
-	if (tr->iter_flags & TRACE_ITER_SYM_ONLY)
-		r = sprintf(buf, "%s", "symonly ");
-	r += sprintf(buf+r, "\n");
+	/* calulate max size */
+	for (i = 0; trace_options[i]; i++) {
+		len += strlen(trace_options[i]);
+		len += 3; /* "no" and space */
+	}
 
-	return simple_read_from_buffer(ubuf, cnt, ppos,
-				       buf, r);
+	/* +2 for \n and \0 */
+	buf = kmalloc(len + 2, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	for (i = 0; trace_options[i]; i++) {
+		if (tr->iter_flags & (1 << i))
+			r += sprintf(buf + r, "%s ", trace_options[i]);
+		else
+			r += sprintf(buf + r, "no%s ", trace_options[i]);
+	}
+
+	r += sprintf(buf + r, "\n");
+	WARN_ON(r >= len + 2);
+
+	r = simple_read_from_buffer(ubuf, cnt, ppos,
+				    buf, r);
+
+	kfree(buf);
+
+	return r;
 }
 
 static ssize_t mctracer_iter_ctrl_write(struct file *filp,
@@ -416,6 +738,9 @@ static ssize_t mctracer_iter_ctrl_write(
 {
 	struct mctracer_trace *tr = filp->private_data;
 	char buf[64];
+	char *cmp = buf;
+	int neg = 0;
+	int i;
 
 	if (cnt > 63)
 		cnt = 63;
@@ -425,8 +750,22 @@ static ssize_t mctracer_iter_ctrl_write(
 
 	buf[cnt] = 0;
 
-	if (strncmp(buf, "symonly", 7) == 0)
-		tr->iter_flags |= TRACE_ITER_SYM_ONLY;
+	if (strncmp(buf, "no", 2) == 0) {
+		neg = 1;
+		cmp += 2;
+	}
+
+	for (i = 0; trace_options[i]; i++) {
+		int len = strlen(trace_options[i]);
+
+		if (strncmp(cmp, trace_options[i], len) == 0) {
+			if (neg)
+				tr->iter_flags &= ~(1 << i);
+			else
+				tr->iter_flags |= (1 << i);
+			break;
+		}
+	}
 
 	filp->f_pos += cnt;
 
@@ -459,6 +798,12 @@ static void mctrace_init_debugfs(void)
 				    &mctracer_trace, &mctracer_iter_fops);
 	if (!entry)
 		pr_warning("Could not create debugfs 'iter_ctrl' entry\n");
+
+	entry = debugfs_create_file("function_trace", 0444, d_mctracer,
+				    &mctracer_trace, &mctrace_lt_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'function_trace' entry\n");
+
 	entry = debugfs_create_file("trace", 0444, d_mctracer,
 				    &mctracer_trace, &mctrace_fops);
 	if (!entry)
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-15 15:03:02.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 15:03:41.000000000 -0500
@@ -8,6 +8,8 @@ struct mctracer_entry {
 	unsigned long long t;
 	unsigned long ip;
 	unsigned long parent_ip;
+	unsigned long preempt_count;
+	unsigned long flags;
 	char comm[TASK_COMM_LEN];
 	pid_t pid;
 };
@@ -23,6 +25,17 @@ struct mctracer_trace {
 	unsigned long entries;
 	long	      ctrl;
 	unsigned long iter_flags;
+	char comm[TASK_COMM_LEN];
+	pid_t	      pid;
+	uid_t	      uid;
+	unsigned long nice;
+	unsigned long policy;
+	unsigned long rt_priority;
+	unsigned long saved_latency;
+	unsigned long critical_start;
+	unsigned long critical_end;
+	unsigned long long time_start;
+	unsigned long long time_end;
 	struct mctracer_trace_cpu *data[NR_CPUS];
 };
 

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 24/30 v3] Split out specific tracing functions
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (22 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 23/30 v3] Add latency_trace format tor tracer Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 25/30 v3] Trace irq disabled critical timings Steven Rostedt
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mctracer-splitout.patch --]
[-- Type: text/plain, Size: 34433 bytes --]

Several different types of tracing needs to use the
same core functions. This patch separates the core
functions from more specific ones to allow for
future tracing methods.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/Makefile                   |    1 
 lib/tracing/Kconfig            |    7 
 lib/tracing/Makefile           |    3 
 lib/tracing/trace_function.c   |  180 +++++++++++++
 lib/tracing/tracer.c           |  537 +++++++++++++++++------------------------
 lib/tracing/tracer.h           |   91 +++++-
 lib/tracing/tracer_interface.h |   14 -
 7 files changed, 485 insertions(+), 348 deletions(-)

Index: linux-compile.git/lib/Makefile
===================================================================
--- linux-compile.git.orig/lib/Makefile	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/Makefile	2008-01-14 13:14:14.000000000 -0500
@@ -67,6 +67,7 @@ obj-$(CONFIG_SWIOTLB) += swiotlb.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
 obj-$(CONFIG_MCOUNT) += tracing/
+obj-$(CONFIG_TRACING) += tracing/
 
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-14 14:57:35.000000000 -0500
@@ -9,11 +9,16 @@ config MCOUNT
 	bool
 	select FRAME_POINTER
 
-config MCOUNT_TRACER
+config TRACING
+        bool
+	depends on DEBUG_KERNEL
+
+config FUNCTION_TRACER
 	bool "Profiler instrumentation based tracer"
 	depends on DEBUG_KERNEL && HAVE_MCOUNT
 	default n
 	select MCOUNT
+	select TRACING
 	help
 	  Use profiler instrumentation, adding -pg to CFLAGS. This will
 	  insert a call to an architecture specific __mcount routine,
Index: linux-compile.git/lib/tracing/Makefile
===================================================================
--- linux-compile.git.orig/lib/tracing/Makefile	2008-01-14 13:14:13.000000000 -0500
+++ linux-compile.git/lib/tracing/Makefile	2008-01-14 14:57:35.000000000 -0500
@@ -1,5 +1,6 @@
 obj-$(CONFIG_MCOUNT) += libmcount.o
 
-obj-$(CONFIG_MCOUNT_TRACER) += tracer.o
+obj-$(CONFIG_TRACING) += tracer.o
+obj-$(CONFIG_FUNCTION_TRACER) += trace_function.o
 
 libmcount-y := mcount.o
Index: linux-compile.git/lib/tracing/trace_function.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/trace_function.c	2008-01-14 13:14:14.000000000 -0500
@@ -0,0 +1,180 @@
+/*
+ * ring buffer based mcount tracer
+ *
+ * Copyright (C) 2007 Steven Rostedt <srostedt@redhat.com>
+ *
+ * Based on code from the latency_tracer, that is:
+ *
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+#include <linux/mcount.h>
+
+#include "tracer.h"
+
+static struct tracing_trace function_trace __read_mostly;
+static DEFINE_PER_CPU(struct tracing_trace_cpu, function_trace_cpu);
+static int trace_enabled __read_mostly;
+
+static notrace void function_trace_reset(struct tracing_trace *tr)
+{
+	int cpu;
+
+	tr->time_start = now();
+
+	for_each_online_cpu(cpu)
+		tracing_reset(tr->data[cpu]);
+}
+
+static void notrace function_trace_call(unsigned long ip,
+					unsigned long parent_ip)
+{
+	struct tracing_trace *tr = &function_trace;
+	struct tracing_trace_cpu *data;
+	unsigned long flags;
+	int cpu;
+
+	if (unlikely(!trace_enabled))
+		return;
+
+	raw_local_irq_save(flags);
+	cpu = raw_smp_processor_id();
+	data = tr->data[cpu];
+	atomic_inc(&data->disabled);
+
+	if (likely(atomic_read(&data->disabled) == 1))
+		tracing_function_trace(tr, data, ip, parent_ip, flags);
+
+	atomic_dec(&data->disabled);
+	raw_local_irq_restore(flags);
+}
+
+static struct mcount_ops trace_ops __read_mostly =
+{
+	.func = function_trace_call,
+};
+
+#ifdef CONFIG_DEBUG_FS
+static void function_trace_ctrl_update(struct tracing_trace *tr,
+				       unsigned long val)
+{
+	val = !!val;
+
+	/* When starting a new trace, reset the buffers */
+	if (val)
+		function_trace_reset(tr);
+
+	if (tr->ctrl ^ val) {
+		if (val) {
+			trace_enabled = 1;
+			register_mcount_function(&trace_ops);
+		} else {
+			trace_enabled = 0;
+			unregister_mcount_function(&trace_ops);
+		}
+		tr->ctrl = val;
+	}
+}
+
+static __init void function_trace_init_debugfs(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+
+	function_trace.ctrl_update = function_trace_ctrl_update;
+
+	entry = debugfs_create_file("fn_trace_ctrl", 0644, d_tracer,
+				    &function_trace, &tracing_ctrl_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'ctrl' entry\n");
+
+	entry = debugfs_create_file("function_trace", 0444, d_tracer,
+				    &function_trace, &tracing_lt_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'function_trace' entry\n");
+
+	entry = debugfs_create_file("trace", 0444, d_tracer,
+				    &function_trace, &tracing_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'trace' entry\n");
+
+}
+
+#else
+static __init void function_trace_init_debugfs(void)
+{
+	/*
+	 * No way to turn on or off the trace function
+	 * without debugfs, so we just turn it on.
+	 */
+}
+#endif
+
+static void function_trace_open(struct tracing_iterator *iter)
+{
+	/* stop the trace while dumping */
+	if (iter->tr->ctrl)
+		trace_enabled = 0;
+}
+
+static void function_trace_close(struct tracing_iterator *iter)
+{
+	if (iter->tr->ctrl)
+		trace_enabled = 1;
+}
+
+__init static int function_trace_alloc_buffers(void)
+{
+	const int order = page_order(TRACING_NR_ENTRIES * TRACING_ENTRY_SIZE);
+	const unsigned long size = (1UL << order) << PAGE_SHIFT;
+	struct tracing_entry *array;
+	int i;
+
+	for_each_possible_cpu(i) {
+		function_trace.data[i] = &per_cpu(function_trace_cpu, i);
+		array = (struct tracing_entry *)
+			  __get_free_pages(GFP_KERNEL, order);
+		if (array == NULL) {
+			printk(KERN_ERR "function tracer: failed to allocate"
+			       " %ld bytes for trace buffer!\n", size);
+			goto free_buffers;
+		}
+		function_trace.data[i]->trace = array;
+	}
+
+	/*
+	 * Since we allocate by orders of pages, we may be able to
+	 * round up a bit.
+	 */
+	function_trace.entries = size / TRACING_ENTRY_SIZE;
+
+	pr_info("function tracer: %ld bytes allocated for %ld",
+		size, TRACING_NR_ENTRIES);
+	pr_info(" entries of %ld bytes\n", (long)TRACING_ENTRY_SIZE);
+	pr_info("   actual entries %ld\n", function_trace.entries);
+
+	function_trace_init_debugfs();
+
+	function_trace.open = function_trace_open;
+	function_trace.close = function_trace_close;
+
+	return 0;
+
+ free_buffers:
+	for (i-- ; i >= 0; i--) {
+		struct tracing_trace_cpu *data = function_trace.data[i];
+
+		if (data && data->trace) {
+			free_pages((unsigned long)data->trace, order);
+			data->trace = NULL;
+		}
+	}
+	return -ENOMEM;
+}
+
+device_initcall(function_trace_alloc_buffers);
Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-14 13:14:14.000000000 -0500
@@ -19,23 +19,21 @@
 #include <linux/percpu.h>
 #include <linux/debugfs.h>
 #include <linux/kallsyms.h>
-#include <linux/clocksource.h>
 #include <linux/utsrelease.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
 #include <linux/mcount.h>
 
 #include "tracer.h"
-#include "tracer_interface.h"
 
-static inline notrace cycle_t now(void)
+enum trace_type
 {
-	return get_monotonic_cycles();
-}
+	__TRACE_FIRST_TYPE = 0,
+
+	TRACE_FN,
 
-static struct mctracer_trace mctracer_trace;
-static DEFINE_PER_CPU(struct mctracer_trace_cpu, mctracer_trace_cpu);
-static int trace_enabled __read_mostly;
+	__TRACE_LAST_TYPE
+};
 
 enum trace_flag_type {
 	TRACE_FLAG_IRQS_OFF		= 0x01,
@@ -44,18 +42,18 @@ enum trace_flag_type {
 	TRACE_FLAG_SOFTIRQ		= 0x08,
 };
 
-static inline notrace void
-mctracer_add_trace_entry(struct mctracer_trace *tr,
-			 int cpu,
-			 const unsigned long ip,
-			 const unsigned long parent_ip,
-			 unsigned long flags)
+void notrace tracing_reset(struct tracing_trace_cpu *data)
+{
+	data->trace_idx = 0;
+	atomic_set(&data->underrun, 0);
+}
+
+static inline notrace struct tracing_entry *
+tracing_get_trace_entry(struct tracing_trace *tr,
+			struct tracing_trace_cpu *data)
 {
 	unsigned long idx, idx_next;
-	struct mctracer_entry *entry;
-	struct task_struct *tsk = current;
-	struct mctracer_trace_cpu *data = tr->data[cpu];
-	unsigned long pc;
+	struct tracing_entry *entry;
 
 	idx = data->trace_idx;
 	idx_next = idx + 1;
@@ -70,12 +68,21 @@ mctracer_add_trace_entry(struct mctracer
 	if (unlikely(idx_next != 0 && atomic_read(&data->underrun)))
 		atomic_inc(&data->underrun);
 
+	entry = data->trace + idx * TRACING_ENTRY_SIZE;
+
+	return entry;
+}
+
+static inline notrace void
+tracing_generic_entry_update(struct tracing_entry *entry,
+			     unsigned long flags)
+{
+	struct task_struct *tsk = current;
+	unsigned long pc;
+
 	pc = preempt_count();
 
-	entry = data->trace + idx * MCTRACER_ENTRY_SIZE;
 	entry->preempt_count = pc & 0xff;
-	entry->ip	 = ip;
-	entry->parent_ip = parent_ip;
 	entry->pid	 = tsk->pid;
 	entry->t	 = now();
 	entry->flags = (irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
@@ -85,49 +92,19 @@ mctracer_add_trace_entry(struct mctracer
 	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
 }
 
-static notrace void trace_function(const unsigned long ip,
-				   const unsigned long parent_ip)
-{
-	unsigned long flags;
-	struct mctracer_trace *tr;
-	int cpu;
-
-	raw_local_irq_save(flags);
-
-	tr = &mctracer_trace;
-	if (!trace_enabled)
-		goto out;
-
-	cpu = raw_smp_processor_id();
-
-	atomic_inc(&tr->data[cpu]->disabled);
-	if (likely(atomic_read(&tr->data[cpu]->disabled) == 1))
-		mctracer_add_trace_entry(tr, cpu, ip, parent_ip, flags);
-
-	atomic_dec(&tr->data[cpu]->disabled);
-
- out:
-	raw_local_irq_restore(flags);
-}
-
-static struct mcount_ops trace_ops __read_mostly =
-{
-	.func = trace_function,
-};
-
-static notrace void mctracer_reset(struct mctracer_trace *tr)
-{
-	int cpu;
-
-	tr->time_start = now();
-	tr->saved_latency = 0;
-	tr->critical_start = 0;
-	tr->critical_end = 0;
-
-	for_each_online_cpu(cpu) {
-		tr->data[cpu]->trace_idx = 0;
-		atomic_set(&tr->data[cpu]->underrun, 0);
-	}
+notrace void tracing_function_trace(struct tracing_trace *tr,
+				    struct tracing_trace_cpu *data,
+				    unsigned long ip,
+				    unsigned long parent_ip,
+				    unsigned long flags)
+{
+	struct tracing_entry *entry;
+
+	entry = tracing_get_trace_entry(tr, data);
+	tracing_generic_entry_update(entry, flags);
+	entry->type	    = TRACE_FN;
+	entry->fn.ip	    = ip;
+	entry->fn.parent_ip = parent_ip;
 }
 
 #ifdef CONFIG_DEBUG_FS
@@ -143,25 +120,17 @@ static const char *trace_options[] = {
 	NULL
 };
 
+static unsigned trace_flags;
+
 enum trace_file_type {
 	TRACE_FILE_LAT_FMT	= 1,
 };
 
-struct mctracer_iterator {
-	struct mctracer_trace *tr;
-	struct mctracer_entry *ent;
-	unsigned long iter_flags;
-	loff_t pos;
-	unsigned long next_idx[NR_CPUS];
-	int cpu;
-	int idx;
-};
-
-static struct mctracer_entry *mctracer_entry_idx(struct mctracer_trace *tr,
-						 unsigned long idx,
-						 int cpu)
+static struct tracing_entry *tracing_entry_idx(struct tracing_trace *tr,
+					       unsigned long idx,
+					       int cpu)
 {
-	struct mctracer_entry *array = tr->data[cpu]->trace;
+	struct tracing_entry *array = tr->data[cpu]->trace;
 	unsigned long underrun;
 
 	if (idx >= tr->entries)
@@ -176,18 +145,18 @@ static struct mctracer_entry *mctracer_e
 	return &array[idx];
 }
 
-static struct notrace mctracer_entry *
-find_next_entry(struct mctracer_iterator *iter, int *ent_cpu)
+static struct notrace tracing_entry *
+find_next_entry(struct tracing_iterator *iter, int *ent_cpu)
 {
-	struct mctracer_trace *tr = iter->tr;
-	struct mctracer_entry *ent, *next = NULL;
+	struct tracing_trace *tr = iter->tr;
+	struct tracing_entry *ent, *next = NULL;
 	int next_cpu = -1;
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
 		if (!tr->data[cpu]->trace)
 			continue;
-		ent = mctracer_entry_idx(tr, iter->next_idx[cpu], cpu);
+		ent = tracing_entry_idx(tr, iter->next_idx[cpu], cpu);
 		if (ent && (!next || next->t > ent->t)) {
 			next = ent;
 			next_cpu = cpu;
@@ -200,9 +169,9 @@ find_next_entry(struct mctracer_iterator
 	return next;
 }
 
-static void *find_next_entry_inc(struct mctracer_iterator *iter)
+static void *find_next_entry_inc(struct tracing_iterator *iter)
 {
-	struct mctracer_entry *next;
+	struct tracing_entry *next;
 	int next_cpu = -1;
 
 	next = find_next_entry(iter, &next_cpu);
@@ -220,7 +189,7 @@ static void *find_next_entry_inc(struct 
 static void notrace *
 s_next(struct seq_file *m, void *v, loff_t *pos)
 {
-	struct mctracer_iterator *iter = m->private;
+	struct tracing_iterator *iter = m->private;
 	void *ent;
 	void *last_ent = iter->ent;
 	int i = (int)*pos;
@@ -249,14 +218,14 @@ s_next(struct seq_file *m, void *v, loff
 
 static void *s_start(struct seq_file *m, loff_t *pos)
 {
-	struct mctracer_iterator *iter = m->private;
+	struct tracing_iterator *iter = m->private;
 	void *p = NULL;
 	loff_t l = 0;
 	int i;
 
-	/* stop the trace while dumping */
-	if (iter->tr->ctrl)
-		trace_enabled = 0;
+	/* let the tracer grab locks here if needed */
+	if (iter->tr->start)
+		iter->tr->start(iter);
 
 	if (*pos != iter->pos) {
 		iter->ent = NULL;
@@ -279,9 +248,11 @@ static void *s_start(struct seq_file *m,
 
 static void s_stop(struct seq_file *m, void *p)
 {
-	struct mctracer_iterator *iter = m->private;
-	if (iter->tr->ctrl)
-		trace_enabled = 1;
+	struct tracing_iterator *iter = m->private;
+
+	/* let the tracer release locks here if needed */
+	if (iter->tr->stop)
+		iter->tr->stop(iter);
 }
 
 #ifdef CONFIG_KALLSYMS
@@ -330,13 +301,14 @@ static void notrace print_help_header(st
 }
 
 static void notrace print_trace_header(struct seq_file *m,
-				       struct mctracer_iterator *iter)
+				       struct tracing_iterator *iter)
 {
-	struct mctracer_trace *tr = iter->tr;
+	struct tracing_trace *tr = iter->tr;
+	struct tracing_trace_cpu *data = tr->data[tr->cpu];
 	unsigned long underruns = 0;
 	unsigned long underrun;
 	unsigned long entries   = 0;
-	int sym_only = !!(tr->iter_flags & TRACE_ITER_SYM_ONLY);
+	int sym_only = !!(trace_flags & TRACE_ITER_SYM_ONLY);
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
@@ -356,7 +328,7 @@ static void notrace print_trace_header(s
 		 "---------------------------------\n");
 	seq_printf(m, " latency: %lu us, #%lu/%lu, CPU#%d |"
 		   " (M:%s VP:%d, KP:%d, SP:%d HP:%d",
-		   cycles_to_usecs(tr->saved_latency),
+		   cycles_to_usecs(data->saved_latency),
 		   entries,
 		   (entries + underruns),
 		   smp_processor_id(),
@@ -379,15 +351,15 @@ static void notrace print_trace_header(s
 	seq_puts(m, "    -----------------\n");
 	seq_printf(m, "    | task: %.16s-%d "
 		   "(uid:%d nice:%ld policy:%ld rt_prio:%ld)\n",
-		   tr->comm, tr->pid, tr->uid, tr->nice,
-		   tr->policy, tr->rt_priority);
+		   data->comm, data->pid, data->uid, data->nice,
+		   data->policy, data->rt_priority);
 	seq_puts(m, "    -----------------\n");
 
-	if (tr->critical_start) {
+	if (data->critical_start) {
 		seq_puts(m, " => started at: ");
-		seq_print_ip_sym(m, tr->critical_start, sym_only);
+		seq_print_ip_sym(m, data->critical_start, sym_only);
 		seq_puts(m, "\n => ended at:   ");
-		seq_print_ip_sym(m, tr->critical_end, sym_only);
+		seq_print_ip_sym(m, data->critical_end, sym_only);
 		seq_puts(m, "\n");
 	}
 
@@ -396,7 +368,7 @@ static void notrace print_trace_header(s
 
 
 static void notrace
-lat_print_generic(struct seq_file *m, struct mctracer_entry *entry, int cpu)
+lat_print_generic(struct seq_file *m, struct tracing_entry *entry, int cpu)
 {
 	int hardirq, softirq;
 
@@ -422,7 +394,7 @@ lat_print_generic(struct seq_file *m, st
 	}
 
 	if (entry->preempt_count)
-		seq_printf(m, "%lx", entry->preempt_count);
+		seq_printf(m, "%x", entry->preempt_count);
 	else
 		seq_puts(m, ".");
 }
@@ -443,15 +415,15 @@ lat_print_timestamp(struct seq_file *m, 
 }
 
 static void notrace
-print_lat_fmt(struct seq_file *m, struct mctracer_iterator *iter,
+print_lat_fmt(struct seq_file *m, struct tracing_iterator *iter,
 	      unsigned int trace_idx, int cpu)
 {
-	struct mctracer_entry *entry = iter->ent;
-	struct mctracer_entry *next_entry = find_next_entry(iter, NULL);
+	struct tracing_entry *entry = iter->ent;
+	struct tracing_entry *next_entry = find_next_entry(iter, NULL);
 	unsigned long abs_usecs;
 	unsigned long rel_usecs;
-	int sym_only = !!(iter->tr->iter_flags & TRACE_ITER_SYM_ONLY);
-	int verbose = !!(iter->tr->iter_flags & TRACE_ITER_VERBOSE);
+	int sym_only = !!(trace_flags & TRACE_ITER_SYM_ONLY);
+	int verbose = !!(trace_flags & TRACE_ITER_VERBOSE);
 
 	if (!next_entry)
 		next_entry = entry;
@@ -459,7 +431,7 @@ print_lat_fmt(struct seq_file *m, struct
 	abs_usecs = cycles_to_usecs(entry->t - iter->tr->time_start);
 
 	if (verbose) {
-		seq_printf(m, "%16s %5d %d %ld %08lx %08x [%08lx]"
+		seq_printf(m, "%16s %5d %d %d %08x %08x [%08lx]"
 			   " %ld.%03ldms (+%ld.%03ldms): ",
 			   entry->comm,
 			   entry->pid, cpu, entry->flags,
@@ -471,18 +443,22 @@ print_lat_fmt(struct seq_file *m, struct
 		lat_print_generic(m, entry, cpu);
 		lat_print_timestamp(m, abs_usecs, rel_usecs);
 	}
-	seq_print_ip_sym(m, entry->ip, sym_only);
-	seq_puts(m, " (");
-	seq_print_ip_sym(m, entry->parent_ip, sym_only);
-	seq_puts(m, ")\n");
+	switch (entry->type) {
+	case TRACE_FN:
+		seq_print_ip_sym(m, entry->fn.ip, sym_only);
+		seq_puts(m, " (");
+		seq_print_ip_sym(m, entry->fn.parent_ip, sym_only);
+		seq_puts(m, ")\n");
+		break;
+	}
 }
 
 static void notrace print_trace_fmt(struct seq_file *m,
-				    struct mctracer_iterator *iter)
+				    struct tracing_iterator *iter)
 {
 	unsigned long usec_rem;
 	unsigned long secs;
-	int sym_only = !!(iter->tr->iter_flags & TRACE_ITER_SYM_ONLY);
+	int sym_only = !!(trace_flags & TRACE_ITER_SYM_ONLY);
 	unsigned long long t;
 
 	t = cycles_to_usecs(iter->ent->t);
@@ -493,18 +469,22 @@ static void notrace print_trace_fmt(stru
 	seq_printf(m, "CPU %d: ", iter->cpu);
 	seq_printf(m, "%s:%d ", iter->ent->comm,
 		   iter->ent->pid);
-	seq_print_ip_sym(m, iter->ent->ip, sym_only);
-	if (iter->ent->parent_ip) {
-		seq_printf(m, " <-- ");
-		seq_print_ip_sym(m, iter->ent->parent_ip,
-				 sym_only);
+	switch (iter->ent->type) {
+	case TRACE_FN:
+		seq_print_ip_sym(m, iter->ent->fn.ip, sym_only);
+		if (iter->ent->fn.parent_ip) {
+			seq_printf(m, " <-- ");
+			seq_print_ip_sym(m, iter->ent->fn.parent_ip,
+					 sym_only);
+		}
+		break;
 	}
 	seq_printf(m, "\n");
 }
 
-static int trace_empty(struct mctracer_iterator *iter)
+static int trace_empty(struct tracing_iterator *iter)
 {
-	struct mctracer_trace_cpu *data;
+	struct tracing_trace_cpu *data;
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
@@ -520,7 +500,7 @@ static int trace_empty(struct mctracer_i
 
 static int s_show(struct seq_file *m, void *v)
 {
-	struct mctracer_iterator *iter = v;
+	struct tracing_iterator *iter = v;
 
 	if (iter->ent == NULL) {
 		if (iter->iter_flags & TRACE_FILE_LAT_FMT) {
@@ -528,10 +508,10 @@ static int s_show(struct seq_file *m, vo
 			if (trace_empty(iter))
 				return 0;
 			print_trace_header(m, iter);
-			if (!(iter->tr->iter_flags & TRACE_ITER_VERBOSE))
+			if (!(trace_flags & TRACE_ITER_VERBOSE))
 				print_help_header(m);
 		} else
-			seq_printf(m, "mctracer:\n");
+			seq_printf(m, "tracer:\n");
 	} else {
 		if (iter->iter_flags & TRACE_FILE_LAT_FMT)
 			print_lat_fmt(m, iter, iter->idx, iter->cpu);
@@ -542,17 +522,17 @@ static int s_show(struct seq_file *m, vo
 	return 0;
 }
 
-static struct seq_operations mctrace_seq_ops = {
+static struct seq_operations tracer_seq_ops = {
 	.start = s_start,
 	.next = s_next,
 	.stop = s_stop,
 	.show = s_show,
 };
 
-static struct mctracer_iterator *
-__mctrace_open(struct inode *inode, struct file *file, int *ret)
+static struct tracing_iterator notrace *
+__tracing_open(struct inode *inode, struct file *file, int *ret)
 {
-	struct mctracer_iterator *iter;
+	struct tracing_iterator *iter;
 
 	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
 	if (!iter) {
@@ -560,14 +540,21 @@ __mctrace_open(struct inode *inode, stru
 		goto out;
 	}
 
-	iter->tr = &mctracer_trace;
+	iter->tr = inode->i_private;
 	iter->pos = -1;
 
 	/* TODO stop tracer */
-	*ret = seq_open(file, &mctrace_seq_ops);
+	*ret = seq_open(file, &tracer_seq_ops);
 	if (!*ret) {
 		struct seq_file *m = file->private_data;
 		m->private = iter;
+
+		/*
+		 * Most tracers want to disable the
+		 * trace while printing a trace.
+		 */
+		if (iter->tr->open)
+			iter->tr->open(iter);
 	} else {
 		kfree(iter);
 		iter = NULL;
@@ -577,21 +564,40 @@ __mctrace_open(struct inode *inode, stru
 	return iter;
 }
 
-static int mctrace_open(struct inode *inode, struct file *file)
+int tracing_open_generic(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+int tracing_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = (struct seq_file *)file->private_data;
+	struct tracing_iterator *iter = m->private;
+
+	if (iter->tr->close)
+		iter->tr->close(iter);
+
+	seq_release(inode, file);
+	kfree(iter);
+	return 0;
+}
+
+static int tracing_open(struct inode *inode, struct file *file)
 {
 	int ret;
 
-	__mctrace_open(inode, file, &ret);
+	__tracing_open(inode, file, &ret);
 
 	return ret;
 }
 
-static int mctrace_lt_open(struct inode *inode, struct file *file)
+static int tracing_lt_open(struct inode *inode, struct file *file)
 {
-	struct mctracer_iterator *iter;
+	struct tracing_iterator *iter;
 	int ret;
 
-	iter = __mctrace_open(inode, file, &ret);
+	iter = __tracing_open(inode, file, &ret);
 
 	if (!ret)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
@@ -599,109 +605,27 @@ static int mctrace_lt_open(struct inode 
 	return ret;
 }
 
-int mctrace_release(struct inode *inode, struct file *file)
-{
-	struct seq_file *m = (struct seq_file *)file->private_data;
-	struct mctracer_iterator *iter = m->private;
-
-	seq_release(inode, file);
-	kfree(iter);
-	return 0;
-}
-
-static struct file_operations mctrace_fops = {
-	.open = mctrace_open,
+struct file_operations tracing_fops = {
+	.open = tracing_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
-	.release = mctrace_release,
+	.release = tracing_release,
 };
 
-static struct file_operations mctrace_lt_fops = {
-	.open = mctrace_lt_open,
+struct file_operations tracing_lt_fops = {
+	.open = tracing_lt_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
-	.release = mctrace_release,
+	.release = tracing_release,
 };
 
-static int mctracer_open_generic(struct inode *inode, struct file *filp)
-{
-	filp->private_data = inode->i_private;
-	return 0;
-}
-
-
-static ssize_t mctracer_ctrl_read(struct file *filp, char __user *ubuf,
-				  size_t cnt, loff_t *ppos)
+static ssize_t tracing_iter_ctrl_read(struct file *filp, char __user *ubuf,
+				      size_t cnt, loff_t *ppos)
 {
-	struct mctracer_trace *tr = filp->private_data;
-	char buf[16];
-	int r;
-
-	r = sprintf(buf, "%ld\n", tr->ctrl);
-	return simple_read_from_buffer(ubuf, cnt, ppos,
-				       buf, r);
-}
-
-static ssize_t mctracer_ctrl_write(struct file *filp,
-				   const char __user *ubuf,
-				   size_t cnt, loff_t *ppos)
-{
-	struct mctracer_trace *tr = filp->private_data;
-	long val;
-	char buf[16];
-
-	if (cnt > 15)
-		cnt = 15;
-
-	if (copy_from_user(&buf, ubuf, cnt))
-		return -EFAULT;
-
-	buf[cnt] = 0;
-
-	val = !!simple_strtoul(buf, NULL, 10);
-
-	/* When starting a new trace, reset the buffers */
-	if (val)
-		mctracer_reset(tr);
-	else {
-		/* pretty meaningless for now */
-		tr->time_end = now();
-		tr->saved_latency = tr->time_end - tr->time_start;
-		memcpy(tr->comm, current->comm, TASK_COMM_LEN);
-		tr->pid = current->pid;
-		tr->uid = current->uid;
-		tr->nice = current->static_prio - 20 - MAX_RT_PRIO;
-		tr->policy = current->policy;
-		tr->rt_priority = current->rt_priority;
-	}
-
-	if (tr->ctrl ^ val) {
-		if (val)
-			trace_enabled = 1;
-		else
-			trace_enabled = 0;
-		tr->ctrl = val;
-	}
-
-	filp->f_pos += cnt;
-
-	return cnt;
-}
-
-static struct file_operations mctracer_ctrl_fops = {
-	.open = mctracer_open_generic,
-	.read = mctracer_ctrl_read,
-	.write = mctracer_ctrl_write,
-};
-
-static ssize_t mctracer_iter_ctrl_read(struct file *filp, char __user *ubuf,
-				       size_t cnt, loff_t *ppos)
-{
-	struct mctracer_trace *tr = filp->private_data;
 	char *buf;
 	int r = 0;
-	int i;
 	int len = 0;
+	int i;
 
 	/* calulate max size */
 	for (i = 0; trace_options[i]; i++) {
@@ -715,7 +639,7 @@ static ssize_t mctracer_iter_ctrl_read(s
 		return -ENOMEM;
 
 	for (i = 0; trace_options[i]; i++) {
-		if (tr->iter_flags & (1 << i))
+		if (trace_flags & (1 << i))
 			r += sprintf(buf + r, "%s ", trace_options[i]);
 		else
 			r += sprintf(buf + r, "no%s ", trace_options[i]);
@@ -732,11 +656,10 @@ static ssize_t mctracer_iter_ctrl_read(s
 	return r;
 }
 
-static ssize_t mctracer_iter_ctrl_write(struct file *filp,
-					const char __user *ubuf,
-					size_t cnt, loff_t *ppos)
+static ssize_t tracing_iter_ctrl_write(struct file *filp,
+				       const char __user *ubuf,
+				       size_t cnt, loff_t *ppos)
 {
-	struct mctracer_trace *tr = filp->private_data;
 	char buf[64];
 	char *cmp = buf;
 	int neg = 0;
@@ -760,9 +683,9 @@ static ssize_t mctracer_iter_ctrl_write(
 
 		if (strncmp(cmp, trace_options[i], len) == 0) {
 			if (neg)
-				tr->iter_flags &= ~(1 << i);
+				trace_flags &= ~(1 << i);
 			else
-				tr->iter_flags |= (1 << i);
+				trace_flags |= (1 << i);
 			break;
 		}
 	}
@@ -772,104 +695,92 @@ static ssize_t mctracer_iter_ctrl_write(
 	return cnt;
 }
 
-static struct file_operations mctracer_iter_fops = {
-	.open = mctracer_open_generic,
-	.read = mctracer_iter_ctrl_read,
-	.write = mctracer_iter_ctrl_write,
+static struct file_operations tracing_iter_fops = {
+	.open = tracing_open_generic,
+	.read = tracing_iter_ctrl_read,
+	.write = tracing_iter_ctrl_write,
 };
 
-static void mctrace_init_debugfs(void)
+static ssize_t tracing_ctrl_read(struct file *filp, char __user *ubuf,
+				 size_t cnt, loff_t *ppos)
 {
-	struct dentry *d_mctracer;
-	struct dentry *entry;
+	struct tracing_trace *tr = filp->private_data;
+	char buf[64];
+	int r;
 
-	d_mctracer = debugfs_create_dir("tracing", NULL);
-	if (!d_mctracer) {
-		pr_warning("Could not create debugfs directory mctracer\n");
-		return;
-	}
+	r = sprintf(buf, "%ld\n", tr->ctrl);
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       buf, r);
+}
 
-	entry = debugfs_create_file("ctrl", 0644, d_mctracer,
-				    &mctracer_trace, &mctracer_ctrl_fops);
-	if (!entry)
-		pr_warning("Could not create debugfs 'ctrl' entry\n");
+static ssize_t tracing_ctrl_write(struct file *filp,
+				  const char __user *ubuf,
+				  size_t cnt, loff_t *ppos)
+{
+	struct tracing_trace *tr = filp->private_data;
+	long val;
+	char buf[64];
 
-	entry = debugfs_create_file("iter_ctrl", 0644, d_mctracer,
-				    &mctracer_trace, &mctracer_iter_fops);
-	if (!entry)
-		pr_warning("Could not create debugfs 'iter_ctrl' entry\n");
+	if (cnt > 63)
+		cnt = 63;
 
-	entry = debugfs_create_file("function_trace", 0444, d_mctracer,
-				    &mctracer_trace, &mctrace_lt_fops);
-	if (!entry)
-		pr_warning("Could not create debugfs 'function_trace' entry\n");
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
 
-	entry = debugfs_create_file("trace", 0444, d_mctracer,
-				    &mctracer_trace, &mctrace_fops);
-	if (!entry)
-		pr_warning("Could not create debugfs 'trace' entry\n");
+	buf[cnt] = 0;
 
-}
-#else /* CONFIG_DEBUG_FS */
-static void mctrace_init_debugfs(void)
-{
-	/*
-	 * No way to turn on or off the trace function
-	 * without debugfs.
-	 */
-}
-#endif /* CONFIG_DEBUG_FS */
+	val = simple_strtoul(buf, NULL, 10);
 
-static notrace int page_order(const unsigned long size)
-{
-	const unsigned long nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
-	return ilog2(roundup_pow_of_two(nr_pages));
+	tr->ctrl_update(tr, val);
+
+	filp->f_pos += cnt;
+
+	return cnt;
 }
 
-static notrace int mctracer_alloc_buffers(void)
+struct file_operations tracing_ctrl_fops = {
+	.open = tracing_open_generic,
+	.read = tracing_ctrl_read,
+	.write = tracing_ctrl_write,
+};
+
+static struct dentry *d_tracer;
+
+struct dentry *tracing_init_dentry(void)
 {
-	const int order = page_order(MCTRACER_NR_ENTRIES * MCTRACER_ENTRY_SIZE);
-	const unsigned long size = (1UL << order) << PAGE_SHIFT;
-	struct mctracer_entry *array;
-	int i;
+	static int once;
 
-	for_each_possible_cpu(i) {
-		mctracer_trace.data[i] = &per_cpu(mctracer_trace_cpu, i);
-		array = (struct mctracer_entry *)
-			  __get_free_pages(GFP_KERNEL, order);
-		if (array == NULL) {
-			printk(KERN_ERR "mctracer: failed to allocate"
-			       " %ld bytes for trace buffer!\n", size);
-			goto free_buffers;
-		}
-		mctracer_trace.data[i]->trace = array;
+	if (d_tracer)
+		return d_tracer;
+
+	d_tracer = debugfs_create_dir("tracing", NULL);
+
+	if (!d_tracer && !once) {
+		once = 1;
+		pr_warning("Could not create debugfs directory 'tracing'\n");
+		return NULL;
 	}
 
-	/*
-	 * Since we allocate by orders of pages, we may be able to
-	 * round up a bit.
-	 */
-	mctracer_trace.entries = size / MCTRACER_ENTRY_SIZE;
+	return d_tracer;
+}
 
-	pr_info("mctracer: %ld bytes allocated for %ld entries of %ld bytes\n",
-		size, MCTRACER_NR_ENTRIES, (long)MCTRACER_ENTRY_SIZE);
-	pr_info("   actual entries %ld\n", mctracer_trace.entries);
+static __init int trace_init_debugfs(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
 
-	register_mcount_function(&trace_ops);
+	d_tracer = tracing_init_dentry();
+	if (!d_tracer)
+		return 0;
 
-	mctrace_init_debugfs();
+	entry = debugfs_create_file("iter_ctrl", 0644, d_tracer,
+				    NULL, &tracing_iter_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'iter_ctrl' entry\n");
 
 	return 0;
-
- free_buffers:
-	for (i-- ; i >= 0; i--) {
-		if (mctracer_trace.data[i] && mctracer_trace.data[i]->trace) {
-			free_pages((unsigned long)mctracer_trace.data[i]->trace,
-				   order);
-			mctracer_trace.data[i]->trace = NULL;
-		}
-	}
-	return -ENOMEM;
 }
 
-device_initcall(mctracer_alloc_buffers);
+device_initcall(trace_init_debugfs);
+
+#endif /* CONFIG_DEBUG_FS */
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-14 13:14:14.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-14 13:14:14.000000000 -0500
@@ -3,40 +3,93 @@
 
 #include <asm/atomic.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
-struct mctracer_entry {
-	unsigned long long t;
+struct tracing_function {
 	unsigned long ip;
 	unsigned long parent_ip;
-	unsigned long preempt_count;
-	unsigned long flags;
+};
+
+struct tracing_entry {
+	char type;
+	char cpu;  /* who will want to trace more than 256 CPUS? */
+	char flags;
+	char preempt_count; /* assumes PREEMPT_MASK is 8 bits or less */
+	int pid;
+	cycle_t t;
 	char comm[TASK_COMM_LEN];
-	pid_t pid;
+	struct tracing_function fn;
 };
 
-struct mctracer_trace_cpu {
+struct tracing_trace_cpu {
 	void *trace;
 	unsigned long trace_idx;
 	atomic_t      disabled;
 	atomic_t      underrun;
+	unsigned long saved_latency;
+	unsigned long critical_start;
+	unsigned long critical_end;
+	unsigned long critical_sequence;
+	unsigned long nice;
+	unsigned long policy;
+	unsigned long rt_priority;
+	cycle_t preempt_timestamp;
+	pid_t	      pid;
+	uid_t	      uid;
+	char comm[TASK_COMM_LEN];
 };
 
-struct mctracer_trace {
+struct tracing_iterator;
+
+struct tracing_trace {
 	unsigned long entries;
 	long	      ctrl;
+	int	      cpu;
+	cycle_t	      time_start;
+	void (*open)(struct tracing_iterator *iter);
+	void (*close)(struct tracing_iterator *iter);
+	void (*start)(struct tracing_iterator *iter);
+	void (*stop)(struct tracing_iterator *iter);
+	void (*ctrl_update)(struct tracing_trace *tr,
+			    unsigned long val);
+	struct tracing_trace_cpu *data[NR_CPUS];
+};
+
+struct tracing_iterator {
+	struct tracing_trace *tr;
+	struct tracing_entry *ent;
 	unsigned long iter_flags;
-	char comm[TASK_COMM_LEN];
-	pid_t	      pid;
-	uid_t	      uid;
-	unsigned long nice;
-	unsigned long policy;
-	unsigned long rt_priority;
-	unsigned long saved_latency;
-	unsigned long critical_start;
-	unsigned long critical_end;
-	unsigned long long time_start;
-	unsigned long long time_end;
-	struct mctracer_trace_cpu *data[NR_CPUS];
+	loff_t pos;
+	unsigned long next_idx[NR_CPUS];
+	int cpu;
+	int idx;
 };
 
+#define TRACING_ENTRY_SIZE sizeof(struct tracing_entry)
+#define TRACING_NR_ENTRIES (65536UL)
+
+void notrace tracing_reset(struct tracing_trace_cpu *data);
+int tracing_open_generic(struct inode *inode, struct file *filp);
+struct dentry *tracing_init_dentry(void);
+void tracing_function_trace(struct tracing_trace *tr,
+			    struct tracing_trace_cpu *data,
+			    unsigned long ip,
+			    unsigned long parent_ip,
+			    unsigned long flags);
+
+extern struct file_operations tracing_fops;
+extern struct file_operations tracing_lt_fops;
+extern struct file_operations tracing_ctrl_fops;
+
+static inline notrace cycle_t now(void)
+{
+	return get_monotonic_cycles();
+}
+
+static inline notrace int page_order(const unsigned long size)
+{
+	const unsigned long nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+	return ilog2(roundup_pow_of_two(nr_pages));
+}
+
 #endif /* _LINUX_MCOUNT_TRACER_H */
Index: linux-compile.git/lib/tracing/tracer_interface.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer_interface.h	2008-01-14 13:14:13.000000000 -0500
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,14 +0,0 @@
-#ifndef _LINUX_MCTRACER_INTERFACE_H
-#define _LINUX_MCTRACER_INTERFACE_H
-
-#include "tracer.h"
-
-/*
- * Will be at least sizeof(struct mctracer_entry), but callers can request more
- * space for private stuff, such as a timestamp, preempt_count, etc.
- */
-#define MCTRACER_ENTRY_SIZE sizeof(struct mctracer_entry)
-
-#define MCTRACER_NR_ENTRIES (65536UL)
-
-#endif /* _LINUX_MCTRACER_INTERFACE_H */

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 25/30 v3] Trace irq disabled critical timings
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (23 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 24/30 v3] Split out specific tracing functions Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 26/30 v3] Add context switch marker to sched.c Steven Rostedt
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-tracer-latency-trace-irqs-off.patch --]
[-- Type: text/plain, Size: 29822 bytes --]

This patch adds latency tracing for critical timings.
In /debugfs/tracing/ three files are added:

  preempt_max_latency
    holds the max latency thus far (in usecs)
   (default to large number so one must start latency tracing)

  preempt_thresh
    threshold (in usecs) to always print out if irqs off
    is detected to be longer than stated here.
    If irq_thresh is non-zero, then max_irq_latency
    is ignored.

  preempt_trace
    Trace of where the latecy was detected.

  preempt_fn_trace_ctrl
    0 - don't use mcount
    1 - use mcount to trace

Here's an example of a trace with preempt_fn_trace_ctrl == 0

=======
preemption latency trace v1.1.5 on 2.6.24-rc7
--------------------------------------------------------------------
 latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
    -----------------
    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------
 => started at: _spin_lock_irqsave+0x2a/0xb7
 => ended at:   _spin_unlock_irqrestore+0x32/0x5f

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
 swapper-0     1d.s3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
 swapper-0     1d.s3  100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
 swapper-0     1d.s3  100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)


vim:ft=help
=======


And this is a trace with irqsoff_fn_trace_ctrl == 1


=======
preemption latency trace v1.1.5 on 2.6.24-rc7
--------------------------------------------------------------------
 latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
    -----------------
    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
    -----------------
 => started at: _spin_lock_irqsave+0x2a/0xb7
 => ended at:   _spin_unlock_irqrestore+0x32/0x5f

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
 swapper-0     1dNs3    0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000])
 swapper-0     1dNs3   46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000])
 swapper-0     1dNs3   46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000])
 swapper-0     1dNs3   46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000])
 swapper-0     1dNs3   47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000])
 swapper-0     1dNs3   47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
 swapper-0     1dNs3   97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
 swapper-0     1dNs3   98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000])
 swapper-0     1dNs3   99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000])
 swapper-0     1dNs3  101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000])
 swapper-0     1dNs3  102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000])
 swapper-0     1dNs3  102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f)


vim:ft=help
=======


Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/process_64.c  |    3 
 arch/x86/lib/thunk_64.S       |   18 +
 include/asm-x86/irqflags_32.h |    4 
 include/asm-x86/irqflags_64.h |    4 
 include/linux/irqflags.h      |   37 ++
 include/linux/mcount.h        |   31 ++
 kernel/fork.c                 |    2 
 kernel/lockdep.c              |   25 +
 lib/tracing/Kconfig           |   18 +
 lib/tracing/Makefile          |    1 
 lib/tracing/trace_irqsoff.c   |  558 ++++++++++++++++++++++++++++++++++++++++++
 11 files changed, 680 insertions(+), 21 deletions(-)

Index: linux-compile.git/arch/x86/kernel/process_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/process_64.c	2008-01-15 12:50:00.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/process_64.c	2008-01-15 12:51:36.000000000 -0500
@@ -233,7 +233,10 @@ void cpu_idle (void)
 			 */
 			local_irq_disable();
 			enter_idle();
+			/* Don't trace irqs off for idle */
+			stop_critical_timings();
 			idle();
+			start_critical_timings();
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
Index: linux-compile.git/arch/x86/lib/thunk_64.S
===================================================================
--- linux-compile.git.orig/arch/x86/lib/thunk_64.S	2008-01-15 12:49:33.000000000 -0500
+++ linux-compile.git/arch/x86/lib/thunk_64.S	2008-01-15 12:51:36.000000000 -0500
@@ -47,8 +47,22 @@
 	thunk __up_wakeup,__up
 
 #ifdef CONFIG_TRACE_IRQFLAGS
-	thunk trace_hardirqs_on_thunk,trace_hardirqs_on
-	thunk trace_hardirqs_off_thunk,trace_hardirqs_off
+	/* put return address in rdi (arg1) */
+	.macro thunk_ra name,func
+	.globl \name
+\name:
+	CFI_STARTPROC
+	SAVE_ARGS
+	/* SAVE_ARGS pushs 9 elements */
+	/* the next element would be the rip */
+	movq 9*8(%rsp), %rdi
+	call \func
+	jmp  restore
+	CFI_ENDPROC
+	.endm
+
+	thunk_ra trace_hardirqs_on_thunk,trace_hardirqs_on_caller
+	thunk_ra trace_hardirqs_off_thunk,trace_hardirqs_off_caller
 #endif
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
Index: linux-compile.git/include/asm-x86/irqflags_32.h
===================================================================
--- linux-compile.git.orig/include/asm-x86/irqflags_32.h	2008-01-15 12:49:33.000000000 -0500
+++ linux-compile.git/include/asm-x86/irqflags_32.h	2008-01-15 12:51:36.000000000 -0500
@@ -139,9 +139,9 @@ static inline int raw_irqs_disabled(void
 static inline void trace_hardirqs_fixup_flags(unsigned long flags)
 {
 	if (raw_irqs_disabled_flags(flags))
-		trace_hardirqs_off();
+		__trace_hardirqs_off();
 	else
-		trace_hardirqs_on();
+		__trace_hardirqs_on();
 }
 
 static inline void trace_hardirqs_fixup(void)
Index: linux-compile.git/include/asm-x86/irqflags_64.h
===================================================================
--- linux-compile.git.orig/include/asm-x86/irqflags_64.h	2008-01-15 12:49:33.000000000 -0500
+++ linux-compile.git/include/asm-x86/irqflags_64.h	2008-01-15 12:51:36.000000000 -0500
@@ -120,9 +120,9 @@ static inline int raw_irqs_disabled(void
 static inline void trace_hardirqs_fixup_flags(unsigned long flags)
 {
 	if (raw_irqs_disabled_flags(flags))
-		trace_hardirqs_off();
+		__trace_hardirqs_off();
 	else
-		trace_hardirqs_on();
+		__trace_hardirqs_on();
 }
 
 static inline void trace_hardirqs_fixup(void)
Index: linux-compile.git/include/linux/irqflags.h
===================================================================
--- linux-compile.git.orig/include/linux/irqflags.h	2008-01-15 12:49:33.000000000 -0500
+++ linux-compile.git/include/linux/irqflags.h	2008-01-15 12:51:36.000000000 -0500
@@ -12,10 +12,21 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #ifdef CONFIG_TRACE_IRQFLAGS
-  extern void trace_hardirqs_on(void);
-  extern void trace_hardirqs_off(void);
+# include <linux/mcount.h>
+  extern void trace_hardirqs_on_caller(unsigned long ip);
+  extern void trace_hardirqs_off_caller(unsigned long ip);
   extern void trace_softirqs_on(unsigned long ip);
   extern void trace_softirqs_off(unsigned long ip);
+  extern void trace_hardirqs_on(void);
+  extern void trace_hardirqs_off(void);
+  static inline void notrace __trace_hardirqs_on(void)
+  {
+	trace_hardirqs_on_caller(CALLER_ADDR0);
+  }
+  static inline void notrace __trace_hardirqs_off(void)
+  {
+	trace_hardirqs_off_caller(CALLER_ADDR0);
+  }
 # define trace_hardirq_context(p)	((p)->hardirq_context)
 # define trace_softirq_context(p)	((p)->softirq_context)
 # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
@@ -28,6 +39,8 @@
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
+# define __trace_hardirqs_on()		do { } while (0)
+# define __trace_hardirqs_off()		do { } while (0)
 # define trace_softirqs_on(ip)		do { } while (0)
 # define trace_softirqs_off(ip)		do { } while (0)
 # define trace_hardirq_context(p)	0
@@ -41,24 +54,32 @@
 # define INIT_TRACE_IRQFLAGS
 #endif
 
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+ extern void stop_critical_timings(void);
+ extern void start_critical_timings(void);
+#else
+# define stop_critical_timings() do { } while (0)
+# define start_critical_timings() do { } while (0)
+#endif
+
 #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
 
 #include <asm/irqflags.h>
 
 #define local_irq_enable() \
-	do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
+	do { __trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
 #define local_irq_disable() \
-	do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0)
+	do { raw_local_irq_disable(); __trace_hardirqs_off(); } while (0)
 #define local_irq_save(flags) \
-	do { raw_local_irq_save(flags); trace_hardirqs_off(); } while (0)
+	do { raw_local_irq_save(flags); __trace_hardirqs_off(); } while (0)
 
 #define local_irq_restore(flags)				\
 	do {							\
 		if (raw_irqs_disabled_flags(flags)) {		\
 			raw_local_irq_restore(flags);		\
-			trace_hardirqs_off();			\
+			__trace_hardirqs_off();			\
 		} else {					\
-			trace_hardirqs_on();			\
+			__trace_hardirqs_on();			\
 			raw_local_irq_restore(flags);		\
 		}						\
 	} while (0)
@@ -76,7 +97,7 @@
 #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
 #define safe_halt()						\
 	do {							\
-		trace_hardirqs_on();				\
+		__trace_hardirqs_on();				\
 		raw_safe_halt();				\
 	} while (0)
 
Index: linux-compile.git/include/linux/mcount.h
===================================================================
--- linux-compile.git.orig/include/linux/mcount.h	2008-01-15 12:50:57.000000000 -0500
+++ linux-compile.git/include/linux/mcount.h	2008-01-15 13:12:30.000000000 -0500
@@ -6,10 +6,6 @@ extern int mcount_enabled;
 
 #include <linux/linkage.h>
 
-#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
-#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
-#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
-
 typedef void (*mcount_func_t)(unsigned long ip, unsigned long parent_ip);
 
 struct mcount_ops {
@@ -35,4 +31,31 @@ extern void mcount(void);
 # define unregister_mcount_function(ops) do { } while (0)
 # define clear_mcount_function(ops) do { } while (0)
 #endif /* CONFIG_MCOUNT */
+
+
+#ifdef CONFIG_FRAME_POINTER
+/* TODO: need to fix this for ARM */
+# define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+# define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
+# define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
+# define CALLER_ADDR3 ((unsigned long)__builtin_return_address(3))
+# define CALLER_ADDR4 ((unsigned long)__builtin_return_address(4))
+# define CALLER_ADDR5 ((unsigned long)__builtin_return_address(5))
+#else
+# define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+# define CALLER_ADDR1 0UL
+# define CALLER_ADDR2 0UL
+# define CALLER_ADDR3 0UL
+# define CALLER_ADDR4 0UL
+# define CALLER_ADDR5 0UL
+#endif
+
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+  extern void notrace time_hardirqs_on(unsigned long a0, unsigned long a1);
+  extern void notrace time_hardirqs_off(unsigned long a0, unsigned long a1);
+#else
+# define time_hardirqs_on(a0, a1)		do { } while (0)
+# define time_hardirqs_off(a0, a1)		do { } while (0)
+#endif
+
 #endif /* _LINUX_MCOUNT_H */
Index: linux-compile.git/kernel/fork.c
===================================================================
--- linux-compile.git.orig/kernel/fork.c	2008-01-15 12:49:33.000000000 -0500
+++ linux-compile.git/kernel/fork.c	2008-01-15 12:51:36.000000000 -0500
@@ -1010,7 +1010,7 @@ static struct task_struct *copy_process(
 
 	rt_mutex_init_task(p);
 
-#ifdef CONFIG_TRACE_IRQFLAGS
+#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_LOCKDEP)
 	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
 	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif
Index: linux-compile.git/kernel/lockdep.c
===================================================================
--- linux-compile.git.orig/kernel/lockdep.c	2008-01-15 12:49:33.000000000 -0500
+++ linux-compile.git/kernel/lockdep.c	2008-01-15 12:51:36.000000000 -0500
@@ -39,6 +39,7 @@
 #include <linux/irqflags.h>
 #include <linux/utsname.h>
 #include <linux/hash.h>
+#include <linux/mcount.h>
 
 #include <asm/sections.h>
 
@@ -2009,7 +2010,7 @@ void early_boot_irqs_on(void)
 /*
  * Hardirqs will be enabled:
  */
-void trace_hardirqs_on(void)
+void notrace trace_hardirqs_on_caller(unsigned long a0)
 {
 	struct task_struct *curr = current;
 	unsigned long ip;
@@ -2050,14 +2051,27 @@ void trace_hardirqs_on(void)
 	curr->hardirq_enable_ip = ip;
 	curr->hardirq_enable_event = ++curr->irq_events;
 	debug_atomic_inc(&hardirqs_on_events);
+
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+	time_hardirqs_on(CALLER_ADDR0, a0);
+#endif
 }
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
 
+void notrace trace_hardirqs_on(void) {
+	trace_hardirqs_on_caller(CALLER_ADDR0);
+}
 EXPORT_SYMBOL(trace_hardirqs_on);
 
+void notrace trace_hardirqs_off(void) {
+	trace_hardirqs_off_caller(CALLER_ADDR0);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
 /*
  * Hardirqs were disabled:
  */
-void trace_hardirqs_off(void)
+void notrace trace_hardirqs_off_caller(unsigned long a0)
 {
 	struct task_struct *curr = current;
 
@@ -2075,10 +2089,17 @@ void trace_hardirqs_off(void)
 		curr->hardirq_disable_ip = _RET_IP_;
 		curr->hardirq_disable_event = ++curr->irq_events;
 		debug_atomic_inc(&hardirqs_off_events);
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+		time_hardirqs_off(CALLER_ADDR0, a0);
+#endif
 	} else
 		debug_atomic_inc(&redundant_hardirqs_off);
 }
 
+void notrace trace_hardirqs_off(void) {
+	trace_hardirqs_off_caller(CALLER_ADDR0);
+}
+
 EXPORT_SYMBOL(trace_hardirqs_off);
 
 /*
Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-15 12:51:21.000000000 -0500
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 13:12:36.000000000 -0500
@@ -24,3 +24,21 @@ config FUNCTION_TRACER
 	  insert a call to an architecture specific __mcount routine,
 	  that the debugging mechanism using this facility will hook by
 	  providing a set of inline routines.
+
+config CRITICAL_IRQSOFF_TIMING
+	bool "Interrupts-off critical section latency timing"
+	default n
+	depends on TRACE_IRQFLAGS_SUPPORT
+	depends on GENERIC_TIME
+	select TRACE_IRQFLAGS
+	select TRACING
+	help
+	  This option measures the time spent in irqs-off critical
+	  sections, with microsecond accuracy.
+
+	  The default measurement method is a maximum search, which is
+	  disabled by default and can be runtime (re-)started
+	  via:
+
+	      echo 0 > /debug/tracing/preempt_max_latency
+
Index: linux-compile.git/lib/tracing/Makefile
===================================================================
--- linux-compile.git.orig/lib/tracing/Makefile	2008-01-15 12:51:21.000000000 -0500
+++ linux-compile.git/lib/tracing/Makefile	2008-01-15 13:12:36.000000000 -0500
@@ -2,5 +2,6 @@ obj-$(CONFIG_MCOUNT) += libmcount.o
 
 obj-$(CONFIG_TRACING) += tracer.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_function.o
+obj-$(CONFIG_CRITICAL_IRQSOFF_TIMING) += trace_irqsoff.o
 
 libmcount-y := mcount.o
Index: linux-compile.git/lib/tracing/trace_irqsoff.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/trace_irqsoff.c	2008-01-15 13:13:07.000000000 -0500
@@ -0,0 +1,558 @@
+/*
+ * trace irqs off criticall timings
+ *
+ * Copyright (C) 2007 Steven Rostedt <srostedt@redhat.com>
+ *
+ * From code in the latency_tracer, that is:
+ *
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/mcount.h>
+
+#include "tracer.h"
+
+static struct tracing_trace irqsoff_trace __read_mostly;
+static struct tracing_trace max_tr __read_mostly;
+static DEFINE_PER_CPU(struct tracing_trace_cpu, irqsoff_trace_cpu);
+static DEFINE_PER_CPU(struct tracing_trace_cpu, max_data);
+static unsigned long preempt_max_latency = (cycle_t)ULONG_MAX;
+static unsigned long preempt_thresh;
+static __cacheline_aligned_in_smp DEFINE_MUTEX(max_mutex);
+static int trace_enabled __read_mostly;
+
+/*
+ * max trace is switched with this buffer.
+ */
+static void *max_buffer;
+
+/*
+ * Sequence count - we record it when starting a measurement and
+ * skip the latency if the sequence has changed - some other section
+ * did a maximum and could disturb our measurement with serial console
+ * printouts, etc. Truly coinciding maximum latencies should be rare
+ * and what happens together happens separately as well, so this doesnt
+ * decrease the validity of the maximum found:
+ */
+static __cacheline_aligned_in_smp unsigned long max_sequence;
+
+/*
+ * Should this new latency be reported/recorded?
+ */
+static int notrace report_latency(cycle_t delta)
+{
+	if (preempt_thresh) {
+		if (delta < preempt_thresh)
+			return 0;
+	} else {
+		if (delta <= preempt_max_latency)
+			return 0;
+	}
+	return 1;
+}
+
+/*
+ * Copy the new maximum trace into the separate maximum-trace
+ * structure. (this way the maximum trace is permanently saved,
+ * for later retrieval via /proc/latency_trace)
+ */
+static void update_max_tr(struct tracing_trace *tr,
+			  struct tracing_trace_cpu *data,
+			  int cpu)
+{
+	struct tracing_trace_cpu *save;
+	int i;
+
+#ifdef CONFIG_PREEMPT
+	WARN_ON(!preempt_count() && !irqs_disabled());
+#endif
+
+	max_tr.cpu = cpu;
+	save = max_tr.data[cpu];
+
+	/* clear out all the previous traces */
+	for_each_possible_cpu(i) {
+		if (max_tr.data[i]->trace)
+			max_tr.data[i]->trace = NULL;
+	}
+
+	max_tr.time_start = data->preempt_timestamp;
+
+	memcpy(save, data, sizeof(*data));
+	save->saved_latency = preempt_max_latency;
+
+	memcpy(save->comm, current->comm, TASK_COMM_LEN);
+	save->pid = current->pid;
+	save->uid = current->uid;
+	save->nice = current->static_prio - 20 - MAX_RT_PRIO;
+	save->policy = current->policy;
+	save->rt_priority = current->rt_priority;
+
+	/* from memcpy above: save->trace = data->trace */
+	data->trace = max_buffer;
+	max_buffer = save->trace;
+}
+
+cycle_t notrace usecs_to_cycles(unsigned long usecs);
+
+static void notrace
+check_critical_timing(struct tracing_trace *tr,
+		      struct tracing_trace_cpu *data,
+		      unsigned long parent_ip,
+		      int cpu)
+{
+	unsigned long latency, t0, t1;
+	cycle_t T0, T1, T2, delta;
+	unsigned long flags;
+
+	/*
+	 * usecs conversion is slow so we try to delay the conversion
+	 * as long as possible:
+	 */
+	T0 = data->preempt_timestamp;
+	T1 = now();
+	delta = T1-T0;
+
+	local_save_flags(flags);
+
+	if (!report_latency(delta))
+		goto out;
+
+	tracing_function_trace(tr, data, CALLER_ADDR0, parent_ip, flags);
+	/*
+	 * Update the timestamp, because the trace entry above
+	 * might change it (it can only get larger so the latency
+	 * is fair to be reported):
+	 */
+	T2 = now();
+
+	delta = T2-T0;
+
+	latency = cycles_to_usecs(delta);
+
+	if (data->critical_sequence != max_sequence ||
+	    !mutex_trylock(&max_mutex))
+		goto out;
+
+	preempt_max_latency = delta;
+	t0 = cycles_to_usecs(T0);
+	t1 = cycles_to_usecs(T1);
+
+	data->critical_end = parent_ip;
+
+	update_max_tr(tr, data, cpu);
+
+	if (preempt_thresh)
+		printk(KERN_INFO "(%16s-%-5d|#%d): %lu us critical section "
+		       "violates %lu us threshold.\n"
+		       " => started at timestamp %lu: ",
+				current->comm, current->pid,
+				raw_smp_processor_id(),
+				latency, cycles_to_usecs(preempt_thresh), t0);
+	else
+		printk(KERN_INFO "(%16s-%-5d|#%d): new %lu us maximum-latency "
+		       "critical section.\n => started at timestamp %lu: ",
+				current->comm, current->pid,
+				raw_smp_processor_id(),
+				latency, t0);
+
+	print_symbol(KERN_CONT "<%s>\n", data->critical_start);
+	printk(KERN_CONT " =>   ended at timestamp %lu: ", t1);
+	print_symbol(KERN_CONT "<%s>\n", data->critical_end);
+	dump_stack();
+	t1 = cycles_to_usecs(now());
+	printk(KERN_CONT " =>   dump-end timestamp %lu\n\n", t1);
+
+	max_sequence++;
+
+	mutex_unlock(&max_mutex);
+
+out:
+	data->critical_sequence = max_sequence;
+	data->preempt_timestamp = now();
+	tracing_reset(data);
+	tracing_function_trace(tr, data, CALLER_ADDR0, parent_ip, flags);
+}
+
+static inline void notrace
+start_critical_timing(unsigned long ip, unsigned long parent_ip)
+{
+	int cpu = raw_smp_processor_id();
+	struct tracing_trace *tr = &irqsoff_trace;
+	struct tracing_trace_cpu *data = tr->data[cpu];
+	unsigned long flags;
+
+	if (unlikely(!data) || unlikely(!data->trace) ||
+	    data->critical_start || atomic_read(&data->disabled))
+		return;
+
+	atomic_inc(&data->disabled);
+
+	data->critical_sequence = max_sequence;
+	data->preempt_timestamp = now();
+	data->critical_start = parent_ip;
+	tracing_reset(data);
+
+	local_save_flags(flags);
+	tracing_function_trace(tr, data, ip, parent_ip, flags);
+
+	atomic_dec(&data->disabled);
+}
+
+static inline void notrace
+stop_critical_timing(unsigned long ip, unsigned long parent_ip)
+{
+	int cpu = raw_smp_processor_id();
+	struct tracing_trace *tr = &irqsoff_trace;
+	struct tracing_trace_cpu *data = tr->data[cpu];
+	unsigned long flags;
+
+	if (unlikely(!data) || unlikely(!data->trace) ||
+	    !data->critical_start || atomic_read(&data->disabled))
+		return;
+
+	atomic_inc(&data->disabled);
+	local_save_flags(flags);
+	tracing_function_trace(tr, data, ip, parent_ip, flags);
+	check_critical_timing(tr, data, parent_ip, cpu);
+	data->critical_start = 0;
+	atomic_dec(&data->disabled);
+}
+
+void notrace start_critical_timings(void)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		start_critical_timing(CALLER_ADDR0, 0);
+}
+
+void notrace stop_critical_timings(void)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		stop_critical_timing(CALLER_ADDR0, 0);
+}
+
+#ifdef CONFIG_LOCKDEP
+void notrace time_hardirqs_on(unsigned long a0, unsigned long a1)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		stop_critical_timing(a0, a1);
+}
+
+void notrace time_hardirqs_off(unsigned long a0, unsigned long a1)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		start_critical_timing(a0, a1);
+}
+
+#else /* !CONFIG_LOCKDEP */
+
+/*
+ * Stubs:
+ */
+
+void early_boot_irqs_off(void)
+{
+}
+
+void early_boot_irqs_on(void)
+{
+}
+
+void trace_softirqs_on(unsigned long ip)
+{
+}
+
+void trace_softirqs_off(unsigned long ip)
+{
+}
+
+inline void print_irqtrace_events(struct task_struct *curr)
+{
+}
+
+/*
+ * We are only interested in hardirq on/off events:
+ */
+void notrace trace_hardirqs_on(void)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		stop_critical_timing(CALLER_ADDR0, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void notrace trace_hardirqs_off(void)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		start_critical_timing(CALLER_ADDR0, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+void notrace trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		stop_critical_timing(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+void notrace trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+
+	if (irqs_disabled_flags(flags))
+		start_critical_timing(CALLER_ADDR0, caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+
+#endif /* CONFIG_LOCKDEP */
+
+
+#ifdef CONFIG_MCOUNT
+static void notrace irqsoff_trace_call(unsigned long ip,
+					unsigned long parent_ip)
+{
+	struct tracing_trace *tr = &irqsoff_trace;
+	struct tracing_trace_cpu *data;
+	unsigned long flags;
+	int cpu;
+
+	if (unlikely(!trace_enabled))
+		return;
+
+	local_save_flags(flags);
+
+	if (!irqs_disabled_flags(flags))
+		return;
+
+	cpu = raw_smp_processor_id();
+	data = tr->data[cpu];
+	atomic_inc(&data->disabled);
+
+	if (likely(atomic_read(&data->disabled) == 1))
+		tracing_function_trace(tr, data, ip, parent_ip, flags);
+
+	atomic_dec(&data->disabled);
+}
+
+static struct mcount_ops trace_ops __read_mostly =
+{
+	.func = irqsoff_trace_call,
+};
+#endif /* CONFIG_MCOUNT */
+
+#ifdef CONFIG_DEBUG_FS
+static void irqsoff_start(struct tracing_iterator *iter)
+{
+	mutex_lock(&max_mutex);
+}
+
+static void irqsoff_stop(struct tracing_iterator *iter)
+{
+	mutex_unlock(&max_mutex);
+}
+
+static ssize_t max_irq_lat_read(struct file *filp, char __user *ubuf,
+					size_t cnt, loff_t *ppos)
+{
+	unsigned long *ptr = filp->private_data;
+	char buf[64];
+	int r;
+
+	r = snprintf(buf, 64, "%ld\n", *ptr == -1 ? : cycles_to_usecs(*ptr));
+	if (r > 64)
+		r = 64;
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       buf, r);
+}
+static ssize_t max_irq_lat_write(struct file *filp,
+				 const char __user *ubuf,
+				 size_t cnt, loff_t *ppos)
+{
+	long *ptr = filp->private_data;
+	long val;
+	char buf[64];
+
+	if (cnt > 63)
+		cnt = 63;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	val = simple_strtoul(buf, NULL, 10);
+
+	*ptr = usecs_to_cycles(val);
+
+	return cnt;
+}
+
+static struct file_operations max_irq_lat_fops = {
+	.open = tracing_open_generic,
+	.read = max_irq_lat_read,
+	.write = max_irq_lat_write,
+};
+
+static void irqsoff_trace_ctrl_update(struct tracing_trace *tr,
+				      unsigned long val)
+{
+	val = !!val;
+
+	if (tr->ctrl ^ val) {
+		if (val) {
+			trace_enabled = 1;
+			register_mcount_function(&trace_ops);
+		} else {
+			trace_enabled = 0;
+			unregister_mcount_function(&trace_ops);
+		}
+		tr->ctrl = val;
+	}
+}
+
+static __init void irqsoff_trace_init_debugfs(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+
+	irqsoff_trace.ctrl_update = irqsoff_trace_ctrl_update;
+
+#ifdef CONFIG_MCOUNT
+	entry = debugfs_create_file("preempt_fn_trace_ctrl", 0644, d_tracer,
+				    &irqsoff_trace, &tracing_ctrl_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs"
+			   " 'preempt_fn_trace' entry\n");
+#endif
+
+	entry = debugfs_create_file("preempt_max_latency", 0644, d_tracer,
+				    &preempt_max_latency, &max_irq_lat_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'ctrl' entry\n");
+
+	entry = debugfs_create_file("preempt_thresh", 0644, d_tracer,
+				    &preempt_thresh, &max_irq_lat_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'ctrl' entry\n");
+
+	entry = debugfs_create_file("preempt_trace", 0444, d_tracer,
+				    &max_tr, &tracing_lt_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'irqsoff_trace' entry\n");
+}
+
+#endif /* CONFIG_DEBUGFS */
+
+static void notrace irqsoff_trace_open(struct tracing_iterator *iter)
+{
+	/* stop the trace while dumping */
+	if (iter->tr->ctrl)
+		trace_enabled = 0;
+}
+
+static void notrace irqsoff_trace_close(struct tracing_iterator *iter)
+{
+	if (iter->tr->ctrl)
+		trace_enabled = 1;
+}
+
+__init static int trace_irqsoff_alloc_buffers(void)
+{
+	const int order = page_order(TRACING_NR_ENTRIES * TRACING_ENTRY_SIZE);
+	const unsigned long size = (1UL << order) << PAGE_SHIFT;
+	struct tracing_entry *array;
+	int i;
+
+	for_each_possible_cpu(i) {
+		irqsoff_trace.data[i] = &per_cpu(irqsoff_trace_cpu, i);
+		max_tr.data[i] = &per_cpu(max_data, i);
+
+		array = (struct tracing_entry *)
+			  __get_free_pages(GFP_KERNEL, order);
+		if (array == NULL) {
+			printk(KERN_ERR "irqsoff tracer: failed to allocate"
+			       " %ld bytes for trace buffer!\n", size);
+			goto free_buffers;
+		}
+		irqsoff_trace.data[i]->trace = array;
+	}
+
+	array = (struct tracing_entry *)
+		__get_free_pages(GFP_KERNEL, order);
+	if (array == NULL) {
+		printk(KERN_ERR "irqsoff tracer: failed to allocate"
+		       " %ld bytes for trace buffer!\n", size);
+		goto free_buffers;
+	}
+	max_buffer = array;
+
+	/*
+	 * Since we allocate by orders of pages, we may be able to
+	 * round up a bit.
+	 */
+	irqsoff_trace.entries = size / TRACING_ENTRY_SIZE;
+	max_tr.entries = irqsoff_trace.entries;
+	max_tr.start = irqsoff_start;
+	max_tr.stop = irqsoff_stop;
+
+	pr_info("irqs off tracer: %ld bytes allocated for %ld",
+		size, TRACING_NR_ENTRIES);
+	pr_info(" entries of %d bytes\n", (int)TRACING_ENTRY_SIZE);
+	pr_info("   actual entries %ld\n", irqsoff_trace.entries);
+
+	irqsoff_trace_init_debugfs();
+
+	irqsoff_trace.open = irqsoff_trace_open;
+	irqsoff_trace.close = irqsoff_trace_close;
+
+	return 0;
+
+ free_buffers:
+	for (i-- ; i >= 0; i--) {
+		struct tracing_trace_cpu *data = irqsoff_trace.data[i];
+
+		if (data && data->trace) {
+			free_pages((unsigned long)data->trace, order);
+			data->trace = NULL;
+		}
+	}
+	return -ENOMEM;
+}
+
+device_initcall(trace_irqsoff_alloc_buffers);

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 26/30 v3] Add context switch marker to sched.c
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (24 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 25/30 v3] Trace irq disabled critical timings Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 27/30 v3] Add tracing of context switches Steven Rostedt
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: add-markers-to-sched-switch.patch --]
[-- Type: text/plain, Size: 1836 bytes --]

The trace facilities here needs a hook into the context switch events.
Since Mathieu Desnoyers has been working on markers for LTTng, I figured
I take the marker he has in his patch queue.

The thing is that he only records the prev's pid, next's pid, and prev's state.
It would suit me better if it simply passed in the prev and next pointer.
Since I don't want to add markers all over the place to get the information
that I need. The latency tracer only turns on for short periods of time
and already takes up enough memory to store the traces, I don't want to add
databases keeping track of changes such as priorities, and such, when I
could simply get that information from a context switch.

Yes, that would slow down all context switches a little when tracing is on,
but for this, it should be sufficient.

Anyway, I'm not about to start a war on this, so I simply folded and will
use Mathieu's proposed marker instead. I'm just worried that in the future
we will want more information about the processes in the context switch but
will still be stuck with just the pid's and state.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 kernel/sched.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-compile.git/kernel/sched.c
===================================================================
--- linux-compile.git.orig/kernel/sched.c	2008-01-14 14:57:33.000000000 -0500
+++ linux-compile.git/kernel/sched.c	2008-01-15 00:15:45.000000000 -0500
@@ -1933,6 +1933,9 @@ context_switch(struct rq *rq, struct tas
 	struct mm_struct *mm, *oldmm;
 
 	prepare_task_switch(rq, prev, next);
+	trace_mark(kernel_sched_schedule,
+		   "prev_pid %d next_pid %d prev_state %ld",
+		   prev->pid, next->pid, prev->state);
 	mm = next->mm;
 	oldmm = prev->active_mm;
 	/*

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 27/30 v3] Add tracing of context switches
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (25 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 26/30 v3] Add context switch marker to sched.c Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 28/30 v3] Generic command line storage Steven Rostedt
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: trace-add-cmdline-switch.patch --]
[-- Type: text/plain, Size: 11224 bytes --]

This patch adds context switch tracing, of the format of:

                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
  <idle>-0     1d..3  361us!:  0:140:R --> 5028
    bash-5036  0d..3  552us+:  5036:120:D --> 0
  <idle>-0     0d..3  606us!:  0:140:R --> 5036
    sshd-5028  1d..3  943us!:  5028:120:S --> 0
  <idle>-0     1d..3 1316us+:  0:140:R --> 5028
    bash-5036  0d..3 1388us!:  5036:120:S --> 0
    sshd-5028  1d..3 1772us!:  5028:120:S --> 0

Two files are added to /debugfs/tracing

  sched_trace - outputs the above format.

  sched_trace_ctrl
     0 - turns off context switch tracing.
     1 - turns on context switch tracing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 lib/tracing/Kconfig              |   10 +
 lib/tracing/Makefile             |    1 
 lib/tracing/trace_sched_switch.c |  205 +++++++++++++++++++++++++++++++++++++++
 lib/tracing/tracer.c             |   31 +++++
 lib/tracing/tracer.h             |   18 +++
 5 files changed, 264 insertions(+), 1 deletion(-)

Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-15 11:13:54.000000000 -0500
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 11:15:00.000000000 -0500
@@ -42,3 +42,13 @@ config CRITICAL_IRQSOFF_TIMING
 
 	      echo 0 > /debug/tracing/preempt_max_latency
 
+config CONTEXT_SWITCH_TRACER
+	bool "Trace process context switches"
+	depends on DEBUG_KERNEL
+	default n
+	select TRACING
+	select MARKERS
+	help
+	  This tracer hooks into the context switch and records
+	  all switching of tasks.
+
Index: linux-compile.git/lib/tracing/Makefile
===================================================================
--- linux-compile.git.orig/lib/tracing/Makefile	2008-01-15 11:12:31.000000000 -0500
+++ linux-compile.git/lib/tracing/Makefile	2008-01-15 11:14:44.000000000 -0500
@@ -1,6 +1,7 @@
 obj-$(CONFIG_MCOUNT) += libmcount.o
 
 obj-$(CONFIG_TRACING) += tracer.o
+obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_function.o
 obj-$(CONFIG_CRITICAL_IRQSOFF_TIMING) += trace_irqsoff.o
 
Index: linux-compile.git/lib/tracing/trace_sched_switch.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/tracing/trace_sched_switch.c	2008-01-15 11:14:44.000000000 -0500
@@ -0,0 +1,205 @@
+/*
+ * trace context switch
+ *
+ * Copyright (C) 2007 Steven Rostedt <srostedt@redhat.com>
+ *
+ */
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/marker.h>
+#include <linux/mcount.h>
+
+#include "tracer.h"
+
+static struct tracing_trace sched_switch_trace __read_mostly;
+static DEFINE_PER_CPU(struct tracing_trace_cpu, sched_switch_trace_cpu);
+
+static int trace_enabled __read_mostly;
+
+static notrace void sched_switch_callback(const struct marker *mdata,
+					  void *private_data,
+					  const char *format, ...)
+{
+	struct tracing_trace *tr = mdata->private;
+	struct tracing_trace_cpu *data;
+	struct task_struct *prev = current;
+	int prev_pid, next_pid;
+	unsigned long flags;
+	va_list ap;
+	int cpu;
+
+	if (!trace_enabled)
+		return;
+
+	va_start(ap, format);
+
+	/* just use prev pointer and grab next. */
+	prev_pid = va_arg(ap, typeof(prev_pid));
+
+	/*
+	 * Unfortunately we are limited to just the
+	 * next_pid, and the marker doesn't give us
+	 * next itself.
+	 */
+	next_pid = va_arg(ap, typeof(next_pid));
+
+	/* Ignore prev_state, since we get that from prev itself */
+	va_end(ap);
+
+	raw_local_irq_save(flags);
+	cpu = raw_smp_processor_id();
+	data = tr->data[cpu];
+	atomic_inc(&data->disabled);
+
+	if (likely(atomic_read(&data->disabled) == 1))
+		tracing_sched_switch_trace(tr, data, prev, next_pid, flags);
+
+	atomic_dec(&data->disabled);
+	raw_local_irq_restore(flags);
+}
+
+static notrace void sched_switch_reset(struct tracing_trace *tr)
+{
+	int cpu;
+
+	tr->time_start = now();
+
+	for_each_online_cpu(cpu)
+		tracing_reset(tr->data[cpu]);
+}
+
+#ifdef CONFIG_DEBUG_FS
+static void sched_switch_trace_ctrl_update(struct tracing_trace *tr,
+					   unsigned long val)
+{
+	val = !!val;
+
+	/* When starting a new trace, reset the buffers */
+	if (val)
+		sched_switch_reset(tr);
+
+	if (tr->ctrl ^ val) {
+		if (val)
+			trace_enabled = 1;
+		else
+			trace_enabled = 0;
+		tr->ctrl = val;
+	}
+}
+
+static __init void sched_switch_trace_init_debugfs(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+
+	sched_switch_trace.ctrl_update = sched_switch_trace_ctrl_update;
+
+	entry = debugfs_create_file("sched_trace_ctrl", 0644, d_tracer,
+				    &sched_switch_trace, &tracing_ctrl_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'sched_trace_ctrl' entry\n");
+
+	entry = debugfs_create_file("sched_trace", 0444, d_tracer,
+				    &sched_switch_trace, &tracing_lt_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'sched_trace' entry\n");
+}
+
+#else
+static __init void sched_switch_trace_init_debugfs(void)
+{
+	/*
+	 * No way to turn on or off the trace function
+	 * without debugfs, so we just turn it on.
+	 */
+}
+#endif
+
+static void sched_switch_trace_open(struct tracing_iterator *iter)
+{
+	/* stop the trace while dumping */
+	if (iter->tr->ctrl)
+		trace_enabled = 0;
+}
+
+static void sched_switch_trace_close(struct tracing_iterator *iter)
+{
+	if (iter->tr->ctrl)
+		trace_enabled = 1;
+}
+
+__init static int sched_switch_trace_alloc_buffers(void)
+{
+	const int order = page_order(TRACING_NR_ENTRIES * TRACING_ENTRY_SIZE);
+	const unsigned long size = (1UL << order) << PAGE_SHIFT;
+	struct tracing_entry *array;
+	int ret;
+	int i;
+
+	for_each_possible_cpu(i) {
+		sched_switch_trace.data[i] =
+			&per_cpu(sched_switch_trace_cpu, i);
+		array = (struct tracing_entry *)
+			  __get_free_pages(GFP_KERNEL, order);
+		if (array == NULL) {
+			printk(KERN_ERR "sched_switch tracer: failed to"
+			       " allocate %ld bytes for trace buffer!\n", size);
+			goto free_buffers;
+		}
+		sched_switch_trace.data[i]->trace = array;
+	}
+
+	/*
+	 * Since we allocate by orders of pages, we may be able to
+	 * round up a bit.
+	 */
+	sched_switch_trace.entries = size / TRACING_ENTRY_SIZE;
+
+	pr_info("sched_switch tracer: %ld bytes allocated for %ld",
+		size, TRACING_NR_ENTRIES);
+	pr_info(" entries of %ld bytes\n", (long)TRACING_ENTRY_SIZE);
+	pr_info("   actual entries %ld\n", sched_switch_trace.entries);
+
+	sched_switch_trace_init_debugfs();
+
+	sched_switch_trace.open = sched_switch_trace_open;
+	sched_switch_trace.close = sched_switch_trace_close;
+
+	ret = marker_probe_register("kernel_sched_schedule",
+				    "prev_pid %d next_pid %d prev_state %ld",
+				    sched_switch_callback,
+				    &sched_switch_trace);
+	if (ret) {
+		pr_info("sched trace: Couldn't add marker"
+			" probe to switch_to\n");
+		goto out;
+	}
+
+	ret = marker_arm("kernel_sched_schedule");
+	if (ret) {
+		pr_info("sched trace: Couldn't arm probe switch_to\n");
+		goto out;
+	}
+
+ out:
+	return 0;
+
+ free_buffers:
+	for (i-- ; i >= 0; i--) {
+		struct tracing_trace_cpu *data = sched_switch_trace.data[i];
+
+		if (data && data->trace) {
+			free_pages((unsigned long)data->trace, order);
+			data->trace = NULL;
+		}
+	}
+	return -ENOMEM;
+}
+
+device_initcall(sched_switch_trace_alloc_buffers);
Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-15 11:12:31.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-15 11:14:44.000000000 -0500
@@ -31,6 +31,7 @@ enum trace_type
 	__TRACE_FIRST_TYPE = 0,
 
 	TRACE_FN,
+	TRACE_CTX,
 
 	__TRACE_LAST_TYPE
 };
@@ -107,6 +108,24 @@ notrace void tracing_function_trace(stru
 	entry->fn.parent_ip = parent_ip;
 }
 
+notrace void tracing_sched_switch_trace(struct tracing_trace *tr,
+					struct tracing_trace_cpu *data,
+					struct task_struct *prev,
+					int next_pid,
+					unsigned long flags)
+{
+	struct tracing_entry *entry;
+
+	entry = tracing_get_trace_entry(tr, data);
+	tracing_generic_entry_update(entry, flags);
+	entry->type		= TRACE_CTX;
+	entry->ctx.prev_pid	= prev->pid;
+	entry->ctx.prev_prio	= prev->prio;
+	entry->ctx.prev_state	= prev->state;
+	entry->ctx.next_pid	= next_pid;
+	/* would like to save next prio, but we cant :-( */
+}
+
 #ifdef CONFIG_DEBUG_FS
 enum trace_iterator {
 	TRACE_ITER_SYM_ONLY	= 1,
@@ -414,6 +433,8 @@ lat_print_timestamp(struct seq_file *m, 
 		seq_puts(m, " : ");
 }
 
+static const char state_to_char[] = "RSDTtZX";
+
 static void notrace
 print_lat_fmt(struct seq_file *m, struct tracing_iterator *iter,
 	      unsigned int trace_idx, int cpu)
@@ -424,6 +445,7 @@ print_lat_fmt(struct seq_file *m, struct
 	unsigned long rel_usecs;
 	int sym_only = !!(trace_flags & TRACE_ITER_SYM_ONLY);
 	int verbose = !!(trace_flags & TRACE_ITER_VERBOSE);
+	int S;
 
 	if (!next_entry)
 		next_entry = entry;
@@ -450,6 +472,15 @@ print_lat_fmt(struct seq_file *m, struct
 		seq_print_ip_sym(m, entry->fn.parent_ip, sym_only);
 		seq_puts(m, ")\n");
 		break;
+	case TRACE_CTX:
+		S = entry->ctx.prev_state < sizeof(state_to_char) ?
+			state_to_char[entry->ctx.prev_state] : 'X';
+		seq_printf(m, " %d:%d:%c --> %d\n",
+			   entry->ctx.prev_pid,
+			   entry->ctx.prev_prio,
+			   S,
+			   entry->ctx.next_pid);
+		break;
 	}
 }
 
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-15 11:12:31.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 11:14:44.000000000 -0500
@@ -10,6 +10,14 @@ struct tracing_function {
 	unsigned long parent_ip;
 };
 
+struct tracing_sched_switch {
+	unsigned int prev_pid;
+	unsigned char prev_prio;
+	unsigned char prev_state;
+	unsigned int next_pid;
+	unsigned char next_prio;
+};
+
 struct tracing_entry {
 	char type;
 	char cpu;  /* who will want to trace more than 256 CPUS? */
@@ -18,7 +26,10 @@ struct tracing_entry {
 	int pid;
 	cycle_t t;
 	char comm[TASK_COMM_LEN];
-	struct tracing_function fn;
+	union {
+		struct tracing_function fn;
+		struct tracing_sched_switch ctx;
+	};
 };
 
 struct tracing_trace_cpu {
@@ -76,6 +87,11 @@ void tracing_function_trace(struct traci
 			    unsigned long ip,
 			    unsigned long parent_ip,
 			    unsigned long flags);
+void tracing_sched_switch_trace(struct tracing_trace *tr,
+				struct tracing_trace_cpu *data,
+				struct task_struct *prev,
+				int next_pid,
+				unsigned long flags);
 
 extern struct file_operations tracing_fops;
 extern struct file_operations tracing_lt_fops;

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 28/30 v3] Generic command line storage
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (26 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 27/30 v3] Add tracing of context switches Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 21:30   ` Mathieu Desnoyers
  2008-01-15 20:49 ` [RFC PATCH 29/30 v3] make varaible size buffers for traces Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 30/30 v3] trace preempt off critical timings Steven Rostedt
  29 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: trace-generic-cmdline.patch --]
[-- Type: text/plain, Size: 9637 bytes --]

Saving the comm of tasks for each trace is very expensive.
This patch includes in the context switch hook, a way to
store the last 100 command lines of tasks. This table is
examined when a trace is to be printed.

Note: The comm may be destroyed if other traces are performed.
Later (TBD) patches may simply store this information in the trace
itself.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/Kconfig              |    1 
 lib/tracing/trace_function.c     |    2 
 lib/tracing/trace_irqsoff.c      |    3 +
 lib/tracing/trace_sched_switch.c |   14 +++-
 lib/tracing/tracer.c             |  112 +++++++++++++++++++++++++++++++++++++--
 lib/tracing/tracer.h             |    5 +
 6 files changed, 128 insertions(+), 9 deletions(-)

Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-15 10:34:22.000000000 -0500
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 10:41:28.000000000 -0500
@@ -19,6 +19,7 @@ config FUNCTION_TRACER
 	default n
 	select MCOUNT
 	select TRACING
+	select CONTEXT_SWITCH_TRACER
 	help
 	  Use profiler instrumentation, adding -pg to CFLAGS. This will
 	  insert a call to an architecture specific __mcount routine,
Index: linux-compile.git/lib/tracing/trace_function.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_function.c	2008-01-15 10:26:28.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_function.c	2008-01-15 10:41:28.000000000 -0500
@@ -70,9 +70,11 @@ static void function_trace_ctrl_update(s
 	if (tr->ctrl ^ val) {
 		if (val) {
 			trace_enabled = 1;
+			atomic_inc(&trace_record_cmdline);
 			register_mcount_function(&trace_ops);
 		} else {
 			trace_enabled = 0;
+			atomic_dec(&trace_record_cmdline);
 			unregister_mcount_function(&trace_ops);
 		}
 		tr->ctrl = val;
Index: linux-compile.git/lib/tracing/trace_irqsoff.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_irqsoff.c	2008-01-15 10:27:32.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_irqsoff.c	2008-01-15 10:41:28.000000000 -0500
@@ -93,6 +93,9 @@ static void update_max_tr(struct tracing
 	save->policy = current->policy;
 	save->rt_priority = current->rt_priority;
 
+	/* record this tasks comm */
+	tracing_record_cmdline(current);
+
 	/* from memcpy above: save->trace = data->trace */
 	data->trace = max_buffer;
 	max_buffer = save->trace;
Index: linux-compile.git/lib/tracing/trace_sched_switch.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_sched_switch.c	2008-01-15 10:37:11.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_sched_switch.c	2008-01-15 10:41:28.000000000 -0500
@@ -31,7 +31,7 @@ static notrace void sched_switch_callbac
 	va_list ap;
 	int cpu;
 
-	if (!trace_enabled)
+	if (!atomic_read(&trace_record_cmdline))
 		return;
 
 	va_start(ap, format);
@@ -49,6 +49,11 @@ static notrace void sched_switch_callbac
 	/* Ignore prev_state, since we get that from prev itself */
 	va_end(ap);
 
+	tracing_record_cmdline(prev);
+
+	if (!trace_enabled)
+		return;
+
 	raw_local_irq_save(flags);
 	cpu = raw_smp_processor_id();
 	data = tr->data[cpu];
@@ -82,10 +87,13 @@ static void sched_switch_trace_ctrl_upda
 		sched_switch_reset(tr);
 
 	if (tr->ctrl ^ val) {
-		if (val)
+		if (val) {
+			atomic_inc(&trace_record_cmdline);
 			trace_enabled = 1;
-		else
+		} else {
+			atomic_dec(&trace_record_cmdline);
 			trace_enabled = 0;
+		}
 		tr->ctrl = val;
 	}
 }
Index: linux-compile.git/lib/tracing/tracer.c
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-15 10:37:38.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.c	2008-01-15 10:42:46.000000000 -0500
@@ -49,6 +49,88 @@ void notrace tracing_reset(struct tracin
 	atomic_set(&data->underrun, 0);
 }
 
+#define SAVED_CMDLINES 128
+static unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
+static unsigned map_cmdline_to_pid[SAVED_CMDLINES];
+static char saved_cmdlines[SAVED_CMDLINES][TASK_COMM_LEN];
+static int cmdline_idx;
+static DEFINE_SPINLOCK(trace_cmdline_lock);
+atomic_t trace_record_cmdline;
+atomic_t trace_record_cmdline_disabled;
+
+static void trace_init_cmdlines(void)
+{
+	memset(&map_pid_to_cmdline, -1, sizeof(map_pid_to_cmdline));
+	memset(&map_cmdline_to_pid, -1, sizeof(map_cmdline_to_pid));
+	cmdline_idx = 0;
+}
+
+notrace void trace_stop_cmdline_recording(void);
+
+static void notrace trace_save_cmdline(struct task_struct *tsk)
+{
+	unsigned map;
+	unsigned idx;
+
+	if (!tsk->pid || unlikely(tsk->pid > PID_MAX_DEFAULT))
+		return;
+
+	/*
+	 * It's not the end of the world if we don't get
+	 * the lock, but we also don't want to spin
+	 * nor do we want to disable interrupts,
+	 * so if we miss here, then better luck next time.
+	 */
+	if (!spin_trylock(&trace_cmdline_lock))
+		return;
+
+	idx = map_pid_to_cmdline[tsk->pid];
+	if (idx >= SAVED_CMDLINES) {
+		idx = (cmdline_idx + 1) % SAVED_CMDLINES;
+
+		map = map_cmdline_to_pid[idx];
+		if (map <= PID_MAX_DEFAULT)
+			map_pid_to_cmdline[map] = (unsigned)-1;
+
+		map_pid_to_cmdline[tsk->pid] = idx;
+
+		cmdline_idx = idx;
+	}
+
+	memcpy(&saved_cmdlines[idx], tsk->comm, TASK_COMM_LEN);
+
+	spin_unlock(&trace_cmdline_lock);
+}
+
+static notrace char *trace_find_cmdline(int pid)
+{
+	char *cmdline = "<...>";
+	unsigned map;
+
+	if (!pid)
+		return "<idle>";
+
+	if (pid > PID_MAX_DEFAULT)
+		goto out;
+
+	map = map_pid_to_cmdline[pid];
+	if (map >= SAVED_CMDLINES)
+		goto out;
+
+	cmdline = saved_cmdlines[map];
+
+ out:
+	return cmdline;
+}
+
+void tracing_record_cmdline(struct task_struct *tsk)
+{
+	if (atomic_read(&trace_record_cmdline_disabled))
+		return;
+
+	trace_save_cmdline(tsk);
+}
+
 static inline notrace struct tracing_entry *
 tracing_get_trace_entry(struct tracing_trace *tr,
 			struct tracing_trace_cpu *data)
@@ -90,7 +172,6 @@ tracing_generic_entry_update(struct trac
 		((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
 		((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
 		(need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
-	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
 }
 
 notrace void tracing_function_trace(struct tracing_trace *tr,
@@ -242,6 +323,8 @@ static void *s_start(struct seq_file *m,
 	loff_t l = 0;
 	int i;
 
+	atomic_inc(&trace_record_cmdline_disabled);
+
 	/* let the tracer grab locks here if needed */
 	if (iter->tr->start)
 		iter->tr->start(iter);
@@ -269,6 +352,8 @@ static void s_stop(struct seq_file *m, v
 {
 	struct tracing_iterator *iter = m->private;
 
+	atomic_dec(&trace_record_cmdline_disabled);
+
 	/* let the tracer release locks here if needed */
 	if (iter->tr->stop)
 		iter->tr->stop(iter);
@@ -390,8 +475,11 @@ static void notrace
 lat_print_generic(struct seq_file *m, struct tracing_entry *entry, int cpu)
 {
 	int hardirq, softirq;
+	char *comm;
 
-	seq_printf(m, "%8.8s-%-5d ", entry->comm, entry->pid);
+	comm = trace_find_cmdline(entry->pid);
+
+	seq_printf(m, "%8.8s-%-5d ", comm, entry->pid);
 	seq_printf(m, "%d", cpu);
 	seq_printf(m, "%c%c",
 		   (entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
@@ -453,9 +541,12 @@ print_lat_fmt(struct seq_file *m, struct
 	abs_usecs = cycles_to_usecs(entry->t - iter->tr->time_start);
 
 	if (verbose) {
+		char *comm;
+
+		comm = trace_find_cmdline(entry->pid);
 		seq_printf(m, "%16s %5d %d %d %08x %08x [%08lx]"
 			   " %ld.%03ldms (+%ld.%03ldms): ",
-			   entry->comm,
+			   comm,
 			   entry->pid, cpu, entry->flags,
 			   entry->preempt_count, trace_idx,
 			   cycles_to_usecs(entry->t),
@@ -491,6 +582,9 @@ static void notrace print_trace_fmt(stru
 	unsigned long secs;
 	int sym_only = !!(trace_flags & TRACE_ITER_SYM_ONLY);
 	unsigned long long t;
+	char *comm;
+
+	comm = trace_find_cmdline(iter->ent->pid);
 
 	t = cycles_to_usecs(iter->ent->t);
 	usec_rem = do_div(t, 1000000ULL);
@@ -498,7 +592,7 @@ static void notrace print_trace_fmt(stru
 
 	seq_printf(m, "[%5lu.%06lu] ", secs, usec_rem);
 	seq_printf(m, "CPU %d: ", iter->cpu);
-	seq_printf(m, "%s:%d ", iter->ent->comm,
+	seq_printf(m, "%s:%d ", comm,
 		   iter->ent->pid);
 	switch (iter->ent->type) {
 	case TRACE_FN:
@@ -812,6 +906,14 @@ static __init int trace_init_debugfs(voi
 	return 0;
 }
 
-device_initcall(trace_init_debugfs);
+static __init int trace_init(void)
+{
+	trace_init_cmdlines();
+
+	return trace_init_debugfs();
+
+}
+
+device_initcall(trace_init);
 
 #endif /* CONFIG_DEBUG_FS */
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-15 10:34:22.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 10:41:28.000000000 -0500
@@ -25,7 +25,6 @@ struct tracing_entry {
 	char preempt_count; /* assumes PREEMPT_MASK is 8 bits or less */
 	int pid;
 	cycle_t t;
-	char comm[TASK_COMM_LEN];
 	union {
 		struct tracing_function fn;
 		struct tracing_sched_switch ctx;
@@ -92,11 +91,15 @@ void tracing_sched_switch_trace(struct t
 				struct task_struct *prev,
 				int next_pid,
 				unsigned long flags);
+void tracing_record_cmdline(struct task_struct *tsk);
 
 extern struct file_operations tracing_fops;
 extern struct file_operations tracing_lt_fops;
 extern struct file_operations tracing_ctrl_fops;
 
+extern atomic_t trace_record_cmdline;
+extern atomic_t trace_record_cmdline_disabled;
+
 static inline notrace cycle_t now(void)
 {
 	return get_monotonic_cycles();

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 29/30 v3] make varaible size buffers for traces
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (27 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 28/30 v3] Generic command line storage Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  2008-01-15 20:49 ` [RFC PATCH 30/30 v3] trace preempt off critical timings Steven Rostedt
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: trace-variable-buffer-size.patch --]
[-- Type: text/plain, Size: 5730 bytes --]

Each trace now can have the size of its trace entries modified
through command line arguments.

trace_fn_entries - function trace entries (default 65536)
trace_irq_entries - irq off trace entries (default 512)
trace_ctx_entries - schedule switch entries (default 16384)

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/tracing/trace_function.c     |   14 ++++++++++++--
 lib/tracing/trace_irqsoff.c      |   14 ++++++++++++--
 lib/tracing/trace_sched_switch.c |   14 ++++++++++++--
 lib/tracing/tracer.h             |    1 -
 4 files changed, 36 insertions(+), 7 deletions(-)

Index: linux-compile.git/lib/tracing/trace_function.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_function.c	2008-01-15 15:13:59.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_function.c	2008-01-15 15:15:00.000000000 -0500
@@ -18,6 +18,16 @@
 static struct tracing_trace function_trace __read_mostly;
 static DEFINE_PER_CPU(struct tracing_trace_cpu, function_trace_cpu);
 static int trace_enabled __read_mostly;
+static unsigned long trace_nr_entries = (65536UL);
+
+static int __init set_nr_entries(char *str)
+{
+	if (!str)
+		return 0;
+	trace_nr_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("trace_fn_entries=", set_nr_entries);
 
 static notrace void function_trace_reset(struct tracing_trace *tr)
 {
@@ -132,7 +142,7 @@ static void function_trace_close(struct 
 
 __init static int function_trace_alloc_buffers(void)
 {
-	const int order = page_order(TRACING_NR_ENTRIES * TRACING_ENTRY_SIZE);
+	const int order = page_order(trace_nr_entries * TRACING_ENTRY_SIZE);
 	const unsigned long size = (1UL << order) << PAGE_SHIFT;
 	struct tracing_entry *array;
 	int i;
@@ -156,7 +166,7 @@ __init static int function_trace_alloc_b
 	function_trace.entries = size / TRACING_ENTRY_SIZE;
 
 	pr_info("function tracer: %ld bytes allocated for %ld",
-		size, TRACING_NR_ENTRIES);
+		size, trace_nr_entries);
 	pr_info(" entries of %ld bytes\n", (long)TRACING_ENTRY_SIZE);
 	pr_info("   actual entries %ld\n", function_trace.entries);
 
Index: linux-compile.git/lib/tracing/trace_irqsoff.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_irqsoff.c	2008-01-15 15:13:59.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_irqsoff.c	2008-01-15 15:15:00.000000000 -0500
@@ -25,6 +25,16 @@ static unsigned long preempt_max_latency
 static unsigned long preempt_thresh;
 static __cacheline_aligned_in_smp DEFINE_MUTEX(max_mutex);
 static int trace_enabled __read_mostly;
+static unsigned long trace_nr_entries = (512UL);
+
+static int __init set_nr_entries(char *str)
+{
+	if (!str)
+		return 0;
+	trace_nr_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("trace_irq_entries=", set_nr_entries);
 
 /*
  * max trace is switched with this buffer.
@@ -497,7 +507,7 @@ static void notrace irqsoff_trace_close(
 
 __init static int trace_irqsoff_alloc_buffers(void)
 {
-	const int order = page_order(TRACING_NR_ENTRIES * TRACING_ENTRY_SIZE);
+	const int order = page_order(trace_nr_entries * TRACING_ENTRY_SIZE);
 	const unsigned long size = (1UL << order) << PAGE_SHIFT;
 	struct tracing_entry *array;
 	int i;
@@ -535,7 +545,7 @@ __init static int trace_irqsoff_alloc_bu
 	max_tr.stop = irqsoff_stop;
 
 	pr_info("irqs off tracer: %ld bytes allocated for %ld",
-		size, TRACING_NR_ENTRIES);
+		size, trace_nr_entries);
 	pr_info(" entries of %d bytes\n", (int)TRACING_ENTRY_SIZE);
 	pr_info("   actual entries %ld\n", irqsoff_trace.entries);
 
Index: linux-compile.git/lib/tracing/tracer.h
===================================================================
--- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-15 15:13:59.000000000 -0500
+++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 15:15:00.000000000 -0500
@@ -76,7 +76,6 @@ struct tracing_iterator {
 };
 
 #define TRACING_ENTRY_SIZE sizeof(struct tracing_entry)
-#define TRACING_NR_ENTRIES (65536UL)
 
 void notrace tracing_reset(struct tracing_trace_cpu *data);
 int tracing_open_generic(struct inode *inode, struct file *filp);
Index: linux-compile.git/lib/tracing/trace_sched_switch.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_sched_switch.c	2008-01-15 15:13:59.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_sched_switch.c	2008-01-15 15:15:00.000000000 -0500
@@ -18,6 +18,16 @@ static struct tracing_trace sched_switch
 static DEFINE_PER_CPU(struct tracing_trace_cpu, sched_switch_trace_cpu);
 
 static int trace_enabled __read_mostly;
+static unsigned long trace_nr_entries = (16384UL);
+
+static int __init set_nr_entries(char *str)
+{
+	if (!str)
+		return 0;
+	trace_nr_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("trace_ctx_entries=", set_nr_entries);
 
 static notrace void sched_switch_callback(const struct marker *mdata,
 					  void *private_data,
@@ -144,7 +154,7 @@ static void sched_switch_trace_close(str
 
 __init static int sched_switch_trace_alloc_buffers(void)
 {
-	const int order = page_order(TRACING_NR_ENTRIES * TRACING_ENTRY_SIZE);
+	const int order = page_order(trace_nr_entries * TRACING_ENTRY_SIZE);
 	const unsigned long size = (1UL << order) << PAGE_SHIFT;
 	struct tracing_entry *array;
 	int ret;
@@ -170,7 +180,7 @@ __init static int sched_switch_trace_all
 	sched_switch_trace.entries = size / TRACING_ENTRY_SIZE;
 
 	pr_info("sched_switch tracer: %ld bytes allocated for %ld",
-		size, TRACING_NR_ENTRIES);
+		size, trace_nr_entries);
 	pr_info(" entries of %ld bytes\n", (long)TRACING_ENTRY_SIZE);
 	pr_info("   actual entries %ld\n", sched_switch_trace.entries);
 

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 30/30 v3] trace preempt off critical timings
  2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
                   ` (28 preceding siblings ...)
  2008-01-15 20:49 ` [RFC PATCH 29/30 v3] make varaible size buffers for traces Steven Rostedt
@ 2008-01-15 20:49 ` Steven Rostedt
  29 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 20:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Tim Bird,
	Sam Ravnborg, Frank Ch. Eigler, Jan Kiszka, Steven Rostedt

[-- Attachment #1: mcount-trace-latency-trace-preempt-off.patch --]
[-- Type: text/plain, Size: 10973 bytes --]

Add preempt off timings. A lot of this code is taken from the RT patch
latency trace that was written by Ingo Molnar.

Now instead of just tracing irqs off, preemption off can be selected
to be recorded.

When this is selected, it shares the same files as irqs off timings.
One can either trace preemption off, irqs off, or one or the other off.
But once it is configured in the kernel for what to trace, it can't
be changed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/process_32.c |    3 +++
 include/linux/irqflags.h     |    3 ++-
 include/linux/mcount.h       |    8 ++++++++
 include/linux/preempt.h      |    2 +-
 kernel/sched.c               |   24 +++++++++++++++++++++++-
 lib/tracing/Kconfig          |   24 ++++++++++++++++++++++++
 lib/tracing/Makefile         |    1 +
 lib/tracing/trace_irqsoff.c  |   40 ++++++++++++++++++++++++++++++----------
 8 files changed, 92 insertions(+), 13 deletions(-)

Index: linux-compile.git/lib/tracing/Kconfig
===================================================================
--- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-15 15:13:59.000000000 -0500
+++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 15:15:32.000000000 -0500
@@ -43,6 +43,30 @@ config CRITICAL_IRQSOFF_TIMING
 
 	      echo 0 > /debug/tracing/preempt_max_latency
 
+	  (Note that kernel size and overhead increases with this option
+	  enabled. This option and the preempt-off timing option can be
+	  used together or separately.)
+
+config CRITICAL_PREEMPT_TIMING
+	bool "Preemption-off critical section latency timing"
+	default n
+	depends on GENERIC_TIME
+	depends on PREEMPT
+	select TRACING
+	help
+	  This option measures the time spent in preemption off critical
+	  sections, with microsecond accuracy.
+
+	  The default measurement method is a maximum search, which is
+	  disabled by default and can be runtime (re-)started
+	  via:
+
+	      echo 0 > /debug/tracing/preempt_max_latency
+
+	  (Note that kernel size and overhead increases with this option
+	  enabled. This option and the irqs-off timing option can be
+	  used together or separately.)
+
 config CONTEXT_SWITCH_TRACER
 	bool "Trace process context switches"
 	depends on DEBUG_KERNEL
Index: linux-compile.git/lib/tracing/Makefile
===================================================================
--- linux-compile.git.orig/lib/tracing/Makefile	2008-01-15 15:13:50.000000000 -0500
+++ linux-compile.git/lib/tracing/Makefile	2008-01-15 15:15:32.000000000 -0500
@@ -4,5 +4,6 @@ obj-$(CONFIG_TRACING) += tracer.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FUNCTION_TRACER) += trace_function.o
 obj-$(CONFIG_CRITICAL_IRQSOFF_TIMING) += trace_irqsoff.o
+obj-$(CONFIG_CRITICAL_PREEMPT_TIMING) += trace_irqsoff.o
 
 libmcount-y := mcount.o
Index: linux-compile.git/lib/tracing/trace_irqsoff.c
===================================================================
--- linux-compile.git.orig/lib/tracing/trace_irqsoff.c	2008-01-15 15:15:00.000000000 -0500
+++ linux-compile.git/lib/tracing/trace_irqsoff.c	2008-01-15 15:15:32.000000000 -0500
@@ -36,6 +36,12 @@ static int __init set_nr_entries(char *s
 }
 __setup("trace_irq_entries=", set_nr_entries);
 
+#ifdef CONFIG_CRITICAL_PREEMPT_TIMING
+# define preempt_trace() (preempt_count())
+#else
+# define preempt_trace() (0)
+#endif
+
 /*
  * max trace is switched with this buffer.
  */
@@ -208,7 +214,7 @@ start_critical_timing(unsigned long ip, 
 
 	data->critical_sequence = max_sequence;
 	data->preempt_timestamp = now();
-	data->critical_start = parent_ip;
+	data->critical_start = parent_ip ? : ip;
 	tracing_reset(data);
 
 	local_save_flags(flags);
@@ -232,18 +238,19 @@ stop_critical_timing(unsigned long ip, u
 	atomic_inc(&data->disabled);
 	local_save_flags(flags);
 	tracing_function_trace(tr, data, ip, parent_ip, flags);
-	check_critical_timing(tr, data, parent_ip, cpu);
+	check_critical_timing(tr, data, parent_ip ? : ip, cpu);
 	data->critical_start = 0;
 	atomic_dec(&data->disabled);
 }
 
+/* start and stop critical timings used to for stoppage (in idle) */
 void notrace start_critical_timings(void)
 {
 	unsigned long flags;
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (preempt_trace() || irqs_disabled_flags(flags))
 		start_critical_timing(CALLER_ADDR0, 0);
 }
 
@@ -253,10 +260,11 @@ void notrace stop_critical_timings(void)
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (preempt_trace() || irqs_disabled_flags(flags))
 		stop_critical_timing(CALLER_ADDR0, 0);
 }
 
+#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
 #ifdef CONFIG_LOCKDEP
 void notrace time_hardirqs_on(unsigned long a0, unsigned long a1)
 {
@@ -264,7 +272,7 @@ void notrace time_hardirqs_on(unsigned l
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (!preempt_trace() && irqs_disabled_flags(flags))
 		stop_critical_timing(a0, a1);
 }
 
@@ -274,7 +282,7 @@ void notrace time_hardirqs_off(unsigned 
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (!preempt_trace() && irqs_disabled_flags(flags))
 		start_critical_timing(a0, a1);
 }
 
@@ -313,7 +321,7 @@ void notrace trace_hardirqs_on(void)
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (!preempt_trace() && irqs_disabled_flags(flags))
 		stop_critical_timing(CALLER_ADDR0, 0);
 }
 EXPORT_SYMBOL(trace_hardirqs_on);
@@ -324,7 +332,7 @@ void notrace trace_hardirqs_off(void)
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (!preempt_trace() && irqs_disabled_flags(flags))
 		start_critical_timing(CALLER_ADDR0, 0);
 }
 EXPORT_SYMBOL(trace_hardirqs_off);
@@ -335,7 +343,7 @@ void notrace trace_hardirqs_on_caller(un
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (!preempt_trace() && irqs_disabled_flags(flags))
 		stop_critical_timing(CALLER_ADDR0, caller_addr);
 }
 EXPORT_SYMBOL(trace_hardirqs_on_caller);
@@ -346,13 +354,25 @@ void notrace trace_hardirqs_off_caller(u
 
 	local_save_flags(flags);
 
-	if (irqs_disabled_flags(flags))
+	if (!preempt_trace() && irqs_disabled_flags(flags))
 		start_critical_timing(CALLER_ADDR0, caller_addr);
 }
 EXPORT_SYMBOL(trace_hardirqs_off_caller);
 
 #endif /* CONFIG_LOCKDEP */
+#endif /*  CONFIG_CRITICAL_IRQSOFF_TIMING */
 
+#ifdef CONFIG_CRITICAL_PREEMPT_TIMING
+void notrace trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+	stop_critical_timing(a0, a1);
+}
+
+void notrace trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+	start_critical_timing(a0, a1);
+}
+#endif /* CONFIG_CRITICAL_PREEMPT_TIMING */
 
 #ifdef CONFIG_MCOUNT
 static void notrace irqsoff_trace_call(unsigned long ip,
Index: linux-compile.git/include/linux/preempt.h
===================================================================
--- linux-compile.git.orig/include/linux/preempt.h	2008-01-15 14:51:22.000000000 -0500
+++ linux-compile.git/include/linux/preempt.h	2008-01-15 15:15:32.000000000 -0500
@@ -10,7 +10,7 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_CRITICAL_PREEMPT_TIMING)
   extern void fastcall add_preempt_count(int val);
   extern void fastcall sub_preempt_count(int val);
 #else
Index: linux-compile.git/kernel/sched.c
===================================================================
--- linux-compile.git.orig/kernel/sched.c	2008-01-15 15:08:46.000000000 -0500
+++ linux-compile.git/kernel/sched.c	2008-01-15 15:15:32.000000000 -0500
@@ -63,6 +63,7 @@
 #include <linux/reciprocal_div.h>
 #include <linux/unistd.h>
 #include <linux/pagemap.h>
+#include <linux/mcount.h>
 
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
@@ -3502,26 +3503,44 @@ void scheduler_tick(void)
 #endif
 }
 
-#if defined(CONFIG_PREEMPT) && defined(CONFIG_DEBUG_PREEMPT)
+#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
+				defined(CONFIG_CRITICAL_PREEMPT_TIMING))
+
+static inline unsigned long get_parent_ip(unsigned long addr)
+{
+	if (in_lock_functions(addr)) {
+		addr = CALLER_ADDR2;
+		if (in_lock_functions(addr))
+			addr = CALLER_ADDR3;
+	}
+	return addr;
+}
 
 void fastcall add_preempt_count(int val)
 {
+#ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Underflow?
 	 */
 	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
+#endif
 	preempt_count() += val;
+#ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Spinlock count overflowing soon?
 	 */
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
+#endif
+	if (preempt_count() == val)
+		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 }
 EXPORT_SYMBOL(add_preempt_count);
 
 void fastcall sub_preempt_count(int val)
 {
+#ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Underflow?
 	 */
@@ -3533,7 +3552,10 @@ void fastcall sub_preempt_count(int val)
 	if (DEBUG_LOCKS_WARN_ON((val < PREEMPT_MASK) &&
 			!(preempt_count() & PREEMPT_MASK)))
 		return;
+#endif
 
+	if (preempt_count() == val)
+		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 	preempt_count() -= val;
 }
 EXPORT_SYMBOL(sub_preempt_count);
Index: linux-compile.git/arch/x86/kernel/process_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/process_32.c	2008-01-15 14:54:17.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/process_32.c	2008-01-15 15:15:32.000000000 -0500
@@ -195,7 +195,10 @@ void cpu_idle(void)
 				play_dead();
 
 			__get_cpu_var(irq_stat).idle_timestamp = jiffies;
+			/* Don't trace irqs off for idle */
+			stop_critical_timings();
 			idle();
+			start_critical_timings();
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
Index: linux-compile.git/include/linux/mcount.h
===================================================================
--- linux-compile.git.orig/include/linux/mcount.h	2008-01-15 15:07:18.000000000 -0500
+++ linux-compile.git/include/linux/mcount.h	2008-01-15 15:15:32.000000000 -0500
@@ -58,4 +58,12 @@ extern void mcount(void);
 # define time_hardirqs_off(a0, a1)		do { } while (0)
 #endif
 
+#ifdef CONFIG_CRITICAL_PREEMPT_TIMING
+  extern void notrace trace_preempt_on(unsigned long a0, unsigned long a1);
+  extern void notrace trace_preempt_off(unsigned long a0, unsigned long a1);
+#else
+# define trace_preempt_on(a0, a1)		do { } while (0)
+# define trace_preempt_off(a0, a1)		do { } while (0)
+#endif
+
 #endif /* _LINUX_MCOUNT_H */
Index: linux-compile.git/include/linux/irqflags.h
===================================================================
--- linux-compile.git.orig/include/linux/irqflags.h	2008-01-15 15:07:18.000000000 -0500
+++ linux-compile.git/include/linux/irqflags.h	2008-01-15 15:15:32.000000000 -0500
@@ -54,7 +54,8 @@
 # define INIT_TRACE_IRQFLAGS
 #endif
 
-#ifdef CONFIG_CRITICAL_IRQSOFF_TIMING
+#if defined(CONFIG_CRITICAL_IRQSOFF_TIMING) || \
+	defined(CONFIG_CRITICAL_PREEMPT_TIMING)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
 #else

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock.
  2008-01-15 20:49 ` [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock Steven Rostedt
@ 2008-01-15 21:14   ` Mathieu Desnoyers
  2008-01-15 21:27     ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Mathieu Desnoyers @ 2008-01-15 21:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Tim Bird, Sam Ravnborg, Frank Ch. Eigler,
	Jan Kiszka, John Stultz

* Steven Rostedt (rostedt@goodmis.org) wrote:
> The latency tracer can call clocksource_read very early in bootup and
> before the clock source variable has been initialized. This results in a
> crash at boot up (even before earlyprintk is initialized). Since the
> clock->read variable points to NULL.
> 
> This patch simply initializes the clock to use clocksource_jiffies, so
> that any early user of clocksource_read will not crash.
> 

Hrm, is it sane at all to use the jiffies as a clocksource at early
boot? I thought it was updated by the timer interrupt, which is only
activated late in the boot process.

> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Acked-by: John Stultz <johnstul@us.ibm.com>
> ---
>  include/linux/clocksource.h |    3 +++
>  kernel/time/timekeeping.c   |    9 +++++++--
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> Index: linux-compile.git/include/linux/clocksource.h
> ===================================================================
> --- linux-compile.git.orig/include/linux/clocksource.h	2008-01-14 13:14:14.000000000 -0500
> +++ linux-compile.git/include/linux/clocksource.h	2008-01-14 14:57:46.000000000 -0500
> @@ -274,6 +274,9 @@ extern struct clocksource* clocksource_g
>  extern void clocksource_change_rating(struct clocksource *cs, int rating);
>  extern void clocksource_resume(void);
>  
> +/* used to initialize clock */
> +extern struct clocksource clocksource_jiffies;
> +
>  #ifdef CONFIG_GENERIC_TIME_VSYSCALL
>  extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
>  extern void update_vsyscall_tz(void);
> Index: linux-compile.git/kernel/time/timekeeping.c
> ===================================================================
> --- linux-compile.git.orig/kernel/time/timekeeping.c	2008-01-14 13:14:14.000000000 -0500
> +++ linux-compile.git/kernel/time/timekeeping.c	2008-01-14 14:57:46.000000000 -0500
> @@ -53,8 +53,13 @@ static inline void update_xtime_cache(u6
>  	timespec_add_ns(&xtime_cache, nsec);
>  }
>  
> -static struct clocksource *clock; /* pointer to current clocksource */
> -
> +/*
> + * pointer to current clocksource
> + *  Just in case we use clocksource_read before we initialize
> + *  the actual clock source. Instead of calling a NULL read pointer
> + *  we return jiffies.
> + */
> +static struct clocksource *clock = &clocksource_jiffies;
>  
>  #ifdef CONFIG_GENERIC_TIME
>  /**
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock.
  2008-01-15 21:14   ` Mathieu Desnoyers
@ 2008-01-15 21:27     ` Steven Rostedt
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 21:27 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Tim Bird, Sam Ravnborg, Frank Ch. Eigler,
	Jan Kiszka, John Stultz


On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > This patch simply initializes the clock to use clocksource_jiffies, so
> > that any early user of clocksource_read will not crash.
> >
>
> Hrm, is it sane at all to use the jiffies as a clocksource at early
> boot? I thought it was updated by the timer interrupt, which is only
> activated late in the boot process.

It gives us a bogus value, but we know it's always there. This was
discovered in the -rt patch where we had a hard hang at early bootup.
Seems that the tracer was calling for a clock source before it was
initialized, and it called a null pointer.

The original fix was to just make a dummy timer source that returned zero.
But using jiffies seemed a better solution.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 28/30 v3] Generic command line storage
  2008-01-15 20:49 ` [RFC PATCH 28/30 v3] Generic command line storage Steven Rostedt
@ 2008-01-15 21:30   ` Mathieu Desnoyers
  2008-01-15 22:15     ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Mathieu Desnoyers @ 2008-01-15 21:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Tim Bird, Sam Ravnborg, Frank Ch. Eigler,
	Jan Kiszka, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
> Saving the comm of tasks for each trace is very expensive.
> This patch includes in the context switch hook, a way to
> store the last 100 command lines of tasks. This table is
> examined when a trace is to be printed.
> 

Instead of saving the comm at context switch, could we save them when a
process exits ? (actually, if we want to do this right, we would also
have to trace the changes of "comm" upon exec(), so we deal correctly
with processes that do multiple execs).

This way, we would not duplicate the comm of processes still active in
the system, and would trigger on a much lower event-rate trace point.

And the idea would be to save only the comm of processes present in your
summary (top X processes); why do you save the last 100 ? (if the system
has more than 100 active tasks, this code will always be executed, which
seems bad)

Also, if we do as I propose, we would have to keep process exit/exec
tracing active until the summary is cleared, even though the rest of
"high event rate" tracing could be stopped sooner.

Mathieu

> Note: The comm may be destroyed if other traces are performed.
> Later (TBD) patches may simply store this information in the trace
> itself.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  lib/tracing/Kconfig              |    1 
>  lib/tracing/trace_function.c     |    2 
>  lib/tracing/trace_irqsoff.c      |    3 +
>  lib/tracing/trace_sched_switch.c |   14 +++-
>  lib/tracing/tracer.c             |  112 +++++++++++++++++++++++++++++++++++++--
>  lib/tracing/tracer.h             |    5 +
>  6 files changed, 128 insertions(+), 9 deletions(-)
> 
> Index: linux-compile.git/lib/tracing/Kconfig
> ===================================================================
> --- linux-compile.git.orig/lib/tracing/Kconfig	2008-01-15 10:34:22.000000000 -0500
> +++ linux-compile.git/lib/tracing/Kconfig	2008-01-15 10:41:28.000000000 -0500
> @@ -19,6 +19,7 @@ config FUNCTION_TRACER
>  	default n
>  	select MCOUNT
>  	select TRACING
> +	select CONTEXT_SWITCH_TRACER
>  	help
>  	  Use profiler instrumentation, adding -pg to CFLAGS. This will
>  	  insert a call to an architecture specific __mcount routine,
> Index: linux-compile.git/lib/tracing/trace_function.c
> ===================================================================
> --- linux-compile.git.orig/lib/tracing/trace_function.c	2008-01-15 10:26:28.000000000 -0500
> +++ linux-compile.git/lib/tracing/trace_function.c	2008-01-15 10:41:28.000000000 -0500
> @@ -70,9 +70,11 @@ static void function_trace_ctrl_update(s
>  	if (tr->ctrl ^ val) {
>  		if (val) {
>  			trace_enabled = 1;
> +			atomic_inc(&trace_record_cmdline);
>  			register_mcount_function(&trace_ops);
>  		} else {
>  			trace_enabled = 0;
> +			atomic_dec(&trace_record_cmdline);
>  			unregister_mcount_function(&trace_ops);
>  		}
>  		tr->ctrl = val;
> Index: linux-compile.git/lib/tracing/trace_irqsoff.c
> ===================================================================
> --- linux-compile.git.orig/lib/tracing/trace_irqsoff.c	2008-01-15 10:27:32.000000000 -0500
> +++ linux-compile.git/lib/tracing/trace_irqsoff.c	2008-01-15 10:41:28.000000000 -0500
> @@ -93,6 +93,9 @@ static void update_max_tr(struct tracing
>  	save->policy = current->policy;
>  	save->rt_priority = current->rt_priority;
>  
> +	/* record this tasks comm */
> +	tracing_record_cmdline(current);
> +
>  	/* from memcpy above: save->trace = data->trace */
>  	data->trace = max_buffer;
>  	max_buffer = save->trace;
> Index: linux-compile.git/lib/tracing/trace_sched_switch.c
> ===================================================================
> --- linux-compile.git.orig/lib/tracing/trace_sched_switch.c	2008-01-15 10:37:11.000000000 -0500
> +++ linux-compile.git/lib/tracing/trace_sched_switch.c	2008-01-15 10:41:28.000000000 -0500
> @@ -31,7 +31,7 @@ static notrace void sched_switch_callbac
>  	va_list ap;
>  	int cpu;
>  
> -	if (!trace_enabled)
> +	if (!atomic_read(&trace_record_cmdline))
>  		return;
>  
>  	va_start(ap, format);
> @@ -49,6 +49,11 @@ static notrace void sched_switch_callbac
>  	/* Ignore prev_state, since we get that from prev itself */
>  	va_end(ap);
>  
> +	tracing_record_cmdline(prev);
> +
> +	if (!trace_enabled)
> +		return;
> +
>  	raw_local_irq_save(flags);
>  	cpu = raw_smp_processor_id();
>  	data = tr->data[cpu];
> @@ -82,10 +87,13 @@ static void sched_switch_trace_ctrl_upda
>  		sched_switch_reset(tr);
>  
>  	if (tr->ctrl ^ val) {
> -		if (val)
> +		if (val) {
> +			atomic_inc(&trace_record_cmdline);
>  			trace_enabled = 1;
> -		else
> +		} else {
> +			atomic_dec(&trace_record_cmdline);
>  			trace_enabled = 0;
> +		}
>  		tr->ctrl = val;
>  	}
>  }
> Index: linux-compile.git/lib/tracing/tracer.c
> ===================================================================
> --- linux-compile.git.orig/lib/tracing/tracer.c	2008-01-15 10:37:38.000000000 -0500
> +++ linux-compile.git/lib/tracing/tracer.c	2008-01-15 10:42:46.000000000 -0500
> @@ -49,6 +49,88 @@ void notrace tracing_reset(struct tracin
>  	atomic_set(&data->underrun, 0);
>  }
>  
> +#define SAVED_CMDLINES 128
> +static unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
> +static unsigned map_cmdline_to_pid[SAVED_CMDLINES];
> +static char saved_cmdlines[SAVED_CMDLINES][TASK_COMM_LEN];
> +static int cmdline_idx;
> +static DEFINE_SPINLOCK(trace_cmdline_lock);
> +atomic_t trace_record_cmdline;
> +atomic_t trace_record_cmdline_disabled;
> +
> +static void trace_init_cmdlines(void)
> +{
> +	memset(&map_pid_to_cmdline, -1, sizeof(map_pid_to_cmdline));
> +	memset(&map_cmdline_to_pid, -1, sizeof(map_cmdline_to_pid));
> +	cmdline_idx = 0;
> +}
> +
> +notrace void trace_stop_cmdline_recording(void);
> +
> +static void notrace trace_save_cmdline(struct task_struct *tsk)
> +{
> +	unsigned map;
> +	unsigned idx;
> +
> +	if (!tsk->pid || unlikely(tsk->pid > PID_MAX_DEFAULT))
> +		return;
> +
> +	/*
> +	 * It's not the end of the world if we don't get
> +	 * the lock, but we also don't want to spin
> +	 * nor do we want to disable interrupts,
> +	 * so if we miss here, then better luck next time.
> +	 */
> +	if (!spin_trylock(&trace_cmdline_lock))
> +		return;
> +
> +	idx = map_pid_to_cmdline[tsk->pid];
> +	if (idx >= SAVED_CMDLINES) {
> +		idx = (cmdline_idx + 1) % SAVED_CMDLINES;
> +
> +		map = map_cmdline_to_pid[idx];
> +		if (map <= PID_MAX_DEFAULT)
> +			map_pid_to_cmdline[map] = (unsigned)-1;
> +
> +		map_pid_to_cmdline[tsk->pid] = idx;
> +
> +		cmdline_idx = idx;
> +	}
> +
> +	memcpy(&saved_cmdlines[idx], tsk->comm, TASK_COMM_LEN);
> +
> +	spin_unlock(&trace_cmdline_lock);
> +}
> +
> +static notrace char *trace_find_cmdline(int pid)
> +{
> +	char *cmdline = "<...>";
> +	unsigned map;
> +
> +	if (!pid)
> +		return "<idle>";
> +
> +	if (pid > PID_MAX_DEFAULT)
> +		goto out;
> +
> +	map = map_pid_to_cmdline[pid];
> +	if (map >= SAVED_CMDLINES)
> +		goto out;
> +
> +	cmdline = saved_cmdlines[map];
> +
> + out:
> +	return cmdline;
> +}
> +
> +void tracing_record_cmdline(struct task_struct *tsk)
> +{
> +	if (atomic_read(&trace_record_cmdline_disabled))
> +		return;
> +
> +	trace_save_cmdline(tsk);
> +}
> +
>  static inline notrace struct tracing_entry *
>  tracing_get_trace_entry(struct tracing_trace *tr,
>  			struct tracing_trace_cpu *data)
> @@ -90,7 +172,6 @@ tracing_generic_entry_update(struct trac
>  		((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
>  		((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) |
>  		(need_resched() ? TRACE_FLAG_NEED_RESCHED : 0);
> -	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
>  }
>  
>  notrace void tracing_function_trace(struct tracing_trace *tr,
> @@ -242,6 +323,8 @@ static void *s_start(struct seq_file *m,
>  	loff_t l = 0;
>  	int i;
>  
> +	atomic_inc(&trace_record_cmdline_disabled);
> +
>  	/* let the tracer grab locks here if needed */
>  	if (iter->tr->start)
>  		iter->tr->start(iter);
> @@ -269,6 +352,8 @@ static void s_stop(struct seq_file *m, v
>  {
>  	struct tracing_iterator *iter = m->private;
>  
> +	atomic_dec(&trace_record_cmdline_disabled);
> +
>  	/* let the tracer release locks here if needed */
>  	if (iter->tr->stop)
>  		iter->tr->stop(iter);
> @@ -390,8 +475,11 @@ static void notrace
>  lat_print_generic(struct seq_file *m, struct tracing_entry *entry, int cpu)
>  {
>  	int hardirq, softirq;
> +	char *comm;
>  
> -	seq_printf(m, "%8.8s-%-5d ", entry->comm, entry->pid);
> +	comm = trace_find_cmdline(entry->pid);
> +
> +	seq_printf(m, "%8.8s-%-5d ", comm, entry->pid);
>  	seq_printf(m, "%d", cpu);
>  	seq_printf(m, "%c%c",
>  		   (entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : '.',
> @@ -453,9 +541,12 @@ print_lat_fmt(struct seq_file *m, struct
>  	abs_usecs = cycles_to_usecs(entry->t - iter->tr->time_start);
>  
>  	if (verbose) {
> +		char *comm;
> +
> +		comm = trace_find_cmdline(entry->pid);
>  		seq_printf(m, "%16s %5d %d %d %08x %08x [%08lx]"
>  			   " %ld.%03ldms (+%ld.%03ldms): ",
> -			   entry->comm,
> +			   comm,
>  			   entry->pid, cpu, entry->flags,
>  			   entry->preempt_count, trace_idx,
>  			   cycles_to_usecs(entry->t),
> @@ -491,6 +582,9 @@ static void notrace print_trace_fmt(stru
>  	unsigned long secs;
>  	int sym_only = !!(trace_flags & TRACE_ITER_SYM_ONLY);
>  	unsigned long long t;
> +	char *comm;
> +
> +	comm = trace_find_cmdline(iter->ent->pid);
>  
>  	t = cycles_to_usecs(iter->ent->t);
>  	usec_rem = do_div(t, 1000000ULL);
> @@ -498,7 +592,7 @@ static void notrace print_trace_fmt(stru
>  
>  	seq_printf(m, "[%5lu.%06lu] ", secs, usec_rem);
>  	seq_printf(m, "CPU %d: ", iter->cpu);
> -	seq_printf(m, "%s:%d ", iter->ent->comm,
> +	seq_printf(m, "%s:%d ", comm,
>  		   iter->ent->pid);
>  	switch (iter->ent->type) {
>  	case TRACE_FN:
> @@ -812,6 +906,14 @@ static __init int trace_init_debugfs(voi
>  	return 0;
>  }
>  
> -device_initcall(trace_init_debugfs);
> +static __init int trace_init(void)
> +{
> +	trace_init_cmdlines();
> +
> +	return trace_init_debugfs();
> +
> +}
> +
> +device_initcall(trace_init);
>  
>  #endif /* CONFIG_DEBUG_FS */
> Index: linux-compile.git/lib/tracing/tracer.h
> ===================================================================
> --- linux-compile.git.orig/lib/tracing/tracer.h	2008-01-15 10:34:22.000000000 -0500
> +++ linux-compile.git/lib/tracing/tracer.h	2008-01-15 10:41:28.000000000 -0500
> @@ -25,7 +25,6 @@ struct tracing_entry {
>  	char preempt_count; /* assumes PREEMPT_MASK is 8 bits or less */
>  	int pid;
>  	cycle_t t;
> -	char comm[TASK_COMM_LEN];
>  	union {
>  		struct tracing_function fn;
>  		struct tracing_sched_switch ctx;
> @@ -92,11 +91,15 @@ void tracing_sched_switch_trace(struct t
>  				struct task_struct *prev,
>  				int next_pid,
>  				unsigned long flags);
> +void tracing_record_cmdline(struct task_struct *tsk);
>  
>  extern struct file_operations tracing_fops;
>  extern struct file_operations tracing_lt_fops;
>  extern struct file_operations tracing_ctrl_fops;
>  
> +extern atomic_t trace_record_cmdline;
> +extern atomic_t trace_record_cmdline_disabled;
> +
>  static inline notrace cycle_t now(void)
>  {
>  	return get_monotonic_cycles();
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 28/30 v3] Generic command line storage
  2008-01-15 21:30   ` Mathieu Desnoyers
@ 2008-01-15 22:15     ` Steven Rostedt
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2008-01-15 22:15 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Tim Bird, Sam Ravnborg, Frank Ch. Eigler,
	Jan Kiszka, Steven Rostedt


On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:

> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > Saving the comm of tasks for each trace is very expensive.
> > This patch includes in the context switch hook, a way to
> > store the last 100 command lines of tasks. This table is
> > examined when a trace is to be printed.
> >
>
> Instead of saving the comm at context switch, could we save them when a
> process exits ? (actually, if we want to do this right, we would also
> have to trace the changes of "comm" upon exec(), so we deal correctly
> with processes that do multiple execs).
>
> This way, we would not duplicate the comm of processes still active in
> the system, and would trigger on a much lower event-rate trace point.
>
> And the idea would be to save only the comm of processes present in your
> summary (top X processes); why do you save the last 100 ? (if the system
> has more than 100 active tasks, this code will always be executed, which
> seems bad)

100 was just an arbitrary number that seems to be fine in my current
tests. But yes, a function trace over a thousand tasks will clean them
out. But the recording is only done when a trace is active.

>
> Also, if we do as I propose, we would have to keep process exit/exec
> tracing active until the summary is cleared, even though the rest of
> "high event rate" tracing could be stopped sooner.

One thing I planned on doing was at a low path point (hit max latency to
record, etc) we could copy the array into the trace buffer, so the info
stays with the trace.

Going with your approach (marker at exit), I could flush the traces that
were saved, as well as the comms of the current tasks that are still
running. The thing is, this all starts to get a bit intrusive into the
rest of the kernel.  What is suppose to be a compact latency tracer,
starts to become larger, problematic, entity. Now changes in exit could
require updates to the tracer. Which is something I want to avoid.

-- Steve


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-01-15 22:15 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-15 20:49 [RFC PATCH 00/30 v3] mcount and latency tracing utility -v3 Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 01/30 v3] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 02/30 v3] Annotate core code that should not be traced Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 03/30 v3] x86_64: notrace annotations Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 04/30 v3] add notrace annotations to vsyscall Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 05/30 v3] add notrace annotations for NMI routines Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 06/30 v3] mcount based trace in the form of a header file library Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 07/30 v3] tracer add debugfs interface Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 08/30 v3] mcount tracer output file Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 09/30 v3] mcount tracer show task comm and pid Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 10/30 v3] Add a symbol only trace output Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 11/30 v3] Reset the tracer when started Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 12/30 v3] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 13/30 v3] handle accurate time keeping over long delays Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 14/30 v3] ppc clock accumulate fix Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 15/30 v3] Fixup merge between xtime_cache and timkkeeping starvation fix Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 16/30 v3] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 17/30 v3] initialize the clock source to jiffies clock Steven Rostedt
2008-01-15 21:14   ` Mathieu Desnoyers
2008-01-15 21:27     ` Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 18/30 v3] add get_monotonic_cycles Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 19/30 v3] add notrace annotations to timing events Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 20/30 v3] Add timestamps to tracer Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 21/30 v3] Sort trace by timestamp Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 22/30 v3] speed up the output of the tracer Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 23/30 v3] Add latency_trace format tor tracer Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 24/30 v3] Split out specific tracing functions Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 25/30 v3] Trace irq disabled critical timings Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 26/30 v3] Add context switch marker to sched.c Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 27/30 v3] Add tracing of context switches Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 28/30 v3] Generic command line storage Steven Rostedt
2008-01-15 21:30   ` Mathieu Desnoyers
2008-01-15 22:15     ` Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 29/30 v3] make varaible size buffers for traces Steven Rostedt
2008-01-15 20:49 ` [RFC PATCH 30/30 v3] trace preempt off critical timings Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).