linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] mcount tracing utility
@ 2008-01-03  7:16 Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
                   ` (14 more replies)
  0 siblings, 15 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin


The following patch series brings to vanilla Linux a bit of the RT kernel
trace facility. This incorporates the "-pg" profiling option of gcc
that will call the "mcount" function for all functions called in
the kernel.

This patch series implements the code for x86 (32 and 64 bit), but
other archs can easily be implemented as well.

Some Background:
----------------

A while back, Ingo Molnar and William Lee Irwin III created a latency tracer
to find problem latency areas in the kernel for the RT patch.  This tracer
became a very integral part of the RT kernel in solving where latency hot
spots were.  One of the features that the latency tracer added was a
function trace.  This function tracer would record all functions that
were called (implemented by the gcc "-pg" option) and would show what was
called when interrupts or preemption was turned off.

This feature is also very helpful in normal debugging. So it's been talked
about taking bits and pieces from the RT latency tracer and bring them
to LKML. But no one had the time to do it.

Arnaldo Carvalho de Melo took a crack at it. He pulled out the mcount
as well as part of the tracing code and made it generic from the point
of the tracing code.  I'm not sure why this stopped. Probably because
Arnaldo is a very busy man, and his efforts had to be utilized elsewhere.

While I still maintain my own Logdev utility:

  http://rostedt.homelinux.com/logdev

I came across a need to do the mcount with logdev too. I was successful
but found that it became very dependent on a lot of code. One thing that
I liked about my logdev utility was that it was very non-intrusive, and has
been easy to port from the Linux 2.0 days. I did not want to burden the
logdev patch with the intrusiveness of mcount (not really that intrusive,
it just needs to add a "notrace" annotation to functions in the kernel
that will cause more conflicts in applying patches for me).

Being close to the holidays, I grabbed Arnaldos old patches and started
massaging them into something that could be useful for logdev, and what
I found out (and talking this over with Arnaldo too) that this can
be much more useful for others as well.

The main thing I changed, was that I made the mcount function itself
generic, and not the dependency on the tracing code.  That is I added

register_mcount_function()
 and
clear_mcount_function()

So when ever mcount is enabled and a function is registered that function
is called for all functions in the kernel that is not labeled with the
"notrace" annotation.

The key thing here is that *any* utility can now hook its own function into
mcount!

The Simple Tracer:
------------------

To show the power of this I also massaged the tracer code that Arnaldo pulled
from the RT patch and made it be a nice example of what can be done
with this.

The function that is registered to mcount has the prototype:

 void func(unsigned long ip, unsigned long parent_ip);

The ip is the address of the function and parent_ip is the address of
the parent function that called it.

The x86_64 version has the assembly call the registered function directly
to save having to do a double function call.

To enable mcount, a sysctl is added:

   /proc/sys/kernel/mcount_enabled

Once mcount is enabled, when a function is registed, it will be called by
all functions. The tracer in this patch series shows how this is done.
It adds a directory in the debugfs, called mctracer. With a ctrl file that
will allow the user have the tracer register its function.  Note, the order
of enabling mcount and registering a function is not important, but both
must be done to initiate the tracing. That is, you can disable tracing
by either disabling mcount or by clearing the registered function.

Only one function may be registered at a time. If another function is
registered, it will simply override what ever was there previously.

Here's a simple example of the tracer output:

CPU 2: hackbench:11867 preempt_schedule+0xc/0x84 <-- avc_has_perm_noaudit+0x45d/0x52c
CPU 1: hackbench:12052 selinux_file_permission+0x10/0x11c <-- security_file_permission+0x16/0x18
CPU 3: hackbench:12017 update_curr+0xe/0x8b <-- put_prev_task_fair+0x24/0x4c
CPU 2: hackbench:11867 avc_audit+0x16/0x9e3 <-- avc_has_perm+0x51/0x63
CPU 0: hackbench:12019 socket_has_perm+0x16/0x7c <-- selinux_socket_sendmsg+0x27/0x3e
CPU 1: hackbench:12052 file_has_perm+0x16/0xbb <-- selinux_file_permission+0x104/0x11c

This is formated like:

 CPU <CPU#>: <task-comm>:<task-pid> <function> <-- <parent-function>


Overhead:
---------

Note that having mcount compiled in seems to show a little overhead.

Here's 3 runs of hackbench 50 without the patches:
Time: 2.137
Time: 2.283
Time: 2.245

 Avg: 2.221

and here's 3 runs with the patches (without tracing on):
Time: 2.738
Time: 2.469
Time: 2.388

  Avg: 2.531

So it is a 13% overhead when enabled (according to hackbench).

But full tracing can cause a bit more problems:

# hackbench 50
Time: 113.350

  113.350!!!!!

But this is tracing *every* function call!


Future:
-------
The way the mcount hook is done here, other utilities can easily add their
own functions. Just care needs to be made not to call anything that is not
marked with notrace, or you will crash the box with recursion. But
even the simple tracer adds a "disabled" feature so in case it happens
to call something that is not marked with notrace, it is a safety net
not to kill the box.

I was originally going to use the relay system to record the data, but
that had a chance of calling functions not marked with notrace. But, if
for example LTTng wanted to use this, it could disable tracing on a CPU
when doing the calls, and this will protect from recusion.

SystemTap:
----------
One thing that Arnaldo and I discussed last year was using systemtap to
add hooks into the kernel to start and stop tracing.  kprobes is too
heavy to do on all funtion calls, but it would be perfect to add to
non hot paths to start the tracer and stop the tracer.

So when debugging the kernel, instead of recompiling with printks
or other markers, you could simply use systemtap to place a trace start
and stop locations and trace the problem areas to see what is happening.


These are just some of the ideas we have with this. And we are sure others
could come up with more.






^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03  8:31   ` Sam Ravnborg
                     ` (3 more replies)
  2008-01-03  7:16 ` [RFC PATCH 02/11] Add fastcall to do_IRQ for i386 Steven Rostedt
                   ` (13 subsequent siblings)
  14 siblings, 4 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-add-basic-support-for-gcc-profiler-instrum.patch --]
[-- Type: text/plain, Size: 12983 bytes --]

If CONFIG_MCOUNT is selected and /proc/sys/kernel/mcount_enabled is set to a
non-zero value the mcount routine will be called everytime we enter a kernel
function that is not marked with the "notrace" attribute.

The mcount routine will then call a registered function if a function
happens to be registered.

[This code has been highly hacked by Steven Rostedt, so don't
 blame Arnaldo for all of this ;-) ]

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 Documentation/stable_api_nonsense.txt |    3 +
 Makefile                              |    4 +
 arch/x86/Kconfig                      |    6 ++
 arch/x86/Makefile_32                  |    4 +
 arch/x86/kernel/Makefile_32           |    1 
 arch/x86/kernel/entry_64.S            |   46 ++++++++++++++++++++
 arch/x86/kernel/mcount-wrapper.S      |   25 ++++++++++
 include/linux/linkage.h               |    2 
 include/linux/mcount.h                |   21 +++++++++
 kernel/sysctl.c                       |   11 ++++
 lib/Kconfig.debug                     |    2 
 lib/Makefile                          |    2 
 lib/mcount/Kconfig                    |    6 ++
 lib/mcount/Makefile                   |    3 +
 lib/mcount/mcount.c                   |   78 ++++++++++++++++++++++++++++++++++
 15 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 arch/i386/kernel/mcount-wrapper.S
 create mode 100644 lib/mcount/Kconfig
 create mode 100644 lib/mcount/Makefile
 create mode 100644 lib/mcount/mcount.c
 create mode 100644 lib/mcount/mcount.h

Index: linux-compile.git/Documentation/stable_api_nonsense.txt
===================================================================
--- linux-compile.git.orig/Documentation/stable_api_nonsense.txt	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/Documentation/stable_api_nonsense.txt	2008-01-03 01:02:33.000000000 -0500
@@ -62,6 +62,9 @@ consider the following facts about the L
       - different structures can contain different fields
       - Some functions may not be implemented at all, (i.e. some locks
 	compile away to nothing for non-SMP builds.)
+      - Parameter passing of variables from function to function can be
+	done in different ways (the CONFIG_REGPARM option controls
+	this.)
       - Memory within the kernel can be aligned in different ways,
 	depending on the build options.
   - Linux runs on a wide range of different processor architectures.
Index: linux-compile.git/Makefile
===================================================================
--- linux-compile.git.orig/Makefile	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/Makefile	2008-01-03 01:02:39.000000000 -0500
@@ -509,11 +509,15 @@ endif
 
 include $(srctree)/arch/$(SRCARCH)/Makefile
 
+ifdef CONFIG_MCOUNT
+KBUILD_CFLAGS	+= -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls
+else
 ifdef CONFIG_FRAME_POINTER
 KBUILD_CFLAGS	+= -fno-omit-frame-pointer -fno-optimize-sibling-calls
 else
 KBUILD_CFLAGS	+= -fomit-frame-pointer
 endif
+endif
 
 ifdef CONFIG_DEBUG_INFO
 KBUILD_CFLAGS	+= -g
Index: linux-compile.git/arch/x86/Kconfig
===================================================================
--- linux-compile.git.orig/arch/x86/Kconfig	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/arch/x86/Kconfig	2008-01-03 01:02:33.000000000 -0500
@@ -28,6 +28,12 @@ config GENERIC_CMOS_UPDATE
 	bool
 	default y
 
+# function tracing might turn this off:
+config REGPARM
+	bool
+	depends on !MCOUNT
+	default y
+
 config CLOCKSOURCE_WATCHDOG
 	bool
 	default y
Index: linux-compile.git/arch/x86/Makefile_32
===================================================================
--- linux-compile.git.orig/arch/x86/Makefile_32	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/arch/x86/Makefile_32	2008-01-03 01:02:33.000000000 -0500
@@ -37,7 +37,7 @@ LDFLAGS_vmlinux := --emit-relocs
 endif
 CHECKFLAGS	+= -D__i386__
 
-KBUILD_CFLAGS += -pipe -msoft-float -mregparm=3 -freg-struct-return
+KBUILD_CFLAGS += -pipe -msoft-float
 
 # prevent gcc from keeping the stack 16 byte aligned
 KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)
@@ -45,6 +45,8 @@ KBUILD_CFLAGS += $(call cc-option,-mpref
 # CPU-specific tuning. Anything which can be shared with UML should go here.
 include $(srctree)/arch/x86/Makefile_32.cpu
 
+cflags-$(CONFIG_REGPARM) += -mregparm=3 -freg-struct-return
+
 # temporary until string.h is fixed
 cflags-y += -ffreestanding
 
Index: linux-compile.git/arch/x86/kernel/Makefile_32
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/Makefile_32	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/Makefile_32	2008-01-03 01:02:33.000000000 -0500
@@ -23,6 +23,7 @@ obj-$(CONFIG_APM)		+= apm_32.o
 obj-$(CONFIG_X86_SMP)		+= smp_32.o smpboot_32.o tsc_sync.o
 obj-$(CONFIG_SMP)		+= smpcommon_32.o
 obj-$(CONFIG_X86_TRAMPOLINE)	+= trampoline_32.o
+obj-$(CONFIG_MCOUNT)		+= mcount-wrapper.o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse_32.o
 obj-$(CONFIG_X86_LOCAL_APIC)	+= apic_32.o nmi_32.o
 obj-$(CONFIG_X86_IO_APIC)	+= io_apic_32.o
Index: linux-compile.git/arch/x86/kernel/mcount-wrapper.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/arch/x86/kernel/mcount-wrapper.S	2008-01-03 01:02:33.000000000 -0500
@@ -0,0 +1,25 @@
+/*
+ *  linux/arch/x86/mcount-wrapper.S
+ *
+ *  Copyright (C) 2004 Ingo Molnar
+ */
+
+.globl mcount
+mcount:
+	cmpl $0, mcount_enabled
+	jz out
+
+	push %ebp
+	mov %esp, %ebp
+	pushl %eax
+	pushl %ecx
+	pushl %edx
+
+	call __mcount
+
+	popl %edx
+	popl %ecx
+	popl %eax
+	popl %ebp
+out:
+	ret
Index: linux-compile.git/include/linux/linkage.h
===================================================================
--- linux-compile.git.orig/include/linux/linkage.h	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/include/linux/linkage.h	2008-01-03 01:02:33.000000000 -0500
@@ -3,6 +3,8 @@
 
 #include <asm/linkage.h>
 
+#define notrace __attribute__((no_instrument_function))
+
 #ifdef __cplusplus
 #define CPP_ASMLINKAGE extern "C"
 #else
Index: linux-compile.git/kernel/sysctl.c
===================================================================
--- linux-compile.git.orig/kernel/sysctl.c	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/kernel/sysctl.c	2008-01-03 01:02:33.000000000 -0500
@@ -46,6 +46,7 @@
 #include <linux/nfs_fs.h>
 #include <linux/acpi.h>
 #include <linux/reboot.h>
+#include <linux/mcount.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -470,6 +471,16 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+#ifdef CONFIG_MCOUNT
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "mcount_enabled",
+		.data		= &mcount_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+#endif
 #ifdef CONFIG_KMOD
 	{
 		.ctl_name	= KERN_MODPROBE,
Index: linux-compile.git/lib/Kconfig.debug
===================================================================
--- linux-compile.git.orig/lib/Kconfig.debug	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/lib/Kconfig.debug	2008-01-03 01:02:33.000000000 -0500
@@ -517,4 +517,6 @@ config FAULT_INJECTION_STACKTRACE_FILTER
 	help
 	  Provide stacktrace filter for fault-injection capabilities
 
+source lib/mcount/Kconfig
+
 source "samples/Kconfig"
Index: linux-compile.git/lib/Makefile
===================================================================
--- linux-compile.git.orig/lib/Makefile	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/lib/Makefile	2008-01-03 01:02:33.000000000 -0500
@@ -66,6 +66,8 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
+obj-$(CONFIG_MCOUNT) += mcount/
+
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 hostprogs-y	:= gen_crc32table
Index: linux-compile.git/lib/mcount/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/mcount/Kconfig	2008-01-03 01:02:33.000000000 -0500
@@ -0,0 +1,6 @@
+
+# MCOUNT itself is useless, or will just be added overhead.
+# It needs something to register a function with it.
+config MCOUNT
+	bool
+	depends on DEBUG_KERNEL
Index: linux-compile.git/lib/mcount/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/mcount/Makefile	2008-01-03 01:02:33.000000000 -0500
@@ -0,0 +1,3 @@
+obj-$(CONFIG_MCOUNT) += libmcount.o
+
+libmcount-objs := mcount.o
Index: linux-compile.git/lib/mcount/mcount.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/mcount/mcount.c	2008-01-03 01:02:33.000000000 -0500
@@ -0,0 +1,78 @@
+/*
+ * Infrastructure for profiling code inserted by 'gcc -pg'.
+ *
+ * Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+ *
+ * Converted to be more generic:
+ *   Copyright (C) 2007-2008 Steven Rostedt <srostedt@redhat.com>
+ *
+ * From code in the latency_tracer, that is:
+ *
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+
+#include <linux/module.h>
+#include <linux/mcount.h>
+
+/*
+ * Since we have nothing protecting between the test of
+ * mcount_trace_function and the call to it, we can't
+ * set it to NULL without risking a race that will have
+ * the kernel call the NULL pointer. Instead, we just
+ * set the function pointer to a dummy function.
+ */
+notrace void dummy_mcount_tracer(unsigned long ip,
+				 unsigned long parent_ip)
+{
+	/* do nothing */
+}
+
+mcount_func_t mcount_trace_function = dummy_mcount_tracer;
+int mcount_enabled;
+
+/** __mcount - hook for profiling
+ *
+ * This routine is called from the arch specific mcount routine, that in turn is
+ * called from code inserted by gcc -pg.
+ */
+notrace void __mcount(void)
+{
+	if (mcount_trace_function != dummy_mcount_tracer)
+		mcount_trace_function(CALLER_ADDR1, CALLER_ADDR2);
+}
+EXPORT_SYMBOL_GPL(mcount);
+/*
+ * The above EXPORT_SYMBOL is for the gcc call of mcount and not the
+ * function __mcount that it is underneath. I put the export there
+ * to fool checkpatch.pl. It wants that export to be with the
+ * function, but that function happens to be in assembly.
+ */
+
+/**
+ * register_mcount_function - register a function for profiling
+ * @func - the function for profiling.
+ *
+ * Register a function to be called by all functions in the
+ * kernel.
+ *
+ * Note: @func and all the functions it calls must be labeled
+ *       with "notrace", otherwise it will go into a
+ *       recursive loop.
+ */
+int register_mcount_function(mcount_func_t func)
+{
+	mcount_trace_function = func;
+	return 0;
+}
+
+/**
+ * clear_mcount_function - reset the mcount function
+ *
+ * This NULLs the mcount function and in essence stops
+ * tracing.  There may be lag
+ */
+void clear_mcount_function(void)
+{
+	mcount_trace_function = dummy_mcount_tracer;
+}
Index: linux-compile.git/include/linux/mcount.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/include/linux/mcount.h	2008-01-03 01:02:33.000000000 -0500
@@ -0,0 +1,21 @@
+#ifndef _LINUX_MCOUNT_H
+#define _LINUX_MCOUNT_H
+
+#ifdef CONFIG_MCOUNT
+extern int mcount_enabled;
+
+#include <linux/linkage.h>
+
+#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
+#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
+
+typedef void (*mcount_func_t)(unsigned long ip, unsigned long parent_ip);
+
+extern void mcount(void);
+
+int register_mcount_function(mcount_func_t func);
+void clear_mcount_function(void);
+
+#endif /* CONFIG_MCOUNT */
+#endif /* _LINUX_MCOUNT_H */
Index: linux-compile.git/arch/x86/kernel/entry_64.S
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/entry_64.S	2008-01-03 01:02:28.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/entry_64.S	2008-01-03 01:02:33.000000000 -0500
@@ -53,6 +53,52 @@
 
 	.code64
 
+#ifdef CONFIG_MCOUNT
+
+ENTRY(mcount)
+	cmpl $0, mcount_enabled
+	jz out
+
+	push %rbp
+
+	lea dummy_mcount_tracer, %rbp
+	cmpq %rbp, mcount_trace_function
+	jz out_rbp
+
+	mov %rsp,%rbp
+
+	push %r11
+	push %r10
+	push %r9
+	push %r8
+	push %rdi
+	push %rsi
+	push %rdx
+	push %rcx
+	push %rax
+
+	mov 0x0(%rbp),%rax
+	mov 0x8(%rbp),%rdi
+	mov 0x8(%rax),%rsi
+
+	call   *mcount_trace_function
+
+	pop %rax
+	pop %rcx
+	pop %rdx
+	pop %rsi
+	pop %rdi
+	pop %r8
+	pop %r9
+	pop %r10
+	pop %r11
+
+out_rbp:
+	pop %rbp
+out:
+	ret
+#endif
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif	

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 02/11] Add fastcall to do_IRQ for i386
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03 17:36   ` Mathieu Desnoyers
  2008-01-03  7:16 ` [RFC PATCH 03/11] Annotate core code that should not be traced Steven Rostedt
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-fix-i386-do_irq.patch --]
[-- Type: text/plain, Size: 1491 bytes --]

MCOUNT will disable the regparm parameters of the i386 compile
options. When doing so, this breaks the prototype of do_IRQ
where the fastcall must be explicitly called.

Also fixed some whitespace damage in the call to do_IRQ.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/irq_32.c |    2 +-
 include/asm-x86/irq_32.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-compile.git/arch/x86/kernel/irq_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/irq_32.c	2007-12-20 00:20:29.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/irq_32.c	2007-12-20 00:21:55.000000000 -0500
@@ -67,7 +67,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
  * handlers).
  */
 fastcall unsigned int do_IRQ(struct pt_regs *regs)
-{	
+{
 	struct pt_regs *old_regs;
 	/* high bit used in ret_from_ code */
 	int irq = ~regs->orig_eax;
Index: linux-compile.git/include/asm-x86/irq_32.h
===================================================================
--- linux-compile.git.orig/include/asm-x86/irq_32.h	2007-12-20 00:20:29.000000000 -0500
+++ linux-compile.git/include/asm-x86/irq_32.h	2007-12-20 00:21:55.000000000 -0500
@@ -41,7 +41,7 @@ extern int irqbalance_disable(char *str)
 extern void fixup_irqs(cpumask_t map);
 #endif
 
-unsigned int do_IRQ(struct pt_regs *regs);
+fastcall unsigned int do_IRQ(struct pt_regs *regs);
 void init_IRQ(void);
 void __init native_init_IRQ(void);
 

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 03/11] Annotate core code that should not be traced
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 02/11] Add fastcall to do_IRQ for i386 Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03 17:42   ` Mathieu Desnoyers
  2008-01-03  7:16 ` [RFC PATCH 04/11] i386: notrace annotations Steven Rostedt
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-annotate-generic-code.patch --]
[-- Type: text/plain, Size: 8237 bytes --]

Mark with "notrace" functions in core code that should not be
traced.  The "notrace" attribute will prevent gcc from adding
a call to mcount on the annotated funtions.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>

---
 drivers/clocksource/acpi_pm.c |    8 ++++----
 include/linux/preempt.h       |    4 ++--
 kernel/irq/handle.c           |    2 +-
 kernel/lockdep.c              |   27 ++++++++++++++-------------
 kernel/rcupdate.c             |    2 +-
 kernel/spinlock.c             |    2 +-
 lib/smp_processor_id.c        |    2 +-
 7 files changed, 24 insertions(+), 23 deletions(-)

Index: linux-compile.git/drivers/clocksource/acpi_pm.c
===================================================================
--- linux-compile.git.orig/drivers/clocksource/acpi_pm.c	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/drivers/clocksource/acpi_pm.c	2007-12-20 01:00:48.000000000 -0500
@@ -30,13 +30,13 @@
  */
 u32 pmtmr_ioport __read_mostly;
 
-static inline u32 read_pmtmr(void)
+static inline notrace u32 read_pmtmr(void)
 {
 	/* mask the output to 24 bits */
 	return inl(pmtmr_ioport) & ACPI_PM_MASK;
 }
 
-u32 acpi_pm_read_verified(void)
+notrace u32 acpi_pm_read_verified(void)
 {
 	u32 v1 = 0, v2 = 0, v3 = 0;
 
@@ -56,12 +56,12 @@ u32 acpi_pm_read_verified(void)
 	return v2;
 }
 
-static cycle_t acpi_pm_read_slow(void)
+static notrace cycle_t acpi_pm_read_slow(void)
 {
 	return (cycle_t)acpi_pm_read_verified();
 }
 
-static cycle_t acpi_pm_read(void)
+static notrace cycle_t acpi_pm_read(void)
 {
 	return (cycle_t)read_pmtmr();
 }
Index: linux-compile.git/include/linux/preempt.h
===================================================================
--- linux-compile.git.orig/include/linux/preempt.h	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/include/linux/preempt.h	2007-12-20 01:00:48.000000000 -0500
@@ -11,8 +11,8 @@
 #include <linux/list.h>
 
 #ifdef CONFIG_DEBUG_PREEMPT
-  extern void fastcall add_preempt_count(int val);
-  extern void fastcall sub_preempt_count(int val);
+  extern notrace void fastcall add_preempt_count(int val);
+  extern notrace void fastcall sub_preempt_count(int val);
 #else
 # define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
 # define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
Index: linux-compile.git/kernel/irq/handle.c
===================================================================
--- linux-compile.git.orig/kernel/irq/handle.c	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/kernel/irq/handle.c	2007-12-20 01:00:48.000000000 -0500
@@ -163,7 +163,7 @@ irqreturn_t handle_IRQ_event(unsigned in
  * This is the original x86 implementation which is used for every
  * interrupt type.
  */
-fastcall unsigned int __do_IRQ(unsigned int irq)
+notrace fastcall unsigned int __do_IRQ(unsigned int irq)
 {
 	struct irq_desc *desc = irq_desc + irq;
 	struct irqaction *action;
Index: linux-compile.git/kernel/lockdep.c
===================================================================
--- linux-compile.git.orig/kernel/lockdep.c	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/kernel/lockdep.c	2007-12-20 01:00:48.000000000 -0500
@@ -270,14 +270,14 @@ static struct list_head chainhash_table[
 	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
 	(key2))
 
-void lockdep_off(void)
+notrace void lockdep_off(void)
 {
 	current->lockdep_recursion++;
 }
 
 EXPORT_SYMBOL(lockdep_off);
 
-void lockdep_on(void)
+notrace void lockdep_on(void)
 {
 	current->lockdep_recursion--;
 }
@@ -1036,7 +1036,7 @@ find_usage_forwards(struct lock_class *s
  * Return 1 otherwise and keep <backwards_match> unchanged.
  * Return 0 on error.
  */
-static noinline int
+static noinline notrace int
 find_usage_backwards(struct lock_class *source, unsigned int depth)
 {
 	struct lock_list *entry;
@@ -1586,7 +1586,7 @@ static inline int validate_chain(struct 
  * We are building curr_chain_key incrementally, so double-check
  * it from scratch, to make sure that it's done correctly:
  */
-static void check_chain_key(struct task_struct *curr)
+static notrace void check_chain_key(struct task_struct *curr)
 {
 #ifdef CONFIG_DEBUG_LOCKDEP
 	struct held_lock *hlock, *prev_hlock = NULL;
@@ -1962,7 +1962,7 @@ static int mark_lock_irq(struct task_str
 /*
  * Mark all held locks with a usage bit:
  */
-static int
+static notrace int
 mark_held_locks(struct task_struct *curr, int hardirq)
 {
 	enum lock_usage_bit usage_bit;
@@ -2009,7 +2009,7 @@ void early_boot_irqs_on(void)
 /*
  * Hardirqs will be enabled:
  */
-void trace_hardirqs_on(void)
+notrace void trace_hardirqs_on(void)
 {
 	struct task_struct *curr = current;
 	unsigned long ip;
@@ -2057,7 +2057,7 @@ EXPORT_SYMBOL(trace_hardirqs_on);
 /*
  * Hardirqs were disabled:
  */
-void trace_hardirqs_off(void)
+notrace void trace_hardirqs_off(void)
 {
 	struct task_struct *curr = current;
 
@@ -2241,8 +2241,8 @@ static inline int separate_irq_context(s
 /*
  * Mark a lock with a usage bit, and validate the state transition:
  */
-static int mark_lock(struct task_struct *curr, struct held_lock *this,
-		     enum lock_usage_bit new_bit)
+static notrace int mark_lock(struct task_struct *curr, struct held_lock *this,
+			     enum lock_usage_bit new_bit)
 {
 	unsigned int new_mask = 1 << new_bit, ret = 1;
 
@@ -2648,7 +2648,7 @@ __lock_release(struct lockdep_map *lock,
 /*
  * Check whether we follow the irq-flags state precisely:
  */
-static void check_flags(unsigned long flags)
+static notrace void check_flags(unsigned long flags)
 {
 #if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
 	if (!debug_locks)
@@ -2685,8 +2685,8 @@ static void check_flags(unsigned long fl
  * We are not always called with irqs disabled - do that here,
  * and also avoid lockdep recursion:
  */
-void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
-		  int trylock, int read, int check, unsigned long ip)
+notrace void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
+			  int trylock, int read, int check, unsigned long ip)
 {
 	unsigned long flags;
 
@@ -2708,7 +2708,8 @@ void lock_acquire(struct lockdep_map *lo
 
 EXPORT_SYMBOL_GPL(lock_acquire);
 
-void lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
+notrace void lock_release(struct lockdep_map *lock, int nested,
+			  unsigned long ip)
 {
 	unsigned long flags;
 
Index: linux-compile.git/kernel/rcupdate.c
===================================================================
--- linux-compile.git.orig/kernel/rcupdate.c	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/kernel/rcupdate.c	2007-12-20 01:00:48.000000000 -0500
@@ -504,7 +504,7 @@ static int __rcu_pending(struct rcu_ctrl
  * by the current CPU, returning 1 if so.  This function is part of the
  * RCU implementation; it is -not- an exported member of the RCU API.
  */
-int rcu_pending(int cpu)
+notrace int rcu_pending(int cpu)
 {
 	return __rcu_pending(&rcu_ctrlblk, &per_cpu(rcu_data, cpu)) ||
 		__rcu_pending(&rcu_bh_ctrlblk, &per_cpu(rcu_bh_data, cpu));
Index: linux-compile.git/kernel/spinlock.c
===================================================================
--- linux-compile.git.orig/kernel/spinlock.c	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/kernel/spinlock.c	2007-12-20 01:00:48.000000000 -0500
@@ -437,7 +437,7 @@ int __lockfunc _spin_trylock_bh(spinlock
 }
 EXPORT_SYMBOL(_spin_trylock_bh);
 
-int in_lock_functions(unsigned long addr)
+notrace int in_lock_functions(unsigned long addr)
 {
 	/* Linker adds these: start and end of __lockfunc functions */
 	extern char __lock_text_start[], __lock_text_end[];
Index: linux-compile.git/lib/smp_processor_id.c
===================================================================
--- linux-compile.git.orig/lib/smp_processor_id.c	2007-12-20 01:00:29.000000000 -0500
+++ linux-compile.git/lib/smp_processor_id.c	2007-12-20 01:00:48.000000000 -0500
@@ -7,7 +7,7 @@
 #include <linux/kallsyms.h>
 #include <linux/sched.h>
 
-unsigned int debug_smp_processor_id(void)
+notrace unsigned int debug_smp_processor_id(void)
 {
 	unsigned long preempt_count = preempt_count();
 	int this_cpu = raw_smp_processor_id();

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 04/11] i386: notrace annotations
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (2 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 03/11] Annotate core code that should not be traced Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03 17:52   ` Mathieu Desnoyers
  2008-01-03  7:16 ` [RFC PATCH 05/11] x86_64: " Steven Rostedt
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-add-i386-notrace-annotations.patch --]
[-- Type: text/plain, Size: 7525 bytes --]

>From patch-2.6.21.5-rt20. Annotates functions that should not be profiler
instrumented, i.e. where mcount should not be called at function entry.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/apic_32.c  |    2 +-
 arch/x86/kernel/hpet.c     |    2 +-
 arch/x86/kernel/irq_32.c   |    2 +-
 arch/x86/kernel/nmi_32.c   |    2 +-
 arch/x86/kernel/smp_32.c   |    2 +-
 arch/x86/kernel/time_32.c  |    2 +-
 arch/x86/kernel/traps_32.c |    4 ++--
 arch/x86/kernel/tsc_32.c   |    2 +-
 arch/x86/lib/delay_32.c    |    6 +++---
 arch/x86/mm/fault_32.c     |    4 ++--
 arch/x86/mm/init_32.c      |    2 +-
 11 files changed, 15 insertions(+), 15 deletions(-)
---

Index: linux-compile.git/arch/x86/kernel/apic_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/apic_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/apic_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -577,7 +577,7 @@ static void local_apic_timer_interrupt(v
  *   interrupt as well. Thus we cannot inline the local irq ... ]
  */
 
-void fastcall smp_apic_timer_interrupt(struct pt_regs *regs)
+notrace fastcall void smp_apic_timer_interrupt(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs = set_irq_regs(regs);
 
Index: linux-compile.git/arch/x86/kernel/hpet.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/hpet.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/hpet.c	2008-01-02 22:56:41.000000000 -0500
@@ -295,7 +295,7 @@ static int hpet_legacy_next_event(unsign
 /*
  * Clock source related code
  */
-static cycle_t read_hpet(void)
+static notrace cycle_t read_hpet(void)
 {
 	return (cycle_t)hpet_readl(HPET_COUNTER);
 }
Index: linux-compile.git/arch/x86/kernel/irq_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/irq_32.c	2008-01-02 22:56:34.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/irq_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -66,7 +66,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
  * SMP cross-CPU interrupts have their own specific
  * handlers).
  */
-fastcall unsigned int do_IRQ(struct pt_regs *regs)
+notrace fastcall unsigned int do_IRQ(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs;
 	/* high bit used in ret_from_ code */
Index: linux-compile.git/arch/x86/kernel/nmi_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/nmi_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/nmi_32.c	2008-01-02 22:57:52.000000000 -0500
@@ -323,7 +323,7 @@ EXPORT_SYMBOL(touch_nmi_watchdog);
 
 extern void die_nmi(struct pt_regs *, const char *msg);
 
-__kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+notrace __kprobes int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 {
 
 	/*
Index: linux-compile.git/arch/x86/kernel/smp_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/smp_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/smp_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -638,7 +638,7 @@ static void native_smp_send_stop(void)
  * all the work is done automatically when
  * we return from the interrupt.
  */
-fastcall void smp_reschedule_interrupt(struct pt_regs *regs)
+notrace fastcall void smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	__get_cpu_var(irq_stat).irq_resched_count++;
Index: linux-compile.git/arch/x86/kernel/time_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/time_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/time_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -122,7 +122,7 @@ static int set_rtc_mmss(unsigned long no
 
 int timer_ack;
 
-unsigned long profile_pc(struct pt_regs *regs)
+notrace unsigned long profile_pc(struct pt_regs *regs)
 {
 	unsigned long pc = instruction_pointer(regs);
 
Index: linux-compile.git/arch/x86/kernel/traps_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/traps_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/traps_32.c	2008-01-02 22:58:19.000000000 -0500
@@ -723,7 +723,7 @@ void __kprobes die_nmi(struct pt_regs *r
 	do_exit(SIGSEGV);
 }
 
-static __kprobes void default_do_nmi(struct pt_regs * regs)
+static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
 
@@ -763,7 +763,7 @@ static __kprobes void default_do_nmi(str
 
 static int ignore_nmis;
 
-fastcall __kprobes void do_nmi(struct pt_regs * regs, long error_code)
+notrace fastcall __kprobes void do_nmi(struct pt_regs *regs, long error_code)
 {
 	int cpu;
 
Index: linux-compile.git/arch/x86/kernel/tsc_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/tsc_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/tsc_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -269,7 +269,7 @@ core_initcall(cpufreq_tsc);
 
 static unsigned long current_tsc_khz = 0;
 
-static cycle_t read_tsc(void)
+static notrace cycle_t read_tsc(void)
 {
 	cycle_t ret;
 
Index: linux-compile.git/arch/x86/lib/delay_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/lib/delay_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/lib/delay_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -24,7 +24,7 @@
 #endif
 
 /* simple loop based delay: */
-static void delay_loop(unsigned long loops)
+static notrace void delay_loop(unsigned long loops)
 {
 	int d0;
 
@@ -39,7 +39,7 @@ static void delay_loop(unsigned long loo
 }
 
 /* TSC based delay: */
-static void delay_tsc(unsigned long loops)
+static notrace void delay_tsc(unsigned long loops)
 {
 	unsigned long bclock, now;
 
@@ -72,7 +72,7 @@ int read_current_timer(unsigned long *ti
 	return -1;
 }
 
-void __delay(unsigned long loops)
+notrace void __delay(unsigned long loops)
 {
 	delay_fn(loops);
 }
Index: linux-compile.git/arch/x86/mm/fault_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/mm/fault_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/mm/fault_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -293,8 +293,8 @@ int show_unhandled_signals = 1;
  *	bit 3 == 1 means use of reserved bit detected
  *	bit 4 == 1 means fault was an instruction fetch
  */
-fastcall void __kprobes do_page_fault(struct pt_regs *regs,
-				      unsigned long error_code)
+notrace fastcall void __kprobes do_page_fault(struct pt_regs *regs,
+					      unsigned long error_code)
 {
 	struct task_struct *tsk;
 	struct mm_struct *mm;
Index: linux-compile.git/arch/x86/mm/init_32.c
===================================================================
--- linux-compile.git.orig/arch/x86/mm/init_32.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/mm/init_32.c	2008-01-02 22:56:41.000000000 -0500
@@ -200,7 +200,7 @@ static inline int page_kills_ppro(unsign
 	return 0;
 }
 
-int page_is_ram(unsigned long pagenr)
+notrace int page_is_ram(unsigned long pagenr)
 {
 	int i;
 	unsigned long addr, end;

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 05/11] x86_64: notrace annotations
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (3 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 04/11] i386: notrace annotations Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 06/11] add notrace annotations to vsyscall Steven Rostedt
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-add-x86_64-notrace-annotations.patch --]
[-- Type: text/plain, Size: 4231 bytes --]

Add "notrace" annotation to x86_64 specific files.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/head64.c      |    2 +-
 arch/x86/kernel/nmi_64.c      |    2 +-
 arch/x86/kernel/setup64.c     |    4 ++--
 arch/x86/kernel/smpboot_64.c  |    2 +-
 arch/x86/kernel/tsc_64.c      |    4 ++--
 arch/x86/kernel/vsyscall_64.c |    3 ++-
 6 files changed, 9 insertions(+), 8 deletions(-)

Index: linux-compile.git/arch/x86/kernel/head64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/head64.c	2007-12-19 21:44:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/head64.c	2007-12-20 00:52:47.000000000 -0500
@@ -46,7 +46,7 @@ static void __init copy_bootdata(char *r
 	}
 }
 
-void __init x86_64_start_kernel(char * real_mode_data)
+notrace void __init x86_64_start_kernel(char *real_mode_data)
 {
 	int i;
 
Index: linux-compile.git/arch/x86/kernel/nmi_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/nmi_64.c	2007-12-19 21:44:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/nmi_64.c	2007-12-20 00:51:50.000000000 -0500
@@ -314,7 +314,7 @@ void touch_nmi_watchdog(void)
  	touch_softlockup_watchdog();
 }
 
-int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+notrace __kprobes int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 {
 	int sum;
 	int touched = 0;
Index: linux-compile.git/arch/x86/kernel/setup64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/setup64.c	2007-12-19 21:44:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/setup64.c	2007-12-20 00:52:32.000000000 -0500
@@ -114,7 +114,7 @@ void __init setup_per_cpu_areas(void)
 	}
 } 
 
-void pda_init(int cpu)
+notrace void pda_init(int cpu)
 { 
 	struct x8664_pda *pda = cpu_pda(cpu);
 
@@ -197,7 +197,7 @@ DEFINE_PER_CPU(struct orig_ist, orig_ist
  * 'CPU state barrier', nothing should get across.
  * A lot of state is already set up in PDA init.
  */
-void __cpuinit cpu_init (void)
+notrace void __cpuinit cpu_init(void)
 {
 	int cpu = stack_smp_processor_id();
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
Index: linux-compile.git/arch/x86/kernel/smpboot_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/smpboot_64.c	2007-12-19 21:44:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/smpboot_64.c	2007-12-20 00:49:57.000000000 -0500
@@ -317,7 +317,7 @@ static inline void set_cpu_sibling_map(i
 /*
  * Setup code on secondary processor (after comming out of the trampoline)
  */
-void __cpuinit start_secondary(void)
+notrace __cpuinit void start_secondary(void)
 {
 	/*
 	 * Dont put anything before smp_callin(), SMP
Index: linux-compile.git/arch/x86/kernel/tsc_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/tsc_64.c	2007-12-19 21:44:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/tsc_64.c	2007-12-20 00:49:57.000000000 -0500
@@ -248,13 +248,13 @@ __setup("notsc", notsc_setup);
 
 
 /* clock source code: */
-static cycle_t read_tsc(void)
+static notrace cycle_t read_tsc(void)
 {
 	cycle_t ret = (cycle_t)get_cycles_sync();
 	return ret;
 }
 
-static cycle_t __vsyscall_fn vread_tsc(void)
+static notrace cycle_t __vsyscall_fn vread_tsc(void)
 {
 	cycle_t ret = (cycle_t)get_cycles_sync();
 	return ret;
Index: linux-compile.git/arch/x86/kernel/vsyscall_64.c
===================================================================
--- linux-compile.git.orig/arch/x86/kernel/vsyscall_64.c	2007-12-19 21:44:52.000000000 -0500
+++ linux-compile.git/arch/x86/kernel/vsyscall_64.c	2007-12-20 00:54:53.000000000 -0500
@@ -42,7 +42,8 @@
 #include <asm/topology.h>
 #include <asm/vgtod.h>
 
-#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
+#define __vsyscall(nr) \
+		__attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
 #define __syscall_clobber "r11","rcx","memory"
 #define __pa_vsymbol(x)			\
 	({unsigned long v;  		\

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 06/11] add notrace annotations to vsyscall.
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (4 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 05/11] x86_64: " Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 07/11] mcount based trace in the form of a header file library Steven Rostedt
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-add-x86-vdso-notrace-annotations.patch --]
[-- Type: text/plain, Size: 4067 bytes --]

Add the notrace annotations to some of the vsyscall functions.

Note: checkpatch errors on the define of vsyscall_fn because it thinks
   that it is a complex macro that needs paranthesis. Unfortunately
   we can't put paranthesis on this macro.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/vdso/vclock_gettime.c |   15 ++++++++-------
 arch/x86/vdso/vgetcpu.c        |    3 ++-
 include/asm-x86/vsyscall.h     |    3 ++-
 3 files changed, 12 insertions(+), 9 deletions(-)

Index: linux-compile.git/arch/x86/vdso/vclock_gettime.c
===================================================================
--- linux-compile.git.orig/arch/x86/vdso/vclock_gettime.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/vdso/vclock_gettime.c	2008-01-02 22:59:09.000000000 -0500
@@ -24,7 +24,7 @@
 
 #define gtod vdso_vsyscall_gtod_data
 
-static long vdso_fallback_gettime(long clock, struct timespec *ts)
+static long notrace vdso_fallback_gettime(long clock, struct timespec *ts)
 {
 	long ret;
 	asm("syscall" : "=a" (ret) :
@@ -32,7 +32,7 @@ static long vdso_fallback_gettime(long c
 	return ret;
 }
 
-static inline long vgetns(void)
+static inline long notrace vgetns(void)
 {
 	long v;
 	cycles_t (*vread)(void);
@@ -41,7 +41,7 @@ static inline long vgetns(void)
 	return (v * gtod->clock.mult) >> gtod->clock.shift;
 }
 
-static noinline int do_realtime(struct timespec *ts)
+static noinline int notrace do_realtime(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	do {
@@ -55,7 +55,8 @@ static noinline int do_realtime(struct t
 }
 
 /* Copy of the version in kernel/time.c which we cannot directly access */
-static void vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
+static void notrace
+vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
 {
 	while (nsec >= NSEC_PER_SEC) {
 		nsec -= NSEC_PER_SEC;
@@ -69,7 +70,7 @@ static void vset_normalized_timespec(str
 	ts->tv_nsec = nsec;
 }
 
-static noinline int do_monotonic(struct timespec *ts)
+static noinline int notrace do_monotonic(struct timespec *ts)
 {
 	unsigned long seq, ns, secs;
 	do {
@@ -83,7 +84,7 @@ static noinline int do_monotonic(struct 
 	return 0;
 }
 
-int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
+int notrace __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
 	if (likely(gtod->sysctl_enabled && gtod->clock.vread))
 		switch (clock) {
@@ -97,7 +98,7 @@ int __vdso_clock_gettime(clockid_t clock
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
-int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
+int notrace __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	long ret;
 	if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
Index: linux-compile.git/arch/x86/vdso/vgetcpu.c
===================================================================
--- linux-compile.git.orig/arch/x86/vdso/vgetcpu.c	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/arch/x86/vdso/vgetcpu.c	2008-01-02 22:59:35.000000000 -0500
@@ -13,7 +13,8 @@
 #include <asm/vgtod.h>
 #include "vextern.h"
 
-long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
+long notrace
+__vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
 	unsigned int dummy, p;
 
Index: linux-compile.git/include/asm-x86/vsyscall.h
===================================================================
--- linux-compile.git.orig/include/asm-x86/vsyscall.h	2008-01-02 22:53:52.000000000 -0500
+++ linux-compile.git/include/asm-x86/vsyscall.h	2008-01-02 23:00:34.000000000 -0500
@@ -24,7 +24,8 @@ enum vsyscall_num {
 	((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
 #define __section_vsyscall_clock __attribute__ \
 	((unused, __section__ (".vsyscall_clock"),aligned(16)))
-#define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn")))
+#define __vsyscall_fn __attribute__ \
+	((unused, __section__(".vsyscall_fn"))) notrace
 
 #define VGETCPU_RDTSCP	1
 #define VGETCPU_LSL	2

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 07/11] mcount based trace in the form of a header file library
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (5 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 06/11] add notrace annotations to vsyscall Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 08/11] tracer add debugfs interface Steven Rostedt
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-simple-tracer.patch --]
[-- Type: text/plain, Size: 6826 bytes --]

The design is for mcount based tracers to be added thru the
lib/mcount/tracer_interface.h file, just like mcount users should add
themselves to lib/mcount/mcount.h. A Kconfig rule chooses the right MCOUNT and
MCOUNT_TRACER user.

This is to avoid function call costs for something that is supposed to be used
only in a debug kernel and that has to reduce to the bare minimum the per
function call overhead of mcount based tracing.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/mcount/Kconfig            |   11 +++
 lib/mcount/Makefile           |    2 
 lib/mcount/tracer.c           |  125 ++++++++++++++++++++++++++++++++++++++++++
 lib/mcount/tracer.h           |   21 +++++++
 lib/mcount/tracer_interface.h |   14 ++++
 5 files changed, 173 insertions(+)
 create mode 100644 lib/mcount/tracer.c
 create mode 100644 lib/mcount/tracer.h
 create mode 100644 lib/mcount/tracer_interface.h

Index: linux-compile.git/lib/mcount/Kconfig
===================================================================
--- linux-compile.git.orig/lib/mcount/Kconfig	2008-01-02 23:24:53.000000000 -0500
+++ linux-compile.git/lib/mcount/Kconfig	2008-01-02 23:28:06.000000000 -0500
@@ -4,3 +4,14 @@
 config MCOUNT
 	bool
 	depends on DEBUG_KERNEL
+
+config MCOUNT_TRACER
+	bool "Profiler instrumentation based tracer"
+	depends on DEBUG_KERNEL
+	default n
+	select MCOUNT
+	help
+	  Use profiler instrumentation, adding -pg to CFLAGS. This will
+	  insert a call to an architecture specific __mcount routine,
+	  that the debugging mechanism using this facility will hook by
+	  providing a set of inline routines.
Index: linux-compile.git/lib/mcount/tracer.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/mcount/tracer.c	2008-01-02 23:28:06.000000000 -0500
@@ -0,0 +1,125 @@
+/*
+ * ring buffer based mcount tracer
+ *
+ * Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+ * 		      Steven Rostedt <srostedt@redhat.com>
+ *
+ * From code in the latency_tracer, that is:
+ *
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/seq_file.h>
+#include <linux/mcount.h>
+
+#include "tracer.h"
+#include "tracer_interface.h"
+
+static struct mctracer_trace mctracer_trace;
+
+static inline notrace void
+mctracer_add_trace_entry(struct mctracer_trace *tr,
+			 int cpu,
+			 const unsigned long ip,
+			 const unsigned long parent_ip)
+{
+	unsigned long idx, idx_next;
+	struct mctracer_entry *entry;
+
+	idx = tr->trace_idx[cpu];
+	idx_next = idx + 1;
+
+	if (unlikely(idx_next >= tr->entries)) {
+		atomic_inc(&tr->underrun[cpu]);
+		idx_next = 0;
+	}
+
+	tr->trace_idx[cpu] = idx_next;
+
+	if (unlikely(idx_next != 0 && atomic_read(&tr->underrun[cpu])))
+		atomic_inc(&tr->underrun[cpu]);
+
+	entry = tr->trace[cpu] + idx * MCTRACER_ENTRY_SIZE;
+	entry->idx	 = atomic_inc_return(&tr->cnt);
+	entry->ip	 = ip;
+	entry->parent_ip = parent_ip;
+}
+
+static inline notrace void trace_function(const unsigned long ip,
+					  const unsigned long parent_ip)
+{
+	unsigned long flags;
+	struct mctracer_trace *tr;
+	int cpu;
+
+	raw_local_irq_save(flags);
+	cpu = raw_smp_processor_id();
+
+	tr = &mctracer_trace;
+
+	atomic_inc(&tr->disabled[cpu]);
+	if (likely(atomic_read(&tr->disabled[cpu]) == 1))
+		mctracer_add_trace_entry(tr, cpu, ip, parent_ip);
+
+	atomic_dec(&tr->disabled[cpu]);
+
+	raw_local_irq_restore(flags);
+}
+
+
+static inline notrace int page_order(const unsigned long size)
+{
+	const unsigned long nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+	return ilog2(roundup_pow_of_two(nr_pages));
+}
+
+static inline notrace int mctracer_alloc_buffers(void)
+{
+	const int order = page_order(MCTRACER_NR_ENTRIES * MCTRACER_ENTRY_SIZE);
+	const unsigned long size = (1UL << order) << PAGE_SHIFT;
+	struct mctracer_entry *array;
+	int i;
+
+	for_each_possible_cpu(i) {
+		array = (struct mctracer_entry *)
+			  __get_free_pages(GFP_KERNEL, order);
+		if (array == NULL) {
+			printk(KERN_ERR "mctracer: failed to allocate"
+			       " %ld bytes for trace buffer!\n", size);
+			goto free_buffers;
+		}
+		mctracer_trace.trace[i] = array;
+	}
+
+	/*
+	 * Since we allocate by orders of pages, we may be able to
+	 * round up a bit.
+	 */
+	mctracer_trace.entries = size / MCTRACER_ENTRY_SIZE;
+
+
+	pr_info("mctracer: %ld bytes allocated for %ld entries of %ld bytes\n",
+		size, MCTRACER_NR_ENTRIES, MCTRACER_ENTRY_SIZE);
+	pr_info("   actual entries %ld\n", mctracer_trace.entries);
+
+	register_mcount_function(trace_function);
+
+	return 0;
+
+ free_buffers:
+	for (i-- ; i >= 0; i--) {
+		if (mctracer_trace.trace[i]) {
+			free_pages((unsigned long)mctracer_trace.trace[i],
+				   order);
+			mctracer_trace.trace[i] = NULL;
+		}
+	}
+	return -ENOMEM;
+}
+
+device_initcall(mctracer_alloc_buffers);
Index: linux-compile.git/lib/mcount/tracer.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/mcount/tracer.h	2008-01-02 23:28:06.000000000 -0500
@@ -0,0 +1,21 @@
+#ifndef _LINUX_MCOUNT_TRACER_H
+#define _LINUX_MCOUNT_TRACER_H
+
+#include <asm/atomic.h>
+
+struct mctracer_entry {
+	unsigned long idx;
+	unsigned long ip;
+	unsigned long parent_ip;
+};
+
+struct mctracer_trace {
+	void	      *trace[NR_CPUS];
+	unsigned long trace_idx[NR_CPUS];
+	unsigned long entries;
+	atomic_t      cnt;
+	atomic_t      disabled[NR_CPUS];
+	atomic_t      underrun[NR_CPUS];
+};
+
+#endif /* _LINUX_MCOUNT_TRACER_H */
Index: linux-compile.git/lib/mcount/tracer_interface.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-compile.git/lib/mcount/tracer_interface.h	2008-01-02 23:28:06.000000000 -0500
@@ -0,0 +1,14 @@
+#ifndef _LINUX_MCTRACER_INTERFACE_H
+#define _LINUX_MCTRACER_INTERFACE_H
+
+#include "tracer.h"
+
+/*
+ * Will be at least sizeof(struct mctracer_entry), but callers can request more
+ * space for private stuff, such as a timestamp, preempt_count, etc.
+ */
+#define MCTRACER_ENTRY_SIZE sizeof(struct mctracer_entry)
+
+#define MCTRACER_NR_ENTRIES (65536UL)
+
+#endif /* _LINUX_MCTRACER_INTERFACE_H */
Index: linux-compile.git/lib/mcount/Makefile
===================================================================
--- linux-compile.git.orig/lib/mcount/Makefile	2008-01-02 23:24:53.000000000 -0500
+++ linux-compile.git/lib/mcount/Makefile	2008-01-02 23:28:06.000000000 -0500
@@ -1,3 +1,5 @@
 obj-$(CONFIG_MCOUNT) += libmcount.o
 
+obj-$(CONFIG_MCOUNT_TRACER) += tracer.o
+
 libmcount-objs := mcount.o

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 08/11] tracer add debugfs interface
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (6 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 07/11] mcount based trace in the form of a header file library Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 09/11] mcount tracer output file Steven Rostedt
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-tracer-debugfs.patch --]
[-- Type: text/plain, Size: 3643 bytes --]

This patch adds an interface into debugfs.

  /debugfs/mctracer/ctrl

echoing 1 into the ctrl file turns on the tracer,
and echoing 0 turns it off.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/mcount/tracer.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/mcount/tracer.h |    1 
 2 files changed, 87 insertions(+), 1 deletion(-)

Index: linux-compile.git/lib/mcount/tracer.c
===================================================================
--- linux-compile.git.orig/lib/mcount/tracer.c	2008-01-02 23:07:23.000000000 -0500
+++ linux-compile.git/lib/mcount/tracer.c	2008-01-02 23:12:50.000000000 -0500
@@ -15,6 +15,8 @@
 #include <linux/init.h>
 #include <linux/linkage.h>
 #include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
 #include <linux/mcount.h>
 
 #include "tracer.h"
@@ -71,6 +73,89 @@ static inline notrace void trace_functio
 	raw_local_irq_restore(flags);
 }
 
+#ifdef CONFIG_DEBUG_FS
+static int mctracer_open_generic(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+
+static ssize_t mctracer_ctrl_read(struct file *filp, char __user *ubuf,
+				  size_t cnt, loff_t *ppos)
+{
+	struct mctracer_trace *tr = filp->private_data;
+	char buf[16];
+	int r;
+
+	r = sprintf(buf, "%ld\n", tr->ctrl);
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       buf, r);
+}
+
+static ssize_t mctracer_ctrl_write(struct file *filp,
+				   const char __user *ubuf,
+				   size_t cnt, loff_t *ppos)
+{
+	struct mctracer_trace *tr = filp->private_data;
+	int val;
+	char buf[16];
+
+	if (cnt > 15)
+		cnt = 15;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	val = !!simple_strtoul(buf, NULL, 10);
+
+	if (tr->ctrl ^ val) {
+		if (val)
+			register_mcount_function(trace_function);
+		else
+			clear_mcount_function();
+		tr->ctrl = val;
+	}
+
+	filp->f_pos += cnt;
+
+	return cnt;
+}
+
+static struct file_operations mctracer_ctrl_fops = {
+	.open = mctracer_open_generic,
+	.read = mctracer_ctrl_read,
+	.write = mctracer_ctrl_write,
+};
+
+static void mctrace_init_debugfs(void)
+{
+	struct dentry *d_mctracer;
+	struct dentry *entry;
+
+	d_mctracer = debugfs_create_dir("mctracer", NULL);
+	if (!d_mctracer) {
+		pr_warning("Could not create debugfs directory mctracer\n");
+		return;
+	}
+
+	entry = debugfs_create_file("ctrl", 0644, d_mctracer,
+				    &mctracer_trace, &mctracer_ctrl_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'ctrl' entry\n");
+}
+#else /* CONFIG_DEBUG_FS */
+static void mctrace_init_debugfs(void)
+{
+	/*
+	 * No way to turn on or off the trace function
+	 * without debugfs, so we just turn it on.
+	 */
+	register_mcount_function(trace_function);
+}
+#endif /* CONFIG_DEBUG_FS */
 
 static inline notrace int page_order(const unsigned long size)
 {
@@ -107,7 +192,7 @@ static inline notrace int mctracer_alloc
 		size, MCTRACER_NR_ENTRIES, MCTRACER_ENTRY_SIZE);
 	pr_info("   actual entries %ld\n", mctracer_trace.entries);
 
-	register_mcount_function(trace_function);
+	mctrace_init_debugfs();
 
 	return 0;
 
Index: linux-compile.git/lib/mcount/tracer.h
===================================================================
--- linux-compile.git.orig/lib/mcount/tracer.h	2008-01-02 23:04:34.000000000 -0500
+++ linux-compile.git/lib/mcount/tracer.h	2008-01-02 23:11:39.000000000 -0500
@@ -13,6 +13,7 @@ struct mctracer_trace {
 	void	      *trace[NR_CPUS];
 	unsigned long trace_idx[NR_CPUS];
 	unsigned long entries;
+	long	      ctrl;
 	atomic_t      cnt;
 	atomic_t      disabled[NR_CPUS];
 	atomic_t      underrun[NR_CPUS];

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 09/11] mcount tracer output file
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (7 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 08/11] tracer add debugfs interface Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03  7:16 ` [RFC PATCH 10/11] mcount tracer show task comm and pid Steven Rostedt
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-tracer-debugfs-show.patch --]
[-- Type: text/plain, Size: 7583 bytes --]

Add /debugfs/mctracer/trace to output trace output.

Here's an example of the content.

  CPU 0:  [<ffffffff80494691>] notifier_call_chain+0x16/0x60 <-- [<ffffffff80494701>] __atomic_notifier_call_chain+0x26/0x56
  CPU 0:  [<ffffffff802161c8>] mce_idle_callback+0x9/0x2f <-- [<ffffffff804946b3>] notifier_call_chain+0x38/0x60
  CPU 0:  [<ffffffff8037fb7a>] acpi_processor_idle+0x16/0x518 <-- [<ffffffff8020aee8>] cpu_idle+0xa1/0xe7
  CPU 0:  [<ffffffff8037fa98>] acpi_safe_halt+0x9/0x43 <-- [<ffffffff8037fd3a>] acpi_processor_idle+0x1d6/0x518
  CPU 1:  [<ffffffff80221db8>] smp_apic_timer_interrupt+0xc/0x58 <-- [<ffffffff8020cf06>] apic_timer_interrupt+0x66/0x70
  CPU 1:  [<ffffffff8020ac22>] exit_idle+0x9/0x22 <-- [<ffffffff80221de1>] smp_apic_timer_interrupt+0x35/0x58
  CPU 1:  [<ffffffff8020ab97>] __exit_idle+0x9/0x2e <-- [<ffffffff8020ac39>] exit_idle+0x20/0x22
  CPU 1:  [<ffffffff8049473a>] atomic_notifier_call_chain+0x9/0x16 <-- [<ffffffff8020abba>] __exit_idle+0x2c/0x2e
  CPU 1:  [<ffffffff804946e9>] __atomic_notifier_call_chain+0xe/0x56 <-- [<ffffffff80494745>] atomic_notifier_call_chain+0x14/0x16
  CPU 1:  [<ffffffff80494691>] notifier_call_chain+0x16/0x60 <-- [<ffffffff80494701>] __atomic_notifier_call_chain+0x26/0x56
  CPU 1:  [<ffffffff802161c8>] mce_idle_callback+0x9/0x2f <-- [<ffffffff804946b3>] notifier_call_chain+0x38/0x60

This is in the format of the output when KALLSYMS is defined.

  CPU <CPU#>: [<IP>] <func> <-- [<Parent-IP>] <parent-func>

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/mcount/tracer.c |  215 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 213 insertions(+), 2 deletions(-)

Index: linux-compile.git/lib/mcount/tracer.c
===================================================================
--- linux-compile.git.orig/lib/mcount/tracer.c	2008-01-02 23:12:50.000000000 -0500
+++ linux-compile.git/lib/mcount/tracer.c	2008-01-02 23:17:21.000000000 -0500
@@ -13,9 +13,11 @@
 #include <linux/fs.h>
 #include <linux/gfp.h>
 #include <linux/init.h>
+#include <linux/module.h>
 #include <linux/linkage.h>
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
+#include <linux/kallsyms.h>
 #include <linux/uaccess.h>
 #include <linux/mcount.h>
 
@@ -74,6 +76,211 @@ static inline notrace void trace_functio
 }
 
 #ifdef CONFIG_DEBUG_FS
+struct mctracer_iterator {
+	struct mctracer_trace *tr;
+	struct mctracer_entry *ent;
+	unsigned long next_idx[NR_CPUS];
+	int cpu;
+	int idx;
+};
+
+static struct mctracer_entry *mctracer_entry_idx(struct mctracer_trace *tr,
+						 unsigned long idx,
+						 int cpu)
+{
+	struct mctracer_entry *array = tr->trace[cpu];
+	unsigned long underrun;
+
+	if (idx >= tr->entries)
+		return NULL;
+
+	underrun = atomic_read(&tr->underrun[cpu]);
+	if (underrun)
+		idx = (underrun + idx) % tr->entries;
+	else if (idx >= tr->trace_idx[cpu])
+		return NULL;
+
+	return &array[idx];
+}
+
+static void *find_next_entry(struct mctracer_iterator *iter)
+{
+	struct mctracer_trace *tr = iter->tr;
+	struct mctracer_entry *ent;
+	struct mctracer_entry *next = NULL;
+	int next_i = -1;
+	int i;
+
+	for_each_possible_cpu(i) {
+		if (!tr->trace[i])
+			continue;
+		ent = mctracer_entry_idx(tr, iter->next_idx[i], i);
+		if (ent && (!next || next->idx > ent->idx)) {
+			next = ent;
+			next_i = i;
+		}
+	}
+	if (next) {
+		iter->next_idx[next_i]++;
+		iter->idx++;
+	}
+	iter->ent = next;
+	iter->cpu = next_i;
+
+	return next ? iter : NULL;
+}
+
+static void *s_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct mctracer_iterator *iter = m->private;
+	void *ent;
+	int i = (int)*pos;
+
+	(*pos)++;
+
+	/* can't go backwards */
+	if (iter->idx > i)
+		return NULL;
+
+	if (iter->idx < 0)
+		ent = find_next_entry(iter);
+	else
+		ent = iter->ent;
+
+	while (ent && iter->idx < i)
+		ent = find_next_entry(iter);
+
+	return ent;
+}
+
+static void *s_start(struct seq_file *m, loff_t *pos)
+{
+	struct mctracer_iterator *iter = m->private;
+	void *p = NULL;
+	loff_t l = 0;
+	int i;
+
+	iter->ent = NULL;
+	iter->cpu = 0;
+	iter->idx = -1;
+	for (i = 0; i < NR_CPUS; i++)
+		iter->next_idx[i] = 0;
+
+	/* stop the trace while dumping */
+	if (iter->tr->ctrl)
+		clear_mcount_function();
+
+	for (p = (void *)1; p && l < *pos; p = s_next(m, p, &l))
+		;
+
+	return p;
+}
+
+static void s_stop(struct seq_file *m, void *p)
+{
+	struct mctracer_iterator *iter = m->private;
+	if (iter->tr->ctrl)
+		register_mcount_function(trace_function);
+}
+
+#ifdef CONFIG_KALLSYMS
+static void seq_print_symbol(struct seq_file *m,
+			     const char *fmt, unsigned long address)
+{
+	char buffer[KSYM_SYMBOL_LEN];
+
+	sprint_symbol(buffer, address);
+	seq_printf(m, fmt, buffer);
+}
+#else
+# define seq_print_symbol(m, fmt, address) do { } while (0)
+#endif
+
+#ifndef CONFIG_64BIT
+#define seq_print_ip_sym(m, ip)			\
+do {						\
+	seq_printf(m, "[<%08lx>]", ip);		\
+	seq_print_symbol(m, " %s", ip);	\
+} while (0)
+#else
+#define seq_print_ip_sym(m, ip)			\
+do {						\
+	seq_printf(m, "[<%016lx>]", ip);	\
+	seq_print_symbol(m, " %s", ip);	\
+} while (0)
+#endif
+
+static int s_show(struct seq_file *m, void *v)
+{
+	int i = (long)(v);
+	struct mctracer_iterator *iter = v;
+
+	if (i == 1) {
+		seq_printf(m, "mctracer:\n");
+	} else {
+		if (!iter->ent) {
+			seq_printf(m, " ERROR!!!! ent is NULL!\n");
+			return -1;
+		}
+
+		seq_printf(m, "  CPU %d:  ", iter->cpu);
+		seq_print_ip_sym(m, iter->ent->ip);
+		if (iter->ent->parent_ip) {
+			seq_printf(m, " <-- ");
+			seq_print_ip_sym(m, iter->ent->parent_ip);
+		}
+		seq_printf(m, "\n");
+	}
+
+	return 0;
+}
+
+static struct seq_operations mctrace_seq_ops = {
+	.start = s_start,
+	.next = s_next,
+	.stop = s_stop,
+	.show = s_show,
+};
+
+static int mctrace_open (struct inode *inode, struct file *file)
+{
+	struct mctracer_iterator *iter;
+	int ret;
+
+	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	if (!iter)
+		return -ENOMEM;
+
+	iter->tr = &mctracer_trace;
+
+	/* TODO stop tracer */
+	ret = seq_open(file, &mctrace_seq_ops);
+	if (!ret) {
+		struct seq_file *m = file->private_data;
+		m->private = iter;
+	} else
+		kfree(iter);
+
+	return ret;
+}
+
+int mctrace_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = (struct seq_file *)file->private_data;
+	struct mctracer_iterator *iter = m->private;
+
+	kfree(iter);
+	seq_release(inode, file);
+	return 0;
+}
+
+static struct file_operations mctrace_fops = {
+	.open = mctrace_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = mctrace_release,
+};
+
 static int mctracer_open_generic(struct inode *inode, struct file *filp)
 {
 	filp->private_data = inode->i_private;
@@ -98,7 +305,7 @@ static ssize_t mctracer_ctrl_write(struc
 				   size_t cnt, loff_t *ppos)
 {
 	struct mctracer_trace *tr = filp->private_data;
-	int val;
+	long val;
 	char buf[16];
 
 	if (cnt > 15)
@@ -145,6 +352,11 @@ static void mctrace_init_debugfs(void)
 				    &mctracer_trace, &mctracer_ctrl_fops);
 	if (!entry)
 		pr_warning("Could not create debugfs 'ctrl' entry\n");
+
+	entry = debugfs_create_file("trace", 0444, d_mctracer,
+				    &mctracer_trace, &mctrace_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'trace' entry\n");
 }
 #else /* CONFIG_DEBUG_FS */
 static void mctrace_init_debugfs(void)
@@ -187,7 +399,6 @@ static inline notrace int mctracer_alloc
 	 */
 	mctracer_trace.entries = size / MCTRACER_ENTRY_SIZE;
 
-
 	pr_info("mctracer: %ld bytes allocated for %ld entries of %ld bytes\n",
 		size, MCTRACER_NR_ENTRIES, MCTRACER_ENTRY_SIZE);
 	pr_info("   actual entries %ld\n", mctracer_trace.entries);

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 10/11] mcount tracer show task comm and pid
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (8 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 09/11] mcount tracer output file Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03 17:56   ` Mathieu Desnoyers
  2008-01-03  7:16 ` [RFC PATCH 11/11] Add a symbol only trace output Steven Rostedt
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-tracer-add-pid-comm.patch --]
[-- Type: text/plain, Size: 2600 bytes --]

This adds the task comm and pid to the trace output. This gives the
output like:

CPU 0: sshd:2605 [<ffffffff80251858>] remove_wait_queue+0xc/0x4a <-- [<ffffffff802ad7be>] free_poll_entry+0x1e/0x2a
CPU 2: bash:2610 [<ffffffff8038c3aa>] tty_check_change+0x9/0xb6 <-- [<ffffffff8038d295>] tty_ioctl+0x59f/0xcdd
CPU 0: sshd:2605 [<ffffffff80491ec6>] _spin_lock_irqsave+0xe/0x81 <-- [<ffffffff80251863>] remove_wait_queue+0x17/0x4a
CPU 2: bash:2610 [<ffffffff8024e2f7>] find_vpid+0x9/0x24 <-- [<ffffffff8038d325>] tty_ioctl+0x62f/0xcdd
CPU 0: sshd:2605 [<ffffffff804923ec>] _spin_unlock_irqrestore+0x9/0x3a <-- [<ffffffff80251891>] remove_wait_queue+0x45/0x4a
CPU 0: sshd:2605 [<ffffffff802a18b3>] fput+0x9/0x1b <-- [<ffffffff802ad7c6>] free_poll_entry+0x26/0x2a


Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/mcount/tracer.c |    6 +++++-
 lib/mcount/tracer.h |    3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-compile.git/lib/mcount/tracer.c
===================================================================
--- linux-compile.git.orig/lib/mcount/tracer.c	2008-01-02 23:17:21.000000000 -0500
+++ linux-compile.git/lib/mcount/tracer.c	2008-01-02 23:17:44.000000000 -0500
@@ -34,6 +34,7 @@ mctracer_add_trace_entry(struct mctracer
 {
 	unsigned long idx, idx_next;
 	struct mctracer_entry *entry;
+	struct task_struct *tsk = current;
 
 	idx = tr->trace_idx[cpu];
 	idx_next = idx + 1;
@@ -52,6 +53,8 @@ mctracer_add_trace_entry(struct mctracer
 	entry->idx	 = atomic_inc_return(&tr->cnt);
 	entry->ip	 = ip;
 	entry->parent_ip = parent_ip;
+	entry->pid	 = tsk->pid;
+	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
 }
 
 static inline notrace void trace_function(const unsigned long ip,
@@ -223,7 +226,8 @@ static int s_show(struct seq_file *m, vo
 			return -1;
 		}
 
-		seq_printf(m, "  CPU %d:  ", iter->cpu);
+		seq_printf(m, "CPU %d: ", iter->cpu);
+		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
 		seq_print_ip_sym(m, iter->ent->ip);
 		if (iter->ent->parent_ip) {
 			seq_printf(m, " <-- ");
Index: linux-compile.git/lib/mcount/tracer.h
===================================================================
--- linux-compile.git.orig/lib/mcount/tracer.h	2008-01-02 23:16:15.000000000 -0500
+++ linux-compile.git/lib/mcount/tracer.h	2008-01-02 23:17:44.000000000 -0500
@@ -2,11 +2,14 @@
 #define _LINUX_MCOUNT_TRACER_H
 
 #include <asm/atomic.h>
+#include <linux/sched.h>
 
 struct mctracer_entry {
 	unsigned long idx;
 	unsigned long ip;
 	unsigned long parent_ip;
+	char comm[TASK_COMM_LEN];
+	pid_t pid;
 };
 
 struct mctracer_trace {

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 11/11] Add a symbol only trace output
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (9 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 10/11] mcount tracer show task comm and pid Steven Rostedt
@ 2008-01-03  7:16 ` Steven Rostedt
  2008-01-03 17:22 ` [RFC PATCH 00/11] mcount tracing utility Mathieu Desnoyers
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

[-- Attachment #1: mcount-tracer-symbol-only.patch --]
[-- Type: text/plain, Size: 7096 bytes --]

The trace output is very verbose with outputing both the
IP address (Instruction Pointer not Internet Protocol!)
and the kallsyms symbol. So if kallsyms is configured into
the kernel, another file is created in the debugfs system.
This is the trace_symonly file that leaves out the IP address.

Here's an example:

CPU 1: swapper:0 smp_apic_timer_interrupt+0xc/0x58 <-- apic_timer_interrupt+0x66/0x70
CPU 1: swapper:0 exit_idle+0x9/0x22 <-- smp_apic_timer_interrupt+0x35/0x58
CPU 0: sshd:2611 _spin_unlock+0x9/0x38 <-- __qdisc_run+0xb2/0x1a1
CPU 1: swapper:0 __exit_idle+0x9/0x2e <-- exit_idle+0x20/0x22
CPU 0: sshd:2611 _spin_lock+0xe/0x7a <-- __qdisc_run+0xba/0x1a1
CPU 1: swapper:0 atomic_notifier_call_chain+0x9/0x16 <-- __exit_idle+0x2c/0x2e
CPU 1: swapper:0 __atomic_notifier_call_chain+0xe/0x56 <-- atomic_notifier_call_chain+0x14/0x16


Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 lib/mcount/tracer.c |  161 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 106 insertions(+), 55 deletions(-)

Index: linux-compile.git/lib/mcount/tracer.c
===================================================================
--- linux-compile.git.orig/lib/mcount/tracer.c	2008-01-03 00:29:31.000000000 -0500
+++ linux-compile.git/lib/mcount/tracer.c	2008-01-03 00:37:40.000000000 -0500
@@ -85,6 +85,7 @@ struct mctracer_iterator {
 	unsigned long next_idx[NR_CPUS];
 	int cpu;
 	int idx;
+	int sym_only;
 };
 
 static struct mctracer_entry *mctracer_entry_idx(struct mctracer_trace *tr,
@@ -156,7 +157,7 @@ static void *s_next(struct seq_file *m, 
 	return ent;
 }
 
-static void *s_start(struct seq_file *m, loff_t *pos)
+static void *__s_start(struct seq_file *m, loff_t *pos, int sym_only)
 {
 	struct mctracer_iterator *iter = m->private;
 	void *p = NULL;
@@ -166,6 +167,8 @@ static void *s_start(struct seq_file *m,
 	iter->ent = NULL;
 	iter->cpu = 0;
 	iter->idx = -1;
+	iter->sym_only = sym_only;
+
 	for (i = 0; i < NR_CPUS; i++)
 		iter->next_idx[i] = 0;
 
@@ -179,6 +182,11 @@ static void *s_start(struct seq_file *m,
 	return p;
 }
 
+static void *s_start(struct seq_file *m, loff_t *pos)
+{
+	return __s_start(m, pos, 0);
+}
+
 static void s_stop(struct seq_file *m, void *p)
 {
 	struct mctracer_iterator *iter = m->private;
@@ -186,58 +194,7 @@ static void s_stop(struct seq_file *m, v
 		register_mcount_function(trace_function);
 }
 
-#ifdef CONFIG_KALLSYMS
-static void seq_print_symbol(struct seq_file *m,
-			     const char *fmt, unsigned long address)
-{
-	char buffer[KSYM_SYMBOL_LEN];
-
-	sprint_symbol(buffer, address);
-	seq_printf(m, fmt, buffer);
-}
-#else
-# define seq_print_symbol(m, fmt, address) do { } while (0)
-#endif
-
-#ifndef CONFIG_64BIT
-#define seq_print_ip_sym(m, ip)			\
-do {						\
-	seq_printf(m, "[<%08lx>]", ip);		\
-	seq_print_symbol(m, " %s", ip);	\
-} while (0)
-#else
-#define seq_print_ip_sym(m, ip)			\
-do {						\
-	seq_printf(m, "[<%016lx>]", ip);	\
-	seq_print_symbol(m, " %s", ip);	\
-} while (0)
-#endif
-
-static int s_show(struct seq_file *m, void *v)
-{
-	int i = (long)(v);
-	struct mctracer_iterator *iter = v;
-
-	if (i == 1) {
-		seq_printf(m, "mctracer:\n");
-	} else {
-		if (!iter->ent) {
-			seq_printf(m, " ERROR!!!! ent is NULL!\n");
-			return -1;
-		}
-
-		seq_printf(m, "CPU %d: ", iter->cpu);
-		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
-		seq_print_ip_sym(m, iter->ent->ip);
-		if (iter->ent->parent_ip) {
-			seq_printf(m, " <-- ");
-			seq_print_ip_sym(m, iter->ent->parent_ip);
-		}
-		seq_printf(m, "\n");
-	}
-
-	return 0;
-}
+static int s_show(struct seq_file *m, void *v);
 
 static struct seq_operations mctrace_seq_ops = {
 	.start = s_start,
@@ -246,7 +203,8 @@ static struct seq_operations mctrace_seq
 	.show = s_show,
 };
 
-static int mctrace_open (struct inode *inode, struct file *file)
+static int __mctrace_open(struct inode *inode, struct file *file,
+			  struct seq_operations *seq_ops)
 {
 	struct mctracer_iterator *iter;
 	int ret;
@@ -258,7 +216,7 @@ static int mctrace_open (struct inode *i
 	iter->tr = &mctracer_trace;
 
 	/* TODO stop tracer */
-	ret = seq_open(file, &mctrace_seq_ops);
+	ret = seq_open(file, seq_ops);
 	if (!ret) {
 		struct seq_file *m = file->private_data;
 		m->private = iter;
@@ -268,6 +226,11 @@ static int mctrace_open (struct inode *i
 	return ret;
 }
 
+static int mctrace_open(struct inode *inode, struct file *file)
+{
+	return __mctrace_open(inode, file, &mctrace_seq_ops);
+}
+
 int mctrace_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *m = (struct seq_file *)file->private_data;
@@ -278,6 +241,87 @@ int mctrace_release(struct inode *inode,
 	return 0;
 }
 
+#ifndef CONFIG_64BIT
+#define seq_print_ip_sym(m, ip, sym_only)		\
+do {							\
+	if (!sym_only)					\
+		seq_printf(m, "[<%08lx>] ", ip);	\
+	seq_print_symbol(m, "%s", ip);			\
+} while (0)
+#else
+#define seq_print_ip_sym(m, ip, sym_only)		\
+do {							\
+	if (!sym_only)					\
+		seq_printf(m, "[<%016lx>] ", ip);	\
+	seq_print_symbol(m, "%s", ip);			\
+} while (0)
+#endif
+
+#ifdef CONFIG_KALLSYMS
+static void seq_print_symbol(struct seq_file *m,
+			     const char *fmt, unsigned long address)
+{
+	char buffer[KSYM_SYMBOL_LEN];
+
+	sprint_symbol(buffer, address);
+	seq_printf(m, fmt, buffer);
+}
+
+static void *s_start_sym_only(struct seq_file *m, loff_t *pos)
+{
+	return __s_start(m, pos, 1);
+}
+
+static struct seq_operations mctrace_sym_only_seq_ops = {
+	.start = s_start_sym_only,
+	.next = s_next,
+	.stop = s_stop,
+	.show = s_show,
+};
+
+static int mctrace_open_sym_only(struct inode *inode, struct file *file)
+{
+	return __mctrace_open(inode, file, &mctrace_sym_only_seq_ops);
+}
+
+static struct file_operations mctrace_sym_only_fops = {
+	.open = mctrace_open_sym_only,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = mctrace_release,
+};
+
+#else
+# define seq_print_symbol(m, fmt, address) do { } while (0)
+#endif
+
+static int s_show(struct seq_file *m, void *v)
+{
+	int i = (long)(v);
+	struct mctracer_iterator *iter = v;
+
+	if (i == 1) {
+		seq_printf(m, "mctracer:\n");
+	} else {
+		if (!iter->ent) {
+			seq_printf(m, " ERROR!!!! ent is NULL!\n");
+			return -1;
+		}
+
+		seq_printf(m, "CPU %d: ", iter->cpu);
+		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
+		seq_print_ip_sym(m, iter->ent->ip, iter->sym_only);
+		if (iter->ent->parent_ip) {
+			seq_printf(m, " <-- ");
+			seq_print_ip_sym(m, iter->ent->parent_ip,
+					 iter->sym_only);
+		}
+		seq_printf(m, "\n");
+	}
+
+	return 0;
+}
+
 static struct file_operations mctrace_fops = {
 	.open = mctrace_open,
 	.read = seq_read,
@@ -361,6 +405,13 @@ static void mctrace_init_debugfs(void)
 				    &mctracer_trace, &mctrace_fops);
 	if (!entry)
 		pr_warning("Could not create debugfs 'trace' entry\n");
+
+#ifdef CONFIG_KALLSYMS
+	entry = debugfs_create_file("trace_symonly", 0444, d_mctracer,
+				    &mctracer_trace, &mctrace_sym_only_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs 'trace_symonly' entry\n");
+#endif
 }
 #else /* CONFIG_DEBUG_FS */
 static void mctrace_init_debugfs(void)

-- 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
@ 2008-01-03  8:31   ` Sam Ravnborg
  2008-01-03 14:03     ` Steven Rostedt
  2008-01-03  9:21   ` Ingo Molnar
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Sam Ravnborg @ 2008-01-03  8:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

Hi Steven.

On Thu, Jan 03, 2008 at 02:16:10AM -0500, Steven Rostedt wrote:
> If CONFIG_MCOUNT is selected and /proc/sys/kernel/mcount_enabled is set to a
> non-zero value the mcount routine will be called everytime we enter a kernel
> function that is not marked with the "notrace" attribute.
> 
> The mcount routine will then call a registered function if a function
> happens to be registered.
> 
> [This code has been highly hacked by Steven Rostedt, so don't
>  blame Arnaldo for all of this ;-) ]
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
> 
> Index: linux-compile.git/Documentation/stable_api_nonsense.txt
> ===================================================================
> --- linux-compile.git.orig/Documentation/stable_api_nonsense.txt	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/Documentation/stable_api_nonsense.txt	2008-01-03 01:02:33.000000000 -0500
> @@ -62,6 +62,9 @@ consider the following facts about the L
>        - different structures can contain different fields
>        - Some functions may not be implemented at all, (i.e. some locks
>  	compile away to nothing for non-SMP builds.)
> +      - Parameter passing of variables from function to function can be
> +	done in different ways (the CONFIG_REGPARM option controls
> +	this.)

As CONFIG_REGPARM affects calling convention we should add it to the
list of symbols checked when loading modules (vermagic.h).


>        - Memory within the kernel can be aligned in different ways,
>  	depending on the build options.
>    - Linux runs on a wide range of different processor architectures.
> Index: linux-compile.git/Makefile
> ===================================================================
> --- linux-compile.git.orig/Makefile	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/Makefile	2008-01-03 01:02:39.000000000 -0500
> @@ -509,11 +509,15 @@ endif
>  
>  include $(srctree)/arch/$(SRCARCH)/Makefile
>  
> +ifdef CONFIG_MCOUNT
> +KBUILD_CFLAGS	+= -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls
> +else
>  ifdef CONFIG_FRAME_POINTER
>  KBUILD_CFLAGS	+= -fno-omit-frame-pointer -fno-optimize-sibling-calls
>  else
>  KBUILD_CFLAGS	+= -fomit-frame-pointer
>  endif
> +endif
Could we please move these relations to Kconfig. So we do not end up in a situation
where CONFIG_FRAME_POINTER is set but the flag is not added.



>  
>  ifdef CONFIG_DEBUG_INFO
>  KBUILD_CFLAGS	+= -g
> Index: linux-compile.git/arch/x86/Kconfig
> ===================================================================
> --- linux-compile.git.orig/arch/x86/Kconfig	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/arch/x86/Kconfig	2008-01-03 01:02:33.000000000 -0500
> @@ -28,6 +28,12 @@ config GENERIC_CMOS_UPDATE
>  	bool
>  	default y
>  
> +# function tracing might turn this off:
> +config REGPARM
> +	bool
> +	depends on !MCOUNT
> +	default y
> +

Could we please define config REGPARM in _one_ Kconfig file
and let those who want it select it.
If you consider this x86 spacific this should be included in the naming
as in CONFIG_X86_REGPARM - and then the above is OK.

Defining the same config variable in several files is not good (but done too often these days).

> Index: linux-compile.git/lib/mcount/Makefile
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-compile.git/lib/mcount/Makefile	2008-01-03 01:02:33.000000000 -0500
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_MCOUNT) += libmcount.o
> +
> +libmcount-objs := mcount.o

Preferred syntax is now:
libmcount-y := mcount.o

	Sam

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
  2008-01-03  8:31   ` Sam Ravnborg
@ 2008-01-03  9:21   ` Ingo Molnar
  2008-01-03 13:58     ` Steven Rostedt
  2008-01-03 16:01   ` Daniel Walker
  2008-01-03 17:35   ` Mathieu Desnoyers
  3 siblings, 1 reply; 40+ messages in thread
From: Ingo Molnar @ 2008-01-03  9:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> +# function tracing might turn this off:
> +config REGPARM
> +	bool
> +	depends on !MCOUNT
> +	default y

are you sure -pg really needs this? I just carried this along the years 
and went the path of least resistence, but we should not be 
reintroducing the !REGPARM build mode for the kernel. I'd not be 
surprised if there were a few issues with REGPARM + mcount, but we have 
to figure it out before merging ...

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  9:21   ` Ingo Molnar
@ 2008-01-03 13:58     ` Steven Rostedt
  2008-01-03 18:16       ` Chris Wright
  0 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt,
	Chris Wright, Rusty Russell, virtualization


[Added Chris Wright, Rusty and Virt list because they were involved with
this issue before]

On Thu, 3 Jan 2008, Ingo Molnar wrote:

>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > +# function tracing might turn this off:
> > +config REGPARM
> > +	bool
> > +	depends on !MCOUNT
> > +	default y
>
> are you sure -pg really needs this?

Nope! Arnaldo and I only carried it because you had it ;-)

> I just carried this along the years
> and went the path of least resistence, but we should not be
> reintroducing the !REGPARM build mode for the kernel. I'd not be
> surprised if there were a few issues with REGPARM + mcount, but we have
> to figure it out before merging ...

Hmm, I know paravirt-ops had an issue with mcount in the RT tree. I can't
remember the exact issues, but it did have something to do with the way
parameters were passed in.

Chris, do you remember what the issues were?

I'm also thinking that this is only an i386 issue.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  8:31   ` Sam Ravnborg
@ 2008-01-03 14:03     ` Steven Rostedt
  0 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 14:03 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt


Hi Sam!

On Thu, 3 Jan 2008, Sam Ravnborg wrote:
> > ---
> >
> > Index: linux-compile.git/Documentation/stable_api_nonsense.txt
> > ===================================================================
> > --- linux-compile.git.orig/Documentation/stable_api_nonsense.txt	2008-01-03 01:02:28.000000000 -0500
> > +++ linux-compile.git/Documentation/stable_api_nonsense.txt	2008-01-03 01:02:33.000000000 -0500
> > @@ -62,6 +62,9 @@ consider the following facts about the L
> >        - different structures can contain different fields
> >        - Some functions may not be implemented at all, (i.e. some locks
> >  	compile away to nothing for non-SMP builds.)
> > +      - Parameter passing of variables from function to function can be
> > +	done in different ways (the CONFIG_REGPARM option controls
> > +	this.)
>
> As CONFIG_REGPARM affects calling convention we should add it to the
> list of symbols checked when loading modules (vermagic.h).

Good point, thanks for mentioning this.

> > Index: linux-compile.git/Makefile
> > ===================================================================
> > --- linux-compile.git.orig/Makefile	2008-01-03 01:02:28.000000000 -0500
> > +++ linux-compile.git/Makefile	2008-01-03 01:02:39.000000000 -0500
> > @@ -509,11 +509,15 @@ endif
> >
> >  include $(srctree)/arch/$(SRCARCH)/Makefile
> >
> > +ifdef CONFIG_MCOUNT
> > +KBUILD_CFLAGS	+= -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls
> > +else
> >  ifdef CONFIG_FRAME_POINTER
> >  KBUILD_CFLAGS	+= -fno-omit-frame-pointer -fno-optimize-sibling-calls
> >  else
> >  KBUILD_CFLAGS	+= -fomit-frame-pointer
> >  endif
> > +endif
> Could we please move these relations to Kconfig. So we do not end up in a situation
> where CONFIG_FRAME_POINTER is set but the flag is not added.

Yes, most definitely. I thought this part was a bit sloppy, but it
"worked".  My next series will clean this up.

>
>
> >
> >  ifdef CONFIG_DEBUG_INFO
> >  KBUILD_CFLAGS	+= -g
> > Index: linux-compile.git/arch/x86/Kconfig
> > ===================================================================
> > --- linux-compile.git.orig/arch/x86/Kconfig	2008-01-03 01:02:28.000000000 -0500
> > +++ linux-compile.git/arch/x86/Kconfig	2008-01-03 01:02:33.000000000 -0500
> > @@ -28,6 +28,12 @@ config GENERIC_CMOS_UPDATE
> >  	bool
> >  	default y
> >
> > +# function tracing might turn this off:
> > +config REGPARM
> > +	bool
> > +	depends on !MCOUNT
> > +	default y
> > +
>
> Could we please define config REGPARM in _one_ Kconfig file
> and let those who want it select it.
> If you consider this x86 spacific this should be included in the naming
> as in CONFIG_X86_REGPARM - and then the above is OK.

This is pending on resolving what effects REGPARM really has on mcount.
But, I'll keep REGPARM until it's solved, and in the mean time I'll clean
it up as you asked.


>
> Defining the same config variable in several files is not good (but done too often these days).
>
> > Index: linux-compile.git/lib/mcount/Makefile
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-compile.git/lib/mcount/Makefile	2008-01-03 01:02:33.000000000 -0500
> > @@ -0,0 +1,3 @@
> > +obj-$(CONFIG_MCOUNT) += libmcount.o
> > +
> > +libmcount-objs := mcount.o
>
> Preferred syntax is now:
> libmcount-y := mcount.o

Ah! I learn something new every day :-)  Yeah, I stumbled over a few
updates in formats with the makefiles.

Thanks!

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
  2008-01-03  8:31   ` Sam Ravnborg
  2008-01-03  9:21   ` Ingo Molnar
@ 2008-01-03 16:01   ` Daniel Walker
  2008-01-03 17:35   ` Mathieu Desnoyers
  3 siblings, 0 replies; 40+ messages in thread
From: Daniel Walker @ 2008-01-03 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt


On Thu, 2008-01-03 at 02:16 -0500, Steven Rostedt wrote:
> Index: linux-compile.git/Makefile
> ===================================================================
> --- linux-compile.git.orig/Makefile     2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/Makefile  2008-01-03 01:02:39.000000000 -0500
> @@ -509,11 +509,15 @@ endif
>  
>  include $(srctree)/arch/$(SRCARCH)/Makefile
>  
> +ifdef CONFIG_MCOUNT
> +KBUILD_CFLAGS  += -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls
> +else
>  ifdef CONFIG_FRAME_POINTER
>  KBUILD_CFLAGS  += -fno-omit-frame-pointer -fno-optimize-sibling-calls
>  else
>  KBUILD_CFLAGS  += -fomit-frame-pointer
>  endif
> +endif
>  
>  ifdef CONFIG_DEBUG_INFO
>  KBUILD_CFLAGS  += -g

I'd much prefer if you used -finstrument-function since it's already
architecture independent .. It was suggested not too long ago.. There is
also another tracing patchset that is similar to this one which uses it
(KFT)..

Daniel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/11] mcount tracing utility
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (10 preceding siblings ...)
  2008-01-03  7:16 ` [RFC PATCH 11/11] Add a symbol only trace output Steven Rostedt
@ 2008-01-03 17:22 ` Mathieu Desnoyers
  2008-01-03 17:42   ` Steven Rostedt
  2008-01-03 18:05 ` Andi Kleen
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 17:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Tim Bird

Hi Steven,

Great work!

(added Tim Bird, author of KFT/KFI to the CC list)

* Steven Rostedt (rostedt@goodmis.org) wrote:
>
...
> 
> Future:
> -------
> The way the mcount hook is done here, other utilities can easily add their
> own functions. Just care needs to be made not to call anything that is not
> marked with notrace, or you will crash the box with recursion. But
> even the simple tracer adds a "disabled" feature so in case it happens
> to call something that is not marked with notrace, it is a safety net
> not to kill the box.
> 
> I was originally going to use the relay system to record the data, but
> that had a chance of calling functions not marked with notrace. But, if
> for example LTTng wanted to use this, it could disable tracing on a CPU
> when doing the calls, and this will protect from recusion.
> 

Yes, I'd love to add this information source to LTTng. It simply boils
down to adding a "notrace" flag to LTTng tracing functions. Since I
don't use relay code _at all_ in the tracing path, there is no problem
with this (I only disable preemption and do "local" atomic operations on
per-cpu variables). Then I would have to do the glue code that registers
the LTTng handler to your mcount mechanism.

One interesting aspect of LTTng is that is would be very lightweight.
You seem to use interrupt disabling with your simple tracer and do a
_lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple).

LTTng would write the information to a per-cpu memory buffer in binary
format. I see that it would be especially useful in flight recorder
mode, where we overwrite the buffers without writing them to disk : when
a problematic condition is reached, (a kernel oops would be a good one),
then we just stop tracing and dump the last buffers to disk. In this
case, we would have the last function calls that led to an OOPS.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
                     ` (2 preceding siblings ...)
  2008-01-03 16:01   ` Daniel Walker
@ 2008-01-03 17:35   ` Mathieu Desnoyers
  2008-01-03 17:55     ` Steven Rostedt
  3 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 17:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
...
> Index: linux-compile.git/arch/x86/Kconfig
> ===================================================================
> --- linux-compile.git.orig/arch/x86/Kconfig	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/arch/x86/Kconfig	2008-01-03 01:02:33.000000000 -0500
> @@ -28,6 +28,12 @@ config GENERIC_CMOS_UPDATE
>  	bool
>  	default y
>  
> +# function tracing might turn this off:
> +config REGPARM
> +	bool
> +	depends on !MCOUNT
> +	default y
> +
>  config CLOCKSOURCE_WATCHDOG
>  	bool
>  	default y
....
> Index: linux-compile.git/arch/x86/kernel/mcount-wrapper.S
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-compile.git/arch/x86/kernel/mcount-wrapper.S	2008-01-03 01:02:33.000000000 -0500
> @@ -0,0 +1,25 @@
> +/*
> + *  linux/arch/x86/mcount-wrapper.S
> + *
> + *  Copyright (C) 2004 Ingo Molnar
> + */
> +
> +.globl mcount
> +mcount:
> +	cmpl $0, mcount_enabled
> +	jz out
> +
> +	push %ebp
> +	mov %esp, %ebp
> +	pushl %eax
> +	pushl %ecx
> +	pushl %edx
> +
> +	call __mcount
> +
> +	popl %edx
> +	popl %ecx
> +	popl %eax
> +	popl %ebp

Writing this stack setup in assembly may be the one thing that conflicts
with REGPARM ?

> +out:
> +	ret
> Index: linux-compile.git/include/linux/linkage.h
> ===================================================================
> --- linux-compile.git.orig/include/linux/linkage.h	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/include/linux/linkage.h	2008-01-03 01:02:33.000000000 -0500
> @@ -3,6 +3,8 @@
>  
>  #include <asm/linkage.h>
>  
> +#define notrace __attribute__((no_instrument_function))
> +
>  #ifdef __cplusplus
>  #define CPP_ASMLINKAGE extern "C"
>  #else
> Index: linux-compile.git/kernel/sysctl.c
> ===================================================================
> --- linux-compile.git.orig/kernel/sysctl.c	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/kernel/sysctl.c	2008-01-03 01:02:33.000000000 -0500
> @@ -46,6 +46,7 @@
>  #include <linux/nfs_fs.h>
>  #include <linux/acpi.h>
>  #include <linux/reboot.h>
> +#include <linux/mcount.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/processor.h>
> @@ -470,6 +471,16 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= &proc_dointvec,
>  	},
> +#ifdef CONFIG_MCOUNT
> +	{
> +		.ctl_name	= CTL_UNNUMBERED,
> +		.procname	= "mcount_enabled",
> +		.data		= &mcount_enabled,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= &proc_dointvec,
> +	},
> +#endif
>  #ifdef CONFIG_KMOD
>  	{
>  		.ctl_name	= KERN_MODPROBE,
> Index: linux-compile.git/lib/Kconfig.debug
> ===================================================================
> --- linux-compile.git.orig/lib/Kconfig.debug	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/lib/Kconfig.debug	2008-01-03 01:02:33.000000000 -0500
> @@ -517,4 +517,6 @@ config FAULT_INJECTION_STACKTRACE_FILTER
>  	help
>  	  Provide stacktrace filter for fault-injection capabilities
>  
> +source lib/mcount/Kconfig
> +
>  source "samples/Kconfig"
> Index: linux-compile.git/lib/Makefile
> ===================================================================
> --- linux-compile.git.orig/lib/Makefile	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/lib/Makefile	2008-01-03 01:02:33.000000000 -0500
> @@ -66,6 +66,8 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o
>  obj-$(CONFIG_SWIOTLB) += swiotlb.o
>  obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
>  
> +obj-$(CONFIG_MCOUNT) += mcount/
> +
>  lib-$(CONFIG_GENERIC_BUG) += bug.o
>  
>  hostprogs-y	:= gen_crc32table
> Index: linux-compile.git/lib/mcount/Kconfig
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-compile.git/lib/mcount/Kconfig	2008-01-03 01:02:33.000000000 -0500
> @@ -0,0 +1,6 @@
> +
> +# MCOUNT itself is useless, or will just be added overhead.
> +# It needs something to register a function with it.
> +config MCOUNT
> +	bool
> +	depends on DEBUG_KERNEL
> Index: linux-compile.git/lib/mcount/Makefile
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-compile.git/lib/mcount/Makefile	2008-01-03 01:02:33.000000000 -0500
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_MCOUNT) += libmcount.o
> +
> +libmcount-objs := mcount.o
> Index: linux-compile.git/lib/mcount/mcount.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-compile.git/lib/mcount/mcount.c	2008-01-03 01:02:33.000000000 -0500
> @@ -0,0 +1,78 @@
> +/*
> + * Infrastructure for profiling code inserted by 'gcc -pg'.
> + *
> + * Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
> + *
> + * Converted to be more generic:
> + *   Copyright (C) 2007-2008 Steven Rostedt <srostedt@redhat.com>
> + *
> + * From code in the latency_tracer, that is:
> + *
> + *  Copyright (C) 2004-2006 Ingo Molnar
> + *  Copyright (C) 2004 William Lee Irwin III
> + */
> +
> +#include <linux/module.h>
> +#include <linux/mcount.h>
> +
> +/*
> + * Since we have nothing protecting between the test of
> + * mcount_trace_function and the call to it, we can't
> + * set it to NULL without risking a race that will have
> + * the kernel call the NULL pointer. Instead, we just
> + * set the function pointer to a dummy function.
> + */
> +notrace void dummy_mcount_tracer(unsigned long ip,
> +				 unsigned long parent_ip)
> +{
> +	/* do nothing */
> +}
> +
> +mcount_func_t mcount_trace_function = dummy_mcount_tracer;
> +int mcount_enabled;
> +
> +/** __mcount - hook for profiling
> + *
> + * This routine is called from the arch specific mcount routine, that in turn is
> + * called from code inserted by gcc -pg.
> + */
> +notrace void __mcount(void)
> +{
> +	if (mcount_trace_function != dummy_mcount_tracer)
> +		mcount_trace_function(CALLER_ADDR1, CALLER_ADDR2);
> +}

I don't see what the mcount_trace_function test gives us here : we
already tested mcount_enabled.

> +EXPORT_SYMBOL_GPL(mcount);
> +/*
> + * The above EXPORT_SYMBOL is for the gcc call of mcount and not the
> + * function __mcount that it is underneath. I put the export there
> + * to fool checkpatch.pl. It wants that export to be with the
> + * function, but that function happens to be in assembly.
> + */
> +
> +/**
> + * register_mcount_function - register a function for profiling
> + * @func - the function for profiling.
> + *
> + * Register a function to be called by all functions in the
> + * kernel.
> + *
> + * Note: @func and all the functions it calls must be labeled
> + *       with "notrace", otherwise it will go into a
> + *       recursive loop.
> + */
> +int register_mcount_function(mcount_func_t func)
> +{
> +	mcount_trace_function = func;
> +	return 0;
> +}
> +
> +/**
> + * clear_mcount_function - reset the mcount function
> + *
> + * This NULLs the mcount function and in essence stops
> + * tracing.  There may be lag
> + */
> +void clear_mcount_function(void)
> +{
> +	mcount_trace_function = dummy_mcount_tracer;
> +}
> Index: linux-compile.git/include/linux/mcount.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-compile.git/include/linux/mcount.h	2008-01-03 01:02:33.000000000 -0500
> @@ -0,0 +1,21 @@
> +#ifndef _LINUX_MCOUNT_H
> +#define _LINUX_MCOUNT_H
> +
> +#ifdef CONFIG_MCOUNT
> +extern int mcount_enabled;
> +
> +#include <linux/linkage.h>
> +
> +#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
> +#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
> +#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
> +
> +typedef void (*mcount_func_t)(unsigned long ip, unsigned long parent_ip);
> +
> +extern void mcount(void);
> +
> +int register_mcount_function(mcount_func_t func);
> +void clear_mcount_function(void);
> +
> +#endif /* CONFIG_MCOUNT */
> +#endif /* _LINUX_MCOUNT_H */
> Index: linux-compile.git/arch/x86/kernel/entry_64.S
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/entry_64.S	2008-01-03 01:02:28.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/entry_64.S	2008-01-03 01:02:33.000000000 -0500
> @@ -53,6 +53,52 @@
>  
>  	.code64
>  
> +#ifdef CONFIG_MCOUNT
> +
> +ENTRY(mcount)
> +	cmpl $0, mcount_enabled
> +	jz out
> +
> +	push %rbp
> +
> +	lea dummy_mcount_tracer, %rbp
> +	cmpq %rbp, mcount_trace_function


Ok, so we normally jump over the function call (with mcount_enabled being 0)
but we can call it in rare cases when it is being set concurrently (even
though the mcount_trace_function is there, concurrency could still allow
the call).

Therefore we have one data cache hit when disabled (mcount_enabled), and
must do a supplementary comparison before the call when enabled. I
wonder why the cmpq %rbp, mcount_trace_function test is there at all ?


> +	jz out_rbp
> +
> +	mov %rsp,%rbp
> +
> +	push %r11
> +	push %r10
> +	push %r9
> +	push %r8
> +	push %rdi
> +	push %rsi
> +	push %rdx
> +	push %rcx
> +	push %rax
> +
> +	mov 0x0(%rbp),%rax
> +	mov 0x8(%rbp),%rdi
> +	mov 0x8(%rax),%rsi
> +
> +	call   *mcount_trace_function
> +
> +	pop %rax
> +	pop %rcx
> +	pop %rdx
> +	pop %rsi
> +	pop %rdi
> +	pop %r8
> +	pop %r9
> +	pop %r10
> +	pop %r11
> +
> +out_rbp:
> +	pop %rbp
> +out:
> +	ret
> +#endif
> +
>  #ifndef CONFIG_PREEMPT
>  #define retint_kernel retint_restore_args
>  #endif	
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/11] Add fastcall to do_IRQ for i386
  2008-01-03  7:16 ` [RFC PATCH 02/11] Add fastcall to do_IRQ for i386 Steven Rostedt
@ 2008-01-03 17:36   ` Mathieu Desnoyers
  2008-01-03 17:47     ` Steven Rostedt
  0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 17:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
> MCOUNT will disable the regparm parameters of the i386 compile
> options. When doing so, this breaks the prototype of do_IRQ
> where the fastcall must be explicitly called.
> 
> Also fixed some whitespace damage in the call to do_IRQ.
> 

I would propose to try to see how we can #ifdef two different __mcount
assembly functions that would prepare the stack appropriately for each
REGPARM cases.

> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  arch/x86/kernel/irq_32.c |    2 +-
>  include/asm-x86/irq_32.h |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-compile.git/arch/x86/kernel/irq_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/irq_32.c	2007-12-20 00:20:29.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/irq_32.c	2007-12-20 00:21:55.000000000 -0500
> @@ -67,7 +67,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
>   * handlers).
>   */
>  fastcall unsigned int do_IRQ(struct pt_regs *regs)
> -{	
> +{
>  	struct pt_regs *old_regs;
>  	/* high bit used in ret_from_ code */
>  	int irq = ~regs->orig_eax;
> Index: linux-compile.git/include/asm-x86/irq_32.h
> ===================================================================
> --- linux-compile.git.orig/include/asm-x86/irq_32.h	2007-12-20 00:20:29.000000000 -0500
> +++ linux-compile.git/include/asm-x86/irq_32.h	2007-12-20 00:21:55.000000000 -0500
> @@ -41,7 +41,7 @@ extern int irqbalance_disable(char *str)
>  extern void fixup_irqs(cpumask_t map);
>  #endif
>  
> -unsigned int do_IRQ(struct pt_regs *regs);
> +fastcall unsigned int do_IRQ(struct pt_regs *regs);
>  void init_IRQ(void);
>  void __init native_init_IRQ(void);
>  
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/11] mcount tracing utility
  2008-01-03 17:22 ` [RFC PATCH 00/11] mcount tracing utility Mathieu Desnoyers
@ 2008-01-03 17:42   ` Steven Rostedt
  0 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 17:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Tim Bird


On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:

> Hi Steven,
>
> Great work!

Thanks!

>
> (added Tim Bird, author of KFT/KFI to the CC list)

I'm currently investigating using -finstrument-functions instead of -pg,
but if the overhead is too much, I may try to incorporate both.

>
> One interesting aspect of LTTng is that is would be very lightweight.
> You seem to use interrupt disabling with your simple tracer and do a
> _lot_ of cacheline bouncing (trace_idx[NR_CPUS] is a very good exemple).

Please note that this tracer is more of a "simple example". There's lots
of improvements that can be made. It was meant more of to show what mcount
can bring than to push the tracer itself.

I want to stress that the tracer in this patch set is a *much* simplified
version of the latency_tracer in the RT patch. I want to start out simple,
complexity can come later ;-)

>
> LTTng would write the information to a per-cpu memory buffer in binary
> format. I see that it would be especially useful in flight recorder
> mode, where we overwrite the buffers without writing them to disk : when
> a problematic condition is reached, (a kernel oops would be a good one),
> then we just stop tracing and dump the last buffers to disk. In this
> case, we would have the last function calls that led to an OOPS.

This sounds great. My hope is that we can get the mcount (or cyg_profile)
functionality in the kernel that many different users can deploy.

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 03/11] Annotate core code that should not be traced
  2008-01-03  7:16 ` [RFC PATCH 03/11] Annotate core code that should not be traced Steven Rostedt
@ 2008-01-03 17:42   ` Mathieu Desnoyers
  2008-01-03 18:07     ` Steven Rostedt
  0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 17:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
> Mark with "notrace" functions in core code that should not be
> traced.  The "notrace" attribute will prevent gcc from adding
> a call to mcount on the annotated funtions.
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> 
> ---
>  drivers/clocksource/acpi_pm.c |    8 ++++----
>  include/linux/preempt.h       |    4 ++--
>  kernel/irq/handle.c           |    2 +-
>  kernel/lockdep.c              |   27 ++++++++++++++-------------
>  kernel/rcupdate.c             |    2 +-
>  kernel/spinlock.c             |    2 +-
>  lib/smp_processor_id.c        |    2 +-
>  7 files changed, 24 insertions(+), 23 deletions(-)
> 
> Index: linux-compile.git/drivers/clocksource/acpi_pm.c
> ===================================================================
> --- linux-compile.git.orig/drivers/clocksource/acpi_pm.c	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/drivers/clocksource/acpi_pm.c	2007-12-20 01:00:48.000000000 -0500
> @@ -30,13 +30,13 @@
>   */
>  u32 pmtmr_ioport __read_mostly;
>  
> -static inline u32 read_pmtmr(void)
> +static inline notrace u32 read_pmtmr(void)
>  {
>  	/* mask the output to 24 bits */
>  	return inl(pmtmr_ioport) & ACPI_PM_MASK;
>  }
>  
> -u32 acpi_pm_read_verified(void)
> +notrace u32 acpi_pm_read_verified(void)
>  {
>  	u32 v1 = 0, v2 = 0, v3 = 0;
>  
> @@ -56,12 +56,12 @@ u32 acpi_pm_read_verified(void)
>  	return v2;
>  }
>  
> -static cycle_t acpi_pm_read_slow(void)
> +static notrace cycle_t acpi_pm_read_slow(void)
>  {
>  	return (cycle_t)acpi_pm_read_verified();
>  }
>  
> -static cycle_t acpi_pm_read(void)
> +static notrace cycle_t acpi_pm_read(void)
>  {
>  	return (cycle_t)read_pmtmr();
>  }

What precision can you get from this clock source ? How many cycles are
required to read it ? Would it be useful to fall back on the CPU TSC
when they are synchronized ? Does acpi_pm_read_verified read the
timestamp atomically ? Is is a reason why you need to disable
interrupts, and therefore cannot trace NMI handlers ?

> Index: linux-compile.git/include/linux/preempt.h
> ===================================================================
> --- linux-compile.git.orig/include/linux/preempt.h	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/include/linux/preempt.h	2007-12-20 01:00:48.000000000 -0500
> @@ -11,8 +11,8 @@
>  #include <linux/list.h>
>  
>  #ifdef CONFIG_DEBUG_PREEMPT
> -  extern void fastcall add_preempt_count(int val);
> -  extern void fastcall sub_preempt_count(int val);
> +  extern notrace void fastcall add_preempt_count(int val);
> +  extern notrace void fastcall sub_preempt_count(int val);
>  #else
>  # define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
>  # define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
> Index: linux-compile.git/kernel/irq/handle.c
> ===================================================================
> --- linux-compile.git.orig/kernel/irq/handle.c	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/kernel/irq/handle.c	2007-12-20 01:00:48.000000000 -0500
> @@ -163,7 +163,7 @@ irqreturn_t handle_IRQ_event(unsigned in
>   * This is the original x86 implementation which is used for every
>   * interrupt type.
>   */
> -fastcall unsigned int __do_IRQ(unsigned int irq)
> +notrace fastcall unsigned int __do_IRQ(unsigned int irq)

Can you explain the notrace here ?

>  {
>  	struct irq_desc *desc = irq_desc + irq;
>  	struct irqaction *action;
> Index: linux-compile.git/kernel/lockdep.c
> ===================================================================
> --- linux-compile.git.orig/kernel/lockdep.c	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/kernel/lockdep.c	2007-12-20 01:00:48.000000000 -0500
> @@ -270,14 +270,14 @@ static struct list_head chainhash_table[
>  	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
>  	(key2))
>  
> -void lockdep_off(void)
> +notrace void lockdep_off(void)
>  {
>  	current->lockdep_recursion++;
>  }
>  

Due to interrupt disabling in your tracing code I suppose.

>  EXPORT_SYMBOL(lockdep_off);
>  
> -void lockdep_on(void)
> +notrace void lockdep_on(void)
>  {
>  	current->lockdep_recursion--;
>  }
> @@ -1036,7 +1036,7 @@ find_usage_forwards(struct lock_class *s
>   * Return 1 otherwise and keep <backwards_match> unchanged.
>   * Return 0 on error.
>   */
> -static noinline int
> +static noinline notrace int
>  find_usage_backwards(struct lock_class *source, unsigned int depth)
>  {
>  	struct lock_list *entry;
> @@ -1586,7 +1586,7 @@ static inline int validate_chain(struct 
>   * We are building curr_chain_key incrementally, so double-check
>   * it from scratch, to make sure that it's done correctly:
>   */
> -static void check_chain_key(struct task_struct *curr)
> +static notrace void check_chain_key(struct task_struct *curr)
>  {
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  	struct held_lock *hlock, *prev_hlock = NULL;
> @@ -1962,7 +1962,7 @@ static int mark_lock_irq(struct task_str
>  /*
>   * Mark all held locks with a usage bit:
>   */
> -static int
> +static notrace int
>  mark_held_locks(struct task_struct *curr, int hardirq)
>  {
>  	enum lock_usage_bit usage_bit;
> @@ -2009,7 +2009,7 @@ void early_boot_irqs_on(void)
>  /*
>   * Hardirqs will be enabled:
>   */
> -void trace_hardirqs_on(void)
> +notrace void trace_hardirqs_on(void)
>  {
>  	struct task_struct *curr = current;
>  	unsigned long ip;
> @@ -2057,7 +2057,7 @@ EXPORT_SYMBOL(trace_hardirqs_on);
>  /*
>   * Hardirqs were disabled:
>   */
> -void trace_hardirqs_off(void)
> +notrace void trace_hardirqs_off(void)
>  {
>  	struct task_struct *curr = current;
>  
> @@ -2241,8 +2241,8 @@ static inline int separate_irq_context(s
>  /*
>   * Mark a lock with a usage bit, and validate the state transition:
>   */
> -static int mark_lock(struct task_struct *curr, struct held_lock *this,
> -		     enum lock_usage_bit new_bit)
> +static notrace int mark_lock(struct task_struct *curr, struct held_lock *this,
> +			     enum lock_usage_bit new_bit)
>  {
>  	unsigned int new_mask = 1 << new_bit, ret = 1;
>  
> @@ -2648,7 +2648,7 @@ __lock_release(struct lockdep_map *lock,
>  /*
>   * Check whether we follow the irq-flags state precisely:
>   */
> -static void check_flags(unsigned long flags)
> +static notrace void check_flags(unsigned long flags)
>  {
>  #if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
>  	if (!debug_locks)
> @@ -2685,8 +2685,8 @@ static void check_flags(unsigned long fl
>   * We are not always called with irqs disabled - do that here,
>   * and also avoid lockdep recursion:
>   */
> -void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> -		  int trylock, int read, int check, unsigned long ip)
> +notrace void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> +			  int trylock, int read, int check, unsigned long ip)
>  {
>  	unsigned long flags;
>  
> @@ -2708,7 +2708,8 @@ void lock_acquire(struct lockdep_map *lo
>  
>  EXPORT_SYMBOL_GPL(lock_acquire);
>  
> -void lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
> +notrace void lock_release(struct lockdep_map *lock, int nested,
> +			  unsigned long ip)
>  {
>  	unsigned long flags;

Do you really use locks in your tracing code ? I thought you were using
per cpu buffers.

>  
> Index: linux-compile.git/kernel/rcupdate.c
> ===================================================================
> --- linux-compile.git.orig/kernel/rcupdate.c	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/kernel/rcupdate.c	2007-12-20 01:00:48.000000000 -0500
> @@ -504,7 +504,7 @@ static int __rcu_pending(struct rcu_ctrl
>   * by the current CPU, returning 1 if so.  This function is part of the
>   * RCU implementation; it is -not- an exported member of the RCU API.
>   */
> -int rcu_pending(int cpu)
> +notrace int rcu_pending(int cpu)
>  {
>  	return __rcu_pending(&rcu_ctrlblk, &per_cpu(rcu_data, cpu)) ||
>  		__rcu_pending(&rcu_bh_ctrlblk, &per_cpu(rcu_bh_data, cpu));
> Index: linux-compile.git/kernel/spinlock.c
> ===================================================================
> --- linux-compile.git.orig/kernel/spinlock.c	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/kernel/spinlock.c	2007-12-20 01:00:48.000000000 -0500
> @@ -437,7 +437,7 @@ int __lockfunc _spin_trylock_bh(spinlock
>  }
>  EXPORT_SYMBOL(_spin_trylock_bh);
>  
> -int in_lock_functions(unsigned long addr)
> +notrace int in_lock_functions(unsigned long addr)
>  {
>  	/* Linker adds these: start and end of __lockfunc functions */
>  	extern char __lock_text_start[], __lock_text_end[];
> Index: linux-compile.git/lib/smp_processor_id.c
> ===================================================================
> --- linux-compile.git.orig/lib/smp_processor_id.c	2007-12-20 01:00:29.000000000 -0500
> +++ linux-compile.git/lib/smp_processor_id.c	2007-12-20 01:00:48.000000000 -0500
> @@ -7,7 +7,7 @@
>  #include <linux/kallsyms.h>
>  #include <linux/sched.h>
>  
> -unsigned int debug_smp_processor_id(void)
> +notrace unsigned int debug_smp_processor_id(void)
>  {
>  	unsigned long preempt_count = preempt_count();
>  	int this_cpu = raw_smp_processor_id();
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/11] Add fastcall to do_IRQ for i386
  2008-01-03 17:36   ` Mathieu Desnoyers
@ 2008-01-03 17:47     ` Steven Rostedt
  2008-01-07  4:50       ` H. Peter Anvin
  0 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 17:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt



On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:
>
> I would propose to try to see how we can #ifdef two different __mcount
> assembly functions that would prepare the stack appropriately for each
> REGPARM cases.
>

I have to confess that I've been testing this mostly on x86_64, which
doesn't have the troubles with REGPARM as i386 does. I'll need to
investigate this a bit deeper on i386 alone.

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 04/11] i386: notrace annotations
  2008-01-03  7:16 ` [RFC PATCH 04/11] i386: notrace annotations Steven Rostedt
@ 2008-01-03 17:52   ` Mathieu Desnoyers
  0 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 17:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
> From patch-2.6.21.5-rt20. Annotates functions that should not be profiler
> instrumented, i.e. where mcount should not be called at function entry.
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  arch/x86/kernel/apic_32.c  |    2 +-
>  arch/x86/kernel/hpet.c     |    2 +-
>  arch/x86/kernel/irq_32.c   |    2 +-
>  arch/x86/kernel/nmi_32.c   |    2 +-
>  arch/x86/kernel/smp_32.c   |    2 +-
>  arch/x86/kernel/time_32.c  |    2 +-
>  arch/x86/kernel/traps_32.c |    4 ++--
>  arch/x86/kernel/tsc_32.c   |    2 +-
>  arch/x86/lib/delay_32.c    |    6 +++---
>  arch/x86/mm/fault_32.c     |    4 ++--
>  arch/x86/mm/init_32.c      |    2 +-
>  11 files changed, 15 insertions(+), 15 deletions(-)
> ---
> 
> Index: linux-compile.git/arch/x86/kernel/apic_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/apic_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/apic_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -577,7 +577,7 @@ static void local_apic_timer_interrupt(v
>   *   interrupt as well. Thus we cannot inline the local irq ... ]
>   */
>  
> -void fastcall smp_apic_timer_interrupt(struct pt_regs *regs)
> +notrace fastcall void smp_apic_timer_interrupt(struct pt_regs *regs)
>  {
>  	struct pt_regs *old_regs = set_irq_regs(regs);
>  

Why can't this be traced ?

> Index: linux-compile.git/arch/x86/kernel/hpet.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/hpet.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/hpet.c	2008-01-02 22:56:41.000000000 -0500
> @@ -295,7 +295,7 @@ static int hpet_legacy_next_event(unsign
>  /*
>   * Clock source related code
>   */
> -static cycle_t read_hpet(void)
> +static notrace cycle_t read_hpet(void)
>  {
>  	return (cycle_t)hpet_readl(HPET_COUNTER);
>  }

This one is weird on x86 32 : a 32 bits value is casted into a
(cycles_t), which is an unsigned long long. So we think we have a 64
bits counter when in fact we only have 32 bits. How long before the wrap
around ?

> Index: linux-compile.git/arch/x86/kernel/irq_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/irq_32.c	2008-01-02 22:56:34.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/irq_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -66,7 +66,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
>   * SMP cross-CPU interrupts have their own specific
>   * handlers).
>   */
> -fastcall unsigned int do_IRQ(struct pt_regs *regs)
> +notrace fastcall unsigned int do_IRQ(struct pt_regs *regs)

Why ?

>  {
>  	struct pt_regs *old_regs;
>  	/* high bit used in ret_from_ code */
> Index: linux-compile.git/arch/x86/kernel/nmi_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/nmi_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/nmi_32.c	2008-01-02 22:57:52.000000000 -0500
> @@ -323,7 +323,7 @@ EXPORT_SYMBOL(touch_nmi_watchdog);
>  
>  extern void die_nmi(struct pt_regs *, const char *msg);
>  
> -__kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
> +notrace __kprobes int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
>  {

Why ? ... hrm, this is dangerous : you will then have to look at _every_
code path that can be run in an NMI context (printk included) and mark
them "notrace" or make sure your tracer is disabled when they are run.

>  
>  	/*
> Index: linux-compile.git/arch/x86/kernel/smp_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/smp_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/smp_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -638,7 +638,7 @@ static void native_smp_send_stop(void)
>   * all the work is done automatically when
>   * we return from the interrupt.
>   */
> -fastcall void smp_reschedule_interrupt(struct pt_regs *regs)
> +notrace fastcall void smp_reschedule_interrupt(struct pt_regs *regs)

why?

>  {
>  	ack_APIC_irq();
>  	__get_cpu_var(irq_stat).irq_resched_count++;
> Index: linux-compile.git/arch/x86/kernel/time_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/time_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/time_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -122,7 +122,7 @@ static int set_rtc_mmss(unsigned long no
>  
>  int timer_ack;
>  
> -unsigned long profile_pc(struct pt_regs *regs)
> +notrace unsigned long profile_pc(struct pt_regs *regs)
>  {
>  	unsigned long pc = instruction_pointer(regs);
>  
> Index: linux-compile.git/arch/x86/kernel/traps_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/traps_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/traps_32.c	2008-01-02 22:58:19.000000000 -0500
> @@ -723,7 +723,7 @@ void __kprobes die_nmi(struct pt_regs *r
>  	do_exit(SIGSEGV);
>  }
>  
> -static __kprobes void default_do_nmi(struct pt_regs * regs)
> +static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
>  {
>  	unsigned char reason = 0;
>  
> @@ -763,7 +763,7 @@ static __kprobes void default_do_nmi(str
>  
>  static int ignore_nmis;
>  
> -fastcall __kprobes void do_nmi(struct pt_regs * regs, long error_code)
> +notrace fastcall __kprobes void do_nmi(struct pt_regs *regs, long error_code)
>  {
>  	int cpu;
>  

Same here.. we must be careful, or use atomic ops instead of interrupt
disabling.

> Index: linux-compile.git/arch/x86/kernel/tsc_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/kernel/tsc_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/kernel/tsc_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -269,7 +269,7 @@ core_initcall(cpufreq_tsc);
>  
>  static unsigned long current_tsc_khz = 0;
>  
> -static cycle_t read_tsc(void)
> +static notrace cycle_t read_tsc(void)
>  {
>  	cycle_t ret;
>  
> Index: linux-compile.git/arch/x86/lib/delay_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/lib/delay_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/lib/delay_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -24,7 +24,7 @@
>  #endif
>  
>  /* simple loop based delay: */
> -static void delay_loop(unsigned long loops)
> +static notrace void delay_loop(unsigned long loops)
>  {
>  	int d0;
>  
> @@ -39,7 +39,7 @@ static void delay_loop(unsigned long loo
>  }
>  
>  /* TSC based delay: */
> -static void delay_tsc(unsigned long loops)
> +static notrace void delay_tsc(unsigned long loops)
>  {
>  	unsigned long bclock, now;
>  
> @@ -72,7 +72,7 @@ int read_current_timer(unsigned long *ti
>  	return -1;
>  }
>  
> -void __delay(unsigned long loops)
> +notrace void __delay(unsigned long loops)
>  {
>  	delay_fn(loops);
>  }
> Index: linux-compile.git/arch/x86/mm/fault_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/mm/fault_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/mm/fault_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -293,8 +293,8 @@ int show_unhandled_signals = 1;
>   *	bit 3 == 1 means use of reserved bit detected
>   *	bit 4 == 1 means fault was an instruction fetch
>   */
> -fastcall void __kprobes do_page_fault(struct pt_regs *regs,
> -				      unsigned long error_code)
> +notrace fastcall void __kprobes do_page_fault(struct pt_regs *regs,
> +					      unsigned long error_code)

Agreed on this one. It get's called when new modules are loaded for
in-kernel faults. Hard to trace. Will vmalloc_fault, an inline function,
trigger an mcount call ?

>  {
>  	struct task_struct *tsk;
>  	struct mm_struct *mm;
> Index: linux-compile.git/arch/x86/mm/init_32.c
> ===================================================================
> --- linux-compile.git.orig/arch/x86/mm/init_32.c	2008-01-02 22:53:52.000000000 -0500
> +++ linux-compile.git/arch/x86/mm/init_32.c	2008-01-02 22:56:41.000000000 -0500
> @@ -200,7 +200,7 @@ static inline int page_kills_ppro(unsign
>  	return 0;
>  }
>  
> -int page_is_ram(unsigned long pagenr)
> +notrace int page_is_ram(unsigned long pagenr)
>  {
>  	int i;
>  	unsigned long addr, end;
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03 17:35   ` Mathieu Desnoyers
@ 2008-01-03 17:55     ` Steven Rostedt
  0 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 17:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt



On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:
> .....
> > Index: linux-compile.git/arch/x86/kernel/mcount-wrapper.S
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-compile.git/arch/x86/kernel/mcount-wrapper.S	2008-01-03 01:02:33.000000000 -0500
> > @@ -0,0 +1,25 @@
> > +/*
> > + *  linux/arch/x86/mcount-wrapper.S
> > + *
> > + *  Copyright (C) 2004 Ingo Molnar
> > + */
> > +
> > +.globl mcount
> > +mcount:
> > +	cmpl $0, mcount_enabled
> > +	jz out
> > +
> > +	push %ebp
> > +	mov %esp, %ebp
> > +	pushl %eax
> > +	pushl %ecx
> > +	pushl %edx
> > +
> > +	call __mcount
> > +
> > +	popl %edx
> > +	popl %ecx
> > +	popl %eax
> > +	popl %ebp
>
> Writing this stack setup in assembly may be the one thing that conflicts
> with REGPARM ?

Could be.

> > +
> > +/** __mcount - hook for profiling
> > + *
> > + * This routine is called from the arch specific mcount routine, that in turn is
> > + * called from code inserted by gcc -pg.
> > + */
> > +notrace void __mcount(void)
> > +{
> > +	if (mcount_trace_function != dummy_mcount_tracer)
> > +		mcount_trace_function(CALLER_ADDR1, CALLER_ADDR2);
> > +}
>
> I don't see what the mcount_trace_function test gives us here : we
> already tested mcount_enabled.

It's probably me being anal. I did a compare over a function call.
I guess calling dummy_mcount_tracer is OK. I originally had it as NULL
and that had too many races.

> > Index: linux-compile.git/arch/x86/kernel/entry_64.S
> > ===================================================================
> > --- linux-compile.git.orig/arch/x86/kernel/entry_64.S	2008-01-03 01:02:28.000000000 -0500
> > +++ linux-compile.git/arch/x86/kernel/entry_64.S	2008-01-03 01:02:33.000000000 -0500
> > @@ -53,6 +53,52 @@
> >
> >  	.code64
> >
> > +#ifdef CONFIG_MCOUNT
> > +
> > +ENTRY(mcount)
> > +	cmpl $0, mcount_enabled
> > +	jz out
> > +
> > +	push %rbp
> > +
> > +	lea dummy_mcount_tracer, %rbp
> > +	cmpq %rbp, mcount_trace_function
>
>
> Ok, so we normally jump over the function call (with mcount_enabled being 0)
> but we can call it in rare cases when it is being set concurrently (even
> though the mcount_trace_function is there, concurrency could still allow
> the call).
>
> Therefore we have one data cache hit when disabled (mcount_enabled), and
> must do a supplementary comparison before the call when enabled. I
> wonder why the cmpq %rbp, mcount_trace_function test is there at all ?

We can have mcount_enabled on without a tracing function to call. So this
simply saves us from doing another function call.

I've been debating about getting rid of the mcount_enabled, but it makes
it easy for systemtap to disable tracing. We don't even need to modify
systemtap with this, since systemtap already has the ability to turn
mcount_enabled on and off. But it will be a bit uglier to have systemtap
modify the tracing function.

Perhaps calling dummy_mcount_tracer isn't that bad. I'll need to do some
benchmarks between the two.

-- Steve

>
>
> > +	jz out_rbp
> > +
> > +	mov %rsp,%rbp
> > +
> > +	push %r11
> > +	push %r10
> > +	push %r9
> > +	push %r8
> > +	push %rdi
> > +	push %rsi
> > +	push %rdx
> > +	push %rcx
> > +	push %rax
> > +
> > +	mov 0x0(%rbp),%rax
> > +	mov 0x8(%rbp),%rdi
> > +	mov 0x8(%rax),%rsi
> > +
> > +	call   *mcount_trace_function
> > +
> > +	pop %rax
> > +	pop %rcx
> > +	pop %rdx
> > +	pop %rsi
> > +	pop %rdi
> > +	pop %r8
> > +	pop %r9
> > +	pop %r10
> > +	pop %r11
> > +
> > +out_rbp:
> > +	pop %rbp
> > +out:
> > +	ret
> > +#endif
> > +
> >  #ifndef CONFIG_PREEMPT
> >  #define retint_kernel retint_restore_args
> >  #endif
> >
> > --

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/11] mcount tracer show task comm and pid
  2008-01-03  7:16 ` [RFC PATCH 10/11] mcount tracer show task comm and pid Steven Rostedt
@ 2008-01-03 17:56   ` Mathieu Desnoyers
  2008-01-06 15:37     ` Ingo Molnar
  0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 17:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
> This adds the task comm and pid to the trace output. This gives the
> output like:
> 
> CPU 0: sshd:2605 [<ffffffff80251858>] remove_wait_queue+0xc/0x4a <-- [<ffffffff802ad7be>] free_poll_entry+0x1e/0x2a
> CPU 2: bash:2610 [<ffffffff8038c3aa>] tty_check_change+0x9/0xb6 <-- [<ffffffff8038d295>] tty_ioctl+0x59f/0xcdd
> CPU 0: sshd:2605 [<ffffffff80491ec6>] _spin_lock_irqsave+0xe/0x81 <-- [<ffffffff80251863>] remove_wait_queue+0x17/0x4a
> CPU 2: bash:2610 [<ffffffff8024e2f7>] find_vpid+0x9/0x24 <-- [<ffffffff8038d325>] tty_ioctl+0x62f/0xcdd
> CPU 0: sshd:2605 [<ffffffff804923ec>] _spin_unlock_irqrestore+0x9/0x3a <-- [<ffffffff80251891>] remove_wait_queue+0x45/0x4a
> CPU 0: sshd:2605 [<ffffffff802a18b3>] fput+0x9/0x1b <-- [<ffffffff802ad7c6>] free_poll_entry+0x26/0x2a
> 
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  lib/mcount/tracer.c |    6 +++++-
>  lib/mcount/tracer.h |    3 +++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> Index: linux-compile.git/lib/mcount/tracer.c
> ===================================================================
> --- linux-compile.git.orig/lib/mcount/tracer.c	2008-01-02 23:17:21.000000000 -0500
> +++ linux-compile.git/lib/mcount/tracer.c	2008-01-02 23:17:44.000000000 -0500
> @@ -34,6 +34,7 @@ mctracer_add_trace_entry(struct mctracer
>  {
>  	unsigned long idx, idx_next;
>  	struct mctracer_entry *entry;
> +	struct task_struct *tsk = current;

Aren't there situations, like in the middle of a context switch, where
current is not valid ? Is also poses a problem for early boot, and NMI
tracing.

>  
>  	idx = tr->trace_idx[cpu];
>  	idx_next = idx + 1;
> @@ -52,6 +53,8 @@ mctracer_add_trace_entry(struct mctracer
>  	entry->idx	 = atomic_inc_return(&tr->cnt);
>  	entry->ip	 = ip;
>  	entry->parent_ip = parent_ip;
> +	entry->pid	 = tsk->pid;
> +	memcpy(entry->comm, tsk->comm, TASK_COMM_LEN);
>  }
>  
>  static inline notrace void trace_function(const unsigned long ip,
> @@ -223,7 +226,8 @@ static int s_show(struct seq_file *m, vo
>  			return -1;
>  		}
>  
> -		seq_printf(m, "  CPU %d:  ", iter->cpu);
> +		seq_printf(m, "CPU %d: ", iter->cpu);
> +		seq_printf(m, "%s:%d ", iter->ent->comm, iter->ent->pid);
>  		seq_print_ip_sym(m, iter->ent->ip);
>  		if (iter->ent->parent_ip) {
>  			seq_printf(m, " <-- ");
> Index: linux-compile.git/lib/mcount/tracer.h
> ===================================================================
> --- linux-compile.git.orig/lib/mcount/tracer.h	2008-01-02 23:16:15.000000000 -0500
> +++ linux-compile.git/lib/mcount/tracer.h	2008-01-02 23:17:44.000000000 -0500
> @@ -2,11 +2,14 @@
>  #define _LINUX_MCOUNT_TRACER_H
>  
>  #include <asm/atomic.h>
> +#include <linux/sched.h>
>  
>  struct mctracer_entry {
>  	unsigned long idx;
>  	unsigned long ip;
>  	unsigned long parent_ip;
> +	char comm[TASK_COMM_LEN];
> +	pid_t pid;
>  };
>  
>  struct mctracer_trace {
> 
> -- 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/11] mcount tracing utility
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (11 preceding siblings ...)
  2008-01-03 17:22 ` [RFC PATCH 00/11] mcount tracing utility Mathieu Desnoyers
@ 2008-01-03 18:05 ` Andi Kleen
  2008-01-04  6:42 ` Frank Ch. Eigler
  2008-01-08 20:35 ` Tim Bird
  14 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2008-01-03 18:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin

Steven Rostedt <rostedt@goodmis.org> writes:

> The following patch series brings to vanilla Linux a bit of the RT kernel
> trace facility. This incorporates the "-pg" profiling option of gcc
> that will call the "mcount" function for all functions called in
> the kernel.

My personal feeling regarding this code was that it would be much simpler/cleaner
to implement a driver for the "jump tracer"s implemented in various CPUs.
Basically the CPU will write all jumps into a buffer by itself. That
allows you to do many traces (although not latency traces) too.

-Andi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 03/11] Annotate core code that should not be traced
  2008-01-03 17:42   ` Mathieu Desnoyers
@ 2008-01-03 18:07     ` Steven Rostedt
  2008-01-03 18:34       ` Mathieu Desnoyers
  0 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 18:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt




On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:

> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > Mark with "notrace" functions in core code that should not be
> > traced.  The "notrace" attribute will prevent gcc from adding
> > a call to mcount on the annotated funtions.
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> > Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> >

> >
> > Index: linux-compile.git/drivers/clocksource/acpi_pm.c
> > ===================================================================
> > --- linux-compile.git.orig/drivers/clocksource/acpi_pm.c	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/drivers/clocksource/acpi_pm.c	2007-12-20 01:00:48.000000000 -0500
> > @@ -30,13 +30,13 @@
> >   */
> >  u32 pmtmr_ioport __read_mostly;
> >
> > -static inline u32 read_pmtmr(void)
> > +static inline notrace u32 read_pmtmr(void)
> >  {
> >  	/* mask the output to 24 bits */
> >  	return inl(pmtmr_ioport) & ACPI_PM_MASK;
> >  }
> >
> > -u32 acpi_pm_read_verified(void)
> > +notrace u32 acpi_pm_read_verified(void)
> >  {
> >  	u32 v1 = 0, v2 = 0, v3 = 0;
> >
> > @@ -56,12 +56,12 @@ u32 acpi_pm_read_verified(void)
> >  	return v2;
> >  }
> >
> > -static cycle_t acpi_pm_read_slow(void)
> > +static notrace cycle_t acpi_pm_read_slow(void)
> >  {
> >  	return (cycle_t)acpi_pm_read_verified();
> >  }
> >
> > -static cycle_t acpi_pm_read(void)
> > +static notrace cycle_t acpi_pm_read(void)
> >  {
> >  	return (cycle_t)read_pmtmr();
> >  }

Wow! a lot of questions in one paragraph! ;-)
>
> What precision can you get from this clock source ? How many cycles are
> required to read it ? Would it be useful to fall back on the CPU TSC
> when they are synchronized ? Does acpi_pm_read_verified read the
> timestamp atomically ? Is is a reason why you need to disable
> interrupts, and therefore cannot trace NMI handlers ?

This is taken from the RT patch, where the tracing happens on much more
than the mcount. It is used to locate latency hot spots, like timing how
long interrupts are disabled.  It uses whatever is considered the fastest
reliable clock source. Actually, for interrupts off, a simple TSC can be
used because the tracing is only per cpu. But there's also tracing of wake
up times, which can cross over CPUS.

If the TSC is syncronized then it can be used, but there's times when HPET
or ACPI is used. We add the notrace to these functions just so that we can
make them available to the tracing tool kit.  But just because they are
marked as notrace doesn't necessarily mean they have to be used.


>
> > Index: linux-compile.git/include/linux/preempt.h
> > ===================================================================
> > --- linux-compile.git.orig/include/linux/preempt.h	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/include/linux/preempt.h	2007-12-20 01:00:48.000000000 -0500
> > @@ -11,8 +11,8 @@
> >  #include <linux/list.h>
> >
> >  #ifdef CONFIG_DEBUG_PREEMPT
> > -  extern void fastcall add_preempt_count(int val);
> > -  extern void fastcall sub_preempt_count(int val);
> > +  extern notrace void fastcall add_preempt_count(int val);
> > +  extern notrace void fastcall sub_preempt_count(int val);
> >  #else
> >  # define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
> >  # define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
> > Index: linux-compile.git/kernel/irq/handle.c
> > ===================================================================
> > --- linux-compile.git.orig/kernel/irq/handle.c	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/kernel/irq/handle.c	2007-12-20 01:00:48.000000000 -0500
> > @@ -163,7 +163,7 @@ irqreturn_t handle_IRQ_event(unsigned in
> >   * This is the original x86 implementation which is used for every
> >   * interrupt type.
> >   */
> > -fastcall unsigned int __do_IRQ(unsigned int irq)
> > +notrace fastcall unsigned int __do_IRQ(unsigned int irq)
>
> Can you explain the notrace here ?

No.

;)

It came from the RT patch. I'll have to look deeper at these to see if
they are indeed necessary.

>
> >  {
> >  	struct irq_desc *desc = irq_desc + irq;
> >  	struct irqaction *action;
> > Index: linux-compile.git/kernel/lockdep.c
> > ===================================================================
> > --- linux-compile.git.orig/kernel/lockdep.c	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/kernel/lockdep.c	2007-12-20 01:00:48.000000000 -0500
> > @@ -270,14 +270,14 @@ static struct list_head chainhash_table[
> >  	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
> >  	(key2))
> >
> > -void lockdep_off(void)
> > +notrace void lockdep_off(void)
> >  {
> >  	current->lockdep_recursion++;
> >  }
> >
>
> Due to interrupt disabling in your tracing code I suppose.

probably. Again, this came from the RT patch. And this stuff has been in
the patch for ages. So some of it was just plain paranoia. Others were
needed for some kind of tracing.

BTW, I'm curious? How do you handle NMIs and tracing. Do you use a
separate buffer for storing your data on NMIs or do you have some kind of
cmpxchg that can atomically reserve parts of the trace buffer?

I turn off interrupts in my tracing just to make the code simpler.

>
> >  EXPORT_SYMBOL(lockdep_off);
> >

> > -static void check_flags(unsigned long flags)
> > +static notrace void check_flags(unsigned long flags)
> >  {
> >  #if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
> >  	if (!debug_locks)
> > @@ -2685,8 +2685,8 @@ static void check_flags(unsigned long fl
> >   * We are not always called with irqs disabled - do that here,
> >   * and also avoid lockdep recursion:
> >   */
> > -void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> > -		  int trylock, int read, int check, unsigned long ip)
> > +notrace void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> > +			  int trylock, int read, int check, unsigned long ip)
> >  {
> >  	unsigned long flags;
> >
> > @@ -2708,7 +2708,8 @@ void lock_acquire(struct lockdep_map *lo
> >
> >  EXPORT_SYMBOL_GPL(lock_acquire);
> >
> > -void lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
> > +notrace void lock_release(struct lockdep_map *lock, int nested,
> > +			  unsigned long ip)
> >  {
> >  	unsigned long flags;
>
> Do you really use locks in your tracing code ? I thought you were using
> per cpu buffers.

No but the latency_tracer does. Again this could be just paranoid. I'm not
sure the "function trace" part of the latency trace does do this, but
other parts do. Again, I'll work on trimming down some of these "notrace"
and only add them when needed.

>
> >
> > Index: linux-compile.git/kernel/rcupdate.c
> > ===================================================================
> > --- linux-compile.git.orig/kernel/rcupdate.c	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/kernel/rcupdate.c	2007-12-20 01:00:48.000000000 -0500
> > @@ -504,7 +504,7 @@ static int __rcu_pending(struct rcu_ctrl
> >   * by the current CPU, returning 1 if so.  This function is part of the
> >   * RCU implementation; it is -not- an exported member of the RCU API.
> >   */
> > -int rcu_pending(int cpu)
> > +notrace int rcu_pending(int cpu)
> >  {
> >  	return __rcu_pending(&rcu_ctrlblk, &per_cpu(rcu_data, cpu)) ||
> >  		__rcu_pending(&rcu_bh_ctrlblk, &per_cpu(rcu_bh_data, cpu));
> > Index: linux-compile.git/kernel/spinlock.c
> > ===================================================================
> > --- linux-compile.git.orig/kernel/spinlock.c	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/kernel/spinlock.c	2007-12-20 01:00:48.000000000 -0500
> > @@ -437,7 +437,7 @@ int __lockfunc _spin_trylock_bh(spinlock
> >  }
> >  EXPORT_SYMBOL(_spin_trylock_bh);
> >
> > -int in_lock_functions(unsigned long addr)
> > +notrace int in_lock_functions(unsigned long addr)
> >  {
> >  	/* Linker adds these: start and end of __lockfunc functions */
> >  	extern char __lock_text_start[], __lock_text_end[];
> > Index: linux-compile.git/lib/smp_processor_id.c
> > ===================================================================
> > --- linux-compile.git.orig/lib/smp_processor_id.c	2007-12-20 01:00:29.000000000 -0500
> > +++ linux-compile.git/lib/smp_processor_id.c	2007-12-20 01:00:48.000000000 -0500
> > @@ -7,7 +7,7 @@
> >  #include <linux/kallsyms.h>
> >  #include <linux/sched.h>
> >
> > -unsigned int debug_smp_processor_id(void)
> > +notrace unsigned int debug_smp_processor_id(void)
> >  {
> >  	unsigned long preempt_count = preempt_count();
> >  	int this_cpu = raw_smp_processor_id();
> >
> > --

Thanks for the feedback,

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03 13:58     ` Steven Rostedt
@ 2008-01-03 18:16       ` Chris Wright
  2008-01-03 19:15         ` Steven Rostedt
  2008-01-03 19:18         ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 40+ messages in thread
From: Chris Wright @ 2008-01-03 18:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, LKML, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt,
	Chris Wright, Rusty Russell, virtualization

* Steven Rostedt (rostedt@goodmis.org) wrote:
> Hmm, I know paravirt-ops had an issue with mcount in the RT tree. I can't
> remember the exact issues, but it did have something to do with the way
> parameters were passed in.
> 
> Chris, do you remember what the issues were?

Yes, paravirt ops have a well-specified calling convention (register
based).  There was a cleanup that Andi did that caused the problem
because it removed all the "fastcall" annotations since -mregparm=3
is now always on for i386.  Since MCOUNT disables REGPARM the calling
convention changes (caller pushes to stack, callee expects register)
chaos ensues.  I sent a patch to fix that quite some months back, but
it went stale and I neglected to update it.  Would you like me to dig
it up refresh and resend?

thanks,
-chris

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 03/11] Annotate core code that should not be traced
  2008-01-03 18:07     ` Steven Rostedt
@ 2008-01-03 18:34       ` Mathieu Desnoyers
  0 siblings, 0 replies; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-03 18:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Gregory Haskins, Arnaldo Carvalho de Melo,
	William L. Irwin, Steven Rostedt

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> 
 ....
> >
> > >  {
> > >  	struct irq_desc *desc = irq_desc + irq;
> > >  	struct irqaction *action;
> > > Index: linux-compile.git/kernel/lockdep.c
> > > ===================================================================
> > > --- linux-compile.git.orig/kernel/lockdep.c	2007-12-20 01:00:29.000000000 -0500
> > > +++ linux-compile.git/kernel/lockdep.c	2007-12-20 01:00:48.000000000 -0500
> > > @@ -270,14 +270,14 @@ static struct list_head chainhash_table[
> > >  	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
> > >  	(key2))
> > >
> > > -void lockdep_off(void)
> > > +notrace void lockdep_off(void)
> > >  {
> > >  	current->lockdep_recursion++;
> > >  }
> > >
> >
> > Due to interrupt disabling in your tracing code I suppose.
> 
> probably. Again, this came from the RT patch. And this stuff has been in
> the patch for ages. So some of it was just plain paranoia. Others were
> needed for some kind of tracing.
> 
> BTW, I'm curious? How do you handle NMIs and tracing. Do you use a
> separate buffer for storing your data on NMIs or do you have some kind of
> cmpxchg that can atomically reserve parts of the trace buffer?
> 

Because I want to deal with weird cases like :

kernel calling tracing code, causing a user-space page fault, which
calls the tracer, which could be interrupted by an NMI, it would become
tricky to find out how many contexts could be stacked in the worse case
for each architecture. Therefore, using a cmpxchg-based algorithm is the
solution I used. However, in order to make this fast, I extented the
"local atomic operations" to offer a local_cmpxchg.

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03 18:16       ` Chris Wright
@ 2008-01-03 19:15         ` Steven Rostedt
  2008-01-03 19:17           ` Chris Wright
  2008-01-03 19:18         ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2008-01-03 19:15 UTC (permalink / raw)
  To: Chris Wright
  Cc: Ingo Molnar, LKML, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt,
	Chris Wright, Rusty Russell, virtualization


On Thu, 3 Jan 2008, Chris Wright wrote:

>
> Yes, paravirt ops have a well-specified calling convention (register
> based).  There was a cleanup that Andi did that caused the problem
> because it removed all the "fastcall" annotations since -mregparm=3
> is now always on for i386.  Since MCOUNT disables REGPARM the calling
> convention changes (caller pushes to stack, callee expects register)
> chaos ensues.  I sent a patch to fix that quite some months back, but
> it went stale and I neglected to update it.  Would you like me to dig
> it up refresh and resend?

Chris, thanks for the refresher.

I'm going to see if we can remove the REGPARM hack and change the way
mcount does its calls. Maybe this will fix things for us.

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03 19:15         ` Steven Rostedt
@ 2008-01-03 19:17           ` Chris Wright
  0 siblings, 0 replies; 40+ messages in thread
From: Chris Wright @ 2008-01-03 19:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Wright, Ingo Molnar, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Christoph Hellwig, Mathieu Desnoyers,
	Gregory Haskins, Arnaldo Carvalho de Melo, William L. Irwin,
	Steven Rostedt, Chris Wright, Rusty Russell, virtualization

* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Thu, 3 Jan 2008, Chris Wright wrote:
> > Yes, paravirt ops have a well-specified calling convention (register
> > based).  There was a cleanup that Andi did that caused the problem
> > because it removed all the "fastcall" annotations since -mregparm=3
> > is now always on for i386.  Since MCOUNT disables REGPARM the calling
> > convention changes (caller pushes to stack, callee expects register)
> > chaos ensues.  I sent a patch to fix that quite some months back, but
> > it went stale and I neglected to update it.  Would you like me to dig
> > it up refresh and resend?
> 
> Chris, thanks for the refresher.
> 
> I'm going to see if we can remove the REGPARM hack and change the way
> mcount does its calls. Maybe this will fix things for us.

I don't recall why mcount disables regparm, but I think you're on the
right path to remove that dependency.

thanks,
-chris

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation
  2008-01-03 18:16       ` Chris Wright
  2008-01-03 19:15         ` Steven Rostedt
@ 2008-01-03 19:18         ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-03 19:18 UTC (permalink / raw)
  To: Chris Wright
  Cc: Steven Rostedt, Chris Wright, Mathieu Desnoyers, Peter Zijlstra,
	Gregory Haskins, LKML, virtualization, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, William L. Irwin,
	Ingo Molnar, Linus Torvalds, Andrew Morton

Chris Wright wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
>   
>> Hmm, I know paravirt-ops had an issue with mcount in the RT tree. I can't
>> remember the exact issues, but it did have something to do with the way
>> parameters were passed in.
>>
>> Chris, do you remember what the issues were?
>>     
>
> Yes, paravirt ops have a well-specified calling convention (register
> based).  There was a cleanup that Andi did that caused the problem
> because it removed all the "fastcall" annotations since -mregparm=3
> is now always on for i386.  Since MCOUNT disables REGPARM the calling
> convention changes (caller pushes to stack, callee expects register)
> chaos ensues.  I sent a patch to fix that quite some months back, but
> it went stale and I neglected to update it.  Would you like me to dig
> it up refresh and resend?

Ingo/Andrew have been accepting patches to systematically remove all the
fastcall annotations from the kernel, so adding them back isn't going to
help.

Ingo and I discussed whether we need to reannotate paravirt.h (either
with fastcall or something else indicating a register-only calling
convention), specifically because of the -pg issue, but I think the
conclusion was that whatever problem existed no longer does, and there's
no incompatibility between -pg and regparm.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/11] mcount tracing utility
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (12 preceding siblings ...)
  2008-01-03 18:05 ` Andi Kleen
@ 2008-01-04  6:42 ` Frank Ch. Eigler
  2008-01-08 20:35 ` Tim Bird
  14 siblings, 0 replies; 40+ messages in thread
From: Frank Ch. Eigler @ 2008-01-04  6:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin

Steven Rostedt <rostedt@goodmis.org> writes:

> The following patch series brings to vanilla Linux a bit of the RT kernel
> trace facility. This incorporates the "-pg" profiling option of gcc
> that will call the "mcount" function for all functions called in
> the kernel.
> [...]
> [Future:] SystemTap:
> ----------
> One thing that Arnaldo and I discussed last year was using systemtap to
> add hooks into the kernel to start and stop tracing.  

Sure.  The dual of this makes sense too: letting systemtap scripts
hook up to the mcount callback itself, for purposes beyond just
tracing the function calls.

> kprobes is too heavy to do on all funtion calls, but it would be
> perfect to add to non hot paths to start the tracer and stop the
> tracer.

(Note that kprobes are not the only event sources systemtap can use:
markers, timers, procfs control files, and some others.  Any
combination of these can be used in a script to express start/stop
decisions.)


- FChE

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/11] mcount tracer show task comm and pid
  2008-01-03 17:56   ` Mathieu Desnoyers
@ 2008-01-06 15:37     ` Ingo Molnar
  2008-01-07  4:45       ` Mathieu Desnoyers
  0 siblings, 1 reply; 40+ messages in thread
From: Ingo Molnar @ 2008-01-06 15:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Christoph Hellwig, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> > @@ -34,6 +34,7 @@ mctracer_add_trace_entry(struct mctracer
> >  {
> >  	unsigned long idx, idx_next;
> >  	struct mctracer_entry *entry;
> > +	struct task_struct *tsk = current;
> 
> Aren't there situations, like in the middle of a context switch, where 
> current is not valid ? Is also poses a problem for early boot, and NMI 
> tracing.

no such problems on x86.

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/11] mcount tracer show task comm and pid
  2008-01-06 15:37     ` Ingo Molnar
@ 2008-01-07  4:45       ` Mathieu Desnoyers
  2008-01-09 16:45         ` Ingo Molnar
  0 siblings, 1 reply; 40+ messages in thread
From: Mathieu Desnoyers @ 2008-01-07  4:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Christoph Hellwig, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > > @@ -34,6 +34,7 @@ mctracer_add_trace_entry(struct mctracer
> > >  {
> > >  	unsigned long idx, idx_next;
> > >  	struct mctracer_entry *entry;
> > > +	struct task_struct *tsk = current;
> > 
> > Aren't there situations, like in the middle of a context switch, where 
> > current is not valid ? Is also poses a problem for early boot, and NMI 
> > tracing.
> 
> no such problems on x86.
> 
> 	Ingo

I based my comments on the following code snippet, but I think I start
to understand what makes it "so special"

arch/x86/mm/fault_32.c :

static inline int vmalloc_fault(unsigned long address)
{
        unsigned long pgd_paddr;
        pmd_t *pmd_k;
        pte_t *pte_k;
        /*
         * Synchronize this task's top level page-table
         * with the 'reference' page table.
         *
---->    * Do _not_ use "current" here. We might be inside
         * an interrupt in the middle of a task switch..
         */
        pgd_paddr = read_cr3();
        pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
        if (!pmd_k)
                return -1;
        pte_k = pte_offset_kernel(pmd_k, address);
        if (!pte_present(*pte_k))
                return -1;
        return 0;
}

At context switch on x86, loading the registers is done first, and only
after the is the current pointer set. However, for vmalloc faults, it's
the value in the cr3 register that is important, which may not
correspond to the cr3 value saved in "current".

So, I think using the "pid" and "comm" fields of current, even in NMI
context, is not a problem, just as you said. For early boot, the current
task will be init_task, which has pid = 0 and comm = "swapper", still
ok.

Thanks for pointing it out.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/11] Add fastcall to do_IRQ for i386
  2008-01-03 17:47     ` Steven Rostedt
@ 2008-01-07  4:50       ` H. Peter Anvin
  2008-01-07 12:42         ` Steven Rostedt
  0 siblings, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2008-01-07  4:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Linus Torvalds,
	Andrew Morton, Peter Zijlstra, Christoph Hellwig,
	Gregory Haskins, Arnaldo Carvalho de Melo, William L. Irwin,
	Steven Rostedt

Steven Rostedt wrote:
> 
> On Thu, 3 Jan 2008, Mathieu Desnoyers wrote:
>> I would propose to try to see how we can #ifdef two different __mcount
>> assembly functions that would prepare the stack appropriately for each
>> REGPARM cases.
> 
> I have to confess that I've been testing this mostly on x86_64, which
> doesn't have the troubles with REGPARM as i386 does. I'll need to
> investigate this a bit deeper on i386 alone.
> 

I thought we had dropped support for the non-REGPARM case, so why don't 
we just make it work for REGPARM and be done with it?

	-hpa


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/11] Add fastcall to do_IRQ for i386
  2008-01-07  4:50       ` H. Peter Anvin
@ 2008-01-07 12:42         ` Steven Rostedt
  0 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2008-01-07 12:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Linus Torvalds,
	Andrew Morton, Peter Zijlstra, Christoph Hellwig,
	Gregory Haskins, Arnaldo Carvalho de Melo, William L. Irwin,
	Steven Rostedt


On Sun, 6 Jan 2008, H. Peter Anvin wrote:
>
> I thought we had dropped support for the non-REGPARM case, so why don't
> we just make it work for REGPARM and be done with it?
>

I'm working on that ;-)

-- Steve


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/11] mcount tracing utility
  2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
                   ` (13 preceding siblings ...)
  2008-01-04  6:42 ` Frank Ch. Eigler
@ 2008-01-08 20:35 ` Tim Bird
  14 siblings, 0 replies; 40+ messages in thread
From: Tim Bird @ 2008-01-08 20:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Christoph Hellwig, Mathieu Desnoyers, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Nicholas Mc Guire

Steven Rostedt wrote:
> The following patch series brings to vanilla Linux a bit of the RT kernel
> trace facility. This incorporates the "-pg" profiling option of gcc
> that will call the "mcount" function for all functions called in
> the kernel.
> 
> This patch series implements the code for x86 (32 and 64 bit), but
> other archs can easily be implemented as well.

Steven,

This is really exciting!

As a former maintainer of the (out-of-tree) Kernel Function Trace
system, I really welcome this.  I'm just getting out from under
a backlog of work due to the holiday break, but I'm very interested.
I will take a detailed look at this this week.

I have been working with -finstrument-functions for a few years
now, so I know of a few gotchas with that (e.g. It's currently broken
on ARM EABI with GCC 4.x)  This bug is one of the issues that has
prevented me from attempting to mainline the KFT work this last
year.

Please keep me CC'ed on developments in this area, and let me
know if there are any specific things I can do to help.  I'd be
very interested in helping out with non-x86 arch support.

Regards,
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/11] mcount tracer show task comm and pid
  2008-01-07  4:45       ` Mathieu Desnoyers
@ 2008-01-09 16:45         ` Ingo Molnar
  0 siblings, 0 replies; 40+ messages in thread
From: Ingo Molnar @ 2008-01-09 16:45 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Christoph Hellwig, Gregory Haskins,
	Arnaldo Carvalho de Melo, William L. Irwin, Steven Rostedt


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> static inline int vmalloc_fault(unsigned long address)
> {
>         unsigned long pgd_paddr;
>         pmd_t *pmd_k;
>         pte_t *pte_k;
>         /*
>          * Synchronize this task's top level page-table
>          * with the 'reference' page table.
>          *
> ---->    * Do _not_ use "current" here. We might be inside
>          * an interrupt in the middle of a task switch..
>          */
>         pgd_paddr = read_cr3();
>         pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
>         if (!pmd_k)
>                 return -1;
>         pte_k = pte_offset_kernel(pmd_k, address);
>         if (!pte_present(*pte_k))
>                 return -1;
>         return 0;
> }
> 
> At context switch on x86, loading the registers is done first, and 
> only after the is the current pointer set. However, for vmalloc 
> faults, it's the value in the cr3 register that is important, which 
> may not correspond to the cr3 value saved in "current".
> 
> So, I think using the "pid" and "comm" fields of current, even in NMI 
> context, is not a problem, just as you said. For early boot, the 
> current task will be init_task, which has pid = 0 and comm = 
> "swapper", still ok.

yeah - during the context-switch the value of 'current' might be 'stale' 
in a number of ways, but it's always atomically and coherently either 
pointing to the previous task or the next task. So from a tracing POV 
it's perfectly safe to use it (and we've been doing that for ages with 
the mcount stuff).

(The notrace mcount exclusions arent really to avoid any tracing 
badness, they are mostly to make the trace less spammy and more 
readable.)

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2008-01-09 16:46 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-03  7:16 [RFC PATCH 00/11] mcount tracing utility Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 01/11] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-03  8:31   ` Sam Ravnborg
2008-01-03 14:03     ` Steven Rostedt
2008-01-03  9:21   ` Ingo Molnar
2008-01-03 13:58     ` Steven Rostedt
2008-01-03 18:16       ` Chris Wright
2008-01-03 19:15         ` Steven Rostedt
2008-01-03 19:17           ` Chris Wright
2008-01-03 19:18         ` Jeremy Fitzhardinge
2008-01-03 16:01   ` Daniel Walker
2008-01-03 17:35   ` Mathieu Desnoyers
2008-01-03 17:55     ` Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 02/11] Add fastcall to do_IRQ for i386 Steven Rostedt
2008-01-03 17:36   ` Mathieu Desnoyers
2008-01-03 17:47     ` Steven Rostedt
2008-01-07  4:50       ` H. Peter Anvin
2008-01-07 12:42         ` Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 03/11] Annotate core code that should not be traced Steven Rostedt
2008-01-03 17:42   ` Mathieu Desnoyers
2008-01-03 18:07     ` Steven Rostedt
2008-01-03 18:34       ` Mathieu Desnoyers
2008-01-03  7:16 ` [RFC PATCH 04/11] i386: notrace annotations Steven Rostedt
2008-01-03 17:52   ` Mathieu Desnoyers
2008-01-03  7:16 ` [RFC PATCH 05/11] x86_64: " Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 06/11] add notrace annotations to vsyscall Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 07/11] mcount based trace in the form of a header file library Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 08/11] tracer add debugfs interface Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 09/11] mcount tracer output file Steven Rostedt
2008-01-03  7:16 ` [RFC PATCH 10/11] mcount tracer show task comm and pid Steven Rostedt
2008-01-03 17:56   ` Mathieu Desnoyers
2008-01-06 15:37     ` Ingo Molnar
2008-01-07  4:45       ` Mathieu Desnoyers
2008-01-09 16:45         ` Ingo Molnar
2008-01-03  7:16 ` [RFC PATCH 11/11] Add a symbol only trace output Steven Rostedt
2008-01-03 17:22 ` [RFC PATCH 00/11] mcount tracing utility Mathieu Desnoyers
2008-01-03 17:42   ` Steven Rostedt
2008-01-03 18:05 ` Andi Kleen
2008-01-04  6:42 ` Frank Ch. Eigler
2008-01-08 20:35 ` Tim Bird

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).