linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/10] Linux Kernel Markers for 2.6.21-mm2
@ 2007-05-10  1:55 Mathieu Desnoyers
  2007-05-10  1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
                   ` (10 more replies)
  0 siblings, 11 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch

Hi Andrew,

He is an updated, folded, version of the Linux Kernel Markers. It replaces the
version found in 2.6.21-mm2 at the exact same spot in the series file.

Main changes :
- It renames the MARK() trace_mark(), as suggested by Christoph Hellwig.
- It now defines the structures contained out of the marker section outside of
  the #ifdef __KERNEL__ so the SystemTAP tool can use the header to extract the
  markers from the kernel binary.
- The i386 optimization has been tweaked a bit : the call is now in the else
  branch. It has something to do with the i386 "fast path" being the taken
  branch of a jump, to make loops faster. It seems that the if/else statements
  are also affected : the if () will be faster than the else. Therefore, I put
  the call in the else, to make sure the fastest path is when the marker is
  disabled.
- I added the blktrace port to the markers infrastructure. It makes one in-tree
  markers user.

Please remove :

linux-kernel-markers-kconfig-menus.patch
linux-kernel-markers-architecture-independant-code.patch
linux-kernel-markers-powerpc-optimization.patch
linux-kernel-markers-i386-optimization.patch
markers-add-instrumentation-markers-menus-to-avr32.patch
linux-kernel-markers-non-optimized-architectures.patch
markers-alpha-and-avr32-supportadd-alpha-markerh-add-arm26-markerh.patch
linux-kernel-markers-documentation.patch
#
markers-define-the-linker-macro-extra_rwdata.patch
markers-use-extra_rwdata-in-architectures.patch


And add, instead :

linux-kernel-markers-kconfig-menus.patch
linux-kernel-markers-architecture-independant-code.patch
linux-kernel-markers-header-visible-from-userspace.h
linux-kernel-markers-powerpc-optimization.patch
linux-kernel-markers-i386-optimization.patch
linux-kernel-markers-non-optimized-architectures.patch
linux-kernel-markers-documentation.patch
#
linux-kernel-markers-define-the-linker-macro-extra_rwdata.patch
linux-kernel-markers-use-extra_rwdata-in-architectures.patch
#
linux-kernel-markers-port-blktrace-to-markers.patch


Thanks,

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
@ 2007-05-10  1:55 ` Mathieu Desnoyers
  2007-05-10  6:57   ` Christoph Hellwig
  2007-05-10  1:55 ` [patch 02/10] Linux Kernel Markers, architecture independent code Mathieu Desnoyers
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers, Adrian Bunk

[-- Attachment #1: linux-kernel-markers-kconfig-menus.patch --]
[-- Type: text/plain, Size: 25693 bytes --]

With the increasing complexity of today's user-space application and the wide
deployment of SMP systems, the users need an increasing understanding of the
behavior and performance of a system across multiple processes/different
execution contexts/multiple CPUs.  In applications such as large clusters
(Google, IBM), video acquisition (Autodesk), embedded real-time systems (Wind
River, Monta Vista, Sony) or sysadmin/programmer-type tasks (SystemTAP from
Redhat), a tool that permits tracing of kernel-user space interaction becomes
necessary.

Usage of such tools have been made to successfully pinpoint problems such as:
latency issues in a user-space video acquisition application, slowdown
problems in large clusters due to a switch to a different filesystems with a
different cache size, abnormal Linux scheduler latency (just to name a few
that I have personally investigated).

The currently existing solutions does not give a system-wide overview of what
- and when - things are happening on the system.  Ptracing a program works
with few processes, but quickly becomes useless when it comes to keeping track
of many processes.

Bugs occuring because of bad interaction of such complex systems can be very
hard to find due to the fact that they occur rarely (sometimes once a week on
hundreds of machines).  One can therefore only hope at having the best
conditions to statistically reproduce the bug while extracting information
from the system.  Some bugs have been successfully found at Google using their
ktrace tracer only because they could enable it on production machines and
therefore recreate the same context where the bug happened.

Therefore, it makes sense to offer an instrumentation set of the most relevant
events occurring in the Linux that can have the smallest performance cost
possible when not active while not requiring a reboot of a production system
to activate.  This is essentially what the markers are providing.

Since we cannot limit the growth of the Linux kernel, nor can we pre-determine
each and every "interesting" instrumentation within each subsystem and driver,
it is sensible to let this task to the persons who knows the best their code. 
Adding instrumentation should therefore be as easy as adding and maintaining a
"printk" in the kernel code from the developer's point of view.

Towards a complete tracing mechanism in the Linux kernel, the markers are only
one step forward.  The following step is to connect probes to those markers
that will record the tracing information in buffers exported to user-space,
organized in timestamped "events".  Probe callbacks are responsible for
serializing the information passed as parameter to the markers (described by
the format string) into the events.  A control mechanism to activate/stop the
tracing is required, as well as a daemon that maps the buffers to write them
to disk or send them through the network.

Keeping track of the events also requires a centralized infrastructure : the
idea is to assign a unique ID to each event so they can be later recognized in
the trace.  Keeping in mind that recording the complete instrumentation site
name string for each event would be more that inefficient, assigning a numeric
unique identifier makes sense.

Finally, support for gathering events coming from user-space, with a minimal
performance impact, is very useful to see the interaction between the system's
execution contexts.

The last steps are currently implemented in Linux Trace Toolkit Next
Generation (LTTng).

The SystemTAP project could clearly benefit from such an infrastructure for
tracing.  In addition, they would be providing support for dynamic addition of
kernel probes through breakpoints/jumps when possible, with the associated
restrictions (accessing local variables, reentrancy, speed).




This marker infrastructure is a hook-callback mechanism.  It is meant to have
an impact as low as possible on the system performances when no callback
(probe) is connected so markers (hooks) can be compiled into a production
kernel without noticeable slowdown.

The rationale behind this mechanism the following :
1 - It makes sense to have instrumentation (for tracing, profiling)
    within the kernel source tree so that it can follow its evolution.
    Other options, such as kprobes, imply maintaining an external set of
    instrumentation that must be adapted to each kernel version.
    Although it may make sense for distributions, it is not well suited
    for kernel developers, since they rarely work on a major
    distribution image.
2 - kprobes, although being a very good attempt at providing a dynamic
    hooking mechanism that has no impact when disabled, suffers from
    important limitations :
  a - It cannot access local variables of a function at a particular
      point within its body that will be consistent thorough the kernel
      versions without involving a lot of recurrent hair-pulling.
  b - Kprobes is slow, since it involves going though a trap each time
      a probe site is executed. Even though the djprobes project made a
      good effort to make things faster, it cannot currently instrument
      fully-preemptible kernels and does not solve (1), (2a) and (2c).
  c - On the reentrancy side, going though a trap (thus playing with
      interrupt enable/disable) and taking spinlocks are not suited to
      some code paths, i.e. :
      kernel/lockdep.c, printk (within the lockdep_on()/lockdep_off()).
      It must be understood that some code paths interesting for
      instrumentation often present a particular reentrancy challenge.

Some more details :

The probe callback connection to its markers is done dynamically.  A predicted
branch is used to skip the hook stack setup and function call when the marker
is "disabled" (no probe is connected).  Further optimizations can be
implemented for each architecture to make this branch faster.

Instrumentation of a subsystem becomes therefore a straightforward task.  One
has to add instrumentation within the key locations of the kernel code in the
following form :

trace_mark(subsystem_event, "%d %p", myint, myptr);


Why use the markers instead of kprobes?

The rationale behind this mechanism the following :

1 - It makes sense to have instrumentation (for tracing, profiling)
    within the kernel source tree so that it can follow its evolution.
    Other options, such as kprobes, imply maintaining an external set of
    instrumentation that must be adapted to each kernel version.
    Although it may make sense for distributions, it is not well suited
    for kernel developers, since they rarely work on a major
    distribution image.
2 - kprobes, although being a very good attempt at providing a dynamic
    hooking mechanism that has no impact when disabled, suffers from
    important limitations :
  a - It cannot access local variables of a function at a particular
      point within its body that will be consistent thorough the kernel
      versions without involving a lot of recurrent hair-pulling.
  b - Kprobes is slow, since it involves going though a trap each time
      a probe site is executed. Even though the djprobes project made a
      good effort to make things faster, it cannot currently instrument
      fully-preemptible kernels and does not solve (1), (2a) and (2c).
  c - On the reentrancy side, going though a trap (thus playing with
      interrupt enable/disable) and taking spinlocks are not suited to
      some code paths, i.e. :
      kernel/lockdep.c, printk (within the lockdep_on()/lockdep_off()).
      It must be understood that some code paths interesting for
      instrumentation often present a particular reentrancy challenge.




Jim Keniston <jkenisto@us.ibm.com> adds:

kprobes remains a vital foundation for SystemTap.  But markers are attactive
as an alternate source of trace/debug info.  Here's why:

1. Markers will live in the kernel and presumably be kept up to date by
   the maintainers of the enclosing code.  We have a growing set of tapsets
   (probe libraries), each of which "knows" the source code for a certain area
   of the kernel.  Whenever the underlying kernel code changes (e.g., a
   function or one of its args disappears or is renamed), there's a chance
   that the tapset will become invalid until we bring it back in sync with the
   kernel.  As you can imagine, maintaining tapsets separate from the kernel
   source is a maintenance headache.  Markers could mitigate this.

2. Because the kernel code is highly optimized, the kernel's dwarf info
   doesn't always accurately reflect which variables have which values on
   which lines (sometimes even upon entry to a function).  A marker is a way
   to ensure that values of interest are available to SystemTap at marked
   points.

3. Sometimes the overhead of a kprobe probepoint is too much (either in
   terms of time or locking) for the particular hotspot we want to probe.


In OLS2006 proceedings, vol. 1
http://www.linuxsymposium.org/2006/linuxsymposium_procv1.pdf

Frank C. Eigler, from SystemTAP, presents its "static probing markers"
(pp. 261-268) in his paper "Problem Solving With Systemtap".

He explains the advantages :

"In exchange for this effort, systemtap marker-based probes are faster and
 more precise than kprobes.  The better precision comes from not having to
 covet the compiler's favours.  Such fickle favours include retaining
 clean boundaries in the instruction stream between interesting statements,
 and precisely describing positions of variables in the stack frame.  Since
 markers don't rely on debugging information, neither favour is required,
 and the compiler can channel its charms into unabated optimization.  The
 speed advantage comes from using direct call instructions rather than int 3
 breakpoints to dispatch to the systemtap handlers.  We will see below just
 how big a difference this makes."

He does a comparison of his "simple" marker solution with kprobes (his simple
solution looks like my generic markers, but with a major race condition).  I
also posted numbers about the markers performance impact a few months ago in
the initial thread.  I can dig into my emails to find them for you if you
consider it important for the Changelog.

He concludes with :

"To the extent that is true, we propose that these groups consider using a
 shared pool of static markers as the basic kernel-side instrumentation
 mechanism.  If they prove to have as low dormant cost and as high active
 performance as initial experience suggests, perhaps this could motivate the
 various tracing efforts and kernel subsystem developers to finally join
 forces.  Let's designate standard trace/probe points once and for all. 
 Tracing backends can attach to these markers the same way systemtap would. 
 There would be no need for them to maintain kernel patches any more. 
 Let's think about it."


This patch:

Add Kconfig menus for the marker code.

[bunk@stusta.de: Never ever select MODULES]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/Kconfig       |    6 ++++++
 arch/arm/Kconfig         |    6 ++++++
 arch/arm26/Kconfig       |    6 ++++++
 arch/avr32/Kconfig.debug |    7 +++++++
 arch/cris/Kconfig        |    6 ++++++
 arch/frv/Kconfig         |    6 ++++++
 arch/h8300/Kconfig       |    6 ++++++
 arch/i386/Kconfig        |    3 +++
 arch/ia64/Kconfig        |    3 +++
 arch/m32r/Kconfig        |    6 ++++++
 arch/m68k/Kconfig        |    6 ++++++
 arch/m68knommu/Kconfig   |    6 ++++++
 arch/mips/Kconfig        |    6 ++++++
 arch/parisc/Kconfig      |    6 ++++++
 arch/powerpc/Kconfig     |    3 +++
 arch/ppc/Kconfig         |    6 ++++++
 arch/s390/Kconfig        |    2 ++
 arch/sh/Kconfig          |    6 ++++++
 arch/sh64/Kconfig        |    6 ++++++
 arch/sparc/Kconfig       |    2 ++
 arch/sparc64/Kconfig     |    3 +++
 arch/um/Kconfig          |    6 ++++++
 arch/v850/Kconfig        |    6 ++++++
 arch/x86_64/Kconfig      |    3 +++
 arch/xtensa/Kconfig      |    6 ++++++
 kernel/Kconfig.marker    |   20 ++++++++++++++++++++
 kernel/module.c          |    1 +
 27 files changed, 149 insertions(+)

Index: linux-2.6-lttng/arch/alpha/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/alpha/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/alpha/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -642,6 +642,12 @@
 
 source "arch/alpha/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/alpha/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/arm/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/arm/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/arm/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -1025,6 +1025,12 @@
 
 source "arch/arm/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/arm/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/arm26/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/arm26/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/arm26/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -241,6 +241,12 @@
 
 source "drivers/usb/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/arm26/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/cris/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/cris/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/cris/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -198,6 +198,12 @@
 
 source "drivers/usb/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/cris/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/frv/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/frv/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/frv/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -385,6 +385,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/frv/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/h8300/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/h8300/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/h8300/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -220,6 +220,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/h8300/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/i386/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/i386/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/i386/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -1229,6 +1229,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/i386/Kconfig.debug"
Index: linux-2.6-lttng/arch/ia64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/ia64/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/ia64/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -591,6 +591,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/ia64/Kconfig.debug"
Index: linux-2.6-lttng/arch/m32r/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m32r/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/m32r/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -401,6 +401,12 @@
 
 source "arch/m32r/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/m32r/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/m68k/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m68k/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/m68k/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -670,6 +670,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/m68k/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/m68knommu/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/m68knommu/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/m68knommu/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -676,6 +676,12 @@
 
 source "fs/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/m68knommu/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/mips/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/mips/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/mips/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -2150,6 +2150,12 @@
 
 source "arch/mips/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/mips/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/parisc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/parisc/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/parisc/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -269,6 +269,12 @@
 
 source "arch/parisc/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/parisc/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -902,6 +902,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/powerpc/Kconfig.debug"
Index: linux-2.6-lttng/arch/ppc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/ppc/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/ppc/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -1453,8 +1453,14 @@
 
 source "lib/Kconfig"
 
+menu "Instrumentation Support"
+
 source "arch/powerpc/oprofile/Kconfig"
 
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/ppc/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/s390/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/s390/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/s390/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -567,6 +567,8 @@
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
 
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/s390/Kconfig.debug"
Index: linux-2.6-lttng/arch/sh/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sh/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/sh/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -738,6 +738,12 @@
 
 source "arch/sh/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/sh/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/sh64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sh64/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/sh64/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -281,6 +281,12 @@
 
 source "arch/sh64/oprofile/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/sh64/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/sparc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sparc/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/sparc/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -306,6 +306,8 @@
 
 source "arch/sparc/oprofile/Kconfig"
 
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/sparc/Kconfig.debug"
Index: linux-2.6-lttng/arch/sparc64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/sparc64/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/sparc64/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -444,6 +444,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/sparc64/Kconfig.debug"
Index: linux-2.6-lttng/arch/um/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/um/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/um/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -333,4 +333,10 @@
 	bool
 	default n
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/um/Kconfig.debug"
Index: linux-2.6-lttng/arch/v850/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/v850/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/v850/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -339,6 +339,12 @@
 
 source "drivers/usb/Kconfig"
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/v850/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/arch/x86_64/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86_64/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/x86_64/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -790,6 +790,9 @@
 	  a probepoint and specifies the callback.  Kprobes is useful
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
+
+source "kernel/Kconfig.marker"
+
 endmenu
 
 source "arch/x86_64/Kconfig.debug"
Index: linux-2.6-lttng/arch/xtensa/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/xtensa/Kconfig	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/arch/xtensa/Kconfig	2007-05-09 18:15:34.000000000 -0400
@@ -251,6 +251,12 @@
 	  provide one yourself.
 endmenu
 
+menu "Instrumentation Support"
+
+source "kernel/Kconfig.marker"
+
+endmenu
+
 source "arch/xtensa/Kconfig.debug"
 
 source "security/Kconfig"
Index: linux-2.6-lttng/kernel/Kconfig.marker
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/Kconfig.marker	2007-05-09 18:15:34.000000000 -0400
@@ -0,0 +1,20 @@
+# Code markers configuration
+
+config MARKERS
+	bool "Activate markers"
+	depends on MODULES
+	help
+	  Place an empty function call at each marker site. Can be
+	  dynamically changed for a probe function.
+
+config MARKERS_DISABLE_OPTIMIZATION
+	bool "Disable marker optimization"
+	depends on MARKERS && EMBEDDED
+	default n
+	help
+	  Disable code replacement jump optimisations. Especially useful if your
+	  code is in a read-only rom/flash.
+
+config MARKERS_ENABLE_OPTIMIZATION
+	def_bool y
+	depends on MARKERS && !MARKERS_DISABLE_OPTIMIZATION
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-05-09 18:15:34.000000000 -0400
@@ -32,6 +32,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/marker.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
Index: linux-2.6-lttng/arch/avr32/Kconfig.debug
===================================================================
--- linux-2.6-lttng.orig/arch/avr32/Kconfig.debug	2007-05-09 18:14:51.000000000 -0400
+++ linux-2.6-lttng/arch/avr32/Kconfig.debug	2007-05-09 18:15:34.000000000 -0400
@@ -6,6 +6,9 @@
 
 source "lib/Kconfig.debug"
 
+menu "Instrumentation Support"
+	depends on EXPERIMENTAL
+
 config KPROBES
 	bool "Kprobes"
 	depends on DEBUG_KERNEL
@@ -17,4 +20,8 @@
           for kernel debugging, non-intrusive instrumentation and testing.
           If in doubt, say "N".
 
+source "kernel/Kconfig.marker"
+
+endmenu
+
 endmenu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 02/10] Linux Kernel Markers, architecture independent code.
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
  2007-05-10  1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
@ 2007-05-10  1:55 ` Mathieu Desnoyers
  2007-05-10  5:10   ` Alexey Dobriyan
  2007-05-10  1:55 ` [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary Mathieu Desnoyers
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers, Adrian Bunk

[-- Attachment #1: linux-kernel-markers-architecture-independant-code.patch --]
[-- Type: text/plain, Size: 16532 bytes --]

[bunk@stusta.de: marker exports must be EXPORT_SYMBOL_GPL]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/asm-generic/vmlinux.lds.h |   13 +
 include/linux/marker.h            |  124 +++++++++++++++++
 include/linux/module.h            |    4 
 kernel/module.c                   |  273 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 414 insertions(+)

Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-05-09 18:15:55.000000000 -0400
@@ -121,6 +121,19 @@
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
 	}								\
+	/* Kernel markers : pointers */					\
+	.markers : AT(ADDR(.markers) - LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__start___markers) = .;			\
+		*(.markers)						\
+		VMLINUX_SYMBOL(__stop___markers) = .;			\
+	}								\
+	.markers.c : AT(ADDR(.markers.c) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___markers_c) = .;		\
+		*(.markers.c)						\
+		VMLINUX_SYMBOL(__stop___markers_c) = .;			\
+	}								\
+	__end_rodata = .;						\
+	. = ALIGN(4096);						\
 									\
 	/* Built-in module parameters. */				\
 	__param : AT(ADDR(__param) - LOAD_OFFSET) {			\
Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/marker.h	2007-05-09 18:15:55.000000000 -0400
@@ -0,0 +1,124 @@
+#ifndef _LINUX_MARKER_H
+#define _LINUX_MARKER_H
+
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing.
+ *
+ * See Documentation/marker.txt.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef __KERNEL__
+
+struct __mark_marker_data;
+
+typedef void marker_probe_func(const struct __mark_marker_data *mdata,
+	const char *fmt, ...);
+
+struct __mark_marker_data {
+	const char *name;
+	const char *format;
+	const char *args;
+	int flags;
+	marker_probe_func *call;
+	void *pdata;
+} __attribute__((packed));
+
+struct __mark_marker {
+	struct __mark_marker_data *mdata;
+	void *enable;
+} __attribute__((packed));
+
+#ifdef CONFIG_MARKERS
+
+/* Marker flags : selects the mechanism used to connect the probes to the
+ * markers and what can be executed within the probes. This is primarily
+ * used at reentrancy-unfriendly sites. */
+#define MF_OPTIMIZED	(1 << 0)	/* Use optimized markers */
+#define MF_LOCKDEP	(1 << 1)	/* Can call lockdep */
+#define MF_PRINTK	(1 << 2)	/* vprintk can be called in the probe */
+#define _MF_NR		3		/* Number of marker flags */
+
+/* Generic marker flavor always available */
+#define trace_mark_generic(flags, name, format, args...) \
+	do { \
+		static const char __mstrtab_name_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= #name; \
+		static const char __mstrtab_format_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= format; \
+		static const char __mstrtab_args_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= #args; \
+		static struct __mark_marker_data __mark_data_##name \
+		__attribute__((section("__markers_data"))) = \
+		{ __mstrtab_name_##name,  __mstrtab_format_##name, \
+		__mstrtab_args_##name, \
+		(flags) & ~MF_OPTIMIZED, __mark_empty_function, NULL }; \
+		static char __marker_enable_##name = 0; \
+		static const struct __mark_marker __mark_##name \
+			__attribute__((section("__markers"))) = \
+			{ &__mark_data_##name, &__marker_enable_##name } ; \
+		asm volatile ( "" : : "i" (&__mark_##name)); \
+		__mark_check_format(format, ## args); \
+		if (unlikely(__marker_enable_##name)) { \
+			preempt_disable(); \
+			(*__mark_data_##name.call)(&__mark_data_##name, \
+						format, ## args); \
+			preempt_enable(); \
+		} \
+	} while (0)
+
+#define MARK_GENERIC_ENABLE_IMMEDIATE_OFFSET 0
+#define MARK_GENERIC_ENABLE_TYPE char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_GENERIC_ENABLE(a) \
+	*(MARK_GENERIC_ENABLE_TYPE*) \
+		((char*)a+MARK_GENERIC_ENABLE_IMMEDIATE_OFFSET)
+
+static inline int marker_generic_set_enable(void *address, char enable)
+{
+	MARK_GENERIC_ENABLE(address) = enable;
+	return 0;
+}
+
+#else /* !CONFIG_MARKERS */
+#define MARK_GENERIC(flags, name, format, args...) \
+		__mark_check_format(format, ## args)
+#endif /* CONFIG_MARKERS */
+
+#ifdef CONFIG_MARKERS_ENABLE_OPTIMIZATION
+#include <asm/marker.h>			/* optimized marker flavor */
+#else
+#include <asm-generic/marker.h>		/* fallback on generic markers */
+#endif
+
+#define MARK_MAX_FORMAT_LEN	1024
+/* Pass this as a format string for a marker with no argument */
+#define MARK_NOARGS " "
+
+/* To be used for string format validity checking with sparse */
+static inline
+void __mark_check_format(const char *fmt, ...)
+{ }
+
+extern marker_probe_func __mark_empty_function;
+
+extern int _marker_set_probe(int flags, const char *name, const char *format,
+				marker_probe_func *probe, void *pdata);
+
+#define marker_set_probe(name, format, probe, pdata) \
+	_marker_set_probe(MF_DEFAULT, name, format, probe, pdata)
+
+extern int marker_remove_probe(const char *name);
+extern int marker_list_probe(marker_probe_func *probe);
+
+#endif /* __KERNEL__ */
+#endif
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-05-09 18:14:52.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-05-09 18:15:55.000000000 -0400
@@ -356,6 +356,9 @@
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+	const struct __mark_marker *markers;
+	unsigned int num_markers;
 };
 
 /* FIXME: It'd be nice to isolate modules during init, too, so they
@@ -467,6 +470,7 @@
 int unregister_module_notifier(struct notifier_block * nb);
 
 extern void print_modules(void);
+extern void list_modules(void);
 
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-05-09 18:15:34.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-05-09 18:15:55.000000000 -0400
@@ -142,6 +142,8 @@
 extern const unsigned long __start___kcrctab_gpl_future[];
 extern const unsigned long __start___kcrctab_unused[];
 extern const unsigned long __start___kcrctab_unused_gpl[];
+extern const struct __mark_marker __start___markers[];
+extern const struct __mark_marker __stop___markers[];
 
 #ifndef CONFIG_MODVERSIONS
 #define symversion(base, idx) NULL
@@ -302,6 +304,229 @@
 	return NULL;
 }
 
+#ifdef CONFIG_MARKERS
+
+/* Empty callback provided as a probe to the markers. By providing this to a
+ * disabled marker, we makes sure the  execution flow is always valid even
+ * though the function pointer change and the marker enabling are two distinct
+ * operations that modifies the execution flow of preemptible code. */
+void __mark_empty_function(const struct __mark_marker_data *mdata,
+	const char *fmt, ...)
+{
+}
+EXPORT_SYMBOL_GPL(__mark_empty_function);
+
+/* Set the enable bit of the marker, choosing the generic or architecture
+ * specific functions depending on the marker's flags.
+ */
+static int marker_set_enable(void *address, char enable, int flags)
+{
+	if (flags & MF_OPTIMIZED)
+		return marker_optimized_set_enable(address, enable);
+	else
+		return marker_generic_set_enable(address, enable);
+}
+
+/* Sets the probe callback and enables the markers corresponding to a range of
+ * markers. The enable bit and function address are set out of order, and it's
+ * ok : the state is always coherent because of the empty callback we provide.
+ */
+static int _marker_set_probe_range(int flags, const char *name,
+	const char *format,
+	marker_probe_func *probe,
+	void *pdata,
+	const struct __mark_marker *begin,
+	const struct __mark_marker *end)
+{
+	const struct __mark_marker *iter;
+	int found = 0;
+
+	for (iter = begin; iter < end; iter++) {
+		if (strcmp(name, iter->mdata->name) == 0) {
+			if (format
+				&& strcmp(format, iter->mdata->format) != 0) {
+				printk(KERN_NOTICE
+					"Format mismatch for probe %s "
+					"(%s), marker (%s)\n",
+					name,
+					format,
+					iter->mdata->format);
+				continue;
+			}
+			if (flags & MF_LOCKDEP
+				&& !(iter->mdata->flags & MF_LOCKDEP)) {
+					printk(KERN_NOTICE
+					"Incompatible lockdep flags for "
+					"probe %s\n",
+					name);
+					continue;
+			}
+			if (flags & MF_PRINTK
+				&& !(iter->mdata->flags & MF_PRINTK)) {
+					printk(KERN_NOTICE
+					"Incompatible printk flags for "
+					"probe %s\n",
+					name);
+					continue;
+			}
+			if (probe == __mark_empty_function) {
+				if (iter->mdata->call
+					!= __mark_empty_function) {
+					iter->mdata->call =
+						__mark_empty_function;
+				}
+				marker_set_enable(iter->enable, 0,
+					iter->mdata->flags);
+			} else {
+				if (iter->mdata->call
+					!= __mark_empty_function) {
+					if (iter->mdata->call != probe) {
+						printk(KERN_NOTICE
+							"Marker %s busy, "
+							"probe %p already "
+							"installed\n",
+							name,
+							iter->mdata->call);
+						continue;
+					}
+				} else {
+					found++;
+					iter->mdata->call = probe;
+				}
+				iter->mdata->pdata = pdata;
+				smp_wmb();
+				marker_set_enable(iter->enable, 1,
+					iter->mdata->flags);
+			}
+			found++;
+		}
+	}
+	return found;
+}
+
+/* Sets a range of markers to a disabled state : unset the enable bit and
+ * provide the empty callback. */
+static int marker_remove_probe_range(const char *name,
+	const struct __mark_marker *begin,
+	const struct __mark_marker *end)
+{
+	const struct __mark_marker *iter;
+	int found = 0;
+
+	for (iter = begin; iter < end; iter++) {
+		if (strcmp(name, iter->mdata->name) == 0) {
+			marker_set_enable(iter->enable, 0,
+				iter->mdata->flags);
+			iter->mdata->call = __mark_empty_function;
+			found++;
+		}
+	}
+	return found;
+}
+
+/* Provides a listing of the markers present in the kernel with their type
+ * (optimized or generic), state (enabled or disabled), callback and format
+ * string. */
+static int marker_list_probe_range(marker_probe_func *probe,
+	const struct __mark_marker *begin,
+	const struct __mark_marker *end)
+{
+	const struct __mark_marker *iter;
+	int found = 0;
+
+	for (iter = begin; iter < end; iter++) {
+		if (probe)
+			if (probe != iter->mdata->call) continue;
+		printk("name %s \n", iter->mdata->name);
+		if (iter->mdata->flags & MF_OPTIMIZED)
+			printk("  enable %u optimized ",
+				MARK_OPTIMIZED_ENABLE(iter->enable));
+		else
+			printk("  enable %u generic ",
+				MARK_GENERIC_ENABLE(iter->enable));
+		printk("  func 0x%p format \"%s\"\n",
+			iter->mdata->call, iter->mdata->format);
+		found++;
+	}
+	return found;
+}
+
+/* Calls _marker_set_probe_range for the core markers and modules markers.
+ * Marker enabling/disabling use the modlist_lock to synchronise. */
+int _marker_set_probe(int flags, const char *name, const char *format,
+				marker_probe_func *probe,
+				void *pdata)
+{
+	struct module *mod;
+	int found = 0;
+
+	mutex_lock(&module_mutex);
+	/* Core kernel markers */
+	found += _marker_set_probe_range(flags, name, format, probe,
+			pdata,
+			__start___markers, __stop___markers);
+	/* Markers in modules. */
+	list_for_each_entry(mod, &modules, list) {
+		if (!mod->taints)
+			found += _marker_set_probe_range(flags, name, format,
+			probe, pdata,
+			mod->markers, mod->markers+mod->num_markers);
+	}
+	mutex_unlock(&module_mutex);
+	return found;
+}
+EXPORT_SYMBOL_GPL(_marker_set_probe);
+
+/* Calls _marker_remove_probe_range for the core markers and modules markers.
+ * Marker enabling/disabling use the modlist_lock to synchronise. */
+int marker_remove_probe(const char *name)
+{
+	struct module *mod;
+	int found = 0;
+
+	mutex_lock(&module_mutex);
+	/* Core kernel markers */
+	found += marker_remove_probe_range(name,
+			__start___markers, __stop___markers);
+	/* Markers in modules. */
+	list_for_each_entry(mod, &modules, list) {
+		if (!mod->taints)
+			found += marker_remove_probe_range(name,
+				mod->markers, mod->markers+mod->num_markers);
+	}
+	mutex_unlock(&module_mutex);
+	return found;
+}
+EXPORT_SYMBOL_GPL(marker_remove_probe);
+
+/* Calls _marker_list_probe_range for the core markers and modules markers.
+ * Marker listing uses the modlist_lock to synchronise.
+ * TODO : should output this listing to a procfs file. */
+int marker_list_probe(marker_probe_func *probe)
+{
+	struct module *mod;
+	int found = 0;
+
+	mutex_lock(&module_mutex);
+	/* Core kernel markers */
+	printk("Listing kernel markers\n");
+	found += marker_list_probe_range(probe,
+			__start___markers, __stop___markers);
+	/* Markers in modules. */
+	printk("Listing module markers\n");
+	list_for_each_entry(mod, &modules, list) {
+		if (!mod->taints) {
+			printk("Listing markers for module %s\n", mod->name);
+			found += marker_list_probe_range(probe,
+				mod->markers, mod->markers+mod->num_markers);
+		}
+	}
+	mutex_unlock(&module_mutex);
+	return found;
+}
+EXPORT_SYMBOL_GPL(marker_list_probe);
+#endif
+
 #ifdef CONFIG_SMP
 /* Number of blocks used and allocated. */
 static unsigned int pcpu_num_used, pcpu_num_allocated;
@@ -1659,6 +1884,9 @@
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int markersindex;
+	unsigned int markersdataindex;
+	unsigned int markersstringsindex;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1755,6 +1983,10 @@
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	markersindex = find_sec(hdr, sechdrs, secstrings, "__markers");
+	markersdataindex = find_sec(hdr, sechdrs, secstrings, "__markers_data");
+	markersstringsindex = find_sec(hdr, sechdrs, secstrings,
+				"__markers_strings");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1765,6 +1997,22 @@
 #endif
 	if (unwindex)
 		sechdrs[unwindex].sh_flags |= SHF_ALLOC;
+#ifdef CONFIG_MARKERS
+	if (markersindex)
+		sechdrs[markersindex].sh_flags |= SHF_ALLOC;
+	if (markersdataindex)
+		sechdrs[markersdataindex].sh_flags |= SHF_ALLOC;
+	if (markersstringsindex)
+		sechdrs[markersstringsindex].sh_flags |= SHF_ALLOC;
+#else
+	if (markersindex)
+		sechdrs[markersindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
+	if (markersdataindex)
+		sechdrs[markersdataindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
+	if (markersstringsindex)
+		sechdrs[markersstringsindex].sh_flags
+					&= ~(unsigned long)SHF_ALLOC;
+#endif
 
 	/* Check module struct version now, before we try to use module. */
 	if (!check_modstruct_version(sechdrs, versindex, mod)) {
@@ -1905,6 +2153,11 @@
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+	if (markersindex) {
+		mod->markers = (void *)sechdrs[markersindex].sh_addr;
+		mod->num_markers =
+			sechdrs[markersindex].sh_size / sizeof(*mod->markers);
+	}
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2399,6 +2652,26 @@
 	.show	= m_show
 };
 
+void list_modules(void)
+{
+	/* Enumerate loaded modules */
+	struct list_head	*i;
+	struct module		*mod;
+	unsigned long refcount = 0;
+
+	mutex_lock(&module_mutex);
+	list_for_each(i, &modules) {
+		mod = list_entry(i, struct module, list);
+#ifdef CONFIG_MODULE_UNLOAD
+		refcount = local_read(&mod->ref[0].count);
+#endif //CONFIG_MODULE_UNLOAD
+		trace_mark(list_module, "%s %d %lu",
+				mod->name, mod->state, refcount);
+	}
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(list_modules);
+
 /* Given an address, look for it in the module exception tables. */
 const struct exception_table_entry *search_module_extables(unsigned long addr)
 {

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
  2007-05-10  1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
  2007-05-10  1:55 ` [patch 02/10] Linux Kernel Markers, architecture independent code Mathieu Desnoyers
@ 2007-05-10  1:55 ` Mathieu Desnoyers
  2007-05-10  6:51   ` Christoph Hellwig
  2007-05-10  1:55 ` [patch 04/10] Linux Kernel Markers - PowerPC optimized version Mathieu Desnoyers
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, David Smith, Mathieu Desnoyers

[-- Attachment #1: linux-kernel-markers-header-visible-from-userspace.h --]
[-- Type: text/plain, Size: 3012 bytes --]

One of the things I'm starting to work on is adding support for your
kernel markers to systemtap.  I know the marker stuff is still in a bit
of flux because you are trying to meet the (sometimes conflicting)
requirements of the people on lkml.

One of the things systemtap is going to need is to be able to parse the
'__markers' section so it will be able to look up a user-specified
marker.  For instance, if the user says 'probe kernel.mark("foo") {}',
I've got to see if marker "foo" really exists.

There are 2 problems with this currently, since systemtap is a user-land
program:

1) include/linux/markers.h isn't currently installed by "make
headers_install"

2) even if include/linux/markers.h was installed, it is completely
surrounded by "#ifdef __KERNEL__"

So, I've attached a patch that tries to fix those 2 problems.  I've
moved "#ifdef __KERNEL__" down a bit past the structure and flag
definitions and included marker.h in Kbuild so it will get installed.

I'd appreciate any comments you have.   I'd like to get either this
patch or something like it included in the next version you post to lkml.

Signed-off-by: David Smith <dsmith@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/linux/Kbuild   |    1 +
 include/linux/marker.h |    8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2007-05-09 18:15:55.000000000 -0400
+++ linux-2.6-lttng/include/linux/marker.h	2007-05-09 21:44:47.000000000 -0400
@@ -14,8 +14,6 @@
  * See the file COPYING for more details.
  */
 
-#ifdef __KERNEL__
-
 struct __mark_marker_data;
 
 typedef void marker_probe_func(const struct __mark_marker_data *mdata,
@@ -35,8 +33,6 @@
 	void *enable;
 } __attribute__((packed));
 
-#ifdef CONFIG_MARKERS
-
 /* Marker flags : selects the mechanism used to connect the probes to the
  * markers and what can be executed within the probes. This is primarily
  * used at reentrancy-unfriendly sites. */
@@ -45,6 +41,10 @@
 #define MF_PRINTK	(1 << 2)	/* vprintk can be called in the probe */
 #define _MF_NR		3		/* Number of marker flags */
 
+#ifdef __KERNEL__
+
+#ifdef CONFIG_MARKERS
+
 /* Generic marker flavor always available */
 #define trace_mark_generic(flags, name, format, args...) \
 	do { \
Index: linux-2.6-lttng/include/linux/Kbuild
===================================================================
--- linux-2.6-lttng.orig/include/linux/Kbuild	2007-05-09 21:41:55.000000000 -0400
+++ linux-2.6-lttng/include/linux/Kbuild	2007-05-09 21:46:07.000000000 -0400
@@ -254,6 +254,7 @@
 unifdef-y += llc.h
 unifdef-y += loop.h
 unifdef-y += lp.h
+unifdef-y += marker.h
 unifdef-y += mempolicy.h
 unifdef-y += mii.h
 unifdef-y += mman.h

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 04/10] Linux Kernel Markers - PowerPC optimized version
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2007-05-10  1:55 ` [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary Mathieu Desnoyers
@ 2007-05-10  1:55 ` Mathieu Desnoyers
  2007-05-10  6:57   ` Christoph Hellwig
  2007-05-10  1:56 ` [patch 05/10] Linux Kernel Markers - i386 " Mathieu Desnoyers
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:55 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, hch, Mathieu Desnoyers, Paul Mackerras,
	Benjamin Herrenschmidt

[-- Attachment #1: linux-kernel-markers-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 4968 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/kernel/Makefile |    1 
 arch/powerpc/kernel/marker.c |   25 ++++++++++++
 include/asm-powerpc/marker.h |   85 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 111 insertions(+)

Index: linux-2.6-lttng/arch/powerpc/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/kernel/Makefile	2007-05-09 18:14:51.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/kernel/Makefile	2007-05-09 18:16:00.000000000 -0400
@@ -96,3 +96,4 @@
 
 extra-$(CONFIG_PPC_FPU)		+= fpu.o
 extra-$(CONFIG_PPC64)		+= entry_64.o
+obj-$(CONFIG_MARKERS_ENABLE_OPTIMIZATION)	+= marker.o
Index: linux-2.6-lttng/arch/powerpc/kernel/marker.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/powerpc/kernel/marker.c	2007-05-09 18:16:00.000000000 -0400
@@ -0,0 +1,25 @@
+/* marker.c
+ *
+ * Powerpc optimized marker enabling/disabling.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <linux/string.h>
+#include <asm/cacheflush.h>
+
+int marker_optimized_set_enable(void *address, char enable)
+{
+	char newi[MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET+1];
+	int size = MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET
+			+ sizeof(MARK_OPTIMIZED_ENABLE_TYPE);
+
+	memcpy(newi, address, size);
+	MARK_OPTIMIZED_ENABLE(&newi[0]) = enable;
+	memcpy(address, newi, size);
+	flush_icache_range((unsigned long)address, size);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(marker_optimized_set_enable);
Index: linux-2.6-lttng/include/asm-powerpc/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-powerpc/marker.h	2007-05-09 18:16:00.000000000 -0400
@@ -0,0 +1,85 @@
+#ifndef _ASM_POWERPC_MARKER_H
+#define _ASM_POWERPC_MARKER_H
+
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. PowerPC architecture
+ * optimisations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+#ifdef CONFIG_MARKERS
+
+#define MF_DEFAULT (MF_OPTIMIZED | MF_LOCKDEP | MF_PRINTK)
+
+/* Optimized version of the markers */
+#define trace_mark_optimized(flags, name, format, args...) \
+	do { \
+		static const char __mstrtab_name_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= #name; \
+		static const char __mstrtab_format_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= format; \
+		static const char __mstrtab_args_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= #args; \
+		static struct __mark_marker_data __mark_data_##name \
+		__attribute__((section("__markers_data"))) = \
+		{ __mstrtab_name_##name,  __mstrtab_format_##name, \
+		__mstrtab_args_##name, \
+		(flags) | MF_OPTIMIZED, __mark_empty_function, NULL }; \
+		char condition; \
+		asm volatile(	".section __markers, \"a\", @progbits;\n\t" \
+					PPC_LONG "%1, 0f;\n\t" \
+					".previous;\n\t" \
+					".align 4\n\t" \
+					"0:\n\t" \
+					"li %0,0;\n\t" \
+				: "=r" (condition) \
+				: "i" (&__mark_data_##name)); \
+		__mark_check_format(format, ## args); \
+		if (unlikely(condition)) { \
+			preempt_disable(); \
+			(*__mark_data_##name.call)(&__mark_data_##name, \
+						format, ## args); \
+			preempt_enable(); \
+		} \
+	} while (0)
+
+/* Marker macro selecting the generic or optimized version of marker, depending
+ * on the flags specified. */
+#define _trace_mark(flags, format, args...) \
+do { \
+	if ((flags) & MF_OPTIMIZED) \
+		trace_mark_optimized(flags, format, ## args); \
+	else \
+		trace_mark_generic(flags, format, ## args); \
+} while (0)
+
+/* Marker with default behavior */
+#define trace_mark(format, args...) _trace_mark(MF_DEFAULT, format, ## args)
+
+/* Architecture dependant marker information, used internally for marker
+ * activation. */
+
+/* Offset of the immediate value from the start of the addi instruction (result
+ * of the li mnemonic), in bytes. */
+#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 2
+#define MARK_OPTIMIZED_ENABLE_TYPE short
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_OPTIMIZED_ENABLE(a) \
+	*(MARK_OPTIMIZED_ENABLE_TYPE*) \
+		((char*)a+MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET)
+
+extern int marker_optimized_set_enable(void *address, char enable);
+
+#endif
+#endif //_ASM_POWERPC_MARKER_H

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2007-05-10  1:55 ` [patch 04/10] Linux Kernel Markers - PowerPC optimized version Mathieu Desnoyers
@ 2007-05-10  1:56 ` Mathieu Desnoyers
  2007-05-10  9:06   ` Andi Kleen
  2007-05-10  1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers, Andi Kleen

[-- Attachment #1: linux-kernel-markers-i386-optimization.patch --]
[-- Type: text/plain, Size: 7251 bytes --]

[bunk@stusta.de: marker exports must be EXPORT_SYMBOL_GPL]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/i386/kernel/Makefile |    1 
 arch/i386/kernel/marker.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++
 include/asm-i386/marker.h |   84 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 184 insertions(+)

Index: linux-2.6-lttng/arch/i386/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/Makefile	2007-05-09 18:14:51.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/Makefile	2007-05-09 18:16:03.000000000 -0400
@@ -33,6 +33,7 @@
 obj-y				+= sysenter.o vsyscall.o
 obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
 obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
+obj-$(CONFIG_MARKERS_ENABLE_OPTIMIZATION)	+= marker.o
 obj-$(CONFIG_DOUBLEFAULT) 	+= doublefault.o
 obj-$(CONFIG_SERIAL_8250)	+= legacy_serial.o
 obj-$(CONFIG_VM86)		+= vm86.o
Index: linux-2.6-lttng/arch/i386/kernel/marker.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/i386/kernel/marker.c	2007-05-09 18:16:03.000000000 -0400
@@ -0,0 +1,99 @@
+/* marker.c
+ *
+ * Erratum 49 fix for Intel PIII and higher.
+ *
+ * Permits marker activation by XMC with correct serialization.
+ *
+ * Reentrant for NMI and trap handler instrumentation. Permits XMC to a
+ * location that has preemption enabled because it involves no temporary or
+ * reused data structure.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/notifier.h>
+#include <linux/mutex.h>
+#include <linux/preempt.h>
+#include <linux/smp.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <linux/kdebug.h>
+
+#include <asm/cacheflush.h>
+
+#define BREAKPOINT_INSTRUCTION  0xcc
+#define BREAKPOINT_INS_LEN 1
+
+static DEFINE_MUTEX(mark_mutex);
+static long target_eip = 0;
+
+static void mark_synchronize_core(void *info)
+{
+	sync_core();	/* use cpuid to stop speculative execution */
+}
+
+/* We simply skip the 2 bytes load immediate here, leaving the register in an
+ * undefined state. We don't care about the content (0 or !0), because we are
+ * changing the value 0->1 or 1->0. This small window of undefined value
+ * doesn't matter.
+ */
+static int mark_notifier(struct notifier_block *nb,
+	unsigned long val, void *data)
+{
+	enum die_val die_val = (enum die_val) val;
+	struct die_args *args = (struct die_args *)data;
+
+	if (!args->regs || user_mode_vm(args->regs))
+		return NOTIFY_DONE;
+
+	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
+		args->regs->eip += 1; /* Skip the next byte of load immediate */
+		return NOTIFY_STOP;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block mark_notify = {
+	.notifier_call = mark_notifier,
+	.priority = 0x7fffffff,	/* we need to be notified first */
+};
+
+int marker_optimized_set_enable(void *address, char enable)
+{
+	char saved_byte;
+	int ret;
+	char *dest = address;
+
+	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
+		return 0;
+
+	mutex_lock(&mark_mutex);
+	target_eip = (long)address + BREAKPOINT_INS_LEN;
+	/* register_die_notifier has memory barriers */
+	register_die_notifier(&mark_notify);
+	saved_byte = *dest;
+	*dest = BREAKPOINT_INSTRUCTION;
+	wmb();
+	/* Execute serializing instruction on each CPU.
+	 * Acts as a memory barrier. */
+	ret = on_each_cpu(mark_synchronize_core, NULL, 1, 1);
+	BUG_ON(ret != 0);
+
+	dest[1] = enable;
+	wmb();
+	*dest = saved_byte;
+		/* Wait for all int3 handlers to end
+		   (interrupts are disabled in int3).
+		   This CPU is clearly not in a int3 handler
+		   (not preemptible).
+		   synchronize_sched has memory barriers */
+	synchronize_sched();
+	unregister_die_notifier(&mark_notify);
+	/* unregister_die_notifier has memory barriers */
+	target_eip = 0;
+	mutex_unlock(&mark_mutex);
+	flush_icache_range(address, size);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(marker_optimized_set_enable);
Index: linux-2.6-lttng/include/asm-i386/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-i386/marker.h	2007-05-09 18:16:03.000000000 -0400
@@ -0,0 +1,84 @@
+#ifndef _ASM_I386_MARKER_H
+#define _ASM_I386_MARKER_H
+
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. i386 architecture optimisations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+
+#ifdef CONFIG_MARKERS
+
+#define MF_DEFAULT (MF_OPTIMIZED | MF_LOCKDEP | MF_PRINTK)
+
+/* Optimized version of the markers */
+#define trace_mark_optimized(flags, name, format, args...) \
+	do { \
+		static const char __mstrtab_name_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= #name; \
+		static const char __mstrtab_format_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= format; \
+		static const char __mstrtab_args_##name[] \
+		__attribute__((section("__markers_strings"))) \
+		= #args; \
+		static struct __mark_marker_data __mark_data_##name \
+		__attribute__((section("__markers_data"))) = \
+		{ __mstrtab_name_##name,  __mstrtab_format_##name, \
+		__mstrtab_args_##name, \
+		(flags) | MF_OPTIMIZED, __mark_empty_function, NULL }; \
+		char condition; \
+		asm volatile(	".section __markers, \"a\", @progbits;\n\t" \
+					".long %1, 0f;\n\t" \
+					".previous;\n\t" \
+					".align 2\n\t" \
+					"0:\n\t" \
+					"movb $0,%0;\n\t" \
+				: "=r" (condition) \
+				: "m" (__mark_data_##name)); \
+		__mark_check_format(format, ## args); \
+		if (likely(!condition)) { \
+		} else { \
+			preempt_disable(); \
+			(*__mark_data_##name.call)(&__mark_data_##name, \
+						format, ## args); \
+			preempt_enable(); \
+		} \
+	} while (0)
+
+/* Marker macro selecting the generic or optimized version of marker, depending
+ * on the flags specified. */
+#define _trace_mark(flags, format, args...) \
+do { \
+	if (((flags) & MF_LOCKDEP) && ((flags) & MF_OPTIMIZED)) \
+		trace_mark_optimized(flags, format, ## args); \
+	else \
+		trace_mark_generic(flags, format, ## args); \
+} while (0)
+
+/* Marker with default behavior */
+#define trace_mark(format, args...) _trace_mark(MF_DEFAULT, format, ## args)
+
+/* Architecture dependant marker information, used internally for marker
+ * activation. */
+
+/* Offset of the immediate value from the start of the movb instruction, in
+ * bytes. */
+#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
+#define MARK_OPTIMIZED_ENABLE_TYPE char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_OPTIMIZED_ENABLE(a) \
+	*(MARK_OPTIMIZED_ENABLE_TYPE*) \
+		((char*)a+MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET)
+
+extern int marker_optimized_set_enable(void *address, char enable);
+
+#endif
+#endif //_ASM_I386_MARKER_H

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2007-05-10  1:56 ` [patch 05/10] Linux Kernel Markers - i386 " Mathieu Desnoyers
@ 2007-05-10  1:56 ` Mathieu Desnoyers
  2007-05-10  5:13   ` Alexey Dobriyan
  2007-05-10  6:56   ` Christoph Hellwig
  2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers

[-- Attachment #1: linux-kernel-markers-non-optimized-architectures.patch --]
[-- Type: text/plain, Size: 15108 bytes --]

This patch also includes marker code for non optimized architectures.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/asm-alpha/marker.h     |   13 +++++++++++++
 include/asm-arm/marker.h       |   13 +++++++++++++
 include/asm-arm26/marker.h     |   13 +++++++++++++
 include/asm-cris/marker.h      |   13 +++++++++++++
 include/asm-frv/marker.h       |   13 +++++++++++++
 include/asm-generic/marker.h   |   37 +++++++++++++++++++++++++++++++++++++
 include/asm-h8300/marker.h     |   13 +++++++++++++
 include/asm-ia64/marker.h      |   13 +++++++++++++
 include/asm-m32r/marker.h      |   13 +++++++++++++
 include/asm-m68k/marker.h      |   13 +++++++++++++
 include/asm-m68knommu/marker.h |   13 +++++++++++++
 include/asm-mips/marker.h      |   13 +++++++++++++
 include/asm-parisc/marker.h    |   13 +++++++++++++
 include/asm-ppc/marker.h       |   13 +++++++++++++
 include/asm-s390/marker.h      |   13 +++++++++++++
 include/asm-sh/marker.h        |   13 +++++++++++++
 include/asm-sh64/marker.h      |   13 +++++++++++++
 include/asm-sparc/marker.h     |   13 +++++++++++++
 include/asm-sparc64/marker.h   |   13 +++++++++++++
 include/asm-um/marker.h        |   13 +++++++++++++
 include/asm-v850/marker.h      |   13 +++++++++++++
 include/asm-x86_64/marker.h    |   13 +++++++++++++
 include/asm-xtensa/marker.h    |   13 +++++++++++++
 23 files changed, 323 insertions(+)

Index: linux-2.6-lttng/include/asm-arm/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-arm/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-cris/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-cris/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-frv/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-frv/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-generic/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-generic/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,37 @@
+#ifndef _ASM_GENERIC_MARKER_H
+#define _ASM_GENERIC_MARKER_H
+
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Generic header.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ *
+ * Note : the empty asm volatile with read constraint is used here instead of a
+ * "used" attribute to fix a gcc 4.1.x bug.
+ */
+
+/* Default flags, used by MARK() */
+#define MF_DEFAULT			(MF_LOCKDEP | MF_PRINTK)
+
+/* Fallback on the generic markers, since no optimized version is available */
+#define trace_mark_optimized		trace_mark_generic
+#define _trace_mark			trace_mark_generic
+
+/* Marker with default behavior */
+#define trace_mark(format, args...)	_trace_mark(MF_DEFAULT, format, ## args)
+
+/* Architecture dependant marker information, used internally for marker
+ * activation. */
+
+#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET \
+		MARK_GENERIC_ENABLE_IMMEDIATE_OFFSET
+#define MARK_OPTIMIZED_ENABLE_TYPE	MARK_GENERIC_ENABLE_TYPE
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define MARK_OPTIMIZED_ENABLE		MARK_GENERIC_ENABLE
+
+#define marker_optimized_set_enable marker_generic_set_enable
+
+#endif /* _ASM_GENERIC_MARKER_H */
Index: linux-2.6-lttng/include/asm-h8300/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-h8300/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-ia64/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-ia64/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-m32r/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m32r/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-m68k/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m68k/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-m68knommu/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-m68knommu/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-mips/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-mips/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-parisc/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-parisc/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-ppc/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-ppc/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-s390/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-s390/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sh/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sh/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sh64/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sh64/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sparc/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sparc/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sparc64/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-sparc64/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-um/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-um/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-v850/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-v850/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-x86_64/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86_64/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-xtensa/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-xtensa/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-alpha/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-alpha/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-arm26/marker.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-arm26/marker.h	2007-05-09 18:16:05.000000000 -0400
@@ -0,0 +1,13 @@
+/*
+ * marker.h
+ *
+ * Code markup for dynamic and static tracing. Architecture specific
+ * optimisations.
+ *
+ * No optimisation implemented.
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm-generic/marker.h>

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2007-05-10  1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
@ 2007-05-10  1:56 ` Mathieu Desnoyers
  2007-05-10  6:58   ` Christoph Hellwig
                     ` (2 more replies)
  2007-05-10  1:56 ` [patch 08/10] Defines the linker macro EXTRA_RWDATA for the marker data section Mathieu Desnoyers
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers

[-- Attachment #1: linux-kernel-markers-documentation.patch --]
[-- Type: text/plain, Size: 8936 bytes --]

Here is some documentation explaining what is/how to use the Linux
Kernel Markers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/marker.txt |  266 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 266 insertions(+)

Index: linux-2.6-lttng/Documentation/marker.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/marker.txt	2007-05-09 18:16:13.000000000 -0400
@@ -0,0 +1,266 @@
+ 	             Using the Linux Kernel Markers
+
+			    Mathieu Desnoyers
+
+
+	This document introduces to markers and discusses its purpose. It
+shows some usage examples of the Linux Kernel Markers : how to insert markers
+within the kernel and how to connect probes to a marker. Finally, it has some
+probe module examples. This is what connects to a marker.
+
+
+* Purpose of markers
+
+A marker placed in your code provides a hook to a function (probe) that
+you can provide at runtime. A marker can be "on" (a probe is connected to it)
+or "off" (no probe is attached). An "off" marker has no effect. When turned on,
+the function you provide is called each time the marker is executed in the
+execution context of the caller. When the function provided ends its execution,
+it returns to the caller (marker site).
+
+You can put markers at important locations in the code. They act as
+lightweight hooks that can pass an arbitrary number of parameters,
+described in a printk-like format string, to a function whenever the marker
+code is reached.
+
+They can be used for tracing (LTTng, LKET over SystemTAP), overall performance
+accounting (SystemTAP). They could also be used to implement
+efficient hooks for SELinux or any other subsystem that would have this
+kind of need.
+
+Using the markers for system audit (SELinux) would require to pass a
+variable by address that would be later checked by the marked routine.
+
+
+* Usage
+
+In order to use the macro MARK, you should include linux/marker.h.
+
+#include <linux/marker.h>
+
+Add, in your code :
+
+trace_mark(subsystem_event, "%d %s", someint, somestring);
+Where :
+- subsystem_event is an identifier unique to your event
+    - subsystem is the name of your subsystem.
+    - event is the name of the event to mark.
+- "%d %s" is the formatted string for the serializer.
+- someint is an integer.
+- somestring is a char pointer.
+
+Connecting a function (probe) to a marker is done by providing a probe
+(function to call) for the specific marker through marker_set_probe(). It will
+automatically connect the function and enable the marker site. Removing a probe
+is done through marker_remove_probe(). Probe removal is preempt safe because
+preemption is disabled around the probe call. See the "Probe example" section
+below for a sample probe module.
+
+The marker mechanism supports multiple instances of the same marker.
+Markers can be put in inline functions, inlined static functions and
+unrolled loops.
+
+Note : It is safe to put markers within preempt-safe code : preempt_enable()
+will not call the scheduler due to the tests in preempt_schedule().
+
+
+* Optimization for a given architecture
+
+You will find, in asm-*/marker.h, optimisations for given architectures
+(currently i386 and powerpc). They use a load immediate instead of a data load,
+which saves a data cache hit, but also requires cross CPU code modification. In
+order to support embedded systems which use read-only memory for their code, the
+optimization can be disabled through menu options.
+
+The MF_* flags can be used to control the type of marker. See the
+include/marker.h header for the list of flags. They can be specified as the
+first parameter of the _trace_mark() macro, such as the following example which
+is safe with respect to lockdep.c (useful for marking lockdep.c and printk
+functions).
+
+_trace_mark(MF_DEFAULT | ~MF_LOCKDEP, subsystem_eventb, MARK_NOARGS);
+
+Another example is to specify that a specific marker must never call printk :
+_trace_mark(MF_DEFAULT | ~MF_PRINTK, subsystem_eventc,
+  "%d %s", someint, somestring,);
+
+Flag compatibility is checked before connecting the probe to the marker : the
+right flags must be given to _marker_set_enable().
+
+
+* Probe example
+
+You can build the kernel modules, probe-example.ko and marker-example.ko,
+using the following Makefile:
+------------------------------ CUT -------------------------------------
+obj-m := probe-example.o marker-example.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+	rm -f *.mod.c *.ko *.o
+------------------------------ CUT -------------------------------------
+/* probe-example.c
+ *
+ * Connects two functions to marker call sites.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <asm/atomic.h>
+
+#define NUM_PROBES ARRAY_SIZE(probe_array)
+
+struct probe_data {
+	const char *name;
+	const char *format;
+	marker_probe_func *probe_func;
+};
+
+void probe_subsystem_event(const struct __mark_marker_c *mdata,
+		const char *format, ...)
+{
+	va_list ap;
+	/* Declare args */
+	unsigned int value;
+	const char *mystr;
+
+	/* Assign args */
+	va_start(ap, format);
+	value = va_arg(ap, typeof(value));
+	mystr = va_arg(ap, typeof(mystr));
+
+	/* Call printk */
+	printk("Value %u, string %s\n", value, mystr);
+
+	/* or count, check rights, serialize data in a buffer */
+
+	va_end(ap);
+}
+
+atomic_t eventb_count = ATOMIC_INIT(0);
+
+void probe_subsystem_eventb(const struct __mark_marker_c *mdata,
+	const char *format, ...)
+{
+	/* Increment counter */
+	atomic_inc(&eventb_count);
+}
+
+static struct probe_data probe_array[] =
+{
+	{	.name = "subsystem_event",
+		.format = "%d %s",
+		.probe_func = probe_subsystem_event },
+	{	.name = "subsystem_eventb",
+		.format = MARK_NOARGS,
+		.probe_func = probe_subsystem_eventb },
+};
+
+static int __init probe_init(void)
+{
+	int result;
+	uint8_t eID;
+
+	for (eID = 0; eID < NUM_PROBES; eID++) {
+		result = marker_set_probe(probe_array[eID].name,
+				probe_array[eID].format,
+				probe_array[eID].probe_func, &probe_array[eID]);
+		if (!result)
+			printk(KERN_INFO "Unable to register probe %s\n",
+				probe_array[eID].name);
+	}
+	return 0;
+}
+
+static void __exit probe_fini(void)
+{
+	uint8_t eID;
+
+	for (eID = 0; eID < NUM_PROBES; eID++) {
+		marker_remove_probe(probe_array[eID].name);
+	}
+	synchronize_sched();	/* Wait for probes to finish */
+	printk("Number of event b : %u\n", atomic_read(&eventb_count));
+}
+
+module_init(probe_init);
+module_exit(probe_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("SUBSYSTEM Probe");
+------------------------------ CUT -------------------------------------
+/* marker-example.c
+ *
+ * Executes a marker when /proc/marker-example is opened.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <linux/sched.h>
+#include <linux/proc_fs.h>
+
+struct proc_dir_entry *pentry_example = NULL;
+
+static int my_open(struct inode *inode, struct file *file)
+{
+	int i;
+
+	trace_mark(subsystem_event, "%d %s", 123, "example string");
+	for (i=0; i<10; i++) {
+		trace_mark(subsystem_eventb, MARK_NOARGS);
+	}
+	return -EPERM;
+}
+
+static struct file_operations mark_ops = {
+	.open = my_open,
+};
+
+static int example_init(void)
+{
+	printk(KERN_ALERT "example init\n");
+	pentry_example = create_proc_entry("marker-example", 0444, NULL);
+	if (pentry_example)
+		pentry_example->proc_fops = &mark_ops;
+	else
+		return -EPERM;
+	return 0;
+}
+
+static void example_exit(void)
+{
+	printk(KERN_ALERT "example exit\n");
+	remove_proc_entry("marker-example", NULL);
+}
+
+module_init(example_init)
+module_exit(example_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Linux Trace Toolkit example");
+------------------------------ CUT -------------------------------------
+Sequence of operations : (as root)
+make
+insmod marker-example.ko
+insmod probe-example.ko
+  (it is important to load the probe after the marked code)
+cat /proc/marker-example (returns an expected error)
+rmmod marker-example probe-example
+dmesg
+------------------------------ CUT -------------------------------------

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 08/10] Defines the linker macro EXTRA_RWDATA for the marker data section.
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
@ 2007-05-10  1:56 ` Mathieu Desnoyers
  2007-05-10  1:56 ` [patch 09/10] Linux Kernel Markers - Use EXTRA_RWDATA in architectures Mathieu Desnoyers
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers, linux-arch

[-- Attachment #1: linux-kernel-markers-define-the-linker-macro-extra_rwdata.patch --]
[-- Type: text/plain, Size: 2784 bytes --]

Defines the linker macro EXTRA_RWDATA for the marker data section.  It puts
the marker data in a separate section that will not pollute the normal .data
section, which minimize the cache impact.  Markers need such a special section
because they define a lot of spreaded and small data structures at multiple
sites.

This patch also creates the __markers_strings section (ro marker strings) and
makes sure the __markers section is aligned by putting it before the
__ksymtab_strings (not after).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/asm-generic/vmlinux.lds.h |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-05-08 17:07:12.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-05-08 17:09:12.000000000 -0400
@@ -116,21 +116,19 @@
 		*(__kcrctab_gpl_future)					\
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
-									\
-	/* Kernel symbol table: strings */				\
-        __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
-		*(__ksymtab_strings)					\
-	}								\
 	/* Kernel markers : pointers */					\
-	.markers : AT(ADDR(.markers) - LOAD_OFFSET) {			\
+	__markers : AT(ADDR(__markers) - LOAD_OFFSET) {			\
 		VMLINUX_SYMBOL(__start___markers) = .;			\
-		*(.markers)						\
+		*(__markers)						\
 		VMLINUX_SYMBOL(__stop___markers) = .;			\
 	}								\
-	.markers.c : AT(ADDR(.markers.c) - LOAD_OFFSET) {		\
-		VMLINUX_SYMBOL(__start___markers_c) = .;		\
-		*(.markers.c)						\
-		VMLINUX_SYMBOL(__stop___markers_c) = .;			\
+	/* Kernel symbol table: strings */				\
+        __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
+		*(__ksymtab_strings)					\
+	}								\
+	/* Kernel markers : strings */					\
+	__markers_strings : AT(ADDR(__markers_strings) - LOAD_OFFSET) {	\
+		*(__markers_strings)					\
 	}								\
 	__end_rodata = .;						\
 	. = ALIGN(4096);						\
@@ -145,6 +143,10 @@
 									\
 	. = ALIGN(4096);
 
+#define EXTRA_RWDATA							\
+	. = ALIGN(8);							\
+	*(__markers_data)						\
+
 #define SECURITY_INIT							\
 	.security_initcall.init : AT(ADDR(.security_initcall.init) - LOAD_OFFSET) { \
 		VMLINUX_SYMBOL(__security_initcall_start) = .;		\
@@ -241,4 +243,3 @@
   	*(.initcall6s.init)						\
   	*(.initcall7.init)						\
   	*(.initcall7s.init)
-

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 09/10] Linux Kernel Markers - Use EXTRA_RWDATA in architectures
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2007-05-10  1:56 ` [patch 08/10] Defines the linker macro EXTRA_RWDATA for the marker data section Mathieu Desnoyers
@ 2007-05-10  1:56 ` Mathieu Desnoyers
  2007-05-10  1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
  2007-05-10  2:30 ` [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Andrew Morton
  10 siblings, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers, linux-arch

[-- Attachment #1: linux-kernel-markers-use-extra_rwdata-in-architectures.patch --]
[-- Type: text/plain, Size: 15039 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/kernel/vmlinux.lds.S            |    1 +
 arch/arm/kernel/vmlinux.lds.S              |    1 +
 arch/arm26/kernel/vmlinux-arm26-xip.lds.in |    1 +
 arch/arm26/kernel/vmlinux-arm26.lds.in     |    1 +
 arch/avr32/kernel/vmlinux.lds.c            |    1 +
 arch/cris/arch-v10/vmlinux.lds.S           |    1 +
 arch/cris/arch-v32/vmlinux.lds.S           |    1 +
 arch/frv/kernel/vmlinux.lds.S              |    1 +
 arch/h8300/kernel/vmlinux.lds.S            |    4 +++-
 arch/i386/kernel/vmlinux.lds.S             |    1 +
 arch/ia64/kernel/vmlinux.lds.S             |    2 +-
 arch/m32r/kernel/vmlinux.lds.S             |    1 +
 arch/m68k/kernel/vmlinux-std.lds           |    1 +
 arch/m68k/kernel/vmlinux-sun3.lds          |    1 +
 arch/m68knommu/kernel/vmlinux.lds.S        |    1 +
 arch/mips/kernel/vmlinux.lds.S             |    2 ++
 arch/parisc/kernel/vmlinux.lds.S           |    1 +
 arch/powerpc/kernel/vmlinux.lds.S          |    2 ++
 arch/ppc/kernel/vmlinux.lds.S              |    1 +
 arch/s390/kernel/vmlinux.lds.S             |    1 +
 arch/sh/kernel/vmlinux.lds.S               |    1 +
 arch/sh64/kernel/vmlinux.lds.S             |    1 +
 arch/sparc/kernel/vmlinux.lds.S            |    1 +
 arch/sparc64/kernel/vmlinux.lds.S          |    1 +
 arch/um/kernel/dyn.lds.S                   |    1 +
 arch/um/kernel/uml.lds.S                   |    1 +
 arch/v850/kernel/vmlinux.lds.S             |    1 +
 arch/x86_64/kernel/vmlinux.lds.S           |    1 +
 arch/xtensa/kernel/vmlinux.lds.S           |    2 +-
 29 files changed, 33 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/arch/alpha/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/alpha/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/alpha/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -90,6 +90,7 @@
   _data = .;
   .data : {					/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
   }
 
Index: linux-2.6-lttng/arch/arm/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/arm/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/arm/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -159,6 +159,7 @@
 		 * and the usual data section
 		 */
 		*(.data)
+		EXTRA_RWDATA
 		CONSTRUCTORS
 
 		_edata = .;
Index: linux-2.6-lttng/arch/arm26/kernel/vmlinux-arm26-xip.lds.in
===================================================================
--- linux-2.6-lttng.orig/arch/arm26/kernel/vmlinux-arm26-xip.lds.in	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/arm26/kernel/vmlinux-arm26-xip.lds.in	2007-05-09 18:16:17.000000000 -0400
@@ -112,6 +112,7 @@
 		 * and the usual data section
 		 */
 		*(.data)
+		EXTRA_RWDATA
 		CONSTRUCTORS
 
 		*(.init.data)
Index: linux-2.6-lttng/arch/arm26/kernel/vmlinux-arm26.lds.in
===================================================================
--- linux-2.6-lttng.orig/arch/arm26/kernel/vmlinux-arm26.lds.in	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/arm26/kernel/vmlinux-arm26.lds.in	2007-05-09 18:16:17.000000000 -0400
@@ -107,6 +107,7 @@
 		 * and the usual data section
 		 */
 		*(.data)
+		EXTRA_RWDATA
 		CONSTRUCTORS
 
 		_edata = .;
Index: linux-2.6-lttng/arch/avr32/kernel/vmlinux.lds.c
===================================================================
--- linux-2.6-lttng.orig/arch/avr32/kernel/vmlinux.lds.c	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/avr32/kernel/vmlinux.lds.c	2007-05-09 18:16:17.000000000 -0400
@@ -113,6 +113,7 @@
 		/* And the rest... */
 		*(.data.rel*)
 		*(.data)
+		EXTRA_RWDATA
 		CONSTRUCTORS
 
 		_edata = .;
Index: linux-2.6-lttng/arch/cris/arch-v10/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/cris/arch-v10/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/cris/arch-v10/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -45,6 +45,7 @@
 	__Sdata = . ;
 	.data : {                     /* Data */
 		*(.data)
+		EXTRA_RWDATA
 	}
 	__edata = . ;                 /* End of data section */
 	_edata = . ;
Index: linux-2.6-lttng/arch/cris/arch-v32/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/cris/arch-v32/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/cris/arch-v32/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -50,6 +50,7 @@
 	__Sdata = . ;
 	.data : {                     /* Data */
 		*(.data)
+		EXTRA_RWDATA
 	}
 	__edata = . ;		/* End of data section. */
 	_edata = . ;
Index: linux-2.6-lttng/arch/frv/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/frv/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/frv/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -137,6 +137,7 @@
   .data : {			/* Data */
 	*(.data .data.*)
 	*(.exit.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	}
 
Index: linux-2.6-lttng/arch/h8300/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/h8300/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/h8300/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -105,7 +105,9 @@
 	. = ALIGN(0x4) ;
 		*(.data)
 	. = ALIGN(0x4) ;
-		*(.data.*)	
+		*(.data.*)
+	. = ALIGN(0x4) ;
+	EXTRA_RWDATA
 
 	. = ALIGN(0x4) ;
 	___init_begin = .;
Index: linux-2.6-lttng/arch/i386/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/i386/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/i386/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -77,6 +77,7 @@
   . = ALIGN(4096);
   .data : AT(ADDR(.data) - LOAD_OFFSET) {	/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	} :data
 
Index: linux-2.6-lttng/arch/ia64/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/ia64/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/ia64/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -214,7 +214,7 @@
 
   data : { } :data
   .data : AT(ADDR(.data) - LOAD_OFFSET)
-	{ *(.data) *(.data1) *(.gnu.linkonce.d*) CONSTRUCTORS }
+	{ *(.data) *(.data1) *(.gnu.linkonce.d*) EXTRA_RWDATA CONSTRUCTORS }
 
   . = ALIGN(16);	/* gp must be 16-byte aligned for exc. table */
   .got : AT(ADDR(.got) - LOAD_OFFSET)
Index: linux-2.6-lttng/arch/m32r/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/m32r/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/m32r/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -51,6 +51,7 @@
 	*(.spu)
 	*(.spi)
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	}
 
Index: linux-2.6-lttng/arch/m68k/kernel/vmlinux-std.lds
===================================================================
--- linux-2.6-lttng.orig/arch/m68k/kernel/vmlinux-std.lds	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/m68k/kernel/vmlinux-std.lds	2007-05-09 18:16:17.000000000 -0400
@@ -29,6 +29,7 @@
 
   .data : {			/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	}
 
Index: linux-2.6-lttng/arch/m68k/kernel/vmlinux-sun3.lds
===================================================================
--- linux-2.6-lttng.orig/arch/m68k/kernel/vmlinux-sun3.lds	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/m68k/kernel/vmlinux-sun3.lds	2007-05-09 18:16:17.000000000 -0400
@@ -24,6 +24,7 @@
 
   .data : {			/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	. = ALIGN(16);		/* Exception table */
 	__start___ex_table = .;
Index: linux-2.6-lttng/arch/m68knommu/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/m68knommu/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/m68knommu/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -134,6 +134,7 @@
 		. = ALIGN(4);
 		_sdata = . ;
 		*(.data)
+		EXTRA_RWDATA
 		. = ALIGN(8192) ;
 		*(.data.init_task)
 		_edata = . ;
Index: linux-2.6-lttng/arch/mips/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/mips/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/mips/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -64,6 +64,8 @@
 
     *(.data)
 
+    EXTRA_RWDATA
+
     CONSTRUCTORS
   }
   _gp = . + 0x8000;
Index: linux-2.6-lttng/arch/parisc/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/parisc/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/parisc/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -92,6 +92,7 @@
   . = ALIGN(L1_CACHE_BYTES);
   .data : {			/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	}
 
Index: linux-2.6-lttng/arch/powerpc/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/powerpc/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -170,11 +170,13 @@
 		*(.data)
 		*(.sdata)
 		*(.got.plt) *(.got)
+		EXTRA_RWDATA
 	}
 #else
 	.data : {
 		*(.data .data.rel* .toc1)
 		*(.branch_lt)
+		EXTRA_RWDATA
 	}
 
 	.opd : {
Index: linux-2.6-lttng/arch/ppc/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/ppc/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/ppc/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -73,6 +73,7 @@
     *(.sdata2)
     *(.got.plt) *(.got)
     *(.dynamic)
+    EXTRA_RWDATA
     CONSTRUCTORS
   }
 
Index: linux-2.6-lttng/arch/s390/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/s390/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/s390/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -49,6 +49,7 @@
 
   .data : {			/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	}
 
Index: linux-2.6-lttng/arch/sh/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/sh/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/sh/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -42,6 +42,7 @@
 
   .data : {			/* Data */
 	*(.data)
+	EXTRA_RWDATA
 
  	 /* Align the initial ramdisk image (INITRD) on page boundaries. */
  	 . = ALIGN(PAGE_SIZE);
Index: linux-2.6-lttng/arch/sh64/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/sh64/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/sh64/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -79,6 +79,7 @@
 
   .data : C_PHYS(.data) {			/* Data */
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	}
 
Index: linux-2.6-lttng/arch/sparc/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/sparc/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/sparc/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -23,6 +23,7 @@
   .data    :
   {
     *(.data)
+    EXTRA_RWDATA
     CONSTRUCTORS
   }
   .data1   : { *(.data1) }
Index: linux-2.6-lttng/arch/sparc64/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/sparc64/kernel/vmlinux.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/sparc64/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -28,6 +28,7 @@
   .data    :
   {
     *(.data)
+    EXTRA_RWDATA
     CONSTRUCTORS
   }
   .data1   : { *(.data1) }
Index: linux-2.6-lttng/arch/um/kernel/dyn.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/um/kernel/dyn.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/um/kernel/dyn.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -98,6 +98,7 @@
     . = ALIGN(KERNEL_STACK_SIZE);		/* init_task */
     *(.data.init_task)
     *(.data .data.* .gnu.linkonce.d.*)
+    EXTRA_RWDATA
     SORT(CONSTRUCTORS)
   }
   .data1          : { *(.data1) }
Index: linux-2.6-lttng/arch/um/kernel/uml.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/um/kernel/uml.lds.S	2007-05-09 18:14:50.000000000 -0400
+++ linux-2.6-lttng/arch/um/kernel/uml.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -61,6 +61,7 @@
     *(.data.init_task)
     *(.data)
     *(.gnu.linkonce.d*)
+    EXTRA_RWDATA
     CONSTRUCTORS
   }
   .data1   : { *(.data1) }
Index: linux-2.6-lttng/arch/v850/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/v850/kernel/vmlinux.lds.S	2007-05-09 18:14:51.000000000 -0400
+++ linux-2.6-lttng/arch/v850/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -116,6 +116,7 @@
         	*(.data)						      \
 			*(.exit.data)	/* 2.5 convention */		      \
 			*(.data.exit)	/* 2.4 convention */		      \
+		EXTRA_RWDATA						      \
 		. = ALIGN (16) ;					      \
 		*(.data.cacheline_aligned)				      \
 		. = ALIGN (0x2000) ;					      \
Index: linux-2.6-lttng/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86_64/kernel/vmlinux.lds.S	2007-05-09 18:14:51.000000000 -0400
+++ linux-2.6-lttng/arch/x86_64/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -56,6 +56,7 @@
 				/* Data */
   .data : AT(ADDR(.data) - LOAD_OFFSET) {
 	*(.data)
+	EXTRA_RWDATA
 	CONSTRUCTORS
 	} :data
 
Index: linux-2.6-lttng/arch/xtensa/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6-lttng.orig/arch/xtensa/kernel/vmlinux.lds.S	2007-05-09 18:14:51.000000000 -0400
+++ linux-2.6-lttng/arch/xtensa/kernel/vmlinux.lds.S	2007-05-09 18:16:17.000000000 -0400
@@ -144,7 +144,7 @@
   _fdata = .;
   .data :
   {
-    *(.data) CONSTRUCTORS
+    *(.data) EXTRA_RWDATA CONSTRUCTORS
     . = ALIGN(XCHAL_ICACHE_LINESIZE);
     *(.data.cacheline_aligned)
   }

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [patch 10/10] Port of blktrace to the Linux Kernel Markers.
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2007-05-10  1:56 ` [patch 09/10] Linux Kernel Markers - Use EXTRA_RWDATA in architectures Mathieu Desnoyers
@ 2007-05-10  1:56 ` Mathieu Desnoyers
  2007-05-10  6:53   ` Christoph Hellwig
  2007-05-10  9:20   ` Jens Axboe
  2007-05-10  2:30 ` [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Andrew Morton
  10 siblings, 2 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10  1:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, hch, Mathieu Desnoyers

[-- Attachment #1: linux-kernel-markers-port-blktrace-to-markers.patch --]
[-- Type: text/plain, Size: 23900 bytes --]

Here is a proof of concept patch, for demonstration purpose, of moving
blktrace to the markers.

A few remarks : this patch has the positive effect of removing some code
from the block io tracing hot paths, minimizing the i-cache impact in a
system where the io tracing is compiled in but inactive.

It also moves the blk tracing code from a header (and therefore from the
body of the instrumented functions) to a separate C file.

There, as soon as one device has to be traced, every devices have to
fall into the tracing function call. This is slower than the previous
inline function which tested the condition quickly. If it becomes a
show stopper, it could be fixed by having the possibility to test a
supplementary condition, dependant of the marker context, at the marker
site, just after the enable/disable test.

It does not make the code smaller, since I left all the specialized
tracing functions for requests, bio, generic, remap, which would go away
once a generic infrastructure is in place to serialize the information
passed to the marker. This is mostly why I consider it a proof a
concept.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---

 block/Kconfig                |    1 
 block/blktrace.c             |  282 ++++++++++++++++++++++++++++++++++++++++++-
 block/elevator.c             |    6 
 block/ll_rw_blk.c            |   27 ++--
 drivers/block/cciss.c        |    4 
 drivers/md/dm.c              |   14 +-
 fs/bio.c                     |    4 
 include/linux/blktrace_api.h |  146 +---------------------
 mm/bounce.c                  |    4 
 mm/highmem.c                 |    2 
 10 files changed, 323 insertions(+), 167 deletions(-)

Index: linux-2.6-lttng/block/elevator.c
===================================================================
--- linux-2.6-lttng.orig/block/elevator.c	2007-05-09 18:09:47.000000000 -0400
+++ linux-2.6-lttng/block/elevator.c	2007-05-09 18:16:30.000000000 -0400
@@ -32,7 +32,7 @@
 #include <linux/init.h>
 #include <linux/compiler.h>
 #include <linux/delay.h>
-#include <linux/blktrace_api.h>
+#include <linux/marker.h>
 #include <linux/hash.h>
 
 #include <asm/uaccess.h>
@@ -551,7 +551,7 @@
 	unsigned ordseq;
 	int unplug_it = 1;
 
-	blk_add_trace_rq(q, rq, BLK_TA_INSERT);
+	trace_mark(blk_request_insert, "%p %p", q, rq);
 
 	rq->q = q;
 
@@ -730,7 +730,7 @@
 			 * not be passed by new incoming requests
 			 */
 			rq->cmd_flags |= REQ_STARTED;
-			blk_add_trace_rq(q, rq, BLK_TA_ISSUE);
+			trace_mark(blk_request_issue, "%p %p", q, rq);
 		}
 
 		if (!q->boundary_rq || q->boundary_rq == rq) {
Index: linux-2.6-lttng/block/ll_rw_blk.c
===================================================================
--- linux-2.6-lttng.orig/block/ll_rw_blk.c	2007-05-09 18:14:40.000000000 -0400
+++ linux-2.6-lttng/block/ll_rw_blk.c	2007-05-09 18:16:30.000000000 -0400
@@ -28,6 +28,7 @@
 #include <linux/task_io_accounting_ops.h>
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
+#include <linux/marker.h>
 #include <linux/blktrace_api.h>
 #include <linux/fault-inject.h>
 
@@ -1551,7 +1552,7 @@
 
 	if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) {
 		mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
-		blk_add_trace_generic(q, NULL, 0, BLK_TA_PLUG);
+		trace_mark(blk_plug_device, "%p %p %d", q, NULL, 0);
 	}
 }
 
@@ -1617,7 +1618,7 @@
 	 * devices don't necessarily have an ->unplug_fn defined
 	 */
 	if (q->unplug_fn) {
-		blk_add_trace_pdu_int(q, BLK_TA_UNPLUG_IO, NULL,
+		trace_mark(blk_pdu_unplug_io, "%p %p %d", q, NULL,
 					q->rq.count[READ] + q->rq.count[WRITE]);
 
 		q->unplug_fn(q);
@@ -1628,7 +1629,7 @@
 {
 	request_queue_t *q = container_of(work, request_queue_t, unplug_work);
 
-	blk_add_trace_pdu_int(q, BLK_TA_UNPLUG_IO, NULL,
+	trace_mark(blk_pdu_unplug_io, "%p %p %d", q, NULL,
 				q->rq.count[READ] + q->rq.count[WRITE]);
 
 	q->unplug_fn(q);
@@ -1638,7 +1639,7 @@
 {
 	request_queue_t *q = (request_queue_t *)data;
 
-	blk_add_trace_pdu_int(q, BLK_TA_UNPLUG_TIMER, NULL,
+	trace_mark(blk_pdu_unplug_timer, "%p %p %d", q, NULL,
 				q->rq.count[READ] + q->rq.count[WRITE]);
 
 	kblockd_schedule_work(&q->unplug_work);
@@ -2150,7 +2151,7 @@
 	
 	rq_init(q, rq);
 
-	blk_add_trace_generic(q, bio, rw, BLK_TA_GETRQ);
+	trace_mark(blk_get_request, "%p %p %d", q, bio, rw);
 out:
 	return rq;
 }
@@ -2180,7 +2181,7 @@
 		if (!rq) {
 			struct io_context *ioc;
 
-			blk_add_trace_generic(q, bio, rw, BLK_TA_SLEEPRQ);
+			trace_mark(blk_sleep_request, "%p %p %d", q, bio, rw);
 
 			__generic_unplug_device(q);
 			spin_unlock_irq(q->queue_lock);
@@ -2254,7 +2255,7 @@
  */
 void blk_requeue_request(request_queue_t *q, struct request *rq)
 {
-	blk_add_trace_rq(q, rq, BLK_TA_REQUEUE);
+	trace_mark(blk_requeue, "%p %p", q, rq);
 
 	if (blk_rq_tagged(rq))
 		blk_queue_end_tag(q, rq);
@@ -2940,7 +2941,7 @@
 			if (!ll_back_merge_fn(q, req, bio))
 				break;
 
-			blk_add_trace_bio(q, bio, BLK_TA_BACKMERGE);
+			trace_mark(blk_bio_backmerge, "%p %p", q, bio);
 
 			req->biotail->bi_next = bio;
 			req->biotail = bio;
@@ -2957,7 +2958,7 @@
 			if (!ll_front_merge_fn(q, req, bio))
 				break;
 
-			blk_add_trace_bio(q, bio, BLK_TA_FRONTMERGE);
+			trace_mark(blk_bio_frontmerge, "%p %p", q, bio);
 
 			bio->bi_next = req->bio;
 			req->bio = bio;
@@ -3187,10 +3188,10 @@
 		blk_partition_remap(bio);
 
 		if (old_sector != -1)
-			blk_add_trace_remap(q, bio, old_dev, bio->bi_sector, 
-					    old_sector);
+			trace_mark(blk_remap, "%p %p %u %llu %llu", q, bio, old_dev,
+					(u64)bio->bi_sector, (u64)old_sector);
 
-		blk_add_trace_bio(q, bio, BLK_TA_QUEUE);
+		trace_mark(blk_bio_queue, "%p %p", q, bio);
 
 		old_sector = bio->bi_sector;
 		old_dev = bio->bi_bdev->bd_dev;
@@ -3332,7 +3333,7 @@
 	int total_bytes, bio_nbytes, error, next_idx = 0;
 	struct bio *bio;
 
-	blk_add_trace_rq(req->q, req, BLK_TA_COMPLETE);
+	trace_mark(blk_request_complete, "%p %p", req->q, req);
 
 	/*
 	 * extend uptodate bool to allow < 0 value to be direct io error
Index: linux-2.6-lttng/block/Kconfig
===================================================================
--- linux-2.6-lttng.orig/block/Kconfig	2007-05-09 18:14:19.000000000 -0400
+++ linux-2.6-lttng/block/Kconfig	2007-05-09 18:16:30.000000000 -0400
@@ -32,6 +32,7 @@
 	depends on SYSFS
 	select RELAY
 	select DEBUG_FS
+	select MARKERS
 	help
 	  Say Y here, if you want to be able to trace the block layer actions
 	  on a given queue. Tracing allows you to see any traffic happening
Index: linux-2.6-lttng/block/blktrace.c
===================================================================
--- linux-2.6-lttng.orig/block/blktrace.c	2007-05-09 18:06:30.000000000 -0400
+++ linux-2.6-lttng/block/blktrace.c	2007-05-09 18:16:30.000000000 -0400
@@ -23,11 +23,19 @@
 #include <linux/mutex.h>
 #include <linux/debugfs.h>
 #include <linux/time.h>
+#include <linux/marker.h>
 #include <asm/uaccess.h>
 
 static DEFINE_PER_CPU(unsigned long long, blk_trace_cpu_offset) = { 0, };
 static unsigned int blktrace_seq __read_mostly = 1;
 
+/* Global reference count of probes */
+static DEFINE_MUTEX(blk_probe_mutex);
+static int blk_probes_ref;
+
+int blk_probe_arm(void);
+void blk_probe_disarm(void);
+
 /*
  * Send out a notify message.
  */
@@ -179,7 +187,7 @@
 EXPORT_SYMBOL_GPL(__blk_add_trace);
 
 static struct dentry *blk_tree_root;
-static struct mutex blk_tree_mutex;
+static DEFINE_MUTEX(blk_tree_mutex);
 static unsigned int root_users;
 
 static inline void blk_remove_root(void)
@@ -229,6 +237,10 @@
 	blk_remove_tree(bt->dir);
 	free_percpu(bt->sequence);
 	kfree(bt);
+	mutex_lock(&blk_probe_mutex);
+	if (--blk_probes_ref == 0)
+		blk_probe_disarm();
+	mutex_unlock(&blk_probe_mutex);
 }
 
 static int blk_trace_remove(request_queue_t *q)
@@ -386,6 +398,11 @@
 		goto err;
 	}
 
+	mutex_lock(&blk_probe_mutex);
+	if (!blk_probes_ref++)
+		blk_probe_arm();
+	mutex_unlock(&blk_probe_mutex);
+
 	return 0;
 err:
 	if (dir)
@@ -549,9 +566,270 @@
 #endif
 }
 
+/**
+ * blk_add_trace_rq - Add a trace for a request oriented action
+ * Expected variable arguments :
+ * @q:		queue the io is for
+ * @rq:		the source request
+ *
+ * Description:
+ *     Records an action against a request. Will log the bio offset + size.
+ *
+ **/
+static void blk_add_trace_rq(const struct __mark_marker_data *mdata,
+	const char *fmt, ...)
+{
+	va_list args;
+	u32 what;
+	struct blk_trace *bt;
+	int rw;
+	struct blk_probe_data *pinfo = mdata->pdata;
+	struct request_queue *q;
+	struct request *rq;
+
+	va_start(args, fmt);
+	q = va_arg(args, struct request_queue *);
+	rq = va_arg(args, struct request *);
+	va_end(args);
+
+	what = pinfo->flags;
+	bt = q->blk_trace;
+	rw = rq->cmd_flags & 0x03;
+
+	if (likely(!bt))
+		return;
+
+	if (blk_pc_request(rq)) {
+		what |= BLK_TC_ACT(BLK_TC_PC);
+		__blk_add_trace(bt, 0, rq->data_len, rw, what, rq->errors, sizeof(rq->cmd), rq->cmd);
+	} else  {
+		what |= BLK_TC_ACT(BLK_TC_FS);
+		__blk_add_trace(bt, rq->hard_sector, rq->hard_nr_sectors << 9, rw, what, rq->errors, 0, NULL);
+	}
+}
+
+/**
+ * blk_add_trace_bio - Add a trace for a bio oriented action
+ * Expected variable arguments :
+ * @q:		queue the io is for
+ * @bio:	the source bio
+ *
+ * Description:
+ *     Records an action against a bio. Will log the bio offset + size.
+ *
+ **/
+static void blk_add_trace_bio(const struct __mark_marker_data *mdata,
+	const char *fmt, ...)
+{
+	va_list args;
+	u32 what;
+	struct blk_trace *bt;
+	struct blk_probe_data *pinfo = mdata->pdata;
+	struct request_queue *q;
+	struct bio *bio;
+
+	va_start(args, fmt);
+	q = va_arg(args, struct request_queue *);
+	bio = va_arg(args, struct bio *);
+	va_end(args);
+
+	what = pinfo->flags;
+	bt = q->blk_trace;
+
+	if (likely(!bt))
+		return;
+
+	__blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw, what, !bio_flagged(bio, BIO_UPTODATE), 0, NULL);
+}
+
+/**
+ * blk_add_trace_generic - Add a trace for a generic action
+ * Expected variable arguments :
+ * @q:		queue the io is for
+ * @bio:	the source bio
+ * @rw:		the data direction
+ *
+ * Description:
+ *     Records a simple trace
+ *
+ **/
+static void blk_add_trace_generic(const struct __mark_marker_data *mdata,
+	const char *fmt, ...)
+{
+	va_list args;
+	struct blk_trace *bt;
+	u32 what;
+	struct blk_probe_data *pinfo = mdata->pdata;
+	struct request_queue *q;
+	struct bio *bio;
+	int rw;
+
+	va_start(args, fmt);
+	q = va_arg(args, struct request_queue *);
+	bio = va_arg(args, struct bio *);
+	rw = va_arg(args, int);
+	va_end(args);
+
+	what = pinfo->flags;
+	bt = q->blk_trace;
+
+	if (likely(!bt))
+		return;
+
+	if (bio)
+		blk_add_trace_bio(mdata, "%p %p", q, bio);
+	else
+		__blk_add_trace(bt, 0, 0, rw, what, 0, 0, NULL);
+}
+
+/**
+ * blk_add_trace_pdu_int - Add a trace for a bio with an integer payload
+ * Expected variable arguments :
+ * @q:		queue the io is for
+ * @bio:	the source bio
+ * @pdu:	the integer payload
+ *
+ * Description:
+ *     Adds a trace with some integer payload. This might be an unplug
+ *     option given as the action, with the depth at unplug time given
+ *     as the payload
+ *
+ **/
+static void blk_add_trace_pdu_int(const struct __mark_marker_data *mdata,
+	const char *fmt, ...)
+{
+	va_list args;
+	struct blk_trace *bt;
+	u32 what;
+	struct blk_probe_data *pinfo = mdata->pdata;
+	struct request_queue *q;
+	struct bio *bio;
+	unsigned int pdu;
+	__be64 rpdu;
+
+	va_start(args, fmt);
+	q = va_arg(args, struct request_queue *);
+	bio = va_arg(args, struct bio *);
+	pdu = va_arg(args, unsigned int);
+	va_end(args);
+
+	what = pinfo->flags;
+	bt = q->blk_trace;
+	rpdu = cpu_to_be64(pdu);
+
+	if (likely(!bt))
+		return;
+
+	if (bio)
+		__blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw, what, !bio_flagged(bio, BIO_UPTODATE), sizeof(rpdu), &rpdu);
+	else
+		__blk_add_trace(bt, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu);
+}
+
+/**
+ * blk_add_trace_remap - Add a trace for a remap operation
+ * Expected variable arguments :
+ * @q:		queue the io is for
+ * @bio:	the source bio
+ * @dev:	target device
+ * @from:	source sector
+ * @to:		target sector
+ *
+ * Description:
+ *     Device mapper or raid target sometimes need to split a bio because
+ *     it spans a stripe (or similar). Add a trace for that action.
+ *
+ **/
+static void blk_add_trace_remap(const struct __mark_marker_data *mdata,
+	const char *fmt, ...)
+{
+	va_list args;
+	struct blk_trace *bt;
+	struct blk_io_trace_remap r;
+	u32 what;
+	struct blk_probe_data *pinfo = mdata->pdata;
+	struct request_queue *q;
+	struct bio *bio;
+	u64 dev, from, to;
+
+	va_start(args, fmt);
+	q = va_arg(args, struct request_queue *);
+	bio = va_arg(args, struct bio *);
+	dev = va_arg(args, u64);
+	from = va_arg(args, u64);
+	to = va_arg(args, u64);
+	va_end(args);
+
+	what = pinfo->flags;
+	bt = q->blk_trace;
+
+	if (likely(!bt))
+		return;
+
+	r.device = cpu_to_be32(dev);
+	r.sector = cpu_to_be64(to);
+
+	__blk_add_trace(bt, from, bio->bi_size, bio->bi_rw, BLK_TA_REMAP, !bio_flagged(bio, BIO_UPTODATE), sizeof(r), &r);
+}
+
+#define FACILITY_NAME "blk"
+
+static struct blk_probe_data probe_array[] =
+{
+	{ "blk_bio_queue", "%p %p", BLK_TA_QUEUE, blk_add_trace_bio },
+	{ "blk_bio_backmerge", "%p %p", BLK_TA_BACKMERGE, blk_add_trace_bio },
+	{ "blk_bio_frontmerge", "%p %p", BLK_TA_FRONTMERGE, blk_add_trace_bio },
+	{ "blk_get_request", "%p %p %d", BLK_TA_GETRQ, blk_add_trace_generic },
+	{ "blk_sleep_request", "%p %p %d", BLK_TA_SLEEPRQ,
+		blk_add_trace_generic },
+	{ "blk_requeue", "%p %p", BLK_TA_REQUEUE, blk_add_trace_rq },
+	{ "blk_request_issue", "%p %p", BLK_TA_ISSUE, blk_add_trace_rq },
+	{ "blk_request_complete", "%p %p", BLK_TA_COMPLETE, blk_add_trace_rq },
+	{ "blk_plug_device", "%p %p %d", BLK_TA_PLUG, blk_add_trace_generic },
+	{ "blk_pdu_unplug_io", "%p %p %d", BLK_TA_UNPLUG_IO,
+		blk_add_trace_pdu_int },
+	{ "blk_pdu_unplug_timer", "%p %p %d", BLK_TA_UNPLUG_TIMER,
+		blk_add_trace_pdu_int },
+	{ "blk_request_insert", "%p %p", BLK_TA_INSERT,
+		blk_add_trace_rq },
+	{ "blk_pdu_split", "%p %p %d", BLK_TA_SPLIT,
+		blk_add_trace_pdu_int },
+	{ "blk_bio_bounce", "%p %p", BLK_TA_BOUNCE, blk_add_trace_bio },
+	{ "blk_remap", "%p %p %u %llu %llu", BLK_TA_REMAP,
+		blk_add_trace_remap },
+};
+
+
+int blk_probe_arm(void)
+{
+	int result;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
+		result = marker_set_probe(probe_array[i].name,
+				probe_array[i].format,
+				probe_array[i].callback, &probe_array[i]);
+		if (!result)
+			printk(KERN_INFO
+				"blktrace unable to register probe %s\n",
+				probe_array[i].name);
+	}
+	return 0;
+}
+
+void blk_probe_disarm(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
+		marker_remove_probe(probe_array[i].name);
+	}
+	synchronize_sched();	/* Wait for probes to finish */
+}
+
+
 static __init int blk_trace_init(void)
 {
-	mutex_init(&blk_tree_mutex);
 	on_each_cpu(blk_trace_check_cpu_time, NULL, 1, 1);
 	blk_trace_set_ht_offsets();
 
Index: linux-2.6-lttng/include/linux/blktrace_api.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/blktrace_api.h	2007-05-09 18:06:46.000000000 -0400
+++ linux-2.6-lttng/include/linux/blktrace_api.h	2007-05-09 18:16:30.000000000 -0400
@@ -3,6 +3,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/relay.h>
+#include <linux/marker.h>
 
 /*
  * Trace categories
@@ -142,149 +143,24 @@
 	u32 pid;
 };
 
+/* Probe data used for probe-marker connection */
+struct blk_probe_data {
+	const char *name;
+	const char *format;
+	u32 flags;
+	marker_probe_func *callback;
+};
+
 #if defined(CONFIG_BLK_DEV_IO_TRACE)
 extern int blk_trace_ioctl(struct block_device *, unsigned, char __user *);
 extern void blk_trace_shutdown(request_queue_t *);
 extern void __blk_add_trace(struct blk_trace *, sector_t, int, int, u32, int, int, void *);
-
-/**
- * blk_add_trace_rq - Add a trace for a request oriented action
- * @q:		queue the io is for
- * @rq:		the source request
- * @what:	the action
- *
- * Description:
- *     Records an action against a request. Will log the bio offset + size.
- *
- **/
-static inline void blk_add_trace_rq(struct request_queue *q, struct request *rq,
-				    u32 what)
-{
-	struct blk_trace *bt = q->blk_trace;
-	int rw = rq->cmd_flags & 0x03;
-
-	if (likely(!bt))
-		return;
-
-	if (blk_pc_request(rq)) {
-		what |= BLK_TC_ACT(BLK_TC_PC);
-		__blk_add_trace(bt, 0, rq->data_len, rw, what, rq->errors, sizeof(rq->cmd), rq->cmd);
-	} else  {
-		what |= BLK_TC_ACT(BLK_TC_FS);
-		__blk_add_trace(bt, rq->hard_sector, rq->hard_nr_sectors << 9, rw, what, rq->errors, 0, NULL);
-	}
-}
-
-/**
- * blk_add_trace_bio - Add a trace for a bio oriented action
- * @q:		queue the io is for
- * @bio:	the source bio
- * @what:	the action
- *
- * Description:
- *     Records an action against a bio. Will log the bio offset + size.
- *
- **/
-static inline void blk_add_trace_bio(struct request_queue *q, struct bio *bio,
-				     u32 what)
-{
-	struct blk_trace *bt = q->blk_trace;
-
-	if (likely(!bt))
-		return;
-
-	__blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw, what, !bio_flagged(bio, BIO_UPTODATE), 0, NULL);
-}
-
-/**
- * blk_add_trace_generic - Add a trace for a generic action
- * @q:		queue the io is for
- * @bio:	the source bio
- * @rw:		the data direction
- * @what:	the action
- *
- * Description:
- *     Records a simple trace
- *
- **/
-static inline void blk_add_trace_generic(struct request_queue *q,
-					 struct bio *bio, int rw, u32 what)
-{
-	struct blk_trace *bt = q->blk_trace;
-
-	if (likely(!bt))
-		return;
-
-	if (bio)
-		blk_add_trace_bio(q, bio, what);
-	else
-		__blk_add_trace(bt, 0, 0, rw, what, 0, 0, NULL);
-}
-
-/**
- * blk_add_trace_pdu_int - Add a trace for a bio with an integer payload
- * @q:		queue the io is for
- * @what:	the action
- * @bio:	the source bio
- * @pdu:	the integer payload
- *
- * Description:
- *     Adds a trace with some integer payload. This might be an unplug
- *     option given as the action, with the depth at unplug time given
- *     as the payload
- *
- **/
-static inline void blk_add_trace_pdu_int(struct request_queue *q, u32 what,
-					 struct bio *bio, unsigned int pdu)
-{
-	struct blk_trace *bt = q->blk_trace;
-	__be64 rpdu = cpu_to_be64(pdu);
-
-	if (likely(!bt))
-		return;
-
-	if (bio)
-		__blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw, what, !bio_flagged(bio, BIO_UPTODATE), sizeof(rpdu), &rpdu);
-	else
-		__blk_add_trace(bt, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu);
-}
-
-/**
- * blk_add_trace_remap - Add a trace for a remap operation
- * @q:		queue the io is for
- * @bio:	the source bio
- * @dev:	target device
- * @from:	source sector
- * @to:		target sector
- *
- * Description:
- *     Device mapper or raid target sometimes need to split a bio because
- *     it spans a stripe (or similar). Add a trace for that action.
- *
- **/
-static inline void blk_add_trace_remap(struct request_queue *q, struct bio *bio,
-				       dev_t dev, sector_t from, sector_t to)
-{
-	struct blk_trace *bt = q->blk_trace;
-	struct blk_io_trace_remap r;
-
-	if (likely(!bt))
-		return;
-
-	r.device = cpu_to_be32(dev);
-	r.sector = cpu_to_be64(to);
-
-	__blk_add_trace(bt, from, bio->bi_size, bio->bi_rw, BLK_TA_REMAP, !bio_flagged(bio, BIO_UPTODATE), sizeof(r), &r);
-}
+extern int blk_probe_connect(void);
+extern void blk_probe_disconnect(void);
 
 #else /* !CONFIG_BLK_DEV_IO_TRACE */
 #define blk_trace_ioctl(bdev, cmd, arg)		(-ENOTTY)
 #define blk_trace_shutdown(q)			do { } while (0)
-#define blk_add_trace_rq(q, rq, what)		do { } while (0)
-#define blk_add_trace_bio(q, rq, what)		do { } while (0)
-#define blk_add_trace_generic(q, rq, rw, what)	do { } while (0)
-#define blk_add_trace_pdu_int(q, what, bio, pdu)	do { } while (0)
-#define blk_add_trace_remap(q, bio, dev, f, t)	do {} while (0)
 #endif /* CONFIG_BLK_DEV_IO_TRACE */
 
 #endif
Index: linux-2.6-lttng/mm/bounce.c
===================================================================
--- linux-2.6-lttng.orig/mm/bounce.c	2007-05-09 18:06:57.000000000 -0400
+++ linux-2.6-lttng/mm/bounce.c	2007-05-09 18:16:30.000000000 -0400
@@ -13,7 +13,7 @@
 #include <linux/init.h>
 #include <linux/hash.h>
 #include <linux/highmem.h>
-#include <linux/blktrace_api.h>
+#include <linux/marker.h>
 #include <asm/tlbflush.h>
 
 #define POOL_SIZE	64
@@ -237,7 +237,7 @@
 	if (!bio)
 		return;
 
-	blk_add_trace_bio(q, *bio_orig, BLK_TA_BOUNCE);
+	trace_mark(blk_bio_bounce, "%p %p", q, *bio_orig);
 
 	/*
 	 * at least one page was bounced, fill in possible non-highmem
Index: linux-2.6-lttng/mm/highmem.c
===================================================================
--- linux-2.6-lttng.orig/mm/highmem.c	2007-05-09 18:12:48.000000000 -0400
+++ linux-2.6-lttng/mm/highmem.c	2007-05-09 18:16:30.000000000 -0400
@@ -26,7 +26,7 @@
 #include <linux/init.h>
 #include <linux/hash.h>
 #include <linux/highmem.h>
-#include <linux/blktrace_api.h>
+#include <linux/marker.h>
 #include <asm/tlbflush.h>
 
 /*
Index: linux-2.6-lttng/fs/bio.c
===================================================================
--- linux-2.6-lttng.orig/fs/bio.c	2007-05-09 18:12:44.000000000 -0400
+++ linux-2.6-lttng/fs/bio.c	2007-05-09 18:16:30.000000000 -0400
@@ -25,7 +25,7 @@
 #include <linux/module.h>
 #include <linux/mempool.h>
 #include <linux/workqueue.h>
-#include <linux/blktrace_api.h>
+#include <linux/marker.h>
 #include <scsi/sg.h>		/* for struct sg_iovec */
 
 #define BIO_POOL_SIZE 2
@@ -1081,7 +1081,7 @@
 	if (!bp)
 		return bp;
 
-	blk_add_trace_pdu_int(bdev_get_queue(bi->bi_bdev), BLK_TA_SPLIT, bi,
+	trace_mark(blk_pdu_split, "%p %p %d", bdev_get_queue(bi->bi_bdev), bi,
 				bi->bi_sector + first_sectors);
 
 	BUG_ON(bi->bi_vcnt != 1);
Index: linux-2.6-lttng/drivers/block/cciss.c
===================================================================
--- linux-2.6-lttng.orig/drivers/block/cciss.c	2007-05-09 18:09:47.000000000 -0400
+++ linux-2.6-lttng/drivers/block/cciss.c	2007-05-09 18:16:30.000000000 -0400
@@ -37,7 +37,7 @@
 #include <linux/hdreg.h>
 #include <linux/spinlock.h>
 #include <linux/compat.h>
-#include <linux/blktrace_api.h>
+#include <linux/marker.h>
 #include <asm/uaccess.h>
 #include <asm/io.h>
 
@@ -2502,7 +2502,7 @@
 	}
 	cmd->rq->data_len = 0;
 	cmd->rq->completion_data = cmd;
-	blk_add_trace_rq(cmd->rq->q, cmd->rq, BLK_TA_COMPLETE);
+	trace_mark(blk_request_complete, "%p %p", cmd->rq->q, cmd->rq);
 	blk_complete_request(cmd->rq);
 }
 
Index: linux-2.6-lttng/drivers/md/dm.c
===================================================================
--- linux-2.6-lttng.orig/drivers/md/dm.c	2007-05-09 18:12:32.000000000 -0400
+++ linux-2.6-lttng/drivers/md/dm.c	2007-05-09 18:16:30.000000000 -0400
@@ -19,7 +19,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/hdreg.h>
-#include <linux/blktrace_api.h>
+#include <linux/marker.h>
 #include <linux/smp_lock.h>
 
 #define DM_MSG_PREFIX "core"
@@ -485,8 +485,8 @@
 			wake_up(&io->md->wait);
 
 		if (io->error != DM_ENDIO_REQUEUE) {
-			blk_add_trace_bio(io->md->queue, io->bio,
-					  BLK_TA_COMPLETE);
+			trace_mark(blk_request_complete, "%p %p",
+				io->md->queue, io->bio);
 
 			bio_endio(io->bio, io->bio->bi_size, io->error);
 		}
@@ -582,10 +582,10 @@
 	r = ti->type->map(ti, clone, &tio->info);
 	if (r == DM_MAPIO_REMAPPED) {
 		/* the bio has been remapped so dispatch it */
-
-		blk_add_trace_remap(bdev_get_queue(clone->bi_bdev), clone,
-				    tio->io->bio->bi_bdev->bd_dev, sector,
-				    clone->bi_sector);
+		trace_mark(blk_remap, "%p %p %u %llu %llu",
+			bdev_get_queue(clone->bi_bdev), clone,
+			(u64)tio->io->bio->bi_bdev->bd_dev, (u64)sector,
+			(u64)clone->bi_sector);
 
 		generic_make_request(clone);
 	} else if (r < 0 || r == DM_MAPIO_REQUEUE) {

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 00/10] Linux Kernel Markers for 2.6.21-mm2
  2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  2007-05-10  1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
@ 2007-05-10  2:30 ` Andrew Morton
  10 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-10  2:30 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: linux-kernel, hch

On Wed, 09 May 2007 21:55:55 -0400 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> He is an updated, folded, version of the Linux Kernel Markers. It replaces the
> version found in 2.6.21-mm2 at the exact same spot in the series file.

urgh.  That basically takes us back to square one with whatever review has
happened.

I suppose I could do s/MARK/trace_mark/g on the existing diffs then have a
go at generating incremental diffs so we can review these changes.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 02/10] Linux Kernel Markers, architecture independent code.
  2007-05-10  1:55 ` [patch 02/10] Linux Kernel Markers, architecture independent code Mathieu Desnoyers
@ 2007-05-10  5:10   ` Alexey Dobriyan
  2007-05-10 12:58     ` Mathieu Desnoyers
  2007-05-10 13:12     ` Mathieu Desnoyers
  0 siblings, 2 replies; 61+ messages in thread
From: Alexey Dobriyan @ 2007-05-10  5:10 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch, Adrian Bunk

On Wed, May 09, 2007 at 09:55:57PM -0400, Mathieu Desnoyers wrote:
> --- /dev/null
> +++ linux-2.6-lttng/include/linux/marker.h
> @@ -0,0 +1,124 @@

> +#ifdef __KERNEL__

Just don't add this file to include/linux/Kbuild and remove __KERNEL__
ifdef.

> --- linux-2.6-lttng.orig/include/linux/module.h
> +++ linux-2.6-lttng/include/linux/module.h
> @@ -356,6 +356,9 @@
>  	/* The command line arguments (may be mangled).  People like
>  	   keeping pointers to this stuff */
>  	char *args;
> +
> +	const struct __mark_marker *markers;
> +	unsigned int num_markers;

#ifdef CONFIG_MARKERS, please.

> --- linux-2.6-lttng.orig/kernel/module.c
> +++ linux-2.6-lttng/kernel/module.c

> @@ -1659,6 +1884,9 @@
>  	unsigned int unusedcrcindex;
>  	unsigned int unusedgplindex;
>  	unsigned int unusedgplcrcindex;
> +	unsigned int markersindex;
> +	unsigned int markersdataindex;
> +	unsigned int markersstringsindex;

Bunch of underscores wouldn't hurt.

> +void list_modules(void)
> +{
> +	/* Enumerate loaded modules */
> +	struct list_head	*i;
> +	struct module		*mod;
> +	unsigned long refcount = 0;
> +
> +	mutex_lock(&module_mutex);
> +	list_for_each(i, &modules) {
> +		mod = list_entry(i, struct module, list);
> +#ifdef CONFIG_MODULE_UNLOAD
> +		refcount = local_read(&mod->ref[0].count);
						^
Buy second CPU, already. ;-)

> +#endif //CONFIG_MODULE_UNLOAD
> +		trace_mark(list_module, "%s %d %lu",
> +				mod->name, mod->state, refcount);
> +	}
> +	mutex_unlock(&module_mutex);
> +}


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10  1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
@ 2007-05-10  5:13   ` Alexey Dobriyan
  2007-05-10  6:56   ` Christoph Hellwig
  1 sibling, 0 replies; 61+ messages in thread
From: Alexey Dobriyan @ 2007-05-10  5:13 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

On Wed, May 09, 2007 at 09:56:01PM -0400, Mathieu Desnoyers wrote:
> This patch also includes marker code for non optimized architectures.

>  include/asm-alpha/marker.h     |   13 +++++++++++++
>  include/asm-arm/marker.h       |   13 +++++++++++++
>  include/asm-arm26/marker.h     |   13 +++++++++++++
>  include/asm-cris/marker.h      |   13 +++++++++++++
>  include/asm-frv/marker.h       |   13 +++++++++++++
>  include/asm-generic/marker.h   |   37 +++++++++++++++++++++++++++++++++++++
>  include/asm-h8300/marker.h     |   13 +++++++++++++
>  include/asm-ia64/marker.h      |   13 +++++++++++++
>  include/asm-m32r/marker.h      |   13 +++++++++++++
>  include/asm-m68k/marker.h      |   13 +++++++++++++
>  include/asm-m68knommu/marker.h |   13 +++++++++++++
>  include/asm-mips/marker.h      |   13 +++++++++++++
>  include/asm-parisc/marker.h    |   13 +++++++++++++
>  include/asm-ppc/marker.h       |   13 +++++++++++++
>  include/asm-s390/marker.h      |   13 +++++++++++++
>  include/asm-sh/marker.h        |   13 +++++++++++++
>  include/asm-sh64/marker.h      |   13 +++++++++++++
>  include/asm-sparc/marker.h     |   13 +++++++++++++
>  include/asm-sparc64/marker.h   |   13 +++++++++++++
>  include/asm-um/marker.h        |   13 +++++++++++++
>  include/asm-v850/marker.h      |   13 +++++++++++++
>  include/asm-x86_64/marker.h    |   13 +++++++++++++
>  include/asm-xtensa/marker.h    |   13 +++++++++++++
>  23 files changed, 323 insertions(+)

> --- /dev/null
> +++ linux-2.6-lttng/include/asm-arm/marker.h
> @@ -0,0 +1,13 @@
> +/*
> + * marker.h
> + *
> + * Code markup for dynamic and static tracing. Architecture specific
> + * optimisations.
> + *
> + * No optimisation implemented.
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#include <asm-generic/marker.h>

Come on, one line file is enough!


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-05-10  1:55 ` [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary Mathieu Desnoyers
@ 2007-05-10  6:51   ` Christoph Hellwig
  2007-05-10 22:14     ` David Smith
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10  6:51 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch, David Smith

On Wed, May 09, 2007 at 09:55:58PM -0400, Mathieu Desnoyers wrote:
> One of the things I'm starting to work on is adding support for your
> kernel markers to systemtap.  I know the marker stuff is still in a bit
> of flux because you are trying to meet the (sometimes conflicting)
> requirements of the people on lkml.
> 
> One of the things systemtap is going to need is to be able to parse the
> '__markers' section so it will be able to look up a user-specified
> marker.  For instance, if the user says 'probe kernel.mark("foo") {}',
> I've got to see if marker "foo" really exists.

NACK.  This shouldn't be includedable from userspace and systemtap people
should stop doing crap like that but use kernel infrastructure everyone
else uses including runtime kernel code instead of stuffin all kinds of
crap into their broken translator.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 10/10] Port of blktrace to the Linux Kernel Markers.
  2007-05-10  1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
@ 2007-05-10  6:53   ` Christoph Hellwig
  2007-05-10  9:20   ` Jens Axboe
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10  6:53 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

On Wed, May 09, 2007 at 09:56:05PM -0400, Mathieu Desnoyers wrote:
> Here is a proof of concept patch, for demonstration purpose, of moving
> blktrace to the markers.

Please get an ACK from Jens as blktrace is his code.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10  1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
  2007-05-10  5:13   ` Alexey Dobriyan
@ 2007-05-10  6:56   ` Christoph Hellwig
  2007-05-10 13:11     ` Mathieu Desnoyers
  1 sibling, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10  6:56 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

On Wed, May 09, 2007 at 09:56:01PM -0400, Mathieu Desnoyers wrote:
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-lttng/include/asm-arm/marker.h	2007-05-09 18:16:05.000000000 -0400
> @@ -0,0 +1,13 @@
> +/*
> + * marker.h

I think I've told you a few times already that mentioning the filename
in the file is pretty braindead and not exactly helpful.

Would be good to have an automatic check for this as I'm getting a little
tired of mentioning this three times a day.

> + *
> + * Code markup for dynamic and static tracing. Architecture specific
> + * optimisations.
> + *
> + * No optimisation implemented.
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#include <asm-generic/marker.h>

I don't think any kind of copyright statement or comment is appopinquate
for a file that does nothing but including an asm-generic header.  This
file should a single line, namely:

#include <asm-generic/marker.h>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code
  2007-05-10  1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
@ 2007-05-10  6:57   ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10  6:57 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch, Adrian Bunk

Kconfig additions come last because every patch in the series
should be buildable.   


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 04/10] Linux Kernel Markers - PowerPC optimized version
  2007-05-10  1:55 ` [patch 04/10] Linux Kernel Markers - PowerPC optimized version Mathieu Desnoyers
@ 2007-05-10  6:57   ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10  6:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, linux-kernel, hch, Paul Mackerras, Benjamin Herrenschmidt

On Wed, May 09, 2007 at 09:55:59PM -0400, Mathieu Desnoyers wrote:
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Can we please get the non-optimized ones in first and then do the
optimized incremental ontop of that?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
@ 2007-05-10  6:58   ` Christoph Hellwig
  2007-05-10 11:41     ` Alan Cox
                       ` (3 more replies)
  2007-05-10 12:00   ` Christoph Hellwig
  2007-05-10 15:51   ` Scott Preece
  2 siblings, 4 replies; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10  6:58 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

On Wed, May 09, 2007 at 09:56:02PM -0400, Mathieu Desnoyers wrote:
> Here is some documentation explaining what is/how to use the Linux
> Kernel Markers.

Please remove all references to out of tree code from the kernel documentation.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10  1:56 ` [patch 05/10] Linux Kernel Markers - i386 " Mathieu Desnoyers
@ 2007-05-10  9:06   ` Andi Kleen
  2007-05-10 15:55     ` Mathieu Desnoyers
  0 siblings, 1 reply; 61+ messages in thread
From: Andi Kleen @ 2007-05-10  9:06 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

On Wed, May 09, 2007 at 09:56:00PM -0400, Mathieu Desnoyers wrote:
> @@ -0,0 +1,99 @@
> +/* marker.c
> + *
> + * Erratum 49 fix for Intel PIII and higher.

Errata are CPU specific so they can't be higher. You mean it's a P3
erratum only?

In general you need some more description why the int3 handler is needed.

> + *
> + * Permits marker activation by XMC with correct serialization.

This doesn't follow the Intel recommended algorithm for cross modifying
code (7.1.3 in the IA32 manual) which requires the other CPUs to spin
during the modification.

If you used that would you really need the INT3 handler?

> +static DEFINE_MUTEX(mark_mutex);
> +static long target_eip = 0;
> +
> +static void mark_synchronize_core(void *info)
> +{
> +	sync_core();	/* use cpuid to stop speculative execution */
> +}
> +
> +/* We simply skip the 2 bytes load immediate here, leaving the register in an
> + * undefined state. We don't care about the content (0 or !0), because we are
> + * changing the value 0->1 or 1->0. This small window of undefined value
> + * doesn't matter.
> + */
> +static int mark_notifier(struct notifier_block *nb,
> +	unsigned long val, void *data)
> +{
> +	enum die_val die_val = (enum die_val) val;
> +	struct die_args *args = (struct die_args *)data;
> +
> +	if (!args->regs || user_mode_vm(args->regs))

I don't think regs should be ever NULL

> +		return NOTIFY_DONE;
> +
> +	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
> +		args->regs->eip += 1; /* Skip the next byte of load immediate */

In what instruction results that then? This is definitely underdocumented.

> +int marker_optimized_set_enable(void *address, char enable)
> +{
> +	char saved_byte;
> +	int ret;
> +	char *dest = address;
> +
> +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> +		return 0;

Wouldn't you need that inside the mutex too to avoid races?


> +
> +	mutex_lock(&mark_mutex);
> +	target_eip = (long)address + BREAKPOINT_INS_LEN;
> +	/* register_die_notifier has memory barriers */
> +	register_die_notifier(&mark_notify);
> +	saved_byte = *dest;
> +	*dest = BREAKPOINT_INSTRUCTION;
> +	wmb();

wmb is a nop

> +	/* Execute serializing instruction on each CPU.
> +	 * Acts as a memory barrier. */
> +	ret = on_each_cpu(mark_synchronize_core, NULL, 1, 1);
> +	BUG_ON(ret != 0);
> +
> +	dest[1] = enable;
> +	wmb();
> +	*dest = saved_byte;
> +		/* Wait for all int3 handlers to end
> +		   (interrupts are disabled in int3).
> +		   This CPU is clearly not in a int3 handler
> +		   (not preemptible).

So what happens when the kernel is preemptible? 

> +		   synchronize_sched has memory barriers */
> +	synchronize_sched();
> +	unregister_die_notifier(&mark_notify);
> +	/* unregister_die_notifier has memory barriers */
> +	target_eip = 0;
> +	mutex_unlock(&mark_mutex);
> +	flush_icache_range(address, size);

That's a nop on x86

> +
> +#ifdef CONFIG_MARKERS
> +
> +#define MF_DEFAULT (MF_OPTIMIZED | MF_LOCKDEP | MF_PRINTK)
> +
> +/* Optimized version of the markers */
> +#define trace_mark_optimized(flags, name, format, args...) \
> +	do { \
> +		static const char __mstrtab_name_##name[] \
> +		__attribute__((section("__markers_strings"))) \
> +		= #name; \
> +		static const char __mstrtab_format_##name[] \
> +		__attribute__((section("__markers_strings"))) \
> +		= format; \
> +		static const char __mstrtab_args_##name[] \
> +		__attribute__((section("__markers_strings"))) \
> +		= #args; \

For what do you need special string sections?  

If it's something to be read by external programs the interface
would need to be clearly defined and commented.
If not just use normal variables.

> +		static struct __mark_marker_data __mark_data_##name \
> +		__attribute__((section("__markers_data"))) = \
> +		{ __mstrtab_name_##name,  __mstrtab_format_##name, \
> +		__mstrtab_args_##name, \
> +		(flags) | MF_OPTIMIZED, __mark_empty_function, NULL }; \
> +		char condition; \
> +		asm volatile(	".section __markers, \"a\", @progbits;\n\t" \

The volatile is not needed because the output is used.

> +					".long %1, 0f;\n\t" \
> +					".previous;\n\t" \
> +					".align 2\n\t" \
> +					"0:\n\t" \
> +					"movb $0,%0;\n\t" \

This should be a generic facility in a separate include file / separate 
macros etc so that it can be used by other code too.

> +				: "=r" (condition) \
> +				: "m" (__mark_data_##name)); \
> +		__mark_check_format(format, ## args); \
> +		if (likely(!condition)) { \
> +		} else { \
> +			preempt_disable(); \

Why the preempt disable here and why can't it be in the hook?

> +			(*__mark_data_##name.call)(&__mark_data_##name, \
> +						format, ## args); \
> +			preempt_enable(); \
> +		} \
> +	} while (0)
> +
> +/* Marker macro selecting the generic or optimized version of marker, depending
> + * on the flags specified. */
> +#define _trace_mark(flags, format, args...) \
> +do { \
> +	if (((flags) & MF_LOCKDEP) && ((flags) & MF_OPTIMIZED)) \
> +		trace_mark_optimized(flags, format, ## args); \
> +	else \
> +		trace_mark_generic(flags, format, ## args); \

Is there ever a good reason not to use optimized markers? 

> + * bytes. */
> +#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> +#define MARK_OPTIMIZED_ENABLE_TYPE char

unsigned char is usually safer to avoid sign extension

-Andi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 10/10] Port of blktrace to the Linux Kernel Markers.
  2007-05-10  1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
  2007-05-10  6:53   ` Christoph Hellwig
@ 2007-05-10  9:20   ` Jens Axboe
  1 sibling, 0 replies; 61+ messages in thread
From: Jens Axboe @ 2007-05-10  9:20 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

On Wed, May 09 2007, Mathieu Desnoyers wrote:
> Here is a proof of concept patch, for demonstration purpose, of moving
> blktrace to the markers.
> 
> A few remarks : this patch has the positive effect of removing some code
> from the block io tracing hot paths, minimizing the i-cache impact in a
> system where the io tracing is compiled in but inactive.
> 
> It also moves the blk tracing code from a header (and therefore from the
> body of the instrumented functions) to a separate C file.
> 
> There, as soon as one device has to be traced, every devices have to
> fall into the tracing function call. This is slower than the previous
> inline function which tested the condition quickly. If it becomes a
> show stopper, it could be fixed by having the possibility to test a
> supplementary condition, dependant of the marker context, at the marker
> site, just after the enable/disable test.

Did you benchmark it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  6:58   ` Christoph Hellwig
@ 2007-05-10 11:41     ` Alan Cox
  2007-05-10 11:41       ` Christoph Hellwig
  2007-05-10 14:12     ` Mathieu Desnoyers
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-05-10 11:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Mathieu Desnoyers, akpm, linux-kernel, hch

On Thu, 10 May 2007 07:58:47 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, May 09, 2007 at 09:56:02PM -0400, Mathieu Desnoyers wrote:
> > Here is some documentation explaining what is/how to use the Linux
> > Kernel Markers.
> 
> Please remove all references to out of tree code from the kernel documentation.

Christoph stop being ridiculous. Its one thing to object to kernel
markers and some of the rather nice tools they give you for
borderline technical reasons, but at the point you start saying "remove
references to" you are just trying to censor stuff you don't personally
approve of and you sound as stupid as RMS in his "don't link to non GPL
material" mode.

The kernel documentation is *FULL* of references to out of tree code and
documents precisely because the kernel does not exist in isolation. 

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 11:41     ` Alan Cox
@ 2007-05-10 11:41       ` Christoph Hellwig
  2007-05-10 12:48         ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10 11:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

On Thu, May 10, 2007 at 12:41:20PM +0100, Alan Cox wrote:
> On Thu, 10 May 2007 07:58:47 +0100
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Wed, May 09, 2007 at 09:56:02PM -0400, Mathieu Desnoyers wrote:
> > > Here is some documentation explaining what is/how to use the Linux
> > > Kernel Markers.
> > 
> > Please remove all references to out of tree code from the kernel documentation.
> 
> Christoph stop being ridiculous. Its one thing to object to kernel
> markers

I do not object kernel markers, and I have no idea where you got
that crap from.  Quite contrary I've worked with Matthew on
getting both markers and lttng good enough that we can merge
it eventually.

Documentation of kernel facilities should not mention big outside
blobs of crap nerverless, which is something totally different from
not mentionining userspace code contrary to your ill-advised position.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
  2007-05-10  6:58   ` Christoph Hellwig
@ 2007-05-10 12:00   ` Christoph Hellwig
  2007-05-10 15:51   ` Scott Preece
  2 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2007-05-10 12:00 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

> +They can be used for tracing (LTTng, LKET over SystemTAP), overall performance
> +accounting (SystemTAP). They could also be used to implement
> +efficient hooks for SELinux or any other subsystem that would have this
> +kind of need.
> +
> +Using the markers for system audit (SELinux) would require to pass a
> +variable by address that would be later checked by the marked routine.

I don't think you should mention abuses like the selinux one, nor odd
projects that aren't merged.  Just make this a generic:

"The can be used for tracing or performance accounting"

> +* Usage
> +
> +In order to use the macro MARK, you should include linux/marker.h.

It's trace_mark now.

> +* Optimization for a given architecture
> +
> +You will find, in asm-*/marker.h, optimisations for given architectures
> +(currently i386 and powerpc). They use a load immediate instead of a data load,
> +which saves a data cache hit, but also requires cross CPU code modification. In
> +order to support embedded systems which use read-only memory for their code, the
> +optimization can be disabled through menu options.

I'm not sure this belong in here.  As we already have good example it's
probably not needed at all.

> +#include <linux/sched.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/marker.h>
> +#include <asm/atomic.h>
> +
> +#define NUM_PROBES ARRAY_SIZE(probe_array)

no need for this macro.

> +static int __init probe_init(void)
> +{
> +	int result;
> +	uint8_t eID;

please call the loop variable i and use int for it.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 11:41       ` Christoph Hellwig
@ 2007-05-10 12:48         ` Alan Cox
  2007-05-10 12:52           ` Pekka Enberg
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-05-10 12:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

> Documentation of kernel facilities should not mention big outside
> blobs of crap nerverless, which is something totally different from

Christoph , you are sounding even more arrogant than RMS now, which is
quite an achievement, although one you shouldn't be proud of.

"blobs of crap" - in whose opinion. Yours. So you don't want the kernel
to mention things you personally don't approve of. Thats straight forward
censorship and has no place in free software.

If you want to control everything people say and think you might want to
apply from a job at the EU commission instead.

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 12:48         ` Alan Cox
@ 2007-05-10 12:52           ` Pekka Enberg
  2007-05-10 13:04             ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Pekka Enberg @ 2007-05-10 12:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

On 5/10/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> "blobs of crap" - in whose opinion. Yours. So you don't want the kernel
> to mention things you personally don't approve of. Thats straight forward
> censorship and has no place in free software.

Out of kernel code comes and goes, so why mention it in-tree? Besides,
what's wrong with the suggested "can be used for tracing or
performance accounting?"

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 02/10] Linux Kernel Markers, architecture independent code.
  2007-05-10  5:10   ` Alexey Dobriyan
@ 2007-05-10 12:58     ` Mathieu Desnoyers
  2007-05-10 13:12     ` Mathieu Desnoyers
  1 sibling, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 12:58 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, linux-kernel, hch, Adrian Bunk

Hi Alexey,

* Alexey Dobriyan (adobriyan@gmail.com) wrote:
> On Wed, May 09, 2007 at 09:55:57PM -0400, Mathieu Desnoyers wrote:
> > --- /dev/null
> > +++ linux-2.6-lttng/include/linux/marker.h
> > @@ -0,0 +1,124 @@
> 
> > +#ifdef __KERNEL__
> 
> Just don't add this file to include/linux/Kbuild and remove __KERNEL__
> ifdef.
> 
> > --- linux-2.6-lttng.orig/include/linux/module.h
> > +++ linux-2.6-lttng/include/linux/module.h
> > @@ -356,6 +356,9 @@
> >  	/* The command line arguments (may be mangled).  People like
> >  	   keeping pointers to this stuff */
> >  	char *args;
> > +
> > +	const struct __mark_marker *markers;
> > +	unsigned int num_markers;
> 
> #ifdef CONFIG_MARKERS, please.
> 

Ok, I merged the patch from SystemTAP to help them parse the markers
section, but seeing the violent objections it gets, I guess they will
have to figure out another way to parse it. Let's drop the
allow-userspace-applications-to-use-markerh-to-parse-the-markers-section-in-the-kernel-binary.patch
then.


> > --- linux-2.6-lttng.orig/kernel/module.c
> > +++ linux-2.6-lttng/kernel/module.c
> 
> > @@ -1659,6 +1884,9 @@
> >  	unsigned int unusedcrcindex;
> >  	unsigned int unusedgplindex;
> >  	unsigned int unusedgplcrcindex;
> > +	unsigned int markersindex;
> > +	unsigned int markersdataindex;
> > +	unsigned int markersstringsindex;
> 
> Bunch of underscores wouldn't hurt.
> 

Hrm, I used the exact same variable naming style already present in the
function. Do you suggest changing _every_ variable name to underscores ?

> > +void list_modules(void)
> > +{
> > +	/* Enumerate loaded modules */
> > +	struct list_head	*i;
> > +	struct module		*mod;
> > +	unsigned long refcount = 0;
> > +
> > +	mutex_lock(&module_mutex);
> > +	list_for_each(i, &modules) {
> > +		mod = list_entry(i, struct module, list);
> > +#ifdef CONFIG_MODULE_UNLOAD
> > +		refcount = local_read(&mod->ref[0].count);
> 						^
> Buy second CPU, already. ;-)
>

Good catch, will fix. I do already have more than one, don't worry ;)

> > +#endif //CONFIG_MODULE_UNLOAD
> > +		trace_mark(list_module, "%s %d %lu",
> > +				mod->name, mod->state, refcount);
> > +	}
> > +	mutex_unlock(&module_mutex);
> > +}
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 12:52           ` Pekka Enberg
@ 2007-05-10 13:04             ` Alan Cox
  2007-05-10 13:16               ` Pekka J Enberg
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-05-10 13:04 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

On Thu, 10 May 2007 15:52:26 +0300
"Pekka Enberg" <penberg@cs.helsinki.fi> wrote:

> On 5/10/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > "blobs of crap" - in whose opinion. Yours. So you don't want the kernel
> > to mention things you personally don't approve of. Thats straight forward
> > censorship and has no place in free software.
> 
> Out of kernel code comes and goes, so why mention it in-tree? Besides,
> what's wrong with the suggested "can be used for tracing or
> performance accounting?"

Because people want to know what it is really for ? It's quite amazing
really at the same time as people are posting crypto keys everywhere in
defiance of USSA law, we've got free software people trying to remove
references to a piece of out of tree software, and one that is free
software. 

And as I keep saying the tree is full of references to out of tree stuff.
The documentation directory alone currently contains over a thousand http
URLS as of 2.6.21-rc6-mm1.

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10  6:56   ` Christoph Hellwig
@ 2007-05-10 13:11     ` Mathieu Desnoyers
  2007-05-10 13:40       ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 13:11 UTC (permalink / raw)
  To: Christoph Hellwig, akpm, linux-kernel

Linux Kernel Markers - Non Optimized Architectures fix 2

Fix comments in headers
- Remove Copyright notice in one-liner header
- Transfer generic header note "using asm to fix a gcc bug" to where it belongs
  now : in linux/marker.h.
- Fix optimisation -> optimization typo in powerpc and i386 headers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-arm/marker.h       |   12 ------------
 include/asm-cris/marker.h      |   12 ------------
 include/asm-frv/marker.h       |   12 ------------
 include/asm-generic/marker.h   |   12 ------------
 include/asm-h8300/marker.h     |   12 ------------
 include/asm-i386/marker.h      |    2 +-
 include/asm-ia64/marker.h      |   12 ------------
 include/asm-m32r/marker.h      |   12 ------------
 include/asm-m68k/marker.h      |   12 ------------
 include/asm-m68knommu/marker.h |   12 ------------
 include/asm-mips/marker.h      |   12 ------------
 include/asm-parisc/marker.h    |   12 ------------
 include/asm-powerpc/marker.h   |    2 +-
 include/asm-ppc/marker.h       |   12 ------------
 include/asm-s390/marker.h      |   12 ------------
 include/asm-sh/marker.h        |   12 ------------
 include/asm-sh64/marker.h      |   12 ------------
 include/asm-sparc/marker.h     |   12 ------------
 include/asm-sparc64/marker.h   |   12 ------------
 include/asm-um/marker.h        |   12 ------------
 include/asm-v850/marker.h      |   12 ------------
 include/asm-x86_64/marker.h    |   12 ------------
 include/asm-xtensa/marker.h    |   12 ------------
 include/linux/marker.h         |    4 +++-
 24 files changed, 5 insertions(+), 255 deletions(-)

Index: linux-2.6-lttng/include/asm-arm/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-arm/marker.h	2007-05-10 09:02:16.000000000 -0400
+++ linux-2.6-lttng/include/asm-arm/marker.h	2007-05-10 09:02:31.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-cris/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-cris/marker.h	2007-05-10 09:02:16.000000000 -0400
+++ linux-2.6-lttng/include/asm-cris/marker.h	2007-05-10 09:02:35.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-frv/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-frv/marker.h	2007-05-10 09:02:16.000000000 -0400
+++ linux-2.6-lttng/include/asm-frv/marker.h	2007-05-10 09:02:38.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-generic/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/marker.h	2007-05-10 09:04:14.000000000 -0400
@@ -1,18 +1,6 @@
 #ifndef _ASM_GENERIC_MARKER_H
 #define _ASM_GENERIC_MARKER_H
 
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Generic header.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- *
- * Note : the empty asm volatile with read constraint is used here instead of a
- * "used" attribute to fix a gcc 4.1.x bug.
- */
-
 #define _MF_DEFAULT			(_MF_LOCKDEP | _MF_PRINTK)
 
 #define MARK_OPTIMIZED			MARK_GENERIC
Index: linux-2.6-lttng/include/asm-h8300/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-h8300/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-h8300/marker.h	2007-05-10 09:04:23.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-i386/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/marker.h	2007-05-10 09:05:06.000000000 -0400
@@ -4,7 +4,7 @@
 /*
  * marker.h
  *
- * Code markup for dynamic and static tracing. i386 architecture optimisations.
+ * Code markup for dynamic and static tracing. i386 architecture optimizations.
  *
  * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
  *
Index: linux-2.6-lttng/include/asm-ia64/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-ia64/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-ia64/marker.h	2007-05-10 09:05:09.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-m32r/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-m32r/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-m32r/marker.h	2007-05-10 09:05:12.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-m68k/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-m68k/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-m68k/marker.h	2007-05-10 09:05:15.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-m68knommu/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-m68knommu/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-m68knommu/marker.h	2007-05-10 09:05:19.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-mips/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-mips/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-mips/marker.h	2007-05-10 09:05:21.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-parisc/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-parisc/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-parisc/marker.h	2007-05-10 09:05:24.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-powerpc/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-powerpc/marker.h	2007-05-10 09:05:34.000000000 -0400
@@ -5,7 +5,7 @@
  * marker.h
  *
  * Code markup for dynamic and static tracing. PowerPC architecture
- * optimisations.
+ * optimizations.
  *
  * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
  *
Index: linux-2.6-lttng/include/asm-ppc/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-ppc/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-ppc/marker.h	2007-05-10 09:05:38.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-s390/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-s390/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-s390/marker.h	2007-05-10 09:05:40.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sh/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-sh/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-sh/marker.h	2007-05-10 09:05:47.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sh64/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-sh64/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-sh64/marker.h	2007-05-10 09:05:43.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sparc/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-sparc/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-sparc/marker.h	2007-05-10 09:05:52.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-sparc64/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-sparc64/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-sparc64/marker.h	2007-05-10 09:05:50.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-um/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-um/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-um/marker.h	2007-05-10 09:05:56.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-v850/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-v850/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-v850/marker.h	2007-05-10 09:05:59.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-x86_64/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86_64/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86_64/marker.h	2007-05-10 09:06:02.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/asm-xtensa/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-xtensa/marker.h	2007-05-10 09:02:17.000000000 -0400
+++ linux-2.6-lttng/include/asm-xtensa/marker.h	2007-05-10 09:06:07.000000000 -0400
@@ -1,13 +1 @@
-/*
- * marker.h
- *
- * Code markup for dynamic and static tracing. Architecture specific
- * optimisations.
- *
- * No optimisation implemented.
- *
- * This file is released under the GPLv2.
- * See the file COPYING for more details.
- */
-
 #include <asm-generic/marker.h>
Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h	2007-05-10 09:03:25.000000000 -0400
+++ linux-2.6-lttng/include/linux/marker.h	2007-05-10 09:03:51.000000000 -0400
@@ -45,7 +45,9 @@
 #define MF_PRINTK	(1 << 2)	/* vprintk can be called in the probe */
 #define _MF_NR		3		/* Number of marker flags */
 
-/* Generic marker flavor always available */
+/* Generic marker flavor always available.
+ * Note : the empty asm volatile with read constraint is used here instead of a
+ * "used" attribute to fix a gcc 4.1.x bug. */
 #define trace_mark_generic(flags, name, format, args...) \
 	do { \
 		static const char __mstrtab_name_##name[] \
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 02/10] Linux Kernel Markers, architecture independent code.
  2007-05-10  5:10   ` Alexey Dobriyan
  2007-05-10 12:58     ` Mathieu Desnoyers
@ 2007-05-10 13:12     ` Mathieu Desnoyers
  2007-05-10 19:00       ` Alexey Dobriyan
  1 sibling, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 13:12 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, linux-kernel, hch, Adrian Bunk

Linux Kernel Markers - Architecture Independant Code fix 2

Fix trivial SMP bug in list_modules.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 kernel/module.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-05-10 08:51:02.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-05-10 08:51:06.000000000 -0400
@@ -2657,13 +2657,16 @@
 	/* Enumerate loaded modules */
 	struct list_head	*i;
 	struct module		*mod;
-	unsigned long refcount = 0;
+	unsigned long refcount;
+	int cpu;
 
 	mutex_lock(&module_mutex);
 	list_for_each(i, &modules) {
 		mod = list_entry(i, struct module, list);
+		refcount = 0;
 #ifdef CONFIG_MODULE_UNLOAD
-		refcount = local_read(&mod->ref[0].count);
+		for_each_online_cpu(cpu)
+			refcount += local_read(&mod->ref[cpu].count);
 #endif //CONFIG_MODULE_UNLOAD
 		trace_mark(list_module, "%s %d %lu",
 				mod->name, mod->state, refcount);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 13:04             ` Alan Cox
@ 2007-05-10 13:16               ` Pekka J Enberg
  2007-05-10 13:43                 ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Pekka J Enberg @ 2007-05-10 13:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

On Thu, 10 May 2007, Alan Cox wrote:
> Because people want to know what it is really for ? It's quite amazing
> really at the same time as people are posting crypto keys everywhere in
> defiance of USSA law, we've got free software people trying to remove
> references to a piece of out of tree software, and one that is free
> software. 

Oh, I am not advocating censorship. I was merely pointing out that it 
seems silly to mention out-of-tree users because they come and go. Who 
knows whether LTTng or SystemTAP will be relevant five years from now? 
Besides, are we sure the document includes all potential users now? We 
wouldn't want to show favorism to any particular projects, now would we?

On Thu, 10 May 2007, Alan Cox wrote: 
> And as I keep saying the tree is full of references to out of tree stuff.
> The documentation directory alone currently contains over a thousand http
> URLS as of 2.6.21-rc6-mm1.

Many of the already out of date or expiring in the near future... But 
whatever, I don't care too much. Just wanted to say the review comment 
makes sense to me.

				Pekka

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10 13:11     ` Mathieu Desnoyers
@ 2007-05-10 13:40       ` Alan Cox
  2007-05-10 14:25         ` Mathieu Desnoyers
  2007-05-10 15:33         ` Nicholas Berry
  0 siblings, 2 replies; 61+ messages in thread
From: Alan Cox @ 2007-05-10 13:40 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Christoph Hellwig, akpm, linux-kernel

> - Fix optimisation -> optimization typo in powerpc and i386 headers.

That isn't a typo, it's correct as is.

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 13:16               ` Pekka J Enberg
@ 2007-05-10 13:43                 ` Alan Cox
  2007-05-10 14:04                   ` Pekka J Enberg
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-05-10 13:43 UTC (permalink / raw)
  To: Pekka J Enberg; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

O> Oh, I am not advocating censorship. I was merely pointing out that it 
> seems silly to mention out-of-tree users because they come and go. Who 
> knows whether LTTng or SystemTAP will be relevant five years from now? 

And in the intervening five years (or more likely 7 or more for
systemtap) the documentation will be very useful.

> Besides, are we sure the document includes all potential users now? We 
> wouldn't want to show favorism to any particular projects, now would we?

Absolutely not - and isn't it more useful to let users know about all of
the tools that this enables ?

> 
> On Thu, 10 May 2007, Alan Cox wrote: 
> > And as I keep saying the tree is full of references to out of tree stuff.
> > The documentation directory alone currently contains over a thousand http
> > URLS as of 2.6.21-rc6-mm1.
> 
> Many of the already out of date or expiring in the near future... 

Have you done a statistically valid sampling or is this guessing ?

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10 13:43                 ` Alan Cox
@ 2007-05-10 14:04                   ` Pekka J Enberg
  0 siblings, 0 replies; 61+ messages in thread
From: Pekka J Enberg @ 2007-05-10 14:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

On Thu, 10 May 2007, Alan Cox wrote:
> > Besides, are we sure the document includes all potential users now? We 
> > wouldn't want to show favorism to any particular projects, now would we?
> 
> Absolutely not - and isn't it more useful to let users know about all of
> the tools that this enables ?

So set up a http//www.linux-markers.org/ and add the list there. We're not 
going to start accepting patches that add "potentially useful new tools" 
to Documentation/marker.txt are we?

On Thu, 10 May 2007, Alan Cox wrote:
> > Many of the already out of date or expiring in the near future... 
> 
> Have you done a statistically valid sampling or is this guessing ?

It's an educated guess which proves nothing but gets the point across 
quite nicely ;-)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  6:58   ` Christoph Hellwig
  2007-05-10 11:41     ` Alan Cox
@ 2007-05-10 14:12     ` Mathieu Desnoyers
  2007-05-10 14:14     ` Mathieu Desnoyers
  2007-05-11 15:05     ` Valdis.Kletnieks
  3 siblings, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 14:12 UTC (permalink / raw)
  To: Christoph Hellwig, akpm, linux-kernel

* Christoph Hellwig (hch@infradead.org) wrote:
> On Wed, May 09, 2007 at 09:56:02PM -0400, Mathieu Desnoyers wrote:
> > Here is some documentation explaining what is/how to use the Linux
> > Kernel Markers.
> 
> Please remove all references to out of tree code from the kernel documentation.
> 

Since I expect people who will read this documentation to be familiar
with kprobes already, I tried to make the adaptation process as easy as
possible for them. You will find similar examples in
Documentation/kprobes.txt.

Also, this kind of example, which describes what kernel code is needed
to use the markers (this can be either a separate kernel module or
embedded in the kernel, like the blktrace example), seems to be a
straightforward way to teach people how to use the markers.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  6:58   ` Christoph Hellwig
  2007-05-10 11:41     ` Alan Cox
  2007-05-10 14:12     ` Mathieu Desnoyers
@ 2007-05-10 14:14     ` Mathieu Desnoyers
  2007-05-11 15:05     ` Valdis.Kletnieks
  3 siblings, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 14:14 UTC (permalink / raw)
  To: Christoph Hellwig, akpm, linux-kernel

* Christoph Hellwig (hch@infradead.org) wrote:
> On Wed, May 09, 2007 at 09:56:02PM -0400, Mathieu Desnoyers wrote:
> > Here is some documentation explaining what is/how to use the Linux
> > Kernel Markers.
> 
> Please remove all references to out of tree code from the kernel documentation.
> 

Hrm, I see.. you are objecting to mentioning of "LTTng, LKET, SystemTAP"...
I agree : it belongs to a changelog (which is bounded in time) rather
than to the documentation itself.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10 13:40       ` Alan Cox
@ 2007-05-10 14:25         ` Mathieu Desnoyers
  2007-05-10 15:33         ` Nicholas Berry
  1 sibling, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 14:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: Christoph Hellwig, akpm, linux-kernel

* Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > - Fix optimisation -> optimization typo in powerpc and i386 headers.
> 
> That isn't a typo, it's correct as is.
> 
> Alan

Well, it's correct or not depending on which side of the ocean you are.
but as I used "optimization" everywhere else in my code, I simply try to
be consistent.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10 13:40       ` Alan Cox
  2007-05-10 14:25         ` Mathieu Desnoyers
@ 2007-05-10 15:33         ` Nicholas Berry
  2007-05-10 16:09           ` Alan Cox
  1 sibling, 1 reply; 61+ messages in thread
From: Nicholas Berry @ 2007-05-10 15:33 UTC (permalink / raw)
  To: alan, mathieu.desnoyers; +Cc: linux-kernel


>>> Alan Cox <alan@lxorguk.ukuu.org.uk> 5/10/2007 9:40 AM >>>
> - Fix optimisation -> optimization typo in powerpc and i386 headers.

That isn't a typo, it's correct as is.

Alan

OED lists it under optimization, with s form as a variant.

Nik





**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
  2007-05-10  6:58   ` Christoph Hellwig
  2007-05-10 12:00   ` Christoph Hellwig
@ 2007-05-10 15:51   ` Scott Preece
  2 siblings, 0 replies; 61+ messages in thread
From: Scott Preece @ 2007-05-10 15:51 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch

Hi,

Here is a patch to marker.txt to make the English read a little
better. I didn't change the references to out-of-tree packages.

@@ -3,33 +3,30 @@
                            Mathieu Desnoyers


-       This document introduces to markers and discusses its purpose. It
-shows some usage examples of the Linux Kernel Markers : how to insert markers
-within the kernel and how to connect probes to a marker. Finally, it has some
-probe module examples. This is what connects to a marker.
-
+This document introduces Linux Kernel Markers and their use. It provides
+examples of how to insert markers in the kernel and connect probe functions to
+them and provides some examples of probe functions.

 * Purpose of markers

-A marker placed in your code provides a hook to a function (probe) that
+A marker placed in your code provides a hook to call a function (probe) that
 you can provide at runtime. A marker can be "on" (a probe is connected to it)
-or "off" (no probe is attached). An "off" marker has no effect. When turned on,
-the function you provide is called each time the marker is executed in the
-execution context of the caller. When the function provided ends its execution,
-it returns to the caller (marker site).
+or "off" (no probe is attached). When a marker is "off" it has no
+effect. When a marker is "on", the function you provide is called each
+time the marker is executed, in the execution context of the
+caller. When the function provided ends its execution, it returns to the
+caller (continuing from the marker site).

-You can put markers at important locations in the code. They act as
+You can put markers at important locations in the code. Markers are
 lightweight hooks that can pass an arbitrary number of parameters,
-described in a printk-like format string, to a function whenever the marker
-code is reached.
+described in a printk-like format string, to the attached probe function.

-They can be used for tracing (LTTng, LKET over SystemTAP), overall performance
-accounting (SystemTAP). They could also be used to implement
-efficient hooks for SELinux or any other subsystem that would have this
-kind of need.
+Markers can be used for tracing (LTTng, LKET over SystemTAP), overall
+performance accounting (SystemTAP), or to hook to monitoring
+subsystems (like SELinux) that need to observe kernel behavior.

-Using the markers for system audit (SELinux) would require to pass a
-variable by address that would be later checked by the marked routine.
+Using the markers for system audit (SELinux) would require passing a
+variable by address so that it could be checked by the marked routine.


 * Usage
@@ -52,15 +49,15 @@ Where :
 Connecting a function (probe) to a marker is done by providing a probe
 (function to call) for the specific marker through marker_set_probe(). It will
 automatically connect the function and enable the marker site. Removing a probe
-is done through marker_remove_probe(). Probe removal is preempt safe because
+is done through marker_remove_probe(). Probe removal is preempt-safe because
 preemption is disabled around the probe call. See the "Probe example" section
 below for a sample probe module.

-The marker mechanism supports multiple instances of the same marker.
-Markers can be put in inline functions, inlined static functions and
+The marker mechanism supports inserting multiple instances of the same marker.
+Markers can be put in inline functions, inlined static functions, and
 unrolled loops.

-Note : It is safe to put markers within preempt-safe code : preempt_enable()
+Note: It is safe to put markers within preempt-safe code : preempt_enable()
 will not call the scheduler due to the tests in preempt_schedule().


@@ -68,23 +65,23 @@ will not call the scheduler due to the t

 You will find, in asm-*/marker.h, optimisations for given architectures
 (currently i386 and powerpc). They use a load immediate instead of a data load,
-which saves a data cache hit, but also requires cross CPU code modification. In
-order to support embedded systems which use read-only memory for
their code, the
+which saves a data cache hit, but also requires cross-CPU code modification. In
+order to support embedded systems that use read-only memory for their code, the
 optimization can be disabled through menu options.

 The MF_* flags can be used to control the type of marker. See the
 include/marker.h header for the list of flags. They can be specified as the
-first parameter of the _trace_mark() macro, such as the following example which
+first parameter of the _trace_mark() macro, as in the following example, which
 is safe with respect to lockdep.c (useful for marking lockdep.c and printk
 functions).

 _trace_mark(MF_DEFAULT | ~MF_LOCKDEP, subsystem_eventb, MARK_NOARGS);

-Another example is to specify that a specific marker must never call printk :
+Another example is to specify that a specific marker must never call printk:
 _trace_mark(MF_DEFAULT | ~MF_PRINTK, subsystem_eventc,
   "%d %s", someint, somestring,);

-Flag compatibility is checked before connecting the probe to the marker : the
+Flag compatibility is checked before connecting the probe to the marker: the
 right flags must be given to _marker_set_enable().

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10  9:06   ` Andi Kleen
@ 2007-05-10 15:55     ` Mathieu Desnoyers
  2007-05-10 16:28       ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 15:55 UTC (permalink / raw)
  To: Andi Kleen, systemtap, prasanna, ananth, anil.s.keshavamurthy
  Cc: akpm, linux-kernel, hch

* Andi Kleen (ak@muc.de) wrote:
> On Wed, May 09, 2007 at 09:56:00PM -0400, Mathieu Desnoyers wrote:
> > @@ -0,0 +1,99 @@
> > +/* marker.c
> > + *
> > + * Erratum 49 fix for Intel PIII and higher.
> 
> Errata are CPU specific so they can't be higher. You mean it's a P3
> erratum only?
> 
> In general you need some more description why the int3 handler is needed.
> 

>From :
Intel® Core™2 Duo Processor for
Intel® Centrino® Duo Processor
Technology
Specification Update 

AH33.
Unsynchronized Cross-Modifying Code Operations Can Cause
Unexpected Instruction Execution Results

It is therefore still relevant on newer CPUs. I can add reference to this
documentation in the header.


> > + *
> > + * Permits marker activation by XMC with correct serialization.
> 
> This doesn't follow the Intel recommended algorithm for cross modifying
> code (7.1.3 in the IA32 manual) which requires the other CPUs to spin
> during the modification.
> 
> If you used that would you really need the INT3 handler?
> 

Let's go through the excercice of applying their algorithm to a running
kernel.

You would have to do something like : 
CPU A wants to modify code; it initializes 2 completion barriers (one to
know when all other cpus are spinning and another to tell them they can
stop spinning) and sends an IPI to every other CPUs.  All other CPUs,
upon entry in the IPI handler, disable interrupts, indicate that they
are spinning in the 1st barrier and wait for the completion variable to
be reset by CPU A. When all other CPUs are ready, CPU A disables
interrupts and proceeds to the change, then removes the second memory
barrier to let the other CPUs exit their loops.

* First issue : Impact on the system. If we try to make this system
  scale, we will create very long irq disable sections. The expected
  duration is the worse case IPI latency plus the time it takes to CPU A
  to change the variable. We therefore directly grow the worse case
  system's interrupt latency.

* Second issue : irq disabling does not protect us from NMI and traps.
  We cannot use this algorithm to mark these code segments.

* Third issue : Scalability. Changing code will stop every CPU on the
  system for a while. Compared to this, the int3-based approach will run
  through the breakpoint handler "if" one of the CPU happens to execute
  this code at the wrong time. The standard case is just an IPI (to
  issue one "iret" and a synchronize_sched().

Also note that djprobes, a jump-based enhancement of kprobes, uses a
mechanism very similar to mine. I did not use their implementation
because I consider that kprobes has too much side-effects to be used in
"hard to instrument" sites (traps, nmi handlers, lockdep code). kprobes
also uses a temporary data structure to put the stepping instructions,
something I wanted to avoid. I enables me to do the teardown of the
breakpoint even when it is placed in preemptible code.

> > +static DEFINE_MUTEX(mark_mutex);
> > +static long target_eip = 0;
> > +
> > +static void mark_synchronize_core(void *info)
> > +{
> > +	sync_core();	/* use cpuid to stop speculative execution */
> > +}
> > +
> > +/* We simply skip the 2 bytes load immediate here, leaving the register in an
> > + * undefined state. We don't care about the content (0 or !0), because we are
> > + * changing the value 0->1 or 1->0. This small window of undefined value
> > + * doesn't matter.
> > + */
> > +static int mark_notifier(struct notifier_block *nb,
> > +	unsigned long val, void *data)
> > +{
> > +	enum die_val die_val = (enum die_val) val;
> > +	struct die_args *args = (struct die_args *)data;
> > +
> > +	if (!args->regs || user_mode_vm(args->regs))
> 
> I don't think regs should be ever NULL
> 

I played safe and used the same tests done in kprobes. I'll let them
explain. See

arch/i386/kernel/kprobes.c

int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
                                       unsigned long val, void *data)
{
        struct die_args *args = (struct die_args *)data;
        int ret = NOTIFY_DONE;

        if (args->regs && user_mode_vm(args->regs))
                return ret;
...


> > +		return NOTIFY_DONE;
> > +
> > +	if (die_val == DIE_INT3	&& args->regs->eip == target_eip) {
> > +		args->regs->eip += 1; /* Skip the next byte of load immediate */
> 
> In what instruction results that then? This is definitely underdocumented.
> 

Should maybe document this more :

The eip there points right after the 1 byte breakpoint. The breakpoint
replaces the 1st byte of a 2 bytes load immediate. Therefore, if I skip
the following byte (the load immediate value), I am then at the
instruction following the load immediate. As I stated in my comments,
since I check that there is a state change to the variable (0->1 or
1->0), I can afford having garbage in my registers at this point,
because it will be either the state prior to the change I am currently
doing or the new state after the change.

I hope it makes the following comment, found just over mark_notifier(),
clearer : 
/* We simply skip the 2 bytes load immediate here, leaving the register in an
 * undefined state. We don't care about the content (0 or !0), because we are
 * changing the value 0->1 or 1->0. This small window of undefined value
 * doesn't matter.
 */

I will add this paragraph prior to the existing one to make things clearer :

/* The eip value points right after the breakpoint instruction, in the second
 * byte of the movb. Incrementing it of 1 byte makes the code resume right after
 * the movb instruction, effectively skipping this instruction.

> > +int marker_optimized_set_enable(void *address, char enable)
> > +{
> > +	char saved_byte;
> > +	int ret;
> > +	char *dest = address;
> > +
> > +	if (!(enable ^ dest[1])) /* Must be a state change 0<->1 to execute */
> > +		return 0;
> 
> Wouldn't you need that inside the mutex too to avoid races?
> 

Absolutely, will fix.

> 
> > +
> > +	mutex_lock(&mark_mutex);
> > +	target_eip = (long)address + BREAKPOINT_INS_LEN;
> > +	/* register_die_notifier has memory barriers */
> > +	register_die_notifier(&mark_notify);
> > +	saved_byte = *dest;
> > +	*dest = BREAKPOINT_INSTRUCTION;
> > +	wmb();
> 
> wmb is a nop
> 

Quoting asm-i386/system.h :

 * Some non intel clones support out of order store. wmb() ceases to be
 * a nop for these.

#ifdef CONFIG_X86_OOSTORE
/* Actually there are no OOO store capable CPUs for now that do SSE, 
   but make it already an possibility. */
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
#else
#define wmb()   __asm__ __volatile__ ("": : :"memory")
#endif

As I will soon reuse this code on x86_64, wmb() is not a nop under
CONFIG_UNORDERED_IO.

Therefore, I think I am playing safe by leaving a wmb() there.

> > +	/* Execute serializing instruction on each CPU.
> > +	 * Acts as a memory barrier. */
> > +	ret = on_each_cpu(mark_synchronize_core, NULL, 1, 1);
> > +	BUG_ON(ret != 0);
> > +
> > +	dest[1] = enable;
> > +	wmb();
> > +	*dest = saved_byte;
> > +		/* Wait for all int3 handlers to end
> > +		   (interrupts are disabled in int3).
> > +		   This CPU is clearly not in a int3 handler
> > +		   (not preemptible).
> 
> So what happens when the kernel is preemptible? 
> 

This comment may be unclear : the "(not preemptible)" applies to the
int3 handler itself. I'll arrange it.

Because the int3 handler disables interrupts, it is not preemptible.
Therefore, the CPU running this code, doing the synchronize_sched(),
cannot reschedule a int3 handler running on a preempted thread stack
waiting to be scheduled again. Since original mov instruction has been
placed back before the call to synchronize_sched(), and there is a
memory barrier in synchronize_sched, we know there is no possibility for
an int3 handler to run for this instruction on the CPU that runs
synchronize_sched(). At the end of the synchronize_sched(), we know we
have exited every currently executing non-preemptible sections, which
includes the int3 handlers of every other CPUs.


> > +		   synchronize_sched has memory barriers */
> > +	synchronize_sched();
> > +	unregister_die_notifier(&mark_notify);
> > +	/* unregister_die_notifier has memory barriers */
> > +	target_eip = 0;
> > +	mutex_unlock(&mark_mutex);
> > +	flush_icache_range(address, size);
> 
> That's a nop on x86
> 

Right, and eventually on x86_64 too. Will remove.

> > +
> > +#ifdef CONFIG_MARKERS
> > +
> > +#define MF_DEFAULT (MF_OPTIMIZED | MF_LOCKDEP | MF_PRINTK)
> > +
> > +/* Optimized version of the markers */
> > +#define trace_mark_optimized(flags, name, format, args...) \
> > +	do { \
> > +		static const char __mstrtab_name_##name[] \
> > +		__attribute__((section("__markers_strings"))) \
> > +		= #name; \
> > +		static const char __mstrtab_format_##name[] \
> > +		__attribute__((section("__markers_strings"))) \
> > +		= format; \
> > +		static const char __mstrtab_args_##name[] \
> > +		__attribute__((section("__markers_strings"))) \
> > +		= #args; \
> 
> For what do you need special string sections?  
> 
> If it's something to be read by external programs the interface
> would need to be clearly defined and commented.
> If not just use normal variables.
> 

Markers, by their nature, will be declared everywhere through the kernel
code. Therefore, the strings would pollute the kernel data and use up
space in the data caches even when the markers are disabled. This is why
I think it makes sense to put this data in a separate section.


> > +		static struct __mark_marker_data __mark_data_##name \
> > +		__attribute__((section("__markers_data"))) = \
> > +		{ __mstrtab_name_##name,  __mstrtab_format_##name, \
> > +		__mstrtab_args_##name, \
> > +		(flags) | MF_OPTIMIZED, __mark_empty_function, NULL }; \
> > +		char condition; \
> > +		asm volatile(	".section __markers, \"a\", @progbits;\n\t" \
> 
> The volatile is not needed because the output is used.
> 

True. Will fix. The same applies to asm-powerpc/marker.h.


> > +					".long %1, 0f;\n\t" \
> > +					".previous;\n\t" \
> > +					".align 2\n\t" \
> > +					"0:\n\t" \
> > +					"movb $0,%0;\n\t" \
> 
> This should be a generic facility in a separate include file / separate 
> macros etc so that it can be used by other code too.
> 

Yep, the split into a "conditional call" infrastructure will come soon.
I was thinking of something like cond_call(func(arg1, arg2));. If you
think of a better macro name, I am open to suggestions.

> > +				: "=r" (condition) \
> > +				: "m" (__mark_data_##name)); \
> > +		__mark_check_format(format, ## args); \
> > +		if (likely(!condition)) { \
> > +		} else { \
> > +			preempt_disable(); \
> 
> Why the preempt disable here and why can't it be in the hook?
> 

Because I want to safely disconnect hooks. And I need to issue a
synchronize_sched() after disconnection to make sure every handler
finished running before I unload the module that implements the hook
code. Therefore, I must surround the call by preempt_disable.

> > +			(*__mark_data_##name.call)(&__mark_data_##name, \
> > +						format, ## args); \
> > +			preempt_enable(); \
> > +		} \
> > +	} while (0)
> > +
> > +/* Marker macro selecting the generic or optimized version of marker, depending
> > + * on the flags specified. */
> > +#define _trace_mark(flags, format, args...) \
> > +do { \
> > +	if (((flags) & MF_LOCKDEP) && ((flags) & MF_OPTIMIZED)) \
> > +		trace_mark_optimized(flags, format, ## args); \
> > +	else \
> > +		trace_mark_generic(flags, format, ## args); \
> 
> Is there ever a good reason not to use optimized markers? 
> 

Yep, instrumentation of the breakpoint handler, and instrumentation of
anything that can be called from the breakpoint handler, namely
lockdep.c code.

> > + * bytes. */
> > +#define MARK_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
> > +#define MARK_OPTIMIZED_ENABLE_TYPE char
> 
> unsigned char is usually safer to avoid sign extension
> 

Ok, will fix.

Thanks for the thorough review,

Mathieu

> -Andi

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 06/10] Linux Kernel Markers - Non optimized architectures
  2007-05-10 15:33         ` Nicholas Berry
@ 2007-05-10 16:09           ` Alan Cox
  0 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2007-05-10 16:09 UTC (permalink / raw)
  To: Nicholas Berry; +Cc: mathieu.desnoyers, linux-kernel

On Thu, 10 May 2007 11:33:41 -0400
"Nicholas Berry" <nikberry@med.umich.edu> wrote:

> OED lists it under optimization, with s form as a variant.

As I said - its not a typo. 

I'm not objecting to the change which is consistent within the markers
code.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10 15:55     ` Mathieu Desnoyers
@ 2007-05-10 16:28       ` Alan Cox
  2007-05-10 16:59         ` Mathieu Desnoyers
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-05-10 16:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, systemtap, prasanna, ananth, anil.s.keshavamurthy,
	akpm, linux-kernel, hch

> * First issue : Impact on the system. If we try to make this system
>   scale, we will create very long irq disable sections. The expected
>   duration is the worse case IPI latency plus the time it takes to CPU A
>   to change the variable. We therefore directly grow the worse case
>   system's interrupt latency.

Not a huge problem. It doesn't scale in really horrible ways and the IPI
latency on a PIV or later is actually very good. Also the impact is less
than you might think as on huge huge boxes you want multiple copies of
the kernel text pages to reduce NUMA traffic, so you only have to sync
the group of processors involved 

> * Second issue : irq disabling does not protect us from NMI and traps.
>   We cannot use this algorithm to mark these code segments.

If you synchronize all the other processors and disable local interrupts
then the only traps you have to worry about are those you cause, and the
only person taking the trap will be you so you're ok.

NMI is hard but NMI is a special case not worth solving IMHO.

> * Third issue : Scalability. Changing code will stop every CPU on the
>   system for a while. Compared to this, the int3-based approach will run
>   through the breakpoint handler "if" one of the CPU happens to execute
>   this code at the wrong time. The standard case is just an IPI (to

If I read the errata right then patching in an int3 will itself trigger
the errata so anything could happen.

I believe there are other safe sequences for doing code patching - perhaps
one of the Intel folk can advise ?

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10 16:28       ` Alan Cox
@ 2007-05-10 16:59         ` Mathieu Desnoyers
  2007-05-11  4:57           ` Ananth N Mavinakayanahalli
  2007-05-11  6:04           ` Andi Kleen
  0 siblings, 2 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 16:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, systemtap, prasanna, ananth, anil.s.keshavamurthy,
	akpm, linux-kernel, hch

* Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > * First issue : Impact on the system. If we try to make this system
> >   scale, we will create very long irq disable sections. The expected
> >   duration is the worse case IPI latency plus the time it takes to CPU A
> >   to change the variable. We therefore directly grow the worse case
> >   system's interrupt latency.
> 
> Not a huge problem. It doesn't scale in really horrible ways and the IPI
> latency on a PIV or later is actually very good. Also the impact is less
> than you might think as on huge huge boxes you want multiple copies of
> the kernel text pages to reduce NUMA traffic, so you only have to sync
> the group of processors involved 
> 
> > * Second issue : irq disabling does not protect us from NMI and traps.
> >   We cannot use this algorithm to mark these code segments.
> 
> If you synchronize all the other processors and disable local interrupts
> then the only traps you have to worry about are those you cause, and the
> only person taking the trap will be you so you're ok.
> 
> NMI is hard but NMI is a special case not worth solving IMHO.
> 

Not caring about NMIs may have more impact than one could expect. You
have to be aware that (at least) the following code is executed in NMI
context. Trying to patch any of these functions could result in a dying
CPU :

default_do_nmi
  notify_die
  nmi_watchdog_tick
    printk
    die_nmi (should cause a OOPS, not kill the cpu..)
  do_nmi_callback
    unknown_nmi_panic_callback
      sprintf
  unknown_nmi_error
    panic
  reassert_nmi

In entry.S, there is also a call to local_irq_enable(), which falls into
lockdep code.

Tracing those core kernel functions is a fundamental need of crash
tracing. So, in my point of view, it is not "just" about tracing NMIs,
but it's about tracing code that can be touched by NMIs.

> > * Third issue : Scalability. Changing code will stop every CPU on the
> >   system for a while. Compared to this, the int3-based approach will run
> >   through the breakpoint handler "if" one of the CPU happens to execute
> >   this code at the wrong time. The standard case is just an IPI (to
> 
> If I read the errata right then patching in an int3 will itself trigger
> the errata so anything could happen.
> 
> I believe there are other safe sequences for doing code patching - perhaps
> one of the Intel folk can advise ?
> 

I'll let the Intel guys confirm this, I don't have the reference nearby
(I got this information by talking with the kprobe team members, and
they got this information directly from Intel developers) but the
int3 is the one special case to which the errata does not apply.
Otherwise, kprobes and gdb would have a big, big issue.

Mathieu


> Alan

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 02/10] Linux Kernel Markers, architecture independent code.
  2007-05-10 13:12     ` Mathieu Desnoyers
@ 2007-05-10 19:00       ` Alexey Dobriyan
  2007-05-10 19:46         ` Mathieu Desnoyers
  0 siblings, 1 reply; 61+ messages in thread
From: Alexey Dobriyan @ 2007-05-10 19:00 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel, hch, Adrian Bunk

On Thu, May 10, 2007 at 09:12:41AM -0400, Mathieu Desnoyers wrote:
> Linux Kernel Markers - Architecture Independant Code fix 2
>
> Fix trivial SMP bug in list_modules.

> --- linux-2.6-lttng.orig/kernel/module.c	2007-05-10 08:51:02.000000000 -0400
> +++ linux-2.6-lttng/kernel/module.c	2007-05-10 08:51:06.000000000 -0400
> @@ -2657,13 +2657,16 @@
>  	/* Enumerate loaded modules */
>  	struct list_head	*i;
>  	struct module		*mod;
> -	unsigned long refcount = 0;
> +	unsigned long refcount;
> +	int cpu;
>
>  	mutex_lock(&module_mutex);
>  	list_for_each(i, &modules) {
>  		mod = list_entry(i, struct module, list);
> +		refcount = 0;
>  #ifdef CONFIG_MODULE_UNLOAD
> -		refcount = local_read(&mod->ref[0].count);
> +		for_each_online_cpu(cpu)
> +			refcount += local_read(&mod->ref[cpu].count);
>  #endif //CONFIG_MODULE_UNLOAD

Still wrong: module_get on cpu X, cpu X hot unplugged. There is
module_refcount() for you.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 02/10] Linux Kernel Markers, architecture independent code.
  2007-05-10 19:00       ` Alexey Dobriyan
@ 2007-05-10 19:46         ` Mathieu Desnoyers
  0 siblings, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-10 19:46 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, linux-kernel, hch, Adrian Bunk

* Alexey Dobriyan (adobriyan@gmail.com) wrote:
> On Thu, May 10, 2007 at 09:12:41AM -0400, Mathieu Desnoyers wrote:
> > Linux Kernel Markers - Architecture Independant Code fix 2
> >
> > Fix trivial SMP bug in list_modules.
> 
> > --- linux-2.6-lttng.orig/kernel/module.c	2007-05-10 08:51:02.000000000 -0400
> > +++ linux-2.6-lttng/kernel/module.c	2007-05-10 08:51:06.000000000 -0400
> > @@ -2657,13 +2657,16 @@
> >  	/* Enumerate loaded modules */
> >  	struct list_head	*i;
> >  	struct module		*mod;
> > -	unsigned long refcount = 0;
> > +	unsigned long refcount;
> > +	int cpu;
> >
> >  	mutex_lock(&module_mutex);
> >  	list_for_each(i, &modules) {
> >  		mod = list_entry(i, struct module, list);
> > +		refcount = 0;
> >  #ifdef CONFIG_MODULE_UNLOAD
> > -		refcount = local_read(&mod->ref[0].count);
> > +		for_each_online_cpu(cpu)
> > +			refcount += local_read(&mod->ref[cpu].count);
> >  #endif //CONFIG_MODULE_UNLOAD
> 
> Still wrong: module_get on cpu X, cpu X hot unplugged. There is
> module_refcount() for you.

Ok, fixing. thanks.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-05-10  6:51   ` Christoph Hellwig
@ 2007-05-10 22:14     ` David Smith
  2007-06-23  8:09       ` Christoph Hellwig
  0 siblings, 1 reply; 61+ messages in thread
From: David Smith @ 2007-05-10 22:14 UTC (permalink / raw)
  To: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel, David Smith

Christoph Hellwig wrote:
> NACK.  This shouldn't be includedable from userspace and systemtap people
> should stop doing crap like that but use kernel infrastructure everyone
> else uses including runtime kernel code instead of stuffin all kinds of
> crap into their broken translator.

Christoph,

I was using the above information to parse all the available markers in 
a kernel, but I believe I've found a way where I don't need the marker.h 
header, so I'm OK with your NACK.

Out of curiosity, what is the kernel infrastructure you believe I should 
be using to get a list of all available markers in a kernel?  Note that 
I also have a requirement to get a list of all available markers in a 
kernel that isn't the currently running one.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10 16:59         ` Mathieu Desnoyers
@ 2007-05-11  4:57           ` Ananth N Mavinakayanahalli
  2007-05-11 18:55             ` Mathieu Desnoyers
  2007-05-12  5:29             ` Suparna Bhattacharya
  2007-05-11  6:04           ` Andi Kleen
  1 sibling, 2 replies; 61+ messages in thread
From: Ananth N Mavinakayanahalli @ 2007-05-11  4:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Alan Cox, Andi Kleen, systemtap, prasanna, anil.s.keshavamurthy,
	akpm, linux-kernel, hch, richardj_moore, suparna

On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:

...
> > > * Third issue : Scalability. Changing code will stop every CPU on the
> > >   system for a while. Compared to this, the int3-based approach will run
> > >   through the breakpoint handler "if" one of the CPU happens to execute
> > >   this code at the wrong time. The standard case is just an IPI (to
> > 
> > If I read the errata right then patching in an int3 will itself trigger
> > the errata so anything could happen.
> > 
> > I believe there are other safe sequences for doing code patching - perhaps
> > one of the Intel folk can advise ?

IIRC, when the first implementation of what exists now as kprobes was
done (as part of the dprobes framework), this question did come up. I
think the conclusion was that the errata applies only to multi-byte
modifications and single-byte changes are guaranteed to be atomic.
Given int3 on Intel is just 1-byte, we are safe.

> I'll let the Intel guys confirm this, I don't have the reference nearby
> (I got this information by talking with the kprobe team members, and
> they got this information directly from Intel developers) but the
> int3 is the one special case to which the errata does not apply.
> Otherwise, kprobes and gdb would have a big, big issue.

Perhaps Richard/Suparna can confirm.

Ananth

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-10 16:59         ` Mathieu Desnoyers
  2007-05-11  4:57           ` Ananth N Mavinakayanahalli
@ 2007-05-11  6:04           ` Andi Kleen
  2007-05-11 18:02             ` Mathieu Desnoyers
  1 sibling, 1 reply; 61+ messages in thread
From: Andi Kleen @ 2007-05-11  6:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Alan Cox, systemtap, prasanna, ananth, anil.s.keshavamurthy,
	akpm, linux-kernel, hch

On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > > * First issue : Impact on the system. If we try to make this system
> > >   scale, we will create very long irq disable sections. The expected
> > >   duration is the worse case IPI latency plus the time it takes to CPU A
> > >   to change the variable. We therefore directly grow the worse case
> > >   system's interrupt latency.
> > 
> > Not a huge problem. It doesn't scale in really horrible ways and the IPI
> > latency on a PIV or later is actually very good. Also the impact is less
> > than you might think as on huge huge boxes you want multiple copies of
> > the kernel text pages to reduce NUMA traffic, so you only have to sync
> > the group of processors involved 

I agree with Alan and disagree with you on the impact on the system.

> > 
> > > * Second issue : irq disabling does not protect us from NMI and traps.
> > >   We cannot use this algorithm to mark these code segments.
> > 
> > If you synchronize all the other processors and disable local interrupts
> > then the only traps you have to worry about are those you cause, and the
> > only person taking the trap will be you so you're ok.
> > 
> > NMI is hard but NMI is a special case not worth solving IMHO.
> > 
> 
> Not caring about NMIs may have more impact than one could expect. You
> have to be aware that (at least) the following code is executed in NMI
> context. Trying to patch any of these functions could result in a dying
> CPU :

There is a function to disable the nmi watchdog temporarily now


> In entry.S, there is also a call to local_irq_enable(), which falls into
> lockdep code.

??

> 
> Tracing those core kernel functions is a fundamental need of crash
> tracing. So, in my point of view, it is not "just" about tracing NMIs,
> but it's about tracing code that can be touched by NMIs.

You only need to handle the erratas during the modification, not during
the whole lifetime of the marker. 

The only frequent NMIs are watchdog and oprofile which both can
be stopped. Other NMIs are very infrequent.

BTW if you worry about NMI you would need to worry about machine
check and SMI too.

-Andi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 07/10] Linux Kernel Markers - Documentation
  2007-05-10  6:58   ` Christoph Hellwig
                       ` (2 preceding siblings ...)
  2007-05-10 14:14     ` Mathieu Desnoyers
@ 2007-05-11 15:05     ` Valdis.Kletnieks
  3 siblings, 0 replies; 61+ messages in thread
From: Valdis.Kletnieks @ 2007-05-11 15:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Mathieu Desnoyers, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

On Thu, 10 May 2007 07:58:47 BST, Christoph Hellwig said:
> On Wed, May 09, 2007 at 09:56:02PM -0400, Mathieu Desnoyers wrote:
> > Here is some documentation explaining what is/how to use the Linux
> > Kernel Markers.
> 
> Please remove all references to out of tree code from the kernel documentation.

[/usr/src/linux-2.6.21-mm2/Documentation] find . | xargs grep /sbin | wc -l
202

-ENOPATCH?

(The usefulness of a kernel documentation tree that doesn't mention hotplug,
udev, modprobe, and initrd, all of which are out-of-tree, is more than slightly
questionable...)


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-11  6:04           ` Andi Kleen
@ 2007-05-11 18:02             ` Mathieu Desnoyers
  2007-05-11 21:56               ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-11 18:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, systemtap, prasanna, ananth, anil.s.keshavamurthy,
	akpm, linux-kernel, hch

* Andi Kleen (ak@muc.de) wrote:
> On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> > * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > > > * First issue : Impact on the system. If we try to make this system
> > > >   scale, we will create very long irq disable sections. The expected
> > > >   duration is the worse case IPI latency plus the time it takes to CPU A
> > > >   to change the variable. We therefore directly grow the worse case
> > > >   system's interrupt latency.
> > > 
> > > Not a huge problem. It doesn't scale in really horrible ways and the IPI
> > > latency on a PIV or later is actually very good. Also the impact is less
> > > than you might think as on huge huge boxes you want multiple copies of
> > > the kernel text pages to reduce NUMA traffic, so you only have to sync
> > > the group of processors involved 
> 
> I agree with Alan and disagree with you on the impact on the system.
> 

I just want to make sure I understand your disagreement. You do not seem
to provide any counter-argument to the following technical fact : the
proposed algorithm will increase the worse-case interrupt latency of the
kernel.

The IPI might be fast, but I have seen interrupts being disabled for
quite a long time in some kernel code paths. Having interrupts disabled
on _each cpu_ while running an IPI handler waiting to be synchronized
with other CPUs has this side-effect. Therefore, if I understand well,
you object that the worse-case interrupt latency in the Linux kernel is
not important. Since I have some difficulty agreeing with your
objection, I'll leave the debate about the importance of such
side-effects to others, since it is mostly a political issue.

Or maybe am I not understanding you correctly ?


> > > 
> > > > * Second issue : irq disabling does not protect us from NMI and traps.
> > > >   We cannot use this algorithm to mark these code segments.
> > > 
> > > If you synchronize all the other processors and disable local interrupts
> > > then the only traps you have to worry about are those you cause, and the
> > > only person taking the trap will be you so you're ok.
> > > 
> > > NMI is hard but NMI is a special case not worth solving IMHO.
> > > 
> > 
> > Not caring about NMIs may have more impact than one could expect. You
> > have to be aware that (at least) the following code is executed in NMI
> > context. Trying to patch any of these functions could result in a dying
> > CPU :
> 
> There is a function to disable the nmi watchdog temporarily now
> 

As you pointed out below, NMI is only one example of interrupt sources
that cannot be protected by irq disable, such as MCE and SMIs. See below
for the rest of the discussion about this point.

> 
> > In entry.S, there is also a call to local_irq_enable(), which falls into
> > lockdep code.
> 
> ??
> 

arch/i386/kernel/entry.S :
iret_exc:
        TRACE_IRQS_ON
        ENABLE_INTERRUPTS(CLBR_NONE)
        pushl $0                        # no error code
        pushl $do_iret_error
        jmp error_code

include/asm-i386/irqflags.h
# define TRACE_IRQS_ON                          \
        pushl %eax;                             \
        pushl %ecx;                             \
        pushl %edx;                             \
        call trace_hardirqs_on;                 \
        popl %edx;                              \
        popl %ecx;                              \
        popl %eax;

Which falls into the lockdep.c code.

> > 
> > Tracing those core kernel functions is a fundamental need of crash
> > tracing. So, in my point of view, it is not "just" about tracing NMIs,
> > but it's about tracing code that can be touched by NMIs.
> 
> You only need to handle the erratas during the modification, not during
> the whole lifetime of the marker. 
> 

I agree with you, but you need to make the modification of every callees
of functions such as printk() safe in order to be able to trace them
later.

> The only frequent NMIs are watchdog and oprofile which both can
> be stopped. Other NMIs are very infrequent.

If we race with an "infrequent" NMI with this algorithm, it will result
in a unspecified trap, most likely a GPF. So having a solution that is
correct most of the time is not an option here. It will not just cause a
glitch, but bring the whole system down.

> 
> BTW if you worry about NMI you would need to worry about machine
> check and SMI too.
> 

Absolutely. I use NMIs as an example of these conditions, but MCE and
SMI also present the same issue.

So we get another example (I am sure we could easily find more) :

arch/i386/kernel/cpu/mcheck/p4.c:intel_machine_check
  printk (and everything that printk calls)
    vprintk
      printk_clock
        sched_clock
      release_console_sem
        call_console_drivers
          .. therefore serial port code..
        wake_up_klogd
          wake_up_interruptible
            try_to_wake_up

So.. just the fact that the MCE uses printk involves scheduler code
execution. If you plan not to support NMI, MCE or SMI, you have to
forbid instrumentation of any of those code paths.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-11  4:57           ` Ananth N Mavinakayanahalli
@ 2007-05-11 18:55             ` Mathieu Desnoyers
  2007-05-12  5:29             ` Suparna Bhattacharya
  1 sibling, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-11 18:55 UTC (permalink / raw)
  To: Ananth N Mavinakayanahalli
  Cc: Alan Cox, Andi Kleen, systemtap, prasanna, anil.s.keshavamurthy,
	akpm, linux-kernel, hch, richardj_moore, suparna

* Ananth N Mavinakayanahalli (ananth@in.ibm.com) wrote:
> On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> > * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> 
> ...
> > > > * Third issue : Scalability. Changing code will stop every CPU on the
> > > >   system for a while. Compared to this, the int3-based approach will run
> > > >   through the breakpoint handler "if" one of the CPU happens to execute
> > > >   this code at the wrong time. The standard case is just an IPI (to
> > > 
> > > If I read the errata right then patching in an int3 will itself trigger
> > > the errata so anything could happen.
> > > 
> > > I believe there are other safe sequences for doing code patching - perhaps
> > > one of the Intel folk can advise ?
> 
> IIRC, when the first implementation of what exists now as kprobes was
> done (as part of the dprobes framework), this question did come up. I
> think the conclusion was that the errata applies only to multi-byte
> modifications and single-byte changes are guaranteed to be atomic.
> Given int3 on Intel is just 1-byte, we are safe.
> 
> > I'll let the Intel guys confirm this, I don't have the reference nearby
> > (I got this information by talking with the kprobe team members, and
> > they got this information directly from Intel developers) but the
> > int3 is the one special case to which the errata does not apply.
> > Otherwise, kprobes and gdb would have a big, big issue.
> 
> Perhaps Richard/Suparna can confirm.
> 

Ha-ha! I found the reference. It's worth quoting in full :
http://sourceware.org/ml/systemtap/2005-q3/msg00208.html
------
From: Richard J Moore <richardj_moore at uk dot ibm dot com>

There is another issue to consider when looking into using probes other
then int3:

Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
practice of modifying code on one processor where another has prefetched
the unmodified version of the code. Intel states that unpredictable
general protection faults may result if a synchronizing instruction
(iret, int, int3, cpuid, etc ) is not executed on the second processor
before it executes the pre-fetched out-of-date copy of the instruction.

When we became aware of this I had a long discussion with Intel's
microarchitecture guys. It turns out that the reason for this erratum
(which incidentally Intel does not intend to fix) is because the trace
cache - the stream of micorops resulting from instruction interpretation
- cannot guaranteed to be valid. Reading between the lines I assume this
issue arises because of optimization done in the trace cache, where it
is no longer possible to identify the original instruction boundaries.
If the CPU discoverers that the trace cache has been invalidated because
of unsynchronized cross-modification then instruction execution will be
aborted with a GPF. Further discussion with Intel revealed that
replacing the first opcode byte with an int3 would not be subject to
this erratum.

So, is cmpxchg reliable? One has to guarantee more than mere atomicity.

-----

Therefore, it is exactly what my implementation is doing : I make sure
that no CPU sees an out-of-date copy of a pre-fetched instruction by 1 -
using a breakpoint, which skips the instruction that is going to be
modified, 2 - issuing an IPI to every CPU to execute a sync_core(), to
make sure that even when the breakpoint is removed, no cpu could
possibly still have the out-of-date copy of the instruction, modify the
now unused 2nd byte of the instruction, and then put back the original
1st byte of the instruction.

It has exactly the same intent as the algorithm proposed by Intel, but
it has less side-effects, scales better and supports NMI, SMI and MCE.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-11 18:02             ` Mathieu Desnoyers
@ 2007-05-11 21:56               ` Alan Cox
  2007-05-13 15:20                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-05-11 21:56 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, systemtap, prasanna, ananth, anil.s.keshavamurthy,
	akpm, linux-kernel, hch

> The IPI might be fast, but I have seen interrupts being disabled for
> quite a long time in some kernel code paths. Having interrupts disabled
> on _each cpu_ while running an IPI handler waiting to be synchronized
> with other CPUs has this side-effect. Therefore, if I understand well,

This can already occur worst case when we spin on an IPI (eg a cross CPU
TLB shootdown)

If the INT3 is acknowledged as safe by intel either as is or by some
specific usage like lock mov then great. If not it isn't too bad a
problem.

And to be real about this - how many benchmarks do you know that care
about mega-kernel-debugs per second ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-11  4:57           ` Ananth N Mavinakayanahalli
  2007-05-11 18:55             ` Mathieu Desnoyers
@ 2007-05-12  5:29             ` Suparna Bhattacharya
  1 sibling, 0 replies; 61+ messages in thread
From: Suparna Bhattacharya @ 2007-05-12  5:29 UTC (permalink / raw)
  To: Ananth N Mavinakayanahalli
  Cc: Mathieu Desnoyers, Alan Cox, Andi Kleen, systemtap, prasanna,
	anil.s.keshavamurthy, akpm, linux-kernel, hch, richardj_moore

On Fri, May 11, 2007 at 10:27:29AM +0530, Ananth N Mavinakayanahalli wrote:
> On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> > * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> 
> ...
> > > > * Third issue : Scalability. Changing code will stop every CPU on the
> > > >   system for a while. Compared to this, the int3-based approach will run
> > > >   through the breakpoint handler "if" one of the CPU happens to execute
> > > >   this code at the wrong time. The standard case is just an IPI (to
> > > 
> > > If I read the errata right then patching in an int3 will itself trigger
> > > the errata so anything could happen.
> > > 
> > > I believe there are other safe sequences for doing code patching - perhaps
> > > one of the Intel folk can advise ?
> 
> IIRC, when the first implementation of what exists now as kprobes was
> done (as part of the dprobes framework), this question did come up. I
> think the conclusion was that the errata applies only to multi-byte
> modifications and single-byte changes are guaranteed to be atomic.
> Given int3 on Intel is just 1-byte, we are safe.
> 
> > I'll let the Intel guys confirm this, I don't have the reference nearby
> > (I got this information by talking with the kprobe team members, and
> > they got this information directly from Intel developers) but the
> > int3 is the one special case to which the errata does not apply.
> > Otherwise, kprobes and gdb would have a big, big issue.
> 
> Perhaps Richard/Suparna can confirm.

I just tried digging up past discussions on this from Richard, about int3
being safe

http://sourceware.org/ml/systemtap/2005-q3/msg00208.html
http://lkml.org/lkml/2006/9/20/30

Regards
Suparna

> 
> Ananth

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
  2007-05-11 21:56               ` Alan Cox
@ 2007-05-13 15:20                 ` Mathieu Desnoyers
  0 siblings, 0 replies; 61+ messages in thread
From: Mathieu Desnoyers @ 2007-05-13 15:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, systemtap, prasanna, ananth, anil.s.keshavamurthy,
	akpm, linux-kernel, hch

* Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > The IPI might be fast, but I have seen interrupts being disabled for
> > quite a long time in some kernel code paths. Having interrupts disabled
> > on _each cpu_ while running an IPI handler waiting to be synchronized
> > with other CPUs has this side-effect. Therefore, if I understand well,
> 
> This can already occur worst case when we spin on an IPI (eg a cross CPU
> TLB shootdown)
> 

Hrm, maybe am I understanding something incorrectly there :

arch/i386/kernel/smp.c: native_flush_tlb_others() takes a spinlock, but
does not disable interrupts, while spinning waiting for other CPUs.
smp_invalidate_interrupt(), in the same file, does not spin waiting for
other CPUs. Therefore, I understand that none of these functions spin
with interrupts disabled, so this TLB flush does not show the same
behavior.


> If the INT3 is acknowledged as safe by intel either as is or by some
> specific usage like lock mov then great. If not it isn't too bad a
> problem.
> 

Another mail in this thread explains that the main issue is not the
atomicity of the code modification operation (although it must be atomic
for the CPU to see a correct instruction), but to the fact that the CPU
expects the pre-fetched instruction and the executed instruction to be
the same, except for the int3 case.

> And to be real about this - how many benchmarks do you know that care
> about mega-kernel-debugs per second ?

For users with real-time needs, the overall IRQ latency of the system
gives an upper-bound to  what can be executed by the application in a
given time-frame. People doing audio/video acquisition should be quite
interested in this metric.

So this is mostly a matter of how this action (enabling a marker) can
influence the overall system's latency. One of my goals is to provide
tracing in the Linux kernel with minimal performance and behavioral
impact on the system so it does not make the system flakyer than normal
and can be activated on a bogus system and still reproduce the original
problem.


Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-05-10 22:14     ` David Smith
@ 2007-06-23  8:09       ` Christoph Hellwig
  2007-06-23  9:25         ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2007-06-23  8:09 UTC (permalink / raw)
  To: David Smith; +Cc: Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

On Thu, May 10, 2007 at 05:14:25PM -0500, David Smith wrote:
> Christoph Hellwig wrote:
> >NACK.  This shouldn't be includedable from userspace and systemtap people
> >should stop doing crap like that but use kernel infrastructure everyone
> >else uses including runtime kernel code instead of stuffin all kinds of
> >crap into their broken translator.
> 
> Christoph,
> 
> I was using the above information to parse all the available markers in 
> a kernel, but I believe I've found a way where I don't need the marker.h 
> header, so I'm OK with your NACK.
> 
> Out of curiosity, what is the kernel infrastructure you believe I should 
> be using to get a list of all available markers in a kernel?  Note that 
> I also have a requirement to get a list of all available markers in a 
> kernel that isn't the currently running one.

None.  Modules using markers should ship with the kernel or be written
on deman and not beeing created by some unreliable parser.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-06-23  8:09       ` Christoph Hellwig
@ 2007-06-23  9:25         ` Alan Cox
  2007-06-23  9:32           ` Christoph Hellwig
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-06-23  9:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Smith, Christoph Hellwig, Mathieu Desnoyers, akpm, linux-kernel

O> > Out of curiosity, what is the kernel infrastructure you believe I should 
> > be using to get a list of all available markers in a kernel?  Note that 
> > I also have a requirement to get a list of all available markers in a 
> > kernel that isn't the currently running one.
> 
> None.  Modules using markers should ship with the kernel or be written
> on deman and not beeing created by some unreliable parser.

And what do you think should happen in the real world instead of
Christoph fantasy land ?

Getting the marker exports right is what is needed to avoid having an
unreliable parser and ending up with a reliable one.

Or would you rather someone loaded a JVM into kernel space so it was
"shipped with the kernel"

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-06-23  9:25         ` Alan Cox
@ 2007-06-23  9:32           ` Christoph Hellwig
  2007-06-23  9:49             ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2007-06-23  9:32 UTC (permalink / raw)
  To: Alan Cox
  Cc: Christoph Hellwig, David Smith, Mathieu Desnoyers, akpm, linux-kernel

On Sat, Jun 23, 2007 at 10:25:15AM +0100, Alan Cox wrote:
> Getting the marker exports right is what is needed to avoid having an
> unreliable parser and ending up with a reliable one.
> 
> Or would you rather someone loaded a JVM into kernel space so it was
> "shipped with the kernel"

You're totall missing the point here.  We're talking about kernel internal
interface, and the point for them has always been not to care about out
of tree users.  There is no relation to anything involving a JVM here.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-06-23  9:32           ` Christoph Hellwig
@ 2007-06-23  9:49             ` Alan Cox
  2007-06-23 10:06               ` Christoph Hellwig
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2007-06-23  9:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, David Smith, Mathieu Desnoyers, akpm, linux-kernel

On Sat, 23 Jun 2007 10:32:09 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Sat, Jun 23, 2007 at 10:25:15AM +0100, Alan Cox wrote:
> > Getting the marker exports right is what is needed to avoid having an
> > unreliable parser and ending up with a reliable one.
> > 
> > Or would you rather someone loaded a JVM into kernel space so it was
> > "shipped with the kernel"
> 
> You're totall missing the point here.  We're talking about kernel internal
> interface, and the point for them has always been not to care about out
> of tree users.  There is no relation to anything involving a JVM here.

Of course there is Christoph.

If you have a system which generates and loads modules then they can't be
in tree (as they don't exist except transiently). On the other hand if it
outputs java byte codes then a JVM to process them can be in tree. It
would be a stupid solution to the problem but you appear to be objecting
to sane ones.

The system to create the dynamic modules could certainly be in-tree but to
argue that code dynamically created should be "in tree" already is a
bit silly really isn't it ?

A second way of point out your argument is totally and utterly bogus
would be the MODULE_ interface. The modutils are clearly out of tree
users.

So whats the difference between modutils and markers ? Would it suddenely
change if modutils developed modinfo --dump-markers ? 

Alan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-06-23  9:49             ` Alan Cox
@ 2007-06-23 10:06               ` Christoph Hellwig
  2007-06-23 14:55                 ` Alan Cox
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2007-06-23 10:06 UTC (permalink / raw)
  To: Alan Cox
  Cc: Christoph Hellwig, David Smith, Mathieu Desnoyers, akpm, linux-kernel

On Sat, Jun 23, 2007 at 10:49:05AM +0100, Alan Cox wrote:
> The system to create the dynamic modules could certainly be in-tree but to
> argue that code dynamically created should be "in tree" already is a
> bit silly really isn't it ?

I never argued that.  Creating them intree is equivalent to having the
generated modules in tree for all purposes related to interface stability.

The important bit is that we should not even pretend to allow external
users that we keep a tiny part of the interface stable while the major
part isn't.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary.
  2007-06-23 10:06               ` Christoph Hellwig
@ 2007-06-23 14:55                 ` Alan Cox
  0 siblings, 0 replies; 61+ messages in thread
From: Alan Cox @ 2007-06-23 14:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, David Smith, Mathieu Desnoyers, akpm, linux-kernel

On Sat, 23 Jun 2007 11:06:00 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Sat, Jun 23, 2007 at 10:49:05AM +0100, Alan Cox wrote:
> > The system to create the dynamic modules could certainly be in-tree but to
> > argue that code dynamically created should be "in tree" already is a
> > bit silly really isn't it ?
> 
> I never argued that.  Creating them intree is equivalent to having the
> generated modules in tree for all purposes related to interface stability.

So if all the applications using markers are shoved into the kernel
source tree you are happy, and if they are distributed elsewhere you are
not ?

Or do you want a single controlled 'with the kernel' tool for doing the
outputting which alone has close links with the kernel (ie modinfo
--dump-markers) or similar ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2007-06-23 14:49 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
2007-05-10  1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
2007-05-10  6:57   ` Christoph Hellwig
2007-05-10  1:55 ` [patch 02/10] Linux Kernel Markers, architecture independent code Mathieu Desnoyers
2007-05-10  5:10   ` Alexey Dobriyan
2007-05-10 12:58     ` Mathieu Desnoyers
2007-05-10 13:12     ` Mathieu Desnoyers
2007-05-10 19:00       ` Alexey Dobriyan
2007-05-10 19:46         ` Mathieu Desnoyers
2007-05-10  1:55 ` [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary Mathieu Desnoyers
2007-05-10  6:51   ` Christoph Hellwig
2007-05-10 22:14     ` David Smith
2007-06-23  8:09       ` Christoph Hellwig
2007-06-23  9:25         ` Alan Cox
2007-06-23  9:32           ` Christoph Hellwig
2007-06-23  9:49             ` Alan Cox
2007-06-23 10:06               ` Christoph Hellwig
2007-06-23 14:55                 ` Alan Cox
2007-05-10  1:55 ` [patch 04/10] Linux Kernel Markers - PowerPC optimized version Mathieu Desnoyers
2007-05-10  6:57   ` Christoph Hellwig
2007-05-10  1:56 ` [patch 05/10] Linux Kernel Markers - i386 " Mathieu Desnoyers
2007-05-10  9:06   ` Andi Kleen
2007-05-10 15:55     ` Mathieu Desnoyers
2007-05-10 16:28       ` Alan Cox
2007-05-10 16:59         ` Mathieu Desnoyers
2007-05-11  4:57           ` Ananth N Mavinakayanahalli
2007-05-11 18:55             ` Mathieu Desnoyers
2007-05-12  5:29             ` Suparna Bhattacharya
2007-05-11  6:04           ` Andi Kleen
2007-05-11 18:02             ` Mathieu Desnoyers
2007-05-11 21:56               ` Alan Cox
2007-05-13 15:20                 ` Mathieu Desnoyers
2007-05-10  1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
2007-05-10  5:13   ` Alexey Dobriyan
2007-05-10  6:56   ` Christoph Hellwig
2007-05-10 13:11     ` Mathieu Desnoyers
2007-05-10 13:40       ` Alan Cox
2007-05-10 14:25         ` Mathieu Desnoyers
2007-05-10 15:33         ` Nicholas Berry
2007-05-10 16:09           ` Alan Cox
2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
2007-05-10  6:58   ` Christoph Hellwig
2007-05-10 11:41     ` Alan Cox
2007-05-10 11:41       ` Christoph Hellwig
2007-05-10 12:48         ` Alan Cox
2007-05-10 12:52           ` Pekka Enberg
2007-05-10 13:04             ` Alan Cox
2007-05-10 13:16               ` Pekka J Enberg
2007-05-10 13:43                 ` Alan Cox
2007-05-10 14:04                   ` Pekka J Enberg
2007-05-10 14:12     ` Mathieu Desnoyers
2007-05-10 14:14     ` Mathieu Desnoyers
2007-05-11 15:05     ` Valdis.Kletnieks
2007-05-10 12:00   ` Christoph Hellwig
2007-05-10 15:51   ` Scott Preece
2007-05-10  1:56 ` [patch 08/10] Defines the linker macro EXTRA_RWDATA for the marker data section Mathieu Desnoyers
2007-05-10  1:56 ` [patch 09/10] Linux Kernel Markers - Use EXTRA_RWDATA in architectures Mathieu Desnoyers
2007-05-10  1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
2007-05-10  6:53   ` Christoph Hellwig
2007-05-10  9:20   ` Jens Axboe
2007-05-10  2:30 ` [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).